This repo aims to collect a dataset for dense video captioning. We have been focused on automatic labeling so that we can collect larger datasets in a shorter time.
First, two methods for data labeling will be presented. The first method is using the raw description provided by YouTube for videos and the second one is using raw subtitles of the videos.
Finally, a VDC model will be trained using the collected data so that the impact of the data can be compared with other datasets.