Description: VATEX is a a large-scale, high-quality multilingual dataset for Video-and-Language research.
video captioning (5) video description (3) multilingual video captioning video-based machine translation
A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research.
Both English and Chinese captions.
826K captions for 41.3K video clips.