Collections

Caption videos

These models generate text descriptions and captions from videos. They use large multimodal transformers trained on vast datasets that include both video content and corresponding text, such as captions, titles, and descriptions.

Key capabilities:

  • Video captioning: Produce relevant captions summarizing video contents and context. Useful for indexing videos and accessibility. Automate alt text for videos.
  • Visual question answering: Generate natural language answers to questions about videos. Ask questions about your images.