Collections

Caption videos

These models generate text descriptions and captions from videos. They use large multimodal transformers trained on vast datasets that include both video content and corresponding text, such as captions, titles, and descriptions.

Key capabilities:

  • Video captioning: Produce relevant captions summarizing video contents and context. Useful for indexing videos and accessibility. Automate alt text for videos.
  • Visual question answering: Generate natural language answers to questions about videos. Ask questions about your images.

Recommended models

lucataco / minicpm-v-4

MiniCPM-V 4.0 has strong image and video understanding performance

Updated 1 month ago

111 runs

lucataco / qwen2.5-omni-7b

Qwen2.5-Omni is an end-to-end multimodal model designed to perceive diverse modalities, including text, images, audio, and video, while simultaneously generating text and natural speech responses in a streaming manner.

Updated 5 months, 1 week ago

13.1K runs

lucataco / videollama3-7b

VideoLLaMA 3: Frontier Multimodal Foundation Models for Video Understanding

Updated 7 months ago

7.8K runs

lucataco / apollo-7b

Apollo 7B - An Exploration of Video Understanding in Large Multimodal Models

Updated 8 months, 4 weeks ago

106.9K runs

lucataco / apollo-3b

Apollo 3B - An Exploration of Video Understanding in Large Multimodal Models

Updated 8 months, 4 weeks ago

140 runs

lucataco / bulk-video-caption

Video Preprocessing tool for captioning multiple videos using GPT, Claude or Gemini

Updated 9 months ago

129 runs

chenxwh / cogvlm2-video

CogVLM2: Visual Language Models for Image and Video Understanding

Updated 11 months, 3 weeks ago

661.8K runs

cuuupid / qwen2-vl-2b

SOTA open-source model for chatting with videos and the newest model in the Qwen family

Updated 1 year ago

589 runs

lucataco / qwen-vl-chat

A multimodal LLM-based AI assistant, which is trained with alignment techniques. Qwen-VL-Chat supports more flexible interaction, such as multi-round question answering, and creative capabilities.

Updated 1 year, 10 months ago

825.5K runs