Caption videos
These models generate text descriptions and captions from videos. They use large multimodal transformers trained on vast datasets that include both video content and corresponding text, such as captions, titles, and descriptions.
Key capabilities:
- Video captioning: Produce relevant captions summarizing video contents and context. Useful for indexing videos and accessibility. Automate alt text for videos.
- Visual question answering: Generate natural language answers to questions about videos. Ask questions about your images.
Featured models

lucataco / qwen2-vl-7b-instruct
Latest model in the Qwen family for chatting with video and image models
Updated 6 months, 3 weeks ago
shreejalmaharjan-27 / tiktok-short-captions
Generate Tiktok-Style Captions powered by Whisper (GPU)
Updated 7 months, 3 weeks ago

fictions-ai / autocaption
Automatically add captions to a video
Updated 1 year, 6 months ago
Recommended models

lucataco / qwen2.5-omni-7b
Qwen2.5-Omni is an end-to-end multimodal model designed to perceive diverse modalities, including text, images, audio, and video, while simultaneously generating text and natural speech responses in a streaming manner.
Updated 3 months, 1 week ago

lucataco / videollama3-7b
VideoLLaMA 3: Frontier Multimodal Foundation Models for Video Understanding
Updated 5 months ago

lucataco / apollo-7b
Apollo 7B - An Exploration of Video Understanding in Large Multimodal Models
Updated 6 months, 4 weeks ago

lucataco / apollo-3b
Apollo 3B - An Exploration of Video Understanding in Large Multimodal Models
Updated 6 months, 4 weeks ago

lucataco / bulk-video-caption
Video Preprocessing tool for captioning multiple videos using GPT, Claude or Gemini
Updated 7 months ago

chenxwh / cogvlm2-video
CogVLM2: Visual Language Models for Image and Video Understanding
Updated 9 months, 3 weeks ago

cuuupid / qwen2-vl-2b
SOTA open-source model for chatting with videos and the newest model in the Qwen family
Updated 10 months, 2 weeks ago

lucataco / qwen-vl-chat
A multimodal LLM-based AI assistant, which is trained with alignment techniques. Qwen-VL-Chat supports more flexible interaction, such as multi-round question answering, and creative capabilities.
Updated 1 year, 9 months ago