Caption Videos

These models generate text descriptions and captions from videos. They use large multimodal transformers trained on vast datasets that include both video content and corresponding text, such as captions, titles, and descriptions.

Key capabilities:

Video captioning: Produce relevant captions summarizing video contents and context. Useful for indexing videos and accessibility. Automate alt text for videos.
Visual question answering: Generate natural language answers to questions about videos. Ask questions about your images.

Featured models

lucataco/qwen2-vl-7b-instruct

Latest model in the Qwen family for chatting with video and image models

Updated 10 months ago

260.6K runs

shreejalmaharjan-27/tiktok-short-captions

Generate Tiktok-Style Captions powered by Whisper (GPU)

Updated 11 months ago

195.9K runs

fictions-ai/autocaption

Automatically add captions to a video

Updated 1 year, 9 months ago

60.1K runs

Recommended Models

lucataco/minicpm-v-4

MiniCPM-V 4.0 has strong image and video understanding performance

Updated 2 months, 2 weeks ago

228 runs

lucataco/qwen2.5-omni-7b

Qwen2.5-Omni is an end-to-end multimodal model designed to perceive diverse modalities, including text, images, audio, and video, while simultaneously generating text and natural speech responses in a streaming manner.

Updated 6 months, 3 weeks ago

15.3K runs

lucataco/videollama3-7b

VideoLLaMA 3: Frontier Multimodal Foundation Models for Video Understanding

Updated 8 months, 1 week ago

8.3K runs

lucataco/apollo-7b

Apollo 7B - An Exploration of Video Understanding in Large Multimodal Models

Updated 10 months, 1 week ago

117.8K runs

lucataco/apollo-3b

Apollo 3B - An Exploration of Video Understanding in Large Multimodal Models

Updated 10 months, 1 week ago

142 runs

lucataco/bulk-video-caption

Video Preprocessing tool for captioning multiple videos using GPT, Claude or Gemini

Updated 10 months, 1 week ago

163 runs

chenxwh/cogvlm2-video

CogVLM2: Visual Language Models for Image and Video Understanding

Updated 1 year, 1 month ago

663.9K runs

cuuupid/qwen2-vl-2b

SOTA open-source model for chatting with videos and the newest model in the Qwen family

Updated 1 year, 1 month ago

601 runs

lucataco/qwen-vl-chat

A multimodal LLM-based AI assistant, which is trained with alignment techniques. Qwen-VL-Chat supports more flexible interaction, such as multi-round question answering, and creative capabilities.

Updated 2 years ago

825.5K runs

Recommended Models