These models generate text descriptions and captions from videos. They use large multimodal transformers trained on vast datasets that include both video content and corresponding text, such as captions, titles, and descriptions.
Key capabilities:
Featured models


lucataco/qwen2-vl-7b-instruct
Latest model in the Qwen family for chatting with video and image models
Updated 10 months ago
260.6K runs

shreejalmaharjan-27/tiktok-short-captions
Generate Tiktok-Style Captions powered by Whisper (GPU)
Updated 11 months ago
195.9K runs


fictions-ai/autocaption
Automatically add captions to a video
Updated 1 year, 9 months ago
60.1K runs
Recommended Models
Recommended Models


lucataco/minicpm-v-4
MiniCPM-V 4.0 has strong image and video understanding performance
Updated 2 months, 2 weeks ago
228 runs


lucataco/qwen2.5-omni-7b
Qwen2.5-Omni is an end-to-end multimodal model designed to perceive diverse modalities, including text, images, audio, and video, while simultaneously generating text and natural speech responses in a streaming manner.
Updated 6 months, 3 weeks ago
15.3K runs


lucataco/videollama3-7b
VideoLLaMA 3: Frontier Multimodal Foundation Models for Video Understanding
Updated 8 months, 1 week ago
8.3K runs


lucataco/apollo-7b
Apollo 7B - An Exploration of Video Understanding in Large Multimodal Models
Updated 10 months, 1 week ago
117.8K runs


lucataco/apollo-3b
Apollo 3B - An Exploration of Video Understanding in Large Multimodal Models
Updated 10 months, 1 week ago
142 runs


lucataco/bulk-video-caption
Video Preprocessing tool for captioning multiple videos using GPT, Claude or Gemini
Updated 10 months, 1 week ago
163 runs


chenxwh/cogvlm2-video
CogVLM2: Visual Language Models for Image and Video Understanding
Updated 1 year, 1 month ago
663.9K runs


cuuupid/qwen2-vl-2b
SOTA open-source model for chatting with videos and the newest model in the Qwen family
Updated 1 year, 1 month ago
601 runs


lucataco/qwen-vl-chat
A multimodal LLM-based AI assistant, which is trained with alignment techniques. Qwen-VL-Chat supports more flexible interaction, such as multi-round question answering, and creative capabilities.
Updated 2 years ago
825.5K runs