lucataco / bulk-video-caption

Video Preprocessing tool for captioning multiple videos using GPT, Claude or Gemini

  • Public
  • 118 runs
  • CPU
  • GitHub
  • License
Iterate in playground
Run with an API

Input

*file

ZIP archive containing videos to process

boolean

Whether to include CSV in output

Default: true

string
Shift + Return to add a new line

Optional prefix for video captions

Default: ""

string
Shift + Return to add a new line

Optional suffix for video captions

Default: ""

integer

Number of frames to extract from each video for analysis

Default: 2

string
Shift + Return to add a new line

System prompt for caption generation

Default: "\n Analyze these frames from a video and write a detailed caption. \n Describe the type of video (e.g., animation, live-action footage, etc.).\n Focus on consistent elements across frames and any notable motion or action.\n Describe the main subjects, setting, and overall mood of the video.\n Use clear, descriptive language suitable for text-to-video generation.\n "

string

AI model to use for captioning

Default: "gpt-4o"

secret

A secret has its value redacted after being sent to the model.

API key for OpenAI

secret

A secret has its value redacted after being sent to the model.

API key for Anthropic

secret

A secret has its value redacted after being sent to the model.

API key for Google Generative AI

Output

Generated in

Run time and cost

This model runs on CPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

Batch video caption

A cog model for batch video captioning using various AI from OpenAI, Anthropic, and Google

Features

  • Process multiple images from a ZIP archive
  • supports mov, mp4
  • Customizable caption prefixes and suffixes
  • Support for multiple AI models:
    • OpenAI: GPT-4 and variants
    • Anthropic: Claude-3.5, Claude-3 variants
    • Google: Gemini-1.5 variants
  • Flexible system prompts
  • Error handling and retry mechanism
  • Output as a ZIP file containing captions that match image filenames as well as an optional CSV summary