zsxkib / create-video-dataset

Easily create video datasets with auto-captioning for Hunyuan-Video LoRA finetuning

  • Public
  • 523 runs
  • L40S
  • GitHub
  • License

Input

string
Shift + Return to add a new line

YouTube/video URL to process. Leave empty if uploading a file. Note: URL takes precedence if both URL and file are provided.

file

Video file to process. Leave empty if using URL. Ignored if URL is provided.

string

Scene detection method: 'content' (fast cuts), 'adaptive' (camera movement), or 'threshold' (fades)

Default: "content"

number

Minimum scene length in seconds

Default: 1

number

Maximum scene length in seconds

Default: 10

integer

Number of scenes to extract (0 = all detected scenes)

Default: 4

number

Target frame rate (e.g. 24, 30). Set to -1 to keep original fps.

Default: 24

number

Start time in seconds for video processing

Default: 0

number

End time in seconds for video processing. Set to 0 to process until the end.

Default: 0

boolean

Automatically skip first 10 seconds (typical intro)

Default: false

boolean

Generate scene previews without creating full dataset

Default: false

string

Video quality preset: 'fast' (lower quality, smaller files), 'balanced', or 'high' (best quality, larger files)

Default: "balanced"

boolean

Let AI generate a caption for your video. If False, you must provide custom_caption.

Default: true

string

Caption style: 'minimal' (short), 'detailed' (longer descriptions), or 'custom'

Default: "detailed"

string
Shift + Return to add a new line

Your custom caption. Required if caption_style is 'custom' or autocaption is False.

string
Shift + Return to add a new line

Trigger word to include in captions (e.g., TOK, STYLE3D). Will be added at start of caption.

Default: "TOK"

string
Shift + Return to add a new line

Text to add BEFORE caption. Example: 'a video of'

string
Shift + Return to add a new line

Text to add AFTER caption. Example: 'in a cinematic style'

Output

Generated in

This example was created by a different version, zsxkib/create-video-dataset:22f4990c.

Run time and cost

This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

Create Video Dataset

A tool to easily prepare video datasets with automatic captioning for AI training. This tool processes videos (from URLs or local files), generates high-quality captions using QWEN-VL, and packages everything into a training-ready format.

Features

  • 🎥 Process YouTube URLs or local video files
  • 🤖 Automatic video captioning using QWEN-VL
  • ✍️ Support for custom captions
  • 🏷️ Configurable trigger words for training
  • 📝 Prefix/suffix support for caption formatting
  • 🗃️ Clean output in zip format

Input Parameters

Parameter Description Default
video_url YouTube/video URL to process None
video_file Local video file to process None
trigger_word Training trigger word (e.g., TOK, STYLE3D) “TOK”
autocaption Use AI to generate captions True
custom_caption Your custom caption (required if autocaption=False) None
autocaption_prefix Text to add before captions None
autocaption_suffix Text to add after captions None

Output

The tool produces a zip file containing: - Processed video file - Caption files (.txt) for each video - Proper directory structure for training