genmoai / mochi-1-lora-trainer

a-r-r-o-w/cogvideox-factory for Mochi-1 LoRA Training

  • Public
  • 28 runs
  • H100
  • GitHub
  • Weights
  • License

Input

file

A zip file containing the video snippets that will be used for training. We recommend a minimum of 12 videos of only a few seconds each. If you include captions, include them as one .txt file per video, e.g. video-1.mp4 should have a caption file named video-1.txt.

boolean

Automatically trim and crop video inputs

Default: true

integer
(minimum: 10, maximum: 6000)

Number of training steps. Recommended range 500-4000

Default: 100

number

Learning rate, if you're new to training you probably don't need to change this.

Default: 0.0004

number
(minimum: 0.01, maximum: 1)

Caption dropout, if you're new to training you probably don't need to change this.

Default: 0.1

integer

Batch size, you can leave this as 1

Default: 1

string
Shift + Return to add a new line

Optimizer to use for training. Supports: adam, adamw.

Default: "adamw"

boolean

Compile the transformer

Default: false

integer
(minimum: 0, maximum: 100000)

Seed for reproducibility, you can leave this as 42

Default: 42

string
Shift + Return to add a new line

Hugging Face repository ID, if you'd like to upload the trained LoRA to Hugging Face. For example, lucataco/mochi-lora-vhs. If the given repo does not exist, a new public repo will be created.

secret

A secret has its value redacted after being sent to the model.

Hugging Face token, if you'd like to upload the trained LoRA to Hugging Face.

Output

Generated in

This example was created by a different version, genmoai/mochi-1-lora-trainer:170ea99f.

Run time and cost

This model runs on Nvidia H100 GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

About

A Cog implementation of a-r-r-o-w/cogvideox-factory for Mochi-1 LoRA training

How to use

You must include a zip file of mov/mp4 video snippets. It is recommended, but not required, to include captions for each video in separate txt files. Note that a lack of captions may hurt fine-tuning quality.

Feel free to use any video captioning model to caption your videos. For a list of models, see our Caption Video collection here. Captions should be fairly detailed, ideally more than 50 words per video.

Make sure each input video is no longer than 2 seconds long each. Use this tool to help you split up video files: video-split

Example use case: VHS effect

Lets say we want to train a VHS video effect lora that looks like this: Luis Quintero Noise-Video

  • First steps is to caption the video with apollo-7b, or any other video captioner model

  • Now that you have a txt file you can split up the large video into smaller 2.5 second snippets with this tool: mochi1-video-split

  • Now train your VHS LoRA using the same settings as this training run: mochi-lora-vhs

  • Finally, test out your LoRA with this LoRA Explorer to see the effect! Be sure to use similar words that are found in your caption txt file. Here is an example run of the VHS LoRA

Example Trained LoRAs:

Under the examples tab you will see the following Mochi-1 LoRAs trained with this model:

How to Run your LoRA:

Once you have uploaded your LoRA file to a huggingface space, you can use the Mochi-1 LoRA Explorer Model and try it out.