🐣 Follow https://twitter.com/camenduru
🔥 Discord server https://discord.gg/k5BwmmvJJU
Potat 1️⃣
First Open-Source 1024x576 Text To Video Model 🥳 https://huggingface.co/camenduru/potat1
Info
- Prototype Model
- Trained with https://lambdalabs.com
- 2197 clips, 68388 tagged frames ( salesforce/blip2-opt-6.7b-coco )
- train_steps: 10000
Dataset & Config
https://huggingface.co/camenduru/potat1_dataset/tree/main
Finetuning
- https://github.com/Breakthrough/PySceneDetect
- https://github.com/ExponentialML/Video-BLIP2-Preprocessor
- https://github.com/ExponentialML/Text-To-Video-Finetuning
- https://github.com/camenduru/Text-To-Video-Finetuning-colab