Latent blending

Enables video transitions with incredible smoothness between prompts. This method involves specific mixing of intermediate latent representations to create a seamless transition.

Cost: ~$0.02 / transition

Bootup (if required): ~2 minutes

Runtime: ~10s / transition

Recommendations

caption

More ambiguity is better as it allows the model more flexibility to create perceptual similarity. More detailed prompts will lead to more motion so you’ll want to make those transitions longer.

transition_time

10 seconds is the magic cutoff for undetectable motion. Any more than this and you won’t notice the transition. The lower this is the more obvious the effect (not a bad thing!).

My comments

I really liked the aesthetic of the blend in this project and decided to use it as a base for future vector-based content-generation. Anyone can replicate an image of something that exists, but what new things can we generate with the right model?

Models utilized

Diffusion: Stability AI - Stability XL Turbo
Feature extraction (perceptual similarity): AlexNet

TODO

[] Quality - Add 4x upscaling option once processing pipeline is optimized,

[] Quality - Compare seed latents with lpips to select closest seeds (degrades performance)