Explore Docs Sign in Join the waitlist

👋 Do you have a model that needs a demo? Email us to request early access. In the meantime, check out the getting started docs.

mehdidc/feed_forward_vqgan_clip

Public
Feed forward VQGAN-CLIP model
19,374 runs
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
Readme

Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt. This is done by training a model that takes as input a text prompt, and returns as an output the VQGAN latent space, which is then transformed into an RGB image. The model is trained on a dataset of text prompts and can be used on unseen text prompts. The loss function is minimizing the distance between the CLIP generated image features and the CLIP input text features. Additionally, a diversity loss can be used to make increase the diversity of the generated images given the same prompt.

Acknowledgements

Replicate Reproducible machine learning