Join us at Uncanny Spaces, a series of talks about ML and creativity. 🚀

mehdidc/feed_forward_vqgan_clip

Public
Feed forward VQGAN-CLIP model
54,808 runs

Performance

This model runs predictions on Nvidia T4 GPU hardware.

80% of predictions complete within 4 seconds.

Readme

Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt. This is done by training a model that takes as input a text prompt, and returns as an output the VQGAN latent space, which is then transformed into an RGB image. The model is trained on a dataset of text prompts and can be used on unseen text prompts. The loss function is minimizing the distance between the CLIP generated image features and the CLIP input text features. Additionally, a diversity loss can be used to make increase the diversity of the generated images given the same prompt.

Acknowledgements

Replicate