I'm Clay, a member of the team at Replicate. In this post, I'll show you how Replicate's API allows you to easily explore many open text to image models.
You can follow along by downloading the accompanying Jupyter notebook here
If you're running the notebook in Colab, it's recommended to use Firefox/Chrome.
It's wise to use a virtual environment to keep your global python installation clean.
conda will work fine:
python3 -m venv replicate_venv source replicate_venv/bin/activate
conda create --name replicate_venv conda activate replicate_venv
In whatever environment you choose, install the Replicate python client.
(replicate_venv) % pip install replicate
To use the API, you'll need an API access token. You can get a token by subscribing to Replicate. Then, you'll be able to log in to Replicate using your API token each time you need to run python.
You should never store your API key directly in a python file or notebook - this would enable others to gain unauthorized access to your account. Instead, it is recommended to set the
REPLICATE_API_TOKEN variable in your shell prior to running Python:
(replicate_venv) % REPLICATE_API_TOKEN="..." python run_text2image_model.py
If you're in a Jupyter notebook, you can use getpass to receive user input beneath a cell without displaying it.
Assuming everything worked, you should be able to import the replicate module now. In our accompanying notebook, we also include
pathlib.Path, which is needed for some inputs.
import replicate from pathlib import Path
Using a few lines of python, you can programmatically generate an image via text.
Replicate allows us to look up models by
For the example in our notebook, we'll use "afiaka87/glid-3-xl", a great model for generating photorealistic images. For fun, let's generate an image of an avocado lightbulb!
text2image_model = replicate.models.get("afiaka87/glid-3-xl")
Replicate models are run using the
.predict method. Let's take a quick look at the named/keyword arguments for
The named/keyword arguments for each model will vary.
glid-3-xl requires one input
prompt - a scene description you would like to visualize.
We also set the
seed to 0. Setting a manual seed will encourage models to return the exact same output for a set of inputs. Otherwise, a seed will be chosen randomly. Outputs may still differ slightly, but managing a seed is generally a good idea and lots of Replicate models have support for it.
prediction_generator = text2image_model.predict(prompt="an image of a fresh avocado in the form of a lightbulb", seed=0)
.predict simply initializes the model, but does not queue it to be run on Replicate. To run the model, simply iterate over the generator.
generated_image_batches = list(prediction_generator) final_image_batch = generated_image_batches[-1] # ["https://...",] print(final_image_batch)
Because we are only interested in the final, finished output the model returns, we can just cast the generator to a list and grab the last (
The final batch is a list of urls with size is determined by
batch_size (1 by default).
The API opens up lots of possibilities, like passing the output from one model as the input to another. A common example of this is upscaling, where an image generated by one model is piped into a super-resolution model to enlarge it.
generation_to_enhance = Path(final_image_batch) # There's only one URL in the list by default. upscaling_model_api = replicate.models.get("raoumer/srrescgan") high_res_outputs = upscaling_model_api.predict(image=generation_to_enhance)
Some text-to-image models allow you to pass in an existing image called an init image. This produces different variations of your image, with some influence from the specified prompt.
We'll use "laion-ai/ongo", a version of
glid-3-xl finetuned on WikiArt.
You'll need an image to create variations of. We'll use
ongo to vary the image of this farmhouse:
Image inputs to the Replicate API may be a URL or local path.
init_image = Path("https://replicate.com/static/blog/exploring-text-to-image-models/farmhouse.jpeg")
It can be valuable to tweak with various settings to improve model performance, or you can simply remove optional arguments to use the default values set by the model author.
pathlib.Path"initial image" to mix with the generation, causing the model to take influence from the provided image in addition to the specified prompt. Can also be a URL (cast as a Path)
guidance_scale: Determines how much the generation should be guided by your text.
batch_size: Integer from 1 - 12. How many variations should be produced. Low batch sizes are much faster than high batch sizes.
steps: Integer from 30-250. Number of discrete timesteps to run the model for. When using an
init_image, the actual number of timesteps will be
steps * init_skip_fraction(half as many by default). Increasing will improve accuracy at cost of performance.
init_skip_fraction: Decimal from 0.0 to 1.0.
0.5by default. How much influence your image will have on the generation. 0.0 will use almost none, 1.0 will simply encode your image without influence from the model.
When in doubt, you can simply remove an argument and its default will be used instead.
ongo_text2painting_model = replicate.models.get("laion-ai/ongo") ongo_variation_generator = ongo_text2painting_model.predict( prompt="professional painting of a red lakehouse in the style of monet", guidance_scale=10.0, # 1.0 - 100.0 total_steps=250, # 30-250 init_skip_fraction=0.35, # 0.0 - 1.0 batch_size=3, # 1 - 12 init_image=Path("https://replicate.com/static/blog/exploring-text-to-image-models/farmhouse.jpeg"), seed=0, # )
.predict simply returns a generator. To start the prediction, you must first enumerate through it to get your final batched output URL's. We are only interested in the last batch.
ongo_variations_final_batch = list(ongo_variation_generator)[-1]
Using a batch size of 3 means we should get back 3 image URL's.
number_of_variations = len(ongo_variations_final_batch) # should be equal to batch size print(ongo_variations_final_batch) # should be a list of URL's
The output of our inference is a series of beautiful lakehouses in the style of the original farmhouse image!
Lacking inspiration? Model not outputting what you want? Sometimes text to image models can be great at some things but just completely fail at other things. Getting them to perform the way you want them to without updating the weights of the network is sometimes referred to as prompt engineering. Prompt engineering is pretty difficult: we'll be releasing a blog post about our experiences with prompt engineering soon.
There is a lot of opportunity for creative uses of the API. If you have any other creative ways to access models on Replicate, feel free to share on our Discord!
We're bringing people together to explore what's being created with machine learning.
August 11, 2022
Using CLIP and LAION5B to collect thousands of captioned images.
August 5, 2022
Creating a web app to illustrate news headlines with AI-generated visualizations
July 28, 2022
The basics of using the API to create your own images from text.
July 18, 2022 👀
Inspired by model cards, we've created templates for documenting models on Replicate.
July 5, 2022
An introduction to differentiable programming and the process of refining generative art models.
May 27, 2022
We're a small team of engineers and machine learning enthusiasts working to make machine learning more accessible.
May 16, 2022