Exploring text to image models
Posted by @afiaka87 and @rossjillian
I’m Clay, a member of the team at Replicate. In this post, I’ll show you how Replicate allows you to easily explore many open text to image models.
You can follow along by downloading the accompanying Jupyter notebook here
If you’re running the notebook in Colab, it’s recommended to use Firefox/Chrome.
Install
It’s wise to use a virtual environment to keep your global Python installation clean. venv
or conda
will work fine:
python3 -m venv replicate_venv
source replicate_venv/bin/activate
or
conda create --name replicate_venv
conda activate replicate_venv
In whatever environment you choose, install Replicate’s Python client.
(replicate_venv) % pip install replicate
Login
To use the API, you’ll need an API access token. You can get a token by subscribing to Replicate. Then, you’ll be able to log in to Replicate using your API token each time you need to run Python.
You should never store your API key directly in a Python file or notebook - this would enable others to gain unauthorized access to your account. Instead, it is recommended to set the REPLICATE_API_TOKEN
variable in your shell prior to running Python:
(replicate_venv) % REPLICATE_API_TOKEN="..." python run_text2image_model.py
If you’re in a Jupyter notebook, you can use getpass to receive user input beneath a cell without displaying it.
Assuming everything worked, you should be able to import the replicate module now. In our accompanying notebook, we also include pathlib.Path
, which is needed for some inputs.
import replicate
from pathlib import Path
Generate an image from text
Using a few lines of Python, you can programmatically generate an image via text.
Replicate allows us to look up models by f"{username}/{model_name}
with replicate.models.get
.
For the example in our notebook, we’ll use “afiaka87/glid-3-xl”, a great model for generating photorealistic images. For fun, let’s generate an image of an avocado lightbulb!
model = replicate.models.get("afiaka87/glid-3-xl")
version = model.versions.get("d74db2a276065cf0d42fe9e2917219112ddf8c698f5d9acbe1cc353b58097dab")
Models on Replicate are run using the .predict
method. Let’s take a quick look at the named/keyword arguments for .predict
.
The named/keyword arguments for each model will vary. glid-3-xl
requires one input prompt
- a scene description you would like to visualize.
We also set the seed
to 0. Setting a manual seed will encourage models to return the exact same output for a set of inputs. Otherwise, a seed will be chosen randomly. Outputs may still differ slightly, but managing a seed is generally a good idea and lots of models on Replicate have support for it.
prediction_generator = version.predict(prompt="an image of a fresh avocado in the form of a lightbulb", seed=0)
Calling .predict
simply initializes the model, but does not queue it to be run on Replicate. To run the model, simply iterate over the generator.
generated_image_batches = list(prediction_generator)
final_image_batch = generated_image_batches[-1] # ["https://...",]
print(final_image_batch)
Because we are only interested in the final, finished output the model returns, we can just cast the generator to a list and grab the last (-1
) element.
The final batch is a list of urls with size is determined by batch_size
(1 by default).
Enhance an image
The API opens up lots of possibilities, like passing the output from one model as the input to another. A common example of this is upscaling, where an image generated by one model is piped into a super-resolution model to enlarge it.
We’ll use “raoumer/srrescgan” to upscale our image of an avocado lightbulb, but there are many upscaling models on Replicate that you can explore.
generation_to_enhance = Path(final_image_batch[0]) # There's only one URL in the list by default.
upscaling_model_api = replicate.models.get("raoumer/srrescgan")
high_res_outputs = upscaling_model_api.predict(image=generation_to_enhance)

Create variations of an image
Some text-to-image models allow you to pass in an existing image called an init image. This produces different variations of your image, with some influence from the specified prompt.
We’ll use “laion-ai/ongo”, a version of glid-3-xl
finetuned on WikiArt.
You’ll need an image to create variations of. We’ll use ongo
to vary the image of this farmhouse:
Image inputs to Replicate may be a URL or local path.
init_image = Path("https://d31rfu1d3w8e4q.cloudfront.net/static/blog/exploring-text-to-image-models/farmhouse.jpeg")
It can be valuable to tweak with various settings to improve model performance, or you can simply remove optional arguments to use the default values set by the model author.
init_image
: Apathlib.Path
“initial image” to mix with the generation, causing the model to take influence from the provided image in addition to the specified prompt. Can also be a URL (cast as a Path)guidance_scale
: Determines how much the generation should be guided by your text.batch_size
: Integer from 1 - 12. How many variations should be produced. Low batch sizes are much faster than high batch sizes.steps
: Integer from 30-250. Number of discrete timesteps to run the model for. When using aninit_image
, the actual number of timesteps will besteps * init_skip_fraction
(half as many by default). Increasing will improve accuracy at cost of performance.init_skip_fraction
: Decimal from 0.0 to 1.0.0.5
by default. How much influence your image will have on the generation. 0.0 will use almost none, 1.0 will simply encode your image without influence from the model.
When in doubt, you can simply remove an argument and its default will be used instead.
model = replicate.models.get("laion-ai/ongo")
version = model.versions.get("1b3cd15121ec450baa71bbbdacddef9217519f12ca12ccfef36eeaa20ad89b9d")
ongo_variation_generator = version.predict(
prompt="professional painting of a red lakehouse in the style of monet",
guidance_scale=10.0, # 1.0 - 100.0
total_steps=250, # 30-250
init_skip_fraction=0.35, # 0.0 - 1.0
batch_size=3, # 1 - 12
init_image=Path("https://d31rfu1d3w8e4q.cloudfront.net/static/blog/exploring-text-to-image-models/farmhouse.jpeg"),
seed=0, #
)
Recall .predict
simply returns a generator. To start the prediction, you must first enumerate through it to get your final batched output URL’s. We are only interested in the last batch.
ongo_variations_final_batch = list(ongo_variation_generator)[-1]
Using a batch size of 3 means we should get back 3 image URL’s.
number_of_variations = len(ongo_variations_final_batch) # should be equal to batch size
print(ongo_variations_final_batch) # should be a list of URL's
The output of our inference is a series of beautiful lakehouses in the style of the original farmhouse image!
Explore
Lacking inspiration? Model not outputting what you want? Sometimes text to image models can be great at some things but just completely fail at other things. Getting them to perform the way you want them to without updating the weights of the network is sometimes referred to as prompt engineering. Prompt engineering is pretty difficult: we’ll be releasing a blog post about our experiences with prompt engineering soon.
There is a lot of opportunity for creative uses of the API. If you have any other creative ways to access models on Replicate, feel free to share on our Discord!