Learn how to package a trained model using Cog and deploy it with Replicate's API.
Pushing models to Replicate is currently in closed beta while we iron out the wrinkles. If you don't have a invite, come talk to us in Discord or send an email to firstname.lastname@example.org and tell us what you're thinking of making.
Before using the API, you'll need to subscribe for $10/mo to cover the cost of the GPUs.
Next you'll create a page for your model on Replicate. Visit replicate.com/create to choose a name for your model, and specify whether it should be public or private.
Cog is an open source tool that makes it easy to put a machine learning model in a Docker container. Run this to install it:
sudo curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_`uname -s`_`uname -m` sudo chmod +x /usr/local/bin/cog
To configure your project for use with Cog, you'll need to add two files:
cog.yamldefines system requirements, Python package dependencies, etc
predict.pydescribes the prediction interface for your model
cog init command to generate these files in your project:
cd path/to/your/model cog init
cog.yaml file defines all the different things that need to be installed for your model to run. You can think of it as a simple way of defining a Docker image.
build: python_version: "3.8" python_packages: - "torch==1.7.0"
This will generate a Docker image with Python 3.8 and PyTorch 1.7 installed and various other sensible best-practices.
To use GPUs, add the
gpu: true option to the
build section of your
build: gpu: true # ...
Cog will use the nvidia-docker base image and automatically figure out what versions of CUDA and cuDNN to use based on the version of Python, PyTorch, and Tensorflow that you are using.
To run a command inside this environment, prefix it with
$ cog run python ✓ Building Docker image from cog.yaml... Successfully built 8f54020c8981 Running 'python' in Docker with the current directory mounted as a volume... ──────────────────────────────────────────────────────────────────────────────────────── Python 3.8.10 (default, May 12 2021, 23:32:14) [GCC 9.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>>
This is handy for ensuring a consistent environment for development or training.
cog.yaml, you can also install system packages and other things. Take a look at the full reference to see what else you can do.
The next step is to update
predict.py to define the interface for running predictions on your model. The
predict.py generated by
cog init looks something like this:
from cog import BasePredictor, Path, Input import torch class Predictor(BasePredictor): def setup(self): """Load the model into memory to make running multiple predictions efficient""" self.net = torch.load("weights.pth") def predict(self, image: Path = Input(description="Image to enlarge"), scale: float = Input(description="Factor to scale image by", default=1.5) ) -> Path: """Run a single prediction on the model""" # ... pre-processing ... output = self.net(input) # ... post-processing ... return output
predict.py file and fill in the functions with your own model's setup and prediction code. You might need to import parts of your model from another file.
You also need to define the inputs to your model as arguments to the
predict() function, as demonstrated above. For each argument, you need to annotate with a type. The supported types are:
str: a string
int: an integer
float: a floating point number
bool: a boolean
cog.File: a file-like object representing a file
cog.Path: a path to a file on disk
You can provide more information about the input with the
Input() function, as shown above. It takes these basic arguments:
description: A description of what to pass to this input for users of the model
default: A default value to set the input to. If this argument is not passed, the input is required. If it is explicitly set to
None, the input is optional.
floattypes, the value should be greater than or equal to this number.
floattypes, the value should be less than or equal to this number.
inttypes, a list of possible values for this input.
There are some more advanced options you can pass, too. For more details, take a look at the prediction interface documentation.
Next, add the line
predict: "predict.py:Predictor" to your
cog.yaml, so it looks something like this:
build: python_version: "3.8" python_packages: - "torch==1.7.0" predict: "predict.py:Predictor"
To test this works, try running a prediction on the model:
$ cog predict -i email@example.com ✓ Building Docker image from cog.yaml... Successfully built 664ef88bc1f4 ✓ Model running in Docker image 664ef88bc1f4 Written output to output.png
To pass more inputs to the model, you can add more
$ cog predict -i firstname.lastname@example.org -i scale=2.0
In this case it is just a number, not a file, so you don't need the
Now that you've configured your model for use with Cog, it's time to publish it to the Replicate registry:
cog login cog push r8.im/your-username/your-model
Note: You can also set the image property in your
cog.yaml file. This allows you to run
cog push without specifying the image, and also makes your Replicate model page more discoverable for folks reading your model's source code.
Once you've pushed your model to Replicate it will be visible on the website, and you can use the web-based form to run predictions using your model.
To run predictions in the cloud from your code, you can use the Python client library.
Install it from pip:
pip install replicate
Authenticate by setting your token in an environment variable:
Then, you can use it from your Python code:
$ python >>> import replicate >>> model = replicate.models.get("replicate/hello-world") >>> model.predict(text="python") "hello python"
To pass files as input, use a file handle or URL:
>>> model = replicate.models.get("replicate/resnet") >>> model.predict(image=open("mystery.jpg", "rb")) # or... >>> model.predict(image="https://example.com/mystery.jpg")
URLs are more efficient if your file is already in the cloud somewhere, or it is a large file.
If your model returns a file, it will be represented as a URL in the output. To fetch these files, you will need to pass an
Authorization: Token <token> header to securely fetch the file, as documented in the HTTP API reference. (We are working on a better Python API for fetching files.)
For more details, see the full documentation on GitHub.
You can also run your model with the raw HTTP API. See the HTTP API reference for more details.