Introducing a new Cog runtime

We are introducing a new implementation of Cog’s production runtime component. This is the part of Cog responsible for predictor schema validation, prediction execution and HTTP serving.

tl;dr:

If you’re a model author and want to try out the new runtime, make sure you’re on Cog >= 0.16.0 and add build.cog_runtime: true to cog.yaml:

build:
    # Enable new Cog runtime implementation
    cog_runtime: true

Most existing models should work as is, apart from a few exceptions. If you hit one of the exceptions, please follow the messages printed by cog to update your code. Read below for why these are necessary.

Note that:

  • The experimental training interface is not supported yet.
  • This new runtime will become the default in a future Cog release, after which the existing one will be deprecated.

Why build this?

The existing Cog runtime was written in Python and relies heavily on Pydantic and several other libraries when performing predictions. This leads to several problems:

  • Dependency issues: many Python libraries pull in conflicting versions of common dependencies, e.g. Pydantic. This causes runtime errors, sometimes even by just rebuilding the image which pulls a newer version of the dependency. By removing all Python dependencies from Cog runtime, you have total control of your model’s dependency graph.
  • Ambiguous predictor interface: we relied on Pydantic for checking predictor input and output types, which can be ambiguous and error prone, e.g. allowing types that may be handled incorrectly by other parts of our ecosystem or user code. It’s also hard to support custom data types due to potentially incompatible Pydantic versions, i.e. v1 vs v2.
  • Error handling: since Cog HTTP server and predictor are both Python code running via multiprocessing, it’s hard to differentiate platform errors, i.e. Cog, vs application errors, i.e. predictor. A model crash may cause the server to end up in a bad state with no useful logging.
  • Performance: certain things are hard to implement correctly and efficiently in Python, i.e. async HTTP handling, file upload & download, concurrency, serialization.

To tackle these problems, we re-implemented the runtime part of Cog with the following components:

  • Schema validation in pure vanilla Python via inspect and no Pydantic or any other dependency
  • Decoupled HTTP server rewritten in Go
  • Custom, pluggable data serialization

This allows us to minimize the runtime logic in Python and reduce the risk of it interfering with application code. The Go server is now responsible for most of the heavy lifting:

  • HTTP server and webhooks
  • Input file download and output file upload
  • Logging

The Go server communicates with the bare minimum Python runner via JSON files for input/output and HTTP/signals for IPC. The Python runner is solely responsible for invoking the predictor’s setup() and predict() methods.

What do I need to change?

Most of the Cog API, Predictor, Input, BaseModel, etc. are source compatible. There are 3 changes that might require updating the model.

  • Improved semantics of optional inputs
  • Cleaner dependencies
  • Removal of deprecated File API.

First, ambiguous optional inputs are no longer allowed. For example, in existing Cog, declaring prompt: str suggests that it cannot be None, while it still allows default=None, which can confuse type checkers and lead to buggy code, e.g. if it doesn’t check for none-ness. For example, instead of:

def predict(prompt: str=Input(description="prompt", default=None))

We should use:

def predict(prompt: Optional[str]=Input(description="prompt")

Note that default=None is now redundant and removed, as Optional[str] implies that the input may be None, and type checker can warn us about checking it.

Second notable change is that the new Cog runtime no longer depends on any of the Python dependencies of the existing runtime. You’ll have to add them to requirements.txt if the model relies on them and they’re not pulled in via any other third party libraries.

  • attrs
  • fastapi
  • pydantic
  • PyYAML
  • requests
  • structlog
  • typing_extensions
  • uvicorn

Third change is the removal of deprecated cog.File API. Use cog.Path instead.