We built this to deploy models to Replicate (https://replicate.com/), but it can also be used to deploy models to your own infra.
Andreas, my co-founder, used to work at Spotify. Spotify wanted to run models inside Docker containers, but Docker was too hard to use for most ML researchers. So, Andreas built a set of templates and scripts to help researchers deploy their own models.
This was mixed in with my experience working at Docker. I created Docker Compose, which makes Docker easier to use for dev environments. We were also joined by Zeke, who created Swagger (now OpenAPI), which is used to define a model’s inputs/outputs. Dominic and some other contributors have since joined! https://github.com/replicate/cog#contributors-
It’s still early days, so expect a few rough edges, but it’s ready to use for deploying models. We’d love to hear what you think.
My first reaction was: sigh _another_ tool to help ML/DS folk not write a Dockerfile? Aren't there enough already?
But at closer glance cog seems to have an edge on some of the competitors like Seldon or Bento - namely using modern Python libraries (like Pedantic and FastAPI), CUDA/cuDNN/PyTorch/Tensorflow/Python compatibility, and (probably most important to me) automatic queue workers.
It generated a 1GB image with nothing but Python 3.8 in the config, so folks who really care about deployment size would want to continue writing their own container files.
Cog is optimized for getting a deep learning model inside a Docker image. We found that ML researchers struggled to use Docker, so we made that process easier. It generates a best practice Dockerfile with all your dependencies, and resolves the CUDA versions automatically. It also includes a queue worker, which we found was the optimal way of deploying long-running/batch models at Spotify and Replicate.
Bento is more flexible – the models can be used outside of Docker, and it has built-in support for deploying to lots of deployment environments, which Cog doesn't have yet.
We built this to deploy models to Replicate (https://replicate.com/), but it can also be used to deploy models to your own infra.
Andreas, my co-founder, used to work at Spotify. Spotify wanted to run models inside Docker containers, but Docker was too hard to use for most ML researchers. So, Andreas built a set of templates and scripts to help researchers deploy their own models.
This was mixed in with my experience working at Docker. I created Docker Compose, which makes Docker easier to use for dev environments. We were also joined by Zeke, who created Swagger (now OpenAPI), which is used to define a model’s inputs/outputs. Dominic and some other contributors have since joined! https://github.com/replicate/cog#contributors-
It’s still early days, so expect a few rough edges, but it’s ready to use for deploying models. We’d love to hear what you think.