Skip to main content

Gradient Deployments

Get a high-level overview of Gradient Deployments.


Gradient Deployments are containers-as-a-service without the hassle and boilerplate of Kubernetes. Deployments allow you to run container images and serve machine learning models using a high-performance, low-latency service with a RESTful API. Deployments are defined by a spec and can be managed through the web console or CLI/SDK.

A deployment may be running multiple replicas at a time. Each request that is sent to the deployment endpoint is sent to a load balancer which will direct traffic to the active replicas running on the deployment.

A deployment can be updated by changing the spec. Each new spec update will generate a new spec_id under the same deployment_id. Previous versions of the deployment spec can be viewed under the deployment objects history.

The best place to start learning how to deploy models on Gradient is the official Gradient Deployments Tutorial.

Deployment basics


A deployment may be running one or more containers. In Paperspace apps, the number of containers running at any given time is referred to as the number of replicas. Replicas are scaled up and down based on your app configuration.

A deployment's containers run the Docker image set in your configuration. This image is pulled from a registry and run on a Paperspace machine. Deployments can pull from both public or private repositories.

Each container has its own logs and metrics reported to the Paperspace Web Console and Web API.


Deployments can be configured to autoscale based on CPU, memory usage, or request duration. You may also set a minimum and maximum number of replicas.

Alternatively, you can manually scale your deployment to a specific number of replicas.


Every deployment has a public endpoint that can be accessed over the internet. The endpoint resolves to a load balancer which distributes traffic to your containers.


Integrations are a way to connect your app to other services. For example, you can mount volumes, models, and repositories to your containers.

Health Checks

Health checks provide robust checks to run at a specified cadence that will monitor the status of each replica in the deployment. When a replica enters an unhealthy state, traffic to that replica will be stopped, the replica restarted, and traffic to resume once the replica is deemed healthy.


Gradient Deployments can be run on the available Gradient machines. However, Free-tier machines are not available for deployments.