This page provides a reference guide to using health checks in Gradient Deployments.
Configure health checks that monitor the health and readiness of Gradient Deployments.
How health checks work
Gradient health checks leverage Kubernetes probes under the hood. Some slight changes in the configuration have been chosen to deliver a better experience.
There are three configurable health checks available:
liveness checks detect deployment containers that transition to an unhealthy state.
liveness checks remedy said situations through targeted restarts.
readiness checks tell our load balancers when a container is ready to receive traffic. These checks run for the life of the container. Applications that leverage
readiness checks may need to load a model into memory or initiate connections to external services before receiving requests.
startup checks detect if a container has started successfully. If the container never enters a successful state, the container is killed and restarted. Once a
startup health check detects a successful start of the container, it will initiate the
readiness health checks (if configured).
Any status codes returned greater than or equal to 200 and less than 400 indicate success. Any other code indicates failure.
Health check configuration
healthChecks : The overall label used to specify any health checks.
liveness/readiness/startup: The type of health check specified.
path: The path of the http endpoint that the health check will call.
port: (Optional) The port that the path is running on. The default is to use the same port as the image itself.
initialDelaySeconds: (Optional) The number of seconds after the container has started before health checks are initiated. Defaults to 0 seconds.
periodSeconds: (Optional) How often (in seconds) to perform the health check. Defaults to 10 seconds.
timeoutSeconds: (Optional) The number of seconds after which the health check times out. Defaults to 1 second. Minimum value is 1.
failureThreshold: (Optional) The number of times the health check has to return a failed response for the health check to be assigned a failed status. Defaults to 3 tries.
Health check example
Below is a deployment spec and a Python script that use health checks to monitor a FastAPI application. On startup the application will download a model, check that it can make a connection to an S3 bucket, and wait to be marked healthy before serving requests.
Deployment spec example using HTTP
FastAPI application example
from pydantic import BaseSettings
from fastapi import FastAPI, Response, status
model_loaded: bool = False
# Other statuses
load_status = LoadStatus()
app = FastAPI()
async def model_load():
# Download model
load_status.model_loaded = True
return "Liveness check succeeded."
def readiness_check(response: Response):
s3_successful = # S3 connection check
if not s3_successful or not load_status.model_loaded:
response.status_code = status.HTTP_503_SERVICE_UNAVAILABLE
return "Readiness check failed."
return "Readiness check succeeded."
return "Startup check succeeded."
# Make a prediction
# Upload to S3 bucket
# Return response
In the above scenario, when the deployment spec is submitted, the container will be pulled from the container registry. When finished, the deployment will start to build the container. As the container starts to build, the
startup health check will start probing the application. The app has 60 seconds to startup before the container is marked as unhealthy by the
startup health check and restarted
(periodSeconds*failureThreshold = 10*6 = 60 seconds).
Because the FastAPI app above defines a startup event process, that process (model download) will have to finish before the container is considered to have a successful startup. When the model download finishes, assuming it's within 60 seconds, the
startup health check will succeed, stop probing, and the
readiness health checks will start to probe the container every 10 seconds (Kubernetes default) to monitor the health and readiness for the life of the container.
readiness health check will ensure the model has been downloaded and the container can make a connection to the S3 bucket and then return a
200 status code which will mark the container to be in a successful and ready state.
Once all health checks have passed, the container will start to receive incoming traffic (e.g. into the
/predict/ endpoint) and the
readiness health checks will continue to probe and monitor the container. In the case of the
readiness probe, if at some point in the future the container can't make a connection to the S3 bucket, it will return a
503 status code to tell the deployment to no longer send traffic to this container until it can successfully make a connection with the S3 bucket again.