Health checks
This page provides a reference guide to using health checks in Gradient Deployments.
Configure health checks that monitor the health and readiness of Gradient Deployments.
How health checks work​
Gradient health checks leverage Kubernetes probes under the hood. Some slight changes in the configuration have been chosen to deliver a better experience.
There are three configurable health checks available: liveness
, readiness
, and startup
.
liveness
checks detect deployment containers that transition to an unhealthy state. liveness
checks remedy said situations through targeted restarts.
readiness
checks tell our load balancers when a container is ready to receive traffic. These checks run for the life of the container. Applications that leverage readiness
checks may need to load a model into memory or initiate connections to external services before receiving requests.
startup
checks detect if a container has started successfully. If the container never enters a successful state, the container is killed and restarted. Once a startup
health check detects a successful start of the container, it will initiate the liveness
& readiness
health checks (if configured).
Any status codes returned greater than or equal to 200 and less than 400 indicate success. Any other code indicates failure.
Health check configuration​
healthChecks
: The overall label used to specify any health checks.
liveness/readiness/startup
: The type of health check specified.path
: The path of the http endpoint that the health check will call.port
: (Optional) The port that the path is running on. The default is to use the same port as the image itself.initialDelaySeconds
: (Optional) The number of seconds after the container has started before health checks are initiated. Defaults to 0 seconds.periodSeconds
: (Optional) How often (in seconds) to perform the health check. Defaults to 10 seconds.timeoutSeconds
: (Optional) The number of seconds after which the health check times out. Defaults to 1 second. Minimum value is 1.failureThreshold
: (Optional) The number of times the health check has to return a failed response for the health check to be assigned a failed status. Defaults to 3 tries.
Health check example​
Below is a deployment spec and a Python script that use health checks to monitor a FastAPI application. On startup the application will download a model, check that it can make a connection to an S3 bucket, and wait to be marked healthy before serving requests.
Deployment spec example using HTTP
enabled: true
image: paperspace/deployment-fixture
port: 8888
resources:
replicas: 1
instanceType: A5000
healthChecks:
liveness:
path: /liveness
readiness:
path: /readiness
startup:
path: /startup
failureThreshold: 6
FastAPI application example
from pydantic import BaseSettings
from fastapi import FastAPI, Response, status
class LoadStatus(BaseSettings):
model_loaded: bool = False
# Other statuses
load_status = LoadStatus()
app = FastAPI()
@app.on_event("startup")
async def model_load():
# Download model
load_status.model_loaded = True
@app.get("/liveness/", status_code=200)
def liveness_check():
return "Liveness check succeeded."
@app.get("/readiness/", status_code=200)
def readiness_check(response: Response):
s3_successful = # S3 connection check
if not s3_successful or not load_status.model_loaded:
response.status_code = status.HTTP_503_SERVICE_UNAVAILABLE
return "Readiness check failed."
return "Readiness check succeeded."
@app.get("/startup/", status_code=200)
def startup_check():
return "Startup check succeeded."
@app.post("/predict/}")
def predict():
# Make a prediction
# Upload to S3 bucket
# Return response
Example Scenario:
In the above scenario, when the deployment spec is submitted, the container will be pulled from the container registry. When finished, the deployment will start to build the container. As the container starts to build, the startup
health check will start probing the application. The app has 60 seconds to startup before the container is marked as unhealthy by the startup
health check and restarted (periodSeconds*failureThreshold = 10*6 = 60 seconds)
.
Because the FastAPI app above defines a startup event process, that process (model download) will have to finish before the container is considered to have a successful startup. When the model download finishes, assuming it's within 60 seconds, the startup
health check will succeed, stop probing, and the liveness
and readiness
health checks will start to probe the container every 10 seconds (Kubernetes default) to monitor the health and readiness for the life of the container.
The readiness
health check will ensure the model has been downloaded and the container can make a connection to the S3 bucket and then return a 200
status code which will mark the container to be in a successful and ready state.
Once all health checks have passed, the container will start to receive incoming traffic (e.g. into the /predict/
endpoint) and the liveness
and readiness
health checks will continue to probe and monitor the container. In the case of the readiness
probe, if at some point in the future the container can't make a connection to the S3 bucket, it will return a 503
status code to tell the deployment to no longer send traffic to this container until it can successfully make a connection with the S3 bucket again.