Deploy any model as a high-performance, low-latency micro-service with a RESTful API. Easily monitor, scale, and version deployments. Deployments take a trained model and expose them as a persistent service at a known, secure URL endpoint.
Out-of-the-box integration with TensorFlow, ONNX, and TensorRT, as well as Flask for Custom models
A variety of GPU & CPU types to deploy on
Per second pay-as-you-go billing
Multi-instance deployments with automatic load balancing
A dedicated, secure endpoint URL per deployment
Accessible via the Gradient CLI, Web UI, or API, or from your own custom applications