Deploy any model as a high-performance, low-latency micro-service with a RESTful API. Easily monitor, scale, and version deployments. Deployments take a trained model and expose them as a persistent service at a known URI.
Out-of-the-box integration with TensorFlow models, but can be easily extended to serve other types of models and data.(ONNX coming soon)
A variety of GPU & CPU SKUs to deploy to
Per Second Billing
Multi instance deployments with load balancing
Dedicated endpoint URI per deployment
Accessible via the CLI, Web UI or API