Most model-serving frameworks are based on REST. TensorFlow Serving and TensorRT offer gRPC endpoints which are fussier but more performant.
Stateless - No client context is stored on the server between requests
Self-contained - All information that is needed to service a request is packaged with the request itself
Flexible - REST is programming language agnostic, has universal browser and language support, and supports a large number of filetypes
Bi-directional - gRCP supports two-way communication
Simplicity - No headers, methods, or body, and better status codes
Performant - Binary data via protocol buffers for serializing structure data, performs better under high loads