Seldon 1.4 adds GRPC

Clive Cox
Machine Learning Deployment
5 min readDec 21, 2016

In the 1.4 release of Seldon we have added an alpha release of gRPC endpoints to complement our REST and Javascript endpoints. Remote Procedure Calls (RPC) and Google’s version of this (gRPC) provides several advantages over REST.

  • Faster. gRPC is built on top of protocol buffers and provides a binary transport mechanism which decreases latency and complexity when compared to RESTful methods that require parsing of endpoints and payloads (JSON).
  • Clearer. Procedures, inputs and outputs as well as errors can be clearly defined.

See also gRPC FAQs and Kelsey Hightower’s humorous discussion of gRPC in Kubernetes. However, gRPC may be unfamiliar to many and requires a certain expertise for the developer building a gRPC client or server.

For the 1.4 release of Seldon we have added gRPC as an external prediction endpoint as well as allowing prediction microservices to be deployed as gRPC servers internal to Seldon. The gRPC proto definition is shown below:

syntax = “proto3”;import “google/protobuf/any.proto”;option java_multiple_files = true;
option java_package = “io.seldon.api.rpc”;
option java_outer_classname = “PredictionAPI”;
package io.seldon.api.rpc;service Seldon {
rpc Classify (ClassificationRequest) returns (ClassificationReply) {}
}
//Classification Requestmessage ClassificationRequest {
ClassificationRequestMeta meta = 1;
google.protobuf.Any data = 2;
}
message ClassificationRequestMeta {
string puid = 1;
}
// Classification Replymessage ClassificationReply {
ClassificationReplyMeta meta = 1;
repeated ClassificationResult predictions = 2;
google.protobuf.Any custom = 3;
}
message ClassificationReplyMeta {
string puid = 1;
string modelName = 2;
string variation = 3;
}
message ClassificationResult {
double prediction = 1;
string predictedClass = 2;
double confidence = 3;
}
message DefaultCustomPredictRequest {
repeated float values = 1;
}

A ClassificationRequest has two parts

  • meta : metadata associated with the request, presently an optional prediction-id the client wishes to associate with the request.
  • data : A custom entry to hold the features needs for the prediction request defined by the user. This is optionally defined by the user. If not defined then a variable array of floats will be assumed as defined in DefaultCustomPredictRequest.

A ClassificationReply has three parts

  • meta : metadata associated with the prediction. This contains: A prediction unique id either supplied by client in request or created by Seldon; the name of the model that satisfied the request and the AB test variation used in satisfying the request (will be “default” if a single variation).
  • predictions : the predictions for each class
  • custom : optional custom additional data that is defined by the user

The stages to deploy a gRPC service are discussed in detail in our docs and our summarized below.

  1. (Optional) create custom proto buffer file. If your model can take as input an array of floats this step can be ignored. However, if you wish to specify a custom data format for your model then you can create a proto buffer file that describes it.
  2. Build model and package microservice using gRPC. We provide an easy wrapper class in Python. Otherwise you are free to use any language that is supported by gRPC to build your microservice.
  3. (Optional) Inform Seldon of custom protocol buffers. If you have defined a custom proto buffer file in step 1 then you need to inform Seldon’s server of it so it can understand your gRPC requests. We provide a simple command in our CLI to allow you to do this.
  4. Launch gRPC microservice. Launch your gRPC microservice using our start-microservice script.
  5. Test via REST or gRPC clients. Your microserice can be queried externally via gRPC or REST. Obviously for lowest latency gRPC should be used. For REST we translate requests into internal gRPC calls and similarly for replies. The ability to use gRPC and REST interchangeably allows you maximum flexibility in deploying your ML service.

To compare REST with gRPC we ran a load test using locust.io’s open source load testing tool. Locust provides a simple python interface and can easily be extended which allowed gRPC which is unavailable by default to be covered. The details of the benchmarking can be found in our docs. We used the MNIST Tensorflow model demo as the focus of our testing. We created REST and gRPC microservice variants and tested each at around 50 requests per sec.

The locust web interface panel for each run are shown below:

REST

gRPC

On the average response time for REST is over 50% slower than for gRPC.

The percentiles from the two tests are as follows.

Name  50%  66% 75% 80% 90% 95% 98% 99% 100%
grpc 14 16 18 19 25 36 67 110 5045
REST 20 25 30 36 63 110 190 240 717

The advantage of gRPC increases as the percentiles increase except for the 100% percentile which suggests there was some delay at the start or outlier that needs further investigation.

The locust tests were done with 50 clients, each waiting between 900 and 1000ms between calls. Locust unlike the technically more advanced Iago does not allow you to create a fixed request rate so some delays in the load testing framework itself between REST and gRPC implementations could affect the response times. However, we found it less easy to develop a gRPC test in Iago.

Final Thoughts

The addition of gRPC provides an extra option for those needing to deploy machine learning models in more demanding latency environments. There are some open issues in gRPC concerning its ability to load balance over multiple endpoints inside a Kubernetes deployment. At present, given the gRPC client creates a direct connection to the microservice over which it multiplexes calls it is not guaranteed that multiple microservices behind a replication controller will be properly load balanced. Also, there is a transition issue when microservices are redeployed (for example due to new model versions coming into production). There would be a delay until the gRPC client connection failed and was restarted. For the later issue we restart gRPC clients inside the Seldon API server on update from of a new definition as would be the case on model redeploy via the Seldon CLI. However, this causes a delay as the new connections are created which may be unacceptable in some situations of constant high demand API use.

--

--