A Machine Learning API to rule them all: Caffe, XGBoost and Tensorflow are in a boat…

Hiking an API away

A year ago, I was building up my fourth Machine Learning API while hiking alone for days through one of the beautiful inner jungles of Taiwan. With no paper handy, and walking through the rainy season, I had to get it clear in my mind first.

A year later, the result is DeepDetect, an Open Source API and server for deep learning, and more. And for the first time, I am happy with the result, especially with the API. I feel it is intuitive, easy to use, generic and malleable at the same time.

I was interested in building up a useful and generic API to serve a set of complementary techniques. With the help from a handful of contributors, I was able to integrate Caffe, XGBoost, and soon Tensorflow, with no modification to either the server or the API.

In practice XGBoost’s gradient boosted trees are a great alternative or complement to deep models. Tensorflow allows both models and data distribution, has good support for LSTM and RNNs, while Caffe shines in production on images and text data. Having a generic API allows painless switches among these libraries.

The following is an attempt to write down the main principles that were used when building this particular machine learning API. Hopefully it would benefit others and/or generate some thoughts and criticism, leading to improvements.

I have been working with machine learning, deep learning, reinforcement learning and Markov decision processes for over ten years and I know the joys and pains. I have been developing my own tools and custom systems, most of them Open Source, for a variety of industries, from NASA Mars rovers activity planning to Airbus cybersecurity and other industrial automated systems.

So a year ago, my focus is on commoditizing the current addition to the AI toolbox, deep learning, the neural net renaissance. Many great libraries are available out there, and what a great surprise, most of them are fully Open Source, transparent, contributor friendly, and most of all, up-to-date with the state-of-the-art of publications. For the first time I am witnessing that the code was being trusted into the hands of practitionners before the papers had even hit the conference’s reviewers!

I had dealt for a sufficiently long time with various industries and research institutes to know that only a small portion of enterprises and developers could ingurgitate such a massive and fast moving experimental codebase. My focus was then definitely on commoditizing machine learning, and to start with, the main deep learning algorithms and architectures.

While walking among the macaques and bears, I slowly organized the various elements in my mind. To start with, I knew that developers and practitioners would get used to the existing libraries, and that they’d certainly start building around them. On the other side, existing businesses would ponder how to make good use of the new technology, in a pragmatic way, with clear returns and without compromising the existing working codebases. What would they have in common then ?

  • Startups: build an authenticated API around a SaaS deep learning backend, need to scale, fast to market, fast move from development to production
  • Enterprises: seamless integration into the existing ecosystems, data flow is slow at first, models can be improved with more data over time, technology can spread to other tasks and departments

An Open Source project that does fit both needs in another context, that of search, is Elasticsearch. It started as a scalable search backend, a clear REST API and full JSON input/output data structures. Though I believe their API is a bit bloated nowadays, I still like the product, and its growth was quite spectacular.

So what would a deep learning API with an integrated server backend look like then ? I needed to structure it a bit more:

  • No Rewrite: deep learning (and machine learning in general) is like cryptography, it is advisable not to write anything twice, as bugs & stochasticity lead to nasty situations, and this played in favor of uniting around a set of existing libraries
  • Seamless switch: having the same environment in both development and production accelerates the testing & deployment cycle, and avoids bugs
  • Simplicity in the command line: simple & human readable input / output format such as JSON eases the integration into existing pipelines, simplicity of the API is mandatory
  • Production: most of the lifetime of a professional machine learning service is expected to be spent on predicting from data, not training

If these elements could be brought together into a generic machine learning server with a simple while powerful API, it would have to be malleable enough to accomodate both developers and enterprise needs. That is, the seamless switch from development to production (and back). It would have to speak JSON and unify several deep learning and machine learning libraries around a single framework and API while hiding most of their inner complexity.

So starting with a Machine Learning API, the core elements would be the resources and the data input / output structure.

Resources were considered the server’s resources, not just the machine learning services. This design was favored because GPU and memory would be scarce resources over which to POST machine learning service’s jobs. Also, it was simpler to memorize. Let’s see the core resources:

  • Server information: /info with GET
  • Machine learning services management: /services with PUT (create a service), GET (get a service state), POST (update a service)
  • Training of models: /train with POST (a new training job), GET (get a training job status), DELETE (cancel a training job)
  • Prediction: /predict with POST (send data over to a service)

So services hold the machine learning services, train and predict are the resources associated to the two main operations on statistical models. Note that there’s no difference between supervised and unsupervised services at this stage.

Main parameters for machine learning are input acquisition & pre-processing, statistical processing, and final output. Thus the natural data structure that comes to mind is very simple: input, mllib and output. mllib corresponds to the specific parameters of each supported library, the other two are self-explanatory. Let’s see an example that creates an image classification service:

PUT /services/imageserv
{
“description”: “image classification service”,
“mllib”: “caffe”,
“model”: {
“repository”: “/path/to/models/imgnet”,
“templates”: “../templates/caffe/”
},
“parameters”: {
“input”: {
“connector”: “image”
},
“mllib”: {
“nclasses”: 1000,
“template”: “googlenet”
},
“output”: {
}
},
“type”: “supervised”
}

The breakdown of parameters into input, mllib, output is generic, it typically covers both supervised and unsupervised settings by adapting the output connector. The input connector deals with input formats, from CSV, libsvm, to text, including character-based features, and images. The mllib component embeds the machine learning library parameters at service creation, training and prediction time. It is very convenient as it allows to refer to each of the libraries original documentation for parameters, as these remain identical when used through the DeepDetect API.

Let’s see an input connector for a CSV format:

“input”: {
“id”: “Id”,
“label”: “Cover”,
“separator”: “,”,
“shuffle”: true,
“test_split”: 0.1
}

Pretty straightforward and independent from the machine learning library. Let’s see a typical output connector for a training job:

“output”: {
“measure”: [
“acc”,
“mcll”,
“f1”
]
}

Here again, the metrics apply to all supervised services. Now let’s look at a trick with the output. An output template in Mustache format can be set so that the standard JSON output can be transformed into any other format:

{
“network”: {
“http_method”: “POST”,
“url”: “http://localhost:9200/images/img"
},
“template”: “{ {{#body}}{{#predictions}} \”uri\”:\”{{uri}}\”,\”categories\”: [ {{#classes}} { \”category\”:\”{{cat}}\”,\”score\”:{{prob}} } {{^last}},{{/last}}{{/classes}} ] {{/predictions}}{{/body}} }”
}

The template above allows supervised classification results to be directly fed and indexed into Elasticsearch, see http://www.deepdetect.com/tutorials/es-image-classifier/ for the full details. Also take note of the network object, that holds the output server to POST to. This object could be used within the input connector as well, to connect remote sources.

The template above is to be matched with at a typical supervised classification JSON output from the DeepDetect server:

“body”: {
“predictions”: {
“classes”: [
{
“cat”: “n03868863 oxygen mask”,
“prob”: 0.24278657138347626
},
],
“loss”: 0.0,
“uri”: “http://i.ytimg.com/vi/0vxOhd4qlnA/maxresdefault.jpg
}

This trick allows to get rid of glue code when integrating into existing pipelines, and nicely fits many enterprise use cases.

Let’s now give a quick look to the mllib component, with Caffe, then XGBoost:

// Caffe
“mllib”:{
"gpu":true,
"net":{
"batch_size":128
},
"solver: {
"test_interval":1000,
"iterations":16000,
"base_lr":0.01,
"solver_type":"SGD"
}
}
// XGBoost
"mllib": {
"iterations": 100,
"objective": "multi:softprob"
}

With Caffe, the server is instructed to use the GPU, and other parameters set the solver, the learning rate, etc… With XGBoost, the number of iterations and the objective are set. In both cases, these parameters are from the respective machine learning libraries, making it easier for users with some prior knowledge.

Now, importantly, the prediction resource remains in practice independent from the mllib component. And this is of importance to our observation that the lifetime of a machine learning service is mostly spent predicting from data:

curl -X POST 'http://localhost:8080/predict' -d '{"service":"covert","parameters":{"input":{"id":"Id",”separator”:","}},"data":["test.csv"]}'

The mllib component is omitted. Sometimes it can be useful however, typically when extracting features from a deep net. In our API jargon, this is akin to unsupervised learning, since the output is a tensor, not a class or a regression objective:

"mllib":{"extract_layer":"pool5/7x7_s1"}

In summary the core principles learnt from the elaboration of this machine learning API are:

  • Readability: all data structures are simple and human readable
  • Genericity: fits the generic endpoints of both supervised and unsupervised machine learning services
  • REST & Programmatic API: the API is available through the network, but remains identical from straight C++
  • Fictionality: ability to project the API further, to easily study the addition of features and resources

Let’s finish this short review with the last two points.

REST + Programmatic API

The presented API is available over HTTP and RESTful. However, it is independent from the network and can be used from straight C++, here is an example:

// Create machine learning service
APIData model_ad;
model_ad.add(“repository”,model_repo);
CaffeModel cmodel(model_ad);
add_service(service_name,std::move(MLService<CaffeLib,ImgCaffeInputFileConn,SupervisedOutput,CaffeModel>(service_name,cmodel)));
std::vector<std::string> vdata = { image_filename };
// Predict over data
ad.add("data",vdata);
APIData out;
predict(ad,0,out); // prediction output is in out
APIData adr; // empty but can serve to pass parameters
remove_service(service_name,adr); // service destruction

The important construct in the code above is APIData. It is the server’s internal representation of all of the API data structures. It is equivalent to JSON but for some ugly performance reason it is not exactly JSON.

However, it reads and converts to JSON. APIData allows to write machine learning services, train and predict with Caffe, XGBoost and others, using still the same API, but instead of JSON, the exact same parameters are fed to APIData objects.

The point above is of importance, as it means that changes to the API do not affect the inner server. Typically, adding a new parameter to the API, handling it from the machine learning code, does not modify the server itself.

API Fiction

The second and last point is “fictionality”, and remains my favorite. This is what allows to preempt the changes to the API in a simple manner. It also allows users to intuitively navigate the API without going back the documentation every few minutes.

I love to try it when pondering the future additions to the API. Here is a first one, the (fictionized) ability to schedule lists of jobs to the server, let’s fictionize a new resource named chain that would allow to chain up any list of API operations:

PUT /chain/chain_name
{
“chain”: {
“call”: {
“resource”:”predict”,
“calls”: [
{“service”:…},
{“service”:…}
]
}
}
}

This basically takes the JSON calls on existing resources as objects and chain them up. And here is a second one, let’s say we would like to use the chain resource to implement ensembling over multiple predictions:

PUT /chain/chain_name
{
“chain”: {
“call”: {
“ensembling”: {
“type”:"vote"
}
“resource”:”predict”,
“calls”: [
{“service”:…},
{“service”:…}
]
}
}
}

We could change the chain call for supporting different types of ensembling:

"ensembling": {
"type":"average"
}
"ensembling": {
"type":"weighs",
"weights":[0.3,0.7]
}
"ensembling": {
"type":"rank_average"
}

This machine learning API is still young, and much work remains. But as a fourth attempt, I believe it has hit the right level, at least the one I was looking for.

Despite being young, the API and the server are robust because they are simple. As such they already powers several in production services in both startups and large corporations. And that was quick, or at least it has felt that way.

Much remains to be done, and a malleable, simple and generic API is a great way to update a product in a lean manner. Better waste time fictionizing new additions while hiking than losing weeks or months of hard work in front of the screens.

Or at least it is how I do prefer my time to be spent.