Running an Occasionally Connected Neural Network at the Edge with Monocker (TempusML)

TensorFlow Model Serving with Microservices

--

by Randy Pitcher

I have felt for a long time that there is a vacuum in the way we discuss data science. I’ve read articles emphasizing the importance of clean data, I’ve watched videos that explain how to properly train and evaluate models, and I’ve seen discussions on the pros and cons of different data science tools.

Where this content falls short is in discussing what you do after you have a production-ready model. Does clean data, an accurate model, and having the latest tools really matter if no one is able to make use of your results?

Focus on Algorithms, Not Model Deployment, Consumption, Tracking, and Management

While the perfect data scientist would be able to expose the proper API for all models they build, it is unrealistic to expect that every data scientist within an organization is perfect or follows a defined process. Some will have the programming ability to build robust and scalable consumption layers, but many others will waste time struggling to build something that can’t be reused and that won’t scale. And really, does it make sense to waste a data scientist’s time on something that should be automatable?

But even after a consumption strategy has been implemented, you now have the problem of keeping track of all the ways your various models can be consumed.

Why force your team to constantly rewrite scattered model APIs when they could be working on your next latest and greatest algorithm?

(Note: An API, or Application Programming Interface, is just a way that two programs can talk to each other. For example, your browser is using an API to ask medium.com for this blog post.)

Check out this link for a much better, still readable explanation: https://medium.freecodecamp.org/what-is-an-api-in-english-please-b880a3214a82)

Automate Model Deployment and Tracking with Monocker (TempusML)

To help address these issues, we’ve started work on Project Monocker (a subset of Tempus ML).

Monocker is a Kubernetes+Docker based collection of services for rapid deployment of TensorFlow models. This allows your data science team to focus more on building models and less on wrestling with API design, or dealing with load balancing, or checking to make sure everyone knows where to find model deployments.

Monocker is open source developed and is released under the Apache 2 license.

Monocker Design Principles

At Hashmap, we emphasize simple solutions to big problems delivered quickly.

Monocker is built around 6 simple requirements.

- All models must be automatically deployed

- All deployments must have a REST API for using models

- Models must be easy to find and easy to document

- Model usage must be tracked and easy to view

- Monocker must be flexible and scalable for variable traffic

- Monocker must be able to operate in occasionally connected environments

To achieve these requirements in a flexible way, Monocker is Docker-based and built with a microservice architecture. This allows for rapid development of new features that can be quickly incorporated into Monocker with very little change management.

Monocker architecture is depicted below:

So How Does Monocker Work?

At a high level, monocker-models are created, register their information and location with the monocker-registry instance, and listen for model requests.

The monocker-api gets information about which models are available, where they are, and what they do from the monocker-registry. The api provides a UI for users to browse these models and an interactive API page for testing calls to the models (implemented with Swagger).

Importantly, the monocker-api ensures that each model is reachable through a REST API. This is important for traditional web applications that may have issues interacting with other communication protocols (tensorflow serving comes with RPC communication by default).

All logging is handled through the monocker-logging service. This is where errors are recorded as well as where model usage traffic is captured. This allows for efficient debugging and analysis in a single location.

The sqlite db provides a data persistence layer that can be easily accessed by Python for visualization and analysis purposes. It is also simple for connecting to a visualization solution for live monitoring.

A Point on Scalability and Automated Deployment with Kubernetes

An important final note is that every monocker service except the central database is independently scaleable. This means that if you have high user traffic, you can scale 3 more instances of the monocker-api to handle the traffic. This is all trivially simple thanks to Monocker’s integration with Kubernetes. This makes Monocker ready for cloud, on-prem, and hybrid environments.

Let Us Hear From You

Monocker is a simple and flexible tool that can help your data science teams generate more value, waste less time, and focus on what they enjoy most!

We add new features to Monocker every day, so please feel free to submit feature requests, bug reports, or send a pull request at our github: https://github.com/hashmapinc/Monocker

If you’d like more information about how Hashmap, Monocker, and TempusML can help your organization spend less time struggling with data science and more time benefitting from it, please reach out to me (randy.pitcher@hashmapinc.com).

Feel free to share on other channels and be sure and keep up with all new content from Hashmap at https://medium.com/hashmapinc.

Randy Pitcher is a Big Data Developer at Hashmap working across industries with a group of innovative technologists and domain experts accelerating high value business outcomes for our customers. You can connect with him on LinkedIn at linkedin.com/in/randypitcherii.

--

--