How to run many deep learning models locally
Quick! Build two networks and have them predict on a webcam! Sounds hard right?
Motivation for this type of abstraction
Software is building blocks. As the saying goes “It’s turtles all the way down”. Much of the advancement in deep learning has been focused on the hard work of getting models to work well and perform in a research setting. Now it’s time to bring attention to how we use the models in a day to day environment.
Starting point — run a single model
When to use local
Running a model locally (instead of through an API call) usually makes sense for
- processing a high volume of information
- time sensitive data
- data security
Here I’ll walk through one way to run a model locally using tensorflow.
- Perform a one time setup to load the brain
- Run new images
- Assume we wish to run multiple brains, and to keep each brain independent.
The three parts we setup are:
- The weights
- The graph definition
- A label map
Tensorflow’s saved model format bundles the weights and graph definition. Collectively we refer to all three as part of the Brain.
The setup process may be thought of as:
- Getting the brain. While we want to run the brain locally, we assume the latest version is on a remote server.
- Load brain into memory.
Getting the brain
- Make a call to endpoint to get a link to download the brain remote file paths
- Make the request to the remote file to get the model
- Fiddle with the label maps.
Label map fun
There are at least 3 label maps.
The rationale for this is:
- The name is considered arbitrary, so the real reference point is the file_id
- The model_id is the sequential id, usually starting at 0, that the model has actually been trained on.
Given the volume of label is usually < 100 and relatively static, it makes some sense to simply have these dictionaries available for fast access depending on where we are converting from / to.
Load model into memory
- Read the file we just downloaded
- Get a session ready²
with tf.gfile.GFile(self.model_path, 'rb') as fid:
serialized_graph = fid.read()
graph_def.ParseFromString(serialized_graph)self.sess = tf.Session(graph=self.graph)
Great! Now we are ready to run the model on demand.
- Open the image and read the data³
- Run session
- Parse high confidence values
The abbreviated version is:
image = open(path, "rb")
image = image.read()
image = tf.compat.as_bytes(image)
We then run the session and parse high confidence values
for i in range(self.boxes.shape):
if self.scores[i] > self.min_score_thresh:
Current algorithms usually have a bunch of low confidence predictions that must be discarded.
By default we create a Inference() object that contains Instance() objects for each high confidence prediction.
The goal being to make it easier to work with the output through a standardized interface.
We abstract the brain setup into:
brain = project.get_model(
name = None,
local = True)
Then the run:
inference = brain.predict_from_local(file_path)
- Clean abstraction for different deep learning methods, local vs online prediction, and file types
- Designed for changing models and data. The same object you call .train() on can also call .predict()
- Ground up support for many models. See local_cam for one example.
The goal is to be able to call a similar method, be it for an object detection problem, semantic segmentation, or some future method.
Two brains are better than one
Get two brains:
page_brain = project.get_model(
name = "page_example_name",
local = True)graphs_brain = project.get_model(
name = "graph_example_name",
local = True)
We open an image from a local path and runs both brains on same image. We are only reading the image once, so you can stack as many as you need here. (of course memory and compute implications as you add more.)
image = open(path, "rb")
image = image.read()page_inference = page_brain.run(image)
graphs_inference = graphs_brain.run(image)
Why many models?:
- pages all look similar
- what’s on the page will likely have a lot more variance, and require a lot more data.
So the trade off here is:
- More compute
- Less annotation
- More flexibility — it’s easier.
This is the age old argument of new vs old. And historically, we tend to favor what’s easier.
Thanks for reading!
The SDK is a work in progress. There’s a lot of stuff in current version I’m not happy with yet! If you see any glaring issues or feature ideas please feel free to create an issue here.
1, As is standard the model was automatically trained. It was fine tuned from prior data. It was trained on similar images in the same book. If this sounds like “Cheating” — consider this the new age of working with deep learning, where we purpose build our training data to most closely fit our test distribution. It works!
2, Feed dict is considered not a great way to do it in some cases, and the whole setup here assumes we must construct the graph definition which sounds to be done differently in TF 2.0.
3, The online prediction at time of writing assumes encoded_string_tensor, which requires the acrobatics with using tf.compat.as_bytes()