TensorFlow: How to export, freeze models with python API and deploy models with C++ API
Recently, I was in charge of deploying a YOLO network (deep learning network for real-time object detection) as part of the Audi autonomous driving cup (AADC). This lead me to explore how to train models, export and deploy them. This article will focus on exporting models from your training environment and deploying models for C++ applications.
It was a bit painful due to a lot of misleading tutorials and stackOverflow answers that require you to install Bazel and build TensorFlow (TF) on your machine, which you clearly do not need unless for using the C++ API and these tutorials were written for older TF versions. A good starting point for me was this article, but we will be doing things differently here.
A general machine learning application pipeline includes:
- Pre-processing training data
- Training along with validation and test sets
These two steps are and should be unrelated to how you deploy your model in production and the framework or programming language that will be running your algorithm. In reality, these above two steps are usually done on python and TensorFlow python API, and then you end up with a TensorFlow graph definition. As said before, this article will not go into how to train models but rather how to export, freeze and deploy them on C++. Next steps should be:
- Export a Graph definition (model) and the weights
- Freezing the graph
- Deploying the frozen graph
Step #1:
Somewhere in the code before your training loop and where you have access to the TensorFlow Session
the following call should be executed:
# Args:
# graph_or_graph_def: A Graph or a GraphDef protocol buffer.
# logdir: Directory where to write the graph. This can refer to remote filesystems, such as Google Cloud Storage (GCS).
# name: Filename for the graph.
# as_text: If True, writes the graph as an ASCII proto.tf.train.write_graph(session.graph_def, "./export", "network.pb", False)
This saves a serialized GraphDef
to a file that holds a network of nodes, each representing one operation, connected to each other as inputs and outputs. Most importantly here, the file represents just the structure with no weights. To test the existing nodes in the output file, the following python script prints all the nodes names in the graph:
import tensorflow as tf
from tensorflow.python.platform import gfile# TODO: point to where the graph is located
GRAPH_PB_PATH = ''with tf.Session() as sess:
print("load graph")
with gfile.FastGFile(GRAPH_PB_PATH,'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
sess.graph.as_default()
tf.import_graph_def(graph_def, name='')
graph_nodes=[n for n in graph_def.node]
names = []
for t in graph_nodes:
names.append(t.name)
print(names)
Next step is to save the weights; this can be easily done by using the following Saver API. You can init a tf.train.Saver
object before using the session (going into training or inference loop) and after variables declarations and initialization.
saver = tf.train.Saver()
Then, within your training loop you can call Saver#Save
, where the first argument is the Session
and the second argument is the directory to use and prefix for the files to be created
saver.save(session, "./export/my_model_checkpoint")
This saves four different files, but the important files here are; one is holding the weights (“.data”) and the other one (“.meta”) is holding the graph and all its metadata (so you can retrain it etc…)
But when we want to serve a model in production, we don’t need any special metadata to clutter our files, we just want our model and its weights nicely packaged in one file. This facilitate storage, versioning and updates of your different models.
Step #2:
Freezing a graph converts your variable tensors into constant tensors and it will combine the graph structure with the values from checkpoint file into one file, so when you import it to your C++ code, it has both your network architecture and your trained variables value.
One of the most common pitfalls I saw out there is tutorials or guides where you have to build TensorFlow with Bazel to get this done, but this is an unnecessary step and just complicates the whole procedure. To do the above step you must have already installed TensorFlow and you can just use it to freeze your model.
Just go to your installation directory of TensorFlow and you can execute the following and also refer to the documentation for more options and right usage
python python/tools/freeze_graph.py \
--input_graph=export/network.pb \
--input_checkpoint=<REPLACE_WITH_WORKING_DIR>/export/my_model_checkpoint \
--output_graph=<REPLACE_WITH_WORKING_DIR>/export/frozen_network.pb \
--output_node_names=<REPLACE_WITH_YOUR_MODEL_OUTPUT_NODES> \
--input_binary
Internally, the script uses the utility function tf.graph_util.convert_variables_to_constants
which replaces all the variables in a graph with constants of the same values per the documentation
If you have a trained graph containing Variable ops, it can be convenient to convert them all to Const ops holding the same values. This makes it possible to describe the network fully with a single GraphDef file, and allows the removal of a lot of ops related to loading and saving the variables.
You can also prune all the nodes you don’t use for inference, saving additional space and optimizing performance. Also refer to the documentation for more options and right usage
python python/tools/optimize_for_inference.py \
--input=<REPLACE_WITH_WORKING_DIR>/export/frozen_network.pb \
--output=<REPLACE_WITH_WORKING_DIR>/export/optim_frozen_network.pb \
--input_names=<REPLACE_WITH_YOUR_MODEL_INPUT_NODE> \
--output_names=<REPLACE_WITH_YOUR_MODEL_OUTPUT_NODES> \
--frozen_graph=True
Step #3:
After ending up with a frozen graph from the previous step, this step details how to do inference in C++ by means of a utility class and also noting some pitfalls. Project dependencies are:
- OpenCV C++
- Eigen
- Protobuf
- TensorFlow C++
Create ObjectDetection.h
file with the following content
#ifndef WRAPPER_H
#define WRAPPER_H
#include "tensorflow/core/public/session.h"
#include "tensorflow/core/platform/env.h"
#include "tensorflow/core/lib/io/path.h"
#include "tensorflow/cc/ops/const_op.h"
#include "tensorflow/cc/ops/image_ops.h"
#include "tensorflow/cc/ops/standard_ops.h"
#include "tensorflow/core/framework/graph.pb.h"
#include "tensorflow/core/framework/tensor.h"
#include "tensorflow/core/graph/default_device.h"
#include "tensorflow/core/graph/graph_def_builder.h"
#include "tensorflow/core/lib/core/errors.h"
#include "tensorflow/core/lib/core/stringpiece.h"
#include "tensorflow/core/lib/core/threadpool.h"
#include "tensorflow/core/lib/strings/str_util.h"
#include "tensorflow/core/lib/strings/stringprintf.h"
#include "tensorflow/core/platform/init_main.h"
#include "tensorflow/core/platform/logging.h"
#include "tensorflow/core/platform/types.h"
#include "tensorflow/core/util/command_line_flags.h"
#include <opencv2/opencv.hpp>
using namespace tensorflow;
using namespace cv;
using namespace std;
class ObjectDetectionWrapper {
int32 input_width;
int32 input_height;
string input_tensor_name;
string output_tensor_name;
unique_ptr<Session> session;
GraphDef graph_def;
public:
ObjectDetectionWrapper(int32 input_width_, int32 input_height_, const string& input_tensor_name_, const string& output_tensor_name_);
Status load_graph(const string&);
vector<Tensor> forward_path(const Mat&);
Tensor readTensorFromMat(const Mat&);
};
#endif //WRAPPER_H
This is how our header file looks like which includes various useful tensorflow, opencv and standard library headers and declares six members that define input size in the model input layer along with the input and output layer names. Also the session which is used to load and run predictions and the graph definition.
The public API for the wrapper is the constructor to initialize the members, loading the graph into the session, converting OpenCV: cv::Mat
to TensorFlow: tf::Tensor
and the last method is to do predictions on single image.
Then, in implementation file ObjectDetection.cpp
includes our header file and defines the non-empty constructor to configure our utility class
ObjectDetectionWrapper::ObjectDetectionWrapper(int32 input_width_, int32 input_height_, const string& input_tensor_name_, const string& output_tensor_name_) {
input_width =input_width_;
input_height=input_height_;
input_tensor_name=input_tensor_name_;
output_tensor_name=output_tensor_name_;
}
Then, we get to the functions implementation, first is to load the frozen graph into the session using its path
Status ObjectDetectionWrapper::load_graph(const string& graph_path) {
Status load_graph_status =
ReadBinaryProto(tensorflow::Env::Default(), graph_path, &graph_def);
if (!load_graph_status.ok()) {
return tensorflow::errors::NotFound("Failed to load compute graph at '", graph_path, "'");
}
session.reset(tensorflow::NewSession(tensorflow::SessionOptions()));
Status session_create_status = session->Create(graph_def);
return session_create_status;
}
Then to do predictions, the following can be used
Tensor ObjectDetectionWrapper::readTensorFromMat(const Mat& mat) {
int depth = mat.channels();
int batch = 1;
Tensor inputTensor(tensorflow::DT_FLOAT, tensorflow::TensorShape({batch, input_height, input_width, depth}));
auto inputTensorMapped = inputTensor.tensor<float, 4>();
const tensorflow::uint8 *source_data = (tensorflow::uint8 *) mat.data;
const tensorflow::uint8 *all_sources[1] = {source_data};
for (int b=0; b<batch; b++){
const tensorflow::uint8 *source_data_temp = all_sources[b];
for (int y = 0; y < input_height; y++) {
const tensorflow::uint8 *source_row = source_data_temp + (y * input_width * depth);
for (int x = 0; x < input_width; x++) {
const tensorflow::uint8 *source_pixel = source_row + (x * depth);
const tensorflow::uint8 *source_value_blue = source_pixel;
const tensorflow::uint8 *source_value_green = source_pixel + 1;
const tensorflow::uint8 *source_value_red = source_pixel + 2;
inputTensorMapped(b, y, x, 0) = (*source_value_red) / 255.;
inputTensorMapped(b, y, x, 1) = (*source_value_green) / 255.;
inputTensorMapped(b, y, x, 2) = (*source_value_blue) / 255.;
}
}
}
return inputTensor;
}
vector<Tensor> ObjectDetectionWrapper::forward_path(const Mat& camera_image) {
Mat input_mat;
vector<Tensor> outputs;
Tensor inputTensor;
resize(camera_image, input_mat, Size(input_width, input_height));
inputTensor = ObjectDetectionWrapper::readTensorFromMat(input_mat);
Status run_status = session->Run({{input_tensor_name, inputTensor}},
{output_tensor_name}, {}, &outputs);
if (!run_status.ok()) {
LOG(ERROR) << "Running model failed: " << run_status;
}
return outputs;
}
The above piece of code does the following: resize input image to the network input size as configured when calling the constructor and then convert it to Tensor
. This also normalizes the input; if you do not need that just remove the division by 255 from method readTensorFromMat
. Then a forward path is done on the network using Session#Run
and output is returned. The output has dimensions equal to the number of output layers so you would need an index access on the returned vector, i.e. if you only have one output layer then outputs[0]
holds the values of the layer.
Example usage of the wrapper class:
#include <iostream>
#include "ObjectDetection.h"
int main() {
// TODO: provide your working directory where input image and frozen graph residegraph
string root_dir = "/home/mohamed/code/deployObjectDetection/";
string image = "data/grace_hopper.jpg";
string graph = "data/inception_v3_2016_08_28_frozen.pb";
// TODO: provide your own model configurations
int32 input_width = 299;
int32 input_height = 299;
string input_layer = "input";
string output_layer = "InceptionV3/Predictions/Reshape_1";
Mat inputImg = imread(root_dir+image);
ObjectDetectionWrapper predictor(input_width, input_height, input_layer, output_layer);
// load frozen graph
Status load_graph_status = predictor.load_graph(root_dir+graph);
if (!load_graph_status.ok()) {
LOG(ERROR) << load_graph_status;
return -1;
}
// forward path on model
vector<Tensor> outputs = predictor.forward_path(inputImg);
if(outputs.empty()) {
LOG(ERROR) << "Running model failed";
return -1;
}
// TODO: post-processing of network output
return 0;
}
TL;DR
Have a look at my repository that contains the implementation and example usage. Look forTODO
in the code base to know what you need to change in order to fully run the project
References
- https://medium.com/@hamedmp/exporting-trained-tensorflow-models-to-c-the-right-way-cf24b609d183
- https://blog.metaflow.fr/tensorflow-how-to-freeze-a-model-and-serve-it-with-a-python-api-d4f3596b3adc
- https://stackoverflow.com/questions/38947658/tensorflow-saving-into-loading-a-graph-from-a-file
- https://www.tensorflow.org/guide/extend/model_files
- https://medium.com/@fanzongshaoxing/tensorflow-c-api-to-run-a-object-detection-model-4d5928893b02
- TensorFlow documentation