From training to deployment: A tutorial on serving TensorFlow models with TensorFlow Serving

This tutorial assumes that you have TensorFlow and TensorFlow Serving installed and configured.

Step 1: Define the model

import tensorflow as tf

# Define the model
def model(x):
return x * W + b

# Define the loss function
def loss(y_pred, y_true):
return tf.reduce_mean(tf.square(y_pred - y_true))

# Define the optimization operation
def optimize(learning_rate=0.01):
return tf.compat.v1.train.AdamOptimizer(learning_rate).minimize(loss)

Step 2: Generate synthetic data

# Generate synthetic data
x_train = list(range(100))
y_train = [x + 1 for x in x_train]

Step 3: Train the model

# Initialize the variables
W = tf.compat.v1.Variable(0.0, name='W')
b = tf.compat.v1.Variable(0.0, name='b')

# Set the learning rate
learning_rate = 0.01

# Define the optimization operation
optimizer = optimize(learning_rate)

# Initialize the global variables
init = tf.compat.v1.global_variables_initializer()

# Start a TensorFlow session
with tf.compat.v1.Session() as sess:
# Run the initialization operation
sess.run(init)

# Training loop
for i in range(100):
# Select a random batch of data
indices = np.random.choice(len(x_train), size=32)
x_batch = [x_train[i] for i in indices]
y_batch = [y_train[i] for i in indices]

# Run the optimization operation
sess.run(optimizer, feed_dict={'x:0': x_batch, 'y:0': y_batch})

Step 4: Save the trained model

# Save the model
tf.saved_model.save(sess, './model')

Step 5: Serve the model using TensorFlow Serving

# Start the TensorFlow Serving server
tensorflow_model_server --model_name=sequential_number_predictor --model_base_path=./model

This will start the TensorFlow Serving server and load the model from the ./model directory. The server will start listening for requests on the default port (8501).

To test the server, we can use the curl command to send a request to the server and get a prediction:

# Send a request to the server and get a prediction
curl -d '{"instances": [1.0, 2.0, 3.0]}' -X POST http://localhost:8501/v1/models/sequential_number_predictor:predict

This should return a response with the predictions for the next numbers in the series:

{
"predictions": [2.0, 3.0, 4.0]
}

Step 6: Consume the model from your application

import requests

# Send a request to the server and get a prediction
response = requests.post('http://localhost:8501/v1/models/sequential_number_predictor:predict', json={'instances': [1.0, 2.0, 3.0]})

# Print the predictions
print(response.json()['predictions'])

This should print the same predictions that we got earlier using the curl command.

That’s it! You have now trained and served a TensorFlow model using TensorFlow Serving. You can now use the model to make predictions from your application or from other systems.

Happy Coding!

--

--

Group Product Manager @Twilio - Part-Time Crossfit Athlete.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store