ML Design Pattern #1: Transform

Moving an ML model to production is much easier if you keep inputs, features, and transforms separate

Lak Lakshmanan
Sep 19, 2019 · 4 min read

An occasional series of design patterns for ML Engineers. Full list here.

I will illustrate the Transform design pattern using BigQuery ML because SQL makes the concept easy to understand without getting bogged down in syntax. Then, I’ll explain how to implement the pattern in TensorFlow 2.0 and Keras.

Image for post
Image for post

The problem: Inputs != Features

CREATE OR REPLACE MODEL ch09eu.bicycle_model
OPTIONS(input_label_cols=['duration'],
model_type='linear_reg')
AS
SELECT
duration
, start_station_name
, CAST(EXTRACT(dayofweek from start_date) AS STRING)
as dayofweek
, CAST(EXTRACT(hour from start_date) AS STRING)
as hourofday
FROM
`bigquery-public-data.london_bicycles.cycle_hire`

This model has three features (start_station_name, dayofweek, and hourofday) computed from two inputs: start_station_name and start_date:

Image for post
Image for post

But the SQL code above mixes up the inputs and features and doesn’t keep track of the transformations that were carried out. This comes back to bite when we try to predict with this model. Because the model was trained on three features, this is what the prediction signature has to look like:

SELECT * FROM ML.PREDICT(MODEL ch09eu.bicycle_model,(
'Kings Cross' AS start_station_name
, '3' as dayofweek
, '18' as hourofday
))

Note that, at inference time, we have to know the transformations that were applied, and remember to send in ‘3’ for dayofweek. It’s not just that we have to know what features the model was trained on. That ‘3’ … is that Tuesday or Wednesday? Depends on which library was used by the model! This is one of the key reasons why productionization of ML models is so hard.

Transform to the rescue

CREATE OR REPLACE MODEL ch09eu.bicycle_model
OPTIONS(input_label_cols=['duration'],
model_type='linear_reg')
TRANSFORM(
SELECT * EXCEPT(start_date)
, CAST(EXTRACT(dayofweek from start_date) AS STRING)
as dayofweek
, CAST(EXTRACT(hour from start_date) AS STRING)
as hourofday
)

AS
SELECT
duration, start_station_name, start_date
FROM
`bigquery-public-data.london_bicycles.cycle_hire`

Notice how we have clearly separated out the inputs (in the SELECT clause) from the features (in the TRANSFORM clause). Now, the prediction is a whole lot easier. We can simply send to the model a timestamp:

SELECT * FROM ML.PREDICT(MODEL ch09eu.bicycle_model,(
'Kings Cross' AS start_station_name
, CURRENT_TIMESTAMP() as start_date
))

BigQuery ML keeps track of the transformations for you, saves in the model graph, and automatically applies the transformations during prediction. Neat, eh?

Note: at the time of writing, “transform” is in alpha.

Transformations in Keras / TensorFlow 2.0

Let’s say that we want to take in four inputs (pickup latitude, pickup longitude, dropoff latitude, dropoff longitude) and create a transformed feature which is the Euclidean distance. Let’s say that we also want to scale the inputs (BigQuery ML automatically scales the inputs).

  1. Make every input to the Keras model an Input Layer, and make every transformation a Lambda Layer. You will have four Input Layers:
inputs = {
colname : tf.keras.layers.Input(name=colname, shape=(), dtype='float32')
for colname in ['pickup_longitude', 'pickup_latitude', 'dropoff_longitude', 'dropoff_latitude']
}

2. Maintain a dictionary of transformed features, and scale these inputs using Lambda layers:

transformed = {}
for lon_col in ['pickup_longitude', 'dropoff_longitude']:
transformed[lon_col] = tf.keras.layers.Lambda(
lambda x: (x+78)/8.0,
name='scale_{}'.format(lon_col)
)(inputs[lon_col])
for lat_col in ['pickup_latitude', 'dropoff_latitude']:
transformed[lat_col] = tf.keras.layers.Lambda(
lambda x: (x-37)/8.0,
name='scale_{}'.format(lat_col)
)(inputs[lat_col])

You will also have one Lambda Layer for the euclidean distance, which is computed from four of the Input Layers:

def euclidean(params):
lon1, lat1, lon2, lat2 = params
londiff = lon2 - lon1
latdiff = lat2 - lat1
return tf.sqrt(londiff*londiff + latdiff*latdiff)
transformed['euclidean'] = tf.keras.layers.Lambda(euclidean, name='euclidean')([
inputs['pickup_longitude'],
inputs['pickup_latitude'],
inputs['dropoff_longitude'],
inputs['dropoff_latitude']
])

3. All five of these transformed layers will be concatenated into a DenseFeatures Layer:

dnn_inputs = tf.keras.layers.DenseFeatures(feature_columns.values())(transformed)

4. But wait! The constructor for DenseFeatures requires a set of feature columns — you will have to specify how to take each of the transformed values and convert them into an input to the neural network. You might use them as-is, or you might one-hot encode them or you might choose to bucketize the numbers. For simplicity, let’s just use them all as-is:

feature_columns = {
colname: tf.feature_column.numeric_column(colname)
for colname in ['pickup_longitude', 'pickup_latitude', 'dropoff_longitude', 'dropoff_latitude']
}
feature_columns['euclidean'] = tf.feature_column.numeric_column('euclidean')

5. Once you have a DenseFeatures, you can build the rest of your Keras model as usual:

h1 = tf.keras.layers.Dense(32, activation='relu', name='h1')(dnn_inputs)
h2 = tf.keras.layers.Dense(8, activation='relu', name='h2')(h1)
output = tf.keras.layers.Dense(1, name='fare')(h2)
model = tf.keras.models.Model(inputs, output)
model.compile(optimizer='adam', loss='mse', metrics=['mse'])

A complete example is here on GitHub.

Efficient transformations with tf.transform

Use tf.transform for an efficient way of carrying out transformations and saving them so that the transformations can be applied by tf-serving during prediction time. See this canonical TFX example. Obviously, that takes a lot more engineering.

The Startup

Medium's largest active publication, followed by +719K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store