Serverless TensorFlow Model in AWS — Training and Optimizing a Binary Classification Model — pt 1

9 min readOct 8, 2023

Written by: Adam Burek and Gustavo Jurado

While developing our machine learning workflow, we were faced with the challenge of how to host our trained model. Our workload required multiple predictions to be generated for less than a 1 minute each day. However, the out-of-the-box solutions only offered model hosting that would run 24/7.

Our primary objective is to provide a cost-efficient solution for hosting and deploying a TensorFlow model on AWS, while also offering flexibility for our burstable workload.

This article is part of a two part series:

In this first article, we discuss the steps we took to train, optimize, and package our model to be uploaded to S3.
In the second article, we will use the SAM framework to deploy a lambda function that will retrieve our trained model. You can click this link to jump to the second article in this series.

Training a Local TensorFlow Model

The first step is to build and save a Tensorflow model locally, which we will later upload to AWS. For this example, we will take a popular dataset from the 1912 sinking of the RMS Titanic and build a basic survival-prediction model. The model will generate a prediction of survival for the ship’s 891 passengers based on the available feature set.

Loading and preprocessing the dataset

url = 'https://github.com/datasciencedojo/datasets/raw/master/titanic.csv'
titanic = pd.read_csv(url)

To get started, we load the data from a GitHub repository onto a Jupyter notebook. We then continue with cleaning the data to prepare it for model training.

titanic = titanic.drop(['Name', 'Ticket', 'Cabin', 'PassengerId'], axis=1) #irrelevant fields
titanic['Age'].fillna(titanic['Age'].mean(), inplace=True) #fill in missing values
titanic['Sex'] = titanic['Sex'].map({'female': 1, 'male': 0}) #make sex binary
titanic = pd.get_dummies(titanic, columns=['Embarked']) #turn ‘Embarked’ into dummy variables
titanic.dropna(inplace=True) #drop rows with null values

The goal of preprocessing is to have a smoother and more efficient training of our machine learning model. We accomplish this by:

Removing unnecessary columns such as ‘Name’, ‘Ticket’, ‘Cabin’, and ‘PassengerId’, which are not relevant factors for predicting a passenger’s survival and introduce unwanted bias.
To handle missing values in the ‘Age’ column, we fill them in with the mean age of the passengers.
We convert the ‘Sex’ field into a binary format by mapping ‘female’ to 1 and ‘male’ to 0.
We also transform the ‘Embarked’ column into dummy variables using one-hot encoding. To do this, we create three new binary columns: ‘Embarked_C’, ‘Embarked_Q’, and ‘Embarked_S’ that tell us which of the three possible values (‘C’, ‘Q’, and ‘S’) were contained in the original field.
Lastly, we drop rows with missing or null values to avoid potential issues or biases in our analysis.

X = titanic.drop('Survived', axis=1)
y = titanic['Survived']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

After preprocessing, we move on to splitting the dataset into training and testing sets. Partitioning the data allows us to evaluate the model’s performance and ability to generalize.

During the initial training phrase, 80% of the data is randomly selected and allocated to the training set, which is used to define the model and map the underlying correlations in the dataset. Afterwards, the remaining 20% is used as the testing set to assess the model’s performance on unseen data. The goal here is to generate an unbiased prediction of the likelihood of survival, based on all the variables provided in the dataset.

Model Training and Evaluation

We start by defining the structure of our model. We can think of this step like creating a blueprint or framework that will be used to make predictions. In this case we use a Sequential model, which is a linear stack of layers in which the output of each layer serves as the input to the following layer, providing a simple and intuitive architecture for the data to flow through.

model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(32, activation='relu', input_shape=(9,)),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

So far the model architecture consists of two layers:

The first layer of the model is like the initial building block that receives input data and performs calculations to extract meaningful information. It consists of 32 individual neurons or processing units, each representing a specific function. These neurons take in the input data, which has a shape of (9,), meaning there are 9 features or variables associated with each data point. The ReLU (Rectified Linear Unit) activation function is applied to the outputs of the first layer, ensuring that only positive values are passed through, while any negative values are turned into zeros.
The second and final layer takes the information passed from the previous layer and makes a prediction. It uses the ‘sigmoid’ activation function, which produces a value between 0 and 1 representing the two possible outcomes. In this case, whether a passenger survived or not.

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

Once the model is defined, we need to compile it before training. This step involves configuring the model with specific settings that determine how it will be trained. These are the key components:

Optimizer: For this model, we use the ‘adam’ optimizer, which is a popular and effective optimization algorithm. The Adam optimizer combines the advantages of two other optimization algorithms: Adaptive Gradient Algorithm (AdaGrad) and Root Mean Square Propagation (RMSProp). It dynamically adjusts the learning rate during training based on the gradient’s magnitude, which helps the model converge faster and more efficiently.
Loss Function: The ‘binary_crossentropy’ loss function is commonly used for binary classification tasks. It quantifies the difference between the predicted output and the true target values in the training data, telling us how well the model is performing in terms of predicting the binary outcomes.
Metrics: We specify ‘accuracy’ as the metric we want to monitor during training. Accuracy measures the percentage of correct predictions made by the model.

# Train the model
model.fit(X_train, y_train, epochs=100, validation_split=0.2)


# Make predictions on the test set
y_pred = model.predict(X_test)

Once the model is defined, we pass the training data to the ‘fit()’ function, iterating for 100 epochs and withholding 20% of the data to validate the model’s performance during training.

After training the model, we make predictions on the test set using the ‘predict()’ function. The results are a set of numerical values between 0 and 1, representing the probability of survival for each passenger. However, in order to classify passengers into ‘survived’ or ‘did not survive’ categories, we need a systematic way to interpret these probability values and convert them into binary values. TensorFlow provides the ‘predict_classes()’ method to transform prediction probabilities into binary outputs, but it naively applies a threshold of 0.5. To check if this is a good threshold for our dataset, let’s plot the survival probabilities.

sns.distplot(y_pred, kde=False, bins=30)
plt.xlabel("Prediction")
plt.ylabel("Count")
plt.title("Histogram of Model Predictions")
plt.show()

Frequency Distribution of Model Predictions

The histogram above shows the predicted survival probabilities of passengers in the test set. The frequency distribution is clearly skewed, which means it doesn’t make sense to simply choose a 50% probability as the cutoff point for survival. Therefore, we need a more rational approach to classify our survival predictions into binary categories.

Instead, we will employ an ROC (Receiver Operating Characteristic) curve analysis to make our model binary. This method offers significant benefits over the standard predict_classes method, which is why it is widely used in medical research, insurance risk assessment, and in the social sciences.

Determining the Optimal Threshold

When using machine learning models to predict outcomes, such as when predicting whether someone survived or not as in our case, the traditional approach is to use a threshold of 0.5. If the predicted probability is above 0.5, we classify the outcome as ‘Survived’, and if it is 0.5 or below, we classify it as ‘Did not survive’.

However, as we saw in the previous histogram, relying solely on this threshold does not always yield the most accurate results. As an alternative, the Receiver Operating Characteristic (ROC) curve, evaluates the performance of our model at different probability thresholds, allowing us to identify the optimal threshold that balances the trade-off between true positive rate (sensitivity) and false positive rate (1 — specificity). This will help us select a threshold that maximizes the accuracy of our predictions and leads to more reliable survival outcomes.

fpr, tpr, thresholds = roc_curve(y_test, y_pred)
# Plot the ROC curve
plt.plot(fpr, tpr)
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.show()
# Calculate the AUC score
auc_score = roc_auc_score(y_test, y_pred)
print(f"AUC Score: {auc_score}")
# Find the best threshold based on the ROC curve
best_threshold = thresholds[np.argmax(tpr - fpr)]
print(f"Best threshold: {best_threshold}")

AUC Score: 0.8812097812097812
Best threshold: 0.3392092287540436

To select the optimal threshold for classification we generate the ROC curve for our model, then find the point on the curve that maximizes the difference between the True Positive Rate (TPR) and False Positive Rate (FPR). Based on this metric the selected threshold value will convert the predicted probabilities into binary predictions with greater accuracy than the naive 0.5 threshold.

The first step is to calculate the False Positive Rate (FPR), True Positive Rate (TPR), and corresponding threshold values using the ‘roc_curve’ function. These values are computed by comparing the true labels y_test with the predicted probabilities y_pred.

We then plot the ROC curve using plt.plot to visualize the trade-off between the FPR (x-axis) and TPR (y-axis). This shows how the model’s performance varies at different threshold values. The optimal score will be closest to the upper left corner.

Next, we calculate the Area Under the ROC Curve (AUC) score using the ‘roc_auc_score’ function. The AUC score provides a single metric to evaluate the overall performance of the model across all possible threshold values. It quantifies the model’s ability to distinguish between positive and negative instances, with a higher AUC score indicating better performance.

Finally, we locate the optimal threshold by finding the index with the maximum TPR-FPR difference and retrieve the corresponding value from the threshold’s array. Using this threshold, we convert the predicted probabilities into binary values by comparing them to the threshold and assigning 1 or 0 accordingly. This helps us account for skews in the training set like the titanic dataset where fewer people survived.

It’s good to keep in mind that the training set will determine your AUC score, and therefore your optimal threshold. Every time you draw a different training set you will get a slightly different optimal threshold. We are making the assumption that the training set we randomly select is a good representative sample of the whole dataset.

Model Evaluation and Saving

Once we have our optimal threshold, we incorporate it into our model before saving it.

# Create a Lambda layer that converts model outputs to binary using the best threshold
binary_layer = tf.keras.layers.Lambda(lambda x: tf.where(x < best_threshold, 0, 1))(model.output)

# Create a new model that outputs the binary layer
binary_model = tf.keras.Model(inputs=model.input, outputs=binary_layer)

We use the ‘tf.keras.layers()’ lambda function to create a lambda layer. The Lambda layer applies a lambda function to the model’s output, allowing you to perform custom operations on the output tensor. In this case, the lambda function checks each value in the tensor and converts it to either 0 or 1 based on our threshold.

Inside the lambda function, we use ‘tf.where()’ to perform the conversion. The ‘tf.where()’ function takes three arguments: a condition, a value to return when the condition is True, and a value to return when the condition is False. In this case, the condition is x < best_threshold, where x represents each probability value. If the condition is True, the value 0 is returned, indicating the prediction of not surviving. If the condition is False, the value 1 is returned, indicating the prediction of surviving.

Using the Lambda layer, we create a new binary_model using ‘tf.keras.Model()’. This new model takes the same inputs as the original (specified by inputs=model.input) but outputs the binary predictions obtained from the Lambda layer (specified by outputs=binary_layer).

The added layer effectively transforms the continuous probability outputs into binary predictions based on the provided threshold value (best_threshold). This allows us to obtain direct binary outputs from the new model, simplifying the interpretation and evaluation of the model’s predictions.

# Evaluate the model on the test set
loss, accuracy = model.evaluate(X_test, y_test)
print(f"Test Loss: {loss}, Test Accuracy: {accuracy}")

# Save the model in the TensorFlow SavedModel format, including the threshold value
model.save('titanic_saved_model', save_format='tf', signatures={'serving_default': predict})
with open('titanic_saved_model/threshold.txt', 'w') as f:
    f.write(str(best_threshold))

Finally, we save the trained model using the TensorFlow ‘.save’ method for future use. Additionally, we store the optimal threshold value in a text file called ‘threshold.txt’ to ensure consistency when using the model for predictions.

By utilizing the ROC curve method, we achieved a more data-driven and context-specific threshold selection, surpassing the standard 0.5 threshold used in the TensorFlow package. The presented code demonstrates the implementation of this method on the Titanic dataset and provides a valuable resource for training, evaluating, and saving the model. With this foundation you can confidently proceed to deploy the model on AWS and apply it to real-world scenarios requiring accurate binary classification predictions.

Conclusion

In this article, we used the Titanic dataset to build a local TensorFlow model, optimized the classification output using an ROC curve analysis, added the lambda layer to the model, and finally saved our local model to be later uploaded into AWS S3.

In the second article of this two part series, we write a lambda function to host our model in AWS. You can read the article by clicking on this link.

Code Repository: https://github.com/SouthernYoda/Serverless-TensorFlow-Model-in-AWS

Serverless TensorFlow Model in AWS — Training and Optimizing a Binary Classification Model — pt 1

Training a Local TensorFlow Model

Determining the Optimal Threshold

Model Evaluation and Saving

Conclusion

Written by Adam Burek