Effortless model training and deployment using Falcon ML, ONNX and FastAPI

Oleh Kostromin
10 min readApr 20, 2023

--

Machine learning is becoming more widespread in various industries, driving the need for efficient and straightforward ways to train and deploy ML models. This is especially true for tabular models as tabular is the most common data format in business settings where it is used to analyze customer behaviour, forecast sales, and optimize supply chains, among other applications. In these rapidly changing environments, fast prototyping is often a key to success. However, the process of training, fine-tuning and deploying of ML models remains time-consuming and complex. Therefore, the tools that can streamline the development of the models can be extremely useful as they allow data scientists to quickly generate effective solutions and adapt to evolving business needs.

Contrary to the misconception that data scientists mainly focus on model building, their core task should be understanding the problem, the process behind it, and collecting relevant data to solve it. While model building is not the primary activity, it still consumes a significant amount of time. Feature preprocessing, handling of missing values, or treating imbalanced classes can be cumbersome when done manually. Additionally, an iterative process of hyperparameters optimization is also lengthy and might impact model performance. All of these tasks slow down the development process and lead to suboptimal results when not done carefully.

In addition to challenges related to model selection, serialization and deployment can also be a major obstacle. One of the most popular ways to persist an ML model is by using Python’s pickle format. Although widely used, it comes with significant drawbacks. It is not secure due to potential code injection vulnerability, requires all original libraries to be present and may break when library versions change. This fragility and lack of safety is suboptimal for the production environments.

A solid alternative to pickle is an ONNX serialization format which allows the transformation of a model into a computational graph of standardized nodes which can be easily shared and deployed across many different platforms regardless of the target operating system or programming language. ONNX promotes interoperability which helps users to avoid vendor lock-in and benefit from the strength of different ML frameworks. However, converting certain functionality of the ML pipeline into ONNX format is not always straight-forward.

To address both challenges, we developed Falcon, an open-source AutoML library designed to streamline the training and deployment process. With Falcon, developers can train ML models with a single line of code, making the process even more accessible for newcomers. Furthermore, each trained model (including the data processing steps) is exported to ONNX automatically without additional user input.

In this article we will explore the advantages of Falcon AutoML over a traditional manual training method using scikit-learn. We will demonstrate how to train a simple classification model for churn prediction, export it to ONNX format and deploy it as a REST-API microservice using FastAPI. Afterwards, we will compare the process with and without using Falcon to showcase the simplicity of the library.

Step 1: Training and saving the model

In this section we will use Telco Customer Churn dataset for building a simple classifier. This dataset consists of 19 features such as customers’ demographic information, services used, etc. The target column “churn” indicates whether the customer left within the last month.

Preparing the environment

As a first step we need to install all the required libraries. We will use scikit-learn, numpy and pandas for initial preprocessing and pipeline building. Additionally, we will use XGBoost for building the model as it is considered a state of the art technique for supervised predictive modeling on tabular data.

pip install "scikit-learn>=1.2.0" pandas matplotlib imbalanced-learn xgboost

For converting our pipeline into onnx, 3 additional libraries are required. Their exact purpose will be explained later on.

pip install skl2onnx onnxmltools onnx

When all the libraries are installed we can import the necessary submodules:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from imblearn.over_sampling import RandomOverSampler
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder, LabelEncoder
from sklearn.pipeline import make_pipeline
from sklearn.metrics import balanced_accuracy_score
from xgboost import XGBClassifier

from skl2onnx import to_onnx, update_registered_converter
from onnxmltools.convert.xgboost.operator_converters.XGBoost import convert_xgboost
from skl2onnx.common.shape_calculator import calculate_linear_classifier_output_shapes
from skl2onnx.common.data_types import FloatTensorType, StringTensorType
from onnx import save_model

Data preparation

Before training the model, we need to load and prepare the dataset. One important thing is to verify that there are no missing values. For example, in our case, there are 11 missing entries in the column “TotalCharges”. To solve this problem, we will follow the simplest technique by simply dropping the respective rows from the dataset.

df = pd.read_csv('train_dataset.csv')
print(df.shape)
df.head()
def explore_dataset(df):

features = []
dtypes = []
count = []
unique = []
nans = []

for item in df.columns:
features.append(item)
dtypes.append(df[item].dtype)
count.append(len(df[item]))
unique.append(len(df[item].unique()))
nans.append(df[item].isna().sum())

output = pd.DataFrame({
'Feature': features,
'Dtype': dtypes,
'Count': count,
'Nr Unique': unique,
'Nr NA': nans
})

return output

explore_dataset(df)
df.dropna(axis = 0, inplace = True)

Splitting the data

In order to estimate the performance of the model, we will allocate a fixed test set containing 25% of the original data. Generally, splitting into the train and test subsets must be done as early as possible, and before any additional preprocessing steps.

seed = 42
# features X
X = df.drop('Churn', axis = 1)
# target y
y = df['Churn']
# split the dataset into train and test subsets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = seed)

Feature preprocessing

A common model-agnostic preprocessing approach is to scale numerical features to mean = 0 and std = 1, and encode categorical features into one-hot vectors. Even though XGBoost does not strictly require scaling of numerical features, and can natively handle categorical ones (with additional configuration), we will still do both scaling and encoding for the completeness of the example. To combine both preprocessors into a single operation, we can use a ColumnTransformer which assigns independent preprocessors to the specified columns. For the target column we will simply apply a LabelEncoder which replaces string categories with consecutive integers.

num_columns = [4, 17, 18]
cat_columns = [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
transformers = [("num", StandardScaler(), num_columns), ("cat", OneHotEncoder(handle_unknown='ignore', sparse_output=False), cat_columns)]
ct = ColumnTransformer(transformers)
# fit the transformers using X_train
ct.fit(X_train)
le = LabelEncoder()
# fit label encoder
le.fit(y_train)

Dataset rebalancing

As a final preprocessing step we need to make sure that the dataset is balanced. Otherwise, the minority class will be underrepresented and there is a risk that it can be ignored by the model. In our case, as can be seen on the bar chart below, there are significantly more instances of the samples with Churn = No. Therefore, we need to compensate for that. Again, we will follow the simplest approach and simply duplicate the samples of the minority class to equalize the proportion.

unique_labels, counts = np.unique(df['Churn'], return_counts=True)
plt.bar(unique_labels, counts)
plt.xlabel('Churn')
plt.ylabel('Number of samples')
plt.title('Churn distribution in the dataset')
plt.show()
X_train_resampled, y_train_resampled = RandomOverSampler(random_state = seed).fit_resample(X_train, y_train)

Training

In order to determine the most optimal hyperparameters, we will be applying a grid search technique that exhaustively loops through a specified set of hyperparameter candidate values.

Afterwards, we can train the final model with the best found hyperparameters and evaluate the performance on the test set.

# specify heperparameters grid to search over
parameters = [{
'booster': ['gbtree', 'dart'],
'n_estimators': [50,100,150],
'max_depth': [None, 5, 10],
'random_state': [42]}]
clf = GridSearchCV(
XGBClassifier(), parameters, scoring='balanced_accuracy', verbose = 3
)
clf.fit(ct.transform(X_train), le.transform(y_train))
# fit the final pipeline
pipeline = make_pipeline(ct, XGBClassifier(**clf.best_params_))
pipeline.fit(X_train_resampled, le.transform(y_train_resampled))
# evaluate
y_pred = pipeline.predict(X_test)
balanced_accuracy_score(y_test, le.inverse_transform(y_pred))

Exporting to ONNX

In order to export the whole scikit-learn pipeline into ONNX we can use skl2onnx tool. However, an XGBoost classifier is not a native scikit-learn model, hence skl2onnx is unaware how to properly convert it. To fix that, we need to provide a custom converter. Luckily, another package `onnxmltools` already contains the required XGBoost converter which is also compatible with skl2onnx, so we can simply reuse it.

After the converter is registered, we need to prepare the specification of the model input. As our dataset contains a mix of numerical and categorical features, and ONNX tensors are strongly typed, we will not be able to have a single matrix/tensor as an input. Instead, each column will be passed and preprocessed independently. We specify StringTensorType for categorical features and FloatTensorType for numerical ones. Each input will have a shape [None, 1] corresponding to a single column and arbitrary batch size (number of samples).

The rest of the conversion process is straight-forward. We call `to_onnx` function and pass the pipeline and the inputs. Additionally, we provide an extra config option for a standard scaler which is needed to compensate for the fact that scikit-learn standard scaler uses float64 and onnx scaler uses float32, but we will not go into the details here.

update_registered_converter(
XGBClassifier,
"XGBoostXGBClassifier",
calculate_linear_classifier_output_shapes,
convert_xgboost,
options={"nocl": [True, False], "zipmap": [True, False, "columns"]},
)

initial_types = []
for i, c in enumerate(df.columns[:-1]):
if i in cat_columns:
tensor_type = StringTensorType
else:
tensor_type = FloatTensorType
initial_types.append((c, tensor_type([None, 1])))
onx = to_onnx(pipeline, initial_types = initial_types, options={StandardScaler: {"div": "div_cast"}})

save_model(onx, 'model.onnx')

Step 2: Deploying the model

In this section, we will create a simple web microservice that exposes a single /predict endpoint.

To begin with, we need to install FastAPI and Uvicorn. Additionally, we need an onnxruntime to run the model.

pip install fastapi "uvicorn[standard]" onnxruntime

Once we have all of our dependencies installed, we can begin developing the microservice. First, we need to create a file called constants.py that will hold model-related metadata, such as the input names (corresponding to feature names), categorical indices, and the names of the classes.

# constants.py
MODEL_PATH = "model.onnx"

INPUT_NAMES = [
"gender",
"SeniorCitizen",
"Partner",
"Dependents",
"tenure",
"PhoneService",
"MultipleLines",
"InternetService",
"OnlineSecurity",
"OnlineBackup",
"DeviceProtection",
"TechSupport",
"StreamingTV",
"StreamingMovies",
"Contract",
"PaperlessBilling",
"PaymentMethod",
"MonthlyCharges",
"TotalCharges",
]

CAT_IND = [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]

# can be obtained by calling le.classes_ on the fitted LabelEncoder
CLASSES = ["No", "Yes"]

Next, we need to create a function that takes a list of lists as an input, where each inner list represents a single sample, and converts it into a format that can be processed by the ONNX runtime.

# convert.py
from typing import List, Dict
import numpy as np
from constants import CAT_IND, INPUT_NAMES

def convert_input(X: List[List]) -> Dict[str, np.ndarray]:
X = np.array(X, dtype=object)
onnx_input = {}
for i, n in enumerate(INPUT_NAMES):
if i in CAT_IND:
onnx_input[n] = X[:, i].astype(str).reshape(-1, 1)
else:
onnx_input[n] = X[:, i].astype(np.float32).reshape(-1, 1)
return onnx_input

Finally, we create a small FastAPI instance with a single endpoint that takes in the data and returns the predictions.

# main.py
from typing import List
import onnxruntime as ort
from constants import MODEL_PATH, CLASSES
from convert import convert_input
from fastapi import FastAPI

app = FastAPI()

sess = ort.InferenceSession(MODEL_PATH)

@app.post("/predict")
def predict(X: List[List]):
pred = sess.run(["output_label"], convert_input(X))[0].tolist()
y = [CLASSES[i] for i in pred]
return {"y": y}

We can start the server by running the command:

uvicorn main:app --reload

If everything has been set up correctly, an auto-generated documentation page should be available at localhost:8000/docs. By expanding the description of the /predict endpoint and clicking on “Try it out”, we can verify that it works properly. As shown on the image below, we can enter the features of the data points in the first text field, and the server will respond with a list of predictions (displayed at the very bottom of the image).

Step 3: Repeating the steps (1) and (2) using Falcon

Now we will repeat the steps above, but using the Falcon ML library, which will significantly reduce the amount of code that we have to write.

First, let’s install Falcon itself. Since we used XGBoost as our model in the previous guide, we’ll use it again. For that we also need to install the falcon-ml-xgboost extension. Extensions is another nice feature of Falcon: it only includes the absolutely necessary dependencies by default, but allows for easy extensibility. Once the extension is installed, Falcon will register it automatically without any additional actions required from the user.

pip install falcon-ml falcon-ml-xgboost

As we already mentioned, with Falcon it is possible to train a model in a single line of code.

from falcon import AutoML

AutoML(
task = 'tabular_classification',
train_data = 'train_dataset.csv',
config = 'XGBOOST::OptunaLearner'
)

You may have noticed that we provide a configuration name called `XGBOOST::OptunaLearner`. The name before :: indicates the name of the extension (in our case, XGBoost), while OptunaLearner means that the Optuna framework will be used to optimize the hyperparameters.

At the end of the training procedure, Falcon will save the model into the current working directory, allowing us to immediately proceed with deployment.

In addition, Falcon provides a small wrapper around ONNX Runtime that takes care of parsing model inputs (and their types) and converting the data to the required format. As a result, we no longer need the constants.py and convert.py files.

# main.py
from typing import List
from falcon.runtime import ONNXRuntime
import numpy as np
from fastapi import FastAPI

app = FastAPI()

rt = ONNXRuntime("model_falcon.onnx")

@app.post("/predict")
def predict(X: List[List]):
y = rt.run(np.asarray(X, dtype=object))[0].tolist()
return {"y": y}

Conclusion

In conclusion, using Falcon ML, ONNX and FastAPI we were able to both train and deploy a model in under 15 lines of code. Such a simplification in the model building process enables data scientists to focus on understanding and solving core business problems at hand, rather than spending excessive amounts of time on model training and tuning.

You can learn more about Falcon ML by visiting the following links:

--

--