Tracking and Monitoring Transformers with MLFoundry
Efficient tracking and monitoring of Transformer models for Financial Sentiment Analysis using MLFoundry, by TrueFoundry
Natural language processing (NLP) has become increasingly popular in financial applications in recent years. Stock/forex market forecasting, volatility modeling, asset allocation, business taxonomy construction, credit scoring, initial public offering (IPO) valuation, and other applications are among them. One common task (or rather subtask, which can provide features for some tasks mentioned previously) in this financial domain is Financial Sentiment Analysis (FSA). The goal of FSA is to classify financial text as expressing bullish or bearish opinions on specific arguments.
In this new era of NLP, tasks like FSA have also been impacted by the dominance of transformers. Models like FinBERT perform significantly better compared to previous approaches.
In this tutorial, we will explore how to easily fine-tune any pretrained transformer for the FSA task using the Simple Transformers library. Then, we will experiment iteratively by changing parameters and base models and tracking the relevant information using MLFoundry. We will create a custom demo of our model using the MLFoundry Web App, which could be shared with others or showcased to users.
So, let’s dive in!
Tl;dr: If you wish to see only the working implementation of all that we cover in this article, please refer to this notebook. Although GitHub gists are used as code snippets throughout this article, if copied directly, they may not work as intended. Feel free to refer to the mentioned notebook in case you face such issues.
We first create a virtual environment for this project and install the necessary libraries:
Note: It is advisable to have some GPU access to train the transformer models because they are large and require considerable time for training otherwise. Using Google Colab can be a viable solution.
Next, we need to create an IPython notebook and import the required libraries. Optionally, we can also clear the cache in CUDA.
We do not need to set the
device in PyTorch explicitly because
simpletransformers library automatically takes care of that and uses GPU by default.
Sentiment Analysis for Financial News using Simple Transformers
Exploring & Processing the Dataset
For this tutorial, we use the processed FinancialPhraseBank data available from Kaggle. This dataset contains sentiments for financial news headlines as seen through the eyes of a retail investor. The
all-data.csv file contains two columns, Sentiment and News Headline. The sentiment can be negative, neutral, or positive. Thus, the FSA task is treated as a multi-class classification problem.
Since the Simple Transformers library requires data to be in Pandas DataFrames with at least two columns, named
text (of type
labels (of type
int), we do the required processing before splitting the data into training & evaluation sets.
Training a BERT Model for FSA
We define a simple training function that takes in the specifications of the model to create a
ClassificationModel, along with the training hyperparameters to perform the training loop. We also evaluate the model using Micro-F1 and Accuracy scores by leveraging
As a wrapper around the Hugging Face library, Simple Transformers makes it extremely simple to do all this by abstracting away the heavy-lifting required. Thus, we can create, train & evaluate our transformer with just 3–4 statements.
In the constructor of
ClassificationModel, the first parameter is the model_type, the second is the model_name, and the third is the number of labels in the data (set to
3 because we have three sentiments). Currently, Simple Transformers supports one of the available types as the model_type. The model_name can be from among any of the models available on Hugging Face.
After training the model, we evaluate it using
eval_model(). This function returns the following:
dictcontaining the performance metrics on the evaluation dataset (Matthews correlation coefficient and loss by default, along with micro-F1 and accuracy defined by us)
listof model outputs for each evaluation instance
listof inputs for which the model predicts incorrectly
The trained model checkpoints are stored in
There are many hyperparameters that you can change based on your requirement. The full list with their default values can be found below:
Before actually supplying the
trainModel() function with the required parameters, we will introduce MLFoundry to set up the tracking required for our experimentation.
Introducing MLFoundry for Tracking and Monitoring
MLFoundry is the ML Monitoring & Experiment Tracking solution created by TrueFoundry, that allows users to track their experiments, models, metrics, data & features. Each experiment, with a unique combination of parameters, dataset, and metrics is considered a run, and multiple such runs can be grouped logically as a project. Each run has a unique run_id, but can also be given a run_name for easier referencing. Later, these runs can be inspected and compared using an interactive dashboard.
Logging Experiment Details with MLFoundry
First, we import MLFoundry and initialize the API. Then, we create our project (called
financial-sentiment-analysis) with our first run named
bert_3epochs. This experiment will involve fine-tuning a BERT model for three epochs.
This creates a
mlf/ folder in the project directory that will contain all the information across the various runs logged by MLFoundry.
Now, we modify our
trainModel() function to accept a run as input and log all the required information (parameters, dataset, metrics, and dataset stats).
- To log the training & evaluation datasets, we use
- We log the model specifications (type and name), along with the hyperparameters as a dictionary using
- The dictionary containing the performance metrics on our evaluation set (accuracy and micro-f1) is logged using
- Various metrics related to our dataset, along with statistics like counters, summaries, histograms, and most frequent values are estimated using
whylogsautomatically when logged using
Now, we set the hyperparameters in the form of a dictionary along with the model specifications and call the
trainModel() function for the run that we defined previously.
This loads weights from the pre-trained
bert-base-uncased model and fine-tunes it based on the supplied hyperparameters for three epochs.
Navigating around the MLFoundry Dashboard
Once the training is complete, we can view all the logged information in the MLFoundry Dashboard by running the command
mlfoundry ui from within the project folder (containing the
mlf/ folder). This starts the dashboard on
localhost:4200 by default (we can change the port by running
Under the Single Run View, select
financial-sentiment-analysis as the Project Name and
bert_3epochs as the Run Name. Now, we can inspect all the information about that specific run that has been tracked, using the different tabs:
- The Model Health section shows the performance metrics of the current model on the evaluation dataset. These include a confusion matrix (since this is a multiclass classification task) along with other relevant plots.
- The Data Health section contains various stats related to our dataset, which can be used to understand the data quality and compare it against other datasets if there is a data change later.
- The Feature Health section shows the numerical distribution of labels and predicted values based on input features. For our case, there is only one input feature named headline, containing the financial news headline.
- The Run Details section displays all the parameters and metrics logged for the run and also allows users to view the datasets and other artifacts related to the run that were tracked.
We see that after three epochs of training, our fine-tuned BERT model can achieve an accuracy of 0.82 and micro-F1 score of 0.82.
Efficient Tracking and Comparison of Multiple Experiments with MLFoundry
Now, we will see how easy it is to experiment with different models and parameters for a specific task while being able to track all the relevant information for reference by creating different runs within the same project.
To demonstrate experimentation by changing hyperparameters, we create a new run called
bert_5epochs and pass it to the
trainModel() function along with the
training_args dictionary (this time, the value of
num_train_epochs in this dictionary is set to
5 instead of
model_params (no change).
To illustrate the use of different transformer architectures for experimentation, we create a new run called
roberta. This time, we keep the
training_args unchanged from the previous run (with
5 epochs) and change the
model_params to use the weights of pre-trained RoBERTa instead of BERT.
Once the training of these models completes, we again go to the MLFoundry Dashboard to inspect the tracked information by selecting the specific run_name under Single Run View. Now that we have more than one run, we can also compare them under Run Comparison.
The runs to be compared can be selected either using their run_id or run_name, the latter being more comfortable.
- When trained for more epochs, the BERT model showed better accuracy
- The RoBERTa model gives the best accuracy of 0.866
Scrolling below, we can also see the consolidated plot of all the performance metrics tracked for each run.
Model Demo using MLFoundry Web App
Having understood the process of iterative experimentation, tracking, and comparison of transformer models with MLFoundry, we now wish to demonstrate the performance of our best model (i.e. RoBERTa) to a broader audience. This is where the MLFoundry Web App comes in handy.
For this, we can create a standalone web app file that will be registered with the appropriate run using
log_webapp_file(). Since we want to make the demo for our RoBERTa model, we name this file
streamlit_roberta.py (it could be named anything else as well). In this file, we first need to write a function that can load a saved Simple Transformer model and predict the sentiment using the loaded model for an input financial news headline. Then, we need to initialize the MLFoundry client, create a run, and call
webapp() from the run by supplying it with the prediction function, type(s) of inputs and outputs (here, we have just one input and one output, each of type
text). This defines a model demo interface on the dashboard.
Back in our notebook where we trained our models, we can register this file with
mlf_run_3, which tracks all the information for our RoBERTa-based experiment.
In the dashboard, when we can select the
roberta run and open the Model Demo section, we see an interactive demo (similar to the Model Card in Hugging Face) that leverages our saved RoBERTa checkpoint to predict the sentiment of a financial news headline.
Since this web app is built using Streamlit, we can import
streamlit into the
streamlit_roberta.py file and use it to add other elements (for example, to add explainability, etc.) to this model demo dashboard as well for customization.
In this tutorial, we initially saw how to easily train transformer models for an NLP task like Financial Sentiment Analysis using Simple Transformers. Later, we used MLFoundry to track the parameters, metrics, datasets, and stats for the different experiments that we performed by varying some hyperparameters and the model architecture. In the end, we also created a web app to demonstrate the working of our trained model to predict the sentiments of financial news headlines.
Feel free to check out the References section below for more details regarding the documentation of the libraries mentioned in this tutorial. Please leave any feedback or suggestions in the comments section below.
TrueFoundry is building one of the fastest framework for ML Pipelines that relies on Open standards, saving 30–40% of a Data Science team’s time through their automated post-model pipeline. Feel free to sign up for early access to TrueFoundry’s ML monitoring and auto-scaling solution!
Tezan Sahu is on LinkedIn and GitHub.
- Financial Sentiment Analysis: An Investigation into Common Mistakes and Silver Bullets | aclanthology.org
- Simple Transformers
- Simple Transformers — Multi-Class Text Classification with BERT, RoBERTa, XLNet, XLM, and DistilBERT | by Thilina Rajapakse | The Startup | Medium
- MLFoundry | gitbook.io
- TrueFoundry’s ML Monitoring & Experiment Tracking Solution | by Nikunj Bajaj | Medium