Overview

In this article, I extend on the Mergers and Acquisitions (M&A) predictive modeling that has been published earlier by incorporating Natural Language Processing (NLP) based news sentiment variable into it. The original model used only financial variables and utilized Logistic regression for M&A target identification and showcased if that produces an abnormal return for investors. Instead, the purpose of this article is to test if news sentiment derived by NLP has any significant contribution to M&A predictive modeling. To do that, the significance (by comparing evaluation metrics) of news sentiment is tested on different Machine Learning (ML) models, including logistic regression, random forest, and XGBoost models. As it comes to the NLP model, Finbert and BERT-RNA models are tested to calculate sentiment on the news preceding M&A announcement.

The motivation behind using the news sentiment variable comes from a literature finding suggesting that target companies generate significant run-up returns during one month before the announcement of the deal. The problem here is that abnormal returns may happen not only because of the potential future merger announcements but also because of other positive news impacting the share prices. Thus, overall news sentiment should be evaluated and discussed in relation to the abnormal return. Our hypothesis here is that only abnormal return amid no or low positive news sentiment environment is an indication of M&A announcement.

The article has the following structure. In Section 1, datasets for target and non-target companies are constructed. For the target dataset, the RDP search function is used to get the list of target companies for the specified period. For the non-targets, PEER screen function is used to request peer companies of the target list. Financial variables for both target and non-target companies are requested via the get_data function. The article utilizes Refinitiv Data Platform (RDP) API to access the required data. In section 2, news sentiment prior to M&A is calculated via NLP techniques. Finally, in section 3, the performance of different ML models with and without news sentiment variable, calculated both by FinBert and BERT-RNA models, is evaluated.

Install and import packages

To start, the necessary packages are imported and installed. I use the Refinitiv Data platform API to get the data. To do that, we need to authorize ourselves with an app key. The code is built using Python 3.9. Other prerequisite packages are installed below.

We need also to import the following packages:

In addition to the desktop session, I open the RDP platform session to be able to access news data on RDP.

Section 1: Construct dataset for predictive modeling

In order to train and evaluate any classification model, dataset of at least two classes is required. Thus, two separate datasets for target and non-target companies are therefore constructed. First, I access M&A data using the Search function of RDP API. Then, I get the list of target companies and request financial variables using the get_data function. Next, I use the PEER screen function to get peer companies and construct the non-target dataset for the models. Finally, these datasets are merged into a single one with appropriate labels in order to estimate the models’ outputs.

1.1 Construct dataset for the target group of companies

The list of target companies is requested using DealsMergersAndAcquisitions Search view. The following criteria are used to filter the data and access the ones needed for the current model:

Form of the transaction is equal to Merger or Acquisition — acquisition of majority or partial interest is not included in the model.

Form of Transaction is equal to Completed, Pending or Withdrawn — I have included pending and withdrawn deals as well, as those are claimed to provide abnormal returns for the investors.

Transaction Value is greater than USD 100 mln — this is set to exclude very small deals.

Target company is equal to public — we are interested in only public companies as we want to buy the stock of those companies classified as a target by the model.

Acquirer Company Name is not equal to Creditors or Shareholders — this filter is used to include only acquisitions by an actual company.

Transaction Announcement Date is less than 2021–11–15 and greater than 2020–09–15 — the upper limit is set to fix the number of companies; otherwise, every day running the code would add new entries and affect the reproductivity of the model. As it comes to the lower limit, it is set to meet the current restriction of RDP API, which is the ability to get news data for the last 15 months only.

Target Country is equal to US or UK — In the initial model I included only US companies; however, here, to increase the sample size, deals from the UK are also included. UK is the closest to the US in terms of M&A activity and market reactions to the deal announcements.

The code below requests M&A data using the filters specified above and orders the data by the announcement date in descending order. More on how you can use search, including guidance, examples, and tips to determine the possible approaches, from simple discovery through experimentation to more advanced techniques, are presented in this article.

Number of M&A deals for the specified period is 324
png

One very valid question that may pop up from the code above, especially regarding the filter properties and values, is identifying the exact names and possible values of filter properties. For example, how to know that property name for the country where the target company is based is “TargetCountry”, and the possible value for the United Kingdom is “UK” but not simply “United Kingdom”? The problem is, while Search provides a significant amount of content, power, and flexibility, there are challenges when attempting to navigate through the hundreds of available properties when deciding how to extract data. In this article, Nick Zincone outlines a convenient tool that significantly simplifies the challenges of discovering financial properties when programmatically building Search.

I built the search query following the referred article, and the resulting output from the code above is 324 M&A deals from the US and UK from September 15, 2020, to November 15, 2021. Further, I create a list of RICs and announcement dates, including one for 30 days prior to the announcement. These lists are further used to get financial data for target companies.

Below I use the get_data function to request the specified financial variables for the 324 target companies. Here should be noted that the initial fields are coming from my first article where I outline the motivation behind choosing them. Further, I also run correlation analysis to remove variables that may carry multicollinearity. I use also try/except statements to handle possible request errors (such as runtime, connection etc., bad request) and run through them again.

Further, I drop some of the variables which are eliminated from the model. Again please see the previous article for more details.

Since data retrieval takes a relatively long time, I store the data in an excel file once the code is fully executed and further read from there. The dataset is available in the GitHub folder.

After removing target companies with missing values we end up having 209 companies in our target dataset.

1.2 Construct dataset for the non-target group of companies

The non-target sample is constructed from companies similar to the target ones. The best way to identify similar companies is to look at the peers, for which I use the Peer screen function. The peer group for each company, with the variables to be used in the prediction model, is requested using the function below. The function takes a RIC and date as an input and returns a dataframe containing peer companies along with the specified financial variables.

Below I store the rics and dates into separate lists and call the function above for each RIC in the rics list. Then, I drop peers with missing values and merge the resulting dataframe with the main dataframe of peer companies. The code involves the try/except statement to catch API request errors and run the code on them again.

Since data retrieval takes a relatively long time, I store the data in an excel file once the code is fully executed and further read from there. The dataset is available in the GitHub folder.

The resulting dataset consists of 7394 peer companies of 209 targets. However, there are many duplicates in this list, as many target companies have the same peers, and some of the peers were targets themselves. The data is further filtered and processed to eliminate the duplicates, which is presented next in this article.

1.3 Merge the two datasets, add remaining variables and labels

Here, I merge the target and non-target datasets, remove duplicates and calculate/add the rest of the variables which are not directly accessible through the API calls. First, the labels are added to the datasets and two datasets are merged into one.

Further, I remove the duplicate peers and those in the target list. Here, I extract the first part (before “.”) of a RIC into a separate column and run remove duplicate function on that column. This would allow considering the companies which RIC is changed due to a corporate event.

After the dataset with no duplicates is ready, I add the rest of the variables to be used in the ML models that are not directly accessible via the API.

The last variable which needs to be calculated separately is the abnormal return, for which I use the function from the previous article.

After the function is defined, I store the RICs and dates in separate lists to cal them from a loop.

Then I run the function above for each company and append the cumulative abnormal return to a list, which is further inserted into the main dataframe.

As before, since data retrieval takes a relatively long time, I store the data in an excel file once the code is fully executed and further read from there. The dataset is available in the GitHub folder.

Number of target companies in the dataset: 182
Number of non-target companies in the dataset: 3648

The dataset consists of 182 target and 3648 non-target companies totaling 3849 companies. Although it is important to test the model on an imbalanced dataset considering that non-target companies are more common in the real world than the target ones, it is not necessary to have this big ratio of 20:1. The challenge with many companies is that we will end up with 100,000s of textual instances to run NLP on. This will obviously require a lot of time and computational power, which I believe is not necessary for the purpose of this article. Thus, I take first up to the seven closest peers per target company which still ensures similar to real-world distribution. One can easily select more peers and even all of them and still run all of the processes coming next in this article.

Finally, I remove the rest of the variables which were used to calculate the variables to be included in the ML models. The final dataset structure is reported next.

Number of target companies in the dataset: 182
Number of non-target companies in the dataset: 1157

Section 2: Evaluate news sentiment prior to the M&A

This section walks through the retrieval of news headlines during the previous 30 days of the M&A announcement and utilizes FinBert and BERT-RNA models to evaluate the sentiment of each headline. If FinBert has a pre-trained classification engine and the headlines can be directly passed to it, BERT-RNA returns embeddings only. To get sentiment classifications for the latter, a classification model on the embeddings needs to be trained first, and only then passed the headlines to the classification engine.

2.1 Get news headlines

Before running sentiment analysis, news headlines for both target and non-target companies are requested using the get_news_headlines function of RDP API. Here, instead of the actual news stories, I use only news headlines considering the high number of those, which, as we will see further in this article, is around 122,000 instances. The challenge is that to get individual storylines, we would have to make 122,000 API calls. Just this will take more than 12 days considering the daily API call limit of 10,000 requests. Thus, I have decided to lose the API call overhead and focus on the headline text. Here, I just iterate over the news headlines and then pass the headline text to the sentiment engines, which will return the sentiment score.

Since we don’t want the news related to the actual M&A announcement to appear in our sentiment classification dataset, I specify the news request period from 30 days before to 5 days before the announcement.

Then, the code below loops over all instruments and requests news headlines for the specified period by storing the data into a separate dataframe.

As before, since data retrieval takes a relatively long time, I store the data in an excel file once the code is fully executed and further read from there. The dataset is available in the GitHub folder.

After getting news headlines and removing duplicated values, we end up having 1041 instances in our dataset.

2.2 Load pretrained FinBert sentiment classification model

About the key terminology and processes behind the NLP

Before loading the FinBert model, it is worth giving a basic understanding of the key terminology and processes behind the NLP:

Tokenization — Tokenization is the first process of NLP when a text is split into words or subwords, which then is converted to ids through a look-up table. Although this seems pretty straightforward, there are multiple ways of splitting sentences into words or subwords, and each way has its own advantages and disadvantages. Hugging faces provide a great introductory guideline on tokenization, which can be found here.

Word Embedding — Word embeddings are the vector representation of words where words or phrases from the vocabulary are mapped to vectors of real numbers. The vector encodes the meaning of the word such that the words that are closer in the vector space are expected to be similar in meaning. As it comes to the technical creation of the embeddings, these are created using a neural network with an input layer, hidden layer, and an output layer. An illustrative and explanatory example is provided in this blog post.

Transformers — The Hugging Face transformers package is a Python library that provides numerous pre-trained models that are used for a variety of NLP tasks. One of such pre-trained models in FinBert, which is introduced in greater detail in this section.

This article, which walks through the NLP text sentiment classification processes with illustrative examples, is a great source to learn more and have a hands-on experience with tokenization, word embeddings, and transformers.

About FinBert model

As it comes to FinBert model itself, it is a pre-trained NLP model to analyze the sentiment of the financial text. It is built by further training the BERT language model in the finance domain, using Reuters TRC2 financial corpus and thereby fine-tuning it for financial sentiment classification. After the model is adapted to the domain-specific language, it is trained with labeled data for the sentiment classification task.

Financial PhraseBook dataset by Malo et al. (2014) has been used to train the classification task. The dataset consisting of 4845 instances is carefully labeled by 16 experts and master students with finance backgrounds who, along with labels, reported inter-annotator agreement levels for each sentence.

According to the FinBert GitHub account, in order to use the pre-trained FinBert model, one should:

  • Create a directory for the model.
  • Download the model (pytorch_model.bin) and put it into the created directory.
  • Put a copy of config.json in that same directory.
  • Call the model with .from_pretrained(model directory name)

I have already created a folder and stored the required files in a directory called finbert. To load the model, we just need to run the code below. Additionally, I load the BERT tokenizer after loading the model.

2.3 Train sentiment classification model on Labs BERT-RNA

BERT-RNA is a financial language model created by LSEG Labs. BERT-RNA extends BERT-BASE and creates a finance-domain-specific model leveraging LSEG’s depth and breadth of unstructured financial data. The model is pre-trained using Reuters News Archive, which consists of all Reuters articles published between 1996 and 2019.

LSEG labs BERT-RNA model returns a vector of word embeddings, the process of which is illustrated in the image below:

download.png

Unlike FinBert, BERT-RNA doesn’t have a sentiment classification engine and returns a vector of word embeddings that needs to be further trained on a labeled dataset to perform a classification task. In order to train the classification engine, we need to have a labeled dataset with sentiment scores. For that purpose, I use the same Financial Phrasebook dataset which was used to train the FinBert model.

To do that, I downloaded the dataset from here and stored it in the local directory. Below I read the dataset by replacing sentiment words with labels from 0 to 2.

The next step is to format our training data into the CSV structure that BERT RNA accepts. BERT RNA expects a CSV structure with a single column for the text. The header for this column can be named anything.

Then I upload the CSV file into the Labs environment by creating a new Job.

Screenshot 2021-12-09 092557.png
Screenshot 2021-12-09 094157.png

After the job is fully executed, an OUT file is created, which can be downloaded from the Labs environment.

Screenshot 2021-12-09 095825.png

Finally, we can read the out file back to our environment by pandas read_json function.

The output is a vector for each input token (sentence), and each vector is made up of 768 numbers which correspond to the number of hidden units in the NLP model.

BERT RNA is trained with max sentence length set as 512 tokens. It is advised in BERT-RNA documentation that if the average length of text input is much shorter than 512, the feature space can be very sparse. Thus, applying the dimension reduction technique on the embedding output is recommended.

I will use the Principal component analysis (PCA) technique for dimensionality reduction, but before that, I check the average size of the text inputs.

Average length of our text input is 127.0

Now let’s apply the PCA dimensionality reduction with principal components of 127.

original shape:    (4846, 768)
transformed shape: (4846, 127)

Finally, I split PCA output into train and test datasets and run logistic regression to train the sentiment classification model.

precision    recall  f1-score   support

0 0.74 0.66 0.70 267
1 0.70 0.64 0.67 128
2 0.83 0.89 0.86 575

accuracy 0.79 970
macro avg 0.76 0.73 0.74 970
weighted avg 0.79 0.79 0.79 970

The classification results suggest overall accuracy of 0.79; moreover model produced the highest F1 score on the Neutral class (0.86) and the lowest on the negative class (0.67). The varying accuracy measures can also be caused by the number of instances in each label which is the highest for the neutral (575 instances) and the lowest for the negative class (128 instances).

Overall, the results are satisfactory to proceed with classifying the news sentiment of the target and non-target companies.

2.4 Evaluate news sentiments based on FinBert and BERT-RNA models

Now, as we have both of our NLP models ready for the news sentiment classification task, let’s proceed with it.

Get and Label news headline embeddings from BERT-RNA

First, let’s label the news using the BERT-RNA model. For that, I split the news headlines file into 4 different CSVs to make sure the embeddings job is completed properly and download/store the OUT files in a local directory. The code below reads and merges the news headlines into a single dataframe.

Next, I apply PCA dimensionality reduction on our headlines and label them using pre-trained logistic regression model.

Label headlines via FinBert model and evaluate pre M&A overall sentiment

Now let’s label news headlines via the FinBert model and evaluate pre-M&A overall sentiment by both models. To calculate overall sentiment, I first initiate SentOverallFBert and SentOverallLabs variables starting from 0, and each time a news headline is labeled positive for a particular company, I increase the corresponding variable value by 1 and decrease by 1 if the headline is labeled as negative. Neutral labels don’t impact overall sentiment.

Before starting the actual labeling and overall sentiment calculation processes, I first group the news headlines by RICs and then loop over the headlines belonging to the RIC. The code below does the grouping.

Now I loop over each headline of the RIC, label it and calculate overall sentiment for each company based on the two NLP models.

Here again, I store the data in an excel file once the code is fully executed and further read from there. The dataset is available in the GitHub folder.

Below we plot the overall sentiments calculated by both models to explore the outputs visually and compare the sentiments between the two models.

According to the graph, most of the headlines have close to neutral sentiment; moreover, most of the lower negative outputs are classified by BERT-RNA while FinBert has higher positive classifications.

Section 3: Evaluation of M&A predictive modeling

This section evaluates M&A predictive modeling. Before evaluating the ML models, It should be noted that the sample size, especially for the target companies, is very small for robust predictive results. Thus, the primary purpose of this article is to showcase a workflow of M&A predictive modeling using Refinitiv data/APIs and discover if news sentiment has any significant explanatory impact on the predictive power of the model. And, if one wants to build a robust M&A predictive model can use this workflow and variables to train the models on much larger datasets.

Logistic regression, random forests, and XGBoost ML techniques are used for the predictive modeling. The reason for using multiple ML techniques is to evaluate the explanatory power of news sentiment variables from multiple perspectives and make a robust conclusion regarding the importance of that variable on the M&A predictive modeling. Particularly, logistic regression allows looking at the p-values and the coefficients of the variables and random forest and XGBoost evaluate the importance of the features.

As these models are not used for actual prediction but rather for showing the impact of the sentiment variables on evaluation metrics, instead of train/test split, I employ Repeated Stratified Cross-Validation with 10 splits and 5 repeats. Considering the imbalanced nature of the dataset, ROC_AUC score is used as the main accuracy metric.

3.1 Preparing dataset for predictive modeling

Before training and evaluating the models mentioned above, let’s first construct our final dataset of independent and dependent variables. For that, I join NLP-based sentiment variables to our initial dataset consisting of the financial variables.

Then I run correlation analysis to eliminate the highly correlated variables and avoid multicollinearity. The code below unstacks and sorts the outputs showing the highly correlated variables.

It can be observed from the correlation output that Abnormal return(AR) is correlated with Price To Sales Per Share and EV to EBITDA. At the same time, the latter two variables are highly correlated with each other as well. Thus, I removed both of them and kept AR for the final model. Another correlated pair is Free Cash Flow to Sales and Operating Margin. Among those, I eliminated operating margin since there is already another variable, Gross Profit Margin, describing the management efficiency component of the company.

I drop the variables Operating Margin, EV to EBITDA, and Price to Sales per Share in the cell below. Additionally, I created three different sets of independent variables. The first dataset (X_NoSent) consists of only financial variables, the second dataset (X_BertRna) includes the overall sentiment variable derived by the BERT-RNA model, and the third one (X_Finbert) includes overall sentiment based on the FinBert model. Throughout the analysis, evaluation metrics based on all three datasets are reported to showcase the effect of news sentiment variable, including in relation to AR.

3.2 Evaluation of Logistic regression model outputs

First, I train and evaluate the outputs from the logistic regression with a ‘liblinear solver’ and the penalty of ‘l2’. This will allow us to look at p-values and coefficients of the independent variables helping to make certain conclusions regarding the impact of the sentiment variables. As the main accuracy metric the ROC_AUC score is used.

Model with no sentiment variable
AUC:0.5
With Sentiment from BERT-RNA
AUC:0.54
With Sentiment from FinBert
AUC:0.57

From the reported ROC_AUC scores, we can clearly see that logistic regression models with NLP-based sentiment variable significantly outperform the no sentiment model. Moreover, the model with sentiment derived by the FinBert model achieves the highest accuracy of 0.57. Further, I calculate and report coefficients and p-values of variables from the FinBert sentiment-based model. This will show the significance and the direction of the impact of the variables. For that, I first normalize the data by subtracting the mean from the actual values and dividing the outcome by the standard deviation. This will normalize the data and make sure the explainability of the coefficients.

Before showing the resulting dataframe of coefficients, I add to that the p-values for a more comprehensive outlook. As known, sklearn doesn’t have a built-in package for p-value calculation; thus, I calculate it myself by adapting an example from a thread in Stackoverflow.

From the results above, we can observe a statistically significant impact of Gross Profit Margin, Price To Book Value Per Share, Net Debt per Share, AR, Sales Growth, and SentOverallFBert variables. Moreover, the coefficient of SentOverallFBert is one of the biggest and equals to -0.46. At the same time, the coefficient of AR is the biggest equalling to 0.53. The negative coefficient of SentOverallFBert and the positive coefficient of AR (both statistically significant) suggest that higher abnormal return and lower positive sentiment indicate a higher possibility of M&A. This is in line with my initial assumption and supports the hypothesis that abnormal returns amid no or lower positive news is an indication of M&A announcement.

Further, I train and evaluate random forest and XGBoost models on the same datasets and look at the differences of the evaluation metrics across the models with different datasets. Most importantly, I look at the feature importance to confirm the significance of the Sentiment variable for the M&A predictive modeling.

3.3 Evaluation of Random forest model outputs

Next, I train and evaluate the outputs from the random forest model with 500 estimators and the entropy criterion. I used balanced subsample class weight considering the imbalanced nature of the dataset. Also, I set the value of the parameter max_depth to 3 to make sure I don’t overfit the model. ROC_AUC score is used as the main accuracy metric. Additionally, I report values for precision, recall, and F1. Finally, I look at the feature importance of independent variables to evaluate the significance of sentiment variables according to the random forest model.

Model with no sentiment variable
AUC:0.62
Precision:0.24
Recall:0.49
F1:0.32
With Sentiment from BERT-RNA
AUC:0.64
Precision:0.25
Recall:0.49
F1:0.33
With Sentiment from FinBert
AUC:0.64
Precision:0.26
Recall:0.51
F1:0.34

Reported evaluation metrics are in line with the one from the logistic regression model in terms of models with NLP-based sentiment variable outperforming the no sentiment model. Moreover, we observe a much higher ROC_AUC score for random forest models. Particularly, it is 0.12 higher for the no sentiment variable model and 0.11, 0.7 higher for models with sentiments from BERT_RNA and FinBert sentiment, respectively. It is worth also highlighting that FinBert and BERT_RNA based models have the same ROC_AUC score of 0.64; however, FinBert still slightly outperforms BERT_RNA by precision, recall, and F1 measures.

Next, I calculate and plot the feature importance for both models with sentiment variables.

Both of the graphs above suggest the high importance of the news sentiment variable. Particularly, SentOverallFBert has the second-highest feature importance after Profit to Capital. The latter has the highest importance in the model based on BERT_RNA as well. In that model, Net Dept per Share has slightly higher importance than SentOverallLabs. Nevertheless, the results from the feature importance values are in line with the results from the logistic regression model. They show the high importance of NLP-based news sentiment variables along with AR variable.

3.4 Evaluation of XgBoost model outputs

Finally, I train and evaluate the outputs from the XGBoost model with 500 estimators, a learning rate of 0.1, and an alpha of 30. I used the balanced scale_pos_weihght parameter considering the imbalanced nature of the dataset. Also, I set the parameter max_depth to 3 and subsample to 0.5 to make sure I don’t overfit the model. Here again, the ROC_AUC score is used as the main accuracy metric. Additionally, I report values for precision, recall, and F1. Finally, I look at the feature importance of independent variables to evaluate the significance of sentiment variables according to the XGBoost model.

Model with no sentiment variable
AUC:0.62
Precision:0.24
Recall:0.48
F1:0.32
With Sentiment from BERT-RNA
AUC:0.63
Precision:0.25
Recall:0.49
F1:0.33
With Sentiment from FinBert
AUC:0.64
Precision:0.26
Recall:0.53
F1:0.35

Reported evaluation metrics are in line with the one from the random forest model. Here again, we observe a much higher ROC_AUC score compared to the logistic regression outputs. It is worth also highlighting that here FinBert based model still slightly outperforms the BERT_RNA-based one not only by precision, recall, and F1 measures but also on the ROC_AUC score.

Here again, I calculate and plot the feature importance for both models with sentiment variables.

As in the case of the random forest model, both of the graphs suggest that the news sentiment variable is among the highest important features. The main distinctive part of XGBoost feature importance is that it revealed Debt to EV as another important feature which value exceeds both SentOverallFBert and SentOverallLabs. Nevertheless, the results from feature importance values are in line with the results from the logistic regression model and the random forest models suggesting the high importance of NLP-based news sentiment variables.

Summary

This article was an extension to my first article on Predicting M&A targets using Machine Learning techniques. If the first article used only financial variables to predict M&A, here the NLP-based news sentiment variable was used to increase the predictive power of the model. The main hypothesis behind the news sentiment variable was the intuition that abnormal returns amid no or lower positive news sentiment environment could indicate an M&A announcement. For the news sentiment analysis, two BERT-based models, including FinBert and BERT-RNA, are used and the significance of variables derived by both of the models through logistic regression, random forest, and XGBoost ML techniques, are compared.

Although the dataset wasn’t large enough to claim the robustness of the model’s predictive power, the evaluation results on different ML techniques allow to claim the importance of the NLP-based news sentiment variable along with Abnormal return for the M&A predictive analysis. In fact, the latter was the main purpose of this article. Although the predictive power of this model isn’t high enough to use this model for an actual prediction and trading, I hope that this prediction workflow along with the selected variables, can be useful for training more robust models on much larger datasets and achieve much higher accuracies.

Apart from this, another important aspect of this article was the usage of the Search function for M&A data retrieval and the usage of BERT-RNA embeddings for sentiment classification, which I believe can be useful for the developer community who use Refinitv products.

--

--