Data engineering — Data Ingestion for M&A predictive modeling

Haykaz Aramyan
LSEG Developer Community
3 min readMay 3, 2023

Overview

The full article can be found on LSEG’s Developer Portal, which focuses on the ingestion phase of an AI pipeline analysing corporate events. Particularly, we discuss how one can ingest M&A and Fundamental & Reference data for target and non-target companies via Refinitiv Data Platform (RDP) API.

Article in Brief

Data Ingestion for target dataset

In this section we show how one can ingest data both from the Advanced Search of Refinitiv Workspace as well via the API.

One of the most direct ways to retrieve a dataset on M&A target companies is the M&A Advanced Search section of Refinitiv Workspace.

While this is a great first step of getting data on M&A using Refinitiv, it is perhaps not the most scalable approach of data retrieval for AI pipelines. We would want to access the data programmatically and integrate it in the pipeline. For that reason we provide a function which uses the Search capabilities of RDP API:

Workflow for ingesting data as of a universal and for different dates

After we ingested the M&A data, the next step is to get the fundamental & reference data for target companies. Here, we describe ways to ingest data as for a universal date for each target company as well as for a specific day per the target.

Optimising ingestion requests

One important concern during data ingestion is making sure that we have an optimised ingestion layer that is not presenting any unnecessary bottlenecks within our pipeline. This is a proprietary decision, and, in our case, we did notice that there are dates for which multiple M&A has been announced. To optimise our ingestion request we could bundle the deals by date. That will allow us to reduce API calls and retrieve the results much faster.

The optimized version of our code above returned the data roughly 40% faster compared to the other versions provided in the main article due to the decreasing the number of API calls.

Data Ingestion for non-target dataset

The non-target sample set is constructed from companies similar to target companies in terms of business activity. To identify the non-target control group, this would be a complicated identification process, however, Refinitiv has a great function called Peer Screener which allows us to easily retrieve the top 50 peers of the any company. Details can be found in the main article.

Conclusion

In this guide, we presented the business incentive for an M&A AI model and discussed data ingestion approaches to acquire the dataset for our pipeline. Furthermore, in the next phase of Data Engineering, the Data Exploration phase, we will explore more about the specifics of our dataset and the available feature space.

References

Downloads

Related Blueprints

--

--