Using Machine Learning to Solve Data Reconciliation Challenges in Financial Services

6 min readMar 19, 2018

Winning in financial services is increasingly about the speed and accuracy of data normalization and reconciliation. Banks, asset managers, custodians, broker-dealers, portfolio managers, market utilities, etc. are all fundamentally data-driven. Thousands of employees onboard, match, compute and report massive quantities of data every day across every part of the business.

Reconciling data so it can be aggregated, analyzed and reported is a never-ending activity in financial services. Much of this reconciliation work is done by data teams with traditional ETL tools, but spreadsheets and manual processes are rampant. Inefficiencies and costs abound, particularly when on-boarding new data sources.

Data reconciliation inefficiencies are exactly the type of “low hanging fruit” basic machine learning can resolve. Multiples of value in terms of time, operating cost and avoiding regulatory penalties can be unlocked by employing machine learning at key data reconciliation points.

Where Data Reconciliation Costs Impact Financial Institutions

Data reconciliation inefficiencies can occur in any part of the business where:

New data sources must be matched with internal or external records (customer, security master, position, LEI, etc.)
Multiple data sources / types are compared or aggregated (market risk, credit risk, RWA, liquidity stress testing, exposure limits, BCBS 239, etc.)
Internal data must match an external database of record (trade repository, regulator database, 3rd party credit reports, AML / KYC / CFT, etc.)
Manual controls and approval processes exist (customer onboarding, transaction control, loan control, accounts payable, etc.)
There is a system or data migration project (aggregating multiple systems into one, onboarding a new system, acquiring a new loan portfolio or customer accounts, etc.)
Regulatory reporting is mandated (Dodd-Frank, MiFiD II / MiFiR, OFAC, GDPR, etc.)
Audit trails are produced and analyzed (internal audits, regulatory audits, CAT 613, OATS, IFRS 9, SMR, SOX, etc.)
Multiple 1-to-1 reconciliations are aggregated into multi-part reconciliations (collateral netting, bank holding company reporting, etc.)
Market requirements have evolved rapidly (multiple exchanges, market structure, cloud computing, TARGET2 settlements, utilities, etc.)

The “New Normal” — Why Traditional ETL and Fixed Data Structures Fail

Far-reaching new business and regulatory requirements that depend on rapidly handling, reconciling and aggregating complex data have become the “new normal” in financial services. Unfortunately, traditional fixed data model / ETL approaches adapt poorly to these complex new requirements.

Under the traditional model, Business Analysts and ETL / data specialists must create technical requirements, onboard new data sources, analyze the data, and apply ETL processes that match a fixed data model downstream. This is highly complex, takes significant time, and often requires expanding the project scope to multiple parts of the business.

These limitations can cause four effects on the business:

Rigorous constraints on ingesting new data sources can cause critical delays when speed is required to meet a mission-critical goal (M&A, regulatory / compliance)
It does not scale well. Jumping from BAU-level workloads and data volumes to handle large new projects on short timelines can throw the data and technology teams into chaos.
The past defines the future: Systems designed to meet old requirements may substantially limit how new data can be ingested and what can be done with the data downstream.
New data types and sources can require significant process re-engineering, training and hiring. This can push project time frames past “hard stop” dates defined by the business or regulators.

Two further (very negative) things may also occur:

New business or regulatory projects can fall behind schedule and exceed the available budgets.
Business users avoid getting the data team involved and develop their own spreadsheet-based processes to onboard and reconcile data. This causes high costs, low transparency / auditability, and high error rates.

How Machine Learning Can Increase Efficiency and Reduce Data Reconciliation Costs

Basic machine learning can be implemented to help solve the speed and cost issues of on-boarding and reconciling new data sources.

The main problem with structured data / ETL approaches is the slow speed of taking on new data and matching. The greatest “bang for the buck” can be achieved by taking out slow human-based processes in the initial data onboarding stage and replacing them with a machine that analyzes and teaches itself how to handle the new data.

The ideal system for this purpose:

Connects to most/all data sources (the new source as well as existing sources to match, plus existing structured data sources and ETL layer)
Ingests data in a wide range of formats (csv, XML, feed, SQL, NoSQL, etc.)
Processes the data in-memory to maximize speed and capacity
Has a built-in data engine that automatically “learns” the data sources and patterns, analyzes it for likely matches across multiple data sets, highlights reconciliation exceptions / mismatches, and presents actionable “to do” lists to resolve data issues
Has an easy-to-use interface that helps analysts quickly build data control rules in a central location with the ability to implement automated approval processes
Records all activities in an auditable format

Three Case Studies of Machine Learning in Large Scale Reconciliation Projects

Case #1: Fees, pricing and transaction data from 200+ Financial Advisors to a U.S.-based Wealth Management firm

A reconciliation platform featuring machine learning capabilities was implemented at a major U.S.-based Wealth Management firm. Prior to implementing the system, the Operations team had to manually reconcile hundreds of data sources on a daily basis from Excel, PDF, emails and 220+ websites submitted by the firm’s Financial Advisors. The process lacked control, automation and supervisory review/approval.

Implementing the new system involved pointing it to the data sources, then allowing the machine learning and reconciliation engine to process the data in memory. Probability-based algorithms were applied and potential mismatches / exceptions generated in a report. The Operations team then processed these and was able to quickly develop reusable matching rules and approvals / controls in a central location. Once the rules were built up, the system could automatically check data quality within specified tolerances, generate exception reports, and output a file to be ingested by the firm’s accounting system. This eliminated the operational risk from manually cutting and pasting data, implemented supervisory review, and automatically integrated it with the accounting system. Efficiency gains of several million dollars per year were achieved.

Case #2: Broker-dealer reconciliation of multiple exchanges to multiple internal systems for ETDs

A large broker-dealer with global operations trades ETDs across 80+ different exchanges. Multiple internal systems are used to capture and process the securities, trade, price, position and customer data. Due to the high complexity level and performance requirements, combined with a lack of reconciliation and normalized control framework, the firm was increasingly unable to expand into new markets. The firm estimated it took over 200 man-days to onboard a new exchange.

The firm implemented a reconciliation system featuring machine learning and in-memory matching capabilities. The system ingested data from 80+ exchanges and normalized / reconciled it with the various internal systems with their fixed data structures. The learning engine was able to quickly process millions of historical transactions, display exceptions and mismatches, and suggest matching rules. The firm was able to onboard 2 exchanges per day, rather than 2 per month.

Example #3: Acquisition of a new business loan portfolio from another lender

A regional bank acquired a portfolio of several thousand business loans from a competitor. The onboarding process for these loans would require Client Onboarding, KYC / AML, Treasury, Limits, Technology, Project Management, Accounts Receivable, the selling bank’s Loan Operations team, and potentially the borrowers’ Finance teams.

The data was housed at the selling bank in different systems and formats, and was tied to different customer and credit records. Onboarding the loan data required analyzing the data, matching with the bank’s internal LMS, identifying mismatches and gaps, creating reconciliation rules, and matching it to internal customer records. This had to done before the loans can be managed, included in P&L and risk calculations, and billed. The bank’s internal team estimated onboarding the loan portfolio would require 5–6 months with their existing systems, with analysis every loan required.

The bank chose to implement a cloud-based reconciliation system with machine learning capabilities. The loan portfolio and associated customer and payment records were loaded into the system and matched to the bank’s internal LMS and accounting records. The system was able to match approximately 65% of the data to internal records within 1 day, and presented the remaining data in a central dashboard for resolution. The entire portfolio was onboarded in 2 weeks, including new matching rules and records, controls and import to the firm’s LMS.

In a Nutshell

As these examples show, basic machine learning capabilities can be leveraged to rapidly meet business and regulatory requirements where traditional structured data approaches fail. With the speed of change in financial services, investigating how machine learning can be used for data reconciliation in your firm could result in significant benefits.