Integrating Machine Learning Models within Matured Business Process

Machine Learning today is reaching every business process of enterprise, helping to create value, enhance customer experience or to bring in operational efficiency. Business today have necessary infrastructure, right tooling and data to generate insights faster than before.

While Machine Leaning models can have a significant and positive impact on how business process are run, it can also turn out to be risky, if put in live production without monitoring these models for reasonable amount of time. Major hurdle one hits is when organizations have some form of business rules embedded into their critical business process. These rules might have evolved over time taking real world domain knowledge into play and also might be performing exceptionally well. In these scenarios stakeholders typically might push-back on completely doing away with existing rules ecosystem.

Challenge with rule based system is data and business scenario change faster today to a point where either rules are unable to catch up with real world scenario or it is very time consuming to create and maintain additional rules

With the set background, this article is about how we can make use of best of both worlds (Rules + Machine Learning) and also over time measure performance of machine learning models with real world data to see if they can exist by themselves.

I am going to take a banking use case, “Small Business Risk Scoring” as the business process to elaborate on the model deployment options. Said that, deployment options below can be applied to any business process.

Typical risk scoring architecture looks like one below today

Risk scoring is performed based on customer behavior/history and data from credit bureau. These scores along with underlying transaction data is stored in a centralized feature store which again is distributed to bank downstream systems like Collections, Regulatory, Risk etc. High risk customer who are 30+ days past due are prioritized to service agent in real time.

The brain of score calculation today is a rule based scoring engine built over last decade applying domain context as well as risk pattern seen over time. Business today wants to upgrade the scoring engine to take into account external data sources like micro and macro economic factors and also industry data that small business operates on to identify external risk factor that may probably turn good standing account to delinquent.

With this said introducing new data sources might introduce additional complexity into enhancing and maintaining rule based system. This is a place exactly where machine learning can learn from underlying data without being explicitly programmed.

Let us get into few deployment approach assuming we already have some machine learning model built with customer behavior/transaction history, bureau data, micro and macro-economic data and Industry segmentation data.

Approach 1 — Stacked scoring model using meta-learner

In this case we use output of both rule based and ML model to predict customer risk. The outputs are stacked together either as rules or preferably as simple logistics regression model (meta-learner)

Advantages

  • Increased model accuracy as stacked meta-learner learns from both hand coded rules as well as machine learning models
  • Easy to monitor individual model performance and understand strength and weakness of each system

Disadvantages

  • Increased execution time and slightly higher deployment complexity
  • Need to train and monitor multiple ML models (Risk Scoring ML model and Meta-learner)
  • Rules based engine always in the loop (Not bad though). Over time if comfortable with ML model, rules engine and meta-learner can be removed

Approach 2 — Champion/Challenger model deployment

In Champion/Challenger mode, Machine learning model is deployed in parallel pipeline to rules based engine. Both pipeline scores incoming transactions in parallel. To start with risk scoring decision is done using rule based engine while ML model operates in dark mode where scores are used for offline analysis. Over time once we have sufficient positive metrics on ML model performance one can toggle ML model to Champion and rules to challenger.

One another option is to see individual performance by segment and balance load to both scoring component based on segment level performance. Say rules engine works better for high credit score customers and ML model for low, we can put a simple rule to divert incoming transactions to respective scoring component by credit score levels

Advantages

  • Simple pipeline
  • Easy to toggle load based on performance or balance load between both

Disadvantages

  • Complexity in case need to balance incoming transaction intelligently between both champion and challenger model
  • Possibly less accurate than meta learner method

Approach 3 — Serial pipeline deployment

In serial deployment the ML model is trained slightly different from other 2 deployment options. In this case machine learning model is trained with output of rules engine as a separate feature along with features from other data sources. This is approach to go when you are looking to compliment and augment current rules based system with ML model for better decision boundary.

Advantages

  • Simpler pipeline compared to stacked output

Disadvantages

  • Rules Engine always in loop

Each of the approach stated above has its purpose and depends on business process flow, criticality of business process, business/domain context coded in current rules ecosystem and finally decision accuracy of the current process.

With my experience in Banking and Finance, have seen Approach 1 more favorable in Anti-Money Laundering and underwriting projects. Approach 2, more on Marketing and Fraud Analytics side and Approach 3 in places where black box or vendor products like Actimize, Kofax etc exists today and new machine learning model is trained with the score coming in from these vendor products along with other new features.

Happy machine learning deployment!

Data Driven Investor

from confusion to clarity, not insanity

Srivatsan Srinivasan

Written by

Data Scientist | Data Engineer

Data Driven Investor

from confusion to clarity, not insanity

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade