Algorithms vs. data, lessons learned from AI at global scale

vBase.ai
vBase.ai
Published in
3 min readOct 11, 2021

Recently I had a great discussion with Shyam Maddali. Shyam was an early proponent of Machine Learning at eBay. As early as in 2006 he build protypes with ML technologies like SVM (suport vector machine) and Random Forest to compete with the logistic regression models that were common place then.

He then went on to be the first manager of the team that built eBay’s Neural network based advance fraud detection system: RADAR.

Below are some lessons learned from that conversation.

Neural nets vs SVM vs Random Forest vs Boosted Trees

Shyam talked about the various experiments the team conducted to determine the best method for applying ML for a specific fraud detection problem. All methods were fed the same dataset. Methods were compared using a standard technique called PR curve. In the begning there would be some differences. But as each model was appropriately tuned, all models started showing similar results over time. The key conclusion here was: It is the Data that matters in this this type of business AI efforts, more than the specific AI algorithm.

Below is an example of various PR curves, the top 4 could be from the best fit models across different algorithms.

Then why did we choose Neural Networks? Simply put it was faster to train models with NN, as we used a very powerful library: FANN: Fast Artificial Neural Networks! And as the name suggests, it was really fast!

The biggest operational challange: Hiring, training and structring the team!

One of the biggest challange we faced (and possibly many face even today) is how to hire and train engineers. There was a question wheather we should hire PhDs or engineers or both? And how to structure the teams?

Ultimately what worked the best was a very rigorous training program that involved true peer based on the job training. That allowd us to hire for passion and skillset rather than theoritical knowladhge of AI. And the teams were structured based on business / product domain. Rather than based on tehcnology or functional compnents of an AI system.

Operational best pratice: unifying Training and scoring platforms.

A lot has been written on ML ops, and other functions to support operational aspects of ML. What we discovered was a unified platform, or at least a unified set of technologies / programming languages across training and production (scoring) was very critical for scaling the effort.

Other operational issues

Beyond team, and platform there are various operational decisions involved. How long to train a model, how to avoid overfitting, how to handle QA, how to avoid (and catch) data gaps between traing and production, and so on. Many of these decisions are made by ‘trial and error’. Especially if left to the ML engineers / data scientists. We quickly learned the need for ‘management’ approach to these decisions, based on simple cost vs benefit analysis as well as standard practices towards scaling operations.

For a quick primer on the operational challanges of AI, check out this video.

Scaling AI efforts, globally.

The last part we discussed was ways to scale AI efforts across the organization. As AI is used for variery of tasks and for localized solutions used in many countries, it creates it’s own challanges. Some of the issues are scaling feature (or variable) generation to enable cross team sharing, local and global compliance (especially in regulated industries), appropriate learning from global ‘and’local data for localized models. Shyam talked about recent advances in these areas and some of the innovative techniques he has been using in last few years. More on that in a future article!

--

--