Spread your bets: Improving AI performance through diversity

Published in

QuantumBlack, AI by McKinsey

9 min readMar 21, 2018

QuantumBlack’s Chief Operating Officer, Chris Wigley, writes about why diversity should run much deeper than gender and explores how truly diverse teams have better results in driving performance.

It’s not often that a middle-class middle-aged white guy gets to be a diversity candidate. But, a few years ago, it happened to me.

I was working with a jewellery company, doing customer analytics (using SPSS, which dates this story). We were reviewing their existing segmentation, in a beautiful high-ceilinged room, with posters on the walls, and mood-boards for each segment. There were eight women in the room. I was the only man. It struck me that all seven customer segments were women. “How about”, I asked, “we have some segments for men? Even one?” It’s women’s jewellery, the response came back. “Yes, but if we look at most of your ads, it’s a man giving a gift to a woman. The man is the actual purchaser here.” Ahh. Yes. We added a segment for men.

Diversity is clearly not about being a man. Nor in a tech context is it just about being a woman. Diversity is about bringing an additional, fresh, perspective. It leads to better performance — for people, and for Artificial Intelligence (AI).

These days, I have a much broader focus at QuantumBlack, applying machine learning and AI to real world opportunities with clients in pharmaceutical, aerospace, financial services, government and many other sectors. AI is still in its infancy “in the wild” outside of the lab and the mega-tech companies, and it’s easy to criticise its performance or flag its limitations. But we’re experimenting a lot, and finding that diversity is critical to success.

So, if “AI is a child, and we are the parents”, what can we do to raise a good and helpful kid? I argue in this short paper that we can both mitigate the risks and enhance the performance of AI by embracing diversity, specifically:

Diversity of People
Diversity of Data
Diversity of Models
Diversity of Mindset

We’re trying to bring all these kinds of diversity to bear on our work. We haven’t got it right. There probably isn’t a point at which we ever declare victory. But we’re learning and hopefully improving as we go.

We’d love to hear your thoughts on these topics in the comments or on twitter at Chris Wigley or QuantumBlack.

1. Diversity of People

Diverse people bring diverse experience, and diverse approaches to solving problems. This is essential for breaking new ground and has been proven to result in increased performance in many fields. AI is no different.

Let’s look first at gender diversity. Having just been at SXSW in Austin, it was refreshing and illuminating to have no all-male panels and many all-women panels. It shouldn’t be noteworthy; sadly, today it still is. The insights and charisma of the women on stage raised SXSW above its competition.

*Panel at SXSW2018 with Fei Fei Li, Megan Smith and Joanne*

At QuantumBlack we were for some time above 50% women in our technical roles. As we’ve grown (from 35 people two years ago to 350 now) both organically and by taking on entire teams, our ratio has slipped, but we’re working hard to get it back there because it very clearly helps our performance.

Attracting and retaining talent is our number one challenge; many of our best and brightest are women and we desperately need more. It also helps that many of our senior clients (men and women) have commented that they were excited to work with us because we did not have a “bro” culture that they didn’t trust, so diversity also helps client development.

So, to take that point further, gender is important but it’s not just about gender. Other elements of diversity are also critical.

Age and historical perspective. Our team at QuantumBlack ranges from 23 to 63 years old. Our lead Data Architect is in his 50s and was building big data systems in the 1980s. Our younger machine learning engineers have grown up on a contemporary open source stack and use TensorFlow and H2O and other frameworks with the fluency of a native tongue. That combination of skills and perspectives is vital for our work. It is also important for us as a young company to benefit from the experience and scar tissue of veterans of previous technology revolutions (for example, one of our leadership team was the CTO of the New York Stock Exchange when it went from paper-based trading to high frequency trading).

And historical perspective goes beyond our own lifetime (and tends to come from folks with a Humanities background). As Megan Smith points out, driving large scale technical revolutions is nothing new — “Lincoln inherited the Pony Express, and had to upgrade to the Telegraph”.

We only get this range of perspectives by having a range of people. Diversity leads to performance.

2. Diversity of Data

It’s something of a truism that to improve the performance of an Machine Learning (ML) model, you don’t need a sharper algorithm, you need more data. This is not always true, and as we’ll see below, different modelling techniques can bring material step-changes in performance. However, we have experienced that bringing both more data, but critically more diverse data, can lead to real breakthroughs in insight and model performance.

For example, we were working with a large pharmaceutical company on using machine learning to predict which of 3–4,000 clinical trials were likely to have patient safety issues. We had ingested all the big, structured datasets — finance, operations, HR and so on. We had plenty of data, but the models weren’t meaningfully predictive. By engaging with the humans running the clinical trials, we identified new sources of data that were “latent” in the system, for instance the typed reports that visitors to a site make, which were then backed up as PDFs.

Creating new ML features using the text from the reports (geo-stamped and time-stamped, see picture below) turned out to be one of a number of additional data sources that made the models accurate enough to be pragmatic to use. In production, the models have increased the productivity of the patient safety teams by over 4X (vs. being randomly assigned, as they were previously).

*Text fragments extracted from PDFs to form features in ML models in Pharma (patient safety)*

So we constantly challenge ourselves to bring more diverse, more granular data to bear on a problem: structured AND unstructured; machine AND human; detailed AND meta. Diversity leads to performance.

3. Diversity of Models

In the last section we touched on how just “sharpening” an algorithm or model rarely leads to step changes in performance in and of itself. What we do find leads to performance (by which we mean something like “useful applicability in the real world to take action or generate value”, not just “area under the curve”) is using a diverse range of models and generating a compound output from that ensemble approach.

We use a lot of modelling techniques, as per the graphic below:

Applying them in the real world, on messy, often unstructured datasets, means we have to squeeze every ounce of signal out of the noise we are confronted with. For example we worked with a telecoms infrastructure player in India to predict failures of cell towers that would lead to deterioration in user experience.

At its most basic, this might be an alert that says the aircon unit has blown, which means that we know three hours later (given temperature conditions), the main CPU in the cell tower will overheat and it’ll stop functioning. No ML needed there, just expert knowledge and digital tech. At its most advanced, hundreds or even thousands of patterns or features in the data streams coming off all the sensors could tell us what’s likely to happen next. If we can map those patterns to previously recorded failures, and use natural language processing to analyse the previous maintenance log entry, we can also get to a stab at the likely root cause.

For the models to be useful, they had to predict issues sufficiently in advance to get an engineer to the site in time to pre-empt the outage by fixing the imminent issue. If we can also predict the root cause, that also helps speed up the fix. The challenge is, the further back in time we predict, the less accurate the predictions are — predicting a failure one minute before it happens is easy; predicting it a day in advance is near-impossible. We settled on four hours as a ‘sweet spot’ that maximised predictive power while allowing an engineer time to deploy.

The challenge was that no single model could reliably predict at the four hour window, no matter how many data sources we threw at them. So we ended up running three different models in parallel, and making predictions every 10 minutes on each of them. Each individual model would “turn red” (i.e., predict failure) if two of the last three predictions were red. The overall alarm would sound if two of the three models had “turned red”. That seemed to work! Phew.

*Ensemble model output flagging warning (line) four hours ahead of outage (thick bar)*

A final point on model diversity is that there is also an important spectrum of complex to simple models, where we’re often trading off accuracy against explicability. In banking, say, we might make the most accurate predictions, on fraud or credit worthiness, using a Deep Learning model. But we might be unable to explain the outcome to the regulator despite our best efforts with LIME (a model explanation technique). So we might also run a Random Forest model where the importance of each feature is more transparent. And if we’re optimising for speed or we need a fall back model in case of failure of the more advanced model, we might also run a multi-variate regression model. This generates both more comfort with what’s happening and more robustness in operation — resilient AI.

So we can see that often in order to actually generate operationally useful outputs, AI models need to be built on ensembles of ensembles of ensembles, and need to be able to be explainable and robust as well as driving accuracy. Diversity leads to performance.

4. Diversity of Mindset

A final thought on diversity is maybe the hardest to quantify, but is around diversity of mindset. Ray Dalio, in his thought provoking book Principles makes a powerful case for “spending time with really smart people who disagree with you — that’s the best way I’ve found to reduce the likelihood that you’ll make bad decisions”. That’s very much our approach at QuantumBlack as well. We don’t live-score our colleagues during meetings, but we do push hard on…

· Diversity of academic background, where our main disciplines are data science, data engineering, software engineering, UX design, information design and strategy.

· Diversity of nationalities, where at last count we had over 50 nationalities represented among our 350 people.

· Diversity of personality types, ranging across whatever frameworks we can use. We like Myers-Briggs, because we think the difference between e.g., “N” big picture thinking and “S” bottom up thinking is a very healthy debate, and because we recognise the need to accommodate both “E” extroverted thinkers and “I” introverted thinkers. But we also see value in the differing perspectives that simpler differences like optimists and pessimists bring to a problem (one of our leadership team has been nicknamed “Eeyore”, another “Tigger” — that’s another healthy element to a debate: either one of those perspectives taken alone is likely to lead to bad decisions over time).

So the step-back point here is obviously: diversity is not a “nice to have” topic, nor a CSR agenda. It’s a fundamental building block in successfully developing and deploying AI in the real world. If we want resilient AI, if want performant AI, we need diversity. In all its forms.

Spread your bets: Improving AI performance through diversity

1. Diversity of People

2. Diversity of Data

3. Diversity of Models

4. Diversity of Mindset

Written by QuantumBlack, AI by McKinsey