These days, collecting data is easy; maybe too easy. That’s why more and more firms are turning to Enigma CEO Hicham Oudghiri to make sense of it.
By Dan Costa
Hicham Oudghiri is co-founder and CEO of Enigma. I first met Hicham years ago, when Enigma was mainly focused on collecting public data sets. Now it has expanded into private data sets, and today, he’s using all that data to do some extraordinary things, including combatting human trafficking.
If you want to understand the potential of Big Data both for good and ill, Hicham is a perfect guide. Here’s what he had to say on stage at the Techonomy Conference in New York City.
Dan Costa: I want to start with a quote from a Forbes article: ‘To date, Enigma has synthesized 100,000 data sets in more than 100 countries, organized intelligence for 30 million small businesses, and accumulated 140 billion points of data on the US population.’ So what are you doing with all that?
Hicham Oudghiri: So lots of good stuff hopefully. Let me kind of step back and give you a big, overarching premise for us. I think a kind of dirty secret in the data industry, which is most of the advanced AI work, all of the buzz around machine learning, and what people are doing with Big Data, is understanding how people behave online, most of it. The successes that we’ve heard have been these stories from the Googles, the Facebooks, and the Amazons of the world, and essentially using all of this very sophisticated math and data at scale to get you to click on things on the internet, which has done some amazing stuff, like the communication paradigm that we have as human beings.
But in terms of fundamentally changing how businesses work—be it drug safety, or getting access to credit, to a whole segment of the population that’s been left by the wayside by the big banks because they don’t understand them—putting data to work in the real world is quite difficult. So our goal and our mission has always been to collect a new kind of information and model how the real world operates for a variety of use cases that I’m happy to get into. But that first big divide is what we’ve been doing with all of it, just trying to model how the actual, real world works and improve it wherever we can.
A lot of people have heard of Enigma Public, where you’re taking public data sources, government sources, and making that available, and then you’ve been adding and layering more private data sets. Can you explain how that works?
So in this drive to understand how things work, we just came to a point where we had understood the symbiosis in between what was available openly and publicly for everyone, and what we could get by…buying data from folks who had spent good and hard time collecting it. The fundamental process for us is the same, which is, does this data have signal? Does it have quality? How is this data collected from a lineage point of view? I mean, just because a data set is public doesn’t mean that gives us infinite usage. You can’t use property tax assessments in marketing situations. Sometimes those regulations are city by city.
So for us, it wasn’t that much of a shift. But in the scale of the business operation and the questions that we’re trying to answer now, we’re agnostic as to where the source of the data comes from. The public data is the foundation for us to resolve entities—i.e. merge very, very disparate data sets together who sometimes don’t speak to each other. Having that backbone reference spine of every business, every person, these sort of things, have gotten us to where we are in this regards.
Can you give us an example of a problem that you’ve solved for a financial institution, a bank, or a lender?
So we do a lot of compliance work. Basically, is the person or company that you’re doing business with legitimate? This is a question that is actually quite hard to answer, and if you’re a small business and you’ve tried to open a bank account, you’re sitting there and annoyed like, ‘Why can’t I give these people my money?’ Most of the time it’s because the bank’s processes are just really bad in that regard. But the other half of the time there’s some due diligence that needs to be done. Getting that due diligence, like the first 90 percent of that due diligence, automated so that you can let folks investigate the real bad guys, is something that we’ve done quite well.
We’ve done this for American Express, and helped them with their anti-money-laundering operations. We’ve done this with folks like BB&T, where we help score every client that comes into the bank. So think about credit score, and think about scoring someone on basically-I call it like a “shadiness factor,” as it were.
Is that an official metric that you’ve calculated?
Yes, the shadiness factor. It takes all kinds of data in and helps the bank do business with the right people much, much faster.
So we’re going to get to the bad guys in just a second. What’s the strangest data set that you have, and what makes it useful?
Some of my faves are understanding the expense details for various government agencies. Like how much does the NYPD spend on bagels? [That’s] something that I can answer. [And] we use this data set in a very different way, but the voter registration data in the United States, which is a public data set-like everyone who’s registered to vote, their address, all of this good stuff. It’s quite hard to access it, and it’s quite hard to structure.
But it gives you the topography of where people live, how dense are they located to each other, how densely is the population. I think it’s a better, if not, more granular metric than the census. It gives you like kind of approximate counts, which lets us do all kinds of interesting things. Like we help [consumer packaged goods] CPG companies place products like drinks and soups, and all kinds of these things, based on the profiles of where people live and their driving radiuses from businesses and all kinds of things. There’s a tremendous amount of waste in that system, that supply chain is not well understood, and voter registration data is like the chicken stock for us in that algorithmic recipe.
Interesting. But I imagine it leaves out all the people who don’t vote. How much of a problem has that been?
Well, it’s not a problem because we don’t target person by person. So our use case is always like probabilistically what [does] the shape of this neighborhood look like? Is it a residential neighborhood? How clustered are people to shopping centers? What’s the average drive time? All of these things go into the calculation, but it’s not like, ‘Oh, your data set is incomplete so I can’t send these people a piece of marketing, or I can’t use it to underwrite them.’ That’s not what we use it for. So in that sense, it does the job pretty well.
So obviously it’s sort of easy to understand the commercial applications of a lot of this data. Talk a little bit about STAT, Stand Together Against Trafficking, and the Polaris Project, and what you’re bringing to that effort.
So this one is one that’s just born out of what we’ve seen in the field. So we’ve noticed that, much like the rest of the economy, and folks wanting to found businesses and this revived sense of entrepreneurship, that’s also been ported over to the illegitimate part of the economy. So you no longer have like large mafia families controlling most of the crime, or maybe you do, but there’s just a massive proliferation of…call them young founders in the criminal space.
Shady entrepreneurs. We’ve noticed a pretty big uptick in human trafficking, which is not a commonly well-understood concept. People are trafficked all the time. It could be for farm labor. It could be for sex trafficking purposes. Basically we started doing this work with the banks in helping them catch these people because they’re one, regulatorily obligated to do so, and two, there are massive liabilities in terms of fraud and these kinds of things that happen when these folks transact in your network. And then we started to see some patterns emerge that would help identify these folks in a more, and more, and more automated fashion.
One thing that’s particularly hard at a bank is sharing information. Now I believe that there are good reasons for that, from a privacy perspective, and all kinds of other things, but we’re trying to get the banks to share initially. Like, ‘Hey, we caught X, Y, and Z. Be on the lookout,’ but that turned out to be a compliance burden in and of itself. Because if one bank told someone else, and then that bank had them in their system, they proved that their systems for catching them weren’t efficient enough. So we said, ‘Okay, you don’t need to quite share the target list, or quite share the data, but what if we sent you … what if we kind of packaged everyone, and got everyone to crowdsource the queries they used with the external data and the internal data?’
So you’re not naming names, but you’re saying, ‘This is how you find people. We found people this way; you can find people the same way.’
Exactly. So the entire industry goes: ‘This is the kind of activity for a nail salon that has resulted in multiple instances of human trafficking for us. We’ve noticed that nail salons, or truck insurance companies, or whatever’ and that helps the banks categorize for a swath of those. We are relying on Polaris and their expertise, and their kind of function as an NGO of really raising awareness around this. We set up to build this crowdsourcing tool, which a bunch of the banks have jumped on. It’s in release with a couple of folks right now. It’ll always be mostly private and closed within the banks, because we don’t want the bad guys to catch on on the tools of the trade, but we’re excited about it. I think it’s a good step towards sharing information in an industry that’s usually extremely averse to collaborating in this kind of way.
How many partners have come on? Have you sensed any reluctance, or are they all like, ‘Yes, this is exactly what we’ve been waiting for?’
The reluctance, when we came up with this paradigm, dropped pretty significantly. I think the main stage gating is how do we operationalize this in our processes, and finding some extremely lightweight ways for them to do so. But we’ve had tons of bank partners; some even want to fund the project now.
I think coming up with ways for people to share all kinds of information is what’s necessary for problems like human trafficking, where the target is constantly changing, so it’s no one person or no one institution, that’s going to have the expertise required to follow a new kind of pattern in criminal activity. And listen, a lot of people ask us, ‘Well, why isn’t the government doing this?’ Well, the reality of the matter is that the data sits in the banks. It sits in the banks. It sits in the kind of databases that we have and that we productize. Governments are a recipient of this, gets the signal it needs, and then sends boots on the ground. Olivia Benson from SVU shows up to your client and starts giving you problems.
There’s a reason why there’s a fruitful and necessary collaboration in between private and public parts of the economy here, and it’s the folks that live in that world have often been in between both constantly. There’s like a revolving door between compliance officers and the district attorney’s office. It’s kind of fun to see people motivated by that kind of passion as well.
So this is going to be up and running. It’s in beta now. It’ll probably launch in a couple of months?
So it’s up and running now. We have a restricted set of partners that we’re working with, mostly because we want to get it right, and we don’t want to onboard the entire banking system at once for a pro bono project, that labor of love. But this summer we’ll be doing a bigger round of things in this regards, and anyone’s welcome to get in touch with us. You can shoot info at Enigma.com if you’re interested. We have a team of folks dedicated [to it] and plenty of collaboration with folks like Polaris, which is why we decided to partner with folks who have the expertise, who could kind of carry this together with us.
Originally published at https://www.pcmag.com on June 10, 2019.