Fighting Fraud With AWS Machine Learning

I work on the analytics team at Minibar Delivery. We’re an on-demand, e-commerce platform for alcohol which makes us particularly vulnerable to fraudsters. In this post, I will talk about how we used the AWS Machine Learning service to cut our fraud rate in half and save thousands of dollars per month.

Some Context

We partner with local liquor stores to provide consumers with on-demand delivery of alcohol. Customers enter their address, are matched with a liquor store, and then shop that stores inventory. We’re in 30+ markets across the U.S. and receive thousands of orders per night across the platform.

The nature of our business model and the scale at which we operate makes it particularly difficult to fight fraud. Before using ML we used a manual process to catch fraudsters. We used a point system that triggered a manual check from someone on our Customer Service team. A manual check consisted of checking Facebook and other social sites to verify if this person was in fact real. This system was ineffective, time consuming, and costing us a ton of money.

AWS Machine Learning

If you’re using AWS you have a ton of services available to you, Machine Learning being one of them. There is minimal coding required to get it up and running and because it’s using your own data, is tailored to your business. I’ve outlined the steps below.

1. Preparation

ML works by finding patterns between variables in a data set. So the prep work you have to do is around determining the variables you want to test.

If: fraud = Ax1 + Bx2 + Cx3

You must determine what a, b, and c are so that you can use ML to determine x1, x2, and x3 (the weight of each variable).

Coming up with variables took a lot of work and required input from multiple teams. In the end we came up with 34 variables to include in our model. Things like order size, zip code, and time of day we hypnotized could be an indicator or a fraudulent order.

2. Creating a data source

Once we determined the variables we wanted to test, we were ready to use AWS’s Machine Learning service. It’s super simple to use. We connected it to your data warehouse and then wrote a SQL query to pull our historical orders with the variables we determined in Step 1.

Note:

  • Make sure you include the variable you are trying to predict (the target)
  • Try to normalize data whenever possible. For example, if you’re using phone numbers remove all

3. Determine a schema and target

Here you tell AWS the format of each variable (binary, categorical, numerical, text) and select a target. In our case, we created the binary “Fraud?” and set this to our target.

4. Creating a ML Model

Now that we had the historical data AWS can apply their ML techniques to find correlations.

If you have ML experience you can chose to create a custom recipe. We used the default recipe and it worked great.

AWS will use 70% of the data to train the model and use the other 30% to evaluate the output. For the most part you will want to make sure your data is randomized and not in order by date so other business trends don’t skew the model.

5. Evaluation

Now the model is created we can see the results when our test data is put through it. In our case, we are trying to predict a binary (Fraud = True or False). AWS output will be a number between 0 (100% not fraud) and 1 (100% fraud).

We set a threshold of .35 so that any order with a score above .35 is flagged as potential fraud. There are tradeoffs you should explore when setting your threshold. A lower threshold means more true-positives at the expense more false-positives and visa-versa.

It took some refinement and we had to remove and add variables but once we had a model we were happy with, we set up a real-time endpoint so that every new order is fed into it. If an order returns a score higher than our .35 threshold a manual check is triggered and someone from our customer service team determines whether to cancel the order.

Limitations

Unfortunately, the model is static so it isn’t true ML in the sense that it gets better with each new row of data that is fed into it. Because of this, we regularly create a new model to include the orders that were not in the previous version.

— — —

I hope this was helpful. Happy to answer any questions and I would love to hear how else people are using AWS ML to solve business problems!

Twitter: @rtbrennan1