Real World Data Science Interview Assignment for Torqata

Ted Petrou
Dunder Data
Published in
19 min readJun 10, 2021

--

This post contains my work for a data science interview I completed for Torqata, a data analytics company. The Jupyter Notebook and data are located in this repository.

The objective is to determine whether or not a transaction is fraudulent or not. Take note of the slow and methodical pace. The vast majority of the analysis is done without any machine learning, and in fact, a conclusion can be reached without any machine learning.

Executive Summary

A simple model flagging all transactions that occurred in one second and those that had a repeated device ID with price above 29 dollars appears to capture most of the potential value. More investigation needs to be done to determine if other models would be able to find more signal in the data to flag more transactions as fraud.

Read in data

On read, several columns will be converted to a different type. The two time columns will be converted to datetime. The source, browser, and sex columns are converted to categorical as they have very low cardinality. Some memory is saved by using an 8-bit integer for the class column. The first column in the CSV is unnamed containing unique integers and is not read.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(rc={'figure.dpi': 144,
'ytick.labelsize': 7,
'axes.labelsize': 8

--

--

Ted Petrou
Dunder Data

Author of Master Data Analysis with Python and Founder of Dunder Data