Can Machine Learning help in detecting the Credit Card Frauds?

7 min readAug 2, 2022

Let’s find out making a credit card fraud detection model using machine learning approach!

Motivation:

Payment card fraud losses reached $28.65 billion worldwide in 2019, according to the most recent Nilson Report data. The United States alone is responsible for more than a third of the total global loss, making it the most card fraud-prone country in the world.

Julie Conroy, a research director for Aite Group’s fraud and anti-money laundering practice, said, “Our estimate was that at the end of 2020, the U.S. was seeing about $11 billion worth of losses due to credit card fraud.”

The coronavirus pandemic is also fueling explosive growth in card fraud activity.

“What happens in every economic downturn is that the attacks start to become more successful,” warned Julie Fergerson, CEO of Merchant Risk Council. “So over the next two to three years, I fully expect credit card fraud numbers to increase in a pretty meaningful way.”

As we are moving towards the digital world — cybersecurity is becoming a crucial part of our life. When we talk about security in digital life then the main challenge is to find the abnormal activity.

When we make any transaction while purchasing any product online — a good amount of people prefer credit cards. The credit limit in credit cards sometimes helps us making purchases even if we don’t have the amount at that time. but, on the other hand, these features are misused by cyber attackers.

To tackle this problem we need a system that can abort the transaction if it finds fishy.

Here, comes the need for a system that can track the pattern of all the transactions and if any pattern is abnormal then the transaction should be aborted.

Today, we have many machine learning algorithms that can help us classify abnormal transactions. The only requirement is the past data and the suitable algorithm that can fit our data in a better form.

In this article, I will help you in the complete end-to-end model training process — finally, you will get the best model that can classify the transaction into normal and abnormal types.

Context:

It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase.

Content of the dataset:

The dataset contains transactions made by credit cards in September 2013 by European cardholders.
This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.

It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, … V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are ‘Time’ and ‘Amount’. Feature ‘Time’ contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature ‘Amount’ is the transaction Amount, this feature can be used for example-dependant cost-sensitive learning. Feature ‘Class’ is the response variable and it takes value 1 in case of fraud and 0 otherwise.

Importing the Libraries:

It is a good practice to import all the necessary libraries in one place — so that we can modify them quickly.

Data Collection and Preprocessing:

Here we have loaded the data into our pandas dataframe and then we have checked the first five rows of the data.

Checking the missing values in the dataset:

As we can see there is no missing values at all in the dataset. So it is a plus point for us. Else we will have to handle the missing values by dropping them or by replacing them with some other values, this method is known as imputation.

In next step, we will check the number of different values in our target variable which is class.

As we can see the number of values of two cases in our target variable is too much different which states that our dataset is too imbalanced. So, we will have to handle this by using the method such as undersampling where we will take some random sample from our data and will try to make it balanced. So that our model can be trained in a generalised way.

Here:

0 — → Normal Transaction

1 — → Fraudulent Transaction

In next step, we have seperated the legit and fraudulent cases in seperate variables to deal with them:

We will observe the data deeply by looking into some statistical measures about the data.

Under-Sampling: As the dataset is highly imbalanced so this is a technique to overcome this. In this, we will build sample dataset containing similar distribution of normal and fraudulent transactions.

As the number of fraudulent transactions is 492, so we will build a sample dataset of normal transactions too with 492 transactions only.

All the fraudulent transactions are concatenated on the top of normal transactions, which can be seen using concat function.

Now we can see, our dataset is fully balanced!

Now we can see by grouping the transaction by mean that there is no huge difference in between the values of normal and fraudulent transactions.

Splitting the data into features and targets:

Splitting Dataset into Training data and Testing Data:

In the next step, we will split our data into training data and testing data. Training data is used to train our model and the test data is used to test our trained model. By checking the accuracy score of the trained model on test data we can determine whether our model is properly trained or not. Or it may be Overfitted or UnderFitted.

Training Our Logistic Regression Model:

Now we will train our model using the logistic regression model. As Logistic Regression is one of the best model for classification problems.

Model building on imbalanced data:

Metric selection for heavily imbalanced data As we have seen that the data is heavily imbalanced, where only 0.17% transactions are fraudulent, we should not consider Accuracy as a good measure for evaluating the model. Because in the case of all the data points return a particular class(1/0) irrespective of any prediction, still the model will result more than 99% Accuracy.

Hence, we have to measure the ROC-AUC score for fair evaluation of the model. The ROC curve is used to understand the strength of the model by evaluating the performance of the model at all the classification thresholds. The default threshold of 0.5 is not always the ideal threshold to find the best classification label of the test point. Because the ROC curve is measured at all thresholds, the best threshold would be one at which the TPR is high and FPR is low, i.e., misclassifications are low. After determining the optimal threshold, we can calculate the F1 score of the classifier to measure the precision and recall at the selected threshold.

Why SVM was not tried for model building and Random Forest was not tried for few cases?

In the dataset we have 284807 data points and in the case of Oversampling we would have even more number of datapoints. SVM is not very efficient with large number of datapoints because it takes lot of computational power and resources to make the transformation. When we perform the cross validation with K-Fold for hyperparameter tuning, it takes lot of computational resources and it is very time consuming. Hence, because of the unavailability of the required resources and time SVM was not tried.

For the same reason Random forest was also not tried for model building in few of the hyperparameter tuning for oversampling technique.

Why KNN was not used for model building?

KNN is not memory efficient. It becomes very slow as the number of datapoints increases as the model needs to store all the data points. It is computationally heavy because for a single datapoint the algorithm has to calculate the distance of all the data points and find the nearest neighbors.

Model Evaluation:

In this last step we will calculate the accuracy of our trained model on both training data and testing data.

If the trained model don’t perform well on the test data, then our model is not a good model.

Here we can see that our model performs well on both training data and the testing data. So, it is trained in a good way to classify the transactions to be normal or fraudulent.

Hope you liked my approach to solve this problem statement.

Thank you for reading.

Have a nice day! :)