Credit Card Fraud Detection — Part I

Analytics Vidhya
Published in
4 min readFeb 6, 2021


The Credit Card Fraud Detection is an online Challenge on Kaggle where we aim to find if a transaction is Fraudulent or not. I’ve divided this article into two parts, where the Part-1 has information about the dataset and has Exploratory Data Analysis and Part-2 deals with data imbalance and comparison of various classification models.

About the Dataset

We’re given features V1, V2, … V28, that are the principal components obtained with PCA. The only features which have not been transformed with PCA are ‘Time’ and ‘Amount’. Feature ‘Time’ contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature ‘Amount’ is the transaction Amount, this feature can be used for example-dependent cost-sensitive learning. Feature ‘Class’ is a binary variable and it takes value 1 in case of fraudulent transaction and 0 otherwise. The competition link for the same is given below.

Exploratory Data Analysis

First, we’ll start by importing all the dependencies needed for Exploratory Data Analysis where we’ll analyze the dataset to find some significant patterns, dealing with missing data and duplicate data, heatmaps, distributions etc.

Reading the given dataset using pandas:

Fig. Output for the first 5 rows of the dataset.

Now, we’ll see if there is any missing data in the dataset.

No missing Values in the dataset

As we can see, there is no null values that we have to deal with in the dataset, we proceed further.

The dataset might consists of some duplicates which we are going to check and remove. For looking at this, we’ll see the shape of the data before and after removing the duplicates.

OUTPUT:(284807, 31)

OUTPUT:(283726, 31)

As we can see the shape of the data after removing duplicates has changed, we infer that 1081 rows have been deleted which had duplicated data.

Transaction Time Distribution for the dataset

There are two peaks in the graph. This dataset is for 2 days. We can relate this as two peaks corresponds to the two times in each day where maximum number of transactions that are happening(and depth corresponds to the night time where people are not doing any transactions).

Huge Imbalance between two classes.

We can see that we have a huge class-imbalance here. We’ll discuss the issues related with it and how we’ll deal with them in the Part-2 of this blog series.

Now we’re going to plot Time-Distribution graphs of Fraud and Non-Fraud Transactions and will observe if we find any patterns.

Distributions of Fraud and Non-Fraud Transactions

We don’t observe any significant patterns so we’ll move ahead.

Heatmap for Fraud and Non-Fraud transactions

It is better idea to scale the features before using the dataset so that all the values come in similar range. This is important so that features with lesser significance might not end up dominating the more significant features due to its larger range.

Eg. In some dataset, the column Salary might be in Lakhs/Crores but the column age would be under 100. This would lead the salary column to dominate the feature prediction even though it might be less significant. For this reason, different types of Scaling-Log, Standardization and Normalization is used. We’ll decide which of these to choose from depending on our dataset.

Log is a scaling technique which is done when the variables span several orders of magnitude.

Standardization is a scaling technique are the ones where the values are centered around the mean with a unit standard deviation. This means that the mean of the attribute becomes zero and the resultant distribution has a unit standard deviation.

Normalization (Min-Max Scaling) is a scaling technique in which values are shifted and are then rescaled so that they end up ranging between 0 and 1.

We’re going to compare which scaling technique suits our dataset best and thus, we’ll make a box-plot for comparing.

Box plots for Class vs Scaled Amounts

The minimum difference could be seen in Log Scaling. Rest have a huge difference in amounts for 0 and 1 class. Thus, we’d go further with Log Scaling.

Link for part-2:



Analytics Vidhya

Enthusiastic Machine Learning Engineer. I love learning! Would love to connect :)