Financial Transaction Fraud Detection

Logistic Regression, TensorFlow Keras, or XGBoost

Maziar Izadi
Mar 9 · 10 min read
  • Australia lost $574 Million on fraudulent transactions in 2018, out of which $487.5 Million occurred by card-not-present channels.
  • In the same year, $24.26 Billion was lost due to payment card fraud worldwide.
  • In FRAUD THE FACTS 2019, the UK government reported that “unauthorised financial fraud losses across payment cards, remote banking and cheques totalled £844.8 million in 2018, an increase of 16 per cent 😱 compared to 2017”.

Above mentioned matters explicitly express the substantial importance of fraud detection capabilities in banking and financial sector. Nonetheless, it is unfortunate that financial institutions are reluctant to switch to more advanced technologies such as machine-learning and deep-learning engines due to restrictions posed by regulatory parties and banks have sticked to original (and fair-to-say well-proven) rule-based systems.

Hopefully🤞, thanks to recent computing enablement and data availability, we’ll soon witness changes in this trend.

Here, I’m going to compare the performance of multiple tools, which are also mathematically different, in detecting fraudulent transactions. For that, I have used dataset provided by Machine Learning Group — ULB as part of Credit Card Fraud Detection Data on Kaggle.

The datasets contains transactions made by credit cards in September 2013 by European cardholders which presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions.

The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions. This matter is the root of extreme complexity in this topic — extremely imbalanced data that I have explained how to overcome further down.


Notes on the current article:

  • For the first time, I used Google Colab to write Python code. However, I can’t highlight any wow in the experience as oppose to Jupyter Notebook. The reason I used Colab was TensorFlow mainly.
  • Logistic Regresion, tf.Keras, and Xgboost algorithms are used to predict fraudulent transaction and results are compared in terms of precision and recall.

Complete python code is available on my Github.

Let’s jump into it 😏


1. Import data from Google Drive

Load in the csv file from my Google Drive and save into a Pandas data frame.

# Code to read csv file into Colaboratory:!pip install -U -q PyDrivefrom pydrive.auth import GoogleAuthfrom import GoogleDrivefrom google.colab import authfrom oauth2client.client import GoogleCredentials------# Authenticate and create the PyDrive client.auth.authenticate_user()gauth = GoogleAuth()gauth.credentials = GoogleCredentials.get_application_default()drive = GoogleDrive(gauth)

In the process of importing the csv file from Google Drive, you need to enter the file ID . Look at the 3rd method introduced here if you’d like to learn the details.

#you need to enter your file idid = '1grwIZR_LdcdyirULSoJ_VFhtuPpv00AB'----
downloaded = drive.CreateFile({'id':id})
data = pd.read_csv('Filename.csv')
# Dataset is now stored in a Pandas Dataframe

Our data has 284,807 records in 31 columns out of which 30 columns encompass independent variables which theoretically explain the changes in our dependant variable. In our scenario, dependant variable is a binary column to show whether a transaction was fraudulent or genuine.

Based on the data description (image below), there’s a variety of ranges in our data set. As a result, Data Normalisation is required to change the values of numeric columns in the dataset to a common scale.

Dataframe description

2. Scaling the data frame

I separated dependant variable from independent variables. Remember that normalisation is only done on independent variables.

X_data = data.iloc[:,0:30]y_data = data.iloc[:,-1]

… and then, used Standard Scaler to normalise training dataset.

standard_scaler = preprocessing.StandardScaler()X_standard_scaled_df = standard_scaler.fit_transform(X_data)

3. Feature Extraction

Obviously in ML algorithms you are dealing with the variance-bias trade-off and it’s always a challenge to overcome overfitting problem. I used a couple of methods and found out Principal Component Analysis contributed to the best output in terms of dealing with the trade-off.

Towards Data Science has provided a brief and effective explanation on PCA. If you are on your outset to become a data scientist, I suggest you have a look…good stuff!

# Make an instance of the Modelpca = PCA(10)
# fit and transform data frame in one jump
pca_selected = pca.fit_transform(X_standard_scaled_df)

Result of PCA is interesting! There are 10 features extracted out of the function. I convert the result into a Pandas data frame and have a look at the first 5 rows.

data set first 5 rows

4. Train and Test Data Split

As mentioned before, the trickiest point about fraud data sets is the extreme imbalanced distribution of positive and negative instances. For instance, let’s have a look at our current data set by using .value_counts() function and then illustrating it.

positive and negative value counts

Results show that there are only 492 instances of fraudulent transactions out of the total 284,807 records which holds only %0.1727 of all the samples.

There it goes splitting an extremely imbalanced data set…

Since I need to make sure that the handful number of positive class is distributed proportionally in both training and test data, I take the following steps:

  1. separate all the positive records from negative
data_class_0 = ready_data[ready_data['Class']==0]data_class_1 = ready_data[ready_data['Class']==1]

2. spilt each class into train and test set, %67 to %33 respectively

# Since the number of fraud transactions are too little in compare to non-fraud,# I make sure that they are distributed proportionally in both train and test setX_0 = data_class_0.iloc[:,0:-1]  #independent columnsy_0 = data_class_0.iloc[:,-1]    #target column i.e ClassX_1 = data_class_1.iloc[:,0:-1]  #independent columnsy_1 = data_class_1.iloc[:,-1]    #target column i.e ClassX_train_0, X_test_0, y_train_0, y_test_0 = train_test_split(X_0, y_0, test_size=0.33, random_state=42)X_train_1, X_test_1, y_train_1, y_test_1 = train_test_split(X_1, y_1, test_size=0.33, random_state=42)

3. Join them back to have one train and one test set which proportionally encompass both positive and negative class

X_train = pd.concat([X_train_0, X_train_1])y_train = pd.concat([y_train_0, y_train_1])X_test = pd.concat([X_test_0 , X_test_1])y_test = pd.concat([y_test_0 , y_test_1])

At this point, if we look at data set, we have have the following:

Train and Test data sets

Considering the imbalanced data set, next step is to balance our independent training set X_train .

5. Balance training data set

Before getting into what I did, you might like to have a look on the concept of Over-sampling and Under-sampling in general. Here is a good explanation by “Machine Learning Mastery”.

I used SMOTE() function for balancing current model. In a nutshell, SMOTE corresponds to the desired ratio of the number of samples in the minority class over the number of samples in the majority class after resampling. Check here for full documentation. To do so, we simply enter:

sm = SMOTE(random_state=42)X_res, y_res = sm.fit_resample(X_train, y_train)

Now if we compare the dataset before and after SMOTE, here we see the magic.

Output:Original dataset shape Counter({0: 190491, 1: 329})Resampled dataset shape Counter({0: 190491, 1: 190491})

At this stage, we have (…finally 😫) got data set ready for modelling.

6. Fraud Detection Model

For this purpose, I used 3 algorithms and compared their results:

6.1. Logistic Regression


As discussed above, logistic regression is the most highly accepted and used algorithm among the others in the real-life banking industry.

logisticRegr = LogisticRegression()logit_model =, y_train)logit_predict = logisticRegr.predict(X_test)

Here is the accuracy output of Logistic Regression:

print(classification_report(y_test, logit_predict))
Logistic Regression classification report

6.2. Deep Learning Neural Network — TensorFLow.Keras

Image result for deep learning keras png

The second algorithm is the Artificial Neural Network for which I used TensorFLow Keras.

Input and hidden layers

  • Looking at the code on my Github, you’ll see that I tried building the neural network twice. Once, I set the units = 10 for hidden layers and second time, set it =32 (which is more common practice). The network with 32 units of hidden layer resulted in higher accuracy.
  • I did also play with the number of epochs which is another influential hyper-parameter for managing overfitting problem. I found 10 epochs end up with an overfitted model which 5 was reasonably acceptable.
  • The other hyper-parameter is Activation Function. At the most basic level, an activation function decides whether a neuron should be fired or not. It accepts the weighted sum of the inputs and bias as input to any activation function. Step function, Sigmoid, ReLU, Tanh, and Softmax are examples of activation functions. MissingLink has provided a good summary of their story in 7 Types of Neural Network Activation Functions.
ANN Activation functions

Output layer

The output will be a one binary layer for which I set the Sigmoid(👆) activation function.

# Initialising the ANN
classifier = keras.Sequential()

# Adding the input layer and the first hidden layer
classifier.add(keras.layers.Dense(units =32 , kernel_initializer = 'uniform', activation = 'relu', input_dim =10))
# Adding the output layer
classifier.add(keras.layers.Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
# Compiling the ANNclassifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])# And finally
# Fitting the ANN to the Training set
model =, y_train.values, batch_size = 128, epochs = 5)
Model is being trained through 5 Epochs
# Predicting the Test set results
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)
93987/93987 [==============================] - 3s 31us/sample - loss: 0.0041 - accuracy: 0.9992

Let’s see how our model performed

print(classification_report(y_test, y_pred))
tf.Keras classification report

A couple of extras:

  • Confusion Matrix
Keras confusion matrix
  • ROC AUC Diagram

6.3. XGBoost

XGBoost stands for eXtreme Gradient Boosting and is the next level of features of the scikit-learn and R implementations, with new additions like regularisation. Again, have a look at MachineLearningMastery for further explanations.

XGBoost is,

an algorithm that has recently been dominating applied machine learning and Kaggle competitions for structured or tabular data.

an implementation of gradient boosted decision trees designed for speed and performance.

There are two important hyper-parameters that you need to have eyes on which are learning_rate and n-estimators. They help you challenge overfitting problem and improve accuracy. My learning was %1 learning rate produced and better “recall” as oppose to %10. And to be honest, I believe 10,000 n_estimators was a overkill but I did it anyway 😅.

# Learning rate = 0.01XGB_classifier = XGBClassifier(n_estimators=10000, learning_rate=.01, maximize=True),y_train, eval_metric = 'aucpr')


Xgboost trained model

Next is using the model to predict dependant variables based on independent test data…

XGB_classifier_predict_smote = XGB_classifier.predict(X_test)

… and compare the prediction with actual dependent test set.


Finally goes the result 👇

Xgboost classification report

Comparing results and conclusion

“what is the best accuracy measure in our scenario?”

hmmm… 🤔 any idea?!


Why I even posed this question? Because remember…banking is a sensitive industry and there’s huge loss every year.

To answer this question, I would like to refer you back to our friends who build the Confusion_Matrix… the four (in)famous TP, FP, TN, and FN.

If you think about which one of the four is most important to us, you’d know what the best accuracy measure is. Basically, we are dealing with fraudulent transaction, it’s critically important that we do not flag a fraud transaction as genuine. The opposite (flagging a non-fraud as fraud) is also costly but not as critical.

Therefore, priority would be minimising the number of False Negatives. Meaning to say that minimising the number of “fraudulent transactions which are marked as genuine”. Accordingly, we know that we have to improve Recall’s amount as high as possible.

This Idiot’s Guide to Precision, Recall and Confusion Matrix helped me big time. You can also have a look if you feel like you’re not following what I’m saying here.

Comparing results of the three algorithms, we see that Precision hasn’t changed drastically while there’s been a substantial improvement in Recall from Logistic Regression to tf.Keras and from tf.Keras to XGBoost.

Final comparison of 3 algorithms

… and the prize🥇🏆goes to Xgboost for its humble, mathematically powerful gradient boosted engine.


Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Maziar Izadi

Written by

I set goals ambitiously…I take actions quickly…I write…to learn…I play music… to meditate.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem

More From Medium

More from Analytics Vidhya

More from Analytics Vidhya

More from Analytics Vidhya

The Illustrated Word2vec

More from Analytics Vidhya

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade