Fraud Detection in Python; Predict Fraudulent Credit Card Transactions

16 min readOct 12, 2021

“Big data can enable companies to identify variables that recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase”, Lital Naor, CPA, Partner at Schneider, Naor & Co. CPA, August 2018

“Fraud detection analytics is the process of classifying, grouping and segmenting your data to analyze through millions of transactions, in an attempt to identify abnormal patterns and detect fraud”, Adi Schneider, CPA, CEO at FraudWize Ltd., March 2017

This post presents a reference implementation of a fraud detection analysis project that is built by using Python’s Scikit-Learn library. In this article, we introduce Logistic Regression, Random Forest, and Support Vector Machine. We also measure the accuracy of models that are built by using Machine Learning, and we assess directions for further development. And we will do all of the above in Python. Let’s get started!

1. Business Understanding

Fraud detection is a set of activities that are taken to prevent money or property from being obtained through false pretenses.

Fraud can be committed in different ways and different settings. For example, fraud can be committed in banking, insurance, government and healthcare sectors.

A common type of banking fraud is customer account takeover. This is when someone illegally gains access to a victim’s bank account using bots. Other examples of fraud in banking include the use of malicious applications, the use of false identities, money laundering, credit card fraud and mobile fraud.

Insurance fraud includes premium diversion fraud, which is the embezzlement of insurance premiums, or free churning, which is excessive trading by a stockbroker to maximize commissions. Other forms of insurance fraud include asset diversion, workers’ compensation fraud, car accident fraud, stolen or damaged car fraud, and house fire fraud. The motive behind all insurance fraud is financial profits.

Government fraud is committing fraud against federal agencies such as the U.S. Department of Health and Human Services, Department of Transportation, Department of Education or Department of Energy. Types of government fraud include billing for unnecessary procedures, overcharging for items that cost less, providing old equipment when billing for new equipment and reporting hours worked for a worker that does not exist.

Healthcare fraud includes drug fraud and medical fraud, as well as insurance fraud. Healthcare fraud is committed when someone defrauds an insurer or government health care program.

2. Data Understanding

Fraud can be committed in different ways and in many industries. The majority of detection methods combine a variety of fraud detection datasets to form a connected overview of both valid and non-valid payment data to make a decision. This decision must consider IP address, geolocation, device identification, “BIN” data, global latitude/longitude, historic transaction patterns, and the actual transaction information. In practice, this means that merchants and issuers deploy analytically based responses that use internal and external data to apply a set of business rules or analytical algorithms to detect fraud.

Credit Card Fraud Detection with Machine Learning is a process of data investigation by a Data Science team and the development of a model that will provide the best results in revealing and preventing fraudulent transactions. This is achieved through bringing together all meaningful features of card users’ transactions, such as Date, User Zone, Product Category, Amount, Provider, Client’s Behavioral Patterns, etc. The information is then run through a subtly trained model that finds patterns and rules so that it can classify whether a transaction is fraudulent or is legitimate.

The dataset comes from Prediction Consultants and it is related to transactions made by credit cards in August 2021 by Israeli cardholders. The classification goal is to predict whether the transaction will be (1/0) a fraud (variable y). The dataset can be downloaded from here.

It is pretty straightforward. Each row represents a credit card transaction, each column contains credit card transaction attributes:

Sum: the transaction Amount, this feature can be used for example-dependant cost-sensitive learning (numeric)
A: due to confidentiality issues, we cannot provide the original feature and more background information about it (numeric)
B: due to confidentiality issues, we cannot provide the original feature and more background information about it (numeric)
C: due to confidentiality issues, we cannot provide the original feature and more background information about it (numeric)
D: due to confidentiality issues, we cannot provide the original feature and more background information about it (numeric)
E: due to confidentiality issues, we cannot provide the original feature and more background information about it (numeric)
F: due to confidentiality issues, we cannot provide the original feature and more background information about it (numeric)
G: due to confidentiality issues, we cannot provide the original feature and more background information about it (numeric)
H: due to confidentiality issues, we cannot provide the original feature and more background information about it (numeric)
I: due to confidentiality issues, we cannot provide the original feature and more background information about it (numeric)
J: due to confidentiality issues, we cannot provide the original feature and more background information about it (numeric)
K: due to confidentiality issues, we cannot provide the original feature and more background information about it (numeric)
L: due to confidentiality issues, we cannot provide the original feature and more background information about it (numeric)
M: due to confidentiality issues, we cannot provide the original feature and more background information about it (numeric)
N: due to confidentiality issues, we cannot provide the original feature and more background information about it (numeric)
O: due to confidentiality issues, we cannot provide the original feature and more background information about it (numeric)
P: due to confidentiality issues, we cannot provide the original feature and more background information about it (numeric)
Q: due to confidentiality issues, we cannot provide the original feature and more background information about it (numeric)
R: due to confidentiality issues, we cannot provide the original feature and more background information about it (numeric)
S: due to confidentiality issues, we cannot provide the original feature and more background information about it (numeric)
T: due to confidentiality issues, we cannot provide the original feature and more background information about it (numeric)
U: due to confidentiality issues, we cannot provide the original feature and more background information about it (numeric)
V: due to confidentiality issues, we cannot provide the original feature and more background information about it (numeric)
W: due to confidentiality issues, we cannot provide the original feature and more background information about it (numeric)
X: due to confidentiality issues, we cannot provide the original feature and more background information about it (numeric)
Y: due to confidentiality issues, we cannot provide the original feature and more background information about it (numeric)
Z: due to confidentiality issues, we cannot provide the original feature and more background information about it (numeric)
a: due to confidentiality issues, we cannot provide the original feature and more background information about it (numeric)
b: due to confidentiality issues, we cannot provide the original feature and more background information about it (numeric)
fraud (Whether the transaction is fraudulent or not (1 or 0))

import pandas as pd
df = pd.read_csv('frauddetection.csv')
col_names = df.columns.tolist()
print("Column names:")
print(col_names)
print("\nSample data:")
df.head()

Column names:
['Sum', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'fraud']

Sample data:

The type of the columns can be found out as follows:

df.dtypes

Our data is pretty clean, no missing values

df.isnull().any()

The data contains 14,999 employees and 30 features

df.shape

(14999, 30)

The “fraud” column is the outcome variable recording 1 and 0. 1 for transactions which were fraudulent and 0 for those which were not.

Data Exploration

First of all, let us find out the number of transactions which were fraudulent and those which were not:

df[‘fraud’].value_counts()

There are 3571 transactions which were fraudulent and 11428 transactions which were legitimate in our data.

Let us get a sense of the numbers across these two classes:

df.groupby(‘fraud’).mean()

Several observations:

The average sum of transactions which were fraudulent is bigger than that of the transactions which were legitimate.
The average values of the features “B”, “D”, “H”, “S”, “T”, “U”, “V”, “Y“, “Z”, “a” and “b” of transactions which were fraudulent are higher than those of the transactions which were legitimate.
The average values of the features “A”, “C”, “E”, “F”, “G”, “W” and “X” of transactions which were fraudulent are lower than those of the transactions which were legitimate.

Data Visualization

Let us visualize our data to get a much clearer picture of the data and the significant features.

import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib
matplotlib.rcParams['figure.figsize'] = (10.0, 6.0)
sns.set(style="white")
sns.countplot(x='fraud', data=df)
plt.show()

Histograms are often one of the most helpful tools we can use for numeric variables during the exploratory phrase.

Histogram of numeric variables

%matplotlib inline
import matplotlib.pyplot as plt
df.hist(bins=50, figsize=(20,15))
plt.savefig("attribute_histogram_plots")
plt.show()

The variable we are going to predict is the “fraud”. So let’s look at how much each independent variable correlates with this dependent variable.

matplotlib.rcParams['figure.figsize'] = (20.0, 10.0)
sns.heatmap(df.corr(), annot=True)

corr_matrix = df.corr()
corr_matrix[“fraud”].sort_values(ascending=False)

The feature “fraud” tends to increase when the features “D”, “K”, and “B” go up. The feature “fraud” tends to decrease when the features “N”, “L”, “J”, ”P”,”Q”, “C”, “I”, “G”and “R” go down. You can see small positive correlations between the features “S”, “T”, “U”, “b”, “a” and the feature “fraud”. You can see a small negative correlation between the features “A”, “E”, “F”, “Time” and the feature “fraud”. And finally, coefficients close to zero indicate that there is no linear correlation.

We are now going to visualize the correlation between variables by using Pandas’ scatter_matrix function. We will just focus on a few promising variables, that seem the most correlated with the feature “fraud”.

from pandas.plotting import scatter_matrix
attributes = ["D", "K", "N", "L", "J"]
scatter_matrix(df[attributes], figsize=(16, 10))
plt.savefig('matrix.png')

The most promising variable for predicting the feature “fraud” is the feature “N”, so let’s zoom in on their correlation scatter plot.

matplotlib.rcParams['figure.figsize'] = (10.0, 6.0)
df.plot(kind="scatter", x="N", y="fraud", alpha=0.5)
plt.savefig('scatter.png')

The correlation is indeed very strong; you can clearly see that the points are not too dispersed.

3. Data Preparation

Since we have no categorical variables in the data in the dataset, we don’t need to be convert them to dummy variables before using our data for modelling.

df.columns.values

array([‘Sum’, ‘A’, ‘B’, ‘C’, ‘D’, ‘E’, ‘F’, ‘G’, ‘H’, ‘I’, ‘J’,
‘K’, ‘L’, ‘M’, ’N’, ‘O’, ‘P’, ‘Q’, ‘R’, ‘S’, ‘T’, ‘U’, ‘V’, ‘W’,
‘X’, ‘Y’, ‘Z’, ‘a’, ‘b’, ‘fraud’], dtype=object)

The outcome variable is “fraud”, and all the other variables are predictors.

dfw_vars=df.columns.values.tolist()
y=[‘fraud’]
X=[i for i in df_vars if i not in y]

Feature Selection

The Recursive Feature Elimination (RFE) works by recursively removing variables and building a model on those variables that remain. It uses the model accuracy to identify which variables (and combination of variables) contribute the most to predicting the target attribute.

Let’s use feature selection to help us decide which variables are significant that can predict fraudulent transaction with great accuracy.

len(X)

There is total 29 columns in X, how about select 10?

from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegressionmodel = LogisticRegression()rfe = RFE(model, 10)
rfe = rfe.fit(df[X], df[y])
print(rfe.support_)
print(rfe.ranking_)

[False False False False True False False False True False True True
True True True False True True True False False False False False
False False False False False]
[20 9 10 15 1 6 13 18 1 11 1 1 1 1 1 19 1 1 1 7 16 3 2 12
4 14 17 8 5]

You can see that RFE chose the 10 variables for us, which are marked True in the support_ array and marked with a choice “1” in the ranking_array. Let’s find them.

df[X].columns

Index([‘Sum’, ‘A’, ‘B’, ‘C’, ‘D’, ‘E’, ‘F’, ‘G’, ‘H’, ‘I’, ‘J’, ‘K’, ‘L’, ‘M’,
’N’, ‘O’, ‘P’, ‘Q’, ‘R’, ‘S’, ‘T’, ‘U’, ‘V’, ‘W’, ‘X’, ‘Y’, ‘Z’, ‘a’,
‘b’], dtype=’object’)

data_X1 = pd.DataFrame({
 ‘Feature’: df[X].columns,
 ‘Importance’: rfe.ranking_},)
data_X1.sort_values(by=[‘Importance’])

cols=[]
for i in range (0, len(data_X1[“Importance”])):
 if data_X1[“Importance”][i] == 1:
 cols.append(data_X1[“Feature”][i])
print(cols)
print(len(cols))

['D', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R']
10

4. Modeling

Fraud is typically involves multiple repeated methods, making searching for patterns a general focus for fraud detection. For example, data analysts can prevent insurance fraud by making algorithms to detect patterns and anomalies.

Fraud detection can be separated by the use of statistical data analysis techniques or artificial intelligence (AI).

Statistical data analysis techniques include:

calculating statistical parameters
regression analysis
probability distributions and models
data matching

AI techniques used to detect fraud include:

Data mining classifies, groups and segments data to search through millions of transactions to find patterns and detect fraud.
Neural networks learn suspicious-looking patterns and use those patterns to detect them further.
Machine learning automatically identifies characteristics found in fraud.
Pattern recognition detects classes, clusters and patterns of suspicious behavior.

X=df[cols]
y=df[‘fraud’]X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

Logistic Regression Model

from sklearn.linear_model import LogisticRegression
from sklearn import metricslogreg = LogisticRegression(random_state=42)
logreg.fit(X_train, y_train)

LogisticRegression(random_state=42)

from sklearn.metrics import accuracy_scoreprint('Logistic regression accuracy: {:.3f}'.format(accuracy_score(y_test, logreg.predict(X_test))))

Logistic regression accuracy: 0.985

Cross validation attempts to avoid overfitting while still producing a prediction for each observation dataset. We are using 10-fold Cross-Validation to train our logistic regression model.

from sklearn import model_selection
from sklearn.model_selection import cross_val_scorekfold = model_selection.KFold(n_splits=10, random_state=42)
modelCV = LogisticRegression(random_state=42)
scoring = ‘accuracy’
results = model_selection.cross_val_score(modelCV, X_train, y_train, cv=kfold, scoring=scoring)
print(“10-fold cross validation average accuracy of the logistic regression model: %.3f” % (results.mean()))

10-fold cross validation average accuracy of the logistic regression model: 0.983

The average accuracy remains very close to the Logistic Regression model accuracy; hence, we can conclude that the model generalizes well.

Random Forest

from sklearn.ensemble import RandomForestClassifierrf = RandomForestClassifier(random_state=42)
rf.fit(X_train, y_train)

RandomForestClassifier(random_state=42)

print(‘Random Forest Accuracy: {:.3f}’.format(accuracy_score(y_test, rf.predict(X_test))))

Random Forest Accuracy: 0.989

Cross validation attempts to avoid overfitting while still producing a prediction for each observation dataset. We are using 10-fold Cross-Validation to train our Random Forest model.

kfold = model_selection.KFold(n_splits=10, random_state=42)
modelCV =RandomForestClassifier(random_state=42)
scoring = ‘accuracy’
results = model_selection.cross_val_score(modelCV, X_train, y_train, cv=kfold, scoring=scoring)
print(“10-fold cross validation average accuracy of the random forest model: %.3f” % (results.mean()))

10-fold cross validation average accuracy of the random forest model: 0.986

The average accuracy remains very close to the Random Forest model accuracy; hence, we can conclude that the model generalizes well.

Support Vector Machine

from sklearn.svm import SVCsvc = SVC(random_state=42)
svc.fit(X_train, y_train)

SVC(random_state=42)

print(‘Support vector machine accuracy: {:.3f}’.format(accuracy_score(y_test, svc.predict(X_test))))

Support vector machine accuracy: 0.986

Cross validation attempts to avoid overfitting while still producing a prediction for each observation dataset. We are using 10-fold Cross-Validation to train our Support Vector Machine model.

from sklearn import model_selection
from sklearn.model_selection import cross_val_score
kfold = model_selection.KFold(n_splits=10, random_state=42)
modelCV = SVC(random_state=42)
scoring = ‘accuracy’
results = model_selection.cross_val_score(modelCV, X_train, y_train, cv=kfold, scoring=scoring)
print(“10-fold cross validation average accuracy of the support vector machine model: %.3f” % (results.mean()))

10-fold cross validation average accuracy of the support vector machine model: 0.983

The average accuracy remains very close to the Support Vector Machine model accuracy; hence, we can conclude that the model generalizes well.

5. Evaluation

We construct confusion matrix to visualize predictions made by a classifier and evaluate the accuracy of a classification.

Logistic Regression

logreg_y_pred = logreg.predict(X_test)import seaborn as sns
import matplotlib
matplotlib.rcParams[‘figure.figsize’] = (10.0, 6.0)
from sklearn.metrics import confusion_matrixlogreg_cm = metrics.confusion_matrix(logreg_y_pred, y_test, [1,0])
sns.heatmap(logreg_cm, cmap=’RdPu’, annot=True, fmt=’.0f’,xticklabels = [“Fraudulent”, “Legitimate”], yticklabels = [“Fraudulent”, “Legitimate”])
plt.ylabel(‘True class’)
plt.xlabel(‘Predicted class’)
plt.title(‘Logistic Regression’)
plt.savefig(‘logistic_regression’)

print(“\033[1m The result is telling us that we have: “,(logreg_cm[0,0]+logreg_cm[1,1]),”correct predictions\033[1m”)
 print(“\033[1m The result is telling us that we have: “,(logreg_cm[0,1]+logreg_cm[1,0]),”incorrect predictions\033[1m”)
 print(“\033[1m We have a total predictions of: “,(logreg_cm.sum()))

The result is telling us that we have: 4876 correct predictions
The result is telling us that we have: 74 incorrect predictions
We have a total predictions of: 4950

from sklearn.metrics import classification_report
print(classification_report(y_test, logreg.predict(X_test)))

Random Forest

y_pred = rf.predict(X_test)forest_cm = metrics.confusion_matrix(y_pred, y_test, [1,0])
sns.heatmap(forest_cm, cmap=’RdPu’, annot=True, fmt=’.0f’,xticklabels = [“Fraudulent”, “Legitimate”], yticklabels = [“Fraudulent”, “Legitimate”])
plt.ylabel(‘True class’)
plt.xlabel(‘Predicted class’)
plt.title(‘Random Forest’)
plt.savefig(‘random_forest’)

print(“\033[1m The result is telling us that we have: “,(forest_cm[0,0]+forest_cm[1,1]),”correct predictions\033[1m”)
 print(“\033[1m The result is telling us that we have: “,(forest_cm[0,1]+forest_cm[1,0]),”incorrect predictions\033[1m”)
 print(“\033[1m We have a total predictions of: “,(forest_cm.sum()))

The result is telling us that we have: 4896 correct predictions
The result is telling us that we have: 54 incorrect predictions
We have a total predictions of: 4950

from sklearn.metrics import classification_report
print(classification_report(y_test, rf.predict(X_test)))

Support Vector Machine

svc_y_pred = svc.predict(X_test)
svc_cm = metrics.confusion_matrix(svc_y_pred, y_test, [1,0])
sns.heatmap(svc_cm, cmap=’RdPu’, annot=True, fmt=’.0f’,xticklabels = [“Fraudulent”, “Legitimate”], yticklabels = [“Fraudulent”, “Legitimate”])
plt.ylabel(‘True class’)
plt.xlabel(‘Predicted class’)
plt.title(‘Support Vector Machine’)
plt.savefig(‘support_vector_machine’)

print(“\033[1m The result is telling us that we have: “,(svc_cm[0,0]+svc_cm[1,1]),”correct predictions\033[1m”)
print(“\033[1m The result is telling us that we have: “,(svc_cm[0,1]+svc_cm[1,0]),”incorrect predictions\033[1m”)
print(“\033[1m We have a total predictions of: “,(svc_cm.sum()))

The result is telling us that we have: 4880 correct predictions
The result is telling us that we have: 70 incorrect predictions
We have a total predictions of: 4950

print(classification_report(y_test, svc.predict(X_test)))

Conclusion

The winner is … Random forest.

When a transaction was fraudulent, how often does my classifier predict that correctly? This measurement is called “recall” and a quick look at these diagrams can demonstrate that random forest is clearly best for this criterion. Out of all the fraudulent cases, random forest correctly retrieved 1142 out of 1189. This translates to a turnover “recall” of about 96% (1142/1189), better than logistic regression (95%) (1128/1189) or support vector machine (94%) 1123/1189).

When a classifier predicts a transaction will be fraudulent, how often is that transaction actually fraudulent? This measurement is called “precision”. Random forest is about 99% precision (1142 out of 1149) with logistic regression at about 99% (1128 out of 1141), and support vector machine at about 100% (1123 out of 1127).

The ROC Curve

The receiver operating characteristic (ROC) curve is another common tool used with binary classifiers. The dotted line represents the ROC curve of a purely random classifier; a good classifier stays as far away from that line as possible (toward the top-left corner).

Feature Importance for Random Forest Model

Feature importance rates how important each feature is for the decision a tree makes. It is a number between 0 and 1 for each feature, where 0 means “not used at all” and 1 means “perfectly predicts the target”. The feature importances always sum to 1.

feature_labels = X.columns
importance = rf.feature_importances_
feature_indexes_by_importance = importance.argsort()[::-1]
for index in feature_indexes_by_importance:
print(‘{}-{:.2f}%’.format(feature_labels[index], (importance[index] *100.0)))

N-31.92%
J-20.70%
Q-14.47%
L-10.70%
D-8.07%
K-7.24%
P-3.74%
H-1.28%
R-0.95%
M-0.93%

According to our Random Forest model, the above shows the most important features which influence whether a transaction will be fraudulent, in a descending order.

Then we can visualize the feature importances:

import numpy as np
df_features = [x for i,x in enumerate(X.columns) if i!=10]sns.set(style=”white”)
matplotlib.rcParams[‘figure.figsize’] = (14.0, 7.0)
def plot_feature_importances(model):
plt.figure(figsize=(8,6))
n_features = 10
plt.barh(range(n_features), rf.feature_importances_, align=’center’)
plt.yticks(np.arange(n_features), df_features)
plt.xlabel(“Feature importance”)
plt.ylabel(“Feature”)
plt.ylim(-1, n_features)
plot_feature_importances(rf)
plt.savefig(‘feature_importance’)

Feature “N” is by far the most important feature.

6. Deployment

Let’s named our finalized model “model” rather than “rf”.

model = rf.fit(X_train, y_train)

So, our Random Forest model is a pretty good model for predicting the probability of fraudulent. Now how do we predict the probability of fraudulent for new transaction?

new_data = np.array([[1.130443, 0.675859, 0.726281, 0.344679, 0.602552, -1.279443, 0.180442, -0.030783, -0.470214, 0.057878]]).reshape(1, -1)

Suppose there is a new transaction which has: D=1.130443, H=0.675859, J=0.726281, K=0.344679, L=0.602552, M=-1.279443, N=0.180442, P=-0.030783, Q=-0.470214, and R=0.057878. We can take these new data and use it to predict the probability of fraudulent for the new transaction.

prediction = model.predict_proba(new_data)[:,1][0]
print(“\033[1m This new transaction has a {:.2%}”.format(prediction), “chance of being fraudulent”)

This new transaction has a 0.00% chance of being fraudulent

Saving the finalized model to pickle saves us a lot of time as we don’t have to train our model every time, we run the application. Once we save our model as pickle, you can load it later while making the prediction.

import pickle

First, let’s open a new file for our finalized model and call it “fw_model1”

f1 = open(“fw_model1”, “wb”)

Then, let’s save into this file our Random Forest model

pickle.dump(model , f1)

And let’s close this file.

f1.close()

Now, Let’s open a new Python notebook and write

import pickle
f2 = open(“fw_model1”, “rb”)
model = pickle.load(f2)

Now, let’S make a new prediction for the above transaction

model.predict_proba([[1.130443, 0.675859, 0.726281, 0.344679, 0.602552, -1.279443, 0.180442, -0.030783, -0.470214, 0.057878]])[:,1][0]

0.0

Ladies and gentlemen, we deployed our model.

Summary

This brings us to the end of the post. I am not going to print out a list of transactions which the model predicts that they are likely to be fraudulent. This is not the objective of this analysis.

Fraud detection is a set of activities undertaken to prevent money or property from being obtained through false pretenses.

Fraud detection is applied to many industries such as banking or insurance. In banking, fraud may include forging checks or using stolen credit cards. Other forms of fraud may involve exaggerating losses or causing an accident with the sole intent for the payout.

With an unlimited and rising number of ways someone can commit fraud, detection can be difficult. Activities such as reorganization, downsizing, moving to new information systems or encountering a cybersecurity breach could weaken an organization’s ability to detect fraud. Techniques such as real-time monitoring for fraud is recommended. Organizations should look for fraud in financial transactions, locations, devices used, initiated sessions and authentication systems.

Source code that created this post can be found here. I would be pleased to receive feedback or questions on any of the above.

Fraud Detection in Python; Predict Fraudulent Credit Card Transactions

1. Business Understanding

2. Data Understanding

Data Exploration

Data Visualization

3. Data Preparation

Feature Selection

4. Modeling

Logistic Regression Model

Random Forest

Support Vector Machine

5. Evaluation

Logistic Regression

Random Forest

Support Vector Machine

Conclusion

The ROC Curve

Feature Importance for Random Forest Model

6. Deployment

Summary

Written by Roi Polanitzer