My First ML Blog To Detect Fraud Transactions(IEEE CIS)

3 min readJan 17, 2020

Fraud detection using Machine Learning — Fraud Detection Using Machine Learning

1.Motivation/Introduction:

Hello ML enthusiasts,

Today I am very excited to write my first blog on IEEE CIS Fraud Detection which was a competition hosted by kaggle. I learn many things from this projects and iam ready to give a detailed explanation of it.

The machine learning (ML) approach to fraud detection has received a lot of publicity in recent years and shifted industry interest from rule-based fraud detection systems to ML-based solutions. Lets see the difference between the two systems.

Researchers from the IEEE Computational Intelligence Society (IEEE-CIS) want to decrease the fraud rate improving customer experience .With Higher accuracy fraud detection you can get on with your chips without the hassle.

IEEE-CIS works across a variety of AI and machine learning areas, including deep neural networks, fuzzy systems, evolutionary computation, and swarm intelligence. Today they’re partnering with the world’s leading payment service company, Vesta Corporation, seeking the best solutions for fraud prevention industy in kaggle. So lets get into the nitty gritty details of the data and build the model which can do this.

2. Prerequisites:

This post assumes familiarity with basic Data preprocessing steps of machine learning , Boosting methods like LightGBM and XGBoost, Python syntax , Scikit learn library etc.

3. Data Collection:

As already said all the data is collected from kaggle which is the home for datascience. You can see the data link below:

IEEE-CIS Fraud Detection

Can you detect fraud from customer transactions?

www.kaggle.com

The data is broken into two files identity and transaction, which are joined by a column called TransactionID.

Train- Identity: 144233 rows and 41 features.

Train-Transaction :590540 rows and 394 features.

Lets have a quick and brief glimpse of the data :

TransactionDT: timedelta from a given reference datetime (not an actual timestamp)
TransactionAMT: transaction payment amount in USD
ProductCD: product code, the product for each transaction
card1 — card6: payment card information, such as card type, card category, issue bank, country, etc.
addr: address
dist: distance
P_ and (R__) emaildomain: purchaser and recipient email domain
C1-C14: counting, such as how many addresses are found to be associated with the payment card, etc. The actual meaning is masked.
D1-D15: timedelta, such as days between previous transaction, etc.
M1-M9: match, such as names on card and address, etc.
Vxxx: Vesta engineered rich features, including ranking, counting, and other entity relations.

Categorical Features:
ProductCD
card1 — card6
addr1, addr2
Pemaildomain,Remaildomain
M1 — M9

**Identity Table**:

Variables in this table are identity information — network connection information (IP, ISP, Proxy, etc) and digital signature (UA/browser/os/version, etc) associated with transactions.
They’re collected by Vesta’s fraud protection system and digital security partners.
(The field names are masked and pairwise dictionary will not be provided for privacy protection and contract agreement)

Categorical Features:
DeviceType
DeviceInfo
id12 — id38

Some of the data is masked where we dont the actual meaning of the variables and some of them are quite easy to understand.