Online Shoppers Purchasing Intention -Part 1-
— OVERVİEW —
Estimating the revenue of an online user when entering the website provides huge revenue to the site.In this story, we will start building a revenue predictor using Machine Learning techniques. The dataset can be found on UCİ-Online Shoppers Intention.
— Data —
The dataset consists of 10 numerical and 8 categorical attributes.
Uci Attribute Information:
“Administrative”, “Administrative Duration”, “Informational”, “Informational Duration”, “Product Related” and “Product Related Duration” represent the number of different types of pages visited by the visitor in that session and total time spent in each of these page categories. The values of these features are derived from the URL information of the pages visited by the user and updated in real time when a user takes an action, e.g. moving from one page to another. The “Bounce Rate”, “Exit Rate” and “Page Value” features represent the metrics measured by “Google Analytics” for each page in the e-commerce site. The value of “Bounce Rate” feature for a web page refers to the percentage of visitors who enter the site from that page and then leave (“bounce”) without triggering any other requests to the analytics server during that session. The value of “Exit Rate” feature for a specific web page is calculated as for all pageviews to the page, the percentage that were the last in the session. The “Page Value” feature represents the average value for a web page that a user visited before completing an e-commerce transaction. The “Special Day” feature indicates the closeness of the site visiting time to a specific special day (e.g. Mother’s Day, Valentine’s Day) in which the sessions are more likely to be finalized with transaction.
— Import Data and Analyze —
I used my Aws Cloud.
Host: My IP number.
— Data İnfo —
My Target Feature is Revenue column.
As can be seen in the table we have an imbalance dataset. Before machine learning I need to change some columns. So, I will use Label Encoder.What is Label Encoder?It is refers to converting the labels into numeric form so as to convert it into the machine-readable form.
!Let’s get started first look up to data correlation!
‘Correlation’ is shown which columns more important for us.As seen at the correlation map, traffic type and weekend column is very unneeded for Machine Learning.
After correlation, I wanted to show this graph. ‘ProductRelated_Duration’ and ‘Page Values’ columns have a very high correlation. If we want to select which columns are important we can use SelectKBest Function.
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
After that I tried Machine Learning techniques.Scores :
As a result Scores we decided Random Forest and Decision Tree are good for my project.
**In Part 2, I will describe Random Forest and Decision Tree in detail.