Analytics Vidhya
Published in

Analytics Vidhya

Web User Behavior Analyzer

We know that internet contributes a huge role in society nowadays. Almost every day we surf the internet to find what we need and help us complete our work. Website or webpage is the most basic form of internet pages that can be visited by internet users. There are various types of websites based on the use and purpose of the site, such as commercial sites, social media, education, etc.

The number and variety of visitors from a website could possibly makes the admin or website owner overwhelmed and confused of what to do next. Here comes the Web User Behavior Analyzer, a tool to track, collect, and analyze visitor’s behavior from a certain website. The results of this analysis have several benefits for the site admin, among others, searching for defects from the site, developing existing features on the site, ensuring the delivery of information to site visitors, and others.

Some of UBA services

Building the UBA

This is actually pretty much my first project related to data analysis. This project was done about a year ago during my internship period. I was assigned to a team of 3 people. However, all of us are quite new to data analysis work, so we decide to do a bunch of research first about this system we are going to make.

After several days of doing research, we finally come up with our system architecture and flow of work. First, we need website which is going to be analyzed. Then, we need a database to store all of system’s information. We also need a script that is able to extract and scrap information from the website and store it to database. Next, we also need a script to grab data from database and analyze the data. The last part is a dashboard page that contains analysis result and can be accessed by the user.

UBA System’s Architecture

We decided to use Django, web-framework using python language as our code base. We will run a live server using this framework, containing all of the system’s architecture above. We use PostgreSQL as our database services and pandas library for data analysis. Some Javascript packages are also needed for visualization purposes. Here is a link for this repository in Github

Implementing Django

First thing to do is to install Django packages to our local/development device. I follow the guidelines provided in Django official documentation. After that, I start to create a project called ‘analyzer’ and an app in it called ‘webanalyzer’ using Django startproject command. Here is how it looks now in the repository.

Repository overview

We also follow some project tutorial provided in Django official documentation at this link.

These are default files if we start a project and an app using Django. File named settings.py contain configuration for our project. We clarify our ‘webanalyzer’ app in there and Postgresql database configuration in there. File named urls.py contain rooting to running application that we could access in the browser. File named manage.py is default file which we can use to run the live server.

Inside the webanalyzer directory, there are more files and subdirectory created by default in there. Here is a snippet of how it looks like.

webanalyzer subdirectory overview

In this project, we mostly used and applied changes in models.py, urls.py, views.py, static, and templates.

models.py contain our object definition and interpretation. We define 3 object classes which will be stored in database later on. It is also important to initialize table schema in our database. These 3 classes are called Fingerprint, Link, and Behavior. Fingerprint is the basic information of a visitor. It defines browser and OS being used, what type of device are they using, what language and timezone they currently are, etc. Link helps to track and count how many pages does our website have. Behavior defines our visitor’s specific behaviour during certain visit. It contains information such as IP address (visitor can access internet with same device but different WiFi connection), time stamp when they enter and leave certain page, their cursor movement in the page, etc. Here is a code snippet of models.py

urls.py defines rooting address pattern of this whole application. It also calls processing function declared in views.py to certain path. Here is a code snippet of urls.py

As mentioned before, views.py contains all of processing function needed. It could be said that this is the most important file of all since most of our work are done here. Static is a directory containing some Javascript library needed to process or visualize our analysis. While template is a directory containing HTML file as we will render our analysis into.

Database Setup

PostgreSQL is used as our database management system. In order to run this project, we also need to install Postgresql to our deployment machine (because we will run it in localhost). After installation, open PgAdmin and create database. Give a name and password anything you want. Make sure to include it in settings.py as below.

Snippet of database configuration in settings.py

Before running the live server for the first time, table needs to be migrated to make up the schema. Django help us so that this issue could be tackled with just executing these 2 commands in terminal.

python manage.py makemigrationspython manage.py migrate

After making migration, empty table will be initialized and we can check it in our PgAdmin page. Next, we could launch the live server on localhost using the following command.

python manage.py runserver

Extract and Scrap

After launching the live server, we could start to test it. We use random website template that we found on internet. For demo purpose, we initialize 3 pages each having same contents. Here is how it looks.

Website template

While visitor access and open up this page, the system gathers and collects visitor’s data and upload it to the database. We could inspect it on the urls.py code, where on the webanalyzer/fingerprint path, it calls collect function from views.py. If we take a look at collect function from views.py, we could see that it handles specific request. If it is GET, the system will render the page so visitor is able to see it. And if request’s method is POST, system will get all of the information and send it to the corresponding table in the database. We also need to write and put several JS scripts (static/fingerprint.js, static/jquery.js, static/client.js) in the HTML template so that it could be later called in the collect function.

Here are how tables on the database look like after visitor enter the website.

Visitor’s table
Behavior’s table
Website’s URL table

Visitor table contains list of visitors that have been visited our website and behavior table records all visits to the website. User table stores unique visitor, so it means that all of records stored in the behavior table done by only one visitor. There is foreign key in behavior table to connect it to user table and also give us information which user have done that visit.

Analysis

We use Python-pandas library to help us with the analysis. The reason to choose pandas is because it has a lot of methods and functions to process data and it also works for bigger data. All of analysis work are done in views.py code. First thing to do is to acquire the data from database and convert it into pandas dataframe format. It is done in database_to_df function in views.py as below.

After the data has been imported as dataframe, we do our analysis on the dataframe. The result of analysis then will be rendered to specific HTML page to be visualized. The specific data processing function is also called in the urls.py path so it will be executed each time user hit the path. Here is an example of analysis function in views.py

It is a function to analyse accumulative visit’s count and duration of all the website. It renders the data into webanalyzer/session_analisis.html template and passes data in form of python-dictionary or JSON.

Dashboard View

Here is the main menu. It contains 3 viewing option, which are data visitor, session analysis, and visitor analysis.

Main menu

Data visitor contains accumulative data of all visitor in the website. It displays demographical analysis of all visitor, like how many are using Chrome browser instead of Firefox, how much portion is Windows usage compared to other OS, etc. It uses bar charts and pie charts to visualize the report.

Data visitor view

Session analysis contains accumulative data of visit’s count and duration. It presents the graph of daily visit count, time average of URL visits, daily unique visitor, etc.

Session analysis

We could also see the URLs page analysis containing by clicking URL id. It displays information like what type of HTML element that is commonly clicked on specific page.

URL analysis

Visitor analysis contains specific information about a visitor, like its browser, OS, language, even its first visit to our website. It also reports its visit summary using the bar chart. It could also report its journey by displaying their mouse’s movement and click heatmap.

Visitor analysis
Heatmap

--

--

--

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Recommended from Medium

Ready Player Two — YOU READY?

You won’t want to leave your home office ever again

PostgreSQL vs. MySQL: What’s the Difference?

Variances in Scala

Two Sets of Eyes, Part 2: Which of these 5 methods of code review work best?

Chatbot Scripting: Storing Input Parameters from Client Applications in Teneo

RAPID development using Vidyano: KeyValueList

Our journey to Micro Frontend

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Fauzan Ragitya

Fauzan Ragitya

More from Medium

Resources to find datasets for your Next Data Science Project — Part 1

SIMPLE PYTHON FOR DATA ANALYTICS PROJECT FOR BEGINNERS USING SAMPLE DATA.

Q#63 Calculating a moving average — using python

Using openpyxl to take user input and write it to an excel sheet