Web User Behavior Analyzer
We know that internet contributes a huge role in society nowadays. Almost every day we surf the internet to find what we need and help us complete our work. Website or webpage is the most basic form of internet pages that can be visited by internet users. There are various types of websites based on the use and purpose of the site, such as commercial sites, social media, education, etc.
The number and variety of visitors from a website could possibly makes the admin or website owner overwhelmed and confused of what to do next. Here comes the Web User Behavior Analyzer, a tool to track, collect, and analyze visitor’s behavior from a certain website. The results of this analysis have several benefits for the site admin, among others, searching for defects from the site, developing existing features on the site, ensuring the delivery of information to site visitors, and others.
Building the UBA
This is actually pretty much my first project related to data analysis. This project was done about a year ago during my internship period. I was assigned to a team of 3 people. However, all of us are quite new to data analysis work, so we decide to do a bunch of research first about this system we are going to make.
After several days of doing research, we finally come up with our system architecture and flow of work. First, we need website which is going to be analyzed. Then, we need a database to store all of system’s information. We also need a script that is able to extract and scrap information from the website and store it to database. Next, we also need a script to grab data from database and analyze the data. The last part is a dashboard page that contains analysis result and can be accessed by the user.
First thing to do is to install Django packages to our local/development device. I follow the guidelines provided in Django official documentation. After that, I start to create a project called ‘analyzer’ and an app in it called ‘webanalyzer’ using Django startproject command. Here is how it looks now in the repository.
We also follow some project tutorial provided in Django official documentation at this link.
These are default files if we start a project and an app using Django. File named settings.py contain configuration for our project. We clarify our ‘webanalyzer’ app in there and Postgresql database configuration in there. File named urls.py contain rooting to running application that we could access in the browser. File named manage.py is default file which we can use to run the live server.
Inside the webanalyzer directory, there are more files and subdirectory created by default in there. Here is a snippet of how it looks like.
In this project, we mostly used and applied changes in models.py, urls.py, views.py, static, and templates.
models.py contain our object definition and interpretation. We define 3 object classes which will be stored in database later on. It is also important to initialize table schema in our database. These 3 classes are called Fingerprint, Link, and Behavior. Fingerprint is the basic information of a visitor. It defines browser and OS being used, what type of device are they using, what language and timezone they currently are, etc. Link helps to track and count how many pages does our website have. Behavior defines our visitor’s specific behaviour during certain visit. It contains information such as IP address (visitor can access internet with same device but different WiFi connection), time stamp when they enter and leave certain page, their cursor movement in the page, etc. Here is a code snippet of models.py
urls.py defines rooting address pattern of this whole application. It also calls processing function declared in views.py to certain path. Here is a code snippet of urls.py
PostgreSQL is used as our database management system. In order to run this project, we also need to install Postgresql to our deployment machine (because we will run it in localhost). After installation, open PgAdmin and create database. Give a name and password anything you want. Make sure to include it in settings.py as below.
Before running the live server for the first time, table needs to be migrated to make up the schema. Django help us so that this issue could be tackled with just executing these 2 commands in terminal.
python manage.py makemigrationspython manage.py migrate
After making migration, empty table will be initialized and we can check it in our PgAdmin page. Next, we could launch the live server on localhost using the following command.
python manage.py runserver
Extract and Scrap
After launching the live server, we could start to test it. We use random website template that we found on internet. For demo purpose, we initialize 3 pages each having same contents. Here is how it looks.
While visitor access and open up this page, the system gathers and collects visitor’s data and upload it to the database. We could inspect it on the urls.py code, where on the webanalyzer/fingerprint path, it calls collect function from views.py. If we take a look at collect function from views.py, we could see that it handles specific request. If it is GET, the system will render the page so visitor is able to see it. And if request’s method is POST, system will get all of the information and send it to the corresponding table in the database. We also need to write and put several JS scripts (static/fingerprint.js, static/jquery.js, static/client.js) in the HTML template so that it could be later called in the collect function.
Here are how tables on the database look like after visitor enter the website.
Visitor table contains list of visitors that have been visited our website and behavior table records all visits to the website. User table stores unique visitor, so it means that all of records stored in the behavior table done by only one visitor. There is foreign key in behavior table to connect it to user table and also give us information which user have done that visit.
We use Python-pandas library to help us with the analysis. The reason to choose pandas is because it has a lot of methods and functions to process data and it also works for bigger data. All of analysis work are done in views.py code. First thing to do is to acquire the data from database and convert it into pandas dataframe format. It is done in database_to_df function in views.py as below.
After the data has been imported as dataframe, we do our analysis on the dataframe. The result of analysis then will be rendered to specific HTML page to be visualized. The specific data processing function is also called in the urls.py path so it will be executed each time user hit the path. Here is an example of analysis function in views.py
It is a function to analyse accumulative visit’s count and duration of all the website. It renders the data into webanalyzer/session_analisis.html template and passes data in form of python-dictionary or JSON.
Here is the main menu. It contains 3 viewing option, which are data visitor, session analysis, and visitor analysis.
Data visitor contains accumulative data of all visitor in the website. It displays demographical analysis of all visitor, like how many are using Chrome browser instead of Firefox, how much portion is Windows usage compared to other OS, etc. It uses bar charts and pie charts to visualize the report.
Session analysis contains accumulative data of visit’s count and duration. It presents the graph of daily visit count, time average of URL visits, daily unique visitor, etc.
We could also see the URLs page analysis containing by clicking URL id. It displays information like what type of HTML element that is commonly clicked on specific page.
Visitor analysis contains specific information about a visitor, like its browser, OS, language, even its first visit to our website. It also reports its visit summary using the bar chart. It could also report its journey by displaying their mouse’s movement and click heatmap.