Analyzing Site Navigation Using Google Analytics

Weichen Lu
Data Science is life
4 min readApr 2, 2018

Google Analytics is a powerful tool where people can track users’ behaviours and interpret valuable information from it. Furthermore, people can also extract the information from its Google Analytics and implement various software to build the dashboard. Recently, I had received a client who want to visualize the path flow of their web pages along with its relevant key information such as views, exit rate, average time etc. My client is a leading data scientist training institution in Toronto which it provides both personal training and corporate training. My solution for this case is to get the report from Google Analytics in python by exploring dimensions and metrics, then build dashboards and import data to Gephi for the visualization.

Before moving to the details, I want to mention that we should always have a pipeline for the project in our mind. It will help people to track and understand the process of your work whenever you want to present to others.

Pipeline

Let’s get started!

Data Collection

The first thing is to configure and initialize my client’s API, and the work is done on python. After that, I used python to extract the report from Google Analytics by defined the dimensions and metrics.

Dimensions: ga:PreviousPagePath, ga:PagePath

Metrics: ga:Pageviews, ga: exits, ga:avgTimeOnPage

Explore dimensions and metrics

You can always change the dimensions and metrics to get different report base on the business requirement. Once you have defined the dimensions and metrics, you can get the report by defining a function in python.

Function for getting report

Data Preprocessing

The data I got includes some 404 which are commonly known as error pages. What I did is to remove these pages since they are not that useful to me. Moreover, some of the pages had been removed since the client is only interested in the path flow of popular pages.

One of the challenge of this task is that there is no dimension where you can get the path of next page. To deal with it, I basically set the previous page path as my starting page path and the current page path as my next page path.

Part of the data Frame

I have also add two additional columns which are view rate and exit rate.

View Rate: Indicate the percentage of exits for a particular path flow

Exit Rate: Indicate the percentage of views from a particular page to the subsequent page

Total columns

Data Visualization

Once the data was well prepared, it is time to do some visualization for analysis purpose. I used Gephi which is a software for networking visualization. I highly recommend it if you want to make some fancy networking plots.

The software itself is not that complicate and it supports many formats. You need to have nodes and edges if you want to import the data from csv files.

Nodes: Id, Label, (other attributes)

Edges: Source, Target, Weight, (other attributes)

For this task, I save my data frame as two csv files which are nodes and edges. Then, import the data to Gephi.

Page path flow graph

The entrance is where you directly go to a particular page. The label of each node represents the link of that pages and the size of the nodes means the number of views. The thickness of the edges represents the number of flows. As we can see from the graph, most people directly go to the main page then followed by courses page.

I separated three sub graphs from the main graph in order to see the details.

Courses

Courses path flow

Course-schedule

Course-schedule path flow

Blog

Blog path flow

Data Analysis

Based on the above graphs, I can draw several conclusions

Important pages: Main page, Courses page, Course-schedule page, Learning path page

Common path flow: Main -> Courses -> Course-schedule -> learning path

Popular courses: Python for data science, Data science tool box, Data science bootcamp, machine learning

Recommendation

I made follow suggestions to my client in order to keep them staying competitive in the market and stands out.

Updating blogs on a regular basis in order to be more attractive.

Enable clicks on each of the learning path section so that visitors don’t have to go back to the courses.

That is pretty much all about my second project.

Thanks for reading.

--

--