Zentel Network Service Center: An Analysis of Ticket Data

6 min readNov 17, 2022

In October, specifically 5th and 8th, I competed in Datafest Africa’s Data Hackathon. I participated as a member of a team, Analytical Minds.

The competition was a three-part contest. The first part involved making a machine learning model and performing exploratory data analysis. The second part, which was reserved for the top 25 teams from the first part, involved wrangling, modelling and visualizing a dataset using PowerBI only. The last part, for the top 5, was the presentation of the reports created.

As David Abu and the other organizers said, the aim of the hackathon was to enable each participant add a new project to their portfolio. With this in mind, I decided to do the visualization in the second stage of the contest on my own. And so, in this article, I will be guiding you through each step I took.

I carried out an analysis of Zentel Network Service Center’s ticket data, which ran between the 1st and 31st of December, 2020. The dataset contained 5,998 entries with 12 columns/features. Features ranged from Report ID, channel, operator, among others.

Data was wrangled using Python and Power Query, while visualization was done using PowerBI.

METHODOLOGY

Analysis of the data was done using the following steps:

Data Wrangling: Gathering, Assessing and Cleaning
Data Modelling
Data Visualization

WHAT DID I WANT TO KNOW?

Before diving into the process, I posed questions to be answered, most of which were provided in the brief received from the datathon organizers.

A. What was the general performance of the service center and its staff? In other words:

How many tickets were opened?
From what 5 states were the most complaints received?
What channels were used most in lodging complaints?
What were the main faults?
Who were the quickest responding operators?
Who were the quickest resolving operators?

B. The board wanted to keep ticket response time below 10 seconds, on average. What were the factors preventing this?

C. How good was staff performance? Who were the best and worst performing managers and operators?

DATA WRANGLING.

Wrangling of a dataset involves three main steps: gathering, assessing and cleaning. To gather, I downloaded the dataset using this link I had already been provided with. The Excel file obtained was at the time, partially modelled (discussed in the next section) with the fact and dimensions tables already prepared. Afterwards, I extracted the fact table into a separate .csv file. This was used for wrangling on Python.

Assessing involves using visual (observation) and programmatic (using code) methods to extract the issues with the dataset. On assessment of the dataset, I noticed the dataset was relatively cleaner than most and wasn’t going to require a tedious cleaning process. However, I noted the following issues:

Fault type had missing values.

A list of all the features of the dataset. The Fault Type column has 5472 entries, while all others have 5998.

The column labels were written with capitalized titles, with no underscores. This is important because it was going to be impossible to use the labels for further work.

The datatypes of ticket open time, ticket response time, issue response time, and ticket close time needed to be changed to the datetime datatype.

To clean this data, I adjusted the labels, filled the missing values in Fault Type with ‘Unknown’ and changed the erroneous datatypes, in that order.

The cleaned data was then saved into another .csv file and appended to the master dataset.

DATA MODELLING

An article on Microsoft PowerBI’s site defines Data modelling as the process of analyzing and defining all the different data your business collects and produces, as well as the relationships between those bits of data. This part was extremely important to this dataset because it made analysis a lot easier, clearer and less error-prone.

Like I said in the previous section, the data was already partially modelled. The dimensions tables: Channel Type, Service Type, Employee, Fault Type and Location had already been defined in the master dataset.

I loaded the data into Power Query on PowerBI to transform it. On first attempt at modelling, all the connections had a Many-Many cardinality. To correct this, I had to get back into Power Query. These corrections were made:

The column headers were renamed to the original format.
All irrelevant rows and columns, especially in the dimensions tables, were dropped.
A new column, Service Code, was created by duplicating Report ID and splitting the duplicate column by its right-most delimiter.
Sequel to the previous point, duplicate rows in the Service Type table were dropped.

However, there were two key columns missing from the fact table: Response Duration (the amount of time taken to respond to a complaint) and Resolution Duration (the amount of time taken to fix any issues raised). Using the DAX formula, DATEDIFF, these columns were calculated from the dataset.

After this, the connections were made.

Finally, I was ready to move on to next stage: Data Visualization.

DATA VISUALIZATION

Visualization of this data was done using PowerBI. To reiterate, I listed out the questions I was to give solutions to in an earlier section of this article. To derive insights for cases such as the best and worst performing operators, I took standard deviation as my preferred measure of consistency against the usual average. This is because average values do not effectively compute the consistency of staff. Standard deviation, however, measures how far away from the overall average each person performed; literally, the deviation from the mean.

Furthermore, to assess staff performance, I based my insights only on tickets in the Completed status. I thought, you wouldn’t track productivity with incomplete work, right?

Insights: Some of them are as follows.

A. General Performance:

Complaints received spiked between the 29th and 31st of the month.
Delta State recorded the most complaints, with 420 reports.
Social Media, Webform and USSD were the commonest channels used to lay complaints.

B. Ticket Response Time Analysis: Some of the main factors causing ticket response time to be greater than 10 seconds were as follows:

Day of the Month: Given that most of the tickets were opened between the 29th and 31st, ticket response time was much slower.
Channel Used: Social Media complaints were attended to in more time than other channels.

C. Staff Performance:

Victor and his team completed 2485 tickets, the most of all the managers.
It was observed that Manager Wale consistently had the least response and resolution times.

All others can be found in the complete interactive dashboard, linked here.

RECOMMENDATIONS.

More operators should be employed, probably as contract staff, to handle the spike of complaints logged in between the 29th and 31st of the month.
Sequel to the first point, a manager like Victor may need more operators under him, as his team clearly receives the most complaints.
An automated system could be developed to handle late night complaints.
Best performing operators and managers should be recognized and rewarded adequately to improve the overall work ethic of staff.
Poorly performing operators should be probed into so as to find out the reasons for and ways to improve their current performance.

FINAL WORDS

Even though I didn’t win this, I am grateful for the opportunity to have participated in Datafest’s maiden datathon. It was my first attempt at any hackathon and I am happy I did it. Going forward, I am open to collaborations on other projects and hackathons. Corrections on this project are welcome and appreciated.

Till my next article, ¡adiós!

PS: Follow me on Twitter and connect with me on LinkedIn. All code used and the link to the interactive report are stored in the Zentel Analysis repository on my Github.

Zentel Network Service Center: An Analysis of Ticket Data

Written by Neto Anyama