My tryKibo Final Project: The beginning of my analytics journey

Ogunjirin Oluwasotun Goodness
5 min readJun 27, 2024

--

I am happy to have completed my data analysis course with tryKibo having embarked on my AI/ML journey. I had the opportunity to join a 5 week boot camp to learn about data analysis, hosted by Kibo School. The whole learning phase was fun for me, from the live sessions to the assignments, weekly projects, and even down to the final project which I’d also be talking about in the next section of this post. We were exposed to various resources from working with sheets, to working with SQL and databases, and also visualizing data with google looker studio. I also learnt about the basics of Machine Learning and Artificial Intelligence before the final project. For the final project, I analyzed a kaggle dataset on electric power consumption in Tetouan, Morocco for the year 2017 (Find the dataset here). The aim of this project was to majorly study the trend of power consumption in different zones in Tetouan while also considering the weather conditions.

The tools I used in this analysis include Google sheets, and Google Looker Studio. The data consisted of 9 columns and 52,417 rows of data spanning from 01-Jan-2017 to 30-Dec-2017 with date and float data types. The analysis also showed that for the 3 zones in Tetouan, in the year 2017, the average temperature in Tetouan was 18.8 degrees Celsius, the average humidity 68.3% and the average wind speed, 2.0 meters per second.

The steps I used in this analysis include:

Cleaning: There was little cleaning to do for this dataset as most of the data was in the correct format. The only thing left to do was to check for blanks and other rows that might not have the correct format, and also have a quick overview of some of the columns in the dataset. With the filter feature available in Google sheets, I was able to check through the 9 columns and found no blank values or data with wrong formats. After checking through, I decided to alter the order of the date column. I simply changed the format from MM/DD/YYYY to DD/MM/YYYY for easier readability from my perspective.

A quick check through the data
Filtering the data and checking for blanks
Checking for the summary of a column

Visualisation: I visualized the data using google looker studio. After linking the dataset from google sheets, I cross checked the data to ensure they still maintained the same data type, before proceeding to generate various charts. Some of the charts I utilized were the scorecard, the column chart and the time series chart. I also added a date control so as to visualize the data for a specific time frame. Also the dashboard is 100% interactive and dynamic. Here is the link to the visualization if you wish to check it out.

The Dashboard
Date range control on the dashboard
A glimpse of the dashboard’s interactivity and dynamism

After the analysis, I was able to discover that Zone 1 consumed the highest amount of power for the year while Zone 3 consumed the lowest. Also power consumption was high sometime around august across all the zones and low sometimes around January and December. This also prompted me to further analysis by checking the power consumption by quarters and the results showed that the total power consumption was highest in the third quarter and lowest in the fourth quarter. Also, the general diffuse flow trend, a key indicator for power leakage, showed that power leakage must have been quite high around June. So 3 key takeaways from this is that:

Power consumption appears to be highest in Q3 (August — October) and lowest in Q1 (January — March) and there is a general upward trend throughout the year.

Power consumption in other zones seems to follow a similar pattern, especially to zone 1, with a peak in Q3 and a low point in Q1. However, the fluctuations seem to be less pronounced than in zone 1.

Power consumption in zone 3 appears to be more consistent throughout the year, with a slight increase in Q3.

With these findings it is possible make informed decisions on whether to increase power production in any of the Zones deemed fit and also trace causes to low and high power consumptions and also high potential power leakages during some specific periods. Also it might be possible the weather conditions might have also affected the power consumption, but this wasn’t visualized because there was a huge disparity between the electric power consumption and the weather metrics which doesn’t allow the graph to depict all the selected metrics on the same graph. If I would have to visualize this I would need to convert the power consumption values into compact forms rather than their full forms, hence, the reason for not visualizing it but it might be a useful insight for those hoping to also work on the data. Also more suggestions and perspectives are welcome and I’d be posting more of some analysis in the coming days, stay tuned.

--

--