Excel to SQL to PowerBi Portfolio Project

Michelle Wanderi
7 min readJun 13, 2024

--

In this project, we are pulling data from Kaggle, exploring and analysing it in Excel, cleaning and testing it in SQL, and visualizing it in Power BI.

The aim is to find the top Youtubers in the UK for a marketing company so that they pick the creator with the highest subscriber count who will help them in promoting their businesses.

So what will be our key solution?

To create a dashboard that provides insights into the top UK YouTubers in 2024 that includes their

  • subscriber count
  • total views
  • total videos, and
  • engagement metrics

This will help the marketing team make informed decisions about which YouTubers to collaborate with for their marketing campaigns.

Project Stages

  1. Get the requirements from the user- this is what the user wants to see in the dashboard
  2. Design the dashboard- this will be just a rough outline of what we want our final product will look like
  3. Collect data from the source- in our case, Kaggle
  4. Explore the data and its content- this includes scanning the fields and its properties
  5. Clean the data
  6. Test the data- here we run data quality checks to make sure there are no issues.
  7. Visualize the data
  8. Justify and record our findings.
  9. Write the documentation + commentary
  10. Publish the data to GitHub Pages

User requirements

The objectives and user requirements have already been identified now we can proceed to getting and preparing the data

Data Collection

The dataset is provided by Kaggle which contains the Top 100 influencers of most countries on Instagram, Youtube, Tiktok and Threads. Our main focus will be on Youtube. The dataset is in the form of an Excel file. Here is a brief overview of the dataset.

Data Exploration

This is the stage where you have a scan of what’s in the data, errors, inconsistencies, bugs, weird and corrupted characters etc

  • What are your initial observations with this dataset? What’s caught your attention so far?
  1. There are at least 4 columns that contain the data we need for this analysis, which signals we have everything we need from the file without needing to contact the client for any more data.
  2. The first column contains the channel ID with what appears to be channel IDS, which are separated by a @ symbol — we need to extract the channel names from this.
  3. Some of the cells and header names are in a different language — we need to confirm if these columns are needed, and if so, we need to address them.
  4. We have more data than we need, so some of these columns would need to be removed

Now we load the Excel file to the SQL Server, in this case Excel is a flat file. So we’ll need to create a new database using the CREATE DATABASE command then we’ll import the flat file. This should be the final look.

Data cleaning

The aim is to refine our dataset to ensure it is structured and ready for analysis.

The cleaned data should meet the following criteria and constraints:

  • Only relevant columns should be retained.
  • All data types should be appropriate for the contents of each column.
  • No column should contain null values, indicating complete data for all records.

These are steps that are needed to clean and shape the data into the desired format.

  1. Remove unnecessary columns by only selecting the ones you need
  2. Extract Youtube channel names from the first column
  3. Rename columns using aliases

The column NOMBRE which contains the channel names and the channel ids separated by @, so we’ll use CHARINDEX and SUBSTRING function to separate the channel names from channel ids and removing the @ character.

The CHARINDEX function is used to find the position of the @ symbol then using the SUBSTRING function removes it from the string by subtracting one.

We then have to make sure that the channel name column only accepts string characters, this will be done by using CAST and specifying it as VARCHAR

Next step will be creating a view which will contain the data we specifically want the users to see on PowerBi.

This is how the query looks.

The view looks like this :

Now we have the neccessary and relevant columns to proceed with running data quality checks. This is to ensure that we have complete and accurate data.

Data Quality Checks

Data analysts must ensure that they are presenting high quality data that will enable accurate insights to be derived.

We will perform 4 data quality checks that will prove that our data is valid. They include:

The queries of the checks :

The output of the checks:

Since all the checks have passed we can now load our data to PowerBI for visualization.

Data Visualization

You’ll need PowerBI Desktop installed for this next step.

Now we import the data which is the view we created to PowerBI

On Data Tab on the right we can see that the data has been successfully imported.

Now, we’ll create measures which will help us come up with our visualizations. A measure is a column that summarizes a group of records such as total subscribers.

We are going to use DAX code to create measures for all the components we want in our visualization

This is the first measure which sums up all the subscribers we have in our view.

The second measure which sums up all the views we have in our view

The third measure which sums up the total number of videos.

The fourth measure which averages the total number of views per video.

The fifth measure which finds the subscriber engagement rate

Now we have all our measures now we can proceed with building visuals starting with the table which contains the fields we need. Our columns contains all the totals so it will be easier to represent the data.

Then we proceed to a tree map which will help in visualizing top 10 youtubers with the most views. If you hover over the blocks you’ll see the total number of views, total subscribers and total views.

The next visualization will be score cards.

Lastly we’ll need the clustered bar chart which will filter to top 10 Youtubers by subscriber count.

Now all the visualizations are loaded into the dashboard so now we format the dashboard — this includes renaming the fields, titles and adding titles where they are needed

Also changing the color scheme to make it look more appealing. Avoid a lot of colors which cause visual clutter.

This is our final product :

Findings and Recommendations

  • Findings

We discovered that

  1. NoCopyrightSOunds, Dan Rhodes and DanTDM are the channnels with the most subscribers in the UK
  2. GRM Daily, Man City and Yogscast are the channels with the most videos uploaded
  3. DanTDM, Dan RHodes and Mister Max are the channels with the most views
  4. Entertainment channels are useful for broader reach, as the channels posting consistently on their platforms and generating the most engagement are focus on entertainment and music

Recommendations

  1. Dan Rhodes is the best YouTube channel to collaborate with if we want to maximize visbility because this channel has the most YouTube subscribers in the UK
  2. Although GRM Daily, Man City and Yogcasts are regular publishers on YouTube, it may be worth considering whether collaborating with them with the current budget caps are worth the effort, as the potential return on investments is significantly lower compared to the other channels.
  3. Mister Max is the best YouTuber to collaborate with if we’re interested in maximizing reach, but collaborating with DanTDM and Dan Rhodes may be better long-term options considering the fact that they both have large subscriber bases and are averaging significantly high number of views.
  4. The top 3 channels to form collaborations with are NoCopyrightSounds, DanTDM and Dan Rhodes based on this analysis, because they attract the most engagement on their channels consistently.

Publishing

More details about the project are found in my Github. Here is the link to the repository.

https://github.com/Michelle-Wanderi/Excel-to-SQL-to-PoweBI-Portfolio-Project

Conclusion

Working in this project has helped me learn master SQL and PowerBI concepts and learn DAX, writing the documentation of a project from start to end and deploying on Github pages.

Feel free to provide feedback, like and share

Follow me for more!

You can find me on LinkedIn and Github.

--

--

Michelle Wanderi

A student doing Data Science. Learning data analysis and data engineering