As part of the She Code Mentorship Program, Data Science Track, I had the opportunity to learn Power BI. Microsoft Power BI is one of the leading data visualization tools, which aims to provide easy capabilities for creating interactive reports and dashboards.
Power BI has various components that allow you to perform different functionalities. The Power BI components include Power BI Desktop, Power BI Service, Power BI Gateway, Power BI Pro, and Power BI Mobile App.
Power BI Components & Tools
The Power BI Desktop version which I will be using in this article to share my experience is 100% free. It allows you to perform data analysis, create interactive visualizations and reports. However, the limitation of using this version is on collaboration and creating dashboards. For you to share reports, create dashboards or collaborate with other team members, you have to make use of the Power BI Service (cloud service component).
Power BI Desktop allows for connectivity from different sources of data with a wide range of file formats.
The Global Suicide Rate Dataset was the mini project is a CSV file with 12 columns; country, year, sex, age, suicide-no, population, suicide/100k pop, country-year, HDI for year, gdp_for_year ($), gdp_per_capita ($), and generation.
Power BI Desktop has three views:
- Report view (default): enables you to create reports and visuals
- Data view: allows you to see the data being used in the data model, associated with the report.
- Relationships view: shows the relationship view of the data model and allows you to manage relationships among the tables.
After importing the dataset, the next was cleaning and transform it to a presentable form. Power BI Desktop has a powerful feature for this, called Power BI Query Editor that allows you to perform this functionality before loading the data to Power BI.
Functions performed on the dataset:
- Removing columns
Used the RIGHT() function to extract the year from the country_year data. Here, I discovered that the data from the country_year column was a merged column for the country and year columns. I did not require the column, hence dropped it from the dataset.
- Replacing values
HDI for year had blank rows. Using the Replace Value option, I replaced all the null values in the column with 0.
- Custom column
Some data in the generation column did not align with the age column.
Created a custom column AgeGeneration Group in the dataset to match generation with the correct age range, using the DAX Nested IF function.
DAX (Data Analysis Expression) is the language for creating calculations in Power BI.
Think of DAX as Excel functions and formulas.
In a real-world scenario, the age group 5–14 years are “Generation Alpha”, while 88+ years are considered the “G.I Generation”. In my data, since “Generation Alpha” was not in the initial data set, they were factored as “Generation Z”. The “G.I Generation” was combined with “Silent” as 75+ years suited their age range.
#AgeGeneration Group calculated columnAgeGeneration Group = IF(master[age] ="5-14 years", "Generation Z", IF(master[age] = "15-24 years", "Generation Z", IF(master[age] = "25-34 years", "Millenials",IF(master[age] = "35-54 years", "Generation X", IF(master[age] = "55-74 years","Boomers", IF(master[age] = "75+ years", "Silent/G.I Generation"))))))
4. Sort by column
Power BI allows you to sort specified data in ascending or descending order, and by their data fields.
I am ordering the suicide rate by age, based on descending order. However, if you look at the chart above the age range, 5–14 years is wrongly ordered. To make it in the order of 75+ years, 55–74 years, 35–54 years, 25- 34 years, 15–24 years, and 5–14 years, or vice versa, we have to sort our column based on another column.
Steps: Used Enter data option to create a Sort Age table. The new table should have the same values as the original column you want to sort. Once you have created the age column, you create an order column to indicate the order in which the age range should be ranked.
Once the sorting table is created, you create a relationship between the sort age table and the master table. A Many-to-One relationship will be created.
If you want to bring the age column from the sort age table to the original master table, you can create a calculated column using the RELATED () function below. In my case, I decided to retain the new table as is.
// New Column in master table
Now, when sorting the suicide_no column by the new age sort column, it is well ranked.
Power BI Reports
This functionality allows you to visualize and present different findings and insights from the dataset. The report can comprise one or more pages of different interrelated visuals.
In these reports, you are also able to drill down, filter, and use DAX functions to provide more insights.
Power BI offers visualization options with different functionalities. They include column & bar chart, stacked bar/column chart, clustered bar/column chart, 100% stacked bar/column chart, Combo chart, Line chart, Area chart, Pie chart, Donut chart, Treemap, Maps, Card, Multi-row card, KPI, Gauge Chart, Table and Matrix.
Global Suicide Rate Analysis
The United States has the highest population. The bigger the circle, the higher the population in that particular country.
Overview of Global Suicide Rates
Russian Federation with a population of 4 billion people, records the highest number of suicide rates with a majority aged between 35–54 years (Generation X).
Suicides by GDP/Year, GDP/Capita, HDI for Year
Countries with the highest GDP per Capita, HDI, and GDP per year have a higher population and higher suicide rate.
Suicides by Sex
More males than females are committing suicide over the years. However, there is a drop in suicide numbers from the year 2011, with a significant reduction in suicide rate in both genders in 2016.
Suicide by Population
Highly populated countries have higher suicide rates, while countries with low populations have lower suicides occurring.
Suicides by Age
In the charts here, the suicide rate is broken down by age, generation, year, and top 10 populated countries. Globally, the suicide rates are high in people aged 35–54 years (Generation X).
In the chart here we see the number of suicide deaths per 100k people in a given population. The numbers are broken down by sex, age, and top 10 countries by suicide numbers. As per the chart, there are more deaths by males and 75+-year-olds, due to suicide.
This being my first attempt using Power BI, I can gladly affirm that it is indeed user-friendly. I like the fact that it is interactive in the creation of visualizations and reports, due to the drag and drop feature.
The other thing is, with Power BI you can integrate and run your R or Python Scripts and obtain visualizations. You can read all about it here. As a Pythonista, I cannot wait to dive into this feature.