Analytics Vidhya
Published in

Analytics Vidhya

Building & Publishing A MS PowerBI COVID-19 Analytics Dashboard Online

Test out the dashboard for yourself here at the link below:-

(Jan 2021 Edit Note that the chart no longer displays “live” data as the ECDC switched from daily to weekly for COVID-19 reporting and discontinued daily updates from 14 December effectively ‘breaking’ the auto-update rules I had set up. Therefore until I figure out how to ‘merge’ historical daily and weekly data , the dashboard only shows 2020 data)

TLDR Version:

Power BI is a great tool with a bunch of useful ‘out of the box’ features that should fit most standard data visualization and analytics tasks. A big plus is the friendly learning curve especially if you are already familiar with Excel and some basic SQL.

If you are just starting out with PBI , my top tips are :

1. Keep your Data Schema as simple as possible- trust me, this will save you grief and avoid the need for complex DAX queries later

2. Learn how DAX works especially around how it evaluates context/row filters - DAX is a powerful tool but it’s easy to get tripped up if you dive into it without proper understanding of what it can and can’t do…

3. Develop your ‘data-story’ - (e.g Are you showing a trend ? contrasting categories ?) BEFORE constructing your visuals ( Less is more! )

4. Consider how you want the user to interact with the dashboard — PBI is great for visualizing data but less straightforward if you want users to actively ‘add/edit’ the underlying data

Outline

  1. Introduction & Motivation
  2. Starting Up With Power BI
  3. Pulling The Source Data Into Power BI
  4. Cleaning Up The Data & Building The Data Model
  5. Building The Visualization Report
  6. Creating “User Interactive” Curves Via What If Parameters vs Custom Python Visuals
  7. (Unsuccessfully) Trying To Create A Solver/Optimizer In Power BI
  8. Exploring PBI Web Service
  9. Conclusion
  10. Other Learning Resources

1. Introduction & Motivation

Those of you who follow me over on LinkedIn may remember a post I made back in Jun 2020 at the height of the COVID19 crisis over here in Malaysia/Singapore where I built a simple excel based dashboard that pulls data from public sources to visualize some metrics

Since then, over the last few months, I’ve learnt some Microsoft Power BI , a data visualization / analytics tool and re-built the same dashboard in PowerBI and published it online (Yes, I apologise in advance for not picking a more uplifting topic other than COVID-19 but it was convenient to just build further on what I already had).

I initially tried other similar tools like Tableau and even Google Cloud’s Data Studio but the advantage PowerBI had was it’s similar “Look & Feel” to Excel which made it relatively easy to pick up and as with all ‘out of the box’ viz/analytics packages , it has a friendlier learning curve than trying to “DIY” your own visualizations via D3.js or Python’s Bokeh etc

The rest of this article is a tutorial covering all the various steps I took from building to publishing a report/dashboard and a few things I picked up working with DAX (PBI’s Formula & Query Language) and Data Modelling. Hopefully other data enthusiasts starting out with MS Power BI will find this helpful.

2.Starting Up With Power BI

Users normally build reports in the Power BI Desktop version (which can run from your local PC). After the report is completed, you then publish the report onto the Power BI (Web)Service and then it can be shared with others (or posted publicly — more on that later). There is also a PowerBI Mobile App that allows users to view reports on their devices too.

Downloading Power BI Desktop is relatively painless, it’s on the official website https://powerbi.microsoft.com/en-us/downloads/

However , the tricky bit is signing up for the PowerBI Web-Service as Microsoft does NOT allow you to sign up via email addresses provided by consumer email services. This is inconvenient as the Desktop version works fine but you can’t share your reports with anyone unless you send-over the PBIX file itself (which means receipient must have PBI themselves) . Also there are some visualizations/features that are only available in the Web Service version.

Thankfully, I had an existing separate Domain Name that I bought from GoDaddy and an email that came with it (I used it to set up a blog — more on that some other time). Therefore I could register directly for a new account.

Fortunately, there are other workarounds where for example you could register for a trial of Office365 or if you already have a Azure (Microsoft’s Cloud Service), you can set up an Azure Active Directory. More details in the video below

3.Pulling The Source Data Into PowerBI

For those of you who have used Power Query and Power Pivot in Excel, you should find this familiar as PowerBI uses the Power Query as well.

The COVID19 Case data comes from the European Centre For Disease Control’s website and I went with a CSV file.The Stock Market data comes from Yahoo Finance (also in CSV format)

Remember to set the permissions as I left these as Credentials = Anonymous and Privacy = None but the data occasionally didn’t load right when I tried to refresh it later. Setting Privacy to Level to Public seemed to have fixed it but not sure if it was coincidence.

Next, I faced a problem: Yahoo Finance only allows you to query one market at a time (i.e you cannot pull data for multiple markets at the same time in a single request). In the original Excel version of the dashboard, I dynamically pulled the Stock Exchange data “one country” at a time and over-wrote it each time the user refreshes the country selection (via a Excel Macro)

I was looking for an equivalent way in PowerBI where I could ‘load’ only one country’s stock market data at a time and update the table every time the user selects a different country. Although there does seem to be a way where you can set up “Query Parameters” to allow users to dynamically change queries, it ONLY WORKS IF the user have both the PBIX file and Power BI Desktop.

Unfortunately, it seems like there is NO SIMPLE WAY to change the data source “on the fly” after the Power BI Report is published as all reports reference a common dataset (Meaning that you have to load ALL your data sources first). In the end, I gave up this approach and just individually extracted the stock market info for a few key ASEAN markets rather than get too clever.

(You’ll see a similar problem later on in Section 6.0 where I try to allow the user to ‘change the parameters’ of a curve and dynamically reflect that ‘live’)

One last step is to ensure that the data is kept up to date , however you can only do that on the Power BI Web Service so I will cover this only later in Section 8.0

4.Cleaning Up The Data & Building A Data Model To “Link Up” The Tables

Alas, it’s not as simple as just dumping everything into Power BI and hoping stuff sorts itself out when you create the visualizations.

The first step was simple enough as I added new columns for Stock Exchange Name, Country Name (aligned to the labels in the ECDC data set) and Currency to the individual data tables extracted from Yahoo Finance

I then created a New Table and merged the various stock exchange data tables using a UNION function . (Which is the same .append for any panda lovers out there)

The trick then was finding a way to ensure that if I set a filter later for Country or Dates that the ECDC data and the Stock Exchange data ‘move together’. If you try to just link the two tables together , you’ll find that because there are many rows with duplicate dates (representing different countries) this results in a Many to Many relationship that won’t work. Furthermore because you can only have ONE active link between 2 tables , I could’t link the “countriesAndTerritories” field either.

Do Not Do This ! It Will NOT Work

I was looking for solutions where I could transform the data into the simplest form — a Star Schema but I didn’t want to mess around splitting up the existing tables so I eventually landed on living with 2 ‘Fact Tables’ and building two separate new dimensions tables for “Date” and “Country” and linking them up like so

Do This Instead !

I am not certain but it looks a lot like what some call the “Galaxy” Schema or “Fact Constellation” (who comes up with these names?!) where there are multiple fact tables linked to the same dimension table.

https://www.geeksforgeeks.org/fact-constellation-in-data-warehouse-modelling/

Apparently it’s not ideal because it’s hard to maintain and can lead to need to run complex queries. However for purpose of demonstrating PBI, what I set up seemed to work when I tested it out on some simple visualizations so I stuck with it.

5. Building The Visualizations And Other Misc. Features(Making A Reset Button)

I was not aiming for anything too fancy or complex so I went with a simple set of charts showing different metrics (COVID19 Cases and Deaths and also Stock Market info where all charts were controlled via the same slicers for selection of Country and Date Range and aligned horizontally.

I initially was thinking of putting everything into one SINGLE chart (I.e Daily , Cumulative Case and Stock Mkt Movements) but thought it would get very cluttered especially if users select MULTIPLE countries.

However I also began to notice some very odd behaviors where the if I used the Line & Clustered Column Chart , the Line can only reflect the SUM of all the categories.

I didn’t want the YELLOW line — I wanted the three lines below AND clustered columns in the same graph :(

Apparently there isn’t a Multiple Line and Clustered Column Chart — some quick searching shows that I am not the first to ask. There is a solution that involves adding a “Measure” with some DAX formula but it looks like you have to add it FOR EACH category ! Therefore at the end I just went with separate graphs for the cumulative and daily case numbers.

Next I also added a “Reset Filter” button. While PBIX has a nice feature that allows you to “slice/dice” your data when you click on a single data point to see how it relates to the other charts, it can get rather confusing and messy for report viewers when there are selections turned on.

Creating a “Reset Filter” button is super simple as you just need to go to VIEW > and check the BOOKMARK pane. In the Bookmark pane, create a new Bookmark and update it after you’ve set whatever default view you want in the current report. In the button, just set the “Action” to link to the Bookmark.

6. Creating A “User-Interactive” Curves

6.1 Using Power BI What If Parameters

In the Excel version of the Dashboard I had a feature where user can change the parameters that make up a Logistic Growth or SIR Model curve visualise how the Estimated Cases compare versus Actual Case numbers.

This proved to be a much trickier thing to do with PowerBI as fundamentally, it is a tool to query a data source, transform it by filtering values out etc or creating new calcs from the data and then displaying/visualizing it. You can’t ‘per-se’ amend values directly in your data source.

So I had to do some digging and found that there is a workaround of sorts using What-If Parameters in PowerBI.

These are normally useful to create a variable that Users can directly change and can be used in conjunction with data in the model to do calculations. The canonical example is setting a What-If Parameter for Discount % where you can then use that parameter to multiply against Retail Price where user can dynamically see the impacts to Total Revenue etc if more or less Discount % is given.

However I needed something a bit more complicated where I wanted to plot a chart where the Estimated No of cases are expressed via a function called the Logistic Growth Curve

Regrettably, Power BI is not designed to allow users to input/edit the underlying data in Report or Dashboard. Power BI is primarily a tool for extracting , transforming and then visualizing data — it does not allow users to ‘interact and alter’ the data itself (I.e change the value in a row or column rather than just filter them out). This is the so called ‘write-back’ problem . It can be achieved using another Microsoft tool (Power Apps) but it needs some extra work (Edit — Watch this for a demo ! https://www.youtube.com/watch?v=us0hPFwQ4Zs )

Thankfully, though after trawling through some online forums, I found a relatively simple way to automatically edit a curve ‘on the fly’ based on user input without having to resort to PowerApps or any complicated coding.

First I set up What-If Parameters reflecting the L , Xo, k values which creates a new measure and a slicer on the report tied to the value of the parameter.

After that I created a new table with just a bunch of dates matching the ECDC date range and created two new measures — the first was just to calculate the “No of days since the start of infection wave” that is used as the “X” value in the Logistic Curve.

LogisticCurveDays =VAR StartDay=INT(SUMX(LogisticCurveEst,LogisticCurveEst[DaysFrom1Jan2020]-'LogisticCurve - Start Day(From1Jan20)'))RETURN IF(StartDay<0,0,StartDay)

and another to reflect the Logistic Curve itself

Estimated Cumulative Cases = 'LogisticCurve - Est Max Cum Cases L'[LogisticCurve - Est Max Cum Cases L Value] / 
(1 + EXP(-'LogisticCurve - Steepness k'[LogisticCurve - Steepness k Value]*([LogisticCurveDays]- 'LogisticCurve - Mid Point Xo'[LogisticCurve - Mid Point Xo Value])))

The problem I did run into was that I could not find a satisfactory was to “link up” the Logistic Curve Dates field with the “Date” Dimension table.

When I did link them, it created a weird behavior where because the Logistic Curve Dates had a One to One Relationship.

The top chart correctly shows the Jun cut off but the bottom chart shows zero (because it is calculated using a “Measure” and not a “Calculated Column”) but does not automatically adjust the x axis scale to match.

Therefore whenever the Slicer range is EARLIER than the last day in the Market or ECDC table , the Logistic Curve Estimated value gets ‘cut-off’

In the end , I decided that since I never used any links between the Logistic Curve and the Market Data, I could just link the Logistic Curve directly with the ECDC data as a ‘dimension table’ instead.

The problem then is that I could not plot both the estimated and actual cumulative case numbers on the same chart. Somehow because I was using the date in the Logistic Curve table as the X-axis , it was messing up the calculated measure for Actual Cumulated Cases (It was coming out tiny)

The Actual Cumulated Cases at the bottom and top chart do not match up because of the filter context (I think…)

I then realized that the simpler way to do this rather than create a new table and make a relationship with the ECDC Data table was to replicate the same measures DIRECTLY in the ECDC Data table itself — side stepping the issue altogether ! ( Note — I left the standalone Logistic Curve table in the PBIX file if you want to experiment and replicate the error yourself when following along )

I created the same Measures but this time DIRECTLY in the ECDC Table rather than in a linked Table

6.1 Using Custom Python Visuals

The SIR Model from the Excel version of the dashboard was a lot harder to replicate in Power BI than I expected.

It goes back to the fact that the SIR calculation uses differential equations that refer to a value in the prior time step. In Excel, it was a simple straight-forward job where I just set a formula for the next row in the series based on the previous row of the series as per below image.

Every row refers to values from the row above — i.e ‘recursive’

Strangely this is NOT possible to do in DAX :-

There are dozens of threads of users trying to “workaround” the problem with different solutions depending on the exact nature of the problem:

a) For cases where the column references just an increasing number (e.g. Y= aX + C ) , the solution is fairly simple — create a “helper” index columns with GENERATESERIES where the formula would reference the index values (which is sort of what I did for the earlier Logistic Growth function)

b) For cases where the function needs to access the prior row IN ANOTHER COLUMN, use a filter with the EARLIER function

c) For columns with “self-join” characteristics (e.g. such as the common problem of displaying organisational reporting relationships) — you could use the Parent-Child Hierarchy pattern that flattens the hierarchy into multiple columns or some form of Lookup+Index+PATH formula

d) For cases where the function is self-referential/complex but “static” (i.e doesn’t change) — use LIST.GENERATE in Power Query’s M Language even before the data enters PowerBI where you can set more complex formulas that need this “self-referential” nature

Unfortunately — none of these fit my use case which was a column that referred to itself (not another column) AND needed to be dynamic so that the user can change some parameters and immediately visualize the new graph. Therefore I eventually gave up on a ‘native’ Power BI solution and turned to ….Python !

PowerBI has a feature that allows you to run Python scripts for custom visuals

Therefore I adapted code from this site and added in the SIR Model parameters into the custom Python Visual.

Screen Cap of How It Looks On Power BI Desktop On My Local PC

Therefore I side stepped the problem entirely because I used the Scipy library which has functions to handle differential equations to calculate the SIR values and then plotted it alongside the Actuals for Daily Cases from the ECDC website.

However before you get too excited — the really depressing part for me is the fact I can’t showcase my work as PowerBI only allows “Pro” subscribers to publish reports with custom Python Visuals.

Apparently I wasted ~3–4 Hours :( Python Visuals Are Only Supported For “Pro” Subscribers !

(In any event, the native pbix file and the Python script is in the github repo if you would like to try this out on your own machine)

So to wrap up this section — DAX is an extremely powerful tool once you get the hang of it (Like a katana to a butter knife if you were to compare it to Excel’s HLOOKUP/VLOOKUP/INDEX MATCH capabilities) and don’t try to use it to do something it wasn’t designed for (Which is like using the said katana to pound in a nail and complaining that it is an awful hammer) .

As someone starting out with PowerBI the important key concepts I found were:

  • How DAX evaluates a formula i.e things like Filter/Row Context
  • How iterators (e.g. SUMX) that work on row-by-row evaluation vs SUM that just consider total column
  • The difference between Calculated Columns and Measures
  • Quite often you may need to iterate onto your Data Model/Schema as well (i.e what data stored on which table and how related) as a fix on the Data Model may be quicker than writing a super complex DAX query
  • DAX really doesn’t like ‘recursive’ calculations so things like loops or ‘self-referential’ formulas are tricky to ‘replicate’ in PowerBI

7. (Unsuccessfully ) Trying To Create An Solver / Optimisation Method In PowerBI

Almost there…I wanted to see if there was a way to use PowerBI to run something like Excel’s Solver Add-In where in the original Excel dashboard I used it to minimize the error term by varying the k, Xo, start and L values .Sadly there was nothing “Out of The Box” in PowerBI .

FrontLine Solver (the company that makes the Solver function for Excel) seems to have a solution. I admit , it looks pretty slick but sadly to get it you need to have a paid subscription of ~250USD /yr for Analytic Solver so I gave it a pass.

Some articles seem to indicate that it may be possible to do something similar with a custom Python or R script

The challenge was that I could not find an equivalent example for Python. I have worked with some Python libraries (SciPy and PulP) that can do linear programming but the problem is that the problem is optimising for the Logistic Growth Curve is a Non-Linear problem.

Considering how complex this was getting , I left this feature for a future update of the dashboard.

8. Exploring Power BI Web Service & Creating An Embed Code

With the report completed, I ‘published’ it to Power BI’s web service. (Note — whenever you ‘publish’ the report from Power BI Desktop to Power BI Web Service, it creates a Content (e.g. Report/Dashboards) and separate a Dataset ‘file’ vs on PowerBI Desktop version where these are in a single PBIX file)

It’s worth exploring the web service version of PowerBI as it has some features that do not exist on the Desktop version. I particularly like the Get Quick Insights functionality.

Whilst it’s fairly obvious in this data set (i.e Cases have increased over time — duh) , this feature is quite handy if you have other multi dimensional datasets.

The web version also has a Q&A widget that you can add that allows users to “ask” questions in natural languages.

It’s pretty neat as users can just enter a free form question and PowerBI tries to “parse” it to return relevant data without the user having to build the chart themselves

Frustratingly — the Q&A function doesn’t seem to work when you publish the report live. (It seems like it’s only for Pro/Premium Tier users)

I mentioned it earlier but the Power BI Web Service is also where you specify a schedule refresh so that you can keep your dataset and report current. You do this by going into the Dataset > Settings > Datasets > Scheduled Refresh

Admittedly, for the simple report I’ve built, it’s relatively simple as the data only gets updated daily. In practice, most reports would more likely be built from a data pipeline (where data is streamed in live) and this can get quite involved. If it’s of interest, I suggest a read through of the official docs to get a better understanding of what to do for rapidly changing data or different sources with different update frequencies.

(Jan 2021 Edit Note that the ECDC switched from daily to weekly for COVID-19 reporting and discontinued daily updates from 14 December effectively ‘breaking’ the auto-update rules I had set up earlier. Therefore until I figure out how to ‘merge’ historical daily and weekly data , the dashboard only shows 2020 data)

So anyways, getting back to the main objective, to make your final report viewable on the public internet, you need to get an embed code as per below.

Once you have it, you can either embed the report as an iframe [I would have done it here in this article but Medium doesn’t allow you to embed i-framess :( ] or share a hyperlink to a newly created page with the report that Power BI automatically hosts for you.

This brings me to the end of the example but there are still a lot of features that I have yet to fully explore as my intent was to just build a demo report

9.Conclusion

In this article, I outlined the various steps I took to recreate the Excel Dashboard in PowerBI and publish it online.

A summary of my personal reflections while learning PowerBI :-

  • Power BI’s DAX query language is a very powerful tool but it requires a good understanding of how DAX evaluates formulas
  • Invest time upfront to think through the Data Model / Schema. This can avoid the needing complex queries or issues in future model updates
  • Less Is More with Visualisations- Understand what your users might be interested in and what you are trying to show
  • PowerBI is primarily an analytics tool — It’s great for visualizing data interactively but if you want something that allows users to change the underlying data itself, you’ll need to extend it’s functionality with something else like MS’s PowerApps or use some other ERP system tool

Hope this helps. Leave me feedback in the comments if there is anything else you think I missed (e.g. I was considering taking a closer look at the Power BI Mobile reports )

Till next time !

#PowerBI

#DataAnalytics

#DataVisualization

For anyone looking to follow along, the raw PBIX file itself available here :

https://github.com/ZhijingEu/Demo_PowerBI_Covid19_Dashboard

10.Other Learning Resources :

I haven’t fully explored all the options as there is a healthy user community online and a lot of youtube channels (Guy In A Cube is pretty good).

I also recommend the following resources if you want to learn PowerBI:

--

--

--

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Recommended from Medium

Quantifying political disruption

The Data Visualization Polarity

3 Reasons I’m Not Becoming a Data Scientist

(Photo Credit: https://unsplash.com/@cgower )

Data Observability is Irrelevant without Context

Voters Guide Software Product Development

Building a Youtube music recommender

Static vs Dynamic in Application Security Testing

How Google and Facebook Handles the huge data thrown at them?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Zhijing Eu

Zhijing Eu

Hi ! I’m “Z”. I am big on sci-fi, tech and digital trends.

More from Medium

Google Data Analytics — Case Study (Cyclistic Bike Sharing Data Analysis)

Google Data Analytics Capstone Project: Bike-share Cyclistic Case Study

Google Data Analytics Capstone Project

Processing and Analysis on Dataset About Sleep Data Using Hadoop Tools