Self-service quantitative research (feat. immigration data)

Jesse Kim
Cruising Altitude
Published in
6 min readMay 26, 2024

--

Since the 1990's, proficiency in Excel has been regarded as a virtue among office workers. Being able to command formulas and knowing which of the hundreds of built-in functions will produce time-saving results can indeed make a boring spreadsheet do wonders that deliver sought-after knowledge. That is, as long as people who use the spreadsheet don’t break the formatting, violate the meticulously stipulated rules, or wander off with their own isolated versions of the truth.

Times have changed. Self-service quantitative research, also commonly referred to as self-service analytics and self-service business intelligence (BI), has steadily commoditised much of the wizardry only Excel gurus, data nerds, and some management consultants could perform in the past. There are numerous highly accessible tools and services available, some of which are free and optionally offer AI-driven assistance. Here’s an example I recently put together on my own as a non-Excel professional:

The question of Big Australia, visualised (link in text)

View the live interactive report. Contents may change over time as up-to-date figures and additional sets of data are introduced.

Why self-service is a win for (almost) everyone

Before describing what went into my mini research or why I embarked on it in the first place, let me go over the reasons why self-service quantitative research, aside from the commoditisation of tools and services, is relevant and necessary going forward:

  1. Lack of tolerance for mediocrity. Equipped with options and know-how, people will no longer tolerate reports and dashboards that fail to meet usability standards, fail to unearth sought-after knowledge, or simply don’t exist where they are expected. Astute organisations who recognise this will have an appropriately governed self-service analytics platform rolled out accordingly.
  2. Getting AI to work. Self-service means taking charge of raw data and its lifecycle. Self-service is not the polar opposite of leveraging artificial intelligence; it is in fact a prerequisite to any meaningful attempt at AI-generated analysis. Proactive data profiling, transformation, and modelling are akin to engineering a prompt that works. This is critical when choosing to involve generative AI because the principle of garbage-in-garbage-out is acutely accentuated in the world of quantitative research where hallucinations are disallowed, full stop.
  3. Jobs lost. “Strong analytical skills” are no longer just a cliché casually attached to every job description. These are now a real must-have requirement in today’s job market, irrespective of industry, role, or seniority. To be successful, candidates need to cite appropriately sourced evidence to demonstrate the analytical skills they possess. At work, non-data professionals with self-service analytical know-how will have their roles augmented to take jobs away from full-time spreadsheet junkies. Data analysis is a function, not a profession.
  4. Trust issues. In a world where misinformation and disinformation are the number one global risk, forming an opinion or decision over a subject matter based on what sensationalist news outlets, influencers, and interest groups put forward as fact may lead to a regrettable outcome. Confidence can materialise only from data one can trust. The more intimate and business-critical a subject matter is to a person or team, the better they should know how the sausage is made.
  5. Competitive edge. Ultimately, the ability to conduct self-service quantitative research becomes a competitive edge, one that adds authority and weight to what a person or team acts for. It facilitates responsive, data-backed conversations, recommendations, and actions.
  6. Win-win. For existing Excel gurus, walking through the lifecycle of self-service quantitative research is a low-hanging fruit as long as they are open to separating the source of truth from the tool for analysis. As for non-Excel professionals, learning paths take them straight to modern techniques of data profiling, transformation, modelling, and visualisation that produce impactful results upon true collaboration and futureproofing. It’s a win for everyone who jumps on the bandwagon.

The “Big Australia” example

Back to the abovementioned report: I set out to build it because I wanted to establish a simple, reusable example of self-service quantitative research that I could demonstrate in a series of Power BI training sessions at work. I chose a public dataset whose visualisation would resonate one way or another with anyone who comes across it. The process of building such a report will be illustrated in the sessions, taking the participants through the steps along the self-service lifecycle, namely: problem statement, data acquisition and profiling, transformation, modelling, visualisation, and storytelling.

This particular example research raises and answers the following questions:

  • By how much has Australia’s resident population grown, really, over the last 10 years? How is the trend visualised on a quarterly basis?
  • What portion of the population growth was attributed to net overseas migration (immigration minus emigration) before the pandemic? Specifically, what was the annual average over the last 5 pre-pandemic years, that is, 2015-Q2 through 2020-Q1?
  • How has the above changed in the post-pandemic years, that is, 2022-Q1 through the final quarter for which official data is available (2023-Q3 at the time of writing)?

The results are quite telling, and I look forward to getting my hands on data for more recent quarters. “Where to go from here?” is also a key follow-up question for the researcher. If I or someone else were to continue working on this in the future, additional datasets such as employment, housing, inflation, and air passenger traffic could be incorporated for deeper, multifaceted analysis.

That said, I must stress that the whole point of this generic showcase is to illustrate how self-service quantitative research can be produced and delivered. Formulating and asserting opinions over a deliverable is strictly outside the scope of this exercise and entirely up to each individual consuming what they see.

Nevertheless, I hope to see a variety of “everyday” self-service research, no matter how small, appear in people’s blog posts, LinkedIn articles, podcasts, and throughout the wider public domain, beyond Academia and outside paywall-protected vaults of papers. In this age of automatically generated content and the Dead Internet theory, I would like producers of organic human activity to feature intellectual pursuits that are intimately theirs.

As an addendum, here are some technical notes on my Big Australia Power BI report:

  • The initial profiling of the raw data released by the Australian Bureau of Statistics revealed two streams of figures combined into a single table: Australia-wide and State/Territory. These two streams featured different sets of subject-measures (Net Overseas Migration, Natural Increase, etc.) and counting units. Yet, all observed values shared a single physical column. Building a combined line-bar chart to plot values from two different subject-measures within a single visual required the creation of a new calculated table (using DAX) with filtered rows of data.
  • A calculated date table was added to the data model to facilitate time-intelligence analysis. To ensure the correct range of auto-detected dates, however, each of the time-period designations (“2020-Q2”, etc.) in the source data was converted into a corresponding date value (2020–04–01, etc.) and stored in a new column upstream in Power Query.
  • The table of State/Territory breakdown follows the quarter selection made in the line-bar chart above it. If no single quarter is selected, it defaults to the final quarter rendered in the line-bar chart. In order to display values across all 4 numeric columns correctly, 4 separate DAX measures were implemented, each with REMOVEFILTERS() included in the logic to prevent unwanted filters from kicking in.
  • The card visuals in the right column use DAX measures to calculate each result. Asking ChatGPT how to calculate in DAX the average year-over-year increase over the past 5 years going back from a specified date, produced a reasonable starting point for the logic, which was subsequently adapted to the data model at hand.

--

--