PowerQuery Puzzle solved with R

Numbers around us
Numbers around us
Published in
3 min readJun 4, 2024

#187–188

Puzzles

Author: ExcelBI

All files (xlsx with puzzle and R with solution) for each and every puzzle are available on my Github. Enjoy.

Puzzle #187

There are two ways of showing data and it really depends on what is the purpose of certain presentation. Sometimes we only need to show those data point that have any data in it, but sometimes we need to make something like empty data point, which means that we need to include in our report all needed dimension values, all labels, but still have no value or value replaced by 0 for this row.

And that is what the task today is about. We have sales values for different continents in different years, but we don’t have cases where sales were 0. But we need to restructure report to have it. With some twists because we need summary rows and empty rows, and… North America is abbreviated to NA, and you know what could it cause in data. Check my solution.

Loading libraries and data

library(tidyverse)
library(readxl)

input = read_excel("Power Query/PQ_Challenge_187.xlsx", range = "A1:C12")
test = read_excel("Power Query/PQ_Challenge_187.xlsx", range = "E1:G30")

Transformation

all <- expand_grid(Continent = unique(sort(input$Continent)), Year = unique(sort(input$Year)))

result1 <- all %>%
left_join(input, by = c("Continent", "Year")) %>%
mutate(Sales = replace_na(Sales, 0),
Year = as.character(Year))

years <- unique(sort(result1$Year))

empty_row <- tibble(Continent = NA, Year = NA, Sales = NA_real_)

totals <- map_dfr(years, ~ {
yearly_data <- result1 %>%
filter(Year == .x)
total_row <- summarise(yearly_data, Continent = "TOTAL", Year = .x, Sales = sum(Sales))
bind_rows(yearly_data, total_row, empty_row)
})

grand_total <- summarise(result1, Continent = "GRAND TOTAL", Year = "2010-2013", Sales = sum(Sales))

result <- bind_rows(totals, grand_total)

Validation

identical(result, test)
# [1] TRUE

Puzzle #188

Sometimes we are missing some dimension of data, because they are just aggregated into bigger sets. And we have it here. Sales were summarized by period, very irregular period. And we need that data per quarter. As we don’t know how exactly sales went, we need to calculate how many days of each quarters there was sales and assign money proportionally. Let dig in it.

Loading libraries and data

library(tidyverse)
library(readxl)

input = read_excel("Power Query/PQ_Challenge_188.xlsx", range = "A1:D4")
test = read_excel("Power Query/PQ_Challenge_188.xlsx", range = "F1:H11")

Transformation

result = input %>%
mutate(date = map2(`From Date`, `To Date`, seq, by = "day"),
days = map_int(date, length),
daily = Amount / days) %>%
unnest(date) %>%
mutate(quarter = quarter(date),
year = year(date) %>% as.character() %>% str_sub(3, 4),
Quarter = paste0("Q",quarter,"-",year)) %>%
summarise(Amount = sum(daily) %>% round(0), .by = c(Store, Quarter))

Validation

identical(result, test)
# [1] TRUE

Feel free to comment, share and contact me with advices, questions and your ideas how to improve anything. Contact me on Linkedin if you wish as well.

--

--

Numbers around us
Numbers around us

Self developed analyst. BI Developer, R programmer. Delivers what you need, not what you asked for.