PowerQuery Puzzle solved with R

Numbers around us
Numbers around us
Published in
3 min readDec 26, 2023

# 141–142

Puzzles:

PQ_141: content file
PQ_142: content file

Last time I noted that usually PQ Puzzles are focused on times and dates. And… it confirms this week. Lets count it.

PQ_141

Rolling averages and other rolling calculations are very useful concepts because it can show us main trend cleaned from lower time granulation anomalies. In R we can play with comparison of dates to dates with lagged measurements. It is not hard but can complex task. Every complex task has much bigger chance for mistakes. Thats why I use dedicated package for out today puzzle. We have measurements for months and all we need is to calculate mean average from last 3 and 5 months.

Load libraries and data

library(tidyverse)
library(slider)
library(readxl)

input = read_excel("Power Query/PQ_Challenge_141.xlsx", range = "A1:C35")
test = read_excel("Power Query/PQ_Challenge_141.xlsx", range = "E1:I35")

Transformation

result = input %>%
group_by(Month) %>%
mutate(
`3 Year MV` = slide_dbl(Defects, mean,
.after = -1,
.before = 3,
.complete = TRUE) %>% round(0),
`5 Year MV` = slide_dbl(Defects, mean,
.after = -1,
.before = 5,
.complete = TRUE) %>% round(0)
) %>%
ungroup()

Validation

identical(result, test)
#> [1] TRUE

PQ_142

In this puzzle we calculate something that HR called “head count”, which means number of people hired or working in certain period of time. In our task we have to create head count for each quarter of hour from 9AM to 9PM, having start and end time for 3 people. It can be tricky in some ways but I will explain it later.

Load data and libraries

library(tidyverse)
library(readxl)
library(lubridate)

input = read_excel("Power Query/PQ_Challenge_142.xlsx", range = "A1:C4")
%>% janitor::clean_names()
test = read_excel("Power Query/PQ_Challenge_142.xlsx", range = "E1:F49")

Transformation

input <- input %>%
mutate(interval = interval(ymd_hms(start_time), ymd_hms(end_time)))

quarter_table <- tibble(
interval = interval(
seq(ymd_hms("1899-12-31 09:00:00"),
ymd_hms("1899-12-31 20:45:00"),
by = "15 mins"),
seq(ymd_hms("1899-12-31 09:14:59"),
ymd_hms("1899-12-31 20:59:59"),
by = "15 mins")
)
)
head_count <- quarter_table %>%
mutate(
Count = map_dbl(interval, ~sum(int_overlaps(.x, input$interval))),
Time = paste(format(int_start(interval), "%I:%M:%S %p"),
format(int_end(interval), "%I:%M:%S %p"),
sep = " - ")
) %>%
select(Time, Count)

Validation

You can validate it by your eyes, because there is one issue. If we want to have disjoint time intervals we need to finish previous one second before the next started. But if we do it that way we have problem with person who ends its job at 19:00:00 because it would look like working only one second in next interval. That is why I cannot validate it with given results, but except one line values are exactly the same.

Thanks for your engagement, and let me know if you have any comments. Stay tuned, because this Thursday article about memoization will be online.

--

--

Numbers around us
Numbers around us

Self developed analyst. BI Developer, R programmer. Delivers what you need, not what you asked for.