PowerQuery Puzzle solved with R

Numbers around us
Numbers around us
Published in
9 min readNov 21, 2023

Excel BI’s Weekend PQ Puzzles Solutions

Until this time I didn’t really look on PowerQuery Puzzles of ExcelBI, which he publishes every weekend. But after making some posts about Excel Puzzles, I realized that PQ ones are equally if not more interesting. Usually they need more steps to achieve a goal, and sometimes they are little bit harder. In this series I will not make another framework approaches (data.table or base), but rather explain code step by step.
Today as a part of consistent publishing schedule, I start second format “PQ Puzzles Solved with R”, that will be published on Tuesday, one day after Excel Puzzles from all previous week. At the very begining I’ll start with 4 puzzles (from last two weekends). I hope you will enjoy it.

Puzzles:

PQ_129: content file
PQ_130: content file
PQ_131: content file
PQ_132: content file

PQ_129

What do we have here. Two-column table with dated measurement of temperature is provided, and we have to make something that we can call periodical report. We need to find out specific measurements, aggregates and properties from this time series to get: the highest, the lowest and average temperature per period and days when those extremes happened.

But we do not have specific intervals like Jan 1st to Feb 18th or anything similar. We have to calculate it dynamically. So… Let dive in…

Load data and libraries

library(tidyverse)
library(readxl)
library(lubridate)

input = read_excel("PQ_Challenge_129.xlsx", range = "B2:C1002")

I do not read testing table, because it would be only working on day of publication.

Prepare functions and mid-steps

extract_data <- function(data, period, date_range) {
# Convert the first element of date_range to a Date object
date_range_start <- as.Date(date_range[1], format = "%Y-%m-%d")

# The data manipulation begins here
data %>%
# Filter rows where Date is within the specified range
filter(Date <= today() & Date >= date_range_start) %>%
# Summarise the data to find maximum, minimum, and average temperature
summarise(
MaxTemp = max(Temp), # Maximum temperature
MaxtempDate = Date[which.max(Temp)], # Date when maximum temperature occurred
MinTemp = min(Temp), # Minimum temperature
MintempDate = Date[which.min(Temp)], # Date when minimum temperature occurred
AvgTemp = mean(Temp) # Average temperature
) %>%
# Ensure only one row of summary is returned
slice(1) %>%
# Add a column for the specified period
mutate(Period = period) %>%
# Reorder columns to have Period first
select(Period, everything())
}

The function takes three arguments:

  1. data: a dataframe that contains temperature data.
  2. period: a descriptive value (like a string) indicating the period of the data.
  3. date_range: a vector of dates defining the range for data extraction.

But we need to define those periods and date_ranges so, lets prepare one more auxiliary structure:

periods <- data.frame(
# Create a column 'Period' with descriptive names for time periods
Period = c("Last 7 Days", "Last 30 Days", "Last 365 Days", "Month to Date", "Quarter to Date", "Year to Date"),

# Create a column 'DateRangeStart' with calculated start dates for each period
DateRangeStart = c(
today() - days(6), # Start date for the last 7 days
today() - days(29), # Start date for the last 30 days
today() - days(364), # Start date for the last 365 days
floor_date(today(), "month"), # Start date for the current month
floor_date(today(), "quarter"), # Start date for the current quarter
floor_date(today(), "year")), # Start date for the current year

stringsAsFactors = FALSE # Ensure that strings are not converted to factors
)

How is it looks? Of course dates are calculated dynamically and will change every day.

           Period DateRangeStart
1 Last 7 Days 2023-11-14
2 Last 30 Days 2023-10-22
3 Last 365 Days 2022-11-21
4 Month to Date 2023-11-01
5 Quarter to Date 2023-10-01
6 Year to Date 2023-01-01

We have to process calculation for each of given periods which means intervals between start dates and today. So lets use our function which takes our original timeseries source, and two values from periods dataframe to calculate this report. We are going to map function on both input and periods data frames with following script:

output <- map2(periods$Period, periods$DateRangeStart, ~extract_data(input, .x, .y)) %>%
bind_rows() %>%
mutate(AvgTemp = round(AvgTemp, 2))

And something weird happened. I have warnings that there is Infinity in results. Quick look and I know. We cannot calculate results for last 7 days, because I prepare this article over week after dates in source ended. But whatever… Here you have result as we get it, for Nov 20th.

 # A tibble: 5 × 6
Period MaxTemp MaxtempDate MinTemp MintempDate AvgTemp
<chr> <dbl> <dttm> <dbl> <dttm> <dbl>
1 Last 30 Days 34 2023-10-22 00:00:00 -15 2023-10-29 00:00:00 11.1
2 Last 365 Days 39 2023-05-29 00:00:00 -26 2022-12-19 00:00:00 7.68
3 Month to Date 31 2023-11-02 00:00:00 -5 2023-11-04 00:00:00 16.6
4 Quarter to Date 34 2023-10-22 00:00:00 -17 2023-10-20 00:00:00 10.0
5 Year to Date 39 2023-05-29 00:00:00 -23 2023-07-05 00:00:00 7.7

I think it looks good and we can proceed to next puzzle.

PQ_130

Here we’ve got two tables and we have to find out which keywords from first one are present in which column of the second one. Sounds weird? Maybe little bit, but what exactly we have to do?

Load data and libraries

library(tidytext)
library(tidyverse)
library(readxl)

T1 = read_excel("PQ_Challenge_130.xlsx", range = "A1:B12")
T2 = read_excel("PQ_Challenge_130.xlsx", range = "D1:G6")

Test = read_excel("PQ_Challenge_130.xlsx", range = "D11:F23")

In table T1 we have keywords with id, in T2 we have columns to check.

Transform data

Now we have to extract two columns we need to check (Text and City Country) and “slice&dice” them to single words. At the end we need to put it in one table which will tell us from which column it comes.

R1 = T2 %>%
unnest_tokens(word, Text) %>%
ungroup() %>%
mutate(`Column Name` = "Text")

R2 = T2 %>%
unnest_tokens(word, `City / Country`) %>%
ungroup() %>%
mutate(`Column Name` = "City / Country")

Result = R1 %>%
bind_rows(R2)

Finally we are “slicing” first column for single words as well and join it with Result table. It will give us information about description of which person has which keyword id in which column. I promise it looks more clever than that sentence.

T1R = T1 %>%
unnest_tokens(word, Value) %>%
ungroup()

Res1 = Result %>%
left_join(T1R, by = "word") %>%
select(`Emp ID`, No, `Column Name`) %>%
filter(!is.na(No)) %>%
arrange(`Emp ID`, No) %>%
unique()

How it looks at the end?

# A tibble: 12 × 3
`Emp ID` No `Column Name`
<dbl> <dbl> <chr>
1 1 6 City / Country
2 1 9 Text
3 1 11 Text
4 2 4 Text
5 2 10 City / Country
6 3 1 City / Country
7 3 3 City / Country
8 4 2 Text
9 4 4 City / Country
10 5 5 City / Country
11 5 7 Text
12 5 8 City / Country

What else left? We can check if this solution is equal to solution/target table provided by puzzle author. And it is.

identical(Res1, Test)
# [1] TRUE

PQ_131

In this case it looks like somebody is doing accounting notes but has really narrow peace of paper and we need to expand it to shape that facts about one file are in the single row.

Load data and libraries

library(tidyverse)
library(readxl)

input = read_excel(“PQ_Challenge_131.xlsx”, range = “A1:F10”, col_names = FALSE)
test = read_excel(“PQ_Challenge_131.xlsx”, range = “I1:R4”, col_names = TRUE)

Transform data

We need to split this weird table into two elements: labels and values. It will help us to change its shape. Every odd row should be taken to labels, and every even to values. Then both of this auxiliary table should be pivoted to long version. That causes that both tables have exactly the same dimension, and that means that every label is just on the same row like its value in second table.

labels = input %>% 
filter(row_number() %% 2 == 1) %>%
pivot_longer(cols = -c(…1), names_to = “row”, values_to = “label”)

values = input %>%
filter(row_number() %% 2 == 0) %>%
pivot_longer(cols = -c(…1), names_to = “row”, values_to = “value”)

After this manipulation we can bind them column-wise. Then we need little cleaning and pivot it back to wide version. What does change here? Every file named Group has only one line with data about it.
Lets finally check if we are having the same values as in provided solution.

final = bind_cols(labels, values) %>% 
select(Group = ...1, label , value ) %>%
filter(!is.na(label)) %>%
pivot_wider(names_from = label, values_from = value) %>%
mutate(across(everything(), ~ifelse(is.na(.), NA_integer_, as.numeric(.))))

identical(test, final)
#> [1] TRUE

And it is identical.

PQ_132

I have some weird thought. In the office we meet in this puzzles, they have only narrow sheets of paper. Firstly accountant need to extract data from narrow sheet to wide version, now HR need to put wide table on only five columns. But we are here to help so lets go.

Loading data and libraries

library(tidyverse)
library(readxl)

input = read_excel("PQ_Challenge_132.xlsx", range = "A1:I6")
test = read_excel("PQ_Challenge_132.xlsx", range = "A10:E30")

Transform data

First thing we are asked for is to conactenate first and last name in one column.

result = input %>%
unite("Full Name", c("First Name", "Last Name"), sep = " ")

Then I need to create one auxiliary structure which is empty, but will be needed because we have odd number of columns to be wrapped, so we will need it to fill the hole.

empty = result %>% 
select(1) %>%
mutate(Attr = NA_character_, Value = NA_character_, `Emp ID` = as.character(`Emp ID`))

Then we do the main magic. This time first thing is to pivot it to longer version and nest rows. Then we are working on nested structure and its rows. After nesting we need to specify which row (which property of data record) is in even, and which in odd row, because it will determine if this property will finish in first or second column of unnested data.

rest = result %>% 
mutate(across(everything(), as.character)) %>%
pivot_longer(-1, names_to = "Attribute", values_to = "Value") %>%
group_by(`Emp ID`) %>%
mutate(row = row_number()) %>%
nest(data = c(Attribute, Value)) %>%
arrange(row) %>%
mutate(row_even = if_else(row %% 2 == 0, "2","1" ))

Finally we are splitting it into two tables by even/odd differentiation. Each of those tables have to be unnested separately. And it appears that one of them is 4x5 and the other 3x5. Here we need our empty table. After those manipulations we can bind both column-wise, clean up little bit and take care of column names.

final_one = rest %>%
filter(row_even == "1") %>%
unnest(data) %>%
select(-row_even, -row)

final_two = rest %>%
filter(row_even == "2") %>%
unnest(data) %>%
select(-row_even, -row) %>%
bind_rows(empty)

final = final_one %>%
bind_cols(final_two) %>%
select(1,2,3,5,6)

colnames(final) <- c("Emp ID", "Attribute1", "Value1", "Attribute2", "Value2")

Unfortunatelly unifying provided solution with our final result would need some additional adjustments, like type changing and so on. But if you would like to check it visually, you will see that it is exactly the same.

# A tibble: 20 × 5
`Emp ID` Attribute1 Value1 Attribute2 Value2
<chr> <chr> <chr> <chr> <chr>
1 813185 Full Name Marisol Hunt Gender F
2 252591 Full Name Emmett Cotton Gender M
3 781324 Full Name Monica Adkins Gender F
4 598718 Full Name Frederick Frazier Gender M
5 435759 Full Name Prince Hanson Gender M
6 813185 Date of Birth 1990-02-08 Weight 50
7 252591 Date of Birth 1968-10-24 Weight 71
8 781324 Date of Birth 1970-05-26 Weight 44
9 598718 Date of Birth 1989-03-07 Weight 65
10 435759 Date of Birth 1984-05-09 Weight 60
11 813185 Salary 154143 State NJ
12 252591 Salary 78407 State TX
13 781324 Salary 193067 State NV
14 598718 Salary 136736 State NJ
15 435759 Salary 72788 State IN
16 813185 Sales 127 NA NA
17 252591 Sales 3673 NA NA
18 781324 Sales 783 NA NA
19 598718 Sales 93783 NA NA
20 435759 Sales 7893 NA NA

It is, isn’t it?

Thank you for reading first part of PQ puzzles solved with R. This “episode” has 4 riddles, but next Tuesday will start normal mode which is “puzzles from last weekend only”.

I hope you like it. Look also for my general articles about R which will come on Thursdays and Excel puzzles that will always precede PQ by one day.

--

--

Numbers around us
Numbers around us

Self developed analyst. BI Developer, R programmer. Delivers what you need, not what you asked for.