PowerQuery Puzzle solved with R
# 133–134
This weekend brought us two stories, two puzzles which have to things in common: they are pretty hard and difficulties were both linked to time dimension.
Puzzles:
PQ_133: content file
PQ_134: content file
PQ_133
In first puzzle we have data about electricity consumption measured in some points of time during two days across three meters. It shouldn’t be hard right? Yes, it could. Especially when one of the measurements crosses midnight. Let see what was difficult there.
Load data and libraries
library(tidyverse)
library(readxl)
library(lubridate)
input = read_excel("PQ_Challenge_133.xlsx", range = "A1:E9", col_types = c("date", "date", "numeric", "numeric", "numeric"))
test = read_excel("PQ_Challenge_133.xlsx", range = "G1:J3")
There is one issue with reading time column, and that is why I had to add “many” years to it as first data manipulation.
Prepare functions and mid-steps
df = input %>%
mutate(Time = Time + years(124) - days(ifelse(day(Date) == 1, 60, 59))) %>%
select(-Date) %>%
pivot_longer(-Time, names_to = "Meter", values_to = "Value") %>%
group_by(Meter) %>%
arrange(Meter, Time) %>%
mutate(end = lead(Time),
next_reading = lead(Value),
cross_midnight = ifelse(day(Time) != day(end), 1, 0),
duration = end-Time,
consumption = next_reading - Value ,
time_to_midnight = difftime( ceiling_date(Time, "day"),Time, units = "hours"))
)
In this step I had some things to clean data and make auxiliary columns:
- time of next reading;
- flag if measurement is crossing midnight;
- consumptions based on two consecutive measurements;
- and time left to midnight.
But there is where we need to think little bit more. First of all we need to split data for crossing and not crossing midnight. Only for crossing ones we need to calculate proportion of time which happens before and after midnight, then somehow split this proportion into two time periods: before and after.
df_cross = df %>%
ungroup() %>%
filter(cross_midnight == 1) %>%
mutate(perc_of_dur = time_to_midnight/as.numeric(duration),
cons_before_midnight = consumption * perc_of_dur,
cons_after_midnight = consumption - cons_before_midnight) %>%
select(Time, end, Meter, cons_before_midnight, cons_after_midnight) %>%
pivot_longer(-c(Time, end, Meter), names_to = "Midnight", values_to = "Consumption") %>%
mutate(Date_start = as.Date(Time),
Date_end = as.Date(end),
Date = if_else(Midnight == "cons_before_midnight", Date_start, Date_end),
Consumption = as.numeric(Consumption)) %>%
select(Date, Meter, Consumption)
Second part need just simple reshaping before bringing all data back together.
df_else = df %>%
ungroup() %>%
filter(cross_midnight == 0) %>%
select(Time, end, Meter, consumption) %>%
pivot_longer(-c(Time, end, Meter), names_to = "Midnight", values_to = "Consumption") %>%
mutate(Date = as.Date(Time)) %>%
select(Date, Meter, Consumption)
And then we can put it back in one data.frame and reshape it to be compared with provided solution. And it is.
result = bind_rows(df_cross, df_else) %>%
arrange(Date, Meter) %>%
group_by(Date, Meter) %>%
summarize(Consumption = round(sum(Consumption))) %>%
pivot_wider(names_from = Meter, values_from = Consumption) %>%
ungroup() %>%
mutate(Date = as.POSIXct(Date))
identical(result, test)
# [1] TRUE
PQ_134
Second puzzle was about stocks. We are provided with two tables: one with opening stock for first day in three stores, and one that has Ins and Outs from the stock for three consecutive days. What we need to find are closing stocks for each of those three days. I was trying with recursive calculations at first, but… I found out that there is another way. Lets find out!
Load data and libraries
library(tidyverse)
library(readxl)
input1 = read_excel("PQ_Challenge_134.xlsx", range = "A1:D12")
input2 = read_excel("PQ_Challenge_134.xlsx", range = "F1:G4")
test = read_excel("PQ_Challenge_134.xlsx", range = "J1:N12")
input2
is about opening stocks, but we don’t really know for what date. But we can 100% sure to assume that we need first day from second table.
opening = input2 %>%
mutate(Date = min(input1$Date)) %>%
rename(open = `Open Stock`)
There is where magic begins… we can little bit transform opening
df to have exactly the same structure like the second one. We have Date
, open
and store
. We need IN
and OUT
. Let make it by changing open
into IN
and create empty OUT
column. And then we can bind both tables together. Finally grouping data by Store
and Date
we can use cummulative sums to find daily stock moves and closing_stock
. Finally we can compare it to provided solution.
result <- opening %>%
rename(IN = open, OUT = integer(length(open))) %>%
bind_rows(input1) %>%
group_by(Store) %>%
arrange(Store, Date) %>%
mutate(OUT = ifelse(is.na(OUT), 0, OUT),
closing_stock = cumsum(IN) - cumsum(OUT)) %>%
ungroup() %>%
filter(OUT != 0) %>%
select(Store, Date, IN, OUT, `Closing Stock` = closing_stock)
identical(result, test)
#> [1] TRUE
Second puzzle was easier, but also tricky to find good and suprisingly easy solution.
We’ve seen some nice puzzles considering time changes and it shows little bit out of the box thinking. Feel free to ask, like and share.
Lets be in touch for next article about the functions.