rstudio::conf 2020

RStudio Conference 2020 Packages and Highlights and Random Packages and thoughts

Nima Safaian
9 min readOct 7, 2020

Sortable

This was part of the poster session in the rstudio::conf() 2020. Great for drag and drop shiny objects from one bucket to another and capture the interaction in the server side.

[Sortable github page](https://rstudio.github.io/sortable/index.html)

End-to-End Data Science with Shiny, Plumber and Pins

This was a presentation by Alex Gold (@alexgold) engineer from RStudio. The workflow makes it easier for deployment and sharing models and input data. This is another way to make the entire process very reproducible. The files for this presentation is located in github.

Noteworthy package here is pins:

The package library can be found in: pins github directory

Setting up boards are easy in the git hub case you can use:

You can pin the data to your github board using the pin command as follows:

pin(iris, description = "The iris data set", board = "github")

You can get the data back using:

pin_get("iris", board = "github")

Or find the data in the board:

pin_find("iris", board = "github")

**The rule is if the data is large (25M in github) and greater then 100M using rsconnect setup avoid using this.

Plumber Workflow

Very well done a presentation on how to develop and deploy plumber API with unit tests. An interesting use case for continuous deployment from github directly to RStudio Connect. The code base is located under: github and slide decks for the talk are under: Slides

Producing Reproducible Examples

This package is used to generate reproducible examples for stackoverflow or github. Reprex R package is located under: reprex

require(reprex)
(y <- 1:4)
## [1] 1 2 3 4mean(y)## [1] 2.5

File System

fs package is cross-platform file system access package. More user-friendly version of the base R. The package location is: fs

rmarkdown series of talks

The first discussion by Atlas the full discussion in located under ratlas. The most interesting aspect of the talk was branded microsoft word branded documents. Interesting no discussion on google branded products from rmarkdown this could be ideal for project rgooglemarkdown???

Debugging Keynote

Really interesting talk by Jenney Bryan from RStudio entertaining but useful tips full presentation is under debugging

ggplot2 Discussions

Great presentation on ggplot2 Slides but the same information can be found in the tidyverse documentation Link

The focus of the presentation was around using ggplot within a function and package where input parameters need to become dynamics for generic functions.

Use .data[[“variable_name”]] within aes() and/or vars()

ggplot(…) + item1 + item2 is identical to ggplot(…) + list(item1, item2) Use it to conditinally more than one item to a plot!

Example of passing the col name

require(ggplot2)
col_summary <- function(df, col) {
ggplot(df) +
geom_bar(aes(x = {{ col }})) +
coord_flip()
}

col_summary(mpg, drv)

Using the vars within the facet wrap to make it more dynamic

#' @importFrom ggplot2 ggplot aes vars facet_wrap geom_point labs
#'
plot_mpg <- function(colour_var, facet_var) {
mapping <- aes(displ, hwy, colour = .data[[colour_var]])
if (is.null(colour_var)) {
mapping$colour <- NULL
}
if (is.null(facet_var)) {
facet <- NULL
} else {
facet <- facet_wrap(vars(.data[[facet_var]]))
}
ggplot(ggplot2::mpg) +
geom_point(mapping) +
facet
}

tidyr

Not directly from the conference but there were a bunch of discussions on new tidyr. Since I have been using tidyr the old way trying to explore the new functions. This is mainly using the Hadleys Sept 13 blog post

  • pivot_longer() and pivot_wider()
require(tidyr)## Loading required package: tidyriris2<-iris %>% pivot_longer(
-dplyr::starts_with("Spe"),
names_to = c("Type"),
names_pattern = "(.*)"
)
iris2
## # A tibble: 600 x 3
## Species Type value
## <fct> <chr> <dbl>
## 1 setosa Sepal.Length 5.1
## 2 setosa Sepal.Width 3.5
## 3 setosa Petal.Length 1.4
## 4 setosa Petal.Width 0.2
## 5 setosa Sepal.Length 4.9
## 6 setosa Sepal.Width 3
## 7 setosa Petal.Length 1.4
## 8 setosa Petal.Width 0.2
## 9 setosa Sepal.Length 4.7
## 10 setosa Sepal.Width 3.2
## # … with 590 more rows
tmp<-iris2 %>% dplyr::mutate(Species=as.character(Species),
val=as.numeric(value)) %>%
dplyr::select(-value) %>%
pivot_wider(
names_from = c("Species"),
values_from = val
)

tmp
## # A tibble: 4 x 4
## Type setosa versicolor virginica
## <chr> <list> <list> <list>
## 1 Sepal.Length <dbl [50]> <dbl [50]> <dbl [50]>
## 2 Sepal.Width <dbl [50]> <dbl [50]> <dbl [50]>
## 3 Petal.Length <dbl [50]> <dbl [50]> <dbl [50]>
## 4 Petal.Width <dbl [50]> <dbl [50]> <dbl [50]>
tmp %>% unnest(everything())## # A tibble: 200 x 4
## Type setosa versicolor virginica
## <chr> <dbl> <dbl> <dbl>
## 1 Sepal.Length 5.1 7 6.3
## 2 Sepal.Length 4.9 6.4 5.8
## 3 Sepal.Length 4.7 6.9 7.1
## 4 Sepal.Length 4.6 5.5 6.3
## 5 Sepal.Length 5 6.5 6.5
## 6 Sepal.Length 5.4 5.7 7.6
## 7 Sepal.Length 4.6 6.3 4.9
## 8 Sepal.Length 5 4.9 7.3
## 9 Sepal.Length 4.4 6.6 6.7
## 10 Sepal.Length 4.9 5.2 7.2
## # … with 190 more rows
  • unnest_auto(), unnest_longer(), unnest_wider(), and hoist()
tmp %>%  unnest_longer(setosa)## # A tibble: 200 x 4
## Type setosa versicolor virginica
## <chr> <dbl> <list> <list>
## 1 Sepal.Length 5.1 <dbl [50]> <dbl [50]>
## 2 Sepal.Length 4.9 <dbl [50]> <dbl [50]>
## 3 Sepal.Length 4.7 <dbl [50]> <dbl [50]>
## 4 Sepal.Length 4.6 <dbl [50]> <dbl [50]>
## 5 Sepal.Length 5 <dbl [50]> <dbl [50]>
## 6 Sepal.Length 5.4 <dbl [50]> <dbl [50]>
## 7 Sepal.Length 4.6 <dbl [50]> <dbl [50]>
## 8 Sepal.Length 5 <dbl [50]> <dbl [50]>
## 9 Sepal.Length 4.4 <dbl [50]> <dbl [50]>
## 10 Sepal.Length 4.9 <dbl [50]> <dbl [50]>
## # … with 190 more rows
tmp %>% unnest_wider(setosa)## # A tibble: 4 x 53
## Type ...1 ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Sepal.… 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 5.4
## 2 Sepal.… 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 3.7
## 3 Petal.… 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 1.5
## 4 Petal.… 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 0.2
## # … with 41 more variables: ...12 <dbl>, ...13 <dbl>, ...14 <dbl>,
## # ...15 <dbl>, ...16 <dbl>, ...17 <dbl>, ...18 <dbl>, ...19 <dbl>,
## # ...20 <dbl>, ...21 <dbl>, ...22 <dbl>, ...23 <dbl>, ...24 <dbl>,
## # ...25 <dbl>, ...26 <dbl>, ...27 <dbl>, ...28 <dbl>, ...29 <dbl>,
## # ...30 <dbl>, ...31 <dbl>, ...32 <dbl>, ...33 <dbl>, ...34 <dbl>,
## # ...35 <dbl>, ...36 <dbl>, ...37 <dbl>, ...38 <dbl>, ...39 <dbl>,
## # ...40 <dbl>, ...41 <dbl>, ...42 <dbl>, ...43 <dbl>, ...44 <dbl>,
## # ...45 <dbl>, ...46 <dbl>, ...47 <dbl>, ...48 <dbl>, ...49 <dbl>,
## # ...50 <dbl>, versicolor <list>, virginica <list>
tmp %>% unnest_auto(setosa)## # A tibble: 200 x 4
## Type setosa versicolor virginica
## <chr> <dbl> <list> <list>
## 1 Sepal.Length 5.1 <dbl [50]> <dbl [50]>
## 2 Sepal.Length 4.9 <dbl [50]> <dbl [50]>
## 3 Sepal.Length 4.7 <dbl [50]> <dbl [50]>
## 4 Sepal.Length 4.6 <dbl [50]> <dbl [50]>
## 5 Sepal.Length 5 <dbl [50]> <dbl [50]>
## 6 Sepal.Length 5.4 <dbl [50]> <dbl [50]>
## 7 Sepal.Length 4.6 <dbl [50]> <dbl [50]>
## 8 Sepal.Length 5 <dbl [50]> <dbl [50]>
## 9 Sepal.Length 4.4 <dbl [50]> <dbl [50]>
## 10 Sepal.Length 4.9 <dbl [50]> <dbl [50]>
## # … with 190 more rows
  • nest() and unnest() moved to (pack()/unpack(), and chop()/unchop())
df <- tibble(x1 = 1:3, x2 = 4:6, x3 = 7:9, y = 1:3)
df
## # A tibble: 3 x 4
## x1 x2 x3 y
## <int> <int> <int> <int>
## 1 1 4 7 1
## 2 2 5 8 2
## 3 3 6 9 3
df %>% pack(x = starts_with("x")) %>% unpack(x)## # A tibble: 3 x 4
## y x1 x2 x3
## <int> <int> <int> <int>
## 1 1 1 4 7
## 2 2 2 5 8
## 3 3 3 6 9
  • new expand_grid()
students <- tribble(
~ school, ~ student,
"A", "John",
"A", "Mary",
"A", "Susan",
"B", "John"
)
expand_grid(students, semester = 1:2)
## # A tibble: 8 x 3
## school student semester
## <chr> <chr> <int>
## 1 A John 1
## 2 A John 2
## 3 A Mary 1
## 4 A Mary 2
## 5 A Susan 1
## 6 A Susan 2
## 7 B John 1
## 8 B John 2

slider

Great new package not yet in cran but simplifies and replaces some of the tibbletime functionality. Great for applications such as rolling sd and rolling correlation

Good write-up in the slider website simple example below:

library(slider)
library(tidyverse)
## ── Attaching packages ────────────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.3.2 ✓ dplyr 1.0.2
## ✓ tibble 3.0.3 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.0
## ✓ purrr 0.3.4

## ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
set.seed(123)
df <- tibble(
y = rnorm(100),
x = rnorm(100)
)
out<-df %>% mutate(m=slide2_dbl(y,x,cor, .before = 19,.complete = T))

tail(out)
## # A tibble: 6 x 3
## y x m
## <dbl> <dbl> <dbl>
## 1 1.36 -1.31 -0.141
## 2 -0.600 2.00 -0.204
## 3 2.19 0.601 -0.0799
## 4 1.53 -1.25 -0.154
## 5 -0.236 -0.611 -0.114
## 6 -1.03 -1.19 -0.00903

usethis

There was a lot of discussions on usethis package for package building setup and deployment. The link to usethis package is here: usethis

There is also a good blogpost on the steps to setup a R package quickly: Package blogpost

testthat

This was another workflow package that forces good software engineering. This will work in concert with usethis and other package building tools. Good write up by Hadley and package library for testthat

scales

Scales

Presentation on the functionality of scales from scale. The slides for this presentation highlight 5 areas of functionality for scales package.

  • transformation
  • bounds and rescaling
  • breaks
  • labels
  • palettes

A good usecase is to create custom transformation:

require(scales)## Loading required package: scales

##
## Attaching package: 'scales'

## The following object is masked from 'package:purrr':
##
## discard

## The following object is masked from 'package:readr':
##
## col_factor
require(ggplot2)
# use trans_new to build a new transformation
dollar_log <- trans_new(
name = "dollar_log",
# extract a single element from another trans
trans = log10_trans()$trans,
# or write your own custom functions
inverse = function(x) 10^(x),
breaks = breaks_log(),
format = label_dollar()
)
ggplot(diamonds, aes(y = price, x = carat)) +
geom_hex() +
scale_y_continuous(trans = dollar_log) +
scale_x_log10()

or rescaling data:

squish(c(-1, 0.5, 1, 2, NA), range = c(0, 1))## [1] 0.0 0.5 1.0 1.0  NAdiscard(c(-1, 0.5, 1, 2, NA), range = c(0, 1))## [1] 0.5 1.0  NAcensor(c(-1, 0.5, 1, 2, NA), range = c(0, 1))## [1]  NA 0.5 1.0  NA  NAggplot(iris, aes(x = Sepal.Length, y = Sepal.Width,
colour = Sepal.Length)) +
geom_point() +
scale_color_continuous(limit = c(6, 8), oob = scales::squish)

tidyeval

Using tidyeval framework for metaprogramming in R. It is commonly used in tidyverse to apply data masking. There is good writeup and bookdown is uder tidyevaluation.

The new innovation here is tunneling using brackets’

require(tidyverse)

mean_by1<-function(data,by,var) {
data %>%
group_by(by) %>%
summarize(var:= mean(var))
}

#mean_by1(iris,Species,Petal.Width)

mean_by<-function(data,by,var) {
data %>%
group_by({{ by}}) %>%
summarize({{var}} := mean({{var}}))
}

mean_by(iris,Species,Petal.Width)
## # A tibble: 3 x 2
## Species Petal.Width
## <fct> <dbl>
## 1 setosa 0.246
## 2 versicolor 1.33
## 3 virginica 2.03

Time series forecasting workshop

Next week blog post.

This is for the next blogpost the github for the training sessions github

Text book is under timeseries forecasting

renv

New blog post idea: renv

tidymodels/tune

New blog post idea: tidymodels/tune

tfprobability

New blogpost idea: tfprobability

--

--

Nima Safaian

My name is Nima Safaian, financial professional, data scientist, and corporate entrepreneur. Follow me on: https://github.com/bigdatalib