The use case of S7 — the last (or just most recent) OOP system in R

Petr Pavlík
Ph.D. stories
Published in
6 min readJan 7, 2024

R has been around for almost 30 years and became widely used especially in the academia and research. Certainly, much of its popularity was gained due to the focus on data, amount of statistical tools and the ability of producing high quality visuals out of the box. It is fairly easy to produce and run simple code in R, despite being a newcomer. The RStudio (POSIT) also hit the spot with some astonishing reporting tools like RMarkdown and Shiny (special credit should be given to Yihui Xie for tools like knitr and bookdown). The speed and simplicity of R code prototyping has some drawbacks though. Without additional packages like data.table or R6, R is limited to pass-by-value, in-memory computation, as well as single-threadedness by default. Compared to some languages like Julia the performance in some heavy duty computing is just not there (although it got better due to the aforementioned and some core refactoring in the past). Of course there are ways to enhance the speed but the most efficient ones include coding in compiled languages like Fortran or C/C++. Dirk Eddelbuettel and Romain Francois put in a lot of effort with their Rcpp package to make it easier for the other. In the author's field — Hydrological modeling, the workflow consists of a combination of high performance computation and visualizing/reporting loop. It is outlined down below, how the latest OOP approach together with Rcpp can work very well together.

A very short intro to object oriented programming paradigms in R

Even as a novice R user, one surely comes across R's unique (or weird) OOP systems. Yes, plural, there were actually about 10 in total. Now some are useful, some are even more useful and then there are the dead and buried.

Let’s start with the ones in agony and slowly build-up a use case for the latest addition — S7. These categories are opinionated.

Arguably the first row at the graveyard contains the mutatr, R5 — both experimental and abandoned. R.oo and OOP (package) also never really made it and together with proto are buried in the second row. proto was used in ggplot2 package but was replaced with R6 in the newer versions of the package. The one to be considered slowly dying is the referenceClasses system. Although it has features like inheritance or encapsulation, the R6 seem to do the same in somewhat less clunky way. As it seems to be the case with the newest addition — S7.

A step aside to the operators

Learning new programming language could be hard, particularly if you’ve never programmed before. R is certainly pretty straightforward (data structures resemble the data itself and control flow language is obvious) until the moment at which you are presented ggplot2 package with it’s + powered layering system. And this is usually very soon, since the rise of tidyverse ecosystem, which the ggplot2 is part of. Till then, you have probably never had to use the plus sign to add anything else than numbers or logical values. Suddenly the consecutive rows of code are glued together and atributes like data or aesthetics are inherited. Which is done with the + is an example of an overloaded operator for prototype combination.

library(ggplot2)
data(CO2)
CO2 |>
ggplot() +
geom_line(aes(x, y))

That would not be so confusing, if there were not a different glue operator in the workflow with every other tidyverse package called pipe. The pipe's main purpose is to propagate the data through the consecutive chain of functions, effectively avoiding repetition in object stating. It also significantly changes the way the code is perceived hence divides the R programmers to distinguished groups of tidyverse haters/lovers.

# introducing the pipe operator
CO2 %>%
group_by(Type) %>%
summarize(conc) %>%
ggplot(aes(x = Type, y = conc)) +
geom_line()

Hadley Wickham, arguably the most prolific author within the R community and the creator of plethora of packages including the titans like ggplot2, noted on multiple ocasions, that some of the former but now kind of obsolete features of the language that it is now too late and would be too demanding to change the + to e. g. |> for the purpose of consistency. Now what is |> ? Ą native pipe and it does (almost) the same stuff as pipe, most of the time just mildly faster, since it is coded in C rather than R. Operators in R are just functions and overloading them is a common thing. And with OOP they are particularly useful.

Back to OOP

plot() or summary() functions representants of the S3 system, an extremely small but efficient OOP, which is very easy to define and use.

# Creating an S3 class
my_object <- list(data = 1:10)
class(my_object) <- "my_class"

# Creating a method for the class
print.my_class <- function(x) {
cat("Data:", x$data, "\n")
}

# Calling our new method on a new object
print(my_object)

The purpose of S3 is straightforward — to be able to use commonly known functions as methods with defined behavior on different objects.

Later came the S4 system, which is widely used among the Bioconductor packages. But is also very popular with libraries, that handle complex spatial data like the terra and sf. It introduces some important features like non-instantiable virtual classes, structured approach to class definition, and higher level of complexity organization through slots.

S3 + S4 = S7 use case

Actually the original name was R7, since S originated in the S language. But it should be a successor to both S3 and S4, so the naming logic changed apparently. It should be stated, that according authors the S7 is still experimental and is not yet part of the base R (part of CRAN though), so substantial future changes can occur. What seems to be useful, is that even if a definition of S7 class resembles the style of R6, it is the usage that seems to be more functional looking and deprived of the $ member function calls.

So let Model be a more general class for which we will define a child, and overload the operators, which will work with r as runoff.

Model <- S7::new_class("Model", 
package = "wrmt",
properties = list(
structure = S7::class_character,
date = S7::class_Date,
r = S7::class_numeric))

HBV class inherits from the parent Model and contains constructor which calls the HBVcppInit() into ptr slot. The HBV is made accessible through Rcpp::XPtr export. HBV stands for the Hydrologiska Byråns Vattenbalansavdelning model and could be considered one of the standards of conceptual hydrological model due to its spread. For the reference see [1]. Since this contribution is about S7 I will not dvelve into the C++ implementation of the inner parts of the HBVcppInit().

HBV <- S7::new_class(
"HBV",
package = "wrmt",
parent = Model,
properties = list(
pars = S7::class_numeric,
ptr = S7::new_property(class = S7::class_any)),
constructor = function(
structure = S7::class_missing,
date = S7::class_missing,
r = S7::class_missing,
pars = S7::class_missing,
ptr = S7::class_missing) S7::new_object(
Model(
structure = "HBV",
date = date,
r = r,
pars = pars,
ptr = wmrt::HBVcppInit())
)

The HBV Model is defined, let's provide a sample operator, for example to manipulate the r property using both instances. It is done in two steps, first a new generic is dispatched on two arguments representing the left and right hand side of the operator and then the method is registred to Model. The operator uses the slots of the class in this case.

`%>>%` <- S7::new_generic(
"%>>%",
dispatch_args = c("i", "o"),
function(i, o) {
S7::S7_dispatch()
})

S7::method(
generic = `%>>%`,
signature = list(i = Model,
o = Model)) <- function(i, o) {
x <- Model(r = i@r + o@r)
return(x)
}

Other member functions, like getters/setters could be implemented the same way, only they are perhaps dispatched on a single argument. An example workflow with HBV instances could look like this:

# define two Model/HBV objects
A <- D <- HBV(r = 1:10)

# a default print method of S7 class instance
> A
<wrmt::HBV>
@ structure: chr "HBV"
@ date : 'Date' num(0)
@ p : int(0)
@ r : int [1:10] 1 2 3 4 5 6 7 8 9 10
@ pars : int(0)
@ ptr :<externalptr>

# use the oprator to pass the r from A as input to B
A %>>% B

# extract values from the inner pointer (get_values is a wrapper for C++ func)
A |>
get_values()

# or other member methods
A |>
optimize(...)

As it can be seen the workflow is the same as with the common objects in R. Despite the fact that the hydrological models are often quite complex structures, this implementation using S7 classes seems to possess the benefit of handling C++ pointers while maintaining the simplicity of R functional narration.

If you find this approach interesting, follow the development of the wrmt package.

[1] Bergström, S., 1976. Development and application of a conceptual runoff model for Scandinavian catchments, SMHI Report RHO 7, Norrköping, 134 pp.

--

--