Dependencies and bloat

David Hugh-Jones
Mar 14, 2018 · 1 min read

Dirk Eddelbuettel worries about excessive dependencies in R packages.

Okay, but there is a countervailing worry about the amount of bloat in base R:

> length(ls("package:base"))
[1] 1217

Can you tell me what all of these do?

> grep("apply", ls("package:base"), value = TRUE)
[1] "apply" "eapply" "lapply" "mapply" "rapply" "sapply" "tapply" "vapply"

And then there’s:

  • A function rownames and a function row.names.
  • rowSums and rowsum and .rowSums.
  • nrow and NROW.
  • base::rbind and methods::rbind2 .
  • Two functions in the parallel package, mclapply and parLapply, that appear to do exactly the same thing. Like bitter divorcees, mclapply mentions parLapply in the documentation, but parLapply doesn’t mention mclapply, and neither of them explains the differences between them.

And to match a regular expression, should you use grep, grepl, regexpr, gregexpr or regexec?

I could go on. I’ve been programming R for fifteen years, and looking through the documentation today, I have come across functions that I’ve never even heard of, much less used. No wonder it is daunting for a beginner.

In this respect, R is a bit like Windows. It has carefully kept backwards compatibility, and this is surely a big reason for its success — especially with scientists who need old scripts to keep working, for reasons of reproducibility. But it has also added layers of cruft.

The tidyverse has been successful because it gets rid of the cruft and creates a more modern and consistent interface. In that respect, I’d rather have dependencies on a few libraries, if it means more readable code.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade