How to Create and Distribute an R Package

Shian Su
7 min readJul 9, 2019

--

R is an open-source statistical programming language commonly used for data analysis. It features an easy-to-use package system that allows code to easily be shared.

The goal of the article to set up an easily installable R package on Github for others to use via remotes::install_github(). I will assume you have a working understanding of Git and Github.

The official repository for R packages is CRAN, and the alternative repository Bioconductor containing many methods for bioinformatics. While these are the preferred channels for sharing R packages, it’s possible to distribute packages through Github which has fewer requirements.

The main tools required are RStudio along with the packages roxygen2 and usethis. RStudio will be the program used for developing your package, roxygen2 provides support for documentation and usethis provides useful utilities for enhancing your package distribution.

Why make a package?

Creating a package allows you to share your analysis methods with others in a very efficient way. Having a package available for your methods dramatically improves usability and visibility of your work.

Packages are not just for complete software packages, they are also useful for sharing reproducible research. The most important facility packages provide is the inclusion of depdencies. When set up properly, a package will install with a single line and its contents can be run with minimal effort.

How to create an R package?

The easiest way to get started with creating a package is to use RStudio. Then simply follow the instructions at Developing Packages with RStudio and at the “Creating a New Package” make sure to check “Create a git repository”.

I have called my package “PackageHow” and RStudio has generated the following folder structure.

.
├── DESCRIPTION
├── PackageHow.Rproj
├── NAMESPACE
├── R
│ └── hello.R
└── man
└── hello.Rd

hello.R and hello.Rd are placeholders you are going to want to delete. So you should end up with a clean package looking like the structure below.

.
├── DESCRIPTION
├── PackageHow.Rproj
├── NAMESPACE
├── R
└── man

For my package I want just one function that links to this article. The function is documented using roxygen2, see RStudio’s instructions on how to set this up.

#' How to create an R package
#'
#' Open up an article describing how to create and distribute an R
#' package
#'
#' @return None
#' @export
package_how <- function() {
browseURL("https://medium.com")
}

Save this in package_how.R under the R directory and build the documentation.

Aftewards we will have the following structure.

.
├── DESCRIPTION
├── NAMESPACE
├── PackageHow.Rproj
├── R
│ └── package_how.R
└── man
└── package_how.Rd

As per the instructions from RStudio, this package is now buildable and installable. But before it’s ready to be distributed, there is more work to be done.

Filling in the DESCRIPTION

Looking into the DESCRIPTION file you will find

Package: PackageHow
Type: Package
Title: What the Package Does (Title Case)
Version: 0.1.0
Author: Who wrote it
Maintainer: The package maintainer <yourself@somewhere.net>
Description: More about what it does (maybe more than one line)
Use four spaces when indenting paragraphs within the Description.
License: What license is it under?
Encoding: UTF-8
LazyData: true

I will edit this with the basic information.

Package: PackageHow
Type: Package
Title: How to Make an R
Version: 0.1.0
Author: Shian Su
Maintainer: Shian Su <my_email_address@geemail.com>
Description: Link to an medium article describing how to create an
R package.
License: What license is it under?
Encoding: UTF-8
LazyData: true

Dependencies

CRAN dependencies

Then we need to add in the dependencies. R has three notions of dependencies: Depends, Imports and Suggests. These affect how packages are installed and loaded¹.

  • Depends. These packages are loaded when your package loads. It is the “strongest” form of dependency and usually shouldn’t be used. It should be used when your package essentially complements another package. An example of when Depends is necessary is if you package mainly operates on a data structure defined by another package.
  • Imports. This should be the most common form of dependency, the functions of these packages are made available to your package but not to the user.
  • Suggests. This is the weakest form of dependency, it indicates packages whose functions are used in your package but only in rare instances. This allows you to provide obscure features that might be useful to some users but does not force every user to install the packages required to support the features.

Under default options of remotes::install_github(), installing a package will also install the packagesDepends and Imports, while additional options can be used to also install packages from Suggests. These can be added to your DESCRIPTION file as follows.

Package: PackageHow
Type: Package
Title: How to Make an R
Version: 0.1.0
Author: Shian Su
Maintainer: Shian Su <my_email_address@geemail.com>
Description: Link to an medium article describing how to create an
R package.
Depends:
tibble
Imports:
dplyr
tidyr
Suggests:
ggplot2
License: What license is it under?
Encoding: UTF-8
LazyData: true

Bioconductor and Github Dependencies

By default the installer will only look for dependencies on CRAN. Two additional tricks can be applied for other dependencies, Remotes and biocViews.

  • Remotes. This allows you to depend on other github packages that are not yet on CRAN or Bioconductor. This works the same way as the other fields, list the desired packages using user/repository notation.
  • biocViews. This tells remotes::install_github() to also look inside Bioconductor for dependencies. This works different from the other fields, it’s simply the tagging system for Bioconductor. You will list your dependencies in there regular fields while adding biocViews: Software to let remotes know that your dependencies might be found on Bioconductor.
Package: PackageHow
Type: Package
Title: How to Make an R
Version: 0.1.0
Author: Shian Su
Maintainer: Shian Su <my_email_address@geemail.com>
Description: Link to an medium article describing how to create an
R package.
Depends:
tibble
Imports:
dplyr
tidyr
limma
Suggests:
ggplot2
Remotes:
r-lib/testthat
biocViews:
Software
License: What license is it under?
Encoding: UTF-8
LazyData: true

At this point your package can be installed along with all the required dependencies using a single command.

Finishing Touches

The final things to do are to put in a license for your open source project and write a helpful README. The usethis package provides some quick functions to help with this, it can also help you set up unit-testing and continuous-integration but we won’t worry about that here. To set up your README and license (I’ll use MIT in this example) you run

usethis::use_readme_md()
usethis::use_mit_license()

This will have created the files README.md, LICENSE, and LICENSE.md for you as well as have modified your DESCRIPTION to say License: MIT + file LICENSE.

.
├── DESCRIPTION
├── LICENSE
├── LICENSE.md
├── NAMESPACE
├── PackageHow.Rproj
├── R
│ └── package_how.R
├── README.md
└── man
└── package_how.Rd

Then we should fill in the README with useful information.

# PackageHow<!-- badges: start -->
<!-- badges: end -->
This package links to an article about creating R packages.## InstallationYou can install this package with using``` r
remotes::install_github("shians/PackageHow")
```
## ExampleTo use the function provided by this package, run the following code.```r
library(PackageHow)
package_how()
```
## LicenseThis package is licensed under MIT license.

Publishing the Package

To publish the package via Github, you first need to commit all your work to the local git repository. You can do this within RStudio’s integrated git interface.

You can quickly add all the files by using Ctrl/Cmd + a and Space/Enter

Once you’ve added all your files to git and commited your first set of changes. Head over to Github.com to create a new repository.

All you need is the name, leave everything else as is and after clicking “Create Repository” you should be given instructions on how to upload from an existing local repository. For me the commands look like this.

git remote add origin git@github.com:Shians/PackageHow.git
git push -u origin master

You can run this command inside RStudio’s terminal.

The first two lines are taken from Github’s instructions upon creating a new blank repository

My package is now ready at https://github.com/shians/packagehow, and it doesn’t look half bad! Good luck with creating your own R packages and please comment with suggestions on how to improve this article.

[1]: R documentation uses the terms “attached” and “loaded” to have specific meanings. I use “loaded” in the sense of “attached and loaded”, as the distinction is beyond the scope of this article. Attached refers to when the functions of a package are made available to the user, while loaded refers to when the functions are made available to the package specifying the dependency.

--

--