Getting Started with R

Introduction to the R programming language

Kenneth Reilly
CodeX
6 min readAug 17, 2021

--

What is R?

R is a programming language and environment designed for statistical computing and static graphics. First released in 1993, it builds upon the S language from Bell Labs in the 1970s, with influences from Common Lisp, Scheme, and others. R is an interpreted language (meaning that code is not pre-compiled into a binary and therefore must be evaluated at runtime).

The R language is dynamically typed, meaning that data types are not fixed prior to runtime (a common trait of interpreted languages, since there is no compiler to pre-check variables before shipping an executable). R has more support for Object-Oriented Programming than many statistical languages, and provides many features for linear and nonlinear modeling, spatial and time-series analysis, clustering, static graphics with mathematical symbols, and more, including the ability to link performance-critical modules built in languages such as C/C++ and Fortran.

Environment Setup

A fully-functional development environment for R includes:

Installation will be different for each operating system, so follow the instructions below for the OS you will be using to work with R:

MacOS:brew install r

Ubuntu: sudo apt-get install r-base

Windows: Download and install R and (optionally) RStudio

There is also an R extension for VSCode available here.

Concepts

The R environment provides a complete system for performing work on mathematical data. These include a set of operators for working with arrays and matrices, integrated tools for data analytics, publication-quality static graphics for both screen and print, and much more.

The best way to get started with many interpreted languages is to explore the interactive interpreter itself via the command-line (when one is available, which is true in this case). To run the R interpreter, open a terminal instance and execute the command R. This should display some version information along the following lines:

R version 4.1.1 (2021-08-10) -- "Kick Things"Copyright (C) 2021 The R Foundation for Statistical ComputingPlatform: x86_64-apple-darwin20.4.0 (64-bit)  ...Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.> 

To get an idea of what R can do, run the demo() command and check out some of the interesting examples, which include cool things from error handling and recursion, to a built-in table of Japanese characters to showcase some of R’s graphical and print capabilities. To return to the previous menu, press the q key. Let’s check out the graphics demo by entering the following command into the interactive prompt:

> demo(graphics, package="graphics")

This will queue up the demo as follows:

demo(graphics)---- ~~~~~~~~Type  <Return>  to start :

This should load up a blank window which will be used to display the output graph. Press return or enter once more to plot the first graph:

Screenshot of example output for the graphics demo

This illustrates one of the basic features of R, the ability to analyze and plot data on a graph. Additionally, the terminal will now contain log and usage information about the graph and the functions used to create it, such as which parameters to use to set foreground and background color among other properties. Pressing return or enter once again will cycle to the next example graph.

Language Syntax

R is different from languages used for systems-level programming and application engineering, featuring a syntax that is streamlined for mathematics and creation of static graphics for publication. R also differs in various ways such as allowing the period (dot) character . in variable names (in many languages, this character is used to resolve object scope for example, and cannot be used in a variable name).

Here is a quick example we can run in the interpreter which will store a number in a variable and then determine the data type of what was stored:

> asdf <- 1234> typeof(asdf) [1] "double"

Here we have assigned the number 1234 to a variable named asdf and then requested the data type of the value that was stored. We find that asdf now contains a double which is a representation of a double-precision floating-point number that occupies 64 bits in memory and represents a fractional value with 15 significant digits. This is also the default data type for all numbers in R, including Inf and -Inf among others.

To retrieve some input from the user and do some work with it:

> user.name <- readline(prompt="Name: ") Enter your name: ケン> user.dtob <- as.Date(readline(prompt="DOB, yyyy/mm/dd: ")) Enter your DOB, yyyy/mm/dd: 1981/12/21> user.days <- as.numeric(Sys.Date() - user.dtob)
> print(paste("Hello ", user.name, ", you have been alive for ", user.days, " total days", sep=""))
[1] "Hello ケン, you have been alive for 14483 total days"

This illustrates how to accomplish a simple task: retrieve information from the user and extrapolate new information from it. We can see from this example that instances of the Date object can be manipulated with standard math operators and the Date object will correctly handle the necessary calculations internally, including accounting for leap years. This is a common feature of OOP (abstraction by object / encapsulation of features).

Real-World Example

For a real use case, we will retrieve a JSON object with locations of EV charge stations for Canada and the US from the Open Charge Map API and plot these stations on a map using the maps package. The source code for this project is available here on GitHub. The project is organized into source and data directories and provides a configuration file.

Let’s take a look at config.yaml first, which defines the datasource:

When loading config data, R requires at least a default configuration to be specified in a .yaml file. Here, we have defined a datasource with url and key properties, and the URL contains some query string parameters for the API, including the country IDs (based on Open Charge Map’s ID assignment) along with 1000 max results. An API key must be provided, which can be obtained by creating an account at the OCM website.

Next, let’s check out setup.r which will configure our packages:

This will retrieve and install the packages required for this project from the CRAN R package repository. Next let’s check out ev-chargers-get.r:

This uses the httr package to make a GET request to the Open Charge Map API based on the config data loaded from config.yaml and then retrieves the contents of the response body as text to be stored in data/dataset.json for processing at a later time. This allows us to retrieve a fresh copy of data from the API only as necessary and not each time the map is rendered.

Map rendering is handled by ev-chargers-render.r:

This file loads the JSON data retrieved in the previous step, creates a simple world map showing US and Canada using the maps package, and then iterates over each object in the JSON payload to retrieve the address of each charging station, which is then plotted on the map.

To run this example, navigate to the project directory, then run the following commands in the R interpreter:

> source('./setup.r')
trying URL etc ... ... ...
> source('./source/ev-chargers-get.r')
No encoding supplied: defaulting to UTF-8
> source('./source/ev-chargers-render.r')
...

This should open a new window displaying a map of Canada/US and begin plotting coordinate points on the map:

Screenshot of EV Charger map for Canada/US using data from Open Charge Map API

This example can be modified for any country or combination of countries, and may be extended to include roads, traffic patterns, and much more.

Conclusion

R is a good language for statistics, data analytics, static graphic generation, and other similar tasks. In addition to numerous features for performing advanced data analysis, the language can be extended with modules developed in other languages such as C, C++, Java, Python, and others, allowing for integration with a wide variety of other systems and tools.

While the syntax may be unfamiliar to many programmers (specifically those working in software engineering disciplines), the many tools provided can save a tremendous amount of work when used properly. This is especially true when compared to languages such as C/C++ which often require years of study to achieve the proficiency required to implement basic solutions such as the map plot example above, which we can do with a few lines of R.

Thanks for reading and good luck with your next data project!

~ 8_bit_hacker

--

--