Baby Steps into Data Science 04 — Programming: Introduction to R

Editor: Ishmael Njie & Sulayman Saleem

DataRegressed Team
DataRegressed
5 min readJul 3, 2018

--

This chapter is about getting started with R. You will learn the basics and functions to get your baby feet into a running stride!

Firstly, you need to a platform to start running R on. You can either do this by downloading R from here: http://cran.r-project.org/

Or you can download RStudio which is much more user friendly and has bonuses such as debugging and visualization toolkits:

https://www.rstudio.com/products/rstudio/

Both are free to use but I recommend RStudio!

Let’s get started

Once you’ve installed and running lets start with our first command! Lets first play with the command line. R uses functions to perform operations. To run a function for example funcname simply type:

funcname(input1, input 2).

Let’s start with a simple function c(). This will generate matrix of your choosing.

For example: c(1,2,3) will create a row vector of elements 1,2 and 3.

We can also assign this function to an object, say object x. This is useful if we want to use the same matrix again later. You can do this by typing:

X <- c(1, 2, 3)Y<- c(2, 3, 4)

Here we have two vectors x and y. From here you can do many things. For example, let’s add them. In the command line, type:

X+Y

What did you get?

Listing and Removing

Say we want to view what objects we created so far. We can do this by typing in the command line ls().

Hopefully the output gives “X” “Y”.

We can also remove any object by using the rm() function. Simply type rm(itemToBeRemoved). For example, the following line of code will remove the object Y:

rm(Y)

Now use ls() to see if it has been removed.

Later, try rm(list=ls()) which removes all the items at once. This saves time and stops you from typing every object.

Creating a matrix

Okay now let’s create a matrix. You can learn more about creating a matrix by typing the following into your command line:

?matrix

The matrix function takes 3 parameters, the data, number of columns and number of rows. For example:

m <- matrix(data=c(1,2,3,4), nrow=2, ncol=2)

This creates a 2x2 matrix of elements 1 ,2, 3, and 4.

Note: We could just omit data, nrow, and ncol:

m <- matrix(c(1,2,3,4),2,2)

However, it can sometimes be useful to specify the names of the arguments passed in, otherwise R will assume that the function arguments are passed into the function in the same order that is given in the function’s help.

We can also create this by row which is more ideal by adding byrow=TRUE:

m <- matrix(data=c(1,2,3,4), nrow=2, ncol=2, byrow=TRUE)

This will output:

1  2 3  4

Lets try a matrix multiplication:

x <- matrix(c(1,2,3,4),2,2)
y <- matrix(c(0,3,2,1),2,2)
x %*% y

This will result in:

9   512  8

Exercise: Try this out by hand and see if you get the same result!

Indexing

We often wish to examine part of a set of data. Suppose that our data is stored in matrix:

A <- matrix (1:16,4,4)

Say we want to get the element in the second row and third column, we can do this by typing:

A[2,3]

This should return the number 10.

You can also get multiple values or generate a whole new matrix from this matrix. Try these commands out and see what you get:

A[c(1,3),c(2,4)]A[1:3,2:4]A[1:2,]A[,1:2]

The last two examples we left the row and column values empty, what did this do instead?

The use of a negative sign ‘—’ in the index tells R to keep all rows or columns except those indicated in the index.

Example 1:

A[-c(1,3),]

This will result in:

2 6 10 144 8 12 16

Example 2:

A[-c(1,3),-c(1,3,4)]

This will result in:

6 8

The dim() function will tell us the dimensions of a matrix.

For example:

dim(A)

The output will be:

4 4

Which will indicate that the matrix is a 4 by 4 matrix.

Loading Data

Getting and loading data is a key skill for any Analyst or Data Scientist. In R, this is a simple job done by the read command. However, we must first allow R to work from our current directory (folder). To find out your current directory, use the getwd() function.

You can change the current directory by using the setwd() function.

For example, I am changing into the folder called Data regressed on my C drive:

setwd(“C:/Users/sulayman/Documents/DataRegressed”)

Now, let’s go find some Data to import!

Let’s take the Auto data set. Go to the An Introduction to Statistical Learning” webpage and download the Auto.csv file and save it you current directory.

Now run the command:

Auto<-read.csv(“Auto.csv”)

There you go! Your first set of real data!

Of course not all data is neat and tidy in a csv format. Sometimes they are in text files which is more likely the case in Big Data. To import text files you’ll have to use the read.table(“example.txt”) command instead.

R will assume the column names are the first row in a text file. To fix this use the command:

Auto <- read.table(“Auto.txt”, header=T, na.strings=”?”)

You can then follow up by calling the fix() function on our “Auto” object.

fix(Auto)

The header argument in the read.table() method will take the first row (column names) and set them as headers. The fix() function just displays our data in a spreadsheet format so we can see if all items have been correctly formatted.

Exercise: How many dimensions does the Auto dataset have?

Script writing

Instead of writing through command line, you may find it more convenient to write a script of R commands and then run the commands all at once. This allows you save your work.

In RStudio, click on file -> New file -> R Script to create an empty R script. From here, let’s write and run our first script. I will kill two birds with one stone and teach you an “if” clause. Something will happen if the requirement in the if statement is met. For example:

i <- 2
if (i>0) {
print(i)
}

This means if the object i is more than 0, print the object i.

Once you’ve saved this script as a .r file, run it in the command line by typing:

source(“filename.r”)

What is to come…

Firstly, thanks for reading our Baby Steps - R tutorial. In the future, we will be releasing some guides on writing you’re own scripts on more interesting things like algorithms!

--

--