R for Hockey Analysis — Part 1: Installation and First Steps

11 min readSep 9, 2018

2018 1st overall pick Rasmus Dahlin is reportedly a huge proponent of RStudio

This is the first part of a series where I’ll show you some ways in which you can use R to analyze hockey data.

You can click here for Part 2, here for Part 3, and here for Part 4.

My intentions for this series are to, ultimately, demonstrate to you all of the tools that I use for my hockey analysis. To get there, however, I believe that it’s important to have a base upon which we can build that knowledge. For that reason, this particular tutorial will focus on installing R and RStudio on your computer and the basics of R.

Installation

This link provides a fairly easy tutorial on installing R and RStudio on your personal computer. All you need is to install R and RStudio — don’t worry about installing the tidyverse or anything else. If you’ve already installed R and RStudio on your computer, then feel free to skip this section.

**Wait — what the h*ck is R?**

I’m glad you asked! R is a statistical programming language. People use it for all sorts of things, from building statistical models to designing web-scrapers for accessing data from websites to creating art and animations. There’s a lot you can do in R! R is also a program — as you can see from your installation of it.

RStudio is an environment in which you use R. RStudio is — by far — the easiest and most beautiful way to code in the R language, as RStudio provides many features that the base R program does not provide, making your learning experience far easier than it could be. Bear in mind that you need to install R to be able to use RStudio.

From here on out, you will only be using RStudio. If I ever refer to “R” alone, I’m either referring to RStudio or the language of “R” itself.

First Steps

Go ahead, and open up RStudio. The icon for RStudio should look like the logo on the puck that Rasmus Dahlin is holding in the picture above.

After opening RStudio, you should see something with four quadrants, likely in white (unlike the picture below). That’s RStudio!

Each of the four quadrants serves a purpose. Though people have different names for each of these quadrants, I’ll refer to them as the source, console, environment, and viewer.

Source

The source is where you typically write multiple lines of code for the purpose of saving that code and potentially re-running at some point.

Console

The console is similar to the source in that you can run code there. However, you can’t save your code from the console. The console is better for running “throw-away” lines of code, whereas the source is the safer option for writing [and saving] code.

Environment

The environment is where the attributes of objects that you created (datasets, functions, etc.) can be viewed.

Viewer

The viewer is where you can view function and package documentation, plots, and files.

If you found any of that confusing, don’t worry, it should all make sense after some time using RStudio.

One More Thing…

While we’re here, it’s for the best that we take care of the working directory now. The working directory is the folder on your computer in which R will look for files that you want to load and create files that you export from R. More information can be found here.

Following the directions of the above below, you can change your working directory to any folder on your computer. First go to the top of RStudio and click “Session”. Then, click “Set Working Directory”, and then “Choose Directory”. From there, you can go to the location of whatever folder you’d like to make your working directory.

To change your working directory go to the toolbar on top and click “Session”, then “Set Working Directory”, and then “Choose Directory”

My working directory is my “Documents” folder. If there’s a particular folder on your computer in which you intend on saving datasets or other R-related files, I’d recommend that you make that folder your working directory.

Once you set your working directory, you can test that you did it correctly by typing and entering getwd() in your RStudio console. That will print the location of the folder of your working directory, so you can check if the location is where you want it to be.

> getwd()
[1] "C:/Users/eoppe/Documents"

Once that’s all set up, take a nice long breath, as you just completed all the hard steps of getting R up and running! Congrats! You should be proud of yourself.

And now onto the fun…

Let’s Have Fun!

From here on out, we’ll see how writing code in R works. For any code examples I have here, I highly suggest that you re-type it yourself in RStudio rather than copy-and-pasting from this article. That’ll help you in learning how typing R code works.

Basics

R is a working calculator. That means that addition (+), subtraction (-), multiplication (*), and division (/) all work as you’d expect.

Here are some problems you can try to get the hang of it. Try typing each of them in your console and hitting “enter”:

> 3+5
[1] 8
> 4-6
[1] -2
> 4*7
[1] 28
> 13/2
[1] 6.5

You can also print any statements that you’d like by surrounding the statement — known as a string — in quotes.

Try these out:

> "I love hockey... I think"
[1] "I love hockey... I think"
> print("I love hockey... I think")
[1] "I love hockey... I think"
> print("We tend to overrate the potential of CHL prospects")
[1] "We tend to overrate the potential of CHL prospects"

And what if we wanted to save all of these numbers and strings that we’ve printed out? Maybe we want to view them later without having to re-write all of that code.

Well, you can give names to these numbers and strings. In R, we use something called the assignment operator (<-) to name these objects. It also works to use the equals sign (=) for this purpose, but I recommend sticking with the assignment operator for naming objects.

Here’s how it works:

> my_message <- "I'm learning to analyze hockey data in R!"
>

Awesome! Wait… where’d that message go?

When you use the assignment operator (<-) to name an object, you need to call that object to see it. For now, that message was saved as my_message, but you won’t see it again until you ask R to see my_message.

To see my_message, try this:

> my_message
[1] "I'm learning to analyze hockey data in R!"

Whoa! And get this — remember the four quadrants I was telling you about earlier? Do you remember the “environment” quadrant in the top right? Check it now. What do you see?

Here’s what I see:

A demonstration of the purpose of the environment

Folks, that is precisely the purpose of the environment. It’ll remind you what names you’ve given your objects so that you can call those objects again later. Useful stuff!

You can also set names to numbers and — well — nearly anything in R, and those names will all show up in the environment.

To recap what we’ve learned so far:

Installing R and RStudio
The four quadrants (source, console, environment, and viewer)
Setting the working directory (and checking it with getwd())
Basic calculations (+, -, *, and /)
Strings and print()
Using the assignment operator (<-) to name objects

Wow, that’s a lot of information, and if all of this has worked — awesome! If not… you may want to re-check your steps, and try again. Anyways, if you’ve made it this far, then you’ve already learned a fair amount about R! Feel free to take a break if that’ll help you.

Vectors (No Linear Algebra, I promise)

For these next sections, let’s try writing all of the code in the source (the top left quadrant in RStudio). Running code in the source is slightly different from running code in the console — I’ll explain when we get there.

In R, a vector may as well be defined as a single column in a “spreadsheet” of data. For example, below is a vector of my favorite Rangers.

To create a vector in R, we can use the function c(). This function technically stands for “concatenate”, but, for our purposes, it stands for “combine to make a vector”.

Let’s create a vector in the source (the top left quadrant). This vector will be the number of goals — 16, 26, and 25 — for the 2017–18 Rangers’ top point-getters. And while we’re at it, let’s name it goals using the assignment operator.

To do this, we write in the source:

goals <- c(16, 26, 25)

Now, as I said earlier, executing code in the source is a bit different from executing code in the console.

To do this:

Click on (or highlight) the line of code that you’re interested in running
Click “Run”
This code should then show up in your console

I posted some pictures below to illustrate this.

First, click on the line of code. Then, click “Run”. The code should show up in the console.

To check that we did everything right, we can type goals in the source, and run that. Checking the console, we can see:

> goals
[1] 16 26 25

Great! And while we’re at it, let’s create a vector named assists for the number of assists — 37, 20, and 19 — for these same players. Try it in the source again, while leaving alone the code you just wrote for goals.

> assists <- c(37, 20, 19)
> assists
[1] 37 20 19

And I almost forgot! Remember the environment? That’s where object names are stored. Check it out, and see what’s changed:

Another demonstration of the environment

We now have values for assists, goals, and my_message.

Let’s try making 2 more vectors, okay?

We don’t have the names for these three players, so we may as well create a vector called names in the source. The three players are Mats Zuccarello, Mika Zibanejad, and Kevin Hayes.

> names <- c("Mats Zuccarello", "Mika Zibanejad", "Kevin Hayes")
> names
[1] "Mats Zuccarello" "Mika Zibanejad"  "Kevin Hayes"

See how there are quotes around each player’s name? For R to recognize these names as strings, we need to put quotes around them (try it without quotes, and you’ll get an error).

Since we have a vector for goals and a vector for assists, we can also have a vector for points! Points are the sum of a player’s goals and assists, so, because we have the same number of values for goals and assists, we can use R’s functionality as a calculator to create this vector.

Try running these commands in the source:

points <- goals + assists
points

If everything worked correctly, the output should list the players’ points

> points <- goals + assists
> points
[1] 53 46 44

Easy! So I guess we’re all fini…

The Data Frame

…shed. Oh, sorry. I forgot about the data frame (I actually didn’t, I was lying).

Remember how I said a vector is like a single column in a spreadsheet? A data frame is the spreadsheet. Basically, a data frame is a collection of vectors. Bear in mind that, to be able to bind a collection of vectors into a data frame, the vectors must be of equal sizes. So, because the vectors that we created each contain 3 values, we’re good to go.

The function in R to create a data frame is data.frame(). Supplying the names of the vectors we created, we can create a data frame named player_stats.

Try running this in your source:

player_stats <- data.frame(names, goals, assists, points)
player_stats

What does your output look like? For me, it’s:

> player_stats <- data.frame(names, goals, assists, points)
> player_stats
            names goals assists points
1 Mats Zuccarello    16      37     53
2  Mika Zibanejad    26      20     46
3     Kevin Hayes    25      19     44

Whoa. That looks like legitimate data. It is legitimate data — that’s what’s wild about all of this.

While this data frame only has 3 rows, we’ll often have data frames with hundreds or thousands of rows. This makes it difficult to see. Fortunately, R has a function called View() that allows us to scroll through the data more easily (note: View() has a capital “V”, not a lower case “v”).

Try this:

View(player_stats)

A sudden window should pop up with the data frame, like this:

Wild. That’s absolutely bonkers. View() will serve as a useful function for when your data frames contain many many rows of data.

Last thing. Since I asked that you write all of the vector and data frame code in your source, you should have an unsaved document titled Untitled1* . Let’s save it, so you can view it again.

To save, you can click the “save” icon. I posted an image of below of what it should look like.

You should be transported to whatever folder you set as your working directory. There, you can save this as any file name you’d like. I’ll save it as “R Hockey Analysis Part 1”. It’ll then show up in my working directory folder as “R Hockey Analysis Part 1.R”.

Once you’re done playing around in R, I suggest closing your RStudio window. When you do that, you’ll see a pop-up window asking “Save workspace image to ~/.RData?”. If you click “Save”, R would create an “.RData” file, which you could then open to have all of the same vectors and data frames in your environment. However, I typically suggest that you don’t do this, as this is more of a “last resort” technique to save your work. We already saved our R script from the source, so we don’t need to this. You can click “Don’t Save”.

And if all that worked…you made it! Congratulations!

Final Thoughts

Let’s take a quick tour down memory lane to recap all that we’ve learned.

Installing R and RStudio
The four quadrants (source, console, environment, and viewer)
Setting the working directory (and checking it with getwd())
Basic calculations (+, -, *, and /)
Strings and print()
Using the assignment operator (<-) to name objects
Creating vectors with c()
Running code from the source
Creating data frames with data.frame()
Viewing data with View()
Saving R scripts (code from the source)

I’m being serious here — if you, at least to some extent, understand what we’ve worked through here, you’re already pretty dangerous with R. You can do quite a bit, and with a bit more experience and knowledge, you’ll be conducting your own wacky analyses in no time.

Anyways, that’s it! I have a lot more in store for you kind folks — namely, an introduction to the tidyverse using NHL draft data coming soon. I’ll be posting all of my tutorials on Twitter (@OppenheimerEvan). If you have any questions, feel free to contact me there, or e-mail me at eoppe1022@gmail.com.