R Starter Kit

Where to begin when you don’t know where to begin with an emphasis on People Analytics

Rob Stilson
8 min readAug 22, 2023
Photo by Brett Jordan on Unsplash

Overview

The way that I’ve seen most people have success learning R is to get the foundation of the programming language down (see Step 2) and then directly apply it to something they care about or needs to get done anyway.When I first started out, I would begin a project in R and get as far as I could in R and then make a note of wherever I got stuck so that I could come back later and figure it out while I finished up the project in SPSS, Excel, etc. in order to meet a deadline. I would visit R-Bloggers every day to get a feel for the “art of the possible” and then see what I could take from there and apply to People Analytics. Now with ChatGPT and other conversational chatbots coming online, you have another helper and I encourage you to use one of these chatbots as your “copilot” on your coding journey.

Step 0 — GUIs for R

If you don’t feel quite ready for a scripted programming language, then Jamovi may be the place to start. This is a GUI for R (similar to SPSS) as you can see in the user manual.

Step 0.5 — Transitioning from Excel

If you are comfortable with Excel, this article provides a nice transition into R.

How to transition from Excel to R

Step 1 — Get R and R Studio!

First make sure you have both R and R Studio. The download link to the RStudio Integrated Development Environment (IDE) is here.

Step 2 — Introduction video tutorial

Once you’ve installed R and R Studio, I’ve found the following tutorial on Youtube from freeCodeCamp.org presented by Barton Poulson to be very helpful. It will hit the following areas:

  • Installing R
  • RStudio
  • Packages
  • plot()
  • Bar Charts
  • Histograms
  • Scatterplots
  • Overlaying Plots
  • summary()
  • describe()
  • Selecting Cases
  • Data Formats
  • Factors
  • Entering Data
  • Importing Data
  • Hierarchical Clustering
  • Principal Components
  • Regression
  • Next Steps

If you enjoy it as well, make sure to like, comment, and subscribe!

Once you are feeling more comfortable with R, Posit (formally RStudio) maintains a great repository of cheatsheets they create for various R packages here.

If you learn better from videos and a structured class:

Step 3a — Data Science and Machine Learning Bootcamp with R

Udemy — Data Science and Machine Learning Bootcamp with R — Jose Portilla It is usually on sale for less than $15. Anything by Jose Portilla is great! You could also start here with Introduction to R on Udemy which is free.

If you learn better from following a book and like to jump around:

Step 3b — Hands-on Programming with R

Hands-on Programming with R is a free book that is helpful. Introduction to R — Andrew Ellis and Boris Mayer. This takes you from installation of R all the way to CFA, SEM, and HLM.

If you learn better from slides and like the structure of Beginner, Intermediate, Advanced:

Step 3c — Brad Boehmke’s workshops

Introduction to R

Intermediate R

Advanced R

Brad Boehmke does a phenomenal job of walking you through the process of getting started with R all the way to more advanced techniques with his use of entertaining and interactive slides along with providing you with the scripts and the files.

Step 3d — Richard Landers’ Data Science for Social Scientists course

Data Science for Social Scientists Schedule and Materials

I love that Dr. Richard Landers covers String Manipulation and Natural Language Processing (NLP), including Regular Expressions (REGEX) with the course RegexOne as this is so valuable to Social Scientists as we often encounter qualitative data and NLP and REGEX enable us to pull out the necessary information in the minimum amount of time.

If you are already feeling comfortable with R, skip to Step 4, then jump to the DPLYR section:

Step 4 — Data Wrangling

A comprehensive tutorial using dplyr from the tidyverse to wrangle your data. Data Wrangling with dplyr (4 parts)

Suzan Baert walks you through the major aspects of the tidyverse to help you bend data to your will.

Step 4a — Dates and Times!

Dates and times can be a stumbling point when working with data. lubridate from the tidyverse makes transforming data to the correct format much easier.

LubridateDate & Time Data with lubridate

Step 5 — R Software Handbook

R Software Handbook is free and includes the following:

  1. R Basics
  2. Data Preparation and Cleaning
  • Data Manipulation
  • Stats
  1. Data viz in base R
  2. Data vis with ggplot2

Step 6 — The Big Book of R

The Big Book of R from Oscar Baruffa — has everything you need (300+ books and counting) and gets updated. I’ll provide some stops below if you don’t know where to get started:

  1. 2 New to R? Start here
  2. 22.20 The Tidyverse Cookbook
  3. 12.1 A ggplot2 Tutorial for Beautiful Plotting in R
  4. 14.10 Handbook of Regression Modeling in People Analytics
  5. 14.9 Handbook of Graphs and Networks in People Analytics with Examples in R and Python
  6. 14.15 R for Excel users
  7. 17.4 Text Mining with R
  8. 20.14 Tidy Modeling with R
  9. 20.5 Hands on Machine Learning with R

Step 7 — If you have worked through the above steps, you will now know where you need to go to continue your R journey. I’ve made some suggestions below if you are a social scientist like myself.

Data Science for Psychologists — another free book R for HRComputing for Information Science — This has a little bit of everything 7 HR Data Sets for People Analytics

If you don’t want a video, start here with Coding Club and work through the slides of Introduction to R.

Then go to Jumpstart with R created by the amazing Matt Dancho. This is a paid class, but very worth it in my opinion.

To see how a pro uses R, here are David Robinson’s TidyTuesday walkthroughs on YouTube where he wrangles and visualizes a different dataset every week using R.

Julie Silge, a data scientist and software engineer at Postit PBC, also does an awesome job with the TidyTuesday data and often incorporates Machine Learning as well.

Highlighted Blogs from Julia Silge:

There is also this course out of the UC Business Analytics program which uses DataCamp

Step 8 — Object Oriented Programming (OOP)

Object-oriented programming (OOP) is a way of organizing and designing computer programs to make them more manageable and easier to understand.It’s like thinking about the world around us, where everything is an object with certain characteristics and behaviors.

Object Oriented Programming (OOP) in R | Create R Objects & ClassesAdvanced R — OOP Introduction

Step 9 — Time Series Analysis

In this series Boris Guarisma walks you though various ways of looking at your time series data. https://blog.bguarisma.com/series/time-series-forecasting

Step 10 — You should now have a good idea of where you need to go next. Here are some additional useful sites that I’ve found

DPLYR

Once you’ve hit the dplyr section of the above classes, you may want to pepper these in.

In the meantime, hit up these sites to learn how you can actually put R to work for presentations to lay people (C-suite, execs, managers, etc.)

Tabyls — a tidy, fully-featured approach to counting things Openxlsx — Simplifies the creation of Excel files with R Spin — turning R scripts into HTML, PDF, Word (a derivative of knitr and Rmarkdown) formattable-pretty tables in R Officer package-native PowerPoint and Word docs straight from R

People Analytics Bonuses

R for People Analytics — RStudio Conference 2022 — Keith McNulty

Data Storytelling

Telling Stories with Data — with applications in R — Rohan Alexander (2023)

Writing functions in the Tidyverse

R for Epidemiology — Writing functions

Machine Learning Bonuses

Introduction to Machine Learning with the Tidyverse — This is Alison Hill’s talk from the rstudio::conf. Click on “start here” for all of the slides and code.

Julia Silge’s blog — Lots of TidyTuesday data being preprocessed and modeled with TidyModels

Machine learning with {tidymodels} — train multiple models at once

Explainability of {tidymodels} models with {iml}

My Data Science Notes — Michael Foley’s notes across various data science classes in bookdown format

Introduction to Applied Machine Learning — John J. Curtin — This course is designed to introduce students to a variety of computational approaches in machine learning. The course is designed with two key foci. First, students will focus on the application of common, “out-of-the-box” statistical learning algorithms that have good performance and are implemented in tidymodels in R. Second, students will focus on the application of these approaches in the context of common questions in behavioral science in academia and industry.

ISLR Tidymodels Labs — labs from Introduction to Statistical Learning with applications in R using tidymodels

Getting Machine Learning into Production

Machine Learning for Social Scientists — Jorge Cimentada

Where does probability fit in? — vignette on threshold_perf function to determine optimal classification threshold cutoff using tidymodels

Solutions from Posit — Getting Started

Other helpful websites dealing with statistics

StatQuest — Josh Starmer

Statistics of Doom — Dr. Erin Buchanan

Quant Psych — Dr. Dustin Fife

If you enjoyed this article, you can help me share this knowledge with others by:👏claps, 💬comment, and be sure to 👤+ follow.

--

--