R Starter Kit
Where to begin when you don’t know where to begin with an emphasis on People Analytics
Overview
The way that I’ve seen most people have success learning R is to get the foundation of the programming language down (see Step 2) and then directly apply it to something they care about or needs to get done anyway.When I first started out, I would begin a project in R and get as far as I could in R and then make a note of wherever I got stuck so that I could come back later and figure it out while I finished up the project in SPSS, Excel, etc. in order to meet a deadline. I would visit R-Bloggers every day to get a feel for the “art of the possible” and then see what I could take from there and apply to People Analytics. Now with ChatGPT and other conversational chatbots coming online, you have another helper and I encourage you to use one of these chatbots as your “copilot” on your coding journey.
Step 0 — GUIs for R
If you don’t feel quite ready for a scripted programming language, then Jamovi may be the place to start. This is a GUI for R (similar to SPSS) as you can see in the user manual.
Step 0.5 — Transitioning from Excel
If you are comfortable with Excel, this article provides a nice transition into R.
How to transition from Excel to R
Step 1 — Get R and R Studio!
First make sure you have both R and R Studio. The download link to the RStudio Integrated Development Environment (IDE) is here.
Step 2 — Introduction video tutorial
Once you’ve installed R and R Studio, I’ve found the following tutorial on Youtube from freeCodeCamp.org presented by Barton Poulson to be very helpful. It will hit the following areas:
- Installing R
- RStudio
- Packages
- plot()
- Bar Charts
- Histograms
- Scatterplots
- Overlaying Plots
- summary()
- describe()
- Selecting Cases
- Data Formats
- Factors
- Entering Data
- Importing Data
- Hierarchical Clustering
- Principal Components
- Regression
- Next Steps
If you enjoy it as well, make sure to like, comment, and subscribe!
Once you are feeling more comfortable with R, Posit (formally RStudio) maintains a great repository of cheatsheets they create for various R packages here.
If you learn better from videos and a structured class:
Step 3a — Data Science and Machine Learning Bootcamp with R
Udemy — Data Science and Machine Learning Bootcamp with R — Jose Portilla It is usually on sale for less than $15. Anything by Jose Portilla is great! You could also start here with Introduction to R on Udemy which is free.
If you learn better from following a book and like to jump around:
Step 3b — Hands-on Programming with R
Hands-on Programming with R is a free book that is helpful. Introduction to R — Andrew Ellis and Boris Mayer. This takes you from installation of R all the way to CFA, SEM, and HLM.
If you learn better from slides and like the structure of Beginner, Intermediate, Advanced:
Step 3c — Brad Boehmke’s workshops
Introduction to R
- Intro & Fundamentals
- Importing data
- Data transformation
- Data visualization
- Data types
- Tidy data
- Joining data
- Data structures
Intermediate R
- Scoped variable transformations
- Control statements
- Workflow
- Iteration with loops
- Iteration with functional programming
- Writing functions
Advanced R
- Unsupervised modeling
- Supervised modeling process
- Feature & target engineering
- Regression & cousins
- Decision trees, bagging, & random forests
- Gradient boosting machines
- Stacked models & auto ML
- Interpretable machine learning
Brad Boehmke does a phenomenal job of walking you through the process of getting started with R all the way to more advanced techniques with his use of entertaining and interactive slides along with providing you with the scripts and the files.
Step 3d — Richard Landers’ Data Science for Social Scientists course
Data Science for Social Scientists Schedule and Materials
I love that Dr. Richard Landers covers String Manipulation and Natural Language Processing (NLP), including Regular Expressions (REGEX) with the course RegexOne as this is so valuable to Social Scientists as we often encounter qualitative data and NLP and REGEX enable us to pull out the necessary information in the minimum amount of time.
If you are already feeling comfortable with R, skip to Step 4, then jump to the DPLYR section:
Step 4 — Data Wrangling
A comprehensive tutorial using dplyr
from the tidyverse
to wrangle your data. Data Wrangling with dplyr (4 parts)
Suzan Baert walks you through the major aspects of the tidyverse to help you bend data to your will.
- Data Wrangling Part 1: Basic to Advanced Ways to Select Columns
- Part 2: Transforming your columns into the right shape
- Part 3: Filtering rows
- Part 4: Summarising your data
Step 4a — Dates and Times!
Dates and times can be a stumbling point when working with data. lubridate
from the tidyverse
makes transforming data to the correct format much easier.
LubridateDate & Time Data with lubridate
Step 5 — R Software Handbook
R Software Handbook is free and includes the following:
- R Basics
- Data Preparation and Cleaning
- Data Manipulation
- Stats
- Data viz in base R
- Data vis with
ggplot2
Step 6 — The Big Book of R
The Big Book of R from Oscar Baruffa — has everything you need (300+ books and counting) and gets updated. I’ll provide some stops below if you don’t know where to get started:
- 2 New to R? Start here
- 22.20 The Tidyverse Cookbook
- 12.1 A ggplot2 Tutorial for Beautiful Plotting in R
- 14.10 Handbook of Regression Modeling in People Analytics
- 14.9 Handbook of Graphs and Networks in People Analytics with Examples in R and Python
- 14.15 R for Excel users
- 17.4 Text Mining with R
- 20.14 Tidy Modeling with R
- 20.5 Hands on Machine Learning with R
Step 7 — If you have worked through the above steps, you will now know where you need to go to continue your R journey. I’ve made some suggestions below if you are a social scientist like myself.
Data Science for Psychologists — another free book R for HRComputing for Information Science — This has a little bit of everything 7 HR Data Sets for People Analytics
If you don’t want a video, start here with Coding Club and work through the slides of Introduction to R.
Then go to Jumpstart with R created by the amazing Matt Dancho. This is a paid class, but very worth it in my opinion.
To see how a pro uses R, here are David Robinson’s TidyTuesday walkthroughs on YouTube where he wrangles and visualizes a different dataset every week using R.
Julie Silge, a data scientist and software engineer at Postit PBC, also does an awesome job with the TidyTuesday data and often incorporates Machine Learning as well.
Highlighted Blogs from Julia Silge:
- Practice using lubridate… THEATRICALLY
- Handle class imbalance in #TidyTuesday climbing expedition data with tidymodels
- Predict housing prices in Austin TX with tidymodels and xgboost
- Predict #TidyTuesday giant pumpkin weights with workflowsets
- Text predictors for #TidyTuesday chocolate ratings
- PCA and UMAP with tidymodels and #TidyTuesday cocktail recipes
- Use Docker to deploy a model for #TidyTuesday LEGO sets
- You can also search (https://juliasilge.com/categories/tidymodels/) what she has posted strictly on tidymodels
There is also this course out of the UC Business Analytics program which uses DataCamp
Step 8 — Object Oriented Programming (OOP)
Object-oriented programming (OOP) is a way of organizing and designing computer programs to make them more manageable and easier to understand.It’s like thinking about the world around us, where everything is an object with certain characteristics and behaviors.
Object Oriented Programming (OOP) in R | Create R Objects & ClassesAdvanced R — OOP Introduction
Step 9 — Time Series Analysis
In this series Boris Guarisma walks you though various ways of looking at your time series data. https://blog.bguarisma.com/series/time-series-forecasting
- Time Series Forecasting Lab (Part 1) — Introduction to Feature Engineering
- Time Series Forecasting Lab (Part 2) — Feature Engineering with Recipes
- Time Series Forecasting Lab (Part 3) — Machine Learning with Workflows
- Time Series Forecasting Lab (Part 4) — Hyperparameter Tuning
- Time Series Forecasting Lab (Part 5) — Ensembles
- Time Series Forecasting Lab (Part 6) — Stacked Ensembles
Step 10 — You should now have a good idea of where you need to go next. Here are some additional useful sites that I’ve found
DPLYR
Once you’ve hit the dplyr
section of the above classes, you may want to pepper these in.
- Introduction to dplyr for Faster Data Manipulation in R
- dplyr tutorial
- DATA MANIPULATION WITH DPLYR (WITH 50 EXAMPLES)
- Comprehensive Guide to Data Visualization in R
- Beautiful plotting in R: A ggplot2 cheatsheet
- Tidyr’s pivot_longer() and pivot_wider() Examples From the #TidyTuesday Challenge
In the meantime, hit up these sites to learn how you can actually put R to work for presentations to lay people (C-suite, execs, managers, etc.)
Tabyls — a tidy, fully-featured approach to counting things Openxlsx — Simplifies the creation of Excel files with R Spin — turning R scripts into HTML, PDF, Word (a derivative of knitr and Rmarkdown) formattable-pretty tables in R Officer package-native PowerPoint and Word docs straight from R
People Analytics Bonuses
R for People Analytics — RStudio Conference 2022 — Keith McNulty
Data Storytelling
Telling Stories with Data — with applications in R — Rohan Alexander (2023)
Writing functions in the Tidyverse
R for Epidemiology — Writing functions
Machine Learning Bonuses
Introduction to Machine Learning with the Tidyverse — This is Alison Hill’s talk from the rstudio::conf. Click on “start here” for all of the slides and code.
Julia Silge’s blog — Lots of TidyTuesday data being preprocessed and modeled with TidyModels
Machine learning with {tidymodels} — train multiple models at once
Explainability of {tidymodels} models with {iml}
My Data Science Notes — Michael Foley’s notes across various data science classes in bookdown format
Introduction to Applied Machine Learning — John J. Curtin — This course is designed to introduce students to a variety of computational approaches in machine learning. The course is designed with two key foci. First, students will focus on the application of common, “out-of-the-box” statistical learning algorithms that have good performance and are implemented in tidymodels in R. Second, students will focus on the application of these approaches in the context of common questions in behavioral science in academia and industry.
ISLR Tidymodels Labs — labs from Introduction to Statistical Learning with applications in R using tidymodels
Getting Machine Learning into Production
Machine Learning for Social Scientists — Jorge Cimentada
Where does probability fit in? — vignette on threshold_perf
function to determine optimal classification threshold cutoff using tidymodels
Solutions from Posit — Getting Started
Other helpful websites dealing with statistics
StatQuest — Josh Starmer
Statistics of Doom — Dr. Erin Buchanan
Quant Psych — Dr. Dustin Fife
If you enjoyed this article, you can help me share this knowledge with others by:👏claps, 💬comment, and be sure to 👤+ follow.