The Ultimate R Guide

An Index of Learning Resources and Libraries For Data Science

Zoshua Colah
Data Science Library
7 min readNov 2, 2018

--

Source: https://cdn0.tnwcdn.com/wp-content/blogs.dir/1/files/2018/01/IJii3Ho-1-796x398.png

About R

R is a free programming language and software environment for statistical computing and graphics that is supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. Polls, surveys of data miners, and studies of scholarly literature databases show that R’s popularity has increased substantially in recent years. R ranks 8th in the TIOBE index.

Download R here!

Packages, the fundamental building blocks of R, are groups of code, data, documentation, and tests that allow you to do more with many more data sets.

Search for packages here!

There’s a variety of IDEs (essentially, programs that act as text editors, debuggers, and compilers all in one) that can be used for R, but arguably the best and most useful one would have to be R Studio. It can autocomplete certain names or functions, is a very strong text editor, and has seamless integration into some very useful packages.

Download R Studio here!

Books

Technical Foundations of Informatics (highly recommended)

Easily the best resource for any beginner data scientist, this book is a free textbook written for Informatics students at the University of Washington. It has information about setting up your machine, the basic structures and ideas of R, creating repositories to share your code with others on GitHub, and much more. Give it a read before you start any coding, it’s more than worth your time!

Hands-on Programming with R

This book assumes you have some basic knowledge of statistics and programming, so be aware of that going into it. But if you are able to provide that, you’ll be rewarded with a great variety of entertainingly written R concepts and techniques! It is around $18 to buy (or $7 to rent) on Amazon.

Online Courses

Microsoft’s Introduction to R

This is a free course offered by Microsoft that covers the basic fundamentals of R over the course of 4 weeks, at a rate of around 2 or 3 hours a week. No prior experience is required, so it is perfect for beginners who want an insight on what the data science industry is like!

Data Analysis with R

Udacity offers this Facebook-sponsored course for free, although it requires a greater knowledge of R and programming than some of the other mentioned courses. It also lasts 2 months, so it goes farther in-depth than some other online resources. With a strong community of like-minded students, a large influence from industry pros, and rich content, you should definitely check this out further down the road of your R career!

DataCamp (highly recommended):

This website is really good for beginners just starting to learn about coding, since its solid built-in web IDE helps you learn R through small and simple problems. The courses are short and the videos don’t waste any time, so be sure to look into it!

Variance Explained Data Analysis

Variance’s R course is structured similar to the Informatics textbook mentioned earlier in this post, so anyone familiar with that resource will find this highly useful! It also explains some more complex topics, like data visualizations and linear regression, so it provides a strong foundation for more complex classes you may take in the future.

Johns Hopkins University’s R Programming Course

Coursera offers this R programming class with the support of John Hopkins University entirely for free, which makes it a strong entry point for any starting programmers to start with. It only takes around 20 hours to complete and is flexible around your own schedule, but the difficulty is what one would expect from a university-level course. It also requires some background knowledge of Python (a different programming language entirely).

Data Manipulation With ‘dplyr’ in R

‘dplyr’ is a package in R that is perfect for data transformations and manipulations. While this is more of an information post on ‘dplyr’ than an actual course, it will require a large amount of background knowledge in R, so keep that in mind before attempting it. But once you start getting into it, you’ll find that the methods and techniques it describes are invaluable for any data science project!

Other Recommended Websites:

BlogR is a blog run by a data scientist that can offer subtle R tips and tricks that one might miss from the larger, previously-mentioned online courses. From adding numbers to the end of a line in a graph to adding interesting color palettes to a scatterplot, this website is worth checking out once your R career starts getting more serious.

If you’re interested in seeing what other data scientists have come up with, you should check out R-Bloggers! It has some very interesting, deep articles about topics like machine learning and neural networks; at the same time, feel free to check out fun posts about Valentine’s Day statistics or a data visualization inspired by the popular comedy show Brooklyn Nine-Nine.

R Packages and Libraries

Source: http://resizeandsave.online/dappy-February_16_7.html

R provides a lot of powerful packages to work on different tasks like loading, manipulating, modeling, and visualizing data. Listed below is a brief list of some of those packages.

Importing Data into R

  • Readxl — fast way to read excel files in R without additional software
  • Googlesheets — Read data from Google Sheets into R
  • Jsonlite — Parse JSON files within R or turn data frames into JSON
  • Rvest — Extract data from HTML pages
  • RMySQL, RPostgresSQL, RSQLite — If you’d like to read in data from a database, these packages are a good place to start. Choose the package that fits your type of database.

Manipulating Data in R

  • dplyr — Essential shortcuts for subsetting, summarizing, rearranging, and joining together data sets. ‘dplyr’ is our go-to package for fast and intuitive data manipulation.
  • tidyr — Powerful tools for changing the layout of your data sets. Use its functions to convert your data into the tidy format, the layout R likes best.
  • stringr — Easy to learn tools for regular expressions and character strings.
  • lubridate — Tools that make working with dates and times easier.
  • Sqldf — Run SQL queries on your data frame with ‘sqldf’.
  • Janitor — Data cleaning made easy. It lets you find duplicate columns, remove empty columns, etc.
  • TidyText — Text mining with tidy data frames

Visualizing Data in R

  • ggplot2 — One of R’s most popular packages, essential for making beautiful graphics
  • Plotly — Basic interactive graphs which are easy to make
  • ggvis — Interactive, web-based graphics
  • rgl — Interactive 3-D visualizations
  • Leaflet — Intuitive and easily-made maps
  • Dygraphs — Graphs and charts for time-based information
  • DT — Simple tables
  • diagrammeR — Both simple and complex diagrams, useful for math or programming related drawings
  • network3D — Tools for deep and complicated network graphs
  • threeJS — 3-D scatterplots and globes
  • gganimate — Animating data visualizations

Reporting Data using R

  • shiny — Interactive and easily-made web apps with R. A perfect way to explore data and share findings with non-programmers.
  • R Markdown — The perfect workflow for reproducible reporting. Write R code in your markdown reports. R Markdown will replace the code with its results and then export this report as an HTML, pdf, Word document, or a HTML or pdf slideshow.
  • xtable — The xtable function takes an R object (like a data frame) and returns the latex or HTML code you need to paste a prettier version of it into your documents.

Check these sites out for an exhaustive list:

Thank you for reading. If you found this article useful, please give it a clap. Also, a big thank you to Sanjay Unni for editing this article. Add Sanjay on LinkedIn here!

--

--