R is a powerful language used widely for data analysis and statistical computing. It was developed in early 90s. Since then, endless efforts have been made to improve R’s user interface. The journey of R language from a rudimentary text editor to interactive R Studio and more recently Jupyter Notebookshas engaged many data science communities across the world.
This was possible only because of generous contributions by R users globally. Inclusion of powerful packages in R has made it more and more powerful with time. Packages such as dplyr, tidyr, readr, data.table, SparkR, ggplot2 have made data manipulation, visualization and computation much faster.
This is a complete tutorial to learn data science and machine learning using R. By the end of this tutorial, you will have a good exposure to building predictive models using machine learning on your own.
Note: No prior knowledge of data science / analytics is required. However, prior knowledge of algebra and statistics will be helpful.
Table of Contents
- Basics of R Programming for Data Science
- Why learn R ?
- How to install R / R Studio ?
- How to install R packages ?
- Basic computations in R
- Data Types and Objects in R
- Control Structures (Functions) in R
- Useful R Packages
- Basic Graphs
- Treating Missing values
- Working with Continuous and Categorical Variables
- Feature Engineering
- Label Encoding / One Hot Encoding
- Linear Regression
- Decision Tree
- Random Forest
I’ve used a step wise step methodology to explain the underlying concepts of R programming. Focus has largely been kept on implementing data manipulation and machine learning tasks in R.