Getting started with Data science in 30 days with R programming : Day 1

SaiGayatri Vadali
3 min readDec 19, 2017

--

This article is the first one in the series and other articles are available here.

What is this about ??

With the explosion of data and the revolution being brought by it, there are many employees and students trying to have a peek into this data revolution but are discouraged with the sight of so many tools like R, python, SAS etc. Are you also one among them? Intrigued by these terms namely Data analytics, business analytics, Machine learning and Artificial intelligence but clueless as to how to learn and know more about them? Then read along!!!

This series of posts is an attempt to help those who are in this pursuit and introduce them to the world of Data using the most widely popular language - R.

Daily, I would be posting a concept or an intuition behind a Data visualization or a data exploration technique using R language.

There are absolutely no prerequisites to this tutorial as I would take you through necessary statistics, coding wherever needed. But you are always encouraged to know the extra math, statistics and algorithmic details yourself as after all being a Data explorer is to be a continuous explorer.

Let’s get started!!

What do data scientists do??

Today, Let us know what are the general sequences of steps data and the data scientist undergoes - playing, fighting and understanding each other.

  1. Data collection from various resources : It’s a common myth among newbies and outsiders that data science is the easiest job once you learn R or python. You become a data scientist and you have everything at your table -the data, the tools, the algorithms. All you need is to apply that one best algorithm( I never found it yet ☹), get results and enjoy. The above statement of mine is peak load of myths. The problems of the data scientist begin right from this step because of the three popular properties of Data 1. Volume 2. Variety 3. Velocity
  2. Cleaning data : Also known as data preprocessing is a very important and is inturn a collection of many other steps. This is to make data a bit more understandable and clear to dig further. This might involve steps ranging from filling missing values to scaling and reducing certain features ( we will know about these in the coming articles clearly).
  3. Exploring data : This step generally extends into and not limited to graphical visualizations of Data. R provides various libraries to help view data in 2D, 3D Images apart from producing beautiful animations.
  4. Applying various algorithms:Now comes the most important step of building the model- Applying the algorithm. We apply different machine learning algorithms, statistical methods to build models and compare their performance. This step might give us our required accuracy straight away with available models like linear regression or SVM or might need us to build an ensemble model( a model built combining various others).
  5. Testing the model : Wondering where the most important step of software development testing is.. It is done right in the fourth step while validating different models against themselves. We will know about this step as we progress further.

It is to be noted that these steps may not occur independent of each other and may not even be in the same order as mentioned.

Having read about so many steps and so many new things, you might be wondering how can someone make it till the end of all these steps and become a data scientist. This tutorial undoubtedly removes your inhibitions and gives you all motivation to push further. It takes 10 minutes of your day and makes you equipped slowly with all essential weapons to wage your war. I will work to my best level to give you the best experience learning Data Science with R. All it takes 20–40 minutes out of your day if you want that extra edge apart from this reading. As already mentioned don’t worry about prerequisites like math, coding and all. After all, you are building a model in your first 30 days!!! Happy learning R!!!

Please don’t forget to share, clap and comment below. I am a new learner and contributor here. These would mean alot to keep me going.

--

--

SaiGayatri Vadali

An inquisitive Machine Learning Engineer, yoga trainer, fitness freak and a passionate writer!