Day 2: Install R and get started with it’s basics

SaiGayatri Vadali
5 min readDec 20, 2017

--

This article is the second one in the series “Getting started with Data Science in 30 days using R programming” . As a part of it , today we will learn about softwares to be installed and basics of R.

Introduction to R :

R is a statistical language available as a free open source GNU project. R is available under CRAN. There is always a continuous contribution to this CRAN community because of which it has nearly 10,000 packages till date. It is an interpreted language, meaning that all commands typed on the keyboard are directly executed without requiring to build a complete program like in most computer languages (C, Fortran, Pascal, . . .).

Installation of R and RStudio:

The recent version R can be downloaded from here. Choose the installer or zip which is compatible with your operating system and unzip it to get R software.

After getting R, I highly recommend to download RStudio as it is the most useful interactive software which makes your coding more easy and fun. Here is the link to RStudio website. Choose the open source version, download it and install on your system.

Once you are done with downloading both these softwares, keep you RStudio window open and dive in along with me.

In today’s article we’ll discuss about the basics of R like:

  • R objects
  • Variables and Data Types
  • Vectors and Lists
  • Matrices
  • Data Frames

R objects :

Unlike other languages like C, C++, R doesn’t declare its’ variables using the data types. R has objects. The variables are assigned to objects and their data type becomes as that of the assigned object. The important objects of R are

  • Vectors
  • Lists
  • Matrices
  • Arrays
  • Factors
  • Data Frames

Variables:

In R, variables are created by assigning the value directly to an identifier. Valid variable names may consist of letters, numbers and special characters namely dot or underscore. Note that, a dot should not be followed by a number. Here is an example for declaring a variable:

a <- 6
print(a)
[1] 6

Data Types:

The data type of an object can be found using ‘class()’ function. Basic classes in R are:

  1. Logical
  2. Integer
  3. Numeric
  4. Character
  5. Complex
  6. Raw

If you create a new variable, then the class of the variable can be found by:

x <- 2.3
class(x)
[1] "numeric"

Vectors:

Vectors are the most basic objects of R. This is because of the advantages they provide namely ease of representing and understanding data. They are of six types attributing to the six atomic classes in R.

Creation of vectors in R:

  1. Using c():

The c() method is used to create vectors combining different values together. We can even combine objects of different data types, then the data type of vector becomes the highest data type of it’s elements.

x <- c(1,2,3,4)
print(x)
[1] 1 2 3 4x <-c (1,2,3,4.4)
class(x)
[1] "numeric"x <- c(1,2,3,4.4,"c")
class(x)
[1] "character"

2: Vector can also be created using ‘:’

x <- 1:6
print(x)
[1] 1 2 3 4 5 6

3: Using ‘seq()’ function:

x <- seq(1:6)
print(x)
[1] 1 2 3 4 5 6x <- seq(1,6,by=2)
print(x)
[1] 1 3 5

Accessing elements of the vector:

We access elements of vector using indexing operation with ‘[]’ brackets. Indexing starts from 1.

x <- 1:6
print(x)
[1] 1 2 3 4 5 6print(x[2])[1] 2

Lists:

Lists are flexible and all-in-one kind of objects. They can store objects of different types. They can have matrices, numeric, vectors and even other lists with in them. Let’s have a look at their creation and accessing their elements just like we did with vectors.

List can be explicitly created using ‘list()’ function as shown below:

list_data <- list("Bob", "Builder", c(1,2,3,4), TRUE, 981.23,119.13)
print(list_data)
[[1]]
[1] "Bob"
[[2]]
[1] "Builder"
[[3]]
[1] 1 2 3 4
[[4]]
[1] TRUE
[[5]]
[1] 981.23
[[6]]
[1] 119.13

Two lists can be merged as follows using c() method:

list_data <- list("Bob", "Builder", c(1,2,3,4), TRUE, 981.23,119.13)
second_list <- list("cat","bat",2.4,5)
Third_list <- c(list_data,second_list)
print(Third_list)
[[1]]
[1] "Bob"
[[2]]
[1] "Builder"
[[3]]
[1] 1 2 3 4
[[4]]
[1] TRUE
[[5]]
[1] 981.23
[[6]]
[1] 119.13
[[7]]
[1] "cat"
[[8]]
[1] "bat"
[[9]]
[1] 2.4
[[10]]
[1] 5

We can even assign names to the elements of list. For example:

Named_list <- list (c("Monday","Tuesday","Wednesday", "Thursday", "Friday", "Saturday"), c("January","February", "March","April"))names(Named_list) <- c("Weekdays","Months")
print(Named_list)
$Weekdays
[1] "Monday" "Tuesday" "Wednesday" "Thursday" "Friday" "Saturday"
$Months
[1] "January" "February" "March" "April" "May" "June"

Accessing the elements of lists:

We can access elements of list through index or names of attributes.

second_list <- list("cat","bat",2.4,5)
second_list[2]
[[1]]
[1] "bat"

Accessing through names:

Named_list <- list (c("Monday","Tuesday","Wednesday", "Thursday", "Friday", "Saturday"), c("January","February", "March","April"))print(Named_list$Weekdays)[1] "Monday"    "Tuesday"   "Wednesday" "Thursday"  "Friday"    "Saturday"

Matrices :

Matrices are vectors in two dimensions. They have elements of same data type just like vectors. They can be created explicitly using ‘matrix()’ method.

Sample_matrix <- matrix(1:6,2,2)
print(Sample_matrix)
[,1] [,2]
[1,] 1 3
[2,] 2 4

In the basic syntax of this function, we can see following parameters

Data — data that is to be placed in matrix

Nrow — number of rows

Ncol — number of columns

sample_matrix <- matrix(1:7,3,3)Error: In matrix(1:7, 3, 3) : data length [7] is not a sub-multiple or multiple of the number of rows [3]

Try to find the reason behind the error and write in response!!!

Factors:

Factors are used for categorical data. Factor variable can be created using factor() method. They are self describing when it comes to categorizing data. Here is an example illustrating the same

x <- factor(c("male","female","male","female"))
> levels(x)
[1] "female" "male"
> x
[1] male female male female
Levels: female male

Levels() method gives idea about the labels of the data elements.

Data Frames:

If ever you are someone who had already dealt with some data related projects, you must be knowing the importance of a dataframe. Once you are given a dataset, you try to put it in some format which makes it more clear to view and understand data. Data frame of R fulfills this need of the users. Before I take you through creating and manipulating it, have a glance at the data frame given below.

my_dataframe
id name age
1 1 Job 23
2 2 Bob 24
3 3 Ram 25

You must have seen that it is a matrix with multiple types of data with headers added to columns. Yes, a data frame is a convenient way of organizing data into rows and columns just like a two dimensional array structure with necessary headers and row ids.

Following are the characteristics of data frames:

  1. Column names cannot be null
  2. Row ids should be unique
  3. Each column should contain same number of data items

Creating a data-frame in R :

Generally data frame is created from dataset using read.csv() and read.table() methods to read data from different files like csv files.

read.csv():

This method helps in reading a csv formatted file and outputs a data frame. We will learn more about it in the coming articles.

read.csv("filename")

data.frame():

data.frame() helps in creating a data frame by concatenating various columns as shown below

>  my_dataframe= data.frame(id=c(1,2,3),name= c("Job","Bob","Ram"),age=c(23,24,25))
> my_dataframe
id name age
1 1 Job 23
2 2 Bob 24
3 3 Ram 25

We have gone through R objects and data types. In the next article, we shall know know more about manipulating, using and making arithmetic operations on these objects as the article has already become lengthy.

Hope you enjoyed reading this article. Please let me know how you felt reading this article through response section. Keep following the series and fly with colours in R!!

--

--

SaiGayatri Vadali

An inquisitive Machine Learning Engineer, yoga trainer, fitness freak and a passionate writer!