GETTING STARTED WITH R: BASICS

Daniel Tope Omole
Nerd For Tech
Published in
6 min readNov 27, 2022

--

The central aim of every research is to generate insights or reaffirm already existing insights on a specific topic. Insights are developed by analysing data into information; the process of analysing this data is data analysis. Data analysis is a simple but structured process that involves data extraction, cleaning, exploratory data analysis, validation, visualization and presentation. The processes involve using multiple software or a single software depending on the requirements of the study and the methodology.

R Studio will be used as the integrated development environment as we talk about using the R language. It was created specifically for statistical computing and graphics. R is a GNU project that is similar to the S language and environment that John Chambers and colleagues created at Bell Laboratories (formerly AT&T, now Lucent Technologies). R offers a wide range of graphical and statistical techniques, including time-series analysis, classification, clustering, and linear and nonlinear modelling. It is also very extensible (R-project, 2022).

Installing of R and RStudio:
Download the most recent version, R, from this page. To get the R software, select the installer or zip that is compatible with your operating system and unzip it.

After installing R, You will have to download RStudio because it is the most practical interactive programme for R. Here is the website address for RStudio. Select the open-source version, download it to your computer, and install it.

Keep your RStudio window open and follow along with me after you have finished downloading both of these programmes.

Let’s begin by discussing some important terms in R

R objects:
R uses objects because it is an object-oriented programming language, as opposed to a language like C which is procedural-oriented and uses data types to declare variables. In R, variables are linked to objects, and the variable is then given the object’s data type. Vectors, lists, Arrays, Matrices, Data Frames, Tibbles, and Factors are a few of the very useful R objects.

The assignment operators can be used to save variables in objects:

· A character less than and a hyphen without a space combine to form (<-)

· The sign for equality (=).

Then, other calculations can make use of these items. Simply type in the object’s name to print it. There are some limitations to naming an object:

· !, +, -, and # are examples of symbols that cannot be used in object names.

· Both a dot (.) and an underscore (), as well as a name that begins with a dot, are acceptable.

· Although they can end with a number, object names cannot begin with one.

· X and x, as well as temp and temP, are separate objects, and R is case-sensitive.

Variables:
Data values are kept in variables that are assigned to objects. There isn’t a command in R for declaring variables. When a variable is first given a value, it is considered to have been created. Use the <- sign to assign a variable to an object. Simply type the variable name to output (or print) the value.

 # Assigning variable to an object

e <- 1
e
[1] 1

E <-2
E
[1] 2

q <- "mike"
q
[1] "mike"

Data types:
Scalars, vectors (numerical, character, and logical), matrices, data frames, and lists are just a few of the numerous data types available in R.

To identify the data type of an object the “class()” function is used.

 # Identifying the data type of an object

e <- 1
class(e)
[1] "numeric"

q <- "mike"
class(q)
[1] "character"

c <- TRUE
class(c)
[1] "logical"

Vectors:
The most fundamental objects in R are vectors. This is due to their benefits, particularly the simplicity with which data can be represented and understood. Vectors can be created using the c () method: Vectors are created by combining multiple values with the c() method.

Objects of different data types can also be combined, in which case the vector’s data type becomes the highest data type of its constituent materials. R will generate a vector with a mode that can accommodate all of the elements it contains. This conversion between storage modes is known as “coercion.” It is referred to as “implicit coercion” when R converts the mode of storage based on its content.

# Creating vectors

# Using the C() method to create
a <- c(1,2.1,5.3,6,-2,4)
class(a)
[1] "numeric"

b <- c("five","six","ten")
class(b)
[1] "character"

c <- c(FALSE,TRUE,TRUE,FALSE,TRUE,FALSE)
class(c)
[1] "logical"

d <- c(1,2,4,2,"one","two",3)
class(d)
[1] "character"
# Vectors can also be created using
# : method (it is used to create vectors as a sequence of numbers.)

nums <- 3:13
nums
[1] 3 4 5 6 7 8 9 10 11 12 13

# seq() method (it is also used to create vectors as a sequence of numbers.)

nums2 <- seq(1:13)
nums2
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13

nums3 <- seq(4,20,by=3)
nums3
[1] 4 7 10 13 16 19

Lists
Lists are versatile and all-encompassing objects. They can hold a variety of objects. They can contain matrices, numeric values, vectors, and even other lists.

# Creating a list
# Lists are created using list() funtion
list1 <- list("mike",1,3,4,c(9,5,7),"one","two")
list1
[[1]]
[1] "mike"

[[2]]
[1] 1

[[3]]
[1] 3

[[4]]
[1] 4

[[5]]
[1] 9 5 7

[[6]]
[1] "one"

[[7]]
[1] "two"

Multiple lists can be combined and the elements of the list can be named. To assess one of the lists in the merged list “$” is used to select it.

# Merging and naming lists
named_list <- list (c(16,19,20,31,22,24), c("lagos","abuja", "cape town","new york"))
names(named_list) <- c("age","cities")
named_list
$age
[1] 16 19 20 31 22 24

$cities
[1] "lagos" "abuja" "cape town" "new york"


#selecting a list from the named lists
print(named_list$age)
[1] 16 19 20 31 22 24

Matrices:
Matrices is a vector that contains two dimensions. All columns in a matrix must have the same mode(numeric, character, etc.) and the same length.
The general format is:

mymatrix <- matrix(vector, nrow=r, ncol=c, byrow=FALSE,dimnames=list(char_vector_rownames, char_vector_colnames))
 #Creating a matrix
# Generate 3 x 5 numeric matrix
y<-matrix(1:15, nrow=3,ncol=5)
y
[,1] [,2] [,3] [,4] [,5]
[1,] 1 4 7 10 13
[2,] 2 5 8 11 14
[3,] 3 6 9 12 15

#
cells <- c(13,61,54,28)
rnames <- c("A1", "A2")
cnames <- c("B1", "B2")
x <- matrix(cells, nrow=2, ncol=2, byrow=TRUE, dimnames=list(rnames, cnames))
x
B1 B2
A1 13 61
A2 54 28

Data Frames:
In that different columns can have different modes, a data frame is more general than a matrix (numeric, character, factor, etc.). A data frame is an efficient way to structure data into rows and columns, similar to a two-dimensional array structure with mandatory headers and row ids.

Data frames have the following characteristics:
· Column names cannot be empty.
· Row identifiers should be unique.
· Each column must have the same number of data items.

# Creating a Data Frame

age <- c(15,19,17,13)
sex <- c("male", "female", "female","male")
member <- c(FALSE,TRUE,FALSE,FALSE)
mydata <- data.frame(age,sex,member)
names(mydata) <- c("Age","sex","member") #variable names

mydata
Age sex member
1 15 male FALSE
2 19 female TRUE
3 17 female FALSE
4 13 male FALSE

Data Frame also created from a dataset using read.csv() and read.table() methods to read data from different files like CSV files.

read.csv():

This method helps in reading a CSV formatted file and outputs a data frame. We would discuss this further at another time.

# first row contains variable names, comma is separator 
edustats <- read.csv(file = 'C:/Users/Omole Daniel Tope/Downloads/edustats.csv')
head(edustats)
Indicator Unit Subgroup Area Area.ID Time.Period Source Data.Value
1 Gender parity index in secondary level enrolment Index Total Jamaica LACJAM 2008 JAM-Ministry of Education_Statistics Unit_2008 0.98
2 Gender parity index in secondary level enrolment Index Total Jamaica LACJAM 2009 JAM-Ministry of Education_Statistics Unit_2009 1.00
3 Gender parity index in secondary level enrolment Index Total Jamaica LACJAM 2010 JAM-Ministry of Education_Statistics Unit_2010 1.03
4 Gender parity index in secondary level enrolment Index Total Jamaica LACJAM 2011 JAM-Ministry of Education_Statistics Unit_2011 1.06
5 Gender parity index in secondary level enrolment Index Total Jamaica LACJAM 2012 JAM-Ministry of Education_Statistics Unit_2012 1.04
6 Gender parity index in secondary level enrolment Index Total Jamaica LACJAM 2013 JAM-Ministry of Education_Statistics Unit_2013 1.05

# The View() function invokes a spreadsheet-style data viewer within RStudio. Note: Make sure you type a capital “V” when using this function
View(edustats)

We have already covered the data types and objects in R. We will learn more about manipulating, using, and creating mathematical operations in the following article.

Thanks for taking the time to read my blog ❤️. You can reach out to me on LinkedIn

If you have any thoughts on the topic, please share them in the comments; I’m always looking to learn more and improve.

If this post was helpful, please click the clap 👏 button below a few times to show your support for the author 👇

--

--

Daniel Tope Omole
Nerd For Tech

A data scientist with a background in healthcare. My expertise in data analysis and machine learning using tools like python, R , STATA, SQL to deliver insights