# The Shortest Most Powerful Line of Code in Exploratory Data Analysis Is in R

Apr 4 · 7 min read

Coding is simply the rearrangement of key words to form a coherent statement that machines can understand, without the understanding, we fail to instruct machines to do that analysis for us, or output that result we so want to present. Many a time, newbees get overwhelmed with so many functions they come across, a function that calls many functions within itself, some even call outside functions. How can newbies dissect these functions easily, what are the variables the function works with, what does one even use the function for? Today I’d like to talk about a simple yet powerful function in R. The `str()` function.

First, let’s talk about str(), the idea behind this function is to return the internal structure of an object in a compact form, hence its name, `str(ucture)` function. It’s a simple diagnostic tool that is very versatile such that it can work with any function and object. Once called, it aims to return a compact output detailing what is contained in the object or function we call with it, even if it is nested over several layers.

Let’s see what it does exactly, For simplicity in case you’re practicing along, let’s use a dataset available in R already. My dataset of choice is the `infert` dataset because I’ve never explored the dataset before. So…

`head(infert, 5)##   education age parity induced case spontaneous stratum pooled.stratum## 1    0-5yrs  26      6       1    1           2       1              3## 2    0-5yrs  42      1       1    1           0       2              1## 3    0-5yrs  39      6       2    1           0       3              4## 4    0-5yrs  34      4       2    1           0       4              2## 5   6-11yrs  35      3       1    1           1       5             32`

`infert` is a dataset about infertility after spontaneous and induced abortion, so if we want a snapshot of the data, we can check the structure…

`str(infert)## 'data.frame':    248 obs. of  8 variables:##  \$ education     : Factor w/ 3 levels "0-5yrs","6-11yrs",..: 1 1 1 1 2 2 2 2 2 2 ...##  \$ age           : num  26 42 39 34 35 36 23 32 21 28 ...##  \$ parity        : num  6 1 6 4 3 4 1 2 1 2 ...##  \$ induced       : num  1 1 2 2 1 2 0 0 0 0 ...##  \$ case          : num  1 1 1 1 1 1 1 1 1 1 ...##  \$ spontaneous   : num  2 0 0 0 1 1 0 0 1 0 ...##  \$ stratum       : int  1 2 3 4 5 6 7 8 9 10 ...##  \$ pooled.stratum: num  3 1 4 2 32 36 6 22 5 19 ...`

…now we know that the dataset has 248 observations and 8 variables, we can also see the names of the variables, the first rows of the data, and the datatype with `str()`. The data type that holds more details, education being a factor gets expatiated, we see it has 3 levels and the first levels are also listed. This gives us a quick sense of what the data looks like.

Structure can also work on functions to show a snapshot of the arguments the function works on, let’s try it on a function

`str(ls)## function (name, pos = -1L, envir = as.environment(pos), all.names = FALSE, pattern, sorted = TRUE)`

we can see that `ls`is a function, and most arguments are displayed already, reading function documentations is great but most documentation readings can be avoided if we use `str` more often.

Let’s try it on a nested dataset, a dataframe of dataframes. We’ll use the EuStockMarkets here..

`str(EuStockMarkets)##  Time-Series [1:1860, 1:4] from 1991 to 1999: 1629 1614 1607 1621 1618 ...##  - attr(*, "dimnames")=List of 2##   ..\$ : NULL##   ..\$ : chr [1:4] "DAX" "SMI" "CAC" "FTSE"`

After checking the structure, we see it’s a time series dataset, so it isn’t a great dataset to illustrate what I’m about to show you. So let’s return to our first data because the data still makes sense after splitting.

`Infertile <- split(infert, infert\$education)`

We’ve split our data frame into 3 dataframes, all packed into a dataframe named Infertile, let’s see the data frames differently.

`lapply(Infertile, head)## \$`0-5yrs`##    education age parity induced case spontaneous stratum pooled.stratum## 1     0-5yrs  26      6       1    1           2       1              3## 2     0-5yrs  42      1       1    1           0       2              1## 3     0-5yrs  39      6       2    1           0       3              4## 4     0-5yrs  34      4       2    1           0       4              2## 84    0-5yrs  26      6       2    0           0       1              3## 85    0-5yrs  42      1       0    0           0       2              1## ## \$`6-11yrs`##    education age parity induced case spontaneous stratum pooled.stratum## 5    6-11yrs  35      3       1    1           1       5             32## 6    6-11yrs  36      4       2    1           1       6             36## 7    6-11yrs  23      1       0    1           0       7              6## 8    6-11yrs  32      2       0    1           0       8             22## 9    6-11yrs  21      1       0    1           1       9              5## 10   6-11yrs  28      2       0    1           0      10             19## ## \$`12+ yrs`##    education age parity induced case spontaneous stratum pooled.stratum## 45   12+ yrs  30      1       0    1           0      45             44## 46   12+ yrs  37      1       1    1           0      46             48## 47   12+ yrs  28      2       0    1           2      47             51## 48   12+ yrs  27      4       2    1           0      48             61## 49   12+ yrs  26      2       2    1           0      49             49## 50   12+ yrs  38      3       0    1           2      50             60`

Then let’s go ahead and check `Infertile` structure

`str(Infertile)## List of 3##  \$ 0-5yrs :'data.frame': 12 obs. of  8 variables:##   ..\$ education     : Factor w/ 3 levels "0-5yrs","6-11yrs",..: 1 1 1 1 1 1 1 1 1 1 ...##   ..\$ age           : num [1:12] 26 42 39 34 26 42 39 34 26 42 ...##   ..\$ parity        : num [1:12] 6 1 6 4 6 1 6 4 6 1 ...##   ..\$ induced       : num [1:12] 1 1 2 2 2 0 2 0 2 0 ...##   ..\$ case          : num [1:12] 1 1 1 1 0 0 0 0 0 0 ...##   ..\$ spontaneous   : num [1:12] 2 0 0 0 0 0 0 1 0 0 ...##   ..\$ stratum       : int [1:12] 1 2 3 4 1 2 3 4 1 2 ...##   ..\$ pooled.stratum: num [1:12] 3 1 4 2 3 1 4 2 3 1 ...##  \$ 6-11yrs:'data.frame': 120 obs. of  8 variables:##   ..\$ education     : Factor w/ 3 levels "0-5yrs","6-11yrs",..: 2 2 2 2 2 2 2 2 2 2 ...##   ..\$ age           : num [1:120] 35 36 23 32 21 28 29 37 31 29 ...##   ..\$ parity        : num [1:120] 3 4 1 2 1 2 2 4 1 3 ...##   ..\$ induced       : num [1:120] 1 2 0 0 0 0 1 2 1 2 ...##   ..\$ case          : num [1:120] 1 1 1 1 1 1 1 1 1 1 ...##   ..\$ spontaneous   : num [1:120] 1 1 0 0 1 0 0 1 0 0 ...##   ..\$ stratum       : int [1:120] 5 6 7 8 9 10 11 12 13 14 ...##   ..\$ pooled.stratum: num [1:120] 32 36 6 22 5 19 20 37 9 29 ...##  \$ 12+ yrs:'data.frame': 116 obs. of  8 variables:##   ..\$ education     : Factor w/ 3 levels "0-5yrs","6-11yrs",..: 3 3 3 3 3 3 3 3 3 3 ...##   ..\$ age           : num [1:116] 30 37 28 27 26 38 24 36 27 28 ...##   ..\$ parity        : num [1:116] 1 1 2 4 2 3 3 5 3 1 ...##   ..\$ induced       : num [1:116] 0 1 0 2 2 0 1 1 1 0 ...##   ..\$ case          : num [1:116] 1 1 1 1 1 1 1 1 1 1 ...##   ..\$ spontaneous   : num [1:116] 0 0 2 0 0 2 2 2 1 1 ...##   ..\$ stratum       : int [1:116] 45 46 47 48 49 50 51 52 53 54 ...##   ..\$ pooled.stratum: num [1:116] 44 48 51 61 49 60 56 62 57 42 ...`

This lists the data frames then goes ahead to detail the structure of the objects in each data frame, commonly referred to as element, contained in Infertile. We can see from the first lines of these elements that 0–5yrs has 12 observations, 6–11yrs has 120 observations, and 12+ yrs has 116 observations. Do with that piece of info what you wish.

We can now see the power of `str()`, it gives a nice as-compact-as-it-can-get view of data so you can get a quick understanding of what’s missing and the next step you need to take in your EDA. So anytime you have an R object and you don’t know what is in it, I implore you to throw a `str()` at it.

### By Geek Culture

Subscribe to receive top 10 most read stories of Geek Culture — delivered straight into your inbox, once a week. Take a look.

Medium sent you an email at to complete your subscription.

## Geek Culture

A new tech publication by Start it up (https://medium.com/swlh).

Written by

## Okunola Musbaudeen

A solution driven data analyst, domain expertise cut across procurement planning, inventory management, Supply chain network design, and operations research.

## Geek Culture

A new tech publication by Start it up (https://medium.com/swlh).

## Creating a Live World Weather Map using Shiny

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app