The Shortest Most Powerful Line of Code in Exploratory Data Analysis Is in R

Okunola Musbaudeen
Apr 4 · 7 min read

Coding is simply the rearrangement of key words to form a coherent statement that machines can understand, without the understanding, we fail to instruct machines to do that analysis for us, or output that result we so want to present. Many a time, newbees get overwhelmed with so many functions they come across, a function that calls many functions within itself, some even call outside functions. How can newbies dissect these functions easily, what are the variables the function works with, what does one even use the function for? Today I’d like to talk about a simple yet powerful function in R. The str() function.

First, let’s talk about str(), the idea behind this function is to return the internal structure of an object in a compact form, hence its name, str(ucture) function. It’s a simple diagnostic tool that is very versatile such that it can work with any function and object. Once called, it aims to return a compact output detailing what is contained in the object or function we call with it, even if it is nested over several layers.

Let’s see what it does exactly, For simplicity in case you’re practicing along, let’s use a dataset available in R already. My dataset of choice is the infert dataset because I’ve never explored the dataset before. So…

head(infert, 5)

infert is a dataset about infertility after spontaneous and induced abortion, so if we want a snapshot of the data, we can check the structure…

str(infert)

…now we know that the dataset has 248 observations and 8 variables, we can also see the names of the variables, the first rows of the data, and the datatype with str(). The data type that holds more details, education being a factor gets expatiated, we see it has 3 levels and the first levels are also listed. This gives us a quick sense of what the data looks like.

Structure can also work on functions to show a snapshot of the arguments the function works on, let’s try it on a function

str(ls)

we can see that lsis a function, and most arguments are displayed already, reading function documentations is great but most documentation readings can be avoided if we use str more often.

Let’s try it on a nested dataset, a dataframe of dataframes. We’ll use the EuStockMarkets here..

str(EuStockMarkets)

After checking the structure, we see it’s a time series dataset, so it isn’t a great dataset to illustrate what I’m about to show you. So let’s return to our first data because the data still makes sense after splitting.

Infertile <- split(infert, infert$education)

We’ve split our data frame into 3 dataframes, all packed into a dataframe named Infertile, let’s see the data frames differently.

lapply(Infertile, head)

Then let’s go ahead and check Infertile structure

str(Infertile)

This lists the data frames then goes ahead to detail the structure of the objects in each data frame, commonly referred to as element, contained in Infertile. We can see from the first lines of these elements that 0–5yrs has 12 observations, 6–11yrs has 120 observations, and 12+ yrs has 116 observations. Do with that piece of info what you wish.

We can now see the power of str(), it gives a nice as-compact-as-it-can-get view of data so you can get a quick understanding of what’s missing and the next step you need to take in your EDA. So anytime you have an R object and you don’t know what is in it, I implore you to throw a str() at it.

Geek Culture

Proud to geek out.

Sign up for Geek Culture Hits

By Geek Culture

Subscribe to receive top 10 most read stories of Geek Culture — delivered straight into your inbox, once a week. Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Geek Culture

A new tech publication by Start it up (https://medium.com/swlh).

Okunola Musbaudeen

Written by

A solution driven data analyst, domain expertise cut across procurement planning, inventory management, Supply chain network design, and operations research.

Geek Culture

A new tech publication by Start it up (https://medium.com/swlh).

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store