Using .loc

Angie Rincon
5 min readSep 20, 2021

--

I found myself pretty deep into studying data science and still confused about how and when to use .loc. This gap in my knowledge became an annoyance while trying to preprocess a very large and confusing dataset for a group project. It was a real stumbling block during EDA(exploratory data analysis) and while working on ways to deal with null values. I wanted to fix this gap and learn more! It turns out this is a very useful tool and is actually very fun and not very difficult!

.loc allows you to specify specific rows and columns in your dataset to access specific information. This tool can be used in endless different ways and is a very fundamental thing to understand and be able to use. The basic layout for .loc is:

dataframe.loc[row:row, column_name:column_name]

Row numbers and column names are inclusive.

So, dataframe.loc[1:3, ‘column1’:’column3'] would include rows 1 through 3 and columns 1, 2, and 3.

Rows are called by their index and columns care called by their names.

To play around and practice I imported Pandas and then I downloaded the Titanic dataset from Kaggle.

I took a look at the basic information about the dataset.

.info shows us the shape of our dataframe(887, 8), our column names, our datatypes, and our null value counts(none!).

After this inital look I decided to use my new knowledge to answer some questions about our Titanic dataset.

Basic .loc exploration:

  1. What does row 42 look like?
  2. What if we just want to know if the person in row 42 survived?
  3. What if we just want to know their age?
  4. How do we see their age and sex at the same time?
  5. What do rows 10–20 look like?
  6. What about just name, sex, and age for rows 10–20?

Deeper Exploration:

  1. How many children between the ages of 1 and 5 died?
  2. How many men on board had 2 or more siblings or spouses with them?
  3. How many children in Pclass 3 died?
  4. What percentage of married women survived?
  5. What is the average fare paid by Pclass 1?

Basic .loc exploration:

What does row 42 look like?

We specify the specific row that we want to see and then : means we want to see all columns.

What if we just want to know if the person in row 42 survived?

We indicate the row that we would like to see and the column we want. Our dataset tells us that 0=did not survive and 1= did survive so our person in row 42 did survive!

What if we just want to know their age?

How do we see their age and sex at the same time?

We separate the two columns with a , to indicate that we only want to see these two columns. If they were separated by : it would return all columns from Age to Sex.

What about just name, sex, and age for rows 10–20?

The : is returning all columns between Name and Age. In this case that is Name, Sex, and Age. Remember that everything is inclusive.

Deeper .loc Exploration:

This is where I thought this tool became really fun! In my group project we used things like the following examples to build functions for preprocessing our data.

I find it easiest to break the question down into individual steps. I ran the code after each addition to make sure it was returning what I wanted.

How many children between the ages of 1 and 5 died?

  1. Limit the age data to greater than 1 (df.Age > 1)
  2. Limit the age data to less than 5 (df.Age < 5)
  3. Limit to people who did not survive (df.Survived == 0)

12 children between the ages of 1 and 5 died.

How many men on board had 2 or more siblings or spouses with them?

  1. Limit to men (df.Sex == ‘male’)
  2. Limit siblings and spouses to greater than 1 (df[‘Siblings/Spouses Aboard’] > 1)

40 men on board had 2 or more siblings or spouses on board with them.

How many children in Pclass 3 died?

  1. Limit to passengers under 18 years of age (df.Age < 18)
  2. Limit to Pclass 3 (df.Pclass == 3)
  3. Limit to those who did not survive (df.Survived == 0)

62 children in Pclass 3 died.

What percentage of married women survived?

Note: I did not do a deep exploration to make sure that all married women were listed with the ‘Mrs’ prefix. Illustration purposes only here.

  1. Limit to passengers with the prefix ‘Mrs’ (df.Name.str.contains(‘Mrs’))
  2. Indicate that you want to look at the Survived column (, ‘Survived’)
  3. Get value counts
  4. Do a bit of math

79.2% of married women survived.

What is the average fare paid by Pclass 1?

  1. Limit to Pclass 1 (df.Pclass == 1)
  2. Indicate that you want to look at the Fare
  3. Get the mean

Pclass 1 passengers paid, on average, $84.15 for their passage.

.loc is a powerful tool for a data scientist! I found it quite fun and rewarding and I’m looking forward to using it more moving forward.

Some resources:

Pandas documentation for .loc can be found here.

A very informative youtube video from Data School regarding .loc.

--

--