Introduction to Pandas in Python — Guide for iloc and loc

Correct way to select data in pandas with example python codes!

Shuhan Lu
The Startup
6 min readFeb 6, 2020

--

by GagoDesign, via shutterstock

This blog is the first article of the series of exhaustive introduction to pandas package in python. It mainly contains the following topics.

  1. Basic introduction and data exploration methods for pandas.
  2. How to select a single value in pandas?
  3. Summary and comparison between iloc[] and loc[].

What is Pandas?

Pandas is a package that is used in python to conduct data manipulation and data analysis. As long as you are going to do anything related to data, pandas is one of the packages you may use.

Here is the link to the official documentations of pandas, you can find all the functions and how to use them here.

Introduction and Basic Functions for Pandas

The most basic form to store data in pandas is called DataFrame. It’s like the excel, which has columns and rows. For each column there is a corresponding column name and for each row there is a corresponding row index. Most of pandas’s functions are based on DataFrame.

There are several ways you can use to get a general idea of how the dataset looks like.

And you can check the full codes on my Github.

DataFrame.head(n) allows you to examine the first n rows(default value is 5 rows) of the dataset.

First row of the dataset

DataFrame.tail(n) allows you to examine the last n rows(default value is 5 rows) of the dataset.

Last row of the dataset

DataFrame.info() allows you to get a general idea about general information about this dataset and how each column looks like, including their value types, total counts, etc.

Basic information about dataset and columns

DataFrame.describe() allows you to get a statistical summary for numerical columns.

Statistical summary for numerical columns

DataFrame.columns allows you to examine the columns of the dataset.

Columns of the dataset

DataFrame.index allows you to examine the row indices of the dataset.

Row indices of the dataset

How to select single value using Pandas?

Some people may ask: How hard can it be to select a single data point from a Dataframe? Well, at least in python, it could be quite tricky.

You can select certain data by locating their positions in the Dataframe or you can do that by calling the labels(columns name or row indices) or you can use boolean value to select a set of data.

Select a single data point by its position

In pandas,DataFrame.iloc[] can be used to select data by its positions.

iloc[] receives two parameters(separated by ,), which are all integers. The parameters could be one single integer or a range index like [1:3] or a list of integers like[1,3,5].

First three rows of the dataset

Note that for iloc[], the range index only includes the start index and will exclude the end index and starts from 0. So in iloc[] , [1:3] only includes the second and third rows or columns.

Example of using range index

The range index[1:2] includes only the start index 1and excludes the end index 2.

Example of using a single integer
Example of using a list of integers

Note that although the outcome value are the same, their types are different from each other (as you can see the second outcome is a little bit different). Use type(object) in python to examine the type of certain object.

Select a single data point by its labels(column names & row indices)

In pandas, DataFrame.loc[] can be used to select data by its labels.

loc[] receives two parameters (separated by ,). The first parameter is row index and the second parameter is column names.

Example of using a single label

Note that the default row indices are sequential numbers but keep in mind that even use numbers as input for loc[] it is the row index that actually works!

When the row index is default value
When the row index is not default value

Also for iloc[], the range index includes both the start index & end index and starts from 0.

Example of using range index of labels
Example of using a list of labels

Select a single data point by a mix of position & labels

A combination of loc[] and iloc[] (DataFrame.loc[].iloc[]) or vice-versa enable us to select data points by a mix of positions & labels.

Example of using a mix of labels and integers

Just keep in mind that iloc[] receives integers as input and loc[] receives labels as input.

Try NOT to do this!

Some of you may find that DataFrame[column_name][row_number] can also be used to select data.

This is called chained indexing and try not to do this when you want to select certain data points even though it works. Here is the official explanation for why this is not recommended.

If you don’t want to go through the files, just remember the methods I provided above using loc[] and iloc[]. They should be enough for selecting a single data point.

Summary & Comparison between “loc” and “iloc”

Differences between “loc” and “iloc”

loc[] receives labels as input, which are column names and row indices while iloc[] receives integers as input, which are the sequential numbers of columns and rows.

The range index in loc[]includes both start and end indices while the range index in iloc[] only includes the start index and will excludes the end index.

Similarities between “loc” and “iloc”

Both loc[] and iloc[] can receive boolean value (will be covered in my next article) as their input.

Both loc[] and iloc[] receive two parameters, the first one indicates the rows and the second one indicates the columns.

Both loc[] and iloc[] contain loc. LOL.

Summary

To sum up, loc[] and iloc[] can both select certain data points from a dataframe. You should choose the proper one to use based on the context.

I will cover more tips and functions of using pandas for data manipulation and data analysis. Please feel free to reach out to me if you have anything that you want to discuss for the related topics!

Also, you can check my next blog: Introduction to Pandas in Python — Selecting data and handling missing values for further guide on using pandas to select multiple values and deal with missing data.

--

--

Shuhan Lu
The Startup

MSBA student in San Francisco seeking full-time analytic positions. Know more about me on my website: shuhanlu.net