Slicing Hierarchical Indexes

Jimmy Aidoo
Analytics Vidhya
Published in
5 min readMay 5, 2020

In the previous tutorial, we learnt some useful tools on how to get a more nuanced data analysis done on some datasets particularly if they are presented in higher dimensions. It is what we call Hierarchical Indexing or MultiIndexing. A DataFrame is said to be hierarchically indexed when it has more than one index column or higher dimensions. You can refresh your memory here just so you get a sense of where we are and what it is I will be talking about in this tutorial.

I will be getting into some useful analysis that can be done on MultiIndexed DataFrames. I will be using the same Fifa19 dataframe I used in the last post.

In [1]: import pandas as pd
In [2]: data = {'Position': ['GK','GK','GK', 'DF','DF','DF',
'MF','MF','MF','CF','CF','CF'],
'Name': ['De Gea', 'Coutois','Allison','Van Dijk',
'Ramos','Godin','Hazard','Kante','De Bruyne', 'Ronaldo'
'Messi','Neymar'],
'Overall': ['91','88','89','89','91','90','91',
'90','92','94','93','92'],
'Rank': ['1st','3rd','2nd','3rd','1st','2nd',
'2nd','3rd','1st','1st','2nd','3rd']}
In [3]: fifa19 = pd.DataFrame(data, columns=['Position','Name',
'Overall','Rank'])
In [4]: fifa19
Out[4]:
Position Name Overall Rank
0 GK De Gea 91 1st
1 GK Coutios 88 3rd
2 GK Allison 89 2nd
3 DF Van Dijk 89 3rd
4 DF Ramos 91 1st
5 DF Godin 90 2nd
6 MF Hazard 91 2nd
7 MF Kante 90 3rd
8 MF De Bruyne 92 1st
9 CF Ronaldo 94 1st
10 CF Messi 93 2nd
11 CF Neymar 92 3rd

This Dataframe is from the player ratings of a few selected players fromEA Sport's FIFA19 dataset. The dataframe is not currently presented in the format we like it so I will quickly jump a few steps and get it Multiindexed and sorted. If you miss any step, you can refer to the previous post here.

In [5]: fifa19.set_index(['Position','Rank']).sort_index()
Out[5]: Name Overall
Position Rank
CF 1st Ronaldo 94
2nd Messi 93
3rd Neymar 92
DF 1st Ramos 91
2nd Godin 90
3rd Van Dijk 89
GK 1st De Gea 91
2nd Coutois 89
3rd Allison 88
MF 1st De Bruyne 92
2nd Hazard 91
3rd Kante 90

Our data is now sorted hierarchical manner and ready to be analysed.

Sorting a hierarchically indexed dataframe is useful slicing. For instance, if we want to find out who is the best Centre-Forward among these lot according to the FIFA19 dataset, we will call the .locaccessor and pass a tuple ('CF' ,'1st') to extract the highest-ranked Forward in the DataFrame.

In [6]: fifa19.loc[('CF', '1st')]
Out[6]:
Name Ronaldo
Overall 94
Name: (CF, 1st), dtype: object

However, if we follow the tuple with a specific column name, ie, either ‘Name’ or ‘Overall’, we will extract a single element of the table.

In [7]: fifa19.loc[('CF', '1st'), 'Name']
Out[7]:
'Ronaldo'

Next, if we happen to call the .loc accessor with a single string, let’s say ‘GK’ alone, Pandas will then slice the outer index and return all the rows corresponding to string sliced, ie, ‘GK’.

In [8]: fifa19.loc['GK']
Out[8]: Name Overall
Rank
1st De Gea 91
2nd Alisson 89
3rd Coutois 88

Another thing we can do is to extract the slice using a list of indexes. We do this by passing a list of two or more index strings after calling the .locaccessor. Pandas then returns all rows of those indexes.

In [9]: fifa19.loc[['GK', 'MF']]
Out[9]: Name Overall
Position Rank
GK 1st De Gea 91
3rd Coutois 88
2nd Alisson 89
MF 2nd Hazard 91
3rd Kante 90
1st De Bruyne 92

There are instances where we need to find out, for instance, only the highest-ranked players in some specific positions. Pandas makes it possible to do that using tuple consisting of a list and the inner index of choice. What I am saying is, if we are required to slice, let’s say, the 3rd best player amongst the Goalkeepers (‘GK’) and the Midfielders (‘MF’), we will pass a tuple after calling the .loc accessor and in the tuple, pass a list comprising ‘GK’ and ‘MF’ only and then ‘3rd’ after a comma. We will then add a colon at the very end of the .loc call to indicate that we want all the rows for those indexes returned.

Let’s see

In [10]: fifa19.loc[(['GK' ,'MF'], '3rd'), :]
Out[10]: Name Overall
Postion Rank
GK 3rd Coutois 88
MF 3rd Kante 90

In the event, we are only interested in a specific column rather than the entire row of the selected Positions, we call that specific column instead of the colon :

In [10]: fifa19.loc[(['GK' ,'MF'], '3rd'), 'Name']
Out[10]:
Postion Rank
GK 3rd Coutois
MF 3rd Kante
Name: Name, dtype: object

This fanciful way of slicing indexes is also applicable for the inner levels. Looking up data based on inner indexes could be tricky at times. You could run into problems if you pay attention to detail. The syntax is quite similar to the outer indexing. That is, if we want to see the Best and Worst ranked Centre-Forward in the DataFrame, we could do that by calling ‘CF’ followed by a list of ‘1st’ and ‘3rd’ and then a colon : to return all columns of those rows.

In [11]: fifa19.loc[('CF', ['1st' , '3rd']),:]
Out[11]: Name Overall
Postion Rank
CF 1st Ronaldo 94
3rd Neymar 92

There is this last trick I feel I have to introduce you to when it comes to slicing hierarchical indexes. It is the use of python’s in-built function Slice()

A tuple used for indexing does not recognise slicing with colon natively. To make that work, we use Python’s Slice() explicitly.

A Slice() creates an instance of type slice. As we already know, a Slice can be used to index sequences in Start, End, Step.

Let’s say we want to select the highest-rated players in each of the positions, we could do with a

In [12]: fifa19.loc[(['CF','DF','GK' ,'MF', ], '1st'), :]
out[12]: Name Overall
Postion Rank
CF 1st Ronaldo 94
DF 1st Ramos 91
GK 1st De Gea 91
MF 1st De Bruyne 92

That works perfectly but we can have a ‘simpler’ alternative which is the use of Slice().

In [13]: fifa19.loc[(slice(None), '1st'), :]
out[13]: Name Overall
Postion Rank
CF 1st Ronaldo 94
DF 1st Ramos 91
GK 1st De Gea 91
MF 1st De Bruyne 92

We use ‘None’ to get the default value for all of the 3 slice arguments.

Slice(None) creates a slice instance as Slice(none, none, none) and hence returns all the rows that have a ‘1st’ index.

That seems like a handful. You should have enough on your plate to chew on at this point. I will end it here. You can take time out and practice these on your own. The concepts are relatively easy to follow and understand.

In the next tutorial, I will be dealing with Pivoting, Pivot tables, Stacks and Melting.

Until then, Keep Coding!

--

--

Jimmy Aidoo
Analytics Vidhya

CyberSecurity. Machine Learning. Bread & Circuses.