Pandas Slicing : Readable Way

Yodi D

Published in

MyTake

2 min readSep 29, 2019

Image source : https://www.sunkist.com/tips/how-to-slice-oranges/

Let say we have df1.

Most of us, doing things like this when slicing dataframe.

value9 = df1.loc[df1[‘ID’]==’1', 'value9']

It’s ok actually, but it gets worse when we have multiple criteria

#THIS ALL IN ONE LINE ACTUALLY
value9 = df1.loc[(df1[‘ID’]==’1') & (df1[‘value1’]==’10') & (df1[‘value2’]==’21') & (df1[‘value3’]==’22') & (df1[‘value4’]==’24'), 'value9']

It becomes not readable. Why don’t we add a new line?

value9 = df1.loc[(df1[‘ID’]==’1') & \
(df1[‘value1’]==’10') & \
(df1[‘value2’]==’21') & \
(df1[‘value3’]==’22') & \
(df1[‘value4’]==’24'), \
'value9']

Well ummh, it’s readable, but the codes become ugly, any solution ?

What if we separate the selection part and put it in an object ?

select = (df1[‘ID’]==’1') & \
(df1[‘value1’]==’10') & \
(df1[‘value2’]==’21') & \
(df1[‘value3’]==’22') & \
(df1[‘value4’]==’24')value9 = df1.loc[select, 'value9']

Yeap, it’s better now. Even, we can re-use the selection if we need other value/data.

select = (df1[‘ID’]==’1') & \
(df1[‘value1’]==’10') & \
(df1[‘value2’]==’21') & \
(df1[‘value3’]==’22') & \
(df1[‘value4’]==’24')#JUST EXAMPLE, I know u always copy-paste the slicing part, even me
value9 = df1.loc[select, 'value9']
value8 = df1.loc[select, 'value8']
shape = df1.loc[select, 'value9'].shape
index = df1[select].index

But there is something that bugs me. I have new line marks (‘\’) which make me :

I don’t know what is first slicing criteria, is it slice by ‘ID’ on first-line or ‘value1’ at second-line or else ? Besides, ‘ID’ looks hidden.
I need to read lines one by one where is the end of ‘select’ operation. Maybe in the middle of the line is a different process of code.

So, instead of using new lines, what if we use parenthesis?

select = (
     (df1[‘ID’]==’1') &
     (df1[‘value1’]==’10') &
     (df1[‘value2’]==’21') &
     (df1[‘value3’]==’22') &
     (df1[‘value4’]==’24')
)value9 = df1.loc[select, 'value9']

Now, I know where is ‘select’ code start and end at one sight. More positive things using parenthesis is, it’s easier to read if we have nested criteria, for example :

select = (
     (df1[‘ID’]==’1') &
     (
          (df1[‘value1’]==’10') | 
          (df1[‘value1’]==’10')
     ) &
     (
          (df1[‘value2’]==’21') | 
          (df1[‘value2’]==’21')
     ) &
     (df1[‘value3’]==’22') &
     (df1[‘value4’]==’24')
)value9 = df1.loc[select, 'value9']

Wow, it’s ugly again, but it’s readable. Can we still improve? Add comments!

select = (
     #CRITERIA1 =============
     (df1[‘ID’]==’1') &     #CRITERIA2 =============
     (
          (df1[‘value1’]==’10') | 
          (df1[‘value1’]==’10')
     ) &     #CRITERIA3 =============
     (
          (df1[‘value2’]==’21') | 
          (df1[‘value2’]==’21')
     ) &     #CRITERIA4 =============
     (df1[‘value3’]==’22') &     #CRITERIA5 =============
     (df1[‘value4’]==’24')
)#DO SELECTION
value9 = df1.loc[select, 'value9']

Well, this one is still ugly but very readable (at least for me). Can you help me to improve it? beside the chaining method?

Feedback really appreciated. Thanks!

Pandas Slicing : Readable Way

Written by Yodi D