Pandas Slicing : Readable Way

Yodi D
MyTake
Published in
2 min readSep 29, 2019

Let say we have df1.

df1

Most of us, doing things like this when slicing dataframe.

value9 = df1.loc[df1[‘ID’]==’1', 'value9']

It’s ok actually, but it gets worse when we have multiple criteria

#THIS ALL IN ONE LINE ACTUALLY
value9 = df1.loc[(df1[‘ID’]==’1') & (df1[‘value1’]==’10') & (df1[‘value2’]==’21') & (df1[‘value3’]==’22') & (df1[‘value4’]==’24'), 'value9']

It becomes not readable. Why don’t we add a new line?

value9 = df1.loc[(df1[‘ID’]==’1') & \
(df1[‘value1’]==’10') & \
(df1[‘value2’]==’21') & \
(df1[‘value3’]==’22') & \
(df1[‘value4’]==’24'), \
'value9']

Well ummh, it’s readable, but the codes become ugly, any solution ?

What if we separate the selection part and put it in an object ?

select = (df1[‘ID’]==’1') & \
(df1[‘value1’]==’10') & \
(df1[‘value2’]==’21') & \
(df1[‘value3’]==’22') & \
(df1[‘value4’]==’24')
value9 = df1.loc[select, 'value9']

Yeap, it’s better now. Even, we can re-use the selection if we need other value/data.

select = (df1[‘ID’]==’1') & \
(df1[‘value1’]==’10') & \
(df1[‘value2’]==’21') & \
(df1[‘value3’]==’22') & \
(df1[‘value4’]==’24')
#JUST EXAMPLE, I know u always copy-paste the slicing part, even me
value9 = df1.loc[select, 'value9']
value8 = df1.loc[select, 'value8']
shape = df1.loc[select, 'value9'].shape
index = df1[select].index

But there is something that bugs me. I have new line marks (‘\’) which make me :

  • I don’t know what is first slicing criteria, is it slice by ‘ID’ on first-line or ‘value1’ at second-line or else ? Besides, ‘ID’ looks hidden.
  • I need to read lines one by one where is the end of ‘select’ operation. Maybe in the middle of the line is a different process of code.

So, instead of using new lines, what if we use parenthesis?

select = (
(df1[‘ID’]==’1') &
(df1[‘value1’]==’10') &
(df1[‘value2’]==’21') &
(df1[‘value3’]==’22') &
(df1[‘value4’]==’24')
)
value9 = df1.loc[select, 'value9']

Now, I know where is ‘select’ code start and end at one sight. More positive things using parenthesis is, it’s easier to read if we have nested criteria, for example :

select = (
(df1[‘ID’]==’1') &
(
(df1[‘value1’]==’10') |
(df1[‘value1’]==’10')
) &
(
(df1[‘value2’]==’21') |
(df1[‘value2’]==’21')
) &
(df1[‘value3’]==’22') &
(df1[‘value4’]==’24')
)
value9 = df1.loc[select, 'value9']

Wow, it’s ugly again, but it’s readable. Can we still improve? Add comments!

select = (
#CRITERIA1 =============
(df1[‘ID’]==’1') &
#CRITERIA2 =============
(
(df1[‘value1’]==’10') |
(df1[‘value1’]==’10')
) &
#CRITERIA3 =============
(
(df1[‘value2’]==’21') |
(df1[‘value2’]==’21')
) &
#CRITERIA4 =============
(df1[‘value3’]==’22') &
#CRITERIA5 =============
(df1[‘value4’]==’24')
)
#DO SELECTION
value9 = df1.loc[select, 'value9']

Well, this one is still ugly but very readable (at least for me). Can you help me to improve it? beside the chaining method?

Feedback really appreciated. Thanks!

--

--