Pandas — GroupBy.first vs GroupBy.nth vs GroupBy.head

Published in

The Startup

4 min readSep 2, 2020

Often there comes a need to compute operations on groups. But there are times when getting the first or the nth row from each group is the highest priority.

Comic-con tickets🎫️ were going on sale on next Tuesday and the sale was divided into 3 sessions. First in the morning at 10 AM, second at 2 PM and the last one at 8 PM.

It was decided that the first user🥇 of each session will be given a free tour to any country that the user specifies during the ticket reservation.

After the end of the day following user data was collected:

Comic-con user data

Extracting the first user from each session

Pandas dataframe has groupby([column(s)]).first() method which is used to get the first record from each group.

groupby on basis of session

Penny didn’t put anything in the country field ❌

The result of grouby.first() is going off the road a little bit with the last group — that is, 8PM where Penny was the first one to get the ticket. Due to some reasons Penny left the country field empty but the above result has Australia in it.

As a result of which Penny was forced to go to Australia✈️. Poor Penny🤣.

It seems that there is something wrong with the GroupBy.first method😕

The expected result should be

`groupby.first()` was expected to give the above result ✔️

There is nothing wrong with the groupby.first()method. It rather works that way.

So, if there is a null value in the first record then the first non-null value in the group is carried up into the first record. This is reason why Penny went on a tour to Australia.

What if the entire group contains the null values, then what should be expected from GroupBy.first?

Lets take a look at the last year comic-con where no one wanted to on a tour.

Last year comic-con

Getting the first user of each session.

first user extraction

What if there is only one record in group and the same record has null in it?

The result stays the same as the above scenario — that is, the record of that group will have null in it.

Is there a way to make it work to get the correct result using GroupBy.first?

The way to fix this problem is to replace the np.nan(NaN) with None using the np.where()method.

replacing np.nan(NaN) with None

There’s a simple way to get this things done without worrying about the np.nan and None. GroupBy.nth comes in handy during such situation⛑️.

GroupBy.nth doesn’t change anything and gives the result as per order even though if the first record of the group has null value in it. Also, it has extra capabilities.

GroupBy.nth to get the first record

The np.nan is not replaced with None and the GroupBy.nth gives the expected result. The result stays the same even if the np.nan are replaced with None

GroupBy.nth has some extra powers

Now, Comic-con wants to select 2 users from each session. First and Third user will be the lucky winners.

GroupBy.nth can be used to get multiple specific records within each group.

Providing a list. Here, 0 for first record while 2 for third record.

First and third record from each session

Finally, the day has arrived when Sheldon, Leonard, and Raj (except Howard😞) got lucky. Country was pre-planned by the group. Comic-con decided to get first 3 entries from each session.

Is there a way to get the first n records from each group?

During such a time GroupBy.head comes to the rescue🏃‍♂️. GroupBy.nth can be used but the catch here is to provide a sequence of list starting from 0 to all the way to n-1.

GroupBy.head for first n records

The GroupBy.head preserves the index of the original dataframe.

GroupBy.head can also be used to get the first record from each group irrespective of np.nan(NaN) and None.

Safe journey to the winners.

Pandas — GroupBy.first vs GroupBy.nth vs GroupBy.head

Extracting the first user from each session

Written by Faith