# Counting, as a fraction of a group

One of the common operations that I keep coming back to in my data science projects is counting the number of items in a group as a fraction of the total number of items in the group.

For example, in one dataset of marathon runners I have columns for `gender`

and `age`

and also for `expected_time`

, where *some* runners have indicated their expected finish-time. I’d like to group runners by `gender`

and `age`

and count the percentage of runners (in each group) who provide an `expected_time`

value, to see how it varies with gender and age.

In the past I have always calculated this using division, with two separate `groupby`

operations: one to count the number of runners (by `gender`

and `age`

) with non-null `expected_time`

entries divided by another to count the total number of runners (by `gender`

and `age`

). Straightforward, but not very elegant.

I recently came across a better way to do this, based on the fact that the fraction that I want is the same as the `mean`

of non-null values in a boolean data frame, as follows:

df.groupby([‘gender’, ‘age’])[‘expected_time’].apply(

lambda x: x.notnull().mean())

Here, each group is transformed into a boolean dataframe, based on whether `expected_time`

is non-null or not. Since `True`

is considered to be a 1 (and `False`

a 0) then the mean of this boolean dataframe is the fraction of `True`

values (non-null `expected_time`

values). For instance if we have a grouping of 100 runners and 30 of them have provided `expected_time`

values, then the `mean`

will be 30/100, which is exactly the fraction we want.

That’s a lot neater and more efficient. One `groupby`

instead of two and a much more succint piece of code, with less scope for errors … I think.