# Counting, as a fraction of a group

One of the common operations that I keep coming back to in my data science projects is counting the number of items in a group as a fraction of the total number of items in the group.

For example, in one dataset of marathon runners I have columns for `gender` and `age` and also for `expected_time`, where some runners have indicated their expected finish-time. I’d like to group runners by `gender` and `age` and count the percentage of runners (in each group) who provide an `expected_time` value, to see how it varies with gender and age.

In the past I have always calculated this using division, with two separate `groupby` operations: one to count the number of runners (by `gender` and `age`) with non-null `expected_time` entries divided by another to count the total number of runners (by `gender` and `age`). Straightforward, but not very elegant.

I recently came across a better way to do this, based on the fact that the fraction that I want is the same as the `mean` of non-null values in a boolean data frame, as follows:

`df.groupby([‘gender’, ‘age’])[‘expected_time’].apply(    lambda x: x.notnull().mean())`

Here, each group is transformed into a boolean dataframe, based on whether `expected_time` is non-null or not. Since `True` is considered to be a 1 (and `False` a 0) then the mean of this boolean dataframe is the fraction of `True` values (non-null `expected_time` values). For instance if we have a grouping of 100 runners and 30 of them have provided `expected_time` values, then the `mean` will be 30/100, which is exactly the fraction we want.

That’s a lot neater and more efficient. One `groupby` instead of two and a much more succint piece of code, with less scope for errors … I think.