Baby Pandas!

Madison John
Madison John
Published in
6 min readDec 13, 2019
Photo by Stepan Kriz from Pexels

I apologize (but only a little) for the click-bait title. Other than the image above, there will be no images of pandas, the bears.

Instead, I will be showcasing how I used Pandas, the Python library, along with some PyPlot and Seaborn, to analyze my baby’s feedings.

Welcome to the first entry in the Learn By Doing series!

The purpose of this post is not to give a full-blown tutorial on Pandas and other topics, but rather to show how I used this library and its tools to perform analysis of a specific data set, namely my baby’s bottle feedings.

While I won’t paste all of my code in this post, I will highlight some interesting and/or useful concepts and will provide the link to my GitHub repository at the end.

Lastly, we have been tracking our baby’s feeds with the Baby Tracker application, available for Android and IOS.

Here we go!

What is Pandas?

Pandas (not the bear), is a data manipulation and analysis library for the Python programming language, providing its own unique data structure, the Data Frame.

More information can be found at the Pandas homepage, including installation instructions.

Loading Data

First things first, we need to load the data into a Data Frame. Fortunately, the application exports its data in a CSV format, one of two formats readily supported by Pandas via a built-in function.

Figure 1: Reading data from CSV file and loading into Data Frame

Pandas.read_csv()

  • Load the csv-formatted data into a Data Frame using the Pandas.read_csv() function.
  • There is also a Pandas.read_json() function for the JSON data format.

Filtering Data

We also use the application to track other things such as breast pump volume, sleep times, and diaper status among other things. Since I am only interested in bottle feedings, I need to filter out everything else.

Filtering by column values

  • Filter by values in the RecordCategory and RecordSubCategory columns. I only wanted to look at Feeding and Bottle categories, respectively.
  • Filtering by column values is as simple as indexing the Data Frame with a Boolean expression, as shown in Figure 2.
Figure 2: Filtering by column value to only include bottle feedings.

Splitting Strings

Looking at the data, I spot an issue that needs immediate attention. The amount of milk consumed is embedded in the Details column along with some informational but non-numeric text. I need to get the numerical amount into its own column in order to do any analysis on it.

Splitting using Series.str()

Luckily, Series objects, which are the columns of the DataFrame, have String methods built in, including replace and split. I use Series.split() to isolate the feed amount.

Figure 3: Using Series.str.split() method to isolate the feed amount into its own column.

Working with DateTime

Before doing any more data manipulation, let’s visualize the data as it is currently so we can inform and adjust our next steps.

Figure 4: Raw data plotted in a line chart.

We first downloaded the tracker application when our baby was born, but we did not start regularly recording their feedings until we brought them home from the hospital, hence the gap in recordings. This gap in usage is not particularly useful so I will filter it out.

Filtering by a DateTime column

Here are the steps I took to exclude this gap.

  • Convert the new start date from String to DateTime.date data type.
  • Filter the data set by comparing values in the FeedDate column against the converted start date using a Boolean expression
Figure 5: Filtering by DateTime, method 1.
  • Note: The filtering above can also be achieved by casting the FeedDate column as a string using the Series.astype() method and comparing against a String date. The code below results in the same filtered data set.
Figure 6: Filtering by DateTime, method 2.
Figure 7: Raw data plotted in a line chart, with adjusted start date.

Great, now let’s take a look at some statistics.

Grouping and Aggregating Data

To get some meaningful information out of this data set, I needed to calculate some statistics such as means, minimums, and maximums.

This was accomplished using DataFrame.groupby() in conjunction with DataFrame.aggregate() as detailed below.

Figure 8: Aggregating the data by various statistical metrics.

DataFrame.groupby()

This function splits data into groups based on a given criteria. In this case that grouping criteria is the FeedDate value.

DataFrame.aggregate()

This function applies operations to one or more columns. For this study, I used max, min, mean, and sum aggregations

The result of using the above two functions is a new DataFrame object with the aggregated data.

Figure 9: Max, Min, and Mean feed amounts per date.
Figure 10: Sum of feed amounts per date.

Merging DataFrames

Notice in the previous section that I created a separate DataFrame for each statistical measurement (df_mean, df_sum, etc.). To simplify further work, I merged these DataFrames together using the DataFrame.merge() method.

DataFrame.merge()

Merge DataFrame or named Series objects with a database-style join.

Below I merge using the common column of FeedDate.

Figure 11: Merging DataFrames together, joined by common FeedDate column.

Cumulative and Moving Window Calculations

While the mean, min, and max aggregations in the previous two charts have their merit, our baby’s progress is not quite as obvious as I would like.

I wanted to smooth out the variation by implementing a cumulative average for long-term progress and a moving window average for short-term trends.

The DataFrame.expanding() and DataFrame.rolling() functions, respectively, made both simple to implement.

Figure 12: Adding cumulative and rolling averages.

DataFrame.rolling()

This function provides functionality for rolling window calculations, calculations executed on a subset of the data in which the subset of data, or window, changes with each calculation. I used this feature to compute a 7-day average for each date.

DataFrame.expanding()

This function is similar to a rolling window calculation, except that only one side of the window changes. I used it to compute a cumulative average of feed amounts for each date. One edge of the window remained static at the first date (July 28, 2018), while the other edge is expanded to one additional date for each calculation.

Figure 10: Per-feed cumulative averages, split by month. Includes each individual feed (grey) and an all-time cumulative average (black).
Figure 11: Daily total amount cumulative averages, split by month. Includes daily total amounts (grey) and a 7-day moving average (black).

Misc. Plots

With Pandas, I was able to effectively summarize our baby’s feeding trends and overall progress. That’s all for this episode!

Link to analysis scripts.

Note: Analysis code can be found on GitHub.

--

--

Madison John
Madison John

husband. father. enginerd. not necessarily in that order.