How to parse your HealthKit data with Python

Stefan Luyten
3 min readAug 2, 2015

--

If you want to get fancy and do your own magic with your recorded health or workout data in Apple’s Health app, then you can write your own script to parse the generated xml.

For those of you who wish to get started in Python, check out the small sample script at the end of this article (GitHub link).

This script gives a little bit of insight in how HealthKit stores and processes your data.

The script reads the exported XML and stores it as easier to process comma-delimited values.

Here’s what the original XML looks like (just a small sample):

Export from HealthKit — original XML (sample)

Some findings after a couple of experimental runs:

  • The Health app consolidates all measurements after 1 hour. If you want to retrieve the individual measurements, then you need to export the data within the hour.
  • All data older than one hour is consolidated. Apple adds an attribute ‘recordCount’ to indicate how much measurements were used to calculate min, max and average values.
  • All data older than one day is further consolidated into one record per hour.
  • All data older than one week is even further consolidated into one record per day.

See my screenshot on the left…

This is the output of the Python script in excel.

Note the header cells, indicating

  • timestamp: year (yr), month (m), date (d), hour (hr), minute (min)
  • measurements: minimum value (MIN), maximum value (MAX), average value (AVG), number of measurements/record count (rc)

I did this export around 17:37 on Aug 2nd 2015, and all cells marked with a ‘+’ are younger than one hour. These are the unconsolidated ones.

The cells in yellow are tracked during a 3 minute ‘workout’, just to see how the sample frequency increases. There are now several measurements in one minute.

1 hour later — consolidated values

Here’s the same data one hour later…. Basically, I’ve waited one hour to export the data again, and marked the same period in yellow. You can now clearly see that the workout minutes have been consolidated, and that the record count was increased accordingly.

Here’s a sample from my data, showing the consolidation of records older than one week:

Consolidation of records older than one week

Blue rows are records older than one week, yellow ones are more recent.

The script can be found here:

--

--