How I used Azure databricks, Synapse Analytics, and Azure machine learning to get insights from the Apple health app data

Dhangerkapil
4 min readJul 15, 2021

--

Health and fitness is one of my passion. I am using iphone and apple watch for many years now. Both of them gather your health data automatically. So I thought of using that data. Apple health app automatically gathers data about your health like steps, miles, and flights climbed using iPhone sensors apart from activity data from apple watch. So you have lots of data about your health in your phone. The problem is how to use that data to get better insights into your workouts and health. In this post, I will explain how I used azure data services to get a better understanding of health data and how it provides insights you can benefit from. Currently, the following activity types are available in apple health.

ActiveEnergyBurned, ActivitySummary, AppleExerciseTime, AppleStandHour, AppleStandTime, BasalEnergyBurned, BodyMass, DistanceCycling, DistanceWalkingRunning, FlightsClimbed, HeadphoneAudioExposure, HeartRate, HeartRateVariabilitySDNN, HKDataTypeSleepDurationGoal, MindfulSession, RestingHeartRate, SixMinuteWalkTestDistance, SleepAnalysis, VO2Max, WalkingAsymmetryPercentage, WalkingDoubleSupportPercentage, WalkingHeartRateAverage, WalkingSpeed, WalkingStepLength, workoutactivity

You can find the code for this blog in the Github repo https://github.com/dhangerkapil/apple_health_analysis

Exporting health data from iphone

1) Launch the Apple Health App on your iPhone

2) Tap on your profile icon in the corner and click on “Export All Health Data”

3 ) Choose the method of how you want to save or share the exported Health data using Microsoft oneDrive or google drive.

4) It will export a zip file and inside that, there is export.xml which has all the data. It's a large XML file with lots of nested child elements. My file size was around 1.5 GB and it has last 2 and a half years of data. I uploaded it to Azure blob storage/azure data lake from one drive.

Azure Databricks for XML parsing

Extracted all the XML file elements using azure databricks and stored output CSV files in azure data lake/ blob storage. XML conversion/exploding requires a lot of in-memory processing and Databricks can easily handle large XML files like this. I have posted the link for the GitHub repo where you can find the notebook.

Azure Synapse Analytics for data exploration and storage

Azure Synapse is a limitless analytics service that brings together enterprise data warehousing and Big Data analytics. I used serverless SQL pool in Synapse to explore the data for different activities. After that used a synapse pipeline and data flows to split the files by activity type and changed column names. Also added columns for the day, month and year for reporting purposes. I explained how you can split the file and apply business rules in an earlier post. Then created tables for each activity in the synapse dedicated pool.

Powerbi Reports

Pointed Powerbi to synapse dedicated SQL pool and created drill up/down reports for YoY/MoM/DoD comparison using date field.

Created dashboard to show activity details and calories burned by day/month/year.

The chart below shows the relationship between the hours I slept and the active energy burned the next day. Better sleep always helps :)

Azure Machine Learning

In the end, I used Azure machine learning Automl to predict realistic targets for calories burned by day. It's very realistic as compared to the target I set in the apple watch. I used time series forecasting and regression models. Got a better forecast about calories burned from both models. It’s a lot of data and you can use machine learning on different activities. I used ML for a couple of activities only. I will keep exploring more as it's a lot of data and will update this post with findings.

Conclusion

In this post, I used different azure data services for different purposes and it shows how they work together and help in getting insights quickly from large and complex datasets. I used these services for my personal health insights which made this work more interesting.

--

--

Dhangerkapil

Principal Cloud Architect Data & AI@Microsoft. Please feel free to connect with me on LinkedIn: https://www.linkedin.com/in/kapil-dhanger-a8b060aa