Activity Classification for watchOS: Part 1

Introduction and Data Collection

iPhones, iPads, smart watches, shoe sole chips… soon to be brain chips…? Packed with powerful sensors, edge devices are all around us, continuously creating new data for us engineers to play with. With this steady technological advancement, recent emphasis on tracking health related data shouldn’t be surprising. Over 200,000 mobile applications today are related to health and fitness goals¹. Anything from logging workouts, counting steps, monitoring heart rate… you name it.

How does ML fit into the mix? With rich data available at the edge, leveraging machine learning to optimize user experience, track/monitor health related goals, or generate recommendations is alluring. Even better, integrating machine learning models on the edge is now easier than ever thanks to tools like Core ML and In this 3-part blog series we will walk through how we built an example watchOS application that uses an activity classifier model, trained and deployed with

The 3 parts of the series are as follows:

Part 1: Introduction and Data Collection

Part 2: Model Training and Deployment with

Part 3: Building the watchOS App (coming soon)

This post, “Introduction and Data Collection”, is part 1 of the series. Our goal is to introduce the topic of activity classification, provide a specific example of a toy application that uses ML to create some value, and detail the data collection process.

Activity Classification 101

Before we dive into an example, let’s talk about the basics. Most smart devices, loaded with sensors like accelerometers and gyroscopes, are capable of monitoring many aspects of their environment. Subtle movements, loud noises, shifts in pressure.. your device can capture that data and consume it. Apple’s Core Motion framework allows developers to access this data from on-board hardware. Using a combination of machine learning techniques, it is possible to train a model to recognize the relationship between patterns in motion-sensory observations and pre-identified physical activities such as sitting, walking, or running. For a full introduction to the world of activity classification, also known as “Human Activity Recognition” (HAR), check out this resource.

Why is it useful though? In short, this type of model allows your device to “know” what you are doing, assuming you are wearing the thing. We’ve got a simple idea to help communicate potential utility.

Introducing “Sedentarian”

Sedentarian: an app that tracks daily movement and activity patterns using an activity classifier model.

With the release of iOS 12, your iPhone now tells you how much time during the day you spend looking at your screen. Kinda creepy? Maybe. It certainly has value even if it’s driven by a guilt factor. Truthfully, that’s often the best motivator for behavior change anyway. Using Apple’s Screen Time tools as our inspiration, we set out to build an app called Sedentarian.

It’s no secret that an active lifestyle is super important to being a healthy human. According to the Centers for Disease Control and Prevention (CDC) Behavioral Risk Factor Surveillance System (BRFSS), as of late 2018 just over half of adults in the U.S. meet the national standards for aerobic activity². Physical inactivity is prevalent among adults, especially in the southern part of the country.

Figure 1. Plot of adult physical inactivity by U.S. State.

For those of us that work day jobs, it can be a challenge reminding ourselves to stand up or walk around the block for fresh air.

Sedentarian uses an activity classifier model deployed with Skafos to determine your daily (9am-5pm) habits between sitting, standing, and moving. These three categories are broad enough to capture almost all types of activity during the day. Around 5pm daily, users will get a notification with a report that audits motion activity in each of these three categories. The hope is that this type of information will help inspire lifestyle changes in users, leading to patterns of healthy living. Even though the app is simple, we hope that this example sparks ideas for how to bring ML to the edge.

Data Collection

Sedentarian’s workhorse is an on-device activity classifier model that runs constantly in the background. In order to create a model like that though, we had to collect training data.

Specifically, the training data was a time series of sensor readings from the watch paired with an activity label. Making that type of data was not trivial. However, there are several apps available that enable access to in-house sensors on your iOS device.

Below I will walk through how we collected data for the activity classifier model from a bunch of Apple Watches. Here is a list of materials that were used:

Gathering watchOS Sensor Data

We have a team of assassins working on Skafos. So I figured I would ask a couple of them to participate in a data collection experiment around the office. Our goal was to make a bunch of data that look like this:

Figure 2. Sample motion-sensor data taken from a watchOS device.

What you see above are tri-axial readings from the watch’s accelerometer and gyroscope every 10th of a second while performing different activities. We also used a stopwatch app to record the split times between each activity, and with some python-pandas-magic, merged the two time series’. We’ll cover that in more detail later on.

Back to our process… While the time collecting data was rather enjoyable, I had to be clear that we needed to follow a strict set of guidelines to limit the amount of noise in our data.

The Guidelines

  1. Each device must have the Sensor Log app configured to poll sensor data at 10Hz (10 times per second) and export data in CSV (comma separated value) format.
  2. The collection will take place in 2 rounds of 3 sessions. Each session will produce a CSV-file for each user like the one in figure 1.
  3. A time keeper (that’s me) will record split times using a stopwatch at the transition between each pre-determined activity in a separate log.
  4. All transitions between activities (sitting, standing, moving) must happen together.
The amount of data that’s gathered is totally dependent on tolerance for model accuracy. In general, the more training data collected, the better. Additionally, the more unique users contributing data, the better.

While collecting data, we sat at the main room table and talked, walked around the office floor, stood around a whiteboard, worked at our desks, took a walk on the beautiful pedestrian mall outside, gathered around a map, oh… and we even did some calisthenics!

With the right group of people, gathering sensor data can be fun!

Preparing watchOS Sensor Data for ML

Those of you in the Machine Learning or Data Science community know that most of the time spent building cool models is actually allocated towards preparing training data. Fortunately, the Sensor Log app makes it easy to export data from the device so that our data collection effort were not in vain!

Just hit the “send to iPhone” button!

Each person exported their CSV files from watchiPhoneMacBook (thanks Air Drop). I also copied the stopwatch split times to a text editor where I created a JSON snippet. We were officially in business to do some data wrangling.

Our data collection effort generated two types of data to handle:

  1. Sensor Data Logs: CSV files containing sensor readings from the watches.
Figure 3. Sensor Data Logs generated by the Sensor Log App.

2. Activity Label Logs: JSON snippet containing activity labels and a timestamp for each round and session.

Figure 4. Activity Label Logs in JSON format that are associated with a round and a session of collection.

Training an activity classifier requires sensor data to be accompanied by activity labels (figure 4). Therefore, part of the data cleansing process was figuring out how to join the sensor data logs to the activity label logs. This was why it was critical . The cleaning algorithm followed these steps:

  1. Load all files and split by type
  2. For each round, group files by session
  3. For each session and each file within a session, grab the needed columns, and load the matching activity label log
  4. Perform an “as-of” merge with Pandas on the loggingTime field
  5. Append chunks to the final dataset

All Python code to perform the cleaning & joining algorithm is shared below.

Figure 5. Data cleaning and joining python code.


Applying the algorithm in figure 5 to our raw data produced a pretty dataset that we will use to train an activity classifier model. The model will take as input 6 columns of sensor data and try to predict the associated activity.

Figure 6. Slice of the final activity dataset that we can use to train an activity classifier model.

What’s Next?

Hopefully you learned a little bit about collecting and preparing ML-ready activity data for an activity classifier model. The next post in this series will address how to use this data to create the model on the Skafos platform. Stay tuned!