TandemTrack: Combining Smartphones and Smart Speakers

Yuhan Luo
Sparks of Innovation: Stories from the HCIL
6 min readMay 18, 2020

Shaping a consistent exercise experience by complementing a mobile app with a smart speaker.

This article summarizes our CHI 2020 paper on Shaping Consistent Exercise Experience by Complementing a Mobile App with a Smart Speaker, by Yuhan Luo, Bongshin Lee, and Eun Kyoung Choe.

The TandemTrack app and Alexa skill.

Voice interaction has been rapidly integrated into people’s daily life in recent years, especially with the introduction of smart speakers such as Amazon Echo and Google Home. As a result, the number of third-party applications for smart speakers has been growing consistently. Taking Amazon Alexa as an example, the device provides numerous “skills” for many kinds of activities. In particular, skills listed under “Health & Fitness” support in-home exercise training (e.g., 7-minute Workout, 30-day Push-up Challenge, etc).

As of May 2020, more than 2,000 Alexa skills are listed under the “Health & Fitness” category.

In delivering in-home exercise training, smart speakers have the following advantages:

  1. The hands-free interaction can lower the burden of capturing exercise data during exercise.
  2. The voice reminder can be more noticeable than mobile notification in the home where people do not always carry their phone.
  3. Smart speakers’ lack of mobility can aid in creating exercise routines in a consistent location.

The downside of smart speakers is the limited visual feedback it can provide for an exercise session. Mobile fitness apps (e.g., Nike Training Club, SWORKIT, etc), on the other hand, provide a rich visual interface but lack the hands-free convenience that the voice interaction can provide.

Given the pros and cons of the smart speaker and the mobile phone, we examined how the two devices can complement each other in supporting consistent exercise at home. Incidentally, this is an activity that is increasingly important in quarantine times such as these where many people around the world are confined to their homes.

TandemTrack

We designed and developed TandemTrack, a multimodal system coupling a mobile app — the TandemTrack app — and an Amazon Alexa skill — the TandemTrack skill. To situate people to interact with a smart speaker and a mobile app in a simple exercise context, the two modalities support the same key features: an exercise regimen alternating between sit-ups and push-ups, data capture for exercise performance, exercise feedback, and daily reminders.

Exercise Flow

The entire exercise session on TandemTrack lasts for three minutes and 30 seconds, including three exercise sets (either sit-up or push-up) and two breaks in between. Each set lasts for 30 seconds and each break lasts for 60 seconds.

Let’s look at the exercise flow on the TandemTrack skill: the user receives a reminder, tells Alexa to open the TandemTrack skill, then uses the voice command “start trainer” to start exercise.

The exercise flow on the TandemTrack Alexa skill.

By the end of each set, the TandemTrack skill prompts the user to speak out their sit-up or push-up repetitions and takes their voice responses as input. When the user completes all three sets, the TandemTrack skill records their exercise information and gives a confirmation message: “today’s training is completed, you’ve done a total of X sit-ups/push-ups. Great job!

On the TandemTrack app, the exercise flow is similar: the user first receives a mobile notification, then opens the app to start exercising.

The exercise flow on the TandemTrack mobile app.

Exercise Feedback

With the TandemTrack skill, people can ask general questions about their exercise using voice commands such as “how was my workout?”, or specific questions such as “how many push-ups did I complete yesterday?

On the TandemTrack app, the interface presents several visualizations on people’s exercise data, including their daily exercise repetitions (A), a summary of longest streak and total workouts (B), and a series of aggregated data — exercise streak view (C1), sit-up progress (C2), and push-up progress (C3).

Exercise feedback on the TandemTrack mobile app.

The Four-Week Field Study

To examine the added value of a smart speaker to a mobile app in supporting in-home exercise, we conducted a four-week study with 22 participants. Half of the participants only used the mobile app, while the other half used both the mobile app and Alexa skill .

Throughout the four weeks, we found no significant difference between the two groups regarding their completed exercise sessions, longest streak and average streak, which led us to further examine how participants used the TandemTrack app and the skill to exercise.

Usage of Exercise Regimen & Data Capture

Among the 11 participants who used both the app and the skill, we found three usage patterns:

  • Five participants preferred using the TandemTrack skill.
  • Four participants preferred using the TandemTrack app.
  • Two participants never interacted with the TandemTrack skill.

Personal Preferences: Participants highlighted that the hands-free voice interaction made their data capture easy, especially for push-ups which require extensive hand-use. Those who were fitness fanatics called out that the voice-based exercise regimen reduced distraction, so that they could focus on their performance instead of looking at their phones.

On the other hand, two participants did not interact with the TandemTrack skill at all, because they did not feel comfortable talking to Alexa. One of the participants remarked:

“It’s something like almost human but not. The way to interact with it is not intuitive, it’s awkward.”

Proximity and Exercise Space: To effectively perform sit-ups and push-ups using the TandemTrack skill, participants needed enough space while being proximate enough to the Echo Dot. The exercise setting however was not always ideal. For example, one participant rearranged her room during the study, which narrowed the space where she put Alexa and did not leave enough space for her to exercise with the skill.

Social Context: When the Echo Dot was placed at a location where other people could access, interacting with the device became a “public” rather than a “private” activity. Some participants worried that talking to Alexa might interrupt house members; some participants considered exercise a personal activity and did not want to let others know that they were exercising.

In addition, how other house members interacted with the Echo device affected participants’ device preference. For example, a participant who preferred the app over the skill mentioned:

“Ever since my four-year old son found he could ask Alexa ‘Knock Knock Jokes,’ he kept shouting at it every day. My husband got really annoyed so he unplugged the Echo a couple of times.”

Exercise Feedback

Visual feedback on the app: Compared with those who only used TandemTrack app, participants who used both the app and the skill spent less time reviewing their exercise feedback on the app, especially when they exercised using the skill.

Voice feedback on the skill: We found that participants who used both the TandemTrack app and the skill rarely asked the skill questions about their exercise data. They explained that they did not feel necessary to ask Alexa questions, given that the visual feedback on the app already provided sufficient information. Oftentimes, they forgot what voice commands to use and found the voice responses not insightful.

Lessons Learned

Throughout the four-week study, we learned that simply by complementing a mobile app with a smart speaker does not increase people’s exercise adherence, but many opportunities exist in enriching their exercise experience.

Supporting diverse exercises: The potential value of hands-free interaction can help extend TandemTrack to support different exercises beyond sit-ups and push-ups, such as planks and body pumps which involve intensive use of hands.

Optimizing exercise performance: Based on individuals’ exercise skill level, their performance can be optimized by tailoring the primary interface. For example, exercise novices can focus on the visual interface to learn proper postures and pace, while exercise experts who are already familiar with the exercise flow can focus on voice interface to minimize distraction.

Delivering multimodal feedback: To enrich the exercise feedback, it is important to leverage the advantages of both voice and visual in a synergistic manner. For example, we can use the mobile app as the primary interface for the exercise data overview, while enabling people to ask specific questions through voice interaction. In addition, instead of waiting for people to initiate a query, the skill can proactively prompt them, especially before or after exercise.

Building an integrated, multimodal exercise experience: While using a smart speaker to complement a mobile app, it is equally important to use the mobile app to complement the smart speaker. The ideal situation is to enable people to control their progress through either device during exercise. For example, even if people start exercise using the TandemTrack skill, they can use the TandemTrack app to pause the session or input their exercise data.

--

--