Hacking The Peloton Bike To Play A Cycling Video Game

Cezar Babin
8 min readApr 15, 2020

--

During the quarantine, my workout routine has almost entirely been replaced by the Peloton bike. I am a big fan of the product and have been very impressed with the high quality bar behind the entire experience. One observation, however, is that there are a lot of missed opportunities to gamify the workouts.

Aside from ranking all class participants using their wattage output and incentivizing streaks of workouts, Peloton doesn’t do anything else to make the experience more like a game. It’s a shame, especially since the big display could make for an immersive gaming experience . So, I decided to hack together a prototype to see what that feels like.

tl:dr this is what the final experience looked like.

Hacking the project together

Even though Peloton runs on a flavor of Android, you can’t install any apps on it. Peloton doesn’t allow any outside developers to build apps for its bike. The bike is solely intended for the workouts produced in the company’s studio. One thing you can do, however, is run a browser from the debug menu.

Having the browser opens the door to a bunch of experiences. There is one big issue remaining; in order to make any immersive video game for cycling, I needed access to the bike’s output or at least RPM data from the pedals. Again, Peloton doesn’t offer an SDK to check that (although it easily could). How do we get around that ?

Tracking the pedal motion

The easiest way to do this would probably be to use some type of accelerometer that can stream data to the game. Any additional hardware is a pain to deal with and if anyone else wanted to try this at home it would be a huge barrier. It would be much easier if we could use your phone’s camera, conveniently placed next to the pedals, in order to track how fast you are pedaling.

Luckily I have a lot of experience with running ML on mobile devices, so I went ahead trying a couple of ways to identify pedaling motion in a video from my iPhone.

Attempt #1 — Object tracking with iOS Vision Framework

The iOS Vision SDK has some built in functionality for tracking objects in a frame. All you have to do is select the item that you want to track. Their approach relies on running some type of deep learning and old-school CV to run this detection. Surely, this should work just fine.

It turns out, in practice, it doesn’t work so well. After a couple of frames, the object seems to be lost / mistaken for other objects in the frame.

Attempt #2 — Semantic Segmentation with CoreML

The tracking algorithm from the Vision framework is probably generalized to work OK but not great for most use-cases. I wanted to try something more tailored, so I started considering the path of building a neural net that narrows in on the cycling shoe for more precise tracking. For prototyping purposes, I tried running a couple pre-trained segmentation models to see what comes out.

Semantic Segmentation On Cycling Image

There are a few problems that I noticed straight out of the gate, even without building a model tailored for my use-cases. First, segmentation models can run very slow (about 100ms). Second, once the pedaling motion starts, the motion blur makes it very difficult to have high accuracy, especially since the color of the cycling shoes sometimes blends into the bike. This showed some promise, but it would have been a lot of work to get reasonable performance out of it.

Attempt #3 — Optical flow

Computer Vision doesn’t work well in unconstrained environments. This makes it very difficult to build something that would work reliably in most scenarios. Fortunately, this specific use cases of tracking pedal motion has a couple of constants that can be easily baked into the algorithm assumption; most importantly, that some regular circular motion exists in the frame at all times.

With all of this in mind, I used old-school CV basics, optical flow tracking. Optical flow at the very core consists of identifying corners and standout features in images and then attempts to track those features through time in a video. My first attempt was fairly successful, but as you can see below, there was a lot of noise.

These artifacts are partly due to the fact that the features detected and tracked throughout the video are picked randomly. What if we could filter out movements that are not in sync with the average movements in an image? If we take the average of all displacements in a series of frames, we can easily detect outliers.

The approach turned out to be successful in removing most of the noise. Once I had that, I looked at a normalized displacement of all points in an image in order to detect regular motion.

Y-Axis: Normalized Coordinates Of Cycling Shoe in Motion X-Axis: Frame Number

From here, RPM is easily deducible just by looking at the wave frequency across time. In other words, I could just track when the motion of the cycling shoe would complete a full loop, which is easy given that we assume that all motion that is salient in the image is circular.

Building the game

Peloton runs on a Mediatek MT8173 SoC with 2 GB RAM. The performance of the chip is roughly comparable to that of an iPhone 5s, (not a lot of power to run a video game at a high resolution and frame-rate). The GPU does include a video decoder with 4K and H.265 support, which is understandable given that the main function of the device is to stream workout classes. So how do you build a game for a device that is built for streaming?

Google recently released a service doing just that. Google Stadia lets you stream the games you buy on servers in the cloud. All you need is a decent internet connection, a good Wi-Fi router and a device that allows video decoding at high speed, like the $35 Google Chromecast dongle.

The response time of a game run in the cloud is slower on a system that has to send everything through the internet instead of being run on a console. Google solves this problem with a couple of tricks. 1. Rendering games at high frame-rates 2. Connecting customers to servers that are as close as possible to them 3. Using some form of predictive system that guesses the player’s button inputs algorithmically and renders frames early before sending them to the screen.

Turns out this type of system is a perfect match for a Peloton video-game. Bikes are connected to WiFi by default. They have a powerful chip for high-res video streaming. The task of building a predictive system for user inputs in a cycling game is much easier than what Google has to do for the types of modern games that they are streaming.

The server

WebRTC seemed to be the best protocol for this kind of application. I decided to build a proof-of-concept server in Go for its high performance, leveraging the Pion WebRTC library https://github.com/pion/webrtc/ and a x264 encoder https://github.com/gen2brain/x264-go. For simplicity, I hosted the server with AWS Fargate and put it in us-east-1 for proximity.

The game

I had already built a cycling game in Unity a couple of years ago, but it was designed to be played on a mobile device. The game used a computer vision algorithm developed by BitGym to detect the cycling speed of a player using shoulder and head movements. It never took off. Playing a game on crappy indoor cycling trainer, on a small display, coupled with the imprecise algorithm for tracking user speed made for a very poor user experience.

The game only needed some minor modifications. I changed the input mechanism in order to allow events from my new motion tracking algorithm to control the gameplay. I then increased the default frame-rate of the game in order to facilitate streaming. Lastly, I made some changes to allow the game to be run on a Linux machine.

Gameplay basics

Behind the scenes, the game is just a high intensity interval workout. Users are shown how far off they are from the target intensity.

The green zone ahead of the player ball is the target zone of intensity.

and get rewarded for staying in the high intensity zone.

Users get rewarded for being in the target intensity zone for an extend period of time.

Putting everything together

At this point, I had:

  1. The mobile app that is tracking the cycling speed using a phone camera
  2. Server for rendering the video game
  3. A prototype of the game

All I had to do was integrate everything end-to-end for the experience to work. For this, I used a very basic websockets server that would allow the game runtime to receive motion events from the motion tracking app. Below is a graph of the data flow for the whole system.

End result

In this under-optimized setup, the perceived streaming latency ended up being about 150ms. This is just enough to make the latency noticeable, as the human eye can perceive latencies about 100ms. This does not account for the time it takes to track the motion of the pedals and send it to the server. That probably adds another 50ms of delay from when the cadence changes to when it is reflected in the game.

Overall, however, I think this a good start that proves the feasibility of gaming on the Peloton bike. For the best user experience, I think Peloton should open up their platform to allow external developers a way to track the bike cadence and resistance. Even though my computer vision algorithm showed promise, there would be a lot of hoops to jump through in order to make it production ready. There are a lot of ways to mess up computer vision in the wild, not to mention the fact that users have the cumbersome extra step of setting up their phone to allow this tracking.

Having this interactive medium opens the door to some intriguing possibilities; you can have workouts tailored to the user, allow for a more precise tracking of fitness level across time and create new ways to make the experience more competitive. Not to mention, it’s a game! The gameplay mechanics listed above are only a glimpse of what can be done.

Of course, on the other end, you miss the human touch. Peloton classes are half meditation sessions, half workouts. The instructors and the atmosphere they create are a big reason for the product’s success. The constant encouragement, music choices and tribe-like feeling is hard to replace. That said, there is probably room for both types of experiences and they can complement each other.

If you liked this and would like to get updates on this project, sign up below or DM me on Twitter https://twitter.com/cezarbabin.

--

--