A Richer Activity, Part 2

Published in

strava-engineering

8 min readMay 31, 2019

Displaying data from the activity data service in the Strava feed

Background

Over the last year, the Strava API Team has been working to build a service which can persist new types of activity data. This is the second in a series of blog posts about this initiative. Part 1 provided motivation for a more flexible activity data store at Strava. It also detailed implementation details of the new service and left off with some examples of showing a new data field for a single activity, like showing number of runs for alpine snow sports. However, our real goal for this project was to get data from the activity data service in the Strava feed.

In this part, we’ll discuss how we integrated the activity data service with our monolithic application and displayed data from the service in the Strava feed, as well as some of the pitfalls we encountered along the way. To display data from the activity data service in the Strava feed, we had to be efficient about how we called the service when compiling feed data.

Making it too easy to hit the activity data service

Once we were ready to integrate the activity data service with our monolithic Rails application, the hottest topic of debate was how the monolith would read data from the activity data service. Although we are currently migrating our object models to our own lightweight relational mapping system, Activity is still an ActiveRecord model, which is the relational mapping system that ships with Rails. We decided we would try to make data from the activity data service appear like a native activity attribute that was persisted in our older MySQL activities table.

Whenever we wanted to add a new value type to the activity data service, we would add the value type to an enum in the service’s thrift file. The value type would then be available to both the activity data server and the client interface gem.

When we initially integrated the activity data service with our monolith, we decided the first field we would test end to end was “calories.” Calories was not denormalized in our old MySQL activities table. This meant that every time we wanted to display calories we had to do an expensive calculation using an activity’s raw time-series data, which we at Strava call streams. It also meant we had no way of displaying calories for activities without streams. When an activity had streams, showing calories was acceptably slow for a single activity, but it was prohibitively slow for any display with multiple activities, such as the Strava feed.

Before integrating the activity data service into the monolith, our implementation for calories looked like this:

class Activity
  def calories
    if streams?
      calculate_calories
    end
  end
end

There were several problems with this implementation. As mentioned before, the streams-based calculation was expensive, and that performance cost prevented us from being able to show calories on the feed. Another issue was that devices often do their own calorie calculations, but we had no ability to save device calories. Over the years we got thousands of support tickets asking why calories on Strava were different than calories on a device. Furthermore, this implementation only worked for activities with streams. This streams requirement was acceptable when Strava only cared about rides and runs, but now Strava supports dozens of sport types, many of which cannot be easily recorded with stream data. For example, there are gym apps that count reps and use that data to estimate calories.

Once we integrated the activity data service with the monolith, the first thing we did was persist calories as reported by the uploading app or device. Next, we updated our calories method to read from the activity data service.

class Activity
  def calories
    activity_service = ActivityDataService.new
    service_calories = activity_service.get_calories(self)
    return service_calories if service_calories    if streams?
      calculate_calories
    end
  end
end

This solution overcame almost all the problems described with our initial calories implementation. We avoided doing the expensive streams calculation for new activities, we made calories on Strava match calories on the recording device, and we were able to efficiently retrieve calories from our activity data service. But we hadn’t accounted for streams-based activities uploaded before we spun up the activity data service. These older activities had no persisted calories data to facilitate efficient display.

In order to show calories in the feed, we had to be smart about what to do with calories from older activities. Strava passed 2 billion activities last year. Given that scale it was untenable to backfill all those activities just so we could show calories in the feed.

Our solution to show calories for older activities was to implement what we called a “lazy backfill.” Whenever we called activity.calories, we would check the activity data service for a calories value and return the value if there was one. If the activity data service had no calories value but the activity had streams, we’d calculate the activity’s calories, insert that value into our activity data service, and then return the calculated value. Our final implementation looked like this:

class Activity
  def calories
    activity_service = ActivityDataService.new
    service_calories = activity_service.get_calories(self)
    return service_calories if service_calories    if streams?
      calories = calculate_calories
      activity_service.insert_calories(self, calories)
      calories
    end
  end
end

In the worst case we would only ever do our expensive streams-based calculation once per activity. A drawback of this approach is that we would not display the calories value calculated by the recording device. However, completely reparsing the raw file for older activities was too expensive, so we decided to continue to use the streams calculation for older activities. Even though device calories would not match the calories shown on Strava for older activities, at least the calories value on Strava would not suddenly change, which also could have caused confusion among our athletes. With this infrastructure in place, we felt comfortable putting calories in the feed for certain activity types.

Our first attempt at putting calories in the feed did not go smoothly. A different team was responsible for the feed. They knew that calories was a denormalized field in our service and that they could access it by calling activity.calories for each activity. Before rendering each feed, the feed team automatically requested every column in the activities table for each activity in the feed, and they assumed the same was true for calories retrieved from the activity data service. However, the feed team was retrieving calories from the activity data service at the individual activity level, not the feed level, so those values were not bulk loaded. Unaware of this problem, the feed team rolled out calories to the feed and quickly noticed a roughly 20x increase in number of calls made to our activity data service, as well as site-crippling latency.

The problem was that our implementation instantiated a new connection to the activity data service for each activity in the feed activity_service = ActivityDataService.new, and so we had introduced and been bitten by an N+1 query. We had made it really easy for an engineer to get an activity’s calories, but we had also masked that the activity data service was called under the hood whenever someone called activity.calories.

To prevent this type of N+1 query from happening again, we decided to stop making the source of the data opaque to the caller. We did this by removing calls to the activity data service from the activity.calories method, and moving all calls to a feed-level cache object. This object had all values from our activity data service for every activity in the feed, but it was decidedly different from a standard ActiveRecord activity attribute.

Backward-Compatible Upgrades

After we successfully wired up calories, we wanted to add many more fields to our service. One such field was application_id, the ID of the Strava application or third party application which uploaded the activity. We needed to record application ID so that we could know when to apply special treatment to activities uploaded by partners. We added application_id to our activity data service’s value type enum in the thrift interface, generated a gem, deployed our monolith with the updated gem, and started writing/reading application ID to/from our service. The only problem was that we had forgotten to deploy the activity data service with the new value type.

The server raised an exception each time we inserted an application ID value because it did not know about that value type, and our monitoring system soon fired an alert because of those exceptions. It’s important to note that even though exceptions were being raised, the unknown value type was still being persisted by the activity data service. At this point, the strava.com was still functional. We rolled back the monolith deploy, and that was when the mess really started. In the same vein, it had been returning those value types it didn’t know about when the monolith requested them to load the feed. After rolling back the monolith, the activity data service returned a value type the monolith didn’t know about, and exceptions from the monolith became user-facing errors which prevented anyone from viewing their feeds or activities. Once we had diagnosed the issue, we deployed both the activity data service and the monolith with the updated interface so that they both knew about the application ID value type, and that fixed our issues.

Moving forward, we needed a way to prevent this issue from happening again. It was unlikely that someone would redeploy the server before integrating against new value types on the client, but nonetheless it was worth protecting against. However, the same issue would have happened if we had rolled back the monolith deploy because of an unrelated exception in the deploy diff, and that was guaranteed to happen eventually.

Revised Approach

To realize the full potential of our activity data service, we needed to improve the resiliency of both the monolith (acting as a client) and the activity data server. We continued to have the server save values for unknown types, but we logged whenever this happened and did not return values for unknown types on read requests. On the client we created a list of known value types in addition to the list provided by the service’s interface gem. The client would log unknown value types but only allow callers to access known types. Lastly, we established process guidelines for adding a new value type:

Add the value type to the thrift interface
Deploy the server with the updated interface
Bump the gem version and redeploy the monolith
Read/write values of the new type in a subsequent monolith deploy

The activity data service now has 35 different value types, and every product team at Strava is using it to store information they need. Number of ski runs, perceived exertion, and activity visibility are just a few examples of fields we are now able to store for any activity. Every scalar field on the activities table is being stored in the activity data service for new activities, which has brought us closer to being able to retire the activities table altogether.

Conclusion

In this post, we discussed how we integrated our new activity data service with our monolithic application. We also discussed the benefits of providing data at the feed level and the pitfalls of providing data at the individual activity level. To further improve the performance of the activity data service, we’re currently building a Redis cache into the service so we don’t have to hit Cassandra to serve every request. Look out for a third installment of this miniseries in a future Strava Engineering blog post.