AN EVENTFUL SUMMER AT STRAVA

Bisman Sodhi
strava-engineering
Published in
4 min readJan 8, 2024

Hi my name is Bisman and I studied Computer Science at University of California, Santa Barbara. During summer of 2022, I had the most amazing experience working as a Software Engineer Intern on Strava’s Data Platform Team. In the first fews weeks, I learned the tools my team uses and then spent the rest of the time working on my project.

TRACKING BAD EVENTS

For my major summer project, I created a data pipeline that pulls user behavior data out of external storage and persists it in our data warehouse. Strava uses a service called Snowplow to collect this user behavior data, like loading a club page or uploading a profile photo. Sometimes, this data fails to match the schema that we’ve set, and a piece of data that fails this schema validation is called a bad event. Previously, these bad events were temporarily stored in an Elastic Search. Persisting this data in Snowflake, our data warehouse, makes it accessible to a wider audience. It also makes it easier to incorporate the bad events data with other services used at Strava.

To start my project, I created a directed acyclic graph in Apache Airflow, a scheduling framework, using python that extracts bad events data from the S3, AWS’s storage service, buckets on a daily cadence. This data was stored as gzip files on S3 which I decompressed and stored the data as JSON blobs. As I was working with billions of rows of data, it was important to maintain data integrity and take measures in case data failed to load from S3. Therefore, I loaded data into a staging table in Snowflake. The staging table ensured that if loading from S3 failed, the production table would remain untouched. This data was then loaded into the production table free of any partial data. After all the data was loaded into the production table, I created six view tables because there were six different types of bad events stored in the production table.

I collaborated with our stakeholders — data analysts — throughout this process to craft tables based on their inputs. Since the JSON information in each of the bad events data contained different schemas, I extracted unique information from each type. I chose to materialize them as SQL view tables to reduce redundancy of data and decrease latency to query the data. After data was aggregated in the appropriate tables, I created a dashboard on Tableau that creates visuals on the number of bad events and displays important metrics. I collaborated with all members of my team throughout my project to discuss timelines, progress, road blocks, and next steps.

Visual representation of the pipeline

TRACKING GOOD ‘EVENTS’, TOO

My internship was not all about bad events. There were plenty of opportunities to enjoy good events like JAMS and my team’s offsite meet up in the San Francisco office.

JAMS is Strava’s week-long hackathon. Everyone at Strava enthusiastically participates at JAMS and creates features that range from super useful for the app to something just for fun. It was an incredible experience where I worked with engineers and interns across different teams to integrate Spotify to Strava. The other interns and I would hop on a zoom call together and try to debug scala code, which I also learnt during JAMS! I also created a database to store an athlete’s information, the activity during which they played a song, and a list of songs they have played so far.

Although this internship was fully remote, my team organized a week long offsite meeting at the San Francisco office. The Strava office is beautiful and exactly how you would imagine it — indoor gym, a room full of bikes and too many snacks and coffee flavors. During this time, my team and I worked in the mornings, then spent the afternoons making wood boards as a team building exercise, and later enjoying dinner together.

This internship was an intense and rewarding experience filled with opportunities to learn skills beyond my project guidelines, develop non-technical skills and learn about how my team’s work supports Strava.

Great events

Acknowledgement

Thank you to my manager, Kau, for guiding throughout my internship, trusting me to lead the discussions with our stakeholder, giving me opportunities beyond programming to grow and setting me up for success.

Thank you Alaena for mentoring me, reviewing my code, providing me clarity on the project, and always being there to guide me.

Shoutout to the rest of the team — Daniel, Eric, Alex and Stephen for making my internship a memorable experience, reviewing my PRs, answering all my questions on Slack and Zoom, and for all the wonderful memories during the team’s offsite meet up in San Francisco.

--

--