Amazing Summer on ML Team, Search + Recommendation

Shuyi Li
strava-engineering
Published in
5 min readNov 8, 2023

First and foremost, I express my gratitude for stopping by my corner of the internet! I trust that the content that follows will provide valuable insights and prove beneficial to individuals interested in the Machine Learning Engineer(MLE) internship opportunity at Strava or just general ML work.

About me

Greetings from Shuyi! Currently a fifth-year Ph.D. candidate in Statistics at ASU, I am thrilled to be embarking on an MLE Internship at Strava, Pearl in Ocean!

Despite my passion for hiking and frequent search for routes, I hadn’t yet come across the Strava App (a missed opportunity indeed). Serendipitously stumbling upon Strava while exploring intern positions, I delved into research, instantly captivated by what this company has to offer. The resoundingly positive reviews and the enthusiasm radiating from Strava’s employees heightened my eagerness to dive into this Strava experience. Here marks the commencement of my summer journey!

About ML team

The team’s mission revolves around providing machine learning-driven functionalities and expanding the scope of personalized interactions within Strava’s ecosystem. The ML team has successfully introduced an array of remarkable features, exemplified by the following:

  • Route Photos (with Geo Team)
  • Suggested Follows (with Growth Team)
  • Challenge Recommendation (with Events Team)
  • and more

About My Project

Referred to as the “Post Search and Recommendation Prototyping,” this constitutes a crucial technological advancement for the future of the Strava App. Allow me to provide a separate introduction for each.

Search

Delivering Strava’s relevant and high-quality content based on the user’s queries, ex: ‘What’s the scenic route in Bay Area’. Which would be very useful to:

  • Help users find applicable interesting information and relevant content which are hard to find before;
  • Analyze query to understand user needs

Within the Strava App, posts play a pivotal role, enabling us to share glimpses of our daily lives, undiscovered trails, and breathtaking views, while fostering interactions with fellow users. Consequently, we leverage all post data(>4 million) for our prototyping project, with the potential for a comparable search mechanism to extend to elements like the “Club search bar,” “Route search bar,” and beyond.

Architecture(Prototyping)

In essence, the search approach involves the extraction and conversion of each post into a numerical vector, alongside the conversion of the user’s search query or question into a vector as well. Subsequently, all the vectors are compared using a “matching engine” via a distance metric. It is important to note that the proximity between two vectors indicates a closer alignment in meaning. Through this process, we can effectively identify content that is relevant to the given search query.

  1. Choose Embedding/Natural Language Processing(NLP) model paragraph -> numerical embeddings, ‘I love Strava!’ → [1, 2, 9];
  2. Construct Vector Database(DB) — Vertex AI matching engine of Google Cloud Platform(GCP);
  3. Given the search query(‘where is the best route for San Francisco’), the post which is highly relevant to the query would be retrieved by Vector DB in real-time.

Recommendation

If a user tells us explicitly he/she is interested in skiing and hiking, and is interested in finding peers to explore new trails together, are we able to connect the relevant clubs, posts, and routes to this user? By providing unbounded recommendations, we aim to:

  • Improve new user experience;
  • Increase engagement, especially engagement between 2 users with no connection;
  • Inspire creators to amplify their posting frequency by providing increased exposure through recommendations. This conclusion is based on the hypothesis that creators would be inclined to post more if they receive a greater number of clicks, kudos, or comments from non-connected users.

We use the posts data(100000 in 2023 after filtering) for the prototyping project but a similar recommendation mechanism can also be applied “Post curation/ranking”, “Club recommendation”, “Route description/recommendation” and more.

Architecture(Prototyping)

In summary, the key idea is to extract and convert each post to a vector and convert the user’s explicit inputs/preferences to a vector. All the vectors then can be compared within the same domain via a distance metric, i.e. the closer the two vectors are, the closer meaning they carry. This way we are able to find relevant content with respect to the user interests. The diagram above shows the components of this prototyping project.

  1. Choose Embedding Model word/post -> numerical embeddings, ‘I love Strava!’ → [1, 2,…, 9]
  2. Classify posts(100000 posts here for demo), obtain top representation for each cluster/topic
  3. Recommend post based on preferred activities(hike/run/etc.) or goals(“Track my activities and workouts”) via “New Reg Intent Survey”

Team Offsite

Embark on a scenic journey along the “Vista Trail”(picked by Strava https://www.strava.com/routes/3116741709062223292), accompanied by the ML + Data Platform Engineer Team. The adventure continues with an engaging exploration of each team member’s enneagram type, followed by an exciting round of Axe Throwing — a truly unique experience!

hike ML+Data Platform Team

Big Shoutouts:

I want to shout out to my manager & mentor Lucinda, teammate Shuyun, Dan, Jun, and PM Dustin! Lucinda’s unwavering commitment to guiding me through challenges and keeping me aligned as a MLE has been invaluable. Her wealth of knowledge shines through, and she consistently addresses my inquiries with patience. The team’s responsiveness and eagerness to aid me in various capacities, such as integrating GCP with SnowFlake, restructuring backfill logic, and comprehending business intricacies while infusing domain expertise, has been truly remarkable. Their guidance has streamlined everything, and I’m profoundly thankful for the wealth of insights they’ve shared.

Furthermore, I’ve gained valuable perspectives from engineers spanning different teams, including Analytics, Data Platform, and Search. Their generous assistance in familiarizing me with their services, along with the time they’ve taken to connect and elucidate their projects’ contributions to Strava’s overarching goals, has been immensely enriching.

TO BE CONTINUED

--

--