Yik Yak Feed System Architecture (Golang)

Eric Mayers
Yik Yak Engineering
6 min readMar 15, 2017

Background

At its core, Yik Yak is a service where users can create posts at their location and ask “What are people saying near me”? It’s the responsibility of the Feed Service to answer that question and return a set of content relevant for the user. The product evolved from a PHP implementation but this article describes the Go-based microservice architecture that replaced the PHP.

Design Goals

We set out to design a system to satisfy a set of goals:

  1. Support existing clients (expecting an HTTPS/JSON interface). The new system should be a drop-in-replacement for the old, returning a feed similar to the previous PHP stack. This implied some necessary specific features:
    - Authenticate users, checking their verification state (user service).
    - Support muting user-reported posts and users (user service).
    - Support targeted message injection that implements announcements (tools data).
    - Calculate appropriate interaction permissions (e.g. allow users to delete their own post but not others, etc).
    - Support simple geo-fence blocks (to block usage of the service in schools).
  2. Provide a modern platform that would support sophisticated user and location adaptive queries and ranking schemes.
  3. Built as a microservice in Go using GRPC.
  4. Be robust, flexible, and fast.

The result was a fairly simple architecture of microservices communicating via GRPC. This approach provides a bit of friction in building new functionality but guarantees strong interfaces that you can black-box test, independent scaling and failure isolation. Another article will go into more depth on this architecture.

Here’s a simplified system diagram:

You’ll notice that all interfaces are grpc/proto except for the client ⟷ API interface. We chose to use grpc-gateway to construct a JSON interface to our clients, but it does a bit more than just that, it also preserves the legacy API that we had to continue to support. Therein there were some fields always hard-coded to a specific value and some components that needed mapping or translation from the proto interface below it in the stack. We may write another post to discuss the API microservice in itself further.

The following diagram describes the logical flow and sequence of events that happen when a feed is generated. Within a feed query there are concurrent operations to reduce latency. Note that we could be more aggressive with concurrency (starting to prepare the feed before validating the user or geofence for example), further reducing latency but opening ourselves to higher costs and complexity.

The “prepare” sections listed in the diagram are fairly simple. Query validation amounts to parameter sanity checking (not nil, range in allowed bounds, strings not including disallowed characters, etc). GeoFence Blocking is a basic feature of the Yik Yak platform: we block the use of the app for many schools (high schools, etc) and do so by defining a geospatial area from where we reject most user queries.

Similarly we check whether a user has been suspended or is not verified and may restrict or modify what we return in those cases.

The last step in preparation is determining the query area. In the Yik Yak App the client sends a point location and is asking for posts “nearby”. It is up to this part of the service to interpret what that means for the user. This was an area where we invested considerable experimentation. Consider a few ways to approach this problem:

  • Used a fixed radius (a circle) around the user. A user in Manhattan might find a circle of 500 meters “close” and might find many posts in this range that seem nearby.
  • Expanding radius. While our New York user might be happy with a 500m radius, another user in a less dense area, say on a hiking trail, would find few if any posts within 500 meters. We could broaden the range if there aren’t enough posts found, expanding from 500 to 1000 to 2000 meters, etc. until we hit some threshold count of posts. The problem with this is that is creates a load multiplier and causes query latency to be highly variable.
  • Adaptive radius. To avoid the latency variation and load, we decided to pre-calculate the density of posts in active areas. With this approach we were able to perform a fast lookup to get a “hint” in the form of a radius to query N posts. This is the method we landed on and use today. There is a drawback though: the pre-calculation can lag if there is sudden activity. This approach (like anything except for a fixed query) also suffers from the app behavior being somewhat unpredictable for users.

We also imagined other mechanisms which we considered in the architecture and would be exciting for future evaluation:

  • User range. Instead of looking to the count of posts around the user, analyze how the user typically moves and construct a query polygon based on their typical travel patterns or, a more simple approach, based on the typical distance they travel each day. A user driving long distances to commute might care more about posts far from them vs. a user who only walks each day.
  • Geospatial structure-aware. A user on the edge of a park might consider posts in that park “nearby” and relevant, but posts across the street in an office building less so. With this approach we could use structure around the user’s location to construct polygon queries fetching posts relevant to the spaces around the user.

To explore the “fetching ingredients” phase let’s take a look at how we constructed concurrent activities in Go. In the following code we construct four channels and make three anonymous function calls (go func() {…}) that operate concurrently. In each, to keep the code simple and readable, another function is called that does the meat of the work with a simple return, then the anonymous function writes that response into a channel using a simple struct that carries data and error. The approach used here, in contrast to passing a channel into a function, was an effective pattern for us as it kept the function interfaces clean and isolated the concurrence management.

The code then falls into a for-select loop which acts to block until any of the channels have data. We made the decision to fail if we couldn’t, for example, fetch user block data and you can see that happening in the code. We could instead have treated that data as optional, failing “open” instead of “closed”. In this early-fail case you also see the code cancelling the context (cancelCtx()). Context is another powerful pattern in Go that I won’t go into detail here on but simply put it allows for code elsewhere in the concurrent operations to identify if what they’re doing is still necessary. The for-select looks for such a condition by checking for ctx.Done() — allowing for this whole set of activity to perhaps be terminated early.

Each time through the for-select the code checks to see if the channels are closed (nil). If they are all closed we’ve read all the necessary data and can proceed.

Going back to the feed generation steps, in the “Mix” stage (continuing a baking metaphor) we take simple operations, processing the set of posts returned and removing unwanted content — posts authored by “blocked authors”, posts specifically blocked by users (“blocked posts”), posts in abuse evaluation states, etc.

We also add in any targeted post that is returned (if any). These are typically used for announcements in an area or to notify a specific user with some information (such as a suspension warning for bad behavior).

In the last stage we “Package” the data for return to the user. In this phase the steps are pretty simple:

  • Anonymize and fuzz some data to further protect user information such as precise location.
  • Package everything into the return proto.
  • Perform some book-keeping such as recording query timing and other data. You can see some examples of this in the error handling portions of the code snippets above.
  • Return the data.

Summary

We reviewed the overall structure and basic flow of the Yik Yak feed generation system and also discussed some aspects of the design strategy and query approach. One area we did not discuss here was the actual fetching of the posts from the GeoIndex service, which will have to wait for another blog post.

Overall I was very happy with the flexibility that our architecture lent us. Along with our experimentation framework we could easily try new strategies and expand functionality. Go’s concurrency features and the patterns we adopted made it easy to implement sophisticated “mixing” of diverse data sources. The Golang+GRPC+Proto combination is a nice technology fit for Feed System Architecture.

--

--