*ding* Notifications, Subscriptions, and Productivity!

Notifications are hard. Whether its Facebook, Gmail, or your own app, getting users the right information at the right time is crucial. Too chatty, you risk tune out; too quiet and your user might miss a critical update. At its core, a notification is a summary of an event (state change), like “Mike sent you a message”, “Your flight is delayed”, or “Driver departs in 5 minutes”. We built an declarative, expressive subscription system that lets individual users configure listeners for events that are important to them.

Written by Zeke Nierenberg, Alex Bazhenov, and Omri Bernstein.


When we started Fraight, our notification system was simple. Every business event (load delivery, new message, phone call, etc) triggered a mobile notification for every staff member. It was perfect for the time. Our business coordinated truck shipments and we only had 1 employee moving freight. If anything went wrong, we needed to be sure we all knew. Moreover, complex notifications and subscriptions weren’t going to add to our value proposition at our early stage. We needed to focus on what made us different: conversational UI.

It doesn’t take a CS degree to predict a scalability problem. When Ryan Schreiber (VP of Ops), started sharing operational responsibility with Parker Holcomb (CEO), both of their notification feeds were getting cluttered. We’re hiring right now, and we’re aware the problem is about to get much worse.

As our work becomes more concurrent, requiring more people, the relevance of each notification declined along the function (numEmployees) => 1/numEmployees

We needed a solution that let some users know about some events, instead of all, all

First Idea: Every load gets a user_id

At first, the solution didn’t seem terribly complex: shard events to employees by load. Put another way, shard load events by user_id. We figured that most events can probably, be readily associated with a load, so just notify the correct employee.

Problem 1: Users work together on loads

We want to make collaboration software. By forcing the Load belongsTo User relationship, we precluded the possibility that two users could work together on a load. This meant that Ryan couldn’t say “hey Parker, can you watch load 3722 while I’m meeting with the new customer?” This problem alone could be mitigated by a belongsToMany relationship, but the ownership system is still very limiting because…

Problem 2: Its not always clear what load an event is associated with

Sometimes events are created from communication that’s difficult to associate with a load. For example we might get a phone call from a customer that is currently moving 5 loads with us. Who should get the “buzz buzz?”

Problem 3: Some events just aren’t associated with loads at all

Some events are broader than loads. For example, a customer emails to talk through their supply chain strategy for Q2. Even though loads are the centerpiece of our software, not everything fits neatly into a load.

Solution: Subscriptions

We built a tool that lets users subscribe to events. Let’s nail down some high level terminology.

Event

At its core, an Event can be conceptualized as a single change to our database.

  • verb (think CRUD)
  • subject (table, foreign_key pair)
  • meta (the diff of changed fields for update, delete)

We create events by hooking into our ORM for several tables.

Subscription

A subscription is a user’s filtering predicate function. It determines if an event is relevant to an user. It takes the event and and user. It returns a boolean about its relevance. (event, user) => …relevant?.

A subscription is not actually expressed in code, but is instead persisted to the database. It has the following fields

  • user_id
  • channel_id — how should we notify the user?
  • criteria — more on this in a sec, we created a JSON schema for expressing the predicate function

Relevance DAGs

When an event occurs, we cascade it through our data models to find all relevant records. This isn’t simply all associated records, but instead a set of records that are determined programatically.

Any model in our system can register an async method that defines what other records are relevant to a fired event.

For example, the participants of a phone call are relevant to the call.

Users in turn cascade to other entities.

This process continues.

Relevance DAG for an incoming message

When an event is created, we cascade through these methods and assemble a directed acyclic graph in memory. This relevance DAG is what we match subscriptions against. The DAG insures that records are unique.

To understand how the subscription matches work, let’s look at some examples.

Tell me about all new inbound messages”

In our system, we write

All relevance DAGs are matched against this query. If any node of the DAG is an inbound message, the subscription matches, and a notification is created.

Tell me about all inbound customer messages”

There’s an implicit  between the first object in our query and the second. It’s almost like in CSS. nav a { color: blue; } refers to all anchor tags in a nav tag, not just the direct children. Indeed in our SQL database, there is no organization_id on Message. There could be multiple paths in the DAG between message and organization.

A few notes on performance

This is an insanely cool system, but you probably have a weird feeling in the back of your throat. Adding this much logic in the middle of the request/response cycle could grind web service to a halt. Indeed our first prototypes of this system were slow — So slow that it broke our phone system in development!

We knew async workers could come in handy for improving performance because they could take this expensive task and queue it up. We broke the process up into:

  • Generate event — We kept this in the request response cycle. After all, its only one INSERT. Once this is inserted, we enqueue a job take the next step.
  • Build Relevance DAG — This is done in a job. We get all the relevant records of the DAG in memory, then compare them against a cached copy of all our active subscriptions. For every match, we enqueue a notification job
  • Send Notifications — This is the job that actually sends the notification message, be that over SMS, email, or web push.

This system returned us to baseline performance.

Where we ended up

  • Early tests show a ~60% reduction in news feed volume
  • We’re able to accommodate user requests quickly, without changes to code
  • We don’t anticipate any performance or scalability problems for at least the next year
  • In summary, noise is down, signal is up, user experience is sky high

Where we’re going from here

Subscriptions are powerful for more than just notifications. In the future, Ryan could automatically assign a new hire all Track and Trace tasks for customer X with a subscription. Additionally, we could see developer tasks using the system. We might want to ping a microservice when a subscription criterion is met.

We’re considering making this into a microservice and open-sourcing it. Let us know if you’d like to work on it, or use it!

We don’t know every direction this system could take us, but it feels like we’re building the foundation for something special.