Behind the scenes of Hootsuite Inbox

Inbox is one of the latest product offerings at Hootsuite, with the goal of offering customers a new way to interact and engage with their clients across multiple social networks in one centralized view. Since launching in August 2018, Inbox has grown to support engagement across multiple channels such as Twitter direct messages, Facebook private messages and LinkedIn page posts.

A notable feature of Inbox is that it enables teams to collaborate in real-time to triage and respond to customer messages. Two users on the same team will have a synchronized view of Inbox, where actions one user takes will be reflected on both user’s interfaces. In order to support this real-time functionality, the code that powers the Inbox UI has evolved to be somewhat unique compared to other products at Hootsuite. I’d like to share with you some insights into how the architecture of Inbox evolved as well as an overview of how it works today.

A short history of Inbox

The majority of Hootsuite front-end products up until Inbox were based on a traditional request-response model. Clients would query for information from REST APIs, and would often have to do significant work to “massage” this data into formats that made sense for their user interfaces. This often meant that clients would have a great deal of responsibility in how they handled and persisted data, which led to growing complexity and bleeding of business logic into our user interface code.

With Inbox, there was a desire to implement an application-centric API between the clients and back-end services. This application API would offer up data in a format that the clients could render as-is without transformations.

High level service diagram for application API

One of the early ideas was offering support for client defined JSON schemas. Instead of the service defining the form of the data, a client would define the schema of the response it expected, which a server would need to conform to in the structure of its responses.

The assumption of client defined schemas drove some of the early development of the Inbox web and mobile clients. Unfortunately, when it came time to implement the application API, there was significant debate over whether or not supporting client defined schemas was feasible given the development resources that were available at the time. In addition, only the web client schema was near completion, with both mobile clients lagging behind.

Eventually a compromise was reached. We would invert the control, and have the application API define its own schema, but that schema would map directly onto interface elements. This led to the adoption of GraphQL as the middle-man application API service, since it provided support for server-side schemas, as well as allowing the clients to query for only the fields they required.

Real-time task tracking

To keep explanations streamlined and avoid some of the domain specific details of the Inbox product, I’ll present a re-imagining of the traditional to-do task tracking application, using the same schema and UI patterns we utilize in Inbox. Picture that this task tracker is going to be used by anywhere between 10–100 users working to triage and keep track of hundreds of ongoing tasks.

Note that some familiarity with GraphQL schema notation is helpful, but not required, to read this article. You can learn more about GraphQL by reading the documentation here.

Interface foundations

The first element of our UI is a list of tasks that we can scroll through. We will refer to this element as the TaskList. When a user clicks on a task in the TaskList, the notes relating to that task are displayed inside a second UI element which we call the TaskDetail.

The foundation of our UI, split between the TaskList (left) and the TaskDetail of the selected item (right)

If we represent the elements we have as GraphQL schema objects, we get the snippet below. For simplicity, let’s assume that details contained within a task are represented by a single piece of text. Note that the ! in GraphQL schema notation indicates a required field, and [] indicates a list.

type Task {
id: String!
title: String
details: TaskDetail
}
type TaskList {
tasks: [Task!]!
}
type TaskDetail {
description: String
}

With our current model, a client could query for a TaskList containing all the tasks, and display them in a scrollable list. When a user selects a task, the details are displayed in the TaskDetail section.

One note here is that our current implementation doesn’t support real-time task management or interacting with tasks. We’ll be coming back to that later.

Different kinds of tasks

A natural thing to do with tasks is organize them in some way. For example, we might want our tasks to go between a status of “Todo” to “In Progress” to “Completed”. In addition, maybe we want to be able to sort our tasks by created date, and filter between tasks assigned to specific teams.

To extend our app to support such functionalities, we introduce the concept of a View. A view is the combination of filtering criteria that decide what tasks should be displayed within the TaskList.

Within the application API, a view might look something like this:

class View {
status: TaskStatus
teams: List[Team]
sort: Sort
}
enum TaskStatus {
Todo
InProgress
Done
}
class Team {
name: String
}
enum Sort {
DateAscending
DateDescending
}

At this point, these fields could be added directly to the GraphQL schema, but by doing that we would leak the implementation of filtering tasks to all our clients. Code would need to be maintained across all clients to manage and modify view state, and adding or removing fields from the View type would become much more complex.

Instead, we recognize that a client only really needs to know two things about a view:

  • When querying for the TaskList, it can provide a view to get back only Tasks that are relevant to that view
  • It can somehow change views in order to filter tasks by different criteria

So, the only data the application API needs to provide to the client is a unique identifier for a view, which can later be parsed to retrieve the filtering criteria. This identifier could be, for example, a unique ID to a database entry that the back-end maintains.

The diagram below shows how three different views could be used to filter a list of tasks. When a field is omitted in the view, we do not filter by that criteria.

Filtering by views. Note that team/status labels are provided for clarity, and are not UI elements.

Actions and interactables

Once we’ve introduced the concept of views, the client will need a way to transition from one view to the next.

This brings us to the implementation of our first Action. Any change a user can take within the interface is encapsulated by an action. Actions should be self-contained and declarative: they contain all pieces of information the client needs to take that action.

Actions also give us a way to do client side tracking, in the form of a provided tracking string that can be sent to a tracking service by clients whenever they perform that action.

To solve our view switching problem, we introduce a ChangeViewAction as a way for clients to change their current view of the data.

interface Action {
tracking: String
}
type ChangeViewAction implements Action {
# Represents the view to be changed to
viewId: String
  tracking: String
}

From the current view, a client will have multiple possible next views, represented by multiple ChangeViewActions. We can imagine each possible view in our application as a node on an undirected graph, and the edges between these nodes as actions.

A selection of available views, forming nodes on a graph.

If we think of these edges as UI options available to a user, then from the (top-left) view containing sort: DateDescending three available options (among others) would be:

  • An option to change the sort from descending to ascending
  • An option to view only “Todo” tasks
  • An option to view only “Done” tasks

To allow the clients to render these options to the user, we introduce the concept of an Interactable. Interactables are a representation of any UI element that a user can interact with. It acts as a wrapper for an action, and lets the back-end provide some text or an image that the client can use to create buttons for the user.

We place interactables inside ViewGroups, which are logical groupings of the available user interactions inside the current view. For example, all the options for selecting a task status might go in a different group than the option to switch sorting.

Combining all of this, we can define the views query a client would make to fetch a view of the application. To fetch the initial view, a client will make this query without including the optional viewId query parameter. They can then use the provided viewId in the tasks query to fetch their list of filtered tasks.

When a ChangeViewAction is performed (akin to travelling along one of the graph edges), the client repeats the views query with the viewId contained in the action to fetch the new set of UI elements available to the user.

type View {
id: String!
viewGroups: [ViewGroup]
}
type ViewGroup {
interactables: [Interactable]
header: String
}
type Interactable {
action: Action
imageSrc: String
text: String
}
query {
views(viewId: String): View
tasks(viewId: String): TaskList
}

Fire and forget

One can imagine a whole host of other actions that a user may take within the task tracker application. Two critical pieces of interactivity would be editing the details of a todo item, and re-assigning its status.

We run into some challenges here. If our task tracker is meant to be a real-time collaborative tool, then when our user Bob changes the details of a task, his partner Alice next door should see that reflected on her client. Also, if Alice had a view that was filtering only for “In Progress” tasks, and Bob switches a task from “In Progress” to “Done”, it would need to disappear from Alice’s view.

With our task tracker, we will tackle this problem by having all actions be inherently asynchronous. As opposed to a traditional request-response model, where a client would get the result of an action immediately, it will instead dispatch an action, and then at some point in the future receive an event corresponding to that action.

By doing this, when Bob changes a task, his client and all other clients will be dispatched the same event, which they can use to update their UIs. This gives us eventual consistency among all our clients, and enables real-time collaboration on the shared task list.

For our simple application, we will need two types of events:

  • TaskViewEvents, that are sent when the tasks a client can see change
  • TaskDetailEvents, that are sent when the details of a task change

As a first pass, we use a polling query to fetch new events from the server. The ViewEvents query takes a viewId so that clients can receive updates specific to their current view, and TaskDetailEvents takes a taskId since a client only needs to perform real-time updates on the currently open task.

# The actual event types that represent changes in the UI
interface TaskViewEvent {
task: Task
}
type TaskViewRemovedEvent implements ViewEvent {
task: Task
}
type TaskViewInsertedEvent implements ViewEvent {
task: Task
}
type TaskDetailEvent {
taskDetail: TaskDetail
}
# Summaries that contain multiple events and provide cursors
type TaskViewEventSummary {
cursor: String
events: [TaskViewEvent]
}
type TaskDetailEventSummary {
cursor: String
events: [TaskDetailEvent]
}
query {
taskViewEvents(viewId: String, cursor: String):
TaskViewEventSummary!
  taskDetailEvents(taskId: String, cursor: String):
TaskDetailEventSummary!
}

We do have to introduce cursors to support this event polling. This is so that the server can keep track of which events a given client has already been sent, allowing us to avoid duplicate events and to handle large event volumes via pagination. The initial event cursor would be provided in the tasks query.

The diagram below shows how two clients would respond to the same user action while having different views. Client A is filtering by “In Progress”, while Client B is filtering by “Done”.

A task re-assignment being handled asynchronously by Client A (left) and Client B (right)

The order of events is as follows:

  • The user of Client A clicks a button to re-assign Task Two from “In Progress” to “Done”
  • An AssignTaskAction is dispatched from Client A to the application API
  • The application API performs any coordination required to update the data for Task Two in the database, and produces events for the action
  • At some time later, Client A polls for events and receives a TaskViewRemoved event, with Task Two as the content
  • At some other time, Client B polls for events and receives a TaskViewInserted event, with Task Two as the content

Task manager complete

To summarize, we’ve utilized the following high level concepts in creating our real-time task manager:

  • A GraphQL application API to allow our clients to render their UI with minimal client side logic and data manipulation
  • Views to enable filtering by multiple criteria in a server-driven manner
  • Actions and interactables to provide declarative self-encapsulated bundles that enable UI interactions
  • Asynchronous dispatch and event polling to enable eventual consistency and real-time updates amongst all clients

What you see above represents much of how the Inbox product functions at Hootsuite. In reality, we add on a fair amount of additional complexity to address performance and domain concerns, but the core of the application remains very similar to this constructed task manager.

The good

The obvious advantages of this model are that it satisfies the specific needs of the Inbox application (real-time, collaborative, client agnostic), but we’ve also derived some other interesting benefits from adopting this architecture.

The use of the server defined schema has done a lot to ensure feature parity across the different clients and avoid versioning issues. All clients consume from the same version-less application API, so any changes must be validated by both web and mobile front-end teams, in order to ensure we don’t make breaking changes as we evolve the product.

Given that our server schema maps to interface elements, as opposed to domain objects, our clients are generally agnostic to new content so long as it fits within the schema. Our task tracking application for example, could be repurposed as an IT help desk support application where tasks become support tickets. We can also add new buttons, drop-downs or selectors with minimal friction.

Any logic around data transformations are mostly constrained to our application API and further back-end services. All a client does is render out elements, dispatch actions and update based on events. This allows us to unify all business logic into our services, and significantly reduces the complexity of our clients.

The bad

While interactables provide a useful abstraction for user input on our clients, they are somewhat limited when it comes to informing the UI how to represent or render information. It’s not necessarily a negative that a client defines what rendered elements should look like, but there can sometimes be functional differences between UI representations. For example, an interactable with 3 actions could be a set of radio buttons (allowing only one option at a time) or a set of checkboxes (allowing combinations of the options).

Performance considerations have been a challenge in this model. The idea of a stateless view can often lead to a “sluggish” feeling UI, since all actions are only reflected when an eventual event is received. As an example, making a change to a task would not update immediately, since the back-end would need to persist that new task content and create the corresponding event. Optimistic rendering can be used to get around this, but can add additional client-side complexity that must be maintained across different codebases.

Error handling can also be tricky. If an action is “fire and forget”, how do you handle cases where an action fails or a particular error response is returned? We wouldn’t generally want an error caused by one team member’s action to be posted to all clients, so we need some way of reconciling which events are caused by which clients and selectively dispatching them.

Conclusion

Since joining the Engage team in September 2018, I’ve greatly enjoyed working on enhancing and extending the Inbox product. I hope you’ve enjoyed learning about the key design patterns that make Inbox possible.

I’d like to thank the entirety of the Engage team at Hootsuite for providing me with the insights and knowledge about the Inbox product necessary to write this article (and also for just being generally an awesome group of people to work with). Shout-out to Joan Fornells and James Hunter for their guidance and input in helping me put this article together.


About the author

Daniel Bajj is a 4th year computer science student at the University of British Columbia, and a co-op software developer on the Engage team at Hootsuite. He enjoys working across the software stack, leveraging a variety of technologies to create robust and reliable solutions for customers. In his spare time you’ll find him playing guitar, creating indie video games or trying out new dishes in the kitchen.