How We Built Our In-house Chat Platform for the Web

By Vasudevan K.

Grab
Grab
9 min readSep 9, 2020

--

At Grab, we’ve built an in-house chat platform to help connect our passengers with drivers during a booking, as well as with their friends and family for social sharing purposes.

P2P chat for the Angbow campaign and GrabHitch chat

We wanted to focus on our consumer support chat experience, and so we replaced the third-party live chat tool that we’ve used for years with our newly developed chat platform. As a part of this initiative, we extended this platform for the web to integrate with our internal Customer Support portal.

Sample chat between a driver and a customer support agent

This is the first time we introduced chat on the web, and we faced a few challenges while building it. In this article, we’ll go over some of these challenges and how we solved them.

Current Architecture

A vast majority of the communication from our Grab passenger and driver apps happens via TCP. Our TCP gateway takes care of processing all the incoming messages, authenticating, and routing them to the respective services. Our TCP connections are unicast, which means there is only one active connection possible per user at any point in time. This served us well, as we only allow our users to log in from one device at a time.

A TL;DR version of our current system

However, this model breaks on the web since our users can have multiple tabs open at the same time, and each would establish a new socket connection. Due to the unicast nature of our TCP connections, the older tabs would get disconnected and wouldn’t receive any messages from our servers. Our Customer Support agents love their tabs and have a gazillion open at any time. This behaviour would be too disruptive for them.

The obvious answer was to change our TCP connection strategy to multicast. We took a look at this and quickly realised that it was going to be a huge undertaking and could introduce a lot of unknowns for us to deal with.

We had to consider a different approach for the web and zeroed in on a hybrid approach with a little known Javascript APIs called SharedWorker and BroadcastChannel.

Understanding the Basics

Before we jump in, let’s take a quick detour to review some of the terminologies that we’ll be using in this post.

If you’re familiar with how WebWorker works, feel free to skip ahead to the next section. For the uninitiated, JavaScript on the browser runs in a single-threaded environment. Workers are a mechanism to introduce background, OS-level threads in the browser. Creating a worker in JavaScript is simple. Let’s look at it with an example:

The worker API comes with a handy postMessage method which can be used to pass messages between the main thread and worker thread. Workers are a great way to add concurrency in a JavaScript application and help in speeding up an expensive process in the background.

Note: While the method looks similar, worker.postMessage is not the same as window.postMessage.

What is a SharedWorker?

SharedWorker is similar to a WebWorker and spawns an OS thread, but as the name indicates, it’s shared across browser contexts. In other words, there is only one instance of that worker running for that domain across tabs/windows. The API is similar to WebWorker but has a few subtle differences.

SharedWorkers internally use MessagePort to pass messages between the worker thread and the main thread. There are two ports- one for sending a message to the main thread and the other to receive. Let’s explore it with an example:

There is a lot to unpack here. Once a SharedWorker is created, we’ve to manually start the port using mySharedWorker.port.start() to establish a connection between the script running on the main thread and the worker thread. Post that, messages can be passed via the worker’s postMessage method. On the worker side, there is an onconnect callback which helps in setting up listeners for connections from each browser context.

Under the hood, SharedWorker spawns a single OS thread per worker script per domain. For instance, if the script name is worker.js running in the domain https://ce.grab.com. The logic inside worker.js runs exactly once in this domain. The advantage of this approach is that we can run multiple worker scripts in the same-origin each managing a different part of the functionality. This was one of the key reasons why we picked SharedWorker over other solutions.

What are Broadcast Channels?

In a multi-tab environment, our users may send messages from any of the tabs and switch to another for the next message. For a seamless experience, we need to ensure that the state is in sync across all the browser contexts.

Message passing across tabs

The BroadcastChannel API creates a message bus that allows us to pass messages between multiple browser contexts within the same origin. This helps us sync the message that’s being sent on the client to all the open tabs.

Let’s explore the API with a code example:

One thing to note here is that communication is restricted to listeners from the same origin.

How Our Chat Rooms are Powered

Now that we have a basic understanding of how SharedWorker and Broadcast channels work, let’s take a peek into how Grab is using it.

Our Chat SDK abstracts the calls to the worker and the underlying transport mechanism. On the surface, the interface just exposes two methods: one for sending a message and another for listening to incoming events from the server.

The SDK does all the heavy lifting to manage the connection with our TCP service, and keeping the information in-sync across tabs.

SDK flow

In our worker, we additionally maintain all the connections from browser contexts. When an incoming event arrives from the socket, we publish it to the first active connection. Our SDK listens to this event, processes it, sends out an acknowledgment to the server, and publishes it in the BroadcastChannel. Let’s look at how we’ve achieved this via a code example.

Managing connections in the worker:

And in the ChatSDK:

This ensures that there is only one connection between our application and the TCP service irrespective of the number of tabs the page is open in.

Some Caveats

While SharedWorker is a great way to enforce singleton objects across browser contexts, the developer experience of SharedWorker leaves a lot to be desired. There aren’t many resources on the web, and it could be quite confusing if this is the first time you’re using this feature.

We faced some trouble integrating SharedWorker with bundling the worker code along. This plugin from GoogleChromeLabs did a great job of alleviating some pain. Debugging an issue with SharedWorker was not obvious. Chrome has a dedicated page for inspecting SharedWorkers (chrome://inspect/#workers), and it took some getting used to.

The browser support for SharedWorker is far from universal. While it works great in Chrome, Firefox, and Opera, Safari and most mobile browsers lack support. This was an acceptable trade-off in our use case, as we built this for an internal portal and all our users are on Chrome.

Shared race

SharedWorker enforces uniqueness using a combination of origin and the script name. This could potentially introduce an unintentional race condition during deploy times if we’re not careful. Let’s say the user has a tab open before the latest deployment, and another one after deployment, it’s possible to end up with two different versions of the same script. We built a wrapper over the SharedWorker which cedes control to the latest connection, ensuring that there is only one version of the worker active.

Wrapping Up

We’re happy to have shared our learnings from building our in-house chat platform for the web, and we hope you found this post helpful. We’ve built the web solution as a reusable SDK for our internal portals and public-facing websites for quick and easy integration, providing a powerful user experience.

We hope this post also helped you get a deeper sense of how SharedWorker and BroadcastChannels work in a production application.

Authored By Vasu on behalf of the Real-Time Communications team at Grab. Special thanks to the working team for their contributions- Sanket Thanvi, Dinh Duong, Kevin Lee, and Matthew Yeow.

Join Us

Grab is the leading superapp platform in Southeast Asia, providing everyday services that matter to consumers. More than just a ride-hailing and food delivery app, Grab offers a wide range of on-demand services in the region, including mobility, food, package and grocery delivery services, mobile payments, and financial services across 428 cities in eight countries.

Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, join our team today!

Originally published at https://engineering.grab.com.

--

--

Grab
Grab

Southeast Asia’s leading Online to Offline (O2O) mobile platform, providing the everyday services that matter most to consumers. www.grab.com