Optimizing Memory and Performance for i18n

Published in

Kustomer Engineering

6 min readOct 28, 2021

When optimizing for memory storage, take a step back and look at the data you’re storing in your application. Are you fetching more data than you’re using? Is your API sending more data than is necessary? There are other tools out there that help return only the data you need, but is there a way that you can improve performance without adding new services or completely re-architecting your application? These are the questions we asked ourselves when investigating our platform for performance issues.

In Kustomer, snippets live at the core of assisting agents with sending translated messages to their customers. The most basic example of a snippet is “Hello,” which an organization can add translations for in their supported languages, such as German, Spanish, and French. If an agent is assisting a French-speaking customer, inserting the “Hello” snippet and selecting the French language would display “Bonjour” to the customer.

As our platform has grown to serve thousands of organizations, serving and storing snippets for all of our customers has led to scalability and performance concerns. Therefore, we took a close look at how we can improve this essential part of our platform.

Identifying the problem

In order to support internationalization in our web application, the front-end client fetched all of the organization’s snippets in all of the languages that they supported. Additionally, it also fetched ~150 default snippets translated into 75+ languages that Kustomer natively supports. The default snippets are seeded for every new organization created in our system. Default snippets alone amounted to 11,250 (150 snippets * 75 languages) snippet variations.

If a smaller client only used the default snippets, they loaded roughly 888KB of snippets on each page load. If a volume-heavy snippets organization had 4500+ custom snippets and three languages, it’d return 13,400 (4500 snippets * three languages) snippet variations on top of the 11,250 that already existed. 23,900 snippets would equate to ~five MB. With agents logging in and out, coming back from breaks, refreshing their browser, and more, the data would get requested about ~25k times every 10 minutes.

Once all snippets were fetched from the API, the data was stored in the application state to make it globally available. Agents can use snippets on various pages where agents can configure the copy of consumer-facing pages like CSAT emails. While storing all the snippet data for every language in the application state may have been an efficient solution for minimal languages, it quickly became unscalable for our larger clients. It added memory bloat on the client-side as our clients added more languages and snippets.

The volume of data had a large impact on our clients and API servers. The backend needed to go to the database, make a query, await the results, process it, and return the data to the client. The more data that needed to be transferred to our clients, the more each request would block, and the longer our servers would take to respond to requests.

Querying only what we need

The solution appeared fairly trivial. Our web application needed to stop requesting snippets in every language. In order to do that, the front-end had to only request the organization’s enabled languages, and the back-end needed to support returning data in select languages.

At the time, our APIs only allowed one language to be specified. When a single language was requested, it queried the snippet from the database with all of its translations and filtered out the languages that were not necessary on the server. This implementation was needlessly using memory to store all snippet translations on the server before filtering the response down to the specified language. A more memory-efficient solution was to select the languages on the database level and the server wouldn’t need to further process data after it was returned.

After that, adding support for multiple languages was trivial. The server would parse the languages requested in the URL query string and tell the database to only select the specific languages. No more post-processing on the server was required.

On the client-side, the web application asynchronously prefetched certain data, such as the current user, its teams, and permission sets. The client waits for all of the specified data to be available in the application state before the application can render the UI.

Although fetching more data would slow down application initialization, ultimately it was decided to prefetch the organization’s language settings as well. The additional data gave the drastic benefits of reducing the client’s overall memory footprint. Once the language settings were fetched, the client sent a list of only the enabled languages to our snippets service.

Client memory improvement

Looking at system snippets alone, if an org had at most two languages enabled, they would be loading ~87 KB of snippets instead of ~888 KB, decreasing memory space by ~90%. Most clients have on average five languages or less enabled. If an organization created custom snippets for a language and for some reason decided to stop supporting that language by disabling it, this would also further decrease their memory footprint.

Findings and next steps

Our efforts to optimize performance around snippets helped us reduce our memory footprint on the front-end and back-end. It also exposed additional things for us to improve in the future. It became apparent that the number of languages to fetch could be permanently reduced to two by only fetching the user’s and organization’s default language. If an agent encountered a language that was not loaded, the front-end could lazily load it. The work to further improve the performance around our snippets is something we plan to tackle in the future, so stay tuned for another blog post!

Here are some of the improved metrics for our snippets API service that uses Node.js.

Latency Graph:

~150ms P95 response times on average pre-optimization
~50ms P95 response times post-optimization

Node.js Runtime Metrics Graphs:

~12ms to 8ms Garbage Collection (GC) Pause Time
On average, 2.5 GCs to 1.5 GCs
~31ms to ~20ms event loop delay
~40 event loop iterations to ~47 event loop iterations per second

Optimizing Memory and Performance for i18n

Written by Daniel Tsang