Visualized by Ilyas Fahreza

Building Traveloka Web Performance Culture

Ryan Nixon Salim
Traveloka Engineering Blog
8 min readOct 27, 2020

--

Editor’s Note:

Today, we hear from Ryan Nixon Salim who will share one of the key challenges faced by Traveloka (or any fast-growing online business company) when it comes to frontend performance, the technical initiative to help mitigate the problem, and more importantly, for the long run, the type of culture to be adopted in order to ensure performance is a major competitive advantage.

As a corporate technology & web infra engineer, Ryan, together with his team, are responsible not only for securing top-notch performance & stellar reliability of Traveloka website, but also maintaining the core web frameworks, overseeing the adoption as well as advancement of web performance culture in collaboration with the web product team.

Background

Traveloka has many products to enhance the travel and lifestyle experiences of our users across five categories; Transports, Accommodations, Lifestyle, Finance, as well as Travel Enhancements.

As Traveloka products are growing rapidly, our codebase is also increasing in size. Web engineers are often oblivious to the problem of product or feature regression that is caused by unintentional changes until it’s too late to notice or even fix. Regression is the hardest problem to track especially if it is related to web performance. Who will notice that a page is 200kb bigger due to importing the wrong library? Or who will notice that a hotel’s detail page is one second slower to load than usual?

We, and in most cases a product manager, normally start detecting performance degradation when for example, Traveloka’s homepage is noticeably slower to load. The most common follow-up questions are:

  1. When did it happen?
  2. Which commits or changes cause the performance degradation?
  3. What is the root cause?

Prior to January 2019, Traveloka did not have any visibility/monitoring tool to track this kind of regression problem, which made accurately answering the questions above harder, as engineers must look deep and wide into the code and Git commits to find that one needle in the haystack.

As web engineers, creating new features & products takes precedence over improving performance. The former has a clear impact on our users’ experience and is a lot more exciting for us, the engineers. Whereas the latter, such as improving the rendering speed by 200ms, tends to fall by the wayside, as the impact is negligible, individually. However, performance tuning, when performed collectively and consistently over the long run, is a sound long-term investment that can mitigate a massive restoration engineering effort through refactoring or rewriting.

Traveloka Frontend Technical Excellence Initiative

To solve the problem, We started the Traveloka Frontend Technical Excellence initiative, a project to enable visibility and prevent regression on Traveloka frontend platforms (Android, iOS, and web).

First, for each of those platforms, we defined the correct metrics to track as well as the definitive methods to measure and they’re called the Technical Excellence Metrics (TEM).

The TEM for the web platform measures reliability, performance, security, agility, and quality for a total of 16 comprehensive metrics. As an example, for performance and quality, we have Lighthouse Score and unit test coverage to cover those 2 aspects respectively.

The Tools

To enable visibility on the frontend platforms through the TEM, we created Themis and Layar.

Figure 1. Themis and Layar Logo

Themis

Themis is a command-line interface (CLI) to assert thresholds (budgeting) for various technical metrics, block pull requests (PR), and send report data to our dashboard. In a nutshell, we have a general-purpose Lighthouse Budget feature that caters not only for Lighthouse metrics, but also for all of our TEM.

We put Themis on our continuous integration (CI) pipelines. Themis will read our budgeting config (defined in a threshold.json file) and decide whether the metric’s value in a particular commit exceeds our budget (thresholds) or not. If exceeded, Themis Lighthouse GitHub Check will block the PR.

Figure 2. Themis Lighthouse Github Check

Layar

Layar is a generic dashboard for showing our TEM results. Layar supports historical data with per-commit granularity, detail page, filtering, and leaderboard features for each of our TEM. When we find a drop in value for a particular metric, as can be seen on Figure 3 below, we can easily click on that specific dot (the commit that causes the performance drop) on the chart to go to its detail page for debugging.

Figure 3. A performance drop can be seen on the chart of Layar’s Overview page
Figure 4. Layar’s Web Performance Detail Page

In Figure 4 above, the detail page offers thorough information for a specific commit related to its web performance metrics such as each of the Lighthouse test value (including link to its corresponding report), comparison to to other commit, performance fluctuation indicator, and most importantly, the “Attribute” section that shows the metadata of the commit like the Author name, Github Commit URL, Report Date as well as Build URL.

With such visibility, tracking and debugging the root cause of any performance drop becomes easier.

The Flow

Figure 5. Themis + Layar Flow for web performance

Every pull request and commit that merge to master will go through the Themis’ checking process. Themis can be integrated with any metric and measurement tool. In Figure 5 above, the Lighthouse CLI (Lighthouse Testing Registered Page) is responsible for measuring our web performance score by testing a list of registered pages on Traveloka Preview website using the Mock API. Mock API can help us get more “sterile” and stable Lab Test data as it reduces the flakiness of the Lighthouse’s test result.

Once Themis’ checking process is done, Themis will send the result data to Layar to be shown and will also report back to GitHub whether a PR needs to be blocked due to budget overflow.

As of today, Traveloka has these two regression prevention tools operational on all of Traveloka’s frontend platforms. Themis enables us to easily do budgeting for every TEM and Layar helps us to show reports as well as capture any performance regression effortlessly.

Building the Web Performance Culture

The next big step is how to spread & cultivate the web performance culture in our engineers.

Figure 6. Web Performance Culture cycle

1. Spread the Best Practices

To increase the awareness of web performance and spread its best practices, the web infra team created relevant workshops for all web product engineering teams that talked about the general purpose, measurement guide, improvement know-how, as well as tips and tricks about the web performance metrics.

We also introduced our regression monitoring tools (Themis + Layar) and taught them how those tools can prevent any frontend TEM regression issue, especially on web performance.

2. Collaboration

At the end of the workshops, we asked each web product engineering team to register their product page to be measured by Themis + Layar without any budget so that no PR would be blocked in case of the presence of a regression. The goal here was to get the baseline data and showed the performance scores of those product pages in order to encourage them to understand the utility and adopt the application of those two tools.

Afterwards, each web product engineering team, in collaboration with the web infra team, repeated the same exercise above, albeit with a self-determined budget in place this time (starting with as low as 20 Lighthouse score) that could be increased gradually as each web product engineering team’s confidence on their product pages also arose. It’s important to emphasize that the allocation of budget falls in the hands of the product engineering team and not in web infra as the central team. Nevertheless, as Technical Evangelist, the central team does set a bi-weekly sync meeting with the representative of every web product engineering team to gather problems, status updates, and feedback in order to maintain technical excellence.

3 Recognition

Since this will be an integral part of Traveloka’s frontend engineering culture, integrating the Frontend Technical Excellence initiative that has secured the support from VP(s) of Engineering into individual’s performance review, will incentify the engineers to embrace its best practices and earn them recognition.

The Central Team also sends a monthly report to each of the product’s upper managers and VP so that they can see any regression per product per platform.

4 Do More

Every engineer can do more by tracking and measuring the noticeable performance improvement they’ve made in correlation to the business metrics. In addition, they can team up with the data team to create an A/B Test experimentation to see further correlation and use Field data like Speed Curve or Chrome UX report as additional data points.

Conclusions

As of today, 61 desktop & 51 mobile pages across five categories of various products had been registered to have their web performance (Lighthouse score) tracked and measured. In fact, the team behind the mobile version of Traveloka’s homepage is confident enough to have set the warning and fail thresholds at 85 and 80 Lighthouse score respectively. 8 products had also set their minimum budget. It’s a major achievement for Traveloka’s web platform team and we are still continuing to encourage the rest of the web product engineering teams to follow suit.

There is no silver bullet for improving web performance and building its culture. Sometimes, performance improvement has understandably no impact on business metrics. However, as long as performance culture has been instilled in product engineers and therefore, its essence has been well understood, they will improve the overall user experience of the product that they’re building on their accord or conviction.

Web performance is a long-term investment. An average performing ecommerce site that has massive promotion, an incentive dearly sought after especially in Indonesia, will most likely lead ahead of a similar site whose performance has been tuned in the business & technical metrics. However, long-term promotion is not sustainable for any business. Therefore, since there is no silver bullet for improving web performance, Traveloka needs to start early to win the long-term battle of ensuring web performance is one of its competitive advantages.

If you are interested in our technical effort for improving Traveloka web performance, stay tuned for the next publication of “Improving Traveloka Web Performance” as the follow-up to this blog. Until then, take a look at our list of open engineering roles should you be interested in tackling the sort of of technical challenges exemplified in this blog.

--

--