Memory Leaks in SSR Web Apps

Published in

The Glovo Tech Blog

11 min readApr 12, 2023

Introduction

This article explains the memory leaks issues that we face in Glovo through our landing page and how we dealt with them, what we did to solve them, and how to prevent them from happening again in the future.

Context

The Glovo landing page is a Server Side Rendering (SSR) Web Application developed in VueJs and Nuxt. Currently, we are using versions 2.6.14 and 2.15.8 respectively. It’s deployed at AWS where we have 25 EC2 instances that run up to 7 Node.js processes on each instance.

Problem

At one moment we realized the EC2 instances were reaching the maximum memory allowed (12 GB) which was causing the web app to be down until the EC2 instances were restarted. The following chart shows us the maximum memory used by an EC2 instance (the Y axis is measured in GB):

maximum memory used by an EC2 instance (the Y axis is measured in GB)

As we can see, we had three spikes that went more than 12 GB causing the web app instances to be down until it was restarted. Although we understood that there were memory leaks causing more memory consumption than expected, there was something causing the host to restart before reaching the 12 Gb and crashing the app.

Action Plan

We decided to define a plan to execute in order to fix the current memory leak and more importantly, prevent this from happening again. We split the plan into four phases:

Mitigate the memory spikes
Solve memory leaks
Improve memory consumption observability
Prevent memory leaks

Phase one was the most important because we had to resolve those spikes to prevent the web application from crashing again. Afterward, we had to resolve the memory leaks to prevent the web app from consuming more memory than it should and avoid unexpected issues.

Subsequently, we had to improve our memory observability in order to identify which resources in our web app are consuming more memory than they should and detect unexpected increases in order to mitigate them as soon as possible. Finally, we had to add checks in our CI/CD to prevent possible memory leaks to reach production.

1. Mitigate the memory spikes

We realized the web app instances (the ones that run in node processes) were allowed to use up to 3 GB of memory before the Garbage Collector was triggered, as they are hosted in EC2 instance that supports up to 12 GB, there were situations where the sum of all the nodes reach up to 12 GB before the Garbage Collector got triggered, causing the EC2 instance to get down. Because of that, we updated the — max-old-space-size value from 3072 (3 GB) to 1024 (1 GB) which allowed the garbage collector to start early to ensure the processes clear the useless memory and in consequence, prevent the EC2 instance from going down again. In the following chart, we can see how the max memory size used by the EC2 instance decreased:

Even better, it also decreased the average memory use:

average memory size used by EC2 instances

2. Solve memory leaks

Once the memory spikes were resolved, it was time to solve memory leaks. Otherwise, the application could consume more memory than what is available, causing it to crash.

Coming up next, we are going to split the solutions that we have applied into three categories:

2.1 Javascript memory leaks: these are well-known memory leaks caused by the wrong use of eventListeners, setIntervals, and observers
2.2. VueJs Memory leaks, these are memory leaks caused by the wrong use of the lifecycle hooks and memory leaks caused by the framework itself which were solved in newer versions.
2.3 Nuxt Memory leaks, there are memory leaks caused by the use of resources that aren’t available on SSR and memory leaks caused by the framework itself.

2.1 Javascript memory leaks

Event Listeners

Every time we add an event listener:

window.addEventListener('resize', this.sendEvent)

We need to remember to remove it before we destroy that component, page, or function that is using it:

window.removeEventListener('resize', this.sendEvent)

In fact, there are several times that we only need to react once after that event that we are listening to happens, that’s why we can avoid adding the removeEventListener if we pass the object { once: true } as the third parameter to the addEventListener:

window.addEventListener('resize', this.sendEvent, {once: true})

We have several cases like this last one on scripts from third-party libraries that were listening to the load event to execute a promise only once after the library was loaded successfully.

SetTimeout and SetIntervals

It’s important to clear recurring timers that have been added through setTimeout (clearTimeout) and setInterval (clearInterval) before the component that is using them gets destroyed. We know that you don’t need to use setTimeout if the function is executed, however, if you are not 100% sure that the function inside the setTimeout is executed before the component gets destroyed, then it’s better to clear it out rather than leave a memory leak.

export default {
  name: 'MyTooltip',
  props: {
    visibility: {
      type: Number,
      required: false,
      default: 3000,
    },
  },
  data() {
    return {
      visible: false,
    }
  },
  mounted() {
    this.visible = true
    if (this.visibility) {
      this.timeoutId = setTimeout(() => {
        this.visible = false
      }, this.visibility)
    }
  },
  beforeDestroy() {
    clearTimeout(this.timeoutId)
  },
}

mounted() {
  this.polling = setInterval(this.executeMyFunc(), 30000)
},
beforeDestroy() {
  clearTimeout(this.polling)
}

Observers

Every time we use an observer such as IntersectionObserver, ResizeObserver, or MutationObserver we need to disconnect from them:

export default {
  name: 'OnResize',
  props: {
    wrapperRef: {
      default: () => {},
    },
  },
  data() {
    return {
      observer: null,
    }
  },
  methods: {
    onResizeRef() {
      ...
    },
  },
  mounted() {
    const observer = new ResizeObserver(this.onResizeRef)
    observer.observe(this.wrapperRef)
    this.observer = observer
  },
  beforeDestroy() {
    if (this.observer) {
      this.observer.disconnect()
    }
  },
}

Promises

Even promises can leak if they’re never resolved or rejected, that’s why it’s important to be sure that the callback is executed.

2.2. VueJs Memory leaks

beforeDestroy vs destroyed

One issue that we detected was that we were releasing resources (removeEventListener, disconnect from observers) through the destroyed lifecycle hook which was causing errors because the component instance is already destroyed when this lifecycle executes, which means that the HTML elements that we were targeting no longer exist. Because of that, the correct solution is to use beforeDestroy to release those resources.

Wrong:

mounted() {
  document.addEventListener('keydown', this.navigateToItem)
},
destroyed() {
  document.removeEventListener('keydown', this.navigateToItem)
}

Correct:

mounted() {
  document.addEventListener('keydown', this.navigateToItem)
},
beforeDestroy() {
  document.removeEventListener('keydown', this.navigateToItem)
}

Vue 2.6 vs 2.7

As we mentioned in the context, we were using the VueJs version 2.6.14. After some research, we found there is a memory leak issue when we use v-on because it leaves Detached Html Element on those Components that use the v-on directive (Please, find more information about this issue here).

Thankfully, there is a fix for that issue, however, Vue version 2.6.14 still has that issue, and version 2.6.15 hasn’t been released yet. That’s why there are only two solutions available:

Wait until version 2.6.15 is released
Migrate to version 2.7.10 which already contains the mentioned fix (source).

In our case, we have decided to migrate our Web App to version 2.7.10 because we don’t know when version 2.6.15 would be released.

2.3 Nuxt Memory leak

plugin definition

As it’s mentioned in the official documentation, we must be sure that we are not using Vue.use(), Vue.component() to define plugins as well as we don’t make changes to the Vue prototype or global Vue object inside the function exported by the plugin, because they would cause a memory leak on server-side.

client plugins

If we define that a plugin will be client side, then we must be sure that it’s not being used in the SSR because it will cause errors and may cause memory leaks. In the case that we have a client plugin trying to be used in the server, then we have the following options to resolve this issue:

make the plugin available in the server and client
if the plugin is being used in server lifecycle hooks, then move it to client lifecycle hooks or use the process.client as validation in the case is executed through other instances as middlewares that may execute on the client and server side.
if the plugin is executed in the template, then you can use the client-only tag to prevent processing the element on the server side.

window

We need to remember that the window interface is only available in the Client, because of that if we have pages or components that would try to use the window interface on the server side then it will throw an error and may cause memory leaks. That’s why it’s important to remember which lifecycles hook executes on the server and which on the client side: https://nuxtjs.org/docs/concepts/nuxt-lifecycle/

client-only

A memory leak issue was reported in the Nuxt repository caused by the use of the client-only tag on asynchronous component: https://github.com/nuxt/nuxt.js/issues/9279

The following code causes memory leak:

<template>
  <div>
    <client-only>
      <empty />
    </client-only>
  </div>
</template>

<script>
 export default {
   components: {
     Empty: () => {
       return import('~/components/empty.vue')
     }
   }
 }
</script>

The correct way to import the async component inside the client-only tag and prevent the memory leak it’s through a flag to not load the component until the web app executes in the client:

<template>
  <div>
    <client-only v-if="clientMounted">
      <empty />
    </client-only>
  </div>
</template>

<script>
export default {
   components: {
     Empty: () => {
       if (process.client) {
         return import('~/components/empty.vue')
       }
     }
  },
  data () {
    clientMounted: false
  },
  mounted () {
    this.clientMounted = true
  }
}
</script>

3. Improve memory use observability

At Glovo we use Datadog to manage the observability of our tech solutions. Before the incidents occurred, we had already defined dashboards to review memory consumption by host and process. As well as we have monitors to trigger alarms in case the memory consumption exceeds the defined thresholds. However, we realized that we should add more tools that help us to understand better the memory usage in our web app.

Continuous Profiler

The continuous profiler is a tool from Datadog that shows how much work each function is doing by collecting data about the program as it’s running. In our case, we enabled the Node.Js profiler. The tool gives us very detailed information about the functions executed in our web app and the resources they consume. However, after we enabled it we realized that it increased the memory consumption, not to the point of breaking the app but enough to increase the cost of our AWS EC2 hosts. That’s why we decided to leave the tool set up but disabled by a toggle in case we would have an unknown increase in memory consumption and we would need to dig deeper.

Tracing

We added a set of utilities to be able to trace NuxtJS runtime and display the spans in the APM flamegraph. To be more specific, we enabled this feature to trace Vuex Actions and Nuxt Middlewares.

In order to do this, we created a module that uses dd-trace library:

declare module '@my-custom/tracer' {
 import { Tracer } from 'dd-trace'
 import { ActionTree } from 'vuex'

 declare const tracer: Tracer

 export const tracedNuxtMiddleware: <T extends Function>(handler: T, middlewareName: string) => T

 export const tracedVuexActions: <R, S>(actions: ActionTree<R, S>) => ActionTree<R, S>

 export default tracer
}

Then we imported the function tracedNuxtMiddleware in the corresponding Middleware:

import { tracedNuxtMiddleware } from '@my-custom/tracer'

const middlewareHandler = () => {
  // Do whatever
}

export default tracedNuxtMiddleware(middlewareHandler, 'middleware-name')

Or the function tracedVuexActions in the Corresponding Vuex Action:

import { tracedVuexActions } from '@my-custom/tracer'

export const actions = tracedVuexActions({
  action1() {
    // Do whatever
  },
  action2: {
    root: true,
    handler() {
    // Do whatever
    },
  },
})

Thanks to this, now we are able to detect if a Vuex Action or Middleware is failing in the Server and may causing a memory leak.

4. Prevent memory leaks

In favor of preventing memory leaks to appear again in Production we have defined and tried the following strategies:

Linters

We added the following linters that have to succeed in order to allow a pull request to be merged.

eslint-plugin-clean-timer

This linter prevents adding setTimeout or SetInterval without a clearTimeout or clearInterval respectively. Link.

eslint-plugin-listeners

This linter prevents adding addEventListener without a removeEventListner. Link.

eslint-plugin-observers

This linter prevents executing observe methods from Observers (IntersectionObserver, ResizeObserver, MutationObserver) without a disconnect or unobserve at the end. Link

Memory Leak Tests

We tried to find a tool that on every pull request allows us to:

navigate to a page
execute a snapshot of the memory allocated (snapshot 1)
execute an action that causes navigation to another page
then go back to the previous page
execute a snapshot of the memory allocated (snapshot 2)

Afterward, we would compare the snapshot 1 against snapshot 2, then if the snapshot 2 is bigger than the snapshot 1, that means that there are resources that were allocated on the second page that weren’t released and will cause memory leaks.

We tried Memlab using a mock server in order to avoid network variability. This tool had all the features we were looking for:

good documentation and easy to understand
straightforward to configure and easy setup on the CI
scalable

After several attempts, we realized that the same test scenario executed many times gives different results, which makes really difficult to set the thresholds without introducing flakiness into the pull request pipeline. We even thought about setting a range of variety that we can allow between pull requests, for example, a 5% difference from the threshold set. However, because of the wide variety between the results values, it would be really difficult to set a good percentage of allowance without causing false-positive or introducing flakiness.

Because of that, we decided not to introduce Memlab to check memory leaks in the pipeline due to it will introduce flaky tests in the pipeline, pull requests would be delayed, and even more important it would introduce confusion and frustration among the Engineers.

Teach about Memory Leaks

We believe that memory leaks aren’t introduced on purpose, most of the time it’s because of a lack of knowledge about what causes them. That’s why the best way to prevent them to happen again it’s to teach your colleagues about them and improve their conscience about them and their effects.

Because of that, in Glovo we have shared articles and good practices through different learning channels as well as doing workshops.

In fact, we believe that this article it’s a good way to share knowledge and that’s why we have written it.

Conclusion

If your web app is facing issues with memory leaks, then the first thing I would recommend you do is to understand how memory leaks are affecting your web app, it’s not the same to have memory leaks that make your web app crash as memory leaks that just consume more resources than they should. Even if you have crashes, it’s important to know how often they occur and how much time your web app is down until it recovers. Because, depending on the severity of your memory leaks and how they impact your web app, then it will be the amount of time that you will have to research the causes of the memory leaks and the actions to mitigate, solve and prevent them.

After you understand your memory leaks, you need to define your action plan and how much time you would have for each phase of the action plan. Subsequently, you should execute that plan.

Thanks for reading to the end. I hope this article has contributed to improving your knowledge about memory leaks.

Acknowledgments

I would like to highlight that solving the memory leaks at Glovo was a collaborative effort where many Engineers contributed

That’s why I would like to finish the article by thanking Fotis Adamakis, Juan Arocha, Mateusz Chrzonstowski, Łukasz Strączyński, Joel Almeida, Evgenii Zhukov, Ian Roskow, Aliaksei Harakh, Magdalena Dusza, Karolina Jabłońska, and Dominik Kościelak for their contribution and support in solving the memory leaks that we faced in our landing page at Glovo.

Disclaimer: At the time of posting this article on the Glovo Engineering Blog, Mateusz Chrzonstowski, Ian Roskow, Karolina Jabłońska, and Dominik Kościelak are no longer employed at Glovo.