Nurullah Samed Savaş
Trendyol Tech
Published in
3 min readSep 25, 2023

--

Sharing NLP Models among Gunicorn Workers: Reducing Memory Usage and Boosting Performance

Introduction

As the Trendyol Translation team, we are developing processes to meet the translation needs of different teams within our company.

In some cases, before the translation process, we need to perform various text analyses on the source text. One of these analyses is the segmentation. In the segmentation step, we segment long texts to be translated into sentences and ensure that the translation is done sentence by sentence.

To meet this need, we have a Flask application written in Python. In many Python applications, the utilization of natural language processing (NLP) models is common for performing text analysis. However, especially when working with NLP models which can consume a significant amount of memory, the performance of our applications can be negatively impacted due to excessive memory usage.

Our Problem: Excessive Memory Usage Without Preloading

In our application, we need to load an specialized NLP model into memory as shown below in order to perform segmentation using the Spacy library.

# main.py
import spacy
from flask import Flask
app = Flask(__name__)
# Load SpaCy model
nlp = spacy.load(“en_core_web_sm”)


@app.route(“/”)
def segmentation():
# Use the model for text processing
text = “Hello, this is a SpaCy example.”
doc = nlp(text)

At first, our application used a single gunicorn worker, and we were scaling our application only with Kubernetes replicas. However, this meant a separate resource cost for each replica. To maximize efficiency for each replica, we decided to increase the number of workers. Gunicorn will, by default, start up as many workers as we specify in the configuration.

# gunicorn_config.py
workers = 4 # worker count

This way, we allow each replica to handle a higher number of concurrent connections. However, when we increased the number of workers, we noticed a significant increase in the application’s memory usage.

The reason for this was that the ‘spacy.load’ part in the above code run for each worker. This means that nlp model being loaded into memory for each worker.

This problem becomes particularly pronounced when dealing with NLP models, which can consume a significant amount of memory. The redundant loading of these models for each worker not only wastes resources but also impacts the application’s scalability and responsiveness.

Solution: Preloading the Language Model

The solution is a simple configuration change. If we set the ‘preload_app’ configuration to ‘True’ as shown below, the application code is imported into the master process before the worker processes are forked. After the application instance is loaded into the master process, the worker processes are created as copies of this main gunicorn process and all of them share the same memory space.
This means that the ‘nlp’ variable in the example code above is loaded only once and shared by all the workers.

# gunicorn_config.py
workers = 4 # worker count
preload_app = True

Preloading ensures that the model is available for all workers. This approach is particularly beneficial for applications using NLP models, improving memory efficiency and response times.

Result

Before this parameter change, our application, running with 4 workers, had a memory usage of approximately 1.9GB, but after the change, with the same number of workers, our memory usage decreased to 572MB.

Memory usage with 4 workers before parameter change
Memory usage with 4 workers after parameter change

This solution makes your application more scalable and resource-efficient, especially in cases involving substantial NLP models.

If you are also using Gunicorn for resource-intensive services like our example, you can achieve significant cost efficiency with this parameter. The more extensive your application’s memory usage, the greater the benefits you will gain from this change.

If you want to be part of a team that tries new technologies and want to experience a new challenge every day, join us.

--

--