Scaling a Digital Library Server for University-Wide Deployment: An Evaluation of Performance and Scalability

Published in

Networks @ FIIT STU

7 min readMar 31, 2024

Introduction

This blog post details my first experience with scaling a digital library that our faculty students use, aiming for a broader deployment across the entire university. It’s the first time I’ve tackled such a large-scale project deployment without sufficient traffic data. The purpose of this post is to share my methodology, which is based on experiments conducted with a tool named Locust, to make an informed estimate of the resources needed to handle an increase in traffic.

The goal is to create a deployment diagram and calculate the necessary physical resources — CPU, RAM, and storage. These calculations are for the base hardware requirements, on which some type of hypervisor will subsequently run (probably Kernel Virtual Machine — KVM).

Background

At the Faculty of Informatics and Information Technologies, our students and staff have collaborated to develop an academic digital library named ELVIRA. The architecture of the project is shown on the component diagram provided.

The diagram consists of the following components:

EvilFlowers OPDS: A Python-based application that functions as a document server. It is noteworthy that this component may consume substantial CPU resources, as it is sometimes tasked with generating PDF documents in real-time. This project can be found as open source on GitHub: EvilFlowersCatalog. Name is a tribute to the one and only Charles Baudelaire (books haha).
Elvira Portal: A Single Page Application (SPA) that interfaces with the document server.
LDAP: Employed as the authentication backend for EvilFlowers OPDS.
Redis: Utilized by EvilFlowers OPDS as a cache backend and for managing asynchronous job queues.
PostgreSQL:The relational database that serves as the storage backend.
Storage: Represents the persistent storage solution for the system.
nginx: Facilitates server-side caching, serves static files for the SPA, and acts as an HTTP reverse proxy for EvilFlowers OPDS..

The application, as depicted in the diagram below, is currently deployed on a single virtual machine equipped with 4 vCPUs and 8 GB of RAM.

Deployment diagram: ELVIRA deployment on elvira.fiit.stuba.sk — ELVIRA deployment on elvira.fiit.stuba.sk

Methodology

Tools and Setup

We utilized a Python-based tool named Locust to simulate user traffic on our test environments. One key advantage of Locust was its ability to interpret user scenarios through a HAR file. The process involved simply opening a web browser, defining basic user interactions, and then exporting the requests to the document server via the Network Inspector tool. These HAR files can be converted to Locust tests with the assistance of the har2locust utility. Although it is beneficial to refine the generated Python script, this process was relatively quick (including authentication). A demonstration of such scenario creation is available in the video below.

To enhance the accuracy of our measurements, we provisioned an additional VM with comparable hardware specifications outside of the university’s infrastructure, specifically on Azure. This virtual machine was configured with an identical database to the development server.

Experiment Design

We conducted various experiments on both instances to understand the server’s capabilities:

Capacity Testing: We gradually increased the number of users over time to determine the maximum number of active users the server can handle. We deemed the server capable if it responded to user requests in less than one second.
Peak Traffic Performance: We tested the server’s long-term operation at the user limit identified in the previous test to see if response times remained consistent.

Below is an example of how to set up a sample testing project using experiment.har:

# Creating Python virtual environment
python -m venv .venv

# Activating the environment (Linux / macos)
source .venv/bin/activate

# Install dependencies
pip install locust har2locust

# Create a locustfile scenario from experiment.har
har2locust experiment.har > locustfile.py

The conversion of your HAR recording into a Python script, which includes HTTP requests to your server, is automated. However, due to its machine-generated nature, some manual refactoring is advisable:

Any HTTP OPTIONS calls have been omitted.
Took out headers that weren’t needed (Accept, Referer, TE, CORS and client cache related).
Authorization implementation.
Sometimes, we had to tweak the request data to make sure it was in JSON format.
Adjust wait-time between user request to range from 1s to 10s.

Once we updated the locustfile.py, we were all set to run our first capacity test. First of all we run the Locust web interface (inside the activated virtual environment):

python -m locust -f locustfile.py

Here are the parameters we used for capacity testing:

We set the maximum number of parallel users to 50.
A new user was added every 30 seconds to gradually build up the load on the server.
The experiment duration is set to 40 minutes.

Capacity test configuration for Azure VM instance

Results

Capacity testing

The capacity test yielded informative visual data through the charts generated by Locust. From these results, it was determined that:

The FIIT STU server can serve up to 25 active users at a time.

Capacity test results for FIIT STU instance

The Azure VM can support up to 30 active users simultaneously.

Capacity test results for Azure VM instance

Peak Traffic Performance

After identifying the user limits for the existing infrastructure, we aimed to determine if the server could sustain such traffic over an extended period. To this end, we conducted a test with 25 active users for one hour.

As you can see, the results were worse than expected. The average response time for the FIIT STU instance increased by more than 0.65 seconds, with the difference between the beginning and the end of the experiment being only 0.1 seconds.

Peak Traffic Performance for FIIT STU instance with 25 active users.

The Azure instance performed slightly better, with the average response time increasing by 0.45 seconds and a similar difference between the start and end of the experiment as the FIIT STU instance (0.1 seconds).

To investigate the long-term impact of this variation in response times, we conducted another experiment by reducing the number of active users to 20:

By the end of the experiment, the FIIT STU instance had an average response time of 1.3 seconds, with a minor change of 0.07 seconds from start to finish.
The Azure instance showed an average response time of 1.2 seconds at the end of the experiment, where the change between the start and the end was negligible.

Discussion

The results showed that our instances struggled with handling even a moderate number of users simultaneously. This outcome points to a need for a better scaling strategy, especially as we prepare for wider university use.

The Azure instance served as a benchmark to lessen the impact of potential high loads or outdated hardware at the university. Essentially, it helped us understand the application’s performance without the constraints of our current infrastructure.

Our main goals were to determine the application’s capacity limits and to identify which resources it primarily uses. Interestingly, we found that the application is more CPU-intensive than expected, with less demand on RAM (these findings are not described in this blog post — in the next blog post we would like to describe this measurement process). This insight is crucial for planning our next steps.

Scaling strategy

We aim for our application to support around 550 active users. Based on the outcomes of our experiments, we’ve decided to deploy the document server behind a load balancer as part of an application cluster. Each node in this cluster will be equipped with 32 vCPU cores and 16GB of RAM.

From our capacity testing, we determined that handling a single user requires, on average, 0.1467 CPU cores. Therefore, to accommodate approximately 550 users, we calculated a need for around 80 CPU cores. Our planned cluster will have a total of 96 CPU cores available, which should be more than adequate, even when considering the peak traffic performance results. This setup ensures we have a buffer to maintain performance during high demand, aligning with our scaling strategy to provide a reliable service to the wider university community.

Production application cluster deployment

Conclusion

Certainly, the final infrastructure setup will demand significantly more resources than our initial calculations suggest. Our experiments didn’t account for several critical components, such as the database server, the load balancer’s requirements, and management services like the Elastic Stack for monitoring and indexing. Integrating these elements will constitute the next phase of our infrastructure planning. Currently, we’ve managed to outline a proposed deployment diagram, laying a foundational blueprint for further development.

University deployment diagram for digital library

In the coming months, we’ll have access to our first batch of hardware, equipped with 64 CPU cores, 128GB of RAM, and ample storage capacity (including a dedicated storage RAID). This setup will initially support the digital library’s deployment across the first batch of faculties. Subsequently, we will conduct another round of experiments to verify the accuracy of our predictions and ensure that our scaling strategy meets the real-world demands of our university community.

If you have any notes or questions, feel free to discuss them in the comments or reach out to us. See you in the next post about the experiments on our first infrastructure setup!