Web Performance Optimizations

6 min readApr 10, 2019

A lot of programmers are familiar with the famous quote of Donald Knuth:

“Premature optimization is the root of all evil.” — Donald Knuth

However, as it usually happens with famous quotes, in the full context the meaning slightly changes.

“Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.” — Donald Knuth

The real question is how to detect in advance which optimizations will be critical for a specific program. Depending on endless circumstances and conditions it can be absolutely anything.

Still, we can try to distinguish the most common pitfalls at least for web applications. There are already a lot of fine sources with recommendations for frontend performance (f.e. this checklist) including such bits of advice as minified HTML, CSS, and JS, lazy loading, images optimizations, non-blocking JS calls, etc, which are easy to implement and guarantee a performance boost. In this article, I’d like to try combining a similar checklist but for the backend performance. Backend is more tricky in terms of general recommendations so each advice should be carefully checked in the context of your specific application before blindly applying all the practices.

I’ll group the advice in 4 categories— Data, CPU & IO, Metrics, and Scaling. Databases have entire books written devoted solely to performance tuning, so in this article we’ll be focusing on the way how the app interacts with data which let’s assume is already stored within a perfectly tuned database.

Data

Store the required data in memory

IO operations are increasing latency, so storing the required and/or commonly used data in memory boosts the performance. To minimize the tolls on crashes use persistent in-memory storages so the data can be at least partially restored on the process restart.
But keep in mind, that sometimes cache invalidation may cost a lot, so use it wisely.

Construct a colocated data model

Ideally, the related data should be located on the same host (and be stored in memory). All the data necessary to service a specific request should be available locally without extra lookups.

Use data types which can be stored sequentially

With massive cache usage, memory access can become a bottleneck. Prefer flat arrays instead of linked lists when possible.

Fewer DB requests

For the cases when you must retrieve the data from the database it’s better when it can be done within a single query.

Fewer SQL JOINs

Multiple JOINs can dramatically degrade query performance. When a query requires 3 and more JOINs then it’s a good reason to think about database tables denormalization.

Don’t store more data than you really need

Unnecessary data slows down performance. It doesn’t necessarily mean to blindly truncate all the data, just be smarter on deciding on what and where should be stored. For instance, session data can be stored in cookies instead of a table. Btw, cookies shouldn’t be too heavy either.

RAM

Have as much RAM as you need (within reasonable limits of hardware, OS, and funds).

CPU & IO

Threads number should be close to a number of cores

In a perfect case, threads should be running in parallel, each pinned to its own core, with a minimum amount of context switches.

Batch writes

Instead of doing single writes group your data and do batch writes wherever that’s possible.

Use async and non-blocking operations

Locks are overhead. Each time lock is used the app should go down by the OS stack. Prefer async operations wherever it makes sense and does not complicate the codebase too much.

Parallelize operations

In order to reduce the overall processing time.

Use thread pulls with a fixed number of workers

This implies that there’s a queue to drag a work from, which substantially increases throughput. Thread per connection/worker usually leads to a case when you have more threads than cores meaning your system tries to do too much at once.

Compress the data on a disk

IO operations are usually time-consuming. Data compressing can cost some CPU cycles but will help to effectively increase IO throughput.

Compress the data being sent over the network

It decreases transfer time and increases throughput. The CPU time cost of the compression and decompression is usually trivial. The overall efficiency of a system using compressed network transmissions is almost always higher than sending data uncompressed.

Keepalive connections

Minimizes the costs of connection open/close operations. Especially valuable for the cases of frequent requests.

Data streaming

You can save some CPU and memory if you stream the data to the external services directly instead of combining a full file in advance (e.g. firstly writing data to a disk and then reading it and post the data to the network).

Operating systems limitations

On Linux, if you have more than approximately 1000 files per directory then performance will start to degrade. You can split them up, and store the files in nested directories. Popular databases (MySQL, PostgreSQL, etc) store tables in files, so too many tables can affect performance from an OS perspective as well.

Metrics

Little’s law

Using this law we can determine how many app instances do we need to cover the application load. A theorem by John Little states that:

L = λ * W

In the context of a web application, L is the number of app instances (e.g. threads/processes with an app copy), λ is the average requests rate (e.g. 10 requests per second), W is the average response time (e.g. 0.3 seconds).

10 * 0.3 = 3 (3 is the number of app instances we need)

It’s not the most accurate prediction, but at least it’s an easy way to approximately estimate required app instances for given load requirements. Using the same formula we can calculate the theoretical maximum throughput. It also helps to realize that the page load time is quite important when it comes to surviving traffic peaks.
Keep in mind, that in real life these are not isolated units, and with increasing load on the database, cache, network, the numbers won’t be changing strictly proportional. In other words, if DB is the bottleneck, then the increase of app instances number won’t increase throughput.

Know your requirements

You should know in advance the expected traffic on your web site and plan the app infrastructure accordingly.

Use monitoring tools

Automatically detect spikes in CPU and RAM. During the application’s lifespan pages tend to grow up in size and their load time tends to slow down. It’s a good practice to measure this, set boundary values (e.g. the max page load time is 200ms and the max page size is 100KB, except for the data from CDN), and take action when it starts to get out of control.

Automated testing is essential

Use integration and unit tests for daily agile development. Use performance testing to ensure your application will perform well under the expected workload.

Aggregate your logs

Aggregate and store logs in a central location for easy access. Keeping the logs explicit but not extremely verbose is a separate art form.

Timeouts

Put timeouts on all out-of-process calls and pick a default timeout for everything.

Scaling

Use CDNs

Guarantees faster load times for users, can be quickly scaled in case of traffic spikes.

Prefer eventual consistency if possible

Eventually consistent data storage systems use asynchronous processes to update remote replicas. If BASE (Basically Available Soft-State Eventually Consistent) is sufficient for your data, then you can easily achieve scalability and availability from a distributed data storage system.

Autoscaling

Setup auto-scaling based on the data from your monitoring tools. DDOS attacks may cause extra scale up operations, so set smart rules to make sure if a scale-up is necessary.

Service discovery

A central server (or servers) that maintain a global view of addresses and clients that connect to the central server to update and retrieve addresses. It’s a must-have for a dynamic system with autoscaling in place anyways.

Containers

Containers make it easier to manage deployments and service discoveries.

Decentralize services

Embrace self-service wherever possible, allowing services to be independently deployable and backward compatible. Prefer choreography over orchestration with smart endpoints to ensure that you’re keeping things cohesive with associated logic and data within service boundaries.

Instead of a conclusion

Resources

Donald Knuth book “Computer Programming as an Art”
https://github.com/futurice/backend-best-practices
http://khaidoan.wikidot.com/performance-tuning-backend
https://github.com/binhnguyennus/awesome-scalability
Piotr Murach talk “It is correct, but is it fast?”