Web Performance Optimizations
A lot of programmers are familiar with the famous quote of Donald Knuth:
“Premature optimization is the root of all evil.” — Donald Knuth
However, as it usually happens with famous quotes, in the full context the meaning slightly changes.
“Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.” — Donald Knuth
The real question is how to detect in advance which optimizations will be critical for a specific program. Depending on endless circumstances and conditions it can be absolutely anything.
Still, we can try to distinguish the most common pitfalls at least for web applications. There are already a lot of fine sources with recommendations for frontend performance (f.e. this checklist) including such bits of advice as minified HTML, CSS, and JS, lazy loading, images optimizations, non-blocking JS calls, etc, which are easy to implement and guarantee a performance boost. In this article, I’d like to try combining a similar checklist but for the backend performance. Backend is more tricky in terms of general recommendations so each advice should be carefully checked in the context of your specific application before blindly applying all the practices.
I’ll group the advice in 4 categories— Data, CPU & IO, Metrics, and Scaling. Databases have entire books written devoted solely to performance tuning, so in this article we’ll be focusing on the way how the app interacts with data which let’s assume is already stored within a perfectly tuned database.
Data
Store the required data in memory
IO operations are increasing latency, so storing the required and/or commonly used data in memory boosts the performance. To minimize the tolls on crashes use persistent in-memory storages so the data can be at least partially restored on the process restart.
But keep in mind, that sometimes cache invalidation may cost a lot, so use it wisely.
Construct a colocated data model
Ideally, the related data should be located on the same host (and be stored in memory). All the data necessary to service a specific request should be available locally without extra lookups.
Use data types which can be stored sequentially
With massive cache usage, memory access can become a bottleneck. Prefer flat arrays instead of linked lists when possible.
Fewer DB requests
For the cases when you must retrieve the data from the database it’s better when it can be done within a single query.
Fewer SQL JOINs
Multiple JOINs can dramatically degrade query performance. When a query requires 3 and more JOINs then it’s a good reason to think about database tables denormalization.
Don’t store more data than you really need
Unnecessary data slows down performance. It doesn’t necessarily mean to blindly truncate all the data, just be smarter on deciding on what and where should be stored. For instance, session data can be stored in cookies instead of a table. Btw, cookies shouldn’t be too heavy either.
RAM
Have as much RAM as you need (within reasonable limits of hardware, OS, and funds).
CPU & IO
Threads number should be close to a number of cores
In a perfect case, threads should be running in parallel, each pinned to its own core, with a minimum amount of context switches.
Batch writes
Instead of doing single writes group your data and do batch writes wherever that’s possible.
Use async and non-blocking operations
Locks are overhead. Each time lock is used the app should go down by the OS stack. Prefer async operations wherever it makes sense and does not complicate the codebase too much.
Parallelize operations
In order to reduce the overall processing time.
Use thread pulls with a fixed number of workers
This implies that there’s a queue to drag a work from, which substantially increases throughput. Thread per connection/worker usually leads to a case when you have more threads than cores meaning your system tries to do too much at once.
Compress the data on a disk
IO operations are usually time-consuming. Data compressing can cost some CPU cycles but will help to effectively increase IO throughput.
Compress the data being sent over the network
It decreases transfer time and increases throughput. The CPU time cost of the compression and decompression is usually trivial. The overall efficiency of a system using compressed network transmissions is almost always higher than sending data uncompressed.
Keepalive connections
Minimizes the costs of connection open/close operations. Especially valuable for the cases of frequent requests.
Data streaming
You can save some CPU and memory if you stream the data to the external services directly instead of combining a full file in advance (e.g. firstly writing data to a disk and then reading it and post the data to the network).
Operating systems limitations
On Linux, if you have more than approximately 1000 files per directory then performance will start to degrade. You can split them up, and store the files in nested directories. Popular databases (MySQL, PostgreSQL, etc) store tables in files, so too many tables can affect performance from an OS perspective as well.
Metrics
Little’s law
Using this law we can determine how many app instances do we need to cover the application load. A theorem by John Little states that:
L = λ * W
In the context of a web application, L is the number of app instances (e.g. threads/processes with an app copy), λ is the average requests rate (e.g. 10 requests per second), W is the average response time (e.g. 0.3 seconds).
10 * 0.3 = 3 (3 is the number of app instances we need)
It’s not the most accurate prediction, but at least it’s an easy way to approximately estimate required app instances for given load requirements. Using the same formula we can calculate the theoretical maximum throughput. It also helps to realize that the page load time is quite important when it comes to surviving traffic peaks.
Keep in mind, that in real life these are not isolated units, and with increasing load on the database, cache, network, the numbers won’t be changing strictly proportional. In other words, if DB is the bottleneck, then the increase of app instances number won’t increase throughput.
Know your requirements
You should know in advance the expected traffic on your web site and plan the app infrastructure accordingly.
Use monitoring tools
Automatically detect spikes in CPU and RAM. During the application’s lifespan pages tend to grow up in size and their load time tends to slow down. It’s a good practice to measure this, set boundary values (e.g. the max page load time is 200ms and the max page size is 100KB, except for the data from CDN), and take action when it starts to get out of control.
Automated testing is essential
Use integration and unit tests for daily agile development. Use performance testing to ensure your application will perform well under the expected workload.
Aggregate your logs
Aggregate and store logs in a central location for easy access. Keeping the logs explicit but not extremely verbose is a separate art form.
Timeouts
Put timeouts on all out-of-process calls and pick a default timeout for everything.
Scaling
Use CDNs
Guarantees faster load times for users, can be quickly scaled in case of traffic spikes.
Prefer eventual consistency if possible
Eventually consistent data storage systems use asynchronous processes to update remote replicas. If BASE (Basically Available Soft-State Eventually Consistent) is sufficient for your data, then you can easily achieve scalability and availability from a distributed data storage system.
Autoscaling
Setup auto-scaling based on the data from your monitoring tools. DDOS attacks may cause extra scale up operations, so set smart rules to make sure if a scale-up is necessary.
Service discovery
A central server (or servers) that maintain a global view of addresses and clients that connect to the central server to update and retrieve addresses. It’s a must-have for a dynamic system with autoscaling in place anyways.
Containers
Containers make it easier to manage deployments and service discoveries.
Decentralize services
Embrace self-service wherever possible, allowing services to be independently deployable and backward compatible. Prefer choreography over orchestration with smart endpoints to ensure that you’re keeping things cohesive with associated logic and data within service boundaries.
Instead of a conclusion
Resources
Donald Knuth book “Computer Programming as an Art”
https://github.com/futurice/backend-best-practices
http://khaidoan.wikidot.com/performance-tuning-backend
https://github.com/binhnguyennus/awesome-scalability
Piotr Murach talk “It is correct, but is it fast?”