Capacity Planning

Krishni Andradi
WSO2 Solution Architecture Team Blog
5 min readJan 3, 2020

Understanding the importance of capacity planning and concepts around it.

What is Capacity Planning and Why is it important?

Capacity Planning

Capacity planning is planing the number of resources that needs to be allocated for an application, to make that application running with a required uptime, with required latency, with expected Security, and with required QoS.

Basically here we need to identify the needed CPUs and memory requirements for your application.

Importance

When you successfully develop your application and deploy it you expect it to be up and running successfully without failing occasionally, and you expect it to have a good performance with reduced latency, you expect it to have a good quality of service with no message loss and it to have a fair level of security.

On the other hand, you want to run your application on top of hardware which takes minimal resources, minimal cost, So that the amount users need to pay will be lower.

If your application does not fulfill the above requirements, even though it is the best application for a particular task on the operation wise, people won’t embrace it. As having occasional downtimes, less QoS, less Security, and high latency harm the user experience and user trust.

So In order to guarantee the above nonfunctional requirements, you need to have a proper sizing of hardware and space your application would need.

What are the factors to be considered and How do they affect?

Throughput

Throughput is the most important factor when deciding the capacity. This is the amount of work you want to process from your system during a specific period of time. This is normally measured in transactions per second. You may need to consider average TPS and peek TPS when planning capacity.

Latency

Latency is the time a user needs to wait until he/she gets the response back from the system. So this is the response time of the system. For some systems having a high latency could be fine. But most users would like to receive the response within the first 30s.

Message Size

Message size also affects the performance of the system as those messages are needed to remain in the RAM while those messages are processing. So if the message size is too high, the system may need to wait until RAM is ready.

Concurrent users

The number of concurrent users also affects the performance, as those users may try to access the same resource once, and you may need to wait until the first user releases those resources.

Work per transaction

Processing needs to be done for a transaction obviously takes resources and time than the ‘pass-through’ one. Some transactions may need to do multiple processing like database operations, calling some backends, data transformation in one go. So those consume more resources. Either you need to consider them as multiple transactions or you need to measure transaction work.

Clustering

Active

The active node is a primary node that accepts incoming traffic and processes them.

Passive

A passive node can remain running or not, and it will be activated upon a failover scenario.

There are three passive modes in which a server can exist. Those are Hot-standby, warm-standby and cold-standby.

Hot-standby

Identical to the primary node. The Server and the application are in the running mode. So in case of failure, it only takes the state replication time to act as a primary node. Normally this state replication can take a few seconds.

Warm-standby

Identical to the primary node. The Server is in running mode, but the application is not. So in case of failure, it only takes the state replication time and application start time to act as a primary node. Typically this takes a few minutes.

Cold-standby

This is also identical to the primary node. But the server nor the application is running. In case of failure, it takes server startup time, application start time and state replication time for this to act as a primary node. Typically this time is measured in hours.

Load balancer

The load-balancer is the component that regulates the traffic to the primary nodes. In case of failure, this will point out to the failover node and regulate traffic to those. If you are clustering you need to have a load balancer that points out the correct servers that traffic needs to be regulated.

When allocating failover nodes we also need to consider the load which we expect it to handle. If you had 4 primary nodes and 1 secondary passive node, and if 2–3 primary nodes failed, having one secondary node won’t help in as it cannot handle the load of all failed nodes.

Scaling

Horizontal scaling

Vertical scaling is increasing the number of servers that are needed. This is also addressed as Scale-out/in.

Vertical scaling

Horizontal scaling is increasing the resources of servers. This is addressed as Scale-up/down.

Capacity forecasting

When you are deploying an application you also need to forecast the future requirements, your application usage may grow by year by year. Sometimes there would be a peak usage on several months.

So your application should be able to scale-up or scale-out at those times, and that scaling-out or scaling-up needs to be done at a much lesser time, to reduce the downtime.

So you need to design your architecture in a way it can be scale-out or scale-up easily.

Auto Scaling

When it comes to scaling you don’t need to bother so much whenever you are deploying your application on top of Cloud which supports auto-scaling.

Autoscaling is where you can configure a range of resources you want to use. So depending on the load, it will use more resources, or else it will work with the minimal amount of resources in that range.

Conclusion

In conclusion, you need to understand your requirements and check your application performance numbers to come up with proper sizing. When you check the performance numbers it will be ideal if you can use the same environmental factor as your system which the application going to run. Otherwise, if you perform a performance test on a separate machine that does not have any background operations it will not give the exact results you want. But in case you are using someone else’s performance numbers always try to keep a buffer, and expect a bit less performance than the state.

References

Thank You

--

--