Software Performance Tuning Methodology: Discover, Design, Measure & Refine

dm03514
Dm03514 Tech Blog
Published in
7 min readDec 17, 2019

--

Software performance tuning is often regarded as a dark art for low level hackers. In my experiences the majority of performance tuning is much more social and systems based, opposed to low level OS wizardry. This post outlines an approach to software performance tuning which incorporates performance into the very beginning of the software development lifecycle. This post was adapted from an internal talk.

What is Software Performance? And Why Should We Care?

Software performance is an overloaded term and can be used to mean a number of things:

  • Responsiveness (latency)
  • Stability (availability)
  • -ilities (scalability, extensibility, maintainability, extensibility)

Without refining the definition yet, we should care because software performance:

  • Impacts Customer Experience
  • Informs Scaling & Provisioning
  • Indicates Performance Regressions (relative changes)

What’s the Problem?

Performance is (unfortunately) Emergent, meaning that individual systems performance may not inform the larger combined systems performance. Even if this is not strictly 100% true, it’s difficult to predict the performance of end user system until the system is fully put together and running on hardware that will be servicing the production workload. This has 2 important implications:

  • Difficult to predict performance (Favor Observation)
  • System Specific (Local Doesn’t Predict Remote)

Because of this, we often see the mantra: Make it Work then Make it Fast emerge. This suggests that we shouldn’t worry about performance until we have a working project. While I think this is better advice than the other end of the spectrum, there is plenty of low overhead work we can do to design performance in.

Finally! Performance is:

Defined by clients, and a proxy for their experience! Performance is the executable validation of a design. Performance will happen either Explicitly, under our control and guidance or implicitly, hitting against unforeseen user expectations or unforeseen physical limitations (such as the speed of light).

Ensuring Code is Performant

Discover

What are the known or discoverable constraints?

Explicit Constraints

These will often be viewed from a client’s perspective. These are often contractual ie a customer is paying for a certain level of service:

  • Response must be served with 100ms
  • 99.99% of requests need to be < 1 second

The two most common sources for these are:

  • Contractual
  • Product Management

Implicit Constraints

Implicit constraints are performance limits hidden in requirements. If a program is doing daily reports or hourly rollups, the interval of those actions may defined the expected limit. These are constraints that are often end user usability experience (UX) focused. For example if an action is blocking in the user path there is a ~100ms limit for the site to feel “responsive”.

Physical Limitations of Systems

Physical limitations are often dictated by the minimum achievable performance in the physical world! If certain latencies are required between largely separated locations performance will run up against the speed of light! There are other performance critical constraints based on the current speed of hardware, seen in the “Latency Numbers Every Programmer Should Know” infogram.

Reasonable Ranges

Reasonable ranges are formed based on knowledge of physical system limitations and experience of observing systems in production. Suppose that there is a go service that accepts and HTTP connection, makes a get request to redis and then writes and closes the HTTP connection. The service is being tested locally by running both the service and redis and driving traffic using apache benchmark. Locally is a 2019 macbook pro with 6 core and 16GB of ram. On the first test the service is only able to handle 50 requests per second! Based on experiences this is unreasonable. This service should be able to push a couple hundred requests per second easy.

Design

Design phase is focused on materializing the structure of a concrete implementation. This is where discovered constraints are combined with implementation strategies. Most of the performance work should be done here, as this is where the solution is most malleable and has the least cost associated with changing.

Measurement Strategy

First step is to define a measuring strategy:

  • What is being measured?
  • Where is it being measured?
  • How will it be collected?

And finally what will we use to determine if this is “performant”? Google’s strategies for choosing good SLOs highly align with this step.

Theoretical Performance

Theoretical Design takes into account theoretical performance and runtime complexity of implementation algorithms and datastructures:

It also includes system specific implications of performance, like implications of schema choices on database query execution.

Isolatable Components — Testable Design

This brings us to the code level. There are a number of techniques that reduce friction in isolating components for individual benchmarks and in providing stub (highly controllable) implementations for performance testing:

  • Interfaces & Swappable Implementations
  • Dependency Injection
  • Exercise components in isolation using test harness
  • Ability bring up and exercise the service using test load

Measurement

Strategies

Using the measurements defined above its time to start determining if performance has been “achieved”! Since this methodology puts client experience as first priority if performance meets the clients expectations then no more tuning is necessary:

Execution : Generate Load

Many different ways to load a system. Best to start with the higher level (customer representative metrics) and slowly work down into implementation details as more data becomes available. Many different test types:

Local (relative) vs Remote in a Prod-like environment (absolute)

Remote Long running, verifies steady state

Establish Baseline

Establishing baseline is crucial for seeing relative change in performance. Without a baseline it’s difficult to understand the impact of a change:

Profile

Profiling is used to determine where an application is spending its time or other resources (ie memory). This is often the “hands-on” part of performance tuning. Using low level tools to understand which resources a program is using. The goal is to generate empiric data for where an application is spending its time, and which resources its using:

Visualize!

Once data is captured it’s important to put it in a form that can be easily understood. A common profile visualization technique is called Flame Graphs and was created by Brendan Gregg:

A detailed description on how it can be used to understand application performance can be found on the acmqueue article here.

(Image belongs to Brendan Gregg and was Published on acmqueue)

Refine

Refining is the hands on act of combining the previous steps to generate data, observe the performance, create hypotheses, and execute experiments.

Scientific Method in Action

Strategies

Theory of Constraints (“a chain is no stronger than its weakest link”) is a common tuning strategy. It focuses on finding the largest bottleneck, or contributor to latency, to achieve desired performance. It identifies the bottleneck, removes the bottleneck, and then remeasures to identify the next bottleneck. In performance tuning there will always be a slowest operation. Because of this performance tuning provides diminishing returns. At some point it doesn’t make economic sense to continue increasing performance.

Anti-Patterns

Performance Engineering has a number of insidious anti-patterns:

Identifying bottlenecks based on Intuition vs Facts

Solution is to let reality (fact-based evidence) be the guide

Using Local Performance to predict global

Performance is emergent so the best local performance can tell is a relative performance increase/decrease based on previous local baselines

Tuning without overarching client centric measurements

Choose a client representative performance metric in order to understand the impact of performance tuning on the client’s experience.

Choosing low impact candidates

If an operation is responsible for 5% of latency and can take 4 weeks to halve and another bottleneck is responsible for 10% of latency and takes 4 weeks to halve it should be intuitive to attack the 10% latency operation. This can be addressed by using fact based evidence combined with a methodology like Theory of Constraints.

Not collecting data because it’s too difficult, low level, app wasn’t designed properly etc.

There’s ways to measure pretty much everything (BPF) now!

(antipattern image property of Martin Fowler)

Conclusion

  • Performance is as much Design as it is Execution
  • Performance is social
  • Performance is continuous
  • Performance is about uncovering what the program is actually doing by observing where it spends its time
  • Performance is application of Scientific Method

Happy Performance Tuning!

--

--