When Every Millisecond Counts — Designing Adobe Experience Platform Audience Manager for Performance
Adobe Experience Platform Audience Manager processes billions of events every day. Some are collected in batches and others are collected in real time. However, all of these events need to be processed in an efficient and timely manner in order to help our clients successfully grow their businesses. In today’s highly competitive business landscape where every millisecond counts, this can only be achieved through establishing a culture of continuous performance improvement.
We often think of performance in terms of throughput, latency, or memory usage. But there are many other factors that can help improve performance such as:
- Making the best possible use of resources through maximizing outputs from given inputs to minimize costs.
- Performing tasks and fulfilling high data volume within milliseconds with high reliability and low latency.
- Improve user experience.
- Increase stability.
These are just a few of the factors we should consider when it comes to the performance of experience applications. Thinking about performance more holistically required Adobe Experience Platform to build a mindset in which every code we write and every application we deploy is guided by performance.
Performance as a mindset
We have all heard that premature optimization is the root of all evil. Donald Knuth thought so when he wrote The Art of Computer Programming back in the 1960s. But, what if later is too late?
Consider, today’s agile methodologies allow development teams to ship code continually and iterate quickly. This frequently allows engineerings to deploy fixes to bugs as soon as they are identified.
Designing for performance remains a balancing act to ensure that precious time-consuming resources are not wasted on the wrong things. For example, we definitely do not want to focus on optimizing for performance before the primary functionality of the product has been established. However, there are a number of things we’ll explore here that you should consider earlier in the design process to improve performance and avoid potentially costly problems in your applications.
Simplicity and readability
A line of code is read many more times than it is written. Complex code requires time — one of the most essential resources when it comes to performance. The more complex the code, the more difficult it will be to debug, maintain and to extend.
Keeping your code as simple as possible using indentation, logical code organization and naming conventions, along with good documentation can make it more readable and more easily extensible.
Keep your code stateless
Horizontal scaling is adding additional virtual machines to provide more resources when needed is a common practice in cloud environments. Horizontal scaling can easily be applied on stateless applications that do not keep an in-memory context. Stateless machines in the cloud are usually part of an auto-scaling group that apportions the computational resources based on the number of active servers automatically. Because they do not maintain information about the session in memory, you can add or remove machines without jeopardizing data consistency.
In contrast, with a stateful machine that maintains session information or other data in its memory, removing it can be risky. With this type of system, a hot standby — a copy ready to take over whenever the main machine becomes inactive — is one possible way to avoid costly downtime. To the extent that you can keep your code stateless, you are less likely to experience this type of performance issue.
Keep your components decoupled
A good rule of thumb for designing a scalable application is to keep your components decoupled. Decoupling is yet another way to enhance performance in the early stages of development by decomposing your application in simple, maintainable and focused (high cohesion) components. This will allow you to safely change and scale one component without affecting the other.
Decoupling is usually achieved by adding a new level of abstraction. This can be accomplished in a variety of ways, such as:
- Using a simple well-written interface or abstract class.
- Taking advantage of the factory design pattern.
- Adding a new message into a message queue.
Go parallel and asynchronous
After splitting a large monolith of a project into decoupled components, we faced issues on how to make the best use of resources within the individual components.
We approached the issue by addressing concurrency first to improve performance. Want to speed up? Go parallel. The reality is that concurrency can be very tricky. It is not uncommon for concurrent applications to run into issues that can severely impact their overall performance and availability.
This doesn’t mean concurrency shouldn’t be used — just use it carefully. Think ahead to ensure you have a plan for fail safety, data recovery, and how to monitor your application and ensure timely alerts of any unexpected behavior.
When you think about multithreading, you should also think about thread orchestration. Think about using the correct thread pool executor or maybe designing a custom one that better suits your needs. Fixed thread pools are used for CPU intensive tasks, while single thread executors are used for sequential tasks and predictability. As for cached thread pool, maybe you do not want to use that. Cached thread pools go out of control on thread creation for long running tasks. Fixed thread pools should mostly do the trick unless you’ve taken another informed decision about what it best fits your needs.
Likewise, asynchronous programming is crucial for developing scalable and responsive applications and it can often go hand in hand with concurrency. It’s mostly used in the event-driven programming model. It eliminates long-lasting wait times, blocking processes, and expensive IO operations. Used incorrectly, it can block both the CPU and current running thread.
If you can’t measure it, you can’t improve it.
This quote by the famous management consultant, Peter Drucker, is as applicable to computer programming as it is to business. But what should we actually measure?
Monitoring and measuring application availability is most essential. If you have a web application, the easiest way to do that is via a simple recurring HTTP check. You should also gather — at a minimum — information on error rates, throughput, memory, CPU, and if the program uses the network, information on response time and bandwidth as well.
If your application is Java-based, keep an eye on the performance issues that can happen when garbage collector occurs. If the collector is incorrectly tuned, it can suspend your application for a short period of time and can use a lot of CPU. Collecting garbage collection metrics may not be your first thought, but it’s always a good idea to keep an eye on them.
Performance analysis is all about visibility — knowing what is going on inside of an application and the application’s environment. And, visibility is all about using the right tools. Grafana, Prometheus, or New Relic are a few examples you might want to take a look at.
Start simple. Create dashboards for tracking basic metrics, apply static thresholds, but automate them as visual detection is error-prone. Static thresholds work well for predictable values but could cause false alarms whenever your system has usage fluctuations. If this happens, your next steps should be to apply transformations and correlate metrics.
Getting a good image of what is happening behind the scenes can be more difficult. For example, is your multi-threading application hanging when running under load? Or, are you experiencing constant memory issues? With performance issues like these, you might need to take a snapshot of the living threads or maybe perform a heap dump to spot the problem. Java comes with a series of tools that can help you do this, including JVisualVM, Jprofiler, and jmap.
Stability is also critical to success
No application is immune to performance issues. Despite all the steps we’ve shared here in addition to the performance tuning and optimization that all applications go through, something can always go wrong. Because of this, automated anomaly detection and disaster recovery should also be considered during the development stage. However, organizations that adopt a performance mindset will be better able to focus their creative energies innovating and providing high-quality applications.
Follow the Adobe Tech Blog for more developer stories and resources, and check out Adobe Developers on Twitter for the latest news and developer products. Sign up here for future Adobe Experience Platform Meetups.