In the modern world, data has become one of the most valuable commodities available. Companies collectively spend billions sourcing, shaping and analysing it.
As a result of this shift, the amount of data we have available to us has outgrown our traditional ways of viewing it
Now engineers all over the world are rethinking the technologies we use, and employing various tactics to ensure they are capable of handling this increase in information.
The user-facing element is no exception to this. It is unrealistic to assume it could render millions of data points out of the box.
I was recently in a team at a startup that was responsible for building the front-end to a network monitoring tool. The challenge here was the sheer amount of events we were expecting to have to process and present to the client in a valuable way.
Within this overarching challenge lay several questions we’d have to answer if the product were to be successful, with one standing out in particular.
How do we create a stable user experience?
There is no single solution to this question. A combination of technology and tricks were needed to achieve it, which can be broken down into several categories:
- Enhancing cognitive stimulation — The user needs cognitive stimulation almost immediately. Increases in digital consumption have caused the human brain to become impatient. Every second the screen is void of content is detrimental to the user’s engagement with the app, and in the realm of big data, those seconds can build up.
- Controlling the influx of data — Too much, and the app will become unresponsive. Too little, and your product loses its value. You cannot expect your product to thrive without providing meaningful insights in a highly performant way, so finding a balance is crucial.
- Maximising data processing power — We need to maximise the amount of data we’re able to process in the front-end. Controlling the amount coming in becomes less of an issue if there are efficient processing methods in place for when it does.
- Increasing external response times —By spending some time improving API response time, we can accelerate data refreshing, leading to a more fluid experience.
Let’s take a look at each of these categories in more detail.
Enhancing cognitive stimulation
There is not a lot that can be done to make dynamic content instantaneously available to the user. The server has to process requests, query for data, and then respond over a network of undetermined quality.
Traditionally, we would display a single loader or blank screen during this process. Neither of which is sufficient in keeping the human brain stimulated. What we needed to do was incorporate some intelligent design into the UI to increase perceived performance.
“Perceived performance is a measure of how quick a user thinks your site is” — Matt West
To give the impression your app is quicker than it is, we can decouple the API responses from the UI updates. This way, we no longer have to wait for a success or failure message before pushing ahead with UI changes. Instead, we retroactively notify the user when the response comes in.
An excellent example of this occurs in popular messaging apps. When you send a message, it’s pushed to the main chat window, pending delivery. If successful, a ‘Delivered’ notification is displayed. If the delivery fails, an error message is displayed, often paired with a CTA to re-send.
What does this mean in the context of big data?
Rather than lying idle while API requests are processed (which could be several seconds considering the large amount of data we’re dealing with), we start setting the UI up for the response. For example, scaling chart axes to the correct timeframe, or moving new filters to a “selected filters” list.
Optimistic UI works best when mutating data. What do we do if we’re retrieving it?
Skeleton screens have become increasingly popular over the last few years. They are now a fundamental part of interfaces constructed by companies such as Facebook and Youtube.
A skeleton screen is a variant of a page that imitates the layout without providing the content
Upon retrieval of the content, we replace individual elements of the mock layout with live data. Often skeletons screens will include a pulsing or shimmering animation to give the impression of progress.
Bill Chung has written a great article on skeletons screens if you’re interested in reading more about them.
Employing these two methods should hold the user’s attention until the data required returns from the server, which brings us to the next category.
Controlling the influx of data
We’ve covered perceived performance, now let’s talk about actual performance.
When building our network monitoring tool, we needed to think about the variety of devices that would be used to view it. Not all of them would be powerful enough to handle the vast amounts of data we expected to generate.
Two techniques were utilised to reduce this risk.
In software engineering data buckets have several definitions. In this case, it is the batching up of data into time intervals. Let’s use an example to demonstrate the concept.
Say you have a time series chart with the x-axis being the time in minutes, and the y-axis being an arbitrary measurement. Over the course of an hour, the server receives 2 million events.
Without bucketing, these would all be passed to the UI, causing performance problems. With bucketing, we could break the hour down into sixty 1-minute intervals.
For each 1-minute interval, we would take the average measurement of all the data points that occurred.
There are now only 60 data points we need to worry about
The trade-off here is that we’ve severely reduced the value of the data. In response to this, the front-end will allow the user to change the time range, triggering a bucket refresh with the new time parameters. By doing this, the ability to dive into the data on a granular level is retained.
“Lazy loading is the approach of waiting to load resources until they are needed, rather than loading them in advance. This can improve performance by reducing the amount of resources that need to be loaded and parsed on initial page load.” — Sheila Simmons
In the context of “controlling the influx of data”, we are talking about reducing the number of API requests by breaking them down into two categories — onscreen and offscreen.
If a component is offscreen, we delay its requests by not rendering it until it’s intended position is nearly onscreen. Libraries such as react-lazyload make this painless to implement.
Delaying unnecessary requests will help speed up the initial load, delivering valuable information to the user faster
Consequently, there is now an issue where there are multiple data loads as the user scrolls. This may not sound like much of a problem, but again when dealing with big data, this can very quickly become one.
To remedy this, we can iterate on the technique. Rather than deferring the offscreen API calls until the components are onscreen, we delay them until the completion of in-view API requests instead.
Combining these methods should minimise any freezing when loading data. That is only half the job though. The focus now needs to shift to increasing performance when interacting with that data.
Maximising data processing power
Once our data is in the client, the user will begin mutating it in various ways.
This can demand a large amount of resource on the user’s device. To minimise this, we pre-aggregated our data and used Big O notation to make sure any functions we performed on it were as efficient as possible.
To avoid unnecessary computing in the front-end, we perform calculations in advance and store them in a database.
Let’s say the application consistently needs to display a total of two measurements. If this is only happening once — it’s not expensive, and we don’t have to worry about pre-aggregation. If this is happening millions of times — pre-emptively running these computations will take the pressure off of the client, freeing it up to focus on user-driven interactions.
As an alternative to pre-aggregating using databases, it is also possible to do it at runtime.
Whilst you can do this manually, using a library like Crossfilter can simplify the process.
The caveat of doing it at runtime is that additional processing power is needed on app load. Once that has completed performing functions such as filtering and reducing on the data will be a lot faster.
The benefit of doing it at runtime is you have more control over what you can aggregate, as you are not tied to using persistent storage.
Big O notation
Often code efficiency is overlooked, in favour of developing features at speed. Learning how to measure and then reduce code complexity is an excellent way of ingraining efficiency into your programming, and Big O notation is a great system to help with this.
If we ignore efficiency, user actions may crash the interface
It’s best when dealing with resource-heavy code such as iterations, to always be cautious and consider the following:
- How much data is this code likely to be handling?
- Are there unnecessarily complicated iterations?
- Are there nested iterations?
- Can the iteration be broken early?
There’s a lot to consider when it comes to code complexity, but it’s well worth investing the time in learning how to reduce it.
You can read more about Big O notation here.
Increasing external response times
We’ve covered how to optimise the internal responsiveness of your UI. Now, what about optimising externally too?
Caching client-ready data using tools such as Redis is common these days. There is, however, a limit to how much data we can store in-memory, as it is not the cheapest of storage options.
Considering the amount of data, we had to introduce smarter ways of caching to really get the most out of it.
Instead of just caching static data, we also cache data that is similar to what is currently in-view.
For example, we talked earlier about bucketing data into 5-minute intervals. We predict with a high rate of success that the user will want to see the refined data within those intervals. By caching this data beforehand, the response time of the API used to retrieve it is decreased.
A user will usually have a consistent set of behaviours. Often they will visit the same pages or select the same filters. Over time, we can start to predict these patterns and pre-emptively cache the relevant data upon login. This will help strike a balance between the cost of caching and maintaining a high level of cache hits.
Adaptive caching can be seen as a smarter, albeit complex form of adjacent caching, due to it being more accurate as it is tailored to the user.
These two forms of caching aren’t mutually exclusive, but ensure data duplication does not occur when used together.
How do we avoid regressing on our performance enhancements?
Once you’ve built your app, it’s essential not to regress on the hard work with new features that don’t comply with the previous performance-driven standards.
Developers should live and breathe performance
Too often it’s seen as an afterthought, which may be ok for apps with minimal data throughput. However, if we’re building a system that involves big data, it has to be at the forefront of everyone’s mind.
Documentation & onboarding
Assuming your app performs well in the market, scaling your team will become a priority. It’s essential during this time to be disciplined and ensure you grow in the right way.
Remember — Twice as many developers does not immediately equal twice the work
New engineers need time to onboard and familiarise themselves with the system. Pushing them into the deep-end without a structured onboarding process may lead to an increase in feature implementation, but will also hurt the quality and performance of your product.
Automation can catch anything a developer may have missed. Ideally, you want the developers to be building features with performance in mind. However, as the last line of defence, automation can be a lifesaver.
It can cover anything from page load time, using tools such as Pingdom, to checking for memory leaks with a custom build pipeline job.
These techniques represent a high-level response to the challenges we needed to overcome to take an interface dealing with large amounts of data from ideation through to delivery.
There are many other great tactics out there. I strongly suggest thoroughly researching and exploring multiple avenues before settling on a roadmap, especially with the front-end ecosystem continually evolving at such a high speed.
Every year innovations surface that improves the way we handle data in the front-end
Keeping up with them is a fantastic way to ensure the app you’re building is consistently performant, allowing the focus to shift to delivering great features.