How do you like my clickbait title? While you are here, I want to share some tricks to handle larger datasets without locking the UI or slowing down your chart interaction.
Trick 1: Progressive rendering
A common problem is UI locking while a piece of graphics is rendering. One solution for freeing the UI is to chunk the rendering and render each piece in a loop, handing over the thread each time to the UI. For this trick, I often use a little tool called renderSlicer.
In this example, I misused the rendering to illustrate the point. Instead of only rendering the section visible in the window, I re-render the whole line chart on every brush events. I also set the render-slicer parameters to be very slow, so we can see the effect of progressive rendering.
It would be fun to try other patterns for progressive rendering, like iteratively refining the resolution of a grid, or like the example from the original library called renderQueue.
Render-slicer uses requestAnimationFrame, which is a timer that will go as fast as the browser can go. Other optimizations I used on this chart is throttling on the brush events, so it doesn’t try to re-render, for example, more than once each 300ms.
Trick 2: Buffer canvas
One challenge I often meet is the handling of streaming charts: new data is coming periodically, the whole chart is shifting to leave room for the new data points.
In this basic streamgraph example, the I’m just redrawing everything at every tick. It doesn’t matter for this toy example. But what if you need to stream more data points, at higher frequency, or if your graphic computation is expensive, or if you have tons of these charts on the same panel?
My first reaction to this situation would be to find a better abstraction, like downsampling, focus+context, etc., more suitable to what the human visual bandwidth can handle. But I saw some cases where data overload was a necessary evil (which I could share in a future blog post). For now, let’s assume that the goal is to scale up a streaming chart to more data points than the browser can handle without noticeable UX penalty.
While designing the Firespray streaming charts library, I made an example inspired by side-scrolling games and infinite-scrolling tables like clusterize.js or ∞.js. The idea is to always have a buffer canvas preloaded, ready to fill the gap when the main canvas shifts left. As soon as the main canvas is translating out of the view, it jumps back to the right of the second canvas, which is now acting as the main one. And the cycle continues.
There are multiple advantages to this technique. First, instead of redrawing everything on each tick, you only draw everything once a cycle and shift the canvases using a simple scrollLeft or even a nice CSS animation (which is hardware accelerated). I also like that you can lazy-load the data, draw it on the canvas and then leave it to the GC to free the memory.
Here is an example with a canvas buffer on both sides, ready for scrolling in both directions. I’m sure you will excuse my prototype-quality code.
One trade-off is that you need to be able to afford this buffer zone, but it can be as short as a single sample. I realized that it often is more important for the user to see a chart ticking exactly on clock, even if it’s one sample behind. Real-time visualization can mean many things, but for me it’s all about trust in the system being, not fast, but dependable.
Trick 3: Quadtree and bisection
Using the previous idea of sliding canvases, here is another example to show one solution for fast interactivity. Even with 100 line charts, each showing 300 data points, the hovering is totally smooth. The trick is to look for data and layout in the data space using bisection.
Trick 4: Breaking free from the thread
But don’t take this as an example of a good datavis. Mapbox GL offers a way better example of the power of webgl, in this case for rendering interactive maps. Another very impressive framework is deck.gl by the Uber datavis team.
Trick 5: Stream loading
Another process that can block the UI is loading data. My favorite solution is streaming data with Papaparse. Papaparse is not just a csv parser. It can also stream data by chunking it and transferring each data parts sequentially. This can even use Web Workers with a simple config switch. I should really bring one of my examples online, but the fun part is that you can render each data chunk directly to a canvas as soon as it arrives, and then discard it from memory. So you get the same idea as with renderSlicer but across the whole data pipeline.
Trick 6: Fast data layer
I often see the data layer as being the bottleneck. There are of course tons of powerful backend technologies to get things done efficiently. But sometimes, we can forget details that can have a big impact, for example data compression. I liked how Tamper takes care of finding the best compression algorithm for your categorical data, especially when coupled with PourOver, also by the NYTimes kind of a Crossfilter for categorical data.
Crossfilter and PourOver make filtering, aggregating and other data operations client-side very efficient. MapD-charting goes one step further in connecting to their very fast database leveraging the power of GPUs.
A lot can be done to optimize client-side rendering, and there’s a lot more we could discuss (caching, server-side rendering, aggregating, LOD, blitting, using shaders, using webcl, etc.). But I just wanted to share some solution I had to use so far in my datavis work. Thanks for your attention!