Butter Performance

Aleem Mawani
Streak Engineering Blog
8 min readJul 9, 2012

Performance is a top priority at Streak and is something we’re constantly working on improving.

This post goes into the different techniques we used to optimize a data heavy view in our app. Specifically, we wanted to make a 5000 row/10 column table HTML/js table feel as “buttery smooth” as a 20 row/10 column table.

At Streak, our users use the following “pipeline spreadsheet” quite heavily to see all of their data. It isn’t uncommon to have thousands of rows in the table. We also didn’t want users to have to page through rows — we think a smooth and fast scrolling experience is the ideal UX. We also need to support common table operations such as sorting, filtering, grouping, etc.

Detailed problem:

  • Rendering the large table should happen in under three seconds
  • 60 frames per second should be maintained while scrolling, panning and editing cell contents

Approach:

At the 2012 Google IO Nat Duca and Tom Wiltzius had a great session on jank and the process to discover and eliminate it. I highly recommend watching the session as it provides context for the 60 fps goal and some ways on achieving that goal. What follows are the various wins we uncovered while breaking down the performance bottlenecks.

Win #1 — Use smart multisort

We wanted our table to allow for sorting on multiple columns. Consider a list of this can be done in one of two ways:

List — a collection of javascript objects with properties
Properties — an array of properties you want to sort by

Show multisort
Fast multisort

The difference between the two is the placement of the for-loop. In the slow way the for-loop happens outside the sort function which means that you’re sorting the entire list over each property. However, with multisort you only have to sort once, and in each comparison you compare all the properties. This results in a dramatic speed improvement.

Win #2 — Sub-optimal algorithms can be much faster than optimal ones

Levinshtein distance is the smallest number of edits required to transform one list into another using a defined set of operations. We use the edit distance to only update specific parts of the spreadsheet based on updates to the underlying data.

I coded up a modified version of the dynamic programming algorithm found on the Wikipedia article you can find here [modified Levinshtein]. What you’ll notice about this code is that it’s really bad. It’s inefficient and takes up an unnecessary amount of space even though its optimal (finds the minimum number of edits). There are simple improvements that can be made to this basic algorithm to reduce the memory footprint. Execution time can be improved using the Four Russians method.

However, the dynamic programming method gets us the optimal edit distance. Often in the practical world, sub-optimal is enough, and a lot faster. Furthermore, our data undergoes the same kind of changes so we can bias our edit distance algorithm to handle those changes better. Fast Levinshtein is an algorithm that runs in O(n) and produces a pretty good set of edit operations for our needs.

Win #3 — Document fragments are fast, innerHTML is faster

When adding a large number of nodes to the DOM standard practice is to create a document fragment, add all the nodes to the fragment, then add them all at once to the DOM. Unfortunately, this is still too slow. The faster method is a little old school — create all the nodes using one large string, and then set the innerHTML of a container node.

Javascript (especially on Chrome) is actually really fast so constructing the string, even with a large amount of data, takes a minuscule amount of time. Setting the innerHTML of an element, even when that string is large is still substantially faster than creating thousands of nodes and adding them to a fragment individually.

Win #4 — Event binding is slow

If you’re not familiar with jQuery’s on/off event API take the time necessary to really GROK it as the savings are huge. Previously we were attaching event bindings to each cell, and attaching an event takes a non-zero amount of time, so multiplying that by tens of thousands of cells adds up to a lot. With the on/off API and event delegation you attach one set of events to the parent node just once. With event delegation you no longer care about the number of cells and so binding the event takes a constant one millisecond.

Win #5 — Alternatives to overflow:hidden

A known bug in Chrome is that rendering elements with overflow:hidden takes a performance hit. Again, this isn’t that big of a deal for a small number of nodes, but multiplying by tens of thousands of cells results in a noticeable performance hit.

Again, refer to the Google IO talk on how to find these poor performers.

After a lot of investigation we discovered a DOM structure that would give the same visual look as a table with overflow:hidden cells, but without the associated performance penalty.

Traditional table structure
New table structure
Style for new table structure

The reason this works is because of the browser’s default stacking order.

Downsides to this technique:

  • Each cell has to have a defined width
  • Background color has to be defined, can’t be transparent
  • Cells with long contents can “peak out”of the table if you’re not careful (can have a container div around all the cells that is overflow:hidden)
  • Inspecting element in dev tools is really slow because of all the sibling nodes

Win #6 — Reduce number of nodes

Previously, each cell of the spreadsheet was responsible for rendering its own read and edit contents. What this meant is that each cell would have its own set of edit controls, and while those controls wouldn’t be added to the DOM until they were necessary, there was still some containers present in memory, and the edit controls were still found in the DOM.

Similar to having one parent event binding, we moved to having one set of edit inputs that would move around depending on the cell being edited. If you look at the differences in HTML structure…

Cell with scaffolding
Cell without scaffolding

you’ll see that it’s much simpler and we’ve gone from three nodes to one.

That’s huge.

Win #7 — Reduce number of visible nodes

The spreadsheet with thousands of rows will be thousands of pixels high. Even with the latest fancy retina display in full resolution you’ll only be able to see a subset of rows. Unfortunately, the browser doesn’t know which rows are in the visible viewport and has to push every cell through at least part of the rendering pipeline. Our code does have a better idea of what rows are visible so we can help out the browser by only telling it to render the subset of rows that are visible. The rest of the rows can be hidden. We do this by toggling the display visibility of the rows by setting display to none.

Display is set to none instead of visibility:hidden as Chrome is able to handle display:none a lot faster. This makes sense because with visibility:hidden Chrome still has to render the cell to figure out the height and width, whereas a node that is display:none can just be removed from the render path.

This does present us with a new problem though in that the scroll bars won’t work properly. You can see how Google Spreadsheets handles this issue as they actually render the scrollbars themselves — notice how you can’t scroll a Google Spreadsheet by half of a row. We went with a different approach involving spacers which preserves the browser’s default scrollbars and results in a smoother scrolling experience…

How spacers maintain browser’s scrollbar

and the associated pseudo-code.

Pseudo-code for handling visibility

You’ll notice that while you’re scrolling we don’t hide rows that are no longer visible. The reason for this is that even with lots of rows visible the act of scrolling isn’t actually slow, but toggling the visibility of thousands of cells (if you’re scrolling fast) can lead to a jerky scrolling experience. So we set a timer that figures out when scrolling is “done” and then reset the visibility of all the rows.

The tricky part in the above code is optimizing the function “should row be visible”. This determines how performant scrolling is. The more rows you deem needed to be visible, the more toggles of visibility you do and the slower the rendering. You need to build in smarts to load a buffer of rows when you do load a new one so that you don’t do it as often — clearly there are many tradeoffs here.

Conclusion

Did we achieve our goals? Almost.

Initial load time went down to under 2 seconds.

60FPS is maintained most of the time, however occasionally when you’re scrolling there will be some hiccups.

Overall we’re extremely happy with what we accomplished so far. When we first started out on this project we didn’t even know if rendering a table with that many cells would even be possible. We ended up figuring out that not only is it possible, we can make it fast too.

And we can make it faster. Improvements for the future

  • Web workers — sort/filter the spreadsheet and construct the string in a web worker while changes to the data is occurring. This would get initial load to under 1 second.
  • More server processing
  • Hybrid smart scroll — show/hide rows and also remove and readd as necessary, this may eliminate some of the scrolling hiccups as there would be a lot less nodes to RecalculateStyle on
  • Horizontal scrolling — do a similar show/hide of columns for horizontal scrolling, again to eliminate the number of nodes to RecalculateStyle on

Thanks to the web development community for all their great work and contributions. It’s only by sharing our knowledge and discoveries that we can push the web forward. Hopefully you’ll have gained a trick or two from this post, and if you have any further suggestions please let us know.

--

--