Roku’s SceneGraph Benchmarks: AA vs Node

Paweł Hertman
DAZN Engineering
Published in
7 min readSep 24, 2018

Roku applications might look very simple from the users perspective, but they sometimes hide a lot of complexity behind the scenes, data management certainly making up for a big part of it. With data constantly being fetched, parsed and shared across the whole app, giving users a smooth and responsive experience can be quite a challenge given how simple Roku boxes can be in terms of performance. To deal with this data Roku offers us two ways: associative arrays and nodes. But which one should we use?

The official Roku Docs says:

The first category represents small, shallow data structures where each structure instance is usually treated as a single cohesive item. These are reasonable and efficient to model as AA fields. The second represents large, deep data webs where copying would be prohibitive. It is reasonable to model these as node trees.

But what exactly do “small”, “shallow”, “large” and “deep data” mean? Let’s check this out!

Nomenclature and assumptions

For the purposes of the article, let’s use the following nomenclature for data structures:

  • 10x1 means an object containing 10 intrinsic type fields (in this case string)
  • 10x10x1 means a 2-level object: all of its fields contain another object containing 10 intrinsic type fields
  • And so on…

Object creation time

Before benchmarking any other, more complex case, let’s check how much does creating an object “cost”. Obviously, it’s pretty fast, both for AA and Node, so in order to compare results, the benchmark consists of creating 1000 objects in a loop. The self-evident variable impacting the creation time is the number of fields. In the chart below you can check the average of 10 benchmarks tests:

Creating an empty node takes 10 times more time than an empty AA. Every additional field means about 2ms more for AAs. For nodes this number is dynamic, from 25ms at the beginning to 10ms for 12 fields. The chart doesn’t contain results for more than 12 fields for nodes, because the benchmark ended with execution timeout.

The more fields, the smaller ratio between AAs and Node, but it’s a big difference anyway.

Node operations time: extending nodes

Because a node is a SceneGraph component, it can have its own interface fields, what gives the OOP feeling and tempts to extend nodes. The table below shows the difference between creating empty nodes and empty extended nodes:

Average of 10 tests on Roku Ultra

The conclusion is that extending a native node is almost 3 times more “expensive”, but extending non-native node doesn’t matter. Even the official Roku Docs says:

Node creation, especially of inherited components, can be expensive. It is suggested that “AddFields” be used instead of “Extends” for such components.

Node operations time: setting fields values

Average of 10 tests on Roku Ultra; 7-signs fields keys

The fastest way to populate a node with data is via the addFields method. Although, if a node has an interface with fields, the best way is to directly assign values to specific keys. Remark: it’s just for the big amount of objects — for few it’ll be almost unnoticeable.

Data exchanging between render and task threads

As shown above under “Object creation time”, even if the difference between creating AA and node is substantial, the execution time is still human-unnoticeable. The game-changer is a rendezvous action. Unlike the real life, the more rendezvous, the worse.

When a Task thread operates on a Render-thread-owned node, it triggers a rendezvous. […] Nodes are passed by reference whereas Associative Arrays are passed by value.
— Roku Docs

What’s more, it’s not only about passing nodes between Task and Render threads but every type of field.

Let’s check the impact of a rendezvous. Because it may be really “expensive”, we don’t have to do such executions in a loop to notice differences. While checking execution times for different sizes of objects, I’ve noticed that it depends on keys length (that’s why I emphasized object creation benchmarks were done for 7-signs keys and 7-letters values).

Charts below show execution times from creating a task until fetching the task’s output change for different objects and key lengths with 2 different cases:

  • passing an object to a task, so the task’s function code is:
sub benchmark()
input = m.top.input
m.top.output = true
end sub
  • passing an object to the task and setting it as the output:
sub benchmark()
input = m.top.input
m.top.output = input
end sub

During benchmarking I was thinking how times could be decreased, so I decided to try two more type of objects:

  • The “Stringified AA” case contains stringifying an AA, so a JSON string is passed to the task and parsed back into AA inside it
  • “Array” means an array of `{ key: “realKey”, value: “realValue” }` objects

What can be noticed, for each case except an array, the execution times increased dramatically around the 33 number of signs (it may be worth to check this number on models other than Roku Ultra). Brightscript seems to be optimized well up to this point — for such case both AA and node are a good choice, although node performs a bit better. On the second chart, where the input is set as the output of the task, the difference is bigger, because of the additional copy (“Nodes are passed by reference whereas Associative Arrays are passed by value.”)

33 seems to be a big number, but sometimes you may need them. Because the execution time doesn’t depend on value’s length (only on the key’s length), try to generate a special, value+key-objects array in such case:

data = [
{ key: "veryLongDescriptiveKeyOrKeyFromAnExternalAPIService", value: "value may be simple" },
]

If you plan to pass data only in a one way (so using a node instead won’t be a performance boost) and for some reasons (like specific operations comfier for AA) want to avoid generating such array, consider stringifying data before passing to a task.

What about times for different object sizes? The table below shows results of passing an object to and back from a task:

7-signs keys; tested on Roku Ultra

Stringified AA in each option was slower than AA, so they are skipped in the table. “Nested nodes” means that if a field contains an object, it’s another node, whereas for “Node” it’s an AA.

The conclusion is clear — for small, not so composite objects there isn’t a noteworthy difference, but for bigger, complex data structures you should use node (and rather avoid using nodes as fields). Notice how time for the same number of field differs for different structures. The fewer items for the first level the faster execution time. It may be worth to split large “flat” (one-level) object into a few if data could be easily categorized. 10x100x1 is 25% faster than 1000x1!

Notice the big difference between a bit more complex, but consisted of smaller objects, data structure of 10000 fields against a flat object.

You could say “hola, hola, who the heck uses such big objects?!”. I can imagine a few reasons, like fetched translations or complex video list data.

Data exchanging between components

We might expect a similar trend for different key lengths when passing an object between SceneGraph components. And indeed it’s almost the same, with one remark — it looks like generating an array for long keys doesn’t make sense, because the time for a node is similar (it may differ reaching more than 40 signs).

The results of passing an object between SceneGraph components for various object sizes are similar as for Component-Task relation:

Tested on Roku Ultra

Conclusions

  • When using data inside one scope avoid using nodes, because their creation time is much longer than for AAs.
  • Even if it’s tempting to extend the node class and create a specific type node with a custom interface, keep in mind it’s more expensive than the standard, non-interface node. For a small amount, it doesn’t matter, but when working with a bigger number of objects, it’s a potential speed boost using simple node object (necessarily filled with data using addFields() method).
  • Remember that object’s key length does matter. Roku Ultra is optimized for max ~32 fields. When using longer keys, consider generating a special array of key+value objects and avoid using AAs.
  • For small, not so complex data (recommended max 200 fields), the decision whether to use a node or an AA is up to you and should depend on the number of passings between threads. For bigger and more compound data, always use a node (without nested nodes if possible).

A big “thank you” for discussions and remarks to my teammates at DAZN: Tomasz Rejment, Radosław Zambrowski and Mauricio Dziedzinski!

--

--