Using a Directed Acyclic Graph in a React-Redux Application

Published in

Splunk Engineering

18 min readNov 20, 2019

The Splunk Value Platform (SVP) team has built a suite of internal web applications which are used by Splunk Sales and Customer Success (CS) employees all around the world. Each application helps prospective and existing customers understand the value Splunk can provide for them. While every SVP application has a unique purpose, they are all built using React-Redux and implement a directed acyclic graph as the core data structure to store user input and other presentational data.

Splunk’s Sales organization initially created a set of spreadsheets to use during outreach sessions with potential and existing customers. These spreadsheets were designed to showcase the quantifiable benefits of utilizing Splunk. Our web applications were born out of the need to surpass limitations naturally created by using static spreadsheets. For example, consider the cumbersome steps which would need to occur if Sales and CS wanted to include additional metrics in a spreadsheet or change the way a certain cell is calculated. All Sales and Customer Success members would need to immediately start using a copy of the newly updated spreadsheet. Moreover, any spreadsheets currently being worked on or previously shared with customers would be considered outdated. In contrast, versioning is much more convenient and feasible through our web platform. If we want to change any aspect of our front end or back end, we can make the necessary modifications in our codebase and publish new changes whenever we desire. When users reopen the application, they will seamlessly see our additions and improvements.

The UI of our web applications contains input and output fields directly corresponding to cells of the original spreadsheets. The screenshot below shows the translation from an Excel spreadsheet to our UI for a question regarding the customer’s current consumption of Splunk:

Excel cells in old spreadsheets are converted into input and output fields on our web applications. — Input fields (yellow) and output fields (white) from a static spreadsheet compared to how they look on our website’s UI

In the picture above, the blue arrows indicate mapping of input fields (from the spreadsheet to our UI). The purple arrows indicate the mapping of output fields. These values are calculated in Excel using formulas. We implement these formulas in our application code to achieve the same results. Sales and Customer Success employees typically work on the application alongside the customer, modifying input (as possible) depending on the output values desired. Some input fields capture information about the customer, such as their background and preferred investment schedule. Output fields show values calculated from at least one input field, such as ROI.

A Quick Aside About Graphs

Graphs are a type of data structure in which a network of nodes is connected by edges.
The simplest type of graph consists of one node, without edges extending to any other node(s).
In graph theory, directed graphs are composed of nodes which are connected by one-way edges.

A world map showing various airport locations and a round-trip travel plan including an arrival at each airport. — A travel itinerary. Image by: Dennis Shepherd, Sr. Web Application UI Designer (SVP)

We can consider the flight itinerary above to be a directed graph. The airports in San Francisco, Dubai, Tokyo, Singapore, and Sydney (SFO, HND, DXB, SIN, SYD) are nodes of the graph. A flight between two airports is an edge. (SFO, HND), (HND, DXB), (DXB, SYD), (SYD, SIN), and (SIN, SFO) are edges in the graph above.

This graph is directed because there is an asymmetrical dependency between every pair of nodes. In the sample travel itinerary above, an individual must first take a flight from San Francisco to Tokyo before they can take a flight from Tokyo to Dubai. The round-trip travel itinerary is also an example of a cyclic graph, in which there exists at least one node such that you can traverse from the node back to itself. As per the flight schedule above, the individual departs from SFO on August 26th, but also returns to SFO on September 1st. Acyclic graphs, on the other hand, do not have any nodes with circular references. If we change our entire flight itinerary to simply be the first flight from San Francisco to Tokyo, we would not have any way of returning to San Francisco. There wouldn’t be any cycle in the flight schedule.

Spreadsheets Can Be Modeled as Directed Acyclic Graphs

The neat characteristic about spreadsheets is that a cell’s value can be driven entirely by a calculation of one or more other cells. As a consequence, changing any parent cells will immediately change the calculated value of the child cell. Let’s say we have four cells (Cell A, Cell B, Cell C, Cell D) where:

Cell C is a direct result of Cell A + Cell B.
Cell D is a percentage value of Cell C, so Cell C * 100.

We can draw out this relationship for more clarity:

An example of a directed acyclic graph, in which Cell A and Cell B are parents of Cell C and Cell C is the parent of Cell D.

A user can put in values for Cell A and Cell B.
Cell A and Cell B are immediate parents of Cell C.
C = A + B, so both Cell A and Cell B influence Cell C’s value.
Cell C is the immediate parent of Cell D.
D = C * 100, so Cell C directly influences Cell D’s value.
Cell D is a calculated value dependent on the outcome of (A + B) * 100.

The values on our web applications are derived in a similar way to the example above. Some values, like Cell A and Cell B, are purely driven by user input. All output fields on our UI, like Cell C and Cell D, require calculations to be done for their values.

The relationship shown above between Cell A, Cell B, Cell C, and Cell D is that of a directed acyclic graph (DAG). The directed edges (A, C), (B, C), and (C, D) specify formulas of how values in child nodes are calculated from their parents. Each of our applications similarly has a one-way flow of calculations from parent nodes to child nodes and consequently implements a DAG. We don’t implement a cyclic graph because there is no circumstance in which a calculated child node affects its parent.

We use a directed acyclic graph as the primary data structure in our applications because it perfectly models the “parent”-”child” relationship of calculations between cells in a spreadsheet. We can efficiently translate logic for hundreds or thousands of cells from a spreadsheet into this data structure and be able to have a clear visualization of how one part of our app directly affects the values of another section.

Each of SVP’s five applications has a uniquely structured DAG, but I’ll only focus on one application and its DAG for more clarity.

A Closer Look: A Sample Relationship in Our Directed Acyclic Graph (DAG)

We can take a closer look at one relationship between two nodes of our DAG. In the screenshot below, there is an output field which we can call ‘yearly.sev1.incidents’:

Part of our UI showing that the number of incidents investigated with Splunk/year is a node in our directed acyclic graph. — The node ‘yearly.sev1.incidents’ in our DAG currently has the value of 36. It is not an input field, which shows that the value is calculated from at least one other node.

As this is an output field, we know that it must rely on at least one input field for its value. It turns out that the value of “36” in this case only depends on the total number of incidents investigated with Splunk per month (‘monthly.sev1.incidents’):

Many values on our UI are interconnected. Answers from input fields provide the base for calculations which generate values for output fields. ‘monthly.sev1.incidents’ is an input field which affects the value of ‘yearly.sev1.incidents’, an output field.

In the diagram above, we can see that ‘yearly.sev1.incidents’ is only calculated from one parent, ‘monthly.sev1.incidents’. However, it is common for a node to influence many others in our DAG. A child node can act as a parent node as the number of levels grow:

A node in our DAG can be a child node and a parent node. It may be derived from some node(s), and affect others as well. — A small portion of the DAG from one of our applications

Defining the DAG in Our Application

Our React application contains a static definition for the DAG. We use a JavaScript object to store details about every node used in our application. We can assign this to a variable called nodeDefinitions.

In nodeDefinitions:

Each key is the name of a node.
The value for each key is an object containing the node’s properties, such as its parents.

A definition for every node of our application’s DAG is stored in an object called nodeDefinitions.

‘yearly.sev1.incidents’ (string): The name we gave to a particular node. The node defined in the code snippet above corresponds to the purple box in the previous discussion about Sev1/Sev2/Sev3 incidents.
‘is_input’ (boolean): Boolean value indicating if the node’s value comes from user input or from a calculation involving values of other nodes.
‘parents’ (array): Array of strings detailing the names of immediate parent nodes.
‘calc’ (function): A function definition for how to calculate the node’s value. This property is only present for nodes which are calculated from at least one input field.
‘type’ (string): This property tells us more about how the value is calculated and should be displayed on our UI. The node in this example is “numeric”, which gives a hint to the presentation layer of how to display this data. Other possible types could be “percent” or “currency”.

More about the ‘calc’ property:

If the user changes the value of an input field, the ‘calc’ function is called for every child node of the input field that was modified.
The function takes part of the Redux state called valuesOfAllNodes as an argument and returns the new value for the node based on new value(s) of its parent(s) (more on this later). In order to calculate the next value of a child node, we need to know:

1. The new value(s) of its parent(s)

2. Its ‘calc’ property, which is a JavaScript function that returns the new value of the node

You might see the node definition above and ask: “If you include the names of parent nodes, what about a property for the node’s children?”

Good question!

As a developer-friendly optimization, we do not hard-code the ‘children’ array for each key in nodeDefinitions. It would be an unnecessary manual step that naturally increases chances of errors due to typos or accidental omission. Instead, we statically define nodeDefinitions such that every node has a property for its parents. At application startup time, we have a function loop through each key in nodeDefinitions and add a ‘children’ property if the node currently being parsed influences any other node(s).

Consider that we know ‘monthly.sev1.incidents’ is a parent of ‘yearly.sev1.incidents’ given the definition of ‘yearly.sev1.incidents’ above. As a result, ‘yearly.sev1.incidents’ must be a child of ‘monthly.sev1.incidents’.

After defining nodeDefinitions, we call a function which iterates through each key of the object (every node of the DAG) and does the following steps:

1. Check if the key has parents.

'yearly.sev1.incidents': {
  'is_input': false,
  'parents': ['monthly.sev1.incidents'], //this node has one parent, since it has one key name in its 'parents' array
  'calc': function(valuesOfAllNodes) {
    return valuesOfAllNodes['monthly.sev1.incidents'] * 12;
  },
  'type': 'numeric'
}

Suppose we are on the iteration for ‘yearly.sev1.incidents’. We can see from its definition that it has one parent, ‘monthly.sev1.incidents’.

2. For each parent found in (1), initialize its ‘children’ property to an empty array.

nodeDefinitions['monthly.sev1.incidents'].children = [];

As a result, the node definition for ‘monthly.sev1.incidents’ would look like:

'monthly.sev1.incidents': {
  'is_input': true,
  'parents': [], //no parents, since it is an input field not influenced by anything other than user input
  'children': [], //empty 'children' array initialized
  'type': 'numeric'
}

3. Add the name of the child node into the ‘children’ array for each parent found in (2):

nodeDefinitions['monthly.sev1.incidents'].children.push('yearly.sev1.incidents');
//Parent: 'monthly.sev1.incidents'
//Child: 'yearly.sev1.incidents'

As a result, the node definition for ‘monthly.sev1.incidents’ would be updated to:

'monthly.sev1.incidents': {
  'is_input': true,
  'parents': [], //no parents, since it is an input field not influenced by anything other than user input
  'children': ['yearly.sev1.incidents'], //child node added since 'yearly.sev1.incidents' is a child of 'monthly.sev1.incidents'
  'type': 'numeric'
}

If a node does not influence the calculations of any other node(s), it should have an empty array for its ‘children’ property in nodeDefinitions. In a visual sense, this implies that there are no directed edges from the node leading to any other ones in our DAG.

Definitions for the ‘monthly.sev1.incidents’ and ‘yearly.sev1.incidents’ nodes after populating the ‘children’ property for ‘monthly.sev1.incidents’

Maintaining the Values of Our DAG’s Nodes in the Redux State

Redux is a library which can be used to manage the state of any JavaScript application. By adding Redux to our React application, we are able to maintain the values of our DAG’s nodes in an organized and predictable manner. We utilize the Redux state to have a single source of truth for storing and updating the value of every node. Almost every React component in our application needs to display some variety of values from our DAG and several of our components need to be able to update the values in our DAG as well. By using Redux, we can easily map these values into our React components to display them on our UI. We can also dispatch actions from our React components to the Redux store if we need to change the value of a node in the Redux state.

We store the value of every node in a JavaScript object within our Redux state. We can name this portion of our state as valuesOfAllNodes.

Each key in valuesOfAllNodes is the name of a node, such as ‘yearly.sev1.incidents’. The value of each key is either the user input (if the node corresponds to an input field) or calculated value (if the node is a child node).

let reduxState = {
  //other properties of the Redux state
  valuesOfAllNodes: {
    //other key-value pairs in the format of node_name: value
    'monthly.sev1.incidents': 3, //user input
    'yearly.sev1.incidents': 36, //calculated value
    'monthly.sev2.incidents': 6, //user input
    'yearly.sev2.incidents': 72, //calculated value
    //more key-value pairs in the format of node_name: value
  },
  //more properties in the Redux state
};

In general, a change to the value of a parent node should cascade through relevant parts of the directed acyclic graph, recalculating the values of child nodes.

Any modification to the value of an input field will cause the React component to dispatch an action to the Redux store. Here is a brief summary of the steps which would take place after a React component dispatches the action:

1. The action:

Creates a local copy of valuesOfAllNodes.
Updates the key of the input node in this copy with the new value the user inputted.
Breadth-first search (BFS) is subsequently used to traverse our DAG and call the ‘calc’ function on all child nodes of the input field. Each child’s ‘calc’ function is called with the local copy of valuesOfAllNodes. As a result, the calculation for each child node uses the updated value of the parent input node and will be up-to-date.

2. The action passes its local copy of valuesOfAllNodes in its payload to the Redux store.

3. The store calls the appropriate reducing function for the action just dispatched.

4. The reducer returns the payload of valuesOfAllNodes passed from the action.

5. The store saves valuesOfAllNodes returned from the reducer in step 4 in the Redux state.

6. React components which are connected to the Redux store via mapStateToProps would re-render as a response to the change in Redux state. They would receive the newly updated Redux state and pass it down to other components as necessary.

Our Custom Input and Output React Components Tie into Our DAG Logic

React is a front-end library used for building Single-Page Applications. From a React perspective, the UI of a website can be thought of as being composed of different components. Each of these components can maintain its own state.

The way we model our data in a DAG on the front end has also influenced the design of our React components. Our Input and Output components are two of the most commonly used components in our entire application. We developed these to help us present the values of our DAG’s nodes to the user with more ease.

Recall that one node defined in our DAG has:

A name, which corresponds to a key in nodeDefinitions.
A ‘type’ property in its definition, which specifies how the value should be formatted. Some types are “numeric”, “currency”, and “percent”.

Values from two nodes of our DAG are shown on our UI. One is derived from user input, while the other is calculated. — An Input (green) and Output (purple) component on our UI

Our Input component is a controlled, stateful component. Every input field on our UI is an instance of our Input component. We pass the name of an input node as a prop (like ‘monthly.sev1.incidents’) and the component handles validation for the user input depending on the node’s ‘type’ property. We use react-number-format for some types, so validation for user input does not have to be implemented from scratch. For example, the input field will not save any user input which is negative or greater than 100 if the node is of type “percent”. Once the user input passes validation, the component then dispatches a Redux action to the store to update the Redux state, as discussed in the previous section. The reducer ultimately returns a new version of valuesOfAllNodes, which contains the modified user input and updated values for child nodes which depend on the input field for their calculations.

In the parent component where the Input component needs to be rendered, the Input component would be instantiated like so:

<Input
  nameOfNode='monthly.sev1.incidents' //name of input node passed as a prop
  nodeDefinitions={nodeDefinitions} //nodeDefinitions is passed to the Input component for information about the node's 'type' and other properties
/>

In the render function of the Input component itself, the markup would look similar to:

Render function of the Input component

Our Output component is a stateless component that also takes the name of a node as a prop (like ‘yearly.sev1.incidents’). It finds the value of the node by doing a lookup in the Redux state object valuesOfAllNodes. As an example,

valuesOfAllNodes['yearly.sev1.incidents'] //would yield 36 based on the example above

Depending on the node’s ‘type’ from nodeDefinitions, the Output component tailors the formatting of the value.

If a node is of type “currency”, the Output component renders its value differently on our UI than if it is of “numeric” type. Our Output component formats currency values by determining the currency symbol of the user’s region and prefacing it before the monetary amount.

In the parent component where the Output component needs to be rendered, the Output component would be instantiated similar to:

<Output
  nameOfNode='yearly.sev1.incidents' //name of a non-input node passed as a prop
  nodeDefinitions={nodeDefinitions} //nodeDefinitions is passed to the Output component for information about the node's 'type' and other properties
  valuesOfAllNodes={this.props.valuesOfAllNodes} //valuesOfAllNodes from the Redux state is passed to the Output component to find the value of the node
/>

Render function of Output component

The convenience and consistency in how we manage user input increases drastically with our Input and Output components. They both use the ‘type’ property of the node from nodeDefinitions and have access to the node’s value from part of the Redux state, valuesOfAllNodes. We render hundreds of Input and Output components on some pages of our application. Having common components to validate, save, and show values of our nodes provides a sense of uniformity in our UI.

We Only Save Values of Input Fields on the Back End

Our database only stores values of input fields. If we look at the Sev1/Sev2/Sev3 Incidents section of our application’s UI:

Only values from input fields are saved in our database. Values from calculations on our UI are recomputed upon startup time. — Input values are saved on the back end, while output values are recalculated upon application startup.

Values of our application’s input fields (parent nodes of the DAG) are saved in the database, while calculated values (child nodes of the DAG) are not. The back end does not store any models related to formulas or calculated values. The directed acyclic graph storing the relationships between input and calculated fields is implemented by the front end.

When a user reopens our application, the base React component initializes valuesOfAllNodes (Redux state storing the value of every node) by:

1. Making an API call to fetch user inputs from the database.

We have a table that stores the name of every input field in the UI and the value the user answered for that field. These values (shown in the green boxes above) are fetched. From the screenshot above, we would fetch the total number of Sev1, Sev2, and Sev3 incidents investigated with Splunk per month.

2. Calling the ‘calc’ function defined in nodeDefinitions for every child node of the input fields fetched in step 1. By doing this, valuesOfAllNodes in our Redux state will contain values for input nodes as well as their children. We can then display input and calculated values from our Redux state to the user.

Since we fetched the user input for the total number of Sev1, Sev2, and Sev3 incidents per month investigated with Splunk, we can recalculate the values boxed in pink (the total number of incidents investigated with Splunk per year) by calling their respective ‘calc’ functions.

We essentially reconstruct the values of our DAG’s child nodes and store them in our Redux state while initializing our application. We can do this because we know the structure of our DAG from the ‘parents’ and ‘children’ properties defined for each node in nodeDefinitions.

Why We Don’t Store Calculated Values on the Back End

When a user changes the value in an input field, we make an API call to save the answer in our database. If we wanted to store calculated values on the back end as well, we would need to make API calls to maintain the value of every child node in our DAG. If a change to one input field had a cascading effect on ten child nodes, we would need to update the values of those ten calculated fields in our database every time the input changed. Inserting, updating, and fetching operations are not nearly as efficient to do on a database as compared to in-memory (on the front end). For example, fetching all the calculated values from our database (hundreds of child nodes) would be much more expensive than recalculating these values upon application startup time.

In addition, it would be a poor use of storage capacity to store derived values because it is essentially redundant to do so. Since we know all of the parent-child relationships of our DAG, we know all the calculations that need to be done and how they can be computed.

What If We Chose Not to Use a Directed Acyclic Graph?

A key question which would arise if we do not use a directed acyclic graph is: How would we know how to calculate the values of non-input fields? We would not have the concept of “parents” and “children” for each “node”. Every output cell of an Excel spreadsheet needs to know how to be calculated, and the same is true for the calculated outputs displayed on our web application.

A convoluted workaround might be to store a list of formulas for how each non-input field should be calculated from input fields. We would need to call the right formula from that list when a certain calculation should be done.

This approach becomes tricky for two reasons:

1. If we change the value of an input field which affects the calculation of many other values, we would need to be careful to call the formulas of the calculated fields in the correct order. From an Excel spreadsheet point of view, there could be a cell (Cell G) which is the average of four other cells (Cell C, Cell D, Cell E, Cell F). Let’s say each of these four cells is calculated in its own way based on user input from cells A and B.

Relationships between cells of a spreadsheet can easily be seen when structured as a directed acyclic graph. — We can intuitively understand relationships among spreadsheet cells when they are structured in a DAG.

If we change the value of the input in either Cell A or Cell B, we would first need to calculate the values of Cell C, Cell D, Cell E, and Cell F. Then, we can take the average of those to calculate the value of Cell G. Since there is no concept of the fact that cells C, D, E, and F are essentially “children” of cells A and B, it would not be immediately obvious that their values should be calculated first. We would need to ensure that the values for cells C, D, E, and F are calculated first and then do the calculation for Cell G. Imagine if we had an even deeper nesting of calculations — the number of function calls we would need to manually call in the correct order to calculate each output would keep growing. There would be no ability to do a breadth-first search or other type of topological sorting through every value of our application that needs to be calculated.

In reality, the depth of the dependency of calculations in our application extends dozens and dozens of layers, which would be an unmaintainable mess if we used a more traditional approach.

2. The maintainability of the code would significantly decrease. There would be no immediate awareness in the application logic of how certain values relate to other values. You would need to read through all of the formulas in order to understand the cascading effect each input value has on the rest of the application’s values. In contrast, our architecture produces a simple model for application developers to follow, making debugging easy and fast.

Unit Tests for Our Directed Acyclic Graph

We are also able to verify our directed acyclic graph has the expected structure by adding unit tests through our Jest framework. For example, we check the following in nodeDefinitions if node A is a parent for node B:

Node A is present in the ‘parents’ array of node B
Node B is present in the ‘children’ array of node A

Another potential test could verify that there is a ‘calc’ property defined as a valid function for nodes which depend on others for value.

Conclusion

Directed acyclic graphs can model spreadsheet behavior, and this idea was instrumental for the foundation of our web applications. The definition of each DAG not only describes the parent-child structure between spreadsheet cells, but also includes additional information on how to calculate and display the value of each cell. By storing values of the DAG’s nodes in the Redux state, we can easily work with them in any container or presentational React component. Our implementation is just one example of how graphs allow us to better understand and visually depict relationships between interconnected ideas and information.