XCSG Compendium — Data Flow
Find the index for the Compendium here.
In this article we will cover parts of XCSG schema that pertain to Data Flow. Data Flow is the largest portion of XCSG and is constantly improving; especially w.r.t to data flow involving pointers. We periodically update this article as the schema improves.
Let’s first understand how Data Flow is represented in XCSG.
- XCSG.DataFlow_Edge: An XCSG.Edge that represents transfer of data from one memory location to another. The memory location in question corresponds to a constant or a variable in the source code. Which brings us to,
- XCSG.DataFlow_Node: An XCSG.Node that corresponds to a value stored in a memory location. If the value is a constant then it is a variable or a constant. If the value is an address then it is a pointer.
An XCSG.DataFlow_Edge a->b
represents that the value corresponding to a
is accessed by the program and then stored in b
.
Organization of DataFlow nodes
Each XCSG.ControlFlow_Node represents an execution statement that may involve data flow. In XCSG, this data flow will be “contained” in the control flow i.e., the XCSG.DataFlow_Nodes representing the variables involved are connected to the XCSG.ControlFlow_Node by an XCSG.Contains edge.
Query example:var dfg = <function>.contained().nodes(XCSG.DataFlow_Node).induce(universe.edges(XCSG.DataFlow_Edge)
- This query gets all the data flow contained within a function. It enables computation of a intraprocedural slice as described by Mark Weiser.
Data Flow Nodes in XCSG can be roughly divided into three categories, first related to read/write operations, second related to arithmetic operations, and third related to logical operations. In addition, read/write operations can be further distinguished based on whether a pointer or an array is involved or not. We will cover them in this same order: read/write, arithmetic/logical, and pointers/arrays.
Read/Writes
- XCSG.Assignment: An XCSG.DataFlow_Node that represents a variable assignment. In terms of slicing literature, this will be called a ‘Definition’.
- XCSG.Constant: An XCSG.DataFlow_Node that represents a constant value in the source code.
Arithmetic and Logical operators
- XCSG.Operator: An XCSG.DataFlow_Node that represents an operation on one or more XCSG.DataFlow_Nodes.
- XCSG.UnaryOperator: An XCSG.Operator node that represents a unary operation such as negation.
- XCSG.BinaryOperator: An XCSG.Operator node that represents a binary operation such as addition.
- XCSG.leftOperand: An XCSG.DataFlow_Edge that connects an XCSG.DataFlow_Node to an XCSG.BinaryOperator node where the data flow node represents the left operand.
- XCSG.rightOperand: An XCSG.DataFlow_Edge that connects an XCSG.DataFlow_Node to an XCSG.BinaryOperator node where the data flow node represents the right operand.
- XCSG.Addition: An XCSG.Operator node that represents an addition operation.
Pointers/Arrays
- XCSG.Pointer: An XCSG.DataFlow_Node that represents a pointer variable.
- XCSG.ArrayRead: An XCSG.DataFlow_Node that represents an array read.
- XCSG.ArrayWrite: An XCSG.DataFlow_Node that represents an array write.