XCSG Compendium: Introduction
In our Unlocking the Power of Atlas series, we introduced Atlas, a software analysis platform, and went over its basics. The key component to Atlas is its Graph Representation — eXtensible Common Software Graph (XCSG). In this new series, we will be going over the details of the XCSG schema, explaining all the software artifacts it captures, provide examples of how it can be harnessed using Atlas query language.
XCSG Basics
XCSG is a graph representation schema that Atlas uses to represent software artifacts mined from the codebase as a Graph. So, every element in a Database that follows XCSG schema is either a node or an edge. The name of such categorization in XCSG is ‘tag’ and the schema is a hierarchical organization of all the tags in XCSG.
The most basic tags are as follows,
- XCSG.Node: Every node in XCSG is tagged XCSG.Node, which is the supertype that encompasses all nodes in XCSG.
- XCSG.Edge: Similarly, every edge in XCSG is tagged XCSG.Edge. An XCSG.Edge always connects two XCSG.Nodes. The endpoints are accessed using
e.from()
ande.to()
wheree
is an XCSG.Edge. - XCSG.Contains: An edge to represent a containment relation. As codebases are modularized, code segments are contained in files, files are contained in the entire codebase. Similarly, statements are contained in a function, variable definitions are contained in the statements where they are declared and so on. XCSG.Contains enables a seamless browsing of the entire codebase.
- XCSG.Project: A node that represents the entire codebase. All the code artifacts in a codebase are contained within this node by a chain of XCSG.Contains edges.
- XCSG.File: A node that represents a file in the codebase. It is connected to an XCSG.Project node by an XCSG.Contains edge (
project->file
)
Attributes
An XCSG.Node or XCSG.Edge may have a specific value associated with a property it possesses. For example, its location in the source code — path to the file that contains it, line number, and offset. Such values are stored as key-value pairs with the keys called as attributes. Depending upon the node or edge kind, it can have various attributes. Two most common attributes are,
- XCSG.name: Every XCSG.Node and XCSG.Edge has a name attribute. For edges it is always formatted as
<name of from node> -> <name of to node>
. For a node its value depends upon what kind of node it is. For example, for a function it represents the function name. - XCSG.sourcecorrespondence: This attribute is possessed by every XCSG.Node and represents the source code location of the artifact it represents.
The compendium is organized into three parts, please refer to the linked articles to learn more.