Thinking in Graphs: Exploring with Timesketch

As an incident response engineer at Google, nearly every incident I’ve investigated leads to one common truth: relationships between events are more important than the events themselves. As of today Timesketch supports a new experimental graph integration to explore these relationships.

Say hello to the new Graph View

Complementing the tabular view

Timesketch was designed to make collaboration, sharing and search easy as well as quickly correlate disparate events. The default search experience is based on Elasticsearch and the output is detailed and verbose by design. The tabular view shows ordered data available for specific queries.

Common investigations that are best suited for a tabular view include:

  • Profiling a sequence of actions performed by a set of users
  • Sequencing a number of process executions
  • Listing files created during a set time period

More complex correlations like listing all users that were logged into a system that executed a specific process might however require multiple tabs, queries and much copying around. By building graph models from activities like this makes the data more accessible to the analyst without sacrificing transparency.

Graph backend

Timesketch uses Neo4j as a graph database backend. Neo4j implements a property graph and uses native graph storage to make traversals fast. Nodes in the graph can have have zero or more labels, and edges have exactly one type. Edges (or relationships as they are called in Neo4j) are directed and both nodes and relationships (edges) can have properties. These features gives us a powerful and expressive way to model and query graphs.

Graph model

The following examples profile a hypothetical (but realistic) incident and focuses on Windows logins and services. Windows event log entries (EventID 4624/7045) was used to structure the example graph used in this post:

Nodes
Windows machines:           (machine)
Windows users: (user)
Windows services: (service)
Windows service image path: (image path)
Relationships
(user)-[ACCESS]->(machine)
(machine)-[ACCESS{user:<USERNAME>}]->(machine)
(service)-[START]->(machine)
(service)-[HAS]->(image path)

Graph query language

Neo4j ships with a powerful query language called Cypher. Cypher has one of the lowest learning curves for graph queries thanks to its visual nature and good tutorials. A query looks like this:

MATCH (a:Label)-[r:TYPE]->(b {property: “foo”})
WHERE a.foo = “bar”
RETURN *

This query roughly translates to: “Give me all paths where a given node “a” has a directed TYPE relationship to another node “b” that has a property “foo”, where “foo” = “bar”. As you can see, Cypher is like working directly from a whiteboard representation of the graph with some SQL added. This makes it easy to get started and the learning curve is not too steep.

Example1: Get all interactive logins

Let’s get started with some examples. First, let’s retrieve all users who logged in to a system interactively:

Returns all paths where a WindowADUser has a ACCESS relationship to a WindowsMachine the ACCESS login method is one of Interactive, CachedInteractive or Unlock
MATCH (user:WindowsADUser)-[r1:ACCESS]->(m1:WindowsMachine) WHERE r1.method IN [“Interactive”, “CachedInteractive”, “Unlock”]

You might notice that there is a number on the relationship in parentheses. Instead of adding many edges to the graph the system adds the timestamp for each login event as a list property on the relationship itself. You will see how an analyst can take advantage of this later.

Transparency

In digital forensics it is important to be able to reason about how events are connected and why tools represent the data the way they do. When building tools the developer has to be transparent about how these relationships are created.

Selecting a relationship shows an option to display which events were used to create that connection. This way you can explore and verify the data within the context of the graph and reason about why the data is represented the way it is. You can quickly switch between tabular view and graph view for the same data.

Example 2: List all users who logged in to a system where a specific service was started

This query is a bit more complicated and shows some of the power of the Cypher query language. For the sake of this example a normal legitimate service was chosen but you can imagine this being an evil service started by an attacker:

Return all paths where a user logged in to machine A and then then laterally moved to a machine B where at some time a service was started with path to an executable with name GROOVE.EXE
MATCH (user:WindowsADUser)-[r1:ACCESS]->(machineA:WindowsMachine)-[r2:ACCESS]->(machineB:WindowsMachine),(machineB)<-[r3:START]-(service:WindowsService)-[r4:HAS]->(path:WindowsServiceImagePath) WHERE r2.username = user.username AND path.image_path contains “GROOVE.EXE”

Example 3: What about time?

There is one thing missing in the above queries: time! Timesketch is about timeline analysis and the temporal dimension in the graph is important. Remember that there are timestamps added to each relationship as a list property, let’s see how that can be leveraged:

Return all paths where user logged in interactively on machine A and then laterally moved to machine B using a network login. Constraint the result with a time window between the first login and the second login that has to be within 12 hours.
MATCH (user:WindowsADUser)-[r1:ACCESS]->(m1:WindowsMachine)-[r2:ACCESS]->(m2:WindowsMachine) WHERE r1.method IN [“Interactive”, “CachedInteractive”, “Unlock”] AND r2.username = user.username AND r1.timestamp < r2.timestamp < r1.timestamp + 60 * 720

The important detail in this query is:

r1.timestamp < r2.timestamp < r1.timestamp + 60 * 720

Instead of requiring complex Cypher statements the analyst can express time filters on relationships instead. The query is transpiled into valid Cypher on the backend and returns the matching nodes and relationships based on the time filter supplied.

This is just the beginning and the Timesketch team is excited to see what can be done with graphs and incident response. The roadmap includes:

  • A Framework to model graphs from Timesketch queries
  • Manually adding nodes/relationships
  • Exploring graphs by following relationships
  • Saving snapshot of graph layout (like saved views)
  • Executing graph queries and alert on matches
  • Adding graph support to timesketch-api-client
  • Pre-defined queries with user supplied parameters

If you made it this far you are probably keen on testing this feature yourself. The Timesketch team has made experimenting easy by loading and building a graph on the public Timesketch demo server. Just login with demo/demo and start exploring!

Many thanks to Franciszek Piszcz who implemented this feature over the course of his 3 month Google internship! Also many thanks to Stefan Weghofer, Google software engineer who lend his expertise and knowledge to the project and helped make it successful.


Johan is a Senior Security Engineer at Google. He is the author of Timesketch. If you like articles like this — or interested in open source digital forensic tools — you can follow him on Twitter.

If you want to fetch the code it is available over at GitHub. Installation instructions are available on the Wiki.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.