Jupyter Notebooks for BloodHound Analytics and Alternative Visualizations 📊 !

Published in

Open Threat Research

13 min readNov 18, 2019

Nowadays, either as an attacker or defender, detecting hard-to-find privilege relationship patterns and structures in connected data is a very important step while attacking and securing Active Directory domains. In this topic, BloodHound has become by far a must have open source tool among attackers and defenders to leverage the power of graph analytics to accomplish that. However, what if I wanted to use alternative ways to visualize graph query results or be able to replicate and share the overall data analysis in a more interactive, efficient and flexible way?.

In this post, I will show you how you could leverage Jupyter Notebooks and python libraries such as py2neo, altair and plotly.py to interact with BloodHound, share cypher queries, and visualize graph data in familiar chart formats during an engagement or while creating a report. This post was inspired by Andy Robbins work with BloodHound and PowerBI, and I hope this post gets not only defenders, but also attackers interested in using notebooks during engagements when it comes down to data analysis 💜🍻

Graphs vs Charts?

The words graph and chart are often used interchangeably, but they are not the same when referring to connected data. Charts represent data across multiple axes (mostly two, x and y) like a bar, pie or line chart while a graph is the representation of connected data in the form of nodes and edges. Even though they mean two different things, you can use charts as a complement to graph visualizations to represent data in other alternative ways to satisfy specific users and business needs.

BloodHound and Charts?

According to Andy Robbins, BloodHound is comprised of three parts: the Neo4j database, the SharpHound data collector, and the BloodHound user interface. Therefore, when I talk about using alternative visualizations for graph data, I mean using charts to represent cypher query results from the neo4j database and not replace the graph visualizations.

BloodHound, Charts and Jupyter Notebooks?

Yes, in this post I will show you how powerful Jupyter Notebooks are when working with the BloodHound database to do the following:

Connect to a BloodHound database and interact with it via the Bolt network protocol.
Send cypher queries to retrieve information about specific relationships from Active Directory connected data
Visualize cypher query results in chart format.
Save and share all the input and output including visualizations with anyone in the 🌎 in a flexible and practical way.

Prepare Use Case

As I mentioned before, I was inspired by two posts that Andy Robbins put together to show how you can use PowerBI to “create elegant data visualizations that will help reveal and communicate security-related insights about your Active Directory domains”. He ended up creating the following charts for his basic use case:

https://posts.specterops.io/visualizing-bloodhound-data-with-powerbi-part-2-3e1c521fb7ae

After reading those two posts, I started wondering if there was a more efficient, more practical, and easier way to re-create all of them via some python code and share the results via notebooks.

Pre-Requirements

I was ready to start running some tests for this proof of concept, but I was missing a few very important things:

An Active Directory environment (idea for a Mordor environment)
A BloodHound Database up and running (idea for a Mordor environment)
Data 😆 (another idea for a Mordor dataset 😉)

However, I remember that Andy Robbins had shared at the beginning of this year a way to test BloodHound with some sample data 😱 . It felt so good that I did not have to build an AD environment, deploy BloodHound and collect data to start using notebooks to interact with it. Also, something that I was not aware of is that the data shown in the posts was actually from the same lab domain available in the public BloodHound database 🤔.

We still need a Jupyter Notebook Server

That’s right! However, I have already taken care of it for you 🍻, and you have two options:

Notebooks Forge Project

The notebooks forge project is an initiative from the Threat Hunters Forge community dedicated to build and provide Notebooks servers for Defensive and Offensive operators via docker containers. For the purpose of this post and for future use of notebooks and BloodHound, I built the jupyter-bloodhound docker image and shared it via my public docker registry for you.

Also, all the notebooks used in this post are available as part of the image. All you need to do is the following:

git clone https://github.com/hunters-forge/notebooks-forge
cd notebooks-forge/docker/jupyter-bloodhound/
docker-compose -f docker-compose.yml up --build -d

Once the server is built and it is running, run the following command to get the full URL to access the Jupyter notebook server and run the notebooks

docker exec -ti jupyter-bloodhound jupyter notebook listCurrently running servers:
http://0.0.0.0:8888/jupyter/?token=78cdd491e6e9cad3d9796b9ef28c3e35fce41f0b62c4bef4 :: /opt/jupyter/notebooks

BinderHub Link

If you do not want to build anything on your own or you just do not feel comfortable with Docker containers, I leveraged the amazing work from the Binder Team to host the notebooks from this post via open infrastructure. Since the BloodHound database is public and available at bolt://206.189.85.93:7687 , then all we need is a notebook.

GitHub: hunters-forge/bloodhound-notebooks/master

Click to run this interactive environment. From the Binder Project: Reproducible, sharable, interactive computing…

mybinder.org

Note: Currently outbound calls to port 7687 is not allowed in the public service 🤔 For now, I recommend to use the docker containers from the notebooks forge project.

Ready? Connect to the BloodHound Database

There are several ways to connect to a neo4j database from Python, and the following link has all the available drivers for the job. I personally find py2neo very easy to use and work with when doing so.

What is Py2neo?

Py2neo is a client library and toolkit for working with Neo4j from within Python applications and from the command line. The library supports both Bolt and HTTP and provides a high level API, an OGM, admin tools, an interactive console, a Cypher lexer for Pygments, and many other bells and whistles.

Install Py2neo

If you are using the docker containers I provided, you do not need to install it. However, if you want to install it in a different system, you can do it via pip as shown below:

pip install py2neo

Now, all you have to do is initialize the Graph class to connect to the neo4j database via the bolt protocol. Remember, I am using the credentials provided in the tweet above. Run the following commands:

from py2neo import Graphg = Graph("bolt://206.189.85.93:7687", auth=("neo4j", "BloodHound"))
g

Retrieve Security Groups with Local Admin Rights over Computers

Now, following Andy Robbins instructions, “we’ll construct the Cypher query that tells us the name of each security group in Active Directory and the number of computers that group has local admin rights on”.

MATCH (g:Group)
OPTIONAL MATCH (g)-[:AdminTo]->(c1:Computer)
OPTIONAL MATCH (g)-[:MemberOf*1..]->(:Group)-[:AdminTo]->(c2:Computer)
WITH g, COLLECT(c1) + COLLECT(c2) AS tempVar
UNWIND tempVar AS computers
RETURN g.name AS GroupName,COUNT(DISTINCT(computers)) AS AdminRightCount
ORDER BY AdminRightCount DESC

Then, all I have to do is use the run method from the Graph class and pass the query as a parameter as shown below:

sg_computers_df = g.run("""
MATCH (g:Group)
OPTIONAL MATCH (g)-[:AdminTo]->(c1:Computer)
OPTIONAL MATCH (g)-[:MemberOf*1..]->(:Group)-[:AdminTo]->(c2:Computer)
WITH g, COLLECT(c1) + COLLECT(c2) AS tempVar
UNWIND tempVar AS computers
RETURN g.name AS GroupName,COUNT(DISTINCT(computers)) AS AdminRightCount
ORDER BY AdminRightCount DESC
""").to_data_frame()

Very easy right? Did you notice the to_data_frame method at the end of the command? That is because py2neo has the capabilities to extract the whole results as a pandas DataFrame. That opens the door to several new possibilities and creative ways to perform additional analysis to the results of a cypher query 😱 🙏 🍻 💜

Visualize Cypher Results : Bar Chart

One of the awesome things about doing all this via a notebook, is that I can skip all the steps taken by Andy Robbins to process the data and make it available via PowerBi’s interface. I’m working with a DataFrame now, and all I need to do is pass it to a Python library that creates visualizations from data in DataFrame format. One of my favorite ones is altair!

What is Altair?

Altair is a declarative statistical visualization library for Python, based on Vega and Vega-Lite, and the source is available on GitHub.
With Altair, you can spend more time understanding your data and its meaning. Altair’s API is simple, friendly and consistent and built on top of the powerful Vega-Lite visualization grammar. This elegant simplicity produces beautiful and effective visualizations with a minimal amount of code.

If you are using the docker containers I provided, you do not need to install it. However, if you want to install it in a different system, you can do it via pip as shown below:

pip install altair

Now, all I have to do to create a similar bar chart as shown in the reference post, is run the following code using the DataFrame sg_computers_df as the data input, the column AdminRightCount as my X axis and GroupName column as my Y axis.

bars = alt.Chart(sg_computers_df, title="Most Privileged Active Directory Security Groups").mark_bar().encode(
    x='AdminRightCount:Q',
    y=alt.Y(
        "GroupName:N",
        sort=alt.EncodingSortField(
            field="AdminRightCount",
            order="descending"
        )
    )
)text = bars.mark_text(
    align='left',
    baseline='middle',
    dx=3
).encode(
    text='AdminRightCount:Q'
)(bars + text).properties(height=300)

That’s it! Bar charts look the same to me 😆 with a few lines of code! Also, do not get scared of the extra lines of code you have to type to create the visualization. All that can be automated, and it is currently being worked on by Jose Luis Rodriguez with his Python library OpenHunt.

What about other visualizations in chart format? Well, another you can also create with python is a gauge chart one.

Visualize Cypher Results : Gauge Chart

I took the same approach from the Bar chart above and looked for a python library to let me use a DataFrame as an input. Another one of my favorite ones is plotly.py.

What is Plotly.py?

plotly.py is an interactive, open-source, and browser-based graphing library for Python ✨ Built on top of plotly.js, plotly.py is a high-level, declarative charting library. plotly.js ships with over 30 chart types, including scientific charts, 3D graphs, statistical charts, SVG maps, financial charts, and more.

If you are using the docker containers I provided, you do not need to install it. However, if you want to install it in a different system, you can do it via pip as shown below:

pip install plotly

Next, how do we build this gauge chart?

Based on the title “Percentage of Users with a Path to Domain Admin”, we can use the following query to get to similar results. I tweaked a query I found in the BloodHound Cypher Cheatsheet put together by Ryan Hausknecht

MATCH (totalUsers:User {domain:'TOKYO.JAPAN.LOCAL'})
MATCH p=shortestPath((UsersWithPath:User {domain:'TOKYO.JAPAN.LOCAL'})-[r*1..]->(g:Group {name:'DOMAIN ADMINS@TOKYO.JAPAN.LOCAL'}))
WITH COUNT(DISTINCT(totalUsers)) as totalUsers, COUNT(DISTINCT(UsersWithPath)) as UsersWithPath
RETURN 100.0 * UsersWithPath / totalUsers AS percentUsersToDA

Once again, I use the run method from the Graph class and pass the query as a parameter as shown below. Following the reference viz, we are talking about users from TOKYO.JAPAN.LOCAL

users_to_da = g.run("""
MATCH (totalUsers:User {domain:'TOKYO.JAPAN.LOCAL'})
MATCH p=shortestPath((UsersWithPath:User {domain:'TOKYO.JAPAN.LOCAL'})-[r*1..]->(g:Group {name:'DOMAIN ADMINS@TOKYO.JAPAN.LOCAL'}))
WITH COUNT(DISTINCT(totalUsers)) as totalUsers, COUNT(DISTINCT(UsersWithPath)) as UsersWithPath
RETURN 100.0 * UsersWithPath / totalUsers AS percentUsersToDA
""").to_data_frame()users_to_davalue_df = users_to_da['percentUsersToDa'].values[0]
value_df

Now, all I have to do is pass the percentage value of 13.58 to plotly

import plotly.graph_objects as gofig = go.Figure(go.Indicator(
    domain = {'x': [0, 1], 'y': [0, 1]},
    value = (value_df),
    mode = "gauge+number",
    title = {'text': "Percentage of Users with a Path to Domain Admin"},
    gauge = {'axis': {'range': [None, 100]},
             'steps' : [{'range': [0, 250], 'color': "lightgray"}],
             'threshold' : {'line': {'color': "red", 'width': 4}, 'thickness': 0.75, 'value': 490}}))fig.show()

That’s it! Again, only a few more lines of code.

What if I want to refresh the charts?

One of the main points about the idea of using PowerBI and BloodHound was that there was a beta Neo4j connector for PowerBI available to connect to your neo4j database, run the cypher query, process the results as JSON and then update your charts with the new results. That sounds cool! but, with notebooks, all you have to do to perform the same task is just to run the notebook again.

You can either do it manually (similar to the refresh button)

Or you can schedule your notebooks and run them programmatically with other libraries and save the output every time the notebook runs for reporting purposes.

What else? You can build BloodHound Playbooks!

What else can you do with notebooks and BloodHound data? You could either create your own cheatsheet with all your cypher queries in a more interactive way or group specific cypher queries for specific operations (i.e explore kerberoastable users). This can be very useful when you are trying to show others how to run cypher queries with BloodHound data, and even teach red and blue teams specific steps during an engagement (i.e training maybe? 😉)

Explore Kerberoastable Users

I was looking for specific workflows to document with notebooks, and I found Andy Robbins and Rohan Vazarkar presentation this year at Derbycon 2019 named “BloodHound head to tail”. I liked how they explained a few basic initial steps to find kerberoastable users and the context around each query. Therefore. I took their queries and comments and created a notebook with them 🍻.

Import Libraries and Initialize Graph Class

from py2neo import Graph
g = Graph("bolt://206.189.85.93:7687", auth=("neo4j", "BloodHound"))

Count Users with Service Principal Name Set

According to the BloodHound team, when sharphound finds a user with a Service Principal Name set, it sets the property named hasspn in the User node to True. Therefore, if we want to count the number users with that property set, we just need to query for users with hasspn = True.

users_hasspn_count = g.run("""
MATCH (u:User {hasspn:true})
RETURN COUNT(u)
""").to_data_frame()users_hasspn_count

You can then see the name of the users with the service principal name property set with the following query:

g.run("""
MATCH (u:User {hasspn:true})
RETURN u.name
""").to_data_frame()

Retrieve Kerberoastable Users with Path to DA

According to the BloodHound team, we can limit our results and return only Kereberoastable users with paths to DA. We can find Kerberoastable users with a path to DA and also see the length of the path to see which one is the closest.

krb_users_path_to_DA = g.run("""
MATCH (u:User {hasspn:true})
MATCH (g:Group {name:'DOMAIN ADMINS@JAPAN.LOCAL'})
MATCH p = shortestPath(
  (u)-[*1..]->(g)
)
RETURN u.name,LENGTH(p)
ORDER BY LENGTH(p) ASC
""").to_data_frame()krb_users_path_to_DA

Return Most Privileged Kerberoastable users

What if we do not have kerberoastable users with a path to DA? We can still look for most privileged Kerberoastable users based on how many computers they have local admins rights on.

privileged_kerberoastable_users = g.run("""
MATCH (u:User {hasspn:true})
OPTIONAL MATCH (u)-[:AdminTo]->(c1:Computer)
OPTIONAL MATCH (u)-[:MemberOf*1..]->(:Group)-[:AdminTo]->(c2:Computer)
WITH u,COLLECT(c1) + COLLECT(c2) AS tempVar
UNWIND tempVar AS comps
RETURN u.name,COUNT(DISTINCT(comps))
ORDER BY COUNT(DISTINCT(comps)) DESC
""").to_data_frame()privileged_kerberoastable_users

That’s how easy it is to interact with a BloodHound database with notebooks and a few lines of Python 😉 🍻

You are now ready to save the notebook and use it for either training or reporting or just to share your research with other BloodHound lovers around the 🌎.

Initial BloodHound Notebooks:

All the notebooks I put together for this post are available in here. You can get a static view with the following links. Unfortunately, BinderHub denies the outbound calls to port 7687 (maybe next time I can create a Mordor dataset for it to avoid those outbound calls 😉):

Future Work:

Translate the BloodHound cheatsheet , Cypher Query Gallery and examples from presentations like this one from Walter Legowski to notebooks so that anyone can interactively run the cypher queries and get the same results or simply visualize the input and output.
Collaborate with eh BloodHound community and create more notebooks with specific use cases to show the power of graph analytics when attacking and securing Active Directory environments. I am planning on using the notebooks-forge project to keep a library of notebooks for it.

I hope you enjoyed this post! Remember, the goal is not to replace the BloodHound UI, but to complement the capabilities of the BloodHound stack from a data analysis and visualization perspective. Once again, notebooks open the doors to additional analytics workflows, alternative visualizations and more creative ways to interact and process BloodHound graph data. Also, I believe this is the first time that I use a notebook to attack and secure Active Directory environments at the same time all from the same tool 💜 Bringing purple team ideas together 😉!