Threat Hunting in the cloud with Azure Notebooks: supercharge your hunting skills using Jupyter and KQL

Maarten Goet
8 min readFeb 20, 2019

--

Robert M. Lee has a great quote: “Threat hunting exists where automation ends”. Threat hunting is large manually, performed by SOC analysts, trying to find a ‘needle in the haystack’. And in the case of cybersecurity, that haystack is a pile of ‘signals’.

These analysts often use separate tools for querying the data, manipulating the data set, reversing the potential malware, etcetera. What if we could provide an environment where you can perform all these tasks in context, and share the outcome with your team?

Azure Notebooks, with a little KQL magic sauce, is exactly that. Let’s supercharge your hunting skills with Azure, Jupyter, Python and KQL!

Kusto Query Language (KQL)

Kusto Query Language or KQL in short is the default way to work with data in Azure Data Explorer powered services such as Log Analytics, Azure Security Center, Azure Monitor and many more. It is a powerful yet easy to learn language.

Robert Cain, a Microsoft MVP, has written a 4-hour long course on Pluralsight that you can take for free, to learn the language all the way up to the advanced queries. KQL skills is something you’ll need if you will be doing threat hunting in Azure; most of the security data will be in Log Analytics workspaces.

Jupyter

Jupyter Notebook, formerly called IPython, is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text through markdown. It is already broadly used in data science, and has support for lots of programming languages such as R, Python, etc. The multi-user version of Jupyter is called JupyterHub.

The cool thing is that you can share your notebook with others, and that you can produce interactive output using HTML etc. and display that through a so called “presentation mode”. This makes it great for threat hunting and sharing signals within the SOC team.

On GitHub you’ll find ready-to-run Docker images containing Jupyter.

Azure Notebooks

Azure Notebooks is currently in public preview and is a free hosted service to develop and run Jupyter notebooks in the cloud with no installation. Azure Notebooks is a freeservice, but each project is limited to 4GB memory and 1GB data to prevent abuse. Legitimate users that exceed these limits see a Captcha challenge to continue running notebooks.

However, if the Azure Active Directory account you sign in with is associated with an Azure subscription, you can connect to any Azure Data Science Virtual Machine (DSVM) instances within that subscription. DSVM’s can be found in the Azure Marketplace. With these dedicated DSVM’s you can add better processing power and remove any of those limits.

PRO TIP: You need to deploy the Ubuntu version of the DSVM. The Windows version of DSVM does not contain JupyterHub by default. The Ubuntu template of DSVM has an extra bonus: it will open up the right ports by default in your NSG!

In the case of Azure Notebooks, it allows you to share your notebooks using GitHub.

Pandas, KQLMagic and other libraries

One of the things you will find out early using Jupyter is that you will want to manipulate data. This is where a library called Pandas comes in. Pandas is an open source Python framework, maintained by the PyData community and mostly used for Data Analysis and Processing.

Another library you will need is KQLMagic. Michael Binshtock, who works at Microsoft, wrote this and allows you to directly query Log Analytics-based workspaces in Azure, for instance when working with data in Azure Monitor, Azure Security Center, etcetera. The great thing about this library is that it uses the Kusto Query Language (KQL). Which means that you can use your favorite KQL queries directly in Jupyter.

The big picture

Putting all the pieces together you get something like this:

Real-world threat hunting

Let’s look at a real-world example. In this case we have a number of virtual machines running in Microsoft Azure, and Azure Security Center is turned on at the subscription level to capture relevant security events.

We’re suspicious of a machine called APPSERVER, based on an Alert we got fromAzure Security Center, and want to do some investigation.

We go to Azure Notebooks and login:

We create a new Project called ‘Threat Hunting’:

We create a new Notebook called ‘Azure threat hunting’:

We will use the Free Compute option and open the notebook:

The docker image that the Free Compute option provides *already contains* the Kqlmagic library that we will need. If you’re using a dedicated DSVM, or are running Jupyter locally, you should run the install command to get the library installed:

!pip install Kqlmagic — no-cache-dir — upgrade%reload_ext Kqlmagic

PRO TIP: The Free Compute is a docker container in a shared compute environment and therefore it will take a couple of minutes before the library loads. You can look ‘behind the scenes’ by using the Terminal button:

Through the terminal window you can issue commands such as ‘ps’ or ‘top’:

Now we will need to authenticate to the Log Analytics workspace we will be using. In this case we will be connecting to Azure Security Center:

%kql loganalytics://tenant=’<tenant-id>’;clientid=’<aad-appid>’;clientsecret=’<aad-appkey>’;workspace=’<workspace-id>’;alias=’<workspace-friendly-name>’

We can now run KQL queries to look at the data being captures by Azure Security Center for this machine. In this case we’ll have a look at the network connections it has:

%kql VMConnection | where Computer == ‘APPSERVER’

PRO TIP: While in the KQL query interface in Azure you’ll be using the double quote character for specifying input, you’ll be using the single quote in Jupyter. Make sure to change your queries so that they work properly in Jupyter.

If you want to go multi-line to make things better readable, you need to use double %. As our application server is in The Netherlands, I will apply a filter and only show the connections that are going to IP addresses that our outside of our country:

%%kqlVMConnection| where Computer == ‘APPSERVER’| where Direction == ‘outbound’| where RemoteCountry != ‘Netherlands’connections = _.to_dataframe()

Let’s see if any of these IP addresses match a TOR node. There is a current list of TOR nodes and their IP addresses at https://www.dan.me.uk/torlist. We can load that into our notebook using Pandas:

import pandas as pdtorlist = pd.read_csv(‘https://www.dan.me.uk/torlist’,header=0,names=[“DestinationIp”])

The next step is to compare the two lists to see if there are any matches:

connections.merge(torlist, on=”DestinationIp”)

Another great way of visualizing your data is taking the Longitude and Latitude points from the KQL query and putting them on a world map. Add the following line to your KQL query:

| distinct RemoteLongitude, RemoteLatitude| project RemoteLongitude, RemoteLatitude

And use the following Python code in Jupyter:

!conda install basemap -yfrom mpl_toolkits.basemap import Basemapimport matplotlib.pyplot as pltmap = Basemap(projection=’merc’,llcrnrlat=-80,urcrnrlat=80,llcrnrlon=-180,urcrnrlon=180,lat_ts=20,resolution=’c’)map.drawcoastlines()lons = locations.RemoteLongitudelats = locations.RemoteLatitudex,y = map(lons, lats)map.plot(x, y, ‘bo’, markersize=10)plt.show()

From this point on you’ll likely want to do some more investigation and assess whether or not there is a real threat. Use your own hunting skills for that ;-)

Sharing your findings

An unique feature of Jupyter is the Presentation mode. It allows you to easily share key items from your audience to other people in a visual friendly way, without having to copy/paste data to another application.

You can use Markdown text to annotate your notebook. Enable the Slide picker by going to the View menu, Cell Toolbar, then Slide Show. Go to any row and on the right-hand side select to Skip it, be part of a Slide, etcetera.

Lastly, click on ‘Enter/Exit RISE Slideshow’ to share your findings:

I’ve published this Jupyter notebook on my GitHub repository. Another great thing about Azure Notebooks is that you can clone any repository and turn it into a Jupyter project:

Other examples

John Lambert, distinguished engineer at Microsoft’s Threat Intelligence Center, has some other great examples on threat hunting with Jupyter which he has shared here:

* Analyze shellcode payloads* Malware analysis* Decoding malicious PowerShell from the safety of an Azure Linux VM running PsXray

There is also a sample notebook on MyBinder that shows you step-by-step which Kqlmagic commands are available, and how to use them.

Conclusion

Jupyter is a great platform for threat hunting. You can work with data in-context and natively connect to security backends in Microsoft Azure using Kqlmagic.

Best of all, using Azure Notebooks and Azure Security Center, we didn’t spend a dollar and got our threat hunting platform for free :-)

Start learning KQL, Python and Jupyter today and supercharge your hunting skills!

Thank you

A big thank you goes out to Michael Binshtock, John Lambert, Giovanni Lanzani and Paul Shealy. They have been invaluable on providing support, and background information, while I wrote this article.

Happy hunting!

— Maarten Goet, MVP & RD

--

--