Exploring Network Traffic and Anomalies through Interactive 3D Surface Graph Visualization written in Python

Marcell Ujlaki
5 min readJan 19, 2023

--

Hi,

This is my first blog post where I would like to share my experience with data visualization, detailing my journey of visualizing network and anomaly data using Python. While I utilized plotly in this instance, I have also heard that Matplotlib may be a better option so who knows, maybe next time I will use that.

I believe that data visualization can greatly enhance our understanding of our environment and aid in threat hunting processes. The 3D surface graph visualization provides a comprehensive view of network and anomaly data. It allows for the detection of patterns, trends and anomalies that can aid in the prevention of cyber threats. Utilizing python for data processing and visualization makes this process efficient and effective. Okay, enough about this intro bullshit…

TL;DR

Instead of visualizing summarized transmitted network data, focus on visualizing anomaly scores. In the following sections of this blog, I will be showcasing three graphs:

  • A graph displaying a summary of request and response sizes for test data over the past 15 minutes, with a time span of 1 minute.
  • A graph showing data collected from the past 7 days, with a time span of 1 hour, generated by the 192.168.0.0/24subnet.
  • A 3D visualization of machine learning-based anomaly data in Python (cannot give more buzzwordy title for this 😃 )

The test phrase

Initially, I experimented with a small amount of data for testing purposes. The data was sourced from Splunk and covered the past 15 minutes. I collected proxy logs and summarized the request and response traffic size using a 1-minute time span, converting the values to megabytes. Below is an example Splunk search I used for this:

index=proxy source=proxy sourcetype="proxy"
| eval responsesize_mb = round(responsesize/1024/1024,2)
| timechart span="1m" sum(responsesize_mb) by clientip useother=f

index=proxy source=proxy sourcetype="proxy"
| eval requestsize_mb = round(requestsize/1024/1024,2)
| timechart span="1m" sum(requestsize_mb) by clientip useother=f

I exported the results from Splunk and modified the CSV, as the output was not suitable for visualization yet. In my code, I added the response graph above the requests (for easier viewing), by adding a constant value. This may not be the best practice, but it works for now (you can also add z_data_request.max() to z_data_response, for a better approach). The python code I used:

import plotly.graph_objects as go
import pandas as pd
import plotly.express as px

z_data_request = pd.read_csv('request.csv', index_col=0)
z_data_response = pd.read_csv('response.csv', index_col=0)

z_data_response_pushed = z_data_response +4000

fig = go.Figure(data=[
go.Surface(z=z_data_request.values, name='request'),
go.Surface(z=z_data_response_pushed.values, showscale=False, opacity=0.5, name='response')])
fig.update_traces(showlegend=True,
showscale=False,
contours_z=dict(show=True, usecolormap=True,
highlightcolor="limegreen", project_z=True))
fig.update_layout(title='test',
autosize=False,
scene_camera_eye=dict(x=1.87, y=0.88, z=-0.64),
scene = dict(
xaxis = dict(
title = 'time',
nticks = 5,
ticktext = z_data_request.columns,
tickvals= list(range(0,z_data_request.shape[1]))),
yaxis = dict(
title = 'hosts',
nticks = 5,
ticktext = z_data_request.index,
tickvals= list(range(0,z_data_request.shape[0]))),
zaxis = dict(
title = 'MB/minute'),
),
width=1000, height=1000,
margin=dict(l=65, r=50, b=65, t=90))
fig.show()

And the result:

More data needed

After conducting tests, I wanted to gain a broader understanding of the data. I analyzed the proxy logs for a 7-day period, with a 1-hour time frame, focusing specifically on the 192.168.0.0/24 subnet. To accomplish this, I used the following Splunk search query:

index=proxy source=proxy sourcetype="proxy" clientip=192.168.0.0/24
| eval requestsize_mb = round(requestsize/1024/1024,2)
| timechart span="1h" sum(requestsize_mb) by clientip useother=f
| fillnull

Little code modification was required, only changes to the data source and some slight renaming had to be done.

As we can observe from this graph, the request traffic peaks are clearly visible. By narrowing down the host information and performing some baselining, valuable insights can be extracted from this graph. The code itself:

import plotly.graph_objects as go
import pandas as pd
import plotly.express as px


z_data_request = pd.read_csv('subnet_7day.csv', index_col=0)

fig = go.Figure(data=[
go.Surface(z=z_data_request.values, name='request')])
fig.update_traces(showlegend=True,
showscale=False,
contours_z=dict(show=True, usecolormap=True,
highlightcolor="limegreen", project_z=True))
fig.update_layout(title='test_data_sublet_7day',
autosize=False,
scene_camera_eye=dict(x=1.87, y=0.88, z=-0.64),
scene = dict(
xaxis = dict(
title = 'hosts',
nticks = 5,
ticktext = z_data_request.columns,
tickvals= list(range(0,z_data_request.shape[1]))),
yaxis = dict(
title = 'time',
nticks = 5,
ticktext = z_data_request.index,
tickvals= list(range(0,z_data_request.shape[0]))),
zaxis = dict(
title = 'MB/minute')),
width=1000, height=1000,
margin=dict(l=65, r=50, b=65, t=90))
fig.show()

The result:

Visualize anomaly

Finally, we have reached the main focus of the article. So, the initial results did not meet my expectations, so I shifted my attention to Microsoft 365 Defender to gather anomaly data from the past 30 days. I created a series of specified aggregated values using the DeviceLogonEvents` successful logon events (with make-series) and applied a machine learning anomaly detection technique based on series decomposition to identify and score anomalous points. The query used for this was as follows:

let fromdate = ago(30d);
let enddate = now();
DeviceLogonEvents
| where ActionType == "LogonSuccess"
| make-series ActionType = count() on Timestamp from fromdate to enddate step 1d by DeviceName
| extend (flag, score, baseline) = series_decompose_anomalies(ActionType)
| sort by DeviceName asc
| project DeviceName, score

It was clear that the data required some manual adjustments. After formatting the csv as necessary our data looks like this:

I did only minimal amount of code modification. I simply added two values to adjust the scale of the graph, as the results included negative values as well:

minvalue = z_data_request.min()
maxvalue = z_data_request.max()

[…]

zaxis = dict(
title = 'how far from the baseline',
range = [minvalue,maxvalue]),),

The full script:

import plotly.graph_objects as go
import pandas as pd
import plotly.express as px


z_data_request = pd.read_csv('modded_m365d_anomaly.csv', index_col=0)
minvalue = z_data_request.min()
maxvalue = z_data_request.max()

fig = go.Figure(data=[
go.Surface(z=z_data_request.values, name='logon')])
fig.update_traces(showlegend=True,
showscale=False,
contours_z=dict(
show=True,
usecolormap=True,
highlightcolor="limegreen",
project_z=True))
fig.update_layout(title='test_data_m365d_anomaly',
autosize=True,
scene_camera_eye=dict(x=1.87, y=0.88, z=-0.64),
scene = dict(
xaxis = dict(
title = 'time',
nticks = 50,
ticktext = z_data_request.columns,
tickvals= list(range(0,z_data_request.shape[1]))),
yaxis = dict(
title = 'hosts',
nticks = 50,
ticktext = z_data_request.index,
tickvals= list(range(0,z_data_request.shape[0]))),
zaxis = dict(
title = 'how far from the baseline',
range = [minvalue,maxvalue]),
),
width=1000, height=1000,
boxmode='group',
margin=dict(l=65, r=50, b=65, t=90))
fig.show()

And the result:

The anomaly data clearly demonstrates its potential, it can be utilized to initiate a threat hunting process. For example, it raises questions such as, why is there a significant peak for one host with a score value of around 28k? What caused this spike?

I hope you liked it. bye

--

--

Marcell Ujlaki
Marcell Ujlaki

Written by Marcell Ujlaki

M. Sc. | Former Military Officer | Security Analyst | Threat Hunter | AI Engineer | Security Architect | PGP: 2D9C 18DE 5B30 A269

No responses yet