Plotting Hackers: Visualizing Attack Patterns

12 min readDec 27, 2016

I’ve fancied myself more of a command-line application developer for quite a while now — but, recently, I’ve had a lot of fun visualizing data with web applications that can provide interactive tools that can provide new ways to digest and analyze the different information detailed in log files.

Generally system metrics use cases, financial applications and GUI network tools may provide graphical representations in pie charts, bar graphs, line graphs — stuff that’s nice and fancy to show off.

Well, I really wanted to apply those use cases to attack metrics on applications — malicious events inside of log files; and visualizing those to see if I could get a new understanding of threats on my network.

So, I figured I’d share how I do that using a very simple Ruby-based web stack that — yes — doesn’t have anything to do with Rails.

To do the stuff I wouldn’t want to do, I use Sinatra to provide an extremely elegant DSL for quickly creating web applications in Ruby with minimal effort. I use Bootstrap for the front-end to make things responsive and pretty tolerable out of the box; and then I couple that with Chartkick to generate beautiful JS charts… like this one which made the following pie-chart:

<%= pie_chart(@data) %>

And since we’re building a web application, we can easily build tools to easily interact and visualize your data in different ways. In fact, anyway you can come up with ( or that is provided to you in whatever APIs ).

The data specifically I’ll be talking about is data which can be parsed from logs. Something like an SSH log or an FTP log. In this case, we’ll be focusing on malicious login attempts from this sample log that I’ve made up. I’d like to note that these IPs/data has been randomly generated. So, they’re not real attackers. The log reflects an — admittedly rather custom — application log you may find in the real world.

We can also work under the pretense that it’s already been grep’d for the failed login attempts to focus on that.

Parsing

Logs, especially custom ones, can be rather tricky to sort through. There may be very little consistency. Depending on the quality of the logs you’ll come across, you may have varying degrees of difficulty when it comes to properly parsing information of your logs.

Grep and awk can be your friend to to help pre-parse log data you may be interested in visualizing if you’re into that.

As a rule of thumb, I would suggest trusting as little information as possible from the log contents, and try to focus solely on the the placement of the information within the log itself for each line. Logs are typically computer generated, though they can contain information that users themselves can inject to mess with sillier forms of parsing or weak regular expressions.

In some cases, you may also come across logs which actually span multiple lines — and so parsing logic in terms of how you approach extracting information from that log may change.

Before you start parsing information, be sure to go over some of the information by hand ( or with tools like grep and awk ) to get an understanding of the general layout of the log format you’re going to work with.

I could probably write a lot more about some methods I’ve found to parse logs successfully — or badly, or both. But, anyway:

In our case, the sample log I provided has the following layout:

2016-12-22 05:12:43 | W | Failed login 'admin' from 192.168.1.1

We can see that for failed login attempts, the IP address is at the end of the line. The time of the event is at the beginning of the line; and after that is the type of log message that it happens to be, which is a W. We can assume — since I made this — the W means “Warning”.

The actual username that failed to login is kind of oddly placed with single-quotes for delimiters. So, if we filter for information that matches that layout, we can pretty easily parse the line based on position. I’ll touch on some issues you could run into later, perhaps … anyway:

Re-using logic can be helpful. From the sample log, there’s not a whole lot going on. We know they’re all warnings. We know they all contain the username that’s been tried, the IP address of that attempt, and the time of the event. So, we have three main characteristics we can derive from each line.

Build’a Parser

It’s not like you’re going to be handed a handy-dandy log-parsing tool that’s going to fit your needs every time. If you can build your own — if you have the time — then, you can take advantage of managing how you work with the information in logs without being limited.

We can build our own custom API to help us work with our data. We can store it however or wherever we want to. I choose to just shove everything into a hash very easily for this example. Then shove all those hashes into an array.

For this class, it treats parsing in two forms. It has a parse_file() method and a parse_line() method, which both — as you would expect, parse a file and parse a line in that file respectively.

Parsing a line results in a key/value pairing that’s really easy to “query” to build up other methods such as usernames(), ip(), and times().

Or whatever you’d like to create — once you’ve put that relevant information in the proper container.

Feel free to build your custom parsing logic whichever way pleases you the most or works for your workflow the best.

The actual behavior and way you want to contain your data in may make some processes easier than others. I like keeping things simple — if I can.

Extending the Parser

So, let’s try to keep things simple and extend our current parser to take advantage of Chartkick’s timeline. We need to get our data in the proper form to be digested correctly by Chartkick.

This gives us a representation of our parsed data with — hopefully — the proper information to use Chartkick’s timeline feature! The output of the timeline() method would look something like this with pry:

Having a flexible container for data like arrays and hashes can make re-organizing data easy.

Build’a Web App

Building a web application with Sinatra is incredibly straightforward. In fact, let’s get started with the following:

To install the sinatra gem: $ gem install sinatra

We can start running this application from the command-line:

$ ruby simple_sinatra_app.rb== Sinatra (v1.4.6) has taken the stage on 4567
Thin web server (v1.7.0 codename Dunder Mifflin)
Maximum connections set to 1024
Listening on localhost:4567, CTRL+C to stop

So, if we were to open up our web browser and go to 0.0.0.0:4567 we would find our application there — and hopefully working like so:

Not too shabby! But, not too pretty, eh?

Bootstrap’n Sinatra

Sinatra can be used pretty easily like a MVC framework. We can edit the views — the way stuff looks; the front-end — all sorts of ways with Sinatra. Inline, with templates, without templates, whatever you like… Let’s build a really simple Bootstrap Template to use for Sinatra!

I’ve actually made some boiler-plate code that I use from time-to-time for Bootstrap 3.3.7 and Sinatra. But, the gist of it is:

We need to make a new directory in the same folder as our Sinatra application called “views” — this, quite obviously, will contain the templates ( or “views” ) we’ll use for our application. There are lots of template syntax options available.

For this use case, we’ll use erb. It looks a lot like HTML — because it basically is with some ruby mixed in. The directory structure should look like this for our web app:

web_app_dir/
├── app.rb
└── views
    └── layout.erb1 directory, 2 files

The layout.erb template will be applied to all of the web pages for our application. So, let’s use that to figure out the HTML, CSS and JS we want for pretty much every page.

At this point you may have to go download Bootstrap, Chartkick — then make a new directory called public for Sinatra — then you could put all of the JS and CSS stuff you may want to include on the local server you’re serving up pages from.

But, nah. I ain’t about that fam. Yolo. Bruh. I’m hip. We’ll use a CDN to make things easy to build something like this:

We can also create another erb template for our application — this time, how about for our index/home page? We can make things look pretty professional pretty easily — all while being pretty, well… pretty!:

So, we should have a directory structure that looks like this now:

web_app_dir/
├── app.rb
└── views
    ├── index.erb
    └── layout.erb1 directory, 3 files

We now have to update our app.rb file to know to serve up the erb template for our index page and to include the Chartkick gem to use it:

To install the chartkick gem : $ gem install chartkick

When we spin that bad-boy up, we get something neat looking:

Now, let’s start getting some of that data all visualized and what-not!

Put a Parser in a Web App and Shake it All Up!

Since we’ve already written a parser, and a web application template, we just need to put them together with a little more magic. We could have an upload log feature here or something if we wanted to be super user-friendly to ourselves, but I really want to just get into the data, ya’ know? That’s the point after all! I just like it when things look nice.

Timeline

Since we’ve already written the timeline method to use for Chartkick, we edit out app.rb file to look like so to make our timeline feature work:

We also need to add a timeline.erb file into our views directory now:

When we spin up our web application again and go to /timeline, we’ll see the first fruits of our parsing labor!

This kind of timeline is really useful for comparing how long an attacker has been around compared to others.

Maybe we want to boil that down a little further with something like a line graph to see when an attacker tried to get in, how many times, and what that looks like one IP address at a time.

Well, lets add a targeting option to look at the timeline for a specific IP address using a line graph.

Line Graph

To add the feature to target a specific IP address, lets add a form to the timeline page by modifying the erb template like so:

We’ve basically just put a form with an input box right above the graph. When a user posts a single IP address, we’re going to want to show them a single timeline line graph, so we’ve updated our erb logic to reflect that. We need to make sure our Sinatra logic and parser logic work correctly.

Let’s make sure Sinatra knows how to handle that POST parameter. We also haven’t written a method that would get the timeline events for a specific IP address… But, that’s really simple:

Then, let’s say we had a question about that suspicious looking timeline at the very top for the ( not real attacker ) IP address: 100.210.123.5

It’s has a timeline of 114 days?! Better look at what’s happened in that time!

So, let’s put that IP into our new form near the top of the page!

Then we are taken to our new line graph:

We can see that the following chart is less scary — and we now know that, yes, the attacker has been around for a while — but hasn’t been that persistent either. Maybe it’s just someone who messed up their password?

But this seems to catch my interest a little more.

Pie Chart

Maybe I want to know which IP address has attacked that login page the most? A pie chart to compare our results could be really nice to have!

In order to make that happen, we need a piechart.erb file in the views directory we made earlier — then we need some Sinatra logic, and some new parser logic to accommodate the Chartkick pie_chart() method.

We need to update our app.rb like so:

Then, we need to create the piechart.erb template that Sinatra can serve up:

Which, when ran, looks like this:

Now, I have some questions about that 157.160.48.237 IP, wouldn’t you?

Checking out the single timeline view for 157.160.48.237 could be helpful to figure out when all those attack took place:

Since I left in the same form from the timeline, it should work just the same to view that single IP addresses’s timeline, right? I could just put the IP in the search bar, right?

Bingo, we get some answers! Works like a charm.

On August 8th, there was 143 attacks — and on September 9th, there was 86 attacks… interesting!

If I wanted to help correlate that on the timeline, that’s simple too:

Helps us compare attackers and attack cycles.

Nifty Patterns to Discover

Sometimes you run across some interesting data for sure… For this IP address 105.22.61.253 there seems to be a legit sine wave going on in that attack pattern…

And this ended up being really nifty — like, someone was (sine)waving at me or something. 👋

September  9th : 1 attempt(s)
October   10th : 4 attempt(s)
November  11th : 3 attempt(s)
December  12th : 5 attempt(s)

Some attackers seem to try less often or will try more first and then seemingly ween off to a slower attack — maybe with default passwords? Or maybe it’s just a user who forgot their password? I dunno.

The location of the IP address may help determine that kind of thing. But, this IP was randomly generated — so, it’s no big deal.

Findings

I love Ruby ( and the community )! You can do all of this stuff so easily. Granted, you could accomplish the same sort of stuff with pretty much any programming language.

Regardless, I find the simplicity and rapid prototyping I’m able to accomplish with Ruby to be easy and insanely satisfying. I find myself really happy doing this stuff, and excited to dig through the data!

Creating custom log analysis tools is genuinely helpful to analyze different attack patterns you may find or collect. I can help myself better understand log data better by having multiple visual representations of the data in different contexts beyond just what the text content provided at first glance.

I’m finding web applications to be an interesting digital “canvas” of sorts to be able to get build wonderful new visual representations of the data and ways to interact with the data I’m interested in learning about. Sinatra, Bootstrap, Chartkick; and of course Ruby are all wonderful mediums to work with on that canvas.

I really think you should check them out!

Quick Security Warning

Just because this stuff is easy to do, doesn’t make it %100 secure. Wait… when has easy ever meant secure? Like, you’re probably fine if you’re not opening stuff up to the internet.

I was showing you ways to get started building applications, but I didn’t really talk about any of the security involved. Rule of the thumb: don’t trust user input and just don’t open web applications like these to the internet, yeah? Just be careful.

Coolio.