My Serverless Frankenstein for Honeypots
This past May I posted a blog on Honeypots As A Service (HaaS), which is a new feature I launched on my threat information sharing site HoneyDB. It’s a cool feature that enables anyone to deploy servers running the HoneyPy honeypot. It was, and still is, a paid service as the goal is to raise support for more reliable server infrastructure. Since the site was created, I had been running HoneyDB on a very unreliable provider. Why? Because it was dirt cheap! In this post, I’ll provide and update on HaaS and more importantly, HoneyDB’s new infrastructure and architecture (Hint: the title of this post.
Honeypots As A Service
You have to HaaS it! That was the little slogan I came up with when I presented and launched the new feature at CarolinaCon 13. I mean, who doesn’t want to run a honeypot? However, this was an experimental concept, and for a few reasons. First to test if there were enough consumers of HoneyDB’s threat information that thought it was useful enough to help support it financially. Second, are there enough users out there that would pay for HaaS knowing it is helping support the project.
However, aside from a few inquiries there really wasn’t much interest in the HaaS feature. In June, I was invited to present on HoneyPy and HoneyDB at a Triangle Python Users Group where I felt I needed to change the HaaS slogan to, “it turns out, you don’t have to HaaS it”.
To be fair, I really haven’t promoted the HaaS feature much at all. The initial version of HaaS on HoneyDB is very much functional, but it still needs many enhancements to make it a more pleasant user experience. Also, being an experimental concept, I didn’t want to invest too much time into the initial version. Perhaps as time goes on, and interest in honeypots grows, the HaaS feature will gain some popularity and you will want to HaaS it!
But unreliable infrastructure problem still not solved!
New Infrastructure & New Architecture
I’m very happy to say that HoneyDB is now running on new infrastructure with a new “Frankenstein” architecture as I like to call it. Before diving into the details of the new, here is what the old looked like:
The above is a very simple architecture, one web server handling all requests with a MySQL database. Replication is setup with another MySQL database just in case service becomes extremely unreliable and I lose the master database. The main issues with this were:
- It doesn’t scale. As more HoneyPy nodes are deployed the more pressure is placed on the web server to process those requests. The pressure on the web server is only compounded with API clients pulling data and browser traffic querying data.
- While I had 80 GB disk storage on the database servers, the database pushed past 20 GB and steadily growing. Doing any database maintenance and backups are soon to get uncomfortable.
- At times, the cloud provider’s datacenter would lose connectivity to the Internet. It didn’t happen often, but enough to be very annoying.
- The cloud provider’s SSD storage would frequently error out, making the drives read-only. The only way to remediate it was to attempt rebooting and running fsck to fix errors. Again, very annoying.
- In some extreme cases, when a server inexplicably becomes unreachable there is no way to recover. You have to delete the instance and rebuild. Incredibly annoying.
Certainly, HoneyDB is not a mission critical application so availability is not one of the highest requirements. However, I do want the community that does use HoneyDB to experience a solid level of site availability and responsiveness.
Okay, I wasn’t going to assume or expect that HaaS was going to make infrastructure dollars rain from the community at large. However, there was one member of the InfoSec community that came through big time. After my presentation at CarolinaCon, a Partner from Novcon Solutions offered to provide VM resources sitting on reliable infrastructure at no cost. This was and is so awesome!
The good folks at Novcon were happy to contribute to the project, and ultimately the community, by providing VMs with plenty of CPU, Memory, and storage space. Novcon also actually operates their own threat information project called Minotaur. You can find this project at https://minotr.net, and you can expect to see integrations between Minotaur and HoneyDB in the future.
Now that Novcon has solved the reliability and stability problem, I still have a scalability problem. Even with a more robust web server, the potential for a large number of HoneyPy sensors feeding HoneyDB could still be problematic. So I began investigating all the wonderful things cloud has to offer, and for relatively low cost. In the end, the result is freakishly awesome.
Up Into the Cloud
My goals for investigating cloud solutions were as follows:
- Cost effective, preferably low cost, storage of large volumes of data. With the database servers provided by Novcon I have plenty of room to grow, but I’m thinking about the long term and unexpected burst in HoneyPy sensor data? I’m fine with storing about 90–100 days worth of data in MySQL on the VM, however, I also want to be able to retain or archive that data for future use.
- Scale to absorb high volumes of HoneyPy sensor deployments, resulting in very large volumes of requests. The volumes aren’t there yet, but with less than 15 active sensors HoneyDB has seen over 750,000 requests per day. Imagine if there were 50 or 100 active sensors deployed.
My investigation led me to experiment with various solutions on IBM Bluemix, Google Cloud, and Amazon AWS. Disclaimer, in no way, did I do an exhaustive evaluation of all cloud offerings. I looked at what I was familiar with and what was recommended by colleagues. A majority of the evaluation criteria was, “how much is this going to cost?”
I did look at the possibility of replacing the MySQL VMs with a cloud database. However, I quickly realized the options for cloud database storage are not cheap. With a database size of 20 GB it the starting cost would be north of $100 per month. For some other cloud options, storage costs were not so bad, but charges for querying the database is what would easily add up.
So what did I end up with? I found out that I can get very large archive data storage and scalability to handle extremely large volumes of requests, and get it crazy cheap! The resulting architecture is a bit of a freak show of solutions and providers, but it is working really well! I’ll now walk you through what I call a Frankenstein architecture.
This is the breakdown of my Frankenstein architecture:
It’s all the rage these days, and for good reason. HoneyDB uses both OpenWhisk on Bluemix, and Lambda on AWS. Both have an entry level service that can handle a ton of requests for little cost. I leverage both to spread the load, and cost, even further — and because, why not?
OpenWhisk: $0.000017 per second of execution, per GB of memory allocated. Bluemix API Gateway: no cost.
Lambda: 1 million free requests per month and 400,000 GB-seconds of compute time per month. $0.20 per 1 million requests thereafter ($0.0000002 per request). AWS API Gateway: $3.50 per million API calls received, plus the cost of data transfer out ($0.09/GB for the first 10 TB).
It too is all the rage these days, also for good reason. I’m leveraging Google Cloud PubSub to process all the requests taken in by the OpenWhisk and Lambda serverless functions. PubSub is another solution that can handle tons of requests for little cost. This also helps to avoid putting too much load on the MySQL database. The MySQL server can ingest data from PubSub at a controlled pace.
Monthly volume: $0.00 first 10 GB; $0.06 per GB for next 50 TB.
Big Data Storage
Since I don’t need to have all the historical data available on the HoneyDB site, I can offload all the data to Google Big Query. As a result, I’ll only keep about 90–100 days of in the MySQL database. Google Big Query pricing is insanely cheap.
Storage: $0.02 per GB per month; Streaming inserts: $0.05 per GB.
Frankenstein Architecture Diagram
Here is what the final monstrosity looks like:
I had a lot of fun going through the process of figuring out and building this architecture. With special thanks to Novcon, HoneyDB is now vastly more reliable and scalable. To wrap this up, I have a few final thoughts.
- Serverless is awesome.
- I think my “Frankenstein” architecture isn’t really that usual. Many modern applications, especially big data applications, leverage similar components, and architecture.
- Serverless will take over.
- Application server stacks/architectures are rapidly changing due to the overall cost savings cloud offers. I think one of the more significant savings will be in operations, e.g. no servers or infrastructure to manage, just push code.
- Have you heard about this amazing thing called Serverless? Here are three great articles, all on Serverless: