MacOS Security with Osquery and AWS Kinesis Firehose

Craig Huber
May 3, 2018 · 7 min read

Recently, I’ve been more concerned with the security of my digital devices and my online presence. In 2011, my online email account was compromised and had all my emails exfiltrated and then deleted. The amount of my personal information I lost in those emails was mind numbing. I suspect the my credentials were key-logged from an Internet Cafe I used in Bali back then but since then I’ve enabled two factor authentication on all my online accounts to prevent it from happening again. If you don’t already use two-factor authentication or how to use it, I suggest reading this article from the awesome Tobias van Schneider.

Despite doing all these things, I still feel a bit weary installing new apps on my Mac these days. With software password managers, local cryptocurrency wallets and storing local API keys on a machine, being infected by a seemingly harmless application like Handbrake feels like a recipe for disaster.

Connected Black Box

As soon as you connected to the Internet, you have very little visibility into what your device is connecting to or what it’s doing. As an average consumer, you might think the only thing connected to the internet is just your browser. However, there are dozens of operating system services and applications you’ve installed that are making outbound connections and listening for inbound connections as well. Builtin utilities such as Activity Monitor, nettop, lsof may help you poke around, but who has those running and constantly monitoring them all the time?

Commercial utilities like Little Snitch or Private Eye will do the trick, but I wanted something more robust and open source.

Enter Osquery

Osquery is a neat utility from Facebook that allows you to query your Linux or MacOs operating from a sqllite-like interface. With it, you can query things like the filesytem, processes andnetwork of your machine. For example, to find all running processes on your system you would write something like this

osquery> SELECT name, path, pid FROM processes;

What’s interesting though, is that it comes with a daemon that will run in the background on your machine and execute queries at scheduled intervals. So to query which processes have a listening TCP socket, I wrote a query like this:

SELECT path, label, program_arguments, inetd_compatibility, root_directory FROM launchd WHERE label NOT LIKE ‘’;

I then add it to my osqueryd configuration file to run every 60 seconds.

// Define a schedule of queries:“schedule”: {“listening_processes”: {// The exact query to run.“query”: “SELECT DISTINCT, listening.port, FROM processes AS process JOIN listening_ports AS listening ON = WHERE listening.address = ‘’;”,// The interval in seconds to run this query, not an exact interval.“interval”: 60,“description”: “listening_processes”,“value”: “Processes with listening ports”},`

By default Osquery will log to your local filesystem in /var/log/osquery/osqueryd.results.log and you will see something like this:

{“name”:”listening_processes”,”hostIdentifier”:”Craigs-MacBook-Pro.local”,”calendarTime”:”Wed May 2 23:19:31 2018 UTC”,”unixTime”:”1525303171",”epoch”:”0",”counter”:”2060",”columns”:{“name”:”syslogd”,”pid”:”61",”port”:”63313"},”action”:”added”}

Here you’ll see the name and port of the process as expected but notice the action field. Osquery writes differential logs, meaning it has noticed a new syslogd process has started since the last time it ran the query and added that difference to the log file. This means you don’t have to sift through huge log files looking for potential changes in your system, they are neatly organized into one log file.

Kinesis Firehose

So now we have a local log of any changes to the daemons running on our system. But its still local to our device and I think most people aren’t disciplined enough to constantly check it. It would be nice to have it streamed to the cloud and visualize it in a browser. Enter Kinesis Firehose. Kinesis Firehose allows you to capture, transform and load your streaming data into another system like S3, ElasticSearch or Redshift. Sounds like a good candidate for storing streaming Osquery data right? You would be correct — Osquery supports Kinesis Firehose out of the box. I mentioned earlier that Osquery will use the local filesytem logger, but we can add additional logger for Firehose one like this:

"logger_plugin”: “filesystem,aws_firehose"
"aws_firehose_stream": "osquery-firehose",
"aws_access_key_id": "XYZ123",
"aws_secret_access_key": "XYZ123",
"aws_region": "us-east-1"

Now we just need a Firehose Delivery Stream.

Image for post
Image for post
Create a delivery stream

And lets name the Firehose stream to match the stream name in our Osquery configuration

Image for post
Image for post
Name the delivery stream

We need to load our log files somewhere, so we will use S3 as the destination

S3 as the destination for our logs

And finally, choose a bucket name to store your logs. You don’t need to transform data at this point, so default options are fine for the next few prompts

After a few minutes you will see your Firehose. In the Monitoring tab of the Firehose will show some incoming bytes if everything is setup correctly.

Image for post
Image for post
Firehose: Active!

If we check our s3 bucket, we will also see our log files being generated there

Image for post
Image for post

Visualizing the data with AWS Athena and AWS Quicksight

At this point we now have log data streaming from a Macbook to Kinesis Firehose. Something like this:

Macbook > AWS Firehose > S3.

But this does us no good unless we are opening the log files in the S3 bucket on a daily basis which I can assure you I wont. We need a way to query the .log files and visualize them on a dashboard. This is where AWS Athena comes in. Athena lets os query all the log files in the bucket without needing any kind of SQL server. But first we’ll use AWS Glue to crawl all our files and create a schema for our database.

AWS Glue

Before we can query a table in AWS Athena, we need some sort of schema for our json formatted data. Essentially we will want to convert this:

{“name”:”listening_processes”,”hostIdentifier”:”Craigs-MacBook-Pro.local”,”calendarTime”:”Wed May 2 23:19:31 2018 UTC”,”unixTime”:”1525303171",”epoch”:”0",”counter”:”2060",”columns”:{“name”:”syslogd”,”pid”:”61",”port”:”63313"},”action”:”added”}

Into a SQL table that we can query later. Fortunately, AWS Glue has a concept of a Crawler that automagically discovers your schema by pointing it at a S3 bucket. First, let’s create a new Database in AWS Glue.

Image for post
Image for post
New database in AWS Glue

Then we will create a table via a Crawler.

Image for post
Image for post
Add a table using crawler
Image for post
Image for post
Give the table a name

Choose S3 as the Data store. And the Include Path should be the same path where Firehose is delivering the logs to.

Image for post
Image for post

Once youve created the Crawler and run it against your data you’ll find that it has discovered the schema for your logs and mostly figured out the data types

Image for post
Image for post
Discovering the schema

Querying the Data with AWS Athena

With our schema discovered and our tables created. We can switch back to AWS Athena and start querying. In the query editor you can create SQL queries like your normally would. The important thing here is to make sure we are seeing the data as expected.

Image for post
Image for post
Running a query against the data

Visualizing the Data with AWS Quicksight

Okay, the data looks good and now we can start painting some pretty graphs with Quicksight. All we need to do now is to set Quicksight to use Athena as a datasource.

Create a new Dataset with Athena as the source. Choose the Athena database you created earlier

Image for post
Image for post
Choose the Athena database name

Next, choose Edit/Preview data. This step is important because we want to create a custom SQL query to handle our nested JSON. Choose

  • Data source: Query
  • Custom SQL: Use SQL
Image for post
Image for post

The reason we want to use Custom SQL is due to the fact our data has nested JSON. If you look at the log data you’ll see the data we are interested in is actually a nested struct.


We need to place name, pid and port into their own columns so that Quicksight can visualize them. In the Custom SQL, place the following query.

SELECT name, 
action, as process, as pid,
columns.port as port
FROM osquery

Now, Quicksight can automatically create graphs from that query

Image for post
Image for post
Graph of Listening Processes over time


This is just an example of what you can do with Osquery and some AWS primitives to improve your OS security. We could extend this further and generate SNS alerts from the Firehose or have a Lambda function reading from the Firehose and analyzing it in real time for specific threats. Additionally, Osquery has the concept of “packs”, which are predefined queries for it-compliance, ossec-rootkit, vuln-management and more.

If you have any suggestions on how I could improve this further, I’d love to hear them.

Stay safe.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store