Introducing Syslog to AWS Kinesis via Osquery

Logs awaiting collection (Logs in Yyteri by kallerna, licensed under Creative Commons)

At Airbnb, we are committed to protecting our community. This blog is an extension of that effort, as we will be providing periodic updates about investments we are making into our security program and the broader community.

For our opening security-related post, we’re happy to announce we’ve open sourced osquery tables that allow enterprises to collect and query syslog data for both OS X and Linux hosts! This allows you to capture privileged actions (sudo), lateral movement (sshd), errors impacting system availability/reliability and more. In addition to these tables, we’ve open sourced osquery plugins that allow you to send query results to Amazon’s Kinesis Streams and Firehose offerings. Read below for more information!


We wanted an agent for OS X & Linux that supports:

  • File integrity monitoring (FIM)
  • IOC (indicator of compromise) intrusion detection 
    (IPs, domains, ports, file names/paths/hashes, …)
  • State-based intrusion detection
    (shell history, /etc/hosts, NFS shares, firewall settings, …)
  • Syslog collection
  • Flexible remote logging

osquery was the first option we looked at.

For those unfamiliar with osquery, it’s an open source tool that exposes the operating system as a performant relational database. You write SQL-based queries against tables that represent system attributes, such as users, processes, devices, network connections, etc.

Out of the box, osquery supports all but one of our use-cases: Syslog collection for OS X & Linux.

Since osquery is an open source project, we built syslog tables and contributed back to the community!

Before we jump into the details, it’s important to note that we only used safe, supported operating system API’s. You won’t find any kernel hacks or shell’ing-out like you’ll commonly see in security vendor products. You can see the code for yourself here and here.

Since osquery is an open source project, we built syslog tables and contributed back to the community

Apple Syslog

Our first contribution is an osquery table that allows you to surface, collect and query OS X ASL syslog data without any additional configuration.

In the hypothetical example below, we are querying for privileged actions performed on a given host. The query results show an attacker modifying/etc/hosts, escalating their privileges to root and loading a key-logger:

Besides security-related queries, you can also query ASL to surface errors that are impacting system availability or reliability:

In our case, the ASL table informed us that our VPN client was flooding syslog with debug information, a bug that was addressed in a later build.

The ASL table definition is here and additional usage and configuration details can be found here.

Linux Syslog

Our second contribution is an osquery table that allows you to surface, collect and query Linux syslog data.

Using osquery’s event framework, we ingest logs forwarded from rsyslog over a named pipe, maintaining appropriate permissions for data integrity. We then make those logs available for consumption through our new table. This is compatible with any Linux distribution supported by osquery and rsyslog (Ubuntu 12/14, CentOS, RHEL, …).

With relative ease you now have greater visibility into your infrastructure:

Additional usage and configuration details about the Linux syslog table can be found here. This feature is merged in osquery’s master branch and is expected to ship with osquery v1.7.4.

Syslog Challenges

By default, OS X doesn’t send all of it’s logs to the Apple System Log (ASL). For example, here are two logs that are not sent to ASL that capture information about application and package installation:


We toiled for hours, trying to capture this data via /etc/asl.conf and/etc/syslog.conf. We Google’d, we read the man page, nothing. We almost came to the conclusion these paths were hardcoded as a result of some references in /System/Library/PrivateFrameworks/*.

We stumbled upon success when we carefully re-read the man page:

Messages that match the query associated with a 'claim' action are not processed by the main ASL configuration file /etc/asl.conf.

Therefore, we can get these logs flowing into ASL by appending the storeaction into the following files and by restarting syslogd:

? [= Facility install] store
? [= Facility] store

There are unfortunate use-cases where an application may not (or cannot) log to syslog. As a result, we will be developing an osquery table for OS X and Linux that can consume any log-file and present it in a schema like so:{path, line, time, message}.

Amazon Kinesis Streams & Firehose

In addition to the syslog tables, we have released osquery plugins that allow for any query-results to be sent to Amazon Kinesis Streams & Kinesis Firehose. This feature is merged into osquery’s master branch and is expected to ship with osquery v1.7.4. These plugins use the AWS C++ SDK to avoid the need of deploying the Amazon Kinesis Agent.

Kinesis Streams & Kinesis Firehose give us flexibility in how we both process and store our logs. Auto-scaling features and interoperability with services like S3 (~$0.0300 per GB) make it cost effective and low-maintenance.

In an upcoming blog post, Airbnb Security Engineer Jack Naglieri will detail our Kinesis Streams & Kinesis Firehose use-cases and infrastructure. Stay tuned!

Concluding Thoughts

This engineering effort was a labor of love by Zach Wasserman. We want to thank Teddy Reed and Mike Arpaia for their code reviews and help.

It was a lot of fun contributing to this open source initiative and we encourage others to do the same! One fantastic way to give back is by sharing your syslog-related queries to the community via query pack contributions.

We hope this blog post and our open source contributions encourage others to explore this logging architecture with us.

Manager, Airbnb Security

Check out all of our open source projects over at and follow us on Twitter: @AirbnbEng + @AirbnbData