This article was originally written by Zach Wasserman
Interested in correlating events from network monitoring tools to host activity? Support for Community ID hashing in osquery allows osquery’s endpoint instrumentation to be easily correlated with that of network monitors such as Zeek. Similar strategies can be used to correlate osquery logs with those from other tools that support Community ID. This includes Arkime (formerly Moloch), Suricata, and more.
Community ID is a hash of the network connection parameters that allows a connection to be matched between monitoring solutions that support the hash.
To generate a Community ID, a hash is performed with the source and destination IP addresses and ports, along with the protocol and a seed. The generated hash is deterministic and can be compared across implementing software.
How do we use Community ID to correlate the logs between a networking tool and osquery? Consider the following
conn.log generated by Zeek:
Say that we are interested in the TCP connection between
184.108.40.206:443. Looking in the last column of the log, we retrieve the Community ID
We can now use the Community ID as the connection to the information available in osquery:
This query provides a great deal more context for the network connection observed in Zeek. We can see what command is running (
cmdline), the path to the executable (
path), the executing user (
uid), the start time of the process (
start_time), and much more.
In this case we clearly observe that the network connection is a connection made with the
netcat tool to
The data retrieved by osquery can be further extended by joining to additional tables. For example, the following query also retrieves the MD5 hash of the process binary:
Extend the Concept
In the above examples, a live investigation of the network traffic is performed using
osqueryi. How can we take advantage of this functionality to perform retroactive analysis?
Queries can be scheduled within
osqueryd to log the Community ID of network connections along with the details about the associated process. These logs can be collected in the log aggregation pipeline/SIEM and correlated with the logs from network monitoring. Consider scheduling queries such as:
community_id column can then be used to correlate the events logged by network monitors.
On Linux, the
socket_events table may produce additional utility as it captures all socket connections, not only those active at the time of query execution.