Using osquery for Audits and compliance — Windows
Regular audit of the cyber telemetry is not only needed as a part of a various compliance checks (PCI-DSS, HIPPA, GDPR etc), it is an equally important aspect for maintaining cyber hygiene and can prevent plenty of breaches. And as the famous adage goes ‘prevention is better than cure’.
With the latest breach on the DNS infrastructure, the US government has also advised on the importance of audits.
But then comes the question of the RoI with regular audits and setting up practices around it. For one, it needs procurement of relevant tools and then a bevy of experts to generate reports, each of which is neither easy nor cheap.
For most security based audits, following activities from a device need to be monitored regularly:
1) File Activity (File Integrity Monitoring) — Define a set of files (and folders) where all the write/modify/delete actions can be tracked.
2) Process activity — Record the laucnch of all the processes that could then be matched against any suspicious rules
3) Networking activity — Record all the inbound/outbound connection activity and (as mentioned above) DNS look up and resolutions, http requests, so on and so forth.
4) Removable media activity — Record the USB inserts
5) Health check monitor of the security software on the device
6) Regular monitoring of application and system logs
When it comes to auditing reports for endpoint devices, for the sheer volume of data they generate, compounds the problem of audits.
With agents like osquery, this problem gets addressed to quite an extent. It is a community built agent and therefore ‘free’. (Yes, free as in ‘free beer’). Secondly, its sweet secret sauce that enables to collect data in form of structured SQL tables and thereby enabling a much easier audit process. Unfortunately osquery doesn’t provide a great deal of support for audits on Windows operating system. However, given its open and extensible nature, that problem has been solved thru a variety of ways with varying degrees of complexity and success, which gives end user plenty of choices to pick the one that works for their environment.
1) Using Windows Audits — With Windows 7 (and Server 2003), Microsoft introduced advanced audit capabilities of activities. These audit features are built on a mechanism called ETW tracing (Event Tracing for Windows). The log of these activities can then be pumped into Window Event Log and given that osquery has a way of capturing event log entries, a whole bunch of Windows audit events can be extracted via osquery. The auditing, however, is not enabled by default and has to be enabled by using a GPO. This trick is successfully used by a commercial solution built on osquery for Windows process audits. This is indeed a light weight and fairly simplified approach towards filling the gap in osquery. However this has a few fragile Achilles’ heels, which might make the fidelity of this technique a bit questionable:
a) If the audit policy can be enabled by an external mechanism (e.g GPO) or a local command, it could be easily be disabled by a malware that managed to gain privilege, which pretty much every malware will manage to get in a successful intrusion.
b) The ETW mechanism can be easily tampered with.
c) The event logging system of windows can be tampered with (and has been known to) by the attackers.
So if the audit and compliance that one has to meet requires stringent criteria regarding the source of the data and the fidelity of mechanism to collect that data can be questionable in this technique.
2) Using sysmon — This approach is a step up. Sysmon is an extremely flexible, light weight, monitoring tool from the stable of Sysinternals (Microsoft). It has an enormous fan following, including yours truly. Sysmon can be configured to collect a variety of system events and pump those out to windows event log. From there, following the above mechanism on monitoring osquery’s windows_events table, the endpoint telemetry can be captured. Can the attacker not turn sysmon off? Sure they can, but given that its not part of standard windows image, the attacker might have to jump few extra steps to determine if the telemetry is being captured by sysmon and then turn sysmon off. And in doing so, the attacker might leave enough traces behind that get captured. Sysmon also captures all its events thru its kernel driver and doesn’t rely on the ETW framework, therefore the fidelity of events captured by sysmon would certainly be higher than any other tool built on ETW in terms of accuracy, breadth and timeliness.
There would however be some other shortcomings on this solution too
a) Now we are dealing with 2 independently built software (osquery and Sysmon), so at some place a glue need to be added to be able to create a common deployment mechanism and give Sysmon its configuration, without which it can get really cranky and send a blizzard of data.
b) The beauty of osquery is its ability to present the operating system data and properties as SQL tables. The SQL syntax allows for all kinds of query constraints and joins to get a better context around each event and activity, without which it is merely a log stream forwarder and the shelf is fully of dime-a-dozen such tools. With the mechanism of Sysmon logging its event into event log and using osquery just to forward that stream hardly does justice to the technical capabilities of either osquery or Sysmon.
c) The fact that this mechanism is also dependent on Windows Event logger makes it also vulnerable to attackers shutting down event logging services.
d) The DNS recording on Sysmon lacks the level of sophistication and precision needed as perhaps requested by the US Govt advisory.
3) PolyLogyx Approach (via an osquery extension): There is no doubt that both the above mechanism can be made to work and are a decent enough alternates to get features like FIM and process audits in osquery. As an engineer however, there is always a tendency to look for the next better solution. With that aim, we created an osquery extension for real time events on Windows. The advantages of this model is:
a) High fidelity telemetry as all the events are being captured by kernel component
b) Sysmon style filters but being applied and consumed thru the standard osquery config file, thereby truly combining the 2 technologies, rather than glueing them with some kludgey SHIM.
c) The benefits of the oquery’s SQL syntax as all the telemetry is exported as normal SQL tables (instead of all the data gobbled up in one single table called ‘windows_events’), thereby creating a better data pipeline
d) Extensible model with new tables getting added at a regular basis, thereby making osquery a single agent for investigation, real-time telemetry, application log monitoring and incident response.
Depending on the level of sophistication needed by your compliance auditors, you can now take the best of the lot.