Event Log Auditing, Demystified
In my personal experience, the topic of reviewing event logs has received a fair amount grunts, groans, and questions such as “You honestly expect us to review all of that data?!” or “We have so many systems! Where would we even begin?” or “We already have enough on our plate to worry about!”. Fortunately, the times have changed, and log aggregation has matured over a relatively short amount of time. Its existence alone however is not the complete answer to log auditing woes.
To start, let’s cover the ‘why’. What is the purpose of undertaking another tedious task and writing out an elaborate SOP? Well, from a practical perspective, incident detection. According to FireEye/Mandiant’s M-Trends 2018 report, the global median for detection time of their clients in 2017 was an astounding 101 days. Think about your most important systems and they type of data they process. What kind of information could your organization be bleeding in that timeframe? What if you could have been tipped off to that login from a high risk country, or repeated failed login at your critical servers?
Second, most compliance initiatives request that your organization performs some form of auditing on your event logs. The infamous NIST 800–53, commonly cited across SOX, HIPAA, PCI/DSS, and of course FISMA, request that your organization define a timeframe in which you review your logs. See most controls that are prefaced with an “AU”.
Straight forward enough in theory, but let’s examine the practical pieces of the puzzle.
Choosing Your Software
For starters, the idea of connecting into every system week after week to examine countless event codes across your entire network is not feasible. Having access to a plethora of information is great, but potentially useless without something to parse out the noise. If your organization does not currently have a log aggregation solution, get one set up ASAP. The power and insight into a well-structured log aggregation implementation is invaluable to any organization; if not for security reasons, then for operational troubleshooting.
There are quite a few options to choose from, ranging between commercial to open source. Splunk  is a clear contender for the top spot, and you can get started with a free 500MB license. Splunk was an early product to the game, and has been extensively covered by numerous cybersecurity outlets. If you are looking for a product with support, this is presently the best route to go. If you are even more cost conscious and feeling adventurous, take a look into the open-sourced ELK Stack . This option combines Elasticsearch, Logstash and Kibana into a robust and cost-effective alternative to the commercial products.
The choice is yours, but at this point in the game there is no reason to not have one.
Configuring Your Indexes
Now that you have a place to put your logs, which systems will you pull from and how will you structure them? This one is tough, as guidance typically is not clear. Do you pull logs from all of your servers and workstations? If your resources for storage are unlimited, sure! Unfortunately, this is likely never the case.
The rule of thumb here is to collect enough logs from your critical systems and infrastructure to accomplish successful reconstruction of events on your network. To start, some events you will want to correlate are:
- Domain Controller security logs, including authentication events.
- DNS queries (specifically A record lookups, if possible).
- DHCP lease events.
- PowerShell usage.
- Process Creation events.
- Any logging from systems used to remotely connect into your network.
- Web server logs from IIS, Apache, or your webserver of choice.
- Administrative events from your networking equipment, including wireless connection events.
The key is to keep your indexes clean. Think of your indexes as a collection of buckets for your logs. Combining your inputs is the equivalent of mixing your car’s oil collection pan with the water you just pulled from the well. Things swirl around, mix up, and get messy. In the end no one has any idea what is in there, but everyone knows it is useless.
Selecting Your Artifacts
With that being said, it’s time to think about the most efficient use of the data being aggregated. I don’t know about you, but going into your log aggregator of choice with a list of event codes sounds pretty daunting, and likely will lead to the process getting lost in the fray as other “more important” operational tasks take priority. The answer is to assemble a carefully curated dashboard with key artifacts ready for review.
Tools such as Splunk have some excellent built-in visualization capabilities to digest the information as easily as possible (pie charts, tables, etc). A great reference point to get started with is this SANS Reading Room  article on the topic of detecting penetration testers on your network. A good penetration test will simulate the actions of an intruder in a controlled fashion, and hone in on some key criteria to detect lateral movement in your network. This includes:
- User account lockouts.
- Failed VPN login attempts.
- Failed Windows logins.
- Modifications to sensitive groups (Domain Admins especially).
- Crawling on C$ and ADMIN$ shares.
- Log file deletion.
These are all common actions a threat may take to work their way through your network, and should absolutely be raising some eyebrows with your ops teams. I urge you to take a look at the cited reference below, and think to how you can apply these items to your own infrastructure. SANS is a fantastic resource for everything Incident Response and Cybersecurity, and will very likely be a part of your research journeys.
Implementing The Process
Here’s the fun part (sarcasm, obviously); it’s time to document the process. The unfortunate truth for us tech minded folks is that unless there is a documented policy and procedure, there might as well be no process at all. When an auditor inevitably gives your office a ring and requests a kickoff meeting, you will need to prove the work that was done.
Outline the policy describing what requirements are being fulfilled here. If you made it this far into the article, you likely have a business reason for why this process must be performed, and it should be clearly noted. From there, outline the procedure explaining:
- Who is responsible for performing the task.
- How frequently the task is going to be performed.
- What artifacts are going to be evaluated.
- Describe your escalation process if something looks suspicious, such as who you will contact.
- How you will document that the process was performed.
When it comes to tracking your actions, don’t think too hard on it. Leverage your existing ticket tracking or change management system (you do have one… right?) or establish a tracking form/spreadsheet to document your findings. Finally, do your peer review and get the sign-off needed to make it official.
Meeting security requirements in your network can be a tough task, and ambiguously worded guidance doesn’t make things much easier. Hopefully this article has provided you with a place to start when making your organization safer and more compliant in the realm of event log auditing. Having that level of transparency and insight into your network may be the most valuable defense against the ever-changing threat landscape.
Thanks for reading! The goal here is to help organizations of all sizes tackle complex IT security challenges, and bridge cybersecurity policy into operations. Comments or critiques? Reach me on LinkedIn , Twitter, email — jeremy.trinka[at]gmail[dot]com, or reply below.
Most importantly, don’t forget to clap and subscribe! If you don’t have an account with Medium, get one!
 FireEye/Mandiant, “M-Trends 2018 Report”, https://www.fireeye.com/content/dam/collateral/en/mtrends-2018.pdf
 Splunk, https://www.splunk.com/
 ELK Stack, https://www.elastic.co/elk-stack
 SANS Institute — Infosec Reading Room, “Detecting Penetration Testers on a Windows Network with Splunk”, https://www.sans.org/reading-room/whitepapers/logging/detecting-penetration-testers-windows-network-splunk-37367