The Truth Behind Anomaly Detection

I am often asked “How did you find that?” The answer is usually not as simple as “I clicked here, then there, then boom.” Rather, it’s an in-depth discovery process: “I found IP x.x.x.x is their internal scan box, x.x.x.x is the CMS (Configuration Management System), x.x.x.x is the Nagios server, x.x.x.x runs their ‘Y’ script, etc.”

All of this information needs to be sifted through by your security team. Whether it’s post-incident analysis or daily reviews and monitoring, all events must be accounted for. Adding on top of all of the undocumented network traffic, we have noisy (weak) signatures to sift through.

Speeding Up Analysis

This analysis can be sped up in a number of different ways, but first it’s important to note that it is impossible to baseline IDS/IPS traffic without knowledge of the network. This knowledge does not necessarily need to be known by the incident responder for him/her to be successful. As long as the network and IDS is properly tuned; this knowledge may not be required for successful identification of anomalous behavior. I’m talking a perfect world here, and we all know these don’t exist.

The next point to note is that this knowledge can be acquired during the analysis phase, but does extend the time for response. This is where proper documentation, research, and configuration of your network and IDS during the initial deployment of your security tool is crucial.

A proper baseline cannot be established without an initial investment in time and lots of intervention. The first place to start is with the source of your data, the network. Performing interventions throughout your network to remove the ‘white-noise’ will drastically decrease your time to respond and increase your anomaly detection. This is strictly attributed to creating smaller, more actionable data sets.


  • Remove/fix broken scripts
  • Identify scanning hosts
  • Vulnerability scanner
  • Configuration Management Systems
  • Document Purpose of hosts
  • Web Server
  • Database Server
  • Active Directory
  • Etc.


  • Whitelist scanning hosts (HOST => INTERNAL) for expected types of traffic. Keep malware and reputation based signatures on:
  • NMap, Vulnerability, etc.
  • Adjust signatures to watch host for type of service. For example, there is no need to have signatures watching a Linux box for IIS types of attacks.
  • Do Not:
  • INTERNAL => HOST, EXTERNAL => HOST, or HOST => EXTERNAL (depending on purpose)


Below are two histograms over a four day period. Figure 1 displays the raw data without filters. It’s obvious something had occurred on 12/12, but what’s not so obvious is what occurred on the evening of 12/09. This is due to the overwhelming number of events that were created from two hosts. These two hosts, and, are known scanning hosts. Once these two hosts are filtered out (Figured 2), our view moves from 30,000 feet to somewhere closer to the third or fourth floor. A nice detail to note is the top number on the y-axis (9,000 vs 2,500). We could do even better here, but this is a great example of views like these losing value due to other overwhelming, less significant data being included in your data sets.

Screen Shot 2014-12-14 at 2.50.12 PM




  1. Continuous baselining
  • Baselines should constantly be adjusting, week-to-week, month-to-month due to tuning
  • Baselines can adjust while doing anomaly detection (See Screen Shots)


  1. Display all data
  2. Identify…
  • Top x (IP, Signature, Port, Country, etc)
  • Bottom x (IP, Signature, Port, Country, etc)
  • Random subset of (IP, Signature, Port, Country, etc)
  1. Research (Internal Host, Packets, External IP)
  2. Action (Block, Document, Restore, etc)
  3. Remove from current view
  4. Rinse and Repeat

If I could script it…

import random
import research
signatures = open('list-o-sigs', 'r')
bottomx = len(signatures) - 10
topx = 10
randomx = 10
while topx > 0:
topx -= 1
while bottomx <= len(signatures):
bottomx += 1
while randomx > 0:
which_signature = random.randint(0, (len(signatures_list) - 1))
randomx -= 1
Have questions about anomaly detection or another security topic? Email us at
One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.