Statistics used for Anomaly Detection on Network Systems

Sherif Mansour
2 min readSep 28, 2014

--

My notes

Operative Models
x and sequence of operations x1, x2, x3 etc.. (Fixed thresholds) setting a threshold xn+30 raise an alert

First & second order moments
From averages of previous moments and see if we can do deviations from that, ad we can derive standard deviations from those observations.

We can do that because of Chebyshev’s inequality; the probability that a new observation is outside interval is 1/d^2

What does this mean?
If you assume you have certain statistical properties in you original observations you can have a pretty good educated guess that if you have something that is THIS far our you have a pretty low probability that it counts as normal behaviour.

Drawback: Conceptual Drift
If you do not have an equal weight of all observations, i.e. a decay function (at a certain point old data do not hold value).

Multivariate
If we care about correlation, based on relationship of many variables so factor analysis can identify covariance between sets of variables through a set of latent variables.

Things get smudged a bit, but it is great to know if something on this corner of the network has a bearing on that corner of the network.

The place where you would deploy factor analysis is when you have linear dependencies (such as individual network packets) as long as you can map that to a scaler value.

Multidimensional scaling
Not looking for a single observation that is strange. condensing clusters in multidimensional spaces and define a proximity metric for objects

Scale things down to a mapping function, distance between points condense differences. Different of vectors of the dimensional space onto a smaller number of dimensions.

Markov process (proceeding step determines the next one)
Limited to certain events. Random process in which transition probabilities from one state to the next state depend solely on the preceding state (used event counter metrics).

First order Markov process is only a single preceding observation is considered as a matrix of states and if it exceeds a certain threshold an alert is raised, that is great because its not time based (can be run for hours / days).

Time Series
The exact opposite of the Markov process is time series. You observe the sequence and the time distance between observations. Observations are considered abnormal if the probability of an observation occurring at a measured point in time is low.

--

--

Sherif Mansour

Father | Ex-OWASP Chairman | Ex-OpenSSF Governing Board member | Cybersecurity Executive