Analyzing TI feeds for overlap, novelty and aging

M H
OUSPG
Published in
3 min readJul 16, 2019

In this blog post we address the problem of evaluating threat intelligence (TI) feeds. We analyze a set of 16 freely available feeds from the point of view of overlap, novelty and aging. This is accomplished by using the following tools:

We learned that overlap, novelty and aging tests turned out to be a feasible first steps towards repeatable analysis of threat intelligence feeds. However, one should keep in mind that the interpretation of the results may not be as straightforward as one would initially think.

Gathering the data

Our main interest is to focus on feeds containing IP addresses as indicators of compromise (IoC). To this end, Combine is used to gather TI feed data and storing it in a format suitable for tiq-test. We do this daily and get about 18MB of GZipped CSV data per day. The data gathering step takes about 10 minutes on a typical desktop computer. Tests from tiq-test suite are then run against this data set in R programming environment.

Feed Overlap

The overlap test takes 20 seconds and it tells us which portion of a feed is contained in an another feed. The results are given in graphical form such as the figure below. Numerical values are available in R variables. We see that all but one feed are quite unique in their IP address content.

Similar overlap analysis matrix is available from MISP at https://www.misp.software/feeds/

Feed Novelty

Running the novelty test takes about two minutes to produce numerical results which are also displayed nicely in graphical form. For the purposes of illustration we show results only for a few feeds in figures below. The graphs depict the ratio of IP addresses added and removed per day. Our take is that high quality feeds update their content more than lower quality feeds. However, this is highly dependent on feed type. For example, a feed may contain highly relevant data which for some reason or another does not update itself very often.

This test failed for one of our feeds due to URL redirecting in data gathering stage. In other words, the source record in our data is not the same as the URL in our list of feeds.

Feed Aging

The aging test takes about two minutes to run and it analyzes feeds in terms of their aging. Aging means the number of times an indicator is repeated on a feed throughout the time interval of the data. A sample of results is shown in the figures below.

Again, this test failed for one feed for the same reason as the novelty test (data record did not match the original feed URL).

Sources

https://gitlab.com/CinCan/wp1/tree/master/tiq-tests

--

--