8 Steps to normalise naming of cyber threats and related entities

RST Cloud
2 min readAug 28, 2022

--

Many vendors and cyber researchers use a unique taxonomy or threat classification. One of the benefits of RST Threat Feed is that the normalisation of indicator threat names happens automatically for all of the various threat intelligence sources. When indicators come from different sources, even if they are actually related to one specific threat, the names used by different parties may vary and come with slightly different spellings or be a completely different word but sometimes a synonym. Usually, there are many variations of a name for the same threat because of that matter. Our engine parses out the names used by different sources and normalises them into one specific tag which uniquely identifies that particular threat.

A similar approach is applied to malware families, TTPs, threat actor groups, vulnerability names, phishing toolkits, and other tools used by adversaries.

Here is what the threat name normalisation process looks like in a real-life scenario:

  1. An indicator comes with some description (1–2 sentences).
  2. The engine breaks those sentences into tokens (word tokenisation).
  3. Then it combines them into “trigrams.”
  4. Match those using regular expressions (see the picture below) against a specially crafted dictionary. In the dictionary, we aggregate various synonyms for the threat names. Interestingly, the engine is capable of dealing with the most frequent typos as well. Now our dictionary has become quite rich and includes several thousand entries.
  5. If a trigram contains an already known threat name, then it is discarded. For the rest of the trigrams, the process takes the next step.
  6. We run the remaining trigrams through the stop-word dictionary (these words should never be a part of a threat name).
  7. If after that there are still trigrams left, then we split them into batches and push them through the RST Cloud engine that automatically searches them one by one in the data sources we use for threat name analysis (search engines, Malpedia, many TI portals, specialised blogs, forums, deep web, etc).
  8. The final check is to examine what the engine found for us and update the dictionaries based on the results.

These are just the basic steps, and in practice, you have to solve many very specific tasks (for example, do or not do stemming, lemmatisation, NLP, and then work with what they turn the names of groups like “APT XYZ. Flying young elephant in a china shop”).

--

--

RST Cloud

We democratise and revolutionise the field of Cyber Threat Intelligence and make it accessible, affordable, and effective for a wider range of companies!