How we are building a defense against the next big cyber threat with machine learning

Esteban Vargas
3 min readJan 4, 2019

--

You might have already heard from cryptojacking, which is when hackers hijack computers to mine cryptocurrency without the user’s consent.

Cryptojacking is turning out to be a better illegal business model than ransomware because the victim doesn’t need to pay for a ransom. A site containing a cryptojacking script runs in-browser code that mines Bitcoin, Monero or some other cryptocurrency. The user just perceives a slower computer, but doesn’t know why. Since the scripts are in-browser, it’s really hard for anti-viruses to detect them. That’s why illegal download sites have shifted their business model towards cryptojacking.

Over 33,000 sites (summing up over 1 billion users )have been reported to contain cryptojacking scripts. The monthly growth rate is calculated to be 18%.

For businesses this is a problem because some cryptojackers act as a worm, meaning that if a machine in a Wi-Fi network gets infected the rest of the machines will be as well. Companies will spend money in IT support, changing equipment and even in an electrician (since the high levels of CPU usage generate expensive electricity bills).

At https://www.safetalpa.com/ we saw one of our customers suffer from this attack. When exploring how to solve the problem, we concluded that current solutions are superficial. These solutions are open-source browser extensions that blacklist known malicious sites, but we asked ourselves: what if a malicious site isn’t listed yet? This is when machine learning comes to the rescue.

  1. We are performing both supervised and unsupervised learning on the machine’s network statistics. The netstat command provides us a high-dimensional dataset containing variables such as foreign address, home address and transmission protocol among others. We’ve been able to reverse engineer cryptojacking scripts to have a labeled dataset which we can feed to a neural network, but we’ve also understood that we need to cluster network packets with unsupervised learning as well for robustness.
  2. We‘re experimenting adding static analysis to our platform. This means that we’re getting a bunch of JavaScript code that run cryptojackers and analyzing the software complexity metrics (Halstead complexity, Cyclomatic complexity, SLOCs, etc.) of such. Since we understand that the malicious scripts will evolve, a machine learning classifier that understands new patterns is the way to go.
  3. On our product roadmap we also plan to analyze CPU usage in the future.
  4. Since we are very conscious of the fact that hackers will fight AI with adversarial AI, we’ve researched about how to defend from such thing. There’s a great paper on the topic that proposes a solution based on Kerckhoff’s second cryptographic principle. In such scenario it’s assumed that the attacker completely knows the system’s architecture, has access to the data used for training and testing and he can observe the output of the classifier for each given input.

We’re very proud to be running our beta version with 3 SMEs (summing up over 70 computers) already. If you would like to test our beta feel free to write me at esteban@safetalpa.com

--

--