Securing Windows: Machine Learning for Cybersecurity

Bryan Phee
Intel Student Ambassadors
7 min readOct 28, 2019

Over the past few decades, significant technological advancements have allowed for cost reductions and more widespread access to personal computers. However, the increased uptake of computing devices is also paralleled by an increase in the number of cyber-attacks that users are exposed to. Cyber-attacks are the fastest-growing crime in the United States, and cybercrime has been predicted to cost companies US $6 trillion every year by 2021 [1]. Large-scale cyber-attacks such as WannaCry also attracted significant public attention, in which more than 200,000 Windows users had their personal files encrypted and were forced to pay ransoms [2].

Many massive hacks are identified as advanced persistent threats (APTs), which are often well-resourced and target specific organizations with high-value data. APTs often have devastating effects as seen in the case of WannaCry, and they are hence considered a priority to deal with by most organizations. Organizations generally secure their systems using signature-based and rule-based protection tools such as firewalls. However, many APTs are programmed to lie dormant for a long period of time before reactivating within the target system only when needed. This allows them to avoid detection by outer defenses, making these tools insufficient to block advanced APTs. However, APTs can be identified using anomaly detection. As APTs are used to accomplish specific criminal objectives, the processes called by them will be different from those called by typical system users.

Windows is the dominant desktop operating system worldwide, with an estimated 77.61% share of the market as of July 2019 [3]. There are an estimated 800 million devices running the latest version of Windows [4], making it a large market to target. Reviewing current research done in APT protection for the Windows operating system revealed previous work done by Tuor et al. [5], who implemented a neural network for anomaly detection. Motivated by this, the objective of this project is thus to implement and iterate upon the models proposed in [5].

Methodology

A subset of the publicly available “Comprehensive, Multi-Source Cybersecurity Events” dataset published by the Los Alamos National Laboratory [6] was used with the neural network. The chosen dataset contains more than 1.6 billion authentication events captured in an internal network over 58 days. In total, 386 malicious event logs were identified in the relevant data used for testing. The log format, and an example of a log are both shown below in Figure 1.

Figure 1. Log Format and Example of Data

The process in which the loss associated with each individual log was calculated is illustrated in Figure 2. Each log was first split into n shorter sequences of equal size, which were subsequently fed into the network. The probability distribution of each token was calculated, and the loss of each prediction was calculated. The average value of all sequence losses was then used to determine if the event is considered anomalous. A unique neural network model was trained for each user to learn the user’s normal behavior, which generally differs from individual to individual. To maximize the predictive performance on the dataset, different variants of RNNs were tested and compared, and multiple optimizations were made to the structure.

Figure 2. Loss Calculation Process for Single Log

As event logs are time-ordered and sequential, recurrent neural networks (RNN) were chosen as they are able to learn the relationships within the data and make suitable predictions. To obtain the best-performing model, comparisons were made between the standard RNN model, an advanced Long-Short Term Memory (LSTM) model, as well as bidirectional versions of both models. In addition to storing short-term memory from tokens that are nearby the next predicted token, LSTM networks are superior to traditional RNNs as they can also record long-term previous information in hidden states. In a bidirectional RNN model, this memory storage process takes place in both directions; a token that appears later is also used to predict the previous token. This allows the algorithm to work independently of the order in which the fields appear within the log, making it a suitable choice in this case where the optimal field order is unknown. Tuor et al.’s paper in [5] implemented a bidirectional LSTM for this purpose, and the reported results were chosen as a basis for reference and comparison.

Preliminary Testing

The baseline performance was first recorded using an embedding input dimension of 50, with 2 hidden units in the RNN / LSTM layer. These numbers were arbitrarily chosen and would be tweaked in the hyperparameter tuning section after the baseline performance was recorded. The neural networks were created using the Keras API and the Tensorflow backend, and they were trained on the Intel DevCloud platform, which provides optimized frameworks for more efficient training and evaluation. The obtained results are shown in Table 1 below.

Table 1. Comparison of RNN Model Performance

The optimized structure shows a higher true positive rate of 83.1% with a significantly lower false positive rate of 5.56%. It was noted that the bidirectional SimpleRNN model performed better than the bidirectional LSTM model. This can be attributed to the input sequences having relatively short lengths; as a result of this, there is no improvement if the model uses a long-term memory to remember tokens that are far away from each other. Bidirectional models were also observed to perform better than their traditional unidirectional counterparts. This is because the optimal ordering of the fields in each input sequence is not known; bidirectional models are hence able to ignore this uncertainty by taking both directions of propagation into account. The results obtained with the various models show improved performance over the findings reported in the reference paper using word-level tokenization. This could be due to differences in the method used to set up the data and in the model setup.

Tweaking and Tuning

Further optimizations were also completed through the tuning of the network hyperparameters. Single-layer neural networks are only able to represent linearly separable functions where classes to be predicted can be easily separated [7]. Using a network with multiple layers allows for the modeling of more complex functions. This could potentially improve the predictive accuracy of the neural network if the problem requires the deep learning model. As a result, a model including an additional dense layer was also tested for its performance. The results of these tests are reported in Table 2.

Table 2. Results of Hyperparameter Tuning

The results obtained in Table 2 show an increase in the true positive rate and a correspondingly higher false positive rate with an increase in the embedding input dimension and the number of hidden units in the LSTM layer. On the other hand, the added dense layer caused a significant decrease in the true positive rate, while the false positive rate was lowered by 33% from the original model’s results. In the context of this problem, accuracy is prioritized, and as a result, it is determined that the model performs better without the additional dense layer.

Conclusion

In conclusion, we have created a recurrent neural network that has shown good performance in detecting anomalous events within the Windows operating system. We have also explored how different neural network structures and hyperparameter tuning can produce different results on the same dataset. Cybersecurity is a highly important relevant field, as it affects every one of us in our daily lives. Machine learning has the power to potentially enhance cybersecurity and help protect our systems and data from attacks.

References

[1] Morgan, S. (2018). Global Cybercrime Damages Predicted To Reach $6 Trillion Annually By 2021. Retrieved from Cybercrime Magazine: https://cybersecurityventures.com/cybercrime-damages-6-trillion-by-2021/

[2] Piper, E. (2017). Cyber Attack Hits 200,000 In At Least 150 Countries: Europol. Retrieved from Reuters: https://www.reuters.com/article/us-cyber-attack-europol/cyber-attack-hits-200000-in-at-least-150-countries-europol-idUSKCN18A0FX

[3] Statcounter. (2018). Desktop Operating System Market Share Worldwide. Retrieved from Statcounter: https://gs.statcounter.com/os-market-share/desktop/worldwide#monthly-201807-201807-map

[4] Microsoft. (n.d.). Microsoft By The Numbers. Retrieved from Microsoft: https://news.microsoft.com/bythenumbers/en/windowsdevices

[5] Tuor, A., Baerwolf, R., Knowles, N., Hutchinson, B., Nichols, N., & Jasper, R. (2018). Recurrent Neural Network Language Models for Open Vocabulary Event-level Cyber Anomaly Detection. Proceedings of AAAI-2018 Artificial Intelligence in Cyber Security Workshop.

[6] Kent, A. (2015). Cybersecurity Data Sources for Dynamic Network Research. Dynamic Networks in Cybersecurity.

[7] Brownlee, J. (27 July, 2018). Machine Learning Mastery. Available: https://machinelearningmastery.com/how-to-configure-the-number-of-layers- and-nodes-in-a-neural-network/

--

--