If You’re Into Cybersecurity, Get Into Splunk and Machine Learning

--

The future will see the rise of the cybersecurity specialist who uses data science and machine learning in the way that we have used speadsheets in the past. These specialists will be able to take data sets from logs on systems and from open source data sets, and then make predictions on the data. They will find the correlations and build new threat models. The rise of Splunk, too, seems almost endless, especially as it provides a great interface to machine learning models.

So let’s take an example for malware infection [download dataset]:

The data set we have contains 98,944 records, which is rather a lot, so we will just analyse the first 50,000. Overall the fields used are: receive_time, serial_number, session_id, src_ip, dst_ip, bytes_sent, bytes_received, packets_sent, packets_received, dest_port, src_port, used_by_malware, and has_known_vulnerability. First we will use the logistic regression learning method in order to train on “used_by_malware”) and against all the other fields:

--

--

Prof Bill Buchanan OBE FRSE
ASecuritySite: When Bob Met Alice

Professor of Cryptography. Serial innovator. Believer in fairness, justice & freedom. Based in Edinburgh. Old World Breaker. New World Creator. Building trust.