This short post catalogs some resources that may be useful for those interested in security data science. It is not meant to be an exhaustive list. It is meant to be a curated list to help you get started.
Staying Current with Security Data Science
Here is my current strategy for staying current with security data science research. It leans heavier towards academic research since this is what interests me at the moment.
- Google Scholar Publication alerts on known respected researchers.
- Google Scholar Citation alerts on interesting or noteworthy papers.
- Follow security ML researchers on Twitter and Medium. They frequently share interesting and cutting edge research papers / videos / blogs.
- Periodically review proceedings from noteworthy security conferences.
- Skim published security conference videos from Irongeek looking for topics of interest.
Google Scholar alerts
Citation Alerts on these papers:
- “Acing the IOC game: Toward automatic discovery and analysis of open-source cyber threat intelligence”
- “AI^ 2: training a big data machine to defend”
- “APT Infection Discovery using DNS Data”
- “Beehive: Large-scale log analysis for detecting suspicious activity in enterprise networks”
- “Deep neural network based malware detection using two dimensional binary program features”
- “Detecting malicious domains via graph inference”
- “Detecting malware based on DNS graph mining”
- “Detecting structurally anomalous logins in Enterprise Networks”
- “Discovering malicious domains through passive DNS data graph analysis”
- “EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models”
- “Enabling network security through active DNS datasets”
- “Feature-based transfer learning for network security”
- “Gotcha-Sly Malware!: Scorpion A Metagraph2vec Based Malware Detection System”
- “Guilt by association: large scale malware detection by mining file-relation graphs”
- “Identifying suspicious activities through dns failure graph analysis”
- “Polonium: Tera-scale graph mining and inference for malware detection”
- “Segugio: Efficient behavior-based tracking of malware-control domains in large ISP networks”
New article alerts on these authors with the bolded being the most relevant / interesting to me.
- Alina Oprea — heavily focused on operational security ML.
- Josh Saxe, Rich Harang, and Konstantin Berlin — heavily focused on Malware detection/analytics using ML. Also a published book author.
- Manos Antonakakis and Roberto Perdisci — heavily focused on network security analytics using ML with a specialty in DNS traffic.
- Balduzzi Marco
- Battista Biggio
- Chaz Lever
- Christopher Kruegel
- Damon McCoy
- David Dagon
- David Freeman
- Gianluca Stringhini
- Giovanni Vigna
- Guofei Gu
- Han Yufei
- Hossein Siadati
- Issa Khalil
- Jason (Iasonas) Polakis
- Michael Donald Bailey
- Michael Iannacone
- Nick Feamster
- Niels Provos
- Nir Nissim
- Patrick McDaniel
- Stefan Savage
- Steven Noel
- Terry Nelms
- Ting-Fang Yen
- Vern Paxson
- Wenke Lee
- Yacin Nadji
- Yanfang (Fanny) Ye
- Yizheng Chen
- Yuval Elovici
Twitter can be a gold mine for new and relevant ideas, blogs, presentations, etc for security data science. You just need to make sure you continually follow the right folks. Here is a short list of thought leaders in this space (if I left you off it is my oversight so please don’t take offense).
For a more exhaustive list of others I would recommend following on Twitter, see this gist. This list is focused on Threat Intel, Threat Hunting, Detection Engineering, IR, and Security Engineering. It is not exhaustive, but is a good start.
Below are several interesting security conferences where research is published on security data science topics. It is a good idea to be on the look out for the proceedings from these events.
- ACM CODASPY (ACM Conference on Data and Application Security and Privacy)
- AI Sec
- Annual Computer Security Applications Conference (ACSAC)
- Conference on Applied Machine Learning for Information Security
- Deep Learning and Security Workshop (Co located with IEEE Security Oakland conference)
- DEEPINTEL Conference. Focus on security intelligence.
- Defcon AIVillage
- Machine Learning and Computer Security Workshop (colocated at NIPS)
- ScAINet: 2018 USENIX Security and AI Networking Conference
- Workshop on Data Management for End-to-End Learning
- Workshop on Graph Data-management Experiences & Systems (colocated with SIGMOD/PODS)
- Workshop on Managing Insider Security Threats (In Conjunction with ACM CCS 2017)
- Workshop on Mining and Learning with Graphs (colocated with KDD)
This page is also an excellent resource in general for top academic security conferences: Top Academic Security conferences list. The major industry focused security conferences like Blackhat, RSA, Defcon, BSides*, DerbyCon, and ShmooCon all frequently have talks relevant to security data science, but this is not their primary focus, so they are not explicitly called out above.
These resources will help you build a baseline of knowledge in Cyber Security and Machine Learning.
- Extrusion Detection: Security Monitoring for Internal Intrusions by Richard Bejtlich
- Intelligence-Driven Incident Response: Outwitting the Adversary by Scott J. Roberts and Rebekah Brown
- Counter Hack Reloaded: A Step-by-Step Guide to Computer Attacks and Effective Defenses (2nd Edition) by Edward Skoudis and Tom Liston
Security Data Science:
- Network Security Through Data Analysis: Building Situational Awareness by Michael S Collins
- Malware Data Science: Attack Detection and Attribution by Joshua Saxe and Hillary Sanders
- Machine Learning and Security: Protecting Systems with Data and Algorithms by Clarence Chio and David Freeman
Machine Learning / Data Science:
- Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow, 2nd Edition by Sebastian Raschka and Vahid Mirjalili
- Deep Learning with Python by Francois Chollet
- O’Reilly Learning Platform
- FastAI: Practical Deep Learning for Coders
- FastAI: Cutting Edge Deep Learning for Coders
- FastAI: Introduction to Machine Learning for Coders
- FastAI: Computational Linear Algebra
- Coursera Deep Learning
- Coursera Machine Learning
- Udacity Deep Learning
Short Courses / Live Sessions
O’Reilly’s learning platform has some pretty interesting security + ML / DS related “Live Training sessions”. These are usually just a few hours long and all make their course materials available through O’Reilly’s GitLab instance which is open to the public.
This blog post was originally published at my personal blog at http://www.covert.io/security-data-science-learning-resources/