CodeX

Everything connected with Tech & Code. Follow to join our 1M+ monthly readers

Member-only story

Data Science & Exploratory Data Analysis: the Panda versus the Pony!

--

Exploratory data analysis sits at the core of any insightful data work. Performing log analysis in search for threats is no different.

Whether you are a security analyst triaging alerts or a detection engineer looking for detection opportunities from logs, the ability to understand data is what determines your success.

Now that security logs are abundant, speaking to data is what sets you apart from an average cybersecurity pro with domain knowledge 'only'.

What is EDA?

In statistics, exploratory data analysis (EDA) is an approach of analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods — Wikipedia.

In the context usually covered here, mainly around detection engineering and security analytics, that's the process of digging into log data to answer the questions we often have when facing a new log data source (dataset).

Exploratory data analysis is the discovery of trends and patterns in data using statistics and visual representations.

Many individuals in our industry already do that without knowing, myself included (a few years back). EDA is the entire process of untangling a brand new log source to figure out what’s inside!

Meet the Panda

I have recently started (formally) studying Data Science as part of a professional program at MIT. I highly recommend it! ❤️

As expected, the very first lab involves programming in Python and leveraging some of its multiple libraries available to explore data.

Turns out, the work we do when faced with new log data is not far from what data engineers or data analysts do when dealing with a new dataset.

Ultimately, security relevant logs represent a fraction of the vast array of data relevant to businesses. Nevertheless, the practices employed for acquisition and consumption are becoming remarkably consistent.

When it comes to numerical data in Python, numpy is the fundamental package needed for scientific computing among others (ex.: scypi).

--

--

CodeX
CodeX

Published in CodeX

Everything connected with Tech & Code. Follow to join our 1M+ monthly readers

Alex Teixeira
Alex Teixeira

Written by Alex Teixeira

I design and build detection and SIEM/EDR/XDR content for Enterprise #SecOps teams #DetectionEngineering http://opstune.com

No responses yet