Pre-processing Your Data For Machine Learning using Splunk

PCA and Standard Scaler

--

PCA (Principal Component Analysis) is a machine learning method that allows us to take multi-dimensional data and then map it into a much less complex space. For example we might have seven different variables within an IoT solution, and which would be difficult to visualise in seven dimensions. With PCA, we can reduce this down to two or three dimensions and which will make it easy to visualise.

Now, let’s use PCA to extract data for clustering for IoT data:

And then create a new experiment:

Next we load up our dataset with [here]:

| inputlookup track_day.csv

We can then see we have data relate to the “batteryVoltage”, “engineCoolantTemperature”, “engineSpeed”, “lateralGForce”…

--

--

Prof Bill Buchanan OBE FRSE
ASecuritySite: When Bob Met Alice

Professor of Cryptography. Serial innovator. Believer in fairness, justice & freedom. Based in Edinburgh. Old World Breaker. New World Creator. Building trust.