Insightful analytics is the secret sauce of our current data-driven hyperconnected economy to run effectively any kind of trade, no matter the business size, vertical or geography. The driving demand and hyperinflation of data in our reach, is due to the digitalisation of almost everything around us. Digitalisation is the essence of digital transformation, which is the evolution that traditional brick and mortar enterprises are currently undergoing. The result is an ongoing demand for new and innovative digital solutions.
When it comes to blockchain technology there is not a lot more we can add to the myriad of articles on…
When I was working in the ship repairs industry we had a saying: if we could make the ship sail with half a propeller or even one fin, ship-owners would use it immediately. What was meant with this is that shipping is a business that is extremely cost sensitive and ship-owners create operational margins out of ruthless cost cutting. There are of course periods of high margins as the market goes up. Lots of money is made during these times. …
A few days ago we open-sourced our platform for anomaly detection in Python — you can read more about that here.
This post is focused on one feature of our framework: integration with scikit-learn. Sklearn is the flagship ML toolbox for Python, and growing by the day. To ignore their models and design patterns would be to reinvent the wheel.
So we have added a small example about how you can bring the full strength of scikit-learn to bear upon your detection problem, while still using dsio. Consider the following file, which you can find in the examples folder:
We are proud to launch the very first version of our open-source project for Anomaly Detection and Behavioural Profiling on data-streams, datastream.io (dsio on github).
We have a long roadmap ahead of us, but, release often and release early, as they say. So here it is — a minimal viable full-stack Python anomaly detector:
pip install -e git+https://github.com/MentatInnovations/datastream.io#egg=dsio
The purpose of the project is to perform the following functions:
Robust Anomaly Detection at Scale
One of the core competencies of the Mentat team has been anomaly detection, in particular unsupervised streaming data anomaly detection.
Anomaly detection (also known as outlier detection) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. These events may indicate network intrusions, industrial component failures, financial fraud or health problems.
Classifying anomalies correctly and efficiently determines the usability and effectiveness of many algorithms. It is a horizontal technology that is core to data driven methodologies.
Assume you want to maintain constant pressure in a certain container. You have control of a valve that can pump more air into it, or release some air. It sounds like a simple task: if the pressure is below the target pump some more air in, and if it’s above the target, release some — that should do it. Months pass, and your valve accumulates wear and tear: but you notice nothing, because your controller is doing its job, keeping the pressure constant — even if that means pumping a little more air each week to offset a small leak…
The fusion of the IoT with Artificial Intelligence is driving a new Industrial Revolution. A key ingredient in this transformation is autonomy: as learning machines gain sophistication and experience, they reach a point where they can be trusted to take decisions on their own, without direct or continuous human control. A great example of this can be found in the form of drones.
At Mentat we view drones as an agile flying platform for advanced sensors, hence the “Flying IoT”.
Consider the example of a remote wind farm that requires regular visual inspection of the wind turbines to detect and…
As promised in our previous blog post (Flavours of Streaming Processing), in this post we report the performance of our weapon of choice when it comes to classification on data streams: the Streaming Random Forest (SRF).
Random Forests were introduced by Leo Breiman et al in 2001. They combine two of the most powerful ideas in classification, decision trees and bagging, to represent a decision rule as a kind of majority vote over a large number of different decision trees that are generated probabilistically.
However, Random Forests scale poorly with the size of the dataset. This makes them impractical in streaming contexts…
In his excellent recent blog post “Streaming 101”, Tyler Akidau made a great contribution to the streaming community by teasing apart certain notions that have been confounded by standard albeit increasingly obsolete practices. Within that same spirit of paving the ground for the streaming revolution, this blog post wishes to emphasise the observation that streaming does not necessarily mean approximate, with a particular focus on machine learning.
Let’s start with the simplest possible example, that of a linear sum: it can be easily computed in an incremental (exactly-once) processing manner:
The answer in this case is exact: it agrees precisely…
A typical wind turbine is equipped with around 100 sensors, each producing 100 datapoints per second. And yet communication can be patchy, 3G rare. CCTV networks can only afford to send back to the data center a small fraction of the captured videostreams. Robots in a manufacturing plant can sense at millisecond granularity, but the plant’s SCADA infrastructure can handle perhaps 1 observation per minute (with 1 observation per 15 minutes being a typical sampling rate).
This is a very ‘2015’ problem. As the IoT gets more commoditized and high-frequency, one Moore’s law (cheaper sensors) goes against another (cheaper…