Event-Driven Machine Learning

Published in

Phi Skills

7 min readApr 8, 2020

How do we predict the data that a future Machine Learning model will need? We are at the start of a new software revolution: Artificial Intelligence. Knowing how to design your data to extract as much insight as possible should always be a priority. Unfortunately, it is not always the case. We will explore how to improve data architecture to anticipate future needs with the help of event-driven designs.

Better Software with Machine Learning

Machine Learning has open new promising abilities in the domain of software engineering. Software applications are personal assistants trained to answer predefined questions, like “Will this customer buy that product?”. Before, the only way to build software was to code human-defined rules for the application to answer those questions. In our example, the human had to tell the software how to calculate the probability for a customer to buy a given product. But with Machine Learning, instead of telling the software how to do the calculation, we can provide examples from the past, and the software will analyze this data to come up with rules on its own. This process is what we call “learning”.

The most crucial factor in Machine Learning is data. The quality of it will determine the quality of the rules the software will define. Without the required information, the software can’t find the correct answers. This constraint means that if we want to leverage the power of Machine Learning, we need to put our attention to one question: How do we get the most data?

Event-Driven Data Collection

Data collection¹ is the first step in any data science project. It is essential to work with Machine Learning. They are various ways of collecting data from one or multiple sources. When working with enterprises, it is not always easy to have direct access to sensitive but essential information. Too often, data scientists need to request data dumps. This practice might prevent accidents but drives a wedge between enterprise reality and ongoing projects. It prevents the teams from having real-time access to the data and requires bureaucratic processes that create unnecessary work and frustration.

How do we fix this: with event-driven data collection. By listening to events that the enterprise system emits, the data scientist can recreate new data in real-time. Additionally, he can process his data without any risks of altering the enterprise data. The enterprise system will also be able to filter sensitive information or anonymize it. The only requirement: the enterprise system has to adopt an event-driven architecture or event sourcing.

If we take our previous example, let’s imagine a data scientist wants to predict the probability of a customer to buy a product. To analyze the costumer’s behavior, the data scientist receives events each time a customer looks at a product, puts it in his basket, and finally buys it. By collecting those events in real-time, the data scientist will be able to correlate those with newly introduced features, recent incidents, or other types of circumstances that could affect the costumer’s decision to buy a product.

Event-Driven Data Exploration

To produce the most accurate answers, the data scientist will have to filter and pre-process the right information. Data exploration² is a vital step to discover the correct combination of transformations to provide the optimal data for his Machine Learning model. With a diversity of data visualization tools³, understanding data with its graphical representation helps to grasp the connections between the underlying information.

As I explained in my previous article, “Events are the Key to Master Data⁴”, using events will help the data scientist put data into context. By limiting his exploration to the mutable data, the data that is updated without keeping its previous value, he won’t be able to explore by whom, when, and why data has been modified. The additional information events provide will highlight behavior, patterns, habits, and more. Event-driven data exploration increases Machine Learning capabilities with more qualitative data.

In our example, if we limit our exploration to the list of sold products, it will be harder to understand the reasons a customer purchased a product. By using the various events to design graphics about the customer journey, we can analyze behavior, patterns, and habits. The data scientist could use a bar chart to visualize the conversion rate by comparing the number of customers that looked at a product and the actual amount of time the product was bought.

Event-Driven Data Preparation

When we develop a product, we design data for humans, not machines. Unfortunately, humans and machines have a different affinity with data. Our brain will be extremely performant at analyzing pages of text but incredibly inefficient to calculate massive matrix operations. For a machine, it’s the exact opposite: Computers excel at working with significant quantities of numbers but struggle to understand literature. This fact means that to optimize our Machine Learning, we need to transform our human-centric data.

Data preparation⁵ will clean, pre-process, and specialize data for each specific Machine Learning model. Each model will answer different questions, which means that the data of one model will not fit another one’s needs. Depending on the amount of original data, those transformations might require a significant amount of calculation. Producing data for different models can quickly become an arduous task.

Thanks to event-driven data collection, it is possible to achieve an event-driven data preparation. Instead of transforming the entire enterprise data at once, we can incrementally produce specialized data by continuously processing incoming events. This process will distribute the workload over time and avoid risks of overload.

When we look at our example, we can imagine that a product contains various information such as a name and a description. For a machine, analyzing a product description is not an easy task. However, by merely comparing the length of the description, we could discover that the longer it is, the more likely the customer will buy this product. If this is true, it would be wise to feed our Machine Learning model with the length of the description instead of the description itself.

Event-Driven Model Training

In Machine Learning, the training phase⁶ starts when the model is learning how to answer the question from the data. Analyzing the data and defining the rules is, on average, a long process. The more data there is and the more complex the model is, the more time it can take. It is not unusual to take hours or even days. In cases where data is pretty static and evolves very slowly, like real estate prices, we can train models once in a while. However, in many cases, data is dynamic and is continuously evolving.

Coming up with strategies to train the models at a good time with the relevant data is crucial. It will make sure our Machine Learning models are giving accurate answers at any given time. A simple strategy would be to train the model in a constant time frame with the entire data. Another approach would be to train the model every time a certain amount of recent data has been reached. Data often evolves in unpredictable ways, and it’s wise to choose a strategy that will be able to adapt to data changes.

Having an event-driven approach will leverage the strength of events to trigger a new training phase on particular data changes. Events provide the best references to determine the optimal time to train the model thanks to the context it gives about the data. An event-driven approach will offer the most flexible strategy and enable updates that will be as close to real-time as possible.

When we look at our example, we know that products follow trends. This reality means that if our Machine Learning model learns from data of last year’s sales, it will most likely be terrible at predicting this year’s sales. A good strategy would be to train our model with fixed-sized batches of data and train the model with the latest ten batches every time a new batch is available.

Conclusion

Machine Learning is a fantastic tool that will help revolutionize the software industry. Using events to achieve event-driven Machine Learning will make this revolution even more impactful and unlock possibilities that will be incredibly helpful.

Event-driven data collection and preparation assure that the data scientists will always obtain the data they need when they require it.
Event-driven data exploration puts the context of the data under the spotlight to increase knowledge and comprehension.
Event-driven model training keeps Machine Learning models continuously synchronized with the real world.

In future articles, we will dive deeper into those different event-driven concepts with more concrete and technical examples. We will learn how to unlock the potential of event-driven Marchin Learning by implementing event-driven data engineering.

[1] Nemanja Jovancic (2019). 5 Data Collection Methods for Obtaining Quantitative and Qualitative Data — https://www.leadquizzes.com/blog/data-collection-methods/
[2] Roberto Verdelli (2019). Data Science Process: Data Exploration — https://www.techedgegroup.com/blog/data-science-process-data-exploration
[3] Camron Chapman (2019). A Complete Overview of the Best Data Visualization Tools — https://www.toptal.com/designers/data-visualization/data-visualization-tools
[4] Victor Nitu (2020). Events are the Key to Master Data — https://medium.com/phi-skills/events-are-the-key-to-master-data-11984332e435
[5] Import.io (2018). 10 Best Practices in Data Preparation — https://www.import.io/post/10-best-practices-data-preparation/
[6] Amazon Web Services (2020). Training ML Models — https://docs.aws.amazon.com/machine-learning/latest/dg/training-ml-models.html