How to build a Predictive Maintenance application with Dash and Vaex

Jovan Veljanoski

Published in

Plotly

15 min readNov 4, 2021

authors: Jovan Veljanoski & Maarten Breddels

📌 Want a video walk-through of Dash and Vaex? Check out this recorded webinar.

Reactive was, predictive will be

Most companies and organizations have figured out that collecting and levering data can help their business. Tracking relevant metrics, KPIs, and all kinds of statistics related to past actions and outcomes can help one better understand the domain and position one is in, and hopefully, make better adjustments for the future. This is typically referred to as reactive analytics. The analysis would process the outcomes of past events, gather the relevant info, and propose the strategy for the future.

There is a major drawback with this approach, however, as it creates a sort of “blind spot” in the business process. By the time data from a past event is analyzed and the strategy for the next segment decided, enough time would have passed to either be late to employ that strategy for the very next segment or make that strategy out-of-date due to changes in circumstances.

Companies and organizations which are more mature in the data domain are trying to take the next step and build various statistical or machine learning models that will help them predict the likely outcomes of an event — hence the term predictive analytics. While more involved to put in practice, this is a continuous and often automated approach that should always use the latest relevant data and allows one to anticipate events before they occur, and act accordingly.

Can your dashboard take me where I want to go?

Dashboards are ubiquitous and are often the primary method for communicating data insights between data teams and stakeholders. Tools like Tableau, PowerBI, Google Data Studio, and others have dominated this space for years. While these tools allow users to make pretty-looking charts relatively easily, they cater to the reactive approach to data analytics. One can choose from a set number of visualization, and display statistics that are calculated or summarized via SQL commands. Working with anything beyond clean tabular data is questionable at best.

So this raises the question: how do we “fine-tune” a dashboard to the specific needs of a business and the problem that businesses are trying to solve? How do we move beyond showing bar charts, pie charts, and trend lines? This is not a matter of “status” but of need. Businesses are different. They have different data, they operate differently and in different domains, and care about different things. The need for data analytics specialized for each business, one that leverages the right data, tooling, and domain knowledge is very real and immensely valuable.

Enter Dash

Dash is a Python Open Source library for creating web-based applications. With it, one can create a stunningly beautiful dashboard and analytics applications, fully customizable to a particular use-case or a problem one is trying to solve. A quick look at the Dash Gallery will give you a good idea of how versatile this solution is. And you don’t need to be a React expert to create such stunning apps!

The best part about using Dash as a dashboard solution is that it allows you to leverage the entire Python ecosystem! The Python / PyData ecosystem is abundant with a multitude of tools for sophisticated data analysis for virtually any domain, tools for processing and transformation of data big and small, libraries for machine learning and model interpretation, APIs for communication with various services, cloud storage, and more. All this can be harnessed and put to use to create web applications specific to the problems you care about.

A predictive maintenance example

To illustrate the points made above, we are going to use data from the NASA Turbofan Degradation Simulation to build an application that tells us when a jet engine is expected to fail. The data is split into two parts: a training set and a test set. In the training set, a set of engines are run until failure. In the test set, a different set of engines of the same type are run for a number of cycles, with the goal being to estimate the number of cycles before the engines breaks. The data itself is composed of a number of sensor readings for each cycle a particular engine is operating.

We will be using a few popular Python Open Source libraries to build our solution, namely:

Vaex: for data exploration, data preprocessing & transformation, and for building the model “pipeline”;
Keras: for building the predictive model (an LSTM neural network);
Shap: for interpreting the predictions of the model;
Dash: for putting it all together into an intuitive and informative application!

Note that the NASA Turbofan Degradation Simulation dataset has been public for over a decade now, and there are many solutions to this problem floating around the internet. Thus, the purpose of this article is not to provide an optimal solution to this problem. Rather, it is present how one can combine a few quite powerful libraries, and together with Dash create a specialized web application related to the particular problem at hand. This article will not touch on every little detail when creating the solution, but the full project can be found here and the deployed app can be found here.

Let’s get started!

Data Preparation

The NASA Turbofan Degradation Simulation dataset comes in a bit of an unusual format, and we need to do a bit of work to transform it into a regular supervised problem. The training set does not have an explicit target column. Since each engine was run until it broke, we can infer the target variable we call RUL, i.e. remaining useful life, directly from the total number of cycles. For the test set, the final RUL is provided in a separate file, which we would like to join to the main feature set for convenience. These transformations can be easily done with this function:

Note that we are using Vaex to do the data manipulation. The full notebook data preparation notebook can be found in the GitHub repo.

Predictive Modelling

We are now ready to start building our model! The idea is to use a Recurrent Neural Network (RNN) as the basis of our model, in order to better capture the temporal aspect of the sequence of sensor readings for each engine.

But before we get there, however, we need to prepare and preprocess the data, such that the RNN can make the most out of it. In this example, we will not spend any time doing any kind of sophisticated data exploration. You are of course very welcome to dig into the data and see if you can come up with extra features or transformations that improve the model performance.

Let's start by defining a “control” sample, otherwise known as a validation set from the original full training set. Such a sample is crucial for training most models, as it gives us a sense of how well the model is learning, and that it is generalizing to unseen data and not simply memorizing the training set. In our case, the validation consists of 5 engines randomly picked out of the full training set:

With Vaex, these operations do not take extra memory!

Note that since we are using Vaex for manipulating the data, the creation of the train and validation sets in the last two lines of the code block above costs no extra memory! While the dataset we are using for this example is small enough and any DataFrame library would be able to handle it, using Vaex ensures that your code will remain valid even if your data volume suddenly increases beyond what your RAM can hold, without upgrading your machine. This is not the main reason for choosing Vaex as the DataFrame library in this example! Stay tuned to find out how Vaex accelerates the development of ML solutions.

Next up, we would like to clean the data a bit. Not all sensors are useful for predicting the RUL. Namely, we would like to remove sensors that have constant or near-constant values through the runtime of the engine, and sensors whose values are too weakly correlated with the RUL, meaning that they have virtually no predictive power for our use-case. Also, we would like to remove one sensor of a pair that is very highly correlated with each other. Finally, the dataset contains columns that are not sensors readings, which we decide to remove as well.

The readings of different sensors span different value ranges. The readings of some sensors are in the tens, while of others in the thousands. We do not want our model to think that certain sensors are more important simply because their values are generally higher than those of other sensors. Hence we decide to scale the data, and in this case, we do MinMax scaling between 0 and 1.

Before we start training the model, we need to reshape the data such that it is suitable for the RNN model. Traditional ML models, especially those APIs that follow the scikit-learn’s fit/transform design pattern, expect the data in a form of a two-dimensional table or a matrix with the shape (n_samples, n_features), where n_samples is the number of samples or rows in our data, and n_features is the number of features or columns in our table that we will use to build a model. Such an approach generally assumes temporal independence amongst the samples in the input matrix.

Since we do want to capture the temporal aspect of our data, which is the reason to use a more complicated RNN model, we need to re-shape the data into a three-dimensional matrix of the form (n_sequences, n_samples, n_features). A sequence, in this case, presents a series of consecutive sensor readings, from all of the sensors that we are sending the model. We build a set of sequences in a “sliding window” fashion: for our chosen window size of 50, for each engine, we create a set of sequences made of readings starting from 1..50, then 2..51, then 3..52, and so on. Doing this programmatically in Vaex looks like this:

Reshaping the data making it suitable for the RNN model.

We are now ready to set up the RNN model, which is based primarily on two LSTM layers.

Schematic of the RNN model used in this example.

You are welcome to modify the hyper-parameters of this model, and see if you can improve the results, or use a different model architecture altogether!

We are now almost ready to start training our model! We just need to bring the validation sample “up to speed”. At this point you might be wondering, “do we go back and re-do all the steps for the validation sample?” or “do we need to re-implement our above approach in a form of a pipeline so the validation set and the future test data can be processed in the same way?”. The answer is — not exactly! Actually, the cool thing about a Vaex DataFrame is that it automatically creates a pipeline for you! A Vaex DataFrame will automatically “remember” all the transformations done to the data, which we can use to transform any other DataFrame provided it has a similar schema. Thus, to propagate the changes done on the training set to the validation set, all we need to do is:

Re-apply the transformations on the validation set.

The data cleaning, scaling even the custom reshaping will be propagated to the test set! In fact, this is feature is the main reason why we chose to use Vaex as the DataFrame library in this example.

Passing the data from a Vaex library to the Keras model is quite simple: Vaex has a convenience method that creates a data generator fully compatible with Keras:

Create data generators from Vaex DataFrames as Keras model inputs.

Training the model is then simply done by:

You can find more information about the model training phase and the relevant metrics in the GitHub repo. Once the model is trained we can use it to get predictions for each sequence. A nice thing about Vaex is that it can create a scikit-learn like transformer out of the trained model, with it will can add the predictions as a virtual column in the DataFrame. This is quite convenient, especially for model diagnostics and explainability purposes.

Obtain the model predictions as a virtual column in a Vaex DataFrame.

At this point, we save the internal transformations pipeline of df_train in order to be able to propagate all of these steps to the validation and test sets as needed.

Serializing the entire transformation pipeline automatically created by the Vaex training DataFrame.

As it is now, we have the prediction for the RUL for each sequence of sensor readings that we created. However, we would like to have the best estimate of the RUL for each engine. To get a robust estimate of the RUL per engine, we average the RUL predictions for the 5 last sequences, while adjusting for the temporal offset between them, i.e. knowing that in an ideal case there would be an offset of 1 for the predictions of the RUL between adjacent sequences. Finally, we save the training transformation pipeline that leads to the final DataFrame.

Obtaining the RUL estimate per engine, and saving the pipeline leading to the final DataFrame.

At this point, we can truly appreciate the power of the internal transformations pipeline of Vaex. By just doing exploratory-like work, Vaex automatically created a production-ready pipeline, capturing all the data transformations and DataFrame manipulations that we did. In the first stage, we dropped some columns, scaled the data, created sequences via a window function, and obtained the RUL predictions from the trained model. All of these steps are trivially propagated to a fully untouched test or production DataFrame like so:

A Vaex DataFrame contains all of the data transformations in one place, including the predictions

You can see the that transformed DataFrame contains not only the original columns, but all the transformations we applied to them later on, such as the scaling, reshaping to form sequences, and the RUL predictions per sequence. This is incredibly powerful, not only for model diagnostics and data transparency but also for building informative dashboards for which we might need data from each step of the process.

Obtaining the very final results for the test data is just as easy. Notice that the Vaex internal transformations pipeline is able to capture and propagate custom transformations of the DataFrame itself, even if they are defined outside of Vaex.

Obtaining the final RUL predictions of the test set.

So how does our model do on the test set? We can quote the mean-absolute-error of 3.3 cycles, but perhaps this figure will be more informative:

The figure shows that the model performs very well on engines that will fail within 40 cycles or so, and is not so accurate for engines that have longer RULs. This makes sense — we care most about having accurate predictions for engines that are likely to fail soon. The exact RUL for engines that will stay operational for longer periods of time is harder to predict.

Dashing it up!

No matter how well an ML model performs, it is useless until it is put in production, and its predictions are reported to the relevant stakeholders. This is an area where Dash shines. We will use Dash to create an informative dashboard that shows not only the estimated RULs for each engine in the test set but also the likely reasons behind those estimates — which sensors contributed most towards the estimates as well as which sensor readings were the ones to prompt the model to make the estimate. We obtain the explanations via the SHAP library, which can help us determine whether the model is actually learning the right aspects of the problem.

Dash exposes various UI components in a dashboard. These components can be all kinds of figures, tables, buttons, sliders, interactive menus, and other custom components. All of those components can be monitored for user interactions like click, drag, zoom, etc. When such user interaction is detected, it triggers a callback function that will update one or more components in the dashboard.

We will not explain in great detail here how to build the whole application here, since everything is available on GitHub, but we will go over the core concepts instead. The dashboard looks like this:

Let us discuss how to create a single interactive element, in this case, the figure at the bottom. The figure shows which of the last 50 sensor readings had the most impact on the prediction. The readings with the most impact are shaded with a darker blue color. We also see that the sensor can be chosen via a dropdown menu, but also if one clicks on the figure on the top right, or on the table on the top left. There is also a checkbox “Normalize” which specifies if we want to display the original or the normalized values of the sensor readings.

A good practice to follow when building a Dash application is to split your Python functions into 3 groups, which us to have more easily readable and maintainable code:

compute functions: they calculate the required quantities we want to display on the dashboard. The input arguments will often come from some user interaction.
plotting functions: they typically take the outputs of the compute functions and produce a visualization.
callback functions: they monitor and are triggered by certain events like user interactions, after which they simply first one or more compute functions and then plotting functions, and use the results to update the dashboard.

Let's start with a compute function. For the figure we want to create, for a given sensor of a particular engine, we need to extract the relevant SHAP values, as well as the sensor readings themselves, and their associated “time stamps”:

Then, we would like to make a visualization based on the outputs of the above function:

Create a Plotly figure highlighting the relevant sensor readings.

The above functions are triggered by a callback function as shown below:

A Dash callback function that triggers the relevant compute & plotting functions, and updates the Dashboard.

The callback function is prepended by a Dash decorator. The decorator specifies which values of particular UI components are monitored. In the above example, we can see that if the SHAP values change (because one selected a different engine), if a new sensor is selected (by a new selection in the dropdown menu or a click in the feature importance plot), or if one (un)checks the “Normalize” checkbox, the callback function will be triggered. The Output in the decorator specifies which figure in the Dashboard is to be updated, and in this case, this isthe figure in the explainer-figure component.

With this simple design pattern, one can easily create a custom and use-case specific dashboard.

Dash Enterprise

Plotly has poured recent efforts into easing the technical or non-technical developer’s journey. Dash Enterprise intends to be the one-stop-shop for data scientists to build, test and deploy their analytical apps. You can code in Workspaces, the browser-based VS Code-like IDE, and build your frontend with Dash Design Kit to eliminate the need to write thousands of custom CSS rules.

Creating a dashboard has truly become simpler than ever with our all-new Dashboard Engine tool and Pro Components library. Dashboard Engine allows a single senior Python-dev to create a canvas for analysts, executives, or even interns to create customized analytical views of their dataset without diving into the code. By the way, Dashboard Engine is also pre-processed using Vaex powering cross-filtering up to millions of rows of data.

Pro Components are tools that deliver even more value by adding customizable and powerful — utilize the Tour component for simple-to-implement tutorial instructions overlaid onto your Dash app or employ the Gantt chart component to add project management features to your Dash app. If you’re intrigued by these features, tune into Plotly’s next webinar to learn more! ➡️ https://tinyurl.com/fyahmez4

Power at your fingertips

We hope that this example showed you how you can create a very use-case specific dashboard by combining a variety of open Python libraries. And here lies the true power of Dash — it can harness the power of the entire Python ecosystem to create a specific, meaningful dashboard, specific for virtually any problem, in any domain. Want to learn more about Dash and Vaex? Watch this recorded webinar.