Predictive Maintenance — a dive into DB2 and Watson Studio

7 min readMay 20, 2020

MarketsandMarkets forecasts the global predictive maintenance market size to grow from USD $3.0 billion in 2019 to USD $10.7 billion by 2024.

Predictive maintenance is a strategy to directly monitor the condition of equipment and detect when performance is needed to minimize unplanned failures. It is one of the top applications of artificial intelligence and machine learning. Predictive maintenance is generally thought to be most applicable to the manufacturing industry since any equipment downtime is very costly to a manufacturer. To that same point, unnecessarily servicing equipment can also be expensive, as you might be paying someone to go waste time inspecting equipment that is functioning perfectly.

For this tutorial, you will need to download this hard drive data set which includes the date, serial number, model, failure, and a number of SMART parameters represented. SMART is a monitoring system built into most modern hard drives which stands for Self-Monitoring, Analysis and Reporting Technology. These systems are in placed to detect various reliability problems at an early stage, giving warning signs well in advance before the hard drive fails. By the end of this lab, you will be able to identify the likelihood of failure for a specific hard drive model.

Below are step-by-step instructions that go over how to import that data into DB2 and connect it to Watson Studio to generate an automated machine learning pipeline — all on IBM Cloud.

1. IBM Db2: Create a DB2 instance

IBM® Db2® Warehouse on Cloud is a managed public cloud service. You can set up IBM Db2 Warehouse on premises with your own hardware or in a private cloud. As a database warehouse, it includes features such as in-memory data processing and columnar tables for online analytical processing (OLAP). These deployment options have a common database engine so your data workloads can be moved and optimized with ease.

To get started, **sign up or log in** to your IBM Cloud account at https://www.ibm.com/cloud

Once you are signed in, search **Db2** in the search bar.

In the **Create** tab, select Dallas as your region. You can complete this lab with a **FREE Lite plan** for your DB2 instance.

Once it has finished creating an instance, you will be automatically guided your Db2 Getting Started Page.

You will need to create a **Service Credential** to connect your Db2 Instance to Watson Studio later in the lab. On the navigation panel, go to service credential and select **New Credential.**

You may rename the credential or leave as is. Then click **Add**.

Now that you have successfully created a service credential, you will be able to connect an app or external consumer. Leave this page on your browser and duplicate tab.

Next, go to **Manage**, then **Open Console** to DB2 .

In the top left navigation panel click **Load,** then Load Data.

Use the **Load** — **Load Data** screen to load a single delimited text file (CSV) from your computer to the system.

In the **Source** stage, choose what type of data you have, and whether it’s on your local system or online object storage. You can find the drag the **EoM_HardDriveData.csv** into the File Selection box or browse your local computer to upload.

In the **Target** stage, select where you want to put your data. It can be an existing table or one you create on the fly. you will select a schema to create your table in. Each DB2 instance has a **unique schema name, thus your schema name will NOT be the same** as the one in the screenshot. Select the schema with the similar 8 characters (NOT the ones that start with DB2 or SQL).

In the **Define** stage, you have the option to change the code page, reformat columns, separate columns and prepare the database. Notice the error symbol on the right — To fix this error, simply click the down arrow in the **Date Format** and select the format M/D/YYYY.

In the **Finalize** step, review your selections before you start loading your data. No changes are needed, and **Begin Load**.

DB2 will take a few seconds to load your data — you have successfully loaded data into Db2. **If you received an error, make sure you have selected the appropriate schema. You may need to retry using another Schema listed in the Target stage.**

(OPTIONAL) TRY YOURSELF — Repeat the previous steps to append more hard drive data. (i.e. from the **2013-HardDriveFailure** Folder, try appending the **2013–05–01.csv** file

**Click View Table** once you finished uploading your data.

As you can see in the table, the hard drive dataset includes the date, serial number, model, failure and a number of S.M.A.R.T. parameters represented. The availability of these parameters can depend on the specific vendor and model of hard drive — therefore, we will see missing values represented as 0’s in some S.M.A.R.T. parameter columns where the hard drive vendor did not supply them.

Use the **Connections** screen to monitor all available connections to the database and which application is using the connection. Under the navigation panel, select **Connection Information**. You will need this connection details to connect for later in the lab.

Select **Without SSL** and leave this page open. Duplicate your tab once more you to refer back to this connection information later in the lab.

2. IBM Watson Studio: Let’s Create a Project

Watson Studio provides you with the environment and tools to solve your business problems by collaboratively working with data. You can choose the tools you need to analyze and visualize data, to cleanse and shape data, or to create and train machine learning models.

To get started, **sign up or log in** to your Watson Studio account on IBM Cloud at https://www.ibm.com/cloud/watson-studio

Once you have successfully signed in, **Create a project** to get started.

Name your project **Predictive Maintenance** and add a description.

Once you’ve created your project, you will see an overview of your project dashboard. Take a moment to explore the dashboard going through each tab.

After exploring the different tabs, select **Assets — Add to project.**

You can see a number of connections to third-party services as well as IBM services that you can connect to — Select **DB2.**

Input your Db2 connection details using the **service credentials** and the connection information (see below screenshots). De-select the port is configured to accept SSL connections, then click Create.

Use the **DB2 Service Credential** above.

Use the **DB2 Connection Information** above.

After your Db2 connection has been successfully connected, it will be listed under your data assets.

Next, you will connect to your data tables. Click **Add project** and select **Connected Data.**

Name the connected data asset as Db2_PredictiveMaintenance and Select source

Select **Db2_PredictiveMaintenanc**e, your **unique schema** name and the table with the **DB2_PREDICTIVEMAINTENANCE** table

Add Db2_PredictiveMaintenance as the name and add a description. Then click **Create**

Now, you will see you have successfully connected to your tables in Db2. Click **Db2_PredictiveMaintenance** data asset to explore the details of this table.

From here, you can see the top 1000 rows. A very handy feature of Watson Studio is the **Refine** capability which enables you clean, prepare, and transform your dataset.

3. IBM Watson Studio: AutoAI (Watson Machine Learning)

AutoAI is a graphical tool in Watson Studio that automatically analyzes your data and generates candidate model pipelines customized for your predictive modeling problem. These model pipelines are created over time as AutoAI analyzes your dataset and discovers data transformations, algorithms, and parameter settings that work best for your problem setting. Results are displayed on a leaderboard, showing the automatically generated model pipelines ranked according to your problem optimization objective.

Back in the Predictive Maintenance Project, re-upload the **EoM_HardDriveData.csv** data.

To start an AutoAI project, select **Add to project** and **AutoAI.**

Name your AutoAI experiment **Hard Drive Failure AutoAI** and give it any description you’d like. You will need to create a Watson Machine Learning Service Instance, click **Associate a Machine Learning service instance.**

When you click the Associate a machine service instance, select **WatsonMachineLearning** in the dropdown menu.

Select **Reload** and your instance of WatsonMachineLearning will be created. Click **Create.**

Click **Browse** and select the **EndofMonth.csv** file

**Select Asset** once you have selected the appropriate file

Under Configure details, select **failure** as your prediction column. Explore the **Experiment settings**, and leave everything in the default setting once you are done. Then **Run experiment** to begin model pipeline creation

The **Relationship Map** info-graphic shows you the creation of pipelines for your data. The duration of this phase depends on the size of your data set. In this case, it can take up to **~40 minutes** to complete the experiment. You can explore other parts of Watson Studio while the pipelines build.

You can click on the **Swap view** on the right to see the progress map of your model which will better show which steps the AutoAI tool is going through. The **Progress map** shows the entire pipeline of what Watson is doing with the data

Scroll down to see the highest-ranked pipelines displayed in a leader-board. This leader-board provides the option to save select model pipelines after reviewing them. You can view more information about them. As you can see, **Pipeline 1** has performed the best with an accuracy at 1.0 and had the fastest build time.

In the model Evaluation section, you can see the summary of the pipeline including Model Evaluation, Confusion Matrix, and a Precision Recall Curve which were used in the resulting pipeline model evaluation. Select **Feature Importance** to see some of the indicators for failure.

**Feature importance** shows some of the indicators for thee hard drive failure. According to this AutoAI experiment,feature smart_1_raw was identified as the biggeeset indicator for failure. After exploring your model, go ahead and **save the model**

The model name is automatically generated, click on **Save**

4. IBM Watson Studio: Deployment

After you train and save a model, you will create a deployment space so you can embed the model into your applications. Find your saved model in the **Watson Machine Learning Models** on your project assets page.

Select the **Deployments** tab After you train and save a model, you need to create a deployment space so you can use the model to make predictions. Click **Add Deployment.**

Name the deployment space as **Predictive Maintenance Deployment Space** and click **Save**

The status of your deployment space will take a few seconds. Once the status is ready, click your deployment space to see more details.

In the overview tab, you can see the different details about your deployment space

In your deployment space, AutoAI has automatically created code snippets in Java, JavaScript, Python, and Scala. To interact programmatically with an AutoAI deployment, refer to the deployment syntax in Watson Machine Learning Python Client Library

Alternatively on the **Test** tab of the deployment details page, you can test your deployment by entering JSON-formatted payload data in the **input data** box. Input **5/1/2020** for date and **Hitachi HDS5C3030** for model.

Congratulations!

You have successfully finished the lab and deployed an automated predictive maintenance model with DB2 and Watson Studio.

IBM is helping companies across industries apply predictive maintenance to improve business performance. Check out these 5 IBM client examples demonstrating how predictive maintenance in the cloud is helping businesses from five different industries excel.

Feel free to connect with me on Linkedin or email me directly at vincent.cheng@ibm.com if you have any questions about this lab.

Check out this tutorial by Parker Merritt where he goes over how to connect IBM Cloud Pak for Data to an Amazon Web Services S3 data source to prepare data for analysis, and generate a similar automated AI pipeline.

Disclaimer: All data collected from this tutorial was pulled directly from an external public source and used for informational purposes only.

5 Emerging Technologies

Want to enable your business with new technologies? Check out how these emerging tech that have disrupted the way work…

medium.com