Predict the Likelihood of Heart Disease with SPSS Modeler Flows

Joseph Kent
IBM watsonx Assistant
5 min readJun 7, 2018

Intro

In this tutorial, we are going to grab a dataset from the Watson Studio Community (a great resource for sample data, articles and tutorials). Then, we’ll build a model to predict the likelihood of heart disease in a sample of patients. The dataset includes various fields including age, gender, blood pressure, and other patient health metrics that we’ll use as inputs to the model. This tutorial demonstrates how quickly and simply you can build models with SPSS Modeler Flows.

Getting Started

  • If you haven’t already, log in to Watson Studio at https://datascience.ibm.com/ and either create a new project or open an existing one (up to you!).
  • Once you have a project that you want to use, click the ‘Community’ link in the header bar and search for the dataset — UCI: Heart Disease — Cleveland. You can also find it here.
  • Click on the dataset tile to open the preview and take a look at the field names and contents (the field names are a little cryptic but you won’t need to specifically remember what they are apart from the ‘NUM’ field which we will use as our model target).
  • Click the back arrow, add the dataset to your chosen project, then open the project. Create a new Modeler Flow using ‘Add to project’ and make sure you use the ‘IBM SPSS Modeler’ runtime.

Add and Prepare Data

  • Add a ‘Data Asset’ import node on to the canvas and open the editor by double clicking the node. Click ‘Change data asset’ to open the Asset Browser.

The Asset Browser shows you the available data assets contained in your project. If you are using an existing project you’ve worked with before you may have many assets listed. If you started a new project you should only have the ‘Heart Disease’ dataset we just added.

  • Once you have selected the dataset, click ‘Ok’ to close the Asset Browser then click ‘Save’ in the node editor to commit the changes.
  • Next we’ll add a ‘Type’ node from the ‘Field Operations’ category in the node palette. Drop it on to the canvas and connect the ‘Data Asset’ node to it by dragging a link from its output port.

We are going to try and predict the ‘NUM’ field which is the diagnosis of heart disease represented as five categories from 0 (no presence) to 4. As the predicted values are categories we are going to use a C5.0 decision tree model which can predict a categorical target. However, by default, the SPSS Modeler runtime will consider a field with values of 0–4 as continuous rather than categorical and therefore we must use the type node to change the measurement type (the measurement type describes how the field can be used by Modeler).

  • Open the ‘Type’ node editor then click the ‘Configure types’ link. This will open the types table panel where you can add fields which you want to modify.
  • Using the ‘Add Columns’ link open the field picker and add the ‘NUM’ field.
  • In the ‘Measure’ column change the measurement type to ‘Ordinal’ and change the ‘Role’ to ‘Target’.

Ordinal means categories with some kind of hierarchical order, such as school grades, or in this case severity of heart disease present in the patient. If the categories have no inherent order then the measurement type would be Nominal.

  • Click ‘OK’ on the Configure Types panel and then ‘Save’ on the node editor.

Build the model

  • Next add a C5.0 model builder from the node palette and link the type node up to it.

Modeling nodes in the SPSS Modeler runtime use any field roles which are inherited from earlier in the flow. By default all fields have the ‘Input’ role. As we set the role of ‘NUM’ to target in the previous type node that means our model is already set to predict ‘NUM’ using all other fields as inputs to the model.

Therefore all we need to do is run the model using either the context menu or the run button in the canvas toolbar which executes the entire flow. You should end up with something like this:

Once the model has finished building it will appear as a new node the canvas with a link to the previous type node and a symbolic dashed link to the model build node. This new node is the model object which gets scored with data when you either run the whole flow (which runs all nodes and branches in the flow) or if you run a node subsequent to this model in the same branch using the context menu.

View the model and take a look at its details.

Of particular interest is the ‘Predictor Importance’ chart which shows which input fields have the most impact on the model predictions. Often you can remove less useful fields to improve the performance of your model.

Congratulations! You’ve built a model! You can now add a ‘Table’ node from the ‘Output’ category of the palette to score your model and see the output predictions, or alternatively you could try the ‘Analysis’ node (make sure to open the node editor and select the analysis items you want to generate — coincidence matrix is a good start) to take a look at the performance of your model.

After that, you can save your model model with Watson Machine Learning.

And that’s it! There are many different model types for different applications, but the general process of building them is the same, so get building!

Joseph Kent:

I’ve been a UX designer for IBM for six years and previously worked on the SPSS suite of data science products. My particular areas of interest are graphical representations of data flows and simplifying data science through good interaction design.

--

--

Joseph Kent
IBM watsonx Assistant

UX Designer for Modeler and Spark flows in IBM Watson Studio. Formerly wielded crayons for various SPSS data science applications.