What Clinicians Should Know About AI, Part 2: The Process of AI Model Creation

8 min readMay 30, 2023

Photo by Desola Lanre-Ologun on Unsplash

Before getting into how AI models work, let’s start with a bird’s-eye view of how AI models are created. Understanding the process will help with understanding some of the pitfalls we’ll discuss later.

Who creates AI models?

Two of the major roles involved with machine learning are data scientist and machine learning engineer. There can be overlap between these roles. Data scientists generally focus on exploring data and creating machine learning models. Machine learning engineers generally focus on taking these early machine learning models and creating a system to implement them — this is called putting the model into production or deployment. There are others involved as well, such as data engineers, who focus on managing data and ensuring it is prepared properly for use.

There are also multiple executive-level positions that can be involved with implementing AI at healthcare organizations. Not all organizations have all these roles, and the responsibilities vary by organization. The alphabet soup includes:

Chief Information Officer (CIO) — This can also be called the Chief Digital Information Officer (CDIO). This person works to lead information technology (IT) efforts such that they’re in alignment with an organization’s strategic goals. Think: introducing telemedicine to the organization.
Chief Medical Information Officer (CMIO) — This can also be called the Chief Medical Informatics Officer (CMIO) or the Chief Clinical Information Officer (CCIO). This role is similar to the CIO but specific to healthcare organizations, and it’s often held by a physician who acts as a bridge between tech teams and clinical teams. This role may report to the Chief Executive Officer (CEO), Chief Operating Officer (COO), Chief Medical Officer (CMO), or CIO. Think: implementing clinical decision support systems.
Chief Technology Officer (CTO)— This role involves oversight of the technological infrastructure of an organization. The CTO reports to the CIO in some organizations, while in others they may be on the same level as the CIO. Think: purchasing hardware and managing data centers.
Chief Data Officer (CDO) — This role involves managing an organization’s data assets. Think: setting up data governance policies.
Chief Digital Officer (also CDO)— This is a newer role that involves leading digital transformation efforts, which generally refers to using technology to change how patient care is provided. Think: implementing a digital health app to help manage chronic conditions

What Sort of Coding is Involved?

AI models are generally created by writing code in a programming language such as Python or, less commonly, R. You can write code to perform a set of steps, and save this as a file called a script file. You can then run or execute this script to perform the steps defined by that code.

In the early steps of AI model creation, interactive tools such as Jupyter Notebooks are often used instead of script files. A Jupyter Notebook makes it convenient to write a small snippet of code, run that snippet, and then examine the result. This makes it easier to pick up on unexpected behavior that can be missed when running longer sequences of steps.

Here’s an example of a Jupyter Notebook that I’m running on my laptop and accessing through my web browser. Dataset_1 is a dataset with 100 records (rows). At the bottom, I’m using Python to examine records where the weight is 110 or lower:

Now let’s go over the general steps of creating an AI model.

Step 1: Data Exploration

Once an AI project is defined, the first step is generally data exploration — seeing what data is available, assessing its quality, and deciding which data to use for the AI model. This is done using steps like the one shown above.

Step 2: Choosing an AI Approach

We’ll get into some AI approaches in the next post. Deep learning is the group of approaches that is commonly used for healthcare.

Step 3: Preprocessing

The next step involves getting the data into a form that can be used for the AI model.

The first part of preprocessing is “cleaning” the dataset. For example, you may consider a height of 5 inches or 53 feet to be erroneous and decide not to use those value in the model. If you have a dataset where a variable — let’s say the weight — is missing for several records, you will generally choose from a few different options: you may exclude records with a missing weight, fill in the missing values with a reasonable estimate (imputation), or remove weight from the set of variables that are used for the AI model. While cleaning the dataset, you’ll also want to ensure that any duplicate records are removed.

Preprocessing also involves getting the data into the proper format, which often means making the data consistent and simplified. For example, you may write code to convert all heights to inches. Preprocessing can get pretty complicated for images. For example, the first step may be to convert each image to a collection of numbers called an array — each pixel of the image would be represented by 3 numbers giving its red, green, and blue values. You may then convert this array to grayscale so that each pixel is represented by a single number giving the pixel intensity. Then, you may resize and crop these arrays so they’re the same size.

Step 4: Splitting the Data

Data splitting is not relevant for all AI approaches, but it’s important to understand for deep learning.

In this technique, datasets are usually randomly split into training, validation, and test sets. For example, 60% of the records may be in the training set, 20% may be in the validation set, and 20% may be in the test set.

The training set is used to train the model, meaning the model gradually “learns” from the data in the training set. The test set is used to evaluate how well the model performs on “new” data.

I’ll start with an analogy here, and we’ll touch on this again in the next post. A model learning from a training set is like a student learning multiplication by doing practice problems. The student may end up memorizing the answers to the specific practice problems — they would do great on those practice problems, but then do terribly on the test. This situation is like an AI model that overfits the training set and doesn’t generalize well, which becomes obvious when it does not perform well on the test set.

Ideally, the test set should only be used once. If the student tries the test over and over again, they may then focus on the practice problems that are important for the test. Similarly, if a model is evaluated on the test set over and over during model development, the model may become overfit to the test set and not perform well on a new dataset. The validation set helps remedy this situation. It’s like a mock exam that helps the student evaluate how well they’re learning the material prior to the actual test. During model development, the validation set helps with evaluating generalizability and overfitting of the model.

It should now be apparent why splitting of the dataset is done randomly. If a student learns with practice problems up to 11x11 but then the test consists of problems where numbers are multiplied by 12, the test is not appropriately assessing how well they learned the material of the practice problems.

Step 5: Training the Model

AI models are usually created using a machine learning framework such as TensorFlow or PyTorch. TensorFlow and PyTorch are free software libraries that simplify the code required for frequently used steps, so you can create machine learning models with far less code than would otherwise be required.

Once you have set up code for the machine learning algorithm and have your data ready, you can start training the model. Initial attempts may not work out great, and you may have to modify the algorithm — and sometimes the data preprocessing steps — to try to improve the model. Training a model can take several hours or days, so this can be a lengthy process.

The types of parallel calculations used to train complex machine learning models are performed more optimally on a GPU (graphics processing unit) than on a CPU (central processing unit). Companies such as Nvidia initially created GPUs to perform the complex calculations required for video game graphics and similar purposes. While your personal computer may have a GPU, it is often not sufficient for creating robust AI models since AI model training is a computationally intensive process generally involving large volumes of data. Instead, you may log into a server that your organization has set up, which provides access to a central set of computational resources including GPUs.

An increasingly popular approach is to use cloud servers rather than on-premise servers. Common cloud GPU platforms are Amazon Web Services, Google Cloud, and Microsoft Azure. The concept is similar to an on-premise server — there is a collection of computers with GPUs that is accessible to users. But these platforms have large-scale servers and charge for computational resources as they are used, saving organizations the cost and effort of creating and managing on-premise servers that are rarely used at full capacity. Individuals can access free cloud GPU resources through sources such as Google Colaboratory, Kaggle, and Amazon Web Services. These allow individuals to create AI models, but there are limits to how much you can use the GPU resources.

Step 6: Evaluating the Model

This is a step that is increasingly getting the attention it deserves. It includes evaluating the model on the test set, but should go beyond that as well.

You may want to explore different types of metrics for model performance. It may be important to obtain additional datasets, for example from other institutions, to evaluate whether the model works well with data from different settings. You’ll want to evaluate failure cases — cases where the AI model did not perform well — to see what went wrong. For example, it may turn out that the model does not work well on images obtained with a particular brand of CT scanners. You’ll want to evaluate not only the model’s overall performance, but also its performance on edge cases, or situations that uncommonly come up. For example, if your dataset has relatively few records from patients with kidney failure, you may want to evaluate whether the model is still able to perform well for patients with kidney failure. Additionally, you’ll want to evaluate for biases. We’ll discuss this step further in a future post.

Step 7: Deploying the Model

Once the model is created and evaluated, it can be deployed into a production environment. During deployment, issues that are addressed include:

How can the model be incorporated within existing software and processes?
Where will data come from?
How will the data be preprocessed?
Will we set up a system so that the AI model is continuously updated using newer data (online machine learning)?
What will we do with the output of the AI model?
Has the code been thoroughly tested?
How will we monitor the model to detect issues that come up?
How will we ensure the system functions reliably?

Deploying AI models can be particularly difficult in healthcare as clinical environments are often not set up to enable this complex process.

Stay Tuned

In the next post, we’ll get into different machine learning approaches. I’ll be posting on LinkedIn and Twitter when it comes out. If you missed Part 1 of this series, you can find it here. See you next time!