Healthcare Machine Learning: Getting Started

Jenine John
8 min readMar 29, 2023

--

Photo by Sigmund on Unsplash

If you’re in healthcare, you have no doubt heard about all the interesting new projects that involve machine learning. You may want to gain a basic understanding of healthcare machine learning, or you may want to actually create machine learning models. If you’re in the latter group, this guide is for you. If you’re simply looking to gain a basic understanding, check out this introduction.

We (Pierre Elias and Jenine John) start out with some pointers based on our experience as cardiologists who have gotten into data science, and then provide a compilation of some of the best resources we found over the past few years. It is far from comprehensive and we welcome additions.

Accessing Data and GPUs

If you’d like to engage in healthcare ML research, you’ll want to be in a group that will provide you access to guidance, data, and GPUs (graphics processing units, which are better suited for machine learning than CPUs). While you’re learning, though, you can start off with publicly available datasets (like Kaggle or PhysioNet) and use free GPU resources like Google Colab (AWS also provides free GPU use, but it can be tricky to avoid accidental charges).

Picking a Language and ML framework

If you’re in a group, it’s best to learn the language and ML framework your group is using.

Python is the language commonly used for machine learning. Some groups use R instead.

Two of the most commonly used machine learning frameworks are PyTorch and TensorFlow 2, which incorporates Keras. PyTorch tends to be favored for research because it is flexible, whereas TensorFlow tends to be favored for models that will be deployed. Also, PyTorch’s commands are closer to Python and may therefore be more intuitive, but TensorFlow’s commands are a little more simple. You can’t go wrong with choosing either one as your first framework, and you can learn another framework later.

Recommended Resources for Coding

Photo by Boitumelo Phetla on Unsplash

Coding is a skill that is learned best through practice. Try out multiple resources to see which ones work best for you. Once you choose a resource, you should also consider whether it is worth completing it to the end, because some of the later portions may not be relevant to your needs.

Jump into applying the skills to real data as early as possible. It is one thing to learn from homework and an entirely different one to tackle it with real data. The sooner you get to building things for yourself, the sooner it will begin to click. When you need to figure out how to do something, you’ll know which resources to turn to, or you’ll search online (Stack Overflow is incredibly useful). Jupyter notebooks are very helpful as you’re learning — they’ll allow you to carefully check your code as you go along so you can avoid unexpected behaviors.

Python Basics

Jupyter notebooks are great for running short pieces of Python code and seeing the results. When writing scripts, you’ll want to use an Integrated Development Environment (IDE) or code editor. PyCharm Pro is an IDE that is free with a .edu e-mail address, and Sublime Text is a great code editor. If your group has set up a remote server, PyCharm can be used when working on the server (be careful with the settings, because it’s possible to accidentally delete the server’s data when syncing). Alternatively, Sublime Text can be used in combination with a file syncing tool like Forklift or Cyberduck to work on a remote server.

Here are some courses to get you started with Python — try some out and see which ones you like.

  • CodeAcademy — Learn Python
    While you will almost certainly code in Python 3 rather than Python 2, the differences are minimal. The Python 2 version is completely free while the Python 3 version has a free trial. This is a great interactive approach to beginning to learn Python.
  • YouTube — Corey Schafer videos on Python
    Excellent set of practical videos that includes topics like setting up Python and importing modules.
  • Coursera — Python for Everybody
    A good basic introduction to Python. The pdf of the accompanying book and other materials are available here.
  • Hitchhiker’s Guide to Python
    A resource for understanding some core topics better, such as organizing scripts and logging.

Python for Data Science

Shell

One of the first things you should do is get comfortable with the command-line shell, which will enable you to do things like run your scripts. There are lots of online resources about this, such as this page.

Github

Git is an important tool used for versioning (tracking changes to scripts so that you can collaborate with others or go back to prior versions if needed). The website Github is commonly used for this. It can be a little tricky to get the hang of. There are lots of basic Github introductions online. If you’re looking for a detailed introduction, this Youtube playlist explains it well.

It’s a good idea to store private data and Jupyter notebooks in separate directories from git repositories so that you don’t accidentally upload sensitive information. An alternative is to use .gitignore.

Tmux (optional)

Tmux is a handy tool that allows processes (including Jupyter notebooks) to continue on a remote server even if you get disconnected. It also allows multiple processes to be run at the same time. This is a brief tutorial you can turn to if you’d like to use tmux.

Other

Recommended Resources for Machine Learning

Photo by Alina Grubnyak on Unsplash

Focus first on gaining a conceptual understanding of machine learning, then create some basic models. Once you have the basics down, you can supplement with additional resources to flesh out your knowledge. The resources with asterisks are especially good.

Machine Learning Techniques

Machine Learning for Healthcare

Math for Healthcare Machine Learning

Other Resources

Photo by Art Wall — Kittenprint on Unsplash

Keeping Up with the Field

Datasets

Machine Learning Packages for Cardiology​​

  • IntroECG
    Pierre Elias’s deep learning library that allows users to create their own synthetic ECGs and teaches them how to build deep learning models for electrocardiograms. It provides multiple architecture examples and step-by-step instructions on how to review experimental findings and automate experimentation.
  • EchoNet
    David Ouyang’s deep learning library for determining ejection fraction from echocardiograms. This library comes with a large, freely available echocardiographic dataset and facilitates many of the important principles for doing deep learning research with echocardiograms.

--

--