Deep Learning in Healthcare: Challenges and Opportunities
“Medicine is an art and a science, but the science dominates the art.”
Dr. Dave Channin received a Bachelor’s degree in computer science and molecular biology from Brandeis University. After graduation, he worked as a programmer for a couple of years, and then left the United States to study medicine at the Faculté de Medicine Lariboisière-St. Louis in Paris. Returning to the USA, Dr. Channin completed medical school and residency in radiology at the Penn State College of Medicine. At the completion of residency, Dr. Channin was recruited to Northwestern University as the principal architect of the Northwestern Memorial Hospital PACS. In 2010, Dr. Channin became Chair of the Guthrie Clinic medical imaging service line. There, he had shared administrative responsibility for imaging at 4 hospital and 7 outpatient locations, performing 240,000 procedures per year. In 2015, Dr. Channin left Guthrie to return to his roots in informatics and technology, founding Insightful Medical Informatics, Inc.
What makes deep learning in medical and imaging informatics different from applications that are more consumer-facing?
This is healthcare and healthcare, itself, is fundamentally different from every other industry. People assign the highest priority to their health (or lack thereof), expect the highest levels of care and service regardless of cost and are more emotional and ideological about this industry than any other. Because it consumes 17.5% of US GDP and still does not meet societal expectations, it is the most regulated aspect of American society.
You are both a physician and an entrepreneur. What are the difficulties in starting a medically-relevant company as a physician, and what advice do you have for those who are looking to do so?
Actually, I was a computer programmer who became a radiologist and through an interest in R&D became an entrepreneur. Radiology, in particular, is a great specialty in which to find a technology driven path and apply the tools of the programmer.
The challenge to starting a medically relevant company is identifying the niche upon which you are going to focus. Work backward from the patient and their pain and suffering. Do not underestimate the size, complexity and regulation of the American healthcare system and the scientific rigor to which you will be held. Consider the American healthcare system as an ugly shrub that only 200 years of carefully metered cuts will transform it into the bonsai we all so desire. It is unrealistic to think you will uproot the entire shrub to plant something new. Even your branch may take decades to change.
Collaborate with people who are already in healthcare. You will be surprised by their insights and their desire to improve the system.
What are the most important factors teams must consider when building healthcare-minded products more generally?
In today’s environment, everything done in healthcare must address the pillars of the Triple AIM; improve the health of populations, lower the cost of care, or improve the patient experience. Some add a fourth aim of improving the provider experience so as to recruit and retain the best people. If your product or service does not address one or more of these, don’t bother.
Medicine is an art and a science but the science dominates the art. Medicine, directly or indirectly, is evidence-based and sooner or later you are going to have to produce hard scientific data to back up your marketing claims. The road from Hippocrates to HIPAA is littered with snake oil and its promoters.
Assume it is a zero sum game. You are going to make money in this business by taking it away from someone else. They, their lobbyists, legal staff and everyone else they can muster are going to try and stop you and maintain their playing field advantages.
You are dealing with a large number of highly educated, highly trained, highly enculturated individuals. Respect the validated, accumulated knowledge and wisdom and the culture of altruism, empathy and compassion; challenge unvalidated beliefs, disrupt bad workflow and bureaucracy and help these people do what they do best, better.
What catalyzed the interest in deep learning applied to healthcare?
It is important to remember that ‘artificial intelligence’ (in the largest, traditional sense) and ‘algorithmic learning’ has been applied to medical data including images since the earliest days of computing. Computer assisted diagnosis systems have been around since the 1970s. Automated processing and analysis of one-dimensional time signals (e.g., electrocardiograms) has been around for decades. Computer aided detection and diagnosis of medical images (e.g., Papanicolau smear cytology, detection of masses and microcalcifications in mammograms) have also been around for quite some time. Some of the latter already use deep learning techniques such as convolutional neural networks.
The current interest in deep learning in healthcare stems from two things. First, the flowering of machine learning techniques, in general, and especially unsupervised learning techniques, in the commercial space with the likes of Google, Facebook and IBM Watson. The second factor is the explosion of available healthcare data (lagging only slightly the explosion of internet data) that was triggered by the HITECH portion of the American Recovery and Reinvestment Act (ARRA). The latter effectively transformed medical records from carbon paper to silicon chips and made that data, structured and unstructured, available.
What hurdles do you see for these first-movers going forward?
Data in, data out and regulation.
Machine learning methods used in a vacuum have next to no utility — you need data to train your model. How significant of a data barrier is there when it comes to medical applications of machine learning concepts, given the significant privacy considerations?
This is the “data in” problem. The problem is not privacy. The use of medical subjects and data in research, including research to develop new technologies, is well established both within the context of Federal Policy for the Protection of Human Subjects (the so-called, “common rule”) and HIPAA. Even the transfer of technology and intellectual property developed with federal research dollars to the private sector has been facilitated for decades by the Bayh-Dole Act of 1980. Companies in this space “only” need to respect policy, paperwork and process.
The real “data in” problem, affecting deep learning applications, especially, but not exclusively, in medical imaging, is truth. Truth means knowing what is in the image. It is very easy to get a large number of images of hats and have people annotate the images that contain red hats or fedoras. Crowdsourcing to millions (billions?) of people, the annotation or validation of data (e.g., CAPTCHA) can also work to create/validate large datasets. Other small and large annotated datasets, for specific recognition tasks, have been created by government, academia and industry at no small cost in time and money.
Medical images are much more complex. There are dozens of kinds of medical imaging devices each producing images according to their respective physical principles. These machines are producing images of hundreds of different anatomic structures and normal variants and pathophysiologic processes resulting in thousands of observable imaging features.
In the case of supervised learning, and creating annotated datasets, it is important to remember that in the United States, there are only approx. 35,000 people trained and licensed to annotate all of those observable imaging features (though there are perhaps triple that number that could contribute annotations in their specialty areas).
Large numbers of patient imaging studies performed with digital technologies over the past 30 years have been annotated by this rolling population of 35,000 experts. The vast majority of those annotations, however, are in the form of unstructured free text and are absent links to the coordinates of the pixels containing the image features that engendered the annotation. The good news is that there is a new standard for Annotation and Image Markup (AIM) that was developed under a National Cancer Institute program and anyone developing annotated medical imaging data sets ignores the importance of standardized annotation at their peril.
But you can’t just take single annotations from one of the 35,000. Even though they are experts and very good at what they do, they are human and make mistakes. So you have to have consensus annotations by multiple expert observers.
What about data for unsupervised learning? Can’t we find millions of, for example, chest X-rays and see what patterns are found?
Well, yes, you could but you might suffer from garbage in — garbage out. There are thousands of imaging procedures. The Current Procedural Terminology (CPT) and other code sets used to classify and bill for these procedures lack the granularity to characterize the exact nature of the imaging performed. It turns out, there are 11 or so ways to produce a radiograph of the chest. The billing code, 71020, can be used for any two of these 11 views. In computed tomography (CT) there are dozens of parameters that can be varied to produce images, including whether or not the patient was injected with contrast media. In magnetic resonance imaging, even more parameters. Which of those parameters are going to affect the output of the unsupervised system? There are no widespread, detailed standards for the acquisition of medical imaging studies. The good news is that there is a developing standard for the nomenclature of imaging studies (the Radiological Society of North America’s RadLex™ playbook now being harmonized with LOINC). Furthermore, medical imaging has one of the best standards, DICOM, that specifies, in infinite detail, the metadata of medical images, so you can use this information to assist an intelligent triage of the images. As the saying goes, “DICOM is always documented in brown, because it is clear as mud, but delivers like UPS.”
Standards for non-image structured data are less, ummm, standardized. Even then, much non-image medical data is still unstructured (e.g., notes or structured laboratory data transformed into unstructured document formats). Vocabularies, lexicons and ontologies are mature but schemata and usage still have large local variance.
Lastly, there is no central clearinghouse or national interoperability for medical record data though some has been in development for a decade or more. Each institution, cluster of institutions or other association of data stewards act on their own within the limits of the law. So, obtaining high quality annotated data sets for both supervised and unsupervised learning will remain a costly challenge for years to come.
What is the “data out” problem?
Let’s say that you’ve overcome the data-in hurdles, you’ve acquired a great, annotated data set and the results on the test set are great. Now you have to validate it; compare the performance of your system to humans for this task and, I would warn, humans are very good at these tasks. This is done by performing an observer performance study and calculating a receiver operating characteristic curve that relates to the observer’s sensitivity and specificity. And since you are hoping the difference between your system and the human is small, the study must be large to have the statistical power to distinguish the two. These experiments take time and are costly to perform. Perhaps the system and the human used together are better than either alone? Does the system speed up the interpretation process or slow it down? I don’t want to throw any shade, but humans can determine gross normality of a chest radiograph in 200 milliseconds (Radiology. 1975 Sep;116(3):527–32).
OK. You’ve got an AI and it’s good enough for clinical use. How are you going to deliver your result to the clinician, radiologist or other anticipated user of the system and incorporate it into the electronic medical record? Their eyes are not fixed to generic development platforms like iOS or Android. Rather, they are attached to large, expensive, proprietary, often regulated devices and systems. There are standards for integration and interoperability but they must be addressed.
Unlike many consumer technology applications of machine learning, healthcare has a dedicated regulatory body in the FDA. As a result, the FDA will play a significant role in determining the future of machine learning in healthcare. What challenges do developers face in working with the FDA?
The first challenge is not to ignore the 800-pound gorilla in the room. Start early. Find out if your device is a device. I would argue that if your deep learning system is going to do anything meaningful it is going to be a device but there is plenty of guidance available to help the developer make that determination. Once you determine that your device is a device, you can determine what class of device it is and whether any exemptions apply. The class of the device is “based on the level of control necessary to assure the safety and effectiveness of the device.” These determinations will define the path you will take to FDA approval of your device.
Again, policy, paperwork, process. One fundamental philosophy of the FDA is “Quality System (QS) Regulation/Medical Device Good Manufacturing Practices.” While we all love ‘garage code’ that gets us 7 million users in 7 days, the FDA will insist that the code was developed with common good manufacturing process (CGMP). There are many software development methodologies that will meet CGMP and you might as well start using one from day one. Similarly, the FDA will look for GMP and appropriate regulations to have been applied to any data you use and any experiments you perform to validate that data.
Identify who is going to shepherd your company and product through the FDA process. Do you have a lawyer, accountant and CFO to deal with the IRS? You will probably need similar for the FDA. Prepare as much as you can in advance and work in parallel as much as possible.
What challenges does the FDA face in its consideration of these technologies? How can regulatory bodies such as the FDA keep up with the speed of development? How should investors and entrepreneurs think about the FDA’s role in the process of development?
How smart is the gorilla and how good is he at his job? Pretty smart and fairly good. The FDA works by assigning devices for evaluation to one of 16 medical specialty “panels”. These panels rely on published and unpublished scientific studies. One power of the FDA is its ability to convoke panels of industry and academic experts to analyze the evidence. The radiology panel has, for example, already approved “Analyzer, Medical Image” (govspeak) systems based on deep learning techniques such as convolutional neural networks.
The system is, admittedly, slow. This is not, however, solely due to the nature of a large government bureaucracy. Following and documenting the CGMP process, even for software, is tedious and time consuming. Performing and documenting the scientific validation is meticulous and time consuming. Statistical analyses, publishing and analyzing the published and unpublished results all take time. Remember, we are talking about a medical device that could diagnose or steer the diagnosis in many directions. It seems like a demonstration of “safety and effectiveness” is only just that for which your mother would ask before she allowed it to be used on her.
What are the benefits that deep learning can provide in healthcare? What is its value proposition, and in what areas of the healthcare system is it most helpful? How does the development of AI fit within the conversation about the rising and unsustainable costs in healthcare?
The value of deep learning systems in healthcare comes only in improving accuracy and/or increasing efficiency. Healthcare, today, is a human — machine collaboration that may ultimately become a symbiosis or even cyborg relationship. We are still at the stage, however, that we have both humans and machines each performing both tasks at which they are suboptimal. As deep learning systems develop and evolve they will more and more assist humans with those tasks at which humans are not good. So, for example, humans are very good at processing information from their senses including vision. They are very good at perceiving human emotions. But humans are not so good at remembering things, searching for and organizing data and not too good at correlating and reasoning about that data. So I foresee DL systems that will make physicians and other providers faster and smarter in their diagnoses and reduce uncertainty in their decisions thereby avoiding costs and hazards and saving time.
A similar debate that is facing industrial automation with robotics could be made about deep learning in health informatics when it comes to job replacement. Do you see backlash from the medical community towards utilizing concepts such as deep learning with regard to its part in changing medical practice? Are there any similar historical analogies you could speak on where technology fundamentally changed the way medicine was practiced, but had significant risks to “traditional” medical practice?
Medicine, in general, and radiology, perhaps more so than any other specialty, has been very good at developing and adapting to new technology. The golden road to the annual meeting of the Radiological Society of North America (the largest medical meeting in the world) is paved with technological innovation. Many fundamental technology “sea changes” have occurred in radiology, in a relatively short time, many within our lifetimes. For example, the transition within a decade or two from film based imaging to digital imaging. Dark room staff (large numbers of whom were blind!)? Eliminated like buggy whip manufacturers. Film file storage (c.f., The Cleveland Clinic X-Ray Fire of 1929) “librarians”? Reduced or eliminated. Job loss? Some, but not as much as you would think. The transformation to digital and the (ongoing) explosion of new imaging modalities opened new opportunities as did work in the information systems and the changing healthcare environment itself. Industrial disruption? Sure (c.f., Kodak where the small, growing digital siamese twin slew the body of the mighty film producer). Job loss? Some, especially locally. But less than expected given the number of healthcare information technology companies that arose in parallel.
What about radiologists? Remarkably adaptable to technology perceived as positive to the patient or the institution. At one institution, in 1999, 25 radiologists went from reading images on film to reading images on computer workstations overnight without a significant degradation in accuracy or efficiency. Eventually, they were faster on the new workstations and with new, learned behaviors could never return to film. Fewer radiologists? Not really as new uses for imaging and new imaging technologies were developed. Look how well radiologists have adapted first to mammography (special techniques and technology) then digital mammography, then digital mammography with computer assisted detection/diagnosis and now digital breast tomosynthesis. Accuracy and efficiency have incrementally increased at each step to the benefit of women everywhere. Fewer mammographers and radiologists? Not really.
We, as a society, are going to have to face the accelerating pace of automation and its impact on the workforce and society. There is, however, nothing to suggest to me that these effects will occur faster or in different form in healthcare and in particular due to deep learning. Do I still recommend Radiology as a career to high school and college students? Absolutely.
Deep learning in healthcare has been thriving in recent years. What do you see for the field going forward? What are the important considerations deep learning researchers need to consider for deep learning to be most effective (both from a cost and computational perspective) and ethical going forward?
I see unlimited opportunity to improve the system. Despite current best efforts, there are innumerable inaccuracies and inefficiencies in the system that are ripe targets for DL and other technologies. The most important consideration is to choose your target wisely. Don’t lose sight of the link between the accuracy and efficiency you improve and the pain and suffering you reduce.
If you’re interested in deep learning or working on something in this area, we’d love to hear from you.