Training Undergraduate Medical Students in
Artificial Intelligence: A Student Perspective

Itambolkar
AI Hero
Published in
19 min readApr 2, 2022

The use of Artificial Intelligence (AI) in health care has the potential to revolutionize the field but requires a blend of both medical as well as
technical expertise. A course about the use of AI in healthcare must be introduced at the undergraduate level itself so that the medical students are well-versed with it by the time they become practitioners. I believe that no-code tools provide a suitable platform for medical students, who have no exposure to AI, Machine Learning (ML), or even computer programming, to create an introductory “black-box” understanding of AI techniques. This article offers a unique perspective on the questions that a medical student might have about how AI learns and it can be used for automating a
typical medical use case. This article aims to find answers to these questions and present the evaluation of a no-code tool for learning about AI through first-hand experience, and build a mental model about how machines learn and nuances with the training data selection.

Data and Code Availability: This article uses the MP-IDB: The Malaria Parasite Image Database for Image Processing and Analysis dataset (Loddo et al., 2018), which is available here. The semantic segmentation model was trained using the Keras implementation of DeepLabV3+available at https://github.com/bonlime/keras-deeplab-v3-plus.

Photo by Testalize.me on Unsplash

Introduction

The integration of Artificial Intelligence (AI) and Ma-
chine Learning (ML) in healthcare has provided a specialized branch of research that requires both medical and technical expertise. This concept has the potential to revolutionize the quality of healthcare de-
livered to the community in terms of speed, accuracy, and scalability of diagnosis and treatment. Research in this field necessitates cross-disciplinary literacy, basic knowledge, and understanding of core concepts in AI that pathologists and laboratories have typically been unfamiliar with (Dhillon and Singh, 2019).

Outside of research, various government regulatory agencies have laid out action plans for releasing intelligent Software as Medical Devices (e.g. (Food et al., 2020)). Novel AI-powered solutions are already getting regulatory approval (FDA, 2021) and are available to medical practitioners such as physicians and surgeons.

In light of the above advancements, it can be assumed that AI is going to play a huge role in clinical medicine and diagnosis in the near future. As users of this technology, the practitioners must be able to understand AI and apply AI-powered solutions in the same manner that they must understand any other technology that has an impact on clinical decision-making, such as MRIs (McCoy et al., 2020). While Continuing Medical Education (CME) courses are available for physicians (e.g. https://stanford.cloud-cme.
com/course/courseoverview?EID=40335) there is also widespread global interest among medical students in AI literacy and training in providing patient care using AI-based diagnosis support systems (Wood et al., 2021; Tran et al., 2021). Budding doctors expect training regarding AI in various healthcare functions even at the undergraduate level(Mehta et al., 2021). It follows then that medical schools must prepare physicians to use, interpret, and explain the outputs of such AI to their patients, with-
out the need to fully understand the Machine Learn-
ing algorithms used.

Meanwhile, disruption in the software industry has led to the rise of “no-code” platforms, that allow enthusiasts and engineers to build ML models without the need for technical coding expertise (https://levity.ai/blog/no-code-ai-map). As the list of no-code tools increases, there has been some promise in the use of such tools in medical image analysis (Kalshetty and Rakshit, 2021). According to me, such no-code tools provide a suitable alternative for medical students, who have no exposure to AI, Machine Learning, or even computer programming, to create an introductory ”black-box” understanding of AI techniques. I believe that by using such tools, they can build a mental model of how machines can be taught routine image analysis tasks, so that it may get easier for them to understand, justify, and explain the results of the models that are part of the released SaMDs.

Methodology

The Task: Detecting Malarial Parasites in Peripheral Blood Smears (PBS)

In patients affected with malaria, detecting the type of malarial parasite is critical for timely and effective treatment. Malaria, sometimes called the ‘King of Diseases’, is caused by different species of the parasite ‘Plasmodium’- P. falciparum, P. malariae, P.ovale, P.vivax. The disease occurs due to the entry of this pathogen into the bloodstream of a human by the bite of the female Anopheles mosquito. The need for practical diagnostics for malaria control is growing because early accurate diagnosis reduces both morbidity and mortality. It may be difficult to distinguish malaria from other tropical infections based on patients’ signs and symptoms alone. As a result, confirming diagnoses using laboratory methods is critical (Tangpukdee et al., 2009). We chose this use-case because of its potential impact — the use of AI to automate the detection of parasites in digitized PBS images will potentially reduce the detection time, increase the accuracy of and make the malaria-detecting test more accessible by a professional who has technician-level expertise in viewing using a digital microscope even in rural areas.

The Dataset: The Malaria Parasite Image Database (MP-IDB)

We investigated several openly available datasets on malarial parasites as candidates for the assessment. The datasets available have digitized images of the Peripheral smears of patients affected with malaria along with annotations for different ML tasks like classification, object detection, and semantic segmentation. Among the three, NIH’s malaria dataset (Rajaraman et al., 2018) has small images of individuals' infected and uninfected cells annotated for a binary classification task. The Broad Bioimage Benchmark Collection (Ljosa et al., 2012) is a dataset of P. vivax affected peripheral blood smears with bounding-box annotations around different types of cells — two classes of uninfected cells (RBCs and leukocytes) and four classes of infected cells (gametocytes, rings, trophozoites, and schizonts). The Malaria Parasite Image Database(Loddo et al., 2018) also contains images of entire PBSs with each parasite semantically segmented and along with its classification of type and stage of life. We decided to use the MP-IDB for our analysis for its simplicity in the visualization of the annotations for review, its reasonable size, and the learning challenge for semantic segmentation as a result of often missing annotations of early-stage parasites.

The MP-IDB dataset has about 229 digitized images of PBSs with each image having at least one infected parasite. Along with the images, the dataset
contains the semantic segmentation mask around the infected parasite along with its type. While also provided in the dataset, we ignored the stage of the parasite for this evaluation. As seen in Figure 1, a typical image in the dataset contains a large number of red blood cells with neutrophils and platelets floating among them. In the ground truth provided, only the obvious fully infected RBC is highlighted. However, there is at least one missing infected cell (about 1/5ths way down and 3/5ths way across. The RBCs
at the bottom have platelets overlaid on them and do not appear infected. 90% of the images were used for training and the remaining 10% were used as a validation split.
For the entire dataset, the overall image quality seems fair, readable enough for the human eye despite being blurred occasionally and having imaging artifacts of shadows when the photograph was taken.
An observation to note, which will be more relevant in our assessment, is that different images have different lighting, hue, and staining concentrations resulting in some large variation in the images, given that the dataset only contains 229 images.

The Model: A no-code implementation of DeepLabV3+ with ANONYMIZED NO-CODE PLATFORM

Since this work is focusing on treating the model training as a black box, and creating a mental model of how AI learns, we decided to use a state-of-the-art semantic segmentation model for the evaluation. We acknowledge that there may be many opportunities for improving the model architecture. However, the aim here is to study the performance of AI from the perspective of an undergraduate student in automating the malarial parasite detection task and not the ML algorithm. As a result, we focus on evaluating images in the dataset individually and present our quantitative and qualitative assessment of how the model is learning in the section following this.

The ANONYMIZED NO-CODE PLATFORM automated the model training pipeline and provided a user interface that allowed for easy browsing of the images along with the visualization and comparison of both the predictions and ground truth Figure 2. For the semantic segmentation task, the platform used a DeepLabV3+architecture using an Xception backbone (Chen et al., 2017) architecture with model weights pre-trained on the Pascal-VOC (Everingham et al,2010) dataset. An open-source Python implementation using Tensorflow’s Keras framework of the model architecture was usedhttps://github.com/bonlime/keras-deeplab-v3-plus. The no-code platform training pipeline had the following presets: an Adam optimizer; 100 epochs; and focal Tversky loss to correct for class imbalance.

Evaluation

Below, we first present the quantitative and qualitative performance results of the model learned. Our evaluation focuses on a student’s perspective on how the AI learns based on the first-hand experience of using the no-code AI platform. It is followed by a discussion on the impact this has on student understanding and finally identifying questions that a course on AI using such a tool would need to address.

Quantitative Evaluation:
Model Performance

Table 1 shows the metrics reported by ANONYMIZED NO-CODE PLATFORM on how well the semantic segmentation model learned. These metrics being specific to semantic segmentation were not particularly helpful from a medical student's perspective on how well the model predicted malarial parasites. However, on visually browsing the masks predicted by the model, a quick visual assessment showed that the model was indeed learning to mask the malarial parasites.

Scope for improvement

One challenge in comparing the results with previous results was the lack of prior-art that performed semantic segmentation on the entire image in the MPIDB dataset — existing solutions either cite, but avoid performing semantic segmentation citing lack of bounding-box annotation (Arshad et al., 2021;
Sultani et al., 2021), or prefer semantic segmentation on other datasets (van Driel, 2020; Abraham, 2019). In spite of this, we recognize that the following is more desirable for a no-code AI system for such tasks in healthcare — As future medical practitioners, undergraduate students are taught about the parameters in which the accuracy of laboratory tests done in medicine are expressed. These are the ’Sensitivity’(True positive rate) and ’Specificity’ (True negative rate ) of a particular test. Undergraduate students are well-versed with these terms. However, they are unaware of the different metrics used in Artificial Intelligence to denote the accuracy of the model.

Hence, they may find it difficult to assess how well the model is working in its function to detect pathological findings in the image. It is therefore vital, to use terms that medical students can comprehend. If the no-code tool used can somehow convert parameters like Dice Loss or IoU Score into ’Sensitivity’ and ‘Specificity’, it may make it easier for undergraduate students to understand its accuracy and compare different models with each other.

Qualitative Evaluation

Additionally, we also have the following findings when
evaluating around 50 images (presented with typical
examples below). Overall, we can see that the model did not miss
out on any of the parasites of P. falciparum but could not detect the other species of P. malariae accurately. To add to this, it also incorrectly predicted it as P.falciparum. In some places, it even detected ‘false positives’ of P. falciparum. Figure 3 shows an example.

This could be attributed to

(a) the small number of annotated images in the dataset

(b) Most of the annotations are those of P. falciparum leaving too few images with other Plasmodium species that the model could train on (i.e. class imbalance).

The model also incorrectly predicted platelets or other RBCs, not designated as parasitized in the annotations, as parasites of P. falciparum. Thus it could be reasoned that the model makes the predictions on the basis of the shape and color of the structure rather than the identifying features of P. falciparum such as the ring form or accole structure of the parasite. Thus any structure, even distantly resembling the parasite in shape or hue, is predicted as one. Figure 4 shows an example of this. Absurdly, there are images where the model has not predicted anything at all for annotations of parasites other than P. falciparum. Figure 5 shows an example of this. This questions our basic logic that
the model is basically predicting everything on the basis of the shape of the structure and its color. There is something more to the pattern of predictions than just the shape and color of the structure that it identifies to be a parasite. Thus, some explainability in the tool would be appreciated.

An alarming finding was that the dataset itself is not completely and accurately annotated and a few questionable annotations were present. Thus, non-uniformity exists between the images in the dataset. The incompleteness of the annotations can also be seen in several images. Figure 6 shows a (cropped) example where a couple of annotations were missing.

This could have played a role in the comparatively low efficacy of the model in making accurate predictions. Procedures for modifying datasets and bias mitigation strategies may only be beneficial if the concerned dataset is a well-designed task. When preparing lemonade from lemons, it’s important to make sure the lemons aren’t sour or deformed(Paullada et al., 2021)!

Finally, we find the data varying in hue, lighting, and sharpness Figure 7. The background of the images also differs considerably from each other. While some images have a pinkish tinge to the background, others have a straw-colored one. This raises questions on what effect the inconsistency in data preparation (e.g. staining), image capture device (e.g. digital microscope vs. digitized photographs), and other factors (such as lighting, imaging artifacts from borders) has on the model performance.

Discussion

While the model performance needs improvement, the use of the no-code AI tool for learning about ML has raised a few pertinent concerns. The most important concern is related to competency: “Can AI be used as an independent diagnostic tool or should it be used to complement the healthcare worker involved, in this case, the pathologist?”.

This is because it is certainly not detecting every pathological finding the im-
age has to offer. However, there are two ways to look at this shortcoming for a better student understanding of how the AI learns:

  • Having good data is of paramount importance and the data should have the right variety and volume: There already exists some apprehension about the accuracy of datasets in the minds of pathologists. If in-
    correctly annotated or incompletely annotated datasets lead to decreased accuracy of predictions made by the model, would further cast a doubt about the utility of AI among pathologists
    or even doctors in general.
    • The algorithm of the AI is such that it trains on the images without knowing the entire spectrum of pathological changes other than the presence of parasites alone. For example, the RBCs infected by P. vivax are slightly larger than the uninfected ones. The normal human mind of the
    pathologist, having been trained on these facts will notice it immediately after comparing the infected RBCs with the uninfected ones. How-
    ever, the AI model bases its identification solely on the parasitic infestation of the RBC and will only consider the morphology and color of the
    parasite. This could also hamper the accuracy of the model.

The first issue is quite resolvable and calls for a better annotation of the data that will proportionally increase the quality and accuracy of the predictions by the model. The second issue however is complex and casts a doubt on the independent functioning of the model Thus, from a student's perspective, a course on Artificial Intelligence should enable them to identify what a good dataset is and how important it is to use a dataset that is up to the mark.

A point to be noted about the model is the inconsistency in the reporting of predictions. The image that has been predicted as ‘pathological’ somewhere,
has been detected as normal elsewhere. The reason for such a pattern of detection is unknown and remains a topic of further research. Another aspect of concern of the model is the way it predicts and detects an abnormality. After analyzing the images the authors opined that the no-code platform made predictions based on a host of factors. The detection and prediction were actually an interplay of the size, shape, color of the dye, and the depth of its shade. Besides this, the model also looked at the immediate surrounding of the pathology, in this case, the parasite. For example, it detected even a spot of dye inside an RBC as a parasite while it spared a similar spot located extracellularly. It, therefore, becomes important to understand the algorithm of detection by the model- What the model ‘thinks’; and compare it
with how a pathologist ‘thinks’. This will help us to understand where both lines of thought diverge and give us an insight into how to combine the best of both worlds and increase the accuracy of predictions and detections. Thus, as a student understanding the concept of Artificial Intelligence, one needs to know when to trust the model and when not to trust it. Thus the course needs to enlighten the students about learning to assess the findings of the model so that they know when to ’trust’ its findings.
An important advantage that the model brings to the table is that it does not miss out on even the slightest of abnormalities in the image provided it has learned it well. It directs the attention of the observer towards the abnormality thus increasing the sensitivity of predictions. However, the specificity may be com-
promised and this is an aspect of concern. It may need human help to improve the specificity. In that case, the whole system might not be a cost-effective process, especially for a developing country like India. It will also not serve the very purpose it was developed for reaching every nook and corner of the country, especially where skilled pathologists are unavailable for detection and diagnosis. The point this brings forward is whether the model should only be reserved for the diagnosis of rare diseases that may other-
wise go unnoticed. These are practical questions that need to be addressed in due course of time.

Another point that needs to be highlighted is that generally,
one requires a very large dataset to train a model. This means that one needs skilled manpower to actually annotate every image, completely and correctly.

The model also trains on images that are already provided to it. Thus a source for capturing those images for the model needs to be present, like a digital microscope. All this would definitely have a bearing on the cost-effectiveness of the model. If we intend to use it in peripheral or rural areas, then we need to ‘rethink’ the entire process of training and using the model so
that it becomes a cost-effective option that we can use in the primary healthcare centers in rural areas. Thus, while designing a course for undergraduate students, it is imperative to give them a perspective
on the financial aspect of using the model also and not lust limit the boundaries of the course to academic knowledge about Artificial Intelligence. This will ensure that, in the future, when they want to research
the use of Artificial intelligence in healthcare, they will do so keeping the financial aspect in mind.

In this process, we would ensure that Artificial Intelligence is used where it is practically required as well as possible.

Additional Questions Raised

The analysis and the findings have led to more questions in the minds of the authors and remain aspects of further research:

• How do we trust the model when we do not know at what point it fails to predict correctly and when it actually gets its prediction right?

• How does the AI model work when the human clearly has information external to the image?

•If humans themselves are unsure about the annotations and are unable to completely annotate the images, how do we expect an AI model to do
it?

• Is this algorithm of just looking at similar images and then predicting, actually useful, or should an alternate and different method be devised altogether?

• Should AI be used only to supplement the diagnosis made by a pathologist? If yes, is this a cost-effective method especially in developing
countries?

• If it does not serve the purpose of being cost-effective, should it only be reserved for rare conditions and diseases?

  • Will the functioning of the model improve if the data collection strategies and dataset quality improve?

Conclusion

Although there is some buzz about the use of AI in healthcare, it is still in its nascent stages. Moreover, there exists a certain level of skepticism about its use in the field of healthcare among medical professionals.
However, with improved methods and models, AI can easily be integrated into routine medical practice and will have great utility in it. Thus, the possibility of AI in healthcare is a routine practice cannot be refuted. Thus, future doctors need to be prepared and will need to have a set of skills to seamlessly integrate AI into their medical practice. It becomes important to acquire these skills early on to smoothen the process of routinely using AI in medicine. I believe that introducing AI and machine learning to undergraduate students, in the form of essential knowledge about the subject with the help of no-code tools, is promising in its endeavor to impart the skills that
doctors would need to use AI in medicine routinely.

However, knowing what undergraduate medical students think about AI and the questions they have in mind when they are first exposed to AI should
be prioritized. This could possibly help us develop a student-friendly AI course that is just enough to prepare them for a future in which artificial intelligence will play a prominent role in healthcare. The nature of designing such a course should be dynamic and student-centric. Thus, constant feedback from students may be taken as well as timely and relevant updates to the course need to be made. This article highlighted the questions that arose in the mind of an undergraduate medical student about the use and working of AI in healthcare.

Future Work

We intend to get the opinions and perspectives of many more undergraduate medical students on the use of AI in healthcare. We would like to do so by developing a short course for undergraduate students that provides them with essential knowledge about AI and how it works. We would then like to get feedback from them about what they feel needs to be improved or added to the course as well as the questions about AI that they have in mind. We would try to find solutions to those questions and include them in the course so that a student-friendly course that provides important yet holistic knowledge to the students about this subject is created.

References

1. Julisa Bana Abraham. Malaria parasite segmentation using u-net: Comparative study of loss functions.Communications in Science and Technology, 4(2):57–62, 2019.

2. Qazi Ammar Arshad, Mohsen Ali, Saeed-ul Hassan, Chen Chen, Ayisha Imran, Ghulam Rasul, and Waqas Sultani. A dataset and benchmark for malaria life-cycle classification in thin blood smear images. arXiv preprint arXiv:2102.08708, 2021.

3. Liang-Chieh Chen, George Papandreou, Florian
Schroff, and Hartwig Adam. Rethinking atrous
convolution for semantic image segmentation.
arXiv preprint arXiv:1706.05587, 2017.

4. Arwinder Dhillon and Ashima Singh. Machine learning in healthcare data analysis: a survey. Journal of Biology and Today’s World, 8(6):1–10, 2019.

5. Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge. International journal of computer vision, 88(2):303–338, 2010.

6. US FDA. Fda authorizes software that can help identify prostate cancer. The U.S. Food & Drug Administration Press Announcements, 2021.

URL: https://www.fda.gov/news-events/press-announcements/fda-authorizes-software-can-help-identify-prostate-cancer.

7. US Food, Drug Administration, et al. Artificial intelligence and machine learning in software as a medical device. Content current as of January, 28:
2020, 2020.

8. Ashwini Kalshetty and Sutapa Rakshit. Use case of no code machine learning tools for medical image classification. 2021.

9. Vebjorn Ljosa, Katherine L Sokolnicki, and Anne E Carpenter. Annotated high-throughput microscopy image sets for validation. Nature methods, 9(7):637–637, 2012.

10. Andrea Loddo, Cecilia Di Ruberto, Michel Kocher, and Guy Prod’Hom. Mpidb: The malaria parasite image database for image processing and analysis. In Sipaim–Miccai Biomedical Workshop, pages 57–65. Springer, 2018.

11. Liam G McCoy, Sujay Nagaraj, Felipe Morgado, Vinyas Harish, Sunit Das, and Leo Anthony Celi. What do medical students actually need to know
about artificial intelligence? NPJ digital medicine,3(1):1–3, 2020.

12. Nishila Mehta, Vinyas Harish, Krish Bilimoria, Felipe Morgado, Shiphra Ginsburg, Marcus Law, and Sunit Das. Knowledge of and attitudes on artificial intelligence in healthcare: A provincial survey study of medical students. medRxiv, 2021.

13. Amandalynne Paullada, Inioluwa Deborah Raji,Emily M Bender, Emily Denton, and Alex Hanna.

14. Data and its (dis) contents: A survey of dataset development and use in machine learning research.Patterns, 2(11):100336, 2021.

15. Sivaramakrishnan Rajaraman, Sameer K Antani, Mahdieh Poostchi, Kamolrat Silamut, Md A Hossain, Richard J Maude, Stefan Jaeger, and George R Thoma. Pre-trained convolutional neural networks as feature extractors toward improved malaria parasite detection in thin blood smear images. PeerJ, 6:e4568, 2018.

16. Waqas Sultani, Wajahat Nawaz, Syed Javed, Muhammad Sohail Danish, Asma Saadia, and Mohsen Ali. Towards low-cost and efficient malaria detection. arXiv preprint arXiv:2111.13656, 2021.Noppadon Tangpukdee, Chatnapa Duangdee, Polrat

17. Wilairatana, and Srivicha Krudsood. Malaria diagnosis: a brief review. The Korean journal of parasitology, 47(2):93, 2009.

18. Anh Quynh Tran, Long Hoang Nguyen, Hao Si Anh Nguyen, Cuong Tat Nguyen, Linh Gia Vu, Melvyn Zhang, Thuc Minh Thi Vu, Son Hoang Nguyen, Bach Xuan Tran, Carl A Latkin, et al. Determinants of intention to use artificial intelligence-based diagnosis support system among prospective physicians. Frontiers in public health, 9, 2021.

19. N van Driel. Automating malaria diagnosis: a machine learning approach: Erythrocyte segmentation and parasite identification in thin blood smear microscopy images using convolutional neural net-
works. 2020.

20. Elena A Wood, Brittany L Ange, and D Douglas Miller. Are we ready to integrate artificial intelligence literacy into medical school curriculum: Students and faculty survey. Journal of Medical Education and Curricular Development, 8:23821205211024078, 2021.

This article is written by Isha Tambolkar and Rahul Parundekar for AI Hero.

--

--