Segmenting Anything Model (SAM) in Need of a Medical Makeover!
Can SAM truly be considered a proficient segmentation tool in the domain of medical imaging, or does it fall short as a mere gimmick attempting to emulate the success of chatGPT ?
SAM is primarily designed for segmenting objects in natural images rather than medical images. While SAM performs well on natural images, it may not be directly applicable to medical image segmentation due to several reasons:
- Domain differences: medical images differ significantly from natural images in terms of resolution, sharpness, noise levels, colors diversity and deformations.
- Limited annotated data: acquiring annotated medical data is time-consuming, leading to limited availability of training samples.
- Imbalanced class distribution: medical data may exhibit imbalanced classes due to the natural distribution of behaviors.
- Variability in anatomical structures: the diversity observed in medical images is a reflection of the wide range of variations present in different patients. Each patient’s unique characteristics and conditions contribute to the large distribution of medical cases.
Cases requiring accurate segmentation include:
- Radiation therapy: during treatment planning, medical imaging techniques such as CT (Computed Tomography), MRI (Magnetic Resonance Imaging), and PET (Positron Emission Tomography) scans are used to visualize and precisely delineate tumors and surrounding healthy tissues. This allows radiation oncologists to create a treatment plan that delivers the necessary radiation dose to the target while sparing nearby critical structures and organs.
- Surgical planning: supporting surgeons by providing precise preoperative segmentation of structures and tumors, enabling better visualization and planning for complex surgeries.
- Disease monitoring: aiding in monitoring the progression of diseases, such as tumor growth or changes in organ structures, by segmenting and comparing medical images over time.
- Image-guided interventions: assist in image-guided procedures, such as needle biopsies or catheter placements.
The advantage of SAM lies in its zero-shot learning capability. However, there are several competing architectures that are not zero-shot but are widely used for segmentation and classification of medical images.
The non-zero shot medical segmentation heroes:
- U-Net: A popular architecture for medical image segmentation, U-Net consists of an encoder path and a decoder path.
- 3D U-Net: Extending U-Net to process volumetric medical images.
- Attention U-Net: Attention mechanisms are integrated into the U-Net framework to emphasize relevant regions and suppress irrelevant information.
- V-Net: read this blog.
It’s important to note that the choice of architecture depends on the specific requirements of the medical image segmentation task, the dataset, and the characteristics of the images being segmented. Researchers often customize and adapt these architectures or develop novel architectures that address the unique challenges and characteristics of medical imaging data.
The following figure illustrates a comparison of the mentioned architectures, where the Dice accuracy serves as a metric to evaluate the segmentation performance across various medical applications, with a higher Dice coefficient indicating better segmentation accuracy.
The performance of SAM was assessed using the comprehensive COSMOS 553K dataset. This dataset consists of 16 modalities, 68 objects, and a staggering 553K slices, providing a rich and diverse set of images for analysis and assessment of SAM’s segmentation capabilities. Figure 1 illustrates the various modalities included in the COSMOS 553K dataset: CT, T1-weighted MRI, diffusion-weighted MRI, CMR (Cardiac Magnetic Resonance), cine-MRI, MRI, T2-weighted MRI, histopathology, electron microscopy, ultrasound (US), X-ray, fundus, colonoscopy, dermoscopy, and microscopy. However, this blog will discuss several case studies that SAM aims to tackle, including:
- Abdominal organs (mentioned in the context of SAM’s performance in abdominal CT organ segmentation
- Liver (mentioned specifically in the context of multi-phase liver tumor segmentation)
- Brain (mentioned in the context of SAM’s effectiveness in brain tumor segmentation)
- Colon (mentioned in the context of SAM’s performance in polyp segmentation in colonoscopy images)
- Kidneys (mention in the context of the impact of interactive platforms on SAM’s adaptive segmentation in abdominal CT images)
- Retinal (mentioned in the context of retinal blood vessel segmentation in fundus images)
How does SAM work? Honestly, very simple! Here are its key abilities:
- Zero shot: users can segment objects without any prior training or additional guidance.
- Bounding box selection: users can define object regions using bounding boxes by simply drawing a rectangle around the desired object.
- Point selection: users can specify key points within an object of interest and SAM leverages this information to segment the entire object.
- Interactive experience: users can refine and improve the initial segmentation by providing additional guidance or corrections.
Interactive platform: it’s the matter that makes a difference!
The interactive platform plays a crucial role in acquiring accurate segmentation results. When examining kidney segmentation, the significance of different visual prompting types is highlighted in Figure 1. The study focuses on comparing the effectiveness of point prompts and bounding box prompts, with a particular emphasis on the competitive performance achieved by box prompting, even under the presence of moderate jitter. These findings underscore the importance of interface choices and highlight the role of interactive platforms in achieving accurate and robust kidney segmentation results.
Swiss-SAM slicing: a heavenly match in medical imaging!
3D Slicer, the Swiss army knife of medical imaging, offers advanced visualization and processing capabilities, allowing users to explore and analyze image data in a 3D environment. It provides a range of tools for interactive segmentation, annotation, registration, and fusion of multiple image modalities. SAMM leverages these capabilities to ensure seamless integration and efficient processing of medical images with SAM. By integrating SAM with 3D Slicer, users can visualize and analyze the segmentation results generated by SAM within the same software, enabling them to assess and refine the segmentation masks as needed. Overall, the integration of SAM with 3D Slicer provides a powerful and familiar platform for medical image segmentation. It combines the advanced segmentation capabilities of SAM with the rich image processing, visualization, and analysis tools of 3D Slicer, enabling researchers and medical professionals to perform efficient and accurate segmentation tasks on medical images.
Under unprompting settings, how does SAM perform for polyp segmentation during colonoscopy?
In this scenario, SAM did not demonstrate superior performance compared to the existing state-of-the-art models. Therefore, there is a need for further research and development to enhance SAM’s performance specifically in polyp segmentation, which is crucial for improving the diagnosis and treatment of colorectal cancer. Figure 2 presents examples of SAM’s performance in polyp segmentation, showcasing both improved segmentation results (on the left) and less accurate segmentations (on the right).
To support radiotherapy treatment planning, SAM achives favorable brain tumor accuracy on MRI. This is why.
Radiotherapy treatment planning is the process of determining the optimal trajectory for delivering radiation therapy to a patient with a tumor. It involves careful consideration of the tumor’s location, size, shape, and proximity to critical functional areas (speech, visual, auditory, sensory etc.), as well as the desired radiation dose and treatment technique. The goal is to maximize the effectiveness of the treatment while minimizing the impact on healthy tissues.
Figure 3 depicts the comparison between ground truth tumor segmentation (red) and SAM segmentation (green) overlaid on an MRI image of the brain. In my opinion, the study achieved high accuracy due to the selection of “easy” images with distinct contrast between the tumor and surrounding pixels, as well as well-defined tumor boundaries, unlike ambiguous or indistinct cases. This enabled more precise segmentation results, contributing to the observed high accuracy in tumor segmentation.
How do different resolutions and imaging phases affect the performance of SAM in liver tumor segmentation?
Does SAM see things clearly with high-resolution images and point out the key details, or does it struggle to connect the dots when faced with low resolution and a sparse selection?
Evaluating SAM using different resolutions [224, 512, 1024] and various phases [non-contrast, arterial, portal venous, delayed] on a dataset of 1552 multi-phase contrast-enhanced CT volumes indicate that SAM’s performance may not meet the expected level, but it demonstrates potential as an efficient annotation tool when provided with sufficient human guidance, particularly with a higher number of prompt points (P=20).
In the context of the multi-phase liver tumor segmentation task, the phases refer to different stages or time points during a contrast-enhanced CT scan. Typically, a that scan involves multiple phases to capture different aspects of liver tissue enhancement. Commonly, the phases include:
- Non-contrast (NC) phase: This is the initial scan without the administration of a contrast agent.
- Arterial (ART) phase: This phase captures the arterial blood supply to the liver and is typically acquired shortly after the contrast agent injection.
- Portal venous (PV) phase: This phase captures the venous blood supply to the liver and is acquired when the contrast agent has circulated through the portal veins.
- Delayed (DE) phase: This phase is acquired at a later time point to capture delayed enhancement characteristics of the liver tissue.
What impact does the resolution of images and the quantity of selected points have on SAM’s performance?
- Providing more prompt points as guidance improves the segmentation results.
- The larger resolution of the data does not always lead to better segmentation results, contrary to expectations. increasing the resolution of the CECT images does not necessarily improve the overall performance of SAM.
Can we customize SAM and improve its performance by fine tune the model with few shot learning ?
Segmenting retinal blood vessels in fundus images proved to be a challenge for SAM, as initial attempts at zero-shot segmentation fell short in accurately identifying and segmenting the vessels. Even with manual prompts in areas with prominent vessel visibility, SAM struggled to achieve precise segmentation, likely due to the complex nature of continuously branching structures like blood vessels in medical images and tree branches in nature images.
To overcome this limitation, SAM was fine-tuned by utilizing SAM adapter, a task-specific fine-tuning method. In this experiment, 20 image-mask pairs from the Digital Retinal Images for Vessel Extraction dataset were selected for the fine-tuning process. The training was supervised, with ground truth masks used to guide and supervise the training process.
After the fine-tuning of SAM, the segmentation results were depicted in Figure 5, showcasing the nearly perfect alignment between the predictions and the ground truth in most cases, indicating the potential of SAM for precise medical image segmentation after undergoing domain-specific fine-tuning. However, there were small missing parts observed at the terminal ends of the vessels, suggesting areas for further improvement in SAM’s performance.
Closure: slice and dice: SAM’s quest for perfect segmentation in the medical maze (mess)
In conclusion, accurate segmentation is crucial in cases such as radiation therapy, surgical planning, disease monitoring, and image-guided interventions. While SAM demonstrates proficiency in segmenting natural images, it faces challenges in the domain of medical image segmentation. However, there are alternative architectures available, such as U-Net, 3D U-Net, Attention U-Net, and V-Net, that are specifically designed for medical image segmentation. These architectures offer tailored solutions to address the complexities and challenges of medical imaging data.
Fine-tuning SAM with few-shot learning shows promise in improving its performance in specific tasks, as demonstrated in the segmentation of retinal blood vessels. However, it is important to note that SAM is not yet ready for clinical use, and further research and development are needed to enhance its capabilities.
If you enjoy my work, feel free to drop me an email at miritrope@gmail.com or connect with me on LinkedIn. Additionaly, I’ve just opened my website page, you are welcome to visit me. I’d love to hear your thoughts and chat about our shared interests.
Reference:
- Huang, Yuhao, et al. “Segment anything model for medical images?.” arXiv preprint arXiv:2304.14660 (2023).
- Roy, Saikat, et al. “Sam. md: Zero-shot medical image segmentation capabilities of the segment anything model.” arXiv preprint arXiv:2304.05396 (2023).
- Liu, Yihao, et al. “Samm (segment any medical model): A 3d slicer integration to sam.” arXiv preprint arXiv:2304.05622(2023).
- Zhou, Tao, et al. “Can sam segment polyps?.” arXiv preprint arXiv:2304.07583 (2023).
- Putz, Florian, et al. “The Segment Anything foundation model achieves favorable brain tumor autosegmentation accuracy on MRI to support radiotherapy treatment planning.” arXiv preprint arXiv:2304.07875 (2023).
- Hu, Chuanfei, and Xinde Li. “When sam meets medical images: An investigation of segment anything model (sam) on multi-phase liver tumor segmentation.” arXiv preprint arXiv:2304.08506 (2023).
- Shi, Peilun, et al. “Generalist vision foundation models for medical imaging: A case study of segment anything model on zero-shot medical segmentation.” Diagnostics 13.11 (2023): 1947.