New Technologies to Improve Quality in Pathology

April Khademi, PhD, PEng.
13 min readJan 2, 2024

--

QUALITY IN PATHOLOGY

Pathology is the definitive diagnosis and is critical to patient care and management. The accuracy, quality and reliability of pathological analyses are resultantly very important. Several years ago, there was some scrutiny around the quality of care and treatment of patients in some Pathology centres [1]. This lead to the development of quality assurance protocols and governing bodies to ensure that high quality of care is delivered through laboratory services. Such programs are focused on improving patient safety by assessing the quality of lab test results and ensuring standards of excellence [2]. Quality management systems are being pursued in different regions around the world, including Canada [2], Europe [3] and the US [4].

Quality assurance programs seek to improve pre-analytic, analytic and post-analytic phases. © Khademi 2023

Quality assurance programs and improvement plans in surgical pathology seek to ‘‘assure’’ and ‘‘improve’’ surgical pathology ‘‘products’’ [5]. As a result, quality assurances and improvement monitors can include a variety of factors along the pathology workflow, which are summarized into three categories [3], [5]:

Pre-Analytic Factors: All factors that play a role before the slide reaches the Pathologist including whether the right test is ordered for the right patient, whether specimen fixation is adequate or samples are prepared in a timely and cost efficient manner. These can broadly be grouped into test ordered, delivery and accessioning.

Analytic Factors: Concerns the way the Pathologist visually interprets the slide and renders a diagnosis and/or treatment recommendation. The main factors that affect analytic quality include accuracy and precision of the diagnosis, scoring/grading reproducibility and bias.

Post-Analytic Factors: All factors that come after a diagnosis and recommendation is given by the Pathologist. This includes transcription, report delivery, follow up, monitoring and maintenance.

Quality assurance requires an end-to-end solution as these factors touch on a variety of workflow processes [3]. New digital pathology systems could be employed to eliminate paper work, streamline case review and management, and share images remotely. Although digital imaging offers increased access and efficiency, additional challenges surrounding slide interpretation (Analytic Quality) still exist.

DEEP DIVE ON ANALYTIC QUALITY ISSUES — DIAGNOSTIC REPRODUCIBILY, BIAS AND PRECISION

Once a slide is labeled with a diagnosis, the Pathologist may grade or score the lesion [6]. Grading for breast carcinomas is an important surrogate marker of metastatic potential used by oncologists to make treatment decisions. In non-neoplastic conditions, grading can reflect the activity of the disease related to inflammatory and/or fibrotic processes [6]. Additionally, grading and scoring systems can be used for research, to investigate treatment response. These semi-quantitative scales play an important role in patient management, treatment and research.

While the clinical utility of scoring and grading is clearly understood, there are psychological issues associated with the way these systems are developed [6], and visual examination of tissue slides is highly subjective. There can be mid-to-high discordance rates for slide interpretation between Pathologists, even if they are using the same semi-quantitative systems [7] [8] [9] [10] [11] [12] [13]. This creates challenges in patient management, since patients can receive different treatment regimens, depending on who reviews the slide.

A commonly discussed discordance rate is based on Her-2/neu overexpression scores [16]. Pathologists may evaluate the percentage of cells stained at each intensity score [14] which has poor concordance especially in 1+ and 2+ scores [16], as well as between local and reference laboratories [7]. Labs may lack specialists and experience in Her2/neu scoring [7]. While FISH testing can be used to examine equivocal (weakly-positive) cases more definitively [16], additional testing is expensive and not always available — especially in under-served regions. Similarly, many studies examined the agreement between Pathologists for ER/PR intensity and proportion scores, and although the rates are higher than those of Her-2/neu analysis, discordance still exists [8] [9] [16].

Her2 assays for breast carcinomas over 0, 1, 2 and 3+ scores.

Other IHC samples such as Ki-67 and p53 have associated scoring methods that depend on the number of positively stained nuclei, not on the intensity scores as in Her2/ER/PR assays. For example, the Ki-67 proliferation index (PI), which is a prognostic marker for tumour proliferation, is measured by counting the number of positively stained cells among the total number of malignant cells and can have mid-low concordance rates [17]. One study showed that among 5 pathologists, moderate agreement was found for Grade1/Grade 3 tumours (κ = 0.56–0.72) and poor-to-moderate concordance for Grade 2 tumours (κ=0.17–0.49) [10]. Comparably, the concordance between specialized pathologists and non-specialized pathologists for p53 scoring was found to be poor (κ=0.13–0.25) [11]. Despite not relying on intensity scoring, disagreement in Ki67 and p53 scoring still exits.

Studies showing discordance among pathologists for grading of breast carcinomas.

Grading systems have similar challenges, such as the Nottingham Grading System (NGS) for invasive ductal carcinoma in Hematoxylin and Eosin (H&E) stained breast samples. The NGS relies on three tumour features: 1) Mitotic Count, 2) Nuclear Grade and 3) Tubular Formation. The agreement between Pathologists using the NGS has been described in many research articles. In [12], the agreement between 6 Pathologists was found to be poor-moderate for all three features (κ = 0.64, 0.52 and 0.40 for tubule formation, mitotic and nuclear grades, respectively). Another paper summarizes the findings of several research studies on NGS concordance rates [13]. These findings demonstrate further the variability in pathological slide review, as well as the motivation for creating new technologies to address these issues.

WHAT CAUSES VARIABILITY IN DIAGNOSTIC PRECISION?

There are many reasons why discordance in pathological tissue interpretation exists. Some of it may be due to lack of sub-specialty expertise in a particular lab, or lack of experience. But majority of it is due to the subjective nature of the task (as is with any human-based analysis).

Consider the Ki-67 PI; variability could exist due to many subjective factors: detecting Ki-67 positivity (determining whether a cell is positive due to “brown staining” ), user-specific selection, number, and location of the high power fields (HPFs) for scoring, as well as rough estimation or “eye-balling” of the number of cells (versus physical counting). Although many recommendations have been made to standardize Ki-67 scoring, such as the Global Method proposed by the International Ki67 Working Group, including the number and location of HPF to use for scoring, these standards are continuously evolving and do not address all of the concerns; especially operator-dependent subjectivity.

Regions from Ki67 slides. Left shows darkly (green) and lightly (red) stained Ki67 cells. Right shows different regions from different HPF.

Scores for other IHC stained samples, such as Her2/ER/PR, depend on both the number (proportion) of cells positively stained as well as the darkness (intensity) of the stain. “Darkness-of-stain” is associated with higher antigen concentration and the intensity of the stain (DAB chromogen) is visually estimated. Therefore, unlike the determination of KI-67 positivity, which is a binary response, determining IHC intensity is more challenging.

Visual examination of staining intensity is used to estimate antigen concentration in IHC slides.

Moreover, antigen concentration is a continuous variable but intensity scores are usually quantized into bins (0,1+,2+,3+). As a result, visual analysis may not yield a reliable and repeatable threshold to separate adjacent scores. This becomes especially challenging in centres with limited access to subspecialty expertise [7].

What intensity value distinguishes 1+ and 2+, or 2+ and 3+ scores?

Grading systems that analyze the morphology and structure of nuclei and tissue in H&E slides suffer from similar challenges. Consider the NGS, which uses three morphological features: Tubular Formation — the percentage of tubules or ducts that occupy the tumour area; Nuclear Grade — shape, size, presence/lack of nucleoli, cellularity, other nuclei features; and Mitotic Counts — the number of cells undergoing cell division per HPF.

Estimation of tubular formation percentage requires accurate calculation of the area of the tubules or ducts formed within the tumour boundaries which is challenging without software assistance. Nuclear grading is based on qualitative descriptions such as “pleomorphic nuclei” to differentiate between high grade and low grade cancers which are difficult to reproduce. Mitosis counting suffers from similar challenges — not only can they be difficult to recognize, but counting them is time-consuming.

Visual factors contributing to slide interpretation variability:

  • Selecting HPF for scoring/grading
  • Qualitative descriptors to explain visual cues
  • Manual counting, “eye-balling”, visual estimation of objects

Any human-based visual analysis creates variability in the estimation of scales and grades. This effect is more pronounced in laboratories with limited pathology subspecialist access since lack of experience creates greater discordance. Additionally, obtaining some scores are more labourious and time-consuming than others, and not only are they pain-points for Pathologists, but also contribute to subjectivity (i.e. estimation of the number of cells, versus actual counting).

HOW CAN WE IMPROVE ANALYTICAL QUALITY?

As grading scales and scores play a critical role in patient management and treatment, new technologies are being sought to to increase the accuracy, efficiency, reliability and repeatability of Pathological scoring and grading systems. Fortunately, now that pathology samples are being digitized and stored in large databases, image analysis solutions, or algorithms, can be utilized to combat the challenges in pathological slide interpretation.

Image analysis systems are a series of software modules that automatically perform operations on a digital image. The output is a processed image, which can display algorithm results visually (as an overlay), or as a series of metrics that describe what was detected by the algorithm. Consider Figure 1, a workflow for an algorithm to detect nuclei in breast biopsies. The output is displayed as an overlay (red outlines detected nuclei) and the table below shows numerical values describing the number of nuclei detected, and the average area of the nuclei in this HPF (mm2).

Types of information we can readily provide to the pathologist using computer-aided diagnostic tools.

Since algorithms depend on mathematical analysis, they produce robust, objective and quantitative measures of disease. Every time the program is executed on the same tissue sample, the answer will be the same, ensuring reliable and repeatable solutions. Algorithms are implemented on computing devices and resultantly are efficient and can speed-up quantification tasks. Image analysis solutions also have the potential to aid in creating more uniform grading and scoring methods.

Software-based measurement tools are not meant to replace Pathologists, but rather, to assist them in their daily tasks — for workflow augmentation.

Workflow augmentation algorithms can be applied to entire slides, or select, reproducible HPFs. Such technical requirements can be hard-coded into the program, or can be designed according to guidelines and protocols set out by the College of American Pathologists (CAP). Moreover, if the ideal number of HPFs is not known, image analysis tools can be used to discover this correlation.

Image analysis solutions also offer advantages in research studies since it is possible to employ these methods on large datasets — leading to the uncovering of disease etiology/mechanisms, and more standardized grading and scoring systems. Automated results can be correlated to patient outcome for clinical trials and drug discovery research. Large patient cohorts can be split into homogenous sub-groups using objective imaging biomarkers in a way not possible with manual analysis.

Computer-aided diagnostic (CAD) algorithms combat quality issues pathology that arise in the analytic phase, including subjectivity, reproducibility and efficiency of rendering diagnoses — which all improve the quality of care delivered to the patient.

With new AI and deep learning technologies for computer vision applications — in particular CNNs and vision transformers — these goals are attainable!

But…

Demonstrating the return-on-investment for using AI to specifically improve quality of care in pathology — is HARD.

While noble, and could positively impact the lives of many patients suffering from diseases such as cancer, it is difficult to measure the value that AI brings to improving the outcomes of patients. This limits industry from building and marketing AI tools specifically for quality of care — which is a leading barrier. Many studies in the digital pathology space have focused on technology that can make Pathologists “more efficient”, i.e. read more cases in less amount of time. Or to improve access in under-served regions through remote consults and telepathology.

However, AI tools that augment workflows can help the Pathologist be more accurate with higher scoring and grading agreement. This would translate into more robust and personalized treatment decisions, ensuring the right therapy being delivered to the right patient at the right time. Quantifying these economic benefit s— i.e. cost of not using therapy when it is not needed, or keeping patients out of hospitals as a result of getting the correct treatment — is a daunting and ill-defined task. This may be related to the long cycle of measuring therapeutic outcomes (i.e. 5 or 10 year survival).

Hopefully in the years to come, researchers and companies will focus on AI solutions that address reproducibility and reliability in pathology (Analytic Quality) for the greatest impact on the patients. At the same time, we need to demonstrate the value proposition that such AI tools would bring to healthcare organizations — which helps to sell (and adopt) AI devices.

IN THE FINAL ANALYSIS

Pathological analysis is the definitive diagnosis for many diseases including cancer and is critical in providing care and therapy. There are some challenges with tissue slide interpretation, including accuracy and agreement in grading and scoring systems between observers, and some tasks are labourious. Given increasing access to whole-slide imaging scanners and advanced AI algorithms, tissue specimens can be digitized and automatically analyzed to combat the subjectivity. AI tools can also make some tasks easier and faster for the pathologist, which reduces pressure on pathology labs and pathologist burn-out.

AI algorithms can be used to automatically count objects for proportion scoring, quantify IHC positivity for intensity scoring, compute area measurements of various objects and structures, describe texture, shape and size of nuclei for nuclear grading, and more. These systems quantify disease in a reliable, repeatable and objective manner. Moreover, because they are implemented in software, they are also efficient. The result is a completely digital workflow, that can improve patient care since scores and grades are more consistent and repeatable. Some of the pain-points of the pathologist, such as cell counting, can also be alleviated.

Integrating multiple modalities and scales will be key to unlocking personalized medicine. © Khademi 2023

Clinical algorithms for digital pathology are starting to be approved for workflow augmentation. In the future, multi-modal AI systems will be designed, where algorithms integrate pathology, radiology and molecular data to quantify relationships between these different yet complimentary modalities. Patient health information can also be included by the way of large-language-models (LLMs). Such a “Pathologist Cockpit” could output a personalized disease signature, that uniquely describes the state of the patient in terms of disease, prognosis, and likelihood of survival. Since multiple sources of information are included, a much broader picture of the patient’s health can be explained. Multimodal tools will be key to unlocking the personalized medicine paradigm.

REFERENCES

[1] B. McLellan, R. McLeod and J. Srigley. Report of the Investigators of Surgical and Pathology Issues at Three Essex County Hospitals: Hotel-Dieu Grace Hospital, Leamington District Memorial Hospital and Windsor Regional Hospital [Report]. 2010.

[2] Ontario Medical Association. About QMP-LS. [Online] www.qmpls.org. (2013). Accessed March 2013.

[3] Royal College of Pathologists. What is Quality in Pathology? Report of a meeting to discuss the development of laboratory accreditation in the UK. [Report]. 2009, pp:1–27.

[4] D.A. Novis, G. Konstantakos. Reducing Errors in the Practices of Pathology and Laboratory Medicine: An Industrial Approach , American Journal of Clinical Pathology, (2006) 126, S30-S35.

[5] R. E. Nakhleh, What is quality in surgical pathology? Journal of Clinical Pathology. 2006. 59(7), pp:669–672.

[6] S.S. Cross. Grading and Scoring in Histopathology. Histopathology, 33, pp:99–106, 1998.

[7] S.C. Wludarski, L.F. Lopes, T.R. Berto E. Silva, F.M. Carvalho, L.M. Weiss, C.E. Bacchi. HER2 testing in breast carcinoma: very low concordance rate between reference and local laboratories in Brazil. Appl Immunohistochem Mol Morphol. 2011 Mar;19(2):112–8.

[8] F.Z. Bischoff, T. Pham, K.L. Wong, E. Villarin, X. Xu, K. Kalinsky, and J.A. Mayer. Immunocytochemistry staining for estrogen and progesterone receptor in circulating tumor cells: Concordance between primary and metastatic tumors. Cancer Research, 72(24), Supplement 3, 2012.

[9] E.N. Kornaga, A.C. Klimowicz, M. Konno, N. Guggisberg, T. Ogilvie, R.W. Cartun, D.G. Morris, M.A. Webster, and A.M. Magliocco. Comparison of three commercial ER/PR assays on a single clinical outcome series. Cancer Research: 72(24), Supplement 3, 2012

[11] K. Garg, M.M. Leitao Jr, C.A. Wynveen, G. L Sica, J. Shia, W. Shi and R.A. Soslow. p53 overexpression in morphologically ambiguous endometrial carcinomas correlates with adverse clinical outcomes. Modern Pathology (2010) 23, 80–92.

[10] Z. Varga, J. Diebold, C. Dommann-Scherrer, H. Frick, D. Kaup, A. Noske, E. Obermann, C. Ohlschlegel, B. Padberg, C. Rakozy, S. Sancho Oliver, S.Schobinger-Clement, H. Schreiber-Facklam, G. Singer, C. Tapia, U. Wagner, M. G. Mastropasqua, G. Viale, H.-A. Lehr. How Reliable Is Ki-67 Immunohistochemistry in Grade 2 Breast Carcinomas? A QA Study of the Swiss Working Group of Breast- and Gynecopathologists. PLoS One. 2012; 7(5): e37379

[12] H.F. Frierson Jr, R.A. Wolber, K.W. Berean, D.W. Franquemont, M.J. Gaffey, J.C. Boyd, D.C. Wilbur. Interobserver reproducibility of the Nottingham modification of the Bloom and Richardson histologic grading scheme for infiltrating ductal carcinoma. Am J Clin Pathol. 1995 Feb;103(2):195–8.

[13] E.A. Rakha, J.S. Reis-Filho, F. Baehner, D.J. Dabbs, T. Decker, V. Eusebi, S.B. Fox, S. Ichihara, J. Jacquemier, S.R. Lakhani, J. Palacios, A.L. Richardson, S.J. Schnitt, F.C. Schmitt, P.H. Tan, G.M. Tse, S. Badve, I.O. Ellis. Breast cancer prognostic classification in the molecular era: the role of histological grade. Breast Cancer Research, 2010, 12:207.

[14] S. Detre, J.G. Saclani, M. Dowsett. A “quickscore” method for immunohistochemical semiquantitation: validation for oestrogen receptor in breast carcinomas. J Clin Pathol. 1995 Sep;48(9):876–8.

[15] M. Dowsett, T.O. Nielsen, R. A’Hern, J. Bartlett, R.C. Coombes, J. Cuzick, M. Ellis, N.L. Henry, J.C. Hugh, T. Lively, L. McShane, S. Paik, F. Penault-Llorca, L. Prudkin , M. Regan, J. Salter, C. Sotiriou, I.E. Smith, G. Viale, J.A. Zujewski, D.F. Hayes; International Ki-67 in Breast Cancer Working Group. Assessment of Ki67 in breast cancer: recommendations from the International Ki67 in Breast Cancer working group. J Natl Cancer Inst. 2011 Nov 16;103(22):1656–64.

[16] M. Zaakouk, C. Quinn, E. Provenzano, C. Boyd, G. Callagy, S. Elsheikh, J. Flint, R. Millican-Slater, A. Gunavardhan, Y. Mir, P. Makhija, S. Di Palma, S. Pritchard, B. Tanchel, E. Rakha, N. M. Atallah, A.H.S. Lee, S. Pinder, A. M. Shaaban, Concordance of HER2-low scoring in breast carcinoma among expert pathologists in the United Kingdom and the republic of Ireland –on behalf of the UK national coordinating committee for breast pathology,
The Breast, Volume 70, 2023, Pages 82–91.

[17] M. G. Davey, S. O. Hynes, M. J. Kerin, N. Miller, A. J. Lowery. Ki-67 as a prognostic biomarker in invasive breast cancer. Cancers (Basel), 2021, 13(17), 4455.

--

--