An Explanation of the Risks & Benefits of Screening Tests

Mary McQuilkin, NP, MPH
Circle Medical
16 min readMar 6, 2020

--

If you just read the annual wellness exam post, you may be left thinking, “what’s the harm in just getting screened for everything? I want all the tests.”

To explain why this is a bad idea, I need to define some terms and provide an overview of testing principles in the context of preventive health. If you want to skip the theory and cut right to the clinical examples, click here.

Types of preventive health:

  1. Primary prevention: Health interventions that keep people from getting sick in the first place (e.g. vaccines)
  2. Secondary prevention: Screening tests for people who are well and have no symptoms, with the goal of detecting disease early when it can be treated or cured. (e.g. pap smear to detect precancerous cervical changes, which can be treated to prevent cervical cancer from developing)
  3. Tertiary prevention: Promoting wellness in people already diagnosed with a disease (e.g. nutrition counseling, exercise, and medications for someone with diabetes to improve quality of life following diagnosis).

What conditions we screen for varies based on many factors, including the prevalence of a disease in the community a patient comes from (how common it is) and how serious the disease is. This is the foundation we start from when deciding what to screen for when providing preventive health care:

What makes a good screening test? How do we measure how good a test is?

In the diagram below, the blue shading represents the people in the population being screened for a condition who actually have the disease, while the white rectangle is the people in the population being screened who do not actually have the disease. Whether or not people actually have the disease is across the top of the table. The left side of the table indicates the result people got after everyone was screened with the test.

Assume we are using a screening test with dichotomous results (you either have the disease, or you don’t). If the test were perfect, all the results would be true positive (TP) or true negatives (TN), but in real life, no test is perfect. Tests incorrectly give some people who have the disease a negative result (false negative) and incorrectly give some people who do not have the disease a positive result (false positive).

We usually pick an initial screening test with a high sensitivity. That way we can be sure that all the people we test who actually have the disease will test positive (true positives), but the downside of a high sensitivity is that it often comes at the cost of some people getting a positive result who don’t really have the disease (false positives). We deal with this problem by testing everyone who is positive at the initial screening with a second, diagnostic test that has a high specificity. A test with a high specificity is good at correctly labeling people as not having the disease who don’t have the disease (the true negatives). While the initial screening test is often something like a simple blood test, the high specificity test needed to re-test all the people who get positive results may be more invasive (e.g. breast biopsy) or more expensive.

The challenge of deciding what a test’s cutoff point should be

In humans, whether or not you have a disease is rarely a simple yes/no result. For most diseases, the measure we use to diagnose the condition has a range of values in a population. It can be challenging to decide what counts as “too high” or “too low” on a health measure, but where we draw the line is important because that determines who gets a diagnosis and who is told they don’t have the disease. This is generally not a decision made by the individual clinician; labs usually follow agreed-upon standards for reporting results, based on consensus among medical providers.

The following diagram shows the impact of where we draw the cuttoff point for what counts as disease and what counts as negative for the disease. The blue horizontal line going across the box represents the cuttoff; people with test results above the line are labeled as positive for the disease, while people with results below the line are labeled as negative for the disease. Just like the previous diagrams, the left blue shaded area represents the people who actually have the disease and the white vertical area is the people who really don’t have the disease, labeled across the top. On the left are the test results after the entire population was screened. The circles represent individual people to show the distribution of test result values in this population.

If we make the cuttoff for disease versus no disease where it is now, 4 people are correctly labeled as having the disease, but we incorrectly labeled 6 people as negative who actually do have the disease (false negatives). Meanwhile, 1 person was incorrectly told they may have the disease (false positive) and 9 people were correctly told they don’t have the disease who don’t have it (true negatives).

Now let’s move the cutoff point for this test down. In the next diagram, we correctly label 9 people as having the disease, out of the 10 people in this population who have it. The downside is that we now incorrectly label 3 people as having the disease who really don’t have it. These individuals will have to come back for additional testing and may experience stress while worried about having a disease that they really don’t have, may incur medical bills for unnecessary care, and experience physical discomfort or other harms while going through additional testing to rule out the disease, which varies by disease but could include radiation exposure from imaging, or surgery.

The lower right graph is another way of looking at the cuttoff value for a screening test and how it impacts the test results people with and without the disease will receive. This example illustrates what false positives are, why they occur, and helps explain why we can’t eliminate them without risking missing people who do have the disease. This difficult balance of where to draw the cuttoff point is one of the reasons screening tests should only be used on people from populations at high risk for that condition; the tests were designed for that purpose when the choice of cuttoff point was made. Using the test for people at low risk may cause more harm than good.

As I mentioned above, we tend to err on the side of high sensitivity with screening tests because we can bring people in for a second test if there is a chance they have the disease, but we don’t want to incorrectly tell people they are fine and send them on their way without follow up if they do actually have the disease (false negatives). This means that in general, screening tests may yield many false positive results, but this depends on both characteristics of the specific test used and characteristics of the population the individual patient is from.

Sensitivity and specificity aren’t the most useful measures for determining how to interpret a screening test result in the real world

Sensitivity and specificity are both fixed characteristics of the test. A test that is positive in 8 of 10 patients with a disease has a sensitivity of 0.8 (80%). A test that is negative in 9 of 10 patients without disease has a specificity of 0.9 (or 90%). This information is most helpful in a research setting where you already know at the outset who has the disease and who doesn’t.

For example, a group of 100 people with HIV enroll in a study and 100 people without HIV are enrolled as a control group. In such a situation, testing your study participants with HIV with the above test would correctly label 80 patients as positive for HIV (20 would get a false negative result). Testing your control group with the test would correctly label 90 people as HIV negative (10 would get a false positive result).

When screening real patients, we don’t know who has the disease and who doesn’t… that’s why we’re screening. There are more helpful ways of determining how accurate a screening test is likely to be for an individual patient outside of a research setting.

Positive likelihood ratio, abbreviated as LR(+), tells us how much to increase the probability of disease if a test is positive. It is the probability of a person who has the disease testing positive (sensitivity) divided by the probability of a person who does not have the disease testing positive (1-specificity).

Negative likelihood ratio, LR(-), tells us how much to decrease the probability of disease if a test is negative. It’s the probability of a person who has the disease testing negative divided by the probability of a person who does not have the disease testing negative.

A likelihood ratio of greater than 1 indicates the test result is associated with the disease. A likelihood ratio less than 1 indicates that the result is associated with the absence of the disease. Before ordering any test, we always need to consider how the result will affect our treatment plan for the individual patient. Likelihood ratios help make this determination.

The shortfall of likelihood ratios is that they are based only on sensitivity and specificity, which are fixed characteristics of the test. Because you don’t show up for a screening test already knowing definitively if you have the disease or not, and your personal characteristics impact how likely it is that you have a disease, likelihood ratios are a helpful but imperfect measure for clinical care.

Positive and negative predictive values

Predictive value is not a fixed characteristic of the test because it is affected by prevalence. Prevalence is the number of people in a population who have the disease, usually expressed as the proportion of people who have the disease out of the total people in the population at one point in time. For example, the prevalence of obesity in West Virginia is 39%, which was calculated by dividing the number of residents of the state who were obese in 2018 by the total number of people in the state.

To explain predictive values, I’ll literally turn what we talked about before on its side; now the blue shaded area is horizontal rather than vertical. The same diagram is used again, but this time the information we know is whether people got a positive or negative test result, rather than if they actually have the disease or not. This is more like the real-life situation when you get a screening test result back from your primary care provider, but don’t know if you actually have the disease or not.

Positive predictive value (PPV) answers the question: if your test result is positive, how likely is it that you, the individual patient, actually have the disease?

PPV is much higher in a population with a high disease prevalence. When something is rare, false positives skew it a lot, but when it is common, the true positives are closer to all positives. This means it works well for conditions that are relatively common in the population, but it’s less accurate for conditions that are very rare in that population.

When deciding who to screen for a particular disease, it is especially important to have a high PPV if a positive test result will lead to invasive follow up. A screening test is more likely to cause harm if the PPV is low, say <10%, for the individual patient based on their characteristics (i.e. gender, age, geographic area of residence, occupational exposures, family history).

Let’s say, for example, that you saw a documentary about coal workers’ pneumoconiosis, so you want to get screened to make sure you don’t have it. You have never been in a mine. You live in San Francisco and work in an office. You have no occupational history of exposure to coal mine dust. Based on this history, the pre-test probability is very nearly 0. If you get a chest X-ray and your primary care provider wrote “coal workers’ pneumoconiosis screening” on the order, there is a chance the radiologist reading the X-ray may not feel confident in stating that coal workers’ pneumoconiosis is definitely not present. The radiologist may then recommend follow up testing with more expensive imaging or a lung biopsy, potentially exposing you to more radiation and the stressful uncertainty of thinking you may have a serious lung disease.

Should you get a lung biopsy? Might you be the first case of coal workers’ pneumoconiosis in history to be diagnosed in someone without occupational exposure to a mine? No and no. This positive result can be interpreted as a false positive, and the screening test never should have been ordered in the first place because the PPV was too low given the occupational and environmental exposure history.

Test results need to be interpreted taking into account pre-test probability; how likely it is that you have the disease before you know the test result. For example, if you are obese, both your parents have diabetes, and your ethnic background is Samoan, a population with a high prevalence of diabetes, then your pre-test probability before getting screened for diabetes is high. Bayes’ theorem tells us how to adjust pre-test probability to post-test probability (you don’t throw out the pre-test probability, you adjust it to interpret test results). The pneumoconiosis example was a clear-cut case — you shouldn’t trust that test result at all — but most situations are more nuanced.

P(A) = probability of having the disease
P(B) = probability of a positive test result

If you get a positive screening test result and you want to know — based on that result, how likely it is that you actually have the disease — it is helpful to remember that the PPV will be much higher if the screening was for a condition with a high prevalence in the population you come from. The likelihood that you have the disease before knowing the test result (pre-test probability) affects the accuracy of the test.

Calculating predictive values for a COVID-19 antibody test

Breast cancer screening example

The mammogram is a well-known screening test for breast cancer. Years ago, women were screened before age 50 even if they didn’t have an increased risk (such as a genetic susceptibility to breast cancer). Over time, it has become increasingly apparent based on population-level data that screening women under 50 with an average risk for breast cancer causes more harm than good. This is partially because younger women’s breast tissue is more dense, which makes a mammogram harder to read. The risk of breast cancer also increases with age; the annual incidence of breast cancer in 40-year-olds is about 1 case per 1000 women, while the incidence reaches about 5 cases per 1000 at age 65.

Another factor is that screenings we use today in modern medicine aren’t nearly as accurate as people often assume. Imagine you recently had your first mammogram, and your primary care provider informs you that your screening was positive, so you need to come in for additional imaging and a breast biopsy. Intuitively, you may think it is very likely that you have breast cancer based on your positive test results. After all, mammograms are considered to be a highly accurate tool for breast cancer detection.

People often incorrectly think “if mammography has a sensitivity of 81%, then my positive test must mean I have an 81% chance of having cancer.” Remember that sensitivity refers to the number of people correctly identified by the test as having the disease when testing a group of people who already are known to have the disease (like the HIV research study example above). It’s positive predictive value, not sensitivity, that is helpful in determining the likelihood of disease given a positive screening test result.

If we assume you are a 40-year-old woman and use breast cancer prevalence data from a relatively small study conducted locally in San Francisco and surrounding counties, the positive predictive value of screening mammography is 4%. That means given your positive mammogram, there is a 4% chance you have breast cancer. It’s likely even lower than that though; large international databases yield a positive predictive value of 1.3% for a 40-year-old, although it increases to 9.8% at age 60. The increase in PPV with age is important, because it’s after age 50 when it nears 10% that the benefits of screening may start to outweigh the risks.

If a woman completes screening mammography over 20 years, she has a 25% to 50% risk of receiving a false positive result and being called back for more testing or treatment. Screening causes a significant amount of overdiagnosis and overtreatment because non-invasive ductal carcinoma in-situ, which is a cancer that often will not progress to cause negative health effects or death, is often diagnosed with mammography and then treated. Some of these cases do progress, but we don’t have a way of determining which ones will, so all these women get treated to be on the safe side. In addition to psychological stress and medical bills, false positive results and overdiagnosis can lead to surgeries, chemotherapy, and radiation therapy for cancerous cells that never would have progressed to invasive breast cancer had they been left untreated.

Getting regular screening mammograms for more than 10 years reduces the risk of dying from breast cancer by 0–15%, with the most well-designed studies finding the smallest benefit or no benefit. The table below shows that when 10,000 women are screened repeatedly over 10 years, 3 deaths may be prevented. These three deaths are at the cost of 1,212 women receiving false positive results, 164 women getting breast biopsies, and 10 women having cancer that was not detected by their mammograms.

Prostate cancer screening example

The U.S. Preventive Services Task Force (USPSTF) does not recommend routinely screening men for prostate cancer using a PSA blood test. After reviewing the available evidence on prostate cancer screening in American men and grading the evidence based on quality, an independent panel of experts determined that for most men, the benefits of this screening do not outweigh the harms. Even for men ages 55–69, the decision of whether or not to screen for prostate cancer should be individualized with a discussion of the potential risks and benefits of this screening. The decision should take into account individual risk factors and patient preferences.

Potential benefits and risks of prostate cancer screening for men 55–69:

Herpes example

Herpes Simplex Virus can cause sores on the mouth or genitals. In general, HSV-1 infects the mouth and HSV-2 infects the genitals, but the viruses are very closely related and either type can affect either part of the body. Once a person gets infected, the virus will always remain in the body, even if no sores are present. When there are sores on the skin, the virus can spread to other people by skin-to-skin contact.

When a patient with sores comes in for treatment, if it is unclear to the provider if the sores are caused by HSV or another type of infection, such as a bacterial skin infection, the sore can be swabbed and tested in the lab to confirm the cause. This testing is diagnostic testing, not screening, because the person in this example already has symptoms and the test is being used to clarify the cause of disease.

People with no symptoms of herpes sometimes ask to have a blood test to screen for HSV, but this is not recommended. For one woman’s story of how getting screened for HSV impacted her, read this article.

The positive predictive value of HSV-2 blood testing in an American is 50–75%, depending on the specific test used. This means that if you get a positive result with the most widely used test, there is only a 50% chance you actually have HSV-2. Flipping a coin would be equally as effective to predict whether or not you have the infection.

Summary of key points

  • The goal of screening is to catch early signs of disease before symptoms develop when a cure is more likely.
  • No medical test is 100% accurate, and any time you get a screening test there is a chance of a false positive or a false negative result.
  • Screening tests are intended for use when there is a high prevalence of the disease in the population the patient belongs to. A personal and family health history is needed, in combination with data for geographic area, age, gender, ethnic background, lifestyle, and other factors to determine how likely it is that you have the disease before testing.
  • If a screening test is used when there is a very low pre-test probability for the individual, false positive results are more likely, and it is less clear how the result should be interpreted.
  • Potential harms of screening tests include psychological stress, unnecessary procedures and treatments, and the cost, time, and health risks associated with these.
  • Benefits of screening: When screening tests are ordered for the right patients and results are interpreted accurately by the provider, secondary prevention catches disease in the early stages of development when a disease can be cured or effectively treated.
  • Health screenings can save lives. To schedule a wellness exam to discuss which screening tests are right for you, click here to schedule an appointment with your primary care provider.

Most tables and graphs above were created or reproduced using data from published papers, websites, or lectures, cited where applicable. For a references list of sources without links above, click here.

--

--

Mary McQuilkin, NP, MPH
Circle Medical

Nurse Practitioner certified in primary care, public health, and HIV. marymcquilkin.com