Demystifying Vaccination Metrics
Nine considerations when reading COVID-19 vaccination charts
The line charts the world is obsessing over these days aren’t a race to flatten the curve. Instead, our attention is focused on a race to a herd immunity threshold that promises a return to some semblance of normalcy.
If you skip the rest of this article about the nuances and complexities of vaccination data, please remember this: all of the prevention measures (masks, distancing, hand washing) that we’re practicing now need to be part of our lives for the foreseeable future, particularly with new COVID-19 variants in circulation.
Listen to public health officials. Be hopeful and excited for this huge scientific milestone, but also remember, we still don’t have sound data on whether full vaccination prevents you from acquiring and transmitting the SARS-nCoV2 virus to others, even if you yourself don’t get sick.
For a year, we unpacked the complexity of COVID-19 case data.
Over the last year, there’s been a collective learning about the complexity of COVID-19 case data in the data visualization community.
What seemed like straightforward numbers (infections, recoveries) on the surface required more nuanced understanding, including factors like data collection methods, related metrics like the complicated test positivity rate, and the predictable peaks and valleys routine in health information reporting
I’ve developed charts of immunization coverage data many times while working on global health programs. Immunization experts corrected my missteps and helped me better understand uncertainty, denominator challenges, and data quality issues to monitor in coverage numbers. Vaccination data gets even more complex with multi-dose vaccines. As data journalists, health departments, and other sites launch trackers, the focus seems to be on the cheery cumulative climb of the ‘doses administered’ curve, which tells an incomplete story.
We can collectively benefit from becoming more informed readers (and creators, with the appropriate subject matter expertise) of data visualizations about immunization as COVID-19 vaccination trackers spin up. Here are nine considerations when reading or creating charts of COVID-19 vaccination data.
Nine Considerations When Reading or Creating Charts of COVID-19 Vaccination Data
1. Don’t fixate solely on the endless climb of the cumulative ‘doses administered’ line chart.
The total doses administered feels like a vanity metric — or at least only one part of a more complicated puzzle. Yes, we want to see the line keep climbing. But who is receiving those doses? What share of the total available doses in that state have been administered? How many people are fully immunized (two shots, with the current regimen) compared to those partially immunized?
In some trackers, the total doses is normalized to doses per 100 people in a country or state. While doses per population can help us compare vaccine rollout across countries or states, having two-dose vaccine regimens, and eventually a mix of single- and multi-dose vaccines, will make doses a poor proxy for coverage — which is really what we’re trying to understand.
2. Clearly identify benchmarks and goals to aid understanding, but recognize those benchmarks are likely to change.
Why do we care so much about calculating coverage? The percent of the population fully vaccinated is the metric we compare to the ‘herd immunity’ thresholds, making this metric the number that will be used in debates about reopening timelines. Make reopening decisions too early, when vaccination coverage is still low and community spread is high, and there could be real life or death consequences to those choices.
Herd immunity thresholds vary depending on the disease. Estimates for herd immunity necessary to curb the spread of COVID-19 currently range from 75% to 90%, but estimates are still being adjusted as we learn more. Current estimates of a herd immunity threshold are based on what data we have about COVID-19 — and if you’ve been following the disease over the past year, you’ve seen that there’s much we’re still learning. Discussing the upward shift in estimates (which started out at 60–70%), Marc Lipsitch, an epidemiologist at Harvard’s T.H. Chan School of Public Health, said, “You tell me what numbers to put in my equations, and I’ll give you the answer…but you can’t tell me the numbers, because nobody knows them.”
Vaccine coverage does not account for those who have recovered from COVID-19 and may have immunity from the virus (having had COVID-19 does not disqualify you from receiving the vaccine). Deduplicating the number of ‘immune’ persons would be exceedingly complex, which is why you’ll see vaccine coverage used as a proxy for comparison against the herd immunity threshold.
The herd immunity estimated benchmarks can also be helpful for interpreting vaccine acceptance data. Reading the trend chart below, the shaded band (representing a range, rather than a specific threshold) spans from 75% to 90%, since a precise number hasn’t been established. Ideally the green area (‘definitely yes or probably yes will take the COVID-19 vaccine’) would exceed the lower bound of that reference band, indicating that at least enough people would be willing to take a COVID-19 vaccine to meet the minimum threshold.
Acknowledging that no population is expected to have 100% vaccine acceptance, without the reference band we’re limited to focusing on acceptance fluctuations, making it harder to assess if a country has the vaccine demand to put it on the path towards herd immunity if vaccine supply and accessibility constraints are addressed.
3. Display distribution (supply) and administration (demand) data together for a more complete picture of the vaccine rollout.
To make sense of what was happening with COVID cases, charts from groups like the COVID-19 Tracking Project clustered trends on testing, cases, hospitalizations, and deaths for a more complete picture. Similarly, we can’t look at data on doses administered in isolation to understand how a country or state is performing on vaccine rollout.
The New York Times displays a combination of the percent of people given at least one shot or two shots and information about the doses distributed and the share of doses used. Together, these metrics give a high level snapshot of information about supply distributed and administered. Note that understanding demand requires knowing more than how many people received shots though, which is likely influenced by supply.
NPR shares some of the same metrics, posed as answers to two specific questions on their tracker: how much of the population has been vaccinated and how efficiently are states administering their doses? Note that the color scales on the heat map are specific to the current range of values and not scaled for the maximum value color to be a target or 100% — perhaps another debate for the dataviz community about color ramps.
There are a number of other metrics used to track vaccine logistics and distribution. While supply chain data is used primarily by teams managing vaccination programs and not in broad public reporting, these metrics provide more detail on vaccine rollout progress. Examples include:
Order response time — the amount of time between a product being ordered (requested) and the product arriving at the destination. As states and countries manage vaccine supplies with the aim to have adequate supply — such that everyone who receives a first dose will have access to a second for the current regimens — managing supply requires understanding how long it takes to receive more product across manufacturers and distributors.
Wastage — a measurement of the amount of vaccine that is not administered (due to both doses from open vials and totally unopened vials going unused or spoiled), compared to the amount of vaccine issued. Quantifying vaccine wastage is particularly critical in a mass immunization campaign where there are shortages of vaccine. We want to ensure the limited supply is being utilized, and tracking wastage creates greater transparency and accountability around progress to that goal. According to ProPublica, despite reporting vaccine waste being a required measure by CDC for COVID-19 vaccines, some states are failing to report this information.
Dropout rate — the percent difference in coverage between two different doses in sequence. For the current COVID-19 regimens, this would compare coverage for the first dose and the second. If we see the dropout rate increasing, public health practitioners will seek to understand why: a supply issue, like vaccine not available; an access issue, like not being able to schedule a second appointment due to issues with the scheduling application; or a demand issue, like not having a good experience with the first dose or otherwise hesitant for some reason to get the second. Understanding more about common reasons for dropout between doses enables public health experts to develop appropriate interventions.
You can learn more here from the WHO about vaccine supply chains and logistics measures, which also extend into cold chain capacity. You can learn more about these routine metrics in the WHO Global Health Observatory or through the Better Immunization Data Initiative.
4. Always seek to understand who is represented in the data and potential issues of inequity.
Equity issues are global. A virus doesn’t recognize the borders we draw on our maps. As vaccination rates increase, new cases decline, and global travel picks up again, COVID-19 anywhere can be a threat to the rest of the world. As Ifeanyi Nsofor, a former Routine Immunization Surveillance Officer in Nigeria, wrote:
If all nations would join together, they would be stronger. I’m reminded of an Igbo word, igwebuike, which means “there is strength in community.” However, to work as a stronger global community, the well-off Western world must stop behaving as if poorer countries are invisible. And they must acknowledge how their plunder of these poor nations contributed to those countries’ poverty.
Maps and charts illuminate the inequities in the distribution of vaccine, with estimates that many low- and middle-income countries will not have widely-available vaccine until 2022 or 2023.
With slow ramp up and limited availability of testing in many countries, some may look at maps of cases and wrongly assume that African countries may not need the vaccine as urgently. Researchers are using other approaches to estimate the mortality from COVID-19. A study led by BUSPH researchers collected post-mortem samples and found that 15% of recently-deceased people admitted to the main morgue in Lusaka, Zambia tested positive for COVID-19.
We also have significant equity concerns within the United States, which can be hard to assess and address if data gaps persist. Early in the pandemic, the number of states reporting cases by race/ethnicity was limited. Advocacy by The COVID Tracking Project and others pushed states to provide more granular reporting to better understand who was most impacted by COVID.
Black and Indigenous Americans have been disproportionately impacted by COVID-19. According to the APM Research Lab’s Color of Coronavirus update on January 7, 2021, 1 in 735 Black Americans has died from COVID-19, and 1 in 595 Indigenous Americans has died from COVID-19, compared to 1 in 1,030 white Americans. Take a minute to sit with those statistics if you’re seeing them for the first time.
We need to advocate for sharing disaggregated data, in the US and around the world, to understand who is getting vaccinated and identify the supply and demand barriers preventing those communities most impacted by COVID-19 from receiving the vaccine.
Currently, only a limited number of US states are reporting vaccination data by race/ethnicity, according to a tracker set up by the Kaiser Family Foundation; reporting by the Washington Post highlights the incompleteness in some of the state reporting, with many of the states reporting on vaccinations by race/ethnicity sharing very incomplete information.
Wondering what incomplete looks like? As of Thursday, January 28, 2021, according to the Virginia Department of Health’s COVID-19 Vaccine Dashboard, race/ethnicity data was available for less than half of people who received the first dose of the COVID-19 vaccine. On the dashboard, the ‘not reported’ number is included as a footnote; instead, placing that category on the chart itself ensures the information does not get ignored when the ‘not reported’ category accounts for such a large share of patients.
The COVID Tracking Project has a detailed article on the issues with the available disaggregates in our state vaccine data including missing or incomplete demographic data, inconsistent definitions across states, and more.
5. Pay attention to the nuances in the actual measure plotted in charts about total doses administered.
The Our World in Data team gives a few different views of data on doses per 100 people by country: cumulative doses (left), daily new doses (center), and a seven-day rolling average of daily new doses (right). The daily charts show us more about the pace of the rollout, compared to having to interpret the slope of the cumulative line.
Like other health metrics, smoothing to a seven-day rolling average (chart above, right), when looking at daily doses administered, is helpful to account for reporting delays and reduced availability of health services on weekends (even with some US states introducing 24-hour appointment windows).
6. Differentiate between metrics about people who have received one shot or two shots, particularly as new vaccines enter the market.
Both the Moderna and Pfizer vaccines currently approved for emergency use in the US have a two-dose recommended regimen. Other vaccines on the market, like Covaxin (Bharat Biotech) and Covishield (AstroZeneca) being deployed in India, are also two-dose regimens. Eventually, we are likely to have single-dose options as well, like Johnson & Johnson’s regimen.
Doses per 100 people — as illustrated in the charts above from Our World In Data — won’t translate directly to a percent of population vaccinated. They say as much in their subtitle, which hasn’t stopped the same data from being republished as percent of population vaccinated, unfortunately.
Let’s think through what metrics we can calculate from an illustrative population of 100 people eligible for the currently available two-dose regimens.
With only two-dose regimens on offer at the time of this article’s publication, our hope is that most people return for a follow-up appointment to get their second dose on the recommended timeline. Public health experience indicates that not everyone will make that follow-up appointment. We quantify this as the dropout rate described in a previous section. Because some people will only receive one dose and others will be in the three-to-four-week window between doses — not yet eligible for a second dose — we also cannot simply divide the number of doses administered by two to count the number of people fully vaccinated.
Now, look forward to a time when we have some two-dose regimens (those already on the market) and single-dose regimens. We would need to know which vaccine someone received as their fist dose to estimate the share of the population fully vaccinated. One-dose and two-dose language will need to be replaced with ‘fully vaccinated’ and ‘partially vaccinated’. You can see how this becomes increasingly complex, requiring more granular data to calculate.
7. Be mindful of coverage denominator issues.
To calculate coverage, we divide the number of people receiving all doses of the vaccine (fully vaccinated) by the number of people in the target population. An accurate denominator requires accurate population statistics, which are not universally available.
In high income countries, births, deaths and migration are generally well recorded and accurate, and age-specific population estimates exist. In England for example, a census is conducted every 10 years. In addition, electronic patient registers are available, allowing quasi-real time identification of individuals attending a specific primary care facility.
But in many low- and middle-income countries (LMICs), recent, local population estimates are rarely available. For example, the last population censuses conducted in Angola and the Democratic Republic of Congo were 1970 and 1984, respectively. The usefulness and reliability of administrative denominator data for identifying target populations and planning immunization activities is therefore limited.
For routine childhood immunizations, for example, gaps in vital registration (registering a new birth with the government) can create uncertainty in the denominator for calculating coverage. In the long run, outdated population estimates can result in coverage estimates that exceed 100% for widely-distributed vaccines.
A more immediate consideration for COVID-19 vaccine distribution is quantifying who is eligible within a population and how accurate our estimates are for groups within the general population (for example, how many people qualified in the Priority 1 groups across states).
Read more about estimating global target populations for COVID-19 vaccine in the BMJ and explore the interactive COVID-19 Vaccine Allocation Planner for US states and counties to learn more.
8. Consider how data about vaccine acceptance and vaccine hesitancy will be used (or mis-used) when representing survey findings.
Vaccine acceptance studies are being conducted in the US (Kaiser Family Foundation, McKinsey) and around the world (Johns Hopkins CCP and partners). These tracking studies involve multiple rounds of surveys to reveal trends over time, often disaggregated by demographic groups.
Vaccine hesitancy statistics provide health communicators with target audience insights when developing health messaging. But, more widely amplifying these numbers can have unintended consequences, particularly in the months before vaccines are made widely available.
Yes, understanding what share of the population is hesitant or unlikely to get the COVID-19 vaccine is important, particularly when we understand why they’re hesitant or who is likely to influence them, in order to tailor public health messaging and connect with community leaders. But we have a more immediate challenge right now related to supply and logistics constraints to overcome on the path towards herd immunity.
If you are sharing vaccine hesitancy numbers, take the time to review the complete survey report for additional more actionable insights, such as information about potential influencers and where people are seeking information.
9. Learn more about how the data is collected and consider how it should be used.
This last consideration is not unique to vaccination data or COVID-19 case data. Any time we’re using data to inform or make decisions, we should spend time learning about how that data was collected. We can gripe all we want about whether the charts and vaccination dashboards abide by data viz best practices, but ultimately we need access to quality, open, machine-readable data for those charts to be meaningful.
Fragmented systems and issues of interoperability will vary across countries. US vaccination data, like much health information, is primarily decentralized to the states and counties. MIT Technology Review analyzed the failings of the new VAMS system custom built for COVID-19 vaccination rollout and introduced in the US, the authors pointed out that our information challenges aren’t unique to COVID-19: “America’s heavily privatized medical system was held together by duct tape and bubble gum long before the biggest public health crisis of our lifetimes.”
Despite being critical to reporting national public health statistics, many state information systems are woefully out of date, particularly for handling the volume of incoming data from a mass-vaccination campaign like we hope to see for COVID-19.
Other countries have invested deeply in improving the quality of their routine immunization data, whether with dedicated immunization registries or integrations within their national health information systems. These investments have the potential to pay dividends — if we can start allocating vaccine supplies more equitably to low- and middle-income countries.
Here in the US, teams adjacent to the public health informatics systems are stepping in to transform data like this into readable formats. In the state of Georgia, vaccine orders by county are released every 7–10 days as a PDF. Standard Co scrapes the vaccine orders and, via an API, converts those .pdfs into data available as an Excel file, housed in a basic dashboard.. COVIDActNow, COVID Tracking Project, Johns Hopkins University, and others stepped in and did the same data transformation and compilation heavy lifting for COVID case information.
These short term solves to the demand for data are not the same as an investment in our public health information systems. As you watch the ‘real-time’ updates, remember that there will be different lags for distribution and delivery measures, and we can expect some routine reporting noise in the data.
Over the coming months, think about the amount of time you could spend watching immunization trackers — the same way many did with case curves. What are you looking to learn? How is that information serving your purpose? What is missing from the visual stories you’re reading?
Finally, for dataviz designers, be cautious and thoughtful if you are tasked with visualizing vaccination data. These numbers will eventually be used to justify re-opening and return to work decisions that could have life or death consequences if made prematurely. There are some additional notes on priorities (mobile-first, accessibility) related to creating vaccine data visualizations in this thread from Moritz Stefaner, whose team developed a mobile-friendly dashboard of vaccination data for Germany.
More than any design consideration though: spend the time getting to know the data before you spin up any new charts and graphs of vaccination coverage. Always consider the equity issues masked by summary statistics. And don’t forget the people behind the numbers.
Amanda Makulec is the Senior Data Visualization Lead at Excella and holds a Masters of Public Health from the Boston University School of Public Health. She worked with data in global health programs for eight years before joining Excella, where she leads teams and develops user-centered data visualization products for federal, non-profit, and private sector clients. Amanda volunteers as the Executive Director for the Data Visualization Society and is a co-organizer for Data Visualization DC. Find her on Twitter at @abmakulec.