A Craftsman Blames His Tools: Blood Glucose Meter Accuracy & Long-Term Diabetes Control
After consistently receiving higher-than-expected A1c results, I tested my meter against four of the most popular meters available to see if systematic bias was at the heart of my frustration. The results were particularly damning to not only my meter, but to the landscape as a whole. I found that meter accuracy alone could be responsible for a full 1.5% difference in A1c without any change to perceived glucose control, resulting in significantly different treatment plans and emotional outcomes. This contrasts the rhetoric presented by device manufacturers, medical providers, and leading diabetes organizations, all of whom take a particularly casual approach to the question of accuracy.
When I started using Dexcom’s G4 Platinum w/ Share continuous glucose monitor (CGM) about a year ago, I found that my average glucose was not correlating well with my periodic A1c results. Specifically, for three separate three-month periods, my A1c came in a full percentage point higher than my average glucose (as calculated from my CGM data) would predict. This was especially frustrating given how much effort I was spending tuning my homemade artificial pancreas. While there are numerous potential sources for this discrepancy, the fact that it was occurring consistently pointed me in the direction of something that could cause systematic bias: the twice-daily blood glucose meter readings used to calibrate my CGM.
For the past seven years or so, I have been using the Johnson & Johnson LifeScan OneTouch UltraLink blood glucose meter. This is the meter that was provided with my Medtronic MiniMed Paradigm 722 insulin pump and carries Medtronic branding. It is not a “current” meter in that it is no longer available for purchase (it was succeeded by the Bayer Contour Next Link in terms of Medtronic pump-integrated meters), but it is still commonly used and no upgrade has ever been suggested to me for the purposes of accuracy. It also uses the extremely popular and easily-obtained OneTouch Ultra test strips.
Because I don’t have access to laboratory-grade glucose measurement equipment, I decided that a side-by-side comparison of consumer meters would be a reasonable way to test my hypothesis. Luckily, meters are inexpensive if not free—manufacturers make their money selling the test strips—and thus easily acquired. In addition to the meter I use regularly, I obtained four new meters from my endocrinologist, each with a sample vial of ten test strips.
The following blood glucose meters were used for this test:
- LifeScan OneTouch UltraLink
- LifeScan OneTouch Verio
- LifeScan OneTouch VeiroIQ
- Abbott FreeStyle Lite
- Bayer Contour Next
Details on the meters can be found in the Appendix section.
This represents three of the four most common meter companies, with the other being Roche’s Accu-Chek series. There are of course numerous other meters available on the market, and many other offerings from these companies, but the sample here is reasonably representative of the selection in the US. Importantly, all of these meters report their results as plasma-equivalent glucose, not whole blood glucose, which makes them directly comparable to each other and to the estimated average glucose reported with A1c results.
The key to this set of tests is using the same blood sample to supply all of the meters. If you instead use different samples — even in rapid succession — you introduce a number of confounding factors, including sample contamination and legitimate sample inconsistency, perhaps caused by local or temporal variations in glucose concentrations. Due to the sparse blood sample requirements of modern meters, a sample of the size required to supply all five meters is easily obtained (i.e. I squeezed out a big drop of blood).
Nine testing rounds were performed over five days (December 12, 2015 through December 16, 2015). The following procedure was used for each round:
- Wash and dry hands
- Open meter cases and arrange meters on table in randomly-determined order
- Insert a new lancet into the lancing device
- Remove a test strip from each meter’s strip vial and insert strips into meters (thereby turning on meters)
- Wipe fingertip with an alcohol pad; pause momentarily to allow drying
- Use lancing device to prick fingertip; squeeze base of fingertip to present a large drop of blood
- Apply the blood to each test strip in order; do not wait for results before proceeding to next meter
- Record results
Additional procedural notes:
- The order in which the blood samples were applied to the test strips was randomized on a per-round basis. The order can be found in the Appendix.
- The tests were performed only when CGM readings were considered suitably stable (i.e. not rapidly increasing or decreasing).
- While an attempt was made to take readings over a reasonably representative range of glucose values, glucose was not purposefully controlled to achieve specific conditions outside normal treatment.
- The test strips used for each meter all came from their own unique vials. Only the Verio and VerioIQ used the same type of strip, but the strip vials were not mixed. Test strip information can be found in the Appendix.
- After completing the nine testing rounds, the meters were checked using their respective control solutions. Each meter comes with its own control solution for ensuring that the meter (and test strips) are functioning properly, and the meter-specific instructions were followed in each case. All meters passed their control solution tests; details are in the Appendix.
Results & Analysis
The following are the results of the blood glucose meter tests:
The meters produced a wide range of values for each blood sample. For example, round 1 had a minimum of 139 mg/dL and a maximum of 202 mg/dL—a spread of 63 mg/dL, all from the same drop of blood. The data is plotted below along with linear regression lines for the two meters that appear to deviate the most from the mean results.
Of note is that the regression lines are not parallel to the mean-equivalent line and have intercepts near the origin, which indicate that the apparent bias is a scaling issue, not a shift. The individual and aggregate results can therefore be effectively expressed as deviations from the mean values:
The LifeScan OneTouch VerioIQ, Abbott FreeStyle Lite, and Bayer Contour Next (with the exception of one outlier) all exhibited tight internal grouping as well as relatively close agreement with the mean. The LifeScan OneTouch UltraLink and LifeScan OneTouch Verio, on the other hand, showed significant internal scatter and deviation from the mean.
It is important to note that there is no “correct” value to compare these results against, and therefore the results can only be compared to each other. Similarly, these results are not an indictment of any particular brand or meter; rather, the salient point should be just how different the results are from each other. For the purposes of the discussion below, however, let’s assume that the mean is in fact the “true” blood glucose concentration of the blood samples. The analysis holds even if this is an incorrect assumption—the inter-meter variability is still the same—but it allows us to outline a concrete example.
The American Diabetes Association’s (ADA) official guidance on A1c target for adults with type 1 diabetes is 7.0%, which equates to an average plasma glucose of 154 mg/dL. If I were to use the UltraLink and successfully achieve an average glucose of 154 mg/dL as reported by the meter, I would expect (within the error bounds of the A1c test) to have an A1c of 7.0%. According to this data and the assumptions presented above, I would instead have a “true” average glucose of 176 mg/dL and an A1c of 7.8%—well above the ADA’s target threshold.
Now, if I were instead to use the Verio and successfully achieve an average glucose of 154 mg/dL as reported by the meter, I would also expect to have an A1c of 7.0% — same as with the Ultra Link, as I have simply been controlling to the meter’s feedback. According to this data and the assumptions presented above, I would instead have a “true” average glucose of 135 mg/dL and an A1c of 6.3%—well below the ADA’s target threshold.
Without changing anything but my meter, I will have reduced my average glucose by 41 mg/dL and my A1c by 1.5%. This is a massive difference in terms of evaluating the efficacy of a treatment regimen, both in design and execution. A patient with an A1c of 6.3% receives an entirely different course of action than one with an A1c of 7.8%, even with all other things being equal.
There is also an emotional toll associated with this discrepancy. I have repeatedly felt like I managed my disease well over an extended period of time and met the goals I set with my care team only to find out I’ve failed. Even at the minimum requested calibrations, three months of testing is about 25,000 CGM readings and 200 finger sticks, which in aggregate are shouting “congratulations,” while a single-point lab test is saying (correctly, it turns out) “sorry, try again.” But no matter how hard you try, you will never achieve your goals if your meter is constantly lying to you.
If these assertions stand up, it is a smear on the entire system: device manufacturers for accepting their products’ inadequacies; medical providers for neither emphasizing the importance of meter accuracy nor providing guidance on available calibration services; the FDA for not pushing this harder from a regulatory angle; and me, for not taking a closer look at the tools I depend on sooner. And yet, despite how impactful this is, meter accuracy seems to be near the bottom of the list for many of the most influential players in the diabetes treatment ecosystem.
Coincidentally, just as I began this study, I saw a tweet show up in my feed with a link to the 2015 Diabetes Forecast blood glucose meter feature guide. After reviewing it, I was curious about why there was no mention of accuracy or precision in the guide:
Diabetes Forecast is the official publication of the American Diabetes Association (ADA). The link provided in the above response by Diabetes Forecast took me to a post from 2012 titled “Why Not Compare Meters’ Accuracy?”, written by Dr. Sue Kirkman of the ADA along with Dr. David A. Simmons of Bayer.
Given the ADA’s prominent place in all aspects of diabetes care, research, and advocacy—the ADA took in about $200,000,000 in 2013—the article was quite disappointing. An excerpt from Dr. Kirkman’s section reads:
Some meters report their accuracy results as a “regression line,” with a correlation coefficient, slope, and Y-axis. (Yikes! Shades of forgotten algebra!) Other companies report in a table format the percentage of readings above 75 mg/dl that are within plus or minus 5 percent, plus or minus 10 percent, and so on. It would be difficult to compare one regression line to another, and even more difficult to compare a regression line to a table.
And later, from Dr. Simmons:
Even published studies might have misleading conclusions unless they have been carefully constructed to avoid these sources of bias and the reviewers are aware of all of these pitfalls.
Their point was more to why they cannot provide a consumer-friendly comparison rather than why a comparative study cannot be performed, but the result is the same: no adequate quantitative comparison and two prominent players justifying its nonexistence. I was frankly shocked by how flippantly Dr. Kirkman inserted her “math is hard” parenthetical commentary, which is antithetical to the patient-enabling rhetoric found on the ADA’s website. Dr. Simmons’ profound statement that poorly-executed research leads to incorrect results is suggesting that we probably shouldn’t do any science because there is a chance we’ll get it wrong.
So what are the current standards these devices must meet? From the lead-in article posted with the 2015 glucose meter guide:
The current standards require 95 percent of all meter test results to be within 20 percent of the actual blood glucose level for results greater than 75 mg/dl and within 15 mg/dl for values below 75 mg/dl. So a blood glucose that in reality is 100 mg/dl could show on a meter as being between 80 and 120 mg/dl — and still be considered accurate.
And that is really where it stands. These percentage bins don’t distinguish between scatter (imprecision) and bias (inaccuracy), and the result is exactly what was seen in this test: drastic differences in outcomes that are entirely opaque to the patient and care team. Even a 2014 revision to the standards that tightened the approval process did nothing for real-world accuracy. In fact, the OneTouch Verio meter—the same one that ended up furthest from the mean in the present testing—meets these new standards, at least per the accuracy tables in its user manual. The fact is that +/- 15% is still too wide of an acceptable range for the treatment results patients are expected to achieve.
As for me, I have decided to take a few immediate actions as well as to incorporate some long-term best practices. First, since I still have hundreds of OneTouch Ultra test strips, I will continue to use my OneTouch UltraLink meter for the time being. When I calibrate my CGM, however, I add 10% to the reading. If my hypothesis is correct, this should bring my A1c more in line with my average CGM data. I am looking to switch meters, but I have yet to decide which meter will fit the bill. There is more to a good meter than just accuracy, so it isn’t as simple as finding which is the best in that regard. I will certainly use accuracy as a necessary but not sufficient criterion, however, and continue to check my meter’s accuracy over its lifetime.
I plan on repeating this type of test in the future, whenever a new batch of meters (and test strips) is made available to me. This particular experiment took me a few hours of actual work and cost $20, mostly to get some new control solution for the one meter that didn’t have it. There are definitely some holes in my work — see the Appendix for some discussion on this topic — but for what little effort it was, it resulted in an immensely valuable understanding of the technology I use every day. I encourage others to try and repeat these tests with whatever equipment they have available, and to post their results online. When you do, share them with me (@hannemannemann) and we’ll see how they compare.
The following subsections provide further data and references.
Meter & Test Strip Information
Below are links to each meters’ product pages and user manuals, as well as the test strip information:
LifeScan OneTouch UltraLink (Medtronic-branded)
- Strip Type: OneTouch Ultra (blue) test strips
- Strip Expiration: 04/2017
- Strip Lot #: 3897881
- Strip Type: OneTouch Verio test strips
- Strip Expiration: 02/2017
- Strip Lot #: 3847810
- Strip Type: OneTouch Verio test strips
- Strip Expiration: 03/2016
- Strip Lot #: 3688521
- Strip Type: FreeStyle Lite test strips
- Strip Expiration: 12/2016
- Strip Lot #: 1514815
- Strip Type: Contour Next test strips
- Strip Expiration: 03/2017
- Strip Lot #: DW5CFEC52B
Control Solution Results
The following table shows the results of the control solution testing. Each meter has its own control solution and procedure for performing the test. Manufacturers designate this test as pass/fail, and the values are not used for any calibration purposes. Because the strip supply for the OneTouch UltraLink was not limited, three control solution tests were performed.
The following table shows the testing order for each round; all but the first round were randomized prior to beginning each test.
Due to the limited scope of this testing, there are numerous relevant questions which could not be addressed. These include:
- What is the lot-to-lot variation in test strip accuracy?
- What is the meter-to-meter variation in accuracy (within the same product)?
- Does the meter accuracy drift over time?
Some of these could be answered by repeating the test with additional meters and/or test strips while others require a longer study timeline. There is work being done to investigate the accuracy of meters once they pass regulatory approval, namely through the Diabetes Technology Society’s Surveillance Program.
For the example comparing hypothetical meter averages and equivalent expected A1c values, the linear regressions for the meter values as a function of the mean values were used. With regards to the 1.5% difference in A1c, it should be noted that the spread (in percentage terms) is a function of the “true” (mean) glucose selected for the example. In the Commentary section, the ADA’s target of 7.0% was selected. If, instead, the patient is at 8.0%, the spread increases to 1.7% (7.2% to 8.9%). Similarly, it decreases at lower A1c values.
No attempt was made to determine which meter is actually the most accurate. While instantaneous plasma glucose is often provided along with A1c results, this only provides a single data point and is taken from a different sample than a consumer would use to do a simultaneous meter check.