The PDMP: Raising Issues in Data Design, Use and Implementation

Terri Lewis
10 min readAug 7, 2021

How machine learning, algorithms, and poorly designed data collection combines to create vicarious harm to health care users

1/ “The worst part of machine learning snake-oil isn’t that it’s useless or harmful–it’s that ML-based statistical conclusions have the veneer of mathematics, the empirical facewash that makes otherwise suspect conclusions seem neutral, factual and scientific.

Think of “predictive policing,” in which police arrest data is fed to a statistical model that tells the police where crime is to be found. Put in those terms, it’s obvious that predictive policing doesn’t predict what criminals will do; it predicts what police will do.” — @CoryDoctorow, 2021, twitter

2/ Machine learning is an application of artificial intelligence (AI) that programs digital data systems with the ability to automatically learn from an existing dataset without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it to learn for themselves. https://www.ibm.com/cloud/learn/machine-learning

3/ Predictive modeling is the formulaic application of algorithms to project a behavior based on patterns detected in retrospective data. https://light-it.net/blog/use-of-predictive-modeling-in-healthcare/

4/ Let’s apply the idea of machine learning, ‘predictive policing,’ and ‘predictive modeling’ to prescription opioid surveillance data that relies on machine learning. Keep in mind that the CDC Guidelines (2016) provide the reference thresholds for dose (<90 MME), days (<90 days), units (dose X days), and inclusion (primary care, acute pain) and exclusion (chronic pain associated with cancer pain, palliative care or end of life hospice care). https://www.brennancenter.org/our-work/research-reports/predictive-policing-explained and https://www.cdc.gov/mmwr/volumes/65/rr/rr6501e1.htm

5/ From here it gets murky, primarily because within these algorithms, opioids are associated exclusively with ‘risk of harms’ for persons with conditions associated with noncancer chronic pain. This association was incorporated into the CDC Guidelines (2016) based on low quality evidence and under the undue influence of associates of Physicians for Responsible Opioid Prescribing or PROP. (pI, pp60–68) https://www.cdc.gov/drugoverdose/pdf/prescribing/CDC-DUIP-QualityImprovementAndCareCoordination-508.pdf

6/ Here opioid prescribing data captured by a statewide PDMP is fed into a statistical model that tells the DEA that an aberrant pattern of behavior may reflect a ‘crime in process’ based on accumulated patient, prescriber, or pharmacy data in one or more of 17 elements detected across rolling windows of time. https://static1.squarespace.com/static/54d50ceee4b05797b34869cf/t/5fac5d0d16947a58fe85ba09/1605131535197/DEA+RFP+%282%29.pdf

7/ This crime may be fraudulent billing, wasteful diagnostic testing and treatment, or abuses of medications thought to be associated with system, community, or patient harms. https://www.cms.gov/Outreach-and-Education/Medicare-Learning-Network-MLN/MLNProducts/Downloads/CombMedCandDFWAdownload.pdf

8/ ‘Predictive policing’ attempts to identify the potential for a crime to occur based on the presence of data believed to have a reliable association with a pattern of crime. https://www.rand.org/content/dam/rand/pubs/research_reports/RR200/RR233/RAND_RR233.pdf

9/ The DEA Strike Force can only find a crime when and where they can LOOK for it. Where the PDMP collects information about dose, days, and units, surveillance entities will always perform pretextual investigations upon patients who utilize opioids, the physicians who prescribe them, and the pharmacies that fill them. https://www.dea.gov/operations/ocdetf

10/ Given the very nature of the algorithm, predictive modeling doesn’t predict what physicians, pharmacies, or patients will do; it predicts what the DEA will do in response to indicators and patterns of aberrant behaviors associated with retrospective patterns of opioid prescribing. https://twitter.com/doctorow/status/1422239691034664991?s=20

11/ The DEA will ONLY find ‘harms’ associated with prescribed opioids, prescription fills, and days of use among patients who receive these medications through their physician offices and pharmacies because the only indicators programmed into the PDMP focus on behaviors that have been associated in the algorithm with fraud, waste, and abuse of medications. https://www.ehra.org/sites/ehra.org/files/EHRA%20Recommended%20Ideal%20Dataset%20for%20PDMP%20Inquiry%20-%201.14.19.pdf

12/ Despite claims of patient-centeredness prescribing, there is no data collected about potential positive patient outcomes. The PDMP algorithms cannot predict appropriate use behavior from legal prescribing. ‘Benefit’ or ‘No harm’ has no assigned value in these algorithms. https://www.cdc.gov/drugoverdose/pdf/pubs/2019-cdc-drug-surveillance-report.pdf

13/ That’s not because patients have more illicit medications or are engaged in more antisocial behavior, but because surveillance entities that rely on the PDMP are only checking for harmful behavior among people with prescribed, legal medications. This imposes a form of confirmation bias. (If we build it, it will come) http://www.collegiatetimes.com/opinion/digital-algorithms-are-reinforcing-confirmation-bias/article_a23423fe-a457-11e6-9992-e7a835b30d18.html

14/ Opioid use is reflected as ‘Suspect’ (1) and becomes ‘more suspect’ (1+1) à ‘public menace’(1+1+1) compared to other members of the ingroup data set as case characteristics increase in dose, distributed prescription units, or accumulating days. When that surveillance data is fed into an algorithm that relies on harms (1), the algorithm treats it like the truth and predicts harmful behavior accordingly. https://www.bmj.com/content/361/bmj.k1479

15/ Add to this, naïve ‘experts’ who designed algorithms that lack indicators about patient characteristics, and indicators of benefit or absence of harm, can only find ‘cases associated with risk of harm.’ The system will predict mathematical calculations that we perceive to be empirically neutral, but harmful based on scale of their distribution within the measured ingroup of patients, physicians, and pharmacies tagged by dispensing and or use of opiods. https://www.brookings.edu/research/algorithmic-bias-detection-and-mitigation-best-practices-and-policies-to-reduce-consumer-harms/

16/ By what method are these algorithms biased toward one outcome or another?

17/ The ‘less-is-better bias’ is the phenomenon of ascribing more value (better-ness) to something smaller in quantity (less-ness) in certain situations that we don’t have a good baseline for needed comparisons (think MME, days, units dispensed). when a person judges an option in isolation, the judgment is influenced more by attributes that are easy to evaluate than by attributes that are hard to evaluate, even if the hard-to-evaluate attributes are more important. https://steemit.com/cognitive-biases/@natator88/less-is-better-effect-cognitive-bias-1-of-188 https://www.healthcareitnews.com/news/addressing-ai-bias-algorithmic-nutrition-label

18/ An attribute is said to be easy to evaluate if the decision maker knows how its information about impact is distributed and thereby knows whether a given value on the attribute is good or bad. By claiming there is no evidence of positive benefit for opioids, our understanding of distribution effect is foreclosed and we don’t event ask the question. https://steemit.com/cognitive-biases/@natator88/less-is-better-effect-cognitive-bias-1-of-188

19/ The PDMP is programed to predict the ‘less is better behavioral bias’ that DEA is intent on tracking and prosecuting. The algorithms answer not WHO IS BAD, but HOW BAD ARE MEMBERS OF THE DATASET BY COMPARISON TO MEMBERS WHO ARE LESS BAD? https://bja.ojp.gov/sites/g/files/xyckuh186/files/Publications/Global-JusticeSystemUsePDMPs.pdf

20/ Because fewer opioids are ‘risky,’ they only tag behavior deemed ‘risky,’ and can’t measure or look for positive patient outcomes — because data that is not associated with risk is nowhere to be found. p9 https://www.ojp.gov/ncjrs/virtual-library/abstracts/technical-assistance-guide-pdmp-administrators-standardizing and https://www.wmpllc.org/ojs/index.php/jom/article/view/2675

21/ Where else do we see this AI design problem show up?

22/ Notably, Black Women in AI encountered significant resistance for asserting that facial recognition systems are inherently racist because they overly predict skin types of color as aberrant. https://arstechnica.com/tech-policy/2019/01/yes-algorithms-can-be-biased-heres-why/

23/ Similarly, Kilby (2020) found that that the PDMP algorithm applied to multiple years of CMS billing claims, over-detected chronically-ill patients as aberrant (over utilizers) based on the scale of the prescriptions dispensed, filled, and purchased within the measured group. http://www2.nber.org/conferences/2020/SI%20subs/main_draft23.pdf

24/ You don’t have to have a degree in computer science or be an AI specialist to understand that algorithms primed with biased data can reasonably be expected to predict singularly harmful behavior. Coined in 1957, the phrase “Garbage In, Garbage Out” (GIGO) became an iron law of computing since the days of hand tabulation of data. Yet another inherent problem in data submitted from the states into the PDMP is a lack of standardization in collection and a concerning data error rate. https://towardsdatascience.com/problems-in-machine-learning-models-check-your-data-first-f6c2c88c5ec2 and https://www.ncpdp.org/NCPDP/media/pdf/WhitePaper/NCPDP_Standards-based_Facilitated_Model_for_PDMP_-_Phase_I_and_II.pdf

25/ Sometimes humans cut corners. “If all you have is a hammer, then everything is a nail” is a cautionary tale for scientific malpractice. If scientists don’t address data integrity, the results can impose what has been referred to as ‘vicarious harms’ upon those whose data is targeted by digital surveillance. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3850418

26/ This can be lethal. USDOJ-DEA relies on statistical modeling to figure out which physicians are over-prescribing based on the accumulation of positive data hits on harmful data. All data submitted to the system relies on positive hits (harms) to predict antisocial conduct around the use of controlled substances. https://towardsdatascience.com/problems-in-machine-learning-models-check-your-data-first-f6c2c88c5ec2

27/ The most egregious statistical sin in AI algorithm development is the recycling of what is known as training data to validate a model. Whenever you create a statistical model, you hold back some of the “training data” (data the algorithm analyzes to find commonalities) for later testing. https://towardsdatascience.com/train-validation-and-test-sets-72cb40cba9e7

28/ Machine-learning systems — “algorithms” — produce outputs that reflect the training data over time. If the inputs are biased (in he mathematical sense of the word), the outputs will be, too. Eliminating sociological bias is very hard because it depends on the design of data and questions asked, information collected. https://arstechnica.com/tech-policy/2019/01/yes-algorithms-can-be-biased-heres-why/

29/ Retrospective cohort studies suffer from selection bias where participants are selected based on known outcomes that have already occurred. Short on data, the original developers of the PDMP in Ohio (2015) used a shortcut to train and test their algorithm for predicting aberrant use of opioids on a single set of data of 1687 users suspected of misusing opioids with subsequent mortality. https://academic.oup.com/biostatistics/article/10/1/17/269436

30/ The construction of the PDMP involved assessment of existing cases, and mirrored the same cases to create a control group for training. Then it asked the algorithm to confidently predict that the cases in the control group were also legitimate cases. https://apprisshealth.com/wp-content/uploads/sites/2/2017/02/NARxCHECK-Score-as-a-Predictor.pdf

31/ There’s a major issue in predictive modeling based on data that it has already digested and modeled. It’s the equivalent of asking a witness in a police lineup ‘have you seen this face before’? It becomes a test of recall rather than generalization to the detection of features of novel data characteristics. Have you seen this before (matching, recall) versus ‘Is this LIKE something you have ever seen (categorization, generalization). https://academic.oup.com/biostatistics/article/10/1/17/269436

32/ A training set of data must be representative of the cases you want to generalize to. Machine learning is excellent at recall. The PDMP has repeatedly demonstrated that it can recognize users of opioids and aggregate their use based on dose, days, and units. https://academic.oup.com/biostatistics/article/10/1/17/269436

33/ What the PDMP is NOT designed to do, is detect patients who are using their opioids correctly from patients who are misusing their medications. It can detect physicians are prescribing and dispensing within specific parameters. It CANNOT predict whether prescribing and dispensing is associated with either appropriate use or misuse by patients. It can detect that pharmacies are filling authorized prescriptions. It CANNOT predict which prescriptions will be used appropriately from those that are diverted. https://academic.oup.com/biostatistics/article/10/1/17/269436

34/ Machine learning relies on the use of patterns associated with its own training data. The PDMP only recognizes the presence and quantity of doses, days, units dispensed for people who it is programmed to assume may be misusing the system. People with the same characteristics who are not prescribed opioids are not found in the dataset. What they may do for palliation remains unknown.

35/ Applied algorithms distribute the available data to predict who is engaged in aberrant behavior based on the scale of the data (smaller à larger). It cannot predict harms associated with data associated with unknown users of drugs purchased outside the physician, pharmacy, patient relationship.

36/ Machine learning in AI can impose vicarious harms upon patients, physicians and pharmacies whose experience is captured in the data. These harms are imposed by the treatment of the data by the algorithms that encode specific assumptions or values. https://effectivehealthcare.ahrq.gov/products/algorithms-bias-healthcare-delivery/request-info

37/ Data algorithms can cause great harms if individual health behavior is filtered through a forensic model to compare it to desirable public health outcomes. https://www.practicalpainmanagement.com/resources/ethics/when-opioid-prescriptions-are-denied

https://www.belmonthealthlaw.com/2020/02/04/narxcare-pharmacies-way-of-tracking-opioid-usage-of-patients-what-you-need-to-know/

38/ This brings me to the models that emerge from combining PDMP data with other federal, state and private insurance datasets to create comparative analytics designed to detect ‘aberrant patterns of prescribing, dispensing, patient use.’ Twenty-one public datasets combine to create a Frankenstein data framework for evaluation by AI contractors and DOJ-DEA. https://www.cms.gov/hfpp/become-a-partner/benefits-of-membership

39/ All of this is shrouded in secrecy by nondisclosure agreements among the data partners. The data and methods are covered by contracting agreements with “AI” contractors who don’t have to disclose their source data, data treatment, algorithms used to treat the data submitted by multiple data sharing parties. https://www.cms.gov/hfpp/become-a-partner/benefits-of-membership

40/ Stakeholders most affected by outcomes are not invited to participate in the data design process. This forecloses on the necessary and independent scrutiny that might catch errors of assumptions in algorithm construction. https://wecount.inclusivedesign.ca/uploads/WeBuildAI_Participatory-Framework-for-Algorithmic-Governance-tagged.pdf

41/ It also pits research teams against one another, rather than setting them up for collaboration, a phenomenon exacerbated by scientific career advancement, which structurally gives preference to independent work. It pits governments against physicians, pharmacy companies and patients whose inputs could actually improve the process and reduce the vicarious harms imposed upon them by forensic modeling. https://www.scientificamerican.com/article/what-skepticism-reveals/

42/ Making mistakes is human — the scientific method demands an accounting for disclosure, peer review, validation and reliability testing as a check against fallibility and harms to the public.

43/ The combination of untested assumptions, financial incentives, poor quality practices, and badly designed data make for poor design of clinical guidelines and implementation of public policy. https://www.ncbi.nlm.nih.gov/books/NBK22928/

44/ Without the discipline of good science, nontransparent implementation produces poor public outcomes. These outcomes are pressed into service in the field, offer no benefit, and harm physicians, pharmacies, patients, and public policy at large. https://www.worldbank.org/content/dam/Worldbank/Event/MNA/yemen_cso/english/Yemen_CSO_Conf_Social-Accountability-in-the-Public-Sector_ENG.pdf

--

--