Propublica’s #surgeonscorecard should be retracted

I’d given up being a journalist — despite years in the field, some awards and a master’s degree — after spending about six weeks trying to cover other journalists for FishbowlDC. Let me be clear: I ultimately left FishbowlDC because the site was unnecessarily mean and vicious. I thought, unwisely, I could bring a different tone. I refused to write some things I didn’t agree with. An unraveling relationship with my editor an a growing discomfort being associated with the site and I finally asked for my contract to be terminated.

But I learned a lot. There wasn’t a day that went by that someone didn’t hate me. Between the emails from angry New York Times reporters and National Review editors — all of who demanded they be covered in a different way than they covered their own subjects — there were the lower rungs on the ladder, like the Buzzfeed reporters with knacks for rewriting their own public histories. It was astonishing, sometimes, the way journalists would challenge publicly verifiable facts about themselves if those facts, when reported, made them look bad. The words “unmitigated liars” came into my vocabulary and I suddenly looked at my profession in a really different way.

It became clear when, one night, out running off anger at the way some new reporter tore into me for having the gull to report their actual words as fact, that I realized I didn’t want to be a journalist anymore. Rather, I didn’t want to be associated with the kind of people I was covering.*

Fast forward to where we are today, and Propublica, an organization I once aspired to work for, has published a Surgeon’s Scorecard. A statistical analysis that purports to calculate the complication rates of various surgeons based on Medicare billing data it got through an odd (and probably legally unenforceable) agreement with the government.

Propublica is revered by other journalists and journalism organizations. I’ve rarely, if ever, seen any reporter seriously challenge their work. That’s a recipe for disaster, because as we’ve seen with the Surgeon’s scorecard, they start to believe their own hype. It’s also a dereliction of the duty most journalists would profess to hold — the Society of Professional Journalist’s Code of Ethics requires that journalists hold each other publicly accountable for their mistakes. But as I learned in my short time at Fishbowl, that only happens when someone isn’t a friend or liked.

Let’s get this out of the way. I know and am very, very close to a lot of surgeons. None of them, from what I’ve seen, are in Propublica’s database. Spending the last ten years around doctors has given me a lot of insight. It’s never stopped me from holding them accountable, though. Ask WVU and its medical school what a pain in the ass I was when I was in grad school there. There’s a Dermatology practice in SoCal that threatened to sue me and the now-defunct San Diego News Network for my reporting. And, in theory, I believe publicly available complication rates are an idea that merits a lot of study. There’s evidence in New York that this has lowered patient mortality rates. There’s also evidence that doctors began cherry-picking easy cases to protect their numbers. It’s a complex topic, one Propublic didn’t do justice to.

First, let’s be clear — they’ve not published “complication rates.” Surgeons know their actual numbers. What Propublica has here is a statistically-derived number that they came to through a particularly tortuous process. They used ICD-9 codes, billing information, and a panel of doctors and experts to decide if a patient had a complication related to a recent surgery based on that code and that code alone. They had no other access to case details, but did have some some basic demographic information about the patient that they used to control for things like age and general health.

Billing codes where never designed to communicate this kind of information, and they shouldn’t be used this way. That’s fatal flaw number one. The codes are often not chosen by the surgeon and the surgeon often has no oversight to make sure they’re correct. I spent six months arguing with WVU and its lawyers over a mistakenly entered billing code on one of my own bills. An attorney even sent me a copy of the doctor’s order, thinking it proved him right. It didn’t.

Beyond that, the codes don’t provide enough information to know if the diagnosis was actually a complication related to the surgery. In other words, Propublica is reporting these complications as fact when they are at best, guesses. When organization’s expert panel couldn’t agree on a complication, the case was tossed and not included in the analysis. As far as I can tell, they’ve not given more detail on that process. How often did doctor’s disagree? What was the threshold for tossing a case? Did disagreement arise for a certain diagnosis code or procedure more often than others?

Propublica’s agreement with the government means they can’t share their data, so we can’t ever be sure. But selection bias is a huge problem — it can dramatically alter the analysis — and the questions surrounding deserve clearer, more concrete answers. One doctor, on Twitter, said Propublica showed he had a zero complication rate based on their analysis. The problem is, his real rate, by his own admission, wasn’t zero. That should stop any journalist in their tracks. It hasn’t here.

And it doesn’t get better from. The sample size dictates a 95 percent confidence interval. That means, for all intents and purposes, the results for many doctors in their analysis are absolutely meaningless. Check the database for yourself and note how many overlap all three complication rate categories: low, medium and hight. As a patient, whom Propublica says it has in mind when it published this data, what are you supposed to do with that? Basically your surgeon is very, good, or okay. Feeling safer?

The results just can’t be trusted. One post-analysis I’ve seen was arguing, convincingly, that Propublica’s results were not much better than random chance. Random chance. That’s usually a death knell for statistical analysis.

And that brings us to another point about complications — Propublica didn’t control for the fact that some doctors are fixers. They have a habit of taking on very complex patients or very difficult cases with inherently higher complication rates. Does that mean they’re an unsafe surgeon? No, it might mean they’re an amazing surgeon who could be a patient’s only hope. Propublica, however, runs the risk of of branding them as unsafe and that’s not only unfair, it’s probably defamatory.

On Twitter, a Propublica staffer who worked on the story has said they’ve considered submitting their research to journals for publication, but they were in a hurry to get this information out the public. If you’re a journalist, you understand that motivation. You have something that’s going to get you a lot of attention. Why bury it in a journal? In this case, skipping peer-review was a huge mistake.

For one thing, it would’ve exposed the flaws in their methodology. The senior reporter on the story is fond of touting the expert opinions of those who worked with them on this analysis as proof they did a good job. Fine. These doctors and public-health experts are proud of their work. That’s not peer-review, however. Last time I checked, Mark Regenerus is still really proud of his now-debunked research pitching gay parents as a danger to children. And that piece actually underwent something that looked, at the time, like peer-review.

But more importantly, it’s a sign of arrogance and hubris. They’re entering a realm of science. They’re using science to do journalism-type things. Doctors use science to do healing-type things. Evidence-based science. Peer-reviewed science. What Propublica is saying to the thousands of surgeons in their database, who’ve spent their entire adult lives in the sciences, is that we — an upstart of journalists — are so good at science that we don’t need your peer-review. We can do this so well, that a process refined through time by brilliant minds to help prevent exactly these issues is useless to us. We’re just that good.

And look where this has gotten them. On the defense and under attack by doctors who otherwise might welcome a new metric to gauge safety and better protect patients. They could’ve had allies, and now they just have enemies. In their zeal and, I’ll say it again, hubris, they’ve now risked making doctors even more resistant to transparency. A complication rate is something every. single. patient should know before undergoing a procedure — but the number has to be accurate. It has to be real. And it has to be put in context.

Propublica has targeted surgeons because of the high-minded ideal that surgeons should be responsible for everything that happens to a patient under their care. That’s a good policy, because the surgeon is the one best positioned to care for the patients at all levels. But as the crux for a journalistic pursuit regarding complication rates, it doesn’t bear much on what actually causes complications or what happens in real-life situations. Surgeons, for example, have zero control over anesthesia. Anesthesia related complications, at least as far as Propublica’s database is confirmed, is a ding against the surgeon. Fair? Does it accurately represent the surgeon’s ability? It doesn’t seem even this very obvious fact has been considered.

That leads me to this. The database is so flawed it should be retracted. The more the statistics are picked at by outside experts, the more they fall apart. The confidence intervals alone should’ve given any competent researcher pause. And if Propublica actually cares about its work and its own integrity, it should immediately submit its work to peer-review and at least suspend access to the analysis while that process is ongoing.

I have other suggestions, like, say, examining the inherent bias in giving a staffer the title “patient safety reporter” (or, at least, allowing that reporter to say they cover “patient safety”). If you don’t have the self-awareness as a journalist organization to see how you’ve cast one of your own as an advocate first and journalist second, you’ve got some very big problems, (ones that, as I’ve alluded to, can come from not being challenged by your colleagues).

The second is to actually listen. I’ve seen a lot of snarky comments on Twitter from Propublica staffers, I’ve seen a lot of deflecting (I’ve been told several times that I can contact all of the experts who’ve helped with the database, even the ones they allowed, for some inexplicable reason, to go on background) and I’ve seen a tremendous amount of just ignoring legitimate questions and concerns. Late yesterday, Propublica found a flaw in its code that was omitting entire groups of doctors and they were then asked how they can prove no actual cases were missing. That’s something that could actually effect individual surgeon numbers.

We got silence.

I’ll give you that again. Silence. To a scientist asking questions about data in say, a paper, that would be a very bad sign. What should we take it as in regard to Propublica’s work?

And to the journalists covering this database — the dozens of you parroting back Propublica’s announcement language as though you were stenographers and not reporters without casting nary a critical eye on the project — do me a favor (and forgive the language), get your heads out of your asses and do your damn jobs. I’ve yet to see a story by a mainstream outlet covering the critical backlash to the Surgeon Scorecard, and I’m not holding my breath that there ever will be one.

It seems Propublica wants to have it both ways. They want to do journalism, and they want to do science. They just haven’t proven that they know how to do science and safeguard the process — and as history has taught us, bad science is in no one’s interest. The public’s, or otherwise.

ps — I’m sorry for the typos and poor writing. A lot of thoughts and I have an appointment coming up soon that I can’t get around, so this is pretty raw. Update: some light edits for clarity… pesky pronouns.

*I was thinking about this and just want to add, I know a lot of amazing journalists — my former Citybeat colleagues and many others I met in San Diego for some. They didn’t do the kinds of things that attracted Fishbowl’s attention, precisely becaus they’re so quality.