Data Science in a Crisis. Episode 2! The Professionals strike back, or return, or something.
This is another post which isn’t about Covid-19.
It’s about what happens to data science in a crisis and the lessons that we can draw from that, and mostly about what people who are dealing with data science outputs in a crisis need to do. What process and skills they (you) need to deal with what they are being told.
In the last post of this (hopefully, cough, cough) series, I talked about some of the… I’ll be professional… problems and issues around amateur/outsider data science and analysis in a crisis situation.
This time I’ll look at the outputs of the professionals — after all the bard said something about people who don’t love not showing love. So show some love I will.
The objective is not to do a review in the traditional sense, because I don’t have the expertise to do that — and nor do you I think, probably. Instead I want to show how it is possible to evaluate and interrogate the outputs of a data science exercise without technical knowledge — but instead from a professional perspective. My object is to argue that non professionals can do this without simply looking at which graphs seem to be best drawn or how charismatic the people pitching it are. So — onward with that.
Imperial College published some analysis with the snappy title “Impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand.” [1]
The title isn’t catchy (henceforth I’ll call it Covid-NPI’s), but I think that there is a-lot to admire about the contents of the paper. I won’t rewrite it, lots of people seem to think that they can take this paper and represent it for wider consumption. They may be right, but I am afraid that if I were to do that I would potentially misrepresent the paper in the round, I think that the best thing for people to do is to actually read it themselves end to end. This is also not a review of the paper in any traditional sense, I am simply not equipped to properly review it, as I will explain later in this post. Short version : I do review papers as part of my professional practice for the UK national funding bodies, the EU and some other people, but those are about computer science and not epidemiology, and I can’t do epidemiology.
In my last article I looked at the basis of some of the reasoning in the covid19 blogposts I chose and concluded that they came up short based on some straightforward critical reasoning about what they were trying to do with the data and analysis. I looked to see if the work was numerology or data science — had the data been understood and were the models well founded and consistent? I also I set out four ways to evaluate data science without looking at the details, but instead by looking at the things that produce it. That’s what I will try and apply to the Covid-NPI’s paper.
I believe this is a good test of data science work in the real world, you can do it quite quickly, I think that to work through the Covid-NPI paper probably took me about 16hrs — two days work (spread out over quite a lot of evenings because I have a job and a family). Obviously doing the write up took longer, but if you are doing this process yourself you don’t need a decent write up — just a list of complaints to go back to the team with — if you find some that is.
Right let’s go folks! It’s time for a Rodeo!
Step one : what have you been given?
First of all notice that the Covid-NPI’s paper’s title is limited and specific. This isn’t a paper that claims to have all the answers, and it’s purposeful. Covid-NPI has a defined use at the outset; it is for something and it will give clear direction. This is a good sign, but not a deal breaker/maker either way.
The paper is structured with an Introduction, Method, Results and Discussion, again a good sign — but no more.
In the results we see quantitative outcomes — actual numbers, not trends or patterns, but numbers.
In the discussion (and summary) there are specific outcomes and recommendations (tempered by the teams scope of responsibility).
From this I conclude that I have been given a tool for making future decisions — if the tool is good enough to use. If I can’t conclude this from this kind of inspection I can’t really see the point of spending more time on it.
Step two : can the results possibly stand up?
I didn’t bother much with the introduction — I am sure it’s very good, but I really don’t care about the whys and wherefores although I know that if you don’t put them into a document like this some people get really disorientated and upset.
What I want to know was how the team did the work, and can I believe their results? So I focused on the method. The first line is worth quoting “We modified an individual-based simulation model developed to support pandemic influenza planning (5,6) to explore scenarios for COVID-19in GB.” Snappy, what are they on about?
So, I went and had a look at 5 & 6 (you should too, I struggled with some of the content so I won’t comment on what it really means, but figure 4 of [5] is a doozeey and it’s worth knowing about when you are watching the news in the next few months)
[5] Group Strategies for mitigating an influenza pandemic, Ferguson et-al, Nature Letters. Retrieved from http://courses.washington.edu/b578a/readings/ferguson2006.pdf March 2020
[6] Modelling targeted layered containment of an influenza pandemic in the United States Halloran et-al 2008. PNAS
Retrieved from :https://www.pnas.org/content/pnas/105/12/4639.full.pdf March 2020
[5] tells me that the model that is in use was developed on the basis of high resolution population data — so I look up that data (helpfully referenced — the link is now dead, but 2 seconds on Google finds the right link) and find [3] the data repository of Landscan. I have a look at the resource and discover that this data, complied by the DoD in the US gives expected population values for every 30m*30m square in the USA and the UK (in fact in the whole world basically). I look at the samples and discover that I can now find out where the population of Cyprus hangs out for free — admittedly if free includes spending several hours fiddling round with strange Python libraries. But it’s there and contains data that corresponds to reality as far as I can see (for Cyprus).
However just having expectations about where people are during a day isn’t enough, so [5] describes the need to generate mixing areas — schools and workplaces — and distribute them relative to population density. I suppose that this is something that you could pick on; the artificial population of schools and workplaces will be a little different from the real world one. But Covid-NPI is clear about what it is that they have done to create this part of the data. They go on to describe how they have acquired another part of the data for their model — the information on the type of people that they are simulating in the epidemics of interest [4] 7,800 people took place in a study to understand how people in different demographics mixed, with 90k interactions recorded. Overall it is clear that the model of population and behaviour is meticulous and comprehensive, and significant effort has been expended in underpinning with quality data.
In [6] the model used in Covid-NPI is compared empirically to two other models. First of, hang onto your seats here folks — it turns out that the Covid-NPI authors wanted to make sure that their models were right, and so they thought of a way to test them, and then they tested them. Now, even if the evaluation and comparison turns out to have been weak sauce the fact that they have tried to do it is a serious marker of quality. In the spirit of being a bit of a spoil sport though let’s have a go at the comparison and see if we can debunk it. Because that would be fun!
There are three aspects of the comparison that are informative in judging the quality and applicability of the model — in plain speak… why should you trust this ?
1. What is being compared ?
2. How are they being compared ?
3. What is the outcome?
There are two other models — Uni Washington/Fred Hutchinson Cancer Research/Los Alamos National Research and Virginia Bioinformatics Insitute. Both of these groups appear to be founded in large insitutions enjoying long term stable funding (I am sure they would like larger, longer term and more stable funding, but it looks pretty good to me)
On the how the models are compared “The intervention scenarios and baseline R0 values examined were selected in consultation with government employees working with the Homeland Security Council and the Department of Health and Human Services in the United States” [6] so this test was not just made up — it was deliberately designed to demonstrate something meaningful.
What is the outcome of the test — well “Especially at values of R02 or below, the more probable values for a pandemic strain, the interventions are similarly, although not identically, effective in all three models. At the lower R0, all three models show considerable effectiveness of the suite of NPIs. School closure plays an important role in all three models.” This is interesting, it shows that the models are generally agreeing but highlights some of the differences between the evaluation scenario and what I understand of the current position. I’m not expert enough to speculate though, so all I can really draw from this reading is that a detailed evaluation of the models was made by a team that included people external to the model development team and then this evaluation has both passed peer review, passed the review of an invested funding body (National Institutes for General Medical Sciences) and (as far as one can tell) has stood the test of time of community evaluation in that these papers have not been withdrawn or superseded. I checked that by looking up citations of [6] with the intent of discovering any criticism. A comprehensive survey [7] cites it as part of a second “golden age” of progress in epidemiology. In [8] it is cited as an example of a group of studies that motivate the development of guidelines for conduct. Neither reference criticizes [5] or [6].
Step 3 : is the method used to underpin the results logically and systematically ?
OK- so having taken a hatchet to the methods in the paper and come away with a blunted and broken hatchet and no significant scratches in the machine presented by Covid-NPI let’s examine one of the artifacts that the authors develop and I read as representative of the Covid-NPI papers reasoning.
I’ve copied this directly from the paper so it’s labelled figure 2 (there is no figure one in this post!) So, is this any good? The first thing that I note is that there is a benchmark projection : labelled “Do nothing”. Good analysis provides a base case, this has a base case. The second thing I note is there is a red line across the chart — this is the capacity of the health system, and is the specific limit that the authors hoped that the mitigations charted would avoid. This provides something for chart to be measured against and a simple take away conclusion. The line isn’t “going to the moon” or “exponential” neither of which are genuinely useful apart from as commentary. The chat tells us that a set of interventions are not going to work. The authors have driven a nail into that and now they move on.
Step 4 : am I being blinded with science?
So, I can’t undermine the arguments in the paper with simple analytic/critical thinking. But are the team a good team working in good faith ? They could be plausible liars, they could — in good faith be trying to spin a line, or they could be doing this with out and out malicious intent. I could be fooled, this could be BS.
How can I know?
I think all we can do look at the characteristics of the team & process. In the last post I set out the following tests of this .
- Process (was there a process that produced the analysis?)
- Team (is there a team that has worked together to develop the analysis — or is this a lone wolf effort?)
- Experience (are there people in the team with domain expertise?)
- Behaviour (do the team stand together ? Are they open to discussion?)
For Covid-NPI was there a process? When I downloaded the report I got it from the MRC centre page with Covid-19 updates, I had been looking at that for a few weeks already and I was aware of the other reports there — you can look at them too [9]. Covid-NPI is number 9 in a series of reports by the centre investigating the properties of the epidemic/pandemic. The first was published on 17th Jan. This progression of work — looking at the details and fathoming the parameters of a model is strong evidence of a systematic and deliberate process of knowledge creation, confirmation and dissemination.
On team, the paper trail gives some evidence that teams grew from a core group of about eight to 31 for Covid-NPI. I expect that a lot more people were involved in supporting the report’s authors — there will be students running and debugging things, tech support and more. A team of 31 sounds impressive — but 31 silly people makes a team of silly people. How do we know that this is not a team of silly people?
The signifier for me is that this is a team with affiliations to four institutes, this means a multiplicity of income streams and collaboration structures. It is true that some PI’s in academia are silly people who have either lucked out, won the “game of thrones” style politics of academia (either by being very nasty or very socially adept) or are just independently wealthy and therefore able to build a career regardless of the derailing blows that empty out cohorts of the less blessed, but in my experience they are quite rare, and other academics are quite wary of them. Additionally folks like that are generally rather poor at being part of the sustained collaborations that knit together cross institutional teams like this.
Experience : there are 31 people on the author’s list. Obviously the lead author is extremely experienced (we know due to having had a look at [5] [6]) but what of the others. I sampled a few of the contributors at random to find out. Arran Hamlet is in the list, he is a research assistant and Ph.D student at Imperial. He has had experience working on yellow fever outbreaks in Africa and Brazil and also some experience of working on the progress of Ebola like viruses. Hayley Thompson is also on the list, she has been working on a project to model the impact of vaccines on Malaria transmission. Han Fu has published models of infection of tuberculosis. It seems that the team have been involved in a diverse set of real world projects with commonalities to the work.
Behaviour : Ferguson (first author) has been prominent in the media discussing his team’s findings (right up until he came down with virus symptoms) and Ghani has also appeared frequently on the media. The team do seem to be engaged and open to debate.
Using the process
So — I can’t find a thing to point to and complain about in Covid-NPI, so am I saying it is right?
No — I am saying that I just can’t find a problem with it. I am not competent to say if it is right or not. I am a professional data scientist, I know Python, Julia, R, Tensorflow and all the tricks. I know Bayesian statistics, although some of my collaborators know that I struggle with them tharr math of conjugate distributions business, but anyway I try very hard at Bayesian statistics.
I know that I can’t reproduce the work in Covid-NPI, at least not this year, even if I had all the data to hand. I don’t know enough to be able to point to alternative methods or approaches.
Despite 25 years of training and experience in a near neighbor of the topic underpinning the Covid-NPI work I am just a bystander and consumer, and I have to evaluate what I see on the basis of the signifiers above. It’s a funny feeling for me as I have happily out analyzed people for my whole adult life. Almost everyone is clueless and makes the same dumb mistakes time and again and the sieve that I went through above fishes them out and I can dance about pointing at them and hooting and waving my bum in their general direction.
Here I have to say “I don’t know, those people seem to have done a really good job and now I have to decide if I am going to be rational and back them as the best bet, or irrational and back something else that is more attractive to my prejudices and preconceptions.” What’s a lad to do ehh?
Wrap up context witter
Of course for the Covid-NPI case and team there is a really nasty twist — if people do back this analysis and follow it then they will be proven wrong, because the data will not emulate the nasty graphs and people will point at this and say “Y2K all over again innit”. If people don’t follow the analysis then the data will follow the nasty graphs (I am backing them implicityly now) and then people will say “didn’t work, load of rubbish”. The Covid-NPI team will feel pretty sick about this, and it will go on and on, but hopefully other people will say “ok, here’s a seat in the House of Lords and lots of cash for your research” which might be a consolation to them. Especially because over time I expect that the people throwing stones will look crazier and crazier and the seat in the House of Lords will feel comfier and comfier.
But that is all in the future anyway, good luck and god bless. For me the interesting thing for now is how this paper has been received in the throes of the crisis. Folks, it’s been received in exactly the same way that bad news has been for time in memorial, unevenly and in some quarters not so well.
So, this is problem 2.
It is very, very hard to accept bad news. It is very, very easy to throw bricks at teams of people who aren’t necessarily charismatic, fun, people like me, people who I want to be like, people in the latest style… More classically for data scientists, people who aren’t on the front line, people who didn’t go to the school of hard knocks, people who don’t know about marketing (sales, engineering, accounts, HR) — people who aren’t in my tribe.
At the same time it’s critical for data science teams and organizations that host them that their insights are not the good news only, that’s not only the way to get blindsided, it’s the way to get blindsided having spent your contingency and dumped your insurance. Or one might say that it’s the way to get infected by a pandemic having destroyed your capability to monitor and react to it, and to deal with the outcome.
How can we stop this from happening though? It does happen (cf. no testing, no management, no coping) — how do we win the land war, do the hand to hand combat and retain our integrity? What do we do when the bad news has to be delivered and how do we survive afterwards?
I might write about that next. If I can be bothered, and if the world hasn’t ended in the meantime.
References
[4] Mossong J, Hens N, Jit M, et al. Social contacts and mixing patterns relevant to the spread of infectious diseases. PLoS Med 2008;5(3):0381–91.
[5] Group Strategies for mitigating an influenza pandemic, Ferguson et-al, Nature Letters. Retrieved from http://courses.washington.edu/b578a/readings/ferguson2006.pdf March 2020
[6] Modelling targeted layered containment of an influenza pandemic in the United States Halloran et-al 2008. PNAS Retrieved from :https://www.pnas.org/content/pnas/105/12/4639.full.pdf March 2020
[7] Pastor-Satorras, Romualdo, Claudio Castellano, Piet Van Mieghem, and Alessandro Vespignani. “Epidemic processes in complex networks.” Reviews of modern physics 87, no. 3 (2015): 925.
[8] Den Boon, Saskia, Mark Jit, Marc Brisson, Graham Medley, Philippe Beutels, Richard White, Stefan Flasche et al. “Guidelines for multi-model comparisons of the impact of infectious disease interventions.” BMC medicine 17, no. 1 (2019): 163
[9] https://www.imperial.ac.uk/mrc-global-infectious-disease-analysis/news--wuhan-coronavirus/