Weeknotes S1E6

12 min readApr 6, 2019

My name is Jess. I am AI lead at the UK Department of Health and Social Care (DHSC)(in the process of becoming NHSX). I’m also an MSc Student at the University of Oxford’s Internet Institute (OII) and a Research Assistant at the OII’s Digital Ethics Lab (DELab). I am supervised by Professor Luciano Floridi.

Things that happened

It was April Fool’s Day on Monday so we had a little bit of NHSX fun (mostly this made me realise that I laugh too much at my own jokes):

The AI Mission* Delivery Board took place at the beginning of the week. It was chaired by our Minister for Innovation which always adds a bit of extra pressure. Luckily I think it was a pretty good meeting although this might primarily be because Indra and I drew a picture and used post-its in a senior-official board meeting. The picture was drawn in multi-coloured sharpies and the post-its were all bright pink. This, combined with the fact that we only decided to do this at about 9pm the previous evening, made me really happy for some inexplicable reason. Things we discussed included: (1) what metrics do we want to assign to the Mission/ what should it be specifically aiming to achieve?; (2) next steps for the development of the Code of Conduct. Reactions to plans were positive and I’m hoping to be able to share more very soon**

*The AI and Early Diagnosis Mission is from the Prime Minister and is ‘to use data, Artificial Intelligence and innovation to transform the prevention, early diagnosis and treatment of chronic disease by 20130’

**In an ideal world we wouldn’t make decisions in closed boardrooms

When Indra and I blogged about the new version of the Code of Conduct in February we said that we would have to get much clearer about what ‘good looks like’ for principle 7: ‘Show what type of algorithm is being developed or deployed, the ethical examination of how the data is used, how its performance will be validated and how it will be integrated into health and care provision.’ We’ve been working with a variety of partner organisations, including Future Advocacy, to deliver on this promise. This week we began testing our ideas with an expert roundtable (we’ll be testing with patients and those in the care community next week and a roundtable with a wider group of stakeholders). The notes I took down to think about [NB these are notes from my policy brain not my technical brain — I will need to write something more in-depth from that perspective later]:

Guidance is not sufficient if not embedded in a wider infrastructure to help support companies of all sizes to demonstrate good practice in this area. It’s complex and we need to make sure that we’re not giving large companies with large amounts of resources to dedicate solely to this a significant advantage over SMEs or academic researchers. This might include hosting workshops with facilitators to guide inexperienced developers/commissioners etc. through the process.
What good looks like in this area needs to be focused on the problem that is being solved not the solution as this provides context that will be essential if we are to be proportionate to risk (i.e. it will need to be defined at the appropriate level of abstraction).
We need to be talking in a common language, that of ethicists, data-specialists, technologists, developers, business managers, patients, doesn’t match. I personally really like this glossary from IEEE ethically-aligned design and would like to at least explore whether we can adapt this before we try to do something different.
We should make it possible for any companies demonstrating good practice to publish this in a centrally accessible location so that it becomes something that companies actively want to do as a way of staying competitive. We can then also use these examples as case studies to demonstrate to those that are less confident in their ability to follow any guidance we provide, that it is possible.
Ethical examination and performance validation will require bringing together wide perspectives, not just in terms of diverse viewpoints but also in terms of the definition of ‘performance’ e.g. economic, social impact, clinical effectiveness etc.

NHSX announced some stuff. It’s genuinely starting to feel exciting as it comes together. There’s still a lot to work out but I like the direction of travel and the values that are embedded e.g. promote openness and standardisation, limit hierarchy, are closely aligned with my own personal values. Change is always scary but I’m choosing to embrace it.

Things I finished

I finished reading and responding to the WHO’s Public Consultation on the Draft Global Strategy on Digital Health:

It defines Digital Health as:

“the field of knowledge and practice associated with any aspect of adopting digital technologies to improve health, from inception to operation”

Its Vision Statement is:

“Improve health for everyone, everywhere by accelerating the adoption of Digital Health

I very much like the fact that, as it says in the document, the definition of Digital Health that it adopts, frames digital technologies as a means to an end putting the emphasis on improving health rather than on specific ‘widgets’. I think ideally you would also want clarification of what ‘improving health’ means or at least recognition of the fact that this will look very different for different people and we should be creating a digitally-enhanced health and care system that is tolerant of this plurality. Extending this further, (although I haven’t said this part in the response because it would be daft) and being a proper nerd, I would like to adapt the mission statement to:

‘Improve health for everyone, everywhere by making full use of all available human knowledge and practice’

This is because I think that what we should be doing is agreeing culturally sensitive ‘improved health’ goals and enabling these to be achieved with the best knowledge and associated tools we have available to us on any given day. This knowledge and its associated tools of practice (available technologies) may or may not be digital, or at least not directly digital.

The strategy does do a reasonable job of presenting a balanced view of the opportunities and risks associated with digital health technologies [DHTs]. It states that [DHTs]“open new ways of interacting with individuals, citizens, families, communities, patients and health care workers” but also introduce new risks associated “with the increasingly sophisticated collection and misuse of personal data” and that “it will promote ways to protect populations against the misuse of information, cyber-attacks, fraud, extorsion, fake news, racism and other human-rights violations” as well as “stimulate the adoption of health technology assessment methods that support and encourage effective innovation.” Where it falls down is linking this recognition with its discussion on metrics for progress evaluation, which is almost entirely focused on progress measurement and monitoring i.e. it assumes that we should launch the digital technology in question and we only need to measure its relative level of success. There is no recognition of the need to create metrics that can be used to assess whether harm would come to the overall system as a result of introducing a new DHT rather than assessing the DHT itself, or, for example, whether there will be metrics to ‘improve the health’ for specific population groups that are currently very underserved. I think this is extremely hard to do (or at least to do well) but I worry that if we don’t do this as an international community we will create significant pockets of cumulative disadvantage/advantage. This is because:

A) Digital technologies run on data

B) The overarching nature of healthcare data is highly biased, skewed towards more affluent communities (especially in places that don’t have a national health service).

C) These communities tend to already be served by high performing healthcare establishments.

D) The fact that these healthcare establishments are performing well, means they do not get penalised (fined) for underperforming and/or it means that they make significant more profit.

E) This means that these establishments have more available resources to invest in digital technologies.

F) These establishments get more investment to act as the ‘leaders’ in this space even though this further investment will make very little difference to the outcomes of patients that are served.

G) The gap between the low-performing areas and the high-performing areas widens, undermining the self-efficacy of the under-performing areas and limiting their desire to ‘try’ to invest in digital.

H) This process is mirrored in the health outcomes of the people served by these under-performing areas, whose confidence in their ability to improve their own health (because they have limited tools available to help them do that compounded by environmental factors) diminishes and they become disinterested in the digital agenda meaning that they generate less data.

I) This entire cycle becomes self-reinforcing.

Instead I would really like to see commitment at an international level to improve the health outcomes of underserved communities and make an effort to understand what barriers can be alleviated by capitalising on digital technologies, and investing there at the bottom rather than at the top first. This will have a greater likelihood of achieving the goal of ‘ensuring no one is left behind’ and also create a greater opportunity for demonstrating evidence of impact when the potential for ‘gain’ is so much larger than it is in areas already doing well. We would then need to complement this with a mechanism for measuring ‘harm’ (this is very different to measuring under-performance) and having a way of dealing with this when identified (e.g. withdrawal of the service and replacing it temporarily with an ‘offline’ option)

I also finished reviewing the FDA’s discussion paper on a Total Product Life Cycle approach for continuously updating AI clinical software.

[Thanks to the mansplainer in my comments] I am aware that the FDA is the regulator in the US and that there will be differences between their market and that over here but I always think it’s good to look for opportunities to share learning both ways. I strongly believe that in this area where there are so many ‘open questions’ there is no point in being competitive in a ‘mine is bigger’ or ‘we went first’ fashion, the aim should just be to make good stuff happen, regardless of who came up with the good idea. Then we can adapt it to fit the context.

I think it presents an interesting proposition and I like the three stage approach (it’s very similar to our develop, deploy and use model) but I don’t think this (yet) represents a truly Total Product Life Cycle because the entire focus is on the output and monitoring the model. There is very little discussion in the paper of the need to create ‘check-points’ of regulation at every stage of development from idea → data collection → data preparation → model selection → training of model → testing of model → evaluation of model → parameter tuning. It assumes that there is a mechanism for monitoring continuous development at set times once the model is deployed and trusts that if a pre-market test of the company in question indicates that it abides by principles of good behaviour, there’s no need for regulatory approval for any of these first phases. There may be very good reasons for this (and I could sit and debate the pros/cons for several very boring hours) and so I wouldn’t want to assume that they are not thinking of this, it just currently stands out as not-quite-true to say this is a TPLC approach when half of the lifecycle isn’t covered, in this particular document anyway.

I finally ‘finished’ the typology of tools/methodologies available to ‘do’ ethical/responsible AI. It’s been a really great piece of research to conduct because it’s helped to quantify my beliefs that ‘we’ as an ethics in AI community are spending way too long debating the issues rather than just getting on and doing something about it, when there are already so many people doing exactly that. We need to come together and start testing the tools/methods that are available, making the results of this testing process openly available for reproducibility purposes and quickly working out what works and iterating what doesn’t.

Things I continued to work on

I’ve started to draft (and will finish tomorrow) the paper that accompanies the typology set out above.
A collaborator and I have continued to work on a paper exploring the use of mental health/wellbeing apps.
I started drafting something that writes up more formally the work Indra and I have been doing on creating the ecosystem for the safe development, deployment and use of data-driven health and care tech.

Things I thought about

Mostly this week I have thought a lot about openness and what that truly means at an organisational level. At the moment I think it’s often coming too late in the chain of events. It’s one thing to publish the results of a study, or the outcomes of a decision and to ask for feedback, it’s another thing entirely to be open about how the decision was made in the first place or, indeed, to make the decision publicly. Working within central government I can completely understand how this might feel slightly terrifying but I also think it would ensure public acceptability at the point of initiation because all of the kinks could be worked out earlier. With this in mind, I have been wondering whether it’s time to rethink the purpose of Non-Disclosure Agreements. They are definitely important when it’s necessary to protect commercially sensitive information or information of national security etc. but I think when we are talking about decision-making processes, advice given etc. about public services it might be time for us to think of a new operating model.
In line with this I’ve been thinking a lot about the value of community and community creation, as well as the importance of open source. I can’t help but think in the responsibly applied AI space, particularly in health, we should let people come together, highlight work that has already been done of an excellent quality, have discussions as a community about where the gaps in resources/materials/tools/methodologies etc. are instead of constantly thinking everyone has to do the cool thing independently of each other. If everyone wants to achieve a common goal, we should let everyone be involved and we should think about the variety of ways we can enable people to be involved so that we encourage as much diversity of thought and input as possible.

Things I learned

I watched this tutorial from Kolter and Madry (2018)on the theory and practice of adversarial robustness. It provides a pretty in-depth overview of the topic and combines both maths and illustrative code to highlight some of the key methods to developing deep learning classifiers that are robust to perturbations of their inputs by an adversary that is intent on fooling the classifier.
I will not attempt to explain it in detail, but I have been thinking about how one of the ways we make ‘AI’ meet the criteria of being non-maleficent is to make models resistant and robust so that outside attackers cannot make a system cause harm.

Things I read

Brothers, K. B., & Rothstein, M. A. (2015). Ethical, legal and social implications of incorporating personalized medicine into healthcare. Personalized medicine, 12(1), 43–51. doi:10.2217/pme.14.65

Colijn, C., Jones, N., Johnston, I., Yaliraki, S., & Barahona, M. (2017). Towards precision healthcare: context and mathematical challenges. Frontiers in Physiology, 8. doi:10.3389/fphys.2017.00136

Caulfield, T., & Zarzeczny, A. (2014). Defining ‘medical necessity’ in an age of personalised medicine: A view from Canada. BioEssays, 36(9), 813–817. doi:10.1002/bies.201400073

Feeney, O., Borry, P., Felzmann, H., Galvagni, L., Haukkala, A., Loi, M., . . . Vears, D. (2017). Genuine participation in participant-centred research initiatives: the rhetoric and the potential reality. 9(2). doi:10.1007/s12687–017–0342–4

Finlay, T. (2017). Testing the NHS: the tensions between personalized and collective medicine produced by personal genomics in the UK. New genetics and society, 36(3), 227–249. doi:10.1080/14636778.2017.1351873

Hood, L., & Friend, S. H. (2011). Predictive, personalized, preventive, participatory (P4) cancer medicine. Nature Reviews Clinical Oncology, 8, 184. doi:10.1038/nrclinonc.2010.227

IEEE Ethically-Aligned Design Standards

James, J. (2016). The Charms and Harms of Personalized Medicine11A shorter version of the text of this chapter was published in the European Journal of Epidemiology (James, 2014). In (pp. 245–281).

Juengst, E. T., Settersten, R. A., Jr., Fishman, J. R., & McGowan, M. L. (2012). After the revolution? Ethical and social challenges in ‘personalized genomic medicine’. Personalized medicine, 9(4), 429–439. doi:10.2217/pme.12.37

Kleine, D. (2011). The capability approach and the `medium of choice’: steps towards conceptualising information and communication technologies for development. Ethics and Information Technology, 13(2).

Nath, S. (2018). Risk Shift: An Institutional Logics Perspective. Administration & Society, 0095399718760581. doi:10.1177/0095399718760581

Nyatanga, L., & Dann, K. L. (2002). Empowerment in nursing: the role of philosophical and psychological factors. Nursing Philosophy, 3(3).

O’Hara, K., Tuffield, M. M., & Shadbolt, N. (2008). Lifelogging: Privacy and empowerment with memories for life. Identity in the Information Society, 1(1).

Riso, B., Tupasela, A., Vears, D. F., Felzmann, H., Cockbain, J., Loi, M., . . . Rakic, V. (2017). Ethical sharing of health data in online platforms- which values should be considered? Life Sciences, Society and Policy, 13(1).

Shneiderman, B. (1990). Human values and the future of technology: a declaration of empowerment. Acm Sigcas Computers and Society, 20(3).

Some more technical things:

https://github.com/slundberg/shap

https://arxiv.org/pdf/1705.07874.pdf

https://github.com/andosa/treeinterpreter

https://arxiv.org/abs/1704.02685

https://github.com/kundajelab/deeplift