Ersilia becomes a Digital Public Good

The Ersilia Model Hub has been recognised as a public good by the Digital Public Goods Alliance

Miquel Duran-Frigola
ersiliaio

--

We have been incorporated into the Digital Public Goods Alliance (DPGA) catalogue, an initiative that recognizes software attaining the UN’s Sustainable Development Goals (SDGs). Ersilia’s target SDGs are SDG 3 (Good Health and Well-Being), SDG 4 (Quality Education), and SDG 17 (Partnerships for the Goals). This recognition by the DPGA is particularly significant as we try to understand the role that research software should play in the rather messy ecosystem of scientific code repositories and how much effort we should put into making our tools truly accessible, well-maintained, and robust. To be labelled as a ‘public good’ is somewhat reassuring, too — there is something in this pair of words that makes things feel a bit more tangible, which is very gratifying.

The 17 Sustainable Development Goals. Source: United Nations

I like it when material terms are employed to describe software. For example, I enjoy software being referred to as ‘infrastructure’. The Invest in Open Infrastructure (IOI) initiative publishes articles along those lines regularly, adding nuance to what it means to be essential in the digital world and discussing how certain software needs to be considered and protected as a common good. In the last year, we’ve had the chance to participate in a few related debates, most notably at RightsCon, the most important convening for human rights in the digital era. As expected, at RightsCon most narratives focused on AI and, more occasionally, on open source, and it was interesting to see how things get complicated as we factor in data privacy, ownership, security, justice, and so on. A mantra in these kinds of meetings is that the internet is almost entirely built and sustained by open source — and that, ineluctably, we need to ask ourselves if we will be silly enough to let the private sector take control of the next transformative technology. Despite all the activism and resistant voices present at conferences like RightsCon, nobody really questioned that AI will soon underpin many aspects of our lives.

A molecular model produced by AlphaFold 3. Source: DeepMind.

This is a timely debate in the biomedical sciences as well. In a 2021 paper published in Nature, DeepMind (Google) presented AlphaFold 2, an AI method capable of predicting the structure of proteins solely based on their sequence of amino acids. AlphaFold 2 was open-sourced, and furthermore, DeepMind subsequently partnered with the European Bioinformatics Institute (EBI) to release predictions for hundreds of millions of proteins, which was awe-inspiring to molecular biologists. Quite unanimously, AlphaFold 2 was considered to be a breakthrough. Interestingly, shortly after the AlphaFold 2 release, RoseTTAFold was published in Science. RoseTTAFold comes from David Baker’s lab, the group that has been shaping the field of computational structural biology for years. With comparable performance to AlphaFold 2, RoseTTAFold was also open-sourced and, while perhaps less known to the general public, it has been equally impactful in the field.

Now, this was in 2021. Shortly after the release of AlphaFold 2, DeepMind spun out an AI-first company named Isomorphic Labs with an explicit focus on drug discovery. Drug discovery (when it works) is highly profitable and, therefore, also tightly bound to intellectual property (IP) constraints. It is, for all intents and purposes, an industrial endeavour, so as someone who does not expect much from the pharmaceutical industry and their give-back to science, I took the Isomorphic Labs announcement with disappointment, even with a bit of despair. The natural evolution for both AlphaFold and RoseTTAFold was to accept any kind of input, not only protein sequences. This would include DNA, RNA, metabolites, and small molecule drugs.

This time, a version of RoseTTAFold called ‘All-Atom’ came first, published again in Science a few weeks ago. The code can be found in this GitHub repository. Subsequently, AlphaFold 3 was published in Nature, in another significant (and possibly rushed) paper, along with an impressive web server to run predictions online. However, AlphaFold 3, now co-authored by Isomorphic Labs, does not have any code available. Perhaps more regrettably, the prediction of protein-small molecule interactions (i.e., precisely what you need for drug discovery) is capped as a functionality in the webserver. There have been reactions on social media, particularly questioning whether Nature should be publishing work that is not fully reproducible nor fully available for others to build upon. At the core of this issue is whether we should applaud (as in ‘publish in Nature’) a tool produced by authors who are not entirely willing to abide by the rules of science. In response to criticisms, Nature commented on why they decided to publish AlphaFold 3 despite the limited accessibility to the tool, and it seems that authors will end up publishing the code within six months. As much I am all for openness and knowledge sharing, I understand Nature’s public-vs-private dilemma, which is a central dilemma in AI ethics. The best AI tools (including AlphaFold 3) are currently developed under a for-profit model, and we should not alienate ourselves from this reality. Instead, we need to be demanding with these secretive AI systems, perhaps even cracking them, to maximise their benefit for society.

That said, there is no way Ersilia will pivot to a for-profit model or become IP-locked. I’d rather go do something else. I am convinced that, in the long term, free and open-source AI tools will be much more prominent than their commercial counterparts. In many instances, they already are. It seems unlikely that AlphaFold 3 will reach the same level of recognition and esteem that AlphaFold 2 did, and this is not because the tool is less great, but because its authors have decided to act opaquely. Therefore, I doubt that AlphaFold 3 will eventually become scientific infrastructure in the way version 2 did. The goal of Ersilia is to become scientific infrastructure one day too, offering an AI-based research platform to scientists worldwide, especially those operating in the global south and investigating infectious and neglected tropical diseases. In this respect, the recognition of our tool as a ‘public good’ from the DPGA is encouraging and sets the basis (and boundaries) for how the Ersilia Model Hub should grow in alignment with the SDGs and the principles of open science.

--

--

Miquel Duran-Frigola
ersiliaio

Computational pharmacologist with an interest in global health. Lead Scientist and Founder at Ersilia Open Source Initiative. Occasional fiction writer.