Invisible cities so far

2023 marks the centenary of Italo Calvino. Our organisation, Ersilia, is named after a city imagined in his book Invisible Cities

Miquel Duran-Frigola
ersiliaio
6 min readOct 26, 2023

--

This year we commemorate the centenary of the Cuban-Italian writer Italo Calvino. Much has been written about Calvino’s relevance in today’s world, starting with his celebrated Six Memos for the Next Millennium — lightness, quickness, exactitude, visibility, multiplicity, and an unwritten one: consistency. It is easy to see how these memos can be translated from literature (their intended field) to almost any human activity. Modern data science is no exception. As it turns out, Calvino also imagined a machine that could write creatively (and do much more), anticipating today’s AI large language models like GPT-4 & co. Of all his oeuvre, perhaps the most well-known is The Baron in the Trees. It is one of the greatest “noes” of literature. In it, a firstborn aristocratic child refuses to eat snail soup and, to oppose his father, climbs a tree, vowing never to come down, much to the despair of his family. Many of us struggle with the “no” — it is impossible not to applaud this one.

Italo Calvino, Marcello Mencarini, The Estate of Italo Calvino.

Our organisation’s connection with Calvino originates from a short, perfect book titled Invisible Cities. The book channels the voice of Marco Polo as he describes his (imagined) travels to emperor Kublai Khan. Each chapter is dedicated to a city, and every city portrays, in a poetic tone, a societal metaphor. One such city is Ersilia, from which we take our name. Gradually, we are finalising smaller and satellite projects, adopting city names for them too — Isaura, Zaira, and Olinda, so far.

The city of Ersilia, a network of drug discovery collaborators in the global south

The goal of our non-profit organisation is to strengthen research capacity in the global south, better termed the majority world, where funding for science is critically small. Traditionally, drug discovery has been a closed-source discipline, confined behind the walls of large pharmaceutical companies. Nothing in the public sector can commensurate with pharma; therefore, drug discovery in academia must be collaborative and associative at its core. In Calvino’s depiction, the inhabitants of Ersilia stretch threads from door to door to symbolise their relationships and partnerships. You can read Ersilia’s chapter here; it is quite memorable. The mesh of strings that form Ersilia is a testament to the connections between people and, as such, it is the main feature of the city, more enduring than wood and stone. Ersilia is a nomadic city, ceaselessly travelling across an esplanade. Due to the demands of our work, we’ve also been travelling frequently for the past three years, and when not, we operate remotely in digital spaces. We aim to connect partners from the global south, the global north, the open-source realm, data donors from large firms, and contributors from major tech companies. We believe Calvino’s metaphor works well to describe all this.

More practically, the tool we’ve developed to coordinate our growth and engage our partners is the Ersilia Model Hub. You can find a rudimentary interface for it here and the code repository here. The Ersilia Model Hub offers an ever-growing set of artefacts for scientists to use, aimed at fostering collaboration with and within the drug discovery community in the global south — the strings in Calvino’s story, if you’ll allow me.

The City of Ersilia, imagined by the AI model DALL·E 3

The city of Zaira, automated AI for virtual screening screening casades

By AI industry standards, drug discovery is a severely “low data” scenario, especially in the often-neglected area of anti-infectives drugs. We do not have many data points available to train our AI models for a specific pathogen, so the best we can do is capitalise on historical data produced in other studies and consider it cumulatively to at least have somewhere to start. In AI terminology, this approach is termed “transfer learning”, i.e., using a generic model trained on a large dataset and fine-tuning it with smaller data specific to the domain of interest. The city of Zaira, in Invisible Cities, offers an almost direct analogy to this. Zaira encapsulates memories in its very architecture, with layers of history embedded within houses and streets. I find it challenging to articulate in my own words; you can read Calvino’s vivid description here.

Recently, we published a tool named ZairaChem in Nature Communications, in collaboration with the H3D Centre in Cape Town. ZairaChem provides a fully-automated framework to train AI models from bioactivity screening datasets, and it uses the concept of transfer learning to pull as much historical information as possible within the training procedure, so that acceptable models can be derived using a minimal amount of data. If you are a computational chemist, feel free to explore the tool here. ZairaChem is relatively mature, and we’d love to hear your feedback.

The city of Isaura, a data lake of AI model precalculations

Even when you believe you have a solid technical roadmap, there are some things you can’t anticipate. A mantra in AI is that while training is computationally expensive, predictions are cheap. This is generally true, but we have learned that in some contexts where we operate, even making AI predictions is prohibitively costly or just not feasible. This is often the case in centres without dedicated IT departments. Moreover, there are some very popular sets of molecules (FDA-approved drugs or the MMV Boxes) that people use over and over, so it makes sense for us to stay a step ahead and make pre-calculations. Our vision is to offer a “data lake” of readily available AI predictions.

Thanks to an outstanding team of volunteers from The Good Data Institute, we have developed a pipeline to run our AI models and store the results seamlessly. Another team from Harvard Tech for Social Good (TS4G) is helping us visualise and monitor the process with Splunk. All of this revolves around a code repository named Isaura, which aids in managing and storing all precalculations. One of our star open-source contributors, Ankur Kumar, was instrumental in kicking off the code. Calvino’s description of Isaura is one of the most evocative: a city of “thousand wells” that sits atop a deep, subterranean lake. “Its green border repeats the dark outline of the buried lake”.

The city of Olinda, AI model distillation

Another repository that Ankur developed is named Olinda. You can read a recent interview with Ankur here. Olinda is still under development, and we hope to bring it to production in the not-so-distant future. In Invisible Cities, Olinda is presented as a city within cities, a geometric marvel of concentric streets that emerge one inside the other. As a result, the inner city is very thin but still retains all features of the ensemble, in a sort of distilled form. In AI jargon, “model distillation” is a relatively well-defined term that refers to the process of obtaining a lighter model from a larger one. Lighter AI models are suitable for conventional computers or even mobile devices, and are also much cheaper to serve in the cloud. Our Olinda code does exactly this distillation. Let’s hope we can release a well-tested version of the tool within the next few months.

What’s next?

Not all our current tools are named after Calvino’s work. Some have more mundane names like ChemSampler (work in progress), and for a few of our flagship projects, we do not even have branded names (like the PharmacoGx embeddings repository). Also, we don’t want to overuse Calvino’s metaphors. In any case, it is unlikely that we will be developing new software anytime soon. We are now consolidating the existing tools and aim to release and publish them. It has been an enormous amount of work, and we need to make sure that they reach our beneficiaries in the global south effectively. Expect substantial updates by the end of 2023.

--

--

Miquel Duran-Frigola
ersiliaio

Computational pharmacologist with an interest in global health. Lead Scientist and Founder at Ersilia Open Source Initiative. Occasional fiction writer.