Uncreative science

Some personal thoughts before populating the Ersilia Model Hub

Miquel Duran-Frigola
ersiliaio

--

Conceptual artist Douglas Huebler once wrote:

“The world is full of objects, more or less interesting, I don’t wish to add any more.”

Poet and curator Kenneth Goldsmith said the same about texts. “The world is full of texts, more or less interesting, I don’t wish to add any more.” Surely I’m not the only scientist that feels this way too. The world is full of knowledge, more or less interesting, I don’t wish to add any more.

As of June 2021 (only one year and a half into the pandemic) we’ve produced over 500,000 scholarly articles related to COVID-19. Go to Semantic Scholar and you’ll find 200 million papers from all fields of science. You will also find a “too long; didn’t read” (TL;DR) AI tool there that will summarize each article for you. Which begs the question: who reads scientific literature? I don’t read that much anymore. I skim, copy, share on Twitter, email, parse and archive. But I don’t read, really. I am just pushing language around.

I am well aware that nobody is reading my work, either. Almost half of the published papers will never be cited. If you make it to 10 citations you’ll be in the 25% top cited work worldwide. Let’s be honest: this is not an “audience”. If you are a pharmacologist, let’s be honest one more time: it is unlikely that you’ll ever “cure” anyone. The net result of decades and decades of research adds up to less than 3,000 approved small molecule drugs, a few dozens per year. What are the chances? With so much pharma industry, so many biotechs and so many professors competing for these few dozens of FDA approvals.

I guess I’m just trying to come to terms with my own frustration. In academia success means publications, publications mean novelty, and novelty means creativity. As it turns out, I am not a very creative individual. Certainly not good at making discoveries. Of course I wish to add more knowledge to the world, this was not a boutade. But I am slowly, silently giving up. It is fine: most of us in STEM give up the professorship career in some way or another.

It’s a pity, though, that things have to be like this. They’ve been different in the visual arts for one hundred years, or probably more. Think of Duchamp’s readymades or Andy Warhol’s Brillo boxes. No need to make new stuff. It is perfectly fine to be an artist even if you just push objects around. No mastery required.

American pop artist Andy Warhol amid his Brillo box sculptures. Photo by Fred W. McDarrah.

We know how visual artists were liberated by photography, a technology so good at replicating reality that forced them to explore new avenues, including abstraction, collage, appropriation, and intentional plagiarism. This is why Houbler was not willing to add new objects to the world, and this is why Goldsmith, in the age of the Web and social media, is not willing to bring in any new text. His essay Uncreative Writing is quite an interesting read, advocating for remixing, repeating, reframing, CTRL+C CTRL+V and all of that. For example, uncreative writing would be:

  • A 2,637-pages long PDF by Steve Giasson containing the entire comment stream of the 9/11 video on YouTube; or
  • A poem by Caroline Bergvall composed of the first tercet of Dante’s Inferno as it appeared in 48 translations found in the British Library; or, by Goldsmith himself,
  • A written record of every single word he spoke during one week.

Try and do something like this in science. Good luck with funding and good luck with getting it published. Nobody really cares about text mining, integrative knowledgebases or data visualization tools. I’ve had the most wondrous data points at hand, and it was not until we came up with a petty story that we managed to publish them well.

The Ersilia Model Hub

In the next few months, I will devote my time to the development of the Ersilia Model Hub, which I feel will be my most uncreative scientific experience to date. We want the Ersilia Model Hub to be the largest collection of AI tools for drug discovery. At first, we will focus on infectious and neglected tropical diseases. We expect to host 100 AI models by September 2021, and 500 before the end of the year.

Sketch of the Ersilia Model Hub. Follow development on GitHub.

The uncreative procedure will be the following, repeated for each AI model:

  1. Identify, based on our experience or on collaborator’s demands, a predictive task of interest. For example, “broad-spectrum antibiotic activity”.
  2. Search the scientific literature for existing research on the topic. Rarely, a model will be available with pre-trained parameters and code. In this case: simply download. Sometimes, code will be there, but parameters will be missing. Then: train in-house. Often, only data will be at hand: download this data, process them and run Ersilia’s automated AI pipeline.
  3. Bundle model code and parameters in a Docker container. Try to make it work with Anaconda and Virtualenv as well.
  4. Test model performance in a few computers with different operating systems installed. Perhaps avoid Windows. Make a “lite” version of the model if it turns out to be too slow.
  5. Give the AI model an identifier. Do a write-up, select an open-source license file, explain succinctly what the model does. What is the input? Molecules. What is the output? Growth inhibition of microbes.
  6. Finally, upload the model bundle to the Ersilia Model Hub. Distribute and publicize, credit and engage the original authors. Depending on demand, prepare a web app.

I am truly excited about this challenge, and I am curious to see how fast the Ersilia Model Hub can grow. How does it feel to be a curator liberated from the urge of discoveries, and how is the tool perceived by the community.

--

--

Miquel Duran-Frigola
ersiliaio

Computational pharmacologist with an interest in global health. Lead Scientist and Founder at Ersilia Open Source Initiative. Occasional fiction writer.