Thoughts on COVID-19, Scientific Gatekeeping, and Substack Newsletters

Published in

The Startup

8 min readJun 14, 2020

I recently started a free email newsletter called ML4Sci which highlights applications of Machine Learning for Scientific applications, a topic which I’ve written about here on Medium as well. This essay is a lightly edited version of the 12th issue of the ML4Sci newsletter. Interested in reading more? You can find more issues here and if you like what you see, feel free to sign up!

This essay was inspired by Jeffrey Ding’s reflections on gatekeeping in his ChinAI Substack newsletter, which covers China-US tech policy.

COVID-19: A Catalyst for Experimentation

“There are decades where nothing happens, and there are weeks where decades happen.” -Vladimir Lenin

COVID-19 has forced experiments with new technologies and tools and may have pushed many nascent technologies pass the “valley of death” between early adopters and mass consumption. For instance, telemedicine, ecommerce (see below), and new scientific collaboration tools have all taken off during the global pandemic.

In addition to serving as an accelerant, COVID-19 has also laid bare many fault lines in the scientific world. For the first time, we are seeing the collision of a global pandemic with the (mis-)information age. Suddenly, the general public is intensely interested in the preprints coming out of BioRxiv and scientists are finding that the scientific community is woefully unprepared to cogently communicate the pitfalls of the scientific process. As a result, we’ve seen the proliferation of “coronavirus influencers” i.e. people with no medical background writing on medium about coronavirus research and models, as well as the spread of misinformation online about fake cures and treatments. The deluge of papers is proving difficult for even scientists to keep up with, which has led to the creation of new tools to analyze and mine the flood of papers coming out.

Now, we see the inadequacies of our scientific communication infrastructure laid bare when quite literally millions of lives are at stake. There are several distinct problems:

lack of infrastructure for rapid, robust, yet high-quality and open-source peer review to filter preprints
lack of infrastructure to aggregate useful papers and for scientists to identify and discuss early trends
lack of infrastructure to communicate science, with all its nuances and pitfalls, to the broader public and news media [1]

Lessons learned from the Machine Learning Community

Of course, another field that has broad public interest, rampant misinformation and misconceptions, an overwhelming growth in papers, not to mention enormous financial incentives, is artificial intelligence (AI), particularly where deep learning is concerned. [Somehow, all of these manage to be encapsulated in the persona of Elon Musk, who regularly tweets about terminator-esque AI, has developed a highly-valued software-driven car company with AI as a core competency, and co-founded an AI institute (OpenAI) that publishes a breath-taking amount of high-quality papers]

The broader scientific community might be able to learn from how the ML community has used open science principles to streamline review processes, while avoiding their mistakes. For instance, OpenReview is a great, open-source, peer review process for ML conferences that allows everyone to view the reviewer comments and author responses. The preprint and open-source culture of ML research certainly has begun permeating other fields, particularly those at the interface of ML4Sci, that is, machine learning for the sciences.

However, the enormous financial incentives around publishing new state-of-the-art models and fame associated with incremental achievements on benchmark datasets are still altering the publishing and review landscape. See “Troubling Trends in ML Scholarship” or “Peer Review in NLP: reject-if-not-SOTA” for more.

Building digital gatekeeper communities

Fingerspitzengefühl: literally, “finger tips feeling”

It seems to me the problem is that we are missing a middle abstraction barrier. We have the very low barrier-to-entry of preprint publishing; basically anyone can put something on Arxiv. On the other hand, we have the very high barrier-to-entry of peer review publications, which (as of now) happens behind closed doors and can take months. In both cases, the scientific writing is formal and difficult to parse for anyone not in the field. What we need is to build infrastructure that can help improve people’s situational awareness of published literature, rather than trying to drink from the firehose of preprints.

Currently, Twitter is the trending solution to this problem. Arxiv-sanity, an improved UI for Arxiv, has a tab specifically for preprints trending on Twitter (as does Rxivist, the BioRxiv equivalent). Evidently, the alternative to peer review is a papers popularity on Twitter. Yet anyone who has spent any time on Twitter knows its a terrible platform for scientific discussion: 280 character limits, complicated threading, prone to “wisdom of the crowd”/mob mentality and “rich get richer” network effects. The social media sphere is also easily infiltrated by those seeking to promote misinformation[@China][@Russia]. (imagine if someone built Twitter bots to promote their own research and “game” the system….or more likely, nation-states doing so to promote their own “national” research and increase prestige)

What we need are gatekeepers of knowledge, who aggregate and identify high-value information and articles. More long-form and conducive to substative discussion than Twitter, but also faster than peer review. We also need people to provide high-level overviews and opinions on the trajectory of a field. What might that look like?

Well, that’s what I hope for my newsletter, ML4Sci, to become! But the point of this essay is not just a circuitous plug for my newsletter, but rather to advocate for the development of an ecosystem of career-advancing, financially feasible, scientific writing focused around what might be best called “mildly opinionated reviews and analysis” of a particular field. My hope is that newsletters like this (or potentially other forms of media content that have yet to be created) can help fill the middle gap by providing aggregation of both peer-review publications and preprints, as well as analysis and opinions on emerging trends of research.

This would mirror similar shifts occurring throughout the digital information economy. For instance, A16Z, a prominent venture capital firm in Silicon Valley, has written about the growth of the “passion economy”, akin to the gig economy for creators. [2]

To see the opportunity presented by these new business models, we need to look no further than journalism. Plummeting ad revenue due to COVID-19 has hollowed out newsroom journaling, which has traditionally been driven by classified ads. Yet many journalists who found themselves recently unemployed have found a new business structure for themselves on Substack. Perhaps its time for scientists to similarly evolve new “business models” and career paths?

In the past, the only way to make it in academic research was to grind it out into a tenured professorship or else be consigned to perpetual post-doc purgatory. The tightening funding pool and intense publish-or-perish culture has contributed to a mental health crisis in graduate students, who are supposed to be the next generation of our best and brightest minds.

One piece of the solution to all these problems may be to create new career paths. The resurgence of subscription-based content, driven by an era where trust is the new currency, makes such roles financially feasible for newly minted Ph.D’s looking to do something besides academia. Such gatekeepers can be outward-facing (to provide an accessible person-of-contact for reporters looking to actually understand the science and direction of a field, to communicate in times of crises or combat misinformation put out by corporations or other groups, etc.) or inward-facing (providing high-level overviews of fields, resources for scientists looking to learn new skills e.g. ML/AI, high-level analysis of new techniques in recent literature, etc.)

The transition will be hard. The impact-factor metric is deeply ingrained with academic prestige. I don’t know what the solution to this is (maybe someone will one day cite an issue of my newsletter?), but I am confident that the current publishing schemes we have now, even with preprints and open peer review, are not sufficient for the challenges posed by the information era. [3]

Finally, we need diverse gatekeepers. Science always thrives in open discussion and debate. To have a single authoritative voice hemming in an entire field will lead to groupthink and stale science. In other words, gatekeepers need peer review too, but probably not the anonymized, closed-door, synchronous kind we have now. The term “gatekeeper” might conjure up the image of a walled garden, with a singular authority that determines who comes in and out. But what we really need is an ensemble of gatekeepers who debate and build off of each other, providing back and forth. [4,5]

This culture of open debates over blog posts is already embedded in ML e.g. the bitter lesson”, Deep RL doesn’t work yet, etc. We need more of that in the sciences, and preferably in a more structured format than random personal blogs, but until recently, have lacked the infrastructure or incentive to do so. Now, new technology services and business structures like Substack and Revue, as well as the growth of other mediums like podcasts, have made it more feasible both technologically and financially.

Conclusion

Science is becoming more complex, more important to public policy, and just bigger. We can’t just keep trying to drink from the flood of preprints. We need a diverse set of trusted voices who can summarize and aggregate trends, particularly in fields that are closest to the public’s mind e.g. epidemiology, environmental science, public health, AI, etc.

There is plenty of room in this growing field. We are still in the early stages of the creator friendly economy. There are so many different ways of doing this: different audiences, verticals, monetization schemes, etc. The scientific community is desperately in need of new solutions to the age old problem of communication and trust. What will you start?

Thanks for making it to the end! If you want more essays like this, as well as (roughly) weekly updates on the latest advances in AI for scientific and engineering fields delivered straight to your inbox, sign up for my free email newsletter ML4Sci!

[1] To be fair, much of the misinformation online is not the fault of the scientist: larger geostrategic considerations and domestic political polarization are dominant factors. With that said, we still have a responsibility to continue to try and identify new, better ways of communicating in the new information age.

[2] Some people might think of Medium as a salient example of subsciption-based monetization of content creation. But I think writing on Medium is a little too close to being paid for making tweetable articles (one title that popped up on my feed: “How I went from zero coding skills to data scientist in 6 months”). Medium is definitely an improvement and some of its publications offer nice tutorials (I also publish on Medium, so I don’t want to be too unfair). But I think the incentive structure of Medium is too misaligned with the objectivity required of science, as I think anyone who has really perused Medium will agree.

[3] As this was going to press, I found a great Nature Index article that discussed several exciting new preprint projects aimed at filling this “middle abstraction barrier”. The ideas all look great and I’m definitely excited to continue tracking progress in these projects. However, these solutions are all some variant of “open-sourcing peer review for preprints”. This an undeniably an incredible important step forward (honestly can’t express how badly the journal-based peer review system is in need of an overhaul), but I still believe that, in conjunction with these initatives, we need more casual, long-form, field-level analysis and discussion, rather than article-specific review (important as it is!).

[4] As an example, see Jeff Ding’s discussion on how gatekeepers shape policy debate in the China-US tech sphere.

[5] Some people might take issue with the mixing of opinionated gatekeepers and science. But when we consider something as broad as an entire field, say, ML4Sci, then naturally people will have different opinions, be excited about different things, and believe the future lies in different places. The important thing is that we acknowledge what backgrounds we come from and never cease to welcome open debate and rebuttal.

Thoughts on COVID-19, Scientific Gatekeeping, and Substack Newsletters

COVID-19: A Catalyst for Experimentation

Lessons learned from the Machine Learning Community

Building digital gatekeeper communities

Conclusion

Written by Charles Yang