Leveraging Data, Automation And AI to Change The Way We Do Science

Zavain Dar
4 min readSep 26, 2017

Announcing our Series A partnership with Recursion Pharma

This piece initially published on Oct 3, 2016 at Lux Capital

A little over two years ago I first started writing and working on an ongoing thesis of how the emergence of big data sets, automation, connectivity, and machine learning is fundamentally altering how we do science — in essence disrupting the scientific method. In short, these developing technologies and emerging sciences have enabled scientists to shift from hunting and “intelligencing” analytic truths via reasoning a priori about causation to relying on reproducible and large data sets to correlate input and output — first rigorously understanding correlation and only then asking about causation. This transition not only takes the burden of the “eureka” moment away from the “lone” scientist, but shifts the intellectual responsibility increasingly so on machines: first giving predictive and correlative insight to then guide the scientist’s search for causation, truth, and — in biological terms — mechanism of action.

I’m hard pressed to find an area where this technology enabled disruption for how to conduct science has had a bigger impact than in biology. If past decades were about intuiting the semantics and grammars of Gs, As, Ts, and Cs (footnote: the nucleotide building blocks of DNA) via a priori analytic methods and leveraging super computers to mimic our analytically induced proxy models for the simulation of mechanistic biology, the immediate future bears a significant jump. One that emphasizes reproducible digitization of input and output data (generally form and function data), automates data creation and evaluation assays, and employs distributed compute and machine learning to couple and correlate form and function. Net: we’re nearing the gradual but algorithmic ability to give predictive insight to human health, leads for feature therapeutics, and novel means to access underlying mechanistic biology.

We don’t need to go far to see this playing out in industry. Illumina, a current biotech favorite, has perfected the assays and tooling to quickly and reliably produce large and reproducible amounts of genetic data. On top of the new data, they’re not working on unearthing a ‘mechanistic’ set grammar of rules that dictate how a “GATTC” might differ from a “GATCC”, but rather relying on machine learning and data science to correlate and predict what a “GATTC” could express given prior expression data from a “GATCC”.

Moving up a levels of abstraction, Lux portfolio companies 3Scan and Viumsimilarly focus on the creation and digitization of reproducible data to then couple form and function. 3Scan captures and digitizes novel high dimensional tissue data, whereas Vium adds orders of magnitude of statistical reliability and data granularity to traditional in vivo studies — playing a role in launching the field of living systems informatics.

The above points withstanding, the shift from a priori discovery of mechanistic biology to empirically accessed correlative biology has yet to gain traction in early stage drug discovery and Big Pharma. In fact, it’s not unfair to say industry views it as ‘luddite biology’. A biology that hides from the complexity of mechanism and one that’s equivalent to pathologists of yore looking under microscopes to interrogate the phenotype (structure, shape, or form) of cells to assess disease states (function) while screening for drugs. It’s reasonable to assume supercomputer simulations of our proxies of mechanistic biology should far out perform empirical data void of analytical human input or model simulation. The reality, surprisingly, is quite different. The majority of drugs brought to bear today through the FDA pipeline are still discovered phenotypically — with scientists and microscopes, assessing cell shape, form, and function, not via novel analytic insight or computer driven model simulation of mechanistic and target driven biology.

The above context rests as backdrop for the announcement our newest portfolio company: Recursion Pharma. We’re thrilled to be leading the $13M Series A. Taking work from cofounder and CEO Dr. Chris Gibson’s research on automating phenotypic screening for drug discovery and The Broad’s Anne Carpenter Recursion uses cutting edge automation, robotics, biology, machine learning and computer vision to automate the “pathologist of yore”. In effect making reproducible and higher throughput the most prolific method in history for successfully yielding therapies for human disease: phenotypic screening. Inline on the theme of capturing and algorithmically analyzing high fidelity form and function data, cofounders Chris, Dr. Blake Borgeson, and Dr. Dean Li have developed an automated platform that colors and images disease states within human cells to then employ machine learning and AI to model input to output — cell structure to biological health state.

(Image: Colored Cell Structures)

This interdisciplinary approach across biology, robotics, distributed systems, and machine learning is one that’s both unique in its application and contrarian in it’s industry methodology. Chris, Blake, and Dean have assembled a team that mimics their own capabilities and tenacity and is reflected in the company’s three top 10 pharma contracts and early indications that the Recursion team is well on its way to building the fundamental map of cell structure to health state. We’re similarly proud of the unique and interdisciplinary syndicate (Hat tip: Obvious Ventures, DCVC, AME Cloud Ventures, and EPIC) that supports the many challenging and exciting pharma, technical, biological, data and computational hurdles the company will face as they scale up internal efforts and grow their pipeline.

Welcome Chris, Blake, Dean and the entire Recursion team to the Lux portfolio!



Zavain Dar

radical Computer Scientist, recovering Nihilist, VC @Lux_Capital, adjunct @Stanford