Leveraging disease complexity and patient diversity for drug discovery

Marcel van der Brug
Character Biosciences
7 min readMay 17, 2022


Note: The original version of this post was published on July 10, 2019. The company was known as Clover Therapeutics at that time. Clover Therapeutics is now Character Biosciences.

Drug discovery, at its simplest, is answering a series of questions about the nature of disease. Answer enough questions correctly, and you can discover a new medicine that eases suffering. This, of course, is not simple or easy. Success in drug discovery is built on a foundation of failures. This in and of itself is not a bad thing, provided we learn from it. What is concerning is the long development times and costly late-stage clinical failures that threaten to choke the pace and affordability of developing new medicines that improve patients’ lives.

I do not mean the drug discovery process is inherently broken. However, models that worked well since the beginning of the 20th century now have poorer success rates. A new drug’s chance of moving from Phase 1–3 clinical trials to approval is now ~10% and takes more than a decade to get to patients. The majority of the failures occur at the Phase 2 and 3 clinical stages, and drives the total cost of bringing a single drug to market to more than $2.5 billion. Even then, many new medicines don’t improve outcomes over existing drugs and fail to lower healthcare costs for patients or payers.

There are new approaches to augmenting the drug discovery process that are encouraging. For example, we have more clinical, imaging and genetic datasets than ever before (e.g. All of Us and the UK Biobank). The convergence of machine learning (ML) with big clinical data and other big biological data promises new understanding of disease mechanisms. It’s easy to be seduced by this idea of hidden insights uncovered by ML. How effective is it? Can a large cohort of clinical and molecular data be analyzed to provide real breakthroughs in understanding disease mechanisms? Can it also really diagnose sub-types of disease and the exact patients to enroll in a trial?

Maybe, but there is a catch. Derek Lowe, Director in Chemical Biology Therapeutics at Novartis, makes the point that “The important problems with drug discovery in general are not data handling problems … [We] like to think that everything could be solved if we could just obtain and correlate enough data.”

What we need to obtain is what we think is the right data now, AND continuously collect new human data as our understanding of disease evolves.

For Character Biosciences (“Character”), this means we:
(1) Build long-term, trusted relationships with patients that respect their consent, protect their privacy and advance their care;
(2) Add longitudinal quantitative data to our extensive live clinical data stream;
(3) Use genomics and other multimodal data to study disease in-vivo; and
(4) Develop in-house drug discovery expertise and collaborate with industry leaders.

Assembling these capabilities under one roof is an opportunity to accelerate the development of first-in-class therapeutics for diseases of aging.

Machine learning-driven discovery of disease sub-types

The way diseases are classified has not changed dramatically from the original symptom-based descriptions used in the 17th century. To identify a valid drug target, we cannot just rely on these classical definitions of disease. We need to identify sub-groups of patients for whom we believe that a specific target is driving disease. Why are some patients afflicted with disease early? Why do some diseases progress more quickly? Why don’t some respond to current therapies while others are cured? What gene/protein/pathology is driving each of these phenotypes? Do all of these different manifestations even mean we’re talking about one disease anymore?

Heterogeneity in complex diseases is the reason even blockbuster drugs don’t work for everyone. We want to embrace the concept of disease heterogeneity and develop targeted therapeutics with greater efficacy than the current one-size-fits all approach.

To do this we need more data than is found in standard clinical records. Character applies machine learning to genomic, imaging, and digital biomarker data to refine how we classify diseases. Through this understanding, we can identify candidate drug targets and their relationship to phenotypes. It also allows us to be more creative in choosing clinical outcome measures for therapeutic clinical trials, and potentially decrease the size of those trials.

Finding therapeutic targets through analysis of genetic diversity

The patients enrolled in Character’s clinical trials are diverse. We have an opportunity to increase the representativeness of genetic studies, and consider the influence of genetic ancestry on disease. By 2016, 81% of participants in genome-wide association studies were people of European ancestry, and 14% were Asian. Real inequalities in healthcare exist because of this imbalance in genetic knowledge. For example, patients with African ancestry are more likely than those of European ancestry to be misdiagnosed as carrying a pathogenic mutation that causes a life-threatening heart condition known as hypertrophic cardiomyopathy.

Problems for genetics-driven discovery of new drug targets also exist. Most known genetic variants are located outside protein-coding regions. Causal variants and genes have not been definitively identified for most loci. Further refinement of these loci using only genomes from people of European descent may be problematic due to LD structure. Allele frequencies, linkage disequilibrium patterns, and effect sizes of common polymorphisms vary with ancestry. By prioritizing patient diversity in our drug discovery effort, we may refine existing disease loci, discover new loci, identify novel rare variants, and advance precision medicine for all patients.

Deep phenotyping of individuals with genetic variants related to a drug target

As a consequence of genotyping well-characterized patients, we can identify rare variants of large effect. With patients as voluntary and informed partners, we generate additional biological evidence to link these variants to disease and begin to dissect the mechanisms.

Because of our focus on building long-term partnerships with patients, we have an efficient way to re-engage patients with key genetic variants for additional hypothesis-driven phenotyping and follow their clinical journey for years. The collection of biosamples to support in-vitro testing of candidate genes also becomes feasible, enabling functional genomic studies to validate targets and tools to develop therapeutic molecules.

Efficient clinical development

Time spent enrolling patients is often the single biggest factor in the duration of a clinical trial. At Character, we can have years of longitudinal data before patients enroll in a therapeutic clinical trial. This would provide unprecedented understanding of disease manifestation and progression — at the individual patient level. This opens the possibility of selecting the smallest number of patients to enroll in Phase 2 and 3 studies.

Smaller trials lower trial costs and potentially shorten trial times. This idea is not new, but successful execution is part of the important but inconvenient path to improving clinical success.

Looking backward and forward

My personal induction into genetics and drug discovery began in the winter of 2003, though I did not realize it at the time. I had started my postdoctoral work on Parkinson’s Disease (PD) in a neurogenetics laboratory at the National Institutes of Health, just outside of Washington DC.

To prepare, I read as much literature as I could about the genetics of PD, of which there was little. The laboratory where I worked was changing this. It had a prominent role in discovering genetic causes of PD and was rapidly expanding both the literature and its lab space. We were well resourced and not wanting for DNA sequencers, PCR machines, confocal microscopes and other expensive scientific instruments of discovery. But then, as now, money and resources were not the bottleneck to discovering what causes disease; insights and understanding only came with access to the right patients.

In 2004, our geneticists were screening families from Spain and England that appeared to have a heritable form of Parkinson’s disease. Along with another academic group, they soon pinpointed the gene responsible (LRRK2). Academically the gene was fascinating. From a drug development perspective, it was a potential breakthrough.

It was a kinase, a class of proteins that had been drug targets for more than a century. This was a rare opportunity to develop a new treatment for PD. New inhibitors were developed by several pharmaceutical companies, and seven years later I joined one of these programs. Unfortunately, our collective lack of understanding of LRRK2 biology, along with the risk of toxicity, stopped our program and others. It took another six years, an amazing patient advocacy group, and a new company to figure out how to bring this drug into clinical testing in 2018 — in part, by finding patients with a sub-type of PD driven by the target. We don’t yet know whether LRRK2 drugs will work, but I do think we could have found out sooner.

At Character, we believe embracing complexity in disease, diversity in patients, and a live stream of clinical genomic data is a faster path to developing personalized medicines and improving patient lives.




See more recommendations