Health Data: Unlocking its full potential, at last

Published in
13 min readNov 15, 2022


By Gabriel Franz, with the help of Samantha Jérusalmy, Sacha Loiseau, Marc Rougier, Saish Rane, Louisa Mesnard and Jean-David Zeitoun

In July 2021, DeepMind (acquired by Google in 2014) released AlphaFold, a groundbreaking AI system to predict the 3D structure of a protein based only on its basic formula. Trained on a data set of 170.000 protein structures that previous technologies took nearly 50 years to establish, AlphaFold predicted the remaining protein universe (200 million structures) in less than two years. AlphaFold structures have already become key for basic research, and will likely lead to better clinical research outcomes: for instance, drugs designed thanks to those systems may have a greater chance to be more effective and/or safer.

An undeniable momentum has been building around the use of health data to foster research breakthroughs as critical pieces including regulations, advanced computational techniques, and the volume and quality of health data, fall into place. Entrepreneurs, researchers, data scientists and health professionals have developed a much better understanding of the fit between health systems’ needs in terms of efficacy and efficiency, and the potential of increasingly rich health data. Legacy life science and pharmaceutical companies are more willing to use that data to drive their product and process developments.

This dynamic is driving spectacular start-up fundings and exits, validating the hypothesis that health data has now become one of the hottest sectors for Private Equity. After several years, if not decades, of waiting for data science to play a significant role in life sciences, we might finally be at an inflection point, which makes health data all the more an exciting investment space.

After having focused on Digital Life Sciences last year, and French Digital Health Deep Tech Startups last month, we’ll now explore why we at Elaia invest in health data.

We’ll talk about these points in 3 chapters:

  1. Health Data: context
  • Definition and Scope
  • Key Trends
  • Usage Challenges

2. Medico-Economical value of health data

3. Hot funding & M&A space

Chapter 1 — Health data: definition, trends & challenges

Definition and scope: What is health data?

One of the trends around health data is the exceptional growth in both its diversity and volume, with approximately 30% of global data being generated by the healthcare industry. Let’s start by defining exactly what we mean when we use the term “health data”.

Health data is commonly divided into two opposed data types: Randomized Controlled Trial (RCT) data and Real World Data (RWD). RCT data is collected in an experimental setting, normally comparing a proposed new treatment to an existing one. RWD is collected in real-world settings, gathering all data produced by healthcare systems, doctors, hospitals, labs and patients. This includes patient files, biology reports, pathology reports, imaging, administrative, operational and claims data. It also embraces “multi-omics” data: genomics, epigenomics, transcriptomics, proteomics, microbiomics, etc.

Some of this RWD is even collected outside of medical facilities and institutions. This is possible thanks to the growing popularity of connected health devices such as smartwatches, scales, bracelets, and even smartphones that measure heart activity, body mass, and number of steps. Tilak is a great example of this new RWD data collection paradigm.

Its digital therapeutics solution helps ophthalmologists monitor patients with chronic ophthalmic diseases. Tilak collects RWD through clinically validated mobile eyesight tests, and engages the player/patient in a series of games to track their visual acuity. The collected data is then sent to the physician to adapt ongoing treatments while also helping pharmaceutical companies improve the efficiency and comprehensiveness of their clinical trials.

Key trends: Digitalization of health processes and 5P medicine

Until now, the health sector has been slow to digitize compared to other industries. While it produces, collects, and stores vast amounts of health data, much of that is unstructured and unstandardized, even sometimes still recorded on paper.

However, there are signs that this logjam is finally breaking. The accessibility and amount of digitized data has drastically increased in recent years, in large measure thanks to the massive adoption of EHRs (Electronic Health Records).

This points to a larger shift underway. Medicine is evolving rapidly, becoming a field where life sciences go hand in hand with data science. The explosion of patient data is offering medical fields the opportunity to adapt by leveraging this new information. The race is now on to fully unlock the insights from health data because everyone from patients, to providers, to insurers are beginning to grasp the value of this information.

That has led to the creation of a new framework for thinking and talking about the benefits of this new data dynamic. The new concept, dubbed “5P medicine”, takes a patient-centric approach to health by focusing on increasing treatment success rates and decreasing negative side effects on patient health. Health data and digital technologies such as IoT and AI are the core enablers of the 5P medicine.

Challenges to the use of health data

If you want to understand why the road to progress has been this complex, you’ll have to look back a decade ago to the genesis of a health data project in the U.S. In 2011, the National Academies of Sciences, Engineering, and Medicine called for the creation of an “Information Commons” and a “Knowledge Network.” The goal was to create a common pool of medical and research data to promote more rapid advances in medical research. While the project continues to advance the spirit of open science, it has also been plagued by questions over structure, organization, and governance.

These hurdles reflect two broad challenges that all data health projects face: regulation and data flaws.

Let’s start with the regulatory constraints. Across borders, whether it’s U.S. states or EU nations, regulations about privacy and data sharing can vary widely. In Europe, health data is personal data within the definitions of the General Data Protection Regulation (GDPR). In some cases, the question of data sovereignty has become political with rules placing restrictions on where data is stored, and by which providers. For instance, the Health Data Hub (HDH) has stirred controversy in France because it is hosted by Microsoft Azure, an American cloud provider. This potentially places France’s data sovereignty at risk because America’s Cloud Act allows U.S. intelligence services to force cloud providers to share users’ data no matter where it is stored around the globe; hence the invalidation of the Privacy Shield by the European Union in 2020.

Irrespective of region, any analysis and innovation must follow strict medical ethics guidelines. In France, such aggregated data can only be used for research purposes. Beyond informing patients about the use of their data, it’s just as important that the public trust that this sensitive information is guarded safely and is only being used for their personal and public good. These guidelines, long considered as constraints have now set the ground for enabling regulations that foster novel research breakthroughs.

The CURES Act in the U.S. will make EHRs a mainstay and democratize data access for health providers. In parallel, the European Commission is developing a European Health Data Space proposal to support the use of health data for innovation and research. In France, under a framework imposed by the CNIL (French National Commission for Information Technology and Civil Liberties), approved health data warehouses (EDS) bring a valid solution to the aggregated data storage problem. Parallelly, government regulators are addressing sensitive topics like standards and interoperability to unlock the potential of health data.

While regulations gradually evolve towards a favorable health research environment, there remains a deeper and more stubborn technical problem yet: health data is siloed, unstructured, unstandardized, biased and from multiple sources.

Let’s break down these flaws.

Data typically comes from a wide range of sources that use different formats, remaining siloed across different hospitals, research centers and institutions. Creating a standard format has proved difficult, resulting in interoperability issues, and therefore limiting the ability of researchers to share and leverage the data from different institutions.

Complicating the situation is that much of this data is unstructured in various forms (PDFs, images, etc.). The quantity of this data makes it almost impossible for any human effort to categorize, clean, and label, making it hard to access. Only a tiny part of the data found in health institutions are structured and formatted in a way to be easily exploitable.

In the instances where data can be sorted and mixed together, scientists must still address potential bias correction issues, due to the lack of sufficiently large volumes of quality data to train machine learning models. At the same time, it can be technically challenging to combine multiple data sources to leverage any insights. This may nudge researchers to collect detailed, in-depth data from a smaller number of individuals, rather than a larger volume of data on an even greater number of individuals.

LynxCare is a compelling example of how to unlock the full potential of health data. Hospitals generate petabytes of data each year but only around 20% of it is used for the daily practice to improve patient lives, because the data is locked in many disparate databases and prosaic digital text. By mining and aggregating both structured and unstructured hospital data using NLP (Natural Language Processing) & AI, LynxCare is unlocking that information to enable better patient treatment and facilitate research with Life Sciences companies like Johnson&Johnson, Pfizer and AstraZeneca. For example, LynxCare’s AI is used to help doctors find undiagnosed patients affected by rare diseases and generate insights on how to better treat them and to determine which treatments are effective and which ones aren’t.

Besides providing a holistic data solution, Lynxcare is also solving the pain of data standardization. By formatting the data in the Observational Medical Outcomes Partnership (OMOP) common data model, which is increasingly being used across Europe, all data users are now sharing a uniform language foundation to utilize this data.

In conclusion, for those who believe in the potential of health data, the mission is clear: Health data must be easier to process and utilize, it must be standardized and structured, it must respect regulatory frameworks, and it must be consolidated and hosted in a way that is secure and protects privacy.

Chapter 2 — Medico-economical value of health data

Highly valuable medical & business outputs

Health providers are recognizing patient data as a valuable, intangible asset sought after by multiple stakeholders representing a treasure trove of information.

On a human level, this data can provide medical value in terms of earlier, faster, and more accurate diagnoses, improved outcomes, care pathways, and operational efficiency. When combined into a single longitudinal data set, patient-level records will create a 360-degree view of patients’ health, wellness, diagnosis, treatments, medical procedures, and outcomes.

But on a larger, societal scale, health data can fuel innovation in medical research and improve patient care, placing it at the heart of today’s health revolution. This will enable incredible breakthroughs in the health ecosystem through improved diagnosis, treatments, and operational effectiveness.

How does this value drive innovation?

Given this evolution of health data, there is a range of ways for startups to solve the challenges or help others leverage the opportunities. Health companies, including startups and life sciences companies, have understood the potential of the health data market and are therefore tackling various spots of the value chain, from data collection to data interpretation.

At Elaia, we see 3 categories of startups addressing these data challenges:

  1. Creating new ways to collect data: These startups add value through data exclusivity. The data they stockpile is only available from a single source, leading to a scarce asset that can serve as the foundation for new applications. By collecting a larger volume of data, they also help reduce or eliminate biases.
  2. Enabling data processing: The resulting raw datasets are not yet ready to be analyzed because of their lack of structure. These startups create a pipeline for extracting, cleaning, standardizing, structuring and hosting data. As a result, data is made accessible while preserving privacy. This compilation of aggregated and cleaned data also depends on interoperability capabilities between parties.
  3. Analyzing complex data: In this category, the ROI is determined by the target. The more complex the type of case and the more critical the applications, the more value these companies create. That value is directly proportional to the insights and innovations that can be unlocked from the curated data. Startups that address the challenges set by the 5P medicine model make for particularly compelling investments.


  1. Some companies are operating in several categories at the same time.
  2. We also need to highlight the existence of a transverse category, supporting the 3 main categories through services that reinforce and protect data through its different forms and stages

To illustrate the third category, let’s consider the example of genomic data. Clinical labs miss important actionable variants, biopharmas fail to identify patients who will respond to treatments through the lack of accurate genomic data, resulting in clinicians prescribing the wrong therapies. As sequencing costs are reduced, genomic profiling (genotype-level or whole genome/exome sequencing) is becoming more frequent and has the potential to vastly increase the complexity of analyzes and the data set size.

Developing the most advanced end-to-end genomic analysis platform, SeqOne provides state-of-the-art bioinformatic tools to identify complex genomic events, tackles the lack of collaboration between different parties, and aims at making genomic tests as simple and accessible as a blood test for any clinical lab. This will thus enable personalized medicine at scale: Genomic data can also be used for drug discovery, to give more precise diagnoses, make more efficient use of medicines, and increase the quality of life.

Private & public initiatives

The rise of these startups is leading to new collaborations with the institutions that own health data. These partnerships are crucial to safely unlock the access to a sufficient volume of data — critical for AI initiatives — to achieve improved outcomes at scale.

These collaborations involve various configurations of startups, established life science companies, hospitals, and research institutions who recognize the enhanced value that results through combining and sharing their respective data. They take the form of incubators, government investments, and strategic initiatives. In France, this ecosystem is flourishing.

In the public sphere, we’ve seen initiatives such as the previously mentioned Health Data Hub (HDH) which is dedicated to partnerships around health data, to make France an AI leader on the world stage. As part of the French government’s France 2030 program, a €50m budget is allocated to foster innovation through the construction of health data warehouses.

On the private side, Lynxcare has teamed up with PSIH, the French leader in Business Intelligence for hospitals, to improve the use of data in patient care and research. Arkhn and Owkin have launched Oncolab to make oncology data more accessible for research and innovation. Octopize, CHU Brest, and Roche have developed a process to simplify the transmission of patient data from clinical studies. Finally incubators such as Future4Care and Parisanté Campus contribute to this ecosystem by supporting Digital Health companies at the earliest stages.

Chapter 3 — Hot funding & exits

As the health data sector expands, it’s creating a stronger foundation for even more robust businesses. According to Straits Research, Big Data in the healthcare market was valued at $32.9bn in 2021 worldwide, and is projected to rise at a 13.85% CAGR to $105.7bn through 2030.

Investors are taking notice and startups are ready for VC money. Over the past year, we’ve seen some eye-watering VC rounds for health data startups. In the US, Commure raised $500m for their health OS in 2021, while Tempus raised an additional $275m a few weeks ago, bringing their total amount raised to $1.3bn.

In France, Owkin’s federated learning platform for drug discovery is paving the way with a $180m round led by Sanofi in 2021. Additionally, Lifen raised $58m to further develop the interoperability capabilities of health players. As for Elaia, we have supported Aqemia from pre-seed to their recent €30m Series A round, and have joined Lynxcare’s €20m Series A round in 2022 as it prepares to roll out its solution to hospitals in Belgium & France, as well as Seqone’s €20m Series A to democratize genomic analyses & precision medicine.

These VC investments are already translating into a remarkable track record of exits. Back in 2018, Roche set the bar for these deals when it paid $1.9bn to acquire Flatiron, which develops an oncology platform for better patient experience, healthier practice and smarter research. Roche is not an isolated case, being one of the many acquirers among pharmaceutical companies, manufacturers and big tech companies including GAFAMs.

More recent eye-catching acquisitions include Nuance, which provides a better patient-physician experience notably with their conversational AI, acquired by Microsoft in 2020 for $19bn. Similarly, an equity consortium led by Nordic Capital bought Inovalon and its transverse data-driven health platform mid-2021 for $7.3bn.

All this activity is leading to increased confidence around the future of startups in the Digital Health Data sector. Even with current market turbulence, at Elaia, we remain optimistic about the long-term potential of the health data sector. Indeed, it falls into the broader Digital Health category and the latter continues to set new records in Europe where it saw a 6% QoQ fundraising growth in Q2 2022, according to CB Insights. That is particularly impressive considering both the US and Asia were down significantly.

At Elaia we believe it is the right moment to invest in collecting, processing, and analyzing health data. The market lies on the edge of unlocking the health data market thanks to the support of regulation, and a world-class network of hospitals and research labs. We can’t think of another area where the potential for investors to help improve the society’s quality of life while also seeing big returns is so well aligned.



Editor for

• Backing tech disruptors •