Foundation models: The key to solve radiology through AI?
In 2016, Geoffrey Hinton, one of the 3 founding fathers of deep learning, predicted in the 5 coming years the end of the radiologist, who was like the coyote who did not yet know that the ground was collapsing under him with the rising of AI.
In 2022, radiologists are still there. Although the prediction is wrong in its temporality, a paradigm shift is taking place in the world of medicine, even more striking in radiology.
Some data can testify to this phenomenon : the number of AI solutions validated by the FDA, largely dominated by radiology.
But…This huge number is misleading. It does not say anything about the problem linked to the adoption of AI by doctors, who consider these products to be of little or no use, and end up disengaging from it, like the useless applications that clutter our smartphones. Some even speak of an “adoption crisis” of AI in radiology.
However, the needs are there: there are always more examinations in an ever more technical medicine, and radiologists are still too rare, and take a long time to train (around 10 years of study). And this increasing demand have impact on radiologist lives : 49% of them show sign of burnout in US.
We need more intelligently designed tools to tackle this growing demand. Because radiology saves lives: for example, early diagnosis of lung cancer with CT scan have huge impact on outcome of patients at risk (95% of 5-year survival in stage Ia), while cancer diagnosed too late, as is currently the case for 75% of patients, have a much worse prognosis (16% 5-year survival without screening program).
If radiology is among the disciplines to have seen the emergence of the first generation of deep learning based softwares, it will probably be the among the first medical domain that will see the next appear.
AI has seen its paradigm change in recent years. Larger and larger models have emerged, showing impressive results in Natural Language Processing (NLP) such as GPT-3, or even more spectacular ones, such as DALL-E or stable diffusion, capable of very realistic image generation from a single prompt.
The rise of Foundation models gives us a signal about this paradigm shift.
Biological intuition behind Foundation models
Let’s take the example of a child: his brain learns a lot of things in an unsupervised way, i.e. without his parents. Thus, a representation of the world is created by probabilistic learning, via observation and interaction with its environment.
Parents do not explain the physics laws such as gravity to a one-year-old child, who nevertheless builds a first model based on his perceptions: by observing the phenomenon (astonishment at the fall of an object, etc.) or interaction with its environment (learning to walk, for example). It will be the same for language with the emergence of the first words which are not necessarily dictated to the child, or in the development of visual area in the brain, which allows autonomous recognition of shapes, contrasts, faces…
This emerging phenomenon of autonomous learning, however, does not exclude the role of education and supervision. However, the supervision cost is lower when it is based on prerequisites. Thus, a child will learn more words faster with the help of parents or teachers. The same is true for machines, the first generations of which were based on this principle with the concepts of transfer learning, consisting of using a pre-trained network for a new task. This principle turns out to be even more efficient with the use of foundation models.
A new generation of AI with Foundation models
Even if this idea of autonomous learning seems simple and intuitive (because this is the way we’ve all learned!), this is not how current Machine Learning models are designed: The Machine Learning models that power most of the current AI applications rely on Supervised Learning. These models are designed to solve only one task and are thus models of little to intermediate size. For example, a Resnet-50 has about 25 Million parameters. This comes in parallel with the limited size of supervised learning datasets; For example, there are 1.3 Million images in the ImageNet ILSVRC dataset.
While these numbers may seem a lot, the numbers encountered in the Foundation models paradigm are orders of magnitude higher.
Indeed, Foundation Models are trained on a very large amount of unlabeled data, which is way more available than labeled ones. For example, the Florence Foundation model is trained on a corpus of 900M pairs of text and images.
These Foundation Models are able to surpass supervised models on supervised tasks: For example the COCA Foundation model is the current best performing algorithm on the ImageNet classification task.
However, the Foundation Models are not only better than supervised models on their tasks, they also open the way to many tasks that were not reachable.
For example, the PaLM model developed by Google can explain jokes, summarize texts, translate between programming languages, and many others tasks:
Even if Foundation models originated in the language realm, they are evolving fast to use multiple modalities at the same time. For instance, DALL-E relies on the CLIP Foundation model, and the BEiT-3 algorithm had better performance in multimodal task involving text and images.
What can be done with the use of Foundation models
Foundation models already exist with specific use cases outside of medicine. For example, we already use at Raidium tools based on this technology : Github copilot, based on GPT-3 allows you to code faster, by using a “prompt” which acts like a semi-automatic order for writing lines of code, saving precious time.
This design inspires us. Do not automate a job, but make it more efficient by creating tools to serve those who use it. A desirable disruption in a way.
Our objective at Raidium is to transfer these advanced technologies to the world of medical imaging, for research as well as for clinical practice. For that we’ll need to build new tools, in a context of a medical environment full of legacy softwares.
For this, we want to aim for a world premiere : build a first Foundation model for radiology, in order to build a visual assistant, a “github copilot” of the radiology. In a context of growing needs of precision medicine, that already overcome the supply of radiologists, such a Foundation Model will open a brand new world of radiological practice, by a smarter development of AI based imaging biomarkers, with large models that both embed visual and medical knowledge, making the development of AI more agile and scalable than existing solutions.
At Raidium, we want to focus on medical use cases where radiology can have huge impact on outcome, such as cancer detection, metabolic or cardiovascular diseases.
We think the positive impact of this technology is big for Healthcare. Doctors will be more efficient in their work, clinical research will be accelerated, and in the end, everyone wins : as patients, we will all benefit from a more efficient medicine.
This challenge will not happen by itself, because the mission is ambitious. If you want to participate in this adventure, contact us!
Paul Hérent is a radiologist, and conducted the first French thesis on the uses of AI in radiology
Pierre Manceron is an applied mathematics engineer from Ecole Centrale Paris and MVA, specialized in Machine Learning
They met at Owkin, where Paul was trained in Machine Learning and Pierre the ML Engineering group Lead: They contributed to expand clinical research thanks to AI through high impact publications
Contact : contact@raidium.eu
Follow us on Linkedin