Is Alzheimer’s disease too complex even for AI?

AD research with AI is massive but does it pays out?

Semen Yesylevskyy
Receptor.AI
Published in
5 min readJan 13, 2022

--

Alzheimer’s disease (AD) is one of the most fearsome curses of our rapidly aging civilization. This severe neurodegenerative disease is still not treatable and once diagnosed will inevitably lead to mental disability and dementia. AD is prevalent in elderly, but modern researches show that the disease may start several decades prior to symptoms onset.

Despite very intensive research the AD is still mysterious an poorly understood in many ways. The condition diagnosed as AD appears to be extremely heterogeneous with no clear classification and subtyping. No clear risk factors are identified yet. There are no well-established genetic variations or environmental conditions associated with AD.

It is clear that the disease is so complex and multifaceted that traditional approaches fail not only in its treatment but also in understanding of its biological basis and diagnostics. The whole arsenal of modern techniques is now used to better understand the AD. Genomics, transcriptomics, metabolomics, various types of macroscopic and microscopic imaging are used along with clinical observations and demographic analysis. All these techniques produce huge amounts of data, which overwhelm traditional computational tools.

As usually in such cases, the Machine Learning techniques may help to make sense out of the flood of heterogeneous data. The recent review identify several major areas where ML could be beneficial in AD research. This includes biomarker identification, disease subtyping and classification, prediction of progression and, finally, drug discovery and drug repurposing.

The major application of ML in AD research. Image from https://portlandpress.com/emergtoplifesci/article/5/6/765/230422/Applied-machine-learning-in-Alzheimer-s-disease.

Disease classification is especially important for early diagnostics and is based mostly on the data of magnetic resonance imaging (MRI), positron emission tomography (PET) and electroencephalography (EEG). Recently neuropsychological data, such as analysis of the patient’s speech records, were also included into ML models. The best models, which integrate different source of data, are able to identify the early stages of AD with 98% accuracy.

Drug repurposing is now considered more promising than de novo drug discovery for AD because it is not clear which molecular targets should be used for effective treatment. There are several approaches for finding existing drugs usable off-label for treating AD. The first one is comparing drug-induced changes in gene expression with AD-induced changes. The second one is a network pharmacology approach, which is based on association graphs including known biological networks and pathways and known effects of the drugs. The ML can reveal hidden drug-target associations and thus help to find new targets and their prospective off-label ligands. The third approach is population-based analysis of the treatment effects. Large real-world databases of the patient records contain indirect data about off-label effects of prescribed drugs. ML methods can process such data and identify those drugs, which are associated with lower risk of AD onset and progression.

Subtyping of AD is another important task, where ML can help a lot by classifying huge amounts of data. The idea of subtyping is to find distinctive groups of patients, which have similar patterns of disease progression depending on their comorbidities, genotype, race, gender, biomarkers, etc.

The predictions of disease progression has the opposite goal: to predict the risk of progressions depending on available patient data. There are two major goals of progression predictions: to identify the “profile” of healthy individuals, which are at high risk of AD and to predict the “aggressiveness” of already diagnosed AD. The later utilizes the data of longitudinal measurements, such as the series of lab results and cognitive tests of the same patient over time.

Finally, ML could be beneficial in discovering new AD biomarkers. The most pragmatic approach is identifying optimal combinations of existing biomarkers for disease progression, but attempts are made to find completely new biomarkers, which were not previously used for AD.

Machine learning methods applied to Alzheimer’s disease research. Image from https://portlandpress.com/emergtoplifesci/article/5/6/765/230422/Applied-machine-learning-in-Alzheimer-s-disease.

The ML methods used in AD research are as diverse as their topics themselves and include almost all modern ML techniques. An amount of efforts invested by ML community onto AD research is huge and growing with years. However, the “silver bullet” is still not found and many researchers doubt if it exists at all.

The situation with AD clearly shows that ML techniques can’t make sense of the heaps of data if the data are not correctly filtered and annotated. The biggest problem with AD datasets is their unprecedented heterogeneity. We still don’t know if AD is a single disease or a number of clinically similar conditions with different etiology and provoking factors. If our working hypothesis is wrong the AI could be as confused as we are.

Available datasets could could be huge but often lack important details. For example, real-world patient data are intended for billing purposes, not for academic science. As a result the diagnosis records are often presented as disease codes without any details. It is often not clear if AD is correctly differentiated form other causes of dementia.This is not so important for billing statistics but crucial for ML training.

Integration of data from different sources is still painful due to different formats and quality. For example, not all patients undergo high-resolution brain imaging or genotyping. How to work with such sparse data, where only small part of the patients are covered with the whole set of diagnostic methods? Moreover, there is a systematic demographic bias in such data due to their expensiveness — reach patients from rich countries are more likely to have such data.

The final concern is reproducibility of results, which is, of course, not unique for AD studies. Most of existing ML studies do not provide software and datasets in the public repositories, which makes them impossible to reproduce and reuse. This leads to redundant efforts for data preparation and model training.

Nevertheless the progress is AD studies using ML techniques is clearly visible and the future directions are clearly identified. During the next decade some of developed ML technique could finally find their way into clinical practice to help diagnosing and treating an AD.

--

--

Semen Yesylevskyy
Receptor.AI

PhD, Doctor of Sciences, researcher in the area of molecular dynamics and drug discovery. CSO of Receptor.AI. https://t.me/semen_yesylevskyy