Machine Learning: A Way of Treating Cancer?

Jennifer Zhang
deMISTify
Published in
6 min readSep 25, 2022

Introduction

We hear the phrase “machine learning” in our daily lives. However, it is difficult to understand machine learning. Personally, I think one of the best ways to understand it is through its literal meaning — it is the method for machines to learn from data and analyze the information given to perform tasks. Arthur C. Clarke once famously said, “Any sufficiently advanced technology is indistinguishable from magic.” Magic seems so unrealistic and far away from us, but we can see the advancement of technology in our everyday life. I believe that machine learning is just one of the tools and technologies that we can utilize to solve problems, answer questions, and discover new things by using data.

Machine learning can be used in many different realms, including, but not limited to, translation, facial recognition, and product recommendations. It can also be used in identifying undiagnosable cancers. As we all know, it is hard to treat cancer because cancer cells grow at incredible speeds, their structures vary, and they can sometimes avoid an immune response [1]. When treating cancer, the first step for appropriate treatment is to determine the primary site, that is, where the cancer begins in our body [2]. As our technology advances, we have come up with different cancer treating methods, but in most cases, the origin of the cancer cannot be determined [2]. That is where machine learning comes into play.

The Machine Learning Model

Cancer cells look and behave quite differently from normal, healthy cells because of the different gene expressions. There are also wide disparities among cancer cell expressions. Thanks to advances in single cell profiling and efforts to catalog different cell expression patterns, we have abundant data to find clues about how and where all the cancer cells originate by using machine learning technology [2]. You might find it quite intuitive to use machine models to learn from the data and parse the differences in gene expression between healthy and normal cells and among different types of cancer cells to find the origin of a cancer cell. However, building a model into a diagnostic tool is a challenging task. If the machine model is too complex and includes too many features of cancer gene expression, even though the model will be perfect, it might fail or malfunction when it encounters new data. On the other hand, if the model is simple and does not account for various features of the cancer gene expression, the model will miss information and not be capable of correctly identifying the cancer type.

In order to strike a balance, a new deep-learning approach is studied by researchers at the Koch Institute for Integrative Cancer Researcher at MIT and Massachusetts General Hospital (MGH). The team focused on modelling the pathways directing how the cancer cells alter [2]. As the cancerous tumor develops, cancer cells can lose many of its specialized traits and begin to resemble the traits of the embryonic cells [2]. Therefore, by studying the correlation between the cancer cells and the embryonic cells, we might be able to find the origin of those cancer cells since previous technology permits us to categorize embryonic cells. The researchers compared two large cell atlases: the Cancer Genome Atlas(TCGA), which profiles 33 tumor types, and the Mouse Organogenesis Cell Atlas(MOCA), which contains 56 separate trajectories of embryonic cells as they grow and differentiate [2]. These two large databases were then used for machines to learn and extract out a relationship between cancer cells and embryonic cells. More specifically, the researchers broke down the cancer cell gene expressions from TCGA into individual components that correspond to a specific point of time in its development process — cancer cells will gradually change their structures — and assigned each of these components a mathematical value as an input to the machine-learning model [2]. The model that shows the relationship between TCGA and MOCA data is called the Developmental Multilayer Perceptron (D-MLP).

Figure 1: Diagnosis of Malignancy by Developmental Deconvolution and Machine Learning [4]

In a study done by Enrico Moiso et al., the researchers explicitly showed the process of identifying a specific tumor type. In brief, they mapped tumors to trajectories belonging to major cell lineages and developmental programs (Fig.1A-B). This allowed us to deconvolute bulk tumor gene expression signatures into developmental components (Fig.1C), which we then fed into a Multilayer Perceptron classifier that outputs tumor type (Fig.1D). [4]

Results

After training, the D-MLP was used to identify 52 new samples of cancers of unknown types. The model successfully classified these samples into four categories and showed predictions and guidelines as to how to potentially treat these cancers [2].

Researchers explicitly showed the correlation between TCGA and MOCA data sets.

Figure 2: Systematic Mapping of TCGA to Developmental Trajectories [4]

Specifically, the researchers plotted a matrix whereby every entry contained MOCA cells belonging to a specific developmental sub-trajectory (in rows) colored by each cell’s similarity with different TCGA tumor/tissue types (in columns, i.e. primary, metastatic, or normal tissue) (Fig.2A). Within each grid unit, MOCA single cells are plotted by their Uniform Manifold Approximation (UMAP) coordinates given by the MOCA dataset and colored by a single composite gene expression similarity signature for TCGA samples of each type. They then examined the similarity between sample types and sub-trajectories, with many expected relationships being found. For example, inhibitory neuronal trajectories showed similarity with low grade gliomas (LGG) but not hepatocellular tumors (LIHC), and vice versa for hepatocyte trajectories (Fig.2B). [4]

Moreover, the D-MLP model’s comparisons reveal important insights into gene expression profiles of specific tumor types. Some findings suggest that differences in developmental programs could someday be used to design personalized or targeted cancer treatments [2]. In future work, the researchers plan to increase the model’s predictive power by incorporating other types of data and integrating radiology, pathology, and gene expression information together [2]. This is also the next step in creating personalized medicine for cancer patients [2].

Conclusion

The takeaway from this is that by using machine learning, cancer might no longer be a problem in the future. With the D-MLP model, we are able to further classify different cancer cell types and find the origin of these cells, which will help to create personalized medicine to treat cancer. Moreover, building a machine learning model is a balancing act. Algorithms trained on datasets that exclude certain important data can be biased and will not yield an accurate result [3]. Therefore, it is important to choose which data to be inputted wisely. Regardless, machine-learning models play an important and effective role in science and hopefully someday in the future, we will find a way to successfully treat cancer.

References

[1] M. Wanner, “Why is cancer so difficult to cure?,” The Jackson Laboratory, 01-Dec-2015. [Online]. Available: https://www.jax.org/news-and-insights/2015/december/why-no-cure-for-cancer#:~:text=Cancer%20cells%2C%20although%20different%20in,invading%20pathogens%20%E2%80%94%20actually%20infiltrate%20tumors. [Accessed: 11-Sep-2022].

[2] B. Schroeder, “Using machine learning to identify undiagnosable cancers,” MIT News | Massachusetts Institute of Technology, 01-Sep-2022. [Online]. Available: https://news.mit.edu/2022/using-machine-learning-identify-undiagnosable-cancers-0901. [Accessed: 11-Sep-2022].

[3] E. Burns, “In-depth guide to machine learning in the Enterprise,” SearchEnterpriseAI, 05-Apr-2021. [Online]. Available: https://www.techtarget.com/searchenterpriseai/In-depth-guide-to-machine-learning-in-the-enterprise. [Accessed: 18-Sep-2022].

[4] E. Moiso, A. Farahani, H. Marble, A. Hendricks, S. Mildrum, S. Levine, J. Lennerz, and S. Garg, “Developmental deconvolution for classification of cancer origin,” medRxiv, 01-Jan-2021. [Online]. Available: https://www.medrxiv.org/content/10.1101/2021.11.15.21266314v1.full-text#F2. [Accessed: 22-Sep-2022].

--

--