How Machine Learning Might Be Able to Find Cure for Cancer

An Huynh
5 min readMar 12, 2019

--

Hi, my name is An and I’m currently enrolled in the Impact of AI on Society course at Wentworth Institute of Technology. Our main reading is The Master Algorithm by Pedro Domingos and in the book, he provides an interesting suggestion to how machine learning can help find the cure for cancer.

Link to buy the book: https://www.amazon.com/Master-Algorithm-Ultimate-Learning-Machine-ebook/dp/B012271YB2

To understand how machine learning might be able to find a cure for cancer, we need to understand how a machine “learns”. Most of the premise of Pedro’s book is learning about various types of machine learning algorithms, where computers figure out solutions to problems on their own by making inferences from data.

Pedro categorizes the different ways machines learn into five “tribes”:

  1. Symbolists: Views learning as the inverse of deduction and takes ideas from philosophy, psychology, and logic
  2. Connectionists: Reverse engineer the brain and inspired by neuroscience and physics
  3. Evolutionaries: Simulate evolution on the computer and draw on genetics and evolutionary biology
  4. Bayesians: Believe that learning is a form of probabilistic inference and have their roots in statistics
  5. Analogizers: Learn by extrapolating from similarity judgment and influenced by psychology and mathematical optimization

The main tribe we will focus on are symbolists.

As noted in chapter 3 of The Master Algorithm, one of the biggest issues in machine learning right now is figuring out what to do in cases the machine / we haven’t seen before. For example, if there are 2 patients who have the same symptoms, then we can assume the diagnoses is the same. However, if one of the patient’s symptoms doesn’t match anyone else’s, we wouldn’t know what to do right away. Going on from here, no matter how much data you have, the chances that the new case (to which the machine needs to make a decision on) is already in the data set is very small. Machine learning has an unavoidable element of gambling. As Pedro states on page 65:

“Just like evolution, machine learning doesn’t get it right everytime; in fact, errors are the rule, not the exception”

But this isn’t the worst thing in the world, the machine can discard the errors and build on what it gets right.

Think of it like training a dog

One technique suggested by Pedro to accomplish this is to assume that all matches are good. Then, exclude all matches that don’t have some attribute. The machine keeps repeating this and chooses the item that excludes the worst matches and fewest good ones. Another technique is to learn from a pre-defined set of rules, one at a time. After the computer learns each rule, it can discard the positive examples (or proper examples of a concept) that it accounts for. Then, the next rule tries to account for as many of the remaining positive examples as possible and so on so forth. One of the funniest examples of this was how Walmart sold beer next to diapers back in the ’90s. One of the early findings from retail analytics was that if a customer bought diapers, they would also likely buy beer. The interpretation behind this that the mom sends the dad to the store to buy diapers, and as “compensation”, the dad buys a case of beer to go with them.

The problem with this divide and conquer approach is that the machine could end up finding meaningless rules. For example, the machine can derive rule sets that just covers the exact positive examples it has seen and nothing else. This will end up generating every new example as negative. Another flaw is overfitting data or overgeneralizing based on little data. (Similar to assuming all Latinos are maid based on a couple of maids that you’ve seen) The machine should be able to take data provided by humans or learned in previous runs and use it to guide new generalizations from data. Granted, the above technique couldn’t do it, but Pedro suggests another way to learn rules that can.

Remember how Symbolists view learning as the inverse of deduction? (aka Induction)

Deductive Reasoning: The process of reasoning from one or more statements (premises) to reach a logically certain conclusion.

Inductive Reasoning: The process of reasoning where the premises are viewed as evidence towards the truth of the conclusion.

Rule induction involves extracting rules from a set of observations. One example Pedro uses it using induction to predict whether new drugs will have harmful effects. By generalizing from known toxic molecular structures, machines can form rules that quickly weed out many apparently promising compounds, which can help quicken up the process to which new drugs are developed. This can lead to a lot of victories in biology, as Pedro states:

“More generally inverse deduction is a great way to discover new knowledge in biology, and doing that is the first step in curing cancer”

In terms of figuring out a cure for cancer, we need to learn how to stop the bad cells from reproducing without harming the good ones. (Unlike chemotherapy which affects all cells indiscriminately)

The key to this involves using the cell’s genome sequencing, which we can use to predict which drugs will work against which cancer genome. Pedro suggests we can use this alongside induction to figure out the cure for each individual patient’s cancer.

  1. Gather a database of patients, their cancer’s genomes, drugs already tried, and outcomes.
  2. Machine learning uses induction to learn rules with complex conditions, involving cancer’s genomes, the patient’s genomes, and medical history. (It is to note that most cancers involve a combination of mutations, or can only be cured by drugs that haven’t been discovered yet)

Ultimately, what we would need is a model of how the cell works; which will enable us to simulate on the computer the effects of specific patient’s mutations, as well as the effects of different combinations of drugs (existing or speculative).

Some of the tools from IBM Cloud that can be used to model this possible solution are:

  • IBM Watson Studio Cloud and Watson Knowledge (To gather and query data)
  • IBM Watson Visual Recognition (To classify images)
  • IBM Watson Machine Learning (To implement the algorithm and train it)

Whether or not this a true solution to finding the cure, there’s only one way to find out.

--

--