Why Graph Theory is Cooler Than you Thought

Graph Theory in Machine Learning, and How it’s Changed the Game

Sid Arcidiacono
Feb 26 · 7 min read

What are Graphs?

Talk to a scientist in just about any discipline, and ask them the question — based on their discipline — “how that stuff works”. You’ll likely find that there are systems and networks that you have to consider before you can really understand how any given thing works: whether that’s the human body, a food chain in an ecosystem, a chemical reaction, or a society as a whole. Without understanding the relationship between two animals in an ecosystem, two atoms in a molecule, or cells and tissues in our body, you just have a bunch of data: a list of cells, a readout of animals and what they eat, etc.

Traditional machine learning models often take data this way: they take lists or tables of data and do some stuff (the details of which depend on the algorithm being used as well as a few other parameters) to make predictions about a thing. But for some problems, there’s a better way.

Graphs are data structures that represent networks of or relationships between the data they contain. Typically, they’re represented as “nodes” and lines, or “edges”.

fig. 1: A graph representing a social network, with arrows (small) showing who is “following” who, as well as mutual relationships where they exist.

Figure 1, above is an example of a “directed graph”, or, a graph in which data has a one-way relationship with other data. This is demonstrated through arrows — Medium renders this photo in a rather small frame, so they’re slightly hard to see — that show who is “following” who, and which indicate mutual relationships where they exist.

The circles which represent data (in this case, an image and a username) are called nodes or vertices, and the lines which connect them are called edges. These lines represent relationships between the vertices, and can be represented (as they are here), as an “all or nothing relationship” (i.e: you’re following someone or you aren’t) or as a “weighted” relationship (i.e: a thicker line can represent higher engagement between two users, while a thinner line can represent a weaker relationship or lower engagement). We can see an example of a weighted graph below: Figure 2 represents differing levels of connectivity between varied regions in a brain.

fig. 2: An example of a weighted graph, showing neural connections. The thicker lines represent higher “connectivity” and the thinner lower “connectivity”. Source

At this point, it’s possible you’re feeling how I felt when I first was introduced to graphs and graph theory in a computer science class: bored and possibly slightly confused. The good news is, since we’ve covered some of the terminology that’s necessary to understand the good stuff, we can start to get into why graphs matter, and what makes them so cool.

So, what?

Graphs are already used for some pretty neat stuff in computer science: your Maps application, for example, is using graphs behind the scenes to store data about locations and the streets that connect them, and is using shortest-distance algorithms to find you the shortest route to your destination.

But it gets even better when we start to look at using graphs for machine learning. Because graphs demonstrate comprehensive relationships between pieces of data (as compared to ordered lists of data, or tensors which tell us little about the relationships between data points or features by themselves), we can use them to perform in-depth analysis and make predictions based on these relationships.

Graph Theory & Machine Learning — But How?

Before we get to reap the benefits of combining these graphs or networks we keep talking about with machine learning, we have to somehow represent our graph in a way that a computer — and then a machine learning algorithm — can understand.

Graphs can be represented traditionally in one of three basic ways:

  1. An Adjacency Matrix

Adjacency matrices do… kind of just what they sound like they’d do. They represent connections, or edges, between different nodes using a matrix. We can look at an example to illustrate what this might look like:

fig. 3: an adjacency matrix representing an undirected graph. In the matrix, 0 means for “no connection” between the intersecting nodes, and 1 means “connection” between the intersecting nodes. Source

Here, if we look at A, C, we can see that there is no direct connection, because there is a zero in that spot. However, if we look at E, C (or C, E because this is an undirected graph), we see a 1 which represents an edge between those two nodes.

2. An Edge List

An edge list is another way to represent our network — or graph — in a way that’s computationally understandable. Here, we represent pairs of connected nodes within a list. You can see an example below:

fig. 4: An edge list contains pairs of vertices or nodes which are connected to each other.

3. An Adjacency List

Adjacency lists combine the above two approaches, representing a list of nodes, connected to a list of all of the nodes they’re directly connected to. To illustrate, let’s look at an example:

fig. 5: an adjacency list representing an undirected graph. We can tell this graph is undirected, because connected nodes all contain each other in their “connections” list.

With the above three approaches, we’re able to tackle the difficulty of representing graphs computationally in our code. However, there are still some challenges when feeding graphs to machine learning models. Traditionally, deep learning models are good at handling data which takes up a fixed amount of space and is unidirectional. No matter how we represent them, graphs don’t take up a fixed amount of space in memory, and aren’t continuous, but rather, each node holds a reference to nodes it’s directly connected to.

There are some really fabulous ways of tackling these challenges, which I’ll dive deeper into in a later article. For now, I’ll leave you with a few resources to research on your own should you be interested, which are providing us with new ways to expand the problems machine learning is able to solve.

Graph Theory and Machine Learning — What Can we Do With It?

Nothing exists in a vacuum, and understanding the interconnected networks of data that make up many of our scientific disciplines provides the exciting potential to answer so many questions — more than I can begin to wrap into this article.

What if we could better understand the human brain? What if we could make predictions about the effect of some stimulus or change on an ecosystem? Or, predict which compound is the most likely to create an effective drug?

The best part of everything we’ve just learned is that we can — and it isn’t simply a theoretical possibility, but something we’re doing right now!

Graph theory is already being used for:

  1. Diagnostic modeling (predicting to a certain degree of certainty whether or not a patient has a specific diagnosis).
  2. Helping with the diagnoses and treatment of patients with cancer.
  3. Developing pharmaceuticals (medications).

4. Seeking to develop a theoretical synthesis between the theories of ecology and evolution.

How Graph Theory Makes it All Happen

Let’s dive a little deeper into these applications, so we can look at the utilization of graph theory within them in more detail.

Let’s use diagnostic models as an example. Specifically, I want to look at this example of network analysis being used for the diagnosis and identification of possible schizophrenia in patients:

Using graphs to represent network analyses of the brain, neuroscientists are able to map key findings related to the diagnosis of schizophrenia. Given that there are certain markers for the onset of schizophrenia:

  • less efficiently wired networks
  • less local clustering
  • less hierarchical organization
fig. 6: Representations of neurological network findings in schizophrenic patients. Source

We could potentially evaluate these networks with a machine learning algorithm and predict the probability a patient has or will develop schizophrenia based on these markers.

Without the knowledge of these networks, in this example, this kind of analyses becomes an entirely different neurological analysis of the patient. The promising discoveries of these findings for schizophrenia has promising implications for the diagnosis and treatment of this disorder — possible early diagnosis and intervention that goes far beyond simply evaluating symptoms.

This is just an example, but it’s entirely illustrative of the benefits of graph theory in machine learning as it intersects with other disciplines.

The fact of the matter is, there is often much more to our data than we can represent in lists, data frames, or tensors alone. While there are ways to explore our data and present it in such a way that we can hypothesize relationships and even enable our algorithms to predict these, when we’re able to represent the connections between our data in a different way, we’re able to do more with the data that we have.

When we understand the ways in which things relate to one another, we understand them better: we can make more comprehensive predictions, and answer harder questions, with some pretty life changing results.

Resources:

  1. Everything You Need to Know About Graph Theory and Machine Learning
  2. Graph Theory & Machine Learning in Neuroscience
  3. Knowing Your Neighbours — Machine Learning on Graphs
  4. Network-based machine learning and graph theory algorithms for precision oncology

MLearning.ai

Data Scientists must think like an artist when finding a solution

Sign up for AI & ART

By MLearning.ai

A weekly collection of the best news and resources on AI & ART Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

MLearning.ai

Data Scientists must think like an artist when finding a solution, when creating a piece of code.Artists enjoy working on interesting problems, even if there is no obvious answer.

Sid Arcidiacono

Written by

Back End Developer & Machine Learning Engineer dedicated to creating innovative solutions. Currently attending Make School San Francisco class of 2022.

MLearning.ai

Data Scientists must think like an artist when finding a solution, when creating a piece of code.Artists enjoy working on interesting problems, even if there is no obvious answer.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store