# Why Graph Theory is Cooler Than you Thought

## Graph Theory in Machine Learning, and How it’s Changed the Game

## What are Graphs?

Talk to a scientist in just about any discipline, and ask them the question — based on their discipline — “how *th*at stuff works”. You’ll likely find that there are systems and networks that you have to consider before you can *really* understand how any given thing works: whether that’s the human body, a food chain in an ecosystem, a chemical reaction, or a society as a whole. Without understanding the relationship between two animals in an ecosystem, two atoms in a molecule, or cells and tissues in our body, you just have a bunch of data: a list of cells, a readout of animals and what they eat, etc.

Traditional machine learning models often take data this way: they take lists or tables of data and do some *stuff* (the details of which depend on the algorithm being used as well as a few other parameters) to make predictions about a thing. **But for some problems, there’s a better way.**

Graphs are data structures that represent networks of or relationships between the data they contain. Typically, they’re represented as “nodes” and lines, or “edges”.

Figure 1, above is an example of a “directed graph”, or, a graph in which data has a one-way relationship with other data. This is demonstrated through arrows — Medium renders this photo in a rather small frame, so they’re slightly hard to see — that show who is “following” who, and which indicate mutual relationships where they exist.

The circles which represent data (in this case, an *image* and a *username*) are called **nodes** or **vertices, **and the lines which connect them are called **edges.** These lines represent relationships between the vertices, and can be represented (as they are here), as an “all or nothing relationship” (i.e: you’re following someone or you aren’t) or as a “weighted” relationship (i.e: a thicker line can represent higher engagement between two users, while a thinner line can represent a weaker relationship or lower engagement). We can see an example of a weighted graph below: Figure 2 represents differing levels of connectivity between varied regions in a brain.

At this point, it’s possible you’re feeling how I felt when I first was introduced to graphs and graph theory in a computer science class: bored and possibly slightly confused. The good news is, since we’ve covered some of the terminology that’s necessary to understand the good stuff, we can start to get into **why graphs matter, and what makes them so cool.**

## So, what?

Graphs are already used for some pretty neat stuff in computer science: your Maps application, for example, is using graphs behind the scenes to store data about locations and the streets that connect them, and is using shortest-distance algorithms to find you the *shortest route to your destination.*

But it gets even better when we start to look at using graphs for machine learning. Because graphs demonstrate comprehensive relationships between pieces of data (as compared to ordered lists of data, or tensors which tell us little about the relationships between data points or features by themselves), we can use them to perform in-depth analysis and make predictions based on these relationships.

## Graph Theory & Machine Learning — But How?

Before we get to reap the benefits of combining these graphs or networks we keep talking about with machine learning, we have to somehow represent our graph in a way that a computer — and then a machine learning algorithm — can understand.

Graphs can be represented traditionally in one of three basic ways:

**An Adjacency Matrix**

Adjacency matrices do… kind of just what they sound like they’d do. They represent connections, or *edges*, between different nodes using a matrix. We can look at an example to illustrate what this might look like:

Here, if we look at A, C, we can see that there is no direct connection, because there is a zero in that spot. However, if we look at E, C (or C, E because this is an undirected graph), we see a 1 which represents an edge between those two nodes.

**2. An Edge List**

An edge list is another way to represent our network — or graph — in a way that’s computationally understandable. Here, we represent pairs of connected nodes within a list. You can see an example below:

**3. An Adjacency List**

Adjacency lists combine the above two approaches, representing a list of nodes, connected to a list of all of the nodes they’re directly connected to. To illustrate, let’s look at an example:

With the above three approaches, we’re able to tackle the difficulty of representing graphs computationally in our code. However, there are still some challenges when feeding graphs to machine learning models. Traditionally, deep learning models are good at handling data which takes up a fixed amount of space and is unidirectional. No matter how we represent them, graphs don’t take up a fixed amount of space in memory, and aren’t continuous, but rather, each node holds a reference to nodes it’s directly connected to.

There are some really fabulous ways of tackling these challenges, which I’ll dive deeper into in a later article. For now, I’ll leave you with a few resources to research on your own should you be interested, which are providing us with new ways to expand the problems machine learning is able to solve.

- Learn about Deep Learning on Graphs with Graph Convolutional Networks
- Learn about DeepWalk — a project finding another way to represent graphs for machine learning

## Graph Theory and Machine Learning — What Can we Do With It?

Nothing exists in a vacuum, and understanding the interconnected networks of data that make up many of our scientific disciplines provides the exciting potential to answer so many questions — more than I can begin to wrap into this article.

What if we could better understand the human brain? What if we could make predictions about the effect of some stimulus or change on an ecosystem? Or, predict which compound is the most likely to create an effective drug?

The **best part of everything we’ve just learned** is that we can — and it isn’t simply a theoretical possibility, but something we’re doing right now!

Graph theory is already being used for:

- Diagnostic modeling (predicting to a certain degree of certainty whether or not a patient has a specific diagnosis).
- Helping with the diagnoses and treatment of patients with cancer.
- Developing pharmaceuticals (medications).

4. Seeking to develop a theoretical synthesis between the theories of ecology and evolution.

## How Graph Theory Makes it All Happen

Let’s dive a little deeper into these applications, so we can look at the utilization of graph theory within them in more detail.

Let’s use diagnostic models as an example. Specifically, I want to look at this example of network analysis being used for the diagnosis and identification of possible schizophrenia in patients:

Using graphs to represent network analyses of the brain, neuroscientists are able to map key findings related to the diagnosis of schizophrenia. Given that there are certain markers for the onset of schizophrenia:

- less efficiently wired networks
- less local clustering
- less hierarchical organization

We could potentially evaluate these networks with a machine learning algorithm and predict the probability a patient has or will develop schizophrenia based on these markers.

Without the knowledge of these networks, in this example, this kind of analyses becomes an entirely different neurological analysis of the patient. The promising discoveries of these findings for schizophrenia has promising implications for the diagnosis and treatment of this disorder — possible early diagnosis and intervention that goes far beyond simply evaluating symptoms.

This is just an example, but it’s entirely illustrative of the benefits of graph theory in machine learning as it intersects with other disciplines.

The fact of the matter is, there is often much more to our data than we can represent in lists, data frames, or tensors alone. While there are ways to explore our data and present it in such a way that we can hypothesize relationships and even enable our algorithms to predict these, when we’re able to represent the connections between our data in a different way, we’re able to *do more* with the data that we have.

When we understand the ways in which *things *relate to one another, we understand them better: we can make more comprehensive predictions, and answer harder questions, with some pretty life changing results.

**Resources:**