Graph Convolutional Neural Networks- A Talk at IISc

Published in

ACM-W Manipal

5 min readFeb 21, 2018

“ What if we could input a computer network as a graph, directly, and use neural networks to predict..maybe the top few computers that are prone to attacks by hackers?”
- Dr. Tijmen Tieleman

Being an intern at IISc has its own perks, one of them being the talks given by researchers in trending technological domains. I was fortunate to attend a talk recently, organised by the IEEE Signal Processing Society, Bangalore Chapter and the Electrical Engineering department at IISc. Current CTO of mind.ai and a former graduate from Geoff Hinton’s lab (University of Toronto), Dr. Tijmen Tieleman introduced us to the concept of graph convolutional neural networks within an hour! I’ve tried to pen down the remnants of the session, from my perspective. This one’s for the AI & Deep Learning enthusiasts.

We are accustomed to focussing on grid-based data, in the Euclidian domain, with defined neighbours. For example, in case of a traditional CNN, we know the neighbourhood of an image pixel, especially while applying filters in the convolutional or pooling layers (I’ll elaborate on this a little later). Imagine a scenario where we wish to input data in the form of a graph, with all the neighbourhood connections maintained. Many scenarios require data to be analysed in the non-Euclidean space, as graphs. Computer networks, social media networks, studies pertaining to genetics and chemical bonds- all of these contain data which are in the form of graphs.

Graph CNNs versus Grid CNNs

“Graph CNNs fall under bleeding edge research, when it comes to deep learning.”
- Dr. Tijmen Tieleman

The traditional grid CNNs have two characteristics which pose to be challenges in case of graph CNNs. It’s difficult to assign weights to nodes when the relative location of the nodes are not in the grid format. How does one know which node to assign w1, w2 or w3 to, without a 3x3 or NxN spatial structure? The following represent the challenges faced while designing a graph CNN.

No defined neighbourhood (What are the neighbours?)
The weight vectors pertaining to neighbours are unknown (Which weight should be assigned to which neighbour?)

Hence, researchers started looking at localisation techniques. There is an inherent tension between semantics and location in semantic segmentation: local information resolves where while global information seeks to resolve what. 1x1 CNN or Fully Convolutional Networks combine shallow, appearance information with deep, coarse semantic information. Hence, although using a 1X1 filters may seem a little absurd owing to the fact that the neighbouring pixels are not considered for convolution, converting fully connected layers to convolutional layers with this concept proves to be effective in terms of localised information. The method proposed by Jonathan Long et al. learns end-to-end. Addressing the two issues earlier, the following architecture was proposed:

Use the same weight w for a particular layer, for a particular graph node(instead of w1, w2, w3…). This resolves location issues.
Each layer will have the same architecture in terms of node connections. This solves the issue of neighbourhoods.

Using concepts of graph theory can be useful in several cases. Most of the current research work focuses on the spatial domain, however, results in the spectral domain also prove to be efficient and comparable to state-of -the art results. For instance, average pooling can be substituted by graph clustering!

A modification of this would be to have the same weights for immediate neighbours and a different weight for a loop, as shown. Furthermore, a graph network within a graph network, can store the local node information per node. This is inspired by the Network in Network architecture where each node is another smaller network, instead of simply computing a convolution followed by a non-linear activation at once! Following this, the audience had many doubts; a plethora of them are open-questions.

Questions from the audience

Why can’t we use an adjacency matrix to represent a graph and then use a standard 2D CNN to it?
Graph CNNs are designed to also restore connectedness across various layers. Not only will the matrix be extremely sparse, information regarding connectedness will be lost over further layers. Also, this doesn’t help in a one-shot representation of a node, say Node A, connected with Node B and Node C simultaneously.

2. Can graph CNN concepts be used for Grid CNN concepts?
Yes. Essentially, grid representations can be viewed as a simple form of graph representations with known and structured neighbours. A good suggestion to this was that a graph representation can be viewed as a stacked or flattened grid representation where each layer can represent a central node with a weight of w for the spatial locations where other neighbouring or connected nodes are present (as shown).

3. What if 2 graph nodes are not directly connected?
ConvNets always follow a hierarchy, when it comes to passing information. So incase two nodes are not directly connected, their information will be combined, with corresponding weights, over deeper layers.

4. What if we have to consider weighted graphs?
Currently, they are working on techniques where the link weights are represented as nodes.

5. What if 2 graphs have identical relative degrees?
The corresponding weights will vary because two things are taken into consideration for such an ideology- connectedness (which will be the same in this case) and individual node identity(which will vary, based on the information).

6. If we use the concept of same weights to grid layers, will it be effective?
No. Having lesser weights reduces synapses and learning capacity. Graph CNNs use it as a novel approach, but ideally, different weights would prove to be a better approach.

Open-ended Questions

Can we extend this concept to RNNs?
Can we extend this concept to Directed Graphs?
Can we generalise this for both, Graph CNNs and Grid CNNs?
How can we represent link information? (For example, whether it is a single, double or triple bond between atoms).
How can we distribute distinct weights better?
GPU and other hardware, to increase the computational efficiency.

Note: This is a summary of the hour long session and none of the methodologies are proposed by me. Feel free to correct me or provide your suggestions in the comments’ section. Also, excuse my drawings :P

Graph Convolutional Neural Networks- A Talk at IISc

Graph CNNs versus Grid CNNs

Questions from the audience

Open-ended Questions

Written by Sukriti Paul