# aMazeGNN: A Maze clustering GNN

*By Paul Falcone, Geena Kim, and Shan Lu as part of the Stanford CS224W Fall 2021 course project.*

**1. Introduction**

Visual navigation in robotics is an active research area where the goal is to use visual information to locate and navigate a robotic/autonomous agent. There have been an increasing number of approaches using deep learning for both visual information processing (localization) and navigation tasks, most of which use combinations of CNN, MLP, and RNN-family neural network architectures. Since the search space on state and action can be very large, a naive reinforcement learning approach does not always work well when navigating in a complex environment. Taking an abstraction of the environment and then forming an abstraction graph, one can search for a path in the abstraction graph which can simplify navigating in a complex environment. Our project is about clustering on maze like grids to coarsen the graph and see if our algorithm can group similar room-like structures together. This can be useful for robotic navigation in a new environment, by analyzing an unlabeled floor plan, the algorithm would be able to divide the layout into rooms, hallways, corridors, etc, which will then make it easier to perform further downstream tasks such as navigation. For the first step, we use a Graph Neural Network (GNN) to perform a k-means clustering on a maze graph. We show that a GNN can perform a k-means clustering on a 2D grid and suggest ideas to improve clustering on grids.

**2. Data**

We created a maze image and then applied pooling by 10 pixels by 10 pixels to create a grid. Points in the grid are the nodes and edges are formed between the four (maximum) nodes up, down, left, and right of a given node. A pytorch dataset saves the raw grid (with the maximum node degree = 4) with coordinates as node features and action tuples as edge features. The resulting graph has about 61 k nodes and 120 k edges. When the dataset is created, about 43 k nodes are randomly assigned to the training set and about 18 k nodes are randomly assigned to the test set.

**3. Methods**

For the clustering task, we used a 2-layer GraphSage architecture and added Euclidean loss and regularization terms. Let us first review how the GraphSAGE algorithm works. GraphSAGE [1] is a graph neural network that takes as an input a graph with feature vectors associated to each node. The algorithm is inductive, meaning that it can be applied to more general unseen data (that is similarly structured), as opposed to transductive frameworks that only apply to a fixed graph. The GraphSAGE algorithm uses the general graph neural network architecture. There are three primary steps: message passing, message aggregation, and updating. Initially, each node has their feature vector as their message that they propagate to their neighbors. The aggregation step happens for each node; using a permutation invariant aggregation function (e.g. sum, mean, or max), we take all of a node’s neighbor’s messages and aggregate them. We then concatenate the aggregated message with the message from the node itself, apply a linear layer to it, and then apply a nonlinear activation function. The final step is to normalize the output and update the message function for the given node. This structure allows information from each of the nodes to propagate outwards to be shared across the graph. See figures 2 and 3 below for a diagram and a code snippet of the algorithm respectively.

We used the following k-means loss function:

The first two terms are the mean squared distance from the nodes within the cluster *k* to the cluster mean *Xₘ⁽ᵏ⁾* and the mean squared distance from the nodes within other clusters *Cⱼ*(*j ≠ k*). The first term incentivizes the nodes within the cluster to stay close together and the second term incentivizes nodes from other clusters *Cⱼ *to stay far away from the cluster *Cₖ*. The cluster centroid is calculated below; it is obtained by averaging the cluster assignment probabilities, and thus is initially not an accurate centroid, but it converges to the real centroid over time as assignment probabilities become more confident. The soft centroid of the *kᵗʰ *cluster *Cₖ* is calculated by:

To get the cluster assignment probabilities *P⁽ᵏ⁾*, we apply the GNN to the graph and its features: GNN(*A, X*) → *P⁽ᵏ⁾*.

The ℓₛₑₚ is the separation loss, which penalizes nodes from other clusters *Cⱼ, j* ≠ *k* being too close to the centroid of *kᵗʰ* cluster:

where *K* is the number of clusters, *Nⱼ* is the number of nodes in cluster *j*, and small ε is to prevent divergence. The next loss term is the probability assignment loss, which penalizes assignment probabilities of a node being uniformly distributed among the clusters:

The next loss term, ℓ*ₙᵤₘ *is loss that penalizes all nodes for being in a single cluster:

The last loss term is the inter loss and is given by the sum of the mean squared distances between nodes within each cluster. It is a naive measure of the size of a cluster, and it further forces nodes in a cluster to be located close together:

where *s* is a number of samples in the cluster *k*. For calculation efficiency, we randomly sampled 1000 nodes in each cluster. Our implementation of the different loss functions is given below:

**4. Results**

Using the k-means loss mentioned above and 2-layer GraphSAGE with Mean aggregator and four cluster assignments, we demonstrate the clustering performance of the model on the maze data. Fig. 4 shows the initial clustering assignment probability on train and test dataset, whereas Fig. 5 and 6 show the final assignment after 490 epochs of training. Comparing the figures 7–9 to the figures 4–6, one can see the different clustering behavior when with and without the ℓᵢₙₜₑᵣ. When we do not include the ℓᵢₙₜₑᵣ loss, the results are not satisfactory; for instance, in figure 8, cluster 0 on the test data is very irregular because there is not enough regularization on keeping nodes together in the same cluster. With the ℓᵢₙₜₑᵣ regularization term, the nodes in the cluster tend to group together and the prediction patterns on train and test data match.

**5. Discussion and Next Steps**

In this work, we demonstrated k-means clustering using GraphSAGE [1] model on the maze grid data. With some loss function engineering, the GNN model is able to demonstrate k-means algorithm on the grid data. We found it is more difficult to apply clustering using a GNN on grids than real-world graphs such as citation networks or social media where there are naturally occurring community structure. In maze grids, most nodes are uniformly connected (d = 4), and they do not have naturally occurring communities or clusters. We used Euclidean loss functions with a few regularization terms and demonstrated k-means clustering using a GNN. The results show that it works with some limitations:

(1) k-means clustering of nodes in a maze grid using Euclidean distance ignores walls and cluster the nodes that are on the other side of the wall.

(2) The current model (2-layer GraphSAGE) only covers 2-hop neighborhood per node when making decisions, thus it uses very little neighborhood information. A bigger field of view might be beneficial. For that, a deeper architecture may help, but also it may cause over-parametrization. To overcome it, we can use simpler layers such as lightGCN and increase the number of layers in the GNN. Another idea we propose is to augment the graph to have more clustered structure. If the graph has more clustered structures, it is easier to cluster using modularity information. In [2], they demonstrated clustering using modularity-based loss function on paper-citation networks and more.

Inspired by the approach in [2], we explored the idea of augmenting the raw grid to a more densely connected graph. We applied an edge augmentation to increase the number of edges between nodes that are nearby each other. To perform this edge augmentation, we applied a random walk from each node, and then added edges based on the number of times each neighboring node was visited. Adding these additional edges can help performance of the GNN to increase its field of view with small number of layers as well as it can allow us to use modularity-based loss for clustering. Figure 12 shows an example of 5-hop-random walk. Figures 10 and 11 show connections in the raw grid and random walk -augmented graph. Figure 13 shows a heat map of node degrees of the nodes in the augmented graphs. One can see that there are clustered structures. Both these clusters and the random walk edges can make the clustering algorithm more effective by modularity information and increased range for reaching neighborhood.

**6. Colab**

Our code can be found at the following Google Colab: https://colab.research.google.com/drive/1v-NJB2RWRwZTiBZEAsaRBPqPbEuFfOZF?usp=sharing&fbclid=IwAR3mCMtZO_t4dXC3reYJIil5I4g8pvbFPBNfPUgIix4HddvJ2t3GQkLBLPc.

**References**

[1] Hamilton, Will, Zhitao Ying, and Jure Leskovec. “Inductive representation learning on large graphs.” *Advances in neural information processing systems* 30 (2017).

[2] Tsitsulin, Anton, et al. “Graph clustering with graph neural networks.” *arXiv preprint arXiv:2006.16904* (2020).

[3] Kim, Geena, Paul Falcone, and Shan Lu. aMazeGNN: A Maze clustering GNN, (2021), GitHub repository, https://github.com/libphy/robonav-gnn.git