<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Kanu Grover on Medium]]></title>
        <description><![CDATA[Stories by Kanu Grover on Medium]]></description>
        <link>https://medium.com/@groverk?source=rss-e97aac5c84f8------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/0*Pr6_myt0v-kCF5Ns</url>
            <title>Stories by Kanu Grover on Medium</title>
            <link>https://medium.com/@groverk?source=rss-e97aac5c84f8------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Thu, 28 May 2026 17:02:16 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@groverk/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[CS 224W: Unveiling the Patterns in Random Graphs]]></title>
            <link>https://medium.com/@groverk/cs-224w-project-7b3d1fa35ebb?source=rss-e97aac5c84f8------2</link>
            <guid isPermaLink="false">https://medium.com/p/7b3d1fa35ebb</guid>
            <dc:creator><![CDATA[Kanu Grover]]></dc:creator>
            <pubDate>Wed, 06 Dec 2023 22:13:06 GMT</pubDate>
            <atom:updated>2023-12-20T18:49:44.439Z</atom:updated>
            <content:encoded><![CDATA[<blockquote>Authors: Shayan Talaei, Kamyar Rajabalifardi, Kanu Grover</blockquote><blockquote><a href="https://colab.research.google.com/drive/1eLm6EmoxykMz5eHEAE7l_mq2wi4qcpLM?usp=sharing&amp;fbclid=IwAR0clV-ldvrzpQu4D-OUnSlkxXVJGFJfnjw80gXpcaD3LSEQoB5jqyHkoHw#scrollTo=-d-GczmP1BXV">Colab Notebook</a></blockquote><h3>Introduction</h3><p>In this blog post, we discuss an application of graph machine learning techniques in random graph detection. Our primary focus is on developing a Graph Neural Network (GNN) that can accurately classify different types of random graphs. In particular, we are keen on understanding three families of random graphs —<em> </em><strong>Erdős–Rényi (ER), Barabasi-Albert (BA), and Stochastic Block Models (SBM)</strong><em>.</em> Each of these families exhibits unique characteristics, making them ideal candidates for our study.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*n3N72LzIyeM4awbWG17eDg.png" /></figure><p>A high-level outline of our project is as follows:</p><ul><li><strong>Overview of Random Graphs: </strong>In this section, we’ll describe the characteristics of the three random graph families by delving into the mathematics behind each one. Additionally, we will explain the PyG modules tailored for generating samples of graphs from each family. An important element in our discussion will be the exploration of various parameter choices, which play a crucial role in shaping the unique patterns that each family exhibits.</li><li><strong>Modeling: </strong>In this section, we will tackle a two-fold modeling problem. We will first embark on a classification task that aims to predict the family that some particular graph is sampled from. After exploring various architecture choices for this model, we will introduce a variation that performs regression to estimate the unknown parameters of that family. We’ll analyze some results from our work, describe some pitfalls, and propose additional questions for further study.</li></ul><p>Without further ado, let’s get into it!</p><h3>Overview of Random Graphs</h3><p>Random graphs exist everywhere in the world around us — in everything from epidemiology to social network analysis. Many real-world phenomena can be modeled using random graphs, and understanding the underlying structure of these networks can provide invaluable insights into their behavior. Listed below are a few examples of where random graphs are especially useful in modeling real-world events.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*Gis4KLtmg9xKeuEg" /></figure><ul><li><strong>Epidemiology</strong>: Random graphs are useful in understanding and modeling stochastic processes that happen over a network, as with disease transmission during epidemics. By simulating transmission over a network, researchers can determine the effectiveness of various treatments and interventions on a population.</li><li><strong>Social Networks: </strong>Various types of social networks can be modeled as random graphs. By modeling nodes as people and edges as connections/friendships, random graphs can give insight into how social networks form, multiply, and spread information.</li><li><strong>Neural Subgraph Matching: </strong>Distinguishing between the structured patterns and randomness in real-world graphs has helped researchers measure the presence of significant motifs in the area of <a href="https://arxiv.org/abs/2007.03092">neural subgraph matching</a>.</li></ul><p>Random graphs are instances of random graph families, which are broadly defined in the following sections. Each family is uniquely parameterized by a different distribution — this allows us to generate a rich variety of random graphs that closely resemble real-world networks.</p><p>In the following sections, we’ll describe properties of the three families of random graphs — Erdős–Rényi, Barabasi-Albert, Stochastic Block Models — and explain the various PyG functionalities surrounding them.</p><h4>Erdős–Rényi</h4><p>Erdős–Rényi (ER) graphs are perhaps the most simple and intuitive class of random graphs. ER graphs are defined by two required parameters:</p><ul><li>n<strong> </strong>(int): the total number of nodes in the graph.</li><li>p<strong> </strong>(float): the probability of an edge between any two nodes.</li></ul><p>Due to their simplicity, ER graphs are unable to model many complex, real-world relationships, but they are important in the field of graph theory. They often serve as a benchmark for the study of more complex graph structures and random graphs.</p><blockquote><strong>Implementation</strong></blockquote><p>PyG provides a simple implementation of an ER Graph Generator.</p><pre># Below is PyG&#39;s implementation of the ER Graph Generator<br><br>from torch_geometric.data import Data<br>from torch_geometric.datasets.graph_generator import GraphGenerator<br>from torch_geometric.utils import erdos_renyi_graph<br><br>class ERGraph(GraphGenerator):<br>    r&quot;&quot;&quot;Generates random Erdos-Renyi (ER) graphs.<br>    See :meth:`~torch_geometric.utils.erdos_renyi_graph` for more information.<br><br>    Args:<br>        num_nodes (int): The number of nodes.<br>        edge_prob (float): Probability of an edge.<br>    &quot;&quot;&quot;<br>    def __init__(self, num_nodes: int, edge_prob: float):<br>        super().__init__()<br>        self.num_nodes = num_nodes<br>        self.edge_prob = edge_prob<br><br>    def __call__(self) -&gt; Data:<br>        # erdos_renyi_graph() constructs the edge index of the random graph<br>        # See below for a description of the erdos_renyi_graph function<br>        edge_index = erdos_renyi_graph(self.num_nodes, self.edge_prob)<br>        return Data(num_nodes=self.num_nodes, edge_index=edge_index)<br><br>    def __repr__(self) -&gt; str:<br>        return (f&#39;{self.__class__.__name__}(num_nodes={self.num_nodes}, &#39;<br>                f&#39;edge_prob={self.edge_prob})&#39;)</pre><p>We can construct an ER graph (n=6, p=0.6) as follows:</p><pre>from torch_geometric.datasets.graph_generator import ERGraph<br>er_graph_generator = ERGraph(6,0.6)</pre><p>Here, the er_graph_generator variable simply defines an instance of ERGraph; to produce a graph sampled from the ER distribution specified by n and p, we must implicitly invoke the __call__ function, as follows.</p><pre>er_graph = er_graph_generator()<br>print(er_graph.num_nodes, er_graph.edge_index)</pre><p>In the code above, er_graph is an object of type Data, with class attributes num_nodes and edge_index that specify the number of nodes and edge connectivity of the randomly sampled graph, respectively.</p><p>Here, we’ve defined a helper function to plot the ER graph.</p><pre>import matplotlib.pyplot as plt<br>import networkx as nx<br>from torch_geometric.utils import to_networkx<br><br>def plot_graph(graph: torch_geometric.data.Data) -&gt; None:<br>  G = to_networkx(graph, to_undirected=True)<br>  nx.draw(G, with_labels=True)<br>  plt.show()<br><br>plot_graph(er_graph)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*fDmm831N5da41jI3" /><figcaption>ER (n=6, p=0.6) Graph</figcaption></figure><p>As shown below, if we modify the edge probability parameter p<strong> </strong>to 0.9, we observe a more dense graph. A larger p<strong> </strong>parameter is especially useful for modeling highly interconnected graphs.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*aYGmuXYwqdUdhlF4" /><figcaption>ER(n=6, p=0.9) Graph</figcaption></figure><p>Earlier, we glossed over how PyG actually generates an ER graph. It does so using the erdos_renyi_graph method in the __call__ function of the ERGraph class. The function is outlined in <a href="https://pytorch-geometric.readthedocs.io/en/latest/_modules/torch_geometric/utils/random.html">torch_geometric.utils.random.erdos_renyi_graph</a>. From a high-level, PyG first creates a fully-connected graph, and then masks each edge with probability 1-p. For brevity, we’ve omitted further details.</p><h4>Barabasi-Albert</h4><p>Barabasi–Albert (BA) graphs are a more complex, second family of graphs that model various real-world activities, including network growth processes. These family of graphs incorporate two important phenomena observed in many real-world graphs: growth and preferential attachment.</p><ul><li><strong>Growth </strong>is the ability of a graph to enlarge over time (just as many social networks grow over time).</li><li><strong>Preferential Attachment </strong>describes how new nodes in the graph tend to connect with existing nodes that are high-degree in nature —analogous to the popular saying “the rich get richer”. More specifically, the probability of forming an edge with an existing node is proportional to the node’s degree. As such, BA graphs tend to have high-degree “hub” nodes, although most nodes in BA graphs have fewer connections.</li></ul><p>By nature of their interconnectivity, BA graphs are better at modeling some real-world events than ER graphs. A few examples are listed below.</p><ul><li><strong>Citation Networks</strong>: Citation networks are “scale-free”, meaning their degree distribution contains a few high-degree hub nodes. Citation networks also tend to display preferential attachment — for example, new papers are likely to cite results from existing, well-known papers.</li><li><strong>Internet Networks: </strong>Hub nodes in BA graphs are similar to high-traffic websites and social media networks that receive many regular visits. Newer websites are more likely connected to hub nodes over other smaller websites (i.e. there are more outgoing links from a hub node like Google to a smaller website).</li></ul><p>Now that we understand BA graphs well, let’s see how we can leverage PyG to randomly generate them!</p><blockquote><strong>Implementation</strong></blockquote><p>PyG’s implementation of BA graphs is quite similar to that of ER graphs, albeit with different parameters. BA graphs are defined by two required parameters:</p><ul><li>n (int): the total number of nodes in the graph.</li><li>e (int): the number of edges a new node forms with existing nodes.</li></ul><pre># Below is PyG&#39;s implementation of the BA Random Graph Generator<br><br>from torch_geometric.utils import barabasi_albert_graph<br>class BAGraph(GraphGenerator):<br>    r&quot;&quot;&quot;Generates random Barabasi-Albert (BA) graphs.<br>    See :meth:`~torch_geometric.utils.barabasi_albert_graph` for more<br>    information.<br><br>    Args:<br>        num_nodes (int): The number of nodes.<br>        num_edges (int): The number of edges from a new node to existing nodes.<br>    &quot;&quot;&quot;<br>    def __init__(self, num_nodes: int, num_edges: int):<br>        super().__init__()<br>        self.num_nodes = num_nodes<br>        self.num_edges = num_edges<br><br>    def __call__(self) -&gt; Data:<br>        edge_index = barabasi_albert_graph(self.num_nodes, self.num_edges)<br>        return Data(num_nodes=self.num_nodes, edge_index=edge_index)<br><br>    def __repr__(self) -&gt; str:<br>        return (f&#39;{self.__class__.__name__}(num_nodes={self.num_nodes}, &#39;<br>                f&#39;num_edges={self.num_edges})&#39;)</pre><p>Below, we’ve constructed a BA graph (n=10, e=1):</p><pre>from torch_geometric.datasets.graph_generator import BAGraph<br>ba_graph_generator = BAGraph(10, 1)<br>ba_graph = ba_graph_generator()<br>plot_graph(ba_graph)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/502/0*WA7P7rLHoNBGJjW0" /><figcaption>BA (n=10, e=1) Graph</figcaption></figure><p>With a small value of e, we are more likely to observe a possibly unconnected graph with only a small number of high-degree nodes. The unconnectedness may seem counter-intuitive — when node 1 is added to the graph, shouldn’t it connect to node 0? The answer is no. If we look at <a href="https://pytorch-geometric.readthedocs.io/en/latest/_modules/torch_geometric/utils/random.html">torch_geometric.utils.random.erdos_barabasi_albert_grap</a>h, we’ll see that when e is small, depending on the initialization of the nodes and edge connections, some nodes may not be connected at all!</p><p>As e increases to 20 (similarly n is changed to 50), we notice many more high-degree hub nodes in the BA graph. The nodes with fewer edges tend to be the ones added earlier in the graph’s formation. This is because there were only a limited number of existing nodes for connection at that time.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1016/1*QKXjoaZaeyz3awGf5fzmfA.png" /><figcaption>BA (n=50, e=20) Graph</figcaption></figure><h4>Stochastic Block Models</h4><p>Stochastic Block Models (SBM) are the final class of random graphs, distinguished primarily by their node clusterings. In SBMs, nodes are divided into a predetermined number of groups, and the nodes within each group share similar properties. Node clusterings in a SBM have a high degree of connectivity within each group (these edges are referred to as <strong>intra-group</strong> connections). In most SBMs observed in the real-world, edges from one group to another (<strong>inter-group</strong> connections) occur less frequently than intra-group connections.</p><p>Due to their high degree of clustering connectivity, SBMs are useful in community detection applications. A few examples are listed below.</p><ul><li><strong>Social Networks: </strong>In social network analysis, individuals are often grouped into communities depending on their shared interactions. SBMs can help discover patterns and underlying trends between users in similar groups, and help inform strategies to cluster future nodes together.</li><li><strong>Recommender Systems: </strong>SBMs may be used in recommender systems to find clusters of users with a similar purchase history, which may be useful in informing future recommendations for individuals within the group. This community-based approach is able to capture complex patterns in users and items.</li></ul><blockquote><strong>Implementation</strong></blockquote><p>SBM graphs are defined by two required parameters:</p><ul><li>block_size<strong> </strong>([int]): the total number of nodes in each cluster. The length of this list is assumed to be the number of clusters in the graph.</li><li>edge_probs<strong> </strong>([[float]]): the density of edges going from each block to each other block.</li></ul><p>Unfortunately, PyG does not provide a class to randomly generate SBM graphs, as it does for ER and BA graphs. Below, we propose an addition to the PyG module torch_geometric.datasets.graph_generator that does exactly this.</p><pre>from torch_geometric.utils import stochastic_blockmodel_graph<br>from typing import List, Union<br><br># Below is our custom addition of the SBM Graph Generator<br><br>class SBMGraph(GraphGenerator):<br>    r&quot;&quot;&quot;Generates random stochastic_blockmodel_graph (SBM) graphs.<br>    See :meth:`~torch_geometric.utils.stochastic_blockmodel_graph` for more<br>    information.<br><br>    Args:<br>        block_sizes ([int] or LongTensor): The sizes of blocks.<br>        edge_probs ([[float]] or FloatTensor): The density of edges going<br>            from each block to each other block. Must be symmetric if the<br>            graph is undirected.<br>        directed (bool, optional): If set to :obj:`True`, will return a<br>            directed graph. (default: :obj:`False`)<br>    &quot;&quot;&quot;<br><br>    def __init__(self, block_sizes: Union[List[int], torch.LongTensor],<br>                 edge_probs: Union[List[List[float]], torch.FloatTensor],<br>                 directed: bool = False):<br>        super().__init__()<br>        self.block_sizes = block_sizes<br>        self.edge_probs = edge_probs<br>        self.directed = directed<br><br>    def __call__(self) -&gt; Data:<br>        edge_index = stochastic_blockmodel_graph(self.block_sizes,<br>                                                 self.edge_probs,<br>                                                 self.directed)<br>        return Data(num_nodes=np.sum(self.block_sizes), edge_index=edge_index)<br><br>    def __repr__(self) -&gt; str:<br>        return (f&#39;{self.__class__.__name__}(block_sizes={self.block_sizes}, &#39;<br>                f&#39;edge_probs={self.edge_probs}, directed={self.directed})&#39;)</pre><p>The code that generates the SBM exists in the __call__ method. We take advantage of <a href="https://pytorch-geometric.readthedocs.io/en/latest/_modules/torch_geometric/utils/random.html">torch_geometric.utils.stochastic_blockmodel_graph</a> to generate a random set of edge connections sampled from a SBM distribution, which we transform into a Data object — this is eventually returned from the __call__ function.</p><p>Below, we’ve plotted a SBM with parameters specified in the code cell below. Notice how the three clusters of nodes contain a significantly higher amount of intra-group connections than inter-group connections.</p><pre>sbm_graph_generator = SBMGraph([10, 10, 10], [[0.90, 0.05, 0.05], [0.05, 0.90, 0.05], [0.05, 0.05, 0.90]])<br>sbm_graph = sbm_graph_generator()<br>plot_graph(sbm_graph)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/984/1*H9ze5h6igX9HmXxrQM9AxA.png" /><figcaption>SBM Graph (parameters specified above)</figcaption></figure><p>However, when we assign an equal probability of edge connection from each node cluster to every other, we end up with a graph that doesn’t resemble a SBM at all — this is just an Erdős–Rényi graph!</p><pre>sbm_graph_generator = SBMGraph([10, 10, 10], [[0.33, 0.33, 0.33], [0.33, 0.33, 0.33], [0.33, 0.33, 0.33]])<br>sbm_graph = sbm_graph_generator()<br>plot_graph(sbm_graph)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/984/1*YMyfPG6zNfEL8PO38ZSUzQ.png" /><figcaption>SBM Graph (parameters specified above)</figcaption></figure><h3>Data Preparation</h3><p>The section above provides a comprehensive review of random graphs and PyG. We will now propose a practical application of random graphs in the field of random graph detection.</p><p><strong>Question: </strong>Can we design a Graph Neural Network (GNN) capable of accurately classifying the family that a randomly sampled graph originates from? Can we predict the parameters associated with that family?</p><h4>Dataset Generation</h4><p>In order to tackle this problem, we will begin by generating synthetic data from each of the three graph families. To design a robust GNN capable of distinguishing different graph families, we will diversify our dataset by incorporating example graphs with a range of parameters.</p><p>In our code, we implement a class RandomGraphDataset which generates multiple random graphs sampled from a particular graph family with specified parameters. By inheriting from InMemoryDataset, we’ve created a versatile PyG class that is useful for generating random graphs in a multitude of other contexts. The _download, _process, __len__, and __getitem__ functions define this interface, and we’ve adapted these functions for our particular use case.</p><pre>import torch<br>import os<br>from torch_geometric.data import Data, InMemoryDataset<br><br>class RandomGraphDataset(InMemoryDataset):<br>    def __init__(self, graph_generator, num_samples, seed, root):<br>        super(RandomGraphDataset, self).__init__(root, None, None)<br>        self.graph_generator = graph_generator<br>        self.num_samples = num_samples<br>        self.seed = seed<br>        self.root = root<br><br>        if isinstance(graph_generator, ERGraph):<br>            self.graph_generator_params = {&#39;name&#39;: &quot;ER&quot;, &#39;num_nodes&#39;: graph_generator.num_nodes, &#39;p&#39;: graph_generator.edge_prob}<br>        elif isinstance(graph_generator, BAGraph):<br>            self.graph_generator_params = {&#39;name&#39;: &quot;BA&quot;, &#39;num_nodes&#39;: graph_generator.num_nodes, &#39;m&#39;: graph_generator.num_edges}<br>        elif isinstance(graph_generator, SBMGraph):<br>            self.graph_generator_params = {&#39;name&#39;: &quot;SBM&quot;, &#39;block_sizes&#39;: graph_generator.block_sizes, &#39;edge_probs&#39;: graph_generator.edge_probs, &#39;directed&#39;: graph_generator.directed}<br>        else:<br>            raise ValueError(&#39;graph_generator must be an instance of ERGraph, BAGraph or SBMGraph&#39;)<br><br>        file_path = &quot;_&quot;.join([str(value) for value in self.graph_generator_params.values()]) + &quot;_&quot; + str(seed) + &quot;_&quot; + str(num_samples)<br>        # print(file_path)<br>        self.data_list = []<br><br>        # if the file exists in the root folder, load it<br>        if os.path.isfile(os.path.join(self.root, file_path + &#39;.pt&#39;)):<br>            self.data_list = torch.load(os.path.join(self.root, file_path + &#39;.pt&#39;))<br><br>        else:<br>            self._download(file_path)<br>            self._process()<br><br>    def _download(self, download_path):<br>        # set the seed<br>        torch.manual_seed(self.seed)<br>        # generate the graphs<br>        self.data_list = [self.graph_generator() for _ in range(self.num_samples)]<br><br>        # save the graphs<br>        torch.save(self, os.path.join(self.root, download_path + &#39;.pt&#39;))<br><br>    def _process(self):<br>        pass<br><br>    def __len__(self):<br>        return len(self.data_list)<br><br>    def __getitem__(self, idx):<br>        data = self.data_list[idx]<br>        return data</pre><p>The RandomGraphDataset takes in a few important parameters:</p><ul><li>graph_generator: an instance of the graph generator family, which includes the parameters of the family. These parameters are contained as attributes of the graph_generator instance (i.e. graph_generator.num_nodes or graph_generator.edge_prob).</li><li>num_samples: the number of random graphs to generate from the family specified in graph_generator.</li></ul><p>This class will either extract the previously generated data from the file directory, or call the _download function to create and download the data. The random graphs are saved in a .pt file. The names of these files contain the ground truth parameters of the distribution they were sampled from. For instance, the file ER_5_0.3_100.pt<strong> </strong>contains 100 Erdős–Rényi graphs with 5 nodes, whose edges exist with a probability of 0.3. We then save these files in our root directory, so we may access them with ease.</p><p>We create multiple such files of data, where each file consists of 100 randomly sampled graphs from a fixed graph family with a particular set of parameters (for example, one such file was ER_5_0.3_100.pt). For brevity, the code that generates these files is excluded here, but can be accessed through our Google Colab linked above.</p><h4>Incorporating Position-Aware Features</h4><p>Before combining the files into one larger dataset, we critically add positional features to the node embeddings for each graph (via add_position_features). Provided below is our implementation of this function — it’s worth noting that PyG does not currently support this functionality.</p><pre>import torch<br>import numpy as np<br>import networkx as nx<br>from torch_geometric.utils import from_networkx<br><br>def add_position_features(data, num_anchors=10):<br>    # Convert to NetworkX graph<br>    graph = to_networkx(data)<br><br><br>    # Select anchor nodes<br>    anchor_nodes = np.random.choice(graph.nodes, size=num_anchors, replace=False)<br><br>    # Calculate distances to anchor nodes<br>    distances = []<br>    for node in graph.nodes:<br>        node_distances = []<br>        for anchor in anchor_nodes:<br>            try:<br>                distance = nx.shortest_path_length(graph, source=node, target=anchor)<br>            except nx.NetworkXNoPath:<br>                distance = len(graph.nodes)#float(&#39;inf&#39;)<br>            node_distances.append(distance)<br>        distances.append(node_distances)<br><br>    # Convert to tensor<br>    distances_tensor = torch.tensor(distances, dtype=torch.float)<br><br>    # Normalize distances<br>    min_val = torch.min(distances_tensor)<br>    max_val = torch.max(distances_tensor)<br>    distances_tensor = (distances_tensor - min_val) / (max_val - min_val)<br><br>    try:<br>      # Add to node features<br>      data.x = torch.cat([data.x, distances_tensor], dim=1)<br>    except:<br>      data.x = distances_tensor<br><br>    # Add a small noise<br>    data.x = data.x + 0.001 * torch.randn(data.x.shape)<br><br>    return data</pre><p>This step is vital in enabling the GNN to discriminate between random graph families. Without positional embeddings, our experiments have shown that a basic GNN is unable to differentiate between two families of graphs that have a similar number of nodes and average node degree. In the example below, the GNN may take an incorrect shortcut in classifying the graphs by relying heavily on the node count and average degree, but incorporating position-aware features helps mitigate this problem.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/660/1*-Qd_1Nv_KepMDetUPXWvmQ.jpeg" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/660/1*Ohex-Jv9Wxd9D_lpa3raow.jpeg" /><figcaption>A GNN will struggle to distinguish between the two graphs (ER — left, SBM — right) because they have the same average node degree.</figcaption></figure><h3>Modeling (Classification / Regression)</h3><p>We are finally ready to begin our modeling task! We will first embark on a classification task, where our objective will be to accurately predict the random graph family that some arbitrary graph was sampled from. Then, we will attempt to predict the family’s parameters.</p><h4>Classification</h4><p>Our initial task involves solving a graph-level classification task. First, we will compute the individual node embeddings and apply a global pooling operator to generate a graph-level embedding. We will then design a classification head that returns the most probable graph family.</p><p>Here, we consider two architectures to generate embeddings — a simple <strong>Graph Convolutional Network</strong> <strong>(GCN) </strong>and a more sophisticated message-passing architecture, <strong>Graph Isomorphism Network</strong> <strong>(GIN)</strong>. After trial-and-error, we found that the results from the GIN model surpassed the results of the GCN. This was expected — GIN is known to have one of the most powerful message-passing architectures (per the Weisfeiler-Lehman Test).</p><p>The GNN architecture is shown below.</p><pre>from torch.nn import Linear<br>import torch.nn.functional as F<br>from torch_geometric.nn import GINConv<br>from torch_geometric.nn import global_mean_pool<br><br><br>class GIN(torch.nn.Module):<br>    def __init__(self, num_node_features, hidden_channels):<br>        super(GIN, self).__init__()<br>        torch.manual_seed(12345)<br>        self.conv1 = GINConv(Linear(num_node_features, hidden_channels))<br>        self.conv2 = GINConv(Linear(hidden_channels, hidden_channels))<br>        self.conv3 = GINConv(Linear(hidden_channels, hidden_channels))<br><br>        self.lin = Linear(hidden_channels, dataset.num_classes)<br><br>    def forward(self, x, edge_index, batch):<br>        # 1. Obtain node embeddings<br>        x = self.conv1(x, edge_index)<br>        x = x.relu()<br>        x = self.conv2(x, edge_index)<br>        x = x.relu()<br>        x = self.conv3(x, edge_index)<br><br>        # 2. Readout layer<br>        # do sum pool over all nodes in each graph<br>        x = global_mean_pool(x, batch)<br><br>        # 3. Apply a final classifier<br>        x = F.dropout(x, p=0.2, training=self.training)<br>        x = self.lin(x)<br><br>        return x</pre><p>After partitioning our examples into a training, validation, and testing set, we formulated a training and evaluation loop to fine-tune our model.</p><p>The results of our final model are listed below.</p><pre>num_classes = train_dataset.num_classes<br>dataset_x_features = train_dataset[0].x.shape[1]<br><br>model = GIN(num_node_features=dataset_x_features,<br>            hidden_channels=64,<br>            out_channels=num_classes)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ECjTsS45e-hF3QIelihZHQ.png" /><figcaption>Training Loss of the GNN Classifier</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*yOVDnhC9YJ9uVV0cQrdBGw.png" /><figcaption>Training / Testing Accuracy of the GNN Classifier</figcaption></figure><p><strong>Further Study: </strong>Although our classifier performed very well, there are a few areas where we can improve. In particular, our model has difficulty classifying very dense graphs. As we increase the number of edges in a graph, families of random graphs begin to resemble each other. This is a hard problem to solve, and position-aware node embeddings seemed to offer the largest improvement in accuracy. Supplementing this approach with a random-walk strategy may help.</p><p>Moreover, we would love to extend our project to identify whether some real-world graphs can be approximated with random graphs. This would involve supplementing our dataset with positive and negative samples of real-world graphs to help our model better learn the intricacies of this task. Simultaneously, we’d have to increase the complexity of our GNN model. Due to the time constraints of this project, this was not a realistic goal, but it’s something that we’re looking forward to experimenting with in the future!</p><h4>Regression</h4><p>In the final part of our project, we are interested in a regression task — predicting the most likely parameters of the random graph family that an example graph was generated from. However, we quickly realized that estimating the parameters of Barabasi-Albert and Stochastic Block Model random graphs is a very complex task. Instead, we invested our time into designing a GNN that can infer the underlying probability of edge connection parameter p in Erdős–Rényi graphs. We deliberately focused on a data-dense regime for this task, only including ER graph examples in our dataset where the parameter p &gt; 0.5. We found that accuracy decreased significantly for p &lt; 0.5.</p><p>Our regression model closely resembled the GIN model from the classification task. As this was a regression task, we chose to only have a single output channel that returned the predicted p parameter. Additionally, we chose to minimize MSE loss instead of CLE loss. The final model and results are presented below.</p><pre>dataset_x_features = train_dataset[0].x.shape[1]<br><br>model = GIN(num_node_features=dataset_x_features,<br>            hidden_channels=64,<br>            out_channels=1)<br>model = model.to(device) </pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*DNGwKo6S2zuiSm6z4QEJsQ.png" /><figcaption>Regression Training Loss</figcaption></figure><p>Though it’s hard to discern the precise empirical loss from the plot above, our GNN model achieved a remarkably low MSE loss after 100 epochs. Below, we compare the loss of our GNN model with that of the trivial p prediction derived from the average node degree of the graph. While our GNN didn’t perform quite as well, it still showed comparable results. With an enhanced architecture, we’re hopeful to surpass this benchmark.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/900/1*G5T7SfHV4TTLnyeX6RmAZg.jpeg" /></figure><p><strong>Further Study:</strong> One pitfall of our regression model was that it had difficulty estimating the p parameter for very sparse Erdős–Rényi graphs. This makes intuitive sense, as graphs with fewer edges naturally lead to less precise estimates and greater variability. Once again, this is a difficult problem to solve, and likely requires increasing the complexity of the GNN, among overcoming other challenges.</p><p>Looking ahead, we’re eager to continue our work to estimate model parameters for the more intricate random graph families (Barabasi-Albert, Stochastic Block Model). While time constraints prevented us from delving into this area during our current project, it remains an exciting and promising direction for our future endeavors!</p><h3>Conclusion</h3><p>In this project, we explored the field of random graphs — particularly the Erdős–Rényi, Barabasi-Albert, and Stochastic Block Models. We thoroughly examined the characteristics and parameters of each graph family. Our journey included a detailed exploration of the PyG’s source code to generate samples from each random graph family. We also introduced several potential functional enhancements to PyG!</p><p>In the second half of our project, we built a position-aware GNN to predict the family that an arbitrary random graph was sampled from, and began to embark on the process of estimating the parameters. While we achieved some promising results, we recognize the need for more time and a sophisticated model to fully address both challenges. Nonetheless, we’re excited to build off of this work in the future.</p><p>Hope you enjoyed!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=7b3d1fa35ebb" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>