Graph Representational Learning: Creating node and graph embeddings — Part 2

Dhaval Taunk
3 min readJul 1, 2024

--

In the previous blog post, I covered various techniques for node-level and graph-level embeddings, explaining their intuition and training methods. Now, we’ll delve into coding some of these techniques in Python. Let’s begin!

1. Node2Vec

First, we’ll begin with Node2Vec. We’ll use NetworkX to create a random graph, then proceed to train the Node2Vec algorithm by generating random walks over the graph using Word2Vec from the gensim package.

  1. Install the necessary packages
pip install networkx node2vec

2. Next, we create the input graph using the NetworkX package.

import networkx as nx

G = nx.fast_gnp_random_graph(n=100, p=0.5)

The above technique creates a graph with 100 nodes. The parameter p defines the probability of two nodes being connected to each other. Therefore, this graph won’t be fully connected. Instead, it should have approximately half the edges compared to a fully connected graph with 100 nodes. You can adjust the p value as desired.

3. Now, we will initialize a Node2Vec class that takes the generated graph as input and generates random walks over the graph.

from node2vec import Node2Vec

node2vec = Node2Vec(G, dimensions=64, walk_length=30, num_walks=200, workers=4)

In the above code, you can see that I am creating random walks with a length of 30 and a total number of walks equal to 200. The embedding size is set to 64 in this case. Feel free to adjust these parameters according to your specific use-case.

4. Next, let’s fit the created graph and train the model on it.

model = node2vec.fit(window=10, min_count=1, batch_words=4)

5. The next step will involve saving the trained model in order to extract embeddings from it.

model.wv.save_word2vec_format("embeddings_node2vec.txt")

6. To extract embeddings from the model, you can use the following code:

embeddings = {str(node): model.wv[str(node)] for node in G.nodes()}

Now feel free to experiment with it as you like.

DeepWalk

The second algorithm we’ll discuss is DeepWalk. The coding approach remains mostly the same as Node2Vec. The difference lies in the walking strategy, where DeepWalk employs biased random walks instead of the random walks used in Node2Vec.

  1. Installing packages
pip install networkx karateclub

2. Importing required packages

from karateclub import DeepWalk
import networkx as nx

3. Creating a graph

G = nx.fast_gnp_random_graph(n=100, p=0.5)

4. Initialize DeepWalk class

model = DeepWalk(dimensions=64, walk_length=30, num_walks=200, workers=4)

5. Fitting the model

model.fit(G)

6. Get the embeddings

embeddings = model.get_embedding()

So that’s how you create DeepWalk embeddings. Feel free to experiment with this approach.

Graph2Vec

The last algorithm I’m going to discuss is Graph2Vec. It differs slightly from the previous two algorithms because it creates graph-level embeddings instead of node-level embeddings.

  1. Installing required packages
pip install karateclub networkx

2. Importing the packages

import networkx as nx
from karateclub import Graph2Vec
import os

os.makedirs('graphs', exist_ok=True)

3. Creating the graph

for i in range(5):
G = nx.fast_gnp_random_graph(n=10 + i, p=0.5)
nx.write_gml(G, f'graphs/graph_{i}.gml')

4. Creating a list of graphs for training

graphs = []
for i in range(5):
G = nx.read_gml(f'graphs/graph_{i}.gml')
graphs.append(G)

5. Fitting the graph using Graph2Vec algorithm

model = Graph2Vec(dimensions=64, wl_iterations=2, attributed=False)
model.fit(graphs)

6. Extracting the embeddings

embeddings = model.get_embedding()

for idx, embedding in enumerate(embeddings):
print(f'Embedding for graph_{idx}: {embedding}')

So that’s how you learn Graph2Vec embeddings.

For now, that’s it from my side. I hope you enjoyed this blog. Stay tuned for more important topics in the next one.

--

--

Dhaval Taunk

MS by Research @IIITH, Ex Data Scientist @ Yes Bank | Former Intern @ Haptik, IIT Guwahati | Machine Learning | Deep Learning | NLP