Corerec vs. Twitter : My Engine About to Drop the Mic 🎤

Vishesh Yadav
7 min readMay 26, 2024

--

Yo Fam, peep this : Graph analysis and recommendation systems are the real deal in today’s tech game, from social media to online shopping. I’m diving deep into the graph scene using two heavy hitters: VishGraphs and CoreRec. Inspired by the Twitter squad’s SimClusters paper, these tools bring the heat with sick visuals, deep analysis, and dope recommendations. Let’s vibe, shall we?

Introducing VishGraphs

VishGraphs is a versatile Python library designed for graph visualization and analysis. It simplifies complex graph-related tasks and offers features such as generating random graphs, drawing graphs in both 2D and 3D, and analyzing graph properties.

Installation

(only for limited users)

Watch Readme of Repository CoreRec for installation : click_here

Repository Link : click_here

[PAUSED] You can easily install VishGraphs via pip:

pip install vishgraphs==0.2

Usage

Generating Random Graphs

VishGraphs allows you to generate random graphs with ease. Here’s a simple example:

import vishgraphs as vg

Generate a random graph with 10 nodes and save it to a CSV file

graph_file = vg.generate_random_graph(10, "random_graph.csv")
Dataset Created by above code as a preprocessor for our graphs ( right now its graph with n=100> )

Drawing Graphs

Visualizing graphs is straightforward with VishGraphs. You can draw graphs in both 2D and 3D, highlighting specific nodes if needed:

import vis_graphs as vg
adj_matrix = np.loadtxt(file_path, delimiter=",")
top_nodes = vg.find_top_nodes(adj_matrix)
vg.draw_graph_3d(adj_matrix, top_nodes)

Draw a Bipartite Graph

# Load the generated graph from the CSV file
adj_matrix = vg.bipartite_matrix_maker(graph_file)

Draw the graph in 2D

vg.draw_graph(adj_matrix, nodes, top_nodes) 
Red Nodes are popular nodes with largest number of connection through both sides

Draw the graph in 3D

Making a graph in 3D has never been this easier with any dataset. Here i am again using a generate_random_graph feature of vishgraphs to generate 3d graphs.

Blue Nodes are popular nodes with largest number of connection through both sides

Exploring CoreRec : An Intelligent Recommendation Engine

CoreRec complements VishGraphs by providing functionalities for graph analysis and recommendation. I Created CoreRec because my main goal was not to write function to find popular nodes created by vishgraphs but to observe the overall pattern through multihead attention of transformer architecture which has given birth to GraphTransformers

Let’s explore its main features:

Recommendation System

CoreRec offers a robust recommendation system based on graph analysis. It can recommend similar nodes within a graph, aiding in various applications such as personalized recommendations in social networks or product recommendations in e-commerce platforms.

import core_rec as cr
import numpy as np

Assuming 'adj_matrix' is the adjacency matrix of a graph

adj_matrix = np.array([[0. 0. 1. ... 1. 0. 1.]
[0. 0. 0. ... 1. 1. 1.]
[0. 0. 0. ... 0. 0. 0.]
...
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]])

#took a adj_matrix of graph with 40 nodes built through vishgraphs

Lets Visulize this adj_matrix first

adj_matrix = np.array([[0. 0. 1. ... 1. 0. 1.]
[0. 0. 0. ... 1. 1. 1.]
[0. 0. 0. ... 0. 0. 0.]
...
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]])
strong_relations, top_nodes = vg.find_top_nodes(adj_matrix)
vg.draw_graph_3d(adj_matrix, top_nodes) # Pass both adj_matrix and top_nodes
print(top_nodes)

Recommend similar nodes for a specific node

node = 2
recommendations = cr.recommend_similar_nodes(adj_matrix, node)
print(f"Recommended nodes for node {node}: {recommendations}")

Training Transformer Models for Graph Data

CoreRec enables training Transformer models tailored for graph data. These models can be trained for various graph-related tasks such as node classification or link prediction.

import torch
import torch.nn as nn
from torch.utils.data import DataLoader
import core_rec as cr

Define parameters for the Transformer model

num_layers = 3
d_model = 128
num_heads = 4
d_feedforward = 256
input_dim = 20 # Assuming input dimension

Initialise the Transformer model

model = cr.GraphTransformer(num_layers, d_model, num_heads, d_feedforward, input_dim)

Create a dataset for graph data

dataset = cr.GraphDataset(adj_matrix)

Define loss function, optimiser, and other training parameters

criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
num_epochs = 15

Train the model

cr.train_model(model, DataLoader(dataset), criterion, optimizer, num_epochs)
RedNodes : TopNodes && GreenNodes: RecommendedNodes to target ie Node_2
Bipartite Network Training: Loss Trends and Node Recommendations

Observe : For Node 2 Our CoreRec has Recommended Node 18, 34, 22, 27, 39, 1, 2, 4, 10, 32

Backend of CoreRec (That’s How The Model is being Tortured 🫣)

The nodes recommended by the model are based on the patterns it has learned during training. To better understand this, let’s break down the logic and process:

GraphTransformer: Precision in Processing, Fine-Tuning Relationships, and Decoding Node Relevance.

1. Model Architecture

The `GraphTransformer` is a neural network designed to process graph data. It takes an input graph, processes it through several layers, and outputs scores for each node. The architecture consists of:

  • Input Linear Layer: This maps the input dimensions (features of each node) to a higher-dimensional space suitable for the transformer.
  • Transformer Encoder Layers: These layers capture the relationships and patterns within the graph. The transformer mechanism, with attention heads, allows the model to focus on different parts of the graph to learn complex interactions.
  • Output Linear Layer: This produces scores for each node, indicating how suitable each node is as a recommendation for the given input node.

2. Training Logic

During training, the model learns to predict the adjacency matrix row of each node, which indicates the connections (or relationships) of that node with others. The loss function (MSE loss in this case) measures how well the predicted adjacency matrix matches the actual one. Over many epochs, the model adjusts its weights to minimize this loss, thereby learning the underlying structure and relationships in the graph.

3. Recommendation Logic

When predicting recommendations for a given node, the model outputs scores for all nodes. These scores can be interpreted as the model’s confidence or likelihood of each node being relevant or connected to the given node. The nodes with the highest scores are recommended.

Why These Nodes are Recommended
The nodes recommended by the model are those with the highest scores, indicating they are the most relevant according to the model’s learned patterns. These patterns could include direct connections, shared neighbors, structural roles in the graph, etc.

How to Interpret the Scores

1. Direct Connections: Nodes directly connected to the input node in the graph are likely to have higher scores.
2. Neighborhood Similarity: Nodes with similar neighborhoods (i.e., they share many common neighbors with the input node) may also be recommended.
3. Graph Structure: Nodes that play similar roles in the graph (e.g., central nodes, hubs) might be recommended even if they are not directly connected to the input node.

And that’s a wrap!

Graph analysis and recommendation systems are powerful tools for extracting insights and making informed decisions from graph data. With VishGraphs and CoreRec, you have the necessary tools at your disposal to explore, analyze, and recommend nodes within graphs effectively. Whether you’re a researcher, data scientist, or enthusiast, these libraries empower you to unlock the potential of graph data in your projects.

Updates : In further Blogs i will surely bring some judging methods to judge corerec …..so stayyyy tuned !!

Solid proof that I’m all about that open-source!

Functional Architecture of CoreRec

Sure, let’s give a round of applause for…

Special thanks to:

  • Venu Satuluri , Yao Wu , Xun Zheng, Jimmy Lin, Yilei Qian, Brian Wichers, Qieyun Dai, Gui Ming Tang, Jerry Jiang The engineers at Twitter for their research paper on SimClusters, which provided valuable insights into graph analysis and recommendation systems.
  • Vishesh Yadav (@vishesh9131) for the foundational work on VishGraphs and CoreRec libraries, which inspired this project.
  • Andrej Kalapathy for his Transformer implementation, which served as a reference for the implemetation Transformer/encoder-decoder fn in this project.

Vishesh Says

“Hey readers, you’re at the bottom of this Medium blog, whether you find it interesting or you’re just really fast at scrolling! 🤣 By the way, I hope you liked these libraries. I know they’re really basic, but I haven’t thought to LIVE them yet; they’re only for my research purposes. But now, with the objective of knowing what you folks think of this, I’ve launched it. I hope you like it. If not, then make your own! 🤣 Signing off, bye…”

Oh, because clearly, I’m just overflowing with gratitude for CSGO Duck’s invaluable assistance with her staring competition with the codes.

Thank you ;

--

--