Running GCN Classification on TigerGraph ML Workbench Using Intel-Optimized Libraries

Graphs Are a Fundamental Data Structure to Represent Connected Data and Organize Information

Izabela Irzyńska

Published in

Intel Analytics Software

5 min readMar 2, 2023

Izabela Irzynska and Karol Brejna, Intel Corporation

A simple graph consists of nodes (also called vertices) and edges that connect the nodes. Graph data is all around us. It is used in many domains and applications in our daily lives and can provide valuable insights and connections between different entities and phenomena. Social media platforms use graphs to organize connections among users, as well as the interactions and content shared between them. Online retailers use graphs to represent the relationships between users, items, and ratings to make personalized recommendations to customers. Maps and navigation applications use graphs to represent transportation routes that connect locations, as well as the distances and travel times between them. Graph databases facilitate the analysis of connected data. One of the most popular is TigerGraph, a highly scalable graph database platform.

Artificial Intelligence and Graphs

Artificial intelligence (AI) is increasingly present in our personal or business lives. Organizations use machine learning (ML) and deep learning (DL) to automate and optimize decision making and to enhance customer experiences. AI helps businesses analyze large amounts of data, identify patterns and trends, and make predictions that support decision making, which allows them to operate more efficiently, respond to changing market conditions, and gain a competitive advantage.

Graph neural networks (GNN) connect the worlds of AI and graphs. GNN are a type of neural network designed to operate on and take advantage of graph data. They are particularly useful for analyzing data that has a complex shape, such as social networks or molecular structures. They can be applied to a wide range of graph-related tasks, such as node classification, link prediction, and graph generation. GNN are better for graph data than regular neural networks because they capture the complicated relationships between nodes in a graph. It is possible to use regular neural networks to analyze graph data, but it would require simplifying the structure or “flattening” it to fit into a neural network. Such manipulation can lead to information loss.

GNN operate by using a series of neural network layers to process the graph data. Each layer takes as input the representation of the nodes and edges in the graph from the previous layer and applies a series of transformations to generate a new representation of the graph. These transformations typically involve operations like weighted summation and non-linear activation functions, commonly used in other types of neural networks.

A popular GNN architecture is the graph convolutional network (GCN), which uses a convolutional neural network architecture to process the graph data. This involves applying a series of filters to extract local features from the graph and combining them to generate a new representation of the graph. The ability to effectively process and analyze the structural information in graph data makes GCN particularly well-suited to classification tasks. For example, in a network of social media users, a GCN can identify important nodes (such as influencers) and use this information to classify the users into groups or categories.

TigerGraph ML Workbench

GNN sounds exciting, but how to use it? TigerGraph provide us with a powerful tool called the TigerGraph ML Workbench. It is a tool for creating and deploying ML models on the TigerGraph platform. The ML Workbench is a Jupyter-based Python development framework that makes it easy to explore graph-enhanced ML and GNN because it is fully integrated with the TigerGraph database.

The easiest way to start exploring the ML Workbench is to use a sandbox image that lets us run the TigerGraph database and ML Workbench in a Docker container. Example Jupyter notebooks are available and ready to use once the container is running (Figure 1). There are step-by-step instructions for basics like connecting to the database and data processing as well as more complex examples with model training and inference for different frameworks like GCN, GAT, or GraphSAGE.

Figure 1. The TigerGraph ML Tutorial notebook is a good place to start experimenting with the ML Workbench.

Running the TigerGraph ML Workbench with Intel Extension for PyTorch

Jupyter notebooks make it easy to use the Intel-optimized versions of TensorFlow, PyTorch, scikit-learn, NumPy, pandas and more. We present an example of GCN classification that comes with the sandbox container to evaluate ML Workbench. We will show how to set up and run model training. It uses PyTorch supported by the pyTigerGraph package, which helps to connect to the TigerGraph database and work with graph data. We will show how to use Intel Extension for PyTorch to accelerate training on Intel CPUs.

To get started, run the ML Workbench container:

docker run -it -p 14022:22 -p 8888.8888 -p 9000:9000 -p 14240:14240 -p 6006:6006 --name tgsandbox --ulimit nofile=1000000:1000000 -v ~/tgsandbox:/home/tigergraph/tgsandbox/save tigergraphml/sandbox:1.0.0

Once docker is up and running, a URL will be displayed where the Jupyter notebooks are accessible. Access this URL via your browser.

Follow these steps to activate the database (see https://docs.tigergraph.com/ml-workbench/current/on-prem/sandbox and https://act.tigergraphlabs.com/ for more info):

wget https://act.tigergraphlabs.com/assets/download/latest/linux/mlwb
chmod +x mlwb
./mlwb activate http://127.0.0.1 -u tigergraph -p tigergraph

Once ML Workbench is running, you should be able to access the gcn_node_classification.ipynb Jupyter notebook. The example uses the implementation of GCN from PyTorch Geometric and the model is trained on the Cora dataset from PyG datasets.

Run the following command to install Intel Extension for PyTorch:

pip install intel_extension_for_pytorch

And then use it in your scripts:

import torch
import intel_extension_for_pytorch as ipex
...
model.train()
model, optimizer = ipex.optimize(model, optimizer=optimizer)
...

An easy way to measure the duration of training is to use the Python time package:

import time
dur = []
for epoch in range(10):
    t0 = time.time()
    model.train()
    model, optimizer = ipex.optimize(model, optimizer=optimizer)
    ...
    dur.append(time.time() - t0)
    ...
print('Training finished, took {:.2f}s'.format(sum(dur)))

The training results of the Jupyter notebook with and without Intel Extension for PyTorch are shown in Figures 2 and 3. Accuracy for both training experiments is around 80%.

Figure 2. Example results from running training in gcn_node_classification.ipynb

Figure 3. Example results from running training in gcn_node_classification.ipynb with Intel Extension for PyTorch

The example above shows how to use Intel Extension for PyTorch with the GCN model training provided with ML Workbench. The training itself is not very demanding in terms of hardware resources and does not take a lot of time, but Intel Extension for PyTorch still shows about 1.25x speedup (0.243 seconds compared to 0.354 seconds) on a 2.0 GHz Intel Xeon Gold 6330 processor.

For more information about Intel Extension for PyTorch and its acceleration for different workloads, see https://www.intel.com/content/www/us/en/developer/tools/oneapi/optimization-for-pytorch.html#gs.pnzlax or https://medium.com/pytorch/accelerate-pytorch-with-ipex-and-onednn-using-intel-bf16-technology-dca5b8e6b58f.

Conclusions

AI on graphs is a dynamic and fast-growing domain, and new opportunities emerge from the analysis of these complex structurers and dependencies. TigerGraph ML Workbench is useful tool, based on Python Jupyter notebooks that combined with Intel-optimized libraries for supporting AI boosts performance while preserving accuracy.