Spread No More: Twitter Fake News Detection with GNN

By Li Tian, Sherry Wu, Yifei Zheng as part of the Stanford CS224W course project.

Published in

Stanford CS224W GraphML Tutorials

11 min readMay 15, 2023

This tutorial provides two methods for detecting fake news in a social network like Twitter. Both methods make predictions by leveraging the social network structure that a news piece flows through. A step-by-step implementation can be found in the Google Colab notebook .

Introduction
Dataset
2.1. Description
2.2. Preprocessing
Implementations
3.1. GCN Implementation
3.2. GNN-DP Implementation
Results
Discussion
5.1. Practical Implications
5.2. Design Choices

1. Introduction

Fake news on the internet is an emerging social and political issue. The propagation of inaccurate information can easily flow through the existing online social network. To combat this problem, traditional approaches that depend on human fact-checkers or NLP can fall short on their generalization or computational capacity.

With the rise of Graph Neural Network (GNN), the same network that aided the flow of fake news may be used to stop it. That is, what if a social network can preserve information about the nature of the news pieces that propagate through it?

Inside a social network graph, we treat Twitter users and news pieces as nodes, and we model (re)tweeting behavior as edges. Sitting at the cross section of Graph ML and NLP, our GCN method propages nodes’ text embeddings and pools them to make a classification. Alternatively, borrowing ideas from Convolutional Neural Network (CNN), our GNN-DP methods iteratively aggregates predictions across sub-graphs to produce the final label.

Below is a visualization of applying our GCN model to testing (i.e. unseen) data. As we can see, our methods are able to efficiently distinguish networks associated with real vs. fake news!

Embeddings before and after training, with each row corresponding to the embedding of one news/graph in the test set. The darker the color, the larger the numeric value

2. Dataset

We used the Twitter Fake News Propagation Graph Dataset available via GitHub [1], made available through the paper of the same name [2]. This dataset is also integrated as part of the PyG package as the UPFD [3], or User Preference-aware Fake News Detection dataset.

2.1. Description

The UPFD dataset contains both real and fake news networks on Twitter, and the data is obtained through fact-check organizations such as Politifact and Gossip Cop. We will be using the Politifact data for this tutorial: this dataset contains N=314 graphs, with 157 of them associated with fake news. At a more granular level, we have:

Graph: Every graph is a tree-structured social network
Node: The root node represents a news piece, and the leaf nodes represent Twitter users who have retweeted the news piece at the root.
Edge: Every edge represents a retweet behavior, a user retweets the news either directly or indirectly (i.e. from another user)

2.2. Preprocessing

Since this data has been well implemented into the PyG package, loading and preprocessing the data is straightforward. There are two main aspects we need to take care of. First, we are going to cast the directed social network graph into undirected. Second, we will load in node features: for every user and news node, we will concatenate their profile attribute (10 dimensional) and encoding of their past tweets through BERT (768 dimensional).

from torch_geometric.datasets import UPFD

def load_data(split, feature = None):
  """
  Load train, validation, and test data from the UPFD dataset in PyG. Concact node
  features *profile* and *bert*, which are Twitter user's profile attributes and 
  historical tweets encoded through BERT respectively. toDense transformation is applied
  to return adjacency matrix.

  -------------------------------------
  split: 'train', 'val', or 'test' for retrieving the respective portion of UPFD.
  feature: 'content' or None for which features to retrieve
  -------------------------------------
  Return: PyG dataset object.
  """
  max_nodes = 500 # for convert to dense adj matrix, instead of edge_index
  
  if feature ==  'content':
    return UPFD('/tmp/test', "politifact", feature, split, transform= T.ToDense(max_nodes), pre_transform= ToUndirected())
  else:
    data_profile =  UPFD('/tmp/test', "politifact", "profile", split, transform= T.ToDense(max_nodes), pre_transform= ToUndirected())
    data_bert =  UPFD('/tmp/test', "politifact", "bert", split,  transform=T.ToDense(max_nodes), pre_transform=ToUndirected())
    data_profile.data.x = torch.cat((data_profile.data.x, data_bert.data.x),dim =1)
    return data_profile

3. Implementations

3.1. GCN Implementation

GCN is one of the most robust and classical graph neural networks that seem appropriate for our task. In our implementation, we designed the structure to be as shown above. With user-specified inputs, we create a designated number of convolutional layers, with batch normalization, ReLU, and Dropout in between two consecutive ones. At the very end, we apply a global pooling across all nodes to get the graph-level embedding. This embedding is transformed by a linear layer and a log-softmax for the label.

Code it up!

Since the data available in UPFD is rather limited in size compared to the real world, we do not want that to be a constraint for future use cases of our model. Therefore, we made our GCN adaptable to potentially bigger and more complex graphs.

Our GCN model takes in a dictionary of arguments for initialization and training. Other than the common ones such as dimensions, users may specify input arguments like `num_layers` and `dropout`. In our provided Colab notebook, you may see that we trained with 2 layers of GCN and no dropout, primarily because our toy graphs are relatively shallow. This can be easily adjusted for deeper and larger graphs.

Below is a snippet of how to define our GCN class, and please refer to the training section in our colab for more information on initializing, training, and evaluating the GCN model. The original published implementation can be found in the author’s repository [4].

class GCN(torch.nn.Module):
  def __init__(self, args):
    """
    Initialize a simple GCN with specific parameters. 
    
    By default, this GCN has 2 convolutional layers and 1 linear layer. 
    It also has 1 layer of batch normalization between the two conv-layers.
    
    If num_layers is provided by user, then this function initializes
    corresponding number of convolutional layers, with batch normalization
    in between. Num_layers must be >= 2.

    -------------------------------------
    self: GCN object
    args["num_features"]: dimension of the input
    args["hidden_dim"]: dimension of the hidden layer(s)
    args["num_classes"]: dimension of the output (i.e. number of classes)
    args["dropout"]: percentage of neurons being zeroed, can be None
    args["num_layers"]: number of convolutional layers, must be >= 2

    """

    assert args.num_layers >= 2, "num_layers must be >= 2."

    super(GCN, self).__init__()  

    # Initialize parameters
    self.num_layers = args.num_layers
    self.dropout = args.dropout   

    # Initialize the first convolutional layer
    self.convs = torch.nn.ModuleList([GCNConv(args.num_features, args.hidden_dim)])

    # Initialize batch normalization layer
    self.bns = torch.nn.ModuleList()

    # Initialize batch normalization layer(s) and the rest of the convolutional layer(s)
    for _ in range(self.num_layers - 1):

      # Initialize the batch normalization layer
      self.bns.extend([torch.nn.BatchNorm1d(args.hidden_dim)])  

      # Initialize the second convolutional layer  
      self.convs.extend([GCNConv(args.hidden_dim, args.hidden_dim)])
    
    # Initialize the final linear layer
    self.lin0 = Linear(args.hidden_dim, args.num_classes)

  def forward(self, data):
    """
    One forward pass with GCN.

    -------------------------------------
    data: PyG Dataset data object, with properties like x, edge_index, batch, etc.

    -------------------------------------
    Return: prediction at the end of one epoch.
    """

    # get features and adjacency matrix
    # -- the batch propery associate nodes within one graph together, it takes 
    #    the form of [1,...,1,2,...2,...,n,...,n] with n being the number of
    #    independent graphs in the entire dataset
    out, edge_index, batch = data.x, data.edge_index, data.batch

    # apply one layer of GNN at a time
    for i in range(self.num_layers - 1):

      # convolutional layer
      out = self.convs[i](out, edge_index)

      # batch normalization
      out = self.bns[i](out)

      # non-linear activation with Re-Lu
      out = F.relu(out)

      # drop out if requested
      if self.dropout > 0:
        out = F.dropout(out, training=self.training)
    
    # the last convolutional layer
    out = self.convs[i+1](out, edge_index)

    # apply graph level pooling per batch (each bath is one indenpendent graph)
    # -- embeddings of each batch/graph with k nodes (1 news node, k-1 user nodes)
    #    are pooled with the mean method to generate a batch/graph level embedding
    #    for lebel prediction
    out = gmp(out, batch)

    # the final linear layer
    out = self.lin0(out)

    # soft max for final prediction
    out = F.log_softmax(out, dim=-1)

    return out

3.2. GNN with DiffPool Implementation

Hierarchical Graph Representation Learning with Differentiable Pooling (DP) is another powerful model for graph-level class prediction. Compared to the GCN structure which is inherently flat, the GNN-DP structure hierarchically generates predictions.

Specifically, GNNDP is consisted of two GNNs trained in parallel:

• GNN A processes node embeddings

• GNN B maps nodes to a set of clusters

A differential pooling layer is applied to to process the node embeddings from GNN A according to the cluster assignment from B to obtain the updated node embedding and adjacency matrix.

The full structure of GNN-DP is illustrated step by step below:

Code it up!

Our GNN-DP mode first creates a customized GNN module (consist of GraphSage and Batchnorm layers) for the parallel GNNs. Then, we constructed the Differential Pooling Modules, which calls the pre-defined GNNs for node embedding processing and cluster assignment. We choose to perform differential pooling twice, and reduce cluster size sequentially from 500 to 100 to 20 (20% at a time) before making a final softmax prediction with mean pooling and linear transformation. The cluster reduction rate and the number of diffpool layers to use is a design choice, and we choose the combination that works best with the UPFD data.

Below is a snippet of how to define our GNN-DP class, and please refer to the training section in our colab for more information on initializing, training, and evaluating the model. The original published implementation can be found in the author’s repository [4].

class GCNDP(torch.nn.Module):
  def __init__(self, input_dim, hidden_dim, output_dim):
    """
    Initialize a Graph Convolutional Network with Differential Pooling (GCNDP) with specific parameters. 
  
    The embedding matrix and the assignment matrix of eacg graph are computed by two separate customized GNN models respectively.
    In the 2 DIFFPOOL layer architecture, the number of clusters is set as 20% of the number of nodes before applying DIFFPOOL. 
    As a result, with max_node=500, we reduce the nodes to 100 then 20.
    A final GNN layer is applied to compute the embedding matrix before mean aggregation for each graph.

    The final output is 2 class prediction softmax logits after applying relu activation and an affine layer.
    -------------------------------------
    self: GCNDP object
    input_dim: dimension of the input
    hidden_dim: dimension of the hidden layer(s)
    output_dim: dimension of the output (i.e. number of classes)
    """
    super(GCNDP,self).__init__()
    max_nodes = 500

    num_nodes = ceil(0.2 * max_nodes)
    #note below that gnn1_pool has lin=True for cluster assignment, but gnn1_embed has lin=False
    self.gnn1_pool = GNN(input_dim, hidden_dim, num_nodes, lin=True)
    self.gnn1_embed = GNN(input_dim, hidden_dim, hidden_dim, lin=False)
    
    num_nodes = ceil(0.2 * num_nodes)
    self.gnn2_pool = GNN(hidden_dim *2, hidden_dim, num_nodes, lin=True)
    self.gnn2_embed = GNN(hidden_dim *2, hidden_dim, hidden_dim, lin=False)

    self.gnn3_embed = GNN(2 * hidden_dim, hidden_dim, hidden_dim, lin=False)

    self.lin1 = torch.nn.Linear(2 * hidden_dim, hidden_dim)
    self.lin2 = torch.nn.Linear(hidden_dim, output_dim)
    
  def forward(self, x, adj, mask):
    #first diff pool: s for assignment, x for embedding; train both GNNs simultaneuously
    #note below that gnn1_pool has lin=True for cluster assignment, but gnn1_embed has lin=False
    s = self.gnn1_pool(x, adj, mask)
    x = self.gnn1_embed(x, adj, mask)
    x, adj, l1, e1 = dense_diff_pool(x, adj, s, mask) #out, out_adj, link_loss, ent_loss; out is cluster embedding of size B x C x Feature_dimension

    #2nd diff pool, with reduced assignment size [num_nodes = ceil(0.25 * num_nodes)]
    s = self.gnn2_pool(x, adj)
    x = self.gnn2_embed(x, adj)
    x, adj, l2, e2 = dense_diff_pool(x, adj, s)

    #update embedding again, without calculating new assignemnt
    x = self.gnn3_embed(x, adj)

    #calculate new assignemnt with a linear layer
    x = x.mean(dim=1)
    x = F.relu(self.lin1(x))
    x = self.lin2(x)

    return F.softmax(x, dim=-1), l1 + l2, e1 + e2

4. Results

For the GCN model, we obtained a test accuracy of 0.8371 and an F1 score of 0.8286. For the GNN with Differential Pooling model, we obtained a test accuracy of 0.7692 and an F1 score of 0.7773. The performance of the GCN model is slightly better.

Train and Validation Accuracy for GCN and GNN with DiffPool Model

Train and Validation Loss for GCN and GNN with DiffPool Model

Below, the left graph represents the embeddings before training, and the right graph represents the embeddings after training. Each row represents one news piece, which has either 778 (before training) or 128 (after training) features. The pre-training embeddings of real and fake news look similar, while the post-training embeddings are different across real and fake news. This shows that our model is good at separating the real and fake news.

We also tuned the hyperparameters of the GCN model. We focused on learning rate, batch size, and hidden dimensions. The best model uses a learning rate of 0.001, a batch size of 64, and a hidden dimension of 128.

+-----------------------------------------+---------------+---------+
| Model Parameters                        | Test Accuracy | Test F1 |
+---------------+------------+------------+               |         |
| Learning Rate | Batch Size | Hidden Dim |               |         |
+---------------+------------+------------+---------------+---------+
| 0.01          | 128        | 128        | 0.8281        | 0.8273  |
+---------------+------------+------------+---------------+---------+
| 0.001         | 128        | 128        | 0.8286        | 0.8387  |
+---------------+------------+------------+---------------+---------+
| 0.001         | 64         | 128        | 0.8371        | 0.8386  |
+---------------+------------+------------+---------------+---------+
| 0.001         | 64         | 64         | 0.8326        | 0.8230  |
+---------------+------------+------------+---------------+---------+

5. Discussion

5.1. Practical Implications

Our project shows that GNN models have a high potential for quickly and correctly identifying fake news that situates in a social network. This can be a helpful add-on to the traditional methods such as human fact-checkers or massive NLP models.

This discussion is more relevant to us than ever. With the rise of technology such as GPT-4, as much as it opens up a brand new level of artificial intelligence, information (that is not entirely accurate or is fabricated with more nuances) generation can be automated and performed at a large scale. In part, this can jeopardize efforts to stop fake news propagation. GNN models can play a role in this by leveraging and targeting social networks, where misinformation brings the most harm.

5.2. Design Choices

Fact checking for news is no simple task, so while our models achieve promising results with GNN, we would like to make note of some of the design choices we’ve made along the way.

First, if given more time, we would dive deeper into fine tuning the structure of graph neural networks we built as well as the hyperparameters.

Second, GNN-DP may be a bit of an overkill for a task like ours. Differential pooling — or hierarchical pooling in general — is a very expressive method and can give really good results, but some of our graphs are relatively small, shallow, or without clear local neighborhood boundaries. Nevertheless, with bigger news propagation networks in real life, GNN-DP may still be a good choice.

Last but not least, we modeled our directed graphs into undirected ones while modeling. While GNN models were still able to learn from this, more “niche” models with high performance on directed graphs can also be considered in future works.

6. Appendix

[1] https://github.com/safe-graph/GNN-FakeNews

[2] Dou, Y., Shu, K., Xia, C., Yu, P. S., & Sun, L. (2021, July). User preference-aware fake news detection. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 2051–2055).

[3]https://pytorch-geometric.readthedocs.io/en/latest/_modules/torch_geometric/datasets/upfd.html

[4] https://github.com/safe-graph/GNN-FakeNews/tree/main/gnn_model