Uncovering Environmental Social & Governance Themes in Tweets: A Co-occurrence Network Analysis.

Raphael Apeaning
Data And Beyond
Published in
5 min readApr 13, 2023

Photo by Feri & Tasos on Unsplash

In today’s investment landscape, Environment, Social and Governance (ESG) criteria have emerged as a significant element for investors aiming to align their activities with more sustainability and ethical values. Companies that prioritize ESG principles are often seen as more sustainable, ethical, and forward-thinking, which can lead to improved reputation, stakeholder engagement, and financial performance. The ESG framework is a multidimensional and dynamic topic that encompasses a wide array of concepts and practices related to sustainability, responsible investing, and corporate accountability. To learn more about ESG refer to the primer below.

This blog offers a step-by-step guide on how to perform a co-occurrence network analysis using ESG-related hashtags in Python. The purpose of conducting this analysis is to visualize the significant themes present in ESG tweets by displaying the connections between commonly used words.

1. Harvesting Twitter Data

The first step of the process involved harvesting tweets related to the ESG using the snscrape library. My initial inspection of #ESG revealed that the hashtag conveys a wealth of information by co-occurring with other hashtags. In the code below I scraped a total of 150,000 tweets using #ESG as a search term (i.e. #ESG). Essentially, the code scrapes the associated hashtags highlighted in the #ESG tweet example shown in the figure below.

df = pd.DataFrame(itertools.islice(sntwitter.TwitterSearchScraper('#ESG since:2022-07-31 until:2023-03-31 lang:en').get_items(),150000))
tweet_df=df['hashtags']
Examples of hashtags associated with #ESG

2. Preprocessing ESG Tweets

In the next step, I preprocessed the hashtags using the function clean_tweet shown in the code snippet below. This function tokenizes the hashtags and stems the tokens to their root meaning to help reduce redundancies and improve the connection between the words. It is worth noting that stemming may not always be accurate and can sometimes produce incorrect or inconsistent results. But in this case, stemming helped improve the analysis.

def clean_hashtags(tweet):
tweet=str(tweet)
tweet = re.sub(r"[^\w\s]", "", tweet) # Remove punctuation
tweet = tweet.lower() # Convert to lowercase
tweet = re.sub(r"\s+", " ", tweet).strip() # Remove extra whitespace
tweet_list = []
stop_words = nltk.corpus.stopwords.words('english')
stop_words.extend(['esg','amp'])
for token in re.split('\W+', tweet):
if token not in stop_words :
tweet_list.append(PorterStemmer().stem(token))
return tweet_list

3. Building Co-occurrence Table

The function pair_words is used to create a co-occurrence table. This function iteratively pairs the list of words in each tweet and aggregates the frequency for each pair.

def pair_words(tweet):
word_list=tweet.tolist()
pair_words = []
for words in word_list:
words_ = list(set(words))
for i in range(len(words_)-1):
for j in range(i+1, len(words_)):
word_i = words_[i]
word_j = words_[j]
if word_i < word_j:
pair_words.append([word_i, word_j])
else:
pair_words.append([word_i, word_j])
pair_words_df = pd.DataFrame(data = pair_words, columns=['Source', 'Target'])
pair_words_df = pair_words_df.groupby(['Source','Target']).size().sort_values().reset_index()
pair_words_df=pair_words_df.rename(columns={0:'Weight'})
return pair_words_df

4. a . Exploring the Network Features

Before creating the co-occurrence network it is important to explore the attribute of the network. To this end, I used the Networkx library to create a network graph object. The output shown below indicate that the network graph is made up of 36453 nodes and 361749 edges. The reported network density is close to zero (i.e. 0.0005) which implies that the graph is very sparse.

The figure below shows the top 10 most connected words using the measure of degree centrality. By far, the term “sustain” is the most connected hashtag and other hashtags such as investment, climate change, net zero & SDG also have substantial connections.

Top 10 Hashtags by degree centrality Source: Author

4b. Building the Co-occurrence Network Graph

I used the community_network_viz function to create a network of co-occurring hashtags. Due to the sparse nature of the graph, I included an input variable K, which can be used to truncate the co-occurrence table.

def community_network_viz(df,K):
df=df.loc[df['Weight']>K]
max_=max(df['Weight'])
G = nx.from_pandas_edgelist(df, source = "Source", target = "Target",edge_attr = "Weight",create_using = nx.Graph())
partition=community_louvain.best_partition(G)
nx.set_node_attributes(G, partition,'group')
for _, row in df.iterrows():
G.add_edge(row['Source'], row['Target'], weight=row['Weight']/max_)
d = dict(G.degree)
d.update((x, 1*y) for x, y in d.items())
#Setting up size attribute
nx.set_node_attributes(G,d,'size')
net_com = Network(height="600px",
width="75%",
directed=False,
notebook=True,
neighborhood_highlight=True,
select_menu=True,
bgcolor="#36454F",
font_color='white',
layout=None,
#heading="Pairing of ESG Hashtags",
cdn_resources="remote")
net_com.repulsion()
net_com.set_options("""var options = {"edges": {"color": {"inherit": true},"font": {"size": 50,"strokeWidth": 3},"scaling": {"max": 14},"smooth": false},"interaction": {"tooltipDelay": 100}}""")
net_com.from_nx(G, )
net_com.save_graph('ESG_com.html')
net_com.show('ESG_com.html')
return display(HTML('ESG_com.html'))

The output of the function defines the size attributes for each node using the co-occurrence frequency as weights — while implementing a Louvain community detection to extract clusters within the network. The result of the co-occurrence network is shown below.

ESG Tweet Network Analysis
Co-occurring ESG Hashtags Source: Author

The interactive version of the network graph can be accessed using the link below.

5. Important Takeaways and Concluding Remarks

A thorough investigation of the interactive network reveals the following notable insights.

a. ESG is primarily connected to sustainability themes including climate action, circular economy and technological trends (such as the deployment of electric vehicles and renewable technologies).

b. The Social and Environmental dimensions of ESG are closely connected (i.e. both themes belong to the same community). Notably, both themes are connected to the global sustainable development agenda (i.e. agenda2030)

c. ESG trend conveys negative sentiments related to hashtags such as “greenwashing”,“invasionusa”,“americanisfallen”, “climatescam” and many more. These negative connotations are mainly connected to the social and environmental dimensions.

d. The governance dimension of ESG is primarily related to organizational accountability and leadership themes such as “audit committees”, “board effect”, “C-suite” and many more.

This blog discusses how to create a co-occurrence network graph using tweets to gain instructive insights about ESG (Environmental, Social, and Governance) themes. It is worth noting that the insights may be limited due to the small sample of data used. The link below provides the complete code for reference, and I welcome comments and suggestions. Also, if you find the blog informative, please up-clap. Thank you for reading!

Reference :

  1. https://programminghistorian.org/en/lessons/exploring-and-analyzing-network-data-with-python
  2. https://towardsdatascience.com/using-network-science-to-explore-hashtag-culture-on-instagram-1f7917078e0
  3. https://www.lexology.com/library/detail.aspx?g=0da9621f-99a4-4f9a-bc32-bf4df9c33733

--

--