Urban AI
Published in

Urban AI

Does sustainability lack sexiness?

Fig.1 — I created lexicons for six urban topics by extracting the 150 most frequent words associated with each.

At least that’s what suggests my lexical analysis of tweets’ content, conducted over 109 smart-cities worldwide. By using Natural Language Processing techniques, the branch of Artificial Intelligence concerned with the linguistic side of Human-Computer Interaction, I detected that the sustainability lexicon is by far the less used between other urban studies topics such as infrastructure, governance, or entrepreneurship.

In Democracy Studio I am sharing a complete analysis of spontaneous communications by the users of the social media platform Twitter, mentioning in a hashtag one of the 109 smart-cities listed in the Smart City Index 2020. The original datasets represent 110,862 tweets gathered from the four corners of the globe, totalizing 19,184,388 words associated with one name of a smart city. From the extraction of very basic lexical features such as the number of words in a tweet, the average length of a sentence, to more advanced ones like the polarity of sentiment expressed, NLP techniques allow me to point some interesting correlations between the 29 variables collected from tweets. For example, the average number of stopwords is highly correlated with all sentiment scores, which means that the more stopwords we can found in a tweet, the more it is expressive of an opinion. Similarly, the average number of numerics, hashtags, and punctuation are correlated with the average tweet length and the average number of words, but none of them have a relationship with the intensity of the sentiment expressed.

After having cleaned my tweets from all punctuation, numerics, emojis, and stop-words, I finally collected very clean lists of meaningful tokens, representing pretty clear semantics used in each of the 109 smart cities worldwide. I have then been interested in evaluating the use of urban studies vocabulary in online communication of Twitter users. To do so, I have made sedimentation of the most frequent words associated with smart-grid, IoT, urban planning, urban development, innovation, gov-tech, open-data, e-citizenship, empowerment, transportation, mobility, environment, energy, democracy2.0, policy, economy, and business. The resulting lists of words constitute thematic lexicons which are commonly called Bags-of-Words (BoW) when texts of various lengths are represented as a bag of their own words and used as a reference for document classification or topic modeling of other texts.

I have finally combined these lists of words into six BoWs, each of them hosting of the 150 most frequently used words of the following thematics: smart city, civic tech, infrastructure, sustainability, governance, and entrepreneurship. I avoided overlaps between BoWs by assigning cross-field words to a single category, in order to keep away the possibility to count the same word several times. Indeed my method for evaluating the predominance of this or that topic discussed on Twitter in the different smart-cities of the world consists of weighting the Bag-of-words in each city. It means counting the occurrences of words belonging to each thematic. Somehow, this technique is about filtering the 19 million words collected through the different strainers and checking the weight of each BoW at the end, to know much this or that urban studies topic has been discussed online. I have been quite surprised to find that sustainability is by far the less discussed topic of my six (see Fig. 2).

Fig. 2 — Weight of topical lexicons used on Twitter, following the Bag-of-Words technique.

For a few years, I had the feeling that environmental issues had become the number one priority in global policies and that smart cities were among the few front siders tackling it by communicating massively on green solutions supported by institutional marketing forces. Eventually, sustainability vocabulary hosting words like resource, recycling, resilience, or biodiversity represents only 27% of the frequency of the use of infrastructure words like supply, system, storage, or mobility. I would never have bet for that result, especially in spontaneous communications of both official media, politics, and random inhabitants such as Twitter. Assuming that online messaging is most of the time motivated by the social reward evaluated in terms of clicks received on a publication and that this led to the drama of our times: dumb content and fake news often reach more clicks than insightful content, I am wondering why does sustainability lack sexiness?

I haven’t found the answer so far, but I went through a deeper look at this early observation. Is the acknowledgment true to all cities taken individually? Are there some geographical areas in the world where sustainability is more discussed than infrastructure or any other topic?

Taking the average proportion of representation of each topic in each city, by dividing the weight of each BoW by the number of tweets collected in each city, I can confirm first that sustainability vocabulary is less used than the others at the city scale taken individually. While the entrepreneurship lexicon represents 5.11% of the total of all words used in tweets in the city of Singapore, the governance and the civic technology ones represent 5.08% and 4.11% of them respectively in Hong Kong, and the infrastructure and the smart-city ones represent 4.01% and 3.9% of them respectively in Shenzhen, the highest proportion of tweets referring to the sustainability lexicon represents 1.77% of the tweets in the city of Abu Dhabi. On the map below, we can see the regional distribution of the proportion of sustainability vocabulary used in Twitter (see Fig. 3). It shows clearly that the smart cities in the world where the Twitter users communicate the most, in terms of proportion, on the sustainability topic, are located, in decreasing order, in Abu Dhabi in the United Arab Emirates, in Hangzhou, Chongqing, Shanghai, and Nanjing in China, in Singapore, in Oslo in Norway, in Geneva in Switzerland, in Madrid in Spain, in New Dehli in India, and in Gothenburg in Germany.

Fig. 3 — Regional distribution of the proportion of the sustainability lexicon. Intense red is the higher proportion while shaded blue is the lower.

All the method and tools to duplicate such lexical analysis on tweets is detailed in the book Democracy Studio, which goes with online resources of both video tutorials and notebooks of computational code accessible to programming newbies.

Further collaboration between Democracy Studio and UrbanAI is envisioned for the coming months, in the form of a working group on mutual topics of research. Follow us on social networks to stick around!



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Julien Carbonnell

Julien Carbonnell


Urban Developer & Scientist, founder @LandMinis3 // Machine Learning, Tokenized Real Estate