How a Machine Views the World

6 min readJul 31, 2023

In the era of expansive language models like GPT-3 and ChatGPT, exploring how these models represent and perceive various countries has become a crucial endeavor.

Heyo! My name is Pranav Venkit and I am a Ph.D. candidate at Pennsylvania State University pursuing my research in the Ethics and Biases of NLP models.

In this short article, I will put down in words the analysis and insights of my works that look into understanding how a text generation model views the world.

Today, I want to delve into something fascinating and crucial in the world of AI and language models like ChatGPT. It’s all about how these massive models can pick up biases based on the data they’re trained on. And guess where this data most likely comes from? The absolute chaos of a dimension called ‘The Internet’.

I’ll be exploring some straightforward methods to identify these biases and, more importantly, what kind of impact they can have on our society. It’s like peeking behind the scenes to see what’s really going on in the world of AI. We will also see if data from the internet is actually representative of the world as well.

Why is this any different from reading the said published works¹ ², you ask? Well (assuming this is the exact question that crossed your mind while reading this blog), the reason is that I wanted to write down my personal opinions of the results, which would not have been a good platform for a scholarly article. Plus I wanted an excuse to put up a blog post.

So, let’s get to it. First, how can we understand how a large language model, like ChatGPT, views the world? Simple. We get it to talk as much as possible.

For this study, I employed GPT-2, which allows free API usage without limitations, unlike other available models. Using a ‘fill in the blanks’ approach, the prompts were structured as ‘<demonym> people are’, with <demonym> being replaced by various nationalities, like ‘American people are’ or ‘Mexican people are’. We then ask the model to write based on the incomplete prompt.

Examples of short sentences produced by GPT-2 on passing the prompt: “<Demonym> people are.”

These selected examples aptly illustrate our primary objective. It is evident that large language models form perceptions about the world, even expressing opinions about specific populations. However, we must question the validity of these learned perceptions. Are they accurate reflections of reality, or do they carry biases and stereotypes?

To understand this better, we prompted the model to generate 100 unique articles for each country to obtain a total of 194,000 articles to analyze. Noice!

Rule 42: Never say no to more data.

The first step of the analysis was to understand the sentiment of the text. Now the term ‘sentiment’ can be really confusing (definitely will be doing a post on this soon), but for the scope of this article, we define sentiment as an individual’s opinions, attitudes, and emotions via the treatment of subjectivity in text, and we will be using sentiment analysis to obtain this said ‘sentiment’ to understand the text better. We do this using VADER, a publicly available sentiment analysis tool.

The sentiment of the stories generated by GPT-2 for each country.

The above visualization (click on the link for an interactive version) shows how the model views the world using the lens of sentiment. This may not be an accurate depiction of the current state of the country and can be due to wrongful assumptions during training.

Large language models, like GPT-2, are trained on large datasets from the internet. Now the issue is, internet access is not a privilege accessible by all. Prior studies have also shown how large datasets based on texts from the Internet overrepresent hegemonic viewpoints and encode biases potentially damaging to marginalized populations³.

Hence, a crucial aspect influencing these models’ perspectives of the world lies in their alignment with the collective viewpoint of the internet. To verify this, we can conduct a swift statistical test of significance using the internet population data from different countries. The null hypothesis in this case posits that there is no difference in sentiment scores for the text generated by GPT-2 across countries with varying internet user populations.

Our goal now is to disprove our null hypothesis to prove our point.
(We get better by proving ourselves wrong. Science!)

We obtain the total internet user population of each country from the World Bank Open Data. I highly recommend checking this out to see the various datasets available for analysis.

Sentiment scores grouped by Internet Usage. (*) represents the significance codes of the t-test: 0.001 ‘***’ 0.01 ‘**’ 0.05 ‘*’.

In the above table, we group all the countries by their respective internet user population into the following categories: High, Upper-Middle, Lower-Middle, Low, and NA (for countries with no data on their internet user population). We clearly see that countries with a greater internet user population tend to have more positive stories generated by GPT-2.

A statistical t-test (used to check the significance of the difference between the scores) also shows that the sentiment scores between each group are significantly different to disprove our hypothesis (Math!). Now this is clearly an issue as the actual state of each country is completely overlooked while the perceptions of the ‘majority’ is taken into stronger consideration.

Why is this an issue?

Well for many reasons…
Firstly, isn’t it unjust for someone to define your identity solely based on the opinions of others? Secondly, what if these perceptions are flawed due to a lack of genuine effort in comprehending who you truly are? Such erroneous assumptions are at the core of perpetuating stereotypes. Thirdly, the prevalence of models like GPT-2 is widespread, founded on the presumption that they inherently convey truth. Yet not everyone comprehends the fallibility of this assumption.

So, for readers who need a ‘moral of the story’…
It is essential to approach content generated by language models with skepticism. Always take such information with a grain of salt and remember that nothing should be considered as absolute truth until you verify it independently. Fact-checking and critical thinking are crucial in navigating the vast sea of information available today.

Nothing is the truth until you verify it to be.

Through this little snippet, I show only a small facet of what we do in our work. I will be putting up more of our analysis and my insights on the same, in the future. But if you do get too curious, you can always check out the works mentioned in the reference.

The ongoing research on ‘Nationality Bias in Large Language Models’ is being conducted at Pennsylvania State University, with the valuable contributions of numerous talented collaborators. Their collective efforts are aimed at shedding light on the issue of nationality bias, addressing its implications, and striving to promote a more equitable and inclusive society.

Researchers in the project:
Pranav Venkit, Sanjana Gautam, Ruchi Panchanadikar, Dr. Ting-Hao ‘Kenneth’ Huang, and Dr. Shomir Wilson

Please feel free to contact us if you have any questions. We’d love to hear from you on conversations related to the ethics and biases in AI and NLP models.

References

[1] Venkit, Pranav Narayanan, et al. “Nationality Bias in Text Generation.” Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. 2023.

[2] Venkit, Pranav Narayanan, et al. “Unmasking Nationality Bias: A Study of Human Perception of Nationalities in AI-Generated Articles.” Proceedings of the 6th AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society. 2023.

[3] Bender, Emily M., et al. “On the dangers of stochastic parrots: Can language models be too big?🦜.” Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. 2021.

How a Machine Views the World

In the era of expansive language models like GPT-3 and ChatGPT, exploring how these models represent and perceive various countries has become a crucial endeavor.

Written by Beyond Words and Algorithms