Can ChatGPT assist with computational text analysis of news articles? What I learned as someone without a coding background.

Abigail Chang
The Impact Architects
6 min readApr 24, 2024
Homepage of ChatGPT. “ChatGPT Homepage” by Harvici is marked with CC0 1.0.

Since the launch of ChatGPT in 2022, AI has captivated the public with its ability to answer questions (with varying degrees of accuracy), produce photorealistic images, solve complicated puzzles and equations, and pose new regulatory and legal challenges. Amid an already very saturated AI news environment, I’m here to share another story at the intersection of AI and journalism. This January, I set out to use ChatGPT to assist me in conducting computational text analysis on news articles from various U.S. outlets.

A few caveats — there are many ways to conduct computational text analysis, and I am by no means an expert. I have used Python notebooks to analyze tone, topics, collocation of certain words or phrases, and other article information for large datasets of news stories, but I don’t actually know how to write code. With this in mind, I was interested in whether I could use ChatGPT to bridge this gap. I thought it could be useful to explore whether journalists and researchers without a coding background could rely on the chatbot to advise them on the best ways to conduct certain computational text analysis tasks and then write the code for that analysis. Journalists who were interested in getting a sense of the tone of their reporting across certain topics or identifying instances of non-inclusive language in their reporting, for instance, could export the text of their stories and conduct their own basic analysis without needing to write their own code. I did some preliminary research and found that others had successfully used the tool for similar purposes, so I was optimistic about my prospects.

My dataset was composed of a few hundred articles about ChatGPT and education (which were provided to us by Mercury Public Affairs as part of a project we conducted in partnership with the Walton Family Foundation), with stories published in January 2023 through late September 2024. I then used ChatGPT 3.5 to write code for Python notebooks that allowed me to conduct rudimentary sentiment analysis (essentially giving each article a tone “score” that indicated how positive or negative it was) and run keyword searches. Here are my major takeaways from this experiment:

1. Having your article data formatted correctly and consistently is crucial for speeding up data cleaning and setting it up for further analysis.

The article data that I used for this experiment was originally for a human-coding content analysis project Impact Architects was conducting. Metadata — such as the headline, byline, date of publication, and outlet name — and body text did not have clear markers at their start or end. ChatGPT struggled to produce code that could correctly identify each of these article components and write them to separate columns in a CSV file. It took hours of back and forth with ChatGPT, asking it to write regular expressions or code that interpreted the start and end of an article based on the presence of the word “by” at the start of the byline, before I finally had a usable CSV file of my data. Ensuring that your data is already formatted such that there are clear markers for metadata and for the start and end of each article could save significant time on the front end. Some databases that allow you to export large quantities of news articles (such as Proquest) as plain text files already include regular markers.

2. ChatGPT might “hallucinate” when advising you about your computational text analysis options.

The chatbot often did not recognize when I was asking it for something that was not possible, and instead it would keep producing broken or incomplete code in order to try to do what I asked. This may pose another challenge for someone like me who is trying to use the chatbot to write code because they don’t know how to write it themselves. In order to diagnose certain problems, you sometimes need to know enough about coding to know what’s not possible in specific situations. This issue arose frequently when I was searching for an out-of-the-box sentiment analysis tool that would not require significant additional coding to tailor it to my dataset and when I was trying to figure out a way to test the accuracy of my sentiment analysis.

3. Some kinds of computational text analysis may require checks for accuracy that will be difficult to conduct when you have little to no previous experience.

Once I had cleaned my data, run sentiment analysis, and pulled out articles that contained certain keywords I was looking for, I wanted to check how much I could rely on the tone scores that had been calculated. There are many different tools and methods for conducting sentiment analysis, but a basic explanation of the rules-based approach I used is that it compares a lexicon of words with tone scores to the words in the dataset and calculates how positive or negative the dataset or components of the dataset are. I wanted to get a sense of the reliability of my analysis, so I tried to get ChatGPT to either write code that would allow me to see how many of the words in my dataset appeared in the lexicons used to generate sentiment scores or to recommend other sentiment analysis tools I could use to check my results. A long back and forth produced no working options, however, I know that someone with coding knowledge or more experience with computational text analysis tools might have known what to ask for in order to get the solution they needed. This is another area where a beginner like me might still run into some challenges.

What I took away from this experience is that I do think ChatGPT can be a useful, low-cost computational text analysis assistant if you either have some coding knowledge and know how to ask the chatbot for exactly what you need, or if you don’t have coding knowledge but are only interested in a more basic analysis. For journalists with no coding experience who want to know more about their reporting, the kind of analysis I experimented with might be sufficient.

As the potential use cases and dangers of AI have spun out in every direction, journalists have been quick to provide coverage of the issue — and to write about how AI could change the journalism industry itself. It seems like every day journalists are talking about new applications for AI in the newsroom, how it will speed up processes and allow for greater innovation, how it has thus far fallen short, or how it could spell the end for journalism as we know it. But ChatGPT also has the potential to be very useful for simpler computational text analysis applications — in my experiment, it worked well for identifying key words and pulling out the list of articles containing them. It could help journalists and newsrooms identify themes and create “features” composed of sets of words that would allow them to better understand what topics they report on, what topics often appear together, and how coverage of topics has shifted over time or following relevant events without being limited to pre-existing backend article tags. If outlets have specific questions about content inclusivity, it also could be useful for finding instances of non-inclusive or inaccessible language, an effort we’ve identified as critical in our work with newsrooms in the past.

It’s also worth noting that I did this analysis back in January, and with the pace that AI is changing, it’s likely already somewhat outdated. As AI tools improve, more complicated analysis may become increasingly accessible to journalists and media researchers without a coding or data science background.

--

--