Impact of Linguistic choice of words in News articles on our Society

Harish Gandhi
5 min readMay 22, 2018

--

All around the world both good and bad happens, and we get to know only those that are exposed to us. And, that’s the primary responsibility of the media. But the bigger responsibility of these media houses is the way in which they express the content to the people.

A responsible media house’s content should be original, unbiased, free of exaggeration and should be very sensitive in handling the emotions of it’s readers and viewers. A same story could be told in different ways and these different ways could definitely trigger different emotions among it’s readers.

It is known that we become who we are by what we say and what we read. Reading a story that’s filled with positive words would make us feel more positive and vice versa. So the wordings of a content definitely plays an equal role as that of the content itself.

This project aims to answer how some of the major media houses in USA are giving importance to the wordings of their content. The answer would allow the readers to wisely choose their daily source of news that truly cares about its readers.

For detailed information check out the ‘Detailed Research resources’ section.

Assumptions/Target Audience:

  1. Our target audience are prone to ALL the articles published in the home page.
  2. Data has been scrapped from the resources at the same time(since it gets updated regularly).
  3. Only the USA News web market is considered for this research.
  4. CNN, Foxnews, nytimes, huffingtonpost, reuters are the top news websites considered based on the unique visitor count obtained from the research.
  5. Our Sample considers only the articles published in these websites at 10am(CST).

1) Data Extraction/Preparation Phase:

The Data is collected through script which using Newspaper3K API. The script is designed to collect all the articles published at 10am(CST) in the above mentioned news homepage. Here is a sample image of few articles published on 10/17/2017, 10am in Reuters.com

I then pipelined these raw text into CSV forma, segregated into columns(as shown below) for easy exploration.

The data as CSV file has the following columns:

  • TITLE: the Title of the article.
  • SUMMARY: first few lines of the article’s text.
  • TEXT: Full text inside the article
  • URL: web link to the article.
  • KEYWORDS: important words in the article.

It is also to be noted that all the articles published in their webpage doesn’t have to be from their own news editors. For instance reuters article is shared in the homepage of HuffingtonPost.com.

2) Preprocessing/Cleaning Phase:

My concern is to analyze only the textual content of the article. Thus, only the text(from the text column of the csv file) data is tokenized.

Major issue with these billions and billions of content is that most of which are not relevant to our analysis. So we do language preprocessing and then we build a JSON file format storage all these tokenized vocabulary content for faster access of only the relevant tokenized term contents for our analysis phase.

3) Analysis/Model Building Phase:

Let’s check the distribution of negative words(words that have a negative connotation), as shown below. The media house with least projection of these negative words is Foxnews followed by The New York times. They deliver the content in more optimistic way than their counterparts. Thus our Net score is calculated using the equation:

Net Negative Score=∑Negative termsper media × Sentiment score

However to make the comparison more fair, we also need to check all vocabulary content of the article, that includes both the usage of positive and negative words. It has been found that the content of the Foxnews articles are MORE than that of the Newyork times. So to do a fair analysis, we factor it using normalization. Thus a Normalized score is introduced!

A Normalized score is the net sentiment score of all articles to that of the total number of term usage across all the articles in a day(which is specific to each media houses).

Net Normalized Score=∑(termsper day * Sentiment score)∕Total Number of terms

Conclusion:

As Seen from the above plot we can infer that The New York times plays an important role in not only conveying the News but also in a healthy way(comparatively more optimistic). Thus I recommend New york time , for those specific target auidience of web users who just want to have good taste of daily news.

"People like to think they're objective and making decisions based on numbers," Dr. Lera Boroditsky said. "They want to believe they're logical. But they're really being swayed by metaphors."

Links:

  1. How the words we use affect the way we think.
  2. According to new research by Stanford psychologists. Your thinking can even be swayed with just one word, they say.
  3. There is a famous concept called Law of attraction written by Rhonda Byrne in her book Secret. It says that we become who we are by what we say!
  4. Lera Boroditsky: How language shapes the way we think.

Thanks for reading!

If you liked this article keep sharing with your community and consider exploring what happens when you click the clap icon more than once 👏

Reach out to me on https://twitter.com/Harishaaram91 if you want me to give a lecture/talk on this project.

Final Remarks: All the data collected and used are open to access to any individual under this License.

--

--

Harish Gandhi

I am CS graduate with expertise in data exploration through Software and Machine Learning Techniques