How to generate Word Clouds from multiple news source (with python code)

Jonathan C.T. Kuo
Analytics Vidhya
Published in
2 min readJan 4, 2021

The below tutorial will be divided into two parts — retrieve news information from the News API, and the generation of word clouds.

News API

News API is a simple and easy-to-use API that returns JSON metadata for headlines and articles live all over the web right now.

First, head to News API to get the API key.

Now, open your Jupyter notebook and start coding! We’ll need to first install the newsapi-python client library.

!pip install newsapi-python

Then, import all required libraries for later usage. Please paste your API key in the bracket as shown below.

import pandas as pdimport numpy as npfrom datetime import *from newsapi import NewsApiClientnewsapi = NewsApiClient(api_key=’5f921c126ca94bfc8afd8accaf02bef9')

Data retrieval

sources = newsapi.get_sources()sources_list = []for source in sources[‘sources’]:sources_list.append(source[‘id’])print(len(sources_list))sources_list

This part of code will list out all potential news source. A part of the result is as shown below:

Now, you can specify the timestamp of the news articles that you intend to retrieve. For instance, all news articles published 30 days ago. In addition, the query string (“data science” for example) needs to be specified here. In my case, I only intend to retrieve news articles that are related to “data science” from the first 50 news agencies 30 days ago. All results are stored in the results array.

Word Cloud

Once we get our context data, we can start creating the word clouds. Here, we use the WordCloud library to create a single word cloud for each news agency. Since the results array stores 50 sets of news articles, there will be 50 word clouds being generated according to the below code.

Some examples are shown as follows:

So this is the end of the tutorial of generating word clouds for multiple news source. The same techniques can be applied in many different use cases. Hope this simple demonstration helps!

--

--