How to generate Word Clouds from multiple news source (with python code)
The below tutorial will be divided into two parts — retrieve news information from the News API, and the generation of word clouds.
News API
News API is a simple and easy-to-use API that returns JSON metadata for headlines and articles live all over the web right now.
First, head to News API to get the API key.
Now, open your Jupyter notebook and start coding! We’ll need to first install the newsapi-python client library.
!pip install newsapi-python
Then, import all required libraries for later usage. Please paste your API key in the bracket as shown below.
import pandas as pdimport numpy as npfrom datetime import *from newsapi import NewsApiClientnewsapi = NewsApiClient(api_key=’5f921c126ca94bfc8afd8accaf02bef9')
Data retrieval
sources = newsapi.get_sources()sources_list = []for source in sources[‘sources’]:sources_list.append(source[‘id’])print(len(sources_list))sources_list
This part of code will list out all potential news source. A part of the result is as shown below:
Now, you can specify the timestamp of the news articles that you intend to retrieve. For instance, all news articles published 30 days ago. In addition, the query string (“data science” for example) needs to be specified here. In my case, I only intend to retrieve news articles that are related to “data science” from the first 50 news agencies 30 days ago. All results are stored in the results array.
Word Cloud
Once we get our context data, we can start creating the word clouds. Here, we use the WordCloud library to create a single word cloud for each news agency. Since the results array stores 50 sets of news articles, there will be 50 word clouds being generated according to the below code.
Some examples are shown as follows:
So this is the end of the tutorial of generating word clouds for multiple news source. The same techniques can be applied in many different use cases. Hope this simple demonstration helps!