NEWSEMBLE

Rishabh Gupta
3 min readJul 17, 2021

An API for fetching current news data

Newsemble Logo

What is Newsemble?

Newsemble is a new API that allows for fetching of current Indian news data.

SOURCE CODE: GitHub

Why Newsemble?

For programmers (especially Python), there are a multitude of APIs like Newsdata, NewsAPI that support Indian News Data, but we found that there were certain restrictions while using them:

a) Some of them only provided the headline, and not the content.

b) Even if the content was available, it was restricted to a low character limit.

c) Almost all had strict limits regarding the number of requests for the free version.

d) The quality and relevance of the news wasn’t very high in some cases.

To overcome these problems, we decided to build our own API that provided complete access to current news.

What does Newsemble provide?

When a request is made to the API, a list of JSON objects is returned.

Each JSON object has the following fields:

link: URL of the news article

content: Content of the article

source: Newspaper

title: Headline of the article

time: Date and time of release

How to access Newsemble?

Here is a quick way to access the API (using Python)

Accessing Newsemble with Python

This can easily be transferred to the programming language of your choice!

Additional Information

Currently Supported Websites

API Links

www.newsemble.ml/news: Link to fetch all the data from All Sources

www.newsemble.ml/news/toi: Link to fetch data from Times of India

www.newsemble.ml/news/th: Link to fetch data from The Hindu

www.newsemble.ml/news/tie: Link to fetch data from The Indian Express

www.newsemble.ml/news/ndtv: Link to fetch data from NDTV

www.newsemble.ml/news/it: Link to fetch data from India Today

Technical Details

Pipeline

The Pipeline for Newsemble

Behind The Scenes

Steps:

  • We built custom scrapers for each of the supported websites using BeautifulSoup. These scrapers are responsible for collecting all the data.
  • We also connect our application to a MongoDB database on the cloud (MongoDB Atlas).
  • Whenever the news articles are scraped, they are inserted into a collection.
  • Whenever we insert anything into this collection, we delete everything that exists before, so that it always contains the most recent articles.
  • We then use Flask to convert this app into a REST API.
  • To automate this system, we deploy it on the cloud (Heroku).
  • On Heroku, we use a scheduler that runs the scrapers every hour, and stores the results in the database.
  • Whenever a request is made to the API, the results are fetched from the database and promptly served. This is done to reduce the latency factor that arises due to scraping.

Additional Utilities

One additional utility function is to convert the time of each article, by changing the format from a string to Python’s native datetime object.

This allows for easy manipulation and filtering of the news articles.

Utility Functions

Future Scope

This project can be used for news analysis, detecting keywords, finding the trending topics, news summarization amongst many other interesting things.

We are currently building a system for finding the trending news keywords using this API, and plan to release it soon!

If there is any other innovative application that you want to use it in, please do let us know!

Thanks

Thanks for reading this far!

Please do check out the source code here.

If you find this useful or interesting in any way, please 👏 the article or ⭐️the repository. It serves as good motivation going forwards!

Lastly, and most importantly, if there is any other feature you want us to add, or any other news site that might be useful, please let us know.

Any feedback is greatly appreciated!

--

--

Rishabh Gupta

Computer Science Undergraduate. Current Areas of Interest: Self-Supervised Learning, Meta-Learning, Algorithm Design and Design Patterns.