NEWSEMBLE
An API for fetching current news data
Why Newsemble?
For programmers (especially Python), there are a multitude of APIs like Newsdata, NewsAPI that support Indian News Data, but we found that there were certain restrictions while using them:
a) Some of them only provided the headline, and not the content.
b) Even if the content was available, it was restricted to a low character limit.
c) Almost all had strict limits regarding the number of requests for the free version.
d) The quality and relevance of the news wasn’t very high in some cases.
To overcome these problems, we decided to build our own API that provided complete access to current news.
What does Newsemble provide?
When a request is made to the API, a list of JSON objects is returned.
Each JSON object has the following fields:
link: URL of the news article
content: Content of the article
source: Newspaper
title: Headline of the article
time: Date and time of release
How to access Newsemble?
Here is a quick way to access the API (using Python)
This can easily be transferred to the programming language of your choice!
Additional Information
Currently Supported Websites
API Links
www.newsemble.ml/news: Link to fetch all the data from All Sources
www.newsemble.ml/news/toi: Link to fetch data from Times of India
www.newsemble.ml/news/th: Link to fetch data from The Hindu
www.newsemble.ml/news/tie: Link to fetch data from The Indian Express
www.newsemble.ml/news/ndtv: Link to fetch data from NDTV
www.newsemble.ml/news/it: Link to fetch data from India Today
Technical Details
Pipeline
Behind The Scenes
Steps:
- We built custom scrapers for each of the supported websites using BeautifulSoup. These scrapers are responsible for collecting all the data.
- We also connect our application to a MongoDB database on the cloud (MongoDB Atlas).
- Whenever the news articles are scraped, they are inserted into a collection.
- Whenever we insert anything into this collection, we delete everything that exists before, so that it always contains the most recent articles.
- We then use Flask to convert this app into a REST API.
- To automate this system, we deploy it on the cloud (Heroku).
- On Heroku, we use a scheduler that runs the scrapers every hour, and stores the results in the database.
- Whenever a request is made to the API, the results are fetched from the database and promptly served. This is done to reduce the latency factor that arises due to scraping.
Additional Utilities
One additional utility function is to convert the time of each article, by changing the format from a string to Python’s native datetime object.
This allows for easy manipulation and filtering of the news articles.
Future Scope
This project can be used for news analysis, detecting keywords, finding the trending topics, news summarization amongst many other interesting things.
We are currently building a system for finding the trending news keywords using this API, and plan to release it soon!
If there is any other innovative application that you want to use it in, please do let us know!
Thanks
Thanks for reading this far!
Please do check out the source code here.
If you find this useful or interesting in any way, please 👏 the article or ⭐️the repository. It serves as good motivation going forwards!
Lastly, and most importantly, if there is any other feature you want us to add, or any other news site that might be useful, please let us know.
Any feedback is greatly appreciated!