Stock Market News Scraping using BeautifulSoup

Poojan Shah
3 min readApr 15, 2022

--

What is Web Scraping and Why do we need it?

Web scraping is the process of extracting raw data from any website in an efficient way where large amount of Data can be scraped within a fraction of time. Scraped data can be later used for analysis in different ways. Not every website has an Application Programming Interface(API) nor they offer data sets, so with the help of web scraping, we can get the data and save it in different formats. In this article, we will use BeautifulSoup and Python to scrape the data.

What is BeautifulSoup?

BeautifulSoup is a Python library that is used for web scraping purposes to pull the data out of HTML and XML files. It creates a parse tree from page source code that can be used to extract data in a hierarchical and more readable manner.

Installation of Required Python Libraries:

1. BeautifulSoup.

pip install beautifulsoup4

2. Request Library.

To scrape data from a website, we need to extract the content of the webpage. Once the request is made to a website, the entire content of the webpage is available in the form of plain text.

pip install requests

Let’s get started with the code –

Here, we are importing BeautifulSoup and request library. The website which we are using to scrape news: https://www.equitybulls.com/.

The following code will send a Get request to the web page and the doc object will contain the html file parsed in a tree-based structure using html.parser.

Here are list of few headlines which we are scraping:

Extract News Headlines from the website:

To find out under which html tags our headlines are located, we need to inspect our web page. To inspect a web page click F12.

Here, we can see that our 1st headline is located under the <div> tag containing the class=media-body. We will use the method find_all() to search <div> tags, with a class of media-left. By using a for loop, we will iterate through each tag and scrape the headlines and append them in headings list.

Now let’s scrape all the url’s for each headlines respectively-

Here we can see that the <a> tag contains href link with class=catg_title. We will use the method find_all() to search class_=catg_title where href attribute is True. l.get will get all the hyperlinks inside <a> tag, then we will concatenate https://www.equitybulls.com/ with the scraped href link to create a URL. We will append that URL into our list named click_link.

Map each headline with it’s link respectively-

We will create a Dictionary and with the help of zip() function we will zip the two different list i.e. headings and click_link into a single dicitionary.

When we print the dictionary we will get headings and links in key-value pair:

Here is the Full Code-

List of websites you can try news scraping on:-

https://www.livemint.com/market/stock-market-news
https://www.indiainfoline.com/markets/news

HAPPY SCRAPING 😃

--

--