CODEX

Python: Stock Data Scraping

alpha2phi
CodeX
4 min readJan 18, 2021

--

Photo by Annie Spratt on Unsplash

Overview

Data is the new asset! In this article, I am going through the fundamental of using Python to scrape data from the Internet for use in your data project.

There are many Python libraries for web scraping. You can use the requests library with either BeautifulSoup, lxml, or Parsel, or frameworks like scrapy, Selenium, or a combination of both for dynamic websites.

Personally, I use requests + lxml for most of my scraping needs, and only use scrapy + Selenium for certain scenarios, e.g. getting content from dynamic or interactive websites. Most of the time using a simple approach should suffice.

Below I will be using requests + lxml to scrape the stock balance sheet data, highlighted in red boxes as shown below.

Stock Balance Sheet

XPath vs CSS Selector

There are quite a fair bit of debates regarding XPath and CSS selector, and which one is better to be used for web scraping. This article is not going to compare the two and I will just use XPath for my scraping needs.

--

--

alpha2phi
CodeX

Software engineer, Data Science and ML practitioner.