CODEX

Python: Stock Data Scraping

alpha2phi

Published in

CodeX

4 min readJan 18, 2021

Overview

Data is the new asset! In this article, I am going through the fundamental of using Python to scrape data from the Internet for use in your data project.

There are many Python libraries for web scraping. You can use the requests library with either BeautifulSoup, lxml, or Parsel, or frameworks like scrapy, Selenium, or a combination of both for dynamic websites.

Personally, I use requests + lxml for most of my scraping needs, and only use scrapy + Selenium for certain scenarios, e.g. getting content from dynamic or interactive websites. Most of the time using a simple approach should suffice.

Below I will be using requests + lxml to scrape the stock balance sheet data, highlighted in red boxes as shown below.

XPath vs CSS Selector

There are quite a fair bit of debates regarding XPath and CSS selector, and which one is better to be used for web scraping. This article is not going to compare the two and I will just use XPath for my scraping needs.

CODEX

Python: Stock Data Scraping

Overview

XPath vs CSS Selector

Written by alpha2phi