An easy technique for web scraping an interactive web chart with Python

Toyosi Bamidele
Analytics Vidhya
Published in
3 min readOct 10, 2020

You browse a dynamic website with an interactive chart and it has all the data you need for your next data project. How should you go about web scraping?

For this process, you only need two libraries

  1. The requests library
  2. The pandas.io.json library → json_normalize

The key to this process is exploring the website’s Network tab before digging into HTML.

For this article, Gold prices in Brazilian Real will be scraped from bullionstar.com for dates between 01/01/2019 to 11/09/2020.

Steps

  1. Right-click and click on Inspect

2. Reload the page and click on the Network tab

3. This is the part that requires exercising your investigative skills!

Make sure to click on the XHR tab, it is an API in the form of an object that transfers data between a web browser and a web server

Within the XHR tab, explore the different objects to see if any house the web chart data by looking at the Preview tab.

After some digging, I'm able to find the web chart dataSeries stored in the “chartsData” object under the Preview tab.

4.Now let's head back to the Headers tab and locate the four parameters needed for the request method

a.RequestURL: The URL of the request

https://www.bullionstar.com/widget/generator/line/generate/chartsData

b.Request Method: It is used to send data to a server to create/update a resource

POST

c.Headers: A dictionary of HTTP headers to send to the specified URL

user-agent:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36

d.Form data:

5. Now let’s jump into python and start coding

a.Import the relevant libraries

b.Specify the URL variable

c.Utilize .get() to pull cookies

d.Specify the variable for form data and headers as shown below

e.Pass all four parameters into r.post()

f. Pull the data into a json() format

g. Normalize pulled data and call out specific data series

A snapshot of the last 10 Brazilian price values for the specific dates
Snapshot of the last 10 Gold prices in Brazilian Real after running “values” variable
Normalized JSON format data now in a data frame

Now you can go on and save to a CSV file or do further ETL work for your data project!

Note: This is not a one size fits all, some webpages might require .get() or the use of Beautifulsoup library or even selenium, but understand that by exploring this technique, you can cut down on the time needed to scrap an interactive chart with a just few libraries.

If you like this post, follow me on Linkedin and subscribe to my Youtube channel.

I’m currently pursuing my Master’s in Data Science and I’m currently trying to pivot from the energy industry to technology, please show me support and provide helpful hints as well.

--

--