Scraping Excel Online Read-Only File With Requests

Hit the nail on the head

Han Qi
The Startup

--

cmdCopy as cURL(bash) to formulate a request

In a previous article https://medium.com/@hanqi_47643/scraping-excel-online-read-only-file-with-selenium-and-javascript-in-python-7bb549f05d66, I used Selenium to scrape this Excel Online file, but that felt a little indirect and slow, so here is a new attempt with new tools and knowledge gained. Full notebook at https://gist.github.com/gitgithan/b9f48e1b23e88f1fb1c56ad9b739adef

Creating the request

In the previous article, the strategy was to scroll, find, parse, scroll, find, parse,… Now, the goal is to send requests using Python requests library to directly target the information we want.

Begin by F12 to open Developer Tools → Network Tab on Chrome, then load http://www.presupuesto.pr.gov/PRESUPUESTOPROPUESTO2020-2021/_layouts/15/WopiFrame.aspx?sourcedoc=%7B566feecf-1e0d-46b8-a505-7cd762665268%7D&action=edit&source=http%3A%2F%2Fwww%2Epresupuesto%2Epr%2Egov%2FPRESUPUESTOPROPUESTO2020%2D2021%2FFOMB%2520Budget%2520Requirements%2520FY%25202021%2FForms%2FAllItems%2Easpx%3FRootFolder%3D%252FPRESUPUESTOPROPUESTO2020%252D2021%252FFOMB%2520Budget%2520Requirements%2520FY%25202021 or F5 reload page to see a list of Network Requests being recorded, we want to focus/filter on the GetRangeContent requests (discovered by manually scrolling…

--

--

Han Qi
The Startup

Shares ideas that I can’t find online. You can support my writing by joining Medium through https://hanqi01.medium.com/membership (affiliate link)