Scraping Excel Online Read-Only File With Requests
Hit the nail on the head
In a previous article https://medium.com/@hanqi_47643/scraping-excel-online-read-only-file-with-selenium-and-javascript-in-python-7bb549f05d66, I used Selenium to scrape this Excel Online file, but that felt a little indirect and slow, so here is a new attempt with new tools and knowledge gained. Full notebook at https://gist.github.com/gitgithan/b9f48e1b23e88f1fb1c56ad9b739adef
Creating the request
In the previous article, the strategy was to scroll, find, parse, scroll, find, parse,… Now, the goal is to send requests using Python requests library to directly target the information we want.
Begin by F12 to open Developer Tools → Network Tab on Chrome, then load http://www.presupuesto.pr.gov/PRESUPUESTOPROPUESTO2020-2021/_layouts/15/WopiFrame.aspx?sourcedoc=%7B566feecf-1e0d-46b8-a505-7cd762665268%7D&action=edit&source=http%3A%2F%2Fwww%2Epresupuesto%2Epr%2Egov%2FPRESUPUESTOPROPUESTO2020%2D2021%2FFOMB%2520Budget%2520Requirements%2520FY%25202021%2FForms%2FAllItems%2Easpx%3FRootFolder%3D%252FPRESUPUESTOPROPUESTO2020%252D2021%252FFOMB%2520Budget%2520Requirements%2520FY%25202021 or F5 reload page to see a list of Network Requests being recorded, we want to focus/filter on the GetRangeContent
requests (discovered by manually scrolling…