Scraping Excel Online Read-Only File With Selenium and Javascript in Python

Experience the joy of human-machine cooperation

Han Qi
The Startup

--

This exercise was prompted by a question on a forum https://community.dataquest.io/t/how-to-download-an-excel-online-file/494093 regarding how to download a read-only file http://www.presupuesto.pr.gov/PRESUPUESTOPROPUESTO2020-2021/_layouts/15/WopiFrame.aspx?sourcedoc=%7B566feecf-1e0d-46b8-a505-7cd762665268%7D&action=edit&source=http%3A%2F%2Fwww%2Epresupuesto%2Epr%2Egov%2FPRESUPUESTOPROPUESTO2020%2D2021%2FFOMB%2520Budget%2520Requirements%2520FY%25202021%2FForms%2FAllItems%2Easpx%3FRootFolder%3D%252FPRESUPUESTOPROPUESTO2020%252D2021%252FFOMB%2520Budget%2520Requirements%2520FY%25202021 from excel online that required authentication to Download.

Copy pasting a few cells works fine, but Ctrl+A copy-pasting leads to just the text “Retrieving data. Wait a few seconds and try to cut or copy again.” being pasted with no data, making data analysis of the full file difficult. The follow sections will go through how to move around the document, get all the information, clean them, and put them together. Full notebook at https://gist.github.com/gitgithan/28f63f707bdbdd5dd9f51f553c6322dc

Installing and Connecting to the WebDriver

--

--

Han Qi
The Startup

Shares ideas that I can’t find online. You can support my writing by joining Medium through https://hanqi01.medium.com/membership (affiliate link)