Scraping Excel Online Read-Only File With Selenium and Javascript in Python
Experience the joy of human-machine cooperation
This exercise was prompted by a question on a forum https://community.dataquest.io/t/how-to-download-an-excel-online-file/494093 regarding how to download a read-only file http://www.presupuesto.pr.gov/PRESUPUESTOPROPUESTO2020-2021/_layouts/15/WopiFrame.aspx?sourcedoc=%7B566feecf-1e0d-46b8-a505-7cd762665268%7D&action=edit&source=http%3A%2F%2Fwww%2Epresupuesto%2Epr%2Egov%2FPRESUPUESTOPROPUESTO2020%2D2021%2FFOMB%2520Budget%2520Requirements%2520FY%25202021%2FForms%2FAllItems%2Easpx%3FRootFolder%3D%252FPRESUPUESTOPROPUESTO2020%252D2021%252FFOMB%2520Budget%2520Requirements%2520FY%25202021 from excel online that required authentication to Download.
Copy pasting a few cells works fine, but Ctrl+A copy-pasting leads to just the text “Retrieving data. Wait a few seconds and try to cut or copy again.” being pasted with no data, making data analysis of the full file difficult. The follow sections will go through how to move around the document, get all the information, clean them, and put them together. Full notebook at https://gist.github.com/gitgithan/28f63f707bdbdd5dd9f51f553c6322dc