Member-only story
Web scraping @edge, AWS Kinesis Data Firehose & Glue
What is “Web scraping @edge” or “Intranet web scraping”? The ability to extract data from a private website, i.e. from an IoT resource or device that is not publicly reachable. Why should we care? To collect data available only privately and to be able to process them (on the cloud at scale) in order to produce information and insights.
The purpose of this project is the data acquisition from motion detection sensors usually installed for home alarm system. It’s a closed and proprietary system, for obvious security reasons. A Raspberry PI is used in the local network to scrape the UI of Paradox alarm control unit and to send collected data in (near) realtime to AWS Kinesis Data Firehose for subsequent processing. After few weeks of data collected, I played on a Notebook to identify the most used rooms or the most frequent movement vectors.
Let’s see the general scheme.
Project repository is available on Github.
Web scraping with Raspberry PI and Selenium
Having already experience using Selenium, I decided to adopt the same framework in this project as well…