Tutorial: Creating a Webpage Monitor Using Python and Running It on a Raspberry Pi

Published in

The Startup

10 min readJan 22, 2021

Introduction

Buying concert tickets? Waiting for a college admission decision? Are you ever in a situation where you’re constantly refreshing a webpage over and over again hoping desperately that the next time it loads, you’ll see something different?

If so, what you’re looking for is a website monitor. A website monitor (or webpage monitor) is a program that repeatedly requests a webpage and notifies the user, via text or email, when it has detected any changes.

I recently was in a situation where I needed a website monitor. In January 2021, my hometown had begun COVID vaccine distribution to healthcare workers and the elderly. My grandmother wanted to get the vaccine as soon a possible, but there were so many people who were also interested, that on the first day the vaccine appointment form was released, all of the slots were taken within minutes. I decided that we needed a webpage monitor to repeatedly check this website and tell me as soon as there were new appointment slots available.

I found a few paid services that do this, but I prefer to not spend money when it’s avoidable.

So, I created a simple python script that does the following:

Sends a request to a specified URL and receives the HTML from that webpage
Stores that HTML in a file on the computer’s permanent storage (hard drive)
Waits for 10 minutes
Sends another request to the same URL and receives the new HTML code
Compares the new HTML code with the stored HTML code in the file on disk. If they’re equal, that means the website hasn’t changed; do nothing. If they’re different, send an email and text message to me to let me know that the webpage has changed.
Repeats steps 3–5 forever…

I could have just let this python script run on my personal laptop, but I use my laptop for lots of other things and don’t like to leave it on for long periods of time. So I decided to run it on my Raspberry Pi, which is perfect for just this kind of project.

If you want to skip the tutorial, you can find all of the code on this Github repo.

Who is this tutorial for?

This tutorial is perfect for you if you are…

Learning python
Looking for a fun project that you can do on your Raspberry Pi

Before starting, make sure you’re familiar at least with the basics of python, know how to execute .py files, and are able to install python packages/modules using pip.

I’m using Python 3.7.3 but any Python version 3.* should work for this project.

You don’t need a Raspberry Pi to run this webpage monitor! Any dedicated device that has python and connect to the internet will work. That means that you can use an old PC that you have in your attic or an AWS EC2. If you just want to learn how to make the webpage monitor, stop after reading through Part 1 of the tutorial.

A Word of Caution

The word of caution is actually an acronym: DoS.

Tl;dr: If you request the same webpage too frequently, the web server may think that you are trying to attack it and will block you.

DoS stands for Denial-of-Service. It’s a kind of network attack that involves trying to take down a server or network by bombarding it with way too many requests. The most infamous kind of DoS attack is a DDoS attack, short for Distributed Denial-of-Service. That’s when a bunch of computers in lots of different places simultaneously send requests to the same server, flooding it with too many requests so that it cannot distinguish between legitimate clients and attackers, so it has no choice but to stop serving everyone.

Do not use this tutorial to perform a DDoS attack! That is bad. And probably illegal.

Make sure your program waits longer than 10 seconds before sending another request. If you don’t wait long enough, the server may think you’re trying to attack it and it may ban your IP address.

Part 1: Writing the Python Script

Checking to see if the webpage was changed

Let’s start by writing a function that checks to see whether the contents of a website had changed. The function website_was_changed() will be called repeatedly. It will return true only if the html of the webpage you are checking had changed since the last time you called it.

You can use the python requests module to send a HTTP GET request to a specified URL. If the file previous_content.txt doesn’t exist, the script creates it. Then it reads from the file previous_content.txt and compares the html stored therein with the html response the script just received.

The headers are additional parameters attached to your request that give the server more information about the context of the request. I specifically added 'Pragma': 'no-cache', 'Cache-Control': 'no-cache', which is important to prevent caching. Caching occurs when a program such as a browser or this requests python module will temporarily remember the html from a webpage in order to not need to send as many HTTP requests. You can read more about HTTP caching here. I like to think of it like the difference between refreshing a page on Chrome by hitting Command+R vs. Command+Shift+R (or Control+R and Control+Shift+R on Windows).

Removing <script> and <meta> tags

Some part of the webpage you’re checking may change every time you refresh the page. What if, for example, some nincompoop web designer decided to display the current time somewhere on the webpage?

You can check whether you need to do this by putting the above code into a file called monitor.py and running it in interactive mode as follows:

python3 -i monitor.py>>> URL_TO_MONITOR = "https://paul.bitutsky.com"
>>> webpage_was_changed()
True
>>> webpage_was_changed()
False
>>> webpage_was_changed()
False
>>> webpage_was_changed()
False

The webpage_was_changed() function should always return True the first time it was called because it there’s nothing in the file previous_content.txt. But after the first call, it should return False. If it doesn’t, look at how previous_content.txt changes each time webpage_was_changed() gets called. If a <script> or <meta> tag is being changed every time you get new html, consider adding a processing function that removes those tags:

You will need to install the BeautifulSoup python library by executing the command pip install beautifulsoup4. Depending on how you installed python3, you may need to use pip3 instead of pip. More info on installing python libraries with pip.

Getting Text Message Notifications

Now let’s make a function that sends a text (SMS) message. For this, we’re going to use the Twilio API. Here’s what you need to do:

Create a Twilio Account.
Register a phone number in your account. https://www.twilio.com/console/phone-numbers/incoming
Find your Account SID and Auth Token. https://www.twilio.com/console
Install the Twilio python library by runningpip install twilio on the command line.

And now let’s write a function that sends a text message. The argument to the function will be the text to send be sent in the message.

Getting Email Notifications

There are multiple ways you can send an email using python, but I found that using the Yagmail library is the simplest option. It seems that Yagmail works by just logging you in to a gmail account and sends an email from that account. Before writing any code, here’s what you should do:

Create a new (throw-away) gmail account and remember the username and password.
Turn on less secure app access in this gmail account. Go to https://myaccount.google.com/. Click Security. Scroll down to “Less Secure App Access” and toggle “On”.
Install Yagmail by running pip install yagmail on the command line. You may also need to install keyring: pip install keyring.

Here’s the code for sending an email notification.

Note that it’s generally bad practice to store email passwords in plain text. One alternative to the above code is to not provide the password argument to yagmail.SMTP (e.g. yagmail.SMTP(SENDING_EMAIL_USERNAME).send(...)). However, if you don’t provide the password, you will need to use yagmail.register at some point before starting your python script to store the password for this account in your keychain. I found this approach to be problematic. What happens if you forget to do yagmail.register or need to re-register at some point? If the yagmail library doesn’t have the password to this gmail account it will prompt the user for the password, which means the monitoring script will wait on the user to input the account password, pausing the loop that’s supposed to continuously check the webpage.

The forever loop

Here’s the part of the code that checks if the webpage was changed, and sends email and text alerts when it has.

Adding logging

How will you know if the script is still running? You can add a bunch of print statements, but using the python logging library is better. You can make it add the date and time to every log message too! This will be especially useful later on when we’re running this on the Raspberry Pi and will want others to be able to see if our script is still running.

An important edge case: Network issues

What happens if there’s a network connectivity issue? Answer: Python’s request module will return an error, which means your loop will stop! That’s why you should add a try/except, like so:

Tying it all together

Let’s put the forever loop into a main function, as is python best practice, and make sure that main function gets called when we execute the python script. If you put the following code into a file called monitor.py and install all of the imported libraries via pip (beautifulsoup4, twilio, yagmail), you should be able to activate it by runningpython3 monitor.py on the command line.

Part 2: Running it on a Raspberry Pi

You can run that python script from your computer and it should work fine. But who wants to keep their computer turned on all the time? Let’s deploy this baby to a Raspberry Pi!

I’ll be using the Raspberry Pi 4 Model B because it has wifi capabilities (unlike the earlier Pi’s). You can use any Pi you want, as long as you’re able to connect to it via SSH and it’s able to connect to the internet. You do not need to connect the Pi to a monitor.

You need to set up your Raspberry Pi first to be able to connect to it via SSH. This handsome man I know made a tutorial on how to do that if you have a mac.

Copy your monitor.py from your computer to the Pi

You can copy files over ssh using the scp command. This should copy monitor.py to your home directory on the Pi:

scp path/to/monitor.py pi@raspberrypi.lan:monitor.py

Starting the webpage monitor

You can always start the script the simple way. SSH into the raspberry Pi and execute python3 monitor.py (you will need to install all dependencies with pip/pip3 first!). But once the SSH connection terminates, the python script will stop running! So what you need to do is use nohup. Start the script like this instead: nohup python3 monitor.py &

The ampersand (&) means run the process in the background. If you execute the above command and then end the SSH connection, the script will not stop.

pi@raspberrypi:~ $ nohup python3 monitor.py &
[1] 1911
pi@raspberrypi:~ $ nohup: ignoring input and appending output to 'nohup.out'
pi@raspberrypi:~ $

That’s what it looks like when I run the python script with nohup. It looks like the output from the nohup command overflowed a little bit so I just pressed enter/return to get a clean input line. The four digit number (1911 in my case) is the process identifier that corresponds with the python process running monitor.py.

You can check the output of the script by running tail nohup.out. That’s a file that nohup will create in your home directory and write to there everything that would be normally outputted to the terminal.

Stopping the webpage monitor

You can terminate the nohup process that’s running the python script by SSH’ing into the Pi and doing the following.

pi@raspberrypi:~ $ ps ax | grep python3
 1911 pts/0    S      0:02 python3 monitor.py
 1919 pts/0    S+     0:00 grep --color=auto python3
pi@raspberrypi:~ $ kill 1911

The ps command lists all processes running on the system. ax specifies the format. The output of that long list of processes is piped into the grep command which will filter out all lines that don’t have the string “python3” in them. In English, the first command says “Show me all of the current processes that are related to python3”. For each of the processes, the process identifier (pid) is listed in the leftmost column. Of the three processes shown, you’ll want to kill the one that says python3 monitor.py. In my case, that’s the process with pid 1911. Run the command sudo kill <pid> replacing <pid> with the pid of the process you want to kill.

Make your logs easily accessible (so you don’t have to SSH every time)

You can easily make your logs viewable to anyone on your local network by running an apache server on your Raspberry Pi. Execute sudo apt install apache2 -y. This will install (and start) apache. You can test that it’s working by going to “raspberrypi.lan” in a browser. If you see the “Apache2 Debian Default Page” then it works!

Now, anything in the folder /var/www/html/ on the Raspberry Pi will be a webpage here. So if you create a file /var/www/html/web_monitor.log, you should be able to access it by going to raspberrypi.lan/web_monitor.log.

Let’s do just that. You can create the file with

sudo touch /var/www/html/web_monitor.log

The touch command creates an empty file. You need to use sudo because this directory is protected, and not editable by other users. The sudo command means “As the superuser (admin/root user) do the following…”.

By default, files created by the superuser are only editable/writeable by the the superuser. Let’s make this file writeable to all users:

sudo chmod o+w /var/www/html/web_monitor.log

The chmod command changes permissions. o means all other users. + means add a permission. w means the write permission. In English, “Allow all other users to write to this file.”

So now all you have to do is tell nohup to write to /var/www/html/web_monitor.log instead of nohup.out:

nohup python3 monitor.py > /var/www/html/web_monitor.log &

The “>” means redirect output to this file. You should now be able to see your logs on raspberrypi.lan/web_monitor.log.

What’s Next

That’s the end of the tutorial! Here are some things you can try on your own to improve this webpage monitor:

Using venv and managing python packages with a requirements.txt.
Making the web monitor logs accessible outside of your local network. One way to do this is to enable port forwarding.
Instead of using a loop with a sleep timer in python, you can have a cron job on the Raspberry Pi that activates a python script every few minutes.
Hashing the contents of the webpage to save space on your hard drive.

Hope this was helpful!