Beyond Scraping — Scheduling and Automating Web Interactions

Sam Stone
Structured Ramblings
7 min readJul 7, 2020

You may find this useful if you’re interested in:

  • Automating interactions with 3rd-party websites that lack APIs
  • Managing recurring jobs in a web app
  • Developing with Selenium, Django, or Heroku

The story behind Booking Bot

I love tennis and every week I try to play at the same time and place on San Francisco public courts. In April 2019, SF Parks & Rec closed the Golden Gate Park tennis complex for 2 years for renovations; the closure of these 20 courts tipped the supply/demand balance and it became much harder to find an available public court.

Consequently, SF Parks & Rec released an online tennis court booking system. Each day at 8am, courts become available for booking online 7 days in advance. For example, on Wednesday, March 4 at 8am, courts became available for booking at any time on Wednesday March 11.

So, for example, I like to play at Hamilton Rec Center each Friday at 7:30am. To reliably book that court, I have to log onto the booking system the previous Friday at 8:00am or soon after. That’s a problem because I’m trying to be playing tennis at that time! Turns out a number of my tennis friends also had the same issue, which sparked the idea for an app that could do the following:

As a user, I want to book the same tennis court at the same time each week automatically, so I don’t have to wait in line for a court and I don’t have to wake up early to book myself.

A little bit more digging uncovered an additional complication, and a notable non-requirement. The complication: the SF court booking system has no API, so this all has to be done programmatically via GUI. The non-requirement: a user-facing UI. My friends indicated they actually preferred emailing me their recurring court booking preferences, since they didn’t change often, rather than having a new app to download and learn.

1. Making a single court booking

The Happy Path

Here’s an overview of the user interaction flow that satisfies the user story described above:

  1. Authenticate: type in username and password, click “log in”

2. Search for desired date: enter month, year, and date, click “search”

3. Select all the courts at the desired location on the relevant day

4. Check booking availability: loop through selected courts to see if desired time is available

5. Finalize the booking: click into the desired time at the desired court; save a screenshot of the confirmation page.

6. Send user confirmation email, or if the booking failed, an email explaining why it failed.

Implementation via Selenium

I built a script that followed this user interaction path with Selenium. Selenium is a library that allows you to launch a headless browser (in this case, I chose Chrome), navigate to a website, and manipulate the site’s DOM via code. It’s often used for UI testing, but it can be used for many other things, in this interacting with a 3rd-party application without an API.

Two things are especially nice about Selenium:

  • You can interact with the DOM in the language of your choice: you don’t have to use JavaScript! I used Python, because that’s the language with which I’m most familiar, but Selenium also supports Java, C#, Ruby, Kotlin, and JavaScript.
  • Selenium has an awesome IDE (with a browser plugin) where you can manually navigate the DOM, record your actions, and then review code that replicates those actions (its like the macro recorder in Excel). Here’s an IDE screenshot after recording the interactions for authenticating at the booking reservation site:

I recorded myself doing steps 1–5 using the Selenium IDE, exported into Python, and then turned that into cleaned-up code that could accept different inputs (e.g. court locations, times) and handle errors (e.g. no courts available).

At the end of this stage, I had a Python script with the function make_booking(user, court_location, datetime) which would launch a browser, make the booking, and send the user a confirmation email.

2. Putting the Selenium job into a web app

Setting up data structures

I wanted to be able to run the make_booking( ) function with different inputs, persist the inputs and outputs, and expose a UI for viewing and changing the inputs and outputs. The easiest path for me to do so was to place my script into a Django web app. I chose Django because it was the web app framework with which I was most familiar, it was designed for Python, and it had good out-of-the-box functionality for:

  • Standing up a database with an ORM
  • Making data viewable and editable
  • Running recurring jobs
  • Sending emails

Here are the data structures, aka models in Django terminology (and the associated code)

  • Django comes with a built-in User model, but to append fields to the user, we need to create a UserProfile model, mapped 1-to1 to User.
  • A Booking is a specific, real-world instance of a BookingParameter. A Booking has a status: pending (the app has created the booking, but not yet attempted to book it), succeeded, or failed. If it succeeds, it will have a court number, a confirmation number, and a file path to where the confirmation screenshot within the app’s directory. If it fails, we record the failure reason.
  • A CourtLocation, e.g. “Dolores Park”, can have multiple courts (Dolores Park has 6). So when a booking is successful, we need to record which specific court at the CourtLocation is booked (e.g. court #1, court #2, etc.)

I configured Django to use SQLite (it’s default), but its easy to move to a more robust database like Postgres.

Making data viewable and editable

Django’s admin functionality makes it easy to view and manipulate data:

And here’s what the Django Admin UI looks like for the Booking page:

Preparing jobs to be run automatically

Django has nice built-in functionality for creating commands that can be run from the command-line, by calling $ python manage.py command_name. We need 2 commands (scripts) to run each morning when new bookings become available, which are located in this directory:

  • Create_pending_bookings.py: look ahead 7 days, identify all active BookingParameter objects with DayOfWeek on that day, and create a new Booking object for each one, with status=Pending
  • Execute_pending_bookings.py: for each pending Booking, run make_booking( ) and write the results to the Booking object.

At the end of this stage, I had an app I could run locally which allowed me to (1) make user bookings in batch from the command line, and (2) add, update, and delete user data via Admin.

3. Deploying the app and running regular jobs

An app that only runs when your laptop is open, and which requires command-line prompts at 8am each morning still isn’t very useful. The next step was to deploy this app, and set up recurring jobs to make the right bookings automatically each day.

Deployment

I deployed via Heroku because (1) it’s free for small projects and (2) it has a simple UI for managing recurring jobs. Deploying via Heroku required:

  1. Create a Heroku account
  2. Install the Heroku CLI tool
$ brew install heroku/brew/heroku

3. Log in via CLI

$ heroku login

4. Connect my booking_bot git repo to heroku

$ heroku create

5. Add 2 files to the repo:

6. Deploy!

$ git push heroku master

Recurring Jobs

Using the Heroku dashboard, I selected “Resources” and added “Heroku Scheduler” using “Find more add-ons.” I then added the the 2 command-line prompts discussed above. Note that Heroku defaults to UTC, so the app creates new pending bookings at 7:00am Pacific Time each day, and then attempts to execute (make) all pending bookings at 8:00am Pacific time each day.

4. Metrics and monitoring

Good products need good monitoring; good monitoring reflects the nuances of the product and use case. Booking bot is different than a typical app with user-instantiated events because it only runs 2 jobs per day and is otherwise dormant. So it needed monitoring that was instigated by job attempts, rather than user interactions. I built a lightweight reporting function, set to run 1x per day after the other jobs had run, which would:

  • Summarize the number of bookings made vs attempted
  • Give views for the past 1 day, 7 days, and all time
  • Sent all this to me each morning via email

I did this by creating another Django management command summarize_bookings.py and adding

python manage.py summarize_bookings 

to the Heroku Scheduler. Here’s what it generates:

Reflections

After operating this app for a few months for myself and a few friends, I came away with a few product observations:

  1. Email can be a killer UI for a v0 app. Users check their email; it doesn’t require a change in behavior and it helps promote adoption. I’m glad I didn’t build a user-facing UI, but rather interact with users purely through email.
  2. I didn’t do enough planning up front and relied on some bad assumptions. For example, I assumed I would run scheduled jobs in production the same way I ran scheduled jobs locally. I invested a bunch of time getting scheduled jobs to run locally using crontab, only to discover the default Heroku environment lacks crontab. But it turns out Heroku has an awesome Heroku Scheduler UI, which I found much easier to use than crontab! I could have saved myself a lot of time by pinning down the details of the latter development stages earlier in my planning process.

--

--