Scraping the Surface of Airtable (Part 1)

Recently the company I’m working in started using and tinkering with this google sheets like relational database service (to be honest I think this description may be misleading, it will do you better to just checkout the product here. After tinkering around with it I’m wondering if I can just scrape data from airtable without the use of API and just the feature of view sharing.

But, before I continue, Disclaimer :

  1. this post is not sponsored by airtable or me promoting it. I’m just another user who is excited with certain product, and interested in make the most out of it.
  2. It is not a Tutorial, it is a journey of how I explore web scraping. So it may not be a good reference on how to do things.

Introduction

A lot of template for your personal use to professional project management at work

For now let us pick a simple book tracker just for example.

After you open the template on your workspace, you will be greeted with the view inside your base which consists of two tables in this case (two tabs). i.e. table for Books and for Authors. In each table there are records (row) and for each records you can have one primary field (the leftmost column) and multiple fields (the rest of the column)

For example in our Books Table there are 5 records with primary field of book names, and each has 6 fields (excluding the primary field).Okay now We’ve got some basic covered, I believe it is better for you to try it out yourself since this post is not for introducing how to work with airtable, but how I want to scrape data from it and my progress on it.

What I Want

So I think of something, how about I just aggregate the data from each base? There must be an API for it.

Well, guess what? yes there’s an API for it and there are library for that API whether in Javascript or for Python. But if I want to use the API I need to be invited to the base and from the number of team that I monitor it may use up all my free plan quota (CMIIW, haven’t really try it, to be honest I just need an excuse to try web scraping). So since I am a cheapskate and I want to learn scraping anyway, I’ll just make it an exploration and a chance for me to learn.

Btw I get my motivation from a google sheet extension called airtable importer, since it has been done before, I should be able to do it right? (well at least I hope so)

Let’s Start!

0. My Plan

This is what I want to achieve

It should be easy, right?

1. Searching the data

The shared view in chrome

Since it is on the web, and I could access it with browser, I could just fetch the data from the HTML right?

Turns out I am to naive for this, using chrome dev tools, I found out that:

the complex structure of HTML

The data exists! That means the data is there in the HTML. So I can scrape the data directly following the structure in the HTML. But After giving it more thoughts I assume that the data must be sent from the back-end and there’s a javascript there that select the div and populate it with table data. Maybe I could get the data with better format if I can get the data before it populate the table.

Here I’ve got two choices:

  1. Scrape the data from HTML
  2. Try to get the data from HTTP request

For now let’s try the second option first, since it sounds cooler

2. Finding the Request

There’s time when the content suddenly get populated

If you look closely there’s this request that’s happened right before the content is loaded.

THE DATA

And there it is the data I need!

3. Pulling the Request ?

Turns out no. After reloading the page several times I found out that the RequestId part of the URL is changing each time. After a quick search on the matter I found out that RequestID is a truly unique code generated by the service (CMIIW). And for now I hit a wall here. Sooo I will step back and try the second method, by scraping the html.

Since the post got quite long in the progress, I will see you in part 2 on how I will try to scrape the data from HTML.

Misadventure of programming, doodle, and friends