Recently the company I’m working in started using and tinkering with this google sheets like relational database service (to be honest I think this description may be misleading, it will do you better to just checkout the product here. After tinkering around with it I’m wondering if I can just scrape data from airtable without the use of API and just the feature of view sharing.
But, before I continue, Disclaimer :
- this post is not sponsored by airtable or me promoting it. I’m just another user who is excited with certain product, and interested in make the most out of it.
- It is not a Tutorial, it is a journey of how I explore web scraping. So it may not be a good reference on how to do things.
First let me introduce you to Airtable. In Airtable you can make a base which analogues to worksheet file in Google Sheet. And fortunately Airtable offers lots of template for us to start with.
For now let us pick a simple book tracker just for example.
After you open the template on your workspace, you will be greeted with the view inside your base which consists of two tables in this case (two tabs). i.e. table for Books and for Authors. In each table there are records (row) and for each records you can have one primary field (the leftmost column) and multiple fields (the rest of the column)
For example in our Books Table there are 5 records with primary field of book names, and each has 6 fields (excluding the primary field).Okay now We’ve got some basic covered, I believe it is better for you to try it out yourself since this post is not for introducing how to work with airtable, but how I want to scrape data from it and my progress on it.
What I Want
Actually what I want to do is pretty simple. We have multiple team, and each has a base with the same format and template. And every now and then I need to keep track of what’s happened in just one table of each base that every team has. And also I need to make a summary from them. And to be honest checking each of them is such a pain.
So I think of something, how about I just aggregate the data from each base? There must be an API for it.
Btw I get my motivation from a google sheet extension called airtable importer, since it has been done before, I should be able to do it right? (well at least I hope so)
0. My Plan
Well in short I want to be able to copy the shared view link for a table and have a program to extract the data I want and make an aggregate of the data in a spreadsheet so that I won’t be bothered by the free plan limitation of airtable.
It should be easy, right?
1. Searching the data
This is the view of what I want to scrape, you also can access it here
Since it is on the web, and I could access it with browser, I could just fetch the data from the HTML right?
Turns out I am to naive for this, using chrome dev tools, I found out that:
Here I’ve got two choices:
- Scrape the data from HTML
- Try to get the data from HTTP request
For now let’s try the second option first, since it sounds cooler
2. Finding the Request
After looking closely on how the page load, I guess I could find out how the data is loaded into the table. So I used Network tab in Chrome DevTools and… There!
If you look closely there’s this request that’s happened right before the content is loaded.
And there it is the data I need!
3. Pulling the Request ?
So how should I get it, I could just copy pasting it to postman and it will works right?
Turns out no. After reloading the page several times I found out that the RequestId part of the URL is changing each time. After a quick search on the matter I found out that RequestID is a truly unique code generated by the service (CMIIW). And for now I hit a wall here. Sooo I will step back and try the second method, by scraping the html.
Since the post got quite long in the progress, I will see you in part 2 on how I will try to scrape the data from HTML.