Web Scraping Football Data — Serverless Edition

Collect data wherever you want, whenever you want

Sergi Lehkyi
Geek Culture

--

When you want to start a project with data the first concern is to get data. Once you have it you can build fancy graphs, pie charts and ML models, but before that — please be a nice guy/gal and collect the data.

In one of my previous articles I’ve already explained the process of scraping data from a website — what libraries to use, how to browse through the tags etc. That mentioned article is here.

In this one I would like to make a step forward and improve the scraping process by automating it and moving it to the cloud. We will use AWS Lambda and AWS S3 in order to achieve that.

So the idea is to write a Lambda that will be running once a day, collecting the latest data and adding it to the dataset we maintain in S3. Pretty easy, pretty straightforward.

For this we will need to configure: code of the Lambda, cron trigger, role and policy with write permissions and S3 bucket.

S3

Let’s start with the most basic and easy part. If you want to save data on the cloud you have to enable some storage solution. In AWS it is S3. In our case, we just have to create a bucket where we are going to upload scraped data. This is how my…

--

--

Sergi Lehkyi
Geek Culture

Data and Cloud Developer, love technology in general, maybe too much humor and never too serious, based in amazing Barcelona