Web Development

Let’s Parse the Web

We create a small web app (in Flask) that serves some data on the food habit of different countries

Nil Madhab
webtutsplus
Published in
9 min readDec 11, 2020

--

Photo by Nicolas Picard on Unsplash

Introduction

The web is full of valuable data (mostly public) that can be used for research or other purposes. Now at the age of AI and Machine Learning, data is more valuable than ever before! But most of these data are for humans to read (aka, they are presented in HTML) and not available in formats that are easier to be read by computers (e.g. XML, JSON or CSV).

To collect these data for our purposes, we use scrapers that “scrape” these web pages that we are interested in. They just fetch the page, parse the HTML and extract the data (texts, images, links etc etc) from that HTML. We define hierarchies or paths (XPath) or simply use CSS Selectors to identify which part of the HTML we are interested in.

In this lesson, we will create a small web app (in Flask) that serves some data on the food habit of different countries. in HTML and we will write a spider (or crawler, if you will) using the Scrapy framework.

Why Scrapy? (Optional, you can skip)

Scrapy is the most popular framework for writing web crawlers. Even Google uses Scrapy to…

--

--

Nil Madhab
webtutsplus

Developer @Booking.com | ex: Samsung, OYO | IIT Kharagpur | Entrepreneur, founder of simplecoding.dev | JOIN Medium, https://nilmadhab.medium.com/membership