Web Crawling with Scrapy

Published in

Analytics Vidhya

6 min readJan 10, 2020

In data analytics, the most important resource is the data itself. As web crawling is defined as “programmatically going over a collection of web pages and extracting data”, it is a helpful trick to collect data without an official API.

In this article, we will go through the following topics:

Setup Scrapy
Crawling data from webpages
Deal with infinite scrolling page

Setup Scrapy

Scrapy is a powerful tool when using python in web crawling. In our command line, execute:

pip install scrapy

Our goal

In this article, we will use Yummly as an example. Our goal is to download ingredients from each recipe for further text mining usage (see the related kaggle competition) Now it’s time to create our spiders :)

Create our first Spider

create a python file called crawler.py :

import scrapyclass RecipeSpider(scrapy.Spider):
 name = "recipe_spider"
 start_urls = ["https://www.yummly.com/recipes"]

Here we create a class inherited from scrapy.Spider (In the library, Spider already defines the approaches of tracing paths and data scraping.) We need to give…

Web Crawling with Scrapy

Setup Scrapy

Our goal

Create our first Spider

Written by Wendee 💜🍕