Scrapy Tutorial — Part 5
How to deploy a scrapy spider into production?
This blog is part of a tutorial series: PART 1, PART 2, PART 3, PART 4, PART 5
Running Scrapy spiders in your local machine is very convenient for the (early) development stage, but not so much when you need to execute long-running spiders or move spiders to run in production continuously. This is where the solutions for deploying Scrapy spiders come in.
Popular choices for deploying Scrapy spiders are:
- Scrapyd (open source) — 🆓
- Zyte Scrapy Cloud (cloud-based) — 💰
Let us look at how we can deploy the spider using Scrapyd.
Deploying to a Scrapyd Server
Scrapyd is an open source application to run Scrapy spiders. It provides a server with HTTP API, capable of running and monitoring Scrapy spiders.
To deploy spiders to Scrapyd, you can use the scrapyd-deploy tool provided by the scrapyd-client package. Please refer to the scrapyd-deploy documentation for more information.
Scrapyd is maintained by some of the Scrapy developers
Install the packages
pip install scrapyd
pip install scrapyd-client
Start the scrapyd server
$ scrapyd
You should see something like
If you launch http://localhost:6800/ in your browser you should see
Now we have the scrapyd server running we have to deploy our spider here using the following commands
Update the quotesspider/scrapy.cfg
to following code
scrapy.cfg
is a deploy configuration file which describes how you are going to deploy your spider
[settings]
default = quotesspider.settings[deploy:local]
url = http://localhost:6800/ # where is your scrapyd server running
project = quotesspider # project name
Now run
$ cd quotesspider# format: scrapyd-deploy <target> -p project$ scrapyd-deploy local -p quotesspider
This will eggify your project and upload it to the target. If you have a setup.py
file in your project, it will be used, otherwise one will be created automatically.If successful you should see a JSON response similar to the following:
Now you can start the scheduling/running the spider using
$ scrapyd-client schedule -p <project_name> <spider_name>$ scrapyd-client schedule -p quotesspider quotes
Now if you go to http://localhost:6800/jobs you can see our spider is running
You can see the log
Whoooo !! We have deployed our spider. You can run scrapyd in anywhere(may be in cloud) and replace the url
in scrapy.cfg
and deploy the same way.
That’s a wrap!!
Happy Scrapping!! 🕷
Please leave a comment if you face any issues