PySpider — Part 1

A powerful web crawler

Vorathep Sumetphong
2 min readFeb 6, 2018

I will be making the blog series to document and share my experience using PySpider

Digital Spider

What is PySpider?

PySpider is a web crawler system that can run python scripts. It provides you with easy to use UI where you can edit your scripts, monitor ongoing tasks and view results.

Check out its awesome demo!

Tasks running on demo page

Why PySpider?

  • Prioritization — Every task can be prioritized
  • JS compatible — Ability to run js scripts after web loads
  • Database Support — Able to save results to database directly
  • Headless — Run without having browser overhead
  • Retry-able — Failed tasks can retry to complete
  • Periodical — Tasks can be run daily or after a specific period of time
  • Efficient — Results are cached with age to prevent extra work
  • Scalable — Uses a distributed architecture
  • Python — Written in your favourite language
Editor on web interface

Installation

Technical Prerequisites

  1. Python and pip

2. PhantomJS

Install PySpider

$ pip install pyspider

*For people with both pip 2.x and pip 3.x

$ pip3 install pyspider

Run!

$ pyspider

That’s it!

Make sure you don’t have phantomjs already running else just kill the process running on the port.

Resources

If you enjoyed reading, there’s 50 ways (claps) to show your appreciation :)

Have a question or having trouble? Please leave a comment!

Part 2→

--

--