PySpider — Part 1
A powerful web crawler
2 min readFeb 6, 2018
I will be making the blog series to document and share my experience using PySpider
What is PySpider?
PySpider is a web crawler system that can run python scripts. It provides you with easy to use UI where you can edit your scripts, monitor ongoing tasks and view results.
Check out its awesome demo!
Why PySpider?
- Prioritization — Every task can be prioritized
- JS compatible — Ability to run js scripts after web loads
- Database Support — Able to save results to database directly
- Headless — Run without having browser overhead
- Retry-able — Failed tasks can retry to complete
- Periodical — Tasks can be run daily or after a specific period of time
- Efficient — Results are cached with age to prevent extra work
- Scalable — Uses a distributed architecture
- Python — Written in your favourite language
Installation
Technical Prerequisites
- Python and pip
2. PhantomJS
Install PySpider
$ pip install pyspider
*For people with both pip 2.x and pip 3.x
$ pip3 install pyspider
Run!
$ pyspider
That’s it!
Make sure you don’t have phantomjs already running else just kill the process running on the port.
Resources
If you enjoyed reading, there’s 50 ways (claps) to show your appreciation :)
Have a question or having trouble? Please leave a comment!