5 Scraping Tips
In this story, I will point you to the 5 tips that I have collected while working on my freelance scraping projects.
- Depend on the stable elements
This one looks like an obvious tip but still, you can meet something like this //div/div[2]/div/p/span
in the commercial scrapers.
This is OK for one-time scraper but it is terrible for the long-term one. The reason why — that is not stable. Imagine that a developer changes the structure of the website and now you have one div
in that already big expression. And your scraper does not work at all. It is not what you expect.
Instead of this use selectors based on the id / name / property / class
attributes.
- Use ready proxy management solution
If you need to crawl a lot, consider using ready proxy management solutions, there are lots of them available on the internet. Just choose what fits best for you.
From my experience, if you want to build something from scratch — it is not worth your time and effort.
- Use monitoring
Of course, this is not for one-time scrapping project.
If you are using custom made scrapers, more likely that you will need to write a custom monitoring system for yourself. But it should not be complicated as you may think.