5 Scraping Tips

Artem Rys
python4you
Published in
2 min readNov 13, 2019

--

Photo by Jeroen Bosch on Unsplash

In this story, I will point you to the 5 tips that I have collected while working on my freelance scraping projects.

  • Depend on the stable elements

This one looks like an obvious tip but still, you can meet something like this //div/div[2]/div/p/span in the commercial scrapers.

This is OK for one-time scraper but it is terrible for the long-term one. The reason why — that is not stable. Imagine that a developer changes the structure of the website and now you have one div in that already big expression. And your scraper does not work at all. It is not what you expect.

Instead of this use selectors based on the id / name / property / class attributes.

  • Use ready proxy management solution

If you need to crawl a lot, consider using ready proxy management solutions, there are lots of them available on the internet. Just choose what fits best for you.

From my experience, if you want to build something from scratch — it is not worth your time and effort.

  • Use monitoring

Of course, this is not for one-time scrapping project.

If you are using custom made scrapers, more likely that you will need to write a custom monitoring system for yourself. But it should not be complicated as you may think.

--

--

Artem Rys
python4you

Principal Software Engineer @ Splunk. Writing about Python, GitHub and Splunk.