5 Scraping Tips

Artem Rys
Artem Rys
Nov 13 · 2 min read
Photo by Jeroen Bosch on Unsplash

In this story, I will point you to the 5 tips that I have collected while working on my freelance scraping projects.

  • Depend on the stable elements

This one looks like an obvious tip but still, you can meet something like this //div/div[2]/div/p/span in the commercial scrapers.

This is OK for one-time scraper but it is terrible for the long-term one. The reason why — that is not stable. Imagine that a developer changes the structure of the website and now you have one div in that already big expression. And your scraper does not work at all. It is not what you expect.

Instead of this use selectors based on the id / name / property / class attributes.

  • Use ready proxy management solution

If you need to crawl a lot, consider using ready proxy management solutions, there are lots of them available on the internet. Just choose what fits best for you.

From my experience, if you want to build something from scratch — it is not worth your time and effort.

  • Use monitoring

Of course, this is not for one-time scrapping project.

If you are using custom made scrapers, more likely that you will need to write a custom monitoring system for yourself. But it should not be complicated as you may think.

The simplest metric to monitor is the number of scraped items for example. With the help of this on — you can easily identify any issues with your scraper.

If you decided to use Scrapy for your scrapping project — then take a look at the spidermon project.

  • Use API instead of HTML page scrapping

This option is barely available on the websites you may want to scrape, but still. It is really easy to check using Chrome Developer Tools. Just go to Network tab and observe what is going on.

  • Buy data instead of scrapping

In some few cases, you can buy data from the website owners. One of the examples I saw it was a website that has data about companies that operates in different spheres. Just website with a dictionary. Of course, it is not a big deal just to scrape all the information you need from them but consider an option to buy as well.

You can even go further and make an agreement with website owners to provide you with an API you need. But it is an extremely rare case and should be worth it.

python4you

Articles about general Python, best practices and interviews.

Artem Rys

Written by

Artem Rys

Senior Python Developer @ EPAM Poland

python4you

Articles about general Python, best practices and interviews.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade