I am a software engineer and worked for a data analysis company. At that time, I was in charge of both the data collection and data visualization teams. We have built a data collection system that could collect all the user’s interactive actions without using any additional codes. Not something like google tags needs to do a lot of coding to tag the user’s activities on the target website.
In that period, our marketing team always had some web crawling requirements. They want to scrape some sites to retrieve potential customers leads, I will talk about that more detailed in some other posts, so the web scraping requirement is our data collection team’s duty.
What we faced about web scraping
The first thing we met is those are really tedious tasks. Those are all about XPath. We have to go through different web sites required by the marketing team to extract the data they wanted. And we have to change the extraction rules by various sites because they have different XPaths.
Second, the extraction rules we have defined are hard to test. It only could be tested after the data extraction task is finished. A typical scenario is that we wait for several hours and find the extraction rules are not defined correctly. At that time, we thought if we have a tool to represent the scraping result in realtime, it would be more wonderful.
Third, as the anti-web-scraping strategy that most websites have deployed, it is hard to simulate a real user to scrape data. It always needs to build an IP pool to solve this problem. But in most companies, web scraping task is a Non-mainstream task, so it is hard to get enough support, especially to apply budget.
Fourth, we thought to employ outsourcing companies or use some Saas service tools to solve the problems mentioned before. But in a lot of cases, the target websites we want to scrape need to log in. Our marketing team does not wish the username and password to explore, so this way is blocked.
We build what we want
So that is why AnyPicker was born.
As the experience of data collection, we build a visual web scraper. It aims to that all you need to do is to click on the target website to set want you to want to collect.
We solved the problems that we have met. AnyPicker does not need any IPs pool. There is no need to upload the target website’s username or password; It runs in your local chrome browser; It is a real user! Although it still can run scraping tasks in parallel! Running in your local web browser that means significantly reduce the costs!
And It shows the scraping result in realtime.
If you are interested, visit our website and download it. It is free now!