I am a software engineer and worked for a data analysis company. In that time, I was charge of both data collection team and data visualization team. We have built a data collection system that could collect all the user interactive actions without using any additional codes, not something like google tags need to do a lot of coding to tag the user’s actions, in the target website.
In that period, our marketing team always had some web crawling requirements. They wants to scrape some sites to retrieve potential customers leads, we will talk about that for some other posts, so the web scraping requirement is our data collection team’s duty.
What we faced about web scraping
First thing we met is those are really boring tasks. Those are all about xpath. We have to go through different web sites that required by marketing team to extract the data they wanted. And we have to change the extraction rules by different sites, because they have different xpaths.
Second, the extraction rules we have defined are hard to test. It only could be tested after the data extraction task is finished. A common scenario is that we wait for several hours, and find the extraction rules are not defined correctly. In that time, we thought if we have a tool to represent the scraping result in realtime is wonderful.
Third, as the anti-web-scraping strategy that most websites have deployed, it is hard to simulate as a real user to scrape data. It always need to build an IPs pool to solve this problem. But in most companies, web scraping task is a Non-mainstream task, so it is hard to get enough support, especially to apply budget.
Fourth, we thought to employ outsourcing company or use some saas service tools to solve the problems as mentioned before,. But in a lot of case, the target websites we want to scrape need to login. Our marketing team does not want the username and password to explore out, so this way is blocked.
We build what we want
So that is why AnyPicker was born.
As the experience of data collection, we build a visual web scraper, all you need to do is to click on the target website to set want you want to collect.
We solved the problems that we have met. AnyPicker does not need any IPs pool, and no need to upload target website’s username or password, it runs in your local chrome browser, it is a real user! Although, it still has the ability to run scraping task in parallel! Running in your local web browser that means significantly reduce the costs!
And It shows the scraping result in realtime.
If you are interested, visit our website and download it. It is free now!