Recent developments in the browser automation & web scraping space

Okay now, for a long time there wasn’t a lot of action in this space. We had PhantomJS and NightmareJS, which to be fair, at least in the case of NightmareJS was all you’d ever wish for.
Earlier this year Chromium, the open source project behind the worlds most popular web browser Chrome, released technology that allows you to run the browser headlessly.
This stirred some proper action in the community, because, I believe, this resolves the dependency on Electron (as in the case of NightmareJS). And removing dependencies sounds like a good thing.
There is a bunch of different projects that has surfaced and that wants to gain traction as the next hot thing. This blog post will try to give you a brief update on how things are going with that.
Google’s home grown
Google decided to join the party and build their own high level Node API on top of headless Chrome. It’s called Puppeteer.

Looks promising, but since before I have the feeling that when it comes to data aggregation, Google does not like competition. Which in turns leads me to believe that this might be a problem for this library in the long run. OK, tin foil hat off. Next.
Navalia
According to the growth rate of Github stars, this project is the one mentioned here that so far has had the least amount of traction.

Having said that there is definitely good involvement from developers and things are looking nice and smooth.
Chromeless
This thing looks very promising and has gotten almost as good traction as the Puppeteer project, judging by the number of Github stars. It runs either locally or on AWS Lambda. Which seems very convenient and scalable. Very nice indeed.
Check this demo:
How cool is that?

