Thoughts after trying Puppeteer
The Chrome team just release Puppeteer. It’s great. I think it’s gonna be my scraping tool of choice from now on.

A rapidly evolving library
Quite clearly the repo is still in its early days, a lot of stuff mentioned in the API docs was simply left out and you get “… is not a function”. Lots of pull requests, lots of issues and features being actively worked on. The future is bright! There will be more functionality e.g. ElementHandle and various event hooks — it will be more and more usable.
When is a page loaded?
The event tree on modern web pages is hardly linear. In the old days, you click on a link, order some bytes, they arrive, the ‘load’ event fires. Nowadays, a lot of the elements are loaded on the fly, sometimes after the page load event. When you scroll to the bottom, a new bottom emerges. When you wait for 2 seconds, some annoying modal pops up without user action. A very common case is that the page has technically loaded, but the data request is ongoing, so if you take a screenshot then, you see a spinner.
Non-chainable promises
At first, I wondered why all these awaits in the examples and dismissed it as cool-kids nonsense, then very soon it became clear that it’s just so much less painful to use with the way the APIs are set up.
Almost every method (e.g. load page, find element, focus on element, click, keyboard event, virtually everything) returns a promise, but that promise does not return the original upstream object! So page.doSomething() returns a promise that does not resolve to page! The native Promise API also doesn’t support something similar to .tap() in Bluebird, so you end up attaching .then(() => page) to the other calls…
With “await”, you can just write it like synchronous blocking code:
await page.focus('input');
await page.type('Hello World');
await page.click('#submit');Selecting and addressing elements
We probably need a more powerful interface than the pseudo-CSS selectors — mostly functional pipes with a CSS-selected array as the first, crude input. A very simple example is selecting an anchor tag based on its text, the more ideal idiom would be:
return $('.section a')
.filter(el => el.href.match(/myRegExp/))
.map(el => el.href)If it’s your own site, and you are using Puppeteer for testing, then systematically naming every testable element with a unique ID would be a good strategy. For scraping, I’m trying to build “bots” that browse the web like humans do, but faster.
Error handling and control flow
Basically, a question of “If this button or link exists, do X, if not, do Y” — then the trees of X and Y cascade away and become more complex… [to be continued if I have more to say on this topic…]
But wait, there is more!
You can debug your tests/scripts live with DevTools! That means you can add a breakpoint at a sparkling line of await and inspect run-time variables!
