For those utility players: must have in your toolbox: headless-chrome

Tao-Sheng Chen
ShopBack Tech Blog
4 min readSep 8, 2018

--

There are always utility players in a baseball team which allow flexibility of tactics. Mainly because a team with 25 players should have at least 10~12 pitchers, 1 catcher, 1 shortstop, 3 basemen, 3 fielders. That means there are other 5 backup players for 8 positions. Some body needed to cover more than one position.

In engineer’s organization, full stack developer is the same concept. The flexibility of a developer could help (at least in early stage) his/her career.

In a company, especially a small start-up, utility player could be seen everywhere. Common case are PM could play a QA role, HR could do office admin, BD sometimes handle marketing and even key accounts. And engineer could do any thing which involves in electricity: from change lightbulb to write a program which have only one button and after click button it reads users mind and does whatever user want.

The most epic case is engineer! (especially a specific kind of engineer). There are many miserable stories of 工具人(utility player in traditional Chinese) in Taiwan’s social media, fans page or blogs, simple search “工具人” on Taiwan’s google and then you will understand what I mean.

To be an utility man is not easy. You will need to have a lots of tools on hand. Headless Chrome is one of the important tool!

In internet world all the things user want are all about web pages or actions. In protocol level, most engineers know how to do on HTTP itself but it is always hard to fully simulate users behavior on browser.

The reason is simple, early web (HTTP/HTML) was not designed to have the exactly view. It designed to easily transfer data, link information and browser could interpret in your own way as long as fit the HTML standard. Freedom is not free. It comes with a price.

Of course, if there is a pain, there will be a product to ease it. Automation testing tool like selenium could help on programmable actions on browser. Before chrome v59, the PhantomJS might be the best option if engineer need to operation browser in headless way. Meaning no need to have X environment but still try to simulate as much user behavior as possible.

An epic requirement is from our hard working and talented BD (lets call her J). She is another kind of utility girl who work hard to not only achieve her goal but also wish to move the whole business further. Just recently, at a thunderstorm afternoon, J asked that could I help to make a “simple way to have full web site screenshoot every few seconds to reduce her time on track if our rolling banner works”. J is a pretty smart girl. She knew that she could work harder to reach the goal to benefits her team but she also knew that a better way to handle repeating tasks is also critical. If she could save a bit time on repeating tasks that means she could have a bit more time on DB to make grow our business better. She want to own the change she was seeking and to have killer team execution (means got another utility man to help) is of course a reasonable way.

As a tech person in Shopback, I surely don’t want to just guide J to a few chrome-extension solutions or just drop her a few web pages to ask her to read. Those tools could help but she need to take actions. If that could work out in a linux box with simply cron job then I will be a kind of the utility man of utility man?!

Anyway, a few steps requires

(1) a linux box (here Iuse ubuntu)

(2) must have chrome version≥59, nodejs

(3) better to have install supervisor, simple use apt to install it-> #apt-get install supervisor

(3) setup a service in supervisor which make sure chrome headless become a service. Just prepare a chome.conf put in -> /etc/supervisor/conf.d/

See the content of chrome.conf:

Make sure setup language to zh_TW and also make sure you have zh_TW.UTF-8 font install in your linux box.

(4) write a script to access . Here is an example. In short, the script uses chrome-remote-interface to access the page. After DOM loaded, we will simply use the Page.captureScreenshot to get full web page in PNG. Take a look at the devtools to lear more-> https://chromedevtools.github.io/devtools-protocol/tot/Page

(5) simple use /etc/crontab to schedule the script to access headless chrome. Notes that it will be much better to deamoniz the script. Supervisord is what I pick here.Also, as the utility man of utility man, we of course send the result file via text message. In Taiwan, LINE might be the best options.

This image is one of the schedule full page screenshot for utility girl J.

--

--