How to write ES2015 / ES6 javascript for PhantomJS using Webpack

Disclaimer: Even though the title says “How to”, it’s really more “How I did it”. There are obviously several ways to achieve the same thing, most probably better ways than mine as well.

I’m going to assume that you know one or two things about javascript, more specifically serverside javascript via NodeJS. It’s also a good idea to know what PhantomJS is. Essentially, it’s a web browser that you operate through javascript code.

First some background: every programmer should have side projects, that way, you can improve your skills and learn new tools without the pressure of a deadline. My current side project is a small application that periodically extracts data from a webshop and saves it to Podio.

The first version was a sloppy stack of for loops in more or less 2 large javascript files. PhantomJS runs “crawler.js” to fetch the data from the webshop and saves it to text files on my application server. “index.js” is a small Express app that handles the communication with Podio.

While it did work (most of the time), it was extremely difficult to debug due to the famous callback hell, duplicate code and plain poor design. When the whole thing completely broke down I decided to do things “the right way”™. Since I also wanted to take this opportunity to learn as much new stuff as possible, I decided to read up on some of the best practices in the modern javascript world.

I set up some basic principles that I will use in every javascript project:

  1. Use ES2015 as much as possible (it’s the future, so I better learn it)
  2. Use small composable modules that can be tested separated and used by both Node and Phantom (this should already been a no-brainer)

Thanks to the more recent versions of NodeJS, many ES2015 features can be implemented without any need to transpile it to the earlier ES5. But PhantomJS uses Webkit, not V8 like NodeJS and doesn’t have nearly the same support for ES2015. PhantomJS’s file system API also differs from Node, which we need to handle in our modules.

Webpack to the rescue

If you Google “phantomjs es2015 OR es6” you get lots of result that involves testing frameworks and how to get them going with ES2015. Since we have no “karma.conf.js” or similar we can’t use that approach.

I’ve been looking at Webpack together with React lately and I thought that this might as well be a project where I learn more about Webpack too (after all this is a side project).

— “If I can use webpack and babel to transpile browser javascript then serverside js should work too!”. I found this tutorial by James Long: Backend Apps with Webpack (Part I) which is great! Basically, it comes down to settings a target: node option in the webpack.config.js file to tell Webpack not bundle built in Node modules like fs (the file system).

We also don’t want to bundle modules in the node_modules/ directory since our crawler will have access to them directly. James describes how to do this very well using webpack’s externals option, so I’m not going to repeat it.

PhantomJS has its own built in modules which you also need to tell webpack how to handle.

// Won't work in Webpack
let webPage = require(‘webpage’);

My simple solution was to just manually add the module that I needed to the externals option. My complete webpack.config.js looks like this:

File system API differences

Both my application and Phantom crawler needed access to the file system. But since PhantomJS and NodeJS implements the fs module differently I needed a way to determine the environment in which the script was run. A simple check for the phantom object did the trick:

const isNode = typeof phantom === ‘undefined’;

With the isNode variable, my list files function looks like this (abbreviated for clarity):

let list = () => {
let files;
if (isNode){
files = fs.readdirSync(path);
} else {
files = fs.list(path);
}
  ...
}

Current directory

PhantomJS does not have the __dirname and __filename variables. To get a unified current directory variable I used the same isNode variable:

// execute function immediately into variable
const current_dir = (() => {
if (isNode){
return __dirname;
} else {
let fs = require(‘fs’);
return fs.absolute(‘.’);
}
})();

I only need this in my config file which is why it’s a self executing function. A regular function would be a better fit if you need to call these in multiple files.

Promises

Phantom lacks support for Promises (remember, I need to learn and most importantly get out of callback hell). This was an easy fix, just

npm install es6-promise —-save

and then

require(‘es6-promise’).polyfill();

at the top of the crawler.js script. You can probably use Babels Promise polyfill as well.

The result

Together, these settings enable some pretty sweet code:

and since promises are chainable we can get (imho) very readable sequences:

There are still a tonne of improvements that can be made, but this sets the right direction for me.

What do you think, would you have done anything differently? Have you solved the same problems, if so, how? Since this is my first post on Medium and my first “how-to” article I appreciate all the feedback I can get.