webpack freelancing log book (week 4)

2017/04/24–2017/04/30

Tobias Koppers
webpack
4 min readApr 30, 2017

--

This week was about build performance.

I merged some of the breaking changes that improve performance into a beta branch to try if they work together. They actually do. I’ll continue to merge breaking changes for the next major version into this branch.

Algorithmic Complexity

The most important changes on this branch are two PRs that change the type of Chunk.modules and Module.chunks from Array to Set. This reduce the complexity of some algorithms and has positive performance impact on large builds. There are more places where Arrays should better be replaced with Sets, but these two were the most important ones.

The problem is that these arrays are accessed all over the codebase and probably in 3rd-party plugins too. So changing the type is a breaking change.

(Persistent) Caching

The next thing I created this week is a simple loader called cache-loader, which has a simple job. It caches the result of the loaders following in loader chain. Luckily the loader API gives us all the tools needed to implement such a loader. Loaders need to declare dependencies to files and directories for watching to work. The loader API also allow to read the current dependencies.

So the loader was pretty easy to implement. Note that loaders run in two phases: The pitching phase from left-to-right and the normal phase from right-to-left. In between these phases the resource file is read. Most loaders operate in the normal phase.

The cache-loader operate in both phases: In the pitching phase it checks the cache for a cache entry. In the normal phase it stores a cache entry.

Pitching phase:

  1. Hash the remaining request to generate cache filename.
  2. Read cache file from disk. If not found, return nothing.
  3. Compare timestamps in cache file with files on filesystem. If not equal, return nothing.
  4. Return the result from cache entry. This skips all remaining loaders.

Normal phase:

  1. Hash the remaining request to generate cache filename.
  2. Read dependencies from loader API.
  3. Read timestamps of dependencies from filesystem.
  4. Store dependencies, timestamps and result to cache file.
  5. Return result. The loader is transparent.

Simple but efficient.

https://github.com/webpack-contrib/cache-loader

Parallelization

The next thing was more complicated (and less successful): thread-loader

This loader should run the loaders following in loader chain in thread pool. So we distribute the expensive loader work on more CPUs which should result in fast builds. At least this was the plan…

But it was not that easy. Starting with the easy parts.

It’s possible to run loaders out of the webpack loader pipeline with a module called loader-runner. webpack uses the same module to run loaders. webpack also attaches some additional functions and properties to the loader context like resolve, emitWarning, emitFile or options. Plugins can add more. Not everything is needed by every loader, so we can emulate some of them and skip others. So our thread-loader probably only work for cases where special stuff is not needed.

Now the difficult parts. The threadpool. Probably nobody knows this, but webpack 0.7 to 0.9 or so already included the option to run loaders in a worker process. The result was disappointing. The overhead for spawning processes and passing data between these was too high and running of-process was slower in all cases.

But you might think things have changed since then. Loaders are more expensive. node.js improved. There are threading libraries now.

The threading options was the most promising thing. Communication between threads is faster than between processes.

So I grabbed webworker-threads from npm. I’m on windows, so I downloaded and installed the build tools first. Pretty straight forward process. It was more complicated a few year ago. Great work…

After some experiments with the library I noticed a problem: The thread is not a full node.js. It not even has a module system. So no require in the thread. This is a big problem. Loaders won’t run here. I wasted a hour googling. Also thought about implementing the require function myself. But most native modules are not accessible from the thread. Only fs. I read something about some native modules are not thread-safe…

We need a node-in-a-thread! Any idea?

Ok, so far bad news. I thought about aborting this project (only the thread-loader, not webpack), but on the next day I decided to try the process pool approach again.

I tried the IPC included in node.js (process.send), but I was pretty slow. It JSON.stringifys all data passed, so not suitable for large binary blobs.

So I rolled my own IPC using binary streams, which is much faster.

But… it’s still an overhead. Adding the overhead for process spawning and the overhead for loading the loader in every worker process…

My own experiments didn’t show a significant performance improvement, but maybe your application is large enough and has expensive loader chains. So the loader is published, but don’t expect too much.

https://github.com/webpack-contrib/thread-loader

So far I’m pretty happy with the first month. I probably have a problem in a few months, because we now using more money from the collective than we receive. Note that webpack is not “owned” by a company like many other big Open Source projects. More details in a separate blog post…

Sponsor us here!

Survey didn’t help…

Plan for the next weeks:

  • Fix some bugs reported this week for webpack
  • Write a blog post about some finacial aspects of webpack
  • Check what’s up with the css-loader, maybe it does need a refactoring
  • Add some new features to import()
  • Release a new webpack major version with some breaking changes
  • Fix bugs released with the major version
  • Start implementing scope hoisting…

Thanks to happypack, hard-source-webpack-plugin and babel-loader cacheDirectory, they did a lot of work on caching and parallelization and acted as inspiration for the loaders.

< week 2+3 week 5–7 >

--

--