Serving 2TB/day of dynamically-scaled images: Part 1
How I built my 1st native Node bindings, patched GraphicsMagick++, tuned and optimized GM itself
So I had the pleasure to work on iHeart’s image scaler and figured I should document as much technical aspects as I can for future reference. At the time it was serving roughly 2TB of dynamically sized images per day, using a total of ~60 cores and ~15Gb of memory.
The need for the project comes up as iHeart was supporting too many devices/systems with different screen sizes. Our image catalog was also regularly updated by different music labels that offline batch processing becomes rather unmaintainable. Each client also has very different sets of requirements, ranging from scaling operations (resize, fill, blur), formats (PNG vs JPEG vs WEBP) and quality also. Therefore we know that we’d need to build something to service those requests, at scale.
Some details regarding the driver can be found here. The post partially covers the scaffolding of a native Node driver project with node-nan, bindings.gyp, benchmarks and such. The other part of this is actually to patch GM++ driver itself, since at the time it was lacking some functionalities that the CLI had (resize and extent with gravity to be exact). I can certainly do some of the gravity box calculation in the Node layer, however that would definitely be less efficient since images are buffered in C++, then dimensions are extracted and type-converted to JS while it’s still buffered, then gravity-based resizing needs to be re-implemented also. It also breaks the simplicity of 1-to-1 binding mapping between GM++ to Node. So patching GM++ it is.
Being a C/C++ lib that supports multiple platforms, GM has a rather long release cycle (2 years to be exact, until CentOS considers a version stable). Therefore the downside is that we’ll have to live with dev snapshots until GraphicsMagick >=1.3.21 is out. C/C++ programming paradigms are also radically different with type conversion being one and buffer utilization being another (buffers are allocated, reused and passed inline instead of new-ing things and memcpy) which is certainly very interesting to pick up. In case anyone wondered, JS function parameters are call-by-sharing so not exactly pass by reference or value.
After building the bindings, tuning GraphicsMagick is a whole different set of things to consider. Traditionally scaling images is insanely expensive. GM does support OpenMP which is a way of utilizing multiple cores but I have not had success with this flag turned on. Early symptoms range from slow performance to CPU thrashing due to its attempt to multi-task processing. It’d be interesting to see how the GM GPU utilization delegate turns out but the author seems to having some strong opinion about it so I’ll see how that goes. Anw, the storage hierarchy the GM uses is pretty standard (memory, mmap-ed, swap, straight disk which can be virtuallized to be in memory) so you can tune rather heavily on pixel cache limit, memory & disk usage and such.
In terms of minimizing file size and increasing specific operation speed, I ended up using a couple of techniques:
- -strip: Stripping metadata off images sometimes drastically reduces its file size. Most of the time they contain redudant EXIF data that you won’t need anw.
- -interlace Plane: This allows your image to be progressively rendered from blur to clear due to the way it structures RGB. By default images are rendered top to bottom, left to right, which in case of failure leaves you with half an image. This turns it into blurry image instead, which can be better sometimes
- -filter Lagrange: Lagrange filter seems to yield better performance for me during resampling and re-sizing of images.
- Quality control: Image quality is not linearly correlated to image size so there will be some tuning on this parameter. It is however fairly indistinguishable between quality 75–100 even for 4K images.
- WEBP: WEBP is the new image format by Google and GM happens to have a delegate for that as well. I didn’t end up utilizing this as much as I had hoped to due to several reasons: De/Encoding WEBP is somewhat more expensive than JPEG/PNG on mobile devices, which might affect other CPU-intensive threads (like decoding music stream in the case of iHeart). Besides, WEBP adoption rate is rather low right now across browsers and finally, specifying 1 more parameter may result in cache fragmentation since full URLs are cache keys.
- Resize before blurring: Blur is extremely expensive, especially with big images. It can take upwards of a few seconds to scale a 1080p image. How did I optimize this? Downsize it first. I purposefully downsize any request that includes blur as part of the URL to 20%, blur it, then upsize it to 500% so that dimension is preserved. This significantly reduces processing time thus increasing scalability quite nicely.
- Minimize buffer copy/conversion: You can read all about Node SlowBuffer (and why it’s called slow due to heap allocation) and know that it’s not really the most efficient way to keep type-converting Buffers, even though in this case it’s binary so it’ll likely be memcpy.
It has been a very rewarding and challenging process, at least for me, to get started with the 1st piece of the project. Working w/ GM and learning about image manipulation are already fun in and of itself so feel free to read more about layering, masking and such. In part 2 I’ll talk about API design and infrastructure setup!
P.S: Rounded-corner, as simple as it sounds, is actually quite complex. A quick Google will tell you what’s actually happening, but in short it’s a 3-layer composition where layer 1 is the img, layer 2 contains 4 boxes at each corner and layer 3 contains 4 circles at each corner with each having the radius of the border radius. And yes, it’s pretty expensive also.