Solving the Box Selection Algorithm
Note: The links in this article to shotput’s documentation no longer work. If you have any questions, please feel free to reach out.
When I joined Shotput in February, I knew I was going to get to dive deep into development. It’s a small team with some tough problems to solve, and I was really excited to flex my newly acquired programming skills and simultaneously continue to develop as a software engineer. What I hadn’t yet realized was that I would get the opportunity to take on a non-trivial algorithmic problem and solve it: box packing.
Think about it for a second. At first it seems like there is a simple solution to the problem: see if a product fits, if it does, put it in the box, if not, use a different box. But then you realize you have to think about dimensions, rotations, packing more than one item at a time, and shipping costs. With so many variables, runtime exponentially increases which means you have to apply smart optimizations. In other words, it reduces to a well-known computer science problem called Knapsack.
When I took on the task of building a three dimensional packing algorithm so our warehouse could accurately select the best and most efficient box for a shipment, I knew it had to be good, and I knew it wouldn’t be easy. We’d been using a 3D packing library I’d installed into the code a month prior, but it was constantly inaccurate and our operations team was frequently having to change the box it had selected, which ultimately affected shipping cost, operation time, and experience for both our clients and their customers.
Before I get into what goes into writing the box algorithm (dreaming in shapes and code for one), let me talk a bit about why it’s important, and why it needs to be accurate and fast.
For a lot of fulfillment companies, box selection is a nightmare. Many larger companies have found it more cost effective to select a box that is too large than to have to figure out which one is would work best. Have you ever gotten a package from Amazon, that had one item — perhaps with non-golden ratio dimensions — and the box is simply WAY too big for the product inside?
Unless you live under a rock, surely this has happened to you. As a consumer, upon receiving a comically large box for it’s contents, you might laugh a bit, post a photo to twitter or blog about the ridiculousness of it, then forget about it; as a business, using a box that is too big is hugely inefficient in cost, materials, and energy consumption.
Here, I’ll give an example:
Say you are using a box to ship a product that is 25% larger than the necessary box — and let’s face it that’s being conservative. To prevent movement and damage to the contents of a package, you’ll probably use additional packing material — the bigger the box, the more packing material required. Not only that, but carrier rates are often affected by volume. For USPS, the difference in shipping cost between an 18x18x15” parcel and an 18x18x16” parcel is $5, while UPS charges approximately $0.35 for every additional inch in the dimensions of a parcel.
There is an environmental impact to using too large a box as well. The obvious impact is the result of using more materials, which, especially if you are using plastic, certainly adds to your carbon footprint as well as overall cost. Less obvious is the energy consumption from the carrier itself. If your box is 25% larger than a workable alternative, that means you could only ship 4 in an equal amount of space for every 5 that you would if your box were the right size, ultimately requiring more vehicles to deliver an equal number of packages.
Now it should be obvious why selecting the right box is important. By using an algorithm to select the correct shipping box for a shipment, we are not only bringing fulfillment into the 21st century — through providing accurate and quick pricing and a consciousness of our carbon footprint — we also reduce decisions that need to be made by the operations team, which then allows them to focus their energies on higher priority issues, such as customer support.
The simplest, most obvious accurate solution to the box packing problem:
For each product you need to pack, add it to a box, rotating the product and any other contents of the box until you have determined it does or doesn’t fit. If it doesn’t fit, start another box.
That sounds simple enough. Let’s work through it.
Say you a few 4x5x5 boxes, and you have 3 products:
- First, take product 1, rotate until you can fit it into a box.
- Next, take product 2, try to put it in the box. If it does not fit, rotate product 1 and rotate product 2 until you have tried all possible options (36 at minimum, not including any possible shifting and moving of the products from one location of the box to another) until you realize that product 2 will not fit into the first box.
- Open a second box, and put product 2 in.
- Finally take product 3 and rotate it and product 1 until you find a rotation at which both products fit into the box.
But what if you had a fourth product that was also 1x3x3? It would fit into the first box, and in order to find a position in which it would fit, you would again need to rotate products 1, 3 and now 4 until you find a rotation in which everything would fit, resulting in up to 216 rotations of products, and several more shifts from one location to another.
The runtime on this is of course is ridiculous and it’s an incredibly suboptimal way of discovering what would and would not fit into a space and how to pack products for shipping. For every new item in a box, you multiply the time complexity by 6. This doesn’t seem too bad if you have 2 or 3 items in a box, but what if you are packing lots of small items? 10 items results in the possibility of 60,466,176 iterations. You read that right, Sixty Million rotations just to get the 10th item into the box. When trying to implement known exact solutions to the knapsack problem, scalability — the bane of software engineering — quickly becomes an issue. Though we strive for perfection, using an algorithm which slogs down the system with exponentially increasing runtime makes the solution not only ineffective, but a liability.
Solution #2: How I Went about developing a fast, accurate algorithm:
When I went to the whiteboard to begin diagramming and pseudo-coding the algorithm, rather than thinking about how a computer might pack boxes, I thought about how a person would pack products into a box. I wrote a series of optimizations to not only reduce the number of times an object has to be rotated, but also to allow a product to, once digitally placed into a box, not require movement. From there, I could quickly figure out how many of which products could fit into a box and then compare these results against other options.
The first optimization I integrated is the rule of First Fit Descending. What that means is ordering what you are packing by largest first. From there we can also apply the principle throughout the algorithm in determining which side to lay an item on, and which direction to place it in the box. So we pack the biggest products first in the smallest space you can. From there I thought about how to optimize the amount of space left — no matter how you turn it, a product will only ever take up a given volume, but you can optimize the dimensions of the space remaining through rotation. By organizing the products we are packing by size prior to beginning the algorithm, we then know that any given block of space remaining in the box need not be larger than the last item you just packed.
After each iteration, I have a complete look at all the remaining available space in a box, and can weed out any space that is unsuitable to fit any of the remaining products in an order. By also taking into consideration physical constraints, such as max weight, I can easily determine which boxes will be unsuitable for a dense order. In the end, after all the determining how many items fit into which boxes, ultimately the goal is to figure out which box to use at the lowest cost to the consumer, so we find the smallest box that will fit all the order’s items in the fewest number of parcels.
During the weeks I wrote the algorithm, I would lay down to sleep at night and see virtual prisms rotating and shifting to fit more. Every once and a while I’ll come up with a way that my algorithm might return results that aren’t 100% accurate, and I quickly resolve to mitigate the situation. A few weeks ago, I dreamt an obscure situation where the algorithm would provide the wrong answer, I woke up, tried it, tested it, and sure enough I was right. In writing an initial solution, I had been assuming that the smallest item would be the only thing we needed to look at when determining if a remaining space was still eligible to use. I discovered the problem, isolated the error, tweaked of the optimization techniques, and then, with just a little bit of sweat as I waited for all my tests to turn green, the problem was solved.
I’ve had a lot of fun writing and perfecting the box packing algorithm. I was faced with the problem of writing something that is fast and accurate enough to use throughout the Shotput system, and I’ve developed something that I am not only proud of personally, but also am excited to share it with the rest of the world, and to see what you all decide to do with it.
Checkout a more technical analysis of the algorithm and how it compares to other packing algorithms out there here.