Prototyping A Dimensioning System

Using range cameras for parcel sizing

Published in

Ninja Van Tech

8 min readJun 3, 2018

A couple of months ago, I was tasked to figure out a way to measure the size of parcels quickly and accurately. At that time, we were very dependent on the parcel dimensions we received from our shippers, and we didn’t have an effective way of verifying the accuracy of those numbers. Unfortunately, this was a problem, especially when it came to pricing, because we would undercharge our shippers for delivery, and have no reasonable evidence to use to resolve disputes. Physically measuring each parcel was not an option for obvious reasons, given the volumes that we inbound every day. So we needed to find a tech solution to this problem. Initially, the option to purchase an existing system on the market was briefly considered, but eventually we decided to try to see if we could build a low-cost, portable system with range cameras ourselves.

And this is the story of how we did it.

Stage 1: Preliminary research

As with any endeavor into the realm of inexperience, the first thing to do would be to look around at the available technologies out there in order to map out the possibilities of what could be done within a reasonable budget and time period. At this stage, we considered multiple devices of varying types, sizes and sensitivities: cameras mounted on iPads, webcams, as well as the Kinect and other Kinect-like devices. There were a lot of options out there, but many of them were unsuitable as they did not meet our range, sensitivity, price and ease-of-development requirements. After much consideration, we procured a variety of cameras, and started playing around with them to see if we could build something with their SDKs. Eventually, we found a camera that could perform to our specifications and we kept our setup to one camera (as opposed to multiple ones at different angles) to keep things simple for a start.

Stage 2: A workable prototype

Now that we had a suitable range camera, our first task was to “teach” our camera to identify parcels. Assuming that the camera would be mounted in a fixed position, this would reduce our problem to the well-known background subtraction problem, in which there were many image processing and computer vision techniques available that would yield some pretty satisfactory results.

So we went ahead with an approach that used background mixture models. Essentially, this involves two phases. In the first phase, we build our background model, which involves running the camera for a period of time pointed at our background scene. When we’re done, we can move onto the next phase, where we introduce our parcel into the scene. Here we take the difference between our background model and the current scene, apply a threshold to filter out any noise and unwanted artifacts, and magically obtain the segment of our scene that just contains our parcel. If this sounds complicated, perhaps this image will give a better explanation:

This seemed like the perfect solution for us. But life as a software engineer is never easy. After implementing the algorithm, we found several problems with this approach that would have made it difficult for us to proceed down this path:

The background model was generated by the color image, and as a result it was very sensitive to the color of the parcel. This means that if the parcel were to be similar in color with the background, then the algorithm would fail to segment it properly.
The background model was very sensitive to changes in the scene. This meant that small changes in camera stability or illumination would again cause our segmentation to fail.
The position of the color and depth sensors on the range camera did not overlap, hence the resulting color and depth images were not aligned at all distances from the camera. There was an optimal min and max distance for which the two images would be aligned, but our parcels come in all shapes and sizes, and it would have been difficult for us to guarantee that the parcels would always fall within that range.

It was a shame that we had to abandon this approach, but eventually, we found a much simpler solution. This involved making a few assumptions: (1) that the camera would always be facing the ground, and (2) that the ground is always flat. With these assumptions, we could simply obtain an average of all the depth values within the range of the camera’s view, and use that as the baseline for all our future measurements. This meant that anything that we placed within the camera’s view would give us depth values that, if they were above the baseline, we would simply take as the foreground. And we would use these values to provide the vertical dimension (height) of the parcel.

Now, the first task was complete. Being able to identify the parcel and measure its height, we then moved on to figure out how to obtain its length and width.

Our parcels come in all shapes and sizes:

Often times, they are not even rectangular. And they can be placed under our camera in any orientation. So how do we even define what is the length and width? Well, some math (specifically, linear algebra) came in quite useful here. I won’t describe what we did in detail, but essentially, we compute and draw the minimum bounding box around our foreground depth values. Here’s a point cloud visualization of the result:

A rectangular-shaped parcel (left) and an irregularly-shaped parcel (right).

And we were done! Or so I thought. At this point, it was made known to me that we also needed to measure the dimensions AND the weight of the parcel at the same time! This element of weight was previously not factored into our plans. Moreover, we hadn’t planned to deploy our initial prototype with any screen for visuals, so it was difficult for a user to get a physical sense of whether the parcel was within view of the camera. So the idea was to put a large flat surface on top of the weighing scale and have the camera crop to the surface so that only parcels within its boundaries will be measured. Well, this was a bit of a challenge, but it turned out not to be too difficult to solve. We ended up using Euclidean clustering to divide the scene into ‘clusters’ where we would determine if points in the point cloud belonged to the same cluster by measuring how close they were to each other. And with that, we found the surface on top of the weighing scale by picking the largest cluster that was flat and not the floor. Here’s a visualization of the automatic clustering algorithm:

Stage 3: Optimization for real-world usage

One issue that we didn’t anticipate until the late stages of our development was how we were going to integrate this dimensioning system into our existing inbounding process. In particular, our current inbounding process at that point involved using a handheld scanner to read the barcode on the parcel. And since our application constantly published the dimension and weight values in real-time, the moment someone went into view of the camera with the scanner, his or her hand would become part of the measurement! So to address this problem, we adapted our code so that it would only publish the dimensions when the values were “stable”, i.e. when the values within the past couple of image frames didn’t deviate by more than a predefined threshold. This simply meant that if there was some motion under the camera, it would stop publishing values. So with that, you could put your parcel on the weighing scale, take your hands out of view of the camera and wait a second for the readings to be stable before using your scanner to safely read the barcode.

Finally, our work was done! Except that our application was written in C++ running on a modern MacBook Pro, and we couldn’t possibly afford to buy a Mac for every setup that we rolled out. Initially, we tried running it on the Raspberry Pi and some of its alternatives, but they didn’t have enough computing power to process all the data coming from our camera, even at lower image resolutions. Eventually, we managed to deploy our application on a cost effective and small form factor Intel Celeron PC, and got it running satisfactorily in real-time only after refactoring our application utilize all available cores on the CPU.

Conclusion

Currently this system has been deployed to a few of our warehouses in Indonesia, Philippines and Vietnam as a prototype. There are still some occasional issues that are being reported, mostly related to incorrect calibration before use. Even though we tried our best to come up with a standardized setup that was tested to work in our Singapore office, it is unfortunately still very sensitive to the tilt of the camera and other environmental conditions (how cluttered the area is). So, overall, this prototype is still very much a work in progress.

We have some ideas for improving our prototype. Possible features include automatic picture-taking and barcode scanning of the parcels, better visual feedback of the current camera view, more safeguards against incorrect calibration, ability to measure larger size parcels, etc. The progress of these features will of course depend on the need.

I had a lot of fun working on this project, though it was quite challenging at times. This project would not have been possible without the help and input of Shaun Chong, Cheehan, Chao Zhang and Ivan Wang, and I would like to acknowledge them for their valuable contribution. I look forward to hearing the feedback or ideas that you may have about this project, so please do not hesitate to leave some comments or speak to me about it.

Finally, I’d like to also thank my wife who helped me immensely in the writing of this short post. You are the best, I will buy you a Porg.