Form Auto: Cleaning the Window Crashed Our Software

Seth Berg
Layer by Layer
Published in
7 min readMar 11, 2024

Welcome to the Formlabs Hardware Engineering blog! Here, we will share learnings and stories from our team about 3D printing, mechanical engineering, electrical engineering, optics, manufacturing, product development, and hardware engineering more broadly. Interested in joining our team? Reach out: careers.formlabs.com

A video taken from inside a Form Auto unit showing it removing parts from a Form 3 to enable automatic, continuous printing.

Intro to Seth: Seth Berg joined Formlabs 8 years ago as an R&D intern. He transitioned to a full-time role and spent his first 5 years as an engineer and technical program manager on the Fuse 1 program. With a few other colleagues at the Formlabs hackathon in 2021, Seth conceived the idea for and prototyped what would become Form Auto. Following the launch of Fuse, he led the Form Auto product development effort. Seth is currently working on future SLA products.

Seth has an Electrical Engineering and Computer Science degree from MIT (though he did consider majoring in Mechanical Engineering… his interests across various disciplines has served him well in leading product development at Formlabs!).

Bugs Are Frustrating

One of the most frustrating yet satisfying parts of shipping hardware products is encountering bugs that don’t make any sense, until suddenly the root cause is crystal clear. The bug I’m writing about didn’t impact customers, it didn’t cause any timeline delays, and it didn’t cost resources (other than some of our time) to fix. In the scheme of bugs, this one was pretty low stakes. But what I find so memorable about it is how unlikely the final root cause seemed when the issue first arose.

A Bug Appears at the Form Auto Factory

In January, 2023, we were building Form Auto EVT (Engineering Validation Test) units at the factory in China, and for the most part, everything was going smoothly. However, when the latest batch reached end-of-line testing, 100% of units were failing the full system test. The printer cover didn’t open when it was supposed to, and we got a generic software timeout error.

Photo taken while Form Auto EVT (Engineering Validation Tests) were being conducted at a factory in China.

Our first step when something like this happens is to look at what changes were made between the previous functional batch, and the batch with issues, and then ask which of these might have caused the unexpected behavior we’re seeing.

Here were some of the changes we had recently made, along with our guesses as to the likelihood of them causing the failure:

  1. Used a new vendor for the cover actuator — This was a big red flag because this actuator opens the printer cover
  2. Changed some capacitor values in the circuitry around the USB hub that handles communication with the printer — This seemed possibly related, given the software timeout error
  3. Made various small software changes to test scripts that run at end-of-line testing — None were specifically related to the cover actuator, but this was still potentially related given that we made changes to the script that was now throwing an error
  4. Updated the latching mechanism for the part removal toolhead for improved reliability and serviceability — This is in a different subsystem than the one handling the printer cover, and had no electrical or software changes; therefore, it seemed unrelated to the issue at-hand.
  5. Replaced the plastic window for the camera (used to determine if parts are successfully removed from the printer build platform)with a different one with improved clarity — This was a single change to a clear piece of plastic, and was in a different subsystem than the one handling the printer cover; therefore it seemed unrelated to the issue at-hand

De-Bugging From Across the Globe

We made our list of possible culprits (items 1–3 from above) and reverted each change individually in the current (non-passing) batch of Form Auto units to see if the issue would disappear; if it did, we’d know our root cause. Unfortunately, none of the reversions fixed the issue. Then we applied each suspected culprit individually to the previous batch of passing units to see if we could reproduce the issue. We were not able to reproduce the issue on those units either.

All of these tests were being run at the factory in China, and it took a couple of days and nights of back-and-forth between the factory and our team in Somerville, MA, to rule out each change as the root cause of the issue. We were frustrated because none of the obvious potential root causes was causing the issue, and we were still seeing 100% of all units fail the end-of-line test.

Photos of (left) an image taken from the camera through the original lower-clarity window, and (right) and image take from the camera through the higher-clarity window.

Our contract manufacturer had been running experiments of their own: changing different sub-assembly revisions and then re-running the test. They had no success over the first couple of days, but on the morning of day 3 or 4, they sent us an email indicating that they could solve the issue by swapping the new, higher-clarity camera window with the former, lower-clarity one. I remember receiving this email and thinking they must have made a mistake. It made no sense that the piece of plastic in front of a webcam, in a totally different part of the machine than the printer cover actuator was in, would be causing a linear actuator not to move. But this was quick to test, so we took one of our sample units at our headquarters in Somerville, swapped in the new window, and ran the end-of-line test.

To everyone’s surprise, we suddenly got the exact same error message they’d seen at the factory. Taking the window out entirely also caused the error, but putting in the original blurry window made the error go away. We repeated this test a number of times in each configuration, and the results were always the same. For some reason, Form Auto only seemed to work when the camera window was blurry…

The Answer Was the Least Obvious One

Why was the camera window causing the actuator to stall? It turns out that the problem had nothing to do with the cover actuator, its electronics or any of its software. As soon as we were able to reproduce the issue, we looked closely at everything involved with handling USB communications, and watched what the software did when the issue was triggered. The root cause was an issue with how we were handling the encoded USB camera signal on the printer:

  • The video coming from the USB camera was being compressed before it was sent over USB. A blurry image has less high-frequency content than a higher-clarity one, meaning that the JPEG compression algorithm was able to produce a smaller file size than if the image had greater high-frequency content.
  • This meant that with the higher-clarity window, the camera was sending more data to the printer for each frame of video than it was with the lower-clarity window. We had designed the software to prioritize certain operations (such as moving motors and responding to user input) over others (such as receiving data from the camera). However, in this case, the prioritization had a side effect that we had not anticipated: The larger image sizes that had resulted from swapping in the higher-clarity window led the software to ignore (drop) some frames from the camera, as they could not be processed in the amount of time that had previously been sufficient. And, due to a bug in how dropped frames were handled, the software would crash when this occurred.
  • The only reason this appeared to impact the cover actuator is that the camera begins recording right before the cover actuator moves. Had the order been different, we would have seen a different motion system freeze.

The fix was easy: allow dropped frames rather than error out. This is how the system should have worked all along, but we just happened to be running right below the threshold that triggered dropped frames, so we never caused the issue until changing the window.

A Software Fix

The fix was a simple software change. We pushed it to the factory and never saw the bug again. We fix software bugs all the time, but what I like so much about this one is how unlikely the final answer had seemed at the outset. On day 1 when the factory told us that the cover actuators weren’t moving, had someone suggested that the root cause was the new camera window, we wouldn’t have taken the idea seriously because there were so many other more likely culprits.

Although they can be the most frustrating and hardest to root-cause, problems that require engineers from different teams collaborate to solve are the ones I enjoy most.

Form Auto’s Hackathon Roots

And, fun fact: the concept for Form Auto came out of one of our annual hackathons, during which Formlings and friends have a few days to work on projects they might not typically have the time and resources for (projects can be related to Formlabs or not at all). Here is a brief story of how Form Auto came to be.

What’s next?

There are so many other topics we want to write about and share. What else do you want to hear about? Email us at hardwareblog@formlabs.com. Looking forward to hearing from you and sharing more from behind the scenes at Formlabs.

--

--