Training a geometry-based regression to predict CNC runtimes

Published in

Paperless Parts Tech Blog

8 min readJun 21, 2024

CAMWorks simulating a CNC tool path generated by a VBA Macro

The Setup

In the summer of 2018, I was an intern for Paperless Parts. At the time, we were a 13 person company focused on invigorating and empowering the custom parts manufacturing market by introducing a semi-automated marketplace for buyers and job shops. The key to that business model was an easily-configurable, automated quote engine. A shop would configure their rates and capabilities into Paperless and be “live” on the marketplace. Buyers would submit a request-for-quote to the Paperless marketplace and immediately receive quotes from all shops configured in Paperless. Similar to Xometry, but without any obfuscation between the buyer and supplier. We wanted to empower manufacturers and grow their brands, not blindly source their jobs.

(Spoiler alert: achieving this model proved to be untenable for a variety of reasons. Today, Paperless Parts is an estimating and quoting software that partners with job shops to help them make smarter, faster, and more informed decisions when quoting their work.)

Enter me, a third year mechanical engineering student, looking for an opportunity to explore career options and learn as much as I could in a summer. What better place than a startup in an industry I was not familiar with, working with tech I had never used before? Thanks Dana, Scott, and Jason for taking a chance on me. I was super excited to immerse myself in the problem space and technology.

The Problem

I was tasked with developing an algorithm that could take in a custom part geometry (via a Boundary Representation CAD model) and provide an estimate for how long it would take to machine the part on a CNC mill. The solution had to be fully automated and performant, providing the user with results in seconds without any human input. Oh, and the estimate had to be accurate enough that estimators and machinists would not scoff at it. The idea was that this runtime estimation model could be embedded into the marketplace and garner enough trust by estimators that they would rely on it to drive their configured, automated pricing engine on the Paperless marketplace.

The Approach

The path we decided to pursue was a linear regression model to predict runtimes. The existing Paperless geometric analysis, or “interrogation” as we call it, had the ability to extract lots of attributes from a given CAD file: high-level things like volume, area, bounding box and more complex things like the orientation of a “good” first setup, surface-based analysis of a machining plan, and rudimentary feature detection. These would all be predictors we could use for the linear regression.

The more difficult piece was gathering truth data to train against. We considered a few different possibilities such as hiring a machinist to either estimate runtimes or actually program many parts. Ultimately, I decided to investigate programming the files myself and automating tool path generation as much as possible. I was using CAMWorks on top of SolidWorks to generate tool paths. As someone new to manufacturing, without a real CNC machine in front of me to run, I wanted to rely on CAMWorks automatic tool path generation with as little manual override as possible.

What I found was that if I configured CAMWorks with a standard tool set, set the material to Aluminum 6061, added a stock buffer, and opted for Rest machining, CAMWorks would generate a machining plan with very minimal clicks that was good enough to result in a simulated geometry pretty dang close to the detailed model. There were some caveats I found and worked through:

CAMWorks automation relied on the orientation of the file to pick the first setup direction
It would not automatically generate tool paths to cut 3D surfaced area (think non-prismatic features such as fillets or curved surfaces that would not be profiled)
It struggled with more than 2 setup changes; it would only handle the top and bottom of the part well

Remember that this was me using CAMWorks to do as much work as possible with the smallest amount of human intervention; I’m sure with just a dash more human interaction these things were easy to configure. Even more interaction from an experienced programmer is necessary to make a production-ready tool path, but more on that later.

It became clear there was a path to get many data points by writing a VBA Macro that could load a CAD file into SolidWorks, use CAMWorks to generate a machining plan, and simulate a tool path. This way we could get to a runtime and feasibility of the tool path by analyzing the simulated workpiece. Allowing this to run on hundreds of parts quickly gave us a dataset.

I opted to use Python to analyze the dataset: mainly sklearn, numpy, and pandas. Getting a linear regression model was rather simple with these tools, but (maybe unsurprisingly) the harder part was determining how many predictors to use, which ones, and bucketing data. These three levers were tuned via trial and error until I started seeing diminishing returns. The outcome was a combination of five predictive models: three decision tree classifiers and two linear regressions. The output of the models is a runtime prediction and a confidence (low, medium, high).

The results were promising. When we had high confidence in our prediction, we were able to get within a few minutes for parts with under 40 minutes of runtime and within about ten minutes for parts over. This gave us a median miss percentage of ~20%. However, it was clear that this would not be something that we could use to drive someone’s business completely hands off. Initially, we opted to allow users to access the predicted runtime and confidence and use how they see fit; most of them opted to use it as a benchmark and examine whether they could lean on it in any way.

“Quick Part” Linear Regression: CAMWorks Runtime vs. Predicted Runtime

To extrapolate the runtime prediction to materials other than Aluminum 6061, we assumed that we could scale the Aluminum runtime linearly based on the material. We queried some industry experts to try to identify some ballpark scales between common material families.

The feedback we got in practice was that the model tended to considerably overestimate runtime on parts and not in a predictable way. It was clear the model overfit to the truth data, which was not realistic to begin with. Eventually, we decided the estimates in our product were doing more harm than good, and we chose to sunset the feature.

The Takeaways

The entire project was a bit of a presumptive swing. Lots of things had to go right in order for it to effectively add customer value. They did not. However, we were able to learn a lot about the problem space and experience first-hand why predicting machining runtimes is a much much harder problem to solve than we initially hoped. Along the way we made a lot of assumptions, a lot of them wrong, but the most valuable takeaway was realizing why each of them were wrong:

A linear regression is too simple of a predictive model to adequately predict runtimes. There are too many variables in play that do not scale runtime in a linear manner.
Assuming a predictive model for solely Aluminum 6061 could be extrapolated to other material families via scaling was another oversimplification.
Assuming that the stock piece will always be the bounding box plus some constant buffer is far-fetched. In reality, the material stock can come in many different shapes, some standard, some pre-cut, pre-turned, sometimes there will be operations done before it reaches the CNC mill like water-jet cutting, turning, casting.
Ignoring multi-setup parts and 3D surfaced area was a major oversimplification. Too many geometries contain regions and features that require surfacing operations. A 2-setup, prismatic part runtime predictor is too narrow for the breadth of use cases we were looking for.
There is no “correct” runtime. We were able to make a model that somewhat effectively could predict what CAMWorks out of the box generates, but that is not the runtime that estimator X, programmer Y, or operator Z will achieve. In fact, all of those runtime predictions are going to be different! Runtime is subjective. It depends on the person, shop, CAM software, tools, machine, etc. (If you want to get micro-level, it can even depend on how busy the shop is, how the programmer’s day is going, the state of the economy, the list goes on.) Scott Sawyer wrote a great article that expands on this:

Why Automated Estimation of Machined Parts is Hard

I was more than a decade into my engineering career before I saw the inside of a modern machine shop. I was there to…

medium.com

You cannot ignore variables. We chose to assume all parts would be made out of 6061, from the same tool set, with no way of capturing critical tolerances, threads, and other requirements. Those variables can easily halve or double your runtime.

The Future

This will not be the last effort done by Paperless Parts to achieve an assistive tool for estimating CNC runtimes. The findings from this project continue to fuel whatever solution we venture towards next. Looking at the incorrect assumptions, I would expect our solution to include the following attributes:

Highly configurable: the outcome should not be the runtime, it should be the user’s runtime. It should take into account all of the user’s preferences, biases, and allow them to override any methods/assumptions made.
No obfuscation: the solution should provide mechanisms for the user to know exactly how the value they are seeing was generated. A black box solution will never garner trust from users.
Iterative: first provide a solution for prismatic (2.5D) parts, then 3-axis, then 4-axis, then 5-axis and so on until all desired operation orders have a solution.
Balanced: the solution should exist in a happy medium between a quick, rough estimate and fully programming the part. The user should decide how much effort to put in to achieve a result on the scale they are comfortable with.