Additive Manufacturing Melt Pool Physics Prediction Using Physical Simulation Data

Published in

Analytics Vidhya

10 min readMar 6, 2020

Authors: Cangcheng Tang, Shiyu Liu, Yue Zhuang

Problem Introduction

Additive Manufacturing (AM) is a relatively new manufacturing process that exhibits many favorable characteristics not possible with subtractive methods. The part quality, however, cannot be well controlled unless implementing full-scale physics simulation. To reduce the complexities of the physics simulation, we aim to provide fast predictions on the melt pool physics to enable process planning and control based on the simulation data.

There are two problems that need to be solved in this project. For the first problem, we predict the temperature at a specific point (with x, y, z coordinates) given laser power as well as laser speed. In addition, we also predicted the melt pool length, width, and depth given laser power and laser speed.

In the second problem, we conduct the same prediction but consider three extra variables: laser angel, laser direction and edge distance. Specifically, there are five equidistant laser angels ranging from 10 to 90 degrees, two directions and edge distances ranging from 0.06 to 1.6.

Exploratory Data Analysis

Task 1

Predicting Temperature

For task one, we first examined how temperature correlates with the covariates. From the correlation matrix, the target variable seems to be negatively correlated with all our features.

But this collides with our instinct, as temperature should be positively correlated with power. So we froze speed and examined the relationship between temperature and power, and the result was just as we expected: the two features are positively correlated. And if we freeze power, temperature and speed are negatively correlated. The visualizations of this relationship can be seen below, each line represents a certain speed or a certain power.

This aligns with our instinct, as the more power used for heating, the higher the temperature should be. And the faster the laser goes, the less time it spends on heating Ti64, and the lower the temperature should be.

To analyze the relationship between temperature and the x, y, z coordinates, we used a 3D scatter plot for visualization.

We can see from the plot that on the top right corner, where all X, Y are large and Z is small, the temperature is relatively high. While when X, Z are larger, Y is small, the temperature is relatively low.

Predicting Melt Pool Dimensions

Similarly, we calculated the correlation matrix for features in Melt Pool Dimensions, as shown below. We can see that Pool Length and Laser Power are similarly correlated with other features, so are Pool Width and Pool Depth.

Task 2

Predicting Temperature

Analysis from task 1 yields similar results in task 2. So for this more complicated problem, we are focusing more on the additional features.

First, we performed boxplots on temperature vs laser angle and direction respectively. The temperatures from the plots below are in the logarithm form, to mitigate the outlier problem. From the graphs, when moving away from the edge, Ti64’s temperature is generally higher. As for angles, 10-degree and 50-degree ones have slightly lower temperatures than the others.

Another important feature is edge distance, so we performed a scatter plot to reflect the conditional distribution of average log temperature. From the graph below, we can see that the farther away from the edge, the higher the temperature is.

Method and Models

Why combine ML/DL with simulation?

In traditional approaches to analyze a dynamic heat transfer process, we usually start with thermal theories to build a mathematical model, then apply numerical PDE to set up a simulation process, and finally use programming languages to simulate the process.

Additive Manufacturing (AM) process simulations are highly interested in recent years because additively manufactured parts still suffer from tolerance problems, manufacturing defects, and subpar strength and fatigue life, therefore unable to be used as production parts. Physical simulations provide reliable and cost-effective predictions such as part distortion, residual stresses/strains, microstructure contents, and grain morphology, and can be used effectively to guide the product design and manufacturing process for improved parts quality.

However, due to the multi-scale nature of the problem, certain microstructure simulations could be difficult to scale to macro-level for part level prediction. In this project, we combine data analytics methods with physical simulations and rules to speed up some of the computational intensive additive manufacturing microstructure simulations, namely the metallurgical phase transformation simulations, by building an ML model based on simulation data to predict the melt pool size and temperature field. The ML model can be used with further programming combining physical rules from experimental diagrams to predict the microstructure properties.

Advantages of ML/DL:

Large Dataset: We have a large training dataset for the model, from which an ML/DL model could benefit from.
Reusability: As long as a model is built and trained, it could be reused in alternative scenarios.
Basic Approach: We could use an ML/DL model to gain a general view of how the printer would work and improve the printing accuracy by using further simulations. This could help us save time and effort.

Data Leakage Problem

It seems training and validation data tends to have the same feature values, while testing data does not. This may lead to an incorrect high prediction score on the validation set. We have carefully created our own training and validation data from the original dataset.
To avoid the impact of possible data leakage, we would prefer to perform customized parameter tuning rather than GridSearchCV in this problem.

Basic Model for Case 1:

In Case 1, we will have no edges involved in our printing process, so that the heating would be quite predictable due to the fixed path the Laser has walked through. We should expect a model with high accuracy.

Model1:

In model 1, we aim to predict the temperature of a point given the Laser’s power and path as well as the coordinates of the point.

Baseline model: Linear Regression gets an R2 score of 0.384.

Neural Network: Tuned and trained a DNN to predict temperature. The architecture is multiple dense layers with 2, 4, 8, 16 neurons for each layer, with ReLU as activation and he_normal as initialization. The r2reached .9936 for temperature prediction. But the time needed to tune hyperparameters and structures is too long, so we decided to move on to tree-based models.

Tree Model: We applied both a basic random forest model and a gradient boosting(CatBoost) model for the data set. The basic random forest gets an R2 score of 0.993.

For the gradient boosting algorithm, we used all data for training and parameter tuning process and r2 reached .9990 for the validation dataset. The gradient boosting algorithm performs best among the three machine learning algorithms.

*Predicted temperature versus real temperature for model 1 task 1*

Model2:

In model 2, we would like to make a prediction of the size of the melting pool, due to limitations of the coordinates of data in model 1, we would have to try other methods instead of simply selecting a melting temperature for all.

Baseline Model: Linear Regression gets R2 scores for (length, width, depth) = (0.999, 0.965, 0.964).

Neural Network: Tuned and trained a DNN to predict length, width, depth at the same time. The architecture is multiple dense layers with 8, 9, 10, 16, 16, 32, 64, 128 neurons for each layer, with ReLU as activation and he_normal as initialization. The r2reached .9995, .9987, .9978 for melt length, width and depth predictions.

Tree Model: We applied both a basic random forest model and a gradient boosting(CatBoost) model for the data set. Random forest gets R2 scores for (length, width, depth) = (0.999, 0.994, 0.992).

For the gradient boosting algorithm, we used all data for training and parameter tuning process and r2 reached .9993, .9999, .9998 for melt length, width and depth, respectively in the validation dataset. The gradient boosting algorithm performs best among the three machine learning algorithms.

*Melt length, width and depth prediction versus real values*

Basic Model for Case 2:

In Case 2, we are facing the issues that when the Laser is near the edge of the material, heat conduction would not be simply related to the features in Case 1, but also to edge information. We would expect that a simple model might not work pretty well in this case, feature engineering and parameter tuning would be needed.

Model1:

Similar to model1 in Case 1, we will predict the temperature of a point. Based on knowledge from Case 1, we would focus on Tree models here.

For the gradient boosting algorithm, we used part of data for training and parameter tuning process. In specific, we randomly chose 200 CSV files for training and for testing. This is because training one CatBoots requires a large amount of time in such circumstances. We tried a few hyperparameters and selected the one with the highest validation r2. For the validation dataset, r2 reached .9949. The gradient boosting algorithm performs best among the three machine learning algorithms.

*Predicted temperature versus real temperature for model 1 task 2*

Since a slightly poor prediction was observed at a larger value of temperature. We used a bootstrapping method to obtain more samples at higher temperatures value. The predictions of observations at high real temperature values were improved but resulted in a slightly lower r2(.9992). Therefore, a tradeoff exists here between high overall prediction accuracy and better prediction at observations with high-temperature value.

*Predicted temperature versus real temperature with bootstrapped training data*

Model2:

Similar to model2 in Case 1, we will predict the size of the melting pool. Based on knowledge from Case 1, still, we would focus on Tree models. For the gradient boosting algorithm, we used all data for training and parameter tuning process and r2 reached .9928, .9845, .9986 for melt length, width and depth, respectively in the validation dataset. The gradient boosting algorithm continues to perform well.

Enhancement of Model 2

— — By Applying PDE Finite Element Analysis and Thermal Theories to do Feature Engineering.

Basic formulas for PDE and numerical PDE:

Where F(t,x,y,z) implicates the heat source in this dynamic system (which in our model is a moving point).

Using Finite Element Analysis and Finite Difference Methods, we may come to the following equation for the most general one-dimensional case without a heating source:

where u(j, n) implicates the temperature of point j at time t.

The insights we gained from those equations are (i) temperature is related to the heat it gained from the source, (ii) temperature is related to the temperature of its neighbors.

Insights for Case 2 with edges (what affects the temperature):

At the borders, the way heat transfer does not follow a general case since

as air is not a good material for heat transfer.

Intuitively, while the heat meets the border, it cannot spread out to the air as easily as inside the printing material, which makes the border temperature increase.

The direction of Laser does matter in this case, as for those Lasers coming backward, the material is experiencing heating for the second time, and those heat stored inside would help increase the temperature.

Feature Engineering

Based on the insights from PDE and thermal theories, we suppose that these features may help improve the prediction of models.

The inverse of distance to the Laser point
The inverse of distance to the edge
A measure of heat that could not easily spread out(by volume accumulative)

Roughly use: accumulated volume = edge_dis³/sin(alpha)

4. The area of the border(by surface area accumulative)

Roughly use: accumulated surface = edge_dis²/sin(alpha)

5. The inverse of sin and the distance

Here we used a subset of the training data and applied easy random forest to test the efficiency of generated features.

We have found out that Volume, Surface, Distance, Inverse sin (even roughly estimated) are 4 important features for this model that could bring a 6.9% increase to the R square score of the original model, which was a great improvement since the original model has already had an R2 of 0.868.

We also believe that generating more informative features based on physics and maths understandings would be quite helpful. This should also work for model 2 in case 2.

Enhanced Model

Actually, when predicting the test data in Case 2 model 2, it appears that there exists data leakage, making our validation score much higher than it should be. To avoid this, we have generated our validation data by complete splitting validation and testing. The prediction on validation data turns out to be unsatisfactory with an R2 score of (.900, .878, .958).

*Melt length, width and depth prediction versus real values for model 2 task 2 after considering data leakage before conducting feature engineering*

By introducing the generated features into our dataset, we could greatly enhance our prediction of the melting pool size actually. Given R2 scores of (.968, .932, .968).

*Melt length, width and depth prediction versus real values for model 2 task 2 after considering data leakage after conducting feature engineering*

Future Work

Develop better approaches to tune hyperparameters
More feature engineering based on solid physics knowledge
Try more metrics to evaluate the model

Acknowledgment

Thanks for the support of Dassault Systems in the form of providing data sources and funding for the 2020 Brown Datathon.