Can Reinforcement Learning Generalize Beyond Its Training? Part 2
Training With a Digital Twin
Part 1 described a reinforcement learning system used to find the optimal control settings for a reflow oven used for soldering electronic components to a circuit board (Figure 1 & Figure 2). Part 2 presents the details of the oven simulator used to accelerate the training process.
The oven’s moving belt transports the product (i.e., the circuit board) through multiple heating zones. This process heats the product according to a temperature-time target profile required to produce reliable solder connections (Figure 3).
Since considerable time is required to stabilize an oven’s temperature after changing the heater settings (up to 40 minutes) and passing the product through the oven (5 minutes), an oven simulator is used to speed up the process. The simulator emulates a single pass of the product through the oven in a few seconds compared to the minutes required by a physical oven.
The oven simulator has eight heating zones, each with a control for setting the temperature of the zone’s heater (Figure 4). After each pass, the simulator provides the temperature readings of the product recorded as it traveled through the oven.
Simulation model
The heating process is modeled using the finite-difference method. With this method, the product and the oven’s heaters are modeled as many discrete elements as illustrated in Figure 5.
The conductive and convective heat flow between the elements is illustrated in Figure 6.
Simulation heat flow equations
The following equations define the conductive and convective heat flow between the elements: [Crank, 1975, pg.141][1] and [Lienhard, 2020, pp.13,22][2]
Model stability criteria
The simulation model is stable when the stability factor, r ≤ 0.5. Equation 5 defines the stability factor. [Crank, 1975, pp. 138, 145][1] and [Lienhard, 2020, p. 18][2]
Algorithm
A PDF of the original paper for this article is available here.
All images, unless otherwise noted, are by the author.