Can Reinforcement Learning Generalize Beyond Its Training? Part 2

Training With a Digital Twin

3 min readFeb 14, 2023

Photo by Marc Sendra Martorell on Unsplash

Part 1 described a reinforcement learning system used to find the optimal control settings for a reflow oven used for soldering electronic components to a circuit board (Figure 1 & Figure 2). Part 2 presents the details of the oven simulator used to accelerate the training process.

Figure 1: **Reflow oven** (image via Adobe under license to John Morrow)

Figure 2: **Product on belt** (image via Adobe under license to John Morrow)

The oven’s moving belt transports the product (i.e., the circuit board) through multiple heating zones. This process heats the product according to a temperature-time target profile required to produce reliable solder connections (Figure 3).

Figure 3: **Temperature-time profile (blue:target profile, red:actual product profile)**

Since considerable time is required to stabilize an oven’s temperature after changing the heater settings (up to 40 minutes) and passing the product through the oven (5 minutes), an oven simulator is used to speed up the process. The simulator emulates a single pass of the product through the oven in a few seconds compared to the minutes required by a physical oven.

The oven simulator has eight heating zones, each with a control for setting the temperature of the zone’s heater (Figure 4). After each pass, the simulator provides the temperature readings of the product recorded as it traveled through the oven.

Simulation model

The heating process is modeled using the finite-difference method. With this method, the product and the oven’s heaters are modeled as many discrete elements as illustrated in Figure 5.

Figure 5: **Finite-difference discrete elements**

The conductive and convective heat flow between the elements is illustrated in Figure 6.

Figure 6: **Heat flow between elements**

Simulation heat flow equations

The following equations define the conductive and convective heat flow between the elements: [Crank, 1975, pg.141][1] and [Lienhard, 2020, pp.13,22][2]

¹ This applies for top heaters only. With top and bottom heaters active, convection heat becomes (2 · ∆Qcv).

Model stability criteria

The simulation model is stable when the stability factor, r ≤ 0.5. Equation 5 defines the stability factor. [Crank, 1975, pp. 138, 145][1] and [Lienhard, 2020, p. 18][2]

Algorithm

² The product array includes a dummy element on the left and the right to account for conduction end effects. ³ The ovenx array facilitates the product starting and ending outside the oven as the product array moves across the ovenx array through a sequence of time steps.

A PDF of the original paper for this article is available here.

All images, unless otherwise noted, are by the author.

References

[1] J. Crank, The Mathematics Of Diffusion, 2nd Edition. Oxford, England: Oxford University Press, 1975, also available as PDF.

[2] J. H. Lienhard, IV and J. H. Lienhard, V, A Heat Transfer Textbook, 5th ed. Cambridge, MA: Phlogiston Press, 2020, version 5.10. [Online]. Available as PDF.

Can Reinforcement Learning Generalize Beyond Its Training? Part 2

Training With a Digital Twin

Written by John Morrow