LIBXSMM Brings Deep-learning “Lessons Learned” to Many HPC Applications

Extreme performance HPC examples

Figure 1: LOH.1 benchmark example mesh and material regions (Image courtesy ISC’16)

Optimized matrix operations tie deep-learning to HPC via LIBXSMM

Lowering precision does translate to faster time to solution.

Figure 2: Exemplary illustration of EDGE’s fourth order solution for the ninth receiver and quantity $u$ of the LOH.1 wave propagation benchmark. Plots a) and b) show a comparison to the reference, using double precision arithmetic. Out of the eight fused, identical solutions of the setting, only the first one is shown. Plots c) and d) show a comparison of the almost identical single and double precision results, obtained when using a single forward simulation. Due to the low misfits, shown in d), the FP32 in and FP64 solutions are visually indistinguishable in the raw receiver plot c). (Image courtesy Intel)

Extreme sparse matrix performance

Figure 3: Incorporating multiple seismic sources into the solver using fused simulations (Image courtesy UCSD)
Figure 4: Dark grey: non-fused simulation, Light gray: fused simulation (Image courtesy Intel)




Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store