How did I unbrick an NVidia Bluefield-2 DPU

Levente Csikor
CodeX
Published in
12 min readDec 13, 2023

--

This guide aims to provide instructions on resolving the bricking issue that may occur with an Nvidia Bluefield-2 DPU, particularly when it becomes unresponsive during a bfb-install stage. Have you ever encountered a situation like this before?

bfb-install crashes before “Installing OS” would happen.

Have you considered the possibility of Secure Boot causing the issue, given its apparent hang-up? Have you attempted to disable Secure Boot, only to find that the BIOS/UEFI is password-protected, and the password is unknown to you? If you’ve pondered these questions, accompany me on this intriguing journey of reviving a Bluefield-2 in such circumstances.

The full experience with a brick

Have you encountered a situation where inserting your Bluefield-2 DPU into the server results in an unusually slow boot process? Perhaps you’ve observed error codes on the BIOS POST screen, such as EC 92 (commonly indicating a PCI error), leading you to suspect a potential hardware issue with the Bluefield.

Refusing to dismiss the issue, recognizing its potential financial impact, you opted to keep the server powered on, hoping it would progress beyond the POST screen. Miraculously, it did, but the process took an agonizing 5–10 minutes. As Linux initiated its boot sequence, it too encountered delays, eventually revealing a significant problem with…

--

--

Levente Csikor
CodeX
Writer for

Researcher with a PhD degree in Computer Science. He writes about tools and experiences to boost your research, and occasionally orthogonal stuff. (cslev.vip)