I upgraded my Ubiquiti EdgeRouter X yesterday. This turned out to be… somewhat more involved than I had imagined.
It began normally — I downloaded the new firmware (v188.8.131.5227989) from Ubiquiti’s site. Upload to the ER-X. It chugs away, but then “Upload failed”. Hm, OK. I try again. “Upload failed”. Weird. There’s no indication that an upgrade had actually begun. I figure, let’s reboot and try again. After 5 minutes of waiting after the reboot, I realise it’s not coming back.
I’ve been doing all of this over WiFi. I have two APs, and the link between them goes via the ER-X. WiFi quickly begins to go downhill. OK, I’ll get a network cable and connect directly to the ER-X. I go to get the USB-C to Ethernet adapter out of my bag. Not there. Hm. In the cupboard? Nop. Anywhere else I can think of? Nop. (It’s doubtless in Munich, where I work in the week.) I could run out and get one, but it’s a public holiday and all of the shops are closed.
Does the cheapo Chromebook I recently bought have an Ethernet socket? Nop. My wife’s laptop? Nop. I reluctantly grab the 8-year-old Windows 7 laptop from the cupboard, last used about 5 years ago. It’s old, slow, dirty, clunky and holds all of about 10 minutes of charge, but it does have an Ethernet socket. After it spends half an hour running CHKDSK (…), I connect it to the ER-X. A link light. OK, that’s a good start. No DHCP. Assign a static IP, ping it. Nothing. I do a power-on reset (hold the reset button, power on, keep held until eth4 stops blinking). Nothing, not on its original IP and not on its factory default IP. I notice that about once a minute, the link light goes off and the lights for it and the other three ports (which have nothing connected to them) blink very briefly. Oh dear — that looks like a reboot loop.
Fortunately, my neighbours’ WiFi, to which I (legitimately) have access, is just about in range. Using this to do some searching, Ubiquiti appears to provide no official method to deal with a bricking this hard. Various folks, though, mention that the ER-X has an undocumented serial port, although they caution that it uses 3.3v logic and suggest using an FTDI cable. I have a couple of those, but they too are in Munich (and I can’t buy another because public holiday). I ponder, thinking about what I do have. I realise that I have a bunch of doorbell wire and that a Raspberry Pi runs on 3.3v and has a serial port.
Some time later:
That’s the Pi in the foreground, connected to the grotty Win7 laptop by ethernet, with RX, TX and GND going to the ER-X (with its cover off) in the background/upper centre of the picture.
I didn’t have ribbon cable and didn’t really want to do any soldering, so the connections are made by making tiny loops at the end of the wire.
I use raspi-config to disable the Linux serial console but leave the serial port enabled. The RasPi doesn’t have minicom, but it does have screen, so I run screen /dev/ttyS0 57600,cs8,-parenb,-cstopb,-hupcl — 57.6k, 8N1. Turn on the ER-X and presto: serial output. It’s booting and running fine. And now it’s not in a 60s reboot loop. I get a login prompt. Assuming it’s entirely reset, I try the default user/password ubnt/ubnt. Login failed. I try my actual username/password. Login failed. I try various other combinations. Login failed. Hm.
Part of the boot messages presented an option to do other things than just boot the default image. One of them was getting stuff via TFTP, but Ubiquiti don’t supply TFTP images for the ER-X. Another was a boot menu. I reboot again and select option 4, for the boot menu.
4: System Enter Boot Command Line Interface.
U-Boot 1.1.3 (Nov 2 2015–16:39:31)
MT7621 # ?
? — alias for ‘help’
bootm — boot application image from memory
cp — memory copy
erase — erase SPI FLASH memory
go — start application at address ‘addr’
help — print online help
i2ccmd — read/write data to eeprom via I2C Interface
loadb — load binary file over serial line (kermit mode)
md — memory display
mdio — Ralink PHY register R/W command !!
mm — memory modify (auto-incrementing)
nand — nand command
nm — memory modify (constant address)
printenv- print environment variables
reset — Perform RESET of the CPU
saveenv — save environment variables to persistent storage
setenv — set environment variables
spi — spi command
tftpboot- boot image via network using TFTP protocol
ubntw — ubntw command
version — print monitor version
OK. printenv shows me that that there’s a variable named bootargs that’s set to console=ttyS1,57600n8 ubi.mtd=7 root=ubi0_0 rootfstype=ubifs rootsqimg=squashfs.img rootsqwdir=w rw. Those look like Linux kernel parameters. I run setenv bootargs console=ttyS1,57600n8 ubi.mtd=7 root=ubi0_0 rootfstype=ubifs rootsqimg=squashfs.img rootsqwdir=w rw single init=/bin/bash. This works and printenv shows the change.
Part of the previous boot messages also contained Booting image at bfd40000. The bootm command appears to want an address, so I tried bootm 0xbfd40000 — and it booted and dropped me in a bash shell. /etc/shadow doesn’t mention a ubnt user and root doesn’t have a password. I run setpasswd and /etc/shadow reflects the change. Reboot and let it boot normally and now I can log in as root.
Poking around a bit, everything looks basically fine. There are a few errors on boot, but no smoking gun. Still no 60s reboot loop. I decide to try pinging it again, but I only have one ethernet cable handy, so I’ll have to disconnect the RasPi, so I disconnect from screen but leave it running. Connect the ethernet cable to the ER-X, but still no pinging — and it’s in a reboot loop again. Connect back to the RasPi and now we have this:
CPU 0 Unable to handle kernel paging request at virtual address 000001a6, epc == 802da348, ra == 8029860c
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.14-UBNT #1
task: 80467610 ti: 80456000 task.ti: 80456000
$ 0 : 00000000 00000000 00000001 8fc1e042
$ 4 : 8f4f9080 00000000 00000000 00000800
$ 8 : ffffffff 00000001 0000ffff 00000007
$12 : 00000007 0000000e 803d3228 80457d38
$16 : 8f4f9080 804e31b0 8fdd4000 804e0000
$20 : 0000004e 804e0000 00000010 804e0000
$24 : 00000000 80005e7c
$28 : 80456000 80457d18 804e0000 8029860c
Hi : 00000000
Lo : 00000000
epc : 802da348 eth_type_trans+0xe0/0x13c
ra : 8029860c ei_receive+0xe0/0x504
Status: 1100fc03 KERNEL EXL IE
Cause : 00800008
BadVA : 000001a6
PrId : 0001992f (MIPS 1004Kc)
Modules linked in:
Process swapper/0 (pid: 0, threadinfo=80456000, task=80467610, tls=00000000)
Stack : 8fdd4000 80298bf4 00000000 00000001 00000002 00000000 00000006 804589c0
8fdd447c 8fdd4480 00000000 00000002 00010000 80464e40 8046535c 804b3434
804c8980 8003062c 00000100 80458680 804589c0 00000001 00000000 80458680
00000100 80458680 0000000a 804c8980 8046535c 804b35b8 00000000 80030e64
00000000 80083394 00000001 00000024 804589c0 8007eb90 ffff8c92 00200000
Code: 7ca25204 080b68ba ac820068 <94aa01a6> 94670002 94a901a4 94680000 94a201a8 94660004
— -[ end trace e08821f15f2ecfff ] — -
Kernel panic — not syncing: Fatal exception in interrupt
Rebooting in 60 seconds..hrtimer: interrupt took 0 ns
hrtimer: interrupt took 0 ns
Hrmf. At least I now understand why I only sometimes have the 60s reboot loop: it’s panicing in the ethernet interrupt handler. No ethernet cable, no interrupt, no panic.
I decide to try running the CLI-based upgrade procedure documented by Ubiquiti. show version works, df -h works. show system image storage doesn’t, showing Platform script /usr/bin/ubnt-upgrade. is missing. Taking a look at /usr/bin/ubnt-upgrade, it’s a shell script that’s calling /usr/sbin/ubnt-hal-e getBoardIdE and using the output of that to decide whether to run ubnt-upgrade.e50 or ubnt-upgrade.e51. ubnt-hal-e getBoardIdE is producing no output.
After some more poking around (and noticing the apparent lack of ethernet interfaces, if I remember right, as well as the lack of /lib/modules/3.10.14), I come to the idea that it’s looking like the userspace (squashfs) has been upgraded, but the kernel is old. From the boot messages, it’s running kernel 3.10.14-UBNT, but it looks like it should be running 3.10.107-UBNT. OK, how to get a new kernel on there? Back to the boot menu.
I can boot an image that’s in memory — I did that before to enable login. I need to get a new kernel into memory. One of the other options in that boot menu: loadb — load binary file over serial line (kermit mode). I have a serial line. I didn’t expect to ever be hearing about Kermit again, but I know what it is. The RasPi still doesn’t have internet access, but I go to the RasPi repo from the laptop, download the debs for ckermit and minicom and SCP them to the RasPi. Fortunately, they don’t need any dependencies and dpkg -i installs them without complaint. I also unpack the Ubiquiti firmware upgrade, extract vmlinux.tmp and SCP it to the RasPi. Connect to the ER-X with minicom instead of screen, get into the boot menu, run loadb. Ctrl-A, S in Minicom, select vmlinux.tmp, a bunch of messages flash and nothing is apparently sent. The messages disappear too quickly for me to see. Hrmf.
In another shell on the RasPi, attach to the process with strace and repeat the procedure. Turns out, Kermit is complaining about the lack of carrier, because it’s stuck in the 80s when that was a thing. After some vigorous swearing and some more searching, I add this to ~/.kermrc:
set line /dev/ttyS0
set speed 57600
set carrier-watch off
set handshake none
set flow-control none
set file type bin
set file name lit
set rec pack 1000
set send pack 1000
set window 5
…and try again. (I assume that the carrier-watch line is the critical one, but the others seem sensible too.) It works, but it’s transferring a 1.7MiB kernel at 57.6k, so it takes about 5 minutes. Once it’s done, the U-Boot output shows the address it’s stored at (0x80100000 or so). bootm 0x80100000 and… it boots! I perform a small dance (not pictured).
On login, ubnt-hal-e getBoardIdE now works. There were fewer errors among the boot messages too. It looks like it’s working, but just loading a kernel into memory isn’t going to help me long-term. Running the upgrade procedure requires me to have the firmware image on the device. It’d take a small age, but I’m prepared to transfer this via the serial link. Only, the ubnt image doesn’t have rx, ry, rz or even kermit. There appears to be no way to get an image to the running Linux install. But wait! It’s working now, right? I should be able to talk to it via Ethernet! I swap the cable over, configure the laptop’s IP to be 192.168.1.2 and lo, a ping response from 192.168.1.1!
I SCP the firmware image to the ER-X. Via ssh, it now accepts the ubnt/ubnt login and the image ends up in /home/ubnt. I run add system image /home/ubnt/ER-e50.v184.108.40.20627989.tar. It complains about a lack of space. I delete the previous system image. Still a lack of space. Looking at the output of df, the lack of space is because… there’s a firmware image in /home/ubnt. A repeat round of vigorous swearing ensues, but from the OpenWRT page, I realise that I can just put the image in RAM. mount -o remount,size=240M tmpfs /tmp and mv the image and the disk space error is gone. The upgrade completes successfully (although complains about the name being duplicated with an existing image).
I reboot the router, which comes up perfectly and responds. I use the web interface to restore my backup copy of its config, mount it back on the wall in the cellar with everything connected and go to bed feeling like the kind of hero about whom songs are written.