How I fixed server with dead NFS mount

It was regular Monday afternoon, I was trying not to fall asleep after lunch and I was trying to hide myself from boring daily work, doing not-so-boring maintenance tasks. In fact I had last portion of servers that should be added to Zabbix monitoring. These servers was running on ancient RHEL 5 and was little bit tumble-down. I hit all of my favorite bugs like “you need to move /dev/mem to /dev/mem.dmi if you don’t want to see segmentation fault on subscription-manager attach command”. Or “you forgot aabout this machine last year and you don’t have free space in /var”. Nothing unusual. But then…

Installation of any rpm package, no matter by yum or by rpm command just hanged. Well’ I know this machine is old enough, I can wait for some time. Nope, I was waiting for almost one hour and nothing happens. Interrupt command (ctrl-C) does nothing. Quick check with ps -ef showed a lot of hanged cronjobs, and the reason was daily run of updatedb. Well I know this smell, yes there is NFS mount on the system, but remote NFS server was went away. Yes, I know — the cure is to reboot the server, but I’m not looking for a straight paths so I decided to fix the issue without reboot.

On a fist place I need to kill all processes which had opened file handlers to dead mount. Of course fuser and lsof can’t help here because they just hangs. So I stopped cron daemon and killed all updatedb jobs with “ps -ef | grep updatedb | awk ‘{print $2}’ | xargs kill -9”. Then I checked output of “ps -ef” and killed all suspicious processes.

Next issue — “umount -f /opt/mnt” was returning device or resource is busy and was not working. Well I’m using Linux more than 15 years and I know “umount -f” is the only solution, but let me check man umount just in case. Surprise!

  • l, — lazy 
     Lazy unmount. Detach the filesystem from the file hierarchy now, and clean up all references to this filesystem as soon as it is not busy anymore. (Requires kernel 2.4.11 or later.)

And my stubbornness was satisfied. I fixed the machine without reboot and learned the same lesson again — you can’t always remember everything and you don’t know everything, but you can always RTFM.

Show your support

Clapping shows how much you appreciated Hristofor Pamyatnih’s story.