Thanks Slav for this awesome post, I’ve used the scripts with fast.ai instances with great savings!
I’m now experimenting with Amazon deep learning AMIs and am trying to run the same script (start_spot.sh) on a 512GB detached EBS. I managed to get it working on first try but subsequently the instance gets stuck at ‘status check 1/2’ with the following sys log:
[8087359.575786] EXT4-fs (xvdf1): ext4_check_descriptors: Checksum for group 0 failed (43049!=6681)
[8087359.580882] EXT4-fs (xvdf1): group descriptors corrupted!
mount: mount /dev/xvdf1 on /permaroot failed: Structure needs cleaning
mkdir: cannot create directory '/permaroot/old-root': Read-only file system
pivot_root: failed to change root from `.' to `./old-root': No such file or directory
Moving mounted file system old-root/dev to /dev.
mount: special device ./old-root/dev does not exist
Moving mounted file system old-root/proc to /proc.
mount: special device ./old-root/proc does not exist
Moving mounted file system old-root/sys to /sys.
mount: special device ./old-root/sys does not exist
Moving mounted file system old-root/run to /run.
mount: special device ./old-root/run does not exist
chroot: failed to run command '/sbin/init': No such file or director
is this something you’ve faced before?