Why Consumer NAS are a Bad Idea for Long-Term Data Storage or Backup

Lars Meinel
7 min readApr 23, 2019

--

In recent months I experienced two cases of failing NAS hard drives leading to (almost) catastrophic data losses. I think the causes are immanent to the concept of consumer NAS appliances as set-and-forget solutions. In this article I will share my observations and opinions to help spreading awareness so you will not experience the same issues.

Who this post is for

If you’re anything like me, storing your personal data safely is very important to you. You are afraid of data loss and important documents or valuable memories such as photos or videos being gone forever. Hence, you want to take precautions to avoid that. If you have a copy of all your photos on a single 10 year old, 50$ external hard drive and you’re fine with that, this blog post is probably not for you. Though I sincerely recommend changing something about that. 😉

Trying to make it better

In order to keep your personal data safe, many people use one of the following solutions: A cloud backup service like Backblaze, SpiderOak, Carbonite etc. to backup data directly from their PCs or — maybe if the available upload bandwidth prevents this or they have privacy concerns about the cloud provider — implement at least a local-only backup solution.

For this use-case NAS (Network-Attached Storage) appliances make sense. They are available on the network to allow backups from multiple computers and users. Also, the always-on character of those devices allows automatic additional backups to external drives, other NAS units or even to the cloud — which would be ideal (see 3–2–1 backup strategy).

Backing up to a NAS feels safe since any multi-bay devices allow you to introduce redundancy in case one of your drives fails — which they are going to do at some point in time! With redundant data storage you expect to be protected from drive failure by using multiple physical drives, aka RAID (Redundant Array of Independent Disks). With RAID 1 you can have duplicates of your data on a second drive also improving read speeds. For an array of more than two drives RAID 5 / 6 calculates a parity to allow for 1 or even more failed drives. While those mechanisms are well proven in data centers, they are likely to fail in personal NAS boxes.

The problem(s) with consumer NAS appliances

Consumer-grade NAS-boxes are used as a set-and-forget solution and consumers trust the redundancy promise. This is an inherent danger to your data. Let me explain why the redundancy mechanisms are likely to fail:

One part of the problem is relying on the self-diagnosis of your drives. This function called S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology) is integrated in any modern drive and measures different attributes of your drive e.g. the number of bad sectors or disk-spin retries. If one of the several dozen measurements is above a certain threshold, the drive will report a S.M.A.R.T. failure. The web-management interface of your NAS will show this error or even notify you. In this case you should immediately replace the defective drive and let the NAS restore your data from the redundancy drive(s). However, sometimes S.M.A.R.T. reports too late or not at all. A study by Backblaze shows that about 23% of their drives failed without S.M.A.R.T. reporting.

This is exactly what happened to me recently. I had been using a Western Digital My Book Live Duo (2 x 3TB in RAID 1) as a data archive, i.e. all my photos and videos have a backup on the MBLD. Once the local storage in my computer is full, I delete the original files on my PC but keep the backups. At some point, the NAS started to behave strangely, like it didn’t react for a while, the transfers were very slow or it stopped working completely and had to be hard-rebooted by unplugging the power cable.

MBLD Web UI showing no drive fault.

The web interface showed no problem but I was curious and did a double check by logging into the Linux console with SSH (which is a hidden feature for my particular model, not recommended for the average user). Et voilà: Kernel dmesg reported EXT4-fs errors and rsync that I triggered to secure some data was extremely slow and showed multiple I/O errors. With smartctl I checked both drives for errors finding /dev/sda to be the faulty one:

Surprisingly S.M.A.R.T. still showed PASSED while already one self-test had failed at 10% and the important S.M.A.R.T. attributes 5 (Reallocated Sector Count), 197 (Current Pending Sector Count) and 198 (Uncorrectable Sector Count) reported values above zero. According to the Backblace study the probability of the drive still being operational in this case is 0.05%. Hence, the drive was clearly dead.

What makes S.M.A.R.T. reporting in NAS devices almost useless.

Making matters worse, a few minutes after I checked smartctl, the MBLD stopped responding and didn’t reboot. If I wouldn’t have checked for errors before the device was unresponsive, I wouldn’t know why. And even if I concluded that it must have been a drive failure, I would not know which drive to replace. However, the average user will probably never see the fancy web management interface reporting a disk failure nor be asked to replace the faulty drive.

The reason for this is that in many NAS appliances the operating system is stored on the HDD array itself. When one of the drives fails, the complete system is affected. It may not be able to boot anymore or start the server for the web interface. If the OS is running from a separate flash or SSD drive, the system will continue normally, report a disk failure (in those 77% of the times S.M.A.R.T. actually detects the problem) and let the user replace the drive. Most NAS models have a built-in flash drive but it is only used to restore the OS, e.g. in case you install new drives.

Warning the unconcerned user.

Something similar happened to a colleague of mine some months ago. The NAS he deployed for his family (a dual-bay Netgear ReadyNAS in RAID 1) stopped responding at one point. He brought it into the office and asked us to check for problems. We extracted both drives and mounted the raid members separately on a Linux machine, soon learning that both drives were defective. Sadly, we were not able to recover any data by copying it to another drive. The only chance would have been to send the drives to a professional recovery service. If you’re interested what these guys can do, watch this video.

However, if the root cause is surface degradation, which was likely in our case, the data recovery service wouldn’t been able to recover the data within the bad sectors. Those kinds of worries are exactly what you want to avoid by using RAID redundancy in the first place.

My colleague mentioned that the device started behaving odd a while ago but he was unconcerned about it and only expected a minor network issue. But most likely one of the drives was already starting to produce errors. It often takes several minutes to read a single bad sector, which makes the drive feel extremely slow and further damages the surface. But the ReadyNAS let him ignore the issue probably for months before the second drive failed as well. This is very likely as both are exactly the same age and under the same wear and tear. Whether in this case S.M.A.R.T. wasn’t reporting or it was just not notified to the user remains unknown.

What can be done to mitigate these dangers?

If you are using your NAS just to serve media on your network (such as movies, of which you have the original DVD or Blu Ray discs) you are fine.
However, if you have important data on your device:

  1. Stay alert! Your data is probably less safe than on an external HDD somewhere in your cupboard that you seldom use. If your NAS is acting sketchy, check for drive failures using the terminal and interpret the S.M.A.R.T. attributes manually. Maybe consider replacing a drive ahead of schedule.
  2. Additionally, make sure your NAS is able to inform you in case of a drive failure. I recommend not using the HDD array for the system partition at all. Actually, my next NAS will be a custom build server with FreeNAS or OMV. That way I can keep the system and data partitions on separate drives.
  3. Of course you should also implement the 3–2–1 backup strategy and have another local and offsite backup of your data. This lets you restore most of your data in case the redundancy mechanisms fail.

I hope this article helps you with keeping your data and memories safe or at least provides some useful information about storage options. If you have any questions, additions, comments or similar experiences, please leave a response below.

--

--