First, this picture is about one hard drive that is really old and not related to the post. Why? I don’t want it to be about any specific models or brands.
At the beginning of September, we added a new drive to our servers. Everything looked rosy. We added some initial data to the drive and then it started collecting and serving every-day data from our beta customers.
Imagine our surprise when one day, approximately one month later, the drive suddenly became read-only. We use Linux on our servers and Linux does its best to protect data if it notices any hardware problems. And there they were; our hard drive’s SMART (also written as S.M.A.R.T.) monitoring showed below lines (there were more of course but these got our eye)
|5||Reallocated Sectors Count||3|
|197||Current Pending Sector Count||160|
|198||(Offline) Uncorrectable Sector Count||162|
SMART 5 – Reallocated Sectors Count: This itself would be okay. Rare with quite a new drive but it happens. Although high number is at least a warning about a drive that could fail soon.
SMART 197 – Current Pending Sector Count: this parameter is a critical parameter and indicates the current count of unstable sectors (waiting for remapping). link
SMART 198 – (Offline) Uncorrectable Sector Count: the total count of uncorrectable errors when reading/writing a sector. A rise in the value of this attribute indicates defects of the disk surface and/or problems in the mechanical subsystem.
Based on these errors, Linux made the right call to switch filesystem to read-only mode to prevent any further regressions.
Our storage system is designed to prevent data loss if single (or multiple) hard drives fail. And in our case also one of the underlying raw data files became unreadable. But luckily this is handled automagically by Storadera storage systems and the customer was and is able to read the data at any time.
If everything is fine, why the blog post you may ask? Well, if this hard drive would have ended up somewhere else then there could have been data loss only after one month of usage. Hard disk drive failures do happen. Many people and companies are not backing up everything they need. A drive’s age does not guarantee that the data will stay intact.
Please, back up your important data!
Take a look at a list of applications that can help you start backing up.