Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's extremely unlikely. Could it have been the controller instead? Which HGST drives?


I've frequently had drives in a RAID fail in rapid succession. If you buy a bunch of identical drives at the same time and put them in a RAID, then you can end up with:

* They were manufactured in the same batch, maybe even one right after another on the same line.

* As they were transported from manufacturer to OEM to you they were exposed the the same environmental conditions, right down to vibrations, humidity, and ambient EM environment.

* As you use them, they continue to be exposed to the same environmental conditions, including power supply fluctuations and power inductively coupled into places it doesn't belong.

* They see the same usage patterns. Depending on the RAID specifics, that might be right down to seeing the same disk locations seeing the same read and write volume.

Its then not surprising if they fail at about the same time.

The last machine I put together that I wanted to have high availability, I intentionally bought two different brand drives to put in the mirror to maximize the likelihood that they fail at very different times.

Many years ago (c. 2003) the group I was working in inherited a massive 6U storage server with an insane number of 10k SCSI (it was before SAS was a thing) drives. We named it "hurricane" for the sound it made. After a few weeks of using it, the first drive failed. It rebuilt to a hot spare and we ordered and eventually installed a replacement. A few weeks later, another drive failed, and this time before it could finish rebuilding, two more drives in the RAID failed and its contents lost (but we had a good backup). We never used it again. For a while I used it as a coffee table, but then someone convinced me that was too tacky, and it got ewasted.


> Its then not surprising if they fail at about the same time.

It is, but in a different way. It is a testament to the depth and precision of manufacturing process control, that two insanely complex machines will behave nearly identically for years, up to the point of failing at about the same time, if they've been made in the same batch, and exposed to about the same environment and usage patterns over those years. You'd expect any number of random factors to cause one drive fail way before the other, but no - not only there is very little variation between drives in a batch, tiny variations in usage are damped down instead of amplified.

It truly is amazing.


> If you buy a bunch of identical drives at the same time and put them in a RAID

When setting up a new machine with zfs I intentionally buy drives from as many different brands and models as possible to spread the manufacturing defect risk.


Not extremely unlikely if they were identical drives from the same manufacturing batch. It's good practise to use diverse manufacturers or at least batches when adding disks to a raid array for just this reason.


It's not unreasonable to believe that if you pick two identical products off the same shelf at the same time (as one would logically do when purchasing 2 of a single item), that the two products were manufactured at similar times and in similar conditions.

Your model isn't exactly bad, but there is an assumption being made that you haven't accounted for. Which to be fair, is frequently not stated. The assumption is that the drives defects are independent of one another. This is a poor assumption when manufactured back to back.


https://news.ycombinator.com/item?id=32026606 Hacker news went down a while back because of the 40k hour bug. Both the primary and backup servers were placed into service at the same time with ssd's that had an overflow after ~40k hours.


The drives themselves were toast. My hypothesis was a short in the raid controller or something leading to an over current in the drives.

I wasn’t using them in a RAID configuration, but they were attached to a raid controller.


Or could be that the tolerances and environmental factors were so tighly matched between the drives.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: