People who don't use and who don't understand the differences between tape and disk have been saying tape is dying for years.
The most common incorrect assertion is that drives don't fail THAT often, so why not just back up to drives and put them on a shelf? But to be generally safe, all data would have to be on at least two drives, which changes the cost noticeably.
Older, used LTO drives are actually quite affordable, so I use LTO-4 and LTO-5 tapes to back up all sorts of things for both myself and for my clients. It's surprisingly easy and inexpensive, and pulling files back off is as simple as running pax.
One important thing is (at least a few years back) we realised that HDDs fail at an alarming rate (> 80%) if you disconnect them and leave them for a few years. Either that or someone brought a gamma ray source through our lab. Keeping the disks hot was a different story though.
I am not aware of any large-scale study, but poor HDD reliability in cold storage corresponds to my experience too.
I have archived data (almost a couple hundred TB in double copies) on more than 60 HDDs, about half from WD and half from Seagate, from several HDD generations, i.e. with capacities of 2 TB, 3 TB, 4 TB, 6 TB and 8 TB, most being from the 4 up to 8 TB generations. All HDDs were more expensive WD and Seagate models (i.e. with longer warranty times), not their cheapest consumer HDDs.
All files were checksummed for error detection. When the HDD content was transferred to tapes after some 3 to 6 years since the initial archival, almost all HDDs had a few errors, which sometimes were not reported by the drives even when they were detected by the file checksums.
Only the fact that all files were stored on at least 2 HDDs has prevented data loss, because even when both HDDs had errors they were in different locations.
I've never used tape on a regular basis but when I have I've had terrible experiences with low end tapes. I got kicked out of the computer center at my Alma Mater because I created a bunch of newsgroups. They (said that they) handed me my files on a SunTape that seemed to be empty when I tried to restore it later.
A few years later I got a QIC tape for my 486 PC and had similar experiences. There was even the time I tried to restore a several kb configuration file from a Tivoli tape robot and was quoted that it would take 18 hours and figured I could recreate it in much less time.
If I did it all the time I'm sure I would do better, but 2 out of 3 times or so I had tape backups fail on me, contrast that to almost always being able to read HDDs, even if there is some bit rot.
Most revive after being left in the fridge for some time. The bits and pieces contract at different rates, so cooling (or, I must assume, heating) is enough to get them unstuck.
Many drives I did that with lived happy afterlives after being revived, some having been brought back from the dead more than once.
The biggest expense in TAPE storage has always been the tape-changing-robot. (and if you ever worked with SUN HW -- the software was super expensive and the UI/UX of the backup software (logato? I cant recall the name -- was a PoS (not point of sale) and I hated it...
However, with that said, the price/GB/TB for tape is very cheap...
What would be a good idea would be to use tape duplication robots to take one tape to N tapes on a fairly regulated basis (like every 2,3,5 years) where you read and copy to new such that the medium (the cheap tape) is cycled through...
I have bunch of HDD's starting from more than 15+ years ago. I was recently checking and all of them are readable. Due tp the total number of drives I would notice 80% failure rate.
They're not used as a backup, rather I would start with the data disk and as the price goes down and my needs grow the drive would be replaced with the higher capacity one. And there is an extra set of drives (copy of those) I always back them up.
Is the 80% sampling a variety of manufacturers and batches, or a single batch of drives that were all ordered at the same time from the same manufacturer? If the latter, I'd suspect it's more of a realization that you got a flawed batch of drives (which is a real risk to manage, but not universal).
It's a variety of sizes and companies spanning 8 years. And we were talking about 100ish drives. Basically we realised our entire lab backup was useless.
My issue with tape is that unless your operation is big enough to justify a robotic tape library as a budget line item with support contracts and all, then you're down to paying someone (or eating the cost yourself) to physically swap tapes which is much more expensive and boring than deploying a ZFS server with gobs of disk that once configured, just sits in the rack and does its job quietly.
I think you’re dead on. I smaller scales, it’s cheaper just to keep more disks running. But once you get to scale, tape is great for archival work.
The trick I’ve always found is to figure out where you are on that inflection point. And it’s hard. Is 1PB enough to justify tape? (Which seems like a crazy question to me - I remember having megabyte sized tapes).
When I did the calculations a few years ago, break-even was somewhere around 150 TB, and I don’t think it’s changed too much since then. This is just considering the cost of drives and media. Obviously, the real inflection point is going to be different depending on all sorts of factors that may be specific to your situation and your priorities.
Usability isn’t something you can ignore, but it’s not like hard drives are perfect either—do you buy a big NAS / SAN setup and plug drives into it? Will it get full? And tape has the advantage that it’s completely immune, out of the box, to ransomware.
I think there are four cases that really scream for tape.
1. Data hoarders, who just want to store as much data as possible for the cost. There’s an r/DataHoarder subreddit if you’re curious about these people.
2. Archivists, who want to store lots of data long-term. Tape is a lot easier to store. I recommend that archivists standardize on a specific generation of tape for as long as possible and don’t mix generations (don’t mix LTO4 and LTO5, for example, despite the fact that the drives are advertised as working with both).
3. Companies with recordkeeping requirements, like SOX (Sarbanes-Oxley). Tape is just really good for that. It has a way of surviving problems in your IT department.
4. Companies with enough data that they can put a line-item on the budget for backups, and justify the operational cost of tape—support contracts, keeping staff on hand who know how to use tape, that kind of thing.
Of these, the data hoarders are going to use the 150 TB break-even point just because they want as much data as possible. Everybody else is going to make decisions based on other factors, like staffing or compliance. There are a number of gotchas, like problems with mixing tape generations, the prospect of using robotic tape libraries, and support contracts, that make the tradeoff much more nuanced.
Yes, considering the additional cost of a tape drive versus the price difference per stored TB between tapes and HDDs gives an intersection point somewhere between 100 TB and 300 TB.
Taking into account that for long term archiving it is necessary to store 2 or 3 copies of the data reduces proportionally the threshold above which tape is preferable.
The storage cost per TB includes not only the purchase price but also the lifetime of the media, e.g. a HDD model with a warranty of 2 years cannot be trusted to store data for a longer time.
Tape is guaranteed for 30 years, but the storage time is normally reduced to about 10 years by the risk that the corresponding tape drives may become obsolete.
Yeah. It gets even more complicated, because tape drives and tape media has different failure rates.
At the places where I used tape, we used a more efficient encoding for archival tape backups. Rather than storing 2 or 3 copies, we used forward error correction with something like 30% overhead. This, then, gets even more complicated to evaluate, because it multiplies how “hungry” your backup / archive system is for ingesting data in order to remain efficient & still write data out to tape by whatever deadlines you have set. If you store 3 copies of data on LTO-8, then you write data in blocks of 12TB, with one copy on three tapes. If you use forward error correction, you might do something like write out 96TB at once on 11 tapes. You use less than half as many tapes in the long run, but you need to feed the machine faster in order to meet deadlines.
While it is true that more complex error correction methods can reduce a lot the amount of tapes needed for storage, the simple approach of making 2 or 3 copies remains necessary in many cases, because storing the copies in different geographical locations is typically the only method that can protect an archive against a calamity that may destroy an entire storage center.
It is possible to use RAID-5/RAID-6-like encoding schemes that can survive the complete destruction of 1 or even 2 storage centers while using less tapes than with simple copies, but such encoding schemes can be used only by very large organizations, which use more than 3 separate storage centers.
Yes, the needs for archives (sole copy) and backups (additional copies of live data) are different.
> It is possible to use RAID-5/RAID-6-like encoding schemes that can survive the complete destruction of 1 or even 2 storage centers while using less tapes than with simple copies, but such encoding schemes can be used only by very large organizations, which use more than 3 separate storage centers.
Scenarios where two storage centers are destroyed—that’s extreme. The most paranoid scenarios I’m normally willing to entertain are along the lines of one data center burns to the ground in a generator fire, and somebody drives a truck full of backup tapes into a ditch and they’re all covered with mud and sand.
Tapes have a high enough failure rate that you benefit from forward error correction and you benefit from planning to handle individual tape failures. This includes stuff like the tape leader breaking, somebody losing a tape, damage during transport, water damage in storage, etc.
There’s a layered approach here, where you plan for different disasters at different levels of the stack. Each layer exposes some certain failure rate to the layer above it, and deals with some certain failure rate at the layer below it. When I think of backups, I often imagine a top-level data storage system that has a geographically distributed hot backup, and then an offline cold backup. This lets you survive complete destruction of one data center, or lets you survive a catastrophic software bug that destroys data (and a bunch of tapes are damaged on top of that). Pretty good baseline, IMO.
Another big case besides SOX compliance is medical records generally or imagery specifically.
Basically anywhere where you have a lot of data that has to be retained indefinitely for regulatory compliance or practical reasons is a great case for tape. But yeah, the robotic library and the service costs are pretty high until you hit a huge amount of data.
Source: supported a couple of large medical installations for a couple years many years back. Can confirm that you're dealing with a lot of mechanical complexity per GB until you get to an absolutely enormous amount of data. I genuinely can't imagine breaking even with a robotic library you can't climb into.
But presuming this isn't data with low-latency access requirements (since tape is pretty useless for that, so we wouldn't be making the comparison), what's the inflection point where it becomes worth the CapEx to justify even having your own "nearline" + archival storage cluster at all, vs. just using Somebody Else's Computer, i.e. an object-storage or backup service provider?
To me, 1PB is also where I'd draw that line. Which I would interpret as it never really being worth it to go to local drives for these storage modalities: you start on cloud storage, then move to local tapes once you're big enough.
(Heck, AFAIK the origin storage for Netflix is still S3. Possibly not because it's the lowest-OpEx option, though, but rather because their video rendering pipeline is itself on AWS, so that's just where the data naturally ends up at the end of that pipeline — and it'd cost more to ship it all elsewhere than to just serve it from where it is. They do have their self-hosted CDN cache nodes to reduce those serving costs, though.)
On the other hand, with either tape or hard drives, you can leave it on a shelf for 10 years and the data has a decent chance of still being intact. Proper procedure would dictate more frequent maintenance, but if for whatever reason it gets neglected, there's graceful degradation. With AWS, if you don't pay your bills for a few months, your data goes poof. Other companies might have more friendly policies, but they also might go out of business in that span of time.
I think someone else mentioned in this very comments section that hard drives "rot" while spun down — not the platters, but the grease in their spindle motor bearings or something gets un-good, so that when you go to use them again, they die the first time you plug them in. So you don't want to use offlined HDDs for archival storage.
(Offlined SSDs would probably be fine, if those ever became competitively affordable per GB. And https://en.wikipedia.org/wiki/Disk_pack s would work, too, given that they're just the [stable] platters, not the [unstable] mechanism; they would work, if anyone still made these, and if you could still get [or repair] a mechanism to feed them into, come restore time. For archival purposes, these were basically outmoded by LTO tape, as for those, "the mechanism" is at least standardized and you can likely find a working one to read your old tape decades later.)
Even LTO tape is kind of scary to "leave on a shelf" for decades, though, if that shelf isn't itself in some kind of lead-lined bunker, given that stray EM can gradually demagnetize it. If you're keeping your tapes in an attic — or in a basement in an area with radon — then you'd better have encoded the files on there as a parity set!
I think, right now, the long-term archival media of choice is optical, e.g. https://www.verbatim.com/subcat/optical-media/m-disc/. All you need to really guarantee that that'll survive 50 years, is a cool, dry warehouse that won't ever get flooded or burnt down or bombed [or go out of business!] — something like https://www.deepstore.com/.
But if you're dealing with personal data rather than giant gobs of commercial data, and you really want your photo album to survive the next 50 years, then honestly the only cost-efficient archival strategy right now is to keep it onlined, e.g. on a NAS running in RAID5. That way, as disks in the system inevitably begin to die or suffer readback checksum failures, monitoring in the system can alert you of that, and you can reactively replace the "rotting" parts of the physical substrate, while the digital data itself remains intact. (Companies with LTO tape libraries do the same by having a couple redundant copies of each tape; having their system periodically online tapes to checksum them; and if any fail, they erase and overwrite the bad-checksum tape from a good-checksum copy of the same data — as the tape itself hasn't gone bad, just the data on it has.)
Paying an object-storage or backup service provider, is just paying someone else to do that same active bitrot-preventative maintenance for you, that you'd otherwise be doing yourself. (And they have the scale to take advantage of shifting canonical-fallback storage to being optical-disk-in-a-cave-somewhere format — which reduces their long-term "coldline" storage costs.)
Instead, you're just left with the need to do the much rarer "active maintenance" of moving between object-storage providers as they "bit-rot" — i.e. go out of business. As there are programs that auto-sync between cloud storage providers, this is IMHO a lot less work. Especially if you're redundantly archiving to multiple services to begin with; then there's no rush to get things copied over when any one service announces its shutdown.
That’s a really good point. For us (near that 150-300TB inflection point for archival storage), it made more sense to put the data on S3 glacier. First off, the data is originally transferred through S3, but mainly, glacier hits the same archival requirements as tape, at a pretty compelling cost.
Yeah, but that ZFS server is likely in the same rack that's going to get knocked out when the data center gets sucked into a tornado, or whatever. Weather it makes sense to have someone on staff swapping tapes, rotating backups, shipping them offsite, and keeping records of what exactly is where not to mention handling AND TESTING restores really depends on what your data's worth. For most places, it's cheap insurance.
More realistically you are deploying two of the servers; redundancy is not backup. And now you have two servers to administer.
The whole point of something like tape is having an off-line copy of your data, ideally in a separate physical location. A second server with a bunch of disk in the same location connected to the same network isn't that, and will not save you from ransomware or natural disaster.
If you use even crude tools such as clusterssh, managing a bunch of machines isn’t linearly harder than managing one.
While tape renders itself easily to be offline storage (it’s offline as soon as you eject it after all), you can have that with remote servers that pull data to back it up instead of receiving pushed data. If no server can push data to any other server, only pull from them, a ransomware attack becomes a lot harder.
Also, while tape in a warehouse (or on the desk) is offline, tapes in the robot are no more difficult to destroy than hard disks. They are just slower.
Its also the software to run the blasted thing as well. As soon as you get into tape shit get's enterprise-y real quick. There are opensource tools to manage tape collections, but its not fun.
LTO tape libraries are fairly cheap to pick up second hand, its the cost of getting the newer drives that hurts.
There are tape libraries as small as 3U with 25 slots. That’s a capacity of only 300 TB with LTO-8. It’s not hard to justify if you’re working with stuff like video.
15 years ago I worked for a tape backup company that got bought out by a company making disk backups. Before they bought us out the CEO was constantly saying "tape is dead" in interviews, but once there was money for him to make in tape he stopped saying that. I remember when someone asking him about this and he responded something "amazing how money can make you change your tune"
I remember a gmail outage circa 2008 where they said they were restoring from tape, and people found the idea comically retrograde even back then. But I started at a company with large Commvault tape archives in 2011 and they mentioned that it was just more economical at scale for the workload (the backup team had some sweet gear that was apparently purchased with a kickback program on their LTO(4?) tape purchases; perhaps that assisted the engineering decisions) . However, by the end of that decade, they had migrated to mostly using large drive arrays with tape used only for offsite, so my impression was that tape was indeed a shrinking, niche method of data storage.
Fantastic talk, and I'm just 12 minutes in on a 1h14m video.
Take the Cartesian product, in fact, of those factors. I want location isolation, isolation from application layer problems, isolation from storage layer problems, isolation from media failure.
Edit:
They do RAID4 on the tape level. 5 tapes, one of them the parity tape.
When that fails: Reconstruction at sub-tape level. "We don't really have data loss."
"We make the backups as complicated and take as long as they need. The restores have to be quick and automatic. Recovery should be stupid, fast and simple."
Retrieval (recovery) from disk is easier and faster - ability to randomly read and write. Tapes are sequential - to get to a data which is only at the end, one has to go through all the tapes that is there before that.
The maximum tape seek time is around a couple of minutes. The average seek time is about half of that, i.e. around 1 minute (these example values are valid for LTO-7).
These seek times are seldom a problem. If one would keep on HDDs as much data as it is normally kept on tapes, i.e. at least a few hundred terabytes, one would have the same slow seek times as someone would have to plug and unplug external HDDs, unless the storage would belong to a very large company that could afford the high costs of keeping online many HDDs.
As an individual user, I keep in my computer, on an SSD, an index with the content of all tapes. When I need some file, I get it in at most 5 minutes, which includes starting the tape drive, going to a cabinet and taking the tape from a shelf, inserting it in the drive and waiting for the seek and copy operations.
This is perfectly acceptable, because I do not need files from the tapes every day. Storing the data on external HDDs would not reduce the time wasted with manual operations, it would be much less reliable and it would be much more expensive for great amounts of data.
The sequential transfer speed of tapes is greater than that of HDDs. Therefore, if after a seek you copy large files, e.g. a movie in BluRay format, the whole seek + copy operation takes less than from a HDD.
That detail is abstracted away from the user though, tapes north of LTO 5 (which is pretty old) allow addressing tapes like a normal file system, and if you are doing backup things, the contents of the tape are held separately from the data itself.
As an off-line archival backup medium, access times really are not a concern in the real world.
but, when you are recovering files, its rarely a random io type deal.
Even then, you generally just dump to the nearline/tape cache and fiddle with the data there.
A decent 25 drive tape library will easily saturate a 100gig network link, and its perfectly possible to add more drives to get more IO.
Tapes as part of tiered storage is something that is really powerful. Yes, flash is cheap, but not cheap enough to be used as long term backup (ie legal hold or general archive.)
Keep your expensive fast storage near your clients, the cheaper less fast, but more voluminous a stage away, and then dump out to tape after n weeks with no access.
That’s just not correct. Tapes can be used for either purpose, and with a decent backup software managing them, restoring specific parts of the backup is easy—you just need to wait for it to seek the tape.
This is incorrect. Last place I worked at with tape, the poor tape guy was constantly going back for some deleted a single file and needed it back situation.
Once, we had to set up a hollow Sharepoint and restore a Sharepoint backup just to get someone's deleted file.
Now, do I think this is a good idea? No. Frankly, people who cause these kinds of things need to see "This cost X hours to recover, stop doing that" as feedback.
I have wanted to use a tape drive for years, for storing backups. Partly because tapes are cheap and numerous, partly for the aesthetic, honestly. LTO-5 is indeed quite cheap now, you've inspired me to give it a go.
My 1980 dissertation research on the scattering properties of Jupiter's upper atmosphere is in my attic, stored on a 2400 foot reel of 9-track tape, 1600 bits per inch, NRZI, if I remember right.
Maybe 40 Mbytes of data. It's archived with the write-ring removed.
Somehow, I don't think I'll try to rebuild the scattering matrix for Jovian cloud particles...
Those bad boys (9-track tapes) could easily hold 40MB which was a few hundred times bigger than floppy discs of the 8-bit age and would add up to quite a few compact cassettes. (As expensive as it was, I was so happy to switch to floppy disk for my TRS-80 Color Computer because restoring stuff I saved on a cassette was always hit or miss.)
Mainframes in 1968 were handling much bigger data sets than you could handle with a micro until 1990 or so.
There is no way you could fit 40MB into an audio tape, even when using the fastest fast loader [1]. 40MB is 3.2e8 bits, which means you'd need a transfer rate of ~60kbps even with a 90-minute tape. 8 bit computers could do 1-2kbps at best.
Makes me a little sad to think of it gradually succumbing to bit rot, even though I know everything does and I don't mind that some of my own data is going the same way. If you want to preserve it I bet it wouldn't be too hard to rustle up volunteers to try to retrieve and transfer it.
I have always wanted to put a modern LTO head on one of those giant reel to reel tapes. Probably wouldn’t work right, but if it did it would be awesome.
For me it's crazy how tape vendors always advertise "compressed capacity" which is 2.5 times the actual capacity. Imagine a laptop containing a 2TB SSD getting advertised as containing a "5TB SSD"...
A lot of business data will be encrypted by policy before it gets anywhere near the (perhaps offsite) backup system. That means any compression the tape drive offers will be useless.
Why not compress the data before encrypting it? Seems like a really easy to implement solution that would speed up backup and recovery and cost almost nothing to implement.
Yes of course, but that still means the compression offered by the tape drive is ineffective, even though that compression is assumed as part of the advertised tape capacity.
The worst part of compressing at the drive is that you’ll need to transfer 2.5x-ish more data than you’d need if you compressed prior to transferring.
In the end, it’s encrypted, compressed, decrypted, and decompressed multiple times on the way from the platter (or flash) to the tape recording head. It feels kind of dumb, really, but I’ll agree that if the computer doesn’t have the private key of the drive, stealing the data on tape will be a lot harder.
Even last generation tape drives have hardware encryption built in. And it doesn't make sense to me that they would encrypt, then compress. That would completely invalidate the compression for absolutely no benefit.
Nobody trusts encryption built into tape drives for the same reason nobody trusts encryption built into hard drives.
Even things like Windows Bitlocker and LUKS/dmcrypt on Linux totally ignore the drives ability to do any encryption, and do all encryption using the CPU before the drive sees the data.
Bitlocker/LUKS could easily just calculate the encryption keys and send them to the drive and trust it to encrypt/decrypt data for them... but they don't.
It would be more accurately to say that all those who understand how encryption works (which excludes most people in managerial positions, even if they have the power to decide what kind of encryption is used in their companies) cannot trust any kind of encryption embedded in a hardware device whenever it has to be used for protecting really important data.
The reason is that all encryption is based on the separation of place between the encrypted data and the encryption key. Whoever can access the encrypted data must not have any way to access the encryption key.
Whenever you give your encryption key to a hardware device, or worse, when the hardware device also generates itself the encryption key, it becomes impossible to ensure that the attacker will not be able to access the encryption key.
It is impossible to know how the encryption keys are stored inside a hardware device and how and when they are erased and how easy or how difficult it will be in the future for an attacker to retrieve them.
It is impossible to believe any marketing claim of the vendor of a hardware encryption device about how tamper-resistant the device is, because such claims have very frequently been proven to be lies (even when the claims come from the largest companies, e.g. Microsoft and many others like it) and it is too difficult to distinguish truth from lies in such cases.
The only reliable means of encryption are in software, under complete end user control (or equivalently, in a custom FPGA).
Surely if the device automatically decrypts data as you read from it without having to supply a key or passcode of some sort then you're not protecting it against anything! It may automatically manage/store the actual keys used to encrypt the data, but there must be some mechanism to ensure decryption is not permitted until you supply some external form of credential (which ideally is cryptographically required to access the keys - if not then obviously there's a risk the device's controller code could be reverse engineered to bypass the credential check).
Considering that industry standard software like backup exec has built-in support for hardware encryption (which is actually part of the LTO standard), I would need to see a source for "nobody trusts encryption built into tape drives"
Using the encryption provided by the drive seems like a terrible idea. What happens in 10 years when you want to retrieve the data and that company has gone out of business?
It's functionality exposed by the drive as part of the tape standard, not something unique to vendor of the drive. Any LTO-aware software given the same key would be able to decrypt the data using any drive that can read that generation of tape.
That's probably one of the items on the list of reasons there are so few vendors in the space, LTO drives are essentially fungible and don't do anything interesting across vendors.
At present it's usually sent to the tape drive which does hardware data compression, hardware encryption, and then writes to the tape so it can be sent off site.
However, something close to your point is true. If they compress the data and then encrypt, the ciphertext will indeed be smaller than the original data.
The encryption used by e.g. IBM tape drives is two layered - strong/slow and weak/fast. The Asymmetric key encryption used is the "strong" encryption, but it's far too slow to be used to encrypt the whole tape. So IBM encrypts the tape data with the weak/fast symmetric algorithm and then encrypts the key for that with the strong/slow asymmetric one.
If you compress the symmetrically encrypted data, you will indeed gain some capacity. Not as much as you would compressing the raw data, but a visible amount, because the symmetric algorithm doesn't have the property of indistinguishability.
That property is not needed for tape storage because anyone who has the tape can safely assume it contains a ciphertext. The tape has header information saying whether it's encrypted or not.
If you read the links you posted, you will find out that it's not "most" cryptosystems, it's some. Some applications just don't need indistinguishability and LTO Tapes are one of them.
All that said, IBM does compress first then encrypt to make the most of the data compression, but the resulting ciphertext isn't completely random.
While you are technically correct - the best type of correct - the encrypted data should be indistinguishable from random data. If it’s not, you need better cryptography, as pointed out by the sibling comments.
Actually, it's not that simple. Read my response above.
You can't really say indistinguishability makes one cryptosystem better than another because it prevents distinguishing attacks because for some applications it's both not needed and computationally expensive.
That's because of different expectations. For a tape drive, you don't care about latency, power efficiency, random access or write amplification, all that matters is getting as much data as possible on a tape for the lowest amount of investment into tapes. So the tapes are advertised with the amount of net data that you can fit onto it, assuming a standard distribution of compressible vs non-compressible data.
For a laptop SSD, all of these matter. You can't do much compression because compression consumes a lot of power and latency upon access, some compression schemes make random access harder (i.e. compression schemes without an index where you have to scan through intermediate checkpoints or, in the worst case, sequentially through the entire media), and it may lead to write amplification as well if you need to add a piece of data in the middle of a file (basically the issue with shingle HDDs). As a result, no compression possible, and so SSDs are advertised with the raw capacity (or, in fact, they are underadvertised because SSDs need spare block capacity to account for wear).
But it completely disregards that most files you'd want to backup are already compressed! Even if I where to turn on aggressive compression for my ssd despite all the inconveniences, it wouldn't change much. I'll have to agree with the op, storage capacity marketing for tape is weird if not outright a scam.
A lot of tape usage is in mainframe and traditional big iron systems where data is bulk unloaded from databases in a fairly raw format, and it’s convenient to let the tape infrastructure handle both compression and encryption.
I had a lto4 (max capacity 800gb) last year that took over 5tb of data. The data was ASCII log files from a server that had been running for years. Lots of repetition (time stamps, everything ok messages, you have mail messages) and the drive compression ate it for breakfast.
Tape drives like LTOs use some fairly good hardware compression... believe it or not, it's significantly better than the average zip program. Depending on the data sent to the tape, it can achieve 50-60% reduction in size.
I administer a very very large tape library on a daily basis, with roughly 30pb of video in our backups. LTO tape is just simply more economical when you're at that amount of data. We looked into replacing it with object storage but the cost was more than double than it was to buy a LTO-9 tape library and a fast spinning disk storage pool for nearline archives than it was to go with the object storage solution.
Tape has come a long way since the 800 BPI (bits per inch) NRZI (Non-return zero insert) and 6250 BPI PE (Phase encoded) 2400' tapes I worked with in the 70's.
The tape sort algorithm read tapes backwards to avoid the delay of rewinding. Pretty cool.
If you put a ping pong ball atop the blast from the vacuum pumps, it would hover magically in the air.
If a tape got worn out from use, you cut off the first 100' and pasted a new silver Load Point Marker.
The first computer class I had was on TRS-80s. For a while we did projects that started and ended in the same class, so storage was not even on our radar. Then the proctor brought out a cassette deck and a cable and we were saving our programs on an audio cassette. Wat.
Don't underestimate the bandwidth of a stationwagon full of tapes hurling down the highway.
As a hobby, I've looked into second-hand LTO tape backup for my 71 TB NAS but the drive alone is pricey, and if it dies, how can I restore my data without forking over too much money?
> if it dies, how can I restore my data without forking over too much money?
In the unlikely event your NAS dies at the same time as your LTO tape drive, then you buy a replacement drive, restore the tapes, and sell the drive again on eBay, recouping most of your costs.
Fun anecdote... Many years ago, we had to send a large dataset from the east coast to the west coast. I think we had a 10 or 100Mbps WAN connection (this was the 90s). It was faster to freight airship the entire storage device than send it over the wire (which we did).
The amazing thing is that, at current drive and major cloud egress pricing, it is still cheaper to buy, ship, and throw away a disk than to send data over the network.
AWS charges 5 cents/GB at the highest tier. That’s $50/TiB (roughly). 18 TiB out from AWS costs $900. You can buy a nice WD Red Pro 18 TiB drive from Amazon for $298, and shipping it next day is not particularly expensive.
Of course, you can’t load data onto that drive from AWS, and you won’t pay anywhere near $900 to send 18 TiB over a network from any reasonable provider.
I think AWS has a similar service where they send a van to your private datacenter that has a bunch of storage servers in it to help you migrate to S3. You plug the van in the network and start uploading.
It's still the case! Upload speeds (especially on consumer-grade Internet connections) haven't kept pace, and AWS will send you box for you to copy your data into that you mail back to them. If you have a lot of data, they'll send you a shipping container!
Around the year 2001 my employer at the time was doing a Exchange mailbox migration from a datacenter in Connecticut to one in Atlanta. They paid for me to fly up there (coach fare), pick up the data on an external USB drive, fly back with the drive in my carry-on luggage, drive to the data center and plug it in.
I knew someone who had to stop by a secure data center on the way home and transport a backup tape to a safety deposit box a block or two away. Not the offsite backups I would have asked for, but that's what they did.
Current LTO drives require to be fed at hundreds of megabytes per second for writing. This is well beyond what 1-gig Ethernet can do. Does you NAS support that?
(You can occasionally get away with slightly lower feed speed -- the drive will write "invisible empty data" to keep the motors running and tape movement going -- but that cuts into capacity of the media in unpredictable ways. And eventually the drive decides it is facing data starvation and stops writing. The result of that is a few seconds of tape repositioning to restart the writes.)
ISTR that tape drive speed divided into the capacity was pretty constant across multiple LTO generations. About three hours to fill a tape, I think.
We used to run "Dell" storage and tape (rebranded ADIC changer and Quantum/IBM drives). A tape changer would autoload tapes under command of the backup software. It would run overnight, automatically changing the tapes two or three times, as required, without issues.
Restore was another problem, mostly the fault of the software.
You would tend to put the tape drive in the NAS, so backup from hosts to NAS over probably slow 1G ethernet, and backup from NAS disk(s) to tape periodically. Not writing from NAS to tape over ethernet, or from hosts direct to tape on the NAS.
Used 10G nics are pretty inexpensive these days, if you really want to write network to tape.
128EB of which only a tiny fraction is utilized. I imagine most tapes only get written to multiple times, but never read from. We’re supposed to restore from backup periodically to ensure the process is working but I’ve never seen EVERY backup being tested
It's interesting that they fell off the exponential curve in 2018.
The conventional wisdom is that the tape industry continuously backports hard disk media innovations that are a few years old, so the end game is that there will be a period after HD innovation ends that allows end-game tape to create an insurmountable advantage over the end-game HDDs.
Put another way, if HDDs stop getting bigger this year, then they'll end up competing with tape drives that continue improving density at current trajectories until ~ 2028, and choosing between 2023 HDDs vs. 2028 tape for archival storage will be a no-brainer.
Having said that, the rapid drop off in tape unit shipments combined with the stagnation of total shipped capacity suggests conventional wisdom might be wrong.
I wonder if HDD density improvements have already stopped, so now we're seeing the endgame.
But the increased density comes with reduced IO, similar to HDD. Most of the "density innovation" of LTO-9 etc comes with either just a longer piece of tape, or increasing the number of read/write heads along the same distance of tape.
That's why, while YoY growth of tape is (checks blog...) 5%, that's lower that global HDD sales growth even this year. This is a marketing piece from the LTO consortium.
Lastly, HDDs do way way better in the DC, operationally. From a much wider environment/humidity envelope of operation, to IO scaling linerally across data corpus, to siteops management, to having more than literally 1 vendor, etc.. That's why there are only two hyperscalars left deploying new tape, and even there it's dwindling or being deprecated.
Now HDDs have their own problems, and their density scaling with hamr/mamr/smr hasn't gone great, but no one is really asking for less IO from devices given power cycling drives tends to reduce IO just as effectively as fewer tape drives, and for far less money.
“That's why there are only two hyperscalars left deploying new tape, and even there it's dwindling or being deprecated.” Do you have a source for that?
The annoying thing is that the drives are so expensive.
What's in a LTO drive that costs so much to make, and why can't it cost the same as a hard disk? When I think of it a hard disk seems rather more complicated.
It’s probably economies of scale - not a lot of tape drives being manufactured these days and most are for enterprise usage, not cost-sensitive end users.
NRE: non-recurring engineering expenses. Every product needs to be researched and designed before manufacturing. Take all of those costs and divide by the number sold (which in the case of tape drives is very low).
As IBM seems to be the last man standing in the tape storage market, the TCO price of tape is not a market price against competitors, but against HDD storage.
The TCO of tape only makes sense at scale, and is much harder to assess than HDDs, as tape cost includes media, drives and libraries, each of which have different life times. Tape libraries can be used across many tape generations.
> As IBM seems to be the last man standing in the tape storage market
I wonder if IBM is going to introduce a new generation of 3592, or is LTO the “last man standing” too?
The latest IBM 3592 drive generation (20TB uncompressed), came out in 2018. At the time, their roadmap had future generations promising 30, 40, 80 TB capacity-but I’m thinking, five years on, if they were going to deliver any of that, surely they would have by now?
They're expensive because they aren't made for home users. Tape drives are made in far smaller volumes than say hard drives, yet have most of the same R&D costs.
I seem to recall articles 15-20 years ago about solid state 3D storage using lasers. Spinning HDDs are limited to 2D. Tape is 3D but requires cumbersome tape movement.
Whatever happened to that area of research? Did SSDs and flash drives kill off that idea?
Samsung's 8th generation Flash NAND (which is used to create Samsung SSDs) is 236x layers thick. Effectively forming a 3d mesh (albeit the mesh is created one-layer-at-a-time, but any 3d structure probably would be done like this).
That's how modern SSDs achieve so much density. Alas, the machinery to etch silicon chips in this way remains very expensive, so I bet that Tape and Hard Drives remain the price-per-TB king for the foreseeable future. I'm pretty sure SSDs have won the density crown however.
------------
IIRC, BluRays achieved 4x layers through focusing the laser, reaching 100GB per disk back when BluRay research was still popular (15 years ago or whatever). I don't think it ever was a popular format, but it existed as a crude 3d layer.
I guess Hard Drives are like 8x platters or more, achieving a degree of 3D as well.
But I presume you're talking more about technologies that achieved hundreds of layers (Tape, thanks to "rolling up", or SSDs/modern Flash with 200+ layers, etc. etc.)
> Commence station security log, Stardate 47282.5 - At the request of Commander Sisko, I will hereafter be recording a daily log of law enforcement affairs. The reason for this exercise is beyond my comprehension, except perhaps that Humans have a compulsion to keep records and lists and files. So many in fact, that they have to invent new ways to store them microscopically. Otherwise their records would overrun all known civilization. My own very adequate memory not being good enough for Starfleet, I am pleased to put my voice to this official record of this day. Everything's under control. End log.
Correction, it is backwards compatible for one tape generation. (So your -9 drive will work with -8 tapes with no problem, but you would be out of luck for -7)
That being said though, tape generations advance infrequently enough that this would never really be a problem. And if it is a problem, the secondary market is absolutely loaded with older gen drives going all the way back to LTO 1 if you really need it.
I hope they keep going. I used to work with big robotic tape libraries (ADSM software, can't find that it even exists anymore), and they were something to behold.
I know that for now, tape wins versus optical (Blu-Ray Recordable etc.) in terms of cost.
Do people expect that to always be the case, or is anybody expecting tape to have been supplanted by something else (optical or other) 20 or 30 years from now?
Or is there just something about the physics of tape technology that we can't even imagine anything else in the medium-future that could even begin to compete with it for archival needs?
(Knowing nothing about the area, just as a consumer I would have expected that with the advent of 2-layer DVD's and then 4-layer Blu-Rays, that before too long we'd be getting 1,000-layer discs and essentially moving into 3D storage cylinders by now or something that would have left tape in the dust...)
Optical vs. Tape is only a discussion if you intend to archive (and not read!) your data for years. This is a very uncommon workload, and thus the size of the optical market is much smaller than necessary to get the COGS down.
The vast majority of even backup data is legally required to either be readable in a very short amount of time (say one week) or is required to be deleted within 90 days. Neither of these are ideal for optical (or tape btw, thus why it's YoY growth is only 5% in this LTO marketing piece)
>The vast majority of even backup data is legally required to either be readable in a very short amount of time (say one week) or is required to be deleted within 90 days. Neither of these are ideal for optical
Wouldn't an autoloader solve the the former, and a fire solve the latter?
You are assuming the data is compacted such that the 90day data is grouped by expiration date. This is not how production deletion works at any reasonable scale where cost of media matters.
In reality, the data is all mixed up because data gets backed up usually around the time it's written, and gets deleted in a completely different pattern. The art of compaction is very important, and things like optical and tape make inaccurate compaction extremely expensive due to the reduced IO.
And even if we pretend you do have perfect compaction, you still don't have nearly the IO that HDD would provide. Considering hypothetically you have perfect compaction, that also means you have the perfectly smallest live data set, and thus the HDD premium is even smaller for wildly better throughput in disaster.
Are there any tapes that offer "read and write simultaneously" functionality?
A common usecase for tapes seems to be to store a big archive of data. Imagine having some API to my archive, which says for each record/file stored whether I want to keep it stored, read it, or discard it.
Over time, many tapes start to be only half full of data I want stored (the rest has been discarded). It makes sense to read all the important stored data, consolidate it, and rewrite it to a new tape, perhaps with some new data.
For this usecase, you only need half as many tape drives if a drive has the ability to read data while rewriting a new data stream.
I don't know about the drives themselves, but a company I worked for once used large, robotic arrays with 16 drives each. Typically they ran 15 drives writing (backups) and left one idle for reads (restores) as needed. The tapes were on the order of a kilometer long, so considering the time for a robot to pick the tape and seek to the proper spot, latency is pretty high in the best of circumstances (no one else is using the one reader drive when you request it).
But the drives that are doing writing... Either they are writing onto an entirely new tape, or they're writing over an existing tape. If it's an existing tape, you need to make sure that everything on the tape is not needed anymore. Yet commonly 75% of the data on the tape isn't needed anymore (retention period expired), but some might still be needed. Unless you have a hard drive somewhere with another copy of the data, you need to read every tape before rewriting it.
I suspect your use case is strictly for backups, whereas my use case is more archival (where all copies of the data are on tape, and there are no copies on hard drives).
These tape libraries can also defragment tapes. For this all the data that needs to preserved is read from the tape and written to another tape.
Even for backups you have a consolidation/defragmentation task running every so often that frees up space on the tapes and only keeps the backups you specified in your retention policy.
While that isn't impossible, nobody will do that. Disks are divided into sectors, with "a lot" of empty space between each sector - random variance makes different sectors take up different amounts of space on the disk. (minimizing this is important to disks, but the nature of physical media is you can never fully eliminate it). With a tape you don't have sectors, and so no blank space. As such as soon as you decide to delete one record the entire rest of the tape needs to be re-written.
There have been people who put a filesystem on a tape with sectors - they could then do this, but since tapes are always sequential access it isn't practical, just an interesting hack to show off.
Log structured file systems solve this problem[1]. They can be extended to tape libraries in various ways. Tapes are still rewritten, but not nearly as often as once per deletion.
After few years, like 5 years to 10 years, you will want to rewrite the data to a new tape. Even LTO tapes will age and go bad after a while. You don't want to be the one who finds out the hard way how long LTO tape actually lasts.
And there is no avoiding reading all the data on the tape when rewriting to another tape. The media is cheap after all. And therefore "discard" is largely pointless, unless it's 90% or more of the data.
It would be interesting to see how much data you could fit onto something the same size as an ordinary dual reel audio cassette, or even microcassette, with modern technology.
LTO-8 is 12TB of raw storage on 960 meters of tape.
Wikipedia says that a C90 cassette tape was 135 meters.
Now ignoring all the other differences (width, serpentine winding, chemistry, etc. etc.), that gives us roughly ~1.6 TB of data if a C90 (90-minute) cassette tape had the rough bits-per-inch of a modern LTO-8 tape.
Through the miracles of science, I discovered that ultrafines and microfines (from an unknown source in the datacenter) were exactly the particles needed to permanently ruin the heads of about a sum total of a dozen AIT-2 drives of a dual drive Compaq SSL2020. Every week or so, a drive would start throwing errors and eventually be knocked offline.
Oh the joys of tape barcoding and offsite tape vaulting.
The lock on the office data safe, a large FireKing, was an interesting disk lock.
People don’t understand the reliability of tape. I would never trust a hard drive for archival storage. Also, offline backups are an amazing insurance against ransomware. Just let a company like IronMountain pick up the tapes, it works well.
However it feels irrelevant for a company who exclusively operates in the cloud.
On the other hand, in anecdote land, in 1.5 decades of use, I never had a tape successfully restore without some kind of failure, corruption or other problem.
Tapes can burn. Give me multi-location online/nearline disk storage any day.
We had nearlines that were basically archive targets, that stored about 6months to a years worth of versioned data. We also dumped to tape at the same time (one for data interchange with other companies, and one for our actual backup)
WE arse saved multiple times from both tape fuckery (mostly the robot being sad) and nearlines going whoopsey (they had reduced redundancy, and a whole bunch more nodes to allow for the size)
Every time I look into tape I find the drives are prohibitively expensive. An LTO-8 drive is US$3000+. Unless you have hundreds of TBs it's not worth it.
That's because you aren't the target for tape backup - big enterprises with thousands of terabytes of data are. They spend hundreds of thousands of dollars on big automated tape libraries, and it's just a number on a budget spreadsheet.
I understand that but it's frustrating. The only reason drives are as expensive as they are is due to a lack of competition. There is no way they cost anywhere near $3000 to build.
Cost to an end user is more than just the actual cost of components. There's a decent amount of vendors that actually make tape drives, so if it truly were much cheaper, someone would undercut to get business.
Given these are intended for enterprise usage, there is an expectation of quality, service, and support associated that you ARE paying for with that cost.
LTO5 tapes store 1600GB. Glancing at ebay, drives seem to be $200, tapes are $30. (+the server with a SAS card in ofc). It's always going to depend on your scale.
The most common incorrect assertion is that drives don't fail THAT often, so why not just back up to drives and put them on a shelf? But to be generally safe, all data would have to be on at least two drives, which changes the cost noticeably.
Older, used LTO drives are actually quite affordable, so I use LTO-4 and LTO-5 tapes to back up all sorts of things for both myself and for my clients. It's surprisingly easy and inexpensive, and pulling files back off is as simple as running pax.