This is great, but I’m disappointed Backblaze B2 isn’t included. That seems like an oversight unless someone can point out how B2 doesn’t hold its own with these options in a glaring way. There are tradeoffs, but B2 seems to be very competitive overall.
B2 is cheaper than every option here for both storage ($0.005/GB) and egress ($0.01/GB).[1] Their transaction pricing is also cheaper.[2] Despite being cheaper, it’s still hot storage, so you can immediately download buckets, in whole or in part. I’ve personally used it to backup (and restore) terabytes of data for over a year. I doubt it has an SLA like GCP or AWS, but DigitalOcean doesn’t either, yet it’s listed here. I find the B2 API documentation to be very readable as well.[3]
I’ve used AWS S3, Glacier and GCP Nearline, Coldline. I can’t think of a specific thing that has disappointed me about B2, and the reliability has been excellent. The nature of my work is that I have very large datasets, and B2 becomes extremely competitive when you’re backing up tens of terabytes or more.
I really wanted to like B2. I love the docs and super-clean interface. But B2 was just too "weird" when it came to uploading an object. You can't replace an object key, so you always get a new object key when uploading. This is unlike any normal object store, where you can just upload to an existing key.
Also, if you upload an object with the same name (e.g. myphoto.png) it creates a new version, and there's no way to stop this. I don't want or need a new version!
I have a feeling this restriction is in place because of the underlying "vault" implementation.
B2 is great as a backup store. I use it for backup.
I wouldn't use it for non backup object storage, as it's in a single data center. S3 and Google Cloud are completely different, I'm not sure on the Google specifics but S3 has data replicated across three AZs.
I don't know about that. In my case I'm backing up a backup, so I have to lose both backups. B2 is just for the offsite copy of the backup. Like taking a tape home sometimes...
In fairness backing up is the last place I want to hand over redundancy responsibility. Better I consider B2 as one part of the 3-2-1 backups than the be-all end-all
If you upload an object with the same 'name' to S3, you also get a new version.
When you say 'key', do you mean the opaque object ID granted to objects in B2 which object stores like S3 doesn't have at all? I don't understand the rant here. S3 only operate by 'name' (key).
I have had no problem integrating B2 side-by-side with S3 and GCP in the products I have written. Their high-level models are largely compatible.
Ah, fair enough. I have oddly enough never in my life written to the same key twice, so I hadn't noticed it was off by default for S3.
Just to clarify you statement about atomicity: All writes to S3 and B2 are 'atomic' (and B2 can also verify the hash and reject on failure for an extra layer of security). The difference you mention is just that you can superficially "disable" versioning for S3, so that each key only stores one object version at a given time.
The only difference to the upload when S3 versioning is disabled is what happens during the metadata update: With versioning, a version is appended. Without, the version is replaced.
For B2, simulating disabled versioning is two operations: Upload a new object, and delete the old one. As long as the object is only referenced by name, this will also be atomic.
B2 doesn't have metadata version numbers, but instead use the globally unique file identifier to refer to specific versions. When you list file versions, you're presented with file ids rather than version numbers.
It's only slightly different from the name+version approach, but I personally like the concept of unique identifiers better. It feels nicer.
I'm on DigitalOcean, and I love it, but I use B2 for backup because it just naturally seems safer to have the backups with a different company. It is great for that. I found the object upload a bit weird / different, but I appreciated how it stops you from shooting yourself in the foot by default, rather than the other way around.
None of the other multi-region competitors provide multi-region durability "out of the box" (although S3 does offer cross region replication as an option), so you might consider distributing your data between two object stores if you need multi-region (AWS us-east-1 and Backblaze, for example). This ensure not only geographic redundancy, but also vendor redundancy (not to mention B2 is cheaper than S3).
Not sure what you mean by "out of the box". Both Google Cloud Storage and Amazon S3 have cross-region replication built in. Something not "in the box" would imply you have to build it yourself, or buy a vendor product, to get it.
Google Cloud Storage's multi-regional support (asynchronously replicated to two or more geographic locations) is even easier than AWS: You simply specify, on bucket creation, whether the bucket should be regional or multi-regional.
The "geographic resiliency" chart is pretty inaccurate for the major providers, especially folks like Google cloud that have multi regional offerings with 99.95% availability & a lot more than "1" for redundancy.
I'd also add - unless you are going to include a median/mean latency number many options will look absurdly good even though we know they're a horrible idea, e.g. https://en.wikipedia.org/wiki/GmailFS. I also think "unknown" availability looks a lot like a durability issue..
trying comparing performance: latency, bandwidth up/down and scalability in the face of many concurrent requests.
And then do that wrt. compute resources in various locations.
I was really excited about DO spaces. I compared every major Object Storage (OVH,B2,Wasabi,S3,Azure). DO spaces came out much ahead. I did dozens of hours of research. I was a customer (and I still am). But I am less excited now.
Basically, there are loads of issues with rejected requests because of rate limiting (returns a lot of 503 "slow down" responses). Basically, I don't recall ever receiving this from S3. You can check the forums to see more in-depth discussion.
The good part: This is a solvable problem, and I hope they relax these limits very soon.
Another great anecdote: their API is 99% compatible with S3. In fact, the official recommendation is to use the AWS SDKs on the server, which I am doing!
S3 will sometimes return a "503 temporary error" response if you start writing lots of files per second. From my understanding, if they see that your bucket has a constant high write rate they'll make some configuration changes in the background to accommodate the higher write rate.
That being said, last month I wrote over 60 million files to S3 and the number of failed writes were tiny (solved by simply retrying the write)
I've not used DOs spaces yet, I'd love to know if they guarantee read after write consistency. I know S3 has some issues with that depending on use case. Maybe it's time for another look at spaces.
Yes, DO spaces is very strict about both GET and PUT. I did a benchmark requests-per-second (literally just fetching a URL a bench of times). I get about ~180 requests per second, after which all requests failed.
Amazon S3 is much better in the sense that there IS dynamic scaling if it notices spikes.
To the defense of DO, they are newer and their business model is "cheap,cheap,cheap", so they can't compete at the same level.
S3 will automatically shard your bucket evenly across your sub-namespace if you use pre-shardable names whenever possible.
This sounds complex but is actually fairly easy to do. For example, if you are not dealing with randomly generated ID's, try using Base64 encoded keynames rather than the keynames themselves. (A more limited character set is helpful, though.)
A better name than user_f38c9123 is f38c9123_user, which will allow a statistically even shard size across the full 16 character range of the first character. As more performance is needed, S3 will automatically shard the second character into 16x16 (256) possible shards, etc.
Also, using a more limited character set such as hexadecimal (that is, [[a-f][0-8]]*) (or just numberic digits) for the first characters of a filename will shard more evenly than a full alphanumeric [A-z][0-9][etc].
Jeff Barr had a blog post on this a while ago.. here it is:
"By the way: two or three prefix characters in your hash are really all you need: here’s why. If we target conservative targets of 100 operations per second and 20 million stored objects per partition, a four character hex hash partition set in a bucket or sub-bucket namespace could theoretically grow to support millions of operations per second and over a trillion unique keys before we’d need a fifth character in the hash."
We actually do exactly this in Userify (blatant plug, SSH key management, sudo, etc https://userify.com) by just switching the ID type to the end of the string: company_[shortuuid] becomes [shortuuid]_company. It makes full bucket scans for a single keyname a bit easier and faster if you use different buckets for each type of data, but you actually will get better sharding overall by mixing all of your data types together in a single bucket. The trade-off is worth it for the general case.
The example is 200GB storage with 2000GB data transfer (out) every month. That's a LOT of data going out every month, so I am guessing the scenario is if you are hosting a photo library and lots of people are downloading every month.
If however you are just using the service as an online storage to hold <100GB of data as backup (i.e. mainly only transfer in), then S3 turns out way cheaper than DO.
Not knocking either service - I actually use both, for different use cases.
It's difficult to use with speculative numbers too. I recently compared Imgix and Cloudinary for use in front of an origin server and the numbers that we projected with made Imgix look 3x more expensive. When we put it against real usage, Imgix was actually 1/4 the price of Cloudinary.
It's difficult, sometimes, to guess at numbers that may or may not play to a specific service strength. A lot of times there's something hidden in the details that you're missing as well. In our case, we didn't notice that Cloudinary didn't have an overage rate so if you go over any of the capped limits on a particular plan you have to move up to the next plan level. It was a very unpleasant surprise. We didn't notice it to compare because it simply...didn't appear anywhere.
Without looking at the numbers in the article, I think the use case of storing data you never access is more of an edge-case? Sure a meager 10x down to up ratio (100gb up/month - distribute to 10 locations / people) will be very unfavourable to Amazon.
But what's the use-case for uploading 100 gb a month, and then... Not deleting it (keep paying for storage) and not accessing it?
So $5 is way cheaper? if you need any of those data down the line, transferring out, any savings you have with AWS in the past year or years are going to be wiped out.
I think it would be good if people didn't always pick these services based on price, since that's how you end up with one monopoly service and no alternatives.
Support alternatives even if they are a bit more expensive.
Agreed. The price difference in a lower usage case is not meaningful. I would choose based on ease of use, reliability and functionality.
The main thing stopping me from using Digital Ocean is AWS RDS which is amazing. If they could bring out a similar solution with backups and MySQL and PostgreSQL, that would be amazing. :) I could then run my apps in docker.
Well, I actually agree with you. In my first post above my reply, I did say that I actually use both services based on my use cases. Both excellent, and work well (Oh, and I also use the Minio open source solution for one particular project too). Horses for courses.
Cool site, but noticing an error: This compares AWS S3 single region pricing ($0.024/gb) to GCP GCS multi-region pricing ($0.026/gb) rather than GCP GCS single region pricing ($0.020/gb). Hopefully the creator/author will correct the discrepancy... Disclosure: I'm a pricing dweeb at Google Cloud
The bandwidth egress charges on AWS (and GCP/Azure) are way too high. It almost seems cartel like.
Bandwidth costs have dropped by a huge factor over the past few years; but none of this has been passed on.
I really hope backblaze and/or DO manage to cause the big three some hurt on this and get them to reduce prices significantly; 7c/GB is really high these days.
What are bandwidth costs? This comes up again and again, people see HE.net / Cogent / Level3 / Hibernia / NTT whatever offering 10gig handoff IP transit at a colo neutral location for X per month and somehow determine the true cost. When I suspect the reality is that much of cost is invested in routing equipment both internally and externally, evolving SDN and all the goodness it affords, being able to offer SLA per instance in terms of mbit that can be pushed, being able to mitigate DDOS, hiring top network engineers and researchers, being fully redundant across multiple providers and protected circuits rather than just a lonely hand off - or in GCP case building out the network themselves with multiple SLA tiers, being able to mitagate network issues quicker than a PagerDuty alert at a banwidth blender. Granted I'd say there still probably cushioning in there but judging network costs by comparing to a 10gig handoff at a random colo neutral, or blender, or OVH/Hetznet commodity servers, or Linode/Vultr/DO seems unfair in my opinion.
I totally understand that but the price hasn't dropped in what, a decade? My home internet connection (which includes a shedload of last mile costs laying fibre to my apartment) has went from 2mbit/sec to 1gig/sec (for less money in real terms) in a similar timespan.
I'm not expecting it to be at the cost of a 10gig cogent connection at LINX, but some drop makes sense. DO/Backblaze pricing seems to be more on the money.
Keep in mind these provides also do charge on top for a lot of other networking services (VPCs, NATs etc) which often can really add up, so I'd expect some of the SDN capex to be absorbed by that.
I met a network engineer for amazon about a year ago and I brought up the prices since I had heard about them being expensive here on hacker news. He laughed about it and said it was one of their most profitable departments with like a 90% margin.
The larger the provider is the more they are able to negotiate a lower rate from transit providers. The larger a provider is the more peering they get which reduces the total amount of bandwidth they pay for in general.
Then when you consider the cost of the equipment vs the cost of the throughput you will see that as a network becomes larger, network equipment isn't the main cost driving factor, nor are network engineers.
Simply put Amazon is over charging customers by 10x on bandwidth fees.
If we are able to sell bandwidth to customers at $0.01 cents per GB profitably, which accounts for paying for transit, network equipment, and network engineers, turning a profit for reinvestment, then AWS should be offering you a price that is 10x less than $0.01 per GB because their bandwidth cost should be significantly better than ours.
I work for a company that has to provision hosted products for customers across all the clouds and the one that has been impressing me the most lately is Azure. The load balancers are also the gateways (sound network topology), so there is no need for elastic IPs, NAT Gateways, or proxy protocol. The other thing I like about Azure is they have storage classes that automatically cross region replicate. The automatic storage encryption is a bit of an issue, but I know they were working on it, last I checked.
You can’t beat the offerings of AWS, but there are definitely some compliance scenarios that are easier to fulfill on Azure.
We rarely get customer requests for Google Cloud. Seems like it’s mostly Azure and AWS (at least at the enterprise level).
You don't need any that for load balancing on any cloud. Google Cloud has the best load balancer with a global anycast address that will route to the nearest DC with instances and free capacity.
For automatic cross-region replication for storage, are you talking about Azure's GRS class? That's the same as GCP's multiregional class and AWS allows you to setup entire bucket replication to anywhere else in a few clicks.
Pricing has recently changed to remove egress costs completely. The speeds were ok. Nothing special. But this is most likely better used for long term data storage? With a CDN it would suit well as a large media store for video/image delivery.
This is a rather simplistic comparison. These object storage services have several tiers and features that you need to take into account like zone vs regional replication, strong-consistency listings, bandwidth and access depending on where your compute is, integrations like notifications and functions, etc.
That being said, the clouds are great if your compute is co-located in the same place because the transfer fees are waived. Otherwise DO or B2 are probably better options for less usage or more neutral network locations and egress.
What is the durability objects stored with DO? S3 offers 11x9s of durability.
Likewise what is the replication story? Can you get event notifications when objects are uploaded/deleted? Is there versioning? Static website hosting? Lifecycle management?
I asked DO about this, and their answer was that 1) data are replicated on-site (in the same data center) but are not replicated to another data center; and 2) as for now Spaces does not provide lifecycle management.
As a side note, I think lifecycle management is very useful for backups. A server push its backups to the object storage, but cannot overwrite or delete previous backups. This is useful if the server is hacked...
Sia [1] could be even cheaper, coming out at $1.10 for a TB [2]. Although this price might not be forever, and I'm still not sure about the reliability.
In theory it should be more reliable, as it's decentralised and your data gets split among multiple servers around the world. The question remains what if the Sia network itself stops being profitable and people all exit at the same time. Although the same could be said for Amazon?
Sia will actually soon be adding a backend to Minio too [3].
The only thing that has stopped me using Sia is you have to have the blockchain running on the machine.
I started getting excited about Sia over the summer. They've updated now, but they used to have an exceedingly ambitious project roadmap here: https://trello.com/b/Io1dDyuI/sia-feature-roadmap. Checking back a few months later made me lose confidence in the project, even though I think it's a really cool idea.
I see it being used as a base layer for a Glacier type of product, which is still useful because it might give everyone a cheap-ish way to store media, but I wish it could be something more. As far as I know, there's no way to coordinate the sharing geographically, and furthermore the bandwidth is pretty bad, which I think might be due to bandwidth not being counted in the pricing models? There would be huge money in creating a blockchain like Sia that created a more granular marketplace for distributing shards and that factored geography/bandwidth into the pricing. The killer app for this technology is a public market for servers (specifically media hosting). Imagine if Netflix could continually shift their distribution platforms around the world as offices emptied and people turned off their PCs
Cool thanks! I presumed that it didn't meet some sort of criteria that was being discussed here. I just wanted to make sure there were no issues with the service itself per se.
The main cost seems to be bandwidth (speeds). If you've ever tried Spaces, you know it's a real slog uploading and downloading (even with good client-side speeds). DropBox has similar specs to Spaces, and does it for only $10. OneDrive has the same thing with $7 for 1TB. And there's another Chinese company, Tencent, that gives you 10TB for free (except it's slow and it's all in Chinese).
The problem I have with this article (very brief table), is that the author is comparing two "enterprise" solutions to a consumer solution. With "enterprise" solutions, you get guaranteed uptime and speeds. With Spaces, and the rest I've mentioned, you don't get any of that. Only that your data will still be there as long as you pay out.
Spaces may be slow (never tried it), but it's still a different offering that Dropbox/OneDrive, which are not designed for massive public access. For example, Dropbox has a traffic limit of 200GB/day, which can't be increased.
Is there any open source S3 compatible software? I know Riak Cloud Storage (http://docs.basho.com/riak/cs/2.1.1/), but I think it’s not maintained anymore.
Minio and Ceph both fit this description, with the latter also offering block and file access. Swift is close, but a slightly different native API (there's an adapter but I'm not sure of its status). Scality has open-sourced their implementation as Zenko. Also LeoFS (which despite the name is not a filesystem).
There is pithos [1] from the Exoscale [2] and powers the production system. A new version of the open source project with API signature v4 and other updates is due out soon.
That's libelously untrue. You might think Ceph or Swift are better, that's fine, but it's no toy. I've seen a few toy S3 implementations. I even called out Zettar on my blog back in the day. If it's not scalable or highly available it's a toy. If it's unstable it's a toy. If it's super-duper slow it's a toy. Minio is none of these things, plus it has features like erasure coding and encryption that are mature enough to be backed by real support. Best ever? I'm not saying that, but it's no toy.
Disclaimer: I have no affiliation with Minio. In fact, they're very nearly a competitor with the project I work on. However, Minio is full of ex-Gluster people and I consider many of them friends.
S3 is an abstraction that gives you a limited amount of operations with limited guarantees (e.g. eventual consistency, can only replace whole files). What you get in return is easier scalability and performance.
If you use an S3 API to store files (like minio does) you give up power and gain nothing. So you are better off using NFS, samba, webdav, ftp, etc.
Additionally minio doesn't seem to sync files to the file system, so you can't be sure a file is actually stored after a PUT operation (AWS S3 and swift have eventual consistency and Ceph has stronger guarantees).
That's curious, because the OpenStack Swift docs say "Objects are stored as binary files on the filesystem with metadata stored in the file’s extended attributes (xattrs)".
As far as I know, the advantage comes from exposing a course-grained API over the network, which is generally more efficient and reliable than doing many small operations. It would be hard to implement Minio's distributed mode using the filesystem API.
Also missing the SoftLayer/IBM S3-compatible offering. We use it very lightly (our IoT firmware builds from CI are going up there) and pay nothing, their free limits are quite generous.
Spaces provides a RESTful XML API for programatically managing the data you store through the use of standard HTTP requests. The API is interoperable with Amazon's AWS S3 API allowing you to interact with the service while using the tools you already know.
"unlimited" - It looks like if you actually try to use unlimited you get limited, it's closer to a Dropbox competitor than an s3 competitor
> If your use case creates an unreasonable burden on our infrastructure, we reserve the right to limit your egress traffic and/or ask you to switch to our Legacy pricing plan.
> Wasabi’s hot cloud storage service is not designed to be used to serve up (for example) web pages at a rate where the downloaded data far exceeds the stored data or any other use case where a small amount of data is served up a large amount of times
Oh, well that puts a kink in that. They had an example of 1PB costing $90,000 to download from S3. On their old plan it would be $40,000 (quick math). I think if the company was that large S3, comes across as a much better offering.
At $1/GB, it's the most expensive option mentioned on this page by far. Even with GCP's pricey storage, you would have to download your data 10x for this to make sense (or 88x with DO), and that's before considering cross-cloud/transit costs.
Also since you're the cofounder of nodechef, you should add a disclaimer when mentioning your own product.
You can negotiate lower S3 prices if you're a heavy user. I assume the other companies will as well. By heavy I mean exceeding the lowest price tiers significantly.
B2 is cheaper than every option here for both storage ($0.005/GB) and egress ($0.01/GB).[1] Their transaction pricing is also cheaper.[2] Despite being cheaper, it’s still hot storage, so you can immediately download buckets, in whole or in part. I’ve personally used it to backup (and restore) terabytes of data for over a year. I doubt it has an SLA like GCP or AWS, but DigitalOcean doesn’t either, yet it’s listed here. I find the B2 API documentation to be very readable as well.[3]
I’ve used AWS S3, Glacier and GCP Nearline, Coldline. I can’t think of a specific thing that has disappointed me about B2, and the reliability has been excellent. The nature of my work is that I have very large datasets, and B2 becomes extremely competitive when you’re backing up tens of terabytes or more.
_________________________________
1. https://www.backblaze.com/b2/cloud-storage-pricing.html
2. https://www.backblaze.com/b2/b2-transactions-price.html
3. https://www.backblaze.com/b2/docs/