Object Storage: AWS vs Google Cloud Storage vs Azure Storage vs DigitalOcean

dsacco · on March 20, 2018

This is great, but I’m disappointed Backblaze B2 isn’t included. That seems like an oversight unless someone can point out how B2 doesn’t hold its own with these options in a glaring way. There are tradeoffs, but B2 seems to be very competitive overall.

B2 is cheaper than every option here for both storage ($0.005/GB) and egress ($0.01/GB).[1] Their transaction pricing is also cheaper.[2] Despite being cheaper, it’s still hot storage, so you can immediately download buckets, in whole or in part. I’ve personally used it to backup (and restore) terabytes of data for over a year. I doubt it has an SLA like GCP or AWS, but DigitalOcean doesn’t either, yet it’s listed here. I find the B2 API documentation to be very readable as well.[3]

I’ve used AWS S3, Glacier and GCP Nearline, Coldline. I can’t think of a specific thing that has disappointed me about B2, and the reliability has been excellent. The nature of my work is that I have very large datasets, and B2 becomes extremely competitive when you’re backing up tens of terabytes or more.

_________________________________

1. https://www.backblaze.com/b2/cloud-storage-pricing.html

2. https://www.backblaze.com/b2/b2-transactions-price.html

3. https://www.backblaze.com/b2/docs/

truetraveller · on March 20, 2018

I really wanted to like B2. I love the docs and super-clean interface. But B2 was just too "weird" when it came to uploading an object. You can't replace an object key, so you always get a new object key when uploading. This is unlike any normal object store, where you can just upload to an existing key.

Also, if you upload an object with the same name (e.g. myphoto.png) it creates a new version, and there's no way to stop this. I don't want or need a new version!

I have a feeling this restriction is in place because of the underlying "vault" implementation.

beepbeepbeep1 · on March 20, 2018

B2 is great as a backup store. I use it for backup.

I wouldn't use it for non backup object storage, as it's in a single data center. S3 and Google Cloud are completely different, I'm not sure on the Google specifics but S3 has data replicated across three AZs.

yjftsjthsd-h · on March 20, 2018

Isn't backing up the last place where you want zero redundancy?

linsomniac · on March 20, 2018

I don't know about that. In my case I'm backing up a backup, so I have to lose both backups. B2 is just for the offsite copy of the backup. Like taking a tape home sometimes...

zwily · on March 20, 2018

A backup has redundancy already - your local copy + the backup.

(edit: I’m talking about geographic redundancy. B2 does use redundancy inside the datacenter to protect from drive failure, etc)

corobo · on March 20, 2018

In fairness backing up is the last place I want to hand over redundancy responsibility. Better I consider B2 as one part of the 3-2-1 backups than the be-all end-all

jnsaff2 · on March 20, 2018

I guess what OP meant was that for backups durability is important and periods of downtime are not as crucial as for say catpicturegram.

puzzle · on March 20, 2018

GCS has multi-AZ replication, but also multi-region (at a premium).

mayank · on March 20, 2018

S3 also does eventually consistent multi-region replication (at a premium).

arghwhat · on March 20, 2018

If you upload an object with the same 'name' to S3, you also get a new version.

When you say 'key', do you mean the opaque object ID granted to objects in B2 which object stores like S3 doesn't have at all? I don't understand the rant here. S3 only operate by 'name' (key).

I have had no problem integrating B2 side-by-side with S3 and GCP in the products I have written. Their high-level models are largely compatible.

mayank · on March 20, 2018

S3 versioning is optional and off by default. Writes (including rewrites) to S3 are also atomic, so you’ll never see partial writes.

arghwhat · on March 20, 2018

Ah, fair enough. I have oddly enough never in my life written to the same key twice, so I hadn't noticed it was off by default for S3.

Just to clarify you statement about atomicity: All writes to S3 and B2 are 'atomic' (and B2 can also verify the hash and reject on failure for an extra layer of security). The difference you mention is just that you can superficially "disable" versioning for S3, so that each key only stores one object version at a given time.

The only difference to the upload when S3 versioning is disabled is what happens during the metadata update: With versioning, a version is appended. Without, the version is replaced.

For B2, simulating disabled versioning is two operations: Upload a new object, and delete the old one. As long as the object is only referenced by name, this will also be atomic.

zedpm · on March 20, 2018

S3 allows you to turn versioning on and off for a bucket; my understanding is that B2 does not.

arghwhat · on March 20, 2018

Yeah, that's correct. You have to delete old versions manually after upload if you want to simulate an "overwrite".

It would be quite simple for backblaze to implement, though, so I guess it might come as a future option.

manigandham · on March 20, 2018

Why would you need a separate ID other than the actual filepath key? (and a metadata version number if you have versioning available)

arghwhat · on March 20, 2018

B2 doesn't have metadata version numbers, but instead use the globally unique file identifier to refer to specific versions. When you list file versions, you're presented with file ids rather than version numbers.

It's only slightly different from the name+version approach, but I personally like the concept of unique identifiers better. It feels nicer.

3pt14159 · on March 20, 2018

I'm on DigitalOcean, and I love it, but I use B2 for backup because it just naturally seems safer to have the backups with a different company. It is great for that. I found the object upload a bit weird / different, but I appreciated how it stops you from shooting yourself in the foot by default, rather than the other way around.

moondev · on March 20, 2018

Backblaze is in only a single datacenter, so perhaps not quite fair to compare pricing between true multi-region competitors?

toomuchtodo · on March 20, 2018

None of the other multi-region competitors provide multi-region durability "out of the box" (although S3 does offer cross region replication as an option), so you might consider distributing your data between two object stores if you need multi-region (AWS us-east-1 and Backblaze, for example). This ensure not only geographic redundancy, but also vendor redundancy (not to mention B2 is cheaper than S3).

Disclaimer: I am in risk management.

atombender · on March 20, 2018

Not sure what you mean by "out of the box". Both Google Cloud Storage and Amazon S3 have cross-region replication built in. Something not "in the box" would imply you have to build it yourself, or buy a vendor product, to get it.

Google Cloud Storage's multi-regional support (asynchronously replicated to two or more geographic locations) is even easier than AWS: You simply specify, on bucket creation, whether the bucket should be regional or multi-regional.

toomuchtodo · on March 20, 2018

Thanks for correcting me on Google Cloud! I wasn't aware, as I work exclusively in AWS.

I stand by my statement about the benefit of multi vendor object replication.

beedrillzzzzz · on March 20, 2018

A little unfinished, but https://storagestatistics.com includes backblaze and others

codemac · on March 20, 2018

The "geographic resiliency" chart is pretty inaccurate for the major providers, especially folks like Google cloud that have multi regional offerings with 99.95% availability & a lot more than "1" for redundancy.

I'd also add - unless you are going to include a median/mean latency number many options will look absurdly good even though we know they're a horrible idea, e.g. https://en.wikipedia.org/wiki/GmailFS. I also think "unknown" availability looks a lot like a durability issue..

therealmarv · on March 20, 2018

there are also many other missing.... OVH object storage, online.net C14, Rackspace, Delimiter ObjSpace etc.etc.etc.

The world is not only Google, Amazon, Azure and DigitalOcean!

jopsen · on March 20, 2018

blob storage without associated compute resources is boring.

transfer costs and latency makes it unusable for anything interesting.

willow_sp · on March 20, 2018

Shameless Plug :)

I wrote recently a series of articles comparing most of those providers and explaining how to use them with JavaScript:

- Amazon S3: https://medium.com/@javidgon/amazon-s3-pros-cons-and-how-to-...

- Google Cloud Storage: https://medium.com/@javidgon/google-cloud-storage-pros-cons-...

- Microsoft Azure Blob Storage: https://medium.com/@javidgon/microsoft-azure-blob-storage-pr...

- Backblaze B2: https://itnext.io/backblaze-b2-pros-cons-and-how-to-use-it-w...

- DigitalOcean Spaces: https://medium.com/dailyjs/digital-ocean-spaces-pros-cons-an...

- Wasabi Hot Storage: https://medium.com/@javidgon/wasabi-pros-cons-and-how-to-use...

jopsen · on March 20, 2018

shameless request :)

trying comparing performance: latency, bandwidth up/down and scalability in the face of many concurrent requests. And then do that wrt. compute resources in various locations.

truetraveller · on March 20, 2018

I was really excited about DO spaces. I compared every major Object Storage (OVH,B2,Wasabi,S3,Azure). DO spaces came out much ahead. I did dozens of hours of research. I was a customer (and I still am). But I am less excited now.

Basically, there are loads of issues with rejected requests because of rate limiting (returns a lot of 503 "slow down" responses). Basically, I don't recall ever receiving this from S3. You can check the forums to see more in-depth discussion.

The good part: This is a solvable problem, and I hope they relax these limits very soon.

Another great anecdote: their API is 99% compatible with S3. In fact, the official recommendation is to use the AWS SDKs on the server, which I am doing!

aclelland · on March 20, 2018

S3 will sometimes return a "503 temporary error" response if you start writing lots of files per second. From my understanding, if they see that your bucket has a constant high write rate they'll make some configuration changes in the background to accommodate the higher write rate.

That being said, last month I wrote over 60 million files to S3 and the number of failed writes were tiny (solved by simply retrying the write)

I've not used DOs spaces yet, I'd love to know if they guarantee read after write consistency. I know S3 has some issues with that depending on use case. Maybe it's time for another look at spaces.

truetraveller · on March 20, 2018

Yes, DO spaces is very strict about both GET and PUT. I did a benchmark requests-per-second (literally just fetching a URL a bench of times). I get about ~180 requests per second, after which all requests failed.

Amazon S3 is much better in the sense that there IS dynamic scaling if it notices spikes.

To the defense of DO, they are newer and their business model is "cheap,cheap,cheap", so they can't compete at the same level.

_hyn3 · on March 20, 2018

S3 will automatically shard your bucket evenly across your sub-namespace if you use pre-shardable names whenever possible.

This sounds complex but is actually fairly easy to do. For example, if you are not dealing with randomly generated ID's, try using Base64 encoded keynames rather than the keynames themselves. (A more limited character set is helpful, though.)

A better name than user_f38c9123 is f38c9123_user, which will allow a statistically even shard size across the full 16 character range of the first character. As more performance is needed, S3 will automatically shard the second character into 16x16 (256) possible shards, etc.

Also, using a more limited character set such as hexadecimal (that is, [[a-f][0-8]]*) (or just numberic digits) for the first characters of a filename will shard more evenly than a full alphanumeric [A-z][0-9][etc].

Jeff Barr had a blog post on this a while ago.. here it is:

https://aws.amazon.com/blogs/aws/amazon-s3-performance-tips-...

"By the way: two or three prefix characters in your hash are really all you need: here’s why. If we target conservative targets of 100 operations per second and 20 million stored objects per partition, a four character hex hash partition set in a bucket or sub-bucket namespace could theoretically grow to support millions of operations per second and over a trillion unique keys before we’d need a fifth character in the hash."

We actually do exactly this in Userify (blatant plug, SSH key management, sudo, etc https://userify.com) by just switching the ID type to the end of the string: company_[shortuuid] becomes [shortuuid]_company. It makes full bucket scans for a single keyname a bit easier and faster if you use different buckets for each type of data, but you actually will get better sharding overall by mixing all of your data types together in a single bucket. The trade-off is worth it for the general case.

cyberferret · on March 20, 2018

(*) for a particular use case.

The example is 200GB storage with 2000GB data transfer (out) every month. That's a LOT of data going out every month, so I am guessing the scenario is if you are hosting a photo library and lots of people are downloading every month.

If however you are just using the service as an online storage to hold <100GB of data as backup (i.e. mainly only transfer in), then S3 turns out way cheaper than DO.

Not knocking either service - I actually use both, for different use cases.

arghwhat · on March 20, 2018

I am not sure I find your argument particularly valid.

Yes, AWS S3 is cheaper if you have no egress traffic, and the monthly storage cost falls below $5, which is a minimum price for DO.

This scenario is very much a corner case. If you have egress traffic, or more than $5 worth of storage, then DO runs ahead quite fast.

I'm slightly disappointed that Backblaze B2 was not on the list, though.

brightball · on March 20, 2018

It's difficult to use with speculative numbers too. I recently compared Imgix and Cloudinary for use in front of an origin server and the numbers that we projected with made Imgix look 3x more expensive. When we put it against real usage, Imgix was actually 1/4 the price of Cloudinary.

It's difficult, sometimes, to guess at numbers that may or may not play to a specific service strength. A lot of times there's something hidden in the details that you're missing as well. In our case, we didn't notice that Cloudinary didn't have an overage rate so if you go over any of the capped limits on a particular plan you have to move up to the next plan level. It was a very unpleasant surprise. We didn't notice it to compare because it simply...didn't appear anywhere.

tobias3 · on March 20, 2018

Plus the SLA is different. It probably should be compared to at least AWS S3 reduced redundancy storage.

qaq · on March 20, 2018

right and what are the penalties when AWS breaks SLA?

e12e · on March 20, 2018

Without looking at the numbers in the article, I think the use case of storing data you never access is more of an edge-case? Sure a meager 10x down to up ratio (100gb up/month - distribute to 10 locations / people) will be very unfavourable to Amazon.

But what's the use-case for uploading 100 gb a month, and then... Not deleting it (keep paying for storage) and not accessing it?

Blackstone4 · on March 20, 2018

When you say way cheaper what do you mean? Like savings of ~$5-10 on a spend of $10-20? Or is it material? Like $50

cyberferret · on March 20, 2018

Well, DO is a minimum of $5 per month (up to 250GB IIRC). 100GB on S3 with just uploads during the month will run under $2.

ksec · on March 20, 2018

So $5 is way cheaper? if you need any of those data down the line, transferring out, any savings you have with AWS in the past year or years are going to be wiped out.

some_account · on March 20, 2018

I think it would be good if people didn't always pick these services based on price, since that's how you end up with one monopoly service and no alternatives.

Support alternatives even if they are a bit more expensive.

Blackstone4 · on March 20, 2018

Agreed. The price difference in a lower usage case is not meaningful. I would choose based on ease of use, reliability and functionality.

The main thing stopping me from using Digital Ocean is AWS RDS which is amazing. If they could bring out a similar solution with backups and MySQL and PostgreSQL, that would be amazing. :) I could then run my apps in docker.

cyberferret · on March 20, 2018

Well, I actually agree with you. In my first post above my reply, I did say that I actually use both services based on my use cases. Both excellent, and work well (Oh, and I also use the Minio open source solution for one particular project too). Horses for courses.

milesward · on March 20, 2018

Cool site, but noticing an error: This compares AWS S3 single region pricing ($0.024/gb) to GCP GCS multi-region pricing ($0.026/gb) rather than GCP GCS single region pricing ($0.020/gb). Hopefully the creator/author will correct the discrepancy... Disclosure: I'm a pricing dweeb at Google Cloud

arghwhat · on March 20, 2018

I sincerely hope your business card says "Pricing Dweep, Google Cloud Products".

If not, you need to get this fixed.

milesward · on March 20, 2018

New Card Order: Submitted.

jacksmith21006 · on March 20, 2018

Thanks for pointing out and noticed the same.

martinald · on March 20, 2018

The bandwidth egress charges on AWS (and GCP/Azure) are way too high. It almost seems cartel like.

Bandwidth costs have dropped by a huge factor over the past few years; but none of this has been passed on.

I really hope backblaze and/or DO manage to cause the big three some hurt on this and get them to reduce prices significantly; 7c/GB is really high these days.

sitepodmatt · on March 20, 2018

What are bandwidth costs? This comes up again and again, people see HE.net / Cogent / Level3 / Hibernia / NTT whatever offering 10gig handoff IP transit at a colo neutral location for X per month and somehow determine the true cost. When I suspect the reality is that much of cost is invested in routing equipment both internally and externally, evolving SDN and all the goodness it affords, being able to offer SLA per instance in terms of mbit that can be pushed, being able to mitigate DDOS, hiring top network engineers and researchers, being fully redundant across multiple providers and protected circuits rather than just a lonely hand off - or in GCP case building out the network themselves with multiple SLA tiers, being able to mitagate network issues quicker than a PagerDuty alert at a banwidth blender. Granted I'd say there still probably cushioning in there but judging network costs by comparing to a 10gig handoff at a random colo neutral, or blender, or OVH/Hetznet commodity servers, or Linode/Vultr/DO seems unfair in my opinion.

martinald · on March 20, 2018

I totally understand that but the price hasn't dropped in what, a decade? My home internet connection (which includes a shedload of last mile costs laying fibre to my apartment) has went from 2mbit/sec to 1gig/sec (for less money in real terms) in a similar timespan.

I'm not expecting it to be at the cost of a 10gig cogent connection at LINX, but some drop makes sense. DO/Backblaze pricing seems to be more on the money.

Keep in mind these provides also do charge on top for a lot of other networking services (VPCs, NATs etc) which often can really add up, so I'd expect some of the SDN capex to be absorbed by that.

mrep · on March 20, 2018

I met a network engineer for amazon about a year ago and I brought up the prices since I had heard about them being expensive here on hacker news. He laughed about it and said it was one of their most profitable departments with like a 90% margin.

corobo · on March 20, 2018

Gotta be able to negotiate wiggleroom somewhere when the Netflix-types come knocking

manigandham · on March 20, 2018

Those other core networks and ISPs arent exactly cheap to run either, but they are doing fine and the price differential is close to 100x.

raiyu · on March 20, 2018

Actually you are a bit mistaken.

The larger the provider is the more they are able to negotiate a lower rate from transit providers. The larger a provider is the more peering they get which reduces the total amount of bandwidth they pay for in general.

Then when you consider the cost of the equipment vs the cost of the throughput you will see that as a network becomes larger, network equipment isn't the main cost driving factor, nor are network engineers.

Simply put Amazon is over charging customers by 10x on bandwidth fees.

If we are able to sell bandwidth to customers at $0.01 cents per GB profitably, which accounts for paying for transit, network equipment, and network engineers, turning a profit for reinvestment, then AWS should be offering you a price that is 10x less than $0.01 per GB because their bandwidth cost should be significantly better than ours.

Instead they are charging you a 10x higher price.

hacknat · on March 20, 2018

I work for a company that has to provision hosted products for customers across all the clouds and the one that has been impressing me the most lately is Azure. The load balancers are also the gateways (sound network topology), so there is no need for elastic IPs, NAT Gateways, or proxy protocol. The other thing I like about Azure is they have storage classes that automatically cross region replicate. The automatic storage encryption is a bit of an issue, but I know they were working on it, last I checked.

You can’t beat the offerings of AWS, but there are definitely some compliance scenarios that are easier to fulfill on Azure.

We rarely get customer requests for Google Cloud. Seems like it’s mostly Azure and AWS (at least at the enterprise level).

manigandham · on March 20, 2018

You don't need any that for load balancing on any cloud. Google Cloud has the best load balancer with a global anycast address that will route to the nearest DC with instances and free capacity.

For automatic cross-region replication for storage, are you talking about Azure's GRS class? That's the same as GCP's multiregional class and AWS allows you to setup entire bucket replication to anywhere else in a few clicks.

davidgl · on March 20, 2018

https://stackoverflow.blog/2017/07/21/trends-cloud-computing...

jacksmith21006 · on March 20, 2018

You will find the load balancing at Google to be superior to AWS and Azure. Had not used the others.

Brajeshwar · on March 20, 2018

Can we include Wasabi[1] in the comparison? They seem to have a really compelling offering when it comes to Object Storage and their pricing.

1. https://wasabi.com/

thebigjc · on March 20, 2018

Their pricing looks great. Has anyone used them?

askaboutit · on March 20, 2018

Pricing has recently changed to remove egress costs completely. The speeds were ok. Nothing special. But this is most likely better used for long term data storage? With a CDN it would suit well as a large media store for video/image delivery.

zbjornson · on March 20, 2018

This uses the pricing for multi-regional GCS, which is geo-redundant across two or more locations separated by at least 100 miles.

Regional GCS is the storage class equivalent to standard S3 and is $0.02/GB.

Elect2 · on March 20, 2018

Corrected.

milesward · on March 20, 2018

Woo, thx!

milesward · on March 20, 2018

Yup, I saw the same thing.

manigandham · on March 20, 2018

This is a rather simplistic comparison. These object storage services have several tiers and features that you need to take into account like zone vs regional replication, strong-consistency listings, bandwidth and access depending on where your compute is, integrations like notifications and functions, etc.

That being said, the clouds are great if your compute is co-located in the same place because the transfer fees are waived. Otherwise DO or B2 are probably better options for less usage or more neutral network locations and egress.

alexbilbie · on March 20, 2018

What is the durability objects stored with DO? S3 offers 11x9s of durability.

Likewise what is the replication story? Can you get event notifications when objects are uploaded/deleted? Is there versioning? Static website hosting? Lifecycle management?

icebraining · on March 20, 2018

S3 offers 11x9s of durability.

So they say, but the SLA doesn't make any promises about durability. https://aws.amazon.com/s3/sla/

alexbilbie · on March 20, 2018

The SLA governs availability.

Details of durability here https://docs.aws.amazon.com/AmazonS3/latest/dev/DataDurabili...

ngrilly · on March 20, 2018

I asked DO about this, and their answer was that 1) data are replicated on-site (in the same data center) but are not replicated to another data center; and 2) as for now Spaces does not provide lifecycle management.

As a side note, I think lifecycle management is very useful for backups. A server push its backups to the object storage, but cannot overwrite or delete previous backups. This is useful if the server is hacked...

turblety · on March 20, 2018

Sia [1] could be even cheaper, coming out at $1.10 for a TB [2]. Although this price might not be forever, and I'm still not sure about the reliability.

In theory it should be more reliable, as it's decentralised and your data gets split among multiple servers around the world. The question remains what if the Sia network itself stops being profitable and people all exit at the same time. Although the same could be said for Amazon?

Sia will actually soon be adding a backend to Minio too [3].

The only thing that has stopped me using Sia is you have to have the blockchain running on the machine.

1. https://sia.tech

2. https://siastats.info/storage_pricing

3. https://blog.sia.tech/introducing-s3-style-file-sharing-for-...

opportune · on March 20, 2018

I started getting excited about Sia over the summer. They've updated now, but they used to have an exceedingly ambitious project roadmap here: https://trello.com/b/Io1dDyuI/sia-feature-roadmap. Checking back a few months later made me lose confidence in the project, even though I think it's a really cool idea.

I see it being used as a base layer for a Glacier type of product, which is still useful because it might give everyone a cheap-ish way to store media, but I wish it could be something more. As far as I know, there's no way to coordinate the sharing geographically, and furthermore the bandwidth is pretty bad, which I think might be due to bandwidth not being counted in the pricing models? There would be huge money in creating a blockchain like Sia that created a more granular marketplace for distributing shards and that factored geography/bandwidth into the pricing. The killer app for this technology is a public market for servers (specifically media hosting). Imagine if Netflix could continually shift their distribution platforms around the world as offices emptied and people turned off their PCs

cyberferret · on March 20, 2018

Curious as to why you edited your post to remove mention of the other service that you had listed originally?

turblety · on March 20, 2018

Sorry @cyberferret. I previously listed Storj [1] but did a bit of research and it didn't actually look cheaper when you consider bandwidth cost.

$0.015 per GB per month

$0.05 per GB downloaded

[1] https://storj.io/

cyberferret · on March 20, 2018

Cool thanks! I presumed that it didn't meet some sort of criteria that was being discussed here. I just wanted to make sure there were no issues with the service itself per se.

digianarchist · on March 20, 2018

They're in closed BETA. I looked into them but they are not that good for long term storage. Plus their payment system is centralized.

stagbeetle · on March 20, 2018

The main cost seems to be bandwidth (speeds). If you've ever tried Spaces, you know it's a real slog uploading and downloading (even with good client-side speeds). DropBox has similar specs to Spaces, and does it for only $10. OneDrive has the same thing with $7 for 1TB. And there's another Chinese company, Tencent, that gives you 10TB for free (except it's slow and it's all in Chinese).

The problem I have with this article (very brief table), is that the author is comparing two "enterprise" solutions to a consumer solution. With "enterprise" solutions, you get guaranteed uptime and speeds. With Spaces, and the rest I've mentioned, you don't get any of that. Only that your data will still be there as long as you pay out.

icebraining · on March 20, 2018

Spaces may be slow (never tried it), but it's still a different offering that Dropbox/OneDrive, which are not designed for massive public access. For example, Dropbox has a traffic limit of 200GB/day, which can't be increased.

Lunatic666 · on March 20, 2018

Is there any open source S3 compatible software? I know Riak Cloud Storage (http://docs.basho.com/riak/cs/2.1.1/), but I think it’s not maintained anymore.

notacoward · on March 20, 2018

Minio and Ceph both fit this description, with the latter also offering block and file access. Swift is close, but a slightly different native API (there's an adapter but I'm not sure of its status). Scality has open-sourced their implementation as Zenko. Also LeoFS (which despite the name is not a filesystem).

gizzlon · on March 20, 2018

https://minio.io/

cyberferret · on March 20, 2018

I believe minio.io provides open source S3 compatible storage on your own servers.

unixhero · on March 20, 2018

retrack · on March 20, 2018

There is pithos [1] from the Exoscale [2] and powers the production system. A new version of the open source project with API signature v4 and other updates is due out soon.

[1] http://pithos.io/ [2] https://www.exoscale.com/object-storage/

doublerebel · on March 20, 2018

Joyent's Manta has an S3 adapter, all open-source.

tobias3 · on March 20, 2018

OpenStack Swift or Ceph with Ceph Object Gateway. Don't use minio, it's a toy for testing.

notacoward · on March 20, 2018

That's libelously untrue. You might think Ceph or Swift are better, that's fine, but it's no toy. I've seen a few toy S3 implementations. I even called out Zettar on my blog back in the day. If it's not scalable or highly available it's a toy. If it's unstable it's a toy. If it's super-duper slow it's a toy. Minio is none of these things, plus it has features like erasure coding and encryption that are mature enough to be backed by real support. Best ever? I'm not saying that, but it's no toy.

Disclaimer: I have no affiliation with Minio. In fact, they're very nearly a competitor with the project I work on. However, Minio is full of ex-Gluster people and I consider many of them friends.

striking · on March 20, 2018

What's wrong with minio? It does its job and does it well. I did backups with it once upon a time, in production, and it never failed.

icebraining · on March 20, 2018

I've never used it in production; in what ways is minio lacking compared to Swift and Ceph?

tobias3 · on March 20, 2018

S3 is an abstraction that gives you a limited amount of operations with limited guarantees (e.g. eventual consistency, can only replace whole files). What you get in return is easier scalability and performance. If you use an S3 API to store files (like minio does) you give up power and gain nothing. So you are better off using NFS, samba, webdav, ftp, etc. Additionally minio doesn't seem to sync files to the file system, so you can't be sure a file is actually stored after a PUT operation (AWS S3 and swift have eventual consistency and Ceph has stronger guarantees).

icebraining · on March 20, 2018

That's curious, because the OpenStack Swift docs say "Objects are stored as binary files on the filesystem with metadata stored in the file’s extended attributes (xattrs)".

As far as I know, the advantage comes from exposing a course-grained API over the network, which is generally more efficient and reliable than doing many small operations. It would be hard to implement Minio's distributed mode using the filesystem API.

unixhero · on March 20, 2018

The rclone project supports S3

https://rclone.org/s3/

_jcwu · on March 20, 2018

Rclone.org

SlowBro · on March 20, 2018

Maybe I'm missing something, but where is the Azure data?

kryptkpr · on March 20, 2018

Also missing the SoftLayer/IBM S3-compatible offering. We use it very lightly (our IoT firmware builds from CI are going up there) and pay nothing, their free limits are quite generous.

skc · on March 20, 2018

Doesn't seem to be there. Was wondering the same.

blowski · on March 20, 2018

How does Digital Ocean compare in terms of reliability and compatibility with existing tools? I haven't used it, so genuine question.

overcast · on March 20, 2018

From their documentation.

Spaces provides a RESTful XML API for programatically managing the data you store through the use of standard HTTP requests. The API is interoperable with Amazon's AWS S3 API allowing you to interact with the service while using the tools you already know.

buryat · on March 20, 2018

Would be nice to add things like: - Consistency - First byte latency - Limits on requests - Max upload size - Extra features

whydid · on March 20, 2018

Yes, for example GCP does not have S3's "eventual consistent" behavior. When data is written to GCP, it's available to read immediately.

erikpukinskis · on March 21, 2018

One of the neat features of Microsoft’s storage service is that you can append to existing files.

It surprised me that this isn’t standard on other services.

https://docs.microsoft.com/en-us/rest/api/storageservices/un...

askaboutit · on March 20, 2018

Wasabi storage has unlimited egress now. So spaces with its rate limiting doesn’t seem that much of a big deal.

Storage is something that won’t make much money in a few years I believe. I think the Egress overcharging maybe finally seeing decent competition.

https://wasabi.com/pricing/

corobo · on March 20, 2018

"unlimited" - It looks like if you actually try to use unlimited you get limited, it's closer to a Dropbox competitor than an s3 competitor

> If your use case creates an unreasonable burden on our infrastructure, we reserve the right to limit your egress traffic and/or ask you to switch to our Legacy pricing plan.

> Wasabi’s hot cloud storage service is not designed to be used to serve up (for example) web pages at a rate where the downloaded data far exceeds the stored data or any other use case where a small amount of data is served up a large amount of times

askaboutit · on March 20, 2018

Oh, well that puts a kink in that. They had an example of 1PB costing $90,000 to download from S3. On their old plan it would be $40,000 (quick math). I think if the company was that large S3, comes across as a much better offering.

squid3 · on March 20, 2018

NodeChef's object storage is very attractive option especially with no data transfer charges. Available in two regions. https://nodechef.com/s3-compatible-object-storage

manigandham · on March 20, 2018

At $1/GB, it's the most expensive option mentioned on this page by far. Even with GCP's pricey storage, you would have to download your data 10x for this to make sense (or 88x with DO), and that's before considering cross-cloud/transit costs.

Also since you're the cofounder of nodechef, you should add a disclaimer when mentioning your own product.

johnnycarcin · on March 20, 2018

Azure storage is available in way more than 10 regions FYI: https://azure.microsoft.com/en-us/global-infrastructure/serv...

lyager · on March 20, 2018

.. but with a monthly fee, which kind of breaks the idea of “pay as you go”, atleast for my purpose

k__ · on March 20, 2018

Haha, yes.

But I have to admit, the $5 fees some of these new offerings have is probably to filter out some type of customers.

driverdan · on March 20, 2018

You can negotiate lower S3 prices if you're a heavy user. I assume the other companies will as well. By heavy I mean exceeding the lowest price tiers significantly.