Hacker News new | past | comments | ask | show | jobs | submit login
Amazon Elastic File System – Production-Ready in Three Regions (amazon.com)
232 points by jeffbarr on June 29, 2016 | hide | past | favorite | 119 comments



YES

Finally we can start using standard posix semantics for things.

Yes people might say its primitive, but currently all I can see is people trying to re-create shared files systems over protocols not designed for it cough cough HTTP

Finally I can have a shared home, with a shared environment.

A single readonly binary source (great for docker by the way.) also great for ensuring one version of scripts, without having to ansible everything.


Congratulations, you've won a single point of failure!


It's a single point of failure that's highly available across multiple availability zones. I'm not sure I would even call it a single point of failure when it's running in multiple datacenters. If you need geo-redundancy, rsync it to another region nightly.


HAHAHA. YES! So much this. THere are legit use cases for this, but there are many use cases that will cause you much pain.


I suspect you're being downvoted because you've described every piece of technology ever created.


Or the

"HAHAHA. YES! So much this."

part. Though I can't down vote so I wouldn't know.


I just think that some people are getting excited about reviving the SAN. I don't think everyone realises or is familiar with the pain running NFS on distributed or non distributed infrastructure can cause, when used for the wrong use cases... Yes my comment was a bit of a joke too.


At 10x the price for S3.


You are correct. However its not for the same things.

S3 is high latency, HTTP semantics, no locking mechanism.

EFS is low latency, high throughput shared file system with standard posix interface. everything can write to standard files. Not many things can write using S3 effectively.


That's not a valid comparison for many applications: if you need POSIX filesystem semantics, it doesn't matter how much S3 costs because it cannot do the job, any more than you'd say S3 costs more than a USB key.


This is really cool but also kind of a shame. I feel like it will encourage bad behavior.

I can see people using this to deploy code by deploying onto the shared volume so that all their instances get an "instant update". Which is great when it works, but lord help you when EFS goes down and every app server you have is hung on a broken NFS connection.


I think that deploying onto a shared volume is exactly the workload that EFS encourages. That things will stop functioning can be said about any (AWS) technology. "Lord help you when (S3|RDS|SQS|etc.) goes down and every (thing) you have is hung up on a broken (thing) connection."

The rational argument here is that if uptime is important to you, the solution is to utilize multiple regions, of which EFS is now in 4. You are using AWS because of the SLA and the assumption that when something goes wrong there are legions of technical folks trying to fix it as soon as possible.


The difference is that all of those services are accessed via an API, which means your client can do clever things when it fails, like timeout, give a fallback, find an equivalent resource in another zone or region, etc.

With EFS, it's exposed as NFS, which hooks in much deeper. If it goes down, there isn't anything you can do to work around the problem, unless you start hacking your own file system kernel modules.


Can't you measure and timeout on the file-descriptor writes too?


Do you want to start wrapping your system calls to read files in code that forks to periodically check for a hung system call?


It's worse that that, actually. With hard mounts (the default), the only way to interrupt is with SIGKILL, which if my memory serves me correctly, is a process-wide signal. So you'd have to do all NFS I/O in a separate child process.

With soft mounts (does EFS support this?) you can exchange SIGKILL for religiously checking return values for all I/O syscalls, retrying partial writes and whatnot.


Can't you just use soft mounts? Especially since in the scenario you described you can use read only mounts.


It depends on the nature of the failure. If you're doing something which triggers hard failures – i.e. where a read() call on a ocket would unblock with an error – soft mounts will eventually recover. The problem is that, in my experience anyways, the vast majority of failures aren't that clean – things like a server which processes packets but never responds, a network connection which drops packets but doesn't change the link status, etc. — and in those cases soft mounts behave no better than hard mounts. There also used to be kernel bugs in *BSD, Darwin, and Linux where the client could deadlock in heavy activity, which were hard to reproduce and get fixed.

In all of those cases, anything which tries to access something on the NFS mount will block in the kernel (i.e. “kill -9“ won't work) and the mount cannot be unmounted normally.

I wrote https://github.com/acdha/mountstatus awhile back – if memory serves, 2004 or so – because we found that on Linux a lazy unmount would still work in this case and so you could have a process monitor the mount status (fork() a child to check the mount, alert if it doesn't get a response within a set interval) and a watchdog could respond to an alert by issuing a “umount -l” and remounting, which doesn't fix the blocked process but is less disruptive than rebooting and that new processes won't block because they tried to access that mount.


Assuming all assets get stored on S3, the other common writing scenario is logging. How would multiple machines writing to the same log file work?


Generally, you wouldn't write to the same log file. Prepending/Appending the instance name to the file would be enough to make it unique.


It wouldn't. NFS is not fully POSIX compliant, largely due to not being cache coherent. In particular, O_APPEND is not supported.


Isn't the whole point of AWS that you can blame AWS engineers for downtime instead of your own (in this case) code deployment thingy?


I'm not sure if you're being sarcastic or not, but no, it isn't. It's to provide building blocks for infrastructure. Down time is still on you. Only a lazy engineer blames their tools for their failures.


    > Down time is still on you
If you can say "It's not our fault, it's Amazon's", there are plenty of boards and customers who are fine with that, in my experience.


Yep. "Nobody ever got fired for buying IBM" applies here.


Not sarcastic: aws is not a tool, it's a service. You're exchanging dollars for SLA. The SLA is built in: you're sometimes (rarely) down, and someone else gets called about it.


Your customers will blame you: you don't want it to happen. Even if amazon pay you some damage you may lose trust or your whole business in the meantime.


With a major AWS outage, customers will think "the Internet is having trouble today." Half the sites they visit will be down or broken in some way.


I think this perspective really depends on who you are supporting. This is my experience:

1) Actual customers: They want you to fix it. They don't care who's fault it is, they are dependent on you.

2) Internal customers: They want someone to blame. They don't really care that it's down, they just want to point the finger at someone when their report goes out late.


"Down time is still on you" - well put. And it is actually great when you can only blame yourself because then you can fix the issue right away. If someone other is to blame, you need to a) convince that person that there is a problem and b) wait for the issue to rise to the top of their queue. Which can be painful.


But in the case of Amazon they will likely know about the problem before you would have anyway. Sure, their problems tend to have a much larger scale, but their ops teams are pretty good.


Just because it's your fault, that doesn't mean that you can quickly and easily fix the problem yourself. Amazon has a team of people to keep EFS up and running, and AWS owns the entire storage, server, and network stack. When your own NFS server starts timing out, you may have only a few people to diagnose the problem.


Do you think that's still true in a "serverless" architecture?


Surprise! Serverless runs on... gasp servers!



Hrm, so for the site 'whoownsMYavailability', the answer is someone else?


I have to admit that is the first use case that I thought of when I read this. No more messing around with AMIs and launch config whenever an update is deployed.

You bring up a good point on the EFS being a point of failure though. What use case do you think the EFS is good for? It seems like even worse for data storage since IO can easily become a bottleneck.


> What use case do you think the EFS is good for

Out of band big data/data pipeline processing. Something where you want an easy way to sync your data but can handle extended downtimes.


Or scenarios where you can use EFS to stage data 'for something else' (i.e. shared image / upload content that stages to a CDN, etc.)

Keep in mind (and this comes from hard experience in 'traditional' NFS web server architecture) - if you mount everything on an NFS volume, you ensure that

1) If something goes wrong on that NFS mount, everything goes wrong. (bad code deploy? All nodes are down!)

2) If you rely on an NFS mount to store everything (e.g. trust keystores for JVMs,etc.) your entire infrastructure is dependent on the I/O capabilities of that NFS mount.

3) No matter how clever you are (or how much you trust NFS clients/versions) you will deal with file locking if you are doing a fair amount of read/write from multiple nodes to a single NFS mount.

Short story - EFS will make some of the 'hard' things with distributed nodes possible, but don't make the easy things impossible to troubleshoot.


The instance can fetch the upgrade from the NFS mount and also work using the previous version when NFS is down. I see no problem here.


but thats the same with any component. how is that different to SNS et al?

NFS is actually multi homed, so there is no real reason why it can't be HA/clustered, apart from the block store and the underlying file system.


> how is that different to SNS et al?

It's not really, and you should write your software to account for SNS outages, etc.

That being said, this is presented as NFS. NFS has a nasty habit of freezing your system if it breaks. If SNS breaks, you get errors and timeouts in your logs, but can keep going. If NFS breaks, you pretty much just sit around waiting for it to come back.


If people want to deploy that way, I suppose that using rsync to update a local clone, and running from that, would minimize this problem.


Most app / language load everything at startup.


I think it is worth mentioning that Azure offer something similar with Samba / CIFS/ SMB protocol. [1]

This is the underlying tech that powered their Docker volume plugin as well [2]

I'm using in production for serving up small images to some web servers, and I'm currently in playground with docker stuff.

So far, so good. I'm impressed.

[1] https://azure.microsoft.com/en-us/documentation/articles/sto...

[2] https://azure.microsoft.com/en-us/blog/persistent-docker-vol...

edit: footnotes


A notable difference is that Azure file shares have a 5 TB limit for the whole share, and a 1 TB limit on the size of any given file [1], while EFS has no limit on the size of the file system, and a 52 TB limit on the size of any given file. [2]

Disclosure, work for AWS.

[1] https://azure.microsoft.com/en-us/documentation/articles/sto...

[2] http://docs.aws.amazon.com/efs/latest/ug/limits.html


We use a columnar database and we insert data to it periodically (30s to 90s) using Kinesis as middleware. Files are immutable and each insert creates [count of columns] * [count of tables] files. The database also periodically merges the batch files in background.

Since EFS was not available, I was trying to develop a FUSE filesystem that writes data to both S3 and local filesystem (EBS) for durability and read only from local filesystem for performance. Even though it's cheaper than EFS, it's slow since each file needs to be sent to S3 individually. Our requirements are not that much because the files are immutable, there are not many files (I heard that EFS doesn't work well with many small files.), no need for concurrent access. I think I will give it a shot since it also fits our use-case.


If files are immutable, isn't s3 much better fit for it? s3 main weaknesses is that you can't modify part of a file, which makes it unsuitable for a filesystem, but looks like it satisfies your use case.


They use S3. Their problem likely is that they write many small files to S3, and repeat it every minute or so. That's a combination S3 isn't designed/optimized for.


Maybe but unfortunately the database doesn't support "backup store" that we can use directly S3, it just writes the data to specified directory in a filesystem. We should either fork the database or implement "S3 backed" filesystem and we picked the latter one. Unfortunately the performance FUSE is not that good for read and writing data to S3 is somehow expensive.


If you don't mind me asking, what DB is this?


We're implementing Clickhouse (http://clickhouse.yandex) to our open-source analytics platform Rakam. (https://github.com/rakam-io/rakam)


Why not have a service watch for filesystem changes and then upload to S3?


AFAIK, there is no synchronous way to do it and if I do it asynchronously, partial data loss is possible.


I am excited by EFS but during the beta it didn't perform as well as I would have liked. I had a small Wordpress install that I wanted to make elastic in that I could have multiple web servers and all of the data was stored apart from the AMI. After testing my setup I found that latency was way too high and the performance of my site was terrible. I'm guessing it's because the amount of data was really low (100MB) and the requests were infrequent.

I ended up going with a shared volume, which works fine and performance is great, but I can only attach the volume to a single instance at a time which prevents me from running multiple instances. That said, it looks like there has been some performance tweaks to EFS so maybe it might be better this time.


We addressed this use case during the preview. Could you try it again now?


Was it not possible to use the S3 plugin for WP to offload all the content files to S3 + CloudFront?


I think you might be going about solving this problem in a sub-optiimal way. WordPress data consists of db + asset files. Db gets addressed outside of this and asset files either go in s3 or are synced (rsync, etc)


Price is a bit high $0.3 GB / month vs. $0.1 GB / month for SSD EBS and even less for local storage. S3 is even cheaper $0.03 GB / month. So EFS is 10 times more expensive than S3.

On I/O front also EBS (gp2) and (st1) are several times cheaper.

So most of EFS advantages are in flexibility and sharing same file system across many instances. This may be great e.g. for sharing readonly binaries, but large scale data pipeline likely can be done much cheaper using different technologies.


You are right for small scale things.

I'll have to double check my working out, but you're only paying for what you use, not provision, which is one big things.

Also, if you have > 3 machines working from the same dataset (ie docker image/video/large binaryblob) there is an instant saving, without factoring in provisioned vs useable space with EBS

but, your original point, as a 1-1 replacement of a properly sized EBS mount, is correct, its more expensive.


Nitpick

"large binaryblob" is literally "large binary binary large object"


Haha, point taken.


I think EFS is meant for large read-heavy patterns, at least that's what we used it for. For example, I help run an e-commerce site that has a lot of content (JS, CSS, images, etc.) that we serve up from 10 dedicated servers as well as 30 general servers. When we release new content we have to update each of the servers and make sure that everything got installed properly everywhere. With EFS, we just have to update one thing and everyone gets it automatically.


Why don't you use S3 and Cloudfront to serve all the assets?


Maybe their platform wasn't designed around using S3 and retrofitting that into the platform is a significant task compared to using EFS as the backing store with CloudFront mediating client access?


Exactly! The code base is going on 15 years old and was built with no knowledge of S3 or it's ilk. For that reason each WWW server has to have a local copy of all the assets (or at least a significant portion).


> On I/O front also EBS (gp2) and (st1) are several times cheaper.

Isn't I/O for EFS free?


You don't pay extra for IO on EFS and some types of EBS (like mentioned here gp2). However, for some applications you optimize your storage to get most IO per $. In that terms EFS comes with some IO capacity as well as EBS. The point is that EBS is way cheaper in that use case.


Pricing looks competitive https://aws.amazon.com/efs/pricing/ I wonder if this helps those who perceive S3 as vendor lock-in since it replicates a traditional file system.


Does it? It looks significantly more expensive than, say, rsync.net.


We've been in the EFS preview for quite a while now, and we are incredibly pleased with it. We do a few different things using EFS, including using it as an output for some of our Spark jobs.

It's really a top notch product; couldn't be happier.


Cool! If you are looking at shared filesystems on EC2, you might also want to take a look at ObjectiveFS[1]. Works in all regions and on GCS, and you can mount your filesystem securely (end-to-end encryption) between regions.

[1]: https://objectivefs.com


ObjectiveFS looks awesome and I've been wanting to use it, but using non-S3 storage is locked to "enterprise, I can't afford it". :(

But I also can't afford S3's bandwidth pricing.. I could, however afford to run an S3-API compatible service myself with bandwidth pricing I could afford.

But then you wouldn't let me use it because then I'd be "enterprise".


Great that you have been wanting to use ObjectiveFS. We are happy to talk with you about your non-S3 storage use case and see if we can find a plan that works for you.


Can EFS be mounted from the Lambda environment?


Not at present. I would be interested in learning about use cases for this. Post here or find me online, as you wish.


One use case would be one that came up a while back here on HN -- The person had a large binary blob that they needed to load up every time their Lambda function ran. If they could mount EFS from Lambda, they could put the large binary blob there.

Depending on how they wrote their code, if the Lambda could seek to the right part of the file then this would work, but if it has to load the entire file into RAM anyway, then it wouldn't gain them much except perhaps a slightly easier way to get the file into RAM.


Couldn't you just download the portion of the one file you want from S3? Unless EFS is significantly faster...

From a caching perspective, you wouldn't seem to gain much in lamda. You can cache in memory/tmpfs until your container eventually dies, and then you likely aren't on the same machine so there's no EFS level caching to take advantage of.


Right that was my point. You most likely have to load the whole thing into memory, but in the rare case that you can seek directly to the right spot in the file, ECS might have an advantage for you in Lambda.


Yeah, I just meant that S3 has partial downloads, so you can still seek.

EFS/NFS makes a lot of sense I think when you have a lot of random/unpredictable access across large numbers of files. You could do the same thing with FUSE/S3 but across lots of smaller files/accesses I would think the overhead adds up faster.


You mention HPC and Big Data processing as use cases for EFS; Lambda is also very well suited for big processing tasks you want to execute quickly with a large amount of concurrency. You often need persistence across executions, for example when breaking tasks into multi-function workflows, and EFS would seem to simplify this over manually using S3 for it.


I would like to have a media pipeline where a media goes through QC check, thumbnail, preview generation and some other processing which are independent of each other. My media can range from 1-15GB, so if I can store it in EFS which can be mounted in Lambda, I can use various Lambda functions which can simply read from a common storage without downloading every time from S3.


I want to run the Elm compiler in a Lambda, and maintain persistent environments so that subsequent builds can benefit from keeping the libraries and intermediate files available.


To store small files (10 to 1000 KB) uploaded by users in a web app, what is the best tool, S3 or EFS?


I have an application where users upload hundreds of small (~10kB) files every day. We quickly realized that S3 was not an option, since you get charged per request. As a workaround we batched the file uploads by archiving uploads each hour, and uploading a single .tar.gz to S3 at the end of the hour. The aggregated data gets processed offline later.

We have N upload endpoints (as part of an autoscaling group), it's a bit of a pain to reaggregate the data since each server uploads its own .tgz to S3. I'm very happy to have EFS now! It makes scaling out our upload endpoints much simpler, as they can just dump their data into a common directory (which can still get archived and persisted to S3 later).


Object storage (i.e. s3+cloudfront) is still the correct way to store immutable files. that said, I know this will be used when the options are to rewrite filesystem access to an abstract (i.e. fs/s3/azure/etc) or just use nfs. People already use s3fs-fuse[1] which is dubious at best.

[1]: https://github.com/s3fs-fuse/s3fs-fuse


Would you say that the main choice criteria is immutability?

Immutable files => Object storage => S3

Mutable files => File storage with POSIX semantics => EFS/NFS


A benefit (usually) of S3 is that since it's operating over HTTP, you don't need any additional things to make the files accessible. In case of EFS you would need a HTTP server running. Maintaining a HTTP work is extra work, extra configuration but might give you additional benefits.

Thinking of small file size, EFS would probably work a bit faster (since you'd have some HTTP overhead) but I doubt that it would be significant, unless users upload hundreds of files at once.

EFS advantage in this case would be that you'd no longer need a S3 library to do it, since it's POSIX compliant. Given that most languages have simple, solid libraries for interacting with it, it's not a huge difference but still :)

All in all, I'd go with S3 unless there are some special requirements that are hard to satisfy using current S3 features.


Any ideas what they are using for NFS servers software?


Can you expose the EFS file systems to CloudFront? For a future migration (local to cloud) I would like a backend application running on EC2 to use standard path names, but public access would be through CloudFront. It would be easier to migrate existing apps to the cloud with that approach, and then slowly work on changing the apps to use S3 (for cost reasons).


Why hasn't anyone mentioned ObjectiveFS? When EFS was in preview just a month or so ago it was absolutely abysmal with performance for basic smaller web stuff.

OFS feels nearly like an attached disk, and uses S3. You only pay per mount. Their support has been awesome and very personal.

It provides a fuse-based full POSIX filesystem backed by S3 and can be mounted from multiple places in multiple regions. Some basic things like snapshotting they said are coming soon too.

I gave up on EFS and even though it's finally out of preview, I think I still am going to prefer OFS based on what I've seen so far...


I don't have a lot of experience with pNFS but it's supposed to be an extension to NFS 4.1, which Amazon EFS is supporting. However, pNFS is not mentioned in this blog post, their website or the official docs.

That would be an interesting feature.


Finally! My client was unable to use AWS for lacking of real POSIX fs.


I wonder if some can run a postgresql database on top of that. This would make it relativly easy to have a master without downtime and be way cheaper than RDS.


It claims to present itself as a full filesystem so I don't see why it wouldn't, however, you're probably going to run into pretty hardcore performance issues, because the bandwidth amazon provides in general is on the stingy side, but under a terabyte, they're downright miserly, with a 10 gig filesystem only able to provide 0.5 MB/s sustained. This product is problematic at best.


Also, it is a full-fledged NFS-mounted file system.

I would be very wary of running a database from that.


It sounds like a disaster waiting to happen. The latency of NFS is just too high (Even if it is very fast for NFS) for it to be safe for a database.


I was more thinking about the risk of locking not working as intended on NFS (http://0pointer.de/blog/projects/locking2. See also the section on Nightmare File System in the UNIX-Haters Handbook (https://en.wikipedia.org/wiki/The_Unix-Haters_Handbook)).

There are workarounds for the stateless mess of NFS in modern linuxes, but if you accidentally access a NFS-mounted database from multiple AWS instances, you might get into big trouble, especially when the machines in question run different OSes (e.g. Windows and Linux) or OS versions.


We're running Oracle on NFS volumes, and some MySQL databases also with no problems whatsoever.

It'll never be as fast as local disk but the flexibility it provides is very nice.


That's normal NFS. Amazon's offerings, if you have a 10 gigabyte database, you can only get 500KB/sec bandwidth --

http://docs.aws.amazon.com/efs/latest/ug/performance.html


Huh. well that's kinda shitty.

Our NetApp can saturate 10GBe without even trying.


How exactly is it "without downtime"?


It's essentially NFS, so yes.


Are there any third party performance metrics yet? Read/write ops per second, and read/write bytes per second from one or many machines?

Would love to see this in comparison to a 10Gbps (or 1Gpbs) attached spinning rust or SSD NSF drive. That would really help me understand what tradeoffs are happening here.


I really want to be able to access EFS file systems from Lambda. Has anyone figured out a way to access NFS?


Can EFS be used as a volume for ECS tasks?


Here's a post we wrote earlier in the year about EFS and ECS: https://aws.amazon.com/blogs/compute/using-amazon-efs-to-per...


In the blog post the EFS is mounted by the EC2 instances from the start but i expected to be dynamic depending on the scheduled tasks on the instance. I dont want to couple EFS volumes which might be used by the containers if they might get scheduled on an instance. I would like to have a general purpose cluster and let the scheduler instrument the EC2 instance to mount an EFS volume if its needed by a scheduled task dynamically.


Thanks!


If you use the Convoy NFS volume driver with EFS, your containers can use any NFS server (including EFS) as highly available persistent volume storage: https://github.com/rancher/convoy


Glad to finally see this in production, wish it was available across VPC peering, VPNs, etc.


This is pretty awesome.

Even though we don't use AWS much due to lack of IPv6 support, we've wanted to have a giant bottomless FS for various purposes. It'd be easy to set that up in here and export it over a virtual network.


Does EFS provide better throughput and latency than S3?


Seems like no Windows support?


That's a good question. Originally it was Linux only as the EFS client was locked to NFS 4.0 and the Windows Server NFS client was 4.1 (why 4.1 is not backwards compatible, I don't know). However, the announcement mentioned using the PowerShell cmdlets to attach EFS volumes, and in fact the PowerShell help pages have specific commands for working with EFS. So, my guess is that it is now available for Windows and Jeff didn't do a very good job of bringing that to light.


The EC2 example shows mounting with nfsvers=4.1, so I'm guessing it got resolved by bumping the version supported.


Still no support for incremental snapshots though right?


This feels rather...primitive for a network filesystem introduced in 2016. Where are the snapshots? Where are the subvolumes? Where's the ability to send/receive volumes/subvolumes to filesystems in other availability zones? There are also I/O limits if you have a smaller filesystem with frequently accessed data, an inability to mount on machines through vpn gateways, and a lot of other seriously rough edges.

It's an interesting start to a project, but the public features feel very much lacking.


It seems like this is more of a glue layer between a more interesting backend service (EFS) and clients, but I don't know that I'd expect Amazon to do something like write a native client, versus just probably improving the NFS client behavior on Windows/Linux/...

The neat trick would be if NFS 4.X grew support for communicating interesting operations like snapshotting.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: