Great article - would have been useful to read before starting out on the journey of making rclone mount (mount your cloud storage via fuse)!
After a lot of iterating we eventually came up with the VFS layer in rclone which adapts S3 (or any other similar storage system like Google Cloud Storage, Azure Blob, Openstack Swift, Oracle Object Storage, etc) into a POSIX-ish file system layer in rclone. The actual rclone mount code is quite a thin layer on top of this.
The VFS layer has various levels of compatibility "off" where it just does directory caching. In this mode, like the article states you can't read and write to a file simultaneously and you can't write to the middle of a file and you can only write files sequentially. Surprisingly quite a lot of things work OK with these limitations. The next level up is "writes" - this supports nearly all the POSIX features that applications want like being able to read and write to the same file at the same time, write to the middle of the file, etc. The cost for that though is a local copy of the file which is uploaded asynchronously when it is closed.
Here are some docs for the VFS caching modes - these mirror the limitations in the article nicely!
By default S3 doesn't have real directories either. This means you can't have a directory with no files in, and directories don't have valid metadata (like modification time). You can create zero length files ending in / which are known as directory markers and a lot of tools (including rclone) support these. Not being able to have empty directories isn't too much of a problem normally as the VFS layer fakes them and most apps then write something into their empty directories pretty quickly.
So it is really quite a lot of work trying to convert something which looks like S3 into something which looks like a POSIX file system. There is a whole lot of smoke and mirrors behind the scene when things like renaming an open file happens and other nasty corner cases like that.
Rclone's lower level move/sync/copy commands don't bother though and use the S3 API pretty much as-is.
If I could change one thing about S3's API I would like an option to read the metadata with the listings. Rclone stores modification times of files as metadata on the object and there isn't a bulk way of reading these, you have to HEAD the object. Or alternatively a way of setting the Last-Modified on an object when you upload it would do too.
> If I could change one thing about S3's API I would like an option to read the metadata with the listings. Rclone stores modification times of files as metadata on the object and there isn't a bulk way of reading these, you have to HEAD the object. Or alternatively a way of setting the Last-Modified on an object when you upload it would do too.
I wonder if you couldn't hack this in by storing the metadata in the key name itself? Obviously with the key length limit of 1024 you would be limited in how much metadata you could store, but it's still quite a lot of space, even taking into account the file path. You could use a deliminator that would be invalid in a normalized path, like '//', for example: /path/to/file.txt//mtime=1710066090
You would still be able to fetch "directories" via prefixes and direct files by using '<filename>//' as the prefix.
This kind of formatting would probably make it pretty incompatible with other software though.
I think that is a nice idea - maybe something we could implement in an overlay backend. However people really like the fact that the object they upload with rclone arrive with the filenames they had originally on s3, so I think the incompatible with other software downside would make it unattractive for most users.
> If I could change one thing about S3's API I would like an option to read the metadata with the listings.
Agree. In MinIO (disclaimer: I work there) we added a "secret" parameter (metadata=true) to include metadata and tags in listings if the user has the appropriate permissions. Of course it being an extension it is not really something that you can reliably use. But rclone can of course always try it and use it if available :)
> You can create zero length files ending in /
Yeah. Though you could also consider "shared prefixes" in listings as directories by itself. That of course makes directories "stateless" and unable to exist if there are no objects in there - which has pros and cons.
> Or alternatively a way of setting the Last-Modified on an object when you upload it would do too.
Yes, that gives severe limitations to clients. However it does make the "server" time the reference. But we have to deal with the same limitation for client side replication/mirroring.
My personal biggest complaint is that there isn't a `HeadObjectVersions` that returns version information for a single object. `ListObjectVersions` is always going to be a "cluster-wide" operation, since you cannot know if the given prefix is actually a prefix or an object key. AWS recently added "GetObjectAttributes" - but it doesn't add version information, which would have fit in nicely there.
> Agree. In MinIO (disclaimer: I work there) we added a "secret" parameter (metadata=true) to include metadata and tags in listings if the user has the appropriate permissions. Of course it being an extension it is not really something that you can reliably use. But rclone can of course always try it and use it if available :)
Is this "secret" parameter documented somewhere? Sounds very useful :-) Rclone knows when it is talking to Minio so we could easily wedge that in.
> My personal biggest complaint is that there isn't a `HeadObjectVersions` that returns version information for a single object. `ListObjectVersions` is always going to be a "cluster-wide" operation, since you cannot know if the given prefix is actually a prefix or an object key
Yes that is annoying having to do a List just to figure out which object Version is being referred to. (Rclone has this problem when using --s3-list-version).
After a lot of iterating we eventually came up with the VFS layer in rclone which adapts S3 (or any other similar storage system like Google Cloud Storage, Azure Blob, Openstack Swift, Oracle Object Storage, etc) into a POSIX-ish file system layer in rclone. The actual rclone mount code is quite a thin layer on top of this.
The VFS layer has various levels of compatibility "off" where it just does directory caching. In this mode, like the article states you can't read and write to a file simultaneously and you can't write to the middle of a file and you can only write files sequentially. Surprisingly quite a lot of things work OK with these limitations. The next level up is "writes" - this supports nearly all the POSIX features that applications want like being able to read and write to the same file at the same time, write to the middle of the file, etc. The cost for that though is a local copy of the file which is uploaded asynchronously when it is closed.
Here are some docs for the VFS caching modes - these mirror the limitations in the article nicely!
https://rclone.org/commands/rclone_mount/#vfs-file-caching
By default S3 doesn't have real directories either. This means you can't have a directory with no files in, and directories don't have valid metadata (like modification time). You can create zero length files ending in / which are known as directory markers and a lot of tools (including rclone) support these. Not being able to have empty directories isn't too much of a problem normally as the VFS layer fakes them and most apps then write something into their empty directories pretty quickly.
So it is really quite a lot of work trying to convert something which looks like S3 into something which looks like a POSIX file system. There is a whole lot of smoke and mirrors behind the scene when things like renaming an open file happens and other nasty corner cases like that.
Rclone's lower level move/sync/copy commands don't bother though and use the S3 API pretty much as-is.
If I could change one thing about S3's API I would like an option to read the metadata with the listings. Rclone stores modification times of files as metadata on the object and there isn't a bulk way of reading these, you have to HEAD the object. Or alternatively a way of setting the Last-Modified on an object when you upload it would do too.