> But, until today, there has been no equivalent to ADLS for S3. ObjectiveFS has...

daviesliu · on Nov 22, 2020

There are different when they are read/write from multiple clients in same time. HopsFS can use the meta service to synchronize with each other under low latency (about ms), but ObjectiveFS have to use S3 to do the synchronisation, which has much higher latency (> 20ms).

We have chatted on these with the founders of ObjectiveFS before creating JuiceFS, they did NOT recommend to use ObjectiveFS in big data workload with Hadoop/Spark, that's why we started to build JuiceFS since 2016.

SirOibaf · on Nov 21, 2020

I'd say that the main difference with ObjectiveFS are metadata operations. From the documentation of ObjectiveFS:

`doesn’t fully support regular file system semantics or consistency guarantees (e.g. atomic rename of directories, mutual exclusion of open exclusive, append to file requires rewriting the whole file and no hard links).`

HopsFS does provide strongly consistent metadata operations like atomic directory rename, which is essential if you are running frameworks like Apache Spark.

mwcampbell · on Nov 21, 2020

That quote from the ObjectiveFS documentation [1] is out of context. It was describing limitations in s3fs, not ObjectiveFS. My understanding is that because ObjectiveFS is a log-structured filesystem that uses S3 as underlying storage, it doesn't have those limitations.

[1]: https://objectivefs.com/faq

jamesblonde · on Nov 22, 2020

A quick read of ObjectiveFS, and it doesn't appear to be a distributed filesystem. It appears to be a single service (log -structured storage on a service) that is backed by S3. Am I wrong? (HopsFS is a distributed hierarchical FS).

mwcampbell · on Nov 22, 2020

If I understand the distinction you're making, then yes, you're wrong. The really beautiful thing about ObjectiveFS is that it's distributed, but the user doesn't really have to be concerned about that. A user can mount the same S3-backed ObjectiveFS filesystem on multiple machines, and they will somehow coordinate their reads and writes to that one S3 bucket, without having to communicate with any central component besides S3 itself and the ObjectiveFS licensing server.

jamesblonde · on Nov 22, 2020

That's not what I meant by a distributed file system. Is the metadata layer of ObjectiveFS distributed - scale-out. Can you add nodes to increase the metadata layer's capacity and throughput, so it can scale to handle millions of ops/sec? That's what I meant by a distributed file system (not just a client server, with a single metadata server).

mwcampbell · on Nov 22, 2020

There is no metadata server. The clients access S3 directly, and keep their own caches of both metadata and data. I don't know how the clients cootrdinate their writes and invalidate their caches, but somehow they do. From my perspective as a user, it just works.