Am I the only one surprised to read that this database relies on periodic flushi...

eatonphil · on May 3, 2023

I was surprised by this too. I asked the author about it on Twitter [0]. At the very least it seems like fsync is something you can opt into in their configuration, even if it's not the default.

https://twitter.com/eatonphil/status/1653373246929027075

dmazin · on May 3, 2023

Nice. Glad you asked.

From a developer:

“As long as the OS & the HW doesn't crash, the data is safe thanks to the page cache”

This is so strange to me. A database that is non-durable by default. OK…

josephg · on May 3, 2023

A few years ago I was doing some consulting for a medical tech startup run by an ex-doctor. They were thinking of using mongodb until I explained how it had a reputation for losing data. I'll never forget the look of horror, disgust and confusion on his face. He turned to me and said "A database that forgets things!? Why would anyone want that??".

I still don't have an answer for him. It sounds just as strange to me too.

nhourcard · on May 3, 2023

MongoDB is one of the most successful open-source databases of all time. The parent company is a listed company and worth $15BN, 3x more than Elastic to put some perspective.

This reflection [1] came from the founders of RethinkDB, a competitor of MongoDB at the time:

"It turned out that correctness, simplicity of the interface, and consistency are the wrong metrics of goodness for most users. The majority of users wanted these three trade-offs instead:

- A use case. We set out to build a good database system, but users wanted a good way to do X (e.g. a good way to store JSON documents from hapi, a good way to store and analyze logs, a good way to create reports, etc.).

- Timely arrival. They wanted the product to actually exist when they needed it, not three years later.

- Palpable speed [...]. MongoDB mastered these workloads brilliantly, while we fought the losing battle of educating the market."

MongoDB narrowed things down for a specific use case, and became the best for that use case. This comes with trade-offs. MongoDB was probably not the best database for healthcare back in the days, but that is OK. It did the job very well for other use cases and industries. And over time, they fixed the issue around losing data and became more stable. Essentially, they made developers feel like superheroes, and over time improved their product, and eventually grabbed a massive market share.

[1] https://www.defmacro.org/2017/01/18/why-rethinkdb-failed.htm...

pritambaral · on May 3, 2023

> MongoDB is one of the most successful open-source databases of all time.

It used to be open source. It's not anymore

> The parent company is a listed company and worth $15BN, 3x more than Elastic to put some perspective.

That's purely a capitalistic argument and makes no difference to whether the product is any good. For example, there's plenty of "churches" that are richer than MongoDB Inc. and absolutely abhorrent and evil.

> This comes with trade-offs.

The only thing that required the trade-off of data loss was cheating in benchmarks in order to hoodwink naive potential users into using their dangerous product. MongoDB Inc. has always preferred to lie to their users. It is not a database company; it's a marketing company with a product they label as a database. And that's a smart way to make money, sure, because of vendor lock-in, but it's not a smart way to gain trust.

josephg · on May 3, 2023

Cold comfort to all the companies who believed mongo’s marketing claims and then lost data because of their shoddy engineering. Or the users who had their data stolen because mongo shipped with insecure defaults. (Not entirely mongo’s fault, but they deserve some of the blame).

As engineers we bear responsibility for how our work impacts society. Mongodb may have made their investors a lot of money, but they did sloppy work and didn’t do right by their customers. That’s not a success in my book.

matthews2 · on May 3, 2023

It's like Snapchat, but for databases!

ayende · on May 3, 2023

The issue is that fsync is super expensive. Like, it's not even funny.

There are many cases, and ingest is one of them, where no being durable is fine. If you can either:

* Repeat the whole process on failure (which is assumed to be rare) * Recover from the failure without data corruption (distinct from data loss, mind)

In those cases, being 10x faster is very compelling.

Note that this is about ingest for bulk loads, while online transactions not being durable is a really bad idea.

For bulk load ingest, you can usually retry the whole operation. Not so for transactions.

scottlamb · on May 3, 2023

At least given the append-only nature, the data loss should be bounded.

For some reason, many databases that overwrite data support disabling journaling and/or fsync. E.g. SQLite has "pragma journal = off". You can lose the entire database from an ill-timed crash, if one important page gets written but another doesn't. To their credit, it's not the default, and the documentation is explicit about this:

> If the application crashes in the middle of a transaction when the OFF journaling mode is set, then the database file will very likely go corrupt.

https://www.sqlite.org/pragma.html#pragma_journal_mode

tkhattra · on May 3, 2023

it supports different commit modes (see [1]) - nosync, async, and sync. you can choose the more expensive but safer async or sync modes if you're willing to tolerate higher commit latency.

[1] https://questdb.io/docs/reference/configuration/#cairo-engin...

dmazin · on May 3, 2023

Yeah it’s just… it’s a database. Weird default.

AlfeG · on May 3, 2023

Because otherwise competitive benchmarks will show not so great results against other dbses

ddorian43 · on May 3, 2023

That's what ~every "cloud" database does because disks in the cloud are slow.