Redpanda is much more lean and scales much better for low latency use cases. It ...

drinker · 2025-02-20T04:09:18 1740024558

Those are all good points and pros for redpanda vs Kafka but my question stills stands. Isn't redpanda designed for high-volume scale similar to the use cases for Kafka rather than the low volume workloads talked about in the article?

rockwotj · 2025-02-20T05:42:06 1740030126

When the founder started it was designed to be two things:

* easy to use * more efficient and lower latency than the big resources needed for Kafka

The efficiency really matters at scale and low latency yes but the simplicity of deployment and use is also a huge win.

munksbeer · 2025-02-20T11:21:08 1740050468

In kafka, if you require the highest durability for messages, you configure multiple nodes on different hosts, and probably data centres, and you require acks=all. I'd say this is the thing that pushes latency up, rather than the code execution of kafka itself.

How does redpanda compare under those constraints?

rockwotj · 2025-02-21T01:33:10 1740101590

Oh if you care about durability on Kafka vs Redpanda, see https://www.redpanda.com/blog/why-fsync-is-needed-for-data-s..., acks=all does not fsync (by default before acknowledging the write), so it's still not safe. We use raft for the data path, a proven replication protocol (not the custom ISR protocol) and fsync by default for safety (although if you're good with relaxed durability like in Kafka you can enable that too: https://www.redpanda.com/blog/write-caching-performance-benc...).

As for Redpanda vs Kafka in multi AZ setups and latency, the big win in Redpanda is tail latencies are kept low (we have a variety of techniques to do this). Here's some numbers here: https://www.redpanda.com/blog/kafka-kraft-vs-redpanda-perfor...

Multi AZ latency is mostly single digit millisecond (ref: https://www.bitsand.cloud/posts/cross-az-latencies/) and the JVM can easily take just as long during GC, which can drive up those tail latencies.

enether · 2025-02-22T18:32:29 1740249149

It's pretty safe. Kafka replicates to 3 nodes (no fsync) before the request is completed. What are the odds of all 3 nodes (running in different data centers) failing at the same time?

rockwotj · 2025-02-23T04:49:54 1740286194

Generally if you care about safety then “pretty safe” doesn’t cut it.

enether · 2025-02-23T10:01:26 1740304886

It's just my polite way of saying it's safe enough for most use cases and that you're wrong.

The fsync thing is complete FUD by RedPanda. They later introduce write caching[1] and call it an innovation[2]. I notice you also work for them.

Nevertheless, those that are super concerned with safety usually run with an RF of 5 (e.g banks). And you can configure Kafka to fsync as often as you want[3]

1 - https://www.redpanda.com/blog/write-caching-performance-benc... 2 - https://www.linkedin.com/posts/timfox_oh-you-cant-make-this-... 3 - https://kafka.apache.org/documentation/#brokerconfigs_log.fl...

deniscoady · 2025-02-23T20:54:48 1740344088

Disclaimer: I currently work for Redpanda.

It's just my polite way of saying it's safe enough for most use cases and that you're wrong.

Low volume data can be some of the most valuable data on the planet. Think SEC reporting (EDGAR), law changes (Federal Register), court judgements (PACER), new cybersecurity vulnerabilities (CVEs), etc. Missing one record can be detrimental if its the one record that matters.

Does everyone need durability by default? Probably not, but Redpanda users get it for free because there is a product philosophy of default-safe behavior that aligns with user expectations - most folks don't even know how this stuff works, why not protect them when possible?

The fsync thing is complete FUD by RedPanda.

You want durability? Pay the `fsync()` cost. Otherwise recognize that acknowledgement and durability are decoupled and that the data is sitting in unsafe volatile memory for a bit.

They later introduce write caching[1] and call it an innovation[2].

There are legitimate cases where customers don't care about durability and want the fastest possible system. We heard from these folks and responded with a feature they can selectively opt-in for that behavior _knowing the risks_. Again the idea is to be safer by default, and allow folks to opt-in to more risky behaviors.

those that are super concerned with safety usually run with an RF of 5 (e.g banks)

Going above RF=3 does not guarantee "more nines" since you need more independent server racks, independent power supplies or UPSs, etc, otherwise you're just pigeonholing yourself. This greatly drives up costs. Disks and durability is just cheaper and simpler. Worst case you pull the drives and pull the data off them, not fun and not easy, but possible unlike in-memory copies.

And you can configure Kafka to fsync as often as you want[3]

Absolutely! But nobody changes the default which is the issue - expectations of new users are not aligned with actual behavior. Same thing happened during the early MongoDB days. Either there needs to be better documentation/education to have people understand what the durability guarantees actually are, or change the defaults.

enether · 2025-02-23T21:59:29 1740347969

I agree that data can be valuable and even one record loss can be catastrophic.

I agree that there needs to be better documentation.

I just don't agree that losing 3 replicas each living in a different DC at once is a realistic concern. The ones that would truly be concerned about this issue would do one of two things - run RF>3 (yes, it costs more) or set up some disaster recovery strategy (e.g run in multiple regions, yes that costs more.)

Because truth be told - losing 3 AZs at once is a disaster. And even if you durably persisted to disk - all 3 disks may have become corrupt anyway.

agallego · 2025-02-23T14:04:01 1740319441

It is not FUD. It is deterministic. Reproducible on your laptop. Out of all the banks I work with only a handful of use cases use rf=5. Defaults matter, because most people do not change them.

enether · 2025-02-23T22:00:17 1740348017

Defauls do matter, in principle. But I think this particular risk is overblown, see my other reply for my thoughts