Redpanda is much more lean and scales much better for low latency use cases. It does a bunch of kernel bypass and zero copy mechanisms to deliver low latency. Being in C++ means it can fit into much smaller footprints than Apache Kafka for a similar workload
Those are all good points and pros for redpanda vs Kafka but my question stills stands. Isn't redpanda designed for high-volume scale similar to the use cases for Kafka rather than the low volume workloads talked about in the article?
In kafka, if you require the highest durability for messages, you configure multiple nodes on different hosts, and probably data centres, and you require acks=all. I'd say this is the thing that pushes latency up, rather than the code execution of kafka itself.
How does redpanda compare under those constraints?
It's pretty safe. Kafka replicates to 3 nodes (no fsync) before the request is completed. What are the odds of all 3 nodes (running in different data centers) failing at the same time?
It's just my polite way of saying it's safe enough for most use cases and that you're wrong.
The fsync thing is complete FUD by RedPanda. They later introduce write caching[1] and call it an innovation[2]. I notice you also work for them.
Nevertheless, those that are super concerned with safety usually run with an RF of 5 (e.g banks).
And you can configure Kafka to fsync as often as you want[3]
It's just my polite way of saying it's safe enough for most use cases and that you're wrong.
Low volume data can be some of the most valuable data on the planet. Think SEC reporting (EDGAR), law changes (Federal Register), court judgements (PACER), new cybersecurity vulnerabilities (CVEs), etc. Missing one record can be detrimental if its the one record that matters.
Does everyone need durability by default? Probably not, but Redpanda users get it for free because there is a product philosophy of default-safe behavior that aligns with user expectations - most folks don't even know how this stuff works, why not protect them when possible?
The fsync thing is complete FUD by RedPanda.
You want durability? Pay the `fsync()` cost. Otherwise recognize that acknowledgement and durability are decoupled and that the data is sitting in unsafe volatile memory for a bit.
They later introduce write caching[1] and call it an innovation[2].
There are legitimate cases where customers don't care about durability and want the fastest possible system. We heard from these folks and responded with a feature they can selectively opt-in for that behavior _knowing the risks_. Again the idea is to be safer by default, and allow folks to opt-in to more risky behaviors.
those that are super concerned with safety usually run with an RF of 5 (e.g banks)
Going above RF=3 does not guarantee "more nines" since you need more independent server racks, independent power supplies or UPSs, etc, otherwise you're just pigeonholing yourself. This greatly drives up costs. Disks and durability is just cheaper and simpler. Worst case you pull the drives and pull the data off them, not fun and not easy, but possible unlike in-memory copies.
And you can configure Kafka to fsync as often as you want[3]
Absolutely! But nobody changes the default which is the issue - expectations of new users are not aligned with actual behavior. Same thing happened during the early MongoDB days. Either there needs to be better documentation/education to have people understand what the durability guarantees actually are, or change the defaults.
I agree that data can be valuable and even one record loss can be catastrophic.
I agree that there needs to be better documentation.
I just don't agree that losing 3 replicas each living in a different DC at once is a realistic concern. The ones that would truly be concerned about this issue would do one of two things - run RF>3 (yes, it costs more) or set up some disaster recovery strategy (e.g run in multiple regions, yes that costs more.)
Because truth be told - losing 3 AZs at once is a disaster. And even if you durably persisted to disk - all 3 disks may have become corrupt anyway.
It is not FUD. It is deterministic. Reproducible on your laptop. Out of all the banks I work with only a handful of use cases use rf=5. Defaults matter, because most people do not change them.