Looks good! Do you have any benchmark against Debezium for CDC?

saisrirampur · on July 27, 2023

Not yet. But very soon. A few benefits of PeerDB vs Debezium incl. 1/ easy to setup and work with - no dependence on kafka, zookeper, kafka connect. 2/ managed experience for CDC from PostgreSQL through our enterprise & hosted offerings 3/ performance wise, with the optimization we are doing (parallelized initial loads, parallelized reading of slots, leaner signature of CDC on the target), I'm expecting PeerDB to be better. However not sure by how much. Stay tuned for a future post on this :)

gunnarmorling · on July 28, 2023

Congrats on your launch, great to see that much activity in the CDC field. A few comments on the comparison to Debezium (I've been its project lead for many years):

> no dependence on kafka, zookeper, kafka connect

That's not required with Debezium either, using Debezium Server (even with Kafka, ZK is obsolete nowadays anyways)

> managed experience for CDC from PostgreSQL through our enterprise & hosted offerings

There's several hosted offerings based on Debezium (one example being what we do at decodable.co)

> performance wise, with the optimization we are doing

I'd love to learn more about this. Debezium also supports parallel snapshotting, but I'm not clear what exactly you mean by parallelized reading of slos and the CDC impact on targets. Looking forward to reading your blog post :)

saisrirampur · on July 28, 2023

Thanks for the above reply! Useful feedback/inputs for us.

> Looks like Debezium Server doesn't require Kafka, ZK for setup. However it supports only messaging queues as sinks (targets). So to stream CDC from Postgres to a DWH - one needs to a) setup/manage messaging infra as a part of their stack to capture CDC changes b) write/manage reading from the message queue and replaying the changes to the target store (ex: Snowflake, BQ etc). With PeerDB, you can skip these 2 steps. CREATE MIRROR can have targets that are queues, DWHs or databases.

> That is true, however the ones we tried aren't very simple to work with. For example - confluent was super tricky to work with. One has to setup a sink (to message queues), use another connector to move those changes to snowflake, use something else to normalize those changes to the final table. Overall total number of moving parts were quite a lot. decodable.co might give a better experience, will give it shot! :)

> On parallel snapshotting, very interesting, looks like it is a recent feature Debezium added (March 2023). I missed that one. In regards to parallelized reading of slot, Postgres enables you to read a single slot concurrently across 2 connections using 2 separate publications - each publication filters a set of tables. Same here, we are also excited for the benchmarks. Will keep you posted! :)

Thanks again!