I ran a few dozen kafka clusters at MegaCorp in a previous life. My answer to an...

ThinkBeat · on Oct 18, 2021

This is exactly how I feel about it.

A while back I was on team building a non-critical, low volume application. It just involves people sending a message basically. (there is more too it).

The consultants said he had to use Kafka because the messages could come in really fast.

I said we should stick with Postgres.

No, they said, we really need Kakfa to be able to handle this.

Then I went and spun up Postgres on my work laptop (nothing special), and got a loaner to act as a client. I simulated about 300% more traffic than we had any chance of getting. It worked fine. (did tax my poor work laptop).

No, we could not risk it, when we use Kafka we are safe.

Took it to management, Kafka won since Buzzword.

Now of course we have to write a process to feed the data into Postgres. After all its what everything else depends on

jghn · on Oct 18, 2021

People tend to not realize how big "at scale" problems really are. Instead anything at a scale at the edge of their experience is "at scale" and they reach for the tools they've read one is supposed to use in those situations. It makes sense, people don't know what they don't know.

And thus we have a world where people have business needs that could be powered by my low end laptop but solutions inspired by the megacorps.

jerf · on Oct 18, 2021

The GHz aspect of Moore's law died over a decade ago, and I suppose it's fair to say most other stuff has also slowed down, but if you've got a job that is embarrassingly parallel, which a lot of these "big data" jobs are, people badly underestimate how much progress there has been in the server space even so in the last 10 years if they're not paying attention. What was "big data" in 2011 can easily be "spin up a single 32-core instance with 1TB RAM for a couple of hours" in 2021. Even beyond the "big data" that my laptop comfortably handles.

I'm slowly wandering into a data science role lately, and I've been dealing with teams who are all kinds of concerned about whether or not we can handle the sheer, overwhelming volume of their (summary) data. "Well, let's see, how much data are we talking about?" "Oh, gosh, we could generate 200 or 300 megabytes a day." (Of uncompressed JSON.) Well, you know, if I have to bust out a second Raspberry Pi I'll be sure to charge it to your team.

The funny thing is that some of these teams have the experience that they ought to know better. They are legitimately running cloud services with dozens of large nodes continually running at high utilization and chewing through gigabytes of whatever per second. In their own worlds they would absolutely know that a couple hundred megabytes is nothing. They'll often have known places in their stack where they burn through a few hundred megabytes in internal API calls or something unnecessarily, and it will barely rise to the level of a P3 bug, quite legitimately so. But when they start thinking in terms of (someone else's) databases it's like they haven't updated their sense of size since 2005.

yongjik · on Oct 18, 2021

You are not thinking enterprisey enough. Everything is Big Scale if you add enough layers, because all these overheads add up, to which the solution is of course more layers.

sparsely · on Oct 18, 2021

There are various Kafka to Postgres adaptors. Of course, now you're running 3 bits of software, with 3 bottlenecks, instead of just 1.

dtech · on Oct 18, 2021

Were you advocating an endpoint + DB or different apps directly writing into a shared DB? The latter is not really a good idea for numerous reasons. Kafka a - potentially overkill - replacement for REST or whatever, not for your DB.

i_like_waiting · on Oct 18, 2021

But can you stream data from lets say MS SQL directly into Postgres? Easiest way I found is Kafka, I would love some simple python script instead

BiteCode_dev · on Oct 18, 2021

Stream? Unless you have really hight traffic, put that in a python script while loop that regularly check for new rows, and it will be fine.

If you want to get fancy, db now have pub/sub.

There are use case for stream replication, but you need way more data than 99% of biz have.

Spivak · on Oct 19, 2021

You don’t want stream replication for scale, you want it for availability and durability. The number of times replication has saved my ass is too damn high.

And you want Kafka (or something like it) the moment you need two processes handling updates which again you probably want for availability reasons so a crash doesn’t stop the stream.

You also don’t catch deletes or updates with this setup. But there’s a million db to Kafka connectors that read the binlog or pretend to be a replica to handle this.

BiteCode_dev · on Oct 19, 2021

Sure, if you need high availability + replication cross db, go for Kafka, but that's the thread point: it's not a common use case.

Most people don't need high availability, and most replication use the same db which comes with tooling for that.

superyesh · on Oct 18, 2021

>My answer to anyone who asks for kafka: Show me that you can't do what you need with a beefy Postgres.

Sorry thats just a clickbait-y statement. I love Postgres, try handling 100-500k rps of data coming in from various sources reading and writing to it. You are going to get bottlenecked on how many connections you can handle, you will end up throwing pgBouncers on top of it.

Eventually you will run out of disk, start throwing more in.

Then end up in VACCUUM hell all while having a single point of failure.

While I agree Kafka has its own issues, it is an amazing tool to a real scale problem.

dwohnitmok · on Oct 18, 2021

I think anotherhue would agree that half a million write requests per second counts as a valid answer to "you can't do what you need with a beefy Postgres," but that is also a minority of situations.

IggleSniggle · on Oct 18, 2021

It's just hard to know what people mean when they say "most people don't need to do this." I was sitting wondering about a similar scale (200-1000 rps), where I've had issues with scaling rabbitmq, and have been thinking about whether kafka might help.

Without context provided, you might think: "oh, here's somebody with kafka and postgres experience, saying that postgres has some other super powers I hadn't learned about yet. Maybe I need to go learn me some more postgres and see how it's possible."

It would be helpful for folks to provide generalized measures of scale. "Right tool for the job," sure, but in the case of postgres, it often feels like there are a lot of incredible capabilities lurking.

I don't know what's normal for day-to-day software engineers anymore. Was the parent comment describing 100-500 rps really "a minority of situations?" I'm sure it is for most businesses. But is it "the minority of situations" that software engineers are actively trying to solve in 2021? I have no clue.

dwohnitmok · on Oct 18, 2021

Note superyesh was talking about 100 to 500 thousand requests per second. Your overall question stands, but the scale superyesh was talking about is very different and I am quite confident superyesh's scale is definitely in the minority.

IggleSniggle · on Oct 19, 2021

Oops, yes, was omitting the intended "k", totally skewing the scale of infrastructure my comment was intending to describe. Very funny, ironic. Unfortunately I can no longer edit that comment.

Cederfjard · on Oct 18, 2021

I’m not sure if you’re omitting the k in your numbers, or missed it in the other comment? Do you mean 100-500 and 200-1000, or 100 000-500 000 and 200 000-1 000 000?

IggleSniggle · on Oct 19, 2021

Yes, quite ironically, I accidentally forgot the "k" in my numbers. Oops.

evntdrvn · on Oct 18, 2021

that seems like an awfully low number to be running into issues with RabbitMQ ?

fernandotakai · on Oct 18, 2021

at least for me, every system has its place.

i love posgresql, but i would not use it to replace a rabbitmq instance -- one is an RDBMS, the other is a queue/event system.

"oh but psql can pretend to be kafka/rabbitmq!" -- sure, but then you need to add tooling to it, create libraries to handle it, and handle all the edge cases.

with rmq/kafka, there already a bunch of tools to handle the exact case of a queue/event system.

zbentley · on Oct 20, 2021

I think having ad hoc query capabilities on your job queue/log (not possible with rabbit and only possible by running extra software like KQL with Kafka, and even then at a full-table-scan cost equivalent) is a benefit to using postgres that should not be overlooked. For monitoring and debugging message broker behavior SQL is a very powerful tool.

I say this as someone squarely in the "bigger than can work with an RDBMS" message processing space, too: until you are at that level, you need less tooling (e.g. read replicas, backups, very powerful why-is-it-on-fire diagnostics informed by decades of experience), and get generally higher reliability, with postgres as a broker.

doliveira · on Oct 18, 2021

Yeah, once you do have to scale a relational database you're in for a world of pain. Band-aid after band-aid... I very much prefer to just start with Kafka already. At the very least you'll have a buffer to help you gain some time when the database struggles.

anotherhue · on Oct 18, 2021

Indeed! That sounds absolutely like it requires a real pub/sub. 'Actually needing it' is the exception in my experience though.

sumtechguy · on Oct 18, 2021

I have done both patterns.

Kafka has its use case. Databases have theirs. You can make a DB do what kafka does. But you also add in the programming overhead of getting the DB semantics correct to make an event system. When I see people saying 'lets put the DB into kafka' I make the exact same argument. You will spend more time making kafka act like a database and getting the semantics right. Kafka is more of a data/event transportation system. A DB is an at rest data store that lets you manipulate the data. Use them to their strengths or get crushed by weird edge cases.

tengbretson · on Oct 18, 2021

I'd argue that having the event system semantics layered on top of a sql database is a big benefit when you have an immature product, since you have an incredibly powerful escape hatch to jump in and fire off a few queries to fix problems. Kafka's visibility for debugging is pretty poor in my experience.

sumtechguy · on Oct 18, 2021

My issues typically with layering an event system on top of a db is replication and ownership of that event. Kafka makes some very nice guarantees about giving best attempt to make sure only one process works on something at a time inside of a consumer group. You have to build a system in the db using locks and different things that are poor substitutes.

If you are having trouble debugging kafka you could use a connector to put the data into the database/file to also debug, or a db streamer. You can also use the built in cli tools to scroll along. I have had very good luck with using both of those to find out what is going wrong. Also kafka will basically by default keep all the messages for the past 7 days so you can play it back if you need to by moving the consumer offsets. IF you are trying to use kafka like a db and change messages on the fly you will have a bad time of it. Kafka is meant to be a here is something that happened and some data. Changing that data after the fact would in the kafka world be another event. In some types of systems that is a very desirable property (such as audit heavy cultures, banks, medical, etc). Now also kafka can be a real pain if you are debugging and messup a schema or produce a message that does not fit into the consumer. Getting rid of that takes some poor cli trickery when in a db it is a delete call.

Also kafka is meant for a distributed system event based worker systems (typically some sort of microservice style system). If you are early on you more than likely not building that yet. Just dumping something into a list in a table and polling on that list is a very effective way for something that is early on or maybe even forever. But once you add in replication and/or multiple clients looking at that same task list table you will start to see the issues quickly.

Using an event system like a db system and yes it will feel broken. Also vice versa. You can do it. But like you are finding out those edge cases are a pain and make you feel like 'bah its easier to do it this way'. In some cases yes. In your case you have a bad state in your event data. You are cleaning it up with some db calls. But what if instead you had an event that did that for you?

BiteCode_dev · on Oct 18, 2021

Well, if you want events, you have lighter alternative. Redis pub/sub, crossbar... Even rabbitMQ is lighter.

If you need queuing, you have libs like celery.

You don't need to go full kafka

sumtechguy · on Oct 19, 2021

I do not disagree. Kafka is kind of a monster to fully configure correctly. Once they get rid of zookeeper it may be nicer to spin up and 'set it and forget it' sort of thing like the others.

emerongi · on Oct 18, 2021

Redis - it's probably in your stack already and fills all the use-cases.

anotherhue · on Oct 18, 2021

Agreed. Redis pub/sub is underrated.

lmilcin · on Oct 18, 2021

Just because you can do it with Postgres doesn't mean it is the best tool for the job.

Sometimes the restrictions placed on the user are as important. Kafka presents a specific interface to the user that causes users to build their applications in certain way.

While you can replicate almost all functionality of Kafka with Postgres (except for performance, but hardly anybody needs as much of it), we all know what we end up with when we set up Postgres and use it to integrate applications with each other.

If developers had discipline they could of course crate tables with appendable logs of data, marked with a partition, that consumers could process from with basically same guarantees as with Kafka.

But that is not how it works in reality.

mrweasel · on Oct 18, 2021

We work with a client who has requested a Kafka cluster. They can’t really say why, or what they need it for, but the now have a very large cluster which doesn’t do much. I know why they want it, same reason why they “need” Kubernetes. So far they use it as a sort of message bus.

It’s not that there’s anything wrong with Kafka, it’s a very good product and extremely robust. Same with Kubernetes, it has it uses and I can’t fault anyone for having it as a consideration.

My problem is when people ignore how capable modern servers are, and when developers don’t see the risks in building these highly complex systems, if something much simpler would solve the same problem, only cheaper and safer.

sparsely · on Oct 18, 2021

I've also had much more success with "queues in postgres" than "queues in kafka". I'm sure the use cases exist where kafka works better, but you have to architect it so carefully, and there are so many ways to trip up. Whereas most of your team probably already understands a RDMS and you're probably already running one reliably.

gilbetron · on Oct 18, 2021

What velocity have you achieved with postgres, in terms of # messages/sec where messages could range between 1KB-100KB in size?

lima · on Oct 18, 2021

Yup. That's what Kafka excels at and where it scales to throughput way beyond Postgres.

But there are many many projects where Kafka is used for low value event sourcing stuff where a SQL DB could be easier.

afandian · on Oct 18, 2021

I had a great time with Kafka for prototyping. Being able to push data from a number of places, have mulitple consumers able to connect, go back and forth though time, add and remove independent consumer groups. Ran in pre-production very reliably too, for years.

But for a production-grade version of the system I'm going with SQL and, where needed, IaC-defined SQS.

aqme28 · on Oct 18, 2021

> Show me that you can't do what you need with a beefy Postgres.

I've found this question very useful when pitched any esoteric database.

merarischroeder · on Oct 19, 2021

I agree so much I wrote an article about that https://link.medium.com/eQqVhuvRtkb

"You really can replace Kafka with a database."

ceencee · on Oct 18, 2021

I see posts like this a lot, and it makes me wonder what the heck you were using Kafka for that Postgres could handle, yet you had dozens of clusters? I question if you actually ever used Kafka or just operated it? Sure anyone can follow the "build a queue on a database pattern" but it falls over at the throughputs that justify Kafka. If you have a bunch of trivial 10tps workloads, of course a distributed system is overkill.

Cederfjard · on Oct 18, 2021

They didn’t say that the Kafka clusters they personally ran could have been handled with Postgres instead.

They first gave their credentials by mentioning their experience.

Then they basically said ”given what I know about Kafka, with my experience, I require other people who ask for it to show me that they really need it before I accommodate them - often a beefy Postgres is enough”.

anotherhue · on Oct 18, 2021

A major ecommerce site. We had hundreds of thousands of messages/s but for most use cases YAGNI.

alephu5 · on Oct 18, 2021

Why Postgres? Why not redis, rabbitMQ or even Kafka itself?

hardwaresofton · on Oct 18, 2021

Not those other tools because you can't achieve Postgres functionality with those other tools generally.

Postgres can pretend to be Redis, RabbitMQ, and Kafka, but redis, RabbitMQ, and Kafka would have a hard time pretending to be Postgres.

Postgres has the best database query language man has invented so far (AFAIK), well reasoned persistence and semantics, and as of recently partitioning features to boot and lots of addons to support different usecases. Despite all this postgres is mostly Boring Technology (tm) and easily available as well as very actively developed in the open, with a enterprise base that does consulting first and usually upstreams improvements after some time (2nd Quadrant, EDB, Citus, TimescaleDB).

The other tools win on simplicity for some (I'd take managing a PostgreSQL cluster over RMQ or Kafka any day), but for other things especially feature wise Postgres (and it's amalgamation of mostly-good-enough to great features) wins IMO.

LoriP · on Oct 18, 2021

Your comment on Boring Technology (tm) made me smile... At Timescale as you'll see in this blog we are happy to celebrate the boring, there's an awful lot to be said for it :) https://blog.timescale.com/blog/when-boring-is-awesome-build...

For transparency: I'm Timescale's Community Manager

capableweb · on Oct 18, 2021

> My answer to anyone who asks for kafka: Show me that you can't do what you need with a beefy Postgres.

...

You could probably hack any database to perform any task, but why would you? Use the right tool for the right task, not one tool for all tasks.

If the right tool is a relational database, then use Postgres/$OTHER-DATABASE

If the right tool is distributed, partitioned and replicated commit log service, then use Kafka/$OTHER-COMMIT-LOG

Not sure why people get so emotional about the technologies the know the best. Sure you could hack Postgres to be a replicated commit log, but I'm sure it'll be easier to just throw in Kafka instead.

chartpath · on Oct 21, 2021

Something I find peculiar about this truth is that Postgres itself is based on the write ahead log structure for sequencing statements, replication, audit logging.

It feels like two different lenses onto the same reality.

silisili · on Oct 18, 2021

Can you kinda high level the setup/processes for making Postgres a replacement for Kafka? I've not attempted such a thing before, and wonder about things like expiration/autodeletion, etc. Does it need to be vacuumed often, and is that a problem?

dionian · on Oct 18, 2021

Honest question, how do you expire content in Postgres? Every time I start to use it for ephemeral data I start to wonder if I should have used TimescaleDB or if I should be using something else...

LoriP · on Oct 18, 2021

You likely already know that TimescaleDB is an extension to PostgreSQL, so you get everything you'd get with PostgreSQL plus the added goodies of TimescaleDB. All that said, you can drop (or detach) data partitions in PostgreSQL (however you decide to partition...) Does that not do the trick for your use case, though? https://www.postgresql.org/docs/10/ddl-partitioning.html

Transparency: I work for Timescale

chartpath · on Oct 21, 2021

Maybe something like https://github.com/citusdata/pg_cron or if it's not a rolling time window, just trigger functions that get called whenever some expression in SQL is true.

mvc · on Oct 18, 2021

Record an event log and reliably connect it to a variety of 3rd party sinks using off-the-shelf services

bsaul · on Oct 18, 2021

using a sql db for push/pop semantic feels like using a hammer to squash a bug.. How would you model queues & partitions with ordering guarantees with pg ?

throwaway81523 · on Oct 18, 2021

With transactions, and stored procedures if that helps ;). Redis also seems well suited to the use cases I've seen for Kafka. Kafka must have capabilities beyond those use cases, and I've sometimes wondered what they are.

anotherhue · on Oct 18, 2021

sql has many conveniences for doing so, it wouldn't be much work.

> using a hammer to squash a bug..

Agreed - but Kafka is a much much bigger hammer. SES/Az Queues are also good choices.

SergeAx · on Oct 19, 2021

Read 500m messages in less than an hour?

zbentley · on Oct 20, 2021

It depends. Are your reads also writes (e.g. simulating a job queue with a "claimed" state update or a "select for update ... skip locked")? If not, 500m should be no sweat, even on crappy hardware with minimal tuning. If so, things get a little trickier, and you may have to introduce other tools or a clustering solution to get that throughput with postgres.