cauchyk's comments

cauchyk · on Jan 30, 2024

We've been using the Ubicloud runner for a while at PeerDB[1]. Great value and specially the ARM runners have been helpful to get our CI costs down. The team is really responsive and added the arm runner support within a few weeks of us requesting it.

[1] https://github.com/PeerDB-io/peerdb

cauchyk · on Nov 6, 2023

Author of the blog here: I'm a huge proponent of Rust, but in this project while interfacing with BigQuery and Snowflake go proved to be the right choice. There weren't official drivers for these in Rust and also generally onboarding new devs in Go was easier. I also personally think async in rust has a few more releases to go before I would consider it stable.

cauchyk · on Nov 6, 2023

Not all of our code is in Go. PeerDB has multiple components: the workers, the UI and the query layer. Some of which is in Rust, Go and Typescript.

While it would certainly be possible to package them into individual binaries, I found it significantly easier to define the stack in a docker compose file with the requisite environment setup.

cauchyk · on Nov 6, 2023

Author of the blog here, curious what a better alternative would be in this context. The channel has to be passed around for the producer and consumer to interface with each other. Are there better patterns for this?

atombender · on Nov 6, 2023

Not the parent, but I personally dislike it when Go libraries use channels in their public APIs, as it forces a specific concurrency model on the consumer; in particular, channels are quite slow, being protected by an internal mutex, so you're always paying for the overhead no matter if you need it or not.

You also have to be very careful about managing the channel lifecycle. If you're not pulling (selecting from) the channel, the library will be permanently stuck. So you must now have a way to tell the library to stop sending, and it must cancel any in-flight send operations if you call producer.Stop() or whatever. In my experience libraries often have bugs in their channel code. It's far too easy to get deadlocks with channels that have interdependencies, and you have to be very careful about buffered versus unbuffered channels, as they behave differently.

A better API, in my opinion, is to offer a callback or single-method interface. Then the implementer of that callback or interface can choose to use channels internally if they desire, or they can use something else. You get the same backpressure support since you can treat it as synchronous.

After all, a channel's send interface is essentially just:

    type Channel[T any] interface {
        Send(T)
    }

But a "chan T" doesn't offer this flexibility.

My rule of thumb for channels is that they're goroutine glue, not an API primitive. Build APIs out of interfaces, not channels. The only thing that uses channels should be the one that's controlling the goroutines, because it's the thing that orchestrates them.

That said, it's not a hard rule. There are places where channels may have their place in a public API, though I'm not sure I can think of any examples off-hand.

foobiekr · on Nov 7, 2023

this breaks select to send and is a terrible reduction in capability.

you can always wrap channels to make them worse and less capable, but your API should expose the more capable option.

__turbobrew__ · on Nov 7, 2023

I think it is a matter of preference. For me personally I use raw channels and goroutines all day every day and I really like using them. Channels are a core primitive in golang so I think it is worth getting familiar with them.

As you say being able to select is really nice too.

tw1984 · on Nov 7, 2023

> as it forces a specific concurrency model on the consumer

I found your excuse above is really nonsense. when your program is in Golang, you've already picked side, the concerned concurrency model has already been chosen by the user.

we are not talking about one of random concurrency models, we are talking about channel based sychnronization and communication in golang, if you don't want that and consider it as an issue, you shouldn't be using golang in the first place.

skybrian · on Nov 6, 2023

Looks like the channel field is private in CDCRecordStream, but exposed by GetRecords. The callers mostly loop over Record objects. [1]

If I wanted to encapsulate iterating over a channel of Records, maybe it would be something like Go's io.Pipe function [2], which returns a PipeReader and PipeWriter? Except that it would work on Records rather than byte streams.

I don't have enough context to know if the extra encapsulation is a good idea in this case, though.

[1] https://github.com/search?q=repo%3APeerDB-io%2Fpeerdb%20GetR... [2] https://pkg.go.dev/io#Pipe

JyB · on Nov 6, 2023

Please see this great talk by Bryan C. Mills touching on the subject: https://youtu.be/5zXAHh5tJqQ?t=421

candiddevmike · on Nov 6, 2023

Why have consumers and producers vs doing it all in one goroutine, utilizing some kind of connection pool?

reactordev · on Nov 6, 2023

Because then you are consuming, or producing, you can’t do both at the same time. You are either reading from a stream of data, or you are writing it. Using goroutines to separate these allows you to do both at the same time, as soon as data is available on the channel or you receive the signal to stop.

cauchyk · on Nov 6, 2023

To get higher throughput we would need one goroutine to pull from the replication slot while the other is pushing to the target. The idea is to keep the Postgres connection useful and reading the slot while also pushing to the target asynchronously.

earthboundkid · on Nov 7, 2023

Use an iterator object that can use channels behind the scenes.

cauchyk · on Aug 7, 2023

At my previous job there were multiple use-cases where we used RabbitMQ in conjunction with Postgres. This would've made life so much easier.

Are there any instructions on installing? It'd be great if the `.so`s are released as part of GH releases.

chuckhend · on Aug 7, 2023

If you're on Linux and pg15, you can install it with https://github.com/tembo-io/trunk. It is only tested on pg15 but we are actively working to increase OS and Postgres version support. `trunk install pgmq` should install with the pg_config on your path by default.

If this does not work you can build from source. There is a guide in the contributing docs https://github.com/tembo-io/pgmq/blob/main/CONTRIBUTING.md

cauchyk · on July 27, 2023

Hi there, I’m Kaushik, one of the co-founders of PeerDB. PeerDB doesn’t handle schema changes today.

For CDC, change stream does give us events in case of schema changes, we would have to replay them on the destination. Schema changes on the destination are not supported, the general recommendation is to build new tables / views and let PeerDB manage the destination table.

For streaming the results of a query, as long as the query itself can execute (say a few columns were added or untouched columns were edited) mirror job will continue to execute. In case this is not the case, there will be some manual intervention needed to account for the schema changes.

Thanks for the question, this is a requested feature and on our roadmap.

jmg_ · on July 27, 2023

Calling out limitations like this in the documentation would go a long way in building confidence in the project. Better yet, if there's an example of how to deal with "day-2" operational concerns like this.

Simply looking at the docs on these two pages, its unclear to me whether there's a way to update the mirror definition when a schema change occurs or if I need to drop & recreate the mirror (and what the effects of this are in the destination):

- https://docs.peerdb.io/sql/commands/create-mirror

- https://docs.peerdb.io/usecases/Streaming%20Query%20Replicat...

All-in-all, very excited to see this project and will be watching it closely!

saisrirampur · on July 27, 2023

Thanks for the feedback and I agree on making these missing features more visible in our documentation! We did it here - https://docs.peerdb.io/usecases/Real-time%20CDC/postgres-to-... But will make it more visible soon i.e. in streaming query, cdc, CREATE MIRROR docs etc. We were thinking something on the lines of ALTER MIRROR or provide a new OPTION in CREATE MIRROR that will automatically pick up schema changes etc. Exact spec is not yet finalized.

cauchyk · on Jan 4, 2020

A lot of the power that flutter has comes from hot-reload. Dart + JIT powers this. I noticed that SwiftUI does this too. I wonder if SwiftUI composits and paints onto an underlying CALayer just like flutter does.

novok · on Jan 4, 2020

Swift UI hot reload i'm guessing is less flexible than flutter's

cauchyk · on Nov 7, 2018

Yes, i've been experiencing these rather frequently. I disabled all the extensions and it's still happening.