We've been using the Ubicloud runner for a while at PeerDB[1]. Great value and specially the ARM runners have been helpful to get our CI costs down. The team is really responsive and added the arm runner support within a few weeks of us requesting it.
Author of the blog here: I'm a huge proponent of Rust, but in this project while interfacing with BigQuery and Snowflake go proved to be the right choice. There weren't official drivers for these in Rust and also generally onboarding new devs in Go was easier. I also personally think async in rust has a few more releases to go before I would consider it stable.
Not all of our code is in Go. PeerDB has multiple components: the workers, the UI and the query layer. Some of which is in Rust, Go and Typescript.
While it would certainly be possible to package them into individual binaries, I found it significantly easier to define the stack in a docker compose file with the requisite environment setup.
Author of the blog here, curious what a better alternative would be in this context. The channel has to be passed around for the producer and consumer to interface with each other. Are there better patterns for this?
Not the parent, but I personally dislike it when Go libraries use channels in their public APIs, as it forces a specific concurrency model on the consumer; in particular, channels are quite slow, being protected by an internal mutex, so you're always paying for the overhead no matter if you need it or not.
You also have to be very careful about managing the channel lifecycle. If you're not pulling (selecting from) the channel, the library will be permanently stuck. So you must now have a way to tell the library to stop sending, and it must cancel any in-flight send operations if you call producer.Stop() or whatever. In my experience libraries often have bugs in their channel code. It's far too easy to get deadlocks with channels that have interdependencies, and you have to be very careful about buffered versus unbuffered channels, as they behave differently.
A better API, in my opinion, is to offer a callback or single-method interface. Then the implementer of that callback or interface can choose to use channels internally if they desire, or they can use something else. You get the same backpressure support since you can treat it as synchronous.
After all, a channel's send interface is essentially just:
type Channel[T any] interface {
Send(T)
}
But a "chan T" doesn't offer this flexibility.
My rule of thumb for channels is that they're goroutine glue, not an API primitive. Build APIs out of interfaces, not channels. The only thing that uses channels should be the one that's controlling the goroutines, because it's the thing that orchestrates them.
That said, it's not a hard rule. There are places where channels may have their place in a public API, though I'm not sure I can think of any examples off-hand.
I think it is a matter of preference. For me personally I use raw channels and goroutines all day every day and I really like using them. Channels are a core primitive in golang so I think it is worth getting familiar with them.
As you say being able to select is really nice too.
> as it forces a specific concurrency model on the consumer
I found your excuse above is really nonsense. when your program is in Golang, you've already picked side, the concerned concurrency model has already been chosen by the user.
we are not talking about one of random concurrency models, we are talking about channel based sychnronization and communication in golang, if you don't want that and consider it as an issue, you shouldn't be using golang in the first place.
Looks like the channel field is private in CDCRecordStream, but exposed by GetRecords. The callers mostly loop over Record objects. [1]
If I wanted to encapsulate iterating over a channel of Records, maybe it would be something like Go's io.Pipe function [2], which returns a PipeReader and PipeWriter? Except that it would work on Records rather than byte streams.
I don't have enough context to know if the extra encapsulation is a good idea in this case, though.
Because then you are consuming, or producing, you can’t do both at the same time. You are either reading from a stream of data, or you are writing it. Using goroutines to separate these allows you to do both at the same time, as soon as data is available on the channel or you receive the signal to stop.
To get higher throughput we would need one goroutine to pull from the replication slot while the other is pushing to the target. The idea is to keep the Postgres connection useful and reading the slot while also pushing to the target asynchronously.
If you're on Linux and pg15, you can install it with https://github.com/tembo-io/trunk. It is only tested on pg15 but we are actively working to increase OS and Postgres version support. `trunk install pgmq` should install with the pg_config on your path by default.
Hi there, I’m Kaushik, one of the co-founders of PeerDB. PeerDB doesn’t handle schema changes today.
For CDC, change stream does give us events in case of schema changes, we would have to replay them on the destination. Schema changes on the destination are not supported, the general recommendation is to build new tables / views and let PeerDB manage the destination table.
For streaming the results of a query, as long as the query itself can execute (say a few columns were added or untouched columns were edited) mirror job will continue to execute. In case this is not the case, there will be some manual intervention needed to account for the schema changes.
Thanks for the question, this is a requested feature and on our roadmap.
Calling out limitations like this in the documentation would go a long way in building confidence in the project. Better yet, if there's an example of how to deal with "day-2" operational concerns like this.
Simply looking at the docs on these two pages, its unclear to me whether there's a way to update the mirror definition when a schema change occurs or if I need to drop & recreate the mirror (and what the effects of this are in the destination):
Thanks for the feedback and I agree on making these missing features more visible in our documentation! We did it here - https://docs.peerdb.io/usecases/Real-time%20CDC/postgres-to-... But will make it more visible soon i.e. in streaming query, cdc, CREATE MIRROR docs etc. We were thinking something on the lines of ALTER MIRROR or provide a new OPTION in CREATE MIRROR that will automatically pick up schema changes etc. Exact spec is not yet finalized.
A lot of the power that flutter has comes from hot-reload. Dart + JIT powers this. I noticed that SwiftUI does this too. I wonder if SwiftUI composits and paints onto an underlying CALayer just like flutter does.
[1] https://github.com/PeerDB-io/peerdb