I've discovered by default, CloudRun only allocates one core per HTTP request, and exclusively during request execution [0].
So your app runs at 0 CPU when there is no requests ongoing, and can't use more than 1 processes in the same container. It means you can't have a container with nginx+rails, as only nginx will have a core to execute, rails has no core and it leads to a timeout.
Maybe you should ensure your app is not trying to use a second process?
Only allocating CPU while a request is in progress, and thus only paying for that time, is cloud run's defining feature, so that part makes sense.
But the problems with having multiple processes are weird. I'd have expected patterns where the http server blocks while waiting for a helper process to work. That sounds like a rather annyoing limitation, and I saw no mention of it in the documentation.
Can you double check if there is really such a limitation and not a misattribution?
Maybe our ORM or Postgres connector runs a process. We ourselves are not directly running multiple processes.
This feels like a leaky abstraction that defeats the purpose of container-as-service especially when the intended audience is hobbyists / small teams who don't have a networking SRE, much less keeping track of fragile infrastructure assumptions.
If we have to stick with GCP, then it seems GCE VMs are a safer bet.
He actually called, and held regular "defense councils" (with generals, etc.) to deal with the Covid situation.
Was it because they are less compromised by industry than public health officials? Or just because it's popular with voters? Your guess is as good as mine.
If you can afford a one off 1 second of latency for your SQL queries, then using logical replication with pgbouncer seems way easier :
- setup logical replication between the old and the new server (limitations exist on what is replicated, read the docs)
- PAUSE the pgbouncer (virtual) database. Your app will hang, but not disconnect from pgbouncer
- Copy the sequences from the old to new server. Sequences are not replicated with logical replication
- RESUME the pgbouncer virtual database.
You're done. If everything is automated, your app will see a temporary increase of the SQL latency. But they will keep their TCP connections, so virtually no outage.
You can temporarily reduce query timeout to a smaller setting as part of the automated failover. The long running transactions will fail but you can minimize the window where you can't talk to postgres
Not really, new connections will block as it's pausing. But you won't be able to shut down Postgres until those long queries complete. Perhaps I was not super clear, but what I'm trying to say is that PAUSE is not instantaneous.
yeah what I'm saying is that you can only pause as fast as your slowest currently initiated query. So if you have a diverse set of query patterns, you could be waiting for a really small percentage of small queries to wrap up.
To be fair about this page, this was used to migrate versions of postgres __prior__ to the introduction of logical replication. Logical replication makes this significantly easier (ie you no longer need the triggers)
Exactly this. The OP’s approach reminded me so much of the days of Slony, and I wondered why a simpler approach with logical replication would not just suffice.
Rather than pgbouncer, I did this in the actual application code once (write to both databases at the same time, once everything is in sync and you’re confident the new server works well, fail over to the new one only), but it depends upon how much control you can exercise over the application code.
Any approach that is based on triggers makes me shiver, however.
This is precisely the migration I'm planning on doing in the next few weeks with pglogical under the hood for replication. Seems like the atomic switch is much easier than any sort of problem that could stem from conflict or data duplication errors while in a bi-directional replication strategy.
Yep, you can also prepare the new database by using a snapshot of the primary's volume, and use pg-rewind to get them in sync. Altogether the tooling can make migrations super easy without minimal downtime.
I use pgbouncer and had no idea it supported logical replication. I cant find anything about it in the docs. Do you have something you can link me to to read more?
Which is only possible if you are using a version of postgres which is new enough, and isn't restricted, such as some versions of RDS. Which, explains the whole original post.
So your app runs at 0 CPU when there is no requests ongoing, and can't use more than 1 processes in the same container. It means you can't have a container with nginx+rails, as only nginx will have a core to execute, rails has no core and it leads to a timeout.
Maybe you should ensure your app is not trying to use a second process?
[0] https://cloud.google.com/run/docs/configuring/cpu-allocation