Also reading your reply I get the impression that "sync mechanism between redis and postgress" was the bottleneck. Wondering if you can add some details around it and also was this something that can't be fixed by fine tuning redis config, rather than completely removing it from your stack.
There were many problems but at the core of it, this was us having redis write huge amounts of data to disk very frequently causing this.
We could not reduce the frequency (product would not allow) and we couldn't find a way to make the writes reliably fast.
I like to believe there exists a possible way of handling this, but point being, our team had no time to find out, how redis works internally and have confidence that the new way won't bring up new surprises.
Also reading your reply I get the impression that "sync mechanism between redis and postgress" was the bottleneck. Wondering if you can add some details around it and also was this something that can't be fixed by fine tuning redis config, rather than completely removing it from your stack.