Postgres can have some write-amplification issues for update-heavy loads where a...

anarazel · on June 4, 2017

Not trying to doubt your experience, just collecting datapoints for improvements. I'd appreciate if you could describe what your problems around write amplification are. Is it

- heap bloat

- index bloat

- CPU overhead of index manipulations

- Write-ahead-log overhead of updates

- increased overhead / frequency of VACUUM

- total amount of disk writes

Thanks!

Edit: formatting.

carbocation · on June 4, 2017

This is a great question. I have to say I haven't profiled Postgres to that level of granularity, so I am making an assumption about why my updates are slow. What I notice is relatively slow UPDATE performance, with high total disk writes and a write-ahead-log that seems to be large. The bloat on disk isn't an issue for me (or, rather, I have started using lower fillfactors to reduce the variability in DB size as the rows are updated).

Do you have a favored resource that you can direct me to that would help me identify the factors that you cited and ultimately allow me to make more nuanced criticisms (or to find a fixable error in my approach, which would of course be the ideal result)?

anarazel · on June 4, 2017

> I have to say I haven't profiled Postgres to that level of granularity, so I am making an assumption about why my updates are slow. What I notice is relatively slow UPDATE performance, with high total disk writes and a write-ahead-log that seems to be large.

Hm, that could be quite independent things. Could you check what your checkpoint_segments (< 9.5)/max_wal_size & checkpoint_timeout settings are? Having very frequent checkpoints will trigger a lot of superfluous writes; both the checkpointing itself and increased percentage of full page writes (to prevent torn pages).

When you say your write-ahead-log is large, do you mean the pg_xlog directory, or the amount of writes to it?

Another common reason for slowness can be foreign keys, if you've not indexed both sides of it. E.g. if there's a foreign key referencing the table you're updating, and the referencing columns aren't indexed, you'll end up with seqscans...

> Do you have a favored resource that you can direct me to that would help me identify the factors that you cited and ultimately allow me to make more nuanced criticisms (or to find a fixable error in my approach, which would of course be the ideal result)?

I'd strongly suggest enabling log_checkpoints - that'll log information about every checkpoint, allowing at least some diagnosis.

Another useful thing would be to run pg_xlogdump --stats, that'd give you information about what takes up space in your WAL.

I'd also look at EXPLAIN (ANALYZE, BUFFERS) one_of_your_updates; - that'll tell you a bit more about the timing, including trigger runtime.

Edit: brevity