Hacker News new | past | comments | ask | show | jobs | submit login

It's great to see this type of evolution in database performance, maybe this will become the standard someday.

The only thing missing is that there isn't a comparable project for MySQL.




Not trying to be a snoot here, but what is the point of using MySQL today? I can't think of something that I'm aware it does better than Postgres.


Interoperating with code and data files that were made for/by MySQL.


Code I can understand, data a bit less. I'm just thinking that no MySQL program which has already been written is going to have any reliance on the performance characteristics of this flash-based database system, since it wasn't a thing when they were written.

If you could change the shape of your tables to suit this approach, then surely you could adopt Postgres.


Postgres can have some write-amplification issues for update-heavy loads where at least one of the updated fields is covered by an index. Famously, this triggered Uber to switch to MySQL. Its impact is noticeable in one of my projects as well -- and it isn't that large (a few dozen million rows).


Not trying to doubt your experience, just collecting datapoints for improvements. I'd appreciate if you could describe what your problems around write amplification are. Is it

- heap bloat

- index bloat

- CPU overhead of index manipulations

- Write-ahead-log overhead of updates

- increased overhead / frequency of VACUUM

- total amount of disk writes

Thanks!

Edit: formatting.


This is a great question. I have to say I haven't profiled Postgres to that level of granularity, so I am making an assumption about why my updates are slow. What I notice is relatively slow UPDATE performance, with high total disk writes and a write-ahead-log that seems to be large. The bloat on disk isn't an issue for me (or, rather, I have started using lower fillfactors to reduce the variability in DB size as the rows are updated).

Do you have a favored resource that you can direct me to that would help me identify the factors that you cited and ultimately allow me to make more nuanced criticisms (or to find a fixable error in my approach, which would of course be the ideal result)?


> I have to say I haven't profiled Postgres to that level of granularity, so I am making an assumption about why my updates are slow. What I notice is relatively slow UPDATE performance, with high total disk writes and a write-ahead-log that seems to be large.

Hm, that could be quite independent things. Could you check what your checkpoint_segments (< 9.5)/max_wal_size & checkpoint_timeout settings are? Having very frequent checkpoints will trigger a lot of superfluous writes; both the checkpointing itself and increased percentage of full page writes (to prevent torn pages).

When you say your write-ahead-log is large, do you mean the pg_xlog directory, or the amount of writes to it?

Another common reason for slowness can be foreign keys, if you've not indexed both sides of it. E.g. if there's a foreign key referencing the table you're updating, and the referencing columns aren't indexed, you'll end up with seqscans...

> Do you have a favored resource that you can direct me to that would help me identify the factors that you cited and ultimately allow me to make more nuanced criticisms (or to find a fixable error in my approach, which would of course be the ideal result)?

I'd strongly suggest enabling log_checkpoints - that'll log information about every checkpoint, allowing at least some diagnosis.

Another useful thing would be to run pg_xlogdump --stats, that'd give you information about what takes up space in your WAL.

I'd also look at EXPLAIN (ANALYZE, BUFFERS) one_of_your_updates; - that'll tell you a bit more about the timing, including trigger runtime.

Edit: brevity




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: