Hacker News new | past | comments | ask | show | jobs | submit login

That gp3 volume is extremely slow compared to a $100 NVMe drive. If each txn does a heap update, index update, wal write, and heap read, that's 4 IOs per txn right there (well, not for sequential IDs because you don't need to flush heap/index on every update). The volume gets 16k IOPS max, so that 2600-3400 txn/s is somewhat close to its capabilities assuming multiple IOs per txn. It's a little hard to find info, but latency of a gp3 volume is approximately 1 ms? That's going to limit you on WAL writes since they're synchronous. An NVMe drive that does say 20k read and 50k write IOPS at qd1 has 50 us read / 20 us write latency. A database should be more of a qd32 workload, so hundreds of thousands to millions of IOPS.

It's a single core, so no parallelism in the db itself. There's a fraction of the RAM my phone has, so that slow IO is more pronounced.

The basic implications of different keys and detailed look at the cache internals are valid and interesting, but the hardware is nothing like a server you'd want to run a database on, so the benchmark isn't very interesting. An iPhone is probably beefier in every way.




agree - network storage is slower than local NVMe - the choice was intentional for two reasons

1) the percentages would be different but the basic implications should hold true even with NVMe and 96 cores, as long as we scaled up the data size and workload

2) in addition making it a bit easier to demonstrate what we'd expect to see (there's not really anything surprising here), chose this setup because it's cheap so anyone else could replicate the exact results or play around with the scripts and try variations without having to spend much money. for example, someone on twitter was curious about uuidv7 in a text field - would be easy & cheap to try it out and see what happens - also could easily go to bigger hardware and local NVMe, changing client and row counts

20 years ago when i wanted to benchmark oracle RAC, i had to go out and buy dual-attach firewire drives and that was a hack because who wants to spend their personal vacation money on an old EMC clarion storage array from eBay [i might have bought personally an old sun server or two though!]

size results should be independent of hardware setup, but the perf results are specific to this setup, which is why the post includes detailed specs and scripts for transparency

also, FWIW, most production use cases for databases these days include some kind of high availability which means network involvement in the persistence path - so even when the database is on local NVMe, it's not uncommon to have a hot standby or patroni or something with sync replication


Yeah the high level information is good, and the buffer cache analysis is super neat. I haven't seen that kind of thing elsewhere. It's a great article to explain why performances differences exist. My list of gripes is probably more about Amazon marketing suggesting that something is big or high-performance or scalable when it's... not.

If you're something like a bank, you need synchronous replication, but a lot of use-cases would probably be fine with async with a couple ms RPO. Then again most people probably don't need more than a few thousand writes/second anyway. For banks, I worked on storage arrays at IBM ~10 years ago, and I think our synchronous replication was sub 100 us, but can't remember anymore.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: