You can generate the UUID in your application and insert it, UUIDv7 can be used in any UUID column in Postgres. UUIDv7 natively in Postgres are coming in v17: https://commitfest.postgresql.org/43/4388/.
The random insertion location of UUID4 causes bloat (at least index bloat) as it forces splitting leaves in order to add new entries, so you end up with a lot more partial leaves: with sequential insertion (sequences, uuid7) the leaves get filled and when a leaf is filled you append a new one, you don’t go and revisit leaves.
I love how the general tech blogosphere / Twitter is slowly coming around to the idea that random UUIDs are a terrible PK. I tried convincing people of this at my last job, to little interest. Maybe now that “thought leaders” are saying it, people will take notice.
Aside from maybe size and generation speed, I don't really see why a UUID is any worse than any other synthetic identifier. I don't really see generating a UUID being a limiting factor in DBs if they're designed for it, so what's your reasoning?
The problem is not "UUID" but "random UUID". The randomness makes it hard to index and query, due to missing locality. UUIDv7 is time-sortable, and so has locality. For details, see for example https://uuid7.com/ - the main reason (for me) is this part: "Since UUIDv7 is time-sortable, database indexing mechanisms can better optimize the storage and retrieval processes...".
I don't know. The leaking of information via time-based identifiers always strikes me as an annoying trade-off to work around an implementation detail in certain database engines. Sometimes you have to do it, but it shouldn't be a general rule.
UUIDs are globally unique (unique with such a high probability that in practise there are no collisions, if you follow the spec). But they are not cryptographically safe. So the time part is not a "leak" in this sense. If you need something cryptographically safe, then you would need to encrypt the UUID.
The time part is absolutely a disclosure of information. That can be fine, or it may not be. In the latter case that's called a leak. You need to consider that every time you decide to use a time-based identifier like UUIDv7.
UUIDv4 contains no information, so you cannot accidentally leak information through them.
UUIDs have nothing to do with cryptography. "Cryptographically safe" has no meaning when it comes to identifiers. Encrypting information-carrying identifiers like UUIDv7 is not a good way to hide the information, because you cannot rotate the key if/when it leaks. The moment your key leaks, all information in the identifier is public again.
There's only three ways to mitigate the risk of information-carrying identifiers:
* Don't have them. Use fully random identifiers instead.
* Don't share them. Use a different identifier in public (including in URLs).
* Carefully consider the information carried in the identifier to decide if the information can be made public.
If anything returns/uses the generated UUID where it is visible to the end user - which is typical, say in a UI or API - then you can know when that thing was created. Which could be a problem info leak wise.
You at worst pay cache coherency costs on the CPU for your indexes. That can add up, but on systems that are not write-heavy it won't matter that much in practice I suspect. Assuming a relational database model, even if you're inserting and then querying a bunch of rows at the same time in the same index/table, the rowid is what usually sorts rows on disk so you'll get locality of the bulk of your data regardless.
That being said, unless you have a specific reason to use a UUID over the standard incrementing integer, I wouldn't if for no other reason than dealing with UUIDs is visually complicated.
Depends on the database. Microsoft SQL Server typically uses the PK as the clustering index. This dictates the order the table data is stored on disk. If you row PK is random you're going to have write latency and a fragmented index.
Even in non-clustered indexes (relevant to pretty much all SQL implementations not just MS SQL Server) random keys can cause the index to balloon over time due to many page splits, requiring extra space or extra maintenance rounds (rearranging the indexes to optimise them occasionally, if the relevant data grows rapidly or changes often).
The effect is just worsened by clustering, as done in SQL Server for significant (or at least measurable) benefits for many data patterns, because your base data is bigger due to wasted space due to excess page splits, as well as your supplementary indexes, so that needs reorganising more often too.
A common answer is to use an int (or maybe bigint) for your internal keys and a UUID for anything external, so you have the benefits of a UUID for external use (practically zero chance of collision in distributed systems, if appropriately random, and not potentially leaking information in some security contexts) but the efficiently of the integer otherwise. Or to use a partially ordered UUID format, which balances the compromises slightly differently (benefit: single value, dropping some of the UUID issues, keeping some of the benefits; detriment: letting some of the disadvantages of UUIDs, potentially reducing some of the benefits).
The most deployed database these days is probably SQLite if you count semi-embedded contexts (ie. mobile apps). Excluding that it does look like MySQL is still on top, but Postgres is catching up fast. I completely forgot InnoDB uses clustered indices by default. Regardless, I was referring to most by implementations, not deployment numbers.
The index itself is also a b-tree (unless if it is fully in memory). You also have to consider writes and reads in this index. There, locality also matters. Sure, not as much as with the data itself. But I have seen cases where the index lookup (with randomly distributed UUIDs) was the bottleneck, and not reading the rows. Even thought the row where 1 KB on average.
I could maybe see how it helps with indexing performance, but in terms of lookups there’s typically not going to be correlation in terms of lookups based on record creation. Are you referring to indexing performance as locality or something else?
there’s typically not going to be correlation in terms of lookups based on record creation
Why would you think that? You think it's equally likely to need to look up a salesorder from ten years ago vs one from two weeks ago? Are Github issues from ten years ago read equally often as the ones created yesterday? Most systems have a long tail of historic data that's mainly kept as a reference.
SELECT * FROM forum_post WHERE created_at > $DATE AND created_at < $DATE
If you were using UUIDv7, you could even use that to get the created_at [0], although you'd likely lose the indexing unless you made a functional index on it, like this:
CREATE INDEX foo_ctime_idx ON foo(extract_timestamp_from_uuid_v7(uuid_v7_col))
First, if you can model your data using a natural key (singular or composite), and it doesn't unduly bloat the key, do so. This isn't always possible, of course (you wouldn't want to use an email address as a PK, as tempting as it may seem, since they're subject to change by users), but in many cases it is. Even better, for pre-existing data that's unlikely to change, like SKUs, you could pre-sort the data so you avoid any random page costs.
Barring that, design your API in such a way that you can use an incrementing integer without fear of leaking anything. At the very least, you can create an association table that maps an external ID (preferably UUIDv7 or ULID, as they're k-sortable) to the internal integer ID.
If you can't do either, use UUIDv7 or ULID. I prefer UUIDv7, especially since Postgres will be natively supporting their generation in an upcoming release (and because you can store them in RDBMS' native binary format), but either will do.
Step 1: Generate UUIDs using a highly predictable pattern
Step 2: Use the UUID as a security key - like saving a private file at files.example.com/12345678-1234-5678-1234-123456781234/private-file.pdf and assuming nobody will be able to download it without knowing the UUID
Step 3: Attacker predicts the UUID and downloads the private file.
Obviously the real solution here is to not use UUIDs as security keys - but dumbasses shoot towards their feet all the time, and UUIDv4 makes them hit less often.
By that logic serial-type ids are unsafe for use too, and much more vulnerable.
I was kind of expecting some claim that actually exploits the inclusion of a hardware identifier, rather than "don't use predictable numbers as secrets".
I'm aware of the contents of UUIDs, I just thought we were somehow past absolute bottom of the barrel "well this might be misused by someone who clearly has no idea what they're doing" before claiming things have caused "pwning".
Entering URLs wrong can also cause phishing breaches, that doesn't mean we stop using URLs does it?
Everyone, even experienced engineers can make mistakes. Calling that 'bottom of the barrel' isn't helpful.
Especially in a larger team you can't expect everyone to know fine details of every other part of the system. If you look up UUIDs you will (correctly) get the impression that they're mostly random, except some types aren't.
I would not expect every engineer to know to inspect the UUID and figure out its type and to know that this could have real consequences for guessability by external actors.
If your team is small enough that everyone can hold the entire system in their head at once, that's great, but that excludes many real world projects.
> If you look up UUIDs you will (correctly) get the impression that they're mostly random, except some types aren't.
Either we have very different ideas of what "mostly" means or you don't know what the different versions of UUID include as well as you think you do.
If we're considering the 5 published versions: 1 is built from random data; 2 are built from time & host data; and 2 are built from a namespace and a "name".
If we consider the 3 proposed additional versions: 1 is built from time & host; 1 is built from time & random data; and 1 is built from custom data.
So I personally wouldn't consider "most" to mean either 1 of 5 (20%) nor 2 of 7 (28.6%).
> I would not expect every engineer to know to inspect the UUID and figure out its type and to know that this could have real consequences for guessability by external actors.
I would expect someone who is writing software and encounters UUIDs to take the few minutes needed to understand the basics of what goes into the different versions, ensure they're being used correctly.
> If your team is small enough that everyone can hold the entire system in their head at once, that's great, but that excludes many real world projects.
They don't need to "hold the entire system in their head". They just need to (a) have the most basic understanding of what a UUID contains, and then (b) use it appropriately.
Yes and anyone who doesn't meet your expectations is 'bottom of the barrel' in your book, you've made it clear.
Sorry mate but not everyone gets to always work with the best of the best. I went through this exact exercise in the past and avoided v1 IDs because I suspected there is a risk they become externally exposed down the line, perhaps even years later when I'm gone.
I suppose you might decide otherwise and then blame others for incompetence instead if down the line someone failed to comprehend UUIDs like you so effortlessly do.
> anyone who doesn't meet your expectations is 'bottom of the barrel' in your book
Please try re-reading what I wrote because you clearly didn't comprehend it the first time.
I never called anyone bottom of the barrel.
I said the reason not to use them is "bottom of the barrel", as in, it's a poor excuse for not using something, based on blatant misuse. Like saying "people mis-type URLs all the time, we should get rid of them" or "people get electrical shocks when they stick utensils in power outlets, we should stop using electricity"
uuid_generate_v1mc() from the uuid-ossp extension uses a randomly generated mac, so no leakage and entropy is increased. The timestamp portion is a feature IMO, not a bug. It prevents index and page thrashing in Postgres. (Also doubles as a "created_at" field, but that's getting "tricky".
That's often an issue that people think attackers wouldn't got all the way because it's too cumbersome. "Reasonable" has a very imprecise definition.
> On a recent Code Assisted Penetration Test (CAPT) our team identified a vulnerability in a client's web application which allowed the consultant to bypass authentication and take over legitimate user accounts. The issue stemmed from the use of UUID version 1 in password reset tokens instead of the secure version 4 counterpart.
What about when those 6 bytes are randomly generated?
RFC 4122 does allow the MAC address in a version-1 (or 2) UUID to be replaced by a random 48-bit node ID, either because the node does not have a MAC address, or because it is not desirable to expose it.
Then it wouldn’t have that issue - as long as it isn’t stable/reused for too long.
The big problem - how can you tell for sure that is what is happening, and how would you catch a reversion if the underlying library changes behavior? Assuming you’re making a call into someone else’s stuff anyway.
A big challenge I’ve seen with UUID implementations is dependence on some kind of hidden system state that causes issues. like a MAC or nodeid file somewhere in a VM that gets cloned, resulting in duplicates where ‘duplicates should be impossible’.
It’s not just the writes. If you’re using a non-k-sortable item (like UUIDv4) as the PK, you’re throwing away every linear read-ahead that the kernel is doing for you, because the odds of what you need being on the same page is nil. When you’re paying for IOPS, like in a cloud, that’s even worse.
You’re also causing huge amounts of WAL bloat (unless you’re running ZFS and can thus safely disable full-page writes) [1].
And on a system with a clustered index (InnoDB, SQL Server), the performance and space bloat is even worse.
Thousands? In many scenarios I have worked with the number is millions or billions, but I do "data engineering".
Want to tag all traffic on your website? 1 billion visits = ~375 visits a second.
Some of the accounting systems I have worked with were raking in datasets ~300GB at a time that were almost purely compressed transactions.
Some of the companies I have worked for in retail have thousands of stores and a pretty constant volume, getting to that rowcount for some of their customer tracking stuff would easily blow that out.
I have some vendor webhooks that spike to above this number for 30 minutes at a time, multiple times a day. They use UUIDs for the event id you are supposed to use for event deduplication.
One project I worked on someone picked UUID for tracking AAA telco records. Then decided to make it the primary key (you can guess how well that worked out).
There are lots of uses for thousands per second. Usually in some sort of logging tracking application where you have have lots of processes and users.
UUIDs are nice for exposing them outside, such as:
1) exposing in URLs (you can't scrape all data by iterating over IDs)
2) passing them around between microservices/systems (less confusion where an ID comes from when debugging/doing tech support, because you can check the ID in a few tables and be 100% sure that's exactly what you are looking for, because IDs are globally unique, unlike bigints)
3) useful in situations when data from several servers or DB shards is eventually aggregaged in one place (for example, for analytics) - with bigints you'd have collisions
...which means you need a round-trip to the database before doing anything else. That's latency you can avoid with client-side predictable (or determined) UUID's.
Whether you should do anything else before the data has been persisted is a totally different discussion.
Unless you’re doing some wild async YOLO, you’re already waiting for the DB to send back success; adding a max of 8 bytes into that is highly likely to fit into the same TCP packet and thus add little to no overhead.
If you are doing async writes, as you alluded to, why bother with RDBMS in the first place? ACID is out the window.
- insert/then retrieve ID can easily result in duplicate records in some edge cases, and won’t necessarily be able to be easily fixed either. the inserted record doesn’t have a global ID until it’s inserted.
Can this generally be fixed using good transactions semantics? Yes usually. But it’s expensive. And in many cases you’ll have to default to failing writes instead of eventually consistent behavior.
- CRDT type behavior works better when things have a known valid unique ID from the get go. insert/update/ignore can happen quickly and easily without two way communication and in bulk, and edge cases have more easily modelable ‘eventual consistency’.
- generating unique IDs in the database forces serialization of certain processes in the database, which can cause scaling issues and high latency.
For instance, using the DB to create unique request IDs for web or API requests? Asking for problems.
Generating UUIDs for them at request time, and then putting those IDs where needed when later correlation/tracking is desirable? Much better.
Same can apply for other object ID creation, when there aren’t other natural keys that need to be checked first.
If you mean client like client, and not application server, don't do that (assuming you want to use uuidv7). Client generated IDs mean they can lie in the timestamp bits. Only use client generated ids as an idempotency key.
This [0] is what I used months ago to test this very thing (although missing UUIDv7, but I’ve done more casual tests and agree with the findings); it’s not quite bare metal, but Proxmox in my homelab is unlikely to suffer much, if any stealing.
Results here [1].
Note that this used \COPY instead of INSERT so it bypasses a good bit of logic normally present, but the difference remains stark. Random page hits on a B+tree are always going to suck.
c7a servers have dedicated CPUs. Personally I like seeing benchmarks on AWS instances because anyone can reproduce them without needing to buy the same hardware. The virtualization overhead is basically nil; what’s not to like?
That gp3 volume is extremely slow compared to a $100 NVMe drive. If each txn does a heap update, index update, wal write, and heap read, that's 4 IOs per txn right there (well, not for sequential IDs because you don't need to flush heap/index on every update). The volume gets 16k IOPS max, so that 2600-3400 txn/s is somewhat close to its capabilities assuming multiple IOs per txn. It's a little hard to find info, but latency of a gp3 volume is approximately 1 ms? That's going to limit you on WAL writes since they're synchronous. An NVMe drive that does say 20k read and 50k write IOPS at qd1 has 50 us read / 20 us write latency. A database should be more of a qd32 workload, so hundreds of thousands to millions of IOPS.
It's a single core, so no parallelism in the db itself. There's a fraction of the RAM my phone has, so that slow IO is more pronounced.
The basic implications of different keys and detailed look at the cache internals are valid and interesting, but the hardware is nothing like a server you'd want to run a database on, so the benchmark isn't very interesting. An iPhone is probably beefier in every way.
agree - network storage is slower than local NVMe - the choice was intentional for two reasons
1) the percentages would be different but the basic implications should hold true even with NVMe and 96 cores, as long as we scaled up the data size and workload
2) in addition making it a bit easier to demonstrate what we'd expect to see (there's not really anything surprising here), chose this setup because it's cheap so anyone else could replicate the exact results or play around with the scripts and try variations without having to spend much money. for example, someone on twitter was curious about uuidv7 in a text field - would be easy & cheap to try it out and see what happens - also could easily go to bigger hardware and local NVMe, changing client and row counts
20 years ago when i wanted to benchmark oracle RAC, i had to go out and buy dual-attach firewire drives and that was a hack because who wants to spend their personal vacation money on an old EMC clarion storage array from eBay [i might have bought personally an old sun server or two though!]
size results should be independent of hardware setup, but the perf results are specific to this setup, which is why the post includes detailed specs and scripts for transparency
also, FWIW, most production use cases for databases these days include some kind of high availability which means network involvement in the persistence path - so even when the database is on local NVMe, it's not uncommon to have a hot standby or patroni or something with sync replication
Yeah the high level information is good, and the buffer cache analysis is super neat. I haven't seen that kind of thing elsewhere. It's a great article to explain why performances differences exist. My list of gripes is probably more about Amazon marketing suggesting that something is big or high-performance or scalable when it's... not.
If you're something like a bank, you need synchronous replication, but a lot of use-cases would probably be fine with async with a couple ms RPO. Then again most people probably don't need more than a few thousand writes/second anyway. For banks, I worked on storage arrays at IBM ~10 years ago, and I think our synchronous replication was sub 100 us, but can't remember anymore.
> One of the major differences between C7a instances and the previous generations of instances, such as C6a instances, is their vCPU to physical processor core mapping. Every vCPU on a C7a instance is a physical CPU core. This means there is no Simultaneous Multi-Threading (SMT). By contrast, every vCPU on prior generations such as C6a instances is a thread of a CPU core.
So your "vCPU" is not an SMT thread sharing a core with some other "vCPU". Of course you're still sharing the rest of the CPU, so things like memory bandwidth and presumably upper level cache, and you might be affected by thermal load from other cores? idk whether that kind of thing applies in server contexts.
It is, however, sharing the memory bandwidth and L3 cache. Which is often a big factor for things like inserting millions of table rows. Benchmarks should be run on an idle bare-metal machine, or you can't really compare results.
FWIW, this was why the benchmark was executed on two different servers, and repeated three times on each server. The blog includes a graph of TPS at 5-second intervals for all 24 runs and the results are very tight and consistent. I think that specific graph gives reasonable confidence in the reliability of these numbers.
Yeah, agree. I had ninja edited that in there but not in time I guess. And as I posted elsewhere, the disk they're using is very slow. And ideally your access patterns are more similar to what the other commenter had with COPY, but that's a whole different issue.
AWS will happily sell you bargain bin performance at enterprise prices if you don't understand what you're buying. There's a reason they're almost a $2T company.
My understanding is that vCPUs are mapped to hyperthreads or (in this case) physical cores (note: not whole CPU packages). As some noted there’s shared L3 cache, but AFAIK this would not be “steal” as the kernel presents it.
Nanoid’s performance is likely to be on-par with UUIDv4, if not worse. While it’s smaller in ASCII, Postgres and MySQL (when coerced) both store UUIDs as 16 binary characters, well below the 21 of Nanoid.
And in any case, the primary driver of the poor performance is from k-sortability due to page spread. The only alternative to UUIDv? that would do well is ULID, as it’s lexicographically sortable.
DigitalOcean and AWS RDS both support the https://www.postgresql.org/docs/16/uuid-ossp.html extension but it mentions v5 is the highest supported UUID version.