A UUID v4 is a 128 bit number, but 4 bits are reserved to specify the version number and 2 more bits are reserved to specify the variant, which leaves 122 bits of randomness. That means it can take on 5 x 10^36 possible values. Following the birthday math, you'd have to generate about 103 trillion UUIDs to have a one-in-a-million chance of having a collision.
The heuristic commonly used for things that matter (e.g. high-security/high-assurance systems) is that the probability of collision should be less than 2^-32 assuming uniform distribution[0]. From this you can compute that the largest set of keys that can be used with UUIDv4 that satisfies this constraint is roughly 100 trillion.
This is a pretty high limit that will work for most applications. Some large data models can exceed this number of records, so you can't use probabilistic UUID naively in these cases e.g. one for every unique record. In data models that approach the 100T limit, UUID-like identifiers are typically generated deterministically to avoid this issue entirely.
[0] There have been many cases of UUIDv4 systems breaking in the wild because people naively or accidentally use weak entropy sources. This turns out to be a recurring footgun such that use of UUIDv4 is prohibited in some applications because you can't rely on people to implement it properly.
UUID v4 has 122 random bits giving 2^122 possible values (~5.3×10^36). Using the birthday paradox, you'd need to generate about 2^61 UUIDs (~2.3×10^18) for a 50% collision probability, which is well beyond any practical system's capacity.
> ...which is well beyond any practical system's capacity.
Well beyond a single server but not single systems. Large analytical data models run into the tens of exabytes in practical systems already. It isn't hypothetical and probabilistic identifiers become inadvisable[0] in those systems.
Not everything is a web app and UUIDv4 does not scale infinitely.
[0] Technically you can use probabilistic identifiers if you widen them beyond 128-bits, but at that scale the compactness of unique identifiers has a large impact on performance and cost, so 128-bit deterministic identifiers are preferable.
I can’t see teams adopting Unison (or similar languages) without a way to store code in Git.
Maybe the editor can load text and do structured editing. Maybe the runtime can send functions across the network. Great. But not using Git for storage and review is just too alien for most teams to even consider.
Flink SQL is quite limited compared to Feldera/DBSP or Frank’s Materialize.com, and has some correctness limitations: it’s “eventually consistent” but until you stop the data it’s unlikely to ever be actually correct when working with streaming joins. https://www.scattered-thoughts.net/writing/internal-consiste...
There has to be some change in the code, and they will not share the same semantics (and perhaps won't work when retractions/deletions also appear whilst streaming). And let's not even get to the leaky abstractions for good performance (watermarks et al).
My stance has always been to lean on the available tools to free up time to work on the more interesting problems that deliver value to the organisation / company. Has been a good strategy to date.
Sadly, the current environment does not reflect that in my experience. There is a vicious focus on keeping profit margins at a steady rate at all costs while slashing spend on tooling which requires re-work on solved problems. :/
At some point the music is going to stop and it's not going to be pretty I suspect. :(
organizations that don't trust their engineers to work towards delivering value (by using better tooling, efficiency increasing automation etc), means that they don't improve and is accepting the current status quo.
That's why you need to keep an eye out, and smell whether the management understands it or not. Plan to leave, as your value contribution will not give you back the reward that such contributions deserve in this type of organization.