Hacker Newsnew | past | comments | ask | show | jobs | submit | rienbdj's commentslogin

Nice visual introduction to combinators.


How many values can a UUID v4 take?

How many do you have to generate before a collision becomes a remote possibility?


A UUID v4 is a 128 bit number, but 4 bits are reserved to specify the version number and 2 more bits are reserved to specify the variant, which leaves 122 bits of randomness. That means it can take on 5 x 10^36 possible values. Following the birthday math, you'd have to generate about 103 trillion UUIDs to have a one-in-a-million chance of having a collision.


The question becomes, how bad is your random.

If your random is not uniformly distributed, you might get duplication from bias.

If your random is setup wrong and you get the same seeding multiple times, you might get duplication from that.

If your random is good, the birthday math should hold.


UUIDv4 has 2^122 values.

The heuristic commonly used for things that matter (e.g. high-security/high-assurance systems) is that the probability of collision should be less than 2^-32 assuming uniform distribution[0]. From this you can compute that the largest set of keys that can be used with UUIDv4 that satisfies this constraint is roughly 100 trillion.

This is a pretty high limit that will work for most applications. Some large data models can exceed this number of records, so you can't use probabilistic UUID naively in these cases e.g. one for every unique record. In data models that approach the 100T limit, UUID-like identifiers are typically generated deterministically to avoid this issue entirely.

[0] There have been many cases of UUIDv4 systems breaking in the wild because people naively or accidentally use weak entropy sources. This turns out to be a recurring footgun such that use of UUIDv4 is prohibited in some applications because you can't rely on people to implement it properly.


UUID v4 has 122 random bits giving 2^122 possible values (~5.3×10^36). Using the birthday paradox, you'd need to generate about 2^61 UUIDs (~2.3×10^18) for a 50% collision probability, which is well beyond any practical system's capacity.


> ...which is well beyond any practical system's capacity.

Well beyond a single server but not single systems. Large analytical data models run into the tens of exabytes in practical systems already. It isn't hypothetical and probabilistic identifiers become inadvisable[0] in those systems.

Not everything is a web app and UUIDv4 does not scale infinitely.

[0] Technically you can use probabilistic identifiers if you widen them beyond 128-bits, but at that scale the compactness of unique identifiers has a large impact on performance and cost, so 128-bit deterministic identifiers are preferable.


A uuid has 122 bits of payload.

Depends what you consider “a remote possibility” to be (the birthday attack wiki page has a table for various p and powers of 2)


I was hoping for a software solution. The content would be easily readable as a plain HTML page.


Reminds me of

I’m Feeling Lucky -> bad result -> Google search is useless


1. I would say that nobody did that, so you are making up a straw man

2. The Copilot or ChatGPT or Claude "Ask" buttons should then be renamed to "I'm feeling lucky". And that would be the only button available.


Yeah except Feeling Lucky is the only button you can press and people blame you if they got lucky


Export a requirements.txt from UV and analyze that?


I can’t see teams adopting Unison (or similar languages) without a way to store code in Git.

Maybe the editor can load text and do structured editing. Maybe the runtime can send functions across the network. Great. But not using Git for storage and review is just too alien for most teams to even consider.


Seems more like activism than a company to me.


business is just market activism


A new McSharry post! Excellent

Last I checked, VMWare had moved away from differential datalog?


The Differential Datalog team founded Feldera: https://www.feldera.com/

They switched from differential Datalog to differential SQL, I think because they realized Datalog is a really tough sell.


They did, and their product is great.

It is the only database/query engine that allows you to use the same SQL for both batch and streaming (with UDFs).

I have made an accessible version of a subset of Differential Dataflow (DBSP) in Python right here: https://github.com/brurucy/pydbsp

DBSP is so expressive that I have implemented a fully incremental dynamic datalog engine as a DBSP program.

Think of SQL/Datalog where the query can change in runtime, and the changes themselves (program diffs) are incrementally computed: https://github.com/brurucy/pydbsp/blob/master/notebooks/data...


> It is the only database/query engine that allows you to use the same SQL for both batch and streaming (with UDFs).

Flink SQL also checks that box.


Flink SQL is quite limited compared to Feldera/DBSP or Frank’s Materialize.com, and has some correctness limitations: it’s “eventually consistent” but until you stop the data it’s unlikely to ever be actually correct when working with streaming joins. https://www.scattered-thoughts.net/writing/internal-consiste...


Not true.

There has to be some change in the code, and they will not share the same semantics (and perhaps won't work when retractions/deletions also appear whilst streaming). And let's not even get to the leaky abstractions for good performance (watermarks et al).


Companies prefer growing revenue to cutting costs typically.

Don’t will depend if there are many projects with a good outlook sitting around.


My stance has always been to lean on the available tools to free up time to work on the more interesting problems that deliver value to the organisation / company. Has been a good strategy to date.

Sadly, the current environment does not reflect that in my experience. There is a vicious focus on keeping profit margins at a steady rate at all costs while slashing spend on tooling which requires re-work on solved problems. :/

At some point the music is going to stop and it's not going to be pretty I suspect. :(


organizations that don't trust their engineers to work towards delivering value (by using better tooling, efficiency increasing automation etc), means that they don't improve and is accepting the current status quo.

That's why you need to keep an eye out, and smell whether the management understands it or not. Plan to leave, as your value contribution will not give you back the reward that such contributions deserve in this type of organization.


The git model is great but the interface is not carefully designed.

I think if the interface were better, fewer people would be copy pasting git commands.


There's UIs for git. People seem to be embarrassed about using them, but I've always liked them.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: