A lot of your comment went over my head, but I have modeled relational data in a way conducive to being stored in a Merkle tree. The trick being that every entity in the system ended up having two IDs. A hash ID, identifying this specific version of the object, and an entity ID (probably a UUID or OID), which remained constant as new versions were added. In a situation where people can have friends that are also people, they are friends with the person long-term, not just a specific version of the friend, so in that case you'd use the entity IDs. Though you might also include a reference to the specific version of the person at the point in time at which they became friends, in which case they would necessarily reference an older version of that person. If you friend yourself, you're actually friending that person a moment ago.
A list of all current entities, by hash, is stored higher up the tree. Whether it's better that objects themselves store their entity ID or if that's a separate data structure mapping entity to hash IDs depends on the situation.
On second reading I guess your comment was actually about how to come up with content-based IDs for objects. I guess my point was that in the real world you don't usually need to do that, because if object identity besides its content is important you can just give it an arbitrary ID. How often does the problem of differentiating between a graph containing identical Alices vs one with a single self-friending Alice actually come up? Is there any way around it other than numbering the Alices?
> ...The trick being that every entity in the system ended up having two IDs...
I think we agree that this is a partial solution. Adding a temporal dimension and referencing immutable single-versions only can break cycles by making them unconstructable in the first place. But once an object refers to foreign entity IDs, hash IDs become "polluted" with surrogate values.
> How often does the problem of differentiating between a graph containing identical Alices vs one with a single self-friending Alice actually come up? Is there any way around it other than numbering the Alices?
I think it would come up with forks that have some identical edits, especially if those edits are in different orders. In that case, surrogate keygen state would get out of sync (whether it's counters or UUID state). Either we pessimize merges, or we need some way to recover.
I think allowing for "identical Alices" is probably necessary in practice, but an interface should have a warning of some sort about this. (Maybe ask the Alices more questions until you can differentiate them? Get prepared in case one of the alices comes back and wants a new I.D. card and you don't want to easily enable fraud.) Likewise when merging those warnings should be brought to the fore, along with a menu of resolutions at extremes for the user to decide between.
A list of all current entities, by hash, is stored higher up the tree. Whether it's better that objects themselves store their entity ID or if that's a separate data structure mapping entity to hash IDs depends on the situation.
On second reading I guess your comment was actually about how to come up with content-based IDs for objects. I guess my point was that in the real world you don't usually need to do that, because if object identity besides its content is important you can just give it an arbitrary ID. How often does the problem of differentiating between a graph containing identical Alices vs one with a single self-friending Alice actually come up? Is there any way around it other than numbering the Alices?