> So, why does it have to be every keystroke? We can batch them locally.
This approach would absolutely work. Thanks for bringing it up! The problem is that it has a cost I'd rather not pay.
The problem is that batching only works if you don't send anything until the end of the batch. And that gives you an awful tradeoff - the more "live" your editing system is, the bigger the overhead. Its really nice seeing someone type live. Seeing their changes "pop in" after 5 seconds doesn't feel great. It doesn't feel current. And I think its too slow for pair programming over zoom and things like that. It also makes the DAG of changes much more complicated, because there's a much higher chance that changes are concurrent when there's that much latency. And that in turn also makes other aspects of the system do a lot more work to merge.
With hashing alone, you don't need to batch changes like that. Instead, you can just store hashes on disk for (for example) every 100 changes, and regenerate the changes in the middle when needed. (Though you need some other tricks so you can know where to look for a given hash).
Respect where its due, that would 100% solve it. I'm just looking for a way we can delete data without needing to give up the liveness.
> We should also permit the author of an operation to rollback the operation.
This doesn't completely solve the problem of hashes and guessability. Say in the 5 second batch I only type 1 character. We store some metadata, the character and the hash of the metadata & character. Then in the next 5 seconds I issue a rollback operation to delete that character. If I delete the character from durable storage, I can still figure out what the character was by brute forcing things and checking the hash for that batch.
I think thats fixable by adding junk data to every batch. I just bet there's ways to avoid doing that.
Are you using a rolling hash (such as the one employed in Rabin-Karp string search algorithm)? You can then store the hash at check points and the intermediate hashes between checkpoints can be regenerated using the rolling hash.
This approach would absolutely work. Thanks for bringing it up! The problem is that it has a cost I'd rather not pay.
The problem is that batching only works if you don't send anything until the end of the batch. And that gives you an awful tradeoff - the more "live" your editing system is, the bigger the overhead. Its really nice seeing someone type live. Seeing their changes "pop in" after 5 seconds doesn't feel great. It doesn't feel current. And I think its too slow for pair programming over zoom and things like that. It also makes the DAG of changes much more complicated, because there's a much higher chance that changes are concurrent when there's that much latency. And that in turn also makes other aspects of the system do a lot more work to merge.
With hashing alone, you don't need to batch changes like that. Instead, you can just store hashes on disk for (for example) every 100 changes, and regenerate the changes in the middle when needed. (Though you need some other tricks so you can know where to look for a given hash).
Respect where its due, that would 100% solve it. I'm just looking for a way we can delete data without needing to give up the liveness.
> We should also permit the author of an operation to rollback the operation.
This doesn't completely solve the problem of hashes and guessability. Say in the 5 second batch I only type 1 character. We store some metadata, the character and the hash of the metadata & character. Then in the next 5 seconds I issue a rollback operation to delete that character. If I delete the character from durable storage, I can still figure out what the character was by brute forcing things and checking the hash for that batch.
I think thats fixable by adding junk data to every batch. I just bet there's ways to avoid doing that.