Hacker News new | past | comments | ask | show | jobs | submit login

Basically implement double entry accounting.

I think ideally you want something like Rich Hickey speaks about when he speaks of Datomic. An append only database. You can see what the previous values for that row were, along with schema changes.




I worry about GDPR with respect to Kafka and other append-only databases.


You can encrypt events and throw away the keys, if data should be made inaccessible. Of course, it adds complexity. But its already being done.


This does not work for many data models, for both technical and economic (e.g. increasing costs by multiple orders of magnitude) reasons. Many real systems would require hundreds of millions of active encryption keys, encrypting data that is smaller than the encryption block size.

Every database architecture that exists today is designed with the deep assumption that scalable fine-grained deletion will never be required, largely because we don't have good computer science for how to do it. As experienced database operations people know, if you are required to do this kind of delete then the sane way is to completely rebuild your storage with the data to be deleted filtered out, if you have ample excess capacity -- it is often much faster than editing the existing storage. For some large-scale systems, there is no plausible solution.

This is an interesting computer science challenge -- a database kernel designed for efficient deletion -- that I've thought a lot about over the last year dealing with GDPR compliance. It definitely isn't a thing that exists today and encryption doesn't help.


Every database architecture that exists today is designed with the deep assumption that scalable fine-grained deletion will never be required, largely because we don't have good computer science for how to do it

Can you explain this? Are you talking about something different from deleting individual rows?


A "row" is a logical abstraction. It doesn't exist as a physical thing in many database systems, especially modern ones.

Furthermore, "delete" is commonly defined as "will not be returned in a future selection operation" -- there is no implication that any data is physically deleted and permanently inaccessible. Avoiding physical deletion is done for very good technical reasons to support features and performance that everyone is accustomed to in a database.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: