Hacker News new | past | comments | ask | show | jobs | submit login

Dolt might be good but never underestimate the power of Type 2 Slowly Changing Dimension tables [1]. For example, if you had an SSIS package that took CSV and imported them into a database, and one day you noticed it accidently rounded the value incorrectly, you could fix the data and retain traceability of the data which was there originally.

E.g., SSIS package writes row of data: https://imgur.com/DClXAi5

Then a few months later (on 2020-08-15) we identify that trans_value was imported incorrectly so we update it: https://imgur.com/wdQJWm4

Then whenever we SELECT from the table we always ensure we are extracting "today's" version of the data:

    select * from table where TODAY between effective_from and effective_to
[1] https://en.wikipedia.org/wiki/Slowly_changing_dimension



I definitely agree, just tossing in the superset concept that Dolt and Type 2 SCD involve - temporal databases [1].

I think the idea of a "diff" applied to datasets is quite awesome, but even then, we kind of do that with databases today with data comparison tools - it's just most of them are not time aware, rather they are used to compare data between two instances of the data in different databases, not at two points in time in the same database.

[1] https://en.wikipedia.org/wiki/Temporal_database


The commit log in Dolt is edit history. (When did someone import or edit the data? Who made the change?) It's not about when things happened.

To keep track of when things happened, you would still need date columns to handle it. But at least you don't need to handle two-dimensional history for auditing purposes. So, in your example, I think the "effective" date columns wouldn't be needed.

They have ways to query how the dataset appeared at some time in the past. However, with both data changes and schema changes being mixed together in a single commit log, I could see this being troublesome.

I suppose writing code to convert old edit history to the new schema would still be possible, similar to how git allows you to create a new branch by rewriting an existing one.


If all you want is AS OF semantics, then SCD2 is a great match. Used it a ton in application development myself.

Dolt actually makes branch and merge possible, totally different beast.


Can't something like that work as SQL:2011 temporal tables?

Postgres equivalent solution in this [0] rather complex but great tutorial

[0]: https://clarkdave.net/2015/02/historical-records-with-postgr...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: