Dolt might be good but never underestimate the power of Type 2 Slowly Changing D...

sixdimensional · on March 6, 2021

I definitely agree, just tossing in the superset concept that Dolt and Type 2 SCD involve - temporal databases [1].

I think the idea of a "diff" applied to datasets is quite awesome, but even then, we kind of do that with databases today with data comparison tools - it's just most of them are not time aware, rather they are used to compare data between two instances of the data in different databases, not at two points in time in the same database.

[1] https://en.wikipedia.org/wiki/Temporal_database

skybrian · on March 7, 2021

The commit log in Dolt is edit history. (When did someone import or edit the data? Who made the change?) It's not about when things happened.

To keep track of when things happened, you would still need date columns to handle it. But at least you don't need to handle two-dimensional history for auditing purposes. So, in your example, I think the "effective" date columns wouldn't be needed.

They have ways to query how the dataset appeared at some time in the past. However, with both data changes and schema changes being mixed together in a single commit log, I could see this being troublesome.

I suppose writing code to convert old edit history to the new schema would still be possible, similar to how git allows you to create a new branch by rewriting an existing one.

zachmu · on March 7, 2021

If all you want is AS OF semantics, then SCD2 is a great match. Used it a ton in application development myself.

Dolt actually makes branch and merge possible, totally different beast.

antman · on March 7, 2021

Can't something like that work as SQL:2011 temporal tables?

Postgres equivalent solution in this [0] rather complex but great tutorial

[0]: https://clarkdave.net/2015/02/historical-records-with-postgr...