Differential Dataflow is going to cause a huge upset in several high value areas...

repsilat · on Oct 22, 2019

Ooh, I had not idea this had a name and people were working on it. Super promising, I agree, though I think a lot of it depends on the expressiveness of the functional primitives, the practical efficiency of partial updates, and the amount of caching that actually gets used in updates when the program is complex.

The big idea that convinced me of its worth is the interface being "build your derived state from scratch". The user never modifies something after-the-fact, it's all versioned "source data" and immutable derived state.

That's the magic that makes Docker and Redux work, in addition to the aforementioned React.

simplify · on Oct 22, 2019

What's the main difference between this and event sourcing?

atombender · on Oct 23, 2019

Not the OP, but as I understand this project, the idea is that you can represent a pipeline that looks like it's operating on an entire collection of data, but in reality can operate on a delta.

That means that if you have a million items in your input, and you add another million to it, your pipeline only needs to run on one million, even though the output will be two million. In this case, the input is a collection that may be modified at any point, not a linear sequence of "events".

I've designed a similar system to ingest data that comes in the form of snapshots. The pipeline has many discrete steps and starts by splitting up the snapshot (e.g. a big monolithic JSON file) into smaller parts, which gets parses, transformed, split, joined etc. When a modified file comes in, the pipeline only performs actions on modified items; as data trickles through the pipeline, subsequent steps only run if a step modifies an item (relative to current state). It significantly reduces processing.

Pachyderm works on a similar principle, but using files in a virtual file system.

pgt · on Oct 23, 2019

Differential Dataflow solves the second hardest problem in computer science - caching - by calculating the optimal reactions to an event that will maintain the client's view. Of course, this requires tracking client queries and state cursors.

For more background, look at Adapton and Jane Street's incremental DOM implementations: https://github.com/janestreet/incr_dom