Hacker News new | past | comments | ask | show | jobs | submit login

Exactly. What Calvin (and FaunaDB, I think) does is allow you to run an arbitrary function which takes input on the database. That function can do whatever it wants, including any number of reads, writes, and arbitrary logic based on the reads. But critically, it can't talk back to the calling client, except for to send the final result.

This allows you to implement the pattern you describe, which I agree is common, but in a dramatically simpler way.

Having the database, which is not really a single thing, but a swarm of computers spread across the globe separated by unreliable links who are trying to stay in consensus, pause their work to hear back from the client just seems ... well it seems miraculous that it can work at all.

The Calvin way is so much easier, it seems like there must be some very good reason that it's not what CockroachDB does. But I've never heard what that reason is.




Calvin has been an elegant protocol to work with in practice, and has pretty radically simplified FaunaDB's implementation of transactions compared to classic 2PC. Writes are committed in one global communication exchange, read isolation is pretty straightforward, and not requiring transaction recovery cuts out a significant amount of complexity which tends to be overlooked.

In talking with others, my best guess as to why we've seen relatively few implementations of it in the wild is that it is just less well understood compared to 2PC, so misconceptions propagate. The original paper focuses on how it works, rather than how to apply it in detail to generic transaction processing, which perhaps is a shame in hindsight considering that is where most of the confusion lies, IMHO.

For example, there is no reason stemming from Calvin that FaunaDB cannot provide full SQL-style session transactions. We chose not to implement them because they aren't a good fit for the core use-cases the system currently targets. Specifically, interactive transactions are too chatty for client-server interactions over the internet where link latency dominates an application's perceived speed: Instead, FaunaDB's interface encourages packing as much logic into as few requests as possible. (But I suppose that's a topic for another comment thread.)


Would that include transactions for which the reads can query the whole database as opposed to a predetermined set of rows?


The SQL support? Yes, even that.


> That function can do whatever it wants, including any number of reads, writes, and arbitrary logic based on the reads. But critically, it can't talk back to the calling client, except for to send the final result.

If that function can still do everything the client did and the client still has to wait for a transaction - you are only eliminating interactive communications overhead, not actually improving or simplifying anything fundamentally. There is still consensus, coordination happening during that waiting and all of this is still fundamentally incompatible with computers spread across the globe communicating over unreliable links.

What would actually be a big improvement is eliminating waiting for coordination, but would require some change in programming model [1].

> The Calvin way is so much easier, it seems like there must be some very good reason that it's not what CockroachDB does.

It's just not "so much easier", that's the reason.

[1] https://arxiv.org/pdf/1901.01930.pdf


Nobody in industry understood how to apply Calvin until we did at Fauna. That is the only reason; the rest is engineering path dependence.


if you don’t mind sharing, I’m curious to learn more what was the main challenge was applying Calvin.

Thanks!


See @freels’ reply above.

I think it is fair to say that the Calvin paper is visibly incomplete and expresses some constraints in a way that makes them seem insurmountable when they are not; specifically, they are only constraints within the log, but do not constrain the database experience overall.

Applying Calvin to traditional RDBMS workloads was a very unlikely creative exercise because it required questioning these explicit constraints.

The Spanner paper also leaves a lot unexplained, but it is less obvious until you are too far down the path to turn back. After all, it worked for Google. Calvin did not have that real-world proof. Combine that with the pessimism of the paper itself and nobody was willing to pick it up.


Wouldn't you have to implement a DSL and all the parts related to it for that to work? Also things like serialization cost. What if I have a large local in memory structure I want to base my query results off of? Funny enough this is kind of similar to the problem solved by things like apache beam.


No you'd need a complete programming language. But SQL is basically one already, most variants are already turing-complete.

You'd also have to provide the programming environment with concepts of cursors and so on so they could page through data efficiently.


Every procedural layer I've ever used bolted onto SQL (pl/SQL, t-sql) has been absolute goddamned agony to use.

SQL is a good (if dated) language for relational access and manipulation, but awful for procedural scripting.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: