Azure DevOps has an additional requirement that Git clients support a protocol feature called "multi-ack". We don't support it yet, and we didn't think we need it.
Rather than blocking our roll-out on implementing multi-ack, we just disabled this for Azure DevOps for now. We do have a fallback as long as the user isn't using shallow clones.
Yes indeed! But this doesn't apply to a startup in the Explore phase, where you don't need replication, and how we did it for a long time. This is the phase where this architecture is the most use for product iteration.
But you're right, once you start using replication in the Expand phase, there certainly are engineering challenges, but they're all solvable challenges. It might help that in Common Lisp we can hot-reload code, which makes some migrations a lot easier.
We used an existing library called bknr.datastore to handle this part, so we didn't have to reinvent the wheel :) I mentioned that at the end of the blog post, but I wanted to build up the idea for people who have no prior knowledge about how such things work.
To clarify, as I think some people have misunderstood: we used an existing library called bknr.datastore to handle the "database" part of the in-memory store, so we didn't have to invent too much. Our only innovation here was during the Expand phase, where we put that datastore behind a Raft replication.
We do use Preset for metrics and dashboards, and obviously Preset isn't going to talk to our in-memory database.
So we do have a separate MySQL database where we just push analytics data. (e.g. each time an event of interest happens.) We never read from this database, and the data is a schemaless JSON.
Preset then queries from this database for our metrics purposes.
Raft does do persistence and crash recovery, at least of the transaction logs.
What you need from your side (and there are libraries that already do this):
a) A mechanism to snapshot all the data
b) An easy in-memory mechanism to create indexes on fields--not strictly needed, but definitely makes things a lot more easier to work with.
Bespoke data structures are just simple classes, so if you're familiar with traversing simple objects in the language of your choice, you're all set. You might be over-estimating the benefits of a query engine (and I have worked at multiple places that used MySQL extensively, and used MySQL to build heavily scaled software in the past).
> Raft does do persistence and crash recovery, at least of the transaction logs.
It simply does not. The paper that definitionally is Raft doesn't tell you how to interact with durable storage. The raft protocol handles crash recovery in so far as it allows one or more nodes to rebuild state after a crash, but Raft doesn't talk about serialization or WAL or any of the other things you inevitably have to do for reliability and performance. It gives you a way to go from some existing state to the state of the leader (even if that means downloading the full state from scratch), but it doesn't give you a way to go from a pile of bits on a disk to that existing state.
If you have a library that implements Raft and gives you those things, that's not Raft giving you things. And that library could just be SQLite.
> You might be over-estimating the benefits of a query engine
No, I'm not. It's great to describe the data I want and get, say, an array of strings back without having to crawl some Btrees by hand.
> The paper that definitionally is Raft doesn't tell you how to interact with durable storage.
That's being a bit pedantic. Yeah, I did mean that any respectable library implementing Raft would handle all of this correctly.
> without having to crawl some Btrees by hand.
This is not how I query an index. First, we don't even use Btrees, most of the times it's just hash-tables, and otherwise it's a simpler form of binary search trees. But in both cases, it's completely abstracted away in library I'm using. So if I'm trying to search for companies with a given name, in my code it looks like '(company-with-name "foobar")'. If I'm looking for users that belong to a specific company, it'll look like '(users-for-company company)'.
So I still think you're overestimating the benefits of a query engine.
This is fascinating, thanks for the data! I agree with the the other reply to this: I probably should've said that it's easy to get a machine with 100s of GB of RAM instead of saying it's "cheap".
> It's quite odd that an argument grounded on performance claims
I probably did a bad job then, because everything in the blog post was meant to be developer productivity claims, not performance claims. (I come from a developer productivity background. I'm decent at performance stuff, but it's not what excites me, since for most companies my size performance is not critical as long as it scales.)
Using upload-pack allowed us to remove that constraint, since even in a shallow clone we can still get the commit graph via SSH from the remote.