Litestream live replication has been moved to the LiteFS project

infogulch · on Oct 14, 2022

In a previous post about LiteFS [1], the creator Ben commented on how clients could maintain a monotonically consistent view of the database [2] even in the presence of replication lag and updates made by the client. I think this is a pretty good (TM) strategy that should work well for a majority of applications.

> > To improve latency, we're aiming at a scale-out model that works similarly to Fly Postgres. That's to say: writes get forwarded to the primary and all read requests get served from their local copies.

> How can you ensure that a client that just performed a forwarded write will be able to read that back on their local replica on subsequent reads?

> LiteFS provides a transaction ID that applications can use to determine replication lag. If the replica is behind the TXID, it can either wait or it can forward to the primary to ensure consistency.

[1]: https://news.ycombinator.com/item?id=32925734#32928974

[2]: I think this is a reasonable statement, but may not be industry standard terminology.

maxpert · on Oct 14, 2022

A shameless plug here I've been working on a project of my own called [Marmot](https://maxpert.github.io/marmot/) with the intention of making SQLite replication masterless and eventually consistent. Project is in very early stages, and I've been using it for one of my site replicating a cache based off SQLite. It builds on top of NATS (I love NATS), and distributes itself over JetStreams.

CharlesW · on Oct 14, 2022

> …NATS (I love NATS)…

https://nats.io/about/: "[NATS] enables applications to securely communicate across any combination of cloud vendors, on-premise, edge, web and mobile, and devices. […] The NATS Server acts as a central nervous system for building distributed applications."

losfair · on Oct 14, 2022

Shameless plug of my mvSQLite [1] project here! It's basically another distributed SQLite (that is API-compatible with libsqlite3), but with support for everything expected from a proper distributed database: synchronous replication, strictly serializable transactions, + scalable reads and writes w/ multiple concurrent writers.

[1] https://github.com/losfair/mvsqlite

darthShadow · on Oct 14, 2022

I really wanted to give this a try but the lack of WAL support has prevented me from using it. With the recent addition of WAL support[1] in litefs, would it be possible to add the same to mvsqlite too?

[1] https://github.com/superfly/litefs/pull/120

losfair · on Oct 15, 2022

What are you trying to achieve with WAL mode? Is it some kind of application compatibility issue?

The entire SQLite journaling mechanism is not used by mvSQLite (you can set journal_mode=off safely - although SQLite won't be happy to do explicit rollback in this case)

darthShadow · on Oct 15, 2022

Yeah, I am trying to plugin mvsqlite into a binary-only app which is using WAL mode. Changing to any other journaling mode, just causes the app to not start at all.

ok_dad · on Oct 14, 2022

> Distributed, MVCC SQLite that runs on top of FoundationDB.

FYI to anyone here, FoundationDB is fucking awesome for something like this.

Question @losfair: Did you find the Rust bindings for FDB to be very good? The Go bindings are OK, but are pretty out-of-date with some cool new features on the HEAD of the FDB source repo.

losfair · on Oct 15, 2022

I found foundationdb-rs pretty good & robust. Haven't tried any FDB 7.x features (tenants, etc.) though.

endisneigh · on Oct 14, 2022

mvSQLite looks great, though I’m curious how you’d implement a schema migration given the locking properties of SQLite combined with the transaction limits of fdb.

I imagine you’d get an FDB transaction time limit error preventing any schema migrations with non trivial amounts of data.

losfair · on Oct 15, 2022

We found a way around FDB transaction time limit. A mvSQLite transaction is not directly mapped to a FDB transaction (details in readme and wiki)

endisneigh · on Oct 15, 2022

I’ve read it - there’s still a time limit and so long schema migrations still would be an issue. Even without FoundationDB long schema migrations are problems.

losfair · on Oct 15, 2022

Online DDL is now a WIP feature. This will allow to convert DB into read-only mode & run arbitrarily large schema migration concurrently (first stage), and eventually fully concurrent DDL by replaying logs.

endisneigh · on Oct 15, 2022

That sounds very promising. Excellent work with the project!

dastbe · on Oct 14, 2022

litefs and litestream are interesting, but they all continue to not support confirmation that a transaction is durably replicated OR that failover won’t cause data loss. until that point it just seems like a sequence of experiments.

benbjohnson · on Oct 14, 2022

Litestream/LiteFS author here. I agree that synchronous replication is important and we have plans to implement it in LiteFS. Because LiteFS supports a loose membership model, quorum-based acknowledgement doesn't really work as well since the quorum can change. We have some other pieces to put into place before synchronous replication can work well.

However, I disagree that it's just a "sequence of experiments". There are a lot of applications that can benefit from asynchronous replication. Synchronous acknowledgement of writes can impose a high cost on throughput so many folks use async replication, even in systems like Postgres.

radicality · on Oct 15, 2022

Perhaps semi-sync replication could be a good middle ground too. That way you can have 1 replica extremely close to the master (same datacenter / building), while other async replicas are potentially further away / have slower commit times.

dastbe · on Oct 14, 2022

synchronous replication to replicas is not necessary for data durability and doesn’t have to be a huge drag on throughput. for example, you can achieve high throughput by pipelining uncommitted transactions and then tracking when they are durably committed to backing store (like s3) for when to ack back to clients. and when dealing with failover, you can use the central store for determining place in ledger rather than whatever is on the replica that happens to get leadership.

losfair · on Oct 14, 2022

Maybe the inherent overhead of synchronous replication is more on latency rather than throughput.

LunaSea · on Oct 14, 2022

I wonder if fly.io & co have built home grown solutions for this?

hardwaresofton · on Oct 14, 2022

well they're in the best/only(?) spot to do it -- owning the platform the end-programmer code is running on. SQLite is well built and extensible but hard to extend without the cooperation of the end-programmer, so to speak.

SQLite is unfortunately (?) kind of hard to modify for external processes and while it's built very extensibly it's often not the thing you can kind of just... turn on, if that makes sense, and SQLite lives in the address space of the executing program.

You end up with stuff like hacking LD_PRELOAD[0].

Note: Litestream (Ben) was acquihired essentially by Fly.io (so that should explain all their recent SQLite content!).

[0]: https://github.com/cventers/sqlite3-preload

plugin-baby · on Oct 14, 2022

Fly.io bought Litestream

mildbyte · on Oct 14, 2022

The live replication (as it used to work in Litestream before the LiteFS move, without Consul) would have been perfect for our use case with Seafowl (I played around with Litestream before that but had to settle on PostgreSQL for the sample multi-node deployment [0]):

- rare writes that get directed to a single instance (e.g. using Fly.io's replay header), frequent reads (potentially at edge locations)

- no need to deploy a PostgreSQL cluster and set up logical replication

- SQLite database stored in object storage, reader replicas can boot up using the object storage copy and then get kept in sync by pulling data from the writer

- delay in replication is fine

LiteFS is probably going to be a great solution here since we're mainly using Fly.io and it has built-in support for it [1], but are there any alternatives that don't require Consul, still look like an SQLite database to the client and can work off of a HTTP connection to the primary, so that we don't have to require our users to deploy to Fly?

[0] https://seafowl.io/docs/guides/scaling-multiple-nodes

[1] https://fly.io/docs/litefs/getting-started/

benbjohnson · on Oct 14, 2022

LiteFS author here. You can also set up LiteFS to have a single, static leader instead of using Consul if you don't want that dependency. I need to write up some docs but there's some info on the PR itself: https://github.com/superfly/litefs/pull/47

mildbyte · on Oct 25, 2022

Just to follow up on this, I got it working, thanks for the help! (for anyone else interested in all the details of getting LiteFS to run with Docker Compose + a Rust SQLite client: https://github.com/splitgraph/seafowl/tree/main/examples/lit...)

mildbyte · on Oct 14, 2022

Ah, sweet (and thanks for building Litestream/LiteFS)! This should work great for us, will definitely try to get a PoC going with this.

pkhuong · on Oct 14, 2022

https://github.com/backtrace-labs/verneuil doesn't even need an HTTP connection between the writer and readers: readers only interact with the object store.

detaro · on Oct 14, 2022

Did anyone see a why (i.e. what does the FUSE-based approach gain that the Litestream one doesn't have?)?

simonw · on Oct 14, 2022

I'm sure Ben will pop in with an explanation soon, but my understanding is that this is mainly about safety.

The way Litestream was doing replication required programmers to be extremely careful not to accidentally attempt a write to a replicated database copy - doing so would corrupt that copy, potentially in non-obvious ways. There was no practical mechanism for protecting people from making this mistake.

The FUSE approach for LiteFS has more control, so can do a better job of protecting people from this kind of mistake.

benbjohnson · on Oct 14, 2022

LiteFS author here. That's a good explanation, Simon. Preventing writes on the replica is a nice benefit to FUSE versus an external process.

Another benefit of the control is that we can maintain a rolling checksum of the entire database at each transaction so we're able to verify integrity when replicating. That's also what allows us to do asynchronous replication across a loose membership of nodes since we can easily detect if a node diverges.

openthc · on Oct 14, 2022

Why not use https://github.com/rqlite/rqlite ?

You have to write to the SQLite via the rqlite HTTP API but it will replicate the data to N nodes (at least 20) via RAFT and then others can read-only the SQLite replica files directly; and file-permissions prevent the accidental write.

hardwaresofton · on Oct 14, 2022

rqlite is a nice project, but you kind of covered it:

> You have to write to the SQLite via the rqlite HTTP API

Also requiring non determinism (i.e. no RANDOM()) is something I don't think I really want to worry about. There are a few of tradeoffs for rqlite (and dqlite too, to be fair) that just don't seem to be quite worth it (especially compared to just running Postgres).

I think people are realizing that having one far away writer is actually fine -- 90%+ of the traffic you're trying to serve fast is read queries.

otoolep · on Oct 14, 2022

Actually rqlite release 7.7.0[1] adds support for RANDOM(). Timestamp functions support will be added to an upcoming release. It does this by statement-rewriting[2] before sending the SQL to the Raft log.

[1] https://www.philipotoole.com/rqlite-7-7-0-released/

[2] https://github.com/rqlite/rqlite/blob/master/DOC/NON_DETERMI...

otoolep · on Oct 14, 2022

And benbjohnson wrote the SQL parser[1] rqlite uses to do all this. So you see, the man is everywhere. :-)

[1] https://github.com/rqlite/sql

hardwaresofton · on Oct 14, 2022

TIL, thank you for pointing this out -- maybe it's time to re-evaluate my stance on rqlite & dqlite

alin23 · on Oct 14, 2022

And this is why choosing a good explicit name for a project doesn't matter too much. Litestream was SQLite Streaming replication and now that's exactly what it doesn't do.

Requirements and features change with time, don't fret about names too much.

adamckay · on Oct 14, 2022

It's still streaming replication, but to S3/other-storage for backups, not high availability.

tekstar · on Oct 14, 2022

.. which was the only feature set when litestream was released. Replication was added later and now removed.

zdw · on Oct 14, 2022

What's the overhead of using FUSE to implement LiteFS?

Is there an issue with OS compatibility? FUSE tends to require OS hooks, last I checked, and that can be somewhat hairy to deal with.

benbjohnson · on Oct 14, 2022

> What's the overhead of using FUSE to implement LiteFS?

It's a tricky question to answer. Most of the noticeable overhead is on the write side. Initial benchmarks of overhead that I've seen locally are about 250µs for the write(2) and fsync(2) calls. It's closer to 100µs for read(2) calls. There are additional writes made behind the scenes as well for storing in a replication format for the other nodes too.

However, on the read side some of that is moot. For many databases, most reads will be in the OS page cache and a fetch from there seems to be closer to 4µs. If you're running a moderately sized database (e.g 1GB) on even a modest VM (e.g. 256 RAM) then most of your hot pages will be in the OS page cache so you shouldn't notice much overhead on the read side.

LiteFS is targeted at read-heavy workloads. If you need high write throughput of thousands of writes per second then LiteFS probably isn't a good fit.

> Is there an issue with OS compatibility?

LiteFS is Linux right now. We'll be supporting other operating systems via a SQLite VFS extension in the future. macOS has poor FUSE support right now and I'm not sure where Windows and BSD stand with their support for FUSE or FUSE-like systems.

yellowapple · on Oct 14, 2022

> We'll be supporting other operating systems via a SQLite VFS extension in the future.

Are there advantages to a FUSE-based approach over a VFS-based approach?

benbjohnson · on Oct 15, 2022

FUSE is a better developer experience since the end user doesn't have to worry about loading the extension every time they use a SQLite driver or the sqlite3 CLI. LiteFS runs an HTTP API server to communicate between nodes as well and if you run multiple processes with a VFS then you have to determine which one should start up that server. Those are the main advantages. Otherwise they're similar file system level interfaces.

yellowapple · on Oct 15, 2022

> FUSE is a better developer experience since the end user doesn't have to worry about loading the extension every time they use a SQLite driver or the sqlite3 CLI.

For the CLI that's a fair concern, but from an "I'm writing an application that links against libsqlite3 and/or includes sqlite3.c" I feel like the opposite is true: I'll have a much easier time configuring the VFS in my app than fiddling with FUSE mounts.

> LiteFS runs an HTTP API server to communicate between nodes as well and if you run multiple processes with a VFS then you have to determine which one should start up that server.

Wouldn't that be a similar problem as working out which process should be the writer? I'm admittedly not too familiar with either Litestream or LiteFS, so that might be a dumb question.

dotnwat · on Oct 14, 2022

What is the status of FUSE on macOS?

benbjohnson · on Oct 14, 2022

It's not great. tv42 (who maintains the FUSE implementation we use) commented on it recently[1]. macOS support will be via a VFS extension in the future.

[1]: https://github.com/superfly/litefs/issues/119#issuecomment-1...

rubenv · on Oct 14, 2022

Any chance this works together with Kubernetes?

benbjohnson · on Oct 14, 2022

LiteFS author here. I haven't tested it on Kubernetes yet but it is meant to be deployed anywhere. The only dependency is Consul although you could get around that by using a static leader[1].

[1]: https://github.com/superfly/litefs/pull/47

sharps_xp · on Oct 14, 2022

How do you turn a local-first file system based database into a cloud software product with vendor lock in?

benbjohnson · on Oct 14, 2022

How is there vendor lock-in? The LiteFS & Litestream code is all open source under Apache 2.

sharps_xp · on Oct 14, 2022

i think both fly.io and litestream were projects that spoke sweet words to the average developer wanting to build features without the infra headache, and what made sqlite so appealing was its simplicity. litestream kept that simplicity too. but i don't think the avg developer wants to spin up their own LiteFS. At a glance at the repo, i have no idea how to deploy this thing. It would have served fly and litestream users better to have kept replication within litestream.

just as no one using postgres on RDS will ever leave RDS not b/c RDS is so much better than its competitors but because the hurdle is too great and the migration so risky. right now, fly is the only one who lessens the burden to use LiteFS and as long as they're the only one, the average developer is essentially locked in.