Hacker News new | past | comments | ask | show | jobs | submit login
Litestream live replication has been moved to the LiteFS project (github.com/benbjohnson)
90 points by hardwaresofton on Oct 14, 2022 | hide | past | favorite | 50 comments



In a previous post about LiteFS [1], the creator Ben commented on how clients could maintain a monotonically consistent view of the database [2] even in the presence of replication lag and updates made by the client. I think this is a pretty good (TM) strategy that should work well for a majority of applications.

> > To improve latency, we're aiming at a scale-out model that works similarly to Fly Postgres. That's to say: writes get forwarded to the primary and all read requests get served from their local copies.

> How can you ensure that a client that just performed a forwarded write will be able to read that back on their local replica on subsequent reads?

> LiteFS provides a transaction ID that applications can use to determine replication lag. If the replica is behind the TXID, it can either wait or it can forward to the primary to ensure consistency.

[1]: https://news.ycombinator.com/item?id=32925734#32928974

[2]: I think this is a reasonable statement, but may not be industry standard terminology.


A shameless plug here I've been working on a project of my own called [Marmot](https://maxpert.github.io/marmot/) with the intention of making SQLite replication masterless and eventually consistent. Project is in very early stages, and I've been using it for one of my site replicating a cache based off SQLite. It builds on top of NATS (I love NATS), and distributes itself over JetStreams.


> …NATS (I love NATS)…

https://nats.io/about/: "[NATS] enables applications to securely communicate across any combination of cloud vendors, on-premise, edge, web and mobile, and devices. […] The NATS Server acts as a central nervous system for building distributed applications."


Shameless plug of my mvSQLite [1] project here! It's basically another distributed SQLite (that is API-compatible with libsqlite3), but with support for everything expected from a proper distributed database: synchronous replication, strictly serializable transactions, + scalable reads and writes w/ multiple concurrent writers.

[1] https://github.com/losfair/mvsqlite


I really wanted to give this a try but the lack of WAL support has prevented me from using it. With the recent addition of WAL support[1] in litefs, would it be possible to add the same to mvsqlite too?

[1] https://github.com/superfly/litefs/pull/120


What are you trying to achieve with WAL mode? Is it some kind of application compatibility issue?

The entire SQLite journaling mechanism is not used by mvSQLite (you can set journal_mode=off safely - although SQLite won't be happy to do explicit rollback in this case)


Yeah, I am trying to plugin mvsqlite into a binary-only app which is using WAL mode. Changing to any other journaling mode, just causes the app to not start at all.


> Distributed, MVCC SQLite that runs on top of FoundationDB.

FYI to anyone here, FoundationDB is fucking awesome for something like this.

Question @losfair: Did you find the Rust bindings for FDB to be very good? The Go bindings are OK, but are pretty out-of-date with some cool new features on the HEAD of the FDB source repo.


I found foundationdb-rs pretty good & robust. Haven't tried any FDB 7.x features (tenants, etc.) though.


mvSQLite looks great, though I’m curious how you’d implement a schema migration given the locking properties of SQLite combined with the transaction limits of fdb.

I imagine you’d get an FDB transaction time limit error preventing any schema migrations with non trivial amounts of data.


We found a way around FDB transaction time limit. A mvSQLite transaction is not directly mapped to a FDB transaction (details in readme and wiki)


I’ve read it - there’s still a time limit and so long schema migrations still would be an issue. Even without FoundationDB long schema migrations are problems.


Online DDL is now a WIP feature. This will allow to convert DB into read-only mode & run arbitrarily large schema migration concurrently (first stage), and eventually fully concurrent DDL by replaying logs.


That sounds very promising. Excellent work with the project!


litefs and litestream are interesting, but they all continue to not support confirmation that a transaction is durably replicated OR that failover won’t cause data loss. until that point it just seems like a sequence of experiments.


Litestream/LiteFS author here. I agree that synchronous replication is important and we have plans to implement it in LiteFS. Because LiteFS supports a loose membership model, quorum-based acknowledgement doesn't really work as well since the quorum can change. We have some other pieces to put into place before synchronous replication can work well.

However, I disagree that it's just a "sequence of experiments". There are a lot of applications that can benefit from asynchronous replication. Synchronous acknowledgement of writes can impose a high cost on throughput so many folks use async replication, even in systems like Postgres.


Perhaps semi-sync replication could be a good middle ground too. That way you can have 1 replica extremely close to the master (same datacenter / building), while other async replicas are potentially further away / have slower commit times.


synchronous replication to replicas is not necessary for data durability and doesn’t have to be a huge drag on throughput. for example, you can achieve high throughput by pipelining uncommitted transactions and then tracking when they are durably committed to backing store (like s3) for when to ack back to clients. and when dealing with failover, you can use the central store for determining place in ledger rather than whatever is on the replica that happens to get leadership.


Maybe the inherent overhead of synchronous replication is more on latency rather than throughput.


I wonder if fly.io & co have built home grown solutions for this?


well they're in the best/only(?) spot to do it -- owning the platform the end-programmer code is running on. SQLite is well built and extensible but hard to extend without the cooperation of the end-programmer, so to speak.

SQLite is unfortunately (?) kind of hard to modify for external processes and while it's built very extensibly it's often not the thing you can kind of just... turn on, if that makes sense, and SQLite lives in the address space of the executing program.

You end up with stuff like hacking LD_PRELOAD[0].

Note: Litestream (Ben) was acquihired essentially by Fly.io (so that should explain all their recent SQLite content!).

[0]: https://github.com/cventers/sqlite3-preload


Fly.io bought Litestream


The live replication (as it used to work in Litestream before the LiteFS move, without Consul) would have been perfect for our use case with Seafowl (I played around with Litestream before that but had to settle on PostgreSQL for the sample multi-node deployment [0]):

- rare writes that get directed to a single instance (e.g. using Fly.io's replay header), frequent reads (potentially at edge locations)

- no need to deploy a PostgreSQL cluster and set up logical replication

- SQLite database stored in object storage, reader replicas can boot up using the object storage copy and then get kept in sync by pulling data from the writer

- delay in replication is fine

LiteFS is probably going to be a great solution here since we're mainly using Fly.io and it has built-in support for it [1], but are there any alternatives that don't require Consul, still look like an SQLite database to the client and can work off of a HTTP connection to the primary, so that we don't have to require our users to deploy to Fly?

[0] https://seafowl.io/docs/guides/scaling-multiple-nodes

[1] https://fly.io/docs/litefs/getting-started/


LiteFS author here. You can also set up LiteFS to have a single, static leader instead of using Consul if you don't want that dependency. I need to write up some docs but there's some info on the PR itself: https://github.com/superfly/litefs/pull/47


Just to follow up on this, I got it working, thanks for the help! (for anyone else interested in all the details of getting LiteFS to run with Docker Compose + a Rust SQLite client: https://github.com/splitgraph/seafowl/tree/main/examples/lit...)


Ah, sweet (and thanks for building Litestream/LiteFS)! This should work great for us, will definitely try to get a PoC going with this.


https://github.com/backtrace-labs/verneuil doesn't even need an HTTP connection between the writer and readers: readers only interact with the object store.


Did anyone see a why (i.e. what does the FUSE-based approach gain that the Litestream one doesn't have?)?


I'm sure Ben will pop in with an explanation soon, but my understanding is that this is mainly about safety.

The way Litestream was doing replication required programmers to be extremely careful not to accidentally attempt a write to a replicated database copy - doing so would corrupt that copy, potentially in non-obvious ways. There was no practical mechanism for protecting people from making this mistake.

The FUSE approach for LiteFS has more control, so can do a better job of protecting people from this kind of mistake.


LiteFS author here. That's a good explanation, Simon. Preventing writes on the replica is a nice benefit to FUSE versus an external process.

Another benefit of the control is that we can maintain a rolling checksum of the entire database at each transaction so we're able to verify integrity when replicating. That's also what allows us to do asynchronous replication across a loose membership of nodes since we can easily detect if a node diverges.


Why not use https://github.com/rqlite/rqlite ?

You have to write to the SQLite via the rqlite HTTP API but it will replicate the data to N nodes (at least 20) via RAFT and then others can read-only the SQLite replica files directly; and file-permissions prevent the accidental write.


rqlite is a nice project, but you kind of covered it:

> You have to write to the SQLite via the rqlite HTTP API

Also requiring non determinism (i.e. no RANDOM()) is something I don't think I really want to worry about. There are a few of tradeoffs for rqlite (and dqlite too, to be fair) that just don't seem to be quite worth it (especially compared to just running Postgres).

I think people are realizing that having one far away writer is actually fine -- 90%+ of the traffic you're trying to serve fast is read queries.


Actually rqlite release 7.7.0[1] adds support for RANDOM(). Timestamp functions support will be added to an upcoming release. It does this by statement-rewriting[2] before sending the SQL to the Raft log.

[1] https://www.philipotoole.com/rqlite-7-7-0-released/

[2] https://github.com/rqlite/rqlite/blob/master/DOC/NON_DETERMI...


And benbjohnson wrote the SQL parser[1] rqlite uses to do all this. So you see, the man is everywhere. :-)

[1] https://github.com/rqlite/sql


TIL, thank you for pointing this out -- maybe it's time to re-evaluate my stance on rqlite & dqlite


And this is why choosing a good explicit name for a project doesn't matter too much. Litestream was SQLite Streaming replication and now that's exactly what it doesn't do.

Requirements and features change with time, don't fret about names too much.


It's still streaming replication, but to S3/other-storage for backups, not high availability.


.. which was the only feature set when litestream was released. Replication was added later and now removed.


What's the overhead of using FUSE to implement LiteFS?

Is there an issue with OS compatibility? FUSE tends to require OS hooks, last I checked, and that can be somewhat hairy to deal with.


> What's the overhead of using FUSE to implement LiteFS?

It's a tricky question to answer. Most of the noticeable overhead is on the write side. Initial benchmarks of overhead that I've seen locally are about 250µs for the write(2) and fsync(2) calls. It's closer to 100µs for read(2) calls. There are additional writes made behind the scenes as well for storing in a replication format for the other nodes too.

However, on the read side some of that is moot. For many databases, most reads will be in the OS page cache and a fetch from there seems to be closer to 4µs. If you're running a moderately sized database (e.g 1GB) on even a modest VM (e.g. 256 RAM) then most of your hot pages will be in the OS page cache so you shouldn't notice much overhead on the read side.

LiteFS is targeted at read-heavy workloads. If you need high write throughput of thousands of writes per second then LiteFS probably isn't a good fit.

> Is there an issue with OS compatibility?

LiteFS is Linux right now. We'll be supporting other operating systems via a SQLite VFS extension in the future. macOS has poor FUSE support right now and I'm not sure where Windows and BSD stand with their support for FUSE or FUSE-like systems.


> We'll be supporting other operating systems via a SQLite VFS extension in the future.

Are there advantages to a FUSE-based approach over a VFS-based approach?


FUSE is a better developer experience since the end user doesn't have to worry about loading the extension every time they use a SQLite driver or the sqlite3 CLI. LiteFS runs an HTTP API server to communicate between nodes as well and if you run multiple processes with a VFS then you have to determine which one should start up that server. Those are the main advantages. Otherwise they're similar file system level interfaces.


> FUSE is a better developer experience since the end user doesn't have to worry about loading the extension every time they use a SQLite driver or the sqlite3 CLI.

For the CLI that's a fair concern, but from an "I'm writing an application that links against libsqlite3 and/or includes sqlite3.c" I feel like the opposite is true: I'll have a much easier time configuring the VFS in my app than fiddling with FUSE mounts.

> LiteFS runs an HTTP API server to communicate between nodes as well and if you run multiple processes with a VFS then you have to determine which one should start up that server.

Wouldn't that be a similar problem as working out which process should be the writer? I'm admittedly not too familiar with either Litestream or LiteFS, so that might be a dumb question.


What is the status of FUSE on macOS?


It's not great. tv42 (who maintains the FUSE implementation we use) commented on it recently[1]. macOS support will be via a VFS extension in the future.

[1]: https://github.com/superfly/litefs/issues/119#issuecomment-1...


Any chance this works together with Kubernetes?


LiteFS author here. I haven't tested it on Kubernetes yet but it is meant to be deployed anywhere. The only dependency is Consul although you could get around that by using a static leader[1].

[1]: https://github.com/superfly/litefs/pull/47


How do you turn a local-first file system based database into a cloud software product with vendor lock in?


How is there vendor lock-in? The LiteFS & Litestream code is all open source under Apache 2.


i think both fly.io and litestream were projects that spoke sweet words to the average developer wanting to build features without the infra headache, and what made sqlite so appealing was its simplicity. litestream kept that simplicity too. but i don't think the avg developer wants to spin up their own LiteFS. At a glance at the repo, i have no idea how to deploy this thing. It would have served fly and litestream users better to have kept replication within litestream.

just as no one using postgres on RDS will ever leave RDS not b/c RDS is so much better than its competitors but because the hurdle is too great and the migration so risky. right now, fly is the only one who lessens the burden to use LiteFS and as long as they're the only one, the average developer is essentially locked in.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: