Interesting. SQLite is awesome. I did something similar recently, a block store ...

abetlen · on Dec 26, 2020

The issue I found with WITH RECURSIVE queries is that they're incredibly inefficient for anything but trees. I've looked around and there doesn't seem to be any way to store a global list of visited nodes. This means that when performing a traversal of the graph the recursive query will follow all paths between two nodes.

hansvm · on Dec 27, 2020

I'm pretty sure you could also maintain a temp table and use some kind of "insert...where...returning" construct to squeeze that into a recursive query.

At a moderate overhead you could also definitely return all seen nodes and a flag to identify them as such as part of your intermediate data at each recursive step.

The postgres query optimizer struggles with recursive queries even when well suited to the problem though. Are they actually efficient in sqlite even for trees?

rklaehn · on Dec 27, 2020

I would say they are reasonably efficient.

Of course many orders of magnitude slower than keeping it all in in memory maps and doing the traversal there, but fast enough to not be a limiting factor.

Traversing a medium depth DAG with a million nodes to find orphaned nodes takes less than a second on average hardware.

One thing to be aware of is that SQLite has lots of tuning options, and they are all set to very conservative values by default.

E.g. the default journal mode is FULL, which means that it will flush all the way to disk after each write. The default cache size is tiny.

With a bit of tuning you can get quite decent performance out of SQLite while still having full ACID guarantees, or very good performance for cases where you can compromise on the ACID stuff.

I have not yet found a situation where nosql databases like leveldb offer an orders of magnitude advantage over SQLite, and SQLite is so much more powerful and robust...

https://danluu.com/file-consistency/

hansvm · on Dec 27, 2020

> Traversing a medium depth DAG with a million nodes to find orphaned nodes takes less than a second on average hardware.

Unless you have an abnormally high edge count that sounds super slow to me. Even accounting for metadata overhead and disk page slop you're only reading and processing tens of megabytes, and every algorithm in sight is linear. I'd be surprised if you couldn't get a 2-5x speedup by reading the whole table to RAM in your favorite compiled/jitted language and just traversing it there.

> I have not yet found a situation where nosql databases like leveldb offer an orders of magnitude advantage over SQLite, and SQLite is so much more powerful and robust...

I have no skin in that game, but would some of the nosql solutions not perform significantly better under heavily concurrent insertions and the other workloads they were designed for?

abetlen · on Dec 28, 2020

> I'm pretty sure you could also maintain a temp table and use some kind of "insert...where...returning" construct to squeeze that into a recursive query.

I'm not sure if this is possible in SQLite, as far as I know the WITH clause is limited to SELECT statements.

> Are they actually efficient in sqlite even for trees?

Recursive common table expressions work by adding returned rows to a queue and then performing the recursive select statement independently on each row in the queue until it's empty.

You can use WITH RECURSIVE to traverse a tree by adding the root node to the queue and recursively visiting adjacent rows until the queue is empty. This works correctly and quickly because trees have only a single path between nodes. If you try the same query on a DAG though it will return every path to a given node, you then have to perform a GROUP BY to find the shortest path outside of the recursive query. In the worst case, if you have a graph with many paths between nodes, this method is exponentially slower than a standard BFS.