SQLite 2022 Recap

PreInternet01 · on Jan 1, 2023

A pretty boring and stable year in SQLite land, which is just how I like it.

The JSON -> and ->> operators introduced in 3.38 are my personal favorites, as these really help in implementing my "ship early, then maybe iterate as you're starting to understand what you're doing" philosophy:

1. In the initial app.sqlite database, have a few tables, all with just a `TEXT` field containing serialized JSON, mapping 1:1 to in-app objects;

2. Once a semblance of a schema materializes, use -> and ->> to create `VIEW`s with actual field names and corresponding data types, and update in-app SELECT queries to use those. At this point, it's also safe to start communicating database details to other developers requiring read access to the app data;

3. As needed, convert those `VIEW`s to actual `TABLE`s, so INSERT/UPDATE queries can be converted as well, and developers-that-are-not-me can start updating app data.

The interesting part here is that step (3) is actually not required for, like, 60% of successful apps, and (of course) for 100% of failed apps, saving hundreds of hours of upfront schema/database development time. Basically, you get 'NoSQL/YOLO' benefits for initial development, while still being able to communicate actual database details once things get serious...

mulmboy · on Jan 1, 2023

As a counterpoint I find that nutting out the schema upfront is an incredibly helpful process to define the functionality of the app. Once you have a strict schema that models the application well nearly everything else just falls into place. Strong foundation

freedomben · on Jan 1, 2023

I worked at a startup who did the whole "every table schema is "id, data" where data is a JSON blob with all the data." I loved it at first. For initial development, it felt so freeing and so productive.

Then one of the devs left and I inherited his code, and started to realize exactly how much the rigid schema helps make an app more robust and more maintainable. I will never, ever take that shortcut again. The production outages/downtime caused by database issues that never should have been an issue in the first place (like a corner case row that missing data? Spreading rampant defensive nil checks out across every function that checks any field of a record since any at any time might be inconsistent or nil) contributed to killing the company by pissing off customers.

Elixir/Phoenix does help a lot though because I still treat most rows like JSON objects, but under the hood it's a normal rigid postgres schema. Best of both worlds IMHO.

HelloNurse · on Jan 1, 2023

With JSON blobs in simple tables a very large part of designing a database schema becomes designing a JSON schema, with different opportunities to make mistakes but (potentially) the same strictness.

mulmboy · on Jan 1, 2023

How do you enforce the JSON schema?

HelloNurse · on Jan 2, 2023

We are talking about the internal database of some application: its schema is something that should be designed well, not enforced defensively like a schema for validating undependable inputs.

breeze1990 · on Jan 1, 2023

define schema with data class and you have tools like protobuf/thrift. And I find it really unnecessary to map every attribute to a SQL column.

tenken · on Jan 1, 2023

In the application layer

wenus · on Jan 1, 2023

Nutting?

Rebelgecko · on Jan 2, 2023

https://www.macmillandictionary.com/us/dictionary/american/n...

mulmboy · on Jan 1, 2023

https://idioms.thefreedictionary.com/nutting+out

chris1993 · on Jan 1, 2023

Thinking. British English idiom. Using your nut - your head.

smitty1e · on Jan 2, 2023

Tends to mean "ejaculating" in American English, for the record.

ckvjdonee · on Jan 2, 2023

What's the difference?

phoobahr · on Jan 4, 2023

Similarly one of my most important projects reads a lot of api calls returning serialized json. Those calls are expensive so I have, over time, tried many complicated cacheing mechanism.

These days though it's _so_much_simpler_and_cleaner_ to just wrap the call in a decorator that caches the request to sqlite3 and only makes the call if the cache is stale.

I don't worry about parsing the results or doing any of the heavy lifting right away - just cache the json.

Sqlite is so good at querying those blobs and is so fast it's just not worth munging them. Nice.

And using something like datasette to prototype queries for some of the more complicated tree structures is a breeze.

di456 · on Jan 1, 2023

Thanks for this explanation.

I've been hacking together one-off python scripts to parse out bits of saved API responses, flatten the data to csv's, and load it to SQLite tables. Looks like I can skip all of this and go straight to querying the raw JSON text.

ansgri · on Jan 1, 2023

Finally! I've had good experience following similar process with Postgres (start with obvious key fields and a json "data" then promote components of "data" to columns; maybe use expression indexes to experiment with optimizations). Good to know this is now possible in sqlite without ugly function calls.

nonethewiser · on Jan 1, 2023

This seems awful but maybe I'm underestimating the number of failed apps you're dealing with. And at that rate, maybe the app ideas should be vetted a bit more.

hiAndrewQuinn · on Jan 1, 2023

The stable release of the WebAssembly and JavaScript APIs strike me as particularly monumental. Long live SQLite!

https://sqlite.org/wasm/doc/trunk/index.md

NeutralForest · on Jan 1, 2023

I don't understand the use case? From the docs [https://sqlite.org/wasm/doc/trunk/demo-123.md], the database disappears on page reload on top of requiring using workers for longer running processes.

simonw · on Jan 1, 2023

Even without persistence it's absurdly useful. In 2023 loading even a 50MB+ database file into a browser is feasible (that's only 10 heavy React webpage loads) and now you can run SQL queries directly against it in the browser.

Plenty of interesting databases fit into less than a MB even.

I've been using SQLite in WebAssembly for my Datasette Lite project - a Python server-side web app running entirely in the browser: https://simonwillison.net/2022/May/4/datasette-lite/ - here's an article showing how that can be useful: https://simonwillison.net/2022/Aug/21/scotrail/

It's also available in Observable notebooks, which is really handy. Here's a project I built on top of that: https://simonwillison.net/2022/Nov/20/tracking-mastodon/ - notebook here: https://observablehq.com/@simonw/mastodon-users-and-statuses...

mbo · on Jan 2, 2023

Additionally, SQLite in WebAssembly with a HTTP-Range-request based virtual file system greatly simplifies interactive visualisation of large datasets https://observablehq.com/@mjbo/sqljs-httpvfs (my own demo)

simonw · on Jan 2, 2023

That's a really cool demo notebook.

NeutralForest · on Jan 1, 2023

Oh wow, thanks for all the links!

samwillis · on Jan 1, 2023

The intention is to make use of the new "Origin privet file system" api, this provides the website with a sandboxed block level file system on the users device that can be used for efficient access and writes. It will be possible for WASM SQLite, or any other DB engine ported to WASM, to have full ACID compliance.

The SQLite team have been working with browser developers to ensure the new API is sufficient to enable all this.

Honestly, and I keep going on about it, SQLite in the browser via WASM is the missing piece to make "offline first" PWAs a serious contender when deciding an architecture for an app.

2023 is going to be the year of SQLite in the browser.

https://webkit.org/blog/12257/the-file-system-access-api-wit...

https://sqlite.org/wasm/doc/trunk/persistence.md#opfs

https://chromestatus.com/feature/5702777582911488

ec109685 · on Jan 1, 2023

Unfortunately unlike apps, the system doesn’t support backing up local storage managed by the browser, so if you get a new phone, you lose your data.

yencabulator · on Jan 6, 2023

I think the idea is to still sync to a server, but support offline (and low latency!) use too.

It's "offline first", not "offline only".

NeutralForest · on Jan 1, 2023

Very interesting, seems like I have a lot of reading to do, thanks!

cldellow · on Jan 1, 2023

You can still persist them; you just have to wire it up yourself. Using workers is a best practice for any CPU-bound task, so that's not a drawback by itself in my mind.

It's good for single-page applications. Many datasets are relatively small -- pushing them to the client is reasonable. In exchange, you get zero-latency querying and can build very responsive UIs that can use SQL, versus writing REST APIs or GraphQL APIs.

Taken to an extreme, it permits publishing datasets that can be queried with no ongoing server-side expenses.

A wild example: Datasette is a Python service that makes SQLite databases queryable via the web. It turns out that since you can compile Python and SQLite to WASM, you can run Datasette entirely in the user's browser [1]. The startup time is brutal, because it's literally simulating the `pip install`, but a purpose-built SPA wouldn't have this problem.

[1]: https://lite.datasette.io/?url=https%3A%2F%2Fcongress-legisl...

NeutralForest · on Jan 1, 2023

Thanks for the explanation, the author of Datasette himself answered, which is a pretty cool thing I like about HN =)

maccaw · on Jan 1, 2023

Under persistant storage options:

https://sqlite.org/wasm/doc/trunk/persistence.md

justsomeuser · on Jan 1, 2023

It would be nice if OPFS (Origin Private File System) allowed reading/writing to an actual SQLite file on the users disk.

At the moment, as I understand it, the OPFS virtual disk is completely isolated from the users disk.

This means you cannot just lightly query a 1GB file without first copying the 1GB from the users filesystem to the OPFS.

Any writes mean you must then copy the 1GB SQLite db file from OPFS to the users local filesystem too.

Is this correct?

streptomycin · on Jan 1, 2023

The original plan from Google for the file system access API was to allow read/write of real files, but both Mozilla and Apple said that was too dangerous, and the OPFS was created as a compromise.

Yeah, it's going to be confusing for users when they want to actually want to use one of these files outside of the application that created it. But it's better than nothing!

justsomeuser · on Jan 2, 2023

This does seem strange, as you can read and write whole files to/from the user's file system once they give your app permission.

Adding those API's to read/write parts of the file is the next logical step. But it looks like it is limited to the isolated OPFS only.

johnny22 · on Jan 1, 2023

well it makes sense that the browser can't work outside of the opfs boundary, but shouldn't there be a way to talk to the opfs from outside the browser?

tanin · on Jan 1, 2023

Full outer join is my fave update.

I kinda wish sqlite has more functions though.

bob1029 · on Jan 2, 2023

You can add as many functions as you like:

https://www.sqlite.org/appfunc.html

tanin · on Jan 2, 2023

There is something like this: https://news.ycombinator.com/item?id=26683832

But I would prefer using something Sqlite officially maintains and not to maintain anything myself. I also don't prefer using libraries from random people unfortunately. Working at a fintech makes me paranoid...

Rebelgecko · on Jan 2, 2023

Unfortunately I don't think that can be done natively (like defining a function before using it in a query via the CLI). Only by writing and compiling some C

zimpenfish · on Jan 2, 2023

> Only by writing and compiling some C

Or Rust - https://ricardoanderegg.com/posts/extending-sqlite-with-rust...