The important point to keep in mind is that SQLite will read JSON5, but it never writes it. The JSON that SQLite generates is canonical JSON that is fully compliant with the original JSON spec.
It turns out that there is a lot of "JSON" data in the wild that is not pure and proper JSON, but instead includes some of the extensions of JSON5. The point of this enhancement is to enable SQLite to read and process most of that wild JSON.
This feature was requested by multiple important users of SQLite.
Off topic: would you mind sharing any info on potential timing of begin-concurrent-pnu-wal2 branch being merged into main (or consideration of forking sqlite to have a "client/server" version)?
While I don't care for JSON5 it's hardly "virtually any sequence of text", it rather hews closer to JavaScript content, rather than the very limited subset (ish) that is JSON.
Virtually any sequence is not the same as adopting a specification five years after it’s been published and adopted by industry: https://spec.json5.org/
I appreciate that SQLite can’t write the format, because those changes are human afordances
> > Single (//...) and multi-line (/.../) comments are allowed.
We can now have different parsers have pragmas that specify different behaviour depending on whether those pragmas are recognized or not.
In case anyone was wondering, the history is that comments were considered an anti-feature by Douglas Crockford, the creator of JSON:
> I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability. I know that the lack of comments makes some people sad, but it shouldn't.
> Suppose you are using JSON to keep configuration files, which you would like to annotate. Go ahead and insert all the comments you like. Then pipe it through JSMin before handing it to your JSON parser.
And for a data interchange format this is also 100% reasonable.
The problem is that people have started using JSON for configuration files and the like, which IMHO has always been – and continues to be – the wrong tool for the job.
I think the point is that your config parser should be using yaml or toml, though in fairness neither of those existed in the early 2000s when JSON was being developed/discovered/adopted— the 800lb gorilla in that space at the time was XML.
And XML remained dominant for a long time— for example, the original Google Maps from 2005 received its server responses as XML blobs, and API v2 even exposed the relevant parsing functionality as the GXml JavaScript class. By around 2007, it was all JSONp I think, and GXml was deprecated and removed in API v3 and v4 respectively.
Is JSON5 a thing people use? I see it's from 2012 but it's the first I've heard of it. It looks fairly sensible; I'd take it for trailing commas alone. And comments!
No, VSCode uses jsonc, not JSON5. They are very different, e.g. allowing trailing commas is an explicit design goal of JSON5, whereas in jsonc they are frowned upon (generates a warning, and was illegal at one point). jsonc is basically JSON + comments and nothing more, IIRC.
Tried several "json replacement" formats (bson, cbor, msgpack, ion...) and json5 won by being the smallest after compression (on my data) while also having the nice bonus of retaining human-readability.
How does it affect speed of parsing in JavaScript? I kinda thought the whole reason this deeply-mediocre format caught on in the first place was it was especially natural & fast to serialize/deserialize in JavaScript, on account of being a subset of that language. (XML also had the "fast" going for it thanks to browser APIs, but not so much the "natural")
Most of the "popular" publicly available formats at the time were singularly worse, even ignoring commonly limited or inconvenient language support.
SOAP? ASN.1? plists? CSV? uuencode? I'll still take JSON over all of them, especially when it comes to sending shit to the browser (plists might be workable with a library isolating you from it, but it is way too capable for server to browser communications, or even S2S for that matter not all languages expose a URL or an OSet type).
> it was especially natural & fast to serialize/deserialize in JavaScript, on account of being a subset of that language.
That is certainly a factor, specifically that you could parse it "natively", initially via eval, and relatively quickly[0] via built-in JSON support (for more safety as the eval methods needed a few tricks to avoid full RCE).
But an other factor was almost certainly that it's simple to parse and the data model is a lower common denominator for pretty much every dynamically typed language. And you didn't need to waste time on schemas and codegen, which at the time was a breath of fresh air.
> XML also had the "fast" going for it thanks to browser APIs, but not so much the "natural"
XML never has "fast" going on in any situation, the browser XML APIs are horrible, and you had to implement whatever serialization format you wanted in javascript over that, so that was even slower (especially at a time when JS was mostly interpreted)
[0] compared to the time it started being used: Crockford invented / extracted JSON in 2001, but services started using JSON with the rise of webapps / ajax in the mid aughts, and all of Firefox, Chrome, and Safari added native json support mid-2009
> And you didn't need to waste time on schemas and codegen, which at the time was a breath of fresh air.
Everyone just wrote their own custom clients in code instead :) But it was a gradual process, so it's harder to notice the pain compared to some generator slamming a load of code in your project. coughgRPCcough.
> I kinda thought the whole reason this deeply-mediocre format caught on in the first place was it was especially natural & fast to serialize/deserialize in JavaScript, on account of being a subset of that language
No, people couldn't just eval random strings, especially the ones containing potentially malicious user input. They started writing parsers like any other language, then came the global JSON.parse and .stringify per spec.
I can't say for sure but I doubt JSON being a JS subset helped at all.
> No, people couldn't just eval random strings, especially the ones containing potentially malicious user input.
hehehehehe
> I can't say for sure but I doubt JSON being a JS subset helped at all.
Oh it very much did, both because it Just Worked in a browser context, and because the semantics fit dynamic languages very nicely, and those were quite popular when it broke through (it was pretty much the peak of Rails' popularity).
recursing issue since forever,
"separator" v.s. "terminator" both can work.
For data rows, I prefer termination as makes missing columns explicit.
But being optional here, it means nothing.
I guess it facilitates low effort scripting that doesn't have to treat the last element differently.
Just noticed today that the Python formatter "black" does this, I really disagree. Of course I realise that by noticing its changes and arguing about them I am missing the whole point of using it.
Type validation for one. The reduced chance that a client has put some backwards format in there like MM-DD-YYYY (or DD-MM-YYYY for that matter) or just an invalid date completely.
You might as well ask what native numeric or boolean support offer over just jamming stuff in a string in an agreed format. Some might argue that dates are a compound value so differ from atomic types like a number, but they are wrong IMO as a datetime can be treated as a simple numeric¹ with the compound display being just that – a display issue.
Others will point out that JS doesn't have a native date/datetime/time type, but JSON is used for a lot more that persisting JS structures at this point.
--
[1] caveat: this stops being true if you have a time portion with a timezone property
Maybe it is "just us", maybe it is part of the general shitness you point out, maybe it is the lack of literal representation (even VB and relatives had one) other than an ISO8601 string, but dates don't feel native like simpler types, arrays, objects, fictions, …
> So what I'm reading here is that you don't need types to be supported natively in order to serialize to JSON.
Yes. But not absolutely needing something does not mean it isn't (or wouldn't be) exceptionally useful to have.
What it gets you is not having to deal with breakage when someone doesn't know the difference between ISO8601 and RFC3339 or, in fact, `date` strings. Shuffle all that mess off down the stack to where someone else has made one decision, once, rather than having to relitigate it every time.
JSON will never have datetime support, since Javascript does not have datetime literals (and that's a good thing given how horrible the Date object is).
Probably more importantly, all of that and still not proper datetimes in sqlite.
Also even more so no domains (for custom datatypes).
> Probably more importantly, all of that and still not proper datetimes in sqlite.
Home Assistant recently did a ton of changes to work around the issues caused by this.
The short story is that they stopped storing timestamps as 'timestamp' datatypes and started storing them as unix times stored in numeric columns. Since timestamps turn into strings in SQLite, this was a huge improvement for storage space, performance, etc.
The problem is that this change also affects databases which have a real datetime datatype. So PostgreSQL, which internally stores timestamps as unix times, is now being told to store a numeric value. To treat it as a timestamp you have to convert it while querying. Since I used PostgreSQL for my Home Assistant installation, this feels like a giant step backwards for me.
I wish that they had used this change as an opportunity to refactor the database code a bit so that they could store timestamps as numeric for SQLite, but use a real timestamp datatype for MySQL and PostgreSQL. I'm sure that this isn't a simple thing to do though.
Generally speaking... DateTime/TimeStamp fields between databases are in general treated differently either in practice or purpose much of the time. When migrating from one database to another, this is almost always an issue.
I don't know what "appropriate" or even "beyond the scope" mean in this context but, having in mind that datetime data needing to be stored is a fact of life that's not going away then I'd say yes, it does belong in JSON.
Wrong, there's no datetime there. Only integer can be used for it in a more space-saving manner. But that means you have to convert and invoke calculation functions. It's error-prone.
Comments were considered an anti-feature by Douglas Crockford, the creator of JSON:
> I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability. I know that the lack of comments makes some people sad, but it shouldn't.
> Suppose you are using JSON to keep configuration files, which you would like to annotate. Go ahead and insert all the comments you like. Then pipe it through JSMin before handing it to your JSON parser.
I always hated this logic, the "I know better than you". You can still add pragmas trivially in JSON, this doesn't prevent that problem, it just makes it more convoluted at the expense of a powerful documentation feature.
> I always hated this logic, the "I know better than you".
Feel free to create your own data interchange file format. (Perhaps with Blackjack and hookers. :)
> You can still add pragmas trivially in JSON, this doesn't prevent that problem, it just makes it more convoluted at the expense of a powerful documentation feature.
Computers/systems do not need comments to exchange data between themselves, and that is the problem JSON is trying to solve: data exchange.
If you're using it for configuration and other human-centric tasks, and wish to communicate human-centric things in the file, you're trying to fit a square peg into a round hole. Use some other format that was designed with those things in mind.
If you're saying the designers of a hammer didn't think of being able to handle Robertson screws you're unfairly critiquing the tool.
> Feel free to create your own data interchange file format.
This is such a lazy argument to make. We are both allowed to criticize flaws in data exchange formats. As far as what JSON is trying to solve, it's far beyond that, and the creator knew this (which is why he went out of his way to fight it). If you as a human have ever debugged anything dealing with a json handling, you've already proven that comments can have value.
This really never made sense; I find adding property like __pragma: nicer than comments. It has the same problem as he stated, but leaving out comments is not going to prevent that; it was a bad idea IMHO. I use hjson just for the comments; it's excellent.
It always made perfect sense, given JSON was invented as a data exchange format. It was never intended to be written out by hand.
> I find adding property like __pragma: nicer than comments. It has the same problem as he stated
It does not, because that is data which every JSON parser will at least parse if not interpret.
A comment is something a parser has no reason to yield, which means smuggling metadata in comments makes the document itself inconsistent between parsers which do and parsers which don't[0] expose comments.
And this was not a fancy Crockford made up, smuggling processing instructions in comments was ubiquitous in Java at the time, as well as HTML/XML, where it literally still is a thing: https://en.wikipedia.org/wiki/Conditional_comment.
[0] and possibly can't, how do you deserialize a comment in the middle of an object to JS or Ruby
It's a shame because it's a rather trivial thing to filter out. Personally I wish they would have left out the "//" single-line commenting to avoid the whitespace dependency.
Filtering them out is indeed trivial, but then they're not stable / preserved, and so why bother writing them?
Ensuring that comments and their locations in the text are preserved by processors is quite difficult if not impossible. Reformatting a JSON text alone can "break" comments by not necessarily placing them where they belong. Any schema transformation means comments must be dropped.
You can write a parser that generates an AST node for comments and then pretty prints that AST. So it’s not impossible, but it does require that your parser gives you the option to not just drop comments on the floor.
{
// start
x: "y",
// the following is foo
foo: "hey",
// the following is bar
bar: "there"
// end
}
When parsing it, should I attach each comment to the following name or the previous name?
If I reformat to change indentation, should I change the indentation of the comments too? How would the comment that reads "the following is bar" be re-indented?
What if there are duplicate names? JSON does allow duplicates. If I attach comments to preceding (or following) names, then while parsing I find a dup... what should I do with the preceding comment and the new name?
You make the comment a child node of the object literal node, don’t associate it with the following key/value pair at all. When you pretty print it, it will have the same relative order with the following kv pair as it had originally (and you cannot reorder json keys safely because the handling of duplicate keys is implementation-defined).
As far as indentation goes, if I were writing a pretty printer, I’d make indentation handled by the object/array node not by the comment/kv pair node. So that doesn’t matter: the comment will start printing at whatever the current column is just like the kv pairs would.
Anyways, among other tools, prettier has already solved this problem.
> You make the comment a child node of the object literal node, don’t associate it with the following key/value pair at all. When you pretty print it, it will have the same relative order with the following kv pair as it had originally (and you cannot reorder json keys safely because the handling of duplicate keys is implementation-defined).
Many implementations parse objects into hash tables and lose duplicates and even order of appearance of name/value pairs, and all of that is allowed. Such implementations will not be able to keep relative order of comments.
Don’t use that sort of implementation for a pretty printer is my point. I’m not saying you can pretty print json that has been parsed by a generic parser, but that you can write a parser that can pretty print arbitrary json.
I fail to see how this is relevant. Your code feeds JSON to SQL, that remains supported. Or your code fetches SQL-generated value from SQL, in which case as literally stated before the JSON5 listing:
> JSON text generated by [JSON] routines will always be strictly conforming to the canonical definition of JSON.
Is there an unavoidable reason why SQLite won't allow foreign keys to be added with ALTER TABLE, only at table creation? The whole "select * into temporary table and create new table with FK constraint" seems so verbose / convoluted, my layman's mind can't comprehend why one can't simply
The alter table docs have a general explanation kinda.
They don't store the parsed representation of the schema, since that would lock the table to a specific version. They store the create command itself, and each time you open a DB the schema is generated from that. Allows for flexible upgrades to the schema system without the need for migrations, and lets you use the same DB file with multiple sqlite versions.
The downside is that alter table commands are just edits to the create table string, which when accounting for all the different versions is fairly difficult and risky, so alter operations are limited.
Is there something special about ALTER DDLs that make it impossible to store them and play them back along with CREATE DDls to derive the final schema?
But also FKs remain half-assed and seen with some disdain: it's 2023, support was added (a bit under) 14 years ago, and you still need to activate foreign keys every time you open the database file. There is still no way to enable FKs by default for a database file, let alone prevent disabling it.
”You can set it before compiling” is such a foreign concept to the real world that it’s almost meaningless, if that is the only way to change the behavior it’s pretty crazy.
Many applications use SQLite as an embedded library; that's how it's intended to be used anyway.
It doesn't really jibe all that well with things like Python though, where you typically don't do that. But it's certainly not a "foreign concept" or "almost meaningless".
For e.g. Python you can still just set the desired parameters when connecting, which should usually be handled by the database connector/driver for you.
How are people using sqlite within a multi-threaded, asynchronous runtime: are you using a synchronization lock? SQLite seems to "kinda, sorta" support multi-threading, requiring special configuration and caveats. Is there a good reference?
1. Use WAL mode - this allows reading while practically never being blocked by writes.
2. Use application-level locks (in addition to the SQLite locking system). The queuing behavior from application-level locks can work better than the retry mechanism provided by SQLite.
3. Use connection pooling with a separate connection per thread - one write connection and multiple read connections.
Ideally, all of the above would be covered by a library, and not up to the app developer.
“To accelerate searching the WAL, SQLite creates a WAL index in shared memory. This improves the performance of read transactions, but the use of shared memory requires that all readers must be on the same machine [and OS instance]. Thus, WAL mode does not work on a network filesystem.”
“It is not possible to change the page size after entering WAL mode.”
“In addition, WAL mode comes with the added complexity of checkpoint operations and additional files to store the WAL and the WAL index.”
SQLite does not guarantee ACID consistency with ATTACH DATABASE in WAL mode.
“Transactions involving multiple attached databases are atomic, assuming that the main database is not ":memory:" and the journal_mode is not WAL. If the main database is ":memory:" or if the journal_mode is WAL, then transactions continue to be atomic within each individual database file. But if the host computer crashes in the middle of a COMMIT where two or more database files are updated, some of those files might get the changes where others might not.”
Yeah, wal can also grow very large, making read operations increasingly slow on wrong checkpoint configuration, which also makes checkpointing slower and it grows out of proportion.
Inefficient queries, no reader gaps and a bad manual checkpointing system can bring the system down.
That's neat, I haven't worked with SQLLite for a few years but this changes it.
Of course you can still fall into the trap of reaching your checkpoints faster than the first wal file needs to write back to the database, but I think that's a less critical problem in pratice.
Wait, what's JSON5??
> Object keys may be unquoted identifiers.
> Objects may have a single trailing comma.
> Arrays may have a single trailing comma.
> Strings may be single quoted.
> Strings may span multiple lines by escaping new line characters.
> Strings may include new character escapes.
> Numbers may be hexadecimal.
> Numbers may have a leading or trailing decimal point.
> Numbers may be "Infinity", "-Infinity", and "NaN".
> Numbers may begin with an explicit plus sign.
> Single (//...) and multi-line (/.../) comments are allowed.
> Additional white space characters are allowed.
Oh crap, the levees have broken!