This might be throwing a lit match into a gasoline refinery, but why not opt for XML in some circumstances?
Between its strong schema and wsdl support for internet standards like soap web services, XML covers a lot of ground that Json encoding doesn't necessarily have without add-ons.
I say this knowing this is an unfashionable opinion and XML has its own weaknesses, but in the spirit of using web standards and LoC approved "archivable formats", IMO there is still a place for XML in many serialization strategies around the computing landscape.
Json is perfect for serializing between client and server operations or in progressive web apps running in JavaScript. It is quite serviceable in other places as well such as microservice REST APIs, but in other areas of the landscape like middleware, database record excerpts, desktop settings, data transfer files, Json is not much better or sometimes even slightly worse than XML.
XML cannot be parsed into nested maps/dictionaries/lists/arrays without guidance from a type or a restricted xml structure.
JSON can do that. It also maps pretty seamlessly to types/classes in most languages without annotations, attributes, or other serialization guides.
It also has explicit indicators for lists vs subdocuments vs values for keys, which xml does not. XML tags can repeat, can have subtags, and then there are tag attributes. A JSON document can also be a list, while XML documents must be a tree with a root document.
XML may be acceptable for documents. But seeing as how XHTML was a complete dud, I doubt it is useful even for that.
And we didn't even need to get into the needless complexity of validation, namespaces, and other junk.
> And we didn't even need to get into the needless complexity of validation, namespaces, and other junk.
So, that’s why we’re adding all of this “junk” back into JSON? Transformers, XPath for JSON, validation, schemas, namespaces (JSON-LD, JSON prefixes) it’s all there.
History repeating itself (and here’s the important part) because this complexity is needed. Not every application will need every complication, but every complication is needed by some application.
No junk has been added into JSON - the specification hasn't changed to accommodate those features.
Unless you need to use the feature, you don't need to know anything about it, which is a huge benefit for the majority. XML almost encourages programmers to use unnecessary features.
When an application domain chooses to add a feature (say JSON-LD) then there are advantages to that mixture over XML. Where XML is better, it is often chosen instead.
That depends entirely on which parser you’re using. People have wanted comments so badly there are parsing libraries (and proposed revisions to JSON) that include comments. And sometimes those comments are used to provide processing directives.
> Suppose you are using JSON to keep configuration files, which you would like to annotate. Go ahead and insert all the comments you like. Then pipe it through JSMin before handing it to your JSON parser.
> XML cannot be parsed into nested maps/dictionaries/lists/arrays without guidance from a type or a restricted xml structure.
? Using XML without a schema is slightly worse than JSON because the content of each node is just "text". XML with schema is far more powerful, also because of a richer type-system. JSON dictionaries are most of the time used to encode structs, but for that you have `complexType` and `sequence` in the XML schema.
I've been using XML with strongly-typed schemas for serialization for the last couple of years and couldn't be happier. I have ~100 classes in the schema, yet I've needed a true dictionary like 2 or 3 times.
> And we didn't even need to get into the needless complexity of validation, namespaces, and other junk.
Validation is junk? Isn't it valuable to know that 1) if your schema requires a certain element, and 2) if the document has passed validation, then navigating to that element and parsing it according to its schema type won't throw a run-time exception?
Namespaces are junk? They serve the same purpose as in programming languages. How else would you put two elements of the same name but of different semantics (coming from different sources) into the same document? You can fake this in JSON "by convention", but in XML it's standardized.
XML is a perfectly serviceable data exchange format. The parsers and serializers work great when used properly. It's nice to have schema.
But I think people just got sick of XML because it was abused so badly with "web services", SOAP, wsdl and all those horrible technologies from the early naughts. Over-complicated balls of mud that made people miserable.
Apple's plist format might be the weirdest abuse of XML as far as I can tell. The SOAP envelopes and shit like that were horrible but plist is plain weird.
Everyone abused XML some way or another. JSON is not that "abusable" I'd say.
Xml is a beast to parse. It's slow to parse and verbose but it doesn't give you a human friendly text format. It's got a number of weird features inherited from sgml. Every parser needs a quirks mode since nobody can write good schemas and schema parsers.
XML is a really bad interchange format. It's OK for a document markup language, and that's where it survives.
When I was doing XML/Java stuff 10 years ago, you take your XSD and generate domain classes as a build step. It was more complicated but it was also 100% reliable because the tools were all rock solid. Written by the guy who made Jenkins.
Many languages have libraries built in that do something reasonable with JSON. Usually you just make a class or struct, instantiate it, and then generate JSON, no need to have a separate compile step. When going the other direction, I usually just format the JSON, copy that into my code, then fix the compile errors.
XML has all that tooling because it needs it. JSON is a lot more straightforward, is more compact, and is faster to parse and (probably) generate.
If you're going to go through the effort of a compile step, you should probably just use a binary protocol, which will get you even better performance and getting documentation out of the box (e.g. protocol buffers schemas are very readable).
I see absolutely no reason to use XML these days as a data format, but it's still a response choice as a markup format (you know, what the M stands for).
> Many languages have libraries built in that do something reasonable with JSON.
What about cross-language? In C# I define a class containing a `DateTime` field, export the schema with xsd, and generate classes for Java with xjc, and get back a field of (an equivalent of) `DateTime` type. Doing what you suggest with JSON, I'd get a "string". Thanks but no thanks.
> If you're going to go through the effort of a compile step, you should probably just use a binary protocol, […] I see absolutely no reason to use XML these days as a data format,
In our product we use a relational db (SQLServer) combined with XML. Each table has a structured part which is put into relational columns, plus an extensions part that is put into a "Data" XML column for semi-structured data. SQLServer supports XQuery so we can query the semi-structured data from SQL when needed.
This wouldn't fly with a binary format.
EDIT: yes, SQLServer also supports JSON, but has special optimizations for XML (e.g., it can understand schema types, it supports XML indexes which "shred" XML to a more efficient binary representation based on schema, etc.)
If you're going to reach for automation, why not just use a binary format like protocol buffers, flat buffers, capn proto, etc? You get the tooling and a ton of performance for free.
JSON is great because you don't need tooling. XML is great because it's expressive. You don't need expressiveness for a data format, but it works great as a markup language.
Syntax aside, I think the original mistake is IDLs, schemas, and other attempts at formalism.
WSDL, SOAP, and all their precursors were attempted in spite of Postel's Law.
Repeating myself:
Back when I was doing electronic medical records, my two-person team ran circles around our (much larger) partners by abandoning the schema tool stack. We were able to detect, debug, correct interchange problems and deploy fixes in near realtime. Whereas our partners would take days.
Just "screen scrap" inbound messages, use templates to generate outbound messages.
I'd dummy up working payloads using tools like SoapUI. Convert those known good "reference" payloads into templates. (At the time, I preferred Velocity.) Version every thing. To troubleshoot, rerun the reference messages, diff the captured results. Massage until working.
Our partners, and everyone I've told since, just couldn't grok this approach. No, no, no, we need schemas, code generators, etc.
There's a separate HN post about Square using DSLs to implement OpenAPI endpoints. That's maybe 1/4th of the way to our own home made solution.
I personally like XML a lot for rich text (I like HTML better than TeX) and layout (like in JSX for React), and it's not horrible if you want a readable representation for a tree, but I can't imagine using it for any other purpose.
JSON is exactly designed for object serialization. XML can be used for that purpose but it's awkward and requires a lot of unnecessary decisions (what becomes a tag? what becomes an attribute? how do you represent null separately from the empty string?) which just have an easy answer in JSON. And I can't think of any advantage XML has to make up for that flaw. Sure, XML can have schemas, but so can JSON.
I will agree that JSON is horrible for config files for humans to edit, but XML is quite possibly even worse at that. I don't really like YAML, either. TOML isn't bad, but I actually rather like JSON5 for config files - it's very readable for everyone who can read JSON, and fixes all the design decisions making it hard for humans to read and edit.
One of the biggest advantages for XML are attributes and namespaces. I miss these in JSON.
As AtlasBarfed mentioned, JSON has a native map and list structure in its syntax, which is sorely missed in XML. You have to rely on an XML Schema to know that some tag is expected to represent a map or list.
JSON with attributes and namespaces would be my ideal world.
Why do you want those? Attributes and namespaces just make in memory representation complicated. They're quite useful for markup, but I don't really know why you'd want them in a data format.
Use JSON or a binary protocol for data, XML for markup.
To be fair, there were a lot of very good ideas for a 2.x XML that solved a lot of the complexity. The problem was that none of the tools would be upgraded to support it.
You'd basically have to create a new independent format to have proper compatibility once you introduce breaking changes.
- by removing DTDs, remove the concepts of notations and all external resource resolution from the core spec. Also, no possibility of entity-expansion attacks.
- by removing DTDs, remove validation from the core spec.
- merge namespaces into the core specification. At the same time, make them mandatory
- merge the concept of qualified names into the core specification
- by making namespaces mandatory, all the variations of how namespaces get exposed can be eliminated
- merge the info-set definition into the core specification
- by describing XML items and how they relate, implementations can understand what data is relevant at a particular point while parsing the document.
- Merge xml:id into the core specification.
You also had some other fun outlier concepts:
- Eliminate prefixes from infoset. This is mostly a breaking change for XPath and XML Schema.
- Add an explicit qualified name token (possibly recycling the entity declaration). This would allow the above specs to have their functionality restored, although likely with a new format.
- Accept qualified names without prefixes, such as via a {uri}:{localName} syntax.
Between its strong schema and wsdl support for internet standards like soap web services, XML covers a lot of ground that Json encoding doesn't necessarily have without add-ons.
I say this knowing this is an unfashionable opinion and XML has its own weaknesses, but in the spirit of using web standards and LoC approved "archivable formats", IMO there is still a place for XML in many serialization strategies around the computing landscape.
Json is perfect for serializing between client and server operations or in progressive web apps running in JavaScript. It is quite serviceable in other places as well such as microservice REST APIs, but in other areas of the landscape like middleware, database record excerpts, desktop settings, data transfer files, Json is not much better or sometimes even slightly worse than XML.