This might be throwing a lit match into a gasoline refinery, but why not opt for...

AtlasBarfed · on Aug 17, 2019

XML cannot be parsed into nested maps/dictionaries/lists/arrays without guidance from a type or a restricted xml structure.

JSON can do that. It also maps pretty seamlessly to types/classes in most languages without annotations, attributes, or other serialization guides.

It also has explicit indicators for lists vs subdocuments vs values for keys, which xml does not. XML tags can repeat, can have subtags, and then there are tag attributes. A JSON document can also be a list, while XML documents must be a tree with a root document.

XML may be acceptable for documents. But seeing as how XHTML was a complete dud, I doubt it is useful even for that.

And we didn't even need to get into the needless complexity of validation, namespaces, and other junk.

falcolas · on Aug 17, 2019

> And we didn't even need to get into the needless complexity of validation, namespaces, and other junk.

So, that’s why we’re adding all of this “junk” back into JSON? Transformers, XPath for JSON, validation, schemas, namespaces (JSON-LD, JSON prefixes) it’s all there.

History repeating itself (and here’s the important part) because this complexity is needed. Not every application will need every complication, but every complication is needed by some application.

robocat · on Aug 17, 2019

No junk has been added into JSON - the specification hasn't changed to accommodate those features.

Unless you need to use the feature, you don't need to know anything about it, which is a huge benefit for the majority. XML almost encourages programmers to use unnecessary features.

When an application domain chooses to add a feature (say JSON-LD) then there are advantages to that mixture over XML. Where XML is better, it is often chosen instead.

falcolas · on Aug 17, 2019

> the specification hasn't changed to accommodate those features.

Neither did XML. They simply took advantage of an early "processing directive" feature to add them in. XML and JSON are no different in this regard.

ChrisSD · on Aug 17, 2019

Except that JSON doesn't have anything like processing directives or even comments.

falcolas · on Aug 18, 2019

That depends entirely on which parser you’re using. People have wanted comments so badly there are parsing libraries (and proposed revisions to JSON) that include comments. And sometimes those comments are used to provide processing directives.

https://json5.org/ Jsonnet https://www.npmjs.com/package/comment-json

> Suppose you are using JSON to keep configuration files, which you would like to annotate. Go ahead and insert all the comments you like. Then pipe it through JSMin before handing it to your JSON parser.

reissbaker · on Aug 18, 2019

Neither Jsonnet nor JSON5 are JSON, and they're considerably less popular than their forebear.

(Although I do love JSON5.)

zvrba · on Aug 18, 2019

> XML cannot be parsed into nested maps/dictionaries/lists/arrays without guidance from a type or a restricted xml structure.

? Using XML without a schema is slightly worse than JSON because the content of each node is just "text". XML with schema is far more powerful, also because of a richer type-system. JSON dictionaries are most of the time used to encode structs, but for that you have `complexType` and `sequence` in the XML schema.

I've been using XML with strongly-typed schemas for serialization for the last couple of years and couldn't be happier. I have ~100 classes in the schema, yet I've needed a true dictionary like 2 or 3 times.

> And we didn't even need to get into the needless complexity of validation, namespaces, and other junk.

Validation is junk? Isn't it valuable to know that 1) if your schema requires a certain element, and 2) if the document has passed validation, then navigating to that element and parsing it according to its schema type won't throw a run-time exception?

Namespaces are junk? They serve the same purpose as in programming languages. How else would you put two elements of the same name but of different semantics (coming from different sources) into the same document? You can fake this in JSON "by convention", but in XML it's standardized.

crispyambulance · on Aug 17, 2019

XML is a perfectly serviceable data exchange format. The parsers and serializers work great when used properly. It's nice to have schema.

But I think people just got sick of XML because it was abused so badly with "web services", SOAP, wsdl and all those horrible technologies from the early naughts. Over-complicated balls of mud that made people miserable.

eknkc · on Aug 17, 2019

Apple's plist format might be the weirdest abuse of XML as far as I can tell. The SOAP envelopes and shit like that were horrible but plist is plain weird.

Everyone abused XML some way or another. JSON is not that "abusable" I'd say.

amaccuish · on Aug 18, 2019

PLIST is pretty flexible though, the underlying storage can be XML, binary or even JSON now.

cellularmitosis · on Aug 18, 2019

Implementing a binary plist encoding on you REST endpoints is actually pretty great for iOS devices.

nimish · on Aug 17, 2019

Xml is a beast to parse. It's slow to parse and verbose but it doesn't give you a human friendly text format. It's got a number of weird features inherited from sgml. Every parser needs a quirks mode since nobody can write good schemas and schema parsers.

XML is a really bad interchange format. It's OK for a document markup language, and that's where it survives.

tootie · on Aug 17, 2019

When I was doing XML/Java stuff 10 years ago, you take your XSD and generate domain classes as a build step. It was more complicated but it was also 100% reliable because the tools were all rock solid. Written by the guy who made Jenkins.

beatgammit · on Aug 17, 2019

Many languages have libraries built in that do something reasonable with JSON. Usually you just make a class or struct, instantiate it, and then generate JSON, no need to have a separate compile step. When going the other direction, I usually just format the JSON, copy that into my code, then fix the compile errors.

XML has all that tooling because it needs it. JSON is a lot more straightforward, is more compact, and is faster to parse and (probably) generate.

If you're going to go through the effort of a compile step, you should probably just use a binary protocol, which will get you even better performance and getting documentation out of the box (e.g. protocol buffers schemas are very readable).

I see absolutely no reason to use XML these days as a data format, but it's still a response choice as a markup format (you know, what the M stands for).

zvrba · on Aug 18, 2019

> Many languages have libraries built in that do something reasonable with JSON.

What about cross-language? In C# I define a class containing a `DateTime` field, export the schema with xsd, and generate classes for Java with xjc, and get back a field of (an equivalent of) `DateTime` type. Doing what you suggest with JSON, I'd get a "string". Thanks but no thanks.

> If you're going to go through the effort of a compile step, you should probably just use a binary protocol, […] I see absolutely no reason to use XML these days as a data format,

In our product we use a relational db (SQLServer) combined with XML. Each table has a structured part which is put into relational columns, plus an extensions part that is put into a "Data" XML column for semi-structured data. SQLServer supports XQuery so we can query the semi-structured data from SQL when needed.

This wouldn't fly with a binary format.

EDIT: yes, SQLServer also supports JSON, but has special optimizations for XML (e.g., it can understand schema types, it supports XML indexes which "shred" XML to a more efficient binary representation based on schema, etc.)

hombre_fatal · on Aug 17, 2019

Not the best context to suggest XML superiority: https://cheatsheetseries.owasp.org/cheatsheets/XML_Security_...

If parsing JSON is bad, XML is a clusterfuck.

Nicksil · on Aug 17, 2019

> Not the best context to suggest XML superiority

where was it insinuated XML is superior? It was a very reasonable response.

kelnos · on Aug 17, 2019

There'd be no reason to suggest it as an alternative were it not for an opinion that it's superior.

brightball · on Aug 17, 2019

It’s got a lot built around it that JSON doesn’t have an equivalent for.

I still miss XSD and WSDL for a lot of cases. Other cases it was serious overkill where JSON is a better option.

XML isn’t superior. It’s heavier but more complete.

JSON is lighter and less complete.

Everything in code is always about trade offs. The error comes when people advocate for one solution all the time.

beatgammit · on Aug 17, 2019

If you're going to reach for automation, why not just use a binary format like protocol buffers, flat buffers, capn proto, etc? You get the tooling and a ton of performance for free.

JSON is great because you don't need tooling. XML is great because it's expressive. You don't need expressiveness for a data format, but it works great as a markup language.

specialist · on Aug 17, 2019

Syntax aside, I think the original mistake is IDLs, schemas, and other attempts at formalism.

WSDL, SOAP, and all their precursors were attempted in spite of Postel's Law.

Repeating myself:

Back when I was doing electronic medical records, my two-person team ran circles around our (much larger) partners by abandoning the schema tool stack. We were able to detect, debug, correct interchange problems and deploy fixes in near realtime. Whereas our partners would take days.

Just "screen scrap" inbound messages, use templates to generate outbound messages.

I'd dummy up working payloads using tools like SoapUI. Convert those known good "reference" payloads into templates. (At the time, I preferred Velocity.) Version every thing. To troubleshoot, rerun the reference messages, diff the captured results. Massage until working.

Our partners, and everyone I've told since, just couldn't grok this approach. No, no, no, we need schemas, code generators, etc.

There's a separate HN post about Square using DSLs to implement OpenAPI endpoints. That's maybe 1/4th of the way to our own home made solution.

Zarel · on Aug 18, 2019

I personally like XML a lot for rich text (I like HTML better than TeX) and layout (like in JSX for React), and it's not horrible if you want a readable representation for a tree, but I can't imagine using it for any other purpose.

JSON is exactly designed for object serialization. XML can be used for that purpose but it's awkward and requires a lot of unnecessary decisions (what becomes a tag? what becomes an attribute? how do you represent null separately from the empty string?) which just have an easy answer in JSON. And I can't think of any advantage XML has to make up for that flaw. Sure, XML can have schemas, but so can JSON.

I will agree that JSON is horrible for config files for humans to edit, but XML is quite possibly even worse at that. I don't really like YAML, either. TOML isn't bad, but I actually rather like JSON5 for config files - it's very readable for everyone who can read JSON, and fixes all the design decisions making it hard for humans to read and edit.

taftster · on Aug 17, 2019

One of the biggest advantages for XML are attributes and namespaces. I miss these in JSON.

As AtlasBarfed mentioned, JSON has a native map and list structure in its syntax, which is sorely missed in XML. You have to rely on an XML Schema to know that some tag is expected to represent a map or list.

JSON with attributes and namespaces would be my ideal world.

beatgammit · on Aug 17, 2019

Why do you want those? Attributes and namespaces just make in memory representation complicated. They're quite useful for markup, but I don't really know why you'd want them in a data format.

Use JSON or a binary protocol for data, XML for markup.

legulere · on Aug 17, 2019

If JSON with its relative simplicity is already too complex and leading to a mine field, then XML is even worse by far.

_skel · on Aug 17, 2019

XML manages to be difficult and complex for both computers and people to read. That's why it fell out of favor.

dwaite · on Aug 18, 2019

To be fair, there were a lot of very good ideas for a 2.x XML that solved a lot of the complexity. The problem was that none of the tools would be upgraded to support it.

You'd basically have to create a new independent format to have proper compatibility once you introduce breaking changes.

dwaite · on Aug 18, 2019

FWIW, the common changes were to

- remove DTDs completely.

- by removing DTDs, remove non-standard entities

- by removing DTDs, remove the concepts of notations and all external resource resolution from the core spec. Also, no possibility of entity-expansion attacks.

- by removing DTDs, remove validation from the core spec.

- merge namespaces into the core specification. At the same time, make them mandatory

- merge the concept of qualified names into the core specification

- by making namespaces mandatory, all the variations of how namespaces get exposed can be eliminated

- merge the info-set definition into the core specification

- by describing XML items and how they relate, implementations can understand what data is relevant at a particular point while parsing the document.

- Merge xml:id into the core specification.

You also had some other fun outlier concepts:

- Eliminate prefixes from infoset. This is mostly a breaking change for XPath and XML Schema.

- Add an explicit qualified name token (possibly recycling the entity declaration). This would allow the above specs to have their functionality restored, although likely with a new format.

- Accept qualified names without prefixes, such as via a {uri}:{localName} syntax.

carapace · on Aug 17, 2019

Not to mention XSLT.