Hacker News new | past | comments | ask | show | jobs | submit login
OpenAPI v3.1 and JSON Schema 2019-09 (apisyouwonthate.com)
88 points by BerislavLopac on Feb 29, 2020 | hide | past | favorite | 45 comments



In all honesty, isn't json schema and open api more or less reinventing wsdl/soap/rpc? It feels like we've come full circle now and replaced <> with {} (and on the way lost a lot of mature xml tooling).


The biggest difference between openapi and soap(/wsdl) that I think openapi more descriptive vs soap being prescriptive. As in openapi is supposed to be able to describe pretty much any sort of api, as long as it is over HTTP. In comparison soap wants you to design your api to fit the soap model. So in that sense I see soap being closer to grpc than openapi.

I'd also echo the sentiment that iterating on good ideas is useful and not always just reinventing the wheel.


Except in practice the only accurate and complete openapi specs I see are generated from code annotations or are used to generate the service code. It’s a particular form of worse is better where overall you end up doing the same work as in the soap/wsdl days if you want to finish the job, but because in the beginning you can just hammer out undocumented code it feels much easier. Regrettably many developers stop at 80% and produce openapi specs that have the appearance of completeness without actually being complete.


My preferred approach is to use the OpenAPI as a spec, i.e. to drive the implementation, rather than just documenting the implementation. I wrote a little Python framework [1] to help me put that approach into practice.

[1] https://pyotr.readthedocs.io


Defusedxml lists a number of XML "Attack vectors: billion laughs / exponential entity expansion, quadratic blowup entity expansion, external entity expansion (remote), external entity expansion (local file), DTD retrieval" https://pypi.org/project/defusedxml/

Are there similar vulnerabilities in JSON parsers (that would then also need to be monkeypatched)?


Sure, but the page itself says, that it's mostly based on some uncommon features. Since we are converging to the feature list of XML/WSDL with JSON, what makes you think that some JSON (i'm using JSON as placeholder for the next big thing in the JSON space) parsing libraries won't have bugs? Isn't that the danger of the added complexity?

So, most likely some of those uncommon features won't be added to JSON but why didn't "we" as a community then not "just" invent "XML light" or disable those dangerous but sometimes used features behind better defaults?

So, over the years all i've seen is some mumble jumble over how XML is too complicated and too noisy and big, atleast i used to say that, but what if that complexity came from decades of usage, from real requirements? Why wouldn't we arrive in the same situation 10 or 20 years from now? And if size matters so much, why do we ship megabytes of code for simple websites? And why didn't we choose protobuffers over JSON?

And what makes you think that the next generation won't think that JSON is too complicated (because naturally, complexity will increase with new requirements and features). So, i suppose we'll see a JSON replacement with different brackets ("TOML Schema"?) in the future. Or maybe it's time for a binary representation again, just to be replaced with the next "editor friendly and humanly readable" format.

It all feels like, we as an industry took a step backward for years (instead of improving the given tools and specs) just to head to the same level of complexity all the while throwing away lots and lots of mature tooling of XML processing.

P.S.: Same goes probably for ASN.1 vs. XML Schemas... So now we are in the 3rd generation of schema based data representation: ASN.1 -> XML -> JSON..

P.P.S.: I'm not arguing that we all should use XML now, i'm reflecting over the historical course and what might be ahead of us. Clearly, XML is declining steadily and has "lost".


Yeah, expressing complex types with only primitives in a portable way is still unfortunately a challenge. For example, how do we encode a datetime without ISO8601 and a schema definition; or, how do we encode complex numbers with units like "1j+1"; or "1.01 USD/meter"?

Fortunately, we can use XSD with RDFS and RDFS with JSON-LD.


Obviously there aren't similar issues with parsing JSON. On the other hand if you ingest XML you simply don't want to use full implementation of validating XML parser, because making that work is bunch of pointless busywork. And these vulnerabilities are only relevant for validating XML parsers.


xml entity expansion has nothing to do with XML validation and is an attack vector no matter if you validate the xml doc or not.


It's honestly very disappointing that there is still no non-JS replacement for in-browser XSLT. And in-browser XSLT wasn't that good to begin with.


There is a jspath implementation written in Haskell. Postgrest uses it for jwt configuration.


Yes they do. Years back, my boss told me about project requests. (we do api management) every new project in terms of requirements was: soooooo its like soap..


LDP: Linked Data Platform and Solid: social linked data (and JSON-LD) are the newer W3C specs for HTTP APIs.

For one, pagination is a native feature of LDP.


JSON-LD is just the same old RDF data model that the W3C promotes unsuccessfully for that last 2 decades, repackaged with trendy JSON syntax. It's almost as unreadable as the XML serialization and still put very little value on the table. It shines however with people who like to do over-engineering instead of delivering a product.


It's astounding how often people make claims like this.

There is a whole lot of RDF Linked Data; and it links together without needing ad-hoc implementations of schema-specific relations.

I'll just link to the Linked Open Data Cloud again, for yet another hater that's probably never done anything for data interoperability: https://lod-cloud.net/

That's a success.


i think one of the major reasons JSON took this role from XML is it has a tighter syntax and easier to understand rules, so serialization, hydration and deserialization became easier to optimize over time. I know over the wire JSON packs more tightly than XML too and that used to matter a lot more than I might now depending on the end users of your app too. I think these validation specifications though are all trying to solve the same problem: describing your API to users without human interaction. It turns out to be a tough problem


Yes, JSON has not much rules. JSON schema on the other hand adds a lot of complexity again.


In my testing after compression json+xml were essentially identical for the same payload


Clumsy XML parsers and clumsy language choices?

Strict formal equivalent information doesn't have to mean easily or as well coded in current idioms?


the only thing I miss are the comments


Then you may be happy to learn that OpenAPI documents can be written in YAML (since OpenAPI 2.0).


Attributes are also super nice to extend existing data. Doing so in JSON is a nightmare.


If there were comments, people would start putting data in comments.


but for that you'd have to use a custom parser. If you use a custom parser you could put data in comments today anyways. I don't understand that argument.


JSON5 supports comments: https://json5.org/


But who supports json5?


You can use json5 for comments and compile it to json.


You can use (yaml, hjson, jsonnet) for the same purpose. What sets json 5 apart?


YAML, hjson and jsonnet do not validate existing json blobs. You can "use" ROT13 for the same purpose, but it's incidental in what you are trying to do.

I would plead that everyone stop using YAML. It's terrible at everything.

jsonnet is a template language for a serialization format. Who would choose that nightmare?

Ecmascript isn't big on extending the language orthogonally, so hjson is eventually going to be superceded by a ES*.

This is a niche concern that has an optimal path. Go with a validation schema designed for applying to a serialization format, which has widespread library support.


>YAML ... do not validate existing json blobs

YAML 1.2 is a strict superset of JSON. What do you mean by "validate" here?

>I would plead that everyone stop using YAML. It's terrible at everything.

Fair enough.

>jsonnet is a template language for a serialization format. Who would choose that nightmare?

People who are tired of Jinja2 and YAML but don't want to jump to a general purpose programming language?

>widespread library support

JSON5 is implemented in Javascript, Go, and C#. Not sure how "widespread" that is. Rust, C, Python, Lua, Java, and Haskell are missing out on the fun.


When I searched for "JSON5 [language]", I found JSON5 implementations in/for Rust, C, Python, Java, and Haskell on the first page of search results.

I like YAML, but some of the syntax conveniences are gotchas: 'no' must be quoted, for example.


JSON is a lot friendlier up front though, and hopefully you don't need to go all-in. But the rabbit hole can get pretty deep with both JSON and XML validation, meta-schemas and -DTDs. Depending on what you're trying to do, some of the newer approaches like Cue might fit the bill better or at least feel less like a convoluted mess.


In fact if you want to specify some kind of rigid schema (and for SOAP/XML Schema it usually is needlessly rigid) you are reinventing either ASN.1 or SunRPC's XDR, only with larger overhead...


There are a number of problems with JSON and these schemas based upon it. And a huge chunk of the problems would be solved with a better, stronger type system. That's why I've spent the past couple of years building a replacement [1], [2], and an RPC layer and service selection mechanism that I haven't spec'd out yet.

We need a binary encoding because they're more efficient in code, time, and power. But we also need a text encoding because we're humans, and eyeballing binary data is terrible. I've made them both [3] [4], and they're 100% type compatible. The idea is that machines communicate using binary, and translate to/from text only when a human needs to inspect it or modify it.

The thing slowing me down atm is the coding side of things ([5] [6] [7] [8]) because it's tricky getting go to play ball with its reflect mechanism. Once I have a functioning gob replacement, I can move up to the next layer.

I'm probably a month away from releasing v1.0 of CBE and CTE, after which it'll be another few months for Streamux, and then the layers on top of that should go pretty quickly because I'm not bit bashing anymore.

[1] https://github.com/kstenerud/concise-encoding#concise-encodi...

[2] https://github.com/kstenerud/streamux/#streamux

[3] https://github.com/kstenerud/concise-encoding/blob/master/ct...

[4] https://github.com/kstenerud/concise-encoding/blob/master/cb...

[5] https://github.com/kstenerud/go-cbe

[6] https://github.com/kstenerud/go-cte

[7] https://github.com/kstenerud/go-compact-time

[8] https://github.com/kstenerud/go-compact-float


Great project, regarding the stricter type system we are currently working on a stricter JSON Schema version called TypeSchema which basically provides a strong type system following TypeScript. Through this you can use all tools which are currently working with JsonSchema but it is easily possible to generate code or build a binary representation. You can take a look at it here: https://typeschema.org/


Out of curiosity, why support references, like YAML?


References are necessary in order to support cyclic data. They're also useful for repetitive data.


How does your offering compare to capnproto?


It's similar in a number of ways. The communications protocols both are asynchronous, but capn'proto isn't inherently bidirectional (where commands can be sent both ways). Cap'n proto data is schema-based, whereas Concise encoding is ad-hoc.

Then again, Streamux is VERY low level, and I haven't written the higher level RPC mechanism yet (so it won't do a lot of things the Capn proto RPC layer does without some help). I'll be stealing a lot of high level ideas once I get to that level, but the primary thing is that Streamux is a communication layer designed to be built upon, since not everything requires the same heavy duty RPC treatment. Streamux will fit just fine in a tiny i2c environment or full on TCP or whatever comms channel you're using because you can tune it to use as much or as little of space/time/range resources as you need. And endpoints can negotiate these without even needing round-trips.

As for the binary format itself, Capn'proto relies upon schemas for decoding the data, whereas CBE encodes the basic data type so that it can be decoded without a schema (like JSON, XML, etc) so that you can encode ad-hoc data without needing to specify how it's decoded. Capn'proto only has void, bool, int (signed/unsigned), float, binary, list, and map types. Concise encoding has nil, bool, int (signed unsigned), float (binary and decimal), string, date, URI, binary, list, map, markup, comments, metadata, and references. References are important because they handle cyclic and repeating data in an elegant manner. Capn'proto doesn't support float compression.

That's off the top of my head. Capn'proto has a lot of good ideas and looks very nice for its niche, but the biggest issues I have are the need for a schema, lack of types, and the lack of a twin text format (and this is by design, because it's competing with Protobufs, not JSON). However, it supports mmap, so it can be blindingly fast for certain applications that concise encoding won't even try to compete in.

I just realized that I haven't added Capn'proto to my features matrix: https://github.com/kstenerud/concise-encoding#comparison-to-...


> In all honesty, isn't json schema and open api more or less reinventing wsdl/soap/rpc?

It's a bit like WSDL (a schema on top of a schema-less format), but not at all like SOAP or RPC (which are more command-oriented).

What's your point? So what if it's reinventing something?

Schemas are useful. Instead of writing docs, validators, and language-specific type definitions, you can write a single OAS spec file and get all three.


This is great and +1 to Phil Sturgeon for pushing for this fix. He wrote a great blog post about it a while back which pointed me to a library[1] that helped my team manage the problems with OpenAPI schemas. Nice to see this situation finally in a better place.

1. https://github.com/openapi-contrib/openapi-schema-to-json-sc...


This is amazing news and will help propel the modern API economy forward in so many subtle ways.

Don't underestimate the second and third-order effects that interoperability enables.

(this is also really great news for GraphQL fans...)


Not super familiar with the topic, how does this affect GraphQL?


OpenAPI and GraphQL both offer strongly typed and formally described APIs. As such, they interop seamlessly together.


I'd sure love some convergence on JSON hyperschema and OpenAPI... I prefer hyperschema but it's seen nowhere near the adoption that Swagger/OpenAPI has, meaning it's hard to choose it for newer projects even though I think it's more extensible and expressive than OpenAPI (a result of OpenAPI being a little more opinionated).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: