Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> They definitely are not, as displayed by the fact that binary lengths are the root cause of a huge number of security flaws. JSON mostly avoids that.

I assume you're mainly referring to buffer overflows, which are a problem with text based formats too. See for example the series of overflow vulnerabilities in IIS's HTTP parser which lead to some of the most disruptive worms in history like Code Red. Really this is more of a problem with memory-unsafe languages than serialization formats.

> Let's look at floating point numbers: with a binary format you can transmit NaN, Infinity, -infinity, and -0

Depending on the use case being able to encode these values may be a requirement, in which case binary is no worse than text.

> You can also create two NaN numbers that do not have the same binary representation.

This is specific to IEEE 754, not all binary representations have this issue. Text based formats also have far more pervasive problems with lacking a canonical representation so it's hard to count this as a point against binary.

> JSON is one of many competitors within an ecology of programming, including binary formats, and yet JSON currently dominates large parts of that ecology.

This is just an appeal to popularity fallacy.



> I assume you're mainly referring to buffer overflows,

That is the most visible security issue, but there are many others e.g. reading excess data is a security flaw (a la heartbleed).

> Really this is more of a problem with memory-unsafe languages than serialization formats

JSON is often used to communicate with unsafe languages. Are you suggesting a binary format is better for using with a language that is memory-unsafe? Or are your implying we should use Rust so that we can use a binary format!

> This is specific to IEEE 754, not all binary representations have this issue.

So now we use some other (unspecified) binary format for floating point numbers?

> This is just an appeal to popularity fallacy.

Bullshit. JSON didn't become popular because it was popular. Developers have chosen to use JSON because it served a purpose for them within their particular ecology. It became popular into the headwinds of XML and other formats.

There are plenty of unpopular binary formats that developers have had experience with that they choose not to use. I have personally have enough experience with a variety of spec and custom binary formats to know when I would use one.

Binary formats most definitely have their place (embedded, high throughput at scale, severe bandwidth restrictions, strict typing, languages with poor string handling, naturally binary data). But for a large percentage of software projects, JSON works and it works well.


I see both arguments.

XML became popular because of its human readability compared to binary and allowed disparate systems to cooperate. JSON is even more readable and allowed the whole XLST thing to be ignored, making a dev's life easier, and really took off with node.

Binary is not gone, but you don't see a lot of it in the 'web world', because everyone in that space is JSON, almost exclusively.

Having written many parsers, binary is by far and away the easiest: provided your language/environment supports it. That is, binary in js is a pain, because you don't have native ints and floats. In js, JSON is the base atomic data structure, so it makes complete sense that it's used... But in c/c++ and co. JSON is harder and introduces a lot of overhead/ quirks which simply don't exist in binary (provided you cover the buffer overruns and co.)


> XML became popular because of its human readability compared to binary and allowed disparate systems to cooperate. JSON is even more readable and allowed the whole XLST thing to be ignored, making a dev's life easier, and really took off with node.

XML was simply never designed to represent structured data. It was meant to represent document markup in a way that was both simpler and more extensible than SGML.

If there's one thing the industry always seems to do, its embrace some technology hammer as the solution for every problem. Before XML some people were trying to trade around relational database dumps as data interchange formats, because relational databases were the golden hammer.

JSON is being abused by being stretched beyond its sweet spot as well, but there aren't industry consortiums necessarily pushing bad ideas like there was with SOAP and WS-*.


JSON has JSON Object Signing and Encryption (JOSE), compare that to XML-Dsig and XML-Enc. It has OpenAPI/Swagger, which is close to SOAP/WSDL. It has JSON-Schema, which is close to XML-Schema. It has OpenID Connect, which is literally SAML, but with JSON instead of XML.

If you're missing anything out of WS-* in JSON, you can be sure somebody is working on a spec for it.


Erlang and elixir are two more memory safe languages that comfortably handle binary. See Armstrong, Pogramming Erlang, Ch. 5, "Advanced Bit Syntax Examples", with partial parsers for MPEG, COFF, and IPv4 on pages 83-89.


There is no functional difference between an array of unsigned bytes (binary) and a array of signed bytes (char data). The only difference is that when you send binary, 0 is now a valid value instead of null terminating a string. Therefore you must prepend the size because you can no longer parse till you find a NULL byte. It is always safer to know the size you must allocate ahead of time instead of dynamically growing a buffer until the text stream is terminated.


There are plenty of examples of binary formats where you do not know buffer sizes until you've received all the data, and where assumptions with parsing the data can cause a buffer overflow.

decompression and PNG libraries for example have caused massive security impact across the industry because of reuse in different products. Font handling, compressed bitmap, and windows cursor parsing also have been sources of issues.

Mozilla didn't just invest in Rust because parsing HTML and JSON are hard. Its all hard.


“It is always safer to know the size you must allocate ahead of time instead of dynamically growing a buffer until the text stream is terminated.”

And then you go on to give examples of said scenarios of how this is true while saying I’m wrong? Anytime you have an unknown payload you have to make a determination of how long you’re going to wait, how much you’re going to accept, buffer, etc before it’s become a drain on the system


You can still reserve one value as a control character and have non-text streams that don't need prefixing of data frames by size.

These are also easier to recover the data from in case of corruption, or re-sync the receiver, since you have a control byte you can sync on.


Yes, in fact, this is most common in video streaming formats. These types of streams are more commonly downloaded as opposed to uploaded where the server has to be careful not to exhaust too many resources parsing variable-length messages.


It's also fairly common if you have alway-on streams without any signalling, where you can connect to the stream at any time, like with UART.


JSON over HTTP can communicate its size with an HTTP header.


There’s no reason strings cant be sent like binary if you do a size header first. The problem is trying to send binary like a string where your data might have 0s that could be interpreted by the receiver as string terminations. Typically base64 is used to address this issue


Even if it's compressed?


>This is specific to IEEE 754, not all binary representations have this issue.

Also sometimes it is a feature. The payload of a NaN value can be user defined and some programs use it[1]. The string "NaN" drops information that might be usefull to some programs, it just doesn't affect as many as null -> "null" does.

[1] https://github.com/WebKit/webkit/blob/master/Source/JavaScri...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: