Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What about the 2^63 corner-case?

Consider this JSON: {"key": 9223372036854775807}. With most parsers it never fails.

But... some JSON parsers (include JS.eval) parse it to 9223372036854776000 and continue on their merry way.

The problem isn't user-provided JSON here. The problem is user-provided data (or computer-provided data) that's inside the JSON.

rachelbythebay's take (http://rachelbythebay.com/w/2019/07/21/reliability/):

On the other hand, if you only need 53 bits of your 64 bit numbers, and enjoy blowing CPU on ridiculously inefficient marshaling and unmarshaling steps, hey, it's your funeral.



> some JSON parsers (include JS.eval) parse it to 9223372036854776000 and continue on their merry way

This is correct behavior though...? Every number in JSON is implicitly a double-precision float. JSON doesn’t distinguish other number types.

If you want that big a string of digits in JSON, put it in a string.

Edit: let me make a more precise statement since several people seem to have a problem with the one above:

Every number that you send to a typical JavaScript JSON parser is implicitly a double-precision float, and it is correct behavior for a JavaScript JSON parser to treat a long string of digits as a double-precision float, even if that results in lost precision.

The JSON specification itself punts on the precise semantic meaning of numbers, leaving it up to producers and consumers of the JSON to coordinate their number interpretation.


Every number in JavaScript is, JSON does not specify.


Update: The IETF version is a bit more explicit. https://tools.ietf.org/html/rfc8259#section-6

> This specification allows implementations to set limits on the range and precision of numbers accepted. Since software that implements IEEE 754 binary64 (double precision) numbers [IEEE754] is generally available and widely used, good interoperability can be achieved by implementations that expect no more precision or range than these provide, in the sense that implementations will approximate JSON numbers within the expected precision. A JSON number such as 1E400 or 3.141592653589793238462643383279 may indicate potential interoperability problems, since it suggests that the software that created it expects receiving software to have greater capabilities for numeric magnitude and precision than is widely available.

> Note that when such software is used, numbers that are integers and are in the range [-(2^53)+1, (2^53)-1] are interoperable in the sense that implementations will agree exactly on their numeric values.

Or to paraphrase: if you pretend that every JSON number is a JavaScript number (double-precision float), you will generally be fine. If you don’t, you are responsible for the inevitable interpoperability problems you’ll have with most current JSON parsers.


“JavaScript Object” is right there in the name.


Java is also in the name JavaScript, but we know how much that has to do with it.


indeed, we've all read that history. And we all know how much JavasSript has to do with "JavaScript Object Notation", too right? Basically everything?


Other than shared syntax, JSON is its own thing.


>Every number in JSON is implicitly a double-precision float

Is it? I was under the impression every number in JSON is implicitly arbitrary precision.


You can write an arbitrary precision number in there, say 4.203984572938457290834572098345787564e+20, but if the vast majority of JSON parsers interpret it as a double-precision float, there’s not much point.

If you control both the producer and consumer of the serialized data, you can of course do whatever you like. But I would recommend people who want more extensive data types use something other than JSON.


> If you control both the producer and consumer of the serialized data, you can of course do whatever you like.

Don't forget all the unintended intermediate producers and consumers due to microservices or even otherwise well-written tools that convert to float64 internally.


In what situation would that create a problem that isn't noticed immediately during testing?


You're writing the server for a procedural space game where the coordinates are stored as numbers. As the ships get far from (0,0,0) people start to report that the graphics get "jumpy" and gameplay rules break frequently.

Or, your server keeps track of how long it's been running, in nanoseconds. When the server stays up for months, people start noticing weird precision issues and hard to replicate bugs around time.


When the tests fail to 1.) experiment with numeric values greater than 2^53 or less than -2^53 and 2.) fail to carefully check the results even in the situations where they did experiment with such values.


How often do you encounter situations where your values start off below 2^53 and then later grow above 2^53? How often do you deal with integers greater than 53 bits where the app doesn't immediately fail when the exact value is not correct (i.e. unique IDs or crypto keys)? I feel like these are edge cases which would rarely come into play in most software.


Edge cases which rarely come into play are the ones you should be most afraid of. Things that happen often get caught before they cause trouble. Things that never happen are fine. Things that happen rarely are what get you.

For an made-up example of how this could get you, let’s imagine that you have a message service that gives each message a unique ID. Some bright soul decided to give this ID a nice structure and make it a 64-bit value where the top 32 bits are an incrementing integer per user, and the bottom 32 bits are the user’s ID, assigned by with a global incrementing integer.

Everything works great in testing and you deploy and the VC money is rolling in and then some of your very prolific users go past 2 million messages and suddenly messages are getting mixed up and you’re leaking private info because your access checking code happens to get the real 64-bit value but your message retrieval code puts the ID in a JSON number.

Now you might respond, but that ID scheme is dumb, don’t do that. And you may be right! But dumb things happen. It’s unwise to leave land mines lying around in your software just because they only detonate when someone does something dumb.


Here's some stuff where it would bite me in my work:

- I sometimes use random 64 bit integers in my data (usually as cheaper synthetic guid-like keys).

- I sometimes use CRC-64 hashes.

- I sometimes use 2^63 as a sentinel.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: