Java has a specific hack for this which I discovered by accident a few years ago. Normally, given (primitive) double values d0 and d1, then boxing them does not affect equality testing - i.e. d0 == d1 if and only if Double.valueOf(d0).equals(Double.valueOf(d1)). However if d0 and d1 are both NaN, then the boxed versions ARE considered equal while d0 == d1 is false.
This inconsistency infuriated me when I discovered it but the Javadoc for Double.equals explicitly states that this anomaly is there to "allow hash tables to work properly".
If I get floats from $somewhere, I might want to index them into a hash data structure without creating a huge sparse array.
If you assume float numbers have "identity" such that, whatever the precision, "x == x" always holds (other than NaN), then this is a perfectly valid thing to want.
Say I'm in a dynamic language and have a dictionary in which an object of any type whatsoever can be a key. Why would I arbitrarily disallow floating-point objects, if a character, string, lexical closure, database connection, file handle, ..., or lists or arrays of these can all be hash keys.
A hash table of floating-point values can be used for, say, memoizing a function whose argument (or arguments) are floating-point.
A compiler could use a floating-point-keyed hash table for deduplicating identical floating-point constants. Say that constants are stored in some static area, and referenced by address: it's wasteful to repeat those constants. Some constant defining mechanisms (like #define in C) proliferate copies of a constant as a repeated subexpression.
Because all those other things are either identity types (handles, closures, etc) or are value types where the only possible way to represent them is with a single "canonical form."
But IEEE754 allows not only NaNs, but also "negative zero" and denormals. Floating-point numbers, in other words, allow for multiple different bit-encodings that represent mathematically equal, but not identical, number values. There are non-canonical forms of the "same" numbers. And programming-language runtimes don't do anything to prevent CPUs from returning these non-canonical numbers; nor do they massage them back into canonical forms upon receiving them. They just end up blindly holding these non-canonical numbers — numbers which, if they ask the CPU if they're equal, they are; but if they look at the bit-patterns and compare those for equality, they're not.
> A compiler could use a floating-point-keyed hash table for deduplicating identical floating-point constants.
Bad example. A compiler wouldn't want a hash table whose key type is a floating-point number, because that would imply that the hash table is operating using IEEE754 definition of equality via-a-vis key presence collision checking.
Rather, a compiler would use a bitstring key type, where the keys are the bit patterns representing either the target ISA's native FP-register encodings of the given floating-point numbers; or the abstract/formal bit-packed form of those floating-point numbers according to IEEE754.
The difference between the two is that the latter uses bitstring collation (ordering and equality), not floating-point collation.
value types where the only possible way to represent them is with a single "canonical form."
But IEEE754 allows not only NaNs, but also "negative zero" and denormals
Unicore strings also have normalization forms. Not to make an argument, just a reminder.
True — but Unicode strings that are not bit-equal aren't collation-equal either, so that doesn't usually matter. A non-canonicalized Unicode string won't "find" its canonical counterpart in a hashtable.
IEEE754 floats (and doubles) are kind of unique among scalar types, in that the answer to "are these equal" in pretty much every programming language is delegated directly to the CPU to answer; and, in obeying IEEE754 semantics, the CPU's definition of equality for FP numbers makes things equal that aren't bit-representation-equal.
Having a generic container that handles all types of values except one only for policy reasons is rarely a good idea. If floating point instabilities should be a concern (yes), then not at the Map level. That would be a naive crutch.
Why would you ever want a number to be stored in a floating point format - that is a much better question.
Most of our numeric values don’t even correspond to the domain of floating point. For business and programming goals, -0 is a complete technical nonsense. NaN too, it should just raise an exception (there are quiet and signalling nans, everyone defaults to quiet). And so should non-low-level integer overflow, really, unless there is a software fallback. Intermediate calculations limited to a low fixed number of digits is nonsense. Accounting for constant rounding/formatting errors is also a nuisance.
Builtin, first class citizen, hardware enabled, base 10 fixed point could be the answer. But there’s almost always either bigint-only which is barely used even in a standard library, or a serious performance hit which nobody wants. There would still be issues, but much less in practice.
Floating point is a niche type suitable for “multimedia”, NNs and maybe few other special contexts. They are the default only because everyone on the hardware..runtime spectrum traditionally believes that ordinary numbers are not their responsibility.
Memoizing computations for which you know the input(s) come from a small range of numbers?
IEEE floating points are 32 bit / 64 bit, so it's not in principle any more insane than using (u)int32 / (u)int64 as keys. It's not like the key space is unbounded or even hard to estimate - it's just not a common thing to see in the wild, and I guess most devs rarely have a reason to consider how many values fit between two floating point numbers.
Another one which probably shouldn't exist is a map with boolean keys.
Edit: It might be nice if a language compiler detects such bastardizations and spits out a proposal for a better alternative structure or approach, perhaps even stubbornly refusing to proceed to compile such a shitty idea.
Golang already kind of does a spiritual form of this by disallowing unused variables.
This would undoubtedly be similarly controversial. Ego ruins all.
That... still requires the foresight to see that edge case, and being in a situation where you're not colliding your float hashes with random integers?
Externally hashing outside of the hash table requires you to take on ugly complications. What if two distinct floating point keys map to the same key? The values of your hash table then have to be lists or arrays so you can keep all colliding entries. You've then wastefully built chained hashing around a hash table!
The same approach I suggested elsewhere in this thread will work for this case, too. Convert floats to their string representations and key off that. Equality ambiguity problem solved.
Javascript does a ton of things that are extremely questionable, so it's not a good source to say you should do something. Rather, it's a better argument (though still flawed) to say that because it's done all the time in Javascript, it's evidence that you shouldn't do it.
I'm surprised this is done in JS. If you're doing any kind of calculation with floats (and use actual non-integer numbers, not just ints stored as floats) then it's generally difficult to get the same exact result every time on every system, due to rounding errors. That's why to my knowledge, any strict equality check on floats is generally frowned upon. (exceptions: you know all the floats are actually non-huge ints; You're comparing with a value the variable was explicitly set to at some time before)
JavaScript conceptually only has floats (ignoring big integers, which isn't what most people use for numbers), so any map of numbers to values is using a float key. You refer to that, but yeah they're all floats.
> it's generally difficult to get the same exact result every time on every system, due to rounding errors
Floating point arithmetic is 100% entirely deterministic - you don't just get different values randomly.
ECMAScript does not have maps where keys can be integers and neither can keys be floats.
Everything is a pointer to a value, and otherwise reflected with a symbol or string (which in return is a unique symbol). An object doesn't have object[1] because it is actually object[reference("1")]
This way boxing and unboxing of arrays is much cheaper, due to the arrays not needing a resizing of their cells once datatypes change.
(TypedArrays are optimized in a different manner, but we are speaking about the NaN use case, which implies either an Array or Object)
There are dozens of talks about e.g. v8's hidden classes on youtube that show up this very mechanism.
A nitpick: You're setting object properties on a Map object, which only works "accidentally" but does not make use of the map at all[1]. So in this case, it likely works because the numbers are converted to strings before being used as keys.
But in general, using float keys does work - but can fail in unexpected ways due to rounding errors. E.g.:
That's because 0.1 + 0.2 == 0.30000000000000004 in IEEE floats, which is != 0.3. [2]
So using float keys which are derived from calculations and may contain non-integer numbers is a bad idea, because unless you have very good knowledge about floating point math, you can not easily predict what exact value the result of a calculation will be.
from what I got, it's a very stupid interaction of different javascript language features.
Every object in JS has basic map functionality. Since version 1 of the language, you could always write things like:
var o = new Object();
o["foo"] = "bar";
o["foo"] // --> "bar"
However, this functionality is relatively limited: It only supports strings (and symbols) as keys and it interacts badly with other methods and properties which are part of the object. Hence why an explicit "Map" type was added to the language much later.
The problem is, Maps are also objects, so they still inherit the "old" map functionality in addition to the new functionality. Those two systems are completely independent, even though they act on the same object. So writing
mymap["foo"] = 1
and
mymap.set("foo", 1)
both store the mapping "foo" -> 1, but in completely different places. Only the second one will actually put it into the store of the map, while the first one will just add it as a generic object property.
You can see it when trying to retrieve the value again:
Yes the square brackets were just a typo. But maps do support float keys - you can try it yourself!
map = new Map();
map.set(1.2, 3.4);
log(typeof(map.keys().next().value));
That says the key is a number, and it's a floating point number. So what did they mean by "ECMAScript does not have maps where keys can be integers and neither can keys be floats"?
I think the GP was wrong there. JS maps absolutely do support float keys, it's just generally a bad idea to use them if you don't restrict yourself to integers.
JavaScript maps absolutely can use numbers (or any type) as keys, but that's not how its API works. Square brackets access object properties, but map entries are not stored like that, and that last `typeof` is `"undefined"` in every runtime I tried (not `"number"`). Try `map['set']` to see another example.
We're certainly in agreement that cookiengineer's "ECMAScript does not have maps where keys can be integers and neither can keys be floats" assertion is dead wrong.
You’re being pedantic. Nobody is talking about internal representations, they’re saying it’s literally possible to key into a map with a float, which is true.
It's done in JavaScript because there is not integer type. (Ok, ok, nowadays there is bigint, but historically.) So you do all int like calculations with number (which is IEEE double), but be sure to only do arithmetic where the result is still an integer. That's why you end up using a double as a numeric key. And I think a double can basically hold a 53 bit integer exactly.
This is definitely not my area of expertise, but my understanding was that the mantissa was actually the source of inexactness in floats? I know the power bits (10 bits?) are pretty close to actual twos complement ints, and you could always do things in a C-like union, but I wasn't aware you could directly treat the mantissa like an int
Ok, that's different then. Using integers as keys should not be a problem, even if they are stored as floats. Problems start if you use non-integers or IEEE special values as keys.
Note that the contract for the equals() method requires that the method always returns true when called on itself. So the designers had to decide whether to break the contract on NaN or the contract on equals(). I could imagine that honoring equals() was the solution with the least expected breakage with downstream users.
This inconsistency infuriated me when I discovered it but the Javadoc for Double.equals explicitly states that this anomaly is there to "allow hash tables to work properly".