This is something I don't think folks really appreciate until they run into it in their own code. Back in the early days of microcomputers when everyone was rolling their own floating point code, some packages would very slowly diverge as things went back and forth between decimal and binary. In particular was an example of a BASIC program that was printing out a series of numbers for an accounting program to paper tape, and then later reading them back in when additional data was added. That round trip ended up in variations of a couple of pennies every trip through the system.
IEEE 754 Floating point numbers should only be used where you can tolerate rounding errors. Even if you have enough precision to do base conversions, actually doing math will frequently introduce rounding errors.
You can either be very careful to avoid the many cases where this can happen, choose a different number format (fixed-point, decimal floating point, numerator-denominator, arbitrary precision mantissa binary floating point, which one depends on your problem domain), or use units that let you work in integers (equivalent to fixed-point, but easier to explain).
The article is specifically about round-tripping between base-10 and base-2 floating point because the precision of floating point is a very real issue if you're doing conversions. This is e.g. one of the reasons why decimal fixed point is preferable for many types of operations.
Fair enough that the title leaves it ambiguous, but the ambiguity is resolved in the first sentence of the post, and further clarified in a lot of detail over the next paragraph and a half.
The whole setup is a problem that shouldn't exist, because you're using 7-bit floats for an inappropriate range. When are you going to run into 7-bit floats as the default? If you're not thinking about it, I suppose this could happen if you scale it up, but it's less surprising to say "4862057400958023983457609283094859 can't be represented exactly in floats".
It's a pretty click-baity title, it should be hedged to 'native floating-point' or something.
In the second section the author clearly shoots down the title, while again choosing to repeat that it 'isn't enough', having again as a subheading:
>Seven Bits Looks Like Enough But It Isn’t
When that section clearly demonstrates that it would be enough.
This statement really should be hedged.. You can just straight-up do binary-encoded decimal and change every pair of decimal digits into 7 bits and back again (bin 0000000 for decimal 00 through bin 1100011 for decimal 99, see reference [1] for chart), end of story and never suffer any degradation.
I realize this isn't the scheme that is chosen, but that is a choice. Both the title and the quoted sub-heading imply otherwise; i.e. that it isn't enough bits. As shown above clearly it is.
[[[
Aside: in this scheme you EVEN have some kind of 28% larger space you can make use of somehow for checksumming or the like!
Naive example: when encoding any value that happens to contain digits smaller than 28 you can also encode parity information:
when encoding, if the two digits you're encoding happen to be digits 00-27 you ALSO encode the checksum value 1 by choosing to add decimal 100 (to arrive at bin 1100100 through 1111111 i.e. decimal 100 through decimal 127) or checksum value 0 by passing on doing so (i.e. keep bin 0000000 through 0011011 i.e. decimal 00 through 27).
When decoding, when you decode and encounter decimal 100 through decimal 127, subtract 100 from it before recording, and also record a checksum 1; when encountering decimal 00 through decimal 27 record it but also record a checksum value 0.
This means that in 28% of cases you gain 1 bit of checksumming information while perfectly encoding two binary digits. (If your stream doesn't happen to contain any decimal digits 00 through 27 then you just don't gain any checksumming information.)
You can use these checksum bits however you like! You can't be guaranteed to have them, but you have a good chance of having them, starting with a 27% chance of having a parity bit in the case of even a single pair of digits!
I'm not an expert on checksums, and the above is just a naive example, but the point is this: not ONLY is 7 bits enough to encode 2 decimal digits: it's even enough to add a parity bit in 27% of input cases.
What you seem to be missing is that the post is specifically about conversion between floating points encoded in decimal and floating point encoded in binary. BCD are decimal digits encoded as binary, they're not binary-based floating point.
It is made clear from the very first sentence of the article that it is about representations that will round-trip between number bases, not whether or not it is possible to encode decimal floating point numbers with two digit accuracy in 7 bits.
You're reading something into the title that isn't there. The author says "7 bits aren't enough for 2-digit accuracy" and you're reading it as "7 bits aren't enough to represent 2-digit numbers." The post is about converting base 10 floating point numbers (with 2 digits accuracy) to base 2 floating point numbers and back, not representation of integers.
I don't think he's reading too much into the title. He equally well showed that it's possible to encode 2 base-10 digits of accuracy in base-2 fixed point numbers and back. Nowhere does the title mention floating point numbers, which is just one arbitrary (but popular) format among a number of possibilities to encode numbers with chosen accuracy.
fair enough; maybe it's too far outside my area of expertise. OTOH I never expect round-trip conversions to be perfect and couldn't care less whether there are small errors, and would use a different representation if I did.
Your comment is interesting, but I still downvoted you for irrelevant pedanticism.
A title isn't 'click-baity' just because it's shorter than you think it should be. I think the title was completely fine and the contents of the article were what the title led me to expect, without actually already fully knowing the issue.
Unless you exclusively deal with remainders divisible by powers of 2, any digits to the right of the decimal point should be seen as an approximation. Don't confuse the way it's displayed with the way it's considered - your code should keep its numbers in terms of the smallest unit you need integrity at. (In other words, calculate your billing in floating cents, not dollars.)
Why do you discriminate against the numbers on the left of the decimal point? They are an approximation too.
They are precise in fixed point representations, but floating point has a single bit as the integer part of the mantissa (and even this one can gather error on some sequences of operations).
I think you're getting something mixed up. Until you get to numbers so big that n+1 == n, the numbers on the left of the decimal point are represented exactly. They have the exact same precision as fixed point.
Ignore how the mantissa looks by itself. Once the exponent gets involved, everything that influences the left of the decimal point is an integer. That's why you use the same base for the mantissa and the exponent. The exponent losslessly shifts part of the mantissa into the integer range. Any particular floating point number is exactly equivalent to a fixed point number.
My point is that differentiating the integral and fractional parts of a floating point makes no sense. Some operations lose a constant amount of precision, others lose more precision the more different the numbers are. None of them act differently on the integral and fractional parts of the number.
Now, of course, there's a fixed point representation for each possible floating point number. But operations on fixed point numbers act differently on the integral and fractional parts of the numbers (in part because it's impossible to have a pair of fixed point numbers of the same type, but so different that n+1==n).
>But operations on fixed point numbers act differently on the integral and fractional parts of the numbers
Not really. Not in any way different from floating point. Can you explain this point? The efficient way of doing fixed point is to store it as a single scaled number, which is also how floating point stores the mantissa. You could split it into two numbers, but you could also split a mantissa into multiple numbers if you really wanted, it doesn't really change the math.
>(in part because it's impossible to have a pair of fixed point numbers of the same type, but so different that n+1==n).
I'd disagree there. You can have fixed point denominated in 100s as easily as you can have it denominated in .01s.
>Some operations lose a constant amount of precision, others lose more precision the more different the numbers are. None of them act differently on the integral and fractional parts of the number.
Precision is only 'lost' compared to other floating point operations. A long double won't ever have less precision than a 32.32 fixed point number, unless you're using numbers that wouldn't fit in 32.32 in the first place.
In no way do you have "a single bit" of useful precision. Anything that can corrupt the upper bits of a floating point number can corrupt the upper bits of a fixed point number.
Or just don't use floats at all. In many cases fixed-point makes more sense.
The only problem with that is that there's no support in most programming languages for either and no hardware-support for decimal floating points in common hardware.
Floating point numbers are a pretty generic concept. But there are infinite possibilities of fixed point ones, with completely different qualities. It's just not viable to implement them in hardware.
I especially said hardware for decimal floats and didn't include fixed point. You don't need special hardware for fixed point. Addition and subtraction are the same as if you would do it with integers, for multiplication and division you also need to shift additionally.
With floating point you also have endless possibilities how to define them. Next to the standard IEEE ones you for instance have 80 bit ones you had to use on intel CPUs. With fixed point arithmetic you don't really have standards but common ones like Q16.16
I wanted to go back to the source, or at least as far as I could go (I didn't look to see if any of Von Neumann's papers covered it :) ). I know D. Goldberg's appendix covers this (pages 218-219), for the binary to decimal direction, for single-precision floating-point. But I wanted a simple example as well, and I. Goldberg's paper had that.
It's quite interesting how much of our data is decimal floating point (any ascii representation like in css, html, json, javascript, svg, when we enter data somewhere), yet we convert it to binary floating point back and forth all the time.