Hacker News new | past | comments | ask | show | jobs | submit login
Should I Use Signed or Unsigned Ints? (Part 2) (robertelder.org)
61 points by robertelder on Sept 1, 2015 | hide | past | favorite | 44 comments



One quick side story. Years ago, when I worked in automobile systems programming, some coworkers had to bring up a CAN bus data extraction system quickly (less than a week) that worked with all of our programming tools. We used Java, C#, C, Python and C++ to capture, process and display the data to end users.

To save time (due to the short schedule), they decided to store all numbers as 64 bit doubles. They reasoned that 64 bit doubles were big enough to hold any number that might come off the system controllers (some where intel, others motorola) and would simplify the design so that they could write it quickly.

Some time later, we discovered an infrequent bug where occasionally, some of the CAN variables were correct, but at other times they failed sanity checks (way outside of the expected values for the specific sensor).

Long story short, the issue turned-out to be that 64 bit double assumption. Sometimes there was enough space to store a large 64 bit int (if the int was 52 bits or less (IIRC)), at other times the int was too large to fit. The sign (+/-) and exponent take up 12 bits or so.

The moral of the story is to use the right size (and sign) for each variable and don't assume that one size will fit all. We had to go back and correct all the bad values. That took awhile. In hindsight, they should have been given more time to design the system.


Floating point numbers are almost always a mistake. Classic howlers:

"Well, we can represent almost all the values we care about." Until the day you can't.

"We'll represent time with a double, with the fractional part being part of a day, and the day being an integer day number from when Somebody Famous Did Something." Until you realize that the 10AM meeting you want to put in your scheduler can't be exactly represented.

"We'll put our account balances in there as floats because unlike everyone else who has done this, we know what we're doing." But you don't. And that noise you heard was a million accountants crying out in pain.

You wouldn't represent a pointer as "some fraction of the amount of memory the machine has, from 0.0 to 0.9999", which is my usual retort when someone wants to smuggle a scalar or structural value in a float.


"Floating point numbers are almost always a mistake."

This. The problem is that it's almost impossible to internalize this without having first-hand experience - experience which usually takes years to accumulate. By coincidence, over the last month I had to explain the problems with floating point numbers to non-technical people on two disjoint occasions, and failed miserably each time, because the technical details are too complicated and the examples too obscure. The 'store account balance in double, let's put the remainder on our own account' parable is the only example that sort of works but then they look at you blank-faced and say 'we're not a bank'.

Just venting I guess. It's the simple things in mapping real-world things to bits that trip me up the most - why is it so hard to represent 'numbers' in computers? Why is it so hard to represent 'date' in computers? Why is it so hard to represent 'locations' in computers?


I find it interesting that people don't realize that decimal notation has exactly the same issues as floating point.

You can't store 1/10 of a £ in a binary floating point number exactly so don't use it for money.

Well, you can't store 1/3 of a £ in a decimal number exactly either.

The only difference is that people have got so used to decimal that we don't even notice the issue any more, and choose prices and have systems to work around the issue. It's so ingrained we don't even know we are doing it.

So don't think that decimals are any better at storing money amounts than binary floating point, they are not.

however all of the history and workarounds that we've been using for many years before computers for this issue also work if you store them in computers the same way.

That's why you use decimals, it's because they replicate the representation issues we are used to already so we don't even notice they are there, it's not that they are fundementally better in any way.


I think I disagree (depending on what you mean exactly). You say

"That's why you use decimals, it's because they replicate the representation issues we are used to already so we don't even notice they are there, it's not that they are fundamentally better in any way."

So decimals are better, because they allow us to represent money amounts in the exact same way we do in real life, which is what we're modeling in software.

1/10 = 0.1, an amount you want to be able to represent in a computer. 1/3 = 0.33..., an amount (when talking about money) you don't want to represent (when we're talking about maintaining balances).

Unless you're saying "decimal as a concept is not philosophically or morally better than floating point", which is a position to which I suspect pretty much everybody subscribes.


I think gp's point is that 0.33... is "an amount (when talking about money) you don't want to represent (when we're talking about maintaining balances)" precisely because you, and everybody else, is so used to decimal that it disappears.

There's nothing intrinsic about 1/10 that makes it "rounder" than 1/3, except as an artifact of the numbering system.

Aliens with three hands with three fingers each, whose societies got used to base 9, would think 1/3 looks much "rounder", and 1/10 is the one you'd never want to represent.


No. It's not that a gum costs 5 cents because it's the only fraction of a dollar that can be represented in cents, it's because pricing is inherently done in one-cent increments. If there was a need to price single items at a lower-than-a-cent cost, or at prices between one-cent increments, and if the need for that outweighed the disadvantages, there would be a way to do it.

Other example: an apple doesn't cost 1/3 of a dollar, it costs 33 cents. The need for pricing things as a fraction is simply not there. I doubt most people would see 20 cents as '1/5 of a dollar' in the first place.

This might be a difference between people thinking in metric and thinking in imperial, too. (I'm metric, fwiw).

(before someone says 'I buy resistors that cost less than a cent', please show me a place that sells them individually)


Decimal floating point is almost never used though. What's used is decimal fixed point, which is simple to understand and doesn't blow up in bizarre ways when moving up to larger numbers (or mix numbers with different magnitudes).


My understanding was that they used doubles as int52's, and the troubles they had came when that int52 overflowed and corrupted the exponent of the double.

As opposed to the usual double problems.


A similar story about OCaml's garbage collector from https://twitter.com/mcclure111/status/637758978093547521:

  So the GC stats package in #OCaml has a record of how many words have been
  allocated since startup and it's a float

  it has been explained to me OCaml GC recording bytes in floats is because
  float can store up to 2^52 integer but int only goes up to 2^30.


OCaml uses a tag bit to distinguish ints from pointers, so (unboxed) ints are only 31 bits:

https://realworldocaml.org/v1/en/html/memory-representation-...

Floating point numbers are normally boxed (except for float arrays, float-only records, and some cases of local use).


To save time (due to the short schedule), they decided to store all numbers as 64 bit doubles.

JavaScript influence?


On a team that writes a CAN bus interface? God I hope not...


Lucky for the drivers, it is not for the code that _goes_ in the car.


If the Toyota debacle is any indication, the code that goes in the car is not pretty either.


Could be Lua, too.


It really shouldn't be this false dichotomy. We could have richer specifications for the bounds of numbers and the rules used to convert their formats. It's some kind of failure of imagination to not have that when arithmetic bugs traceable back to "implementation does not match representation" are so common.


Ada got this right. For example, you can do

  type Day_Of_Month is range 1 .. 31;


What happens when you try to pass an invalid number ? An exception is thrown ?


Yes.


Making this efficient at the machine level gets hard. Hardware engineers really want to do 32 or 64 bit add/subtract/etc. datapaths, and handling range types in pipelines is hard, so it's up to software to do anything more complicated. Suddenly you're talking half a dozen instructions to do an ADD and programmers no longer want to use your language.

Efficiency is a convenient god when you're not willing to pay for being correct.


One of the most annoying problems in C and C++ (and Objective-C/C++). Most of the time we deal with quantities, sizes, distances, coordinates, identifiers. Except for the latter, all others can end up in the negative space even when we think they shouldn't. Having said that, identifiers and enumerated codes should also live in the signed realm as they can get converted to other types, again, in ways we can't always predict (and result in bugs).


I wish C++ had a better way of representing quantities, something like Pascal's or Ada's ranged integer types. I started to write a C++ template class to represent strongly-typed ints. The class grew to 100+ LOC to handle all the expect operator overloads. Then I gave up. :) C++11 gave us strongly-typed enums and the means to specify the enum's underlying size (like enum class Color : unsigned short). Why can't they give us strongly-typed ints with something like int class Fahrenheit?


One thing about two's complement representation is the representation is asymmetric, you have one more negative number, i.e. 0x8000 for a 16 bit signed number is negative (unsigned) 0x8000. So a function like absolute value if it returns a signed integer cannot handle all possible input values. I think if you do any numeric processing using fixed point arithmetic you simply need to know the range is asymmetric for two's complement rather than trying to pick a scheme like always using signed or unsigned.


There is no "two's complement" in C. The representation of signed integers is implementation-defined. This is one reason why C has undefined behavior.

Imho it does not make sense anymore, since practically all hardware uses two's complement these days. If the standard would declare two's complement for signed integers, then shift-left would always have defined behavior, for example.


> In my case, I was designing a spec for a CPU that was intended to be as minimal as possible. The CPU would physically implement signed or unsigned atomic math operations, not both. The compiler would support the missing one (signed or unsigned) using emulation in terms of the one that was physically implemented.

I find that part very strange. Here are a couple of observations:

* Unsigned wrapping binary addition and two's complement addition are identical as long as you don't try to report carry or overflow.

* Unsigned wrapping binary subtraction and two's complement subtraction are identical as long as you don't try to report borrow or underflow. If the goal is absolute simplicity, subtraction isn't needed at all because both cases are efficiently handled by two's complement negation (which is the same thing as calculating 0-x in an unsigned wrapping sense) followed by addition.

* Unsigned wrapping binary non-widening multiplication (the usual case) and two's complement multiplication are identical. This follows from the distributive law: an n-bit two's complement negative number -x is represented as 2^n - x. If you multiply -x (a two's complement number interpreted as unsigned) by y, you end up with (2^n - x) * y, which is y2^n - xy. The first term doesn't affect the result, because its low bits are all zero.

Widening multiplication is extremely useful, but implementing the unsigned version in hardware should make it fairly simple to emulate the signed version. x86 implements both.

Division is nasty, especially because number theorists and C programmers disagree on what it means when negative numbers are involved. Really simple CPUs don't implement division at all, though.

Finally, CPU vendors who actually implement undefined behavior are, in my opinion, nuts. Give the CPU a nice clean spec that fully defines what happens and stick to it.


I found this odd too. I'd add that comparison and right shift also differ, but signed right shift does have obvious semantics: sign extension. I'd also argue that signed division has obvious semantics: round towards negative infinity, which matches the semantics of right shift with sign extension. I'm saddened that none of C/C++/x86 have native support for this kind of division, it's _always_ what I want from signed division.


>CPU vendors who actually implement undefined behavior are, in my opinion, nuts. Give the CPU a nice clean spec that fully defines what happens and stick to it.

So, basically none of the IEEE floats engines in AMD, Intel, ARM, etc....


This is why two's complement became dominant. It simplifies CPU design.


Too many problems with unsigned. Stick with signed.

https://google-styleguide.googlecode.com/svn/trunk/cppguide....


The argument 'google says so' is addressed by the author in part [1]: Overall their argument for avoiding unsigned integers and sample code seems surprisingly weak for a company of the caliber of Google. The most important reason to use unsigned instead of signed is not self-documentation, it is to avoid undefined behaviour.

[1] http://blog.robertelder.org/signed-or-unsigned/


The book "Expert C Programming" also says stick with signed, use unsigned for bitfields or binary masks. This post says unsigned for bitwise or modulo, so basically unsigned when expecting the value to be bits in a machine instead of a number.


That's funny since signed over/underflow isn't defined, while unsigned is.


Just because it's defined didn't mean it's expected. Unsigned overflow can happen with seemingly innocent subtraction of small values.


>You should assume that an int is at least 32 bits, but don't assume that it has more than 32 bits.

I thought the spec[0][1] only guarantees int to be at least 16 bits? Am I missing something here?

[0] http://www.cplusplus.com/doc/tutorial/variables/

[1] http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf


True, but in practice I think the only place you'd find a 16-bit int today is with Arduino, but that is only because it is based on an archaic architecture.

All ARM and Intel chips have 32-bit ints.


Or rather, too many problems with C/C++'s treatment of unsigned.


Honest question: Would it be possible to create a "friendly" version of signed that is actually unsigned underneath? After all, it's just bit patterns.


On modern computers, both unsigned/signed ints are just bit patterns and it's only some operations that differ. add/sub/mul/left-shift are the same, but cmp/div/right-shift differ. Look up twos-complement representation for more information.


Left shift also differs. Raw shifting left can have the affect of turning a negative number positive by setting the sign bit to 0, so there's a sign maintaining left shift as well. Also, Left/Right shift of negative numbers have the opposite effect as they do with positive, with left shifting a negative number dividing and right shifting multiplying.


You are mistaken, left shift is the same for both signed and unsigned and any instance of a left shift turning a signed int into a positive number is just normal underflow.


fwrapv?


if you ever have to implement a softfloat on this desert island, you're going to want unsigned ints.


Use whichever works best for a particular variable. If you don't know, you haven't thought about it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: