What would be refreshing would be a C/C++ compiler that did away with the interm...

dapperdrake · 2025-01-24T21:44:10 1737755050

SQLite3 just concatenation everything together into one compilation unit. So, more people have been using this than probably know about it.

https://sqlite.org/amalgamation.html

jdxcode · 2025-01-24T23:47:42 1737762462

I totally see the point of this, but still, you have to admit this is pretty funny:

> Developers sometimes experience trouble debugging the quarter-million line amalgamation source file because some debuggers are only able to handle source code line numbers less than 32,768 [...] To circumvent this limitation, the amalgamation is also available in a split form, consisting of files "sqlite3-1.c", "sqlite3-2.c", and so forth, where each file is less than 32,768 lines in length

yellowapple · 2025-01-25T04:29:23 1737779363

That would imply that such debuggers are storing line numbers as not just 16-bit numbers (which is probably sensible, considering that source files longer than that are uncommon), but as signed 16-bit numbers. I can't fathom a situation where line numbers would ever be negative.

usefulcat · 2025-01-25T06:58:55 1737788335

Cue C or C++ should-I-prefer-signed-or-unsigned-integers debate

eredengrin · 2025-01-25T07:18:58 1737789538

It's not that uncommon of a convention to strictly use signed numbers unless doing bit manipulation, eg the Google C++ Style Guide.

rocqua · 2025-01-25T11:33:10 1737804790

Notably, unsigned integers also have defined behavior for overflow. This means compilers can do less optimization on unsigned integers. For example, they can't assume that. x + 1 > x for unsigned ints, but are free to assume that for standard ints.

That is just another reason to stick with signed ints unless there is a very specific behavior you rely on.

yellowapple · 2025-01-27T02:22:29 1737944549

> For example, they can't assume that. x + 1 > x for unsigned ints, but are free to assume that for standard ints.

No they ain't:

    julia> x = typemax(Int16)
    32767
    
    julia> x + Int16(1) > x
    false

Integers are integers, and can roll over regardless of whether or not they're signed. Avoiding rollover is not a reason to stick with signed integers; indeed, rollover is a very good reason to avoid using signed integers unless you're specifically prepared to handle unexpectedly-negative values.

eredengrin · 2025-01-27T05:18:16 1737955096

It depends on the language. I linked a set of c++ guidelines and for c++, they are correct: it is undefined behaviour to do signed integer overflow. Some languages do specify it, eg rust, and even in c++ it might appear to work, but even then it is still undefined and should be strongly avoided.

yellowapple · 2025-01-27T12:34:41 1737981281

That's what I'm saying, though: rollovers can happen regardless of whether the integer is signed or unsigned. x + 1 > x is never a safe assumption for integers of the same fixed width, no matter if they're i16 or u16. Whether it's specifically acknowledged as defined or undefined behavior doesn't really change that fundamental property of fixed-width integer addition.

(As an aside: I'm personally fond of languages that let you specify what to do if an integer arithmetic result doesn't fit. Zig, for example, has separate operators for rollover v. saturation v. panicking/UB, which is handy. Pretty sure C++ has equivalents in its standard library.)

makapuf · 2025-01-25T06:51:23 1737787883

Maybe somewhere some line offset is stored as i16? (I don't understand why anyway but..)

shakna · 2025-01-25T09:48:28 1737798508

The __LINE__ macro defaults to "int". That then gets handed to the debugger.

qzzi · 2025-01-25T19:08:11 1737832091

The __LINE__ macro, like all other macros, is expanded during the preprocessing of the source code and is not handed to the debugger in any way.

shakna · 2025-01-26T09:35:35 1737884135

Yes... And debuggers that implement line numbers, generally work by taking that information as part of the preprocessing stage. And the #line and __LINE__ macro/directive were implemented _for debuggers_ when originally created. They were made to be handed over to the debugger.

If you simply compile and run, the debugger won't have __LINE__, no. But it also won't have line numbers, at all. So you might have missed a bit of context to this discussion - how are line numbers implemented in a debugger that does so, without access to the source?

qzzi · 2025-01-26T13:38:37 1737898717

No, the debugger does not get involved in preprocessing. When you write "a = __LINE__;", it expands to "a = 10;" (or whatever number) and is compiled, and the debugger has no knowledge of it. Debugging information, including the mapping of positions in the code to positions in the source, is generated by the compiler and embedded directly into the generated binary or an external file, from which the debugger reads it.

The __LINE__ macro is passed to the debugger only if the program itself outputs its value, and the "debugger" is a human reading that output :)

dapperdrake · 2025-01-25T00:09:46 1737763786

*concatenates

Apologies for the typo. And now it is too late to edit the post.

almostgotcaught · 2025-01-24T19:36:16 1737747376

[flagged]

nn3 · 2025-01-24T20:12:45 1737749565

>Secondly, if you think any compiler is meaningfully doing anything optimal >>("whole program analysis") on a TU scale greater than say ~50kloc (ie ~10 files) >relative to compiling individually you're dreaming.

That's wrong. gcc generates summaries of function properties and propagate those up and down the call tree, which for LTO is then build in a distributed way. It does much more than mere inlining, but even advanced analysis like points to analysis.

https://gcc.gnu.org/onlinedocs/gccint/IPA.html https://gcc.gnu.org/onlinedocs/gccint/IPA-passes.html

It scales to millions of lines of code because it's partioned.

jcalvinowens · 2025-01-24T20:53:47 1737752027

> if you think any compiler is meaningfully doing anything optimal ("whole program analysis") on a TU scale greater than say ~50kloc (ie ~10 files) relative to compiling individually you're dreaming.

You can build the Linux kernel with LTO: simply diff the LTO vs non-LTO outputs and it will be obvious you're wrong.

dapperdrake · 2025-01-24T21:45:18 1737755118

SQLite3 may be a counter-example:

https://sqlite.org/amalgamation.html