Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
GCC's new fortification level: The gains and costs (redhat.com)
129 points by pjmlp on Sept 18, 2022 | hide | past | favorite | 40 comments


"Strictly speaking, the C standards prohibit using a pointer to an object after its lifetime ends. It should neither be read nor dereferenced. In this context, it is a bug in the application. However, this idiom is commonly used by developers to prevent making redundant copies."

I thought that hack was dead and buried. It won't work in debug modes where buffers are zeroed in "free".

Fear of copying time is usually misplaced. The PDP-11 is gone. Unless you're copying megabytes, the copy time of recently accessed in-cache data is very small.


This hack of using freed memory is not what (at least as far as I can tell) the issue was about: the code calls realloc and then compares the old pointer value to the new pointer value to avoid recalculating some offsets and somehow gcc decides that this is undefined behavior and refuses to equate them; click the "Autogen" link in the article to go to the issue where they are discussing what happened... like, it is "a bug in the application", but it only seems to be a bug in the application if you have extremely detailed knowledge of undefined pointer behavior, not merely because someone was making egregious implementation assumptions about the allocator.


I'm reminded of https://www.foonathan.net/2022/08/malloc-interface/#problem-.... While initially I was skeptical of the author's criticism of C's allocation APIs, this bug seems to be one that could've been avoided by using the proposed try_expand() instead of realloc().

Of course ideally we'd move away from provenance-based UB and Rust's third-class aliased mutability, to simpler conservative aliasing semantics by default, and allow programmers to opt into non-general optimizations by manually saving unchanging values like vector::size() into locals, or have the optimizer and programmer interactively explore code invariants and optimizations, making the optimizer a performance-focused pair programmer rather than a black box.


What a stupid "smart" compiler.


I think it's the optimization the compiler writers insist on that developers absolutely do not want. When the compiler detects UB any code that depends on it can be 'safely' deleted.

It's the sort of arrogant attitude that's behind driving people away from C/C++


What is driving people away from C and C++ are the CVE's that keep being the same for the last 40 years, despite everyone repeating that one only needs to be good enough developer and use the tools to avoid them.

The abuse of UB is the way how those 1980's C and C++ compilers that generated lousy code easly outperformed by Assembly coders on 8 and 16 bit home computers, finally improved their code generation quality due to the lack of strong type information for the compiler.

So where we are, trying to escape bad decisions from the past.


There are two target groups for a C/C++ compiler:

1) embedded or OS hackers: they want code generation to be predictable and dislike surprising optimizations. To them, undefined behavior is behavior outside the standard specific to certain compilers that they expect to not suddenly change.

2) application and especially HPC developers. They want the compiler to exploit every trick in the literature to improve performance. In return, they are aware that undefined behavior cannot be relied on.

Any compiler that is popular with both crowds would have to strike a compromise.


This is a misunderstanding of how compilers work. Compilers don't actively go out of their way to erase code that it thinks should be skipped, but rather as a consequence of multiple optimizations, code may end up being compiled differently than intended.

void Foo(Bar * bar) { if (bar == nullptr) { println("Bar is null!"); } return bar->ComputeFoo(); }

E.g., one pass may say "the println code is unreachable unless a null pointer is dereferenced", and the next may say "the only code that is reachable is `return bar->ComputeFoo()`", so just compile the function as that.

Imagine instead of a println the code is actually some large body of code, and that the function is inlined into another where bar is null. In that case you'd want the compiler to avoid compiling the code, but it can't without those passes.


Compilers for languages like Ada or Eiffel will usually disregard optimization algorithms that might go into "this might crash your airplane" kind of territory, whereas in C and C++ land whatever, as long as we get a few more μs, who cares.


Indeed, the mindset is the key issue. In JVM land, where I learned most of my compiler ideas, the assumption is "optimization is not observable". JVMs move heaven and earth to make it appear as if they were simple interpreters just running the bytecodes one by one. The C/C++ world is the only one I can think of where this is manifestly not the case: optimizations are routinely observable and an endless source of confusion and frustration for developers. In the presence of even a single bug in the application, suddenly UB results in the veritable gates of hell opening and the entire lower world of the guts of abstractions comes spilling out. Woe to those who would try to debug at this level, for they fight demons of every form.


It's absolutely an optimisation that people trying to achieve maximum performance from C/C++ expect. It's impossible to have the same degree of performance without such optimisations, hence why new language Zig has even more undefined behaviour than C.


It seems that the semantics of undefined behavior are elusive to many people in the industry, so allow me to clarify some misconceptions.

1. Zig does not have more undefined behavior than C. Much like there are two kinds of complexity, accidental, and essential, C has multiple kinds of undefined behavior: accidental, and essential. Accidental UB is stupid shit like "if your file doesn't end with a newline, UB occurs". Essential UB is things like, if the memory of a local variable is changed by /proc/mem by another process, while a function is evaluated, UB occurs. Essential UB allows basic, essential optimisations to take place that everyone expects every language & compiler to be able to perform. C has a large amount of accidental UB; Zig has none.

2. A Zig application decides what to do when a safety check triggers by overriding the panic handler. Zig's default panic handler crashes with a helpful stack trace. This is a killer feature.

3. I see a lot of people talking ignorantly about safety critical applications. Let's talk about Level A Clearance. This is software that is licensed to run on airplanes and other safety critical components in the United States. Here's how it works: you have to test every error condition and every branch at the machine code layer. This makes a simpler language such as C or Zig more well-suited than language with hidden control flow such as C++ or Rust, because it causes problems for testing every branch at the machine code level. Furthermore, such components are redundant, so that when one fails, the readings of the others are used. So, crashing or otherwise indicating a faulty reading is absolutely what you want safety-critical software to do, as opposed to giving a well-defined, incorrect reading due to, for example, an integer overflow, which can happen in "safe" Rust.


From the Zig language reference: Zig has many instances of undefined behavior. If undefined behavior is detected at compile-time, Zig emits a compile error and refuses to continue. Most undefined behavior that cannot be detected at compile-time can be detected at runtime. In these cases, Zig has safety checks. [...] When a safety check fails, Zig crashes with a stack trace


> Zig crashes with a stack trace

I hope the people behind Zig understand that in critical applications that's totally unacceptable.


What exactly is the language supposed to do then. Let's say that you caused an integer overflow, or tried to perform an out of bounds access in a dynamic array. What's the program supposed to do if not crash.

If your point is that some software shouldn't crash, then yes, for sure. But that's on you to not make programming errors in your code.

In fact Zig does help you create software that doesn't crash like not many other programming languages do, for example by not having language features that rely on implicit memory allocations. This gives you the opportunity to always have a fallback strategy if a memory allocation fails.


You can use memcmp to check whether the pointer has not changed.

   void *oldptr = malloc(73);
   void *newptr = realloc(oldptr, 42);

   if (memcmp(&oldptr, &newptr, sizeof oldptr) == 0) {
     // not changed
   }
The value of oldptr is indeterminate if it is used as a pointer; it can still be accessed as an array of bytes.


Is a cast to uintptr_t also OK ?

  void *oldptr = malloc(73);
  uintptr_t savedptr = (uintptr_t)oldptr;
  void *newptr = realloc(oldptr, 42);
  if (savedptr == (uintptr_t)newptr) {
    // not changed
  }


That is OK, at least in the sense that there's no undefined behaviour.

I'm not sure it's also OK, because the memcmp suggestion you are replying to seems a bit suspect.

There's still the problem that comparing the uintptr_t's is not guaranteed to yield the same result as the pointers they were cast from. But that's merely implementation-specific behaviour, not undefined.


I think it's fine on the assumption that a uintptr_t converted from a pointer cannot be a trap representation, which is the case on mainstream platforms.

(An unsigned type can only have trap patterns if it has padding bits. Every combination of the value bits is a valid value according to the pure binary encoding.)


ASFAIU the problem was that the application in question kept using oldptr, e.g.:

   int *oldptr = (int *)malloc(42);
   int *newptr = (int *)realloc(oldptr, 73);

   if (oldptr == newptr) {
     newptr[70] = 10; // ok because the "object" newptr can contain 73 ints
     oldptr[70] = 10; // SIGABRT here because the "object" oldptr can only contain 42 ints even though the memory block oldptr points to can hold 73
   }


wrong. you need to save away old ptr and assign to the realloc 1st arg. memcmp is also wrong, compare the values, so that the compiler knows about it.

    void *ptr = malloc(73);

    void *old = ptr;
    void *ptr = realloc(ptr, 42);
    bool realloced = old != ptr;


I don't quite see what you're getting at, but I do know you can't declare ptr twice in the same scope in C; did you miss a curly brace somewhere to open a new scope, or else want to use different names?

Not comparing the values is the point. Your code uses the pointer that was passed to realloc, and that is undefined behavior according to ISO C.

The game you're playing with the identifiers is pointless; there is no difference between what you're trying to do and just:

   void *oldptr = malloc(73);
   void *newptr = realloc(oldptr, 42);
   bool realloced = oldptr != newptr; // undefined behavior
That's what the article is referring to, and that I'm specifically addressing with the memcmp. Accessing the pointer as an array of bytes doesn't use its value as a pointer. Bytes cannot be indeterminate; they are not allowed to have trap representations.

(There could be a false negative: the address didn't change, but the pointer bit pattern did. On 64 bit systems, the C library could easily put a tag into the upper bits of the pointer, and have realloc change the tag even if the address is the same.)

The problem described in the article is that the compiler generated a false negative even when the pointer didn't change, due to the undefined behavior. The idea is something like that since oldptr was passed to realloc, it is garbage. The newptr is good, and we need not compare garbage to non-garbage; we can just declare them to be unequal.

That's what you might get if your follow your advice of "compare the values, so that the compiler knows about it".


Very cool

From the previous article linked at the footer of this one:

- https://developers.redhat.com/blog/2021/04/16/broadening-com...

It says that this is available in LLVM

  GCC support for __builtin_dynamic_object_size or equivalent functionality is in progress. At the moment this is available only when building applications with LLVM. There are some unspecified corner cases with __builtin_dynamic_object_size that may result in avoidable performance overheads. We hope to iron those out with the GCC implementation and feed it back into LLVM, thus making both implementations consistent and performant.
Does this mean I can set this flag in Clang and it'll work?


Yes.


Which Linux distributions are planning to use this new fortification level, and in what circumstances?

I can imagine enabling this in programs that directly interact with the internet, e.g., web browsers, email clients, and resolver libraries. I can imagine Debian, Ubuntu, and Red Hat enabling it. It would especially make sense for Tails, at least in some cases. But that doesn't mean it will happen. I'd love to hear more.


OpenSUSE already enables it across the distribution; they were the first to do it. I believe Gentoo either already has or is in the process of doing it. I'll make a proposal for Fedora once I have a better idea of the code size and performance impact.


> I believe Gentoo either already has or is in the process of doing it.

There's a bug with the patch to enable it, tracking issues in other packages, so they seem to be in the process.


Android usually takes advantages of this kind of fortifiation levels (however the clang equivalents), SELinux is also enabled by default, syscalls are whitelisted on seccomp, hardware memory tagging is now done when available, and for poor folks like Termux, using anything not declared as public NDK API might kill the process.


This link shows a blank page with red text “Sorry, you need to enable JavaScript to visit this website.” But when I scroll on mobile, a little extra space appears due to the url bar scrolling out of view, and in that space, I can see a glimpse of the full article itself rendering just fine, only it's unusable because of this anti-disable-Javascript overlay blocking the view :(


Looks fine in a text-only browser.

For folks who prefer reading text using large, complex, graphical browsers released by organisations that seek to profit directly or indirectly from the proliferation of online advertising^1

   curl https://developers.redhat.com/articles/2022/09/17/gccs-new-fortification-level|sed -n '/./{/article-content/,/<\/div>/{/<\/div>/,$d;p;};}' > 1.html
   firefox ./1.html
1. Apple, Microsoft, Mozilla, Google, Brave, etc.


From https://www.economist.com/business/2022/09/18/the-300bn-goog... (currently on HN front page)

"The most surprising new adman is Apple. The iPhone-maker used to rail against intrusive digital advertising. Now it sells many ads of its own. As sales of smartphones plateau, the company is looking for new ways to monetise the 1.8bn devices, from smartphones to smart earphones, it already has in circulation. So far it is only dabbling in ads and does not report sales figures. But Bloomberg reported recently that Apple's ad business was already generating sales of $4bn a year, making it about as big an ad platform as Twitter. Apple executives believe there is much more to be had."


Even with javascript enabled it looks awful on mobile.


Unfortunately not much in the way of performance measurements :(


It's on my TODO list. Watch out for Fedora change proposals for (hopefully) Fedora 38.


What if you develop inside of a Fedora 38 Docker container

  FROM quay.io/fedora/fedora:38
  RUN dnf install -y <bunch of tools>
Got any useful tips or flags to enable that're bleeding edge?


It's not a runtime flag, you'll have to patch redhat-rpm-config to use _FORTIFY_SOURCE=3 instead of 2 and then build packages with it.

Of course if you only want to build your application with _FORTIFY_SOURCE=3, you can do it right away even on Fedora 36. The Fedora change will be to build the distribution (or at least a subset of packages) with _FORGIFY_SOURCE=3.


What would be the impact of making _FORTIFY_SOURCE=3 a Fedora-specific Make default (which doesn't apply to CMake or Ninja because those are different defaults, which are for GCC not LLVM/Clang anyway)?


We had bounds checking GCC twenty five years ago.

https://stuff.mit.edu/afs/sipb/project/bounds/src/gcc-2.7.2/...

I played with that; it worked.

In the 90's I reached for Bruce Perens' Electric Fence; that did a good job for me.

We've Valgrind for some two decades now?

Call me unexcited ...


And we have lint since 1979.

What we miss is people actually using these tools in any meaningful way.

Instead we have to go for hardware memory tagging and sandboxing, because there is no advocacy that changes their ways.


To be fair the overhead of bounds checking GCC was rather large. Valgrind is a better and there's no excuse for not using it in your test suite. I doubt valgrind could realistically be used in production. -D_FORTIFY_SOURCE (=2) is used in production everywhere, assuming you're running some Red Hat or SUSE derived distro.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: