20 years ago, making a C compiler that provided sane behaviour and better guaran...

uecker · 2024-12-11T17:20:27 1733937627

This is certainly not true. Many programmers also learned to the use tools available to write reasonably safe code in C. I do not personally find this problematic.

lmm · 2024-12-12T01:15:55 1733966155

> Many programmers also learned to the use tools available to write reasonably safe code in C.

And then someone compiled their code with a new compiler and got a security bug. This happens consistently. Every C programmer thinks their code is reasonably safe until someone finds a security bug in it. Many still think so afterwards.

uecker · 2024-12-12T02:21:11 1733970071

There are couple of cases where compiler optimizations caused security issues, but that this happens all the time is a huge exaggeration. And many of the practically relevant cases can be avoided by using tools such as UBSan. The actual practical issue in C is people getting their pointer arithmetic wrong, which can also be avoided by having safe abstractions for buffer and string handling.

The other fallacy is that these issue then suddenly would disappear when using Rust, which is also not the case. Because the programmer cutting corners in C or prioritizing performance over safety will also use Rust "unsafe" carelessly.

Rust has a clear advantage for temporal memory safety. But it is also possible to have a clear strategy about what data structure owns what other object in C.

lmm · 2024-12-12T04:44:41 1733978681

> And many of the practically relevant cases can be avoided by using tools such as UBSan.

"can be", but aren't.

> The other fallacy is that these issue then suddenly would disappear when using Rust, which is also not the case. Because the programmer cutting corners in C or prioritizing performance over safety will also use Rust "unsafe" carelessly.

The vast majority of these programmers aren't making a deliberate choice at all though. They pick C because they heard it's fast, they write it in the way that the language nudges them towards, or the way that they see done in libraries and examples, and they end up with unsafe code. Sure, someone can deliberately choose unsafe in Rust, but defaults matter.

> it is also possible to have a clear strategy about what data structure owns what other object in C.

Is it though? How can one distinguish a codebase that does from a codebase that doesn't? Other than the expensive static analysis tool mentioned elsewhere in the thread (at which point you're not really writing "C"), I've never seen a way that worked and was distinguishable from the ways that don't work.

uecker · 2024-12-12T18:58:13 1734029893

> > And many of the practically relevant cases can be avoided by using tools such as UBSan.

> "can be", but aren't.

It is a possible option when one needs improved safety, and IMHO often the better option than using Rust.

> > The other fallacy is that these issue then suddenly would disappear when using Rust, which is also not the case. Because the programmer cutting corners in C or prioritizing performance over safety will also use Rust "unsafe" carelessly.

> The vast majority of these programmers aren't making a deliberate choice at all > though. They pick C because they heard it's fast, they write it in the way that the > language nudges them towards, or the way that they see done in libraries and examples, > and they end up with unsafe code. Sure, someone can deliberately choose unsafe in > Rust, but defaults matter.

The choice of handcoding some low-level string manipulation is similar to the choice of using unsafe rust. One can do it or not. There is certainly a better security culture in Rust at this time, but it is unclear to what extend this will be true in the long run. Also C security culture improves too and Rust culture will certainly deteriorate when usage spreads from highly motivated early adopters to the masses.

> > it is also possible to have a clear strategy about what data structure owns what other object in C.

> Is it though? How can one distinguish a codebase that does from a > codebase that doesn't?

This leads to the argument that it is trivial to see unsafe code in Rust because it is marked "unsafe" and just a small amount of code while in C you would need to look at everything. But this largely a theoretical argument: In practice you need to do some quality control for all code anyway, because memory safety is just a small piece of overall the puzzle. (and even for memory safety, you also need to look at the code surrounding code in RUst.) In practice, it is not hard to recognize the C code which is dangerous, it is the one where pointer arithmetic and string manipulation is not encapsulated in safe interfaces and it is the code where ownership of pointers is not clear.

>Other than the expensive static analysis tool mentioned elsewhere in the thread (at which point you're not really writing "C"), I've never seen a way that worked and was distinguishable from the ways that don't work.

I see some very high quality C code with barely any memory safety problems. Expensive static analysis can be used when no mistakes are acceptable, but then you should also formally verify the unsafe code in Rust.

lmm · 2024-12-13T03:57:28 1734062248

> The choice of handcoding some low-level string manipulation is similar to the choice of using unsafe rust. One can do it or not.

But most of the time programmers don't make a conscious choice at all. So opt-out unsafety versus opt-in unsafety is a huge difference.

> In practice you need to do some quality control for all code anyway, because memory safety is just a small piece of overall the puzzle.

Memory safety is literally more than half of real-world security issues.

> In practice, it is not hard to recognize the C code which is dangerous

> I see some very high quality C code with barely any memory safety problems

I hear a lot of C people saying this sort of thing, but they never make it concrete - there's no list of which popular open-source libraries are dangerous and which are not, it's only after a vulnerability is discovered that we hear "oh, that project always had poor quality code". If I pick a random library to maybe use in my project (even big-name ones e.g. libpq or libtiff), no-one can ever actually answer whether that's high quality C code or low quality C code, or give me a simple algorithm that I can actually apply without having to read a load of code and make a subjective judgement. Whereas I don't have to read or judge anything or even properly know rust to do "how much of this rust code is unsafe".

uecker · 2024-12-14T21:23:03 1734211383

> > The choice of handcoding some low-level string manipulation is similar to the choice of using unsafe rust. One can do it or not.

> But most of the time programmers don't make a conscious choice at all. So opt-out unsafety versus opt-in unsafety is a huge difference.

I don't think so. A programmer being careless will be careless with Rust "unsafe" too.

Don't get me wrong, I think marking code without guaranteed memory safety is a good idea. I just don't think it is a fundamental game changer.

> > In practice you need to do some quality control for all code anyway, because memory safety is just a small piece of overall the puzzle.

> Memory safety is literally more than half of real-world security issues.

https://www.horizon3.ai/attack-research/attack-blogs/analysi...

But I think even this is likely overstating it by looking at CVEs and not real world impact.

> > > In practice, it is not hard to recognize the C code which is dangerous

> > I see some very high quality C code with barely any memory safety problems

> I hear a lot of C people saying this sort of thing, but they never make it > concrete - there's no list of which popular open-source libraries are dangerous > and which are not, it's only after a vulnerability is discovered that we hear > "oh, that project always had poor quality code". If I pick a random library > to maybe use in my project (even big-name ones e.g. libpq or libtiff), no-one > can ever actually answer whether that's high quality C code or low quality C code > or give me a simple algorithm that I can actually apply without having to read > a load of code and make a subjective judgement. Whereas I don't have to read or > judge anything or even properly know rust to do "how much of this rust code is unsafe".

So you look at all the 300 unmaintained dependencies a typical Rust projects pulls in via cargo and look at all the "unsafe" blocks to screen it? Seriously, the issue is lack of open-source man power and this will hit Rust very hard once the ecosystem gets larger and this goes even more beyond the highly motivated first adopters. I would be more tempted to buy this argument if Rust would have no "unsafe" and I could pull in arbitrary code from anywhere and be safe. And this idea existed before with managed languages... Safe Java in the browser and so. Also sounded plausible but was similarly highly exaggerated as the Rust story.

lmm · 2024-12-15T01:01:36 1734224496

> A programmer being careless will be careless with Rust "unsafe" too.

Programmers will be careless, sure, but you can't really use unsafe without going out of your way to. Like, no-one is going to write "unsafe { *arr.get_unchecked(index) }" instead of "arr[index]" when they're not thinking about it.

> So you look at all the 300 unmaintained dependencies a typical Rust projects pulls in via cargo and look at all the "unsafe" blocks to screen it?

No, of course not, I run "cargo geiger" and let the computer do it.

I think unmaintained dependencies are less likely, and easier to check, in the Rust world. Ultimately what defines the attack surface is the number of lines of code, not how they're packaged, and C's approach tends to lead to linking in giant do-everything frameworks (e.g. people will link to GLib or APR when they just wanted some string manipulation functions or a hash table, which means you then have to audit the whole framework to audit that program's dependencies. And while the framework might look well-maintained, that doesn't mean that the part your program is using is), reimplementing or copy-pasting common functions because they're not worth adding a dependency for (which is higher risk, and means that well-known bugs can keep reappearing, because there's no central place to fix it once and for all), or both. And C's limited dependency management means that people often resort to vendoring, so even if your dependency is being maintained, those bugfixes may not be making their way into your program.

> And this idea existed before with managed languages... Safe Java in the browser and so. Also sounded plausible but was similarly highly exaggerated as the Rust story.

Java has quietly worked. It didn't succeed in the browser or on the open-source or consumer-facing desktop for reasons that had nothing to do with safety (in some cases they had to do with the perception of safety), but backend processing or corporate internal apps are a lot safer than they used to be, without really having to change much.

quotemstr · 2024-12-11T18:29:12 1733941752

> safe code in C

You're like a Japanese holdout in the 60s refusing to leave his bunker long after the war is over.

C lost. Memory safety is a huge boon for security. Human beings, even the best of them, cannot consistently write correct C code. (Look at OpenBSD.) You can keep fighting the war your side has already lost or you can move on.

uecker · 2024-12-11T19:17:15 1733944635

Well, memory safety is great but it seems Rust programmers also manage to create memory safety issues just fine:

https://rustsec.org/advisories/RUSTSEC-2024-0401.html https://rustsec.org/advisories/RUSTSEC-2024-0400.html https://rustsec.org/advisories/RUSTSEC-2024-0377.html https://rustsec.org/advisories/RUSTSEC-2024-0374.html etc.

whytevuhuni · 2024-12-11T20:05:51 1733947551

I think the first one, stack overflow, is technically not a memory safety issue, just denial-of-service on resource exhaustion. Stack overflow is well defined as far as I know.

The other three are definitely memory safety issues.

ryao · 2024-12-11T21:25:24 1733952324

I would consider a stack overflow to be a memory safety issue. The C++ language authors likely would too. C++ famously refused to support variable length stack allocated arrays because of memory safety concerns. In specific, they were worried that code at runtime would make an array so big so big that it would jump the OS guard page, allowing access to unallocated memory that of course is not noticed ahead of time during development. This is probably easy to do unintentionally if you have more stack variables after an enormous stack allocated array and touch them before you touch the array. The alternative is to force developers to use compiler extensions such as alloca(). That makes it easy to pass pointers outside of the stack frame where they are valid and is a definite safety issue. The C++ nitpicking over variable length arrays is silly since it gives us a status quo where C++ developers use alloca() anyway, but it shows that stack overflows are considered a memory safety issue.

whytevuhuni · 2024-12-11T21:55:05 1733954105

In the general case, I think you might be right, although it's a bit mitigated by the fact that Rust does not have support for variable length arrays, alloca, or anything that uses them, in the standard library. As you said though, it's certainly possible.

I was more referring to that specific linked advisory, which is unlikely to use either VLAs or alloca. In that case, where stack overflow would be caused by recursion, a guard frame will always be enough to catch it, and will result in a safe abort [0].

[0] https://github.com/rust-lang/rust/pull/31333

ryao · 2024-12-12T01:30:38 1733967038

I cited the complaints against VLAs as support for stack overflows being a memory safety issue. I did not mean to imply that Rust supported them.

quotemstr · 2024-12-11T20:06:54 1733947614

C++ is a better unsafe language than unsafe Rust, IMHO. The thing about the social dynamic of Rust, though, is that it keeps unsafe code to a minimum.

uecker · 2024-12-12T02:40:55 1733971255

This may be true, but the minimum unsafe code still seems not that small. Maybe I just had bad luck, but one of the first things I looked at more closely was an implementation of a matrix transpose in Rust (as an example of something relevant to my problem domain) and that directly used unsafe Rust to be reasonably fast and then already had a CVE. This was a revealing experience because was just the same type of bug you might have had in similar C code, but in a language where countless people insist that this "can not happen".

uecker · 2024-12-12T02:04:49 1733969089

I agree that one shouldn't have been included. My favorite ones aren't included here anyway, e.g. how a Rust programmer managed to create a safety issue in a matrix transpose or how the messed up str::repeat in their standard library.

And don't get me wrong. I think Rust is as safer language in C. Just the idea that C is completely unsafe and it is impossible even for experts to write reasonable safe code while it is completely impossible in Rust to create an issue is just a lot of nonsense. In reality, it is possible to screw up in both languages and people do this, and reality is that safety in Rust is only somewhat better when compared to C with good security practices. But this is not how it is presented. I also think the difference will become even smaller when C safety continues to improve as it did in the last years due to better tooling while Rust is being picked up by average programmers under time pressure who will use "unsafe" just as carelessly as they carelessly hand-roll pointer arithmetic in C today.

ryao · 2024-12-11T21:02:58 1733950978

Use a sound static analyzer like astree and you can produce memory safe C code:

https://www.absint.com/astree/index.htm

Note that the key word here is sound. The more common static analyzers are unsound tools that will miss cases. Sound tools do not, but few people know of them, they are rare and they are typically proprietary and expensive.

quotemstr · 2024-12-11T22:40:30 1733956830

Sure. I'm also a big fan of what Microsoft has done with SAL. And of course you have formally proven C, as used in seL4. I'd say that the contortions you have to go through to write code with these systems takes you out of the domain of "C" and into a domain of a different, safer language merely resembling C. Such a language might be a fine tool! But it's not arbitrary C.

uecker · 2024-12-12T03:08:45 1733972925

Note that my original comment above was "reasonably safe" and not "perfectly memory safe". You can formally prove something with a lot of effort, but you can also come reasonably close for practical purposes with a lot less effort and more commonly available tools.

You are right that "arbitrary C" is not safe while safe Rust is safe, but this is mostly begging the question. The question is what can you do with the language depending on your requirements. If you need safe C this doable with enough effort, if you need reasonably safe C this is even practical in most projects, and all this should be compared to Rust as used in a similar situation which very well may include use of unsafe Rust or C libraries which may also limit the safety.

ryao · 2024-12-12T06:00:36 1733983236

It is C. It is just written with the aid of formal methods. It would be nice if all software were written that way. That said, if you want another language “resembling C”, there is always WUFFS:

https://github.com/google/wuffs

The output of the WUFFS compiler certainly resembles C because it is C.

lmm · 2024-12-15T01:05:08 1734224708

> It is C. It is just written with the aid of formal methods.

It is not C in the sense that many of the usual reasons to use C no longer apply. E.g. a common reason to use C is the availability of libraries, but most popular libraries will not pass that analyser so you can't use them if you're depending on that analyser. E.g. a common reason to use C is standard tooling for e.g. automated refactoring, but will those standard tools preserve analyser-passing? Probably not.