I've currently involved with a system, written in C, which has been going for 30 years: GAP - https://www.gap-system.org
While I write a lot of C, I immediately disagree with the idea that C has a "simple (yet expressive) abstract machine model". Every so often we find a bug which has been present for over a decade, because some new compiler has added a (perfectly legal by the standard) optimisation which breaks some old code.
Picking one example: in the "old days", it was very common (and important for efficiency) to freely cast memory between char, int, double, etc. For many years this was fine, then all compilers started keeping better track of aliasing and lots of old code broke.
Also, while POSIX is a nice base, it stops you using Windows, and also almost every non-trivial program ends up with a bunch of autoconf (which has to be updated every so often) to handle differences between linux/BSD/Mac.
Also, definatly don't distribute code where you use flags like '-pedantic', as it can lead to your code breaking on future compilers which tighten up the rules.
Oh I had a fun one like this just this morning. I have (C++) code spanning back to the early 1990's, although some of it was written (as C) a decade before that. It relies on checking the floating point unit status register for a bunch of its error checking. Turns out that at some point, casts from floating point to integer types started being implemented (by the compiler) using the cvttss2si instruction. And when the floating point value is too large to be represented in the integer, the fpu flag is set to 'invalid'. Which (I assume, from how everything else was implemented) didn't used to happen with whatever instruction(s) were used before SSE. And in my code, this only happens very rarely (basically it needs a combination of rare, rare and very rare circumstances to happen - too tedious to explain, not that anyone cares) so I only got bitten by this this week - probably 15 years after this started happening? And yeah the case is technically undefined behaviour, not that I'm enough of a language lawyer to know. If it hadn't been for some kind soul on SO to point this out I wouldn't even have looked in this direction because the last time this combination happened, it worked (and yeah it turns out that that program where 'it works' was compiled 20 years ago...)
Ugh, just venting, but it helps to know that there are others out there suffering through this :)
The changing meanings of compiler flags are a personal pet-peeve, as it leads to strange situations where you have to add both "-Wall" AND "-Wextra".
Still in programs that I expect to be used on wildly different systems I tend to enable all the flags that are common in the development-builds, and be more conservative in the production/deployed version.
> Also, while POSIX is a nice base, it stops you using Windows, and also almost every non-trivial program ends up with a bunch of autoconf (which has to be updated every so often) to handle differences between linux/BSD/Mac.
I agree the lack of compliant POSIX on Windows is annoying. However, unless you rely on 3rd-party libraries, you can use #ifdef to write OS-specific code without autoconf.
You compile your code with -pedantic, it works, and then you distribute the source. A user gets that code, compiles it, it works, and they integrate it into their product. Later, that user upgrades or changes their compiler and your code doesn't build anymore because there's a new warning. Now they have to patch your build.
You are right, I mis-remembered what the flag did, sorry.
I've seen projects with -pedantics -Werror, which are particularly annoying (-Werror in general to be honest, I understand why people might want it for CI of course).
This article certainly rings a bell, as I started rewriting my personal projects to C in the last year, precisely because I wanted to make them decades-proof. I still use the same vimscripts I wrote in early 2000', I want the same thing for all my tooling and apps.
I'm not sure it makes sense professionally, though, as most codebase won't survive a decade : after three years, the dev team will turn over, and the new team will want to rewrite everything from scratch. Or start rewriting parts of the exisiting system in a new language, until it ultimately eat it up. It may be related to the kind of companies I work with, though (very early stage startups).
Regarding interfaces, I think the author could have gone a step further. There is actually a standard and portable interface system: html/js/css. If you write a dependency free web app using things like webcomponents and other standard techs, you know it will stand time, and it actually matches all the reason why the author want to use C : standard and multiple implementations.
If you're in a web startup, software won't last 3 years, the next team will systematically rewrite.
If you're in the bank, logistics, defense sector, it's very likely the software will go for a decade, as long as it's not killed the first or second year for being a pet project (initial manager left) and having no customer.
> If you're in a web startup, software won't last 3 years, the next team will systematically rewrite.
I have an old man rant about that actually... that rewrite is typically unnecessary if you actually use discipline when developing and learn how to read code.
I once took on a CakePHP 2 app and another developer asked me how in the world I got into, and understood, the framework so quickly. My secret? I read the CakePHP 2 source code. So many developers learn how to do that very well.
> that rewrite is typically unnecessary if you actually use discipline when developing and learn how to read code
"But developers that can exercise discipline and know how to read (and modify) code instead of rewriting cost so much money..." is what you'll typically hear in response to this.
It's cheaper (and often faster) to have cheaper, less disciplined, less experienced developers rewrite something multiple times than it is to have more expensive, more disciplined, more experienced developers write something and maintain it. It's also harder to keep the more experience developers because most developers I work with start looking for another job when their project goes into maintenance.
The typical "we never have enough time/money to do it right the first time but we always have to make the time/money to do it twice" situation.
> It's cheaper (and often faster) to have cheaper, less disciplined, less experienced developers rewrite something multiple times than it is to have more expensive, more disciplined, more experienced developers write something and maintain it.
I can't believe this. I've seen the sheer difference in speed and maintainability a single solid web developer can deliver in a framework they are familiar with versus teams of more Jr developers who spin their wheels for weeks. Rewriting when you don't even understand the starting point is always a waste of money.
> It's also harder to keep the more experience developers because most developers I work with start looking for another job when their project goes into maintenance.
This certainly resonates though. I've been that developer more than once.
> I can't believe this. I've seen the sheer difference in speed and maintainability a single solid web developer can deliver in a framework they are familiar with versus teams of more Jr developers who spin their wheels for weeks. Rewriting when you don't even understand the starting point is always a waste of money.
I agree completely.
It can be quite shocking just how much damage a poor developer can do to small to medium companies. I know of 1 company that's holding on for dear life right now because they lost their biggest client due to a very poor developer they had employed. I told them 6 months before this all happened to get rid of him, but they didn't. And Corona is just making it that much harder for them to find new work.
"I'm not sure it makes sense professionally, though, as most codebase won't survive a decade"
That is highly context sensitive. For example CAD packages are generally decades old.
They can't be rewritten from scratch. There is too much code. Too much of it is domain specific. The features can't change or else customer projects worth billions might suddenly go tits up when they migrate to a newer version (customers don't migrate to newer version very often though).
So, if there is some domain specific use case, worth millions to the software vendor and potentially billions to clients then stability is far more critical than keeping the codebase "modern".
This is mostly good advice. I don't love configure scripts, I don't agree with the heavy reliance on POSIX if you intend to be compatible with Windows, and I don't love the fact that the author recommends third party data structure libraries that they haven't actually used. For container libraries in C, you really have to use them to get a feel for their usability (this sounds like a tautology but it's not.)
I disagree strongly with one recommendation. This is just an example, but it holds for larger API design in general:
> we could add a fallback to reading /dev/random. [...] However, in this case, the increased portability would require a change in interface. Since fopen() or fread() on /dev/random could fail, our function would need to return bool.
No, definitely not. It is dangerous to expect the application to sanely handle the case of randomness being unavailable when it is never going to occur in practice. On all POSIX platforms, /dev/random exists and will block until sufficient entropy is available. Something would have to go seriously wrong for this to fail. This is so rare that any error handling code for it will never be tested. The most likely outcome of forcing the caller to handle it is that the return value is ignored or improperly handled and the buffer is used uninitialized, leading to a security vulnerability.
My recommendation instead would be to error check your fopen() and fread() calls within get_random_bytes(), and print an error and abort() if they fail. This way if someone's system is improperly configured and /dev/random doesn't work the program will just crash. Same goes for macOS's SecRandomCopyBytes() and Windows' half a dozen calls to use an HCRYPTPROV. This way you still return void and there is no danger of callers improperly handling errors.
In general, unless you're writing safety-critical software, it's fine for your code (or even library code) to abort() in these sorts of exceptional situations when there is no reasonable or safe way to handle the error. If someone truly wants to handle the error, they can just not use your API and do it manually.
The author lists a number of languages considered stable, C being one of them because of widespread support and portability. Java isn't portable for example because it depends on the JVM (and I know GraalVM is a thing but will you still be able to use it in ten years?).
The argument against Java is weak... You can take the latest JVM and run any jar from 1999.
Also, Java has had jlink in the latest versions which compiles a runtime that does not require a JVM installation, you don't need GraalVM for that.
From the title, I was hoping to hear about software systems that have powered infrastructure for decades, but unfortunately it was more of a programming language analysis strategy.
Hm, decades is not that much, most enterprise code fits into that.
But how about 200 years?
It is about people. Documentation, paper trail why some decisions were made, archiving build tools, VMs, dependency source code..
Also C, POSIX and Motif are terrible choice for their fragmentation. Java is very booring, but compiling and debugging 20 years old code is very common.
That forceinline definition is just tip of the iceberg. It's so hard to define in a way that works with different versions of GCC, -Werror, instrumentation, MSVC, and profiling. If you care about portability, consider just not caring and using static. Too much special casing code can actually make it harder for people in weird environments to use your code, since something is going to break it, and reading past the ifdef soup becomes the biggest obstacle.
I'm currently "betting" on Go for making a back-end (just a REST API + sqlite database) that will last a decade; I'm betting on the tooling to stay backwards compatible or with minimal changes in the codebase; I'm betting on the readability of my own code for the next decade, and I'm betting on the language + tools to continue to be developed whilst sticking to their original goals.
I can't tell if you want generics or not? I've been thinking about the topic for a while and sort of think Go doesn't need em. It's less a technical reason but more a cultural reason. The language shipped without them as a feature. So why deprecate that feature and make lite-version of Java/C#.
A decent chunk of new Go developers are former Java developers. (I'm not being disparaging here, I came over to Go from Java!) I think the idea of "accept interfaces, return structs" doesn't sink in very well and isn't as readily apparent as it could be.
When I have a function that takes a struct, and I need it to take a different struct for the same argument due to whatever, that to me screams "make an interface", but I hadn't been programming Java very long before my team started using Go - I've now been writing Go professionally longer than I have Java.
> I'm betting on the tooling to stay backwards compatible or with minimal changes in the codebase
This is actually why I'm pretty bullish on things like RoR, Laravel, et al.
The sheer speed at which they go to a new version that breaks BC is actively making the web less secure. I've lost count of how many times I've found a new client with this software that's been working for years but suddenly broke, only to realize it's on an OS that's EOL, using a version of the framework that's EOL and a version of the language that's EOL. And now it's my job to bring it up to speed.
And typically the hardest part of that? The 3rd party dependencies that are either abandoned and don't support the newer versions of anything, or have moved onto Python 3 and no longer support Python 2.
It's why I vastly prefer something like asp.net core. I know in 5-10 years the code will probably just work with the latest version, and if there's an incompatibility, it's going to tend to be relatively small.
The same reason why my default stack is still Java/Kotlin with Spring and Hibernate. A stable environment, stable runtime that is guaranteed to not change too much and has a culture of backwards compatibility.
How are you liking Java with all the versions that are dropping. Are you still on 8? Correto? Or are you keeping up? and do you see the value in the features that are dropping.
Java is moving a lot faster now with the new 6 month release cadence but you don't need to follow every release. Sticking to LTS releases is fine for projects that require longevity.
Some projects are still using 8 and will be until their EOL date. New projects start on the latest LTS release (11).
java.time is really nice, so I use Java 11 (2018) on my servers. If not using java.time, Java 8 (2014) or later is fine. But I use Clojure, not raw Java. I haven't had any issues with upgrading. If a server works with 8, it works fine with 11 (in my experience).
Seems like good advice. I'd add another one that seems completely obvious, but some sloppy developers ignore it: avoid undefined behaviour. If you're going to work with C, you need to know about undefined behaviour and take it seriously.
If it were so easy, there would be already specified a subset of C without undefined behavior and you could be able to automatically check your code against it.
My point was only that C programmers should be keenly aware of the pitfalls of undefined behaviour, rather than blithely ignoring it. I've been surprised by the sloppiness of some developers on this point.
> a subset of C without undefined behavior
There are various projects out there that let you produce C code guaranteed to be free of undefined behaviour, but they're not 'quick fix' solutions, so they're not widely used.
I've been actively wondering about generating some efficient and portable C code, and for this project it wouldn't be super-complicated, but undefined behavior is the one thing that keeps me away. C++ and Rust and C# and many other languages all add wonderful things, with side effects on portability, clarity, learning curve, language stability, etc. - wonderful things that I don't always want in a twenty-year-stable system.
Anyway, thank you for these, I'm definitely going to look further here.
What does "keenly aware" even mean? For example: any time I add or subtract two signed ints, undefined behavior can happen. Now what. Must I pepper the code with bounds checks (which are prone to UB too if not done carefully)?
Anyway, any complicated thing that can be easily ignored, inevitably will be.
Keeping the threat of undefined behaviour in mind, and taking steps accordingly, rather than complacently ignoring it. C is a highly unsafe language, and the programmer shouldn't forget this.
> any complicated thing that can be easily ignored, inevitably will be.
The demonstrable inability of C programmers to write correct code is a strong argument against the widespread use of C. Even old languages like Ada show that you can use a language much safer than C and still achieve solid performance. Languages like Rust are making further progress on having safety, performance, and programmer-convenience, all at once.
If you use an ultra-safe language like verified SPARK Ada, the language doesn't even allow you to, say, forget to check whether a denominator is zero, or to forget to protect against out-of-bounds array access.
> Must I pepper the code with bounds checks (which are prone to UB too if not done carefully)?
Not necessarily; a tool can help check for undefined behaviour. Static analysers, GCC flags, and tools like Valgrind, can automatically check for out-of-bounds array access, divide-by-zero, or attempting to dereference NULL. [0] Adding your own runtime assertions isn't a crazy idea though, especially for dev builds. If this were the norm in C programming we'd have fewer security vulnerabilities.
C lacks the kind of runtime checks that are 'always on' in languages like Java and C# (out-of-bounds, divide-by-zero, etc). That's not because such checks don't apply to C code, it's because of the minimalist C design philosophy. You have the option to add your own checks, or use tools to do so automatically, but if you develop without any checks anywhere you should expect to have more bugs. Java added them for a reason.
The C++ language has a somewhat different design philosophy, but it's the same reason its std::array class-template has both a runtime-checked at member-function, and an unchecked operator[]. It would be against the design philosophy to force you to pay the runtime overhead for checks, but it gives you the option.
"Design philosophy"...oh please! C was designed for transistor- and memory- scarce microcomputers. Nowadays there is defacto supercomputer in every phone and runtime bounds checks are cheap. Moreover, allowing CPU to know the size of memory chunk pointed to could enable optimization which would make the code actually faster (not even talking about security benefits). But you C programmers insist tooth an nail against that...
> For example checking for signed overflow must be done carefully:
Right, but we're talking about a simple bounds check. There should be no need for any arithmetic, just comparison.
> "Design philosophy"...oh please! C was designed for transistor- and memory- scarce microcomputers.
Right. Hence its design philosophy.
> Nowadays there is defacto supercomputer in every phone and runtime bounds checks are cheap.
Cheap, but perhaps not cheap enough to dismiss entirely. Bounds checking costs a few percent of performance [0], enough to put some people off in some domains such as in the kernel.
It's a pity C makes it difficult to automate just about any kind of check. Checking whether a pointer overruns a buffer that was returned by free, for instance, requires quite a bit of cleverness, as the system has to track the size of the allocated block.
You have to rely on optional compiler features, elaborate static analysis tools (often proprietary and expensive), and dynamic analysis tools like Valgrind. Ada on the other hand enables all sorts of runtime checks by default, but it's easy to switch them all off if you're sure.
> CPU to know the size of memory chunk pointed to could enable optimization which would make the code actually faster (not even talking about security benefits)
What kind of optimisation do you have in mind? Pre-caching?
> But you C programmers insist tooth an nail against that...
'Fat pointers' of this sort have been tried with the C language [1] but I can't see the committee adding them to the standard. Part of C's virtue is that it's extremely slow moving.
I'm not advocating continued widespread use of C though. I hope safe-but-fast languages like Rust do well. We all pay a price for the problems associated with C and, perhaps to a lesser extent, C++. For what it's worth I haven't written serious C or C++ code for a long time.
> What kind of optimisation do you have in mind? Pre-caching?
All kinds of branch prediction. If CPU knew it is iterating over a fat pointer, it could safely internally unroll the loop or even vectorize the operation. There are so many easy tricks based on runtime information used by JIT languages, yet not implemented in silicon (or only speculatively) because of C legacy.
> If CPU knew it is iterating over a fat pointer, it could safely internally unroll the loop or even vectorize the operation.
That should already be possible with the C custom of passing an array's size alongside the array pointer. As I understand it modern CPUs have very sophisticated loop-detection for doing precisely this kind of thing.
Now we're back to square one: C Compiler has to resort to UB to optimize many kinds "for" loops, including basic stuff like extracting the iteration count - examples in today's discussion. It then tries to emit something that CPU speculative prediction unit can successfully pick up. It is so hard for programmer to unequivocally signal the intent like "we're working with this here chunk of memory malloc'd with this size and nothing ever beyond it and let me know if this is violated". I don't understand why this state of affairs is not universally considered sordid but it's handwaved "just be aware of UB and pass the size along and it will be well"...
> C Compiler has to resort to UB to optimize many kinds "for" loops
The compiler is always permitted to assume the absence of UB. This is the case no matter what your C code is doing, and you're never further than one expression away from UB.
> It is so hard for programmer to unequivocally signal the intent like "we're working with this here chunk of memory malloc'd with this size and nothing ever beyond it and let me know if this is violated".
Indeed, and C pretty much stands alone here. In just about every other language, there's some automated means of keeping track of the size of an allocated block. Even C++ has this in std::array. The only exceptions I can think of are assembly, if that counts, and Forth.
> I don't understand why this state of affairs is not universally considered sordid but it's handwaved "just be aware of UB and pass the size along and it will be well"...
You're not alone in thinking this is a particularly reckless design decision in C. The great Walter Bright, designer of the D language and also a HackerNews regular, wrote a short article on this in 2009, called C's Biggest Mistake. [0] He even suggested a fix, of adding the option of fat pointers into C. I don't think the committee is going to adopt it though.
I recall reading somewhere a snarky take on this: A foundational principle of the C programming language is that the programmer is always right, even when they are wrong.
So now you have to support all browsers : Firefox, Safari, Chrome and Edge. Plus some old stuff because this customer still has a Centos 4 Workstation running and another has a few Windows XP PCs that are mission critical.
I don't know if Motif is better at that, but I wouldn't bet on web-apps personally.
Is Motif actually available on modern Linux systems? And is there a Windows port as well?
I find it difficult to believe that Motif is actually that portable.
Web apps are only as portable as the browser features they use, and the browsers available for the platform. A primarily backend-rendered app, with minimal Javascript is much more portable than the average SPA app.
One day, maybe when I am retired, I am going to develop a programming language-agnostic algorithm specifying language with which you can generate code for programming languages ;-). A kind om Mathematica, but than for software.
I have to admit that I was little joking about this. But I do think it is possible to specify an implementation of how a complex operation can be achieve by combining more primitive operations. With digital computers, data is usually represented by certain representation of bits. An operation is usually defined on these kind of representation. Think for example of an operation for adding two numbers in a certain representation, resulting in a result with a certain representation. In computers, two most primitive adding operations are usually adding modulo some power of 2. But with these, we can implement adding for much large numbers (also using other kinds of operations and/or intermediate storage).
While I write a lot of C, I immediately disagree with the idea that C has a "simple (yet expressive) abstract machine model". Every so often we find a bug which has been present for over a decade, because some new compiler has added a (perfectly legal by the standard) optimisation which breaks some old code.
Picking one example: in the "old days", it was very common (and important for efficiency) to freely cast memory between char, int, double, etc. For many years this was fine, then all compilers started keeping better track of aliasing and lots of old code broke.
Also, while POSIX is a nice base, it stops you using Windows, and also almost every non-trivial program ends up with a bunch of autoconf (which has to be updated every so often) to handle differences between linux/BSD/Mac.
Also, definatly don't distribute code where you use flags like '-pedantic', as it can lead to your code breaking on future compilers which tighten up the rules.