These days I think the sane option is to just add a static assert that the machine is little endian and move on with your life. Unless you're writing glibc or something do you really care about supporting ancient IBM mainframes?
Plenty of binary formats out there containing big endian integers. An example is the format I'm dealing with right now: ELF files. Their endianness matches that of the ELF's target architecture.
Apparently they designed it that way in order to ensure all values in the ELF are naturally encoded in memory when it is processed on the architecture it is intended to run on. So if you're writing a program loader or something you can just directly read the integers out of the ELF's data structures and call it a day.
Processing arbitrary ELF inputs requires adapting to their endianness though.
The fun really starts if you have a CPU using big endian and a bus using little endian..
Back in the late 1990s, we moved from Motorola 68k / sbus to Power/PCI. To make the transition easy, we kept using big endian for the CPU. However, all available networking chips only supported PCI / little endian at this point. For DMA descriptor addresses and chip registers, one had to remember to use little endian.
I feel like we're at a point where you should assume little endian serialization and treat anything big endian as a slow path you don't care about. There's no real reason for any blob, stream, or socket to use big endian for anything afaict.
If some legacy system still serializes big endian data then call bswap and call it a day.
The internet is big-endian, and generally data sent over the wire is converted to/from BE. For example the numbers in IP or TCP headers are big-endian, and any RFC that defines a protocol including binary data will generally go with big-endian numbers.
I believe this dates from Bolt Baranek and Newman basing the IMP on a BE architecture. Similarly computers tend to be LE these days because that's what the "winning" PC architecture (x86) uses.
The low-level parts of the network are big-endian because they date from a time when a lot of networking was done on big-endian machines. Most modern protocols and data encodings above UDP/TCP are explicitly little-endian because x86 and most modern ARM are little-endian. I can't remember the last time I had to write a protocol codec that was big-endian; that was common in the 1990s, but that was a long time ago. Even for protocols that explicitly support both big- and little-endian encodings, I never see an actual big-endian encoding in the wild and some implementations don't bother to support them even though they are part of the standard, with seemingly little consequence.
There are vestiges of big-endian in the lower layers of the network but that is a historical artifact from when many UNIX servers were big-endian. It makes no sense to do new development with big-endian formats, and in practice it has become quite rare as one would reasonably expect.
Is it though? Because my experience is very different than GP’s: git uses network byte order for its binary files, msgpack and cbor use network byte order, websocket uses network byte order, …
> any RFC that defines a protocol including binary data will generally go with big-endian numbers
I'm not sure this is true. And if it is true it really shouldn't be. There are effectively no modern big endian CPUs. If designing a new protocol there is, afaict, zero benefit to serializing anything as big endian.
It's unfortunate that TCP headers and networking are big endian. It's a historical artifact.
Converting data to/from BE is a waste. I've designed and implemented a variety of simple communication protocols. They all define the wire format to be LE. Works great, zero issues, zero regrets.
> There are effectively no modern big endian CPUs.
POWER9, Power10 and s390x/Telum/etc. all say hi. The first two in particular have a little endian mode and most Linuces run them little, but they all can run big, and on z/OS, AIX and IBM i, must do so.
I imagine you'll say effectively no one cares about them, but they do exist, are used in shipping systems you can buy today, and are fully supported.
Yeah those are a teeny tiny fraction of CPUs on the market. Little Endian should be the default and the rare big endian CPU gets to run the slow path.
Almost no code anyone here will write will run on those chips. It’s not something almost any programmer needs to worry about. And those that do can easily add support where it’s necessary.
The point is that big endian is an extreme outlier.
If you're writing an implementation of one of those "early protocols", sure. If not, call a well-known library, let it do whatever bit twiddling it needs to, and get on with what you were actually doing.
Also recommending octal is sadistic!