Hacker News new | past | comments | ask | show | jobs | submit login

These days I think the sane option is to just add a static assert that the machine is little endian and move on with your life. Unless you're writing glibc or something do you really care about supporting ancient IBM mainframes?

Also recommending octal is sadistic!




Big Endian is also called Network Order because some networking protocols use it. And of course, UTF-16 BE is a thing.

There is a non-trivial chance that you will have to deal with BE data regardless if your machine is LE or BE.


> some networking protocols

Pretty low-key way to refer to pretty much all layer 1-3 IETF protocols :D


It's like Tim Berners-Lee being referred to as "Web Developer" :)

https://imgur.com/kX5oBk6


Yeah but that's a known order. You don't have to detect it.


Plenty of binary formats out there containing big endian integers. An example is the format I'm dealing with right now: ELF files. Their endianness matches that of the ELF's target architecture.

Apparently they designed it that way in order to ensure all values in the ELF are naturally encoded in memory when it is processed on the architecture it is intended to run on. So if you're writing a program loader or something you can just directly read the integers out of the ELF's data structures and call it a day.

Processing arbitrary ELF inputs requires adapting to their endianness though.


The fun really starts if you have a CPU using big endian and a bus using little endian..

Back in the late 1990s, we moved from Motorola 68k / sbus to Power/PCI. To make the transition easy, we kept using big endian for the CPU. However, all available networking chips only supported PCI / little endian at this point. For DMA descriptor addresses and chip registers, one had to remember to use little endian.


> add a static assert that the machine is little endian and move on with your life

It's not clear how it would free you from interpreting BE data from incoming streams/blobs.



I feel like we're at a point where you should assume little endian serialization and treat anything big endian as a slow path you don't care about. There's no real reason for any blob, stream, or socket to use big endian for anything afaict.

If some legacy system still serializes big endian data then call bswap and call it a day.


The internet is big-endian, and generally data sent over the wire is converted to/from BE. For example the numbers in IP or TCP headers are big-endian, and any RFC that defines a protocol including binary data will generally go with big-endian numbers.

I believe this dates from Bolt Baranek and Newman basing the IMP on a BE architecture. Similarly computers tend to be LE these days because that's what the "winning" PC architecture (x86) uses.


The low-level parts of the network are big-endian because they date from a time when a lot of networking was done on big-endian machines. Most modern protocols and data encodings above UDP/TCP are explicitly little-endian because x86 and most modern ARM are little-endian. I can't remember the last time I had to write a protocol codec that was big-endian; that was common in the 1990s, but that was a long time ago. Even for protocols that explicitly support both big- and little-endian encodings, I never see an actual big-endian encoding in the wild and some implementations don't bother to support them even though they are part of the standard, with seemingly little consequence.

There are vestiges of big-endian in the lower layers of the network but that is a historical artifact from when many UNIX servers were big-endian. It makes no sense to do new development with big-endian formats, and in practice it has become quite rare as one would reasonably expect.


No idea why you’re getting downvoted. Everything you’ve written is correct.


Is it though? Because my experience is very different than GP’s: git uses network byte order for its binary files, msgpack and cbor use network byte order, websocket uses network byte order, …


Yeah I'd say it should be true but there are plenty of modern protocols that still inexplicably use big endian.

For your own protocols there's no need to deal with big endian though.


> any RFC that defines a protocol including binary data will generally go with big-endian numbers

I'm not sure this is true. And if it is true it really shouldn't be. There are effectively no modern big endian CPUs. If designing a new protocol there is, afaict, zero benefit to serializing anything as big endian.

It's unfortunate that TCP headers and networking are big endian. It's a historical artifact.

Converting data to/from BE is a waste. I've designed and implemented a variety of simple communication protocols. They all define the wire format to be LE. Works great, zero issues, zero regrets.


> There are effectively no modern big endian CPUs.

POWER9, Power10 and s390x/Telum/etc. all say hi. The first two in particular have a little endian mode and most Linuces run them little, but they all can run big, and on z/OS, AIX and IBM i, must do so.

I imagine you'll say effectively no one cares about them, but they do exist, are used in shipping systems you can buy today, and are fully supported.


Yeah those are a teeny tiny fraction of CPUs on the market. Little Endian should be the default and the rare big endian CPU gets to run the slow path.

Almost no code anyone here will write will run on those chips. It’s not something almost any programmer needs to worry about. And those that do can easily add support where it’s necessary.

The point is that big endian is an extreme outlier.


Only the early protocols below the application layer are BE. A lot of the later stuff switched to LE.


Yes, those “early protocols” carry everything. Until applications stop opening sockets, this problem doesn’t go away.


If you're writing an implementation of one of those "early protocols", sure. If not, call a well-known library, let it do whatever bit twiddling it needs to, and get on with what you were actually doing.


But the payload isn't BE


AFAIK quite a number of protocols and file formats use BE without any sign to become a legacy even in a distant future.


You do realize that most of the networking stack is big-endian, right?


BE MIPS is still alive, many recent Mikrotik hardware is BE MIPS.


That should just be your bad decision shouldn't be other people problems.


I write programs for my calculator, which is big endian


Which calculator? And does it have a C compiler?


It’s the Casio CG50, it uses SuperH which is supported by GCC and there is an unofficial SDK (well there are actually two of them)

The CPU is technically bi-endian but it’s controlled by a pin and it’s hardwired to big endian mode

Most C code just works, sometimes there are endianness bugs when porting things but they’re usually not hard to fix


Good for you I guess?


It’s an example of somewhere big endian is used other than an IBM mainframe, even if it’s equally niche

I’m pretty sure there are examples that are way less niche though but I don’t want to look into it at the moment


or older macs

(but mostly, network byte order is big endian)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: