I feel like I shouldn't love x86 encoding, but there is something charming about this. Probably echoing its 8-bit predecessors. It seems like it's designed for tiny memory environments (embedded, bootstrapping, etc.) where you don't mind taking a hit for memory access.
Linux initializes all general purpose registers to zero. It's not documented AFAIK, but should be reliable - it has to init them to some value anyway to avoid leaking kernel state. So you can get away with:
The load address stays constant unless there's some magic GNU extension header to enable ASLR. If we could get the code loaded below 64K, we could save another byte by using SI instead of ESI; however this doesn't work by default, you'd have to run 'echo 0 > /proc/sys/vm/mmap_min_addr' as root first.
Yes, but only in 32 bit mode. Not that it matters, except for the hypothetical future processor or Linux kernel that is no longer compatible with that :)
You can't push a value once and pop it twice, that's not how a stack works! You're popping something else off the stack. So why does this even work?
Linux passes your program arguments on the stack, with argc on top. So when you don't pass any arguments, argc just HAPPENS to be 1. Which you then pop into rdi. Gross!
Thanks, that makes total sense. I was so focused on the ELF part that I didn't even consider optimizing the initial assembly further. Will fix it and edit the article.
Here's a tiny DOS COM file that does it in 18 bytes:
;; 18 bytes
DB 'HELLO_WOIY<$' ; executes as machine code, returning SP to original position without overwriting return address
mov dx, si ; mov dx,0100h MS-DOS (all versions), FreeDOS 1.0, many other DOSes
xchg ax, bp ; mov ah,9 MS-DOS 4.0 and later, and FreeDOS 1.0
int 21h
ret
COM files for CP/M and DOS really are a no-nonsense executable format.
I'm a bit disappointed that Linux (or BSD, macOS, etc.) doesn't support them (or similar) out of the box, though Windows will sort of run them via ntvdm.
My favorite language for implementing short Hello World programs in is HQ9+ [1].
Joking aside, this page [2] used to be a great tutorial on writing small ELF binaries, but I'm not sure whether it will still work in 64-bit land. It proved very helpful for writing a 4K intro back in 1999.
These challenges are funny - they remind me of the old days. Back in the DOS/Windows days, we used to have the .com format, which was perfect for tiny programs. One could even write a program of less than 10 bytes that could actually do something!
We've come a long way since then, and is like, at some point, nobody cared about optimizing executable size anymore
I learned to write COM programs at some point but quickly unlearned it. There were some spots where you can use them and not .bat files, but outside of that it’s a lot.
.model small
.code
org 100h
start:
int 19h ; Bootstrap loader
end start
More "correct":
.model small
.code
org 100h
start:
db 0EAh ; Jump to Power On Self Test - Cold Boot
dw 0,0FFFFh
end start
Even more "correct":
.model small
.code
org 100h
start:
mov ah,0Dh
int 21h ; DOS Services ah=function 0Dh
; flush disk buffers to disk
sti ; Enable interrupts
hlt ; Halt processor
mov al,0FEh
out 64h,al ; port 64h, kybd cntrlr functn
; al = 0FEh, pulse CPU reset
end start
Great example, a two bytes reboot utility. From the times when we could turn off the computer with a push of a button without fearing a global catastrophe...
Yeah I thought sth like this is possible, but (correct me if I'm wrong) this (ab)uses the ELF header and punts data in there, which goes against my requirement
> It should be a ‘proper‘ executable binary according to the spec
nasm will optimize this to the equivalent "mov eax, 1", that's 6 bytes, but still:
would be much smaller. Second line: You already have the value 1 in eax, so a "mov edi, eax" (two bytes) would suffice. Etc. etc.