Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I was able to shave off one additional byte with this:

  ...
  xor rax, rax       ; = 0
  inc rax            ; = 1 - syscall: sys_write
  mov rdi, rax       ; copy 1 - file descriptor: stdout
  lea rsi, [rel msg] ; pointer to message
  mov rdx, 14        ; message length
  syscall
  ...

  $ nasm -f bin -o elf elf.asm; wc -c elf; ./elf
  166 elf
  Hello, World!
So I guess NASM already optimizes this quite well

However, using the stack-based instructions as xpasky hinted at:

  ...
  push 1             ; syscall: sys_write
  pop rax
  pop rdi       ; copy 1 - file descriptor: stdout
  lea rsi, [rel msg] ; pointer to message
  push 14            ; message length
  pop rdx
  syscall
  ...
I get down to 159 bytes! I updated the article to reflect that


That second snippet is pretty funny:

  push 1
  pop rax
  pop rdi
You can't push a value once and pop it twice, that's not how a stack works! You're popping something else off the stack. So why does this even work?

Linux passes your program arguments on the stack, with argc on top. So when you don't pass any arguments, argc just HAPPENS to be 1. Which you then pop into rdi. Gross!


Of course - you are completely right, an oversight in wanting to correct my mistake as quickly as possible.

With that fixed, is there any reason not to use push here?


Yes, because:

  push 1       ; 6A 01 (2 bytes)
  pop rdi      ; 5F    (1 byte)
is longer than a simple:

  mov edi, eax ; 89 C7 (2 bytes)


I think your statement might only apply to 32 bit (one of the constraints mentioned early in the blog post was 64 bit).

But even if it was 32 bit, then we would't have to copy a 1, since the syscall number for sys_write would be 4 instead of 1.

I get the same total size with both variants in 64 bit mode.

  push 1
  pop rax
  mov rdi, rax
Assembling to 48 89 C7 (3 bytes)

seems to be same in size as

  push 1
  pop rax
  push 1
  pop rdi
Assembling to 6A 01 5F (3 bytes)


That's because you're using `mov rdi, rax` again. You keep changing `edi, eax` to `rdi, rax`. Why?

The default operand size in 64-bit mode is, for most instructions, still 32 bits. So `mov edi, eax` encodes the same in 32- and 64-bit mode.

For `mov rdi, rax` you need an extra REX prefix byte [1], that's the 48 you're seeing above, but you don't need it here.

[1] https://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefi...


okay, I didn't know that, thanks for the background. I wonder why the assembler would not optimize this though.

I noticed that I then could also shave of one byte more by using lea esi, [rel msg] instead of lea rsi, [rel msg].


should be ... push 1 ; syscall: sys_write pop rax push 1 pop rdi

of course




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: