Toy decompiler for x86-64 written in Python [pdf]

amelius · on Dec 10, 2016

Can we use this to port arbitrary closed-source x86-64 binaries to a different platform (e.g. WebAssembly)?

ceronman · on Dec 10, 2016

Static recompilation is not possible, even for simpler CPUs like the 6502 (NES, Apple I, etc). There are parts of the program that you will never know until you actually run it. Forget about a complex x86-64 program.

Apart from that, the complexity of all the system calls made by the program is also a huge challenge as those not always have an equivalent in the new system and are really hard to match at such a low level. It is, however, not impossible. Emulators such as Dolphin do that, also combined with dynamic recompilation.

_euac · on Dec 10, 2016

Decompiling isn't the hard part, it's providing all of the infrastructure the code depends on.

amelius · on Dec 11, 2016

What happens if you also decompile (and recompile) that infrastructure?

_euac · on Dec 11, 2016

You'll eventually hit a roadblock. It's turtles all the way down. If you translate libraries your program depends on, then you'll hit the OS API. If you manage to translate the OS, you'll have to emulate all the hardware below it.

Technically it's not impossible, I guess someone could create something that translates all I/O operations into something that reads from the browser storage and renders to the screen, but the amount of work would be pretty insane. It's been done for smaller machines already, just look at any of the JavaScript based emulators like JSNES and so on.

krat0sprakhar · on Dec 10, 2016

Can anyone comment on how hard is it to go from here to writing your own NES emulator?

khedoros1 · on Dec 11, 2016

NES emulator: Look up the ROM header format (iNES is still popular and common). That'll tell you how to find the entry point for the code. Find a document describing the memory map. That'll tell you where RAM, ROM, IO ports, etc go in the address space. Find an m6502 opcode reference. It'll also describe the registers and addressing modes in the CPU.

At that point, you've got enough to read the ROM into memory and start interpreting opcodes, even if you implement them one at a time as they come up. The CPU is just going to be reading and writing data to its registers and the memory map, so it's pretty straightforward.

Pretty quick, you'll run into a point where it expects some bit of external hardware results (usually reading from 0x2002 to query for the VBlank signal). That's the domain of the Picture Processing Unit. Start with simulating the VBlank flag, and go from there. The PPU is fun, and if you've gotten this far, you'll be in a good position to figure things out without a guide tutorial. There's a whole ton of documentation out there, and explanations about every aspect of the NES' hardware.

Retr0spectrum · on Dec 10, 2016

There isn't much of a skill intersection really, other than the relatively simple process of decoding opcodes.

If you want to write a NES emulator, I'd suggest starting with writing a CHIP-8 emulator.

zython · on Dec 10, 2016

86 upvotes over 9 hours and no comments yet, maybe I'll post the first one.

I must say that I enjoyed the read, but the introduction was a little bit too short for my taste. Opinions ?

justifier · on Dec 10, 2016

this is one flaw with forum style discussion formats

by the time anyone gets through reading the 31 page document, let alone follow along and run the code, the mob has moved on and you are left alone with questions and comments

i had a similar thought when this was posted: https://news.ycombinator.com/item?id=12901660#12903516 ; that was 50 page high concept pdf with complicated math

i'd be interested in a Read HN style offering with a link and a posted future date

the time is set in the future for when discussion will occur

giving interested participants and spectators time to take in the material and any supplemental research