A Comprehensive Super Mario Bros. Disassembly

jordigh · on Oct 16, 2017

https://gist.github.com/1wErt3r/4048722#file-smbdis-asm-L601...

I think this is where the real gems start. The biggest contribution that SMB had was the "physics engine", to retrofit a modern term. The friction, the jumping, the inertia. If you compare it with the primitive physics in Donkey Kong or Mario Brothers, you can really grasp the groundbreaking novelty that was SMB. You can change direction in mid-air, but not too much. When you run, you skid if you try to run in the opposite direction. The height of your jumps is affected by your running speed.

It's all of these little details combined, barely noticed in tandem, which made the game new and fun.

dharmapure · on Oct 16, 2017

I once read that the way SMB was able to pull off the physics engine on such limited hardware was that it used lookup tables for physics instead of actually calculating velocity. My assembly-fu is weak but it looks like your link is to the section that contains all the lookup tables. I think JumpMForceData for example is a series of offsets for each successive frame after you hit the jump button.

https://gist.github.com/1wErt3r/4048722#file-smbdis-asm-L611...

This line shows the calculation that makes use of the JumpMForceData.

kabdib · on Oct 17, 2017

When I wrote Atari 800 Donkey Kong, I used 16-bit precision arithmetic for the motion, with pretty good results. All the jumping was done with a horizontal velocity, a vertical velocity and a vertical acceleration. A jump is just setting Mario's vertical velocity to some value, and you need to do floor collisions (but you want those anyway, so other objects can interact with floors).

There was a funky "bounce" at the edge of the screen in the arcade version that I captured pretty well by reversing X and Y velocity on a wall collision; this could not have been done well with a lookup table. It wasn't that much code, and the simple parameter space made playability tuning really easy.

bluedino · on Oct 16, 2017

You can also fine-tune the feel of a jump when you're directly editing a handful of values vs trying to find a function that describes your desired results.

goialoq · on Oct 16, 2017

How so? "Trying to find a function" is editing a handful of parameters to a polynomial+exponential model -- the same thing

Orangeair · on Oct 16, 2017

When you edit one coefficient of a polynomial, you change its behavior everywhere. When you change one value in a lookup table, you're only changing the value for one point in time.

chii · on Oct 16, 2017

i'd say a lookup table is more easily edited than a parameter in a function (provided that you are editing a function with less parameters than the entries in the lookup table).

boomlinde · on Oct 17, 2017

If they used tables it was probably to account for a bunch of edge case and to fine tune the jumps to feel right and controllable. Counter-intuitively, plain inertial physics doesn't seem to feel very good in platform games. But inertial physics is pretty cheap to implement on the 6502 using fixed point arithmetics.

dieterrams · on Oct 16, 2017

Lookup tables were indeed a common technique used by games in the past.

asveikau · on Oct 16, 2017

And present, too, right? It's not the same reason as it would have been in the 80s, but today in performance critical code it is not uncommon to reduce the number of conditionals for better CPU pipelining, and lookup tables are a very common tool for this.

vardump · on Oct 16, 2017

> it is not uncommon to reduce the number of conditionals for better CPU pipelining, and lookup tables are a very common tool for this.

On modern CPUs, data dependency, such as lookup tables often cause pipeline stalls — worse pipelining.

L1 cache is at a premium as well, you rarely want to waste it to access LUTs.

You can compute a lot in 12 cycles caused by L2 hit (L1 miss). In theory up to 32 * 12 = 384 floating point operations.

BeeOnRope · on Oct 16, 2017

To be fair, replacing a series of ALU ops with a lookup table doesn't usually add a "data dependency" - the data dependency probably already existed, but perhaps flowed through registers rather than memory.

What adding a lookup table can do is to add the load-latency to the dependency chain involving the calculation, which seems to be what you are talking about here. For an L1 hit that's usually 4 or 5 cycles, and for L2 hits and beyond it's worse, as you point out. How much that actually matters depends on whether the code is latency-bound and the involved lookup is on the critical path: in many cases where there is enough ILP it won't be (an general rule is that in most code most instructions are not on a critical dependency-chain).

If the involved method isn't that hot then L1 misses (like your example) or worse are definitely a possibility. On the other hand, in that case performance isn't that critical by definition. If the method is really hot, e.g., in a tight(ish) loop, then you are mostly going to be getting L1 hits.

The comparison with 384 FOPs seems a bit off: I guess you are talking about about some 32-FOP per cycle SIMD implementation (AVX512?) - but the assumption of data dependencies kind of rules that out: one would assume it's scalar code here. If it's vectorization, then the whole equation changes!

vardump · on Oct 17, 2017

> To be fair, replacing a series of ALU ops with a lookup table doesn't usually add a "data dependency"

If it's not vectorizable, LUT result is often used for indirect jump/call (like large switch statement) or memory access (say, a histogram etc.).

> What adding a lookup table can do is to add the load-latency to the dependency chain involving the calculation, which seems to be what you are talking about here.

Yeah, used a bit sloppy terminology. Loads can affect performance system wide. To be a win, LUT function needs to be something pretty heavy, while LUT itself needs to be small (at least <16 kB, preferably <1 kB).

> If the method is really hot, e.g., in a tight(ish) loop, then you are mostly going to be getting L1 hits.

That really depends. It's generally good to keep L1 footprint small. There are just 512 of 64-byte L1 cache lines. Hyperthread shares L1 as well. There can be other hot loops nearby that could also benefit from hot L1. It's very easy to start to spill to L2 (and further). Microbenchmarks often miss "system" level issues.

> The comparison with 384 FOPs seems a bit off

384 was for the extreme vectorization case, 12 x 2 x 8 FMACs (AVX). Most vendors count FMACs nowadays as two FOPs...

> If it's vectorization, then the whole equation changes

Well, isn't that where the performance wins are and what you need to do to extract maximum performance from that hot loop? A good truly parallel vector gather implementation could make (small) LUTs very interesting performance wise.

BeeOnRope · on Oct 17, 2017

> If it's not vectorizable, LUT result is often used for indirect jump/call (like large switch statement) or memory access (say, a histogram etc.).

You've lost me. The parent comments were specifically talking about using LUTs to replace calculation, and particular calculations involving branches. So basically rather than some ALU ops + possibly some branches, you use a LUT and fewer or zero ALU ops, and fewer or zero branches.

No one is talking about the context of a LUT of function pointers being used for a switch statement.

A histogram is generally a read/write table in memory and doesn't have much to do with a (usually read-only) LUT - unless I missed what you're getting at there.

> Yeah, used a bit sloppy terminology. Loads can affect performance system wide. To be a win, LUT function needs to be something pretty heavy, while LUT itself needs to be small (at least <16 kB, preferably <1 kB).

Definitely. Useful LUTs are often a few dozen bytes: the JumpMForceData one referring to about was only 8 bytes!

> Microbenchmarks often miss "system" level issues.

Definitely.

But it's like a reverse myth now: at some point (apparently?) everyone loved LUTs - but now it's popular to just dismiss any LUT use with: "yeah but a cache miss takes yyy cycles which will make a LUT terrible!" or "microbenchmarks can't capture the true cost of LUTs!".

Now the latter is certainly true, but you can certainly put reasonable bounds on the cost. The key observation is the "deeper" the miss (i.e., miss to DRAM being the deepest, ignoring swap), the less the implied frequency the LUT-using method was being called anyways. If the method is always missing to DRAM, the LUT entries are being cleared out before the next invocation (that hits the same line) which must be a "while". Conversely, when the LUT method is very hot (high of calls), the opportunity for the LUT to stay in cache is great.

You can even analyze this more formally: basically looking at the cost-benefit of every line of cache used by the LUT: if the LUT didn't use that line, what would be the benefit to the surrounding code? For any non-trivial program this gets hard to do exactly, but certainly with integration benchmarks and performance counters you can make some reasonable tests. Associativity makes everything tougher and less linear though...

> Well, isn't that where the performance wins are and what you need to do to extract maximum performance from that hot loop?

Sure, if it can be vectorized. My point there was that you were discussing the latency of L2 misses rather than the throughput, which implies that the operations on consecutive elements were dependent (otherwise it would be the throughput of 1 or 2 per cycle that would be important). So you just have to keep it apples to apples: if you assume independent ops, you can perhaps vectorize, but then the LUT comparison is a throughput one (as is the scalar alternative), but if the code is dependent and non-vectorizable, then the LUT latency more becomes important.

> A good truly parallel vector gather implementation could make (small) LUTs very interesting performance wise.

Anything that can be vectorized usually adds another factor of 4 or 8 to performance making it much harder for LUTs to come out on top, since the gather implementations on x86 anyways just use the same scalar ports and are thus still limited to 2/cycle, and without any "smarts" for identical or overlapping elements.

Sometimes you can use pshufb to do what amounts to 16 or 32 parallel lookups in a 16-element table of bytes. If your LUT can be made that small, it works great.

white-flame · on Oct 16, 2017

I think the biggest difference is that modern games tend to be written for variable frame rates, stuffing floating point time deltas through equations.

A lookup table meshes much better with fixed frame rate gameplay, either with one entry per frame, or quantizing countdown timers of how many frames to wait to go to the next state.

usernam · on Oct 16, 2017

Actually, modern (and not so modern) physics engines as used in games generally use a fixed time delta for each step, and just iterate faster/slower to keep the simulation in sync[1]. This is done for many reasons, but predominantly numerical stability.

[1] not the full story

derefr · on Oct 16, 2017

I have a strong feeling that Super Mario Maker always has the same physics engine going, but just has four different lookup tables that it switches between depending on the level theme. Anyone want to partially disassemble it for comparison?

ageitgey · on Oct 16, 2017

Super Mario Maker uses "New SMB" physics for all themes. It doesn't actually mimic the exact physics of each game (aside from small tweaks like not letting you wall-jump or carry shells depending on the theme). They did this because newer players found it super confusing to go back to the older physics.

From Takashi Tezuka:

> “In the end we used the New Super Mario Bros. U system for all of the game styles. There was quite a lot of discussion about this within the team. Staff who had strong attachment to the original games expressed a strong desire to see implemented the same system they remembered. However, when players who are used to the modern Mario physics tried playing with the old physics, they found it much more difficult than they remembered."

People on reddit have done some pretty extensive breakdowns of the physics of different Mario versions vs. SMM:

- https://imgur.com/XC38rcX - https://www.reddit.com/r/MarioMaker/comments/4iqa5s/super_ma...

simooooo · on Oct 16, 2017

Yes I recall a Forza motorsport physics guy saying they used a simple lookup table for the chart which holds the curve for the limit if grip on a tyre. Beats the crab out if calculating it every time.

katastic · on Oct 17, 2017

Thank you. I wrote a long post but then I decided not to post it...

Always remember to view past "Tricks" from the era they were written in.

Lookup tables, in the 90's, for example, were used EVERYWHERE. People always used for sin/cos/tan functions. Fixed-point math was very common as well. Nowadays, it may seem like "magic" but it's not. It was just "the way" to get things done.

I'm not downplaying the skill of the programmers of those eras, it's just keep it in perspective. Many "tricks" weren't invented for that one game, they were common place for all programmers using those platforms.

I used to use compiled sprites all the time in the 90's. They seem like magic today, but they were just another kind of drawing to us. In Allegro 4, you could even draw one by simply calling a "make compiled sprite" function and making sure to not draw it outside of the clipping rectangle (since they can't be clipped). That was it. You build it with a builder function, and then you call draw on it. But it "seems" insane nowadays to convert the bits of a bitmap into raw machine code that draws those bits to save some CPU cycles.

mercer · on Oct 21, 2017

I was going to respond to an earlier comment of yours, decided I couldn't quite figure out what to ask, and so kept reading. But after this comment I think my question is a bit clearer: please write longer posts or start a blog with stories of the kind of work you were doing > 10 years ago. It's fascinating, and you're a good writer!

PhasmaFelis · on Oct 16, 2017

I may be missing something, but how is that not "actually calculating velocity"?

dharmapure · on Oct 16, 2017

That is, instead of using y velocity = acceleration * time and then adding in gravity, they made something like this, as I understand it:

speeds = [0, 1, 2, 4, 8, 4, 2, 1, 0]

And then you keep track of how many frames it's been since you jumped, and then just do

yVelocity = speeds[ticksSinceJump]

No math needed at all.

SimbaOnSteroids · on Oct 16, 2017

The second game, melee, definitely had a more traditional engine running it, too many ~bugs~ fantastic features to have been taking advantage of lookup tables.

comex · on Oct 16, 2017

Are you thinking of Super Smash Bros.? This is Super Mario Bros. for the NES, over a decade older :)

dharmapure · on Oct 17, 2017

I actually wonder if a version of Smash Bros could have worked on NES with lookup tables for physics? The existing games are 3d but the physics are all 2d.

SimbaOnSteroids · on Oct 17, 2017

Yes I read super and bros and didn't read the middle.

corysama · on Oct 16, 2017

I'll plug the book http://www.game-feel.com/ here because it has a 28-page chapter devoted to the physics of SMB.

james_hague · on Oct 17, 2017

Not platformers I realize, but lots of outer space games with actual acceleration/velocity calculations preceded SMB. Off the top of my head: Lunar Lander (1979), Asteroids (1979), Defender (1981), Gravitar (1982), Sinistar (1982). Just want to keep the history straight. Several of these modeled gravity as well.

blt · on Oct 18, 2017

Used to have a Gravitar clone for DOS. It was one of the harder games I've ever played :)

DonHopkins · on Oct 16, 2017

I wrote this earlier on another forum but I'll repost it here:

I've seen Shigeru Miyamoto speak at several game developer conferences over the years. He's absolutely brilliant, a really nice guy, and there's so much to learn by studying his work and listening to him talk. Will Wright calls him the Stephen Spielberg of games.

At one of his earlier talks, he explained that he starts designing games by thinking about how you touch, manipulate and interact with the input device in the real world, instead of thinking about the software and models inside the virtual world of the computer first. The instantaneous response of Mario 64 and how you can run and jump around is a great example of that.

Shigeru Miyamoto GDC 1999 Keynote (Full): https://www.youtube.com/watch?v=LC2Pf5F2acI

At a later talk about how he designed the Wii, he said that he now starts designing games by thinking about what kind of expression he wants it to evoke on the player's faces, and how to make the players themselves entertain the other people in the room who aren't even playing the game themselves. That's why the Wii has so many great party games, like Wii Sports. Then he showed a video of a little girl sitting in her grandfather's lap playing a game -- http://youtu.be/SY3a4dCBQYs?t=12m29s , with a delighted expression on her face. The grandfather was delighted and entertained by watching his granddaughter enjoy the game.

This photo -- https://i.imgur.com/zSbOYbk.jpg -- perfectly illustrates exactly what he means!

Shigeru Miyamoto 2007 GDC Keynote - Part 1: https://www.youtube.com/watch?v=En9OXg7lZoE

Shigeru Miyamoto 2007 GDC Keynote - Part 2: https://www.youtube.com/watch?v=jer1KCPTcdE

Shigeru Miyamoto 2007 GDC Keynote - Part 3: https://www.youtube.com/watch?v=SY3a4dCBQYs

Shigeru Miyamoto 2007 GDC Keynote - Part 4: https://www.youtube.com/watch?v=jqBee2YlDPg

Shigeru Miyamoto 2007 GDC Keynote - Part 5: https://www.youtube.com/watch?v=WI3DB3tYiOw

Shigeru Miyamoto 2007 GDC Keynote - Part 6: https://www.youtube.com/watch?v=XvwYBSkzevw

Shigeru Miyamoto Keynote GDC 07 - Wife-o-meter: https://www.youtube.com/watch?v=6GMybmWHzfU

notaboutdave · on Oct 17, 2017

The Miyamoto approach of starting with a desired emotion and working backward toward a design is profound. This is radically different from most things I've read which involve cramming emotion into existing designs. This changes everything for me. Thanks for sharing.

jzl · on Oct 17, 2017

Ben Fry, the creator of Processing, created a beautiful visualization of the disassembled machine code of Super Mario Bros with arrows representing jump instructions:

http://benfry.com/dismap/mario-large2.jpg

via: http://benfry.com/dismap/

I've always loved this as a visualization of machine code. It's simultaneously amazing how complex it is and yet how simple it is when you consider that it represents the entire game all in one graphic. This is from 2007 I believe. I love that 10 years later the machine code is now fully annotated and understood.

retSava · on Oct 17, 2017

Wow, that is amazingly complex! It's beautiful, in a way.

I tried to see if there was something very central, called from lots of places, but there doesn't seem to be any such place. There are a few, but not something singular that stands out as completely dominant.

Long jumps get a more dominant visual appearance than short jumps. Would be interesting to see what it would look like with the address printed in size proportional to eg how many places call it, or how CPU intensive that subroutine is, etc.

retSava · on Oct 17, 2017

Also check out the other example dismap shows: Excitebike! Much fewer and longer subroutines. Interesting, I wonder if that in turn is mostly a result of developer coding style differences, or a result of game mechanics being different.

raldi · on Oct 16, 2017

Check out the section under "DemoActionData": this is where it stores (and plays) the demo you see when you don't push Start and Mario runs around on his own volition.

It just simulates player input and runs it through the regular game engine. (The alternative, playing a recorded video, would have been laughably data intensive.)

dEnigma · on Oct 16, 2017

Same thing goes for Super Mario 64. The popular TASer pannenkoek actually explored whether it was possible to manipulate Demo-Mario's starting position in such a way that he collects a star with the demo input (this was for the purposes of special "A-Button Challenge" speedruns, where pressing the A-Button must be kept to a minimum, but since the demo input isn't actual player input it isn't counted) Sadly I think there was no conceivable way to do it. (i.e. manipulating the starting position is possible, but not in a way that leads to collecting a star)

https://youtu.be/-0emgkIEobI

opdahl · on Oct 16, 2017

That's an amazing youtube channel. Here [1] the creator describes in an over seven-minute long video the intricacies of Mario falling asleep.

[1] https://www.youtube.com/watch?v=7OtW-LLZ2OA

dEnigma · on Oct 16, 2017

That's nothing! He has two videos on walls, floors and ceilings, each longer than half an hour (and both of them extremely interesting)

Part 1: https://www.youtube.com/watch?v=UnU7DJXiMAQ

Part 2: https://www.youtube.com/watch?v=f1kbABTyeo8

justin_ · on Oct 16, 2017

I remember using this disassembly many years ago when writing a little NES emulator. Having a reference available for a popular game is incredibly useful.

Here's one of my favorite parts: https://gist.github.com/1wErt3r/4048722#file-smbdis-asm-L942

The byte here is for the BIT instruction, but why is it just a lonely byte? Well, the BIT instruction in this case also includes the two following bytes. When the game processes that instruction, the `ldy #$04` is swallowed up as part of the BIT instruction, effectively skipping over it. IIRC this was a pretty common trick used among 6502 programmers. It allows you to jump ahead over the next (2byte) instruction with just a single byte!

looperhacks · on Oct 16, 2017

You might also be interested in the dissambly of the first pokemon games:

https://github.com/pret/pokered

And some other pokemon games:

https://github.com/pret/pokered#see-also

DarkTree · on Oct 16, 2017

I loved this article regarding the algorithm used for capturing pokemon: http://www.dragonflycave.com/mechanics/gen-i-capturing

katastic · on Oct 17, 2017

Good gosh, that was a long, fun read.

khedoros1 · on Oct 16, 2017

There's also a Legend of Zelda disassembly, but it's not as nice:

https://github.com/camthesaxman/zeldasource

It's a much larger game, though.

I haven't checked, but I'm assuming that a lot of the giant chunks of statically-defined data are just graphics and audio (unlike many games, LoZ stored graphics interspersed with the program code and copied data over to an in-cartridge RAM chip, instead of storing the complete graphics data in a ROM chip.)

derefr · on Oct 16, 2017

> unlike many games, LoZ stored graphics interspersed with the program code and copied data over to an in-cartridge RAM chip, instead of storing the complete graphics data in a ROM chip

This is because Zelda 1 was a port from the Famicom Disk System—no memory-mapped ROM chip to rely on, so you've got to load everything you're going to use to RAM. (Also like this: Metroid.)

I believe this is why both LoZ's and Metroid's maps are built out of individual "screens" with a "pause to transition" effect between them: in the FDD version, the game would be reading the new map from disk, and there'd (sometimes, if the load took long enough) be a loading screen involved. (You can see the screen for LoZ here: http://tcrf.net/The_Legend_of_Zelda/Console_Differences#Load...)

khedoros1 · on Oct 17, 2017

Cool, I never dug in to see why they were that way, but those were both test cases when I built an NES emulator.

DonHopkins · on Oct 16, 2017

How I love the sleek smooth razor sharp columns of three letter 6502 opcodes. The right edge of columns of opcodes in other instruction sets look so rough and jagged like sandpaper in comparison. That's what I've always hated about x86 code. It looks rough and torn.

andyjohnson0 · on Oct 17, 2017

Nice observation. Its 35+ years since I last did any 6502, and this morning teenage-me looked at those opcodes and smiled a little.

indescions_2017 · on Oct 17, 2017

Master list of SMB glitches makes a nice companion to this. Sorry in advance if it results in anyone staying up well past their bedtime ;)

https://www.mariowiki.com/List_of_Super_Mario_Bros._glitches

laurent123456 · on Oct 17, 2017

That'd be great if someone knowing the disassembly well enough could explain some of these glitches, for example the Minus World[0]. I guess the logic (or bug) to active this appears somewhere in the disassembly.

[0] https://www.mariowiki.com/Minus_World

LocalH · on Oct 20, 2017

https://www.youtube.com/watch?v=Hv_h_R3o9r8

binarymax · on Oct 16, 2017

I love that this is just a gist. Like 'hey just needed to copy and paste this for a minute'

x3ro · on Oct 16, 2017

In case anyone else is trying to re-assemble this into a working game, here's one way to do it :)

https://github.com/x3ro/super-mario-bros-build

tomduncalf · on Oct 16, 2017

This seems like a really impressive effort to make sense of all this!

Would the original game have been written in assembly? And if so, would the source have looked similar to this?

Having never touched assembly language (aside from learning some very basic cracking many years ago swapping JE for JNE in the serial check routine, haha), it seems like a true dark art to me, so I’m really curious to know!

aquova · on Oct 16, 2017

Yes, all old NES games were written in 6502 assembly (named after the NES's 6502 processor), and even most games into the Super Nintendo and Game Boy days were written using assembly language.

The source would've looked very similar to this, although I can assume the original labels would've been in Japanese. The difficulty in creating a disassembly like this isn't converting the machine code back into assembly, which can be done rather simply, but instead re-adding all the label names, which are lost when the game is built. It's quite the undertaking, and the author must know the complete game back to front.

bluedino · on Oct 16, 2017

For another interesting read there's the guy that disassembled Robotron, and traced the code out by hand across 512 printed pages of assembly and fixed 2 long-standing bugs.

http://www.robotron2084guidebook.com/technical/christianging...

gp2000 · on Oct 16, 2017

I imagine the original labels would have been in English. The assembly source code I've seen for Japanese games has variables and labels in English with Japanese comments in Shift-JIS. I would guess the choice was forced because the assembler, linker, debugger or other tools did not support Shift-JIS properly. Often labels are restricted to 6 bytes which would be 3 Japanese characters. Perhaps such a limit was also a factor.

bitwize · on Oct 17, 2017

The first console game I can think of that was written in C was Sonic Spinball. Compared to other Sonic games, the jank was palpable. But the dev team was way behind schedule and behind the 8 ball; switching to C helped them crank out the engine much faster.

Pulcinella · on Oct 16, 2017

Yes I believe on the Genesis most games were written in assembly as well. Sonic Spinball was written in a “high level language” (i.e. C) and so it only runs at 30 FPS instead of 60.

bluedino · on Oct 16, 2017

Yes, the original NES games were written in assembler. However, the source probably didn't look quite like this. I'd guess the assembler of the time didn't have as many features or things like long label names.

Here's some actual Atari 7800 (a less popular console from the same generation) code that was found on disks in a dumpster when some Atari offices closed. They both use a 6052-based CPU but have very different sound/graphics chips. I'd bet the NES code looked a lot more like this - https://github.com/OpenSourcedGames/Atari-7800

khedoros1 · on Oct 16, 2017

> Would the original game have been written in assembly? And if so, would the source have looked similar to this?

Yes, and probably somewhat. The programmers were Japanese, so variable names would almost certainly be different, and the assembler this code was written for is actually for the CPU in the Super Famicom/SNES, so although I'm not sure when the assembler was written, it certainly wasn't around in 1984 when this game was being written. I think that the notation style is based on Nintendo's system development documentation, though.

The NES would actually be a good place to look at some assembly, at least to get a basic idea of how it works. There aren't many operations, they're pretty easy to understand. There are only a few registers, and no layers of historic cruft layered on top ofit. The same (well, very close) CPU was used in a lot of computers from the same era: https://en.wikipedia.org/wiki/MOS_Technology_6502#Computers_...

Narishma · on Oct 17, 2017

> the assembler this code was written for is actually for the CPU in the Super Famicom/SNES

Super Mario Bros is an NES game.

khedoros1 · on Oct 17, 2017

I know that; I played that game for years before the SNES came out, but the assembler supports the Ricoh 5A22 that was used in the SNES. The Ricoh 2A03 used in the NES is basically a subset of the SNES' processor.

I was commenting on the fact that since the assembler itself supports the 16-bit variants of the processor family, it couldn't be the same assembler that would have been used to build the original NES code, so the exact assembly language used might not be a precise match either.

Kelbit · on Oct 16, 2017

Yes, the original game would have been written in 6502 assembly, and probably would have looked something like this. The label names and defines would have been different, since the ones in this file are the interpretation of the person who did the disassembly.

7scan · on Oct 17, 2017

I also have a makefile and game genie code generator for this on github: https://github.com/nwoeanhinnogaehr/smb-assembler

KGIII · on Oct 17, 2017

What is the goal of a project like this?

I'm absolutely not meaning that as a pejorative. I am just curious if there's an actual goal here. Is the goal to make a faithful reproduction, historical reference, a hacking challenge, or? Is it just curiosity?

I see lots of links in the thread, many for different games. Curiosity is as valid a reason as any other, but is there some sort of end result trying to be had? The various links don't actually seem to enumerate this very well, unless I missed it.

PyroLagus · on Oct 17, 2017

It has many uses really. It's a nice resource for learning assembly, especially 6502 assembly; it's a good reference for creating NES games, whether those are original titles or simply romhacks; it's also a great reference for building an emulator for the platform or a level editor for the game. And of course, disassembling a game and documenting the source code is a great hacking challenge that will leave you confident that you know asm. So, all the reasons that you listed, I suppose.

KGIII · on Oct 17, 2017

I can also understand a 'just because' answer. I was just curious if there's a greater goal. I'm not a gamer but I do kind of pay attention to some aspects.

webXL · on Oct 16, 2017

I love stuff like this. Seeing the disassembly somehow adds to the nostalgia for a childhood pastime.

What's this endlessloop for: https://gist.github.com/1wErt3r/4048722#file-smbdis-asm-L712

(Yes, please ;)

jepler · on Oct 16, 2017

You can see that just above the endless loop, the code will "enable NMIs". I'm not sure about the nomenclature here (because normally NMI stands for non-maskable interrupts, meaning you can't disable them) but basically the game at this point becomes event (interrupt) driven, probably from the vertical retrace interrupt or another kind of timer interrupt. When no event is being handled, the CPU idles within this endless loop.

jepler · on Oct 16, 2017

https://wiki.nesdev.com/w/index.php/PPU_registers#Controller... states that bit 7 of the register at $2000 (which they call PPUCTRL, and this disassembly calls PPU_CTRL_REG1) will "generate an NMI at the start of the vertical blanking interval"

camhenlin · on Oct 16, 2017

For people interested in the SMB disassembly, you may also be interested in this one: http://bisqwit.iki.fi/jutut/megamansource/ NES MegaMan disassembly, with some comments

wgrover · on Oct 16, 2017

Looks like data for the various songs here:

https://gist.github.com/1wErt3r/4048722#file-smbdis-asm-L160...

I'd love to see the process of extracting actual audio from that.

strangecasts · on Oct 16, 2017

The NES had a memory-mapped APU[1], so the game just sets sound registers to play the appropriate notes, and ticks down a timer until it's time to switch to the next note: https://gist.github.com/1wErt3r/4048722#file-smbdis-asm-L156...

[1] https://wiki.nesdev.com/w/index.php/APU

earenndil · on Oct 16, 2017

There're tools that can extract audio from snes roms, I'm sure there's something similar for nes.

kchr · on Oct 17, 2017

Lots of retro game reverse engineering stuff on HN lately; please continue this trend :-)

andyjohnson0 · on Oct 17, 2017

I'm not familiar with the history of this work, so I'll just ask. Is this the assembler code that was written by the original dev team, or were the comments reverse-engineered from a dissassembly of the game's machine code?

thristian · on Oct 17, 2017

It's a "disassembly", so it's the work of somebody converting machine code back into assembler, and studying it long and hard to add comments, break out numbers into sensible-named constants, and choose decent names for loop labels. The original dev team was almost certainly not involved.

andyjohnson0 · on Oct 17, 2017

Thanks for the clarification. Given what you've said and the size of the thing, this is an awesome piece of work. The level of detail and perseverance required is enormously impressive.

segmondy · on Oct 16, 2017

The entire game in 16,000 lines of assembly code. :-)

tinus_hn · on Oct 18, 2017

I love how the ‘author’ expresses his thanks for everyone and his dog, except for the people at Nintendo who actually wrote the code.

zxy_xyz · on Oct 16, 2017

Is the gameplay logic in a different file or did i miss something?

psyc · on Oct 16, 2017

It's all in there. Probably start with PlayerCtrlRoutine.

1zael · on Oct 16, 2017

This is insane.

salqadri · on Oct 16, 2017

Tu tu tu turuturu turuturutururur...

MBCook · on Oct 16, 2017

Is there a version of this somewhere that people have changed it to C or some other higher level language for easier skimming? I’d love to see that.

(Yes, I know it was written in ASM).