Unlike a disassembly of code that is written in a higher-level language like C, these old NES and Gameboy games were written in the assembly language you're seeing disassembled. Obviously there are no comments and labels, but the actual logic and the intent of the original developer is clearly communicated.
This allows for a really special type of awesome when you're working on fan translations and ROM hacks. You actually have the opportunity to analyze the work of the developers you idolized when you were younger, and celebrate a their clever hacks or curse them for their spaghetti code, 15 to 25 years later. Furthermore, you can contribute your own clever hacks to the code base.
What's always amazed me about the 8-bit and 16-bit console days is the relatively high quality of the code. It might be ugly, but it's relatively defect free for the most part. My understanding is that Nintendo did significant QA testing before providing their "Seal of Approval" and allowing the game to be shipped.
I know there are a few glitches and bugs, but it's still amazing to me that I can't think of a single 8 or 16 bit game published on Nintendo or Sega's consoles that had game killing bugs during normal gameplay. They didn't lock-up, reboot the system, glitch out, etc. Even when monkeying around with the weird glitches, the games stayed pretty rock solid.
I know today's games are incredibly more complex, but I don't think they go through the rather rigorous gameplay testing of those old days.
(then again, it was every kid's dream to be a gameplay tester, until they found out it was a minimum wage job that involved following a specific script to test a sprite collision bug on level 5 of Yo! Noid for 8 hours straight)
That seems pretty small by today's mega sized binaries. But for an assembly program that seems very big. It is impressive that a large part of my childhood was written in assembly
There was an interesting talk on this at Linux.conf.au a couple of years ago, trying to figure out why modern binaries are so large. http://www.youtube.com/watch?v=Nbv9L-WIu0s
> Bloat: How and Why UNIX Grew Up (and Out) - Rusty Russell, Matt Evans
> The 'ls' binary on the original release of Unix (version 6) was 4920 bytes long. Thirty six years later, 'ls' on Ubuntu is 105776 bytes. Is this the laziness of modern coders? Increasing features? Does 'cat' really now do 313 times more stuff, or is there something else going on?
OT: I find it slightly disturbing to read people's Google+ conversations that were imported to extend the YouTube thread. I hope YouTube/Google isn't really buying Twitch, despite the return that would mean for the Twitch investors.
It depends on the exact proficiency of the programmers involved and the compiler/platform/etc., but it's not unusual to have an order of magnitude difference or more between the same application written in Asm vs. a high-level language like C.
People have tried to rewrite standard *nix utilities in Asm; one example is Asmutils[1][2] which certainly illustrates the "order of magnitude or more" size difference. If you're curious about extreme size optimisation there are many cool examples of that in the demoscene, like 96KB games[3] and all the 4k/64k intros.
Must do, the art is all very compact. 2 bits per pixel - so 4 colour palette or 2-bit grayscale in the first game. All you need is black, dark grey, light grey and white and you have Pokemon Red! Music is also pretty small, as it's synthesised you don't need to store samples, you can loop repeated sections to save space and so on.
This is the entire overworld for Kanto, for instance.
And as the person below also mentioned.. No, the credits for Pokemon are extensive. There may only have been four coders, but there were level designers, sprite designers, etc, etc.
This is a pretty commmon technique in video games generally referred to as "data driven design". Basically put as much of the runtime decisions that are made into a data value that can be tweaked by designers without having to ask an engineer to do it each time.
Final Fantasy for the NES seems to be designed the same way. A lot of the spells are actually "call function A with strength N, call function B with strength M and flag X". And with a large enough vocabulary of functions and flags, you get fire spells, healing spells, instant death spells, protection spells, etc.
It's also common in embedded systems since it makes for extremely compact code. The data basically gets used as "instructions" for an application-specific virtual machine/interpreter (which may or may not be Turing-complete - most usually aren't.)
Would you care to comment on what tools are popular for disassembling these ROMs? I think it's a pretty neat hobby, anywhere suggested to read more about it?
This made me wonder if it's possible to automatically reverse engineer a small binary file into human readable (and understandable) source code.
Assuming you know the language and compiler used (and all of its quirks and optimizations), and considering that human written programs aren't so random and their patterns are most likely predictable, I think it should be possible though not at all trivial.
Are there any projects attempting this?
Labels and comments go a long way to making an assembly project readable. I don't know how an automatic tool could interpret the human intention behind a label.
It's not that difficult, but I think the main obstacle is gathering and representing the collection of knowledge in a useful form.
There was a disassembler called Sourcer that would annotate code with a set of predetermined comments based on its knowledge of the PC hardware. For example a sequence of instructions that enabled the interrupt controller by setting a specific bit in its register would be identified and get the comment "enable interrupt controller". I seem to remember IDA can do the same thing, although it's been a while since I last used it.
The Hex-Rays Decompiler attempts to do this for analysis purposes (that is, the resulting code is intended to be human-readable, but not necessarily compiled): https://www.hex-rays.com/products/decompiler
Unfortunately, I've not sprung for a license, so I can't comment on its efficacy.
Same thing is being done for Rollercoaster Tycoon 2 right now, though it patches the original .exe to use a .dll file that override the original ASM addresses and calls with C functions. I really, really like the maintainers gradual replacement approach.
Wow, thank you so much for mentioning this project (never heard of it before). I absolutely loved RC2 when I was younger and if I can get this to run with the GOG version I will be certainly do my share to bring it back as open source.
Many famous Nintendo games have been publicly disassembled before, and Nintendo is either unaware or turns a blind eye. The only different thing about this disassembly is that it is hosted on GitHub, where it is much more likely to be seen by people that aren't already interested in ROM hacking and retroprogramming.
I knew that the Gameboys were Zilog machines, but I never really put any thought into game development. I imagined they used a higher level language during actual development and ran the code through something that would result in the assembly for use in whatever. Pretty cool stuff.
General purpose high level languages[1] didn't become practical in commercial game development[2] for even what we would consider to be "non-critical" code until Doom, the PlayStation era for consoles, and the Game Boy Advance era for handhelds. Pretty much everything produced for the systems before them was done completely in assembly, and the practice remained relatively common for a while afterward.
This is partially because C compilers of the time weren't that great, partially because "everything is a critical section" when you're pushing the limits of extremely limited hardware, and partially because C's virtual machine model does not fit the average 8-bit CPU architecture at all, making pretty much impossible to this day to compile regular C code to (say) 6502 machine code that is anywhere close to optimal.[3]
[1] Ok, many games had custom bytecode interpreters, like SCUMM or the typical RPG's textbox language, but that's not quite the same thing. They could have used FORTH more (were there any games that used FORTH?), but I guess even in the days of HP calculators, people hated RPN.
[2] I guess I'm ignoring the many games done in BASIC for the 8-bit micros, but there's a clear difference in quality between these games and the average NES action game.
[3] There is a C compiler for various 8-bit architectures called cc65, but it chokes on anything resembling modern C. To get anywhere close to the performance of even naively written assembly, you have to hold its hand with machine-specific annotations (like "stick this variable in the zero page," or "make every variable static because the machine has no stack-relative address modes") to the point that it's easier to just write in assembly from the start.
Actually Pokémon Crystal itself, the rom we were discussing, has an extensive higher-level scripting language built-in for events, messages and movement. It still looks very much like assembly, but it is built up of Pokémon commands instead of machine code.
Not truly zilog z80. The GB processor lack the IX/IY registers and associated addressing modes of the z80 and it added a few instructions. It's closer to the i8080. The GB did use the z80 assembler syntax (thankfully!).
Some people still do z80 development. The community around TI graphing calculators has lots of people still doing wild stuff in z80 assembly. It's a fun platform.
I flagged this story because I don't like to see blatant copyright violations on HN. Instead, I would have loved to seen a tool that automatically disassembles a user-provided ROM to the equivalent of this repo.
I did see the commit log. I believe the project could instead check in symbol names, data structures, and comments that tag addresses in the ROM. You would also want to group those symbols into logical translation units. The copyright status on this repo may still be murky (are the comments and symbol names derived works?) but much better than the current situation where the copyrighted ROM can be rebuilt from from the repo.
Edit: I believe you can do something similar with IDA database dumps. The database doesn't contain the original executable image but instead contains a log of the IDA commands applied to the imported image. Another user can import their executable image and "replay" the IDA commands in the database.
These people put a lot of hard work into something people find interesting. I'm glad this is on hacker news because I'm interested in how these games were written and developed.
Heavens no, it's much easier to complain and criticize other people's work.
I had a feeling "being useful" is something people like OP don't actually do, and checking his online presence only proved me right. This is making me abnormally furious.
I have not but other open source projects have similar issues when dealing with copyrighted files. OpenJailbreak, for example, downloads copyrighted firmwares and then applies patches locally. They did not want to host modified Apple binaries in their open source project. They even went the extra mile and developed a library that can partially extract zip files served over HTTP so the entire firmware would not be downloaded. [0]
I agree that a lot of hard work has gone into the ROM reversing. I also cannot dispute that it is fascinating! I just wish the developers could have gone about the project in a manner that would not besmirch the project with copyright concerns.
Right, but all you're arguing for is a shift of legal responsibility on a project that's unlikely to be bothered legally, which will in-turn add another meta layer of development for the people involved(let's build a patcher).
It's illegal, yes.
It's also :
1) a massive PR blow to sue fans who have taken an interest
2) a legal money-sink on decades old property that isn't producing less profit as a result.
For all you know, this github could have generated revenue for Nintendo through the revitalization of interest in the franchise using nostalgia against current day software developers...
This allows for a really special type of awesome when you're working on fan translations and ROM hacks. You actually have the opportunity to analyze the work of the developers you idolized when you were younger, and celebrate a their clever hacks or curse them for their spaghetti code, 15 to 25 years later. Furthermore, you can contribute your own clever hacks to the code base.