This repo is the home of MetroBoy, GateBoy, and Plait.

GateBoy

GateBoy screenshot

GateBoy is a gate-level simulation of the original Game Boy hardware that was reverse-engineered from die shots of the original DMG-01 chip. It includes all the standard cells on the chip, minus the audio (too slow). It does not currently simulate the CPU at the gate level - it's made of custom logic and is a bit too blurry for me to decipher. GateBoy's CPU is instead my current best guess at how it might be implemented given the constraints implied by the rest of the chip.

Precompiled builds with test ROMS and Plait data are here - https://github.com/aappleby/MetroBoy/releases/tag/GateBoy_v0.1.1

GateBoy runs at around 6 to 8 frames per second in "fast mode" on a modern 4-ish ghz processor. That's quite horrible compared to an emulator, but pretty impressive for something that's simulating a few billion gates per second on a single core.

I owe a huge amount of thanks to Furrtek for his original die traces and schematics that served as a Rosetta Stone for getting the whole translation started. I've noted in the codebase where I found errors in the schematics - some have been reported back to Furrtek but there are still a lot of discrepancies.

Big thanks are also owed to Gekkio for his Mooneye emulator and tests that helped bootstrap Gateboy, and for the flash cart he designed that I used to build many many additional tests.

GateBoy FAQ

How is this simulation connected to the Furrtek schematics?

- Furrtek assigned every gate on the die and in the schematic a 4-character code like "ASUR" or "BALY". Each of those gates has a corresponding line in the GateBoy source code. Lines in the source are tagged like this - `/*#p08.ASUR*/` - this means that gate ASUR (which happens to be a 2-mux) is on page 8 of the schematics, and the '#' indicates that I've manually traced the gate to verify that the schematic is correct.
- Here's a chunk of the unmodified die shot with ASUR in the middle -
    
    Here's the same chunk with Furrtek's annotations -
    
    ![](<https://s3-us-west-2.amazonaws.com/secure.notion-static.com/60b97384-9fa9-446c-83fa-2b6189987d33/ASUR_context1.png>)
    
- 
    
    ![](<https://s3-us-west-2.amazonaws.com/secure.notion-static.com/b1ac9795-8465-4372-9ea6-1cd5e27cc9bb/ASUR_context2.png>)
    
- And here's a closeup showing the three inputs coming into the top "rungs" of the cell, and the output at the bottom -
    
    which corresponds to this ASUR in Furrtek's schematic -
    
    ![](<https://s3-us-west-2.amazonaws.com/secure.notion-static.com/d1d0a7bf-4aad-403a-85e6-adf45af7fd05/ASUR_traced.png>)
    
- which in turn gets translated to this ASUR in GateBoy's code -
    
    ![](<https://s3-us-west-2.amazonaws.com/secure.notion-static.com/25285a0a-8a50-4cc8-9fa5-3609c944f745/ASUR_schematic.png>)
    
- 
    
    ![](<https://s3-us-west-2.amazonaws.com/secure.notion-static.com/abf1d75a-f9d8-403e-8e37-641ec0780e14/ASUR_code.png>)
    
- Repeat that a few thousand times, spend a year-ish debugging, and you get GateBoy. To give you a sense of scale, here's the whole die with a red dot covering ASUR - there are currently 2674 active cells in GateBoy, and another thousand-ish in the audio hardware that aren't being simulated -
    
    ![](<https://s3-us-west-2.amazonaws.com/secure.notion-static.com/0b5c2ebe-2e88-4ad3-ae1b-a6587140dd7a/ASUR_die.png>)
How is this simulation tested?

- GateBoy has a fairly comprehensive test suite that runs all of [the Mooneye tests](<https://github.com/Gekkio/mooneye-gb/tree/master/tests>), as well as a large suite of "micro-tests" that execute in a small number of cycles.
- GateBoy can also do automated render tests (used for [Mealybug's test suite](<https://github.com/mattcurrie/mealybug-tearoom-tests>)), but those are currently disabled.
- There are probably a few plain old code bugs remaining as well. Right now one of the early screens in Zelda is doing something funny with the grass tiles...
Is GateBoy a perfect simulation of a Game Boy?

- Actually no, for complicated reasons. The Game Boy chip has a handful of logic gates that operate [independently of the master clock](<https://en.wikipedia.org/wiki/Asynchronous_circuit>) and whose exact behavior depends on things like [gate delay](<https://en.wikipedia.org/wiki/Propagation_delay>). These gates create [glitches](<https://en.wikipedia.org/wiki/Glitch>) that depend heavily on the physical placement of the gates, the silicon process used to make them, and other weird things like temperature and voltage.
- For example, there's a glitch in the external address bus logic that causes internal bus addresses like `0xFF20` to appear on the external bus even though the logic should prevent that. Due to gate delays, not all of the inputs to gate `LOXO` (page 8 in Furrtek's schematics) arrive at the same time. This causes `LOXO` to produce a glitch pulse that in turn causes latch `ALOR` to make a copy of one bit of the internal bus address. `ALOR` then drives that bit onto the external bus (through a few more gates) where it can be seen with an oscilloscope or logic analyzer.
Wait, if glitches don't show up in the schematics then how did you figure that one out?

- In this case we can deduce what's going on because we can see the side-effect of the glitch on the external bus and there's not that many possible ways that address signal could've gotten there.
- Other internal glitches are harder to figure out because they don't affect external circuits - they just show up as "something does not match the simulation". There are probably 4-5 glitches that need to be tracked down somehow before the simulation is "perfect", but I'm not going to block the release of GateBoy until I find them.
Why is GateBoy so slow?

- GateBoy simulates every logic gate on the DMG chip, one gate at a time. Adding two 8-bit values isn't simulated as "a = b + c;", it's simulated as eight 1-bit adders and eight 1-bit registers and all the control logic that goes along with it.
- In debug builds, all gates also includes a bunch of error checking to verify that gates aren't read before they're updated, that buses aren't floating, that the simulation always stabilizes, and other things like that.
- GateBoy also simulates every clock *phase*, not just individual clock cycles. While you may have read that the Game Boy runs at 1 megahertz, this is not quite correct. The 4.19 megahertz clock crystal feeds a set of gates `AFUR+ALEF+APUK+ADYK` that produce four 1 mhz clocks that are out of phase with each other. Those clocks are then combined by additional logic to create sub-clocks of various patterns and frequencies whose edges can lie on either the positive or negative edges of the 4.19 mhz master clock. So, it's more accurate to say that the Game Boy has a 1-megahertz, 8-phase clock. In GateBoy we give each phase a letter (A through H) and all sub-clocks have a suffix like this - `BALY_xBCDEFGH` - which indicates that the clock generated by gate BALY is high on phases B through H.
- Even with heroic optimization and all the error checking turned off in "fast mode", we still only hit 6-8 fps on a modern CPU.
Why is GateBoy so fast?

- Aha, now we're asking interesting questions. Simulating hardware in software is usually 1000's of times slower than realtime - GateBoy is "only" 8x slower than realtime. How does it do that?
- GateBoy is designed so that most of the simulated gates optimize down to a single instruction or less after the compiler's optimization pass - one "and" gate turns into one "and" instruction, chains of "not" gates get optimized out, etcetera.
- Normally this would require a huge amount of simulation infrastructure to ensure that the simulation doesn't diverge from the "real world" circuit. GateBoy doesn't do this. Instead, GateBoy does all its error checking by adding additional flags to each wire and register and verifying (in debug builds) that every gate is evaluated in the correct order so that the result is the same as if every gate was evaluated in parallel. There are a few workarounds to deal with asynchronous logic, but they are minor.
- The flags are positioned so that they don't interfere with the usual one-instruction-per-gate operations, and in "fast mode" builds the flags are disabled and everything optimizes down quite tightly.
Wouldn't it be even faster to write this in Verilog and then simulate it in Verilator or something?

- You would think so, and I have translated small portions of GateBoy into Verilog and simulated them in Verilator just to prove that GateBoy's simulation strategy does produce correct results. However, the Verilated code is still around 5-10x slower than GateBoy compiled in "fast mode".

How do I build and run GateBoy?

All the code is cross-platform and has been tested under Windows 10, Windows 11, Debian, Ubuntu, and WSL-G. Clone the repo and don't forget to do "git submodule init" and "git submodule update" to pull down the support libraries (SDL2, glm, imgui, and json).

On Windows, open MetroBoy.sln in Visual Studio Community 2019 and build and run as ususal. If you get a "SDL2.dll not found" error, you can either install SDL2 globally, manually copy SDL2.dll into the same folder as the executables, or change "Working Directory" in Project Properties -> Configuration Properties -> Debugging to "$(SolutionDir)".