thubnail

Ripping the data off an Atari cartridge

A little while ago, I read Racing the Beam: The Atari Video Computer System about the Atari 2600. I quite enjoyed it. It’s fascinating to read about the herculean efforts required to write software for such limited hardware. One throwaway line caught my attention: since RAM was so expensive in those days, the game cartridges only had ROM chips to store the software. This means you couldn’t save your games, so each level would give you a completion code which you’d write down and then input later to jump back to the level.

Ever since watching Ben Eater’s Hacking a TV Sensoring Device video, I’d wanted to try ripping the data off a ROM chip. An Atari cartridge seemed like the perfect intro project: Atari catridges are presumably well-documented because the console was so successful, and the technical requirements like clock speed or voltage should be child’s play for a modern microcontroller. So I poked my head into a local vintage game store and grabbed a cartridge. I ended up choosing a cartridge containing the BASIC computing language (ATARI CLX4002) because:

  1. Hopefully it would be easy to tell if I’d successfully ripped the data because the compiler tokens (VAR, IF, FOR, etc.) should be clearly visible in the data
  2. it seemed cool to examine the code of a compiler that can fit into 8k ROM

So I journeyed off!

Note: I am now well aware that the cartridge I picked up is actually for the Atari 400 home computer, and not the Atari 2600 console. The Atari 400/800 computer came after the 2600, and was a proper computer. This makes sense, considering a BASIC compiler would be more logical for a computer than a game console.

Part 1: Breaking and entering

Cartridge (front) Cartridge PCB
image image

First I disassembled the cartridge to access the PCB inside. For those not in the know, a PCB or “Printed Circuit Board” is the green board you’ll find in most electronics. It basically doubles as both 1) a surface to put electrical components (chips, capacitors, resistors, etc) and 2) a way to connect those components together via “traces”, or metal tracks. You could replace a PCB with wires connecting the components together, but PCBs offer a few benefits:

  • You can automate PCB manufacturing, because they’re literally printed, but connecting components via wires probably can only be done by hand.
  • PCBs are more durable than wiring. Imagine shaking your computer and then opening it up to figure out which wire came loose. Not fun.
  • PCBs hold their traces (electrical wires) in rigid place, which prevents wires from changing positions and electrically interfering with each other.

I generally think about a PCB as just a solid-state version of components and wires, but that’s a personal mental shortcut.

PCB with chips removed PCB and chips (back)
image image

One thing I found interesting after opening it up was that the ROM chips were socketed. The sockets can be seen in the image above as the black rectangular outlines. Chip sockets are designed to make inserting and removing the chips easy. It’s a bit unusual to find a commercial PCB with socketed chips. Once I realized this, the chips simply pulled out with a tug. First I thought the cartridge might be modded by another user. However, the solder (connective material) on the underside of the PCB looks too pristine to be hand-modded. I’ve decided that the socketing was probably just another manufacturing byproduct of the time. Probably the PCBs were manufactured in one plant and the ROM chips in another. It would be easier to push the chip into a socket when the PCB and chips were later combined, rather than going through another round of soldering to combine them. I have no hard evidence for this, but it seems a reasonable-enough guess

I figured I’d be able to read the ROM chip type or name off the outside of the chip and look up the data sheet for it. The data sheet should give me all the details on how to read data from the chip, such as the pinout (which pins do what) and the clock speed. I was able to determine the manufacturer of the ROM (Synertek) by looking up a chart of old chip manufacturer logos. However, I couldn’t figure out more about the chips based on the cryptic serial numbers on the chip. If you can figure them out, let me know! My best guess is that the electronics industry was seeing a massive boom at this time, and churning out ROM chips. This led to temporary product lines and secondary manufacturers, which is why it’s hard to trace the chip back to a “family.” For example, the 6502 microprocessor (also used in the Atari) was second-sourced by Synertek.

Anyway, I put the project on the back burner for the time being. I couldn’t seem to find the pinout of the chips, and didn’t want to try and guess-and-check pins in case I sent too much voltage through the wrong pin and fried it.

Part 2: A New Hope

Queue the changing of the seasons, where winter snow melts into verdent spring greens and rain.

I didn’t want to let the project sit on the shelf forever, so I decided to pivot and attempt the first rite-of-passage when learning circuitry: making a LED light up. This was its own humbling multi-day journey, not least of which because one of the wires I was using in my circuit came broken (thanks Adafruit) and I didn’t think to test it. But one YouTube tutorial and multimeter debugging session later, I had my LED lighting up! I love the multimeter, especially the diode setting. I’ve come to associate the “connected” beep with positive emotion through Pavlovian-conditioning. It’s like the opposite of my response to red-text compiler errors.

I was still thinking of how I could find the ROM chip pinout. Eventually, after watching a video on circuit fault finding, I realized that I didn’t need the chip pinout itself! Rather, I could derive the chip pinout by finding the pinout of the Atari cartridge and trace the PCB lines to the chip sockets. Unlike the ROM chips, all Atari cartridges must have the same pinout to work with the Atari, so the pinout must be well documented.

This struck me as an interesting parallel between software and hardware hacking. In software, because your computer needs to know how to run the software, you have all the information locally to also analyze how the software works. Similarly, everything you need to understand the hardware is sitting right in front of you. The hardware can try to obscure things, but ultimately it can’t lie, if it wants to also function.

But there was a small problem to my pinout theory: the pins of my cartridge didn’t match the pinout of an Atari 400 cartridge! The Atari cartridge pinout has 15 pins on each side, where mine had 13 (front) and 12 (back).

Cartridge slot (present on all machines; Left Cartridge/Cartridge A on 800):
     A  B  C  D  E  F  H  J  K  L  M  N  P  R  S    Edge Connector
     -  -  -  -  -  -  -  -  -  -  -  -  -  -  -    15/30 (15x2P 30P)
     -  -  -  -  -  -  -  -  -  -  -  -  -  -  -    0.100" contact pitch 
     1                                         15
 1. /S4 Select $8000-$9FFF              A. RD4 RAM Deselect $8000-$9FFF
                                           except 400: Not Connected
 2. A3 Address bus line 3               B. Vss GND Ground 
 3. A2 Address bus line 2               C. A4 Address bus line 4
 4. A1 Address bus line 1               D. A5 Address bus line 5
 5. A0 Address bus line 0               E. A6 Address bus line 6 
 6. D4 Data bus line 4                  F. A7 Address bus line 7
 7. D5 Data bus line 5                  H. A8 Address bus line 8
 8. D2 Data bus line 2                  J. A9 Address bus line 9
 9. D1 Data bus line 1                  K. A12 Address bus line 12
10. D0 Data bus line 0                  L. D3 Data bus line 3
11. D6 Data bus line 6                  M. D7 Data bus line 7
12. /S5 Select $A000-$BFFF              N. A11 Address bus line 11
13. Vcc +5V                             P. A10 Address bus line 10
14. RD5 RAM Deselect $A000-$BFFF        R. 400/800/1200XL: R/W Early
    except 400: Not Connected              600XL/800XL/XE: R/W Read/Write
15. /CCTL Cartridge Control $D5xx       S. 400/800: RASTIME
                                                       Row Address Strobe Time
                                           XL/XE: BPhi2 Buffered Phase 2 Clock

(taken from here)

I did a reasonable amount of research to determine if there were different cartridge types for the Atari 400/800. Maybe there was like a A and B version or something? I even searched through the 300 page original Atari manual for clues. No luck. At this point, I was worried I had a weird cartridge version that wasn’t well documented. Or maybe the cartridge was modded after all. I was stuck.

However, I looked up the same BASIC cartridge in google images and saw identical PCBs, which gave me some reassurance that the issue wasn’t specific to my cartridge. I eventually figured out that the front and back pins aligned and left space for non-existent pins (15 in total). So the cartridge would connect to the pins, except pins 1,15,A,R, and S would not be connected to anything. It seems weird that they’re not tied to anything, not even ground, but life goes on.

Interlude: The orange thingy

I was also curious what the orange component was in the top left (see earlier image). It looked visually like a thin sheet of metal wrapped around a core, which would indicate a capacitor, but I wasn’t sure. It could also be a resistor or (less likely) an inductor. I used a multimeter to measure the resistance through the component, but the meter indicated immeasurably high resistance (an open circuit). Uh oh. I measured it some more with similar results. I was worried that the cartridge had come to me already broken. Curiously, at one point between cups of tea, I did see a brief numerical reading on resistance before it went back to open circuit. However, I wasn’t able to recreate it. As a final thought, I wondered if I had switched the leads when I set the probe down to drink tea, so I switched them again and saw another spike! Success!

Those more electrically savvy of you have probably already guessed that the multimeter was passing some current through the component to measure resistance. As it did that, it also charged up the capacitor. Once it was fully charged in a few microseconds, no more current flowed in the circuit and resistance went to infinity. By switching the leads, current would briefly flow the other way and flip the capacitor charge, before again reaching equilibrium. So the device was a capacitor, and it still worked!

I did a bit of research and found the capacitor was probably used for decoupling the power supply. Here’s a good video explaining the idea further. This guess is backed up by the two connected traces are larger than all others, which probably means they’re used for power and ground (confirmed by the pinout above).

Part 3: Meet Jack the Ripper

(Jack the Atari Ripper, in all its glory)

Now that I knew the pinout, I simply needed to connect the pins to a microcontroller to write the address in and read the data out. I had a Raspberry Pi Pico on hand, so I used that. The chips need +5v power, which can be provided by the pico. I tried to coordinate wire colors with functionality (blue is address, yellow is data) but I quickly ran out of those color wires and improvised. I then wrote a simple script to loop though the address ranges and save the output data to a binary file.

def read_word(addr):
    A0_PIN.value(addr  & 0b1)
    A1_PIN.value(addr  & 0b10)
    A2_PIN.value(addr  & 0b100)
    A3_PIN.value(addr  & 0b1000)
    A4_PIN.value(addr  & 0b10000)
    A5_PIN.value(addr  & 0b100000)
    A6_PIN.value(addr  & 0b1000000)
    A7_PIN.value(addr  & 0b10000000)
    A8_PIN.value(addr  & 0b100000000)
    A9_PIN.value(addr  & 0b1000000000)
    A10_PIN.value(addr & 0b10000000000)
    A11_PIN.value(addr & 0b100000000000)
    A12_PIN.value(addr & 0b1000000000000)
    time.sleep(0.0001)
    
    word = 0
    word |= D0_PIN.value()
    word |= D1_PIN.value() << 1
    word |= D2_PIN.value() << 2
    word |= D3_PIN.value() << 3
    word |= D4_PIN.value() << 4
    word |= D5_PIN.value() << 5
    word |= D6_PIN.value() << 6
    word |= D7_PIN.value() << 7
    
    print("Address: ", hex(addr), " value: ", word)

    return word
    
    
def read_addr_range(start, end, file):
    with open(file, 'wb') as file:
        for addr in range(start, end):
            word = read_word(addr)
            file.write(word.to_bytes(1, 'little'))
        

def main():
    read_addr_range(0x0000, 0x2000, 'atari-rip.bin')


main()

All there was left to do was to connect it up and run it. At this point, I took a quick detour and tried to see if I could generate PCB schematics from an image of the PCB and and an image filter, with mixed results. It was a fun challenge and something I’d like to come back to, but didn’t want to get too distracted from the project at hand.

Image of PCB Output of image filter
image image

Back to the data. At first I wasn’t getting any data out, but after double-checking pin numbering and connectivity via the multimeter, suddenly data was flashing across the screen:

00000000: a5ca d004 a508 d045 a2ff 9ad8 aee7 02ac  .......E........
00000010: e802 8680 8481 a900 8592 85ca c88a a282  ................
00000020: 9500 e894 00e8 e092 90f6 a286 a001 207f  .............. .
00000030: a8a2 8ca0 0320 7fa8 a900 a891 8491 8ac8  ..... ..........
00000040: a980 918a c8a9 0391 8aa9 0a85 c920 f8b8  ............. ..
00000050: 2041 bd20 72bd a592 f003 2099 bd20 57bd   A. r..... .. W.
00000060: a5ca d09c a2ff 9a20 51da a95d 85c2 2092  ....... Q..].. .
00000070: ba20 f4a9 d0ea a900 85f2 859f 8594 85a6  . ..............
00000080: 85b3 85b0 85b1 a584 85ad a585 85ae 20a1  .............. .
00000090: db20 9fa1 20c8 a2a5 d510 0285 a620 a1db  . .. ........ ..
000000a0: a4f2 84a8 b1f3 c99b d007 24a6 30b2 4c89  ..........$.0.L.
000000b0: a1a5 9485 a720 c8a2 20a1 dba9 a4a0 afa2  ..... .. .......
000000c0: 0220 62a4 86f2 a5af 20c8 a220 a1db 20c3  . b..... .. .. .
000000d0: a190 35a4 9fb1 f3c9 9bd0 06c8 91f3 88a9  ..5.............
000000e0: 2009 8091 f3a9 4005 a685 a6a4 a884 f2a2   .....@.........
000000f0: 0386 a7e8 8694 a937 20c8 a2a4 f2b1 f3e6  .......7 .......
00000100: f2c9 9bd0 f320 c8a2 a594 a4a7 9180 a4f2  ..... ..........
00000110: 88b1 f3c9 9bd0 9aa0 02a5 9491 8020 a2a9  ............. ..
00000120: a900 b003 20dd a938 e594 f020 b013 49ff  .... ..8... ..I.
00000130: a8c8 a28a 207f a8a5 9785 8aa5 9885 8bd0  .... ...........
...(skipping some data)...
000004a0: 02c5 aad0 ddb1 9530 03c8 d0f9 38b0 dac7  .......0....8...
000004b0: a752 45cd caa7 4441 54c1 f3a6 494e 5055  .RE...DAT...INPU
000004c0: d4bc a643 4f4c 4fd2 32a7 4c49 53d4 23a7  ...COLO.2.LIS.#.
000004d0: 454e 5445 d2bf a64c 45d4 93a7 49c6 d1a6  ENTE...LE...I...
000004e0: 464f d2e9 a64e 4558 d4bc a647 4f54 cfbc  FO...NEX...GOT..
000004f0: a647 4f20 54cf bca6 474f 5355 c2bc a654  .GO T...GOSU...T
00000500: 5241 d0bd a642 59c5 bda6 434f 4ed4 5fa7  RA...BY...CON._.
00000510: 434f cd20 a743 4c4f 53c5 bda6 434c d2bd  CO. .CLOS...CL..
00000520: a644 45c7 5fa7 4449 cdbd a645 4ec4 bda6  .DE._.DI...EN...
00000530: 4e45 d719 a74f 5045 ce23 a74c 4f41 c423  NE...OPE.#.LOA.#
00000540: a753 4156 c540 a753 5441 5455 d349 a74e  .SAV.@.STATU.I.N
00000550: 4f54 c549 a750 4f49 4ed4 17a7 5849 cf62  OT.I.POIN...XI.b
00000560: a74f ce5c a750 4f4b c5fb a650 5249 4ed4  .O.\.POK...PRIN.
00000570: bda6 5241 c4f4 a652 4541 c4ee a652 4553  ..RA...REA...RES

(nb: the format is <address>: <raw bytes in hex> <ascii representation>)

Woah! We’re definitely getting valid data, as verified by the strings of legible text “INPU..COLO..STATU..PRIN” Awesome! I did have a couple of concerns at this point. For one, the text strings all seem to be missing the last letter for some reason. It’s unclear if those are just BASIC abbreviations or corrupted data. Another concern is that data that probably should be zeroed out is not:

00000000: 0800 0000 0000 0000 0000 0000 0000 0000  ................
00000010: 0808 0808 0808 0000 0000 0000 0000 0000  ................
00000020: 0000 0000 0008 0808 0808 0800 0000 0000  ................
00000030: 0000 0000 0000 0000 0000 0808 0808 0808  ................
00000040: 0000 0000 0000 0000 0000 0000 0000 0008  ................
00000050: 0808 0808 0800 0000 0000 0000 0000 0000  ................
00000060: 0000 0000 0808 0808 0808 0000 0000 0000  ................
00000070: 0000 0000 0000 0000 0008 0808 0808 0808  ................
00000080: 0000 0000 0000 0000 0000 0000 0000 0808  ................
00000090: 0808 0808 0800 0000 0000 0000 0000 0000  ................
000000a0: 0000 0000 0808 0809 0808 0000 0000 0000  ................
000000b0: 0000 0000 0000 0000 0008 0808 0808 0800  ................
000000c0: 0000 0000 0000 0000 0000 0000 0000 0808  ................
000000d0: 0808 0808 0000 0000 0000 0000 0000 0000  ................

This data is taken from the second ROM chip, where the addresses 0x0000 to 0x1000 should all presumably be zero. However, you can see clumps of 08 and even a couple 09s.

There are a few possible explanations:

  • There is an issue with my wire setup, where the wires close together are accidentally communicating through emitted electromagnetic radiation (called “cross-talk”)
  • These values are meant to be in the ROM, to help differentiate between “intentionally” blank data and no signal or accidentally blank data
  • These values are meant to combine with the values from the other ROM, to create final values. Perhaps it’s a simple form of DRM?

I was suspicious that it was my wire setup. The consistency of 08 indicated that the D4 pin, which corresponds to 8, could be accidentally going high. There were a few 09s as well, so perhaps the D0 pin was also seeing some noise. An easy enough test was to move the wires around and see if I saw the same data or not. If not, it was probably noise.

My suspicions were increased when I looked and the wires I used for D0 and D4 happen to be much longer than the other wires (I only had a limited supply of wires so some are longer than others). I tried holding the wires to the side, so that they did not run parallel to the other wires, and re-ran the script.

Here’s the output of data that (probably) should be zeroed out:

00001000: a5a5 a5a5 a5a5 a5a5 a5a5 a5a5 a5a5 a5a5  ................
00001010: a5a5 a5a5 a5a5 a5a5 a5a5 a5a5 a5a5 a5a5  ................
00001020: a5a5 a5a5 a5a5 a5a5 a5a5 a5a5 a5a5 a5a5  ................
00001030: a4a5 a4a5 a4a5 a4a5 a5a5 a5a5 a5a5 a5a5  ................
00001040: a5a5 a4a5 a4a5 a4a5 a4a5 a4a5 a4a5 a5a5  ................
00001050: a5a5 a5a5 a5a5 a4a5 a4a5 a4a4 a4a4 a4a4  ................
00001060: a4a4 a4a5 a4a5 a5a5 a5a5 a4a5 a4a5 a4a4  ................
00001070: a4a4 a4a4 a4a4 a4a4 a4a5 a4a5 a4a5 a4a5  ................
00001080: a4a5 a4a4 a4a4 a4a4 a4a4 a4a4 a4a4 a4a5  ................
00001090: a4a5 a4a5 a4a5 a4a4 a4a4 a4a4 a4a4 a4a4  ................
000010a0: a4a4 a4a4 a4a5 a4a5 a4a5 a4a4 a4a4 a4a4  ................
000010b0: a4a4 a4a4 a4a4 a4a4 a4a4 a4a5 a4a5 a4a5  ................
000010c0: a4a4 a4a4 a4a4 a4a4 a4a4 8484 8484 8484  ................
000010d0: 8484 a4a5 a4a4 a4a4 a4a4 a4a4 8484 8484  ................
000010e0: 8484 8484 8484 8485 8484 8484 8484 8484  ................
000010f0: 8484 8484 8484 8484 8484 8484 8484 8484  ................
00001100: 8484 8484 8484 8484 8484 8484 8484 8484  ................
00001110: 8484 8484 8484 8484 8484 8484 8484 8484  ................
00001120: 8484 8484 8484 8484 8484 8484 8484 8484  ................
00001130: 8484 8484 8484 8484 8484 8484 8484 8484  ................
00001140: 8484 8484 8484 8484 8080 8080 8080 8080  ................
00001150: 8080 8080 8080 8080 8080 8080 8080 8080  ................
00001160: 8080 8080 8080 8080 8080 8080 8080 8080  ................
00001170: 8080 8080 8080 8080 8080 8080 8080 8080  ................
00001180: 8080 8080 8080 8080 8080 8080 8080 8080  ................
00001190: 8080 8080 8080 8080 8080 8080 8080 8080  ................
000011a0: 8080 8080 8080 8080 8080 8080 8080 8080  ................
000011b0: 8080 8080 8080 8080 8080 8080 8080 8080  ................
000011c0: 8080 8080 8080 8080 8080 8080 8080 8080  ................
000011d0: 8080 8080 8080 8080 8080 8080 8080 8080  ................
000011e0: 8080 8080 8080 8080 8080 8080 8080 8080  ................
000011f0: 8080 8080 8080 8080 8080 8080 8080 8080  ................
00001200: 8080 8080 8080 8080 8080 0000 0000 0000  ................
00001210: 0000 0080 8080 8080 8080 8080 0000 0000  ................
00001220: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001230: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001240: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001250: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001260: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001270: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001280: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001290: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000012a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000012b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000012c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000012d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000012e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000012f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001300: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001310: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001320: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001330: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001340: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001350: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001360: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001370: 0000 0000 0000 0000 0000 0000 0000 0000  ................

Well, honestly I don’t know if it’s better. It’s at least notable that the values are more consistent now, and we’ve confirmed our noise hypothesis. It’s possible this won’t be an issue for our actual data, because the strength of the valid signals could outweigh minor noise. We can inspect our data for now and circle back later if we need to.

Part 4: Trust…but verify

At this point, I decided to cheat a bit and download a dump from another user for comparison. I mainly needed a sanity check that the values I was reading weren’t skewed by the noise. In particular, the strings (“DAT”, “INPU”, “COLO”) missing their last letter concerned me.

I downloaded a dump from the Atari forums and compared them side-by-side:

(Left is others’ data, right is my data. You can click to zoom in.)

The data looks good! Although their dump seems to contain a header, which is not present on my cartridge. Once the header is removed, the data matches up perfectly.

So what is this header? I was surprised the cartridge dump didn’t contain a header. Even with constrained storage at the time (8kb), I expected to see some sort of data indicating the start of valid data from the cartridge.

I searched around for a description of the different Atari file types (and corresponding headers). I stumbled on this post on the Atari forums discussing some of the file types. The author even wrote a Ghidra extension to parse the binary files! Well, if the code can parse Atari binary files, the code knows the structure of the different Atari binary files, so we can just look directly in the code:

(taken from the Ghidra Atari extension GitHub)

I’m guessing we’re mainly interested in the Cartridge file type. We can look at how the code parses it:

(also taken from the Ghidra Atari extension GitHub)

Okay, okay. Looks like the code first checks for the presence of a magic number (“CART”), then reads the next few bytes to see what type of cartridge it is. Seems simple enough. Can we manually create a custom header for our Atari rip? Now that we know the right terms to search, we can look up the Atari Cartridge file specification:

This file describes types of cartridge images supported by the Atari800
emulator.

There are:
- raw images - files that contain only dump of cartridge memory
- CART files - images with additional 16-byte header, which contains type
  of the cartridge

[...]

The format is:
 first 4 bytes containing 'C' 'A' 'R' 'T'.
 next 4 bytes containing cartridge type in MSB format (see the table below).
 next 4 bytes containing cartridge checksum in MSB format (ROM only).
 next 4 bytes are currently unused (zero).
 followed immediately with the ROM data: 4, 8, 16, 32, 40, 64, 128, 256, 512
 or 1024 kilobytes.

(from the Atari 800 emulator repo)

So we want our header to have the structure

[CART][TYPE][CKSM]0000

and considering the code specifies the type we want for an 8kb cartridge, 0001, and the code never uses the checksum, our header looks like

00000000: 4341 5254 0000 0001 0000 0000 0000 0000 CART............

So, I manually prepended the bytes to the file, and loaded it in Ghidra. I was a little sceptical it would be so easy, but Ghidra accepted it without a problem. Success!

Part 5: Inspecting the binary

image

Ghidra is a popular disassembler/decompiler tool, which can take a binary file and reconstruct the source code. It automates much of the work for you, but the last 20% of work still needs to be done by hand. It can also automatically scan the binary file for things such as string tables or debug information.

I spent some time in Ghidra trying to make sense of the codebase. However, there were a couple issues at play:

  • Atari BASIC was actually written in assembly, which means decompilation to C may not be the cleanest
  • Because the compiler had to fit in 8kB, the code is compressed and accidentally obfuscated

Which is to say, it was kinda a jumbled mess. I decided to search around online to see if anyone had already done this work. I stumbled on the Atari BASIC Source Book, which contains a ton of information on the compiler structure. They even have a section dedicated to every table in the binary file, including one of particular note:

Statement Name Table ($A4AF). The first two bytes in each entry point to the information in the Statement Syntax Table for this statement. The rest of the entry is the name of the statement name in ATASCII. Since name lengths vary, the last character of the statement name has the most significant bit turned on to indicate the end of the entry. The value of the Statement Name Token is derived from the relative (from zero) entry number of the statement name in this table.

If you don’t recognize the address A4AF, that’s the address where we saw FO...NEX...GOT..!

000004a0: 02c5 aad0 ddb1 9530 03c8 d0f9 38b0 dac7  .......0....8...
000004b0: a752 45cd caa7 4441 54c1 f3a6 494e 5055  .RE...DAT...INPU
000004c0: d4bc a643 4f4c 4fd2 32a7 4c49 53d4 23a7  ...COLO.2.LIS.#.
000004d0: 454e 5445 d2bf a64c 45d4 93a7 49c6 d1a6  ENTE...LE...I...
000004e0: 464f d2e9 a64e 4558 d4bc a647 4f54 cfbc  FO...NEX...GOT..
000004f0: a647 4f20 54cf bca6 474f 5355 c2bc a654  .GO T...GOSU...T
00000500: 5241 d0bd a642 59c5 bda6 434f 4ed4 5fa7  RA...BY...CON._.
00000510: 434f cd20 a743 4c4f 53c5 bda6 434c d2bd  CO. .CLOS...CL..
00000520: a644 45c7 5fa7 4449 cdbd a645 4ec4 bda6  .DE._.DI...EN...
00000530: 4e45 d719 a74f 5045 ce23 a74c 4f41 c423  NE...OPE.#.LOA.#
00000540: a753 4156 c540 a753 5441 5455 d349 a74e  .SAV.@.STATU.I.N
00000550: 4f54 c549 a750 4f49 4ed4 17a7 5849 cf62  OT.I.POIN...XI.b
00000560: a74f ce5c a750 4f4b c5fb a650 5249 4ed4  .O.\.POK...PRIN.
00000570: bda6 5241 c4f4 a652 4541 c4ee a652 4553  ..RA...REA...RES

So that’s the mystery solved! The strings DO have the last letter, but the leftmost bit is also set to indicate the end of the word. In hindsight, what I should have done is XORed the data I had with the string I was expecting to compare them, and I would have seen the pattern immediately. A good technique to file away for next time.

We can grab the full table of tokens now:

Index: 42951, String: REM
Index: 42954, String: DATA
Index: 42739, String: INPUT
Index: 42684, String: COLOR
Index: 42802, String: LIST
Index: 42787, String: ENTER
Index: 42687, String: LET
Index: 42899, String: IF
Index: 42705, String: FOR
Index: 42729, String: NEXT
Index: 42684, String: GOTO
Index: 42684, String: GO TO
Index: 42684, String: GOSUB
Index: 42684, String: TRAP
Index: 42685, String: BYE
Index: 42685, String: CONT
Index: 42847, String: COM
Index: 42784, String: CLOSE
Index: 42685, String: CLR
Index: 42685, String: DEG
Index: 42847, String: DIM
Index: 42685, String: END
Index: 42685, String: NEW
Index: 42777, String: OPEN
Index: 42787, String: LOAD
Index: 42787, String: SAVE
Index: 42816, String: STATUS
Index: 42825, String: NOTE
Index: 42825, String: POINT
Index: 42775, String: XIO
Index: 42850, String: ON
Index: 42844, String: POKE
Index: 42747, String: PRINT
Index: 42685, String: RAD
Index: 42740, String: READ
Index: 42734, String: RESTORE
Index: 42685, String: RETURN
Index: 42790, String: RUN
Index: 42685, String: STOP
Index: 42685, String: POP
Index: 42747, String: ?
Index: 42727, String: GET
Index: 42681, String: PUT
Index: 42684, String: GRAPHICS
Index: 42844, String: PLOT
Index: 42844, String: POSITION
Index: 42685, String: DOS
Index: 42844, String: DRAWTO
Index: 42842, String: SETCOLOR
Index: 42721, String: LOCATE
Index: 42840, String: SOUND
Index: 42751, String: LPRINT
Index: 42685, String: CSAVE
Index: 42685, String: CLOAD

I decided to end the project here. I have achieved my initial goal: to retrieve the source code from the cartridge, as determined by seeing the string table of the compiler. I could spend more time digging through the Ghidra code to understand how the compiler works, but the Atari BASIC source book has already done the work for me. Plus, my goal was always to improve my electrical skills, not my software skills. Who knows, I might try dumping a GBA game next…

There were more turns than I was expecting, but it taught me a lot and I’m proud to put it back on my shelf with a checkmark next to it. If you want to play around with the dump yourself, you can find it on GitHub here.

Afterthoughts

They say when you try something for the first time, halve it in size, and then halve it again. This was a great first project: retrieving the data off a simple cartridge/ROM chip, whose pinout is well documented and I can find existing rips to compare my data to. The chips were even socketed, so I didn’t need to de-solder them. Even still, it took a lot of perserverance, thinking, and luck. I’m currently riding the high of a completed project, but if I had picked something more complex, I would probably have not finished it. And yet, most of the skills I’ve learned in my project would work just as well in a more complex project. Maybe tutorial levels aren’t so bad after all?

It’s difficult to switch my brain from a circuit board as a singular entity to a composition of smaller eletrical atomic components to manipulate and rearrange. I have experience jumping between multiple levels in software, but hardware is new to me. So it’s a good mental flexibility exercise.

It’s also hard to build a mental model from the ground up. When working in a new domain, there is no reference for “correct” vs “incorrect”. Should my cartridge have 15 pins on each side? If it doesn’t, is it a special version of the cartridge, or for a different machine, or a flawed PCB? Everything is new. You have to stake a few knowledge posts in the ground, and then use those as reference to learn a few more things, and later you come back and re-learn your original reference points in the additional context of new info. It’s both an amazing and amusing feeling to re-learn the same idea multiple times, with increasing levels of familiarity and depth.

So, that’s that. On to the next project!