Preface
Continuing from where we left off in the post #3 on Smarty and the Nasty Gluttons.. this post is about the floppy disk bootblock used in the online ADF version of the game.
The online ADF game bootblock
I had never coded a TVD (trace vector decoder) or even cracked one. Also my experience using "weird stuff" one can do with mc68000 prefetch, use of instructions like STOP and RESET were very thin. Demo coding had little use for those and my demos hardly had anything worth protecting from someone to take a peek.. Anyway, another field that was yet to be explored and I knew(!) there are folks that find such stuff "a nice surprise" in a game. So, the opportunity to code and release "a protection scheme" or an attempt of such came when we got closer releasing the Smarty and the Nasty Gluttons. I kept saying to my fellow game developers that there are folks who hardly ever will play the game but will disasm, train, fix etc the heck out of it. Adding even a small non-obvious "road block" would definitely touch a soft spot for some of them..
I had my plan - a lot did not materialize, though:
- must have a TVD - a bucket list item
- must have use of seldom used opcodes like STOP, RESET, RTR, etc
- must have obfuscated stuff and somehow using Amiga's custom chips/hardware peculiarities
- must have checksums
- must have encryption depending on the bootblock that is spread all around the game files
- must have self-modifying code
- must have code that makes no sense but is still required..
- must work on all game dev group Amigas (includes CPUs up to mc68040 - at the end Jope @EAB did huge work testing with many.. many.. Amiga setups)
- should have AR protection
I started with experimenting with RESET + Memory Overlay. Got it working on my A600 (up to 68020) but never on my A3000 (with 68040). There was even a short discussion started by me about it in EAB. Sadly, I had to drop this. It would have been so cool just to have it ;-)
Next in testing was the mc68000 prefetch experimenting with self-modifying code. It turned out to work nicely. Then I targeted STOP instruction and releasing it with a Copper generated IRQ. Again turned out to work nicely.
AR protection I thought a bit but the only "good" mechanism I knew of was using odd address A7 and that did not really fit to the rest of the plan. I needed to use stack and IRQs to work. Also, I had no AR hardware to use for testing. Found one on the web but sadly I got no Amiga model that it required. Another item to drop..
Then came the TVD.. I have now huge respect to folks in 1980s and early 1990s implementing and cracking these weirdo TVD protections with tools back then.. I mean, I coded/tested/debugged the one and only in Smarty on a REAL HARDWARE and getting that run was an absolute nightmare ;-) Once I finally managed to get the execution path recorded and a mini sized emulator written to do the trace vector encoding part I never changed that anymore. Remember, this was my first ever TVD and had no experience of cracking a TVD. Obviously I had looked at few in past and read the excellent article found in Flashtro about "Basic TVD Cracking" (kudos to WayneKerr).. still, I consider myself a total newbie on anything cracking & protecting related foo.
Anyway, at some point of time we just ran out of time (no pun intended for a game being under development since early 1990s), which meant I dropped checksums, blitter decoding of stuff, copper orchestrated stuff (e.g., blitter) and actually anything that would have required modifications into the actual game code. We were play testing, fixing bugs and fine tuning so much that all this checksum crap etc just did not make it into the plan anymore. I really regret it but.. it was more important to get the game out. That's the reason the "protection attempt" is only on the bootblock and nothing in the actual game is protected. Sorry.
Let's start looking into the actual code.. against my usual principles the code snippets etc are taken from FS-UAE. Stepping through all this weird stuff is just too easy with it. Taking screen captures using Amiga and transferring them to a modern social-media-enabled platform is a bit tedious.. and I am currently slightly short of soothing liqueurs to ease the work as well. And don't laugh.. while the stuff here is trivial'ish I really, really, enjoyed the journey except for some parts of the TVD debugging.
I have highlighted few important areas in the bootblock hexdump. The hexdump in red with yellow background is the copper list, which also is runnable mc68000 code. Actually, our copper list is the privilege violation exception handler.
The hexdump in black and yellow background is a stop mark for the loop relocating the bootblock into low memory and the following word in red background is the initial USP pointer. The hexdump in green background is the reminder of the non-system cache killing code (remember what I said about the TVD code execution path recording..) These three mentioned here are all related in the bootblock code.
00005C40 444F 5300 605A 247E 7FFF 7F7F 4BFA 001A DOS.`Z$~....K...
00005C50 43FA FFF2 4CD1 0030 740A 95CA 264F 4EAE C...L..0t...&ON.
00005C60 FFE2 58AF 0002 4E73 49F9 00DF 89A0 2945 ..X...NsI.....)E
00005C70 66FA 13C5 00BF ED01 429A 9BC9 24D9 66FC f.......B...$.f.
00005C80 21E3 0010 3659 4E63 7000 7200 4E7B 0801 !...6YNcp.r.N{..
00005C90 4E7A 1002 4E91 0695 FFFD B203 06A5 001D Nz..N...........
00005CA0 A7E1 3F0D 2F15 3F02 0000 006C 2978 0068 ..?./.?....l)x.h
00005CB0 66FA 4E71 4E73 4C97 0301 4C90 0801 B153 f.NqNsL...L....S
00005CC0 D25D 4841 3009 B540 4890 0201 B151 4E73 .]HA0..@H....QNs
00005CD0 4C97 0302 4841 0342 D29D 0542 0C83 00FE L...HA.B...B....
00005CE0 009C B399 4E77 00FE 0180 0228 FFFF FFFE ....Nw.....(....
00005CF0 0D04 00B0 66E0 397C 8290 66F6 06AF C372 ....f.9|..f....r
00005D00 4576 FFF2 760A 7A92 66FA 7C08 247C 0000 Ev..v.z.f.|.$|..
00005D10 00B0 4E62 A166 4E71 4C90 66EA 2768 47A0 ..Nb.fNqL.f.'hG.
........ quite a bit of encrypted code follows..
00005FC0 24EC 58B4 75BE 4D40 D8C7 5CF2 CEC6 BF51 $.X.u.M@..\....Q
00005FD0 7AC0 4A25 DA0D 0000 0000 5000 0000 0000 z.J%......P.....
00005FE0 00CC 4A81 6A02 F4F8 4E7B 0002 4E7B 0808 ..J.j...N{..N{..
00005FF0 4E75 434F 4F50 4552 2F43 4F4F 5045 522F NuCOOPER/COOPER/
00006000 84A2 A262 62F2 C204 A2B2 F2CA 04A2 D292 ...bb...........
00006010 3204 2232 AAF2 EA04 9204 2272 8204 CA4A 2."2......"r...J
00006020 82A2 9A04 AC4C 044A A22A 6282 042A A2A2 .....L.J.*b..*..
00006030 B204 2292 2204 A2EA A5AD B007 C0DE DBAD .."."...........
The bootblock code starts here.. The D2 is initialized to $10 as we need it for code decrypting. A5 points at the code to be run in Supervisor mode. A1 points at bootblock after the DOS Type.
00005C4C 4bfa 001a LEA.L (PC,$001a) == $00005c68,A5
00005C50 43fa fff2 LEA.L (PC,$fff2) == $00005c44,A1
00005C54 4cd1 0030 MOVEM.L (A1),D4-D5
00005C58 740a MOVE.L #$0000000a,D2
00005C5A 95ca SUBA.L A2,A2
00005C5C 264f MOVEA.L A7,A3
00005C5E 4eae ffe2 JSR (A6, -$001e) == $00000658
This is the Illegal Instruction exception handler used by the non-system cache killing code to skip the 4 bytes long offending instruction. We saved the current USP to A3 to later pop the JSR return address (i.e., $5c62) as the exception address and put that into address $10. The neat thing here is that if the CPU is mc68000 the write really goes into the exception vector at address $10. On better CPUs the exception vector may be relocated using VBR into what ever address, however, the code we have for killing cache etc do not trigger illegal instruction exception either..
00005C62 58af 0002 ADD.L #$00000004,(A7, $0002) == $000018b4
00005C66 4e73 RTE
Take note here.. A4 is loaded with a $df89a0, which we use to refer to hardware registers with an offset $6660. The value is selected purposely and later we will see why. Here these three lines disables all relevant interrupts.
00005C68 49f9 00df 89a0 LEA.L $00df89a0,A4
00005C6E 2945 66fa MOVE.L D5,(A4, $66fa) == $00dff09a
00005C72 13c5 00bf ed01 MOVE.B D5,$00bfed01
Relocate the bootblock code starting from $5c44 i.e., just after the bootblock DOS Type to memory address $4. We clear the bytes at $0 to $3 as that serves as an initial context for the TVD. Take a note of "SUB.L A1,A5", which will init A5 to $24 i.e., the address for the Trace exception vector. In the non-relocated code A5 would point at $5c64. The copying continues until the first 4 bytes aligned 0 long word shows up in the bootblock.
00005C78 429a CLR.L (A2)+
00005C7A 9bc9 SUBA.L A1,A5
00005C7C 24d9 MOVE.L (A1)+,(A2)+
00005C7E 66fc BNE.B #$fffffffc == $00005c7c (F)
00005C80 21e3 0010 MOVE.L -(A3),$00000010
Get the initial USP ($cc) and now A1 points at the reminder of the non-system cache killing code.
00005C84 3659 MOVEA.W (A1)+,A3
00005C86 4e63 MVR2USP.L A3
00005C88 7000 MOVE.L #$00000000,D0
00005C8A 7200 MOVE.L #$00000000,D1
00005C8C 4e7b 0801 [ MOVEC D0,VBR ]
00005C8E 0801 4e7a BTST.L #$4e7a,D1
00005C92 1002 MOVE.B D2,D0
00005C94 4e91 JSR (A1)
The following two lines modify the "relocated code" (we are still running non-relocated code here) at addresses $20 (equals to $5c60) and $24 (equals to $5c64) generating the privilege violation and trace exception handler addresses.
00005C96 0695 fffd b203 ADD.L #$fffdb203,(A5)
00005C9C 06a5 001d a7e1 ADD.L #$001da7e1,-(A5)
$00000020 will contain $00000090 i.e., code also located at $5cd0.
$00000024 will contain $00000076 i.e., code also located at $5cb6.
Next three instructions create a stack frame Format 0 that looks like we were returning from a privilege violation exception handler. The D2 here will cause setting the X-flag upon return and move CPU to user mode (and no trace bit set). Code execution will resume at the relocated address $90 i.e. the privilege violation exception handler once we execute the RTE instruction a bit later. A detail there is that the exception handler will then be called in user mode not in supervisor and this has a significance when it comes to selection of stack pointers.
00005CA2 3f0d MOVE.W A5,-(A7)
00005CA4 2f15 MOVE.L (A5),-(A7)
00005CA6 3f02 MOVE.W D2,-(A7)
In the relocated bootblock code $5ca8 is at address $68 i.e., Level 2 exception vector. It contains value $6c meaning the code for the exception handler is at $6c. The same value is also adequate to clear the pending IRQs triggered the Level 2 IRQ. We write the entire content of $68 i.e., $0000006c into $dff09a but only the bits for $dff09c take effect.
00005CA8 0000 006c OR.B #$6c,D0
00005CAC 2978 0068 66fa MOVE.L $00000068,(A4, $66fa) == $0000c33a
00005CB2 4e71 NOP
00005CB4 4e73 RTE
The RTE causes a jump to address $90. The stack used now is USP, which points at $cc. Note the loading of D1/A0/A1 with values from stack and later the "EOR.L D1,(A1)+" to modify the instruction that caused the exception just before exiting the handler. Although in this case below we have a fabricated stack frame thus we modify a different place in memory.. see below.
Also, the "ADD.L (A5)+,D1" starts to calculate a checksum over the bootblock code. Note that the "RTR" pops also the CCR but the supervisor portion of the status register is unaffected.
00000090 4c97 0302 MOVEM.W (A7),D1/A0-A1
00000094 4841 SWAP.W D1
00000096 0342 BCHG.L D1,D2
00000098 d29d ADD.L (A5)+,D1
0000009A 0542 BCHG.L D2,D2
0000009C 0c83 00fe 009c CMP.L #$00fe009c,D3
000000A2 b399 EOR.L D1,(A1)+
000000A4 4e77 RTR
What we have at $cc is a piece of code, which is actually used later to load A2 with a wanted value. The handler loads the value $000000b0 into A1 (remember the sign extending properties of "movem.w" and loading a word into an address register in general). Address $b0 is also the address where code execution continues after the "RTR".
000000CC 247c 0000 00b0 MOVEA.L #$000000b0,A2
Also, the above handler will now modify code at address $b0, which originally looks like:
000000B0 0d04 BTST.L D6,D4
000000B2 00b0 66e0 397c 8290 OR.L #$66e0397c,(A0, A0.W*2, $ffffff90)
000000BA 66f6 BNE.B #$fffffff6 == $000000b2 (F)
After that "EOR.L D1,(A1)+" the code changes to:
000000B0 2978 0020 66e0 MOVE.L $00000020,(A4, $66e0) == $00dff080
000000B6 397c 8290 66f6 MOVE.W #$8290,(A4, $66f6) == $00dff096
When the above code is executed the privilege violation exception handler address is used as the COP1 address and the relevant DMAs are started. The privilege violation exception handler at $90 is also a valid copper list. The significant piece there is the triggering of a Level 2 IRQ (write of $b399 to $dff09c). This will be useful later.
00000090: 4c97 0302 ; VP 4c, VE 03; HP 96, HE 02; BFD 0
00000094: 4841 0342 ; VP 48, VE 03; HP 40, HE 42; BFD 0
00000098: d29d 0542 ; VP d2, VE 05; HP 9c, HE 42; BFD 0
0000009c: 0c83 00fe ; VP 0c, VE 00; HP 82, HE fe; BFD 0
000000a0: 009c b399 ; INTREQ := 0xb399
000000a4: 4e77 00fe ; VP 4e, VE 00; HP 76, HE fe; BFD 0
000000a8: 0180 0228 ; COLOR00 := 0x0228
000000ac: ffff fffe ; VP ff, VE 7f; HP fe, HE fe; BFD 1
; End of Copperlist
The next four lines of funky code play around with the prefetch of the mc680x0. After the recent "RTR" the USP points at $d2. The "ADD.L" here will therefore change the code at address $c4 i.e. the code immediately following the "ADD.L".
000000BC 06af c372 4576 fff2 ADD.L #$c3724576,(A7, -$000e) == $000000c4
000000C4 760a MOVE.L #$0000000a,D3
000000C6 7a92 MOVE.L #$ffffff92,D5
000000C8 66fa BNE.B #$fffffffa == $000000c4 (F)
000000CA 7c08 MOVE.L #$00000008,D6
000000CC 247c 0000 00b0 MOVEA.L #$000000b0,A2
However, due to prefetch the CPU has already loaded the "old code", which sets registers D3 and D5. The "BNE.B" above causes looping back to just modified code at $c4. Now the selected offset for A4 register makes sense. It was the "BNE.B" instruction opcode. The code essentially initializes registers for the TVD and enables a "PORTS" Level 2 IRQ (that will be triggered from the copper list).
000000C4 397c c008 66fa MOVE.W #$c008,(A4, $66fa) == $00dff09a
000000CA 7c08 MOVE.L #$00000008,D6
000000CC 247c 0000 00b0 MOVEA.L #$000000b0,A2
The following instruction looks legitimate and actually is. However, it will trigger a privilege violation exception and not really change the USP at all.. but having $b0 in A2 is important later.
000000D2 4e62 MVR2USP.L A2
000000D4 a166 ILLEGAL
000000D6 4e71 NOP
Once the privilege violation handler returns back to $d2 the code has changed a bit and remember that we return from the handler using "RTR" so we remain in supervisor mode..
000000D2 4e72 a110 STOP #$a110
..and the CPU stops until an IRQ takes place with priority/level higher than 1. This means we just sit here until the copper triggers next Level 2 IRQ. This code really has no particular useful meaning but it was just something I had to have to justify the copper list in a code thingy ;-) Well, actually, the "STOP" writes the status register and here it will keep the CPU in supervisor and enable tracing! The next instruction ("NOP") will then kick-off our TVD..
The trace exception handler is located at address $76. Few notes.. The bootblock is always in the lower 32K of RAM, thus we can me use of 16 bit addressing when assigning values into address registers. For example the first "MOVEM.W" will always load A0 with address $0 and in that location we maintain the TVD context for previously decrypted instruction. We want to decrypt the next instruction but also re-encrypt what we already executed. The word at $0 is the "previous XOR key" and the word at $2 is the "previously decrypted instruction address". Initially these both were initialized to zero.
Here D0 is not used for anything. A0 is the TVD context address i.e. $0 and A1 is the address of the next instruction to execute. D1 will accumulate the checksum over the bootblock and D2 is "salt" that gets modified here and there in the decrypted code while TVD runs.
00000076 4c97 0301 MOVEM.W (A7),D0/A0-A1
Re-encrypt the previously executed instruction.
0000007A 4c90 0801 MOVEM.W (A0),D0/A3
0000007E b153 EOR.W D0,(A3)
Update the code checksum. Note that we changed to ".W" and this is to avoid the checksum calculation pointer (A5) reaching the code we decrypt with non-TVD decrypters.. those decrypters are protected by the TVD.. uhh.. I was lazy as the crossing point would have required some extra care to work properly. Most of my TVD stuff was trial'n'error and at this point the error rate started to get too high.
00000080 d25d ADD.W (A5)+,D1
00000082 4841 SWAP.W D1
The actual "decryption XOR key" is a mix of the D2 ("salt") and the address of the next instruction.. before decrypting the next instruction both the calculated "decryption XOR key" and the instruction address get stored into the TVD context location. This lame'ish key algorithm was selected because we needed to allow encrypting and decrypting code that has conditional branches. So, if the D2 ("salt") remains constant within a code block that branches around we can deterministically calculate the "decryption XOR key". As can be seen later the D2 gets modified only in carefully selected spots..
00000084 3009 MOVE.W A1,D0
00000086 b540 EOR.W D2,D0
00000088 4890 0201 MOVEM.W D0/A1,(A0)
0000008C b151 EOR.W D0,(A1)
0000008E 4e73 RTE
The following code is what gets decrypted by the TVD. The actual execution path varies depending on your Amiga's memory configuration as the memlist check for autoconfigured memory is also part of the code that is protected with the TVD. At the beginning D3 is $0000000a, D5 is $ffffff92 and A2 is $000000b0.
000000D8 4042 NEGX.W D2
000000DA 95c5 SUBA.L D5,A2
A2 points at $11e now, which is the start of the "normal encrypted code" outside the TVD protected code. The TVD protected code contains two decryption loops to decrypt the minimal FFS file loader etc. The one below is the decrypter #1 and it uses the previous code checksum as the decryption key.
000000DC d441 ADD.W D1,D2
000000DE b39a EOR.L D1,(A2)+
000000E0 5303 SUB.B #$00000001,D3
000000E2 6afa BPL.B #$fffffffa == $000000de (T)
000000E4 d705 ADDX.B D5,D3
000000E6 4681 NOT.L D1
000000E8 d441 ADD.W D1,D2
The next "ADD.L" is important.. at the very beginning of the boot the system passed Execbase address in A6 to the bootblock code. A2 points at address $14a after the first decrypter and tada.. that's also the index to memlist structure in the Execbase for autoconfigured memory.
Furthermore, D3 was $000000ff after the first decrypter and adding D5 to it gives us $92 (note, X-flag was 1 when the "ADDX.B" was executed), which is the decrypter #2 loop count.
000000EA ddca ADDA.L A2,A6
000000EC b39a EOR.L D1,(A2)+
000000EE 51cb fffc DBF .W D3,#$fffc == $000000ec (F)
000000F2 4042 NEGX.W D2
After the second decrypter follows (a slightly bugged as reported by Ross @EAB ;-) memlist code that finds out the autoconfigured and ranger RAM. We will use 512K as a RAM disk during the game.
000000F4 9cc6 SUBA.W D6,A6
000000F6 4846 SWAP.W D6
000000F8 2c56 MOVEA.L (A6),A6
000000FA 2816 MOVE.L (A6),D4
000000FC 6718 BEQ.B #$00000018 == $00000116 (F)
000000FE 282e 0014 MOVE.L (A6, $0014) == $000008d6,D4
00000102 2a2e 0018 MOVE.L (A6, $0018) == $000008da,D5
00000106 b886 CMP.L D6,D4
00000108 6402 BCC.B #$00000002 == $0000
0000010A 2806 MOVE.L D6,D4
0000010C 4244 CLR.W D4
0000010E da83 ADD.L D3,D5
00000110 4245 CLR.W D5
00000112 9a84 SUB.L D4,D5
00000114 6fe2 BLE.B #$ffffffe2 == $000000f8 (F)
We are done with decrypting and memlist stuff now. Time to set stacks and exist the TVD by clearing the trace bit in the status register and after setting the SSP to $300 returning to user mode.
00000116 46fc 2700 MV2SR.W #$2700
0000011A 4ff8 0300 LEA.L $00000300,A7
0000011E 46c6 MV2SR.W D6
00000120 4ff8 0400 LEA.L $00000400,A7
The last tweaks of obfuscation before calling the file loader.. D3 is $0000ffff and the code below will make D6 to $00080080, which we use to stop the copper DMA and disable the "PORTS" Level 2 IRQ. Finally we store the found memory location & size of RAM disk in D4 and D5 into addresses $8c and $90. Those locations are used by the main game engine.
00000124 ea0b LSR.B #$00000005,D3
00000126 07c6 BSET.L D3,D6
00000128 3946 66f6 MOVE.W D6,(A4, $66f6) == $00dff096
0000012C 2946 66fa MOVE.L D6,(A4, $66fa) == $00dff09a
00000130 397c 9500 66fe MOVE.W #$9500,(A4, $66fe) == $00dff09e
00000136 397c 4489 66de MOVE.W #$4489,(A4, $66de)
0000013C 3446 MOVEA.W D6,A2
0000013E 48d2 003e MOVEM.L D1-D5,(A2)
And call the FFS file loader, load a file called "DOS" (which is the second stage loader and the main game engine) into address $5000 and execute it from there. I won't go through the loader. It is at this point uninteresting. Just a quick note that the loader is tailored to load a file named "DOS" i.e. hash functions to locate specific structures in FFS are precalculated etc. The reason for the imaginary naming of the second stage loader is that originally I intended to use the DOS Type at address $0 as the file name (i.e. the string "DOS\0").
00000148 610e BSR.B #$0000000e == $00000158
0000014A 6604 BNE.B #$00000004 == $00000150 (T)
0000014C 4ef8 5000 JMP $00005000
00000150 396c 6666 67e0 MOVE.W (A4, $6666) == $00dff006,(A4, $67e0) == $00dff180
00000156 60f8 BT .B #$fffffff8 == $00000150 (T)
Phew.. that was it. Again, useless stuff but it was worth the journey for me ;-)