Wednesday, July 21, 2021

Smarty and the Nasty Gluttons - Part 4 (bootblock for the online ADF game)

Preface

Continuing from where we left off in the post #3 on Smarty and the Nasty Gluttons.. this post is about the floppy disk bootblock used in the online ADF version of the game. 


The online ADF game bootblock

I had never coded a TVD (trace vector decoder) or even cracked one. Also my experience using "weird stuff" one can do with mc68000 prefetch, use of instructions like STOP and RESET were very thin. Demo coding had little use for those and my demos hardly had anything worth protecting from someone to take a peek.. Anyway, another field that was yet to be explored and I knew(!) there are folks that find such stuff "a nice surprise" in a game. So, the opportunity to code and release "a protection scheme" or an attempt of such came when we got closer releasing the Smarty and the Nasty Gluttons. I kept saying to my fellow game developers that there are folks who hardly ever will play the  game but will disasm, train, fix etc the heck out of it. Adding even a small non-obvious "road block" would definitely touch a soft spot for some of them..

I had my plan - a lot did not materialize, though:

  • must have a TVD - a bucket list item
  • must have use of seldom used opcodes like STOP, RESET, RTR, etc
  • must have obfuscated stuff and somehow using Amiga's custom chips/hardware peculiarities
  • must have checksums
  • must have encryption depending on the bootblock that is spread all around the game files
  • must have self-modifying code
  • must have code that makes no sense but is still required..
  • must work on all game dev group Amigas (includes CPUs up to mc68040 - at the end Jope @EAB did huge work testing with many.. many.. Amiga setups)
  • should have AR protection

I started with experimenting with RESET + Memory Overlay. Got it working on my A600 (up to 68020) but never on my A3000 (with 68040). There was even a short discussion started by me about it in EAB. Sadly, I had to drop this. It would have been so cool just to have it ;-)

Next in testing was the mc68000 prefetch experimenting with self-modifying code. It turned out to work nicely. Then I targeted STOP instruction and releasing it with a Copper generated IRQ. Again turned out to work nicely.

AR protection I thought a bit but the only "good" mechanism I knew of was using odd address A7 and that did not really fit to the rest of the plan. I needed to use stack and IRQs to work. Also, I had no AR hardware to use for testing. Found one on the web but sadly I got no Amiga model that it required. Another item to drop..

Then came the TVD.. I have now huge respect to folks in 1980s and early 1990s implementing and cracking these weirdo TVD protections with tools back then.. I mean, I coded/tested/debugged the one and only in Smarty on a REAL HARDWARE and getting that run was an absolute nightmare ;-) Once I finally managed to get the execution path recorded and a mini sized emulator written to do the trace vector encoding part I never changed that anymore. Remember, this was my first ever TVD and had no experience of cracking a TVD. Obviously I had looked at few in past and read the excellent article found in Flashtro about "Basic TVD Cracking" (kudos to WayneKerr).. still, I consider myself a total newbie on anything cracking & protecting related foo.

Anyway, at some point of time we just ran out of time (no pun intended for a game being under development since early 1990s), which meant I dropped checksums, blitter decoding of stuff, copper orchestrated stuff (e.g., blitter) and actually anything that would have required modifications into the actual game code. We were play testing, fixing bugs and fine tuning so much that all this checksum crap etc just did not make it into the plan anymore. I really regret it but.. it was more important to get the game out. That's the reason the "protection attempt" is only on the bootblock and nothing in the actual game is protected. Sorry.

Let's start looking into the actual code.. against my usual principles the code snippets etc are taken from FS-UAE. Stepping through all this weird stuff is just too easy with it. Taking screen captures using Amiga and transferring them to a modern social-media-enabled platform is a bit tedious.. and I am currently slightly short of soothing liqueurs to ease the work as well. And don't laugh.. while the stuff here is trivial'ish I really, really, enjoyed the journey except for some parts of  the TVD debugging.

I have highlighted few important areas in the bootblock hexdump. The hexdump in red with yellow background is the copper list, which also is runnable mc68000 code. Actually, our copper list is the privilege violation exception handler.

The hexdump in black and yellow background is a stop mark for the loop relocating the bootblock into low memory and the following word in red background is the initial USP pointer. The hexdump in green background is the reminder of the non-system cache killing code (remember what I said about the TVD code execution path recording..) These three mentioned here are all related in the bootblock code.

00005C40 444F 5300 605A 247E 7FFF 7F7F 4BFA 001A  DOS.`Z$~....K...
00005C50 43FA FFF2 4CD1 0030 740A 95CA 264F 4EAE  C...L..0t...&ON.
00005C60 FFE2 58AF 0002 4E73 49F9 00DF 89A0 2945  ..X...NsI.....)E
00005C70 66FA 13C5 00BF ED01 429A 9BC9 24D9 66FC  f.......B...$.f.
00005C80 21E3 0010 3659 4E63 7000 7200 4E7B 0801  !...6YNcp.r.N{..
00005C90 4E7A 1002 4E91 0695 FFFD B203 06A5 001D  Nz..N...........
00005CA0 A7E1 3F0D 2F15 3F02 0000 006C 2978 0068  ..?./.?....l)x.h
00005CB0 66FA 4E71 4E73 4C97 0301 4C90 0801 B153  f.NqNsL...L....S
00005CC0 D25D 4841 3009 B540 4890 0201 B151 4E73  .]HA0..@H....QNs
00005CD0 4C97 0302 4841 0342 D29D 0542 0C83 00FE  L...HA.B...B....
00005CE0 009C B399 4E77 00FE 0180 0228 FFFF FFFE  ....Nw.....(....
00005CF0 0D04 00B0 66E0 397C 8290 66F6 06AF C372  ....f.9|..f....r
00005D00 4576 FFF2 760A 7A92 66FA 7C08 247C 0000  Ev..v.z.f.|.$|..
00005D10 00B0 4E62 A166 4E71 4C90 66EA 2768 47A0  ..Nb.fNqL.f.'hG.
........ quite a bit of encrypted code follows..
00005FC0 24EC 58B4 75BE 4D40 D8C7 5CF2 CEC6 BF51  $.X.u.M@..\....Q
00005FD0 7AC0 4A25 DA0D 0000 0000 5000 0000 0000  z.J%......P.....
00005FE0 00CC 4A81 6A02 F4F8 4E7B 0002 4E7B 0808  ..J.j...N{..N{..
00005FF0 4E75 434F 4F50 4552 2F43 4F4F 5045 522F  NuCOOPER/COOPER/
00006000 84A2 A262 62F2 C204 A2B2 F2CA 04A2 D292  ...bb...........
00006010 3204 2232 AAF2 EA04 9204 2272 8204 CA4A  2."2......"r...J
00006020 82A2 9A04 AC4C 044A A22A 6282 042A A2A2  .....L.J.*b..*..
00006030 B204 2292 2204 A2EA A5AD B007 C0DE DBAD  .."."...........

The bootblock code starts here.. The D2 is initialized to $10 as we need it for code decrypting. A5 points at the code to be run in Supervisor mode. A1 points at bootblock after the DOS Type.

00005C4C 4bfa 001a           LEA.L (PC,$001a) == $00005c68,A5
00005C50 43fa fff2           LEA.L (PC,$fff2) == $00005c44,A1
00005C54 4cd1 0030           MOVEM.L (A1),D4-D5
00005C58 740a                MOVE.L #$0000000a,D2
00005C5A 95ca                SUBA.L A2,A2
00005C5C 264f                MOVEA.L A7,A3
00005C5E 4eae ffe2           JSR (A6, -$001e) == $00000658

This is the Illegal Instruction exception handler used by the non-system cache killing code to skip the 4 bytes long offending instruction. We saved the current USP to A3 to later pop the JSR return address (i.e., $5c62) as the exception address and put that into address $10. The neat thing here is that if the CPU is mc68000 the write really goes into the exception vector at address $10. On better CPUs the exception vector may be relocated using VBR into what ever address, however, the code we have for killing cache etc do not trigger illegal instruction exception either.. 

00005C62 58af 0002           ADD.L #$00000004,(A7, $0002) == $000018b4
00005C66 4e73                RTE 

Take note here.. A4 is loaded with a $df89a0, which we use to refer to hardware registers with an offset $6660. The value is selected purposely and later we will see why. Here these three lines disables all relevant interrupts. 

00005C68 49f9 00df 89a0      LEA.L $00df89a0,A4
00005C6E 2945 66fa           MOVE.L D5,(A4, $66fa) == $00dff09a
00005C72 13c5 00bf ed01      MOVE.B D5,$00bfed01

Relocate the bootblock code starting from $5c44 i.e., just after the bootblock DOS Type to memory address $4. We clear the bytes at $0 to $3 as that serves as an initial context for the TVD. Take a note of "SUB.L A1,A5", which will init A5 to $24 i.e., the address for the Trace exception vector. In the non-relocated code A5 would point at $5c64. The copying continues until the first 4 bytes aligned 0 long word shows up in the bootblock. 

00005C78 429a                CLR.L (A2)+
00005C7A 9bc9                SUBA.L A1,A5
00005C7C 24d9                MOVE.L (A1)+,(A2)+
00005C7E 66fc                BNE.B #$fffffffc == $00005c7c (F)
00005C80 21e3 0010           MOVE.L -(A3),$00000010

Get the initial USP ($cc) and now A1 points at the reminder of the non-system cache killing code.

00005C84 3659                MOVEA.W (A1)+,A3
00005C86 4e63                MVR2USP.L A3
00005C88 7000                MOVE.L #$00000000,D0
00005C8A 7200                MOVE.L #$00000000,D1
00005C8C 4e7b 0801           [ MOVEC D0,VBR ]
00005C8E 0801 4e7a           BTST.L #$4e7a,D1
00005C92 1002                MOVE.B D2,D0
00005C94 4e91                JSR (A1)

The following two lines modify the "relocated code" (we are still running non-relocated code here) at addresses $20 (equals to $5c60) and $24 (equals to $5c64) generating the privilege violation and trace exception handler addresses.

00005C96 0695 fffd b203      ADD.L #$fffdb203,(A5)
00005C9C 06a5 001d a7e1      ADD.L #$001da7e1,-(A5)

$00000020 will contain $00000090 i.e., code also located at $5cd0.
$00000024 will contain $00000076 i.e., code also located at $5cb6.

Next three instructions create a stack frame Format 0 that looks like we were returning from a privilege violation exception handler. The D2 here will cause setting the X-flag upon return and move CPU to user mode (and no trace bit set). Code execution will resume at the relocated address $90 i.e. the privilege violation exception handler once we execute the RTE instruction a bit later. A detail there is that the exception handler will then be called in user mode not in supervisor and this has a significance when it comes to selection of stack pointers.

00005CA2 3f0d                MOVE.W A5,-(A7)
00005CA4 2f15                MOVE.L (A5),-(A7)
00005CA6 3f02                MOVE.W D2,-(A7)

In the relocated bootblock code $5ca8 is at address $68 i.e., Level 2 exception vector. It contains value $6c meaning the code for the exception handler is at $6c. The same value is also adequate to clear the pending IRQs triggered the Level 2 IRQ. We write the entire content of $68 i.e., $0000006c into $dff09a but only the bits for $dff09c take effect.

00005CA8 0000 006c           OR.B #$6c,D0
00005CAC 2978 0068 66fa      MOVE.L $00000068,(A4, $66fa) == $0000c33a
00005CB2 4e71                NOP 
00005CB4 4e73                RTE
 

The RTE causes a jump to address $90. The stack used now is USP, which points at $cc. Note the loading of D1/A0/A1 with values from stack and later the "EOR.L D1,(A1)+" to modify the instruction that caused the exception just before exiting the handler. Although in this case below we have a fabricated stack frame thus we modify a different place in memory.. see below.

Also, the "ADD.L (A5)+,D1" starts to calculate a checksum over the bootblock code. Note that the "RTR" pops also the CCR but the supervisor portion of the status register is unaffected.

00000090 4c97 0302           MOVEM.W (A7),D1/A0-A1
00000094 4841                SWAP.W D1
00000096 0342                BCHG.L D1,D2
00000098 d29d                ADD.L (A5)+,D1
0000009A 0542                BCHG.L D2,D2
0000009C 0c83 00fe 009c      CMP.L #$00fe009c,D3
000000A2 b399                EOR.L D1,(A1)+
000000A4 4e77                RTR
 

What we have at $cc is a piece of code, which is actually used later to load A2 with a wanted value. The handler loads the value $000000b0 into A1 (remember the sign extending properties of "movem.w" and loading a word into an address register in general).  Address $b0 is also the address where code execution continues after the "RTR".

000000CC 247c 0000 00b0      MOVEA.L #$000000b0,A2

Also, the above handler will now modify code at address $b0, which originally looks like:

000000B0 0d04                BTST.L D6,D4
000000B2 00b0 66e0 397c 8290 OR.L #$66e0397c,(A0, A0.W*2, $ffffff90)
000000BA 66f6                BNE.B #$fffffff6 == $000000b2 (F)

After that "EOR.L D1,(A1)+" the code changes to:

000000B0 2978 0020 66e0      MOVE.L $00000020,(A4, $66e0) == $00dff080
000000B6 397c 8290 66f6      MOVE.W #$8290,(A4, $66f6) == $00dff096

When the above code is executed the privilege violation exception handler address is used as the COP1 address and the relevant DMAs are started. The privilege violation exception handler at $90 is also a valid copper list. The significant piece there is the triggering of a Level 2 IRQ (write of $b399 to $dff09c).  This will be useful later.

00000090: 4c97 0302        VP 4c, VE 03; HP 96, HE 02; BFD 0
00000094: 4841 0342        ; VP 48, VE 03; HP 40, HE 42; BFD 0
00000098: d29d 0542        ; VP d2, VE 05; HP 9c, HE 42; BFD 0
0000009c: 0c83 00fe        ; VP 0c, VE 00; HP 82, HE fe; BFD 0
000000a0: 009c b399        ; INTREQ := 0xb399
000000a4: 4e77 00fe        ; VP 4e, VE 00; HP 76, HE fe; BFD 0
000000a8: 0180 0228        ; COLOR00 := 0x0228
000000ac: ffff fffe        ; VP ff, VE 7f; HP fe, HE fe; BFD 1
                           ; End of Copperlist

The next four lines of funky code play around with the prefetch of the mc680x0. After the recent "RTR" the USP points at $d2. The "ADD.L" here will therefore change the code at address $c4 i.e. the code immediately following the "ADD.L".

000000BC 06af c372 4576 fff2 ADD.L #$c3724576,(A7, -$000e) == $000000c4
000000C4 760a                MOVE.L #$0000000a,D3
000000C6 7a92                MOVE.L #$ffffff92,D5
000000C8 66fa                BNE.B #$fffffffa == $000000c4 (F)
000000CA 7c08                MOVE.L #$00000008,D6
000000CC 247c 0000 00b0      MOVEA.L #$000000b0,A2

However, due to prefetch the CPU has already loaded the "old code", which sets registers D3 and D5. The "BNE.B" above causes looping back to just modified code at $c4. Now the selected offset for A4 register makes sense. It was the "BNE.B" instruction opcode. The code essentially initializes registers for the TVD and enables a "PORTS" Level 2 IRQ (that will be triggered from the copper list).

000000C4 397c c008 66fa      MOVE.W #$c008,(A4, $66fa) == $00dff09a
000000CA 7c08                MOVE.L #$00000008,D6
000000CC 247c 0000 00b0      MOVEA.L #$000000b0,A2

The following instruction looks legitimate and actually is. However, it will trigger a privilege violation exception and not really change the USP at all.. but having $b0 in A2 is important later.

000000D2 4e62                MVR2USP.L A2
000000D4 a166                ILLEGAL
000000D6 4e71                NOP

Once the privilege violation handler returns back to $d2 the code has changed a bit and remember that we return from the handler using "RTR" so we remain in supervisor mode..

000000D2 4e72 a110           STOP #$a110
000000D6 4e71                NOP

..and the CPU stops until an IRQ takes place with priority/level higher than 1. This means we just sit here until the copper triggers next Level 2 IRQ. This code really has no particular useful meaning but it was just something I had to have to justify the copper list in a code thingy ;-) Well, actually, the "STOP" writes the status register and here it will keep the CPU in supervisor and enable tracing! The next instruction ("NOP") will then kick-off our TVD..

The trace exception handler is located at address $76. Few notes.. The bootblock is always in the lower 32K of RAM, thus we can me use of 16 bit addressing when assigning values into address registers. For example the first "MOVEM.W" will always load A0 with address $0 and in that location we maintain the TVD context for previously decrypted instruction. We want to decrypt the next instruction but also re-encrypt what we already executed. The word at $0 is the "previous XOR key" and the word at $2 is the "previously decrypted instruction address". Initially these both were initialized to zero.

Here D0 is not used for anything. A0 is the TVD context address i.e. $0 and A1 is the address of the next instruction to execute. D1 will accumulate the checksum over the bootblock and D2 is "salt" that gets modified here and there in the decrypted code while TVD runs. 

00000076 4c97 0301           MOVEM.W (A7),D0/A0-A1

Re-encrypt the previously executed instruction.

0000007A 4c90 0801           MOVEM.W (A0),D0/A3
0000007E b153                EOR.W D0,(A3)

Update the code checksum. Note that we changed to ".W" and this is to avoid the checksum calculation pointer (A5) reaching the code we decrypt with non-TVD decrypters.. those decrypters are protected by the TVD.. uhh.. I was lazy as the crossing point would have required some extra care to work properly. Most of my TVD stuff was trial'n'error and at this point the error rate started to get too high.

00000080 d25d                ADD.W (A5)+,D1
00000082 4841                SWAP.W D1

The actual "decryption XOR key" is a mix of the D2 ("salt") and the address of the next instruction.. before decrypting the next instruction both the calculated "decryption XOR key" and the instruction address get stored into the TVD context location.  This lame'ish key algorithm was selected because we needed to allow encrypting and decrypting code that has conditional branches. So, if the D2 ("salt") remains constant within a code block that branches around we can deterministically calculate the "decryption XOR key". As can be seen later the D2 gets modified only in carefully selected spots..

00000084 3009                MOVE.W A1,D0
00000086 b540                EOR.W D2,D0
00000088 4890 0201           MOVEM.W D0/A1,(A0)
0000008C b151                EOR.W D0,(A1)
0000008E 4e73                RTE
 

The following code is what gets decrypted by the TVD. The actual execution path varies depending on your Amiga's memory configuration as the memlist check for autoconfigured memory is also part of the code that is protected with the TVD. At the beginning D3 is $0000000a, D5 is $ffffff92 and A2 is $000000b0.

000000D8 4042                NEGX.W D2
000000DA 95c5                SUBA.L D5,A2

A2 points at $11e now, which is the start of the "normal encrypted code" outside the TVD protected code. The TVD protected code contains two decryption loops to decrypt the minimal FFS file loader etc. The one below is the decrypter #1 and it uses the previous code checksum as the decryption key.

000000DC d441                ADD.W D1,D2
000000DE b39a                EOR.L D1,(A2)+
000000E0 5303                SUB.B #$00000001,D3
000000E2 6afa                BPL.B #$fffffffa == $000000de (T)
000000E4 d705                ADDX.B D5,D3
000000E6 4681                NOT.L D1
000000E8 d441                ADD.W D1,D2

The next "ADD.L" is important.. at the very beginning of the boot the system passed Execbase address in A6 to the bootblock code. A2 points at address $14a after the first decrypter and tada.. that's also the index to memlist structure in the Execbase for autoconfigured memory.

Furthermore, D3 was $000000ff after the first decrypter and adding D5 to it gives us $92 (note, X-flag was 1 when the "ADDX.B" was executed), which is the decrypter #2 loop count.

000000EA ddca                ADDA.L A2,A6
000000EC b39a                EOR.L D1,(A2)+
000000EE 51cb fffc           DBF .W D3,#$fffc == $000000ec (F)
000000F2 4042                NEGX.W D2

After the second decrypter follows (a slightly bugged as reported by Ross @EAB ;-) memlist code that finds out the autoconfigured and ranger RAM. We will use 512K as a RAM disk during the game.

000000F4 9cc6                SUBA.W D6,A6
000000F6 4846                SWAP.W D6
000000F8 2c56                MOVEA.L (A6),A6
000000FA 2816                MOVE.L (A6),D4
000000FC 6718                BEQ.B #$00000018 == $00000116 (F)
000000FE 282e 0014           MOVE.L (A6, $0014) == $000008d6,D4
00000102 2a2e 0018           MOVE.L (A6, $0018) == $000008da,D5
00000106 b886                CMP.L D6,D4
00000108 6402                BCC.B #$00000002 == $0000
0000010A 2806                MOVE.L D6,D4
0000010C 4244                CLR.W D4
0000010E da83                ADD.L D3,D5
00000110 4245                CLR.W D5
00000112 9a84                SUB.L D4,D5
00000114 6fe2                BLE.B #$ffffffe2 == $000000f8 (F)

We are done with decrypting and memlist stuff now. Time to set stacks and exist the TVD by clearing the trace bit in the status register and after setting the SSP to $300 returning to user mode.

00000116 46fc 2700           MV2SR.W #$2700
0000011A 4ff8 0300           LEA.L $00000300,A7
0000011E 46c6                MV2SR.W D6
00000120 4ff8 0400           LEA.L $00000400,A7

The last tweaks of obfuscation before calling the file loader.. D3 is $0000ffff and the code below will make D6 to $00080080, which we use to stop the copper DMA and disable the "PORTS" Level 2 IRQ. Finally we store the found memory location & size of RAM disk in D4 and D5 into addresses $8c and $90. Those locations are used by the main game engine.

00000124 ea0b                LSR.B #$00000005,D3
00000126 07c6                BSET.L D3,D6
00000128 3946 66f6           MOVE.W D6,(A4, $66f6) == $00dff096
0000012C 2946 66fa           MOVE.L D6,(A4, $66fa) == $00dff09a
00000130 397c 9500 66fe      MOVE.W #$9500,(A4, $66fe) == $00dff09e
00000136 397c 4489 66de      MOVE.W #$4489,(A4, $66de)
0000013C 3446                MOVEA.W D6,A2
0000013E 48d2 003e           MOVEM.L D1-D5,(A2)

And call the FFS file loader, load a file called "DOS" (which is the second stage loader and the main game engine) into address $5000 and execute it from there. I won't go through the loader. It is at this point uninteresting. Just a quick note that the loader is tailored to load a file named "DOS" i.e. hash functions to locate specific structures in FFS are precalculated etc. The reason for the imaginary naming of the second stage loader is that originally I intended to use the DOS Type at address $0 as the file name (i.e. the string "DOS\0").

00000148 610e                BSR.B #$0000000e == $00000158
0000014A 6604                BNE.B #$00000004 == $00000150 (T)
0000014C 4ef8 5000           JMP $00005000
00000150 396c 6666 67e0      MOVE.W (A4, $6666) == $00dff006,(A4, $67e0) == $00dff180
00000156 60f8                BT .B #$fffffff8 == $00000150 (T)

Phew.. that was it. Again, useless stuff but it was worth the journey for me ;-)

No comments:

Post a Comment

Blitter c2p for a 16 colour rotozoomer

Preface I released a simple rotozoomer in a "Lure of the Temptress" (see the  Pouet link ) crack intro for Flashtro . The original...