Sunday, July 18, 2021

Smarty and the Nasty Gluttons - Part 3 (bootblock for the boxed game)

Preface

Continuing from where we left off in the post #2 on Smarty and the Nasty Gluttons.. this post is about the floppy disk bootblock used in the boxed version of the game. There will be a follow-up post going through the online ADF version bootblock once I get around to do it. It is a bit involved. The latter bootblock contains encrypted data and a protection. We received reports from folks using FPGA based Amiga emulators that the game does not boot.. well.. the ADF version bootblock requires exact emulation of the mc680x0 hardware and obviously those FPGA emulators do not have such ;-) The boxed game bootblock is not encrypted or obfuscated in any way.


The boxed game bootblock

This post contains a lot of source code as it is easier to explain certain things directly from the code. Since the whole release process of a boxed Amiga game in 2020s is an "act of love" or a serious hobby at best the fine line between what makes sense and what not is meaningless. Therefore, the bootblock contains geek things just  for "because we can" and "I had to try it" reasons.

In comparison to the ADF version bootblock the boxed game bootblock does not contain a minimal FFS file loader for loading the second stage file loader program. Instead, the floppy disk layout is mastered (by hand) in a way that the all data sectors of the Amiga DOS file containing the second stage loader start immediately after the bootblock sectors. So when the bootblock code rereads itself from the floppy (using trackdisk.device) it purposely reads past the bootblock to also get the secondary loader compressed file into the memory. The bootblock has the S405 decompressor in it so we can now decompress the second stage FFS file loader and jump into it.

The other geek or useful things added to the bootblock include:

  • Booting and playing the game from any disk drive, not just DF0:. And furthermore, this also works on any Kickstart, not just on 2.x or newer. There's a hack'ish example code for that.
  • The bootblock content starts from offset 4, not the standard 12. This involves generating such bootblock content that the checksum is exactly what we want it to be ;-) There's an example code how to generate a bootblock checksum with a  predefined content.
  • Using code embedded into perfectly readable ASCII text messages.


Generating bootblock checksums with a desired content

To fully understand the checksum code below look at the bootblock source itself. It is later in this post. The main idea here is that we first calculate the normal bootblock checksum and subtract the value of the "desired checksum value" from that. The difference then is placed somewhere in the bootblock and then recalculate the checksum. Now the checksum becomes to the value we wanted it to be. I must thank Ross @EAB getting me to think about this. His superb one disk trainer (which, btw has a superb trainer intro as well ;-) for the Smarty and the Nasty Glutton had this thing. That man is full of crazy ideas!

The code below does the magic. It assumes the "desired checksum" is already in the place of the checksum field of the bootblock.

section boot,code_c
j:      ; "sta" is the start of the bootblock code
lea sta+4(pc),a1
move.l (a1),d2
move.l d2,d3
clr.l (a1)
not.l d2
bsr.b calcBootCRC
sub.l d0,d2
bcc.b .ok
subq.l #1,d2
.ok:
        move.l d2,crc
bsr.b calcBootCRC
not.l d0
move.l d0,d2
        ; This checksum calculation is for verification purposes
bsr.b calcBootCRC
not.l d0
move.l d2,(a1)
rts
calcBootCRC:
lea sta(pc),a0
moveq #0,d0
move.w #(1024/4)-1,d1
.loop:
        add.l (a0)+,d0
bcc.b .skip
addq.l #1,d0
.skip:
dbf d1,.loop
rts


The boxed game bootblock source in full

The following lengthy piece of code is the actual boxed game version bootblock. I have put some comments in between here and there. First the defines.. I am still stuck at old Seka times doing assembly coding... sorry. If there are bugs in the bootblock code.. oh boy.. this code is already out in the wild as part of the physical game disk of the boxed version.

;-----------------------------------------------------------------
;
LVOFindResident equ -96
LVOSupervisor equ -30
LVODoIO equ -456
LVOAllocMem equ -198
LVOAllocAbs equ -204
LVOOpenDevice equ -444
LVOCloseDevice equ -450
LVODisable equ -120
IOSTD_SIZE equ 56
IO_UNIT equ 24
MP_SIZE equ 34
MN_REPLYPORT equ 14
MN_LENGTH equ 18
LN_TYPE equ 8
NT_REPLYMSG equ 7
MH_UPPER equ 24
MH_LOWER equ 20
MAXLOCMEM equ 62
LOADSIZE equ (18+2)*512

The bootblock itself. Note that the copyright message starts immediately after the DOS Type at offset 4. You need the above checksum calculation code to fabricate "*Sma" as the bootblock checksum. Furthermore, as you can see there is no "visible" code at the offset 12, instead the ASCII "and the.." starts from there. The geek thing is that "an" is $616e in hex, which translates to a "BSR.B go" in the bootblock binary. Since "bsr" pushes the return address into the stack we pop it later and use that address as a base address for variables.  

sta: dc.b "DOS",0
dc.b "*Smarty and the Nasty Gluttons*"
dc.b " (c) 2020 Eero Tunkelo."
dc.b " HardCopy release (c) 2020 by Bitmap Soft."
crc: dc.l 0    ; used for biasing the real checksum
tdname: dc.b "trackdisk.device",0
dc.b 'SCX'
; A4=area for variables
go: move.l (sp)+,a4
move.l a1,a2 ; save IORequest

Next we reread the bootblock into the CHIPRAM address $30000. This address is "high enough" in memory not to mess with system stuff and "low enough" to be available in Amigas with just 256K of CHIPRAM (I know this is ridiculous but our friend Ross insisted on stuff like this and those non-existent 256K Fast RAM expansions). The LOADSIZE is way more that 2 sectors needed for the bootblock - remember the trick to load the second stage file loader code at this step.

; Read start program into CHIP..
lea $30000,a3
move.w #2,$1c(a2)     ; io_Command = CMD_READ
move.l #LOADSIZE,$24(a2)   ; io_length
move.l a3,$28(a2)     ; io_Data
clr.l $2c(a2)     ; io_Offset
move.l a2,a1
jsr LVODoIo(a6)
tst.l d0
bne.b errorExit

The next routine finds out which disk drive was used to boot the game disk. This code is contributed by one EAB thread and specifically Stingray (greetings fellow Scoopexian). The basic idea is to iterate through all device units and find which device has the same IO_UNIT as the IORequest address passed to the bootblock. This is somewhat hack'ish but seems to work just fine. 

findBootDrive:
; Find boot drive - from EAB & Stingray
moveq #IOSTD_SIZE+MP_SIZE-1,d6
.clear:
clr.b 0(a4,d6.w)
dbf d6,.clear
;
moveq #4-1,d6         ; device number & count
move.l IO_UNIT(a2),d7
lea IOSTD_SIZE(a4),a5 ; A5 = Message
move.l a5,MN_REPLYPORT(a4)     ; IOExtReq.io_Message.mn_ReplyPort
;move.w #IOSTD_SIZE,MN_LENGTH(a4)
;move.b #NT_REPLYMSG,LN_TYPE(a4)
.findLoop:
move.l d6,d0 ; Unit
moveq #0,d1 ; Flags
lea tdname(pc),a0 ;
move.l a4,a1 ; A4 = IOStdReq
jsr LVOOpenDevice(a6)
tst.l d0
bne.b .noDevice
move.l IO_UNIT(a4),d5
move.l a4,a1
jsr LVOCloseDevice(a6)
cmp.l d5,d7
.noDevice:
dbeq d6,.findLoop
tst.w d6
bpl.b bootDriveFound
errorExit:
moveq #-1,d0
rts

Now we found the boot drive. The next code does the cache disable stuff. Again, a variation and result of the codes found in relevant EAB threads. Kudos to Keir and Ross with a bit of my own (the illegal instruction handler stuff). We do not use the typical (and what would make sense) CacheControl() provided by Kickstart 2.x and newer. Instead, we have a code that works (has been tested) on any Kickstart and 680x0. I bet there is an accelerator out there which will cause this code to break due some magic done in the accelerator firmware... Anyway, I had a perfectly good reason for what I did. The main game developer Sami Karjalainen has Amiga 2000 (Kickstart ROM 1.3) with GVP 68030 turbo. So when the game is booted up for the very first time Sami's A2000 has Kickstart 1.3 and 68030. Obviously the game had to work on his machine as well.

Furthermore, his GVP was a source of additional headache.  It generated spurious Level 7 IRQs, which we did not count for.. it took quite a bit of debugging to find this out. Once I figured it all the numerous threads in EAB came immediately as a backflash about the very same topic. Sigh..  

bootDriveFound:
; Turn disk motor off
move.l a2,a1
move.w #9,$1c(a1)
clr.l $24(a1)
jsr LVODoIo(a6)
jsr LVODisable(a6)
; Cache code
lea disableCaches(pc),a5
move.l sp,a2
jsr LVOSupervisor(a6)
; Illegal instruction handler
addq.l #4,2(sp)
rte
disableCaches:
move.l -(a2),$10.w    ; JSR return address is the handler
moveq #0,d0                   ;
moveq #0,d1
dc.l $4e7b0801 ; movec d0,vbr
dc.l $4e7a1002 ; movec cacr,d1
tst.l d1
bpl.b .skip
dc.w $f478 ; cpusha dc
.skip: dc.l $4e7b0002 ; movec d0,cacr
dc.l $4e7b0808 ; movec d0,pcr
; Jump to code in fixed address..
jmp findMemory-sta(a3)

At this point we have all caches etc fancy features turned off and VBR set to 0. We are also executing the code from the fixed address somewhere around that $30000.

I have a track record embedding bugs into my memlist code finding out all autoconfiguring + ranger memory. The original Smarty and the Nasty Gluttons code from 1990s had a bug, the "cracked" game preview release had a different bug and finally the ADF version of the game had a bug ;-) I hope I finally got this right.. I presume Ross @EAB (if he ever reads this post) will point out the bugs.. as he did previous times.

; Search exec memlist for 512K RAM disk.. any mem
findMemory:
lea $142(a6),a0 ; memlist
moveq #8,d0
swap d0 ; D0=$80000
moveq #0,d1
subq.w #1,d1 ; D1=$ffff
moveq #0,d4 ; mem found flag
; Check for 256K CHIP
cmp.l MAXLOCMEM(a6),d0
bhi.b .noMemFound
.memLoop:
move.l (a0),a0
tst.l (a0)
beq.b .noMemFound
movem.l MH_LOWER(a0),d2/d3
add.l d1,d3
clr.w d3
sub.l d0,d3
; loop if 256K or 512K of CHIP
ble.b .memLoop
; loop if 256K mem expansion
clr.w d2
cmp.l d2,d3
blo.b .memloop
.done:
move.l d0,d4
.noMemFound:
movem.l d3/d4/d6,$8c.w ; $8C = ptr to min 512K RAM disk
; $90 = amount of RAM disk
; $94 = boot drive

If there is not enough RAM or 256K of CHIPRAM is found the second stage file loader code will inform the player about the fact.

Now we are at the final steps.. the rest of to code relocates both stacks and calls the decompressor. The code execution for the second stage file loader starts at $5000. That's it. For your convenience the S405 decompression routine is also included in full..

; Stacks..
lea $400.w,a7 ; SSP
move.l a7,a2    ; decompressor work space 
move.w d0,sr ; back to user mode
lea $300.w,a7 ; USP
lea $dff096,a0
move.w #$0180,(a0) ; #$0180,$dff096
move.w d0,$180-$96(a0)
        ; decompress and execute..
lea $604(a3),a0
lea $5000.w,a1
;lea ($5000-TABLE_SIZE).w,a2
pea (a1)

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Version: 5d
;
; This file is released into the public domain for commercial
; or non-commercial usage with no restrictions placed upon it.
;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

LUTSIZE         equ 15*8
HALFLUTSIZE     equ LUTSIZE/2
PRESYMS         equ 16
LTMSYMS         equ 512
HGHSYMS         equ 256
LOWSYMS         equ 256
rsreset ; DO NOT REARRANGE
PRETABLE rs.b (LUTSIZE+PRESYMS*2)
LTMTABLE rs.b (LUTSIZE+LTMSYMS*2)
HGHTABLE rs.b (LUTSIZE+HGHSYMS*2)
LOWTABLE rs.b (LUTSIZE+LOWSYMS*2)
OVLTABLE rs.b    256
TABLE_SIZE rs.b 0
MTF equ LOWTABLE+HGHSYMS

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;
; Parameters:
;  A0 - src - a ptr to the compressed data. 
;  A1 - dst - a ptr to the destination memory.
;  A2 = ptr to work area (TABLE_SIZE bytes)
;
; Returns:
;  none
;
; Notes:
;  No checks if source and destination memory areas
;  overlap. In such case a crash is very probable.
;  Runtime RAM usage is 2816 ($b00) bytes, relocatable using A2.
;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

decrunch: addq.w #2,a0
;
; A0 = ptr to crunched data (ID skipped) + 2
; A1 = ptr to destination mem
; A2 = ptr to work area
;
blockLoop: tst.w -(a0)
bne.b .cont
rts
;
; Initialize the bitshifter
.cont: move.l (a0)+,d7
swap d7
moveq #0,d6
;
; A0 = src 
; A1 = dst
; a2 = tmp
; D6 = _bc
; D7 = _bb
;
decodeTrees:
lea     MTF(a2),a3 ; MTF table
move.l  a2,a4 ; A4 = PRETABLE
lea     LUTSIZE+PRESYMS*2(a2),a5 ; leaf depths
;
; Init MTF table M while reading 16 leaf depths.
;
moveq #PRESYMS,d3
moveq #29,d4
moveq #0,d5 ; max leaf depth is 15
.getb3: ;
move.b  d5,(a3)+        ; init MTF
move.l d7,d0
lsr.l d4,d0
move.b  d0,(a5)+
moveq #3,d1
bsr.b getB
addq.w #1,d5
cmp.w d5,d3
bne.b .getb3
; Build pretree..
; D3 = num symbols = 16
; A4 = ptr to PRETABLE
bsr.w buildDecodingTables ; PRETREE

;; Build LTMTABLE
lsl.w #5,d3
; D3 = 512
; A4 = A2+LTMTABLE
;lea     LTMTABLE(a2),a4
bsr.b buildDecodingTree
lsr.w #1,d3
; D3 = 256
; A4 = A2+HGHTABLE
bsr.b buildDecodingTree
;
; D3 = 256
; A4 = A2+LOWTABLE
bsr.b buildDecodingTree
;
; D3 = 256
; D5 = oldLength (PMR)      no init required
; A4 = oldOffset (PMR)      no init required
; A5 = oldOffsetLong (PMR)  no init required
;
mainLoop:
lea LTMTABLE(a2),a3     ; LTMTABLE
bsr.b getSyml
sub.w d3,d2
decodeLiteral: ;
; Symbol > 256 => PMR or match
bgt.b matchFound
;
; Symbol = 256 => End of Block
beq.b blockLoop
;
; Symbol < 256 => Literal
move.b d2,(a1)+
bra.b mainLoop
;
matchFound:
subq.w #1,d2
ble.b copyLoop ; Symbol = 257 => PMR 
                        ; D2 = match_length-1
move.w d2,d5 ; D5 = PMR oldLength
decodeOffset:
lea HGHTABLE(a2),a3
bsr.b getSyml
move.w d2,d4
bne.b .notoldoffsetlong
move.w a5,d4
bra.b oldOffsetLong
.notoldoffsetlong:
bclr #7,d4
beq.b oldOffsetLong
;
lsl.w #8,d4
;
lea LOWTABLE(a2),a3
bsr.b getSyml
move.b d2,d4
;
move.w d4,a5     ; A5 = PMR oldOffsetLong
oldOffsetLong: move.w d4,a4     ; A4 = PMR oldOffset
copyLoop: move.w d5,d0     ; D5 = PMR oldLength
;
; D3 = matchLength-1
; D4 = offset
move.l a1,a3
sub.l a4,a3
;
.copy: move.b (a3)+,(a1)+
dbf d0,.copy
bra.b mainLoop
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;
; Description:
;  Decodes the next symbol from the given huffman tree.
;
; Parameters:
;  a3 = ptr to (huffman) tables..
;
; Returns:
;  D1 = cnt (num bits needed for symbol..)
;  D2 = symbol
;
; Trashes:
;  D0,a3
;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
getSyml:
.lut: cmp.l (a3)+,d7
bhi.b .lut
;
movem.w HALFLUTSIZE-4(a3),d0/d1
; D0 = base index
; D1 = number of bits to extract
move.l d7,d2
clr.w d2
rol.l d1,d2
sub.w d0,d2
add.w d2,d2
move.w  -4(a3,d2.w),d2
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Description:
;  Extract n bits from the compressed data stream.
;
; Parameters:
;  D1.w = num_bits_to_extract
;
; Returns:
;  Nothing.
;
; Trashes:
;  flags,D1
;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
getB: cmp.w d1,d6
bge.b .getb
lsl.l d6,d7
move.w (a0)+,d7
sub.w d6,d1
moveq #16,d6
.getb: sub.w d1,d6
lsl.l d1,d7
return: rts

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;
; Description:
;  Build huffman tree & decoding structures. Read symbols
;  from the compressed stream and then does inverse MTF
;  symbols. Used only to decode/build the pretree.
;
; Parameters:
;  D3 = num symbold
;  A4 = ptr to the tree
;
; Returns:
;  D3 = num of symbols
;  A4 = ptr to dest table
;
; Trashes:
;  D0,D1,D2,D4,D5,A3,A4,A5
;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

buildDecodingTree:
move.w d3,d0
add.w d0,d0
lea LUTSIZE(a4,d0.w),a5
;moveq #0,d5
move.w d3,d5
.loop: move.l  a2,a3 ; PRETABLE
bsr.b getSyml ; trashes a3
; inverse MTF
lea MTF(a2),a3
add.w d2,a3
move.w d2,d0
move.b (a3),d2
bra.b .mtf1
;
.mtf: move.b -(a3),1(a3)
.mtf1: dbf d0,.mtf
move.b d2,(a3)
move.b d2,(a5)+
;addq.w #1,d5
;cmp.w d5,d3
subq.w #1,d5
bne.b .loop
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;
; Description:
;  Build huffman tree & decoding structures. Assume that all
;  symbols are already loaded into memory.
;
; Parameters:
;  D3 = num symbols
;  A4 = ptr to the tree table
;
; Returns:
;  D3 = num of symbols
;  A4 = ptr to next table
;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; 
buildDecodingTables:
move.l  a4,a5
moveq   #HALFLUTSIZE/4,d0
; clear LUTTABLE
.clrLoop:
clr.l   (a5)+
subq.w #1,d0
bne.b .clrLoop
;
move.w  d3,d1
add.w   d1,d1
add.w d3,d1
lea     LUTSIZE(a4,d1.w),a3
; Read depths and count occurrences of each depth
; A3 = leaf depths
; A4 = tree/lut
move.w   d3,d2
;moveq #0,d0 ; D0=0
.countLoop:
move.b -(a3),d0
beq.b .zeroDepth
add.b d0,d0
add.b d0,d0
addq.w #1,2-4(a4,d0.w)       ; Count or Index
.zeroDepth: subq.w #1,d2
bne.b .countLoop
        ; count prefix and base
  move.l  a4,a5
moveq #15,d0
;moveq   #0,d2 ; prefix, D2=0
moveq #HALFLUTSIZE,d5 ; inxed or count
.indexLoop:
move.l  (a5)+,d4
beq.b   .zeroCount
add.w   d4,d2
movem.w d2/d5,-4(a5)
add.w   d4,d5
.zeroCount:
add.w   d2,d2
subq.w #1,d0
bne.b .indexLoop
;
; Sort symbols
;moveq #0,d0 ; D0=0
moveq #0,d1
.sortLoop:
move.b  0(a3,d0.w),d1
beq.b   .zeroIndex
add.b   d1,d1
add.b   d1,d1
move.w  2-4(a4,d1.w),d2
addq.w  #1,2-4(a4,d1.w)       ; Count or Index
add.w   d2,d2
move.w  d0,0(a4,d2.w)
.zeroIndex:
addq.w #1,d0
cmp.w d0,d3
bne.b .sortLoop
;
; Calculate LUT tables
move.l a4,a5
moveq   #1,d1
moveq #-1,d4
moveq   #0,d5
.lutLoop: ;
move.w (a4)+,d0 ; prefix
move.w (a4)+,d2 ; index
beq.b .zeroLut
lsl.l d1,d4
or.w d0,d4
subq.w #1,d4
ror.l d1,d4
move.l d4,(a5)+ ; code
;
sub.w d5,d2
sub.w d2,d0
movem.w d0/d1,HALFLUTSIZE-4(a5)
addq.w #2,d5
.zeroLut:
addq.w #1,d1
cmp.w #16,d1 ; this counter is wrong
bne.b .lutLoop ; should be max 15 times
move.l a3,a4
rts
;
    dc.b "Thanks to EAB for the idea of any-drive "
    dc.b "booting & loading and the non-system cache code. "
    dc.b "Greetings to Scoopex, Flashtro and Cave. "
cnop 0,4
end:

No comments:

Post a Comment

Blitter c2p for a 16 colour rotozoomer

Preface I released a simple rotozoomer in a "Lure of the Temptress" (see the  Pouet link ) crack intro for Flashtro . The original...