Preface
Continuing from where we left off in the post #2 on Smarty and the Nasty Gluttons.. this post is about the floppy disk bootblock used in the boxed version of the game. There will be a follow-up post going through the online ADF version bootblock once I get around to do it. It is a bit involved. The latter bootblock contains encrypted data and a protection. We received reports from folks using FPGA based Amiga emulators that the game does not boot.. well.. the ADF version bootblock requires exact emulation of the mc680x0 hardware and obviously those FPGA emulators do not have such ;-) The boxed game bootblock is not encrypted or obfuscated in any way.
The boxed game bootblock
This post contains a lot of source code as it is easier to explain certain things directly from the code. Since the whole release process of a boxed Amiga game in 2020s is an "act of love" or a serious hobby at best the fine line between what makes sense and what not is meaningless. Therefore, the bootblock contains geek things just for "because we can" and "I had to try it" reasons.
In comparison to the ADF version bootblock the boxed game bootblock does not contain a minimal FFS file loader for loading the second stage file loader program. Instead, the floppy disk layout is mastered (by hand) in a way that the all data sectors of the Amiga DOS file containing the second stage loader start immediately after the bootblock sectors. So when the bootblock code rereads itself from the floppy (using trackdisk.device) it purposely reads past the bootblock to also get the secondary loader compressed file into the memory. The bootblock has the S405 decompressor in it so we can now decompress the second stage FFS file loader and jump into it.
The other geek or useful things added to the bootblock include:
- Booting and playing the game from any disk drive, not just DF0:. And furthermore, this also works on any Kickstart, not just on 2.x or newer. There's a hack'ish example code for that.
- The bootblock content starts from offset 4, not the standard 12. This involves generating such bootblock content that the checksum is exactly what we want it to be ;-) There's an example code how to generate a bootblock checksum with a predefined content.
- Using code embedded into perfectly readable ASCII text messages.
Generating bootblock checksums with a desired content
To fully understand the checksum code below look at the bootblock source itself. It is later in this post. The main idea here is that we first calculate the normal bootblock checksum and subtract the value of the "desired checksum value" from that. The difference then is placed somewhere in the bootblock and then recalculate the checksum. Now the checksum becomes to the value we wanted it to be. I must thank Ross @EAB getting me to think about this. His superb one disk trainer (which, btw has a superb trainer intro as well ;-) for the Smarty and the Nasty Glutton had this thing. That man is full of crazy ideas!
The code below does the magic. It assumes the "desired checksum" is already in the place of the checksum field of the bootblock.
section boot,code_cj: ; "sta" is the start of the bootblock codelea sta+4(pc),a1move.l (a1),d2move.l d2,d3clr.l (a1)not.l d2bsr.b calcBootCRCsub.l d0,d2bcc.b .oksubq.l #1,d2.ok:move.l d2,crcbsr.b calcBootCRCnot.l d0move.l d0,d2; This checksum calculation is for verification purposesbsr.b calcBootCRCnot.l d0move.l d2,(a1)rtscalcBootCRC:lea sta(pc),a0moveq #0,d0move.w #(1024/4)-1,d1.loop:add.l (a0)+,d0bcc.b .skipaddq.l #1,d0.skip:dbf d1,.looprts
The boxed game bootblock source in full
The following lengthy piece of code is the actual boxed game version bootblock. I have put some comments in between here and there. First the defines.. I am still stuck at old Seka times doing assembly coding... sorry. If there are bugs in the bootblock code.. oh boy.. this code is already out in the wild as part of the physical game disk of the boxed version.
;-----------------------------------------------------------------;LVOFindResident equ -96LVOSupervisor equ -30LVODoIO equ -456LVOAllocMem equ -198LVOAllocAbs equ -204LVOOpenDevice equ -444LVOCloseDevice equ -450LVODisable equ -120IOSTD_SIZE equ 56IO_UNIT equ 24MP_SIZE equ 34MN_REPLYPORT equ 14MN_LENGTH equ 18LN_TYPE equ 8NT_REPLYMSG equ 7MH_UPPER equ 24MH_LOWER equ 20MAXLOCMEM equ 62LOADSIZE equ (18+2)*512
The bootblock itself. Note that the copyright message starts immediately after the DOS Type at offset 4. You need the above checksum calculation code to fabricate "*Sma" as the bootblock checksum. Furthermore, as you can see there is no "visible" code at the offset 12, instead the ASCII "and the.." starts from there. The geek thing is that "an" is $616e in hex, which translates to a "BSR.B go" in the bootblock binary. Since "bsr" pushes the return address into the stack we pop it later and use that address as a base address for variables.
sta: dc.b "DOS",0dc.b "*Smarty and the Nasty Gluttons*"dc.b " (c) 2020 Eero Tunkelo."dc.b " HardCopy release (c) 2020 by Bitmap Soft."crc: dc.l 0 ; used for biasing the real checksumtdname: dc.b "trackdisk.device",0dc.b 'SCX'; A4=area for variablesgo: move.l (sp)+,a4move.l a1,a2 ; save IORequest
Next we reread the bootblock into the CHIPRAM address $30000. This address is "high enough" in memory not to mess with system stuff and "low enough" to be available in Amigas with just 256K of CHIPRAM (I know this is ridiculous but our friend Ross insisted on stuff like this and those non-existent 256K Fast RAM expansions). The LOADSIZE is way more that 2 sectors needed for the bootblock - remember the trick to load the second stage file loader code at this step.
; Read start program into CHIP..lea $30000,a3move.w #2,$1c(a2) ; io_Command = CMD_READmove.l #LOADSIZE,$24(a2) ; io_lengthmove.l a3,$28(a2) ; io_Dataclr.l $2c(a2) ; io_Offsetmove.l a2,a1jsr LVODoIo(a6)tst.l d0bne.b errorExit
The next routine finds out which disk drive was used to boot the game disk. This code is contributed by one EAB thread and specifically Stingray (greetings fellow Scoopexian). The basic idea is to iterate through all device units and find which device has the same IO_UNIT as the IORequest address passed to the bootblock. This is somewhat hack'ish but seems to work just fine.
findBootDrive:; Find boot drive - from EAB & Stingraymoveq #IOSTD_SIZE+MP_SIZE-1,d6.clear:clr.b 0(a4,d6.w)dbf d6,.clear;moveq #4-1,d6 ; device number & countmove.l IO_UNIT(a2),d7lea IOSTD_SIZE(a4),a5 ; A5 = Messagemove.l a5,MN_REPLYPORT(a4) ; IOExtReq.io_Message.mn_ReplyPort;move.w #IOSTD_SIZE,MN_LENGTH(a4);move.b #NT_REPLYMSG,LN_TYPE(a4).findLoop:move.l d6,d0 ; Unitmoveq #0,d1 ; Flagslea tdname(pc),a0 ;move.l a4,a1 ; A4 = IOStdReqjsr LVOOpenDevice(a6)tst.l d0bne.b .noDevicemove.l IO_UNIT(a4),d5move.l a4,a1jsr LVOCloseDevice(a6)cmp.l d5,d7.noDevice:dbeq d6,.findLooptst.w d6bpl.b bootDriveFounderrorExit:moveq #-1,d0rts
Now we found the boot drive. The next code does the cache disable stuff. Again, a variation and result of the codes found in relevant EAB threads. Kudos to Keir and Ross with a bit of my own (the illegal instruction handler stuff). We do not use the typical (and what would make sense) CacheControl() provided by Kickstart 2.x and newer. Instead, we have a code that works (has been tested) on any Kickstart and 680x0. I bet there is an accelerator out there which will cause this code to break due some magic done in the accelerator firmware... Anyway, I had a perfectly good reason for what I did. The main game developer Sami Karjalainen has Amiga 2000 (Kickstart ROM 1.3) with GVP 68030 turbo. So when the game is booted up for the very first time Sami's A2000 has Kickstart 1.3 and 68030. Obviously the game had to work on his machine as well.
Furthermore, his GVP was a source of additional headache. It generated spurious Level 7 IRQs, which we did not count for.. it took quite a bit of debugging to find this out. Once I figured it all the numerous threads in EAB came immediately as a backflash about the very same topic. Sigh..
bootDriveFound:; Turn disk motor offmove.l a2,a1move.w #9,$1c(a1)clr.l $24(a1)jsr LVODoIo(a6)jsr LVODisable(a6); Cache codelea disableCaches(pc),a5move.l sp,a2jsr LVOSupervisor(a6); Illegal instruction handleraddq.l #4,2(sp)rtedisableCaches:move.l -(a2),$10.w ; JSR return address is the handlermoveq #0,d0 ;moveq #0,d1dc.l $4e7b0801 ; movec d0,vbrdc.l $4e7a1002 ; movec cacr,d1tst.l d1bpl.b .skipdc.w $f478 ; cpusha dc.skip: dc.l $4e7b0002 ; movec d0,cacrdc.l $4e7b0808 ; movec d0,pcr; Jump to code in fixed address..jmp findMemory-sta(a3)
At this point we have all caches etc fancy features turned off and VBR set to 0. We are also executing the code from the fixed address somewhere around that $30000.
I have a track record embedding bugs into my memlist code finding out all autoconfiguring + ranger memory. The original Smarty and the Nasty Gluttons code from 1990s had a bug, the "cracked" game preview release had a different bug and finally the ADF version of the game had a bug ;-) I hope I finally got this right.. I presume Ross @EAB (if he ever reads this post) will point out the bugs.. as he did previous times.
; Search exec memlist for 512K RAM disk.. any memfindMemory:lea $142(a6),a0 ; memlistmoveq #8,d0swap d0 ; D0=$80000moveq #0,d1subq.w #1,d1 ; D1=$ffffmoveq #0,d4 ; mem found flag; Check for 256K CHIPcmp.l MAXLOCMEM(a6),d0bhi.b .noMemFound.memLoop:move.l (a0),a0tst.l (a0)beq.b .noMemFoundmovem.l MH_LOWER(a0),d2/d3add.l d1,d3clr.w d3sub.l d0,d3; loop if 256K or 512K of CHIPble.b .memLoop; loop if 256K mem expansionclr.w d2cmp.l d2,d3blo.b .memloop.done:move.l d0,d4.noMemFound:movem.l d3/d4/d6,$8c.w ; $8C = ptr to min 512K RAM disk; $90 = amount of RAM disk; $94 = boot drive
If there is not enough RAM or 256K of CHIPRAM is found the second stage file loader code will inform the player about the fact.
Now we are at the final steps.. the rest of to code relocates both stacks and calls the decompressor. The code execution for the second stage file loader starts at $5000. That's it. For your convenience the S405 decompression routine is also included in full..
; Stacks..lea $400.w,a7 ; SSPmove.l a7,a2 ; decompressor work spacemove.w d0,sr ; back to user modelea $300.w,a7 ; USPlea $dff096,a0move.w #$0180,(a0) ; #$0180,$dff096move.w d0,$180-$96(a0); decompress and execute..lea $604(a3),a0lea $5000.w,a1;lea ($5000-TABLE_SIZE).w,a2pea (a1);;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Version: 5d;; This file is released into the public domain for commercial; or non-commercial usage with no restrictions placed upon it.;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;LUTSIZE equ 15*8HALFLUTSIZE equ LUTSIZE/2PRESYMS equ 16LTMSYMS equ 512HGHSYMS equ 256LOWSYMS equ 256rsreset ; DO NOT REARRANGEPRETABLE rs.b (LUTSIZE+PRESYMS*2)LTMTABLE rs.b (LUTSIZE+LTMSYMS*2)HGHTABLE rs.b (LUTSIZE+HGHSYMS*2)LOWTABLE rs.b (LUTSIZE+LOWSYMS*2)OVLTABLE rs.b 256TABLE_SIZE rs.b 0MTF equ LOWTABLE+HGHSYMS;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Parameters:; A0 - src - a ptr to the compressed data.; A1 - dst - a ptr to the destination memory.; A2 = ptr to work area (TABLE_SIZE bytes);; Returns:; none;; Notes:; No checks if source and destination memory areas; overlap. In such case a crash is very probable.; Runtime RAM usage is 2816 ($b00) bytes, relocatable using A2.;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;decrunch: addq.w #2,a0;; A0 = ptr to crunched data (ID skipped) + 2; A1 = ptr to destination mem; A2 = ptr to work area;blockLoop: tst.w -(a0)bne.b .contrts;; Initialize the bitshifter.cont: move.l (a0)+,d7swap d7moveq #0,d6;; A0 = src; A1 = dst; a2 = tmp; D6 = _bc; D7 = _bb;decodeTrees:lea MTF(a2),a3 ; MTF tablemove.l a2,a4 ; A4 = PRETABLElea LUTSIZE+PRESYMS*2(a2),a5 ; leaf depths;; Init MTF table M while reading 16 leaf depths.;moveq #PRESYMS,d3moveq #29,d4moveq #0,d5 ; max leaf depth is 15.getb3: ;move.b d5,(a3)+ ; init MTFmove.l d7,d0lsr.l d4,d0move.b d0,(a5)+moveq #3,d1bsr.b getBaddq.w #1,d5cmp.w d5,d3bne.b .getb3; Build pretree..; D3 = num symbols = 16; A4 = ptr to PRETABLEbsr.w buildDecodingTables ; PRETREE;; Build LTMTABLElsl.w #5,d3; D3 = 512; A4 = A2+LTMTABLE;lea LTMTABLE(a2),a4bsr.b buildDecodingTreelsr.w #1,d3; D3 = 256; A4 = A2+HGHTABLEbsr.b buildDecodingTree;; D3 = 256; A4 = A2+LOWTABLEbsr.b buildDecodingTree;; D3 = 256; D5 = oldLength (PMR) no init required; A4 = oldOffset (PMR) no init required; A5 = oldOffsetLong (PMR) no init required;mainLoop:lea LTMTABLE(a2),a3 ; LTMTABLEbsr.b getSymlsub.w d3,d2decodeLiteral: ;; Symbol > 256 => PMR or matchbgt.b matchFound;; Symbol = 256 => End of Blockbeq.b blockLoop;; Symbol < 256 => Literalmove.b d2,(a1)+bra.b mainLoop;matchFound:subq.w #1,d2ble.b copyLoop ; Symbol = 257 => PMR; D2 = match_length-1move.w d2,d5 ; D5 = PMR oldLengthdecodeOffset:lea HGHTABLE(a2),a3bsr.b getSymlmove.w d2,d4bne.b .notoldoffsetlongmove.w a5,d4bra.b oldOffsetLong.notoldoffsetlong:bclr #7,d4beq.b oldOffsetLong;lsl.w #8,d4;lea LOWTABLE(a2),a3bsr.b getSymlmove.b d2,d4;move.w d4,a5 ; A5 = PMR oldOffsetLongoldOffsetLong: move.w d4,a4 ; A4 = PMR oldOffsetcopyLoop: move.w d5,d0 ; D5 = PMR oldLength;; D3 = matchLength-1; D4 = offsetmove.l a1,a3sub.l a4,a3;.copy: move.b (a3)+,(a1)+dbf d0,.copybra.b mainLoop;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Description:; Decodes the next symbol from the given huffman tree.;; Parameters:; a3 = ptr to (huffman) tables..;; Returns:; D1 = cnt (num bits needed for symbol..); D2 = symbol;; Trashes:; D0,a3;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;getSyml:.lut: cmp.l (a3)+,d7bhi.b .lut;movem.w HALFLUTSIZE-4(a3),d0/d1; D0 = base index; D1 = number of bits to extractmove.l d7,d2clr.w d2rol.l d1,d2sub.w d0,d2add.w d2,d2move.w -4(a3,d2.w),d2;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Description:; Extract n bits from the compressed data stream.;; Parameters:; D1.w = num_bits_to_extract;; Returns:; Nothing.;; Trashes:; flags,D1;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;getB: cmp.w d1,d6bge.b .getblsl.l d6,d7move.w (a0)+,d7sub.w d6,d1moveq #16,d6.getb: sub.w d1,d6lsl.l d1,d7return: rts;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Description:; Build huffman tree & decoding structures. Read symbols; from the compressed stream and then does inverse MTF; symbols. Used only to decode/build the pretree.;; Parameters:; D3 = num symbold; A4 = ptr to the tree;; Returns:; D3 = num of symbols; A4 = ptr to dest table;; Trashes:; D0,D1,D2,D4,D5,A3,A4,A5;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;buildDecodingTree:move.w d3,d0add.w d0,d0lea LUTSIZE(a4,d0.w),a5;moveq #0,d5move.w d3,d5.loop: move.l a2,a3 ; PRETABLEbsr.b getSyml ; trashes a3; inverse MTFlea MTF(a2),a3add.w d2,a3move.w d2,d0move.b (a3),d2bra.b .mtf1;.mtf: move.b -(a3),1(a3).mtf1: dbf d0,.mtfmove.b d2,(a3)move.b d2,(a5)+;addq.w #1,d5;cmp.w d5,d3subq.w #1,d5bne.b .loop;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Description:; Build huffman tree & decoding structures. Assume that all; symbols are already loaded into memory.;; Parameters:; D3 = num symbols; A4 = ptr to the tree table;; Returns:; D3 = num of symbols; A4 = ptr to next table;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
buildDecodingTables:move.l a4,a5moveq #HALFLUTSIZE/4,d0; clear LUTTABLE.clrLoop:clr.l (a5)+subq.w #1,d0bne.b .clrLoop;move.w d3,d1add.w d1,d1add.w d3,d1lea LUTSIZE(a4,d1.w),a3; Read depths and count occurrences of each depth; A3 = leaf depths; A4 = tree/lutmove.w d3,d2;moveq #0,d0 ; D0=0.countLoop:move.b -(a3),d0beq.b .zeroDepthadd.b d0,d0add.b d0,d0addq.w #1,2-4(a4,d0.w) ; Count or Index.zeroDepth: subq.w #1,d2bne.b .countLoop; count prefix and basemove.l a4,a5moveq #15,d0;moveq #0,d2 ; prefix, D2=0moveq #HALFLUTSIZE,d5 ; inxed or count.indexLoop:move.l (a5)+,d4beq.b .zeroCountadd.w d4,d2movem.w d2/d5,-4(a5)add.w d4,d5.zeroCount:add.w d2,d2subq.w #1,d0bne.b .indexLoop;; Sort symbols;moveq #0,d0 ; D0=0moveq #0,d1.sortLoop:move.b 0(a3,d0.w),d1beq.b .zeroIndexadd.b d1,d1add.b d1,d1move.w 2-4(a4,d1.w),d2addq.w #1,2-4(a4,d1.w) ; Count or Indexadd.w d2,d2move.w d0,0(a4,d2.w).zeroIndex:addq.w #1,d0cmp.w d0,d3bne.b .sortLoop;; Calculate LUT tablesmove.l a4,a5moveq #1,d1moveq #-1,d4moveq #0,d5.lutLoop: ;move.w (a4)+,d0 ; prefixmove.w (a4)+,d2 ; indexbeq.b .zeroLutlsl.l d1,d4or.w d0,d4subq.w #1,d4ror.l d1,d4move.l d4,(a5)+ ; code;sub.w d5,d2sub.w d2,d0movem.w d0/d1,HALFLUTSIZE-4(a5)addq.w #2,d5.zeroLut:addq.w #1,d1cmp.w #16,d1 ; this counter is wrongbne.b .lutLoop ; should be max 15 timesmove.l a3,a4rts;dc.b "Thanks to EAB for the idea of any-drive "dc.b "booting & loading and the non-system cache code. "dc.b "Greetings to Scoopex, Flashtro and Cave. "cnop 0,4end:
No comments:
Post a Comment