Wednesday, February 4, 2026

rABS with byte-size input in Z80

rABS with byte-size input in Z80

In my previous blog I went through an rABS implementation with one bit at a time input during the state renormalization. Since the mockup Python rABS encoder already supported varying input/output lengths during state renormalization, I decided to experiment with byte-wise decoder side as well. The motivational push for this came from recent Baze/3SC rANS experimentations ;)

On top of nice byte-wise handling of compressed input stream as a bonus I also managed to cut a bit of decoder size. Let's cut the prologue short and check what we have as Z80 implementation. Nothing really changed to the point we start calculating the new ANS state_x. (note: I have swapped DL and HL i.e. HL is now a ptr to compressed data and DE holds ANS state_x).

; with 8 bits input the L_BIT_LOW becomes 0x0100, which allows

; byte-wise input from the compressed data stream during state_x

; renormalization.

L_BITS_         equ     8

L_BIT_LOW_      equ     0x10000 >> L_BITS_

                ...

_new_state:     ; new_state = d * Fs - Is + r

                ;           = d * Fs + (r - Is)


Some unwanted register mangling.. partly because HL and DL are

now swapped from the original design. On the other hand now

the register assignment is more LDD(R) friendly.


                ex      de,hl

                ld      e,a

                ld      d,b

                ld      a,l

                ld      l,b

                ;  H = (d = state_x // M)

                ;  L = 0

                ; DE = Fs

                ; BC = Is

                ;  A = r = state_x & (M - 1)


Multiplication has not changed.


                ld      b,8

_umulDxL_HL:    add     hl,hl

                jr nc,  $+3

                add     hl,de

                djnz    _umulDxL_HL


C_flag is always cleared either after last 'jr nc' or ADD

(which never overflows).


                sbc     hl,bc

                ld      c,a

                add     hl,bc

                ex      de,hl

                pop     hl


Renormalization 'while (new_state_x < L_BIT_LOW)' loop. Now that

our L_BIT_LOW == 256, there's a nice optimization available. We

only need to check if ANS state_x high byte is zero (i.e. D, since

state_x is kept in DE) and everything less than 256 is [0..255].

Since we input a whole byte at time there is no need for a

loop.. or well, right.. if state_x would be 0 before renormalization

then yes, we would input 2 bytes. I guess that cannot ever happen,

can it? ;)


                ld      a,d

                and     a

                jr nz,  _end_while

                ld      d,e

                ld      e,(hl)

                dec     hl

_end_while:     pop     af

                ret


Then there would be more size optimizations possible if we were to drop the fancy multiplication routine and replace it with a brute force add-loop. Now new ANS state_x calculation would become something like below.. But is 6 bytes saving worth doing a brute force multiplication? Anyway, Z80 source code is available in my github. Although the "multiplication" loop setup looks suspicious i.e. A = d = state_x // M, the loop counter A is always at least 1. The renormalization during previous round made sure that state_x >= 0x100.


                ...

_new_state:     ; new_state = d * Fs - Is + r

                ;           = d * Fs + (r - Is)

                ld      h,b

                ld      l,e      ; HL = r

                ld      e,a

                ld      a,d      ;  A = d = state_x // M

                ld      d,b      ; DE = Fs

                and     a        ; BC = Is

                sbc     hl,bc    ; HL = r - Is

_umul_HL:       add     hl,de

                dec     a

                jr nz,  _umul_HL

                ex      de,hl

                pop     hl              

                ; while (new_state < L_BIT_LOW)

                ; L_BIT_LOW == 256

                or      d

                ...



No comments:

Post a Comment

rABS with byte-size input in Z80

rABS with byte-size input in Z80 In my previous blog I went through an rABS implementation with one bit at a time input during the state re...