Tuesday, July 27, 2021

Blitter c2p for a 16 colour rotozoomer

Preface

I released a simple rotozoomer in a "Lure of the Temptress" (see the Pouet link) crack intro for Flashtro. The original "concept" was done summer 1994 but the intro actually got finalised & optimised the heck out of it during summer 2015 ;-) Some delivery process hiccups I would say.. 

There's nothing new in rotozoomers and c2p routines but there's one neat piece of blitter use in this one - at least I think so. The rotozoomer uses 4x4 graphics pixels, not copper as usual, and in 16 colours. The c2p used in the intro is fully done in 4 blitter interrupt driven blitter passes, which with code in FASTMEM allows easily to do full screen zero-precalculated (320x256) rotozoomer in 50Hz with a plain mc68000 and 2 additional bitplanes on for a text writer and stuff. With the code in CHIPMEM I almost managed the full screen but with the writer and scroller I think I had to cut AFAIR 4 or 8 chunky pixels on each row.

The blitter c2p is what I'll go through here. Contrary to other neat tricks using 7 bitplanes OCS/ECS undocumented features this method works in all chipsets, although, I can't think of a reason for someone to use this with AGA. I cannot take credit of inventing this method as I recall at least a coder called CrazyCrack has used similar method prior my stuff. Anyway, I just wanted to document this.. since I have not seen it done earlier.

Maybe the next post is about the rotozoomer itself as I always had hard time to get my head around it when described in "proper terms" etc. Back in day I just realised I can do that using a unit circle and a single scaled sin()/cos() value as an affine "texture mapping" interpolation value.


Prerequisites

The blitter c2p uses two "linear" chunky buffers for 4x4 pixels - actually in RAM they are 4x1 pixels but more about that later. A pixel in a chunky buffer is 8 bits. The 4 bit colour information is repeated in both nibbles of the byte: 

 7       0 
+----+----+
|abcd|abcd|
+----+----+

So, there are two chunky buffers. One for odd pixels and another for even pixels. The rotozoomer inner loop must therefore write into two buffers in an interleaved manner. This is not really an issue, for example, if the inner loop unrolls the entire row of writes. Below is an example ASCII pictures showing the buffers:

 7       0 7       0       7       0 7       0 7       0
+----+----+----+----+-...-+----+----+----+----+----+----+
|abcd|abcd|efgh|efgh|     |ijkl|ijkl|mnop|mnop|qrst|qrst| odd pixels
+----+----+----+----+-...-+----+----+----+----+----+----+

 7       0 7       0       7       0 7       0 7       0
+----+----+----+----+-...-+----+----+----+----+----+----+
|ABCD|ABCD|EFGH|EFGH|     |IJKL|IJKL|MNOP|MNOP|QRST|QRST| even pixels
+----+----+----+----+-...-+----+----+----+----+----+----+

The blitter is setup as follows:
  • Channels A, B and D are enabled.
  • BLTCDAT = $8888
  • BLTBPTH & L points to the last word of even pixels
  • BLTAPTH & L points to the last word of odd pixels
  • BLTDPTH & L points to the last word of the respective bitplane buffer

The BLTCON0 and BLTCON1 are set to:
  • Pass #1 - $7d283012 
  • Pass #2 - $6d282012 
  • Pass #3 - $5d281012 
  • Pass #4 - $4d280012 

We can see that we use the blitter in "descending mode" i.e. the DESC bit in BLTCON1 is set. Also, the EFE bit in BLTCON1 is set, which mean we also use AREA FILLING in exclusive mode.

Let's go through one blitt taking an example of the Pass #1 (for preparing the bitplane 0) to see how the shifts/masks are done. But first, let's have a look at the minterm ($28 i.e. 00101000b). I always found it easier to use the Venn Diagram instead of "blitter logic" stuff when figuring out the minterms. We have bits 5 and 3 set. From the Venn Diagram we see those match to (A and C) and (B and C) and no other combination. This means our logic operation is (AC)+(BC) i.e. a bit gets set in the destination iff (A and C)=1 or (B and C)=1 but not e.g., when (A and B and C)=1. Obviously no bit is set when C=0. We use C channel as a mask for all our blitts. So essentially the minterm means a logic operation "(A XOR B) AND C".

                         ______  0 ______
                        /      \  /      \
                       /        \/        \
                      /         /\         \
                     /   A     /  \     B   \
                    |    -    |    |    -    |
                    |         |  6 |         |
                    |         |    |         |
                    |       4 |____| 2       |
                    |        /|    |\        |
                    |       / |  7 | \       |
                     \     /   \  /   \     /
                      \   /  5  \/  3  \   /
                       \ |      /\      | /
                        \|_____/  \_____|/
                         |              |
                         |       1      |
                         |              |
                         |              |
                          \            /
                           \     C    /
                            \    -   /
                             \______/

When applying the BLTCON0 and BLTCON1 $7d283012 we first shift A (odd) by 7 bits to the left and B (even) by 3 bits to the left. After that the logic operation i.e. the "A XOR B" and mask the result using the static C channel value $8888 are applied. Let us assume:

  • qrst = 1101
  • QRST = 0111
  • mnop = 1100
  • MNOP = 1001
  • ijkl = 1111
  • IJKL = 1111

What we are after is a linear bitplane 0 content looking like this ("upper case" pixels underlined). Note the 3 pixel shift on the right edge, which can be "corrected" using $dff102 shift of 3 pixels to the right..

 7       0 7       0       7       0 7       0 7       0
+----+----+----+----+-...-+----+----+----+----+----+----+
|....|....|.........|     |1111|1000|0111|1111|1111|1...| bitplane 0
+----+----+----+----+-...-+----+----+----+----+----+----+

The above blitter setup applied into the odd and even chunky buffer yields to the following after SHIFTs, masks, logic operation and area fill. The bits affected by the logic operation for the XOR operation are underlined. The green background marks a left shift.

 7       0 7       0       7       0 7       0 7       0
+----+----+----+----+-...-+----+----+----+----+----+----+
|defg|hefg|h........|     |l110|0110|0110|1110|1000|0000| odd pixels
|DABC|DEFG|HEFG|H...|     |1111|1100|1100|1011|1011|1000| even pixels
|1000|1000|1000|1000|     |1000|1000|1000|1000|1000|1000| Mask
|?000|?000|?000|?000|     |0000|1000|1000|0000|0000|1000| XOR result
+----+----+----+----+-...-+----+----+----+----+----+----+
|????|????|????|????|     |1111|1000|0111|1111|1111|1000| filled result
+----+----+----+----+-...-+----+----+----+----+----+----+

The same is repeated for all remaining 3 bitplanes, however, using different shift amounts. The produced planar pixels are actually 4x1 size. One can use bitplane modulos or reload bitplane pointers to stretch the pixels in y-direction to achieve 4x4 pixel size. I think I used the latter in my crack intro due to bitplanes 4 and 5 being used for the writer and the scroller. Actually halfbrite mode is used.. I recall places where there is no writer or scroller copper enables normal 4 bitplane mode to save some DMA. There are other nice features with this method. For example, there is no need to setup destination channel pointer more than once i.e., at the beginning of the Pass #1 blitt and maybe some other stuff I have forgotten.

No comments:

Post a Comment

Blitter c2p for a 16 colour rotozoomer

Preface I released a simple rotozoomer in a "Lure of the Temptress" (see the  Pouet link ) crack intro for Flashtro . The original...