www.pudn.com > Blackfin_Mpeg_2_4.zip > sadct.asm


/******************************************************************************* 
Copyright(c) 2000 - 2002 Analog Devices. All Rights Reserved. 
Developed by Joint Development Software Application Team, IPDC, Bangalore, India 
for Blackfin DSPs  ( Micro Signal Architecture 1.0 specification). 
 
By using this module you agree to the terms of the Analog Devices License 
Agreement for DSP Software.  
******************************************************************************** 
Module name     : sadct.asm 
Label name      : __sadct 
Version         : 1.3 
Change History  : 
 
                Version     Date        Author        Comments 
                1.3         11/18/2002  Swarnalatha   Tested with VDSP++ 3.0 
                                                      compiler 6.2.2 on  
                                                      ADSP-21535 Rev.0.2 
                1.2         11/13/2002  Swarnalatha   Tested with VDSP++ 3.0 
                                                      on ADSP-21535 Rev.0.2 
                1.1         03/10/2002  Manoj         Modified to match 
                                                      silicon cycle count 
                1.0         07/20/2001  Manoj         Original  
 
Description     : This program performs SADCT on a 8x8 as prescribed in the  
                  MPEG-4 standard. It takes the input data array x[] in short  
                  format to be transformed (as difference can have a 9 bit  
                  range), where the elements are  
 
                     x00, x01 ...x07; 
                     x10,x11 ....x17; 
                        ........ 
                     x70,x71.....x77; 
 
                  and the corresponding shape information in character format  
                  (Alpha Map). Consider the shape array of a 3x3 (taken for ease 
                  of demonstration). The following sequence of operations are  
                  performed. 
         
         
                  a) Perform column alignment as shown 
         
                  [255 255  0 ;      column        [255 255  255 ;            
                    0   0  255;      =======>         0  255  0   ;   =====>       
                    0  255  0 ]       align           0   0   0   ]                
 
                                                            xc=[x00  x01  x12 ; 
                                                            0   x31   0  ; 
                                                            0    0    0  ]  
 
 
                  b) Perform DCT of appropriate length on each of the columns of 
                  xc i.e. perform DCT(1) on column 1 of xc, DCT(2) on column 2  
                  of xc and DCT(1) on column 3 of xc,  
                  where DCT(N) => DCT(N,N)=K * cos(i*(j+0.5)*(pi/N)), 
                     where i,j E [0 N) and K=sqrt(1/N) : i=0; 
                                            =sqrt(2/N) : else. 
                  N is the number of shape elements in the column on which SADCT 
                  is being performed. In this program, the DCT coefficients are  
                  stored in an array and a direct matrix multiplication method  
                  is used to implement the DCT. It is to be noted that for N > 6 
                  special flowgraph implementation of DCT will be optimal  
                  (considering the conditional branch and DCT complexity).  
                  However, this has not been incorporated in this program. For  
                  DCT(8) the chens DCT will suffice. If required, the flowgraph 
                  of DCT(7) is to be integrated. However, since the cycle count  
                  is highly dependent on the shape array, the user has to  
                  prudently decide whether to use the flowgraph approach in  
                  application using the considerations of code-size and speed  
                  improvement. If most of the elements in the shape array are  
                  non-zero, it would not be very advisable to use a SADCT over  
                  the normal DCT, considering the computational load. However,  
                  user has to choose between SNR loss and cycle count increase  
                  based on the particular application. In the implementation  
                  provided, two SADCT outputs are computed simultaneously, by a  
                  slight compromise on memory storage for the coefficients. 
                  The column SADCT transformed array XC is 
 
                     [X00 X10 X20 ; 
                       0  X11  0  ; 
                       0   0   0  ] 
 
                  c) Perform row alignment as shown 
         
                      [255 255  255 ;    row     XR=[X00  X10  X20 ; 
                        0  255  0   ;   =====>       X11   0    0  ; 
                        0   0   0   ]   align         0    0    0  ]  
    
                  d) Perform DCT of appropriate length on each of the row of XR  
                  i.e. perform DCT(3) on row 1 of XR, DCT(1) on row 2 of XR and  
                  skip row 3 as there are no non-zero elements in row 3. 
         
                  A new technique that avoids the intermediate shape after top- 
                  column alignment is adopted here to save stack space and to  
                  optimize cycle count. 
 
Prototype       : void sadct(short in[], unsigned char shape[], short out[],  
                             short coeff[]); 
 
                     in     -> Address of the 8x8 data array  
                     shape  -> Address of the 8x8 shape array  
                     out    -> Address of the 8x8 output array 
                     coeff  -> Address of the coefficients 
 
Registers used  : A0, A1, R0-R7, I0-I3, B0-B3, M0-M3, L0-L3, P0-P5, LC0, LC1. 
 
Performance     : 
                Code Size   : 436 Bytes 
                Cycle count : 1904 Cycles for a lower triangular matrix  
                                   (including the diagonal) 
*******************************************************************************/ 
/*Create a temporary storage of 1x8 of size fract16 to store the packed column/ 
row elements for SADCT in Stack*/ 
/*Create a temporary storage of 2x8 bytes in stack to store the length of the  
intermediate shape*/ 
 
.section L1_code; 
.global __sadct; 
.align 8; 
.extern __Coeff_offset; 
 
__sadct: 
                            //Initializations 
     
    P0 = R0;                //Address of the input array 
    P1 = R1;                //Address of the shape array 
    B0 = R1;       
    I0 = R2;                //Address of the output array 
    B3 = R2; 
    R1 = [SP+12]; 
     
    [--SP] = (R7:4,P5:3); 
    P4 = R0;                //Save the address of the input array 
    I2 = R1;                //Base address of the coeff. array 
    B2 = R1;                //Save the base of the coeff. array 
    L2 = 0; 
    I3.L = __Coeff_offset;  //Base of the offset array 
    I3.H = __Coeff_offset; 
    L3 = 0; 
    L0 = 0; 
    SP += -16;              //Temporary Storage allocated in stack. 
    I1 = SP;                //Pointer to temporary storage 
    B1 = SP;       
    L1 = 0;        
     
    SP += -20;              //To store the altered shape information 
    P5 = SP;                //Pointer to the base of column length 
     
    M1 = 16; 
    M3 = -12; 
//Column Alignment and Column DCT done together to reduce the intermediate  
// storage space from 64*2 to 8*2 bytes in the stack. 
//The Col SADCT in normal manner in I0 
     
    P3 = 32; 
    R7 = 0;                 //Set the outer loop counter for column operation 
     
    LSETUP($0LP_ST,$0LP_ST) LC0 = P3; 
                            //Clear the shape array and output array  
$0LP_ST: 
        [I0++] = R7; 
     
    R5 = 2; 
    I0 = B3;           
COL_ALIGN_ST: 
    P3 = 8; 
    I1 = B1; 
    R6 = B1; 
    R2 = B1; 
     
    LSETUP($1LP_ST,$1LP_END) LC0 = P3; 
    P3 = 16; 
    R1 = W[P0++P3] (Z);     //Read the pixel value 
$1LP_ST: 
        R4 = R2+R5 (S) ||R0 = B[P1] (Z); 
                            //Read the first byte of shape  
        CC = R0 == 0; 
        R1 = W[P0++P3] (Z) || W[I1] = R1.L;      
        IF !CC R2 = R4; 
        I1 = R2; 
$1LP_END: 
        P1 += 8; 
     
    R0 = R2-R6(S); 
    R6 = R0>>1 || W[P5] = R0.L; 
                            //Save 2*length  
     
    P5 += 2; 
     
    P3 = R7;                //Point to the next column 
    P1 = B0;                //P1 is restored to start of Shape buffer 
    P4 += 2;                //point to the next column 
    P0 = P4;                //P0 is restored to the data buffer 
    P3 += 1; 
    P1 = P1+P3; 
     
    CC = R6 == 0; 
    IF CC JUMP COL_ALIGN_END; 
                            //If zero length, no SADCT for that column.  
     
     
    M0 = R0;                //2 * Length (L) of non-zero elements 
    L1 = R0;                //Set I1 as circular buffer 
    P2 = R6; 
    R0 = B3;                //Base of temporary storage 
    R4 = 1; 
    R2 = R2-R2 (S) || I3 += M0; 
                            //Point to the right offset  
    R1 = R6+R4 (S) || R2.L = W[I3] || I3 -= M0; 
                            //Length+1, Fetch the offset. Restore I3  
    M2 = R2; 
    P3 = R1; 
    R3 = R7<<1; 
    I1 = B1;                //Point to the start of the temporary storage 
    R3 = R0+R3(S) || I2 += M2; 
    I0 = R3; 
     
     
    A1 = A0 = 0 || R0.L = W[I1++] || R1 = [I2++]; 
     
    LSETUP($2LP_ST,$2LP_END) LC1 = P3>>1; 
                            //Set Loop for (L+1)>>1  
$2LP_ST: 
        LSETUP($3LP_ST,$3LP_ST) LC0 = P2; 
                            //Set Loop for L  
     
$3LP_ST:    R2.H = (A1 += R0.L*R1.H),R2.L = (A0 += R0.L*R1.L) || R0.L = W[I1++] 
            || R1 = [I2++]; 
                            //Fetch a data and 2 coeff.  
        W[I0] = R2.L || I0 += M1; 
$2LP_END: 
        A1 = A0 = 0 || W[I0] = R2.H || I0 += M1; 
                            //Output is stored  
    I2 = B2;                //Restore pointer to the coeff. buffer 
    L1 = 0;                 //Clear the circular buffering of I1 
     
COL_ALIGN_END: 
    R7 += 1;                //Increment row counter 
    CC = R7 <=  7 (IU); 
    IF CC JUMP COL_ALIGN_ST (BP); 
     
//End of column alignment and column DCT 
 
//Row Alignment and Row DCT. 
     
    I1 = B1;                //Restore the temporary location pointer 
    R7 = 0;                 //Set the outer loop counter for column operation 
    I0 = B3; 
    R4 = 2; 
     
ROW_ALIGN_ST: 
    P5 = SP;                //Restore the pointer to the intermediate shape 
                            //array 
     
    R0 = W[P5++] (X) ;      //Read the length 
    P3 = 8; 
    I1 = B1; 
    R2 = B1; 
    R6 = B1; 
     
    LSETUP($5LP_ST,$5LP_END) LC0 = P3; 
    P3 = 4; 
$5LP_ST: 
        R5 = R0-R4 (S) || R1.L = W[I0++] || R0 = W[P5--](X); 
                            //Read the pixel value  
        R3 = PACK(R2.H,R2.L) || W[P5++P3] = R5.L; 
        CC = R5 < 0; 
        R3 = R3+R4 (S) || W[I1] = R1.L; 
        IF !CC R2 = R3;  
$5LP_END: 
        I1 = R2; 
     
     
    R6 = R2-R6; 
    R6 >>= 1;               //Count of nonzero elements 
     
    R0 = B3; 
    R3 = R7 << 4; 
    R3 = R0+R3; 
    I0 = R3; 
    R7 += 1;                //Increment column counter 
    R3 += 16; 
     
    CC = R6 == 0; 
    IF CC JUMP ROW_ALIGN_END; 
                            //If zero length, no SADCT for that column.  
     
    I1 = B1;                //Point to the start of the temporary storage 
    R0 = R6<<1 || NOP; 
    M0 = R0;                //2 * Length (L) of non-zero elements 
    L1 = R0;                //Set the temporary buffer as circular 
    R0 = 0; 
    R5 = R4 >> 1 || [I0++] = R0; 
    R2 = R2-R2 (S) || [I0++] = R0 || I3 += M0; 
                            //Point to the right offset  
    R1 = R6+R5 (S) || R2.L = W[I3] ; 
                            //Length+1, Fetch the offset.  
    M2 = R2; 
    [I0++] = R0|| I3 -= M0; //Clear I0. Restore I3 
    P3 = R1; 
    P2 = R6; 
    I2 += M2 || [I0++M3] = R0; 
                            //Point to the right coeff. location  
     
    A1 = A0 = 0 || R0.L = W[I1++] || R1 = [I2++]; 
    LSETUP($6LP_ST,$6LP_END) LC1 = P3>>1; 
                            //Set Loop for (L+1)>>1  
     
$6LP_ST: 
        LSETUP($7LP_ST,$7LP_ST) LC0 = P2; 
                            //Set Loop for L  
$7LP_ST: 
            R2.H = (A1 += R0.L*R1.H),R2.L = (A0 += R0.L*R1.L) || R0.L = W[I1++] 
            || R1 = [I2++]; 
                            //Fetch a data and 2 coeff.  
$6LP_END: 
        A1 = A0 = 0 || [I0++] = R2 ; 
                            //Output is stored  
    I2 = B2;                //Restore pointer to the coeff. buffer 
    L1 = 0; 
ROW_ALIGN_END: 
    I0 = R3;                //Point to the next row 
    CC = R7 <=  7 (IU); 
    IF CC JUMP ROW_ALIGN_ST (BP); 
//End of row alignment and row DCT 
     
    SP += 36; 
    (R7:4,P5:3) = [SP++]; 
    RTS; 
    NOP;                    //to avoid one stall if LINK or UNLINK happens to be 
                            //the next instruction after RTS in the memory. 
__sadct.end: