US 20050027969 A1 Abstract Instructions for performing SIMD instructions, including parallel absolute value and parallel conditional move instructions, as well as a method and circuit for saturating results of operations. The parallel absolute value instruction determines the absolute value of operands based on the sign bit of the operands. When a parallel conditional move instruction is executed, status indicators corresponding to an operand are compared to a condition code in a register to determine whether the condition is true for any of the status indicators; if the condition is true, the corresponding operand is moved to a specified register. A method and circuit for handling saturation of a result of an operation are also provided. When two m-bit operands are added, as in an addition, average, or subtraction operation, if an average instruction is executed, the m most significant bits are output; otherwise, the m least significant bits are output and the result is saturated if there is overflow and saturation is enabled.
Claims(29) 1. In a processor, a method for performing a parallel conditional move operation comprising:
a) comparing at least two sets of status indicators which correspond to at least two operands to a corresponding condition code specified in a register to determine whether the condition indicated by the condition code is true for any of the status indicators; and b) if the condition indicated by the condition code is true for any of the status indicators, moving the corresponding operand to a specified register. 2. The method of 3. The method of 4. A processor-readable storage medium storing an instruction that, when executed by a processor, causes the processor to perform a method for performing a parallel condition move operation, the method comprising:
a) comparing at least two sets of status indicators which correspond to at least two operands to a corresponding condition code specified in a register to determine whether the condition indicated by the condition code is true for any of the status indicators; and b) if the condition indicated by the condition code is true for any of the status indicators, moving the corresponding operand to a specified register. 5. The processor-readable storage medium of 6. The processor-readable storage medium of 7. In a processor, a method for performing a parallel absolute operation comprising:
a) determining the absolute value of at least two operands by employing one of the following approaches based on the sign bit of each of the at least two operands:
i) where the sign bit of an operand is 1 and at least one of the other bits is 1, the absolute value of the operand is the 2's complement of the operand;
ii) where the sign bit of the operand is 1 and each of the other bits is 0, the absolute value of the operand is the 1's complement of the operand; and
iii) where the sign bit of the operand is 0, the absolute value of the operand is the value of the operand, wherein overflow is set only when the operand is 0x80; and
b) placing the absolute value of each of the at least two operands in at least two registers specified to receive the absolute value of the at least two operands. 8. The method of 9. The method of 10. The method of 11. A processor-readable storage medium storing an instruction that, when executed by a processor, causes the processor to perform a method for performing a parallel absolute operation, the method comprising:
a) determining the absolute value of at least two operands by employing one of the following approaches based on the sign bit of each of the at least two operands:
i) where the sign bit of an operand is 1 and at least one of the other bits is 1, the absolute value of the operand is the 2's complement of the operand;
ii) where the sign bit of the operand is 1 and each of the other bits is 0, the absolute value of the operand is the 1's complement of the operand; and
iii) where the sign bit of the operand is 0, the absolute value of the operand is the value of the operand, wherein overflow is set only when the operand is 0x80; and
b) placing the absolute value of each of the at least two operands in at least two registers specified to receive the absolute value of the at least two operands. 12. The processor-readable storage medium of 13. The processor-readable storage medium of 14. The processor-readable storage medium of 15. In a processor, a method for saturating a result of a first operation comprising:
a) adding together two m-bit operands; b) outputting the m most significant bits as the result when an average operation is performed, otherwise outputting the m least significant bits as the result, wherein the result of the m least significant bits is saturated if there is overflow and if saturation is enabled; and c) placing the result in a specified register. 16. The method of 17. The method of 18. The method of 19. The method of 20. The method of 21. The method of 22. The method of 23. A circuit configured to saturate a result of a first operation comprising:
a) an m+1-bit adder for adding together two m-bit operands and outputting an m+1 bit result; b) coupled to the adder, a first multiplexer for outputting one of the following values:
i) the m least significant bits output by the adder; or
ii) the m most significant bits output by the adder, wherein the m most significant bits is output by the first multiplexer when an average instruction is executed;
c) a second multiplexer coupled to a third multiplexer for outputting a selected saturation value; and d) coupled to the adder, the third multiplexer for outputting one of the following values:
i) the output from the first multiplexer; or
ii) the output from the second multiplexer, wherein the output from the second multiplexer is output by the third multiplexer when there is overflow and saturation is enabled.
24. The circuit of 25. The circuit of 26. The circuit of 27. The circuit of 28. The circuit of 29. The circuit of Description This application claims the benefit of provisional United States patent application entitled “Digital Signal Coprocessor,” application No. 60/492,060, filed on Jul. 31, 2003. This invention relates to single instruction multiple data (“SIMD”) operations on packed data in a processor, particularly instructions causing a processor to determine an absolute value or perform a conditional move of operands or where the result may be saturated. Single instruction, multiple data (“SIMD”) style processing has been used to accelerate multimedia processing, including image processing and data compression. Instruction sets for processors often include SIMD instructions where multiple data elements are packed in a single wide register, with the individual data elements operated on in parallel. Using this approach, multiple operations can be performed with one instruction, thus improving performance. One example is INTEL's MMX (multimedia extension) instruction set. It would be advantageous to provide new SIMD instructions and supporting circuitry to further enhance multimedia processing, for instance, image segmentation or clipping. SIMD instructions, including parallel absolute value and parallel conditional move, for parallel processing of packed data are provided as well as a circuit for saturating the result of an operation. Other operations in the instruction set include parallel add, parallel subtract, parallel compare, parallel maximum, and parallel minimum. The operations indicated by the instructions are carried out in the arithmetic logic unit (“ALU”) of a processor. An instruction indicates, among other things, the operation and the data, in the form of a data word containing data elements, on which the operation is performed. Each data word contains several elements; the number of elements is determined by the mode of operation indicated by the instruction. For instance, when an 8-bit mode is specified, a 32-bit data word contains 4 8-bit data elements, or operands, while in 16-bit mode, the same 32-bit data word contains 2 16-bit operands. A parallel status flags (“PSF”) register stores the parallel status flags (PSFs) which monitor the status of data elements in data word. PSFs indicate whether the result of an integer operation is zero, the sign of the result of an integer operation, whether there was a carry out from the ALU operation, and whether there was a 2's complement integer overflow result. The PSF register is updated whenever a SIMD instruction that updates PSF flags is performed. A parallel conditional test (“PTEST”) register contains a code which maps to a test condition. During parallel conditional move (“PCMOV”) instructions, status flags in the PSF register are compared to the test condition in the PTEST register and, if the flags and condition match, the suboperand corresponding to the flags in the PSF register is moved to a specified register. During parallel absolute value (“PABS”) instructions, the processor determines the absolute value of at least two operands and places the absolute value of the operands in specified registers. The absolute value is determined by using one of the following approaches based on the sign bit of each of the operands: 1) where the sign bit of an operands is 1 and at least one of the other bits is 1, the absolute value of the operand is the 2's complement of the operand; 2) where the sign bit of the operand is 1 and each of the other bits is 0, the absolute value of the operand is the 1's complement of the operand; and 3) where the sign bit of the operand is 0, the absolute value of the operand is the value of the operand. A method and circuit for handling saturation of a result of an operation are also provided. When two m-bit operands are added, as in an addition, average, or subtraction operation, if an average instruction is executed, the m most significant bits are output; otherwise, the m least significant bits are output and the result is saturated if there is overflow and saturation is enabled. In one embodiment, the DSE is controlled by a processor status word (“PSW”) register. In With respect to A parallel status flags (“PSF”) register is part of the DSE. PSFs are used to monitor the status of data elements in data words. The flags are as follows: Zero (“Z”) indicates if the result of an integer operation is zero; Sign (“S”) indicates the sign of the result of an integer operation; Carry (“CY”) indicates there was a carry out from the ALU operation; and Overflow (“OV”) indicates a 2's complement integer overflow result. The register has the following format:
The PSF register is updated whenever a SIMD instruction that updates PSF flags is performed. In 8-bit mode, computations on byte 0 (the least significant byte) affect PSFO, computations on byte 1 affect PSF 1, etc. In 16-bit mode, computations on the lower half-word affect PSF1 while computations on the upper half-word affect PSF3; PSF0 and PSF2 are undefined. Other embodiments of the invention may feature different approaches to handling PSFs.
The DSE also features a parallel condition test (“PTEST”) register. The PTEST register is used when a parallel conditional move (“PCMOV”) instruction is executed. As discussed in greater detail below, a PCMOV operation compares status flags in the PSF register against the test condition specified in the PTEST register; if the flags and the condition match, the suboperand is moved to a specified register. The PTEST register has the following format:
Each 4-bit condition code in the PTEST register maps to a test condition as follows:
Other embodiments of the invention may feature different approaches to handling condition codes and the PTEST register. SIMD instructions may be executed when the DSE is in SIMD mode (in other words, the SIMD bit discussed above is set to “1”). These instructions take 1 cycle to execute. SIMD instructions which may be executed by the processor described above include the following: a parallel absolute value (“PABS”) instruction, which determines the absolute value of an operand and places that value in a specified register; parallel add/subtract (“PADD/PSUB”) instructions that add or subtract operands together and place the results in specified registers; a parallel average (“PAVG”) instruction that averages two values and places the result in a specified register; parallel max/min (“PMAX/PMIN”) instructions that compare two values and write the greater or lesser value into a specified register; a parallel integer compare (“PCMP”) instruction that compares two operands and modifies condition code flags in the parallel status flag register; and a parallel conditional move (“PCMOV”) instruction that compares status flags in the PSW register with the condition code in the PTEST register and, if the flags and code match, moves the operand to a specified register. The instructions and their actions may be summarized as follows:
As noted above, when the HSIMD bit in the PSW is set to “1,” 16-bit, or half-word, operations are used; otherwise, 8-bit, or byte, operations are employed. (The remainder of this discussion will address the use of 32-bit data words and 16- or 8-bit operations. This limitation is for explanatory purposes only. Other embodiments may use 64- or 128-bit data words and 32- or 64-bit operations, etc.) When the USIMD bit is set to “1,” PMIN and PMAX use unsigned operands. When the NSAT bit is set to “1,” the result should not be saturated. The following table shows which instructions are affected when certain PSW bits are set:
Sample opcodes for the instruction and updated settings in the PSF register following execution of each instruction are shown below:
The OV flag is set to zero after execution of a PAVG instruction because there is never overflow when this instruction is executed. The S flag is cleared to 0 after execution of a PABS instruction. Execution of a PCMOV instruction does not affect PSFs. Other embodiments may, of course, use different opcodes to identify each instruction. The PAVG instruction may be executed in 8- or 16-bit mode and may operate on signed or unsigned data. The USIMD PSW bit determines whether sign-extension is done before adding the operands. If the USIMD bit is set, the operands are zero-padded by one bit. If USIMD is not set, the operands are sign-extended by one bit. In 16-bit mode, the PAVG operation is as follows:
PSFs following execution of a PAVG instruction in 16-bit mode are as follows:
In 8-bit mode, the PAVG operation is as follows: rb[31:24]=({(USIMD?0:rb[31]), rb[31:24]}+{(USIMD?0:ra[31]), ra[31:24]})[8:1]rb[23:16]=({(USIMD?0:rb[23]), rb[23:16]}+{(USIMD?0:ra[23]), ra[23:16]})[8:1]rb[15:8]=({(USIMD?0:rb[15]), rb[15:8]}+{(USIMD?0:ra[15]), ra[15:8]})[8:1]rb[7:0]=({(USIMD?0:rb[7], rb[7:0]}+{(USIMD?0:ra[7], ra[7:0]})[8:1]
Following execution of the PAVG operation in 8-bit mode, PSFs are as follows:
“rb” in the tables above refers to the final result of the instruction, not the input operand. The PAVG instruction always rounds down, not towards 0; negative numbers are rounded down towards negative infinity. Execution of the PAVG instruction provides the 8/16 most significant bits (“msbs”) of the result of a 9/17 bits PADD or PSUB operation. Each 8- or 16-bit operation updates the corresponding status flags in the PSF register. PADD instructions may be executed in either 16- or 8-bit mode on signed and unsigned numbers and will provide saturation if the NSAT bit is clear. (When the USIMD bit is “1,” the instructions treat the operands as unsigned operands. When the USIMD bit is “0,” the instructions treat the operands as signed operands.) In 16-bit mode, a PADD instruction operates as follows:
PSFs following execution of a PADD instruction in 16-bit mode are as follows:
The PADD instruction operates in 8-bit mode as follows:
PSFs following an 8-bit operation are as follows:
The “rb” in the above tables refers to the final result of the instruction, not the input operand. Each 8- or 16-bit operation updates the corresponding status flags in the PSF register. PSUB instructions may also be executed in 8-bit or 16-bit mode on signed and unsigned numbers and will provide saturation if the NSAT bit is clear. In 16-bit mode, the PSUB instruction operates as follows:
PSFs after execution of a PSUB instruction in 8-bit mode are as follows:
The PSUB instruction operates in 8-bit mode as follows:
Following execution of the instruction in 8-bit operation, PSFs are as follows:
The “rb” in the above tables refers to the final result of the instruction, not the input operand. Each 8- or 16-bit operation updates the corresponding status flags in the PSF register. Results may be saturated in both 8- and 16-bit mode PADD and PSUB operations (in both signed and unsigned mode). No saturation occurs for PAVG operations, since the average can never overflow, and consequently OV is always 0. In 16-bit unsigned mode, saturation for the PADD instruction occurs as follows: - If ((C==1) && (NSAT==0)) rb[
**31**:**16**]=0xFFFF (Here, C represents the current carry value that will be written in to the PSF register at the end of the instruction.) - If ((C==1) && (NSAT==0) rb[
**15**:**0**]=0xFFFF
In 8-bit unsigned mode, saturation for the PADD instruction occurs as follows: - If ((C==1) && (NSAT==0)) rb[
**31**:**24**]=0xFF - If ((C==1) && (NSAT==0)) rb[
**23**:**16**]=0xFF - If ((C==1) && (NSAT==0)) rb[
**15**:**8**]=0xFF - If ((C==1) && (NSAT==0)) rb[
**7**:**0**]=0xFF
In 16-bit unsigned mode, saturation for the PSUB instruction occurs as follows: - If ((C==0) && (NSAT==0)) rb[
**31**:**16**]=0x0000 - If ((C==0) && (NSAT==0)) rb[
**15**:**0**]=0x0000
If 8-bit unsigned mode, saturation for the PSUB instruction occurs as follows: - If ((C==0) && (NSAT==0) rb [
**31**:**24**]=0x00 - If ((C==0) && (NSAT==0) rb [
**23**:**16**]=0x00 - If ((C==0) && (NSAT==0) rb[
**15**:**8**]=0x00 - If ((C==0) && (NSAT==0) rb[
**7**:**0**]=0x00
In 16-bit signed mode, saturation occurs as follows: - If ((OV==1) && (NSAT==0) && (sum[
**31**]==1)) rb[**31**:**16**]=0x7FFF - If ((OV==1) && (NSAT==0) && (sum[
**31**]==0)) rb[**31**:**16**]=0x8000 - If ((OV==1) && (NSAT==0) && (sum[
**15**]==1)) rb[**15**:**0**]=0x7FFF - If ((OV==1) && (NSAT==0) && (sum[
**15**]==0)) rb[**15**:**0**]=0x8000
In 8-bit signed mode, saturation occurs as follows: - If ((OV==1) && (NSAT==0) && (sum[
**31**]==1)) rb[**31**:**24**]=0x7F - If ((OV==1) && (NSAT==0) && (sum[
**31**]==0)) rb[**31**:**24**]=0x80 - If ((OV==1) && (NSAT==0) && (sum[
**23**]==1)) rb[**23**:**16**]=0x7F - If ((OV==1) && (NSAT==0) && (sum[
**23**]==0)) rb[**23**:**16**]=0x80 - If ((OV==1) && (NSAT==0) && (sum[
**15**]==1)) rb[**15**:**8**]=0x7F - If ((OV==1) && (NSAT==0) && (sum[
**15**]==0)) rb[**15**:**8**]=0x80 - If ((OV==1) && (NSAT==0) && (sum[
**7**]==1)) rb[**7**:**0**]=0x7F - If ((OV==1) && (NSAT==0) && (sum[
**7**]==0)) rb([**7**:**0**]=0x80
If OV is 1, sum[ In Bits Cout[ The output The other input A second AND gate PMIN and PMAX can operate in 8-bit or 16-bit mode with signed or unsigned data depending on the USIMD bit. In 16-bit mode, PMIN and PMAX instructions are executed as follows: - rb[
**31**:**16**]=MIN(rb[**31**:**16**],ra[**31**:**16**]) - rb[
**15**:**0**]=MIN(rb[**15**:**0**],ra[**15**:**0**]) - rb[
**31**:**16**]=MAX(rb[**31**:**16**],ra[**31**:**16**]) - rb[
**15**:**0**]=MAX(rb[**15**:**0**],ra[**15**:**0**])
PSFs are updated as follows following execution of a PMIN or PMAX instruction in 16-bit mode:
In 8-bit mode, PMIN and PMAX instructions are executed as follows: - rb[
**31**:**24**]=MIN(rb[**31**:**24**],ra[**31**:**24**]) - rb[
**23**:**16**]=MIN(rb[**23**:**16**],ra[**23**:**16**]) - rb[
**15**:**8**]=MIN(rb[**15**:**8**],ra[**15**:**8**]) - rb[
**7**:**0**]=MIN(rb[**7**:**0**],ra[**7**:**0**]) - rb[
**31**:**24**]=MAX(rb[**31**:**24**],ra[**31**:**24**]) - rb[
**23**:**16**]=MAX(rb[**23**:**16**],ra[**23**:**16**]) - rb[
**15**:**8**]=MAX(rb[**15**:**8**],ra[**15**:**8**]) - rb[
**7**:**0**]=MAX(rb[**7**:**0**],ra[**7**:**0**])
Following execution of the PMIN or PMAX instruction in 8-bit mode, the PSFs are as follows:
In the above tables, “rb” refers to the final result of the instruction, not the input operand. Each 8- or 16-bit operation updates the corresponding status flags in the PSF register. The PABS instruction may be executed in either 8- or 16-bit mode depending on the HSIMD PSW bit. The NSAT bit in the PSW does not affect the behavior of the PABS instruction. In 16-bit mode, the PABS instruction is executed as follows: - rb[
**31**:**16**]=ABS(ra[**31**:**16**]) - rb[
**15**:**0**]=ABS(ra[**15**:**0**])
After execution of the PABS instruction in 16-bit mode, the PSFs are updated as follows:
In 8-bit mode, the PABS instruction is executed as follows: - rb[
**31**:**24**]=ABS(ra[**31**:**24**]) - rb[
**23**:**16**]=ABS(ra[**23**:**16**]) - rb[
**15**:**8**]=ABS(ra[**15**:**8**]) - rb[
**7**:**0**]=ABS(ra[**7**:**0**])
After execution of the PABS instruction in 8-bit mode, the PSFs are updated as follows:
In the above tables, “rb” refers to the final result of the instruction, not the input operand. Each 8- or 16-bit operation updates the corresponding status flags in the PSF register. The flags tables assume the PABS operation results in 0-ra in the adder. Therefore; overflow will only be set in one case, when the input is 0x80. This is the only instance where the true result of the PABS operation cannot be represented in the required number of bits. The PABS function behaves as follows as shown in The PCMP instruction may be executed in 8- or 16-bit mode on signed or unsigned operands. In executing this instruction, a subtraction is performed without updating the destination register. Instead, the condition code flags in the PSF register are modified. In 16-bit mode, the PCMP operation is as follows: - PSF
**3**=CMP(rb[**31**:**16**],ra[**31**:**16**]) - PSF
**1**=CMP(rb[**15**:**0**],ra[**15**:**0**])
Following execution of a PCMP instruction in 16-bit mode, PSFs are updated as follows:
In 8-bit mode, the PCMP operation is as follows: - PSF
**3**=CMP(rb[**31**:**24**],ra[**31**:**24**]) - PSF
**2**=CMP(rb[**23**:**16**],ra[**23**:**16**]) - PSF
**1**=CMP(rb[**15**:**8**],ra[**15**:**8**]) - PSF
**0**=CMP(rb[**7**:**0**],ra[**7**:**0**])
The PSF register is updated as follows:
Each 8- or 16-bit operation updates the corresponding status flags in the PSF register. PCMOV instructions may be executed in either 16- or 8-bit mode. The instructions test the condition code in the PTEST register (discussed above) against the 4 sets of flags in the PSF register. If the specified condition is true, the corresponding 8 or 16 bits is moved. The PCMOV instruction operates in 16-bit mode as follows: - If (PSF
**3**==cnd(PTEST[**3**:**0**])) rb[**31**:**16**]=ra[**31**:**16**] - If (PSF
**1**==cnd(PTEST[**3**:**0**])) rb[**15**:**0**]=ra[**15**:**0**] The PCMOV instruction operates in 8-bit mode as follows: - If (PSF
**3**==cnd(PTEST[**3**:**0**])) rb[**31**:**24**]=ra[**31**:**24**] - If (PSF
**2**==cnd(PTEST[**3**:**0**])) rb[**23**:**16**]=ra[**23**:**16**] - If (PSF
**1**==cnd(PTEST[**3**:**0**])) rb[**15**:**8**]=ra[**15**:**8**] - If (PSF
**0**==cnd(PTEST[**3**:**0**])) rb[**7**:**0**]=ra[**7**:**0**]
To illustrate execution of a PCMOV instruction, in If 8-bit mode is specified (block 126), the PSF The PCMOV instruction allows decisions on multiple data streams to be made in one cycle, for example, clipping in image processing. Suppose 8×8 mode is specified and the following transformation of each of the 4 8-bit results in register (“R”) 0 is desired: - If x<−30 then 0→x
- If −30<=x<=+30 then c→x, where c is some constant
- If 30<x then
**255→x** The above may be achieved in 4 cycles, with the result in R**1**, as shown below. Suppose - PTEST=JG
- R
**1**=c, c, c, c - R
**2**=0, 0, 0, 0 - R
**3**=−30, −30 , −30, −30 - R
**4**=30, 30, 30, 30 - R
**5**=255, 255, 255, 255 The following instructions are issued: - PCMP R
**0**, R**3** - PCMOV R
**1**, R**1** - PCMP R
**4**, R**0** - PCMOV R
**5**, R**1** Note that PCMP x,y does y-x and JG jumps if y>x.
Referenced by
Classifications
Legal Events
Rotate |