US 20060149938 A1
According to some embodiments, a value is retrieved from a location in an index register. A region in a register file may then be determined based at least in part on the value. Information may then be stored into the determined region of the register file.
1. A method, comprising:
retrieving a value from a location in an index register;
determining a region in a register file based at least in part on the value; and
storing information into the determined region of the register file.
2. The method of
3. The method of
describing, for an operand, the location in the index register.
4. The method of
5. The method of
retrieving from the index register a value for a source operand;
determining a source operand region in the register file; and
reading information from the source operand region in the register file.
6. The method of
7. The method of
8. The method of
9. The method of
describing, for an operand, the region in the register file.
10. The method of
11. The method of
12. The method of
13. The method of
14. The method of
determining if an instruction is associated with at least one of: (i) an immediate addressing mode, (ii) a register addressing mode, or (iii) a register-indirect-register addressing mode.
15. The method of
16. The method of
17. An apparatus, comprising:
a single instruction, multiple data execution engine;
a register file on the same die as the execution engine; and
an index file on the same die as execution engine and the register file, the index file to store a value describing a region in the register file where information will be stored.
18. The apparatus of
an instruction mapping engine to (i) determine, for an operand of a machine code instruction, a portion of the register file based at least in part on the origin, wherein the determined portion is to store information for multiple execution channels the execution engine, and (ii) arrange for the information to be stored into the register file in accordance with the determined region.
19. The apparatus of
20. A system, comprising:
an n-channel single instruction, multiple-data execution engine, n being an integer greater than 1;
a register file;
an index file to store an origin of a region in the register file where information will be stored by the execution engine; and
a graphics data unit.
21. The system of
an instruction mapping engine to (i) scatter data to non-contiguous areas of the register file, and (ii) gather data from non-contiguous areas of the register file.
22. The system of
To improve the performance of a processing system, a Single Instruction, Multiple Data (SIMD) instruction is simultaneously executed for multiple operands of data in a single instruction period. For example, an eight-channel SIMD execution engine might simultaneously execute an instruction for eight 32-bit operands of data, each operand being mapped to a unique compute channel of the SIMD execution engine. Moreover, one or more registers in a register file may be used by SIMD instructions, and each register may have fixed locations associated with execution channels (e.g., a number of eight-word registers could be provided for an eight-channel SIMD execution engine, each word in a register being assigned to a different execution channel). An ability to efficiently and flexibly access register information in different ways may further improve the performance of a SIMD execution engine.
Some embodiments described herein are associated with a “processing system.” As used herein, the phrase “processing system” may refer to any device that processes data. A processing system may, for example, be associated with a graphics engine that processes graphics data and/or other types of media information. In some cases, the performance of a processing system may be improved with the use of a SIMD execution engine. For example, a SIMD execution engine might simultaneously execute a single floating point SIMD instruction for multiple channels of data (e.g., to accelerate the transformation and/or rendering three-dimensional geometric shapes). Other examples of processing systems include a Central Processing Unit (CPU) and a Digital Signal Processor (DSP). Note that any of the embodiments described herein may be associated with other types of processing systems, including a Multiple Instruction, Multiple Data (MIMD) execution engine.
add(8) R1 R3 R4
The “(8)” indicates that the instruction will be executed on operands for all eight execution channels. The “R1” is a destination operand (DEST), and “R3” and “R4” are source operands (SRC0 and SRC1, respectively). Thus, each of the eight single-byte data elements in R4 will be added to corresponding data elements in R3. The eight results are then stored in R1. In particular, the first byte of R4 will be added to the first byte of R3 and that result will be stored in the first byte of R1. Similarly, the second byte of R4 will be added to the second byte of R3 and that result will be stored in the second byte of R1, etc.
In some applications, it may be helpful to access information in a register file in various ways. For example, in a graphics application it might at some times be helpful to treat portions of the register file as a vector, a scalar, and/or an array of values. Such an approach may help reduce the amount of instruction and/or data moving, packing, unpacking, and/or shuffling and improve the performance of the system. Moreover, when a register file has a relatively large number of registers (e.g., one hundred registers), it might be helpful to let an application kernel maintain kernel data in a pre-determined register file location (e.g., in a manner similar to a software managed data cache).
At 402, a value is retrieved from a location in an index register. The value might indicate, for example, which of a number of different registers in a register file should be used as a source operand or a destination operand. Note that the appropriate location in the index register might be encoded in a machine code instruction, and that location's current value might have been determined and stored by an application at run-time.
At 404, a region in a register file is determined based at least in part on the value. For example, the region might simply be a particular register in the register file that will be used as an operand. Information may then be stored into (and/or retrieved from) the determined region of the register file at 406.
add(8) [L1] R3 R4
The “(8)” indicates that the instruction will be executed on operands for all eight execution channels. The R3 and the R4 are source operands (SRC0 and SCR1, respectively) and indicate that information from those registers should be added together.
The brackets in “[L1]” indicate that this operand is being defined at least in part based on a value in an index register 530 (e.g., in accordance with a register-indirect-register addressing mode). In particular, a value at location “L1” of the index register 530 will indicate which register of the register file 520 should be the destination operand (DEST). In the example illustrated in
The index register 530 may be, for example, a dedicated storage area that is used only for indexing purposes. According to some embodiments, the index register 530 may also be used for other purposes. For example, a portion of the register file 520 might be designated as the index register 530 (e.g., the designation might be made by an instruction word or an architectural state register).
According to some embodiments, an index register might store multiple values associated with a single instruction. For example,
add(8) [L1] R3 [R0]
As before, R3 is source operand SRC0 and “L1” indicates that the value at location L1 in the index register 630 will define a destination operand DEST. In this case, source operand SRC1 is defined by a value at another location L0 in the index register. As illustrated in
Note that any combination of immediate, register, and/or register-indirect-register addressing may be applied to operands. For example, the execution engine 610 might execute:
Also note that the index register 630 does not need to be same size as the registers in the register file 620. Similarly, the locations within the index register 630 may be of various sizes. Moreover, a value within the index register 630 might point to a register, a byte, a bit, or another type of location within the register file 620.
For example, the value stored in the index register 630 might simply be an integer from 0 through 4 indicating which of the five registers in the register files 620 should be used. According to some embodiments, the value in the index register 630 may define an origin of a region in the register file 620. For example, the value might represent a register identifier and a “sub-register identifier” indicating a location of a first data element within a register.
Add(8) [L1] R3 R4
As before, R3 and R4 are source operands SRC0 and SRC1 and a value stored at location L1 of the index register 730 will be used to determine DEST. In this case, the value stored in the index register 730 represents an “origin” of RegNum.SubRegNum. The sub-register identifier might indicate, for example, an offset from the start of a register (e.g., and may be expressed using a physical number of bits or bytes or a number of data elements). For example, the DEST region in
Note that the index register 730 may contain a complete region description, or part of a region description. That is, an index register 730 may contain a register description, in whole or in part, of the location of the operand in the register file 720. For example, the index register 730 may contain the exact integer location of a SIMD 8-wide register location of the operand. In another example, the index register 730 may contain a complete description of a region-based register which algorithmically maps 8 locations in the register file to the 8 channel positions of the operand. As a further example, an index may contain only a partial description of the mapping, which when combined with the remaining description either from the instruction word or from some other base description in a storage element, defines a complete mapping of registers to the 8-wide operand.
An origin might be defined in other ways. For example, the register file 720 may be considered as a contiguous 40-byte memory area. Moreover, a single 6-bit address origin could be stored in the index register 730 to represent any byte within the register file 720. Note that a single 6-bit address origin is able to point to any byte within a register file of up to 64-byte memory area. As another example, the register file 720 might be considered as a contiguous 320-bit memory area. In this case, a single 9-bit address origin could be stored in the index register 730.
To provide additional flexibility,
At 804, it is arranged for information to be stored into (or retrieved from) the register file in accordance with the described region. For example, data from a first region might be compared to data in a second region, and a result might be stored in a third region on a per-channel basis.
The region descriptions of SRC0 and SRC1 include a register identifier and a sub-register identifier indicating a location of a first data element in the register file 920. With respect to DEST, an index register 930 will store a value in location LO representing the register identifier and sub-register identifier (which, in the example illustrated in
Some or all of the region descriptions may include a “width” of the region. The width might indicate, for example, a number of data elements associated with the described region within a register row. For example, the DEST region illustrated in
Similarly, the SRC0 region is described as being four bytes wide (and therefore two rows or registers high) and the SRC1 region is described as being eight bytes wide (and therefore has a vertical height of one data element). Note that a single region may span different registers in the register file 920 (e.g., some of the DEST region illustrated in
Although some embodiments discussed herein describe a width of a region, according to other embodiments a vertical height of the region is instead described (in which case the width of the region may be inferred based on the total number of data elements). Moreover, note that overlapping register regions may be defined in the register file 920 (e.g., the region defined by SRC0 might partially or completely overlap the region defined by SRC1). In addition, although some examples discussed herein have two source operands and one destination operand, other types of instructions may be used. For example, an instruction might have one source operand and one destination operand, three source operands and two destination operands, etc.
According to some embodiment, a region origin (e.g., encoded in an instruction or stored in the index register 930) and width might result in a region “wrapping” to the next register in the register file 920. For example, a region of byte-size data elements having an origin of R2.6 and a width of eight would include the last bytes of R2 along with the first six bytes of R3. Similarly, a region might wrap from the bottom of the register file 920 to the top (e.g., from R4 to R0).
The SIMD execution engine 910 may add each byte in the described SRC1 region to a corresponding byte in the described SRC0 region and store the results the described DEST region in the register file 920. For example,
In this case, a horizontal stride of two has been described. As a result, each data element in a row is offset from its neighboring data element in that row by two bytes. For example, the data element associated with channel 5 of the execution engine is located at byte 3 of R2 and the data element associated with channel 6 is located at byte 5 of R2. In this way, a described region may not be contiguous in the register file 1120. Note that when a horizontal stride of one is described, the result would be a contiguous 4×2 array of bytes beginning at R1.1 in the two dimensional map of the register file 1120.
The region described in
According to some embodiments, the value of a horizontal stride may be encoded in an instruction. For example, a 3-bit field might be used to describe the following eight potential horizontal stride values: 0, 1, 2, 4, 8, 16, 32, and 64. Moreover, a negative horizontal stride may be described according to some embodiments.
Note that a region may be described for data elements of various sizes. For example,
The region described in
According to some embodiments, an index register stores a single value describing an origin of a region. According to other embodiments, an index register may store multiple values to describe a region. For example,
In this example, multiple locations in the index register 1530 may each point to a register sub-region as defined by an immediate instruction field. For example, the horizontal dimension may be described by the immediate terms of the instruction word while the vertical dimension (e.g., the origin of each row) is described in the index register 1530. Such an embodiment may be associated with, for example, a one-dimensional field and/or a gathering of vector mode (e.g., in connection with a replicated scalar or a one-dimensional array).
Note that rows of data elements defined in the index register 1530 do not need to be aligned to each other. For example,
Note that different types of descriptions may be provided for different instructions. For example, a first instruction might define a destination region as a 4×4 array while the next instruction defines a region as a 1×16 array. Moreover, different types of regions may be described for a single instruction.
Consider, for example, the register file 2120 illustrated in
In this example, regions are described for an operand in one of two ways:
SRC1 is two bytes wide, and therefore four data elements high, and begins in byte 17 of R2 (illustrated in
SRC0 is four bytes wide, and therefore two data elements high, and begins at R1.14 (based on the value stored in the index register 2130). Because the horizontal stride is zero, the value at location R1.14 (e.g., “2” as illustrated in
DEST is four words wide, and therefore two data elements high, and begins at R5.3 (based on the value stored at location A0.1 in the index register 2130). Thus, the execution channel will add the value “1” (the first data element of the SRC0 region) to the value “2” (the data element of the SRC1 region that will be used by the first four execution channels) and the result “3” is stored into bytes 3 and 4 of R5 (the first word-size data element of the DEST region).
The horizontal stride of DEST is three data elements, so the next data element is the word beginning at byte 9 of R5 (e.g., offset from byte 3 by three words), the element after that begins at bye 15 of R5 (shown broken across two rows in
The vertical stride of DEST is eighteen data elements, so the first data element of the second “row” of the DEST array begins at byte 7 of R6. The result stored in this DEST location is “6” representing the “3” from the fifth data element of SRC0 region added to the “3” from the SRC1 region which applies to execution channels 4 through 7.
According to some embodiments, an index register may store a value for each data element in a register region (e.g., in connection with a total gathering mode). For example,
Because information in the register files may be efficiently and flexibly accessed in different ways, the performance of a system may be improved. For example, machine code instructions may efficiently be used in connection with a replicated scalar, a vector of a replicated scalar, a replicated vector, a two-dimensional array, a sliding window, and/or a related list of one-dimensional arrays. As a result, the amount of data moves, packing, unpacking, and or shuffling instructions may be reduced—which can improve the performance of an application or algorithm, such as one associated with a media kernel. Moreover, a register-indirect-register addressing mode of operation might help an application kernel maintain kernel data in a pre-determined register file location which may further improve performance of a system (especially when there are a relatively large number of registers in a register file).
Note that in some cases, restrictions might be placed on region descriptions. For example, a sub-register origin and/or a vertical stride might be permitted for source operands but not destination operands. Moreover, physical characteristics of a register file might limit region descriptions. For example, a relatively large register file might be implemented using embedded Random Access Memory (RAM), and the cost and power associated with the embedded RAM might depended on the number of read and write ports that are provided. Thus, the number of read and write points (and the arrangement of the registers in the RAM) might restrict region descriptions.
The system 2300 may also include an instruction memory unit 2330 to store SIMD instructions and a data memory unit 2340 to store data (e.g., scalars and vectors associated with a two-dimensional image, a three-dimensional image, and/or a moving image). The instruction memory unit 2330 and the data memory unit 2340 may comprise, for example, RAM units. Note that the instruction memory unit 2330 and/or the data memory unit 2340 might be associated with separate instruction and data caches, a shared instruction and data cache, separate instruction and data caches backed by a common shared cache, or any other cache hierarchy. According to some embodiments, the system 2300 also includes a hard disk drive (e.g., to store and provide media information) and/or a non-volatile memory such as FLASH memory (e.g., to store and provide instructions and data).
The following illustrates various additional embodiments. These do not constitute a definition of all possible embodiments, and those skilled in the art will understand that many other embodiments are possible. Further, although the following embodiments are briefly described for clarity, those skilled in the art will understand how to make any changes, if necessary, to the above description to accommodate these and other embodiments and applications.
Although various ways of describing source and/or destination operands have been discussed, note that embodiments may be use any subset or combination of such descriptions. For example, only source operands might be permitted to have a vertical stride.
According to some embodiments, a description of a register region is encoded in an instruction word for each of the instruction's operands. For example, the register number and sub-register number of the origin may be encoded. In some cases, the value in the instruction word may represent a different value in terms of the actual description. For example, three bits might be used to encode the width of a region, and “011” might represent a width of eight elements while “100” represents a width of sixteen elements.
In this way, a larger range of descriptions may be available as compared to simply encoding the actual value of the description in the instruction word.
Moreover, an instruction word might indicate whether an immediate or a register-indirect-register addressing mode should be used. The instruction may further include a portion that contains, depending on the addressing mode, one of: (i) a location in a register file (e.g., a register number and/or a sub-register) or (ii) a location in an index register (e.g., an index register number and/or index sub-register number).
As described herein, an index register may contain a value that represents an origin of a register region. According to some embodiments, the index register may include other values to describe the register region instead of, or in addition to, the origin. For example, the width, horizontal stride, or data type of a register region might be stored in an index register.
The several embodiments described herein are solely for the purpose of illustration. Persons skilled in the art will recognize from this description other embodiments may be practiced with modifications and alterations limited only by the claims.