US 20080046681 A1 Abstract A processor and method for processing matrix data. The processor includes M independent vector register files which are adapted to collectively store a matrix of L data elements. Each data element has B binary bits. The matrix has N rows and M columns, and L=N*M. Each column has K subcolumns. N≧2, M≧2, K≧2, and B≧1. Each row and each subcolumn is addressable. The processor does not duplicatively store the L data elements. The matrix includes a set of arrays such that each array is a row or subcolumn of the matrix. The processor may execute an instruction that performs an operation on a first array of the set of arrays, such that the operation is performed with selectivity with respect to the data elements of the first array.
Claims(17) 1. A method, comprising:
providing a processor comprising M independent vector register files and M address registers, wherein each address register of the M address registers is associated with a corresponding one of the M vector register files, and wherein each vector register file is independently addressable through its associated address register pointing to one of the N registers of said vector register file; and distributing a matrix of L data elements in M independent vector register files comprised by a processor resulting in the M vector register files collectively storing the matrix, each data element having B binary bits, said matrix having N rows and M columns, said L=N*M, each column having K subcolumns, said N≧2, said M≧2, said K≧2, said N=K*M, said B≧1, each row of said N rows being addressable, each subcolumn of said K subcolumns being addressable, wherein each of the M vector register files includes an array of N registers, wherein each of the N*M registers of the M vector register files is storing a data element of the L data elements, wherein the data elements of each subcolumn are stored in different vector register files, wherein the data elements of each row are stored in different vector register files. 2. The method of storing the data elements of each subcolumn in different relative register locations of different vector register files of the M vector register files; and storing the data elements of each row are stored in a same relative register location of the different vector register files. 3. The method of 1 or 0 such that the value of the multiplexor consists of the composite value of said binary bits. 4. The method of 5. The method of 6. The method of 7. The method of 8. The method of 1 or 0 such that the value of the multiplexor consists of the composite value of said binary bits, and wherein the values associated with the M multiplexors control said selectivity. 9. The method of 10. The method of 11. The method of 12. The method of 13. The method of 14. A processor, comprising M independent vector register file, M multiplexors respectively coupled to the M vector register files, and M address registers;
wherein each of the M vector register files includes an array of N registers; wherein each address register of the M address registers is associated with a corresponding one of the M vector register files; wherein the M vector register files collectively store a matrix of L data elements, each data element having B binary bits, said matrix having N rows and M columns, said L=N*M, said N=K*M, each column having K subcolumns, said N≧2, said M≧2, said K≧2, said N=K*M, said B≧1, each row of said N rows being addressable, each subcolumn of said K subcolumns being addressable, said processor not duplicatively storing said L data elements; and wherein each multiplexor of the M multiplexors has a different value and comprises a set of binary switches subject to each binary switch being on or off and respectively represented by a binary bit 1 or 0 such that the value of the multiplexor consists of the composite value of said binary bits; wherein each of the N*M registers of the M vector register files stores a data element of the L data elements; wherein each vector register file is independently addressable through its associated address register being adapted to point to one of the N registers of said vector register file; wherein the data elements of each subcolumn are stored in different vector register files; and wherein the data elements of each row are stored in different vector register files. 15. The processor of wherein the data elements of each subcolumn are stored in different relative register locations of the different vector register files; and wherein the data elements of each row are stored in a same relative register location of the different vector register files. 16. The processor of wherein the M multiplexors are configured to respond to a command to read a row of the matrix by mapping the data elements of the row from the M vector register files to the row of the matrix in accordance with a read-row mapping algorithm; and wherein the M multiplexors are to configured respond to a command to read a subcolumn of the matrix by reading the data elements of the subcolumn from the M vector register files to the subcolumn of the matrix in accordance with a read-subcolumn mapping algorithm. 17. The processor of wherein the M multiplexors are configured to respond to a command to write a row of the matrix by mapping the data elements of the row to the M vector register files in accordance with a write-row mapping algorithm; and wherein the M multiplexors are configured to respond to a command to write a subcolumn of the matrix by mapping the data elements of the subcolumn to the M vector register files in accordance with a write-subcolumn mapping algorithm. Description This application is a Continuation of Ser. No. 10/715,688, filed Nov. 18, 2003. 1. Technical Field The present invention relates to logically addressing both rows and subcolumns of a matrix stored in a plurality of vector register files within a processor. 2. Related Art A Single Instruction Multiple Data (SIMD) vector processing environment may be utilized for operations associated with vector and matrix mathematics. Such mathematics processing may relate to various multimedia applications such as graphics and digital video. A current problem associated with SIMD vector processing arises from a need to handle vector data flexibly. The vector data is currently handled as a single (horizontal) vector of multiple elements when operated upon in standard SIMD calculations. The rows of the matrix can therefore be accessed horizontally in a conventional manner. However it is often necessary to access the columns of the matrix as entities, which is problematic to accomplish with current technology. For example, it is common to generate a transpose of the matrix for accessing columns of the matrix, which has the problem of requiring a large number of move/copy instructions and also increases (i.e., at least doubles) the number of required registers. Accordingly, there is a need for an efficient processor and method for addressing rows and columns of a matrix used in SIMD vector processing. The present invention provides a processor, comprising M independent vector register files, said M vector register files adapted to collectively store a matrix of L data elements, each data element having B binary bits, said matrix having N rows and M columns, said L=N*M, each column having K subcolumns, said N≧2, said M≧2, said K≧1, said B≧1, each row of said N rows being addressable, each subcolumn of said K subcolumns being addressable, said processor not adapted to duplicatively store said L data elements. The present invention provides a method for processing matrix data, comprising: providing the processor; and providing M independent vector register files within the processor, said M vector register files collectively storing a matrix of L data elements, each data element having B binary bits, said matrix having N rows and M columns, said L=N*M, each column having K subcolumns, said N≧2, said M≧2, said K≧1, said B≧1, each row of said N rows being addressable, each subcolumn of said K subcolumns being addressable, said processor not duplicatively storing said L data elements. The present invention provides a processor, comprising M independent vector register files, said M vector register files adapted to collectively store a matrix of L data elements, each data element having B binary bits, said matrix having N rows and M columns, said L=N*M, each column having K subcolumns, said N≧2, said M≧2, said K≧1, said B≧1, each row of said N rows being addressable, each subcolumn of said K subcolumns being addressable, said matrix including a set of arrays such that each array is a row or subcolumn of the matrix, said processor adapted to execute an instruction that performs an operation on a first array of the set of arrays, said operation being performed with selectivity with respect to the data elements of the first array. The present invention provides a method for processing matrix data, comprising: providing the processor; providing M independent vector register files within the processor, said M vector register files collectively storing a matrix of L data elements, each data element having B binary bits, said matrix having N rows and M columns, said L=N*M, each column having K subcolumns, said N≧2, said M≧2, said K≧1, said B≧1, each row of said N rows being addressable, each subcolumn of said K subcolumns being addressable, said matrix including a set of arrays such that each array is a row or subcolumn of the matrix; and executing an instruction by said processor, said instruction performing an operation on a first array of the set of arrays, said operation being performed with selectivity with respect to the data elements of the first array. The present invention advantageously provides an efficient processor and method for addressing rows and columns of a matrix used in SIMD vector processing. column column column column Subcolumns register R register R register R register R register R register R Instructions for moving and reorganizing data of the matrix The data elements Rn[m] of the matrix The first rule relates to the storing of a row of the matrix The second rule relates to the storing of a subcolumn of the matrix The multiplexors m Each row of the matrix a a a a As an example of reading a row, assume that the row to be read is associated with register R As an example of reading a subcolumn, assume that the subcolumn to be read is associated with register R The preceding examples illustrate that in order for the multiplexors m Thus, the multiplexors m The multiplexors m The data elements of each row or subcolumn to be written, as selected by register Rn (n=0, 1, . . . , 255), is distributed into the registers Yi[j] of the vector register files V As an example of writing a row, assume that the row to be written is associated with register R As an example of writing a subcolumn, assume that the subcolumn to be written is associated with register R Thus, the multiplexors m Although the embodiments described in The examples illustrated in The preceding relationships involving N, M, and K are merely illustrative and not limiting. The following alternative non-limiting relationships are included within the scope of the present invention. A first alternative relationship is that the subcolumns of a given column do not have a same (i.e., constant) number of data elements. A second alternative relationship is that the total number of binary bits in each subcolumn is unequal to the total number of binary bits in each row. A third alternative relationship is that at least two columns have a different number K of subcolumns. A fourth alternative relationship is that N mod K≠0. A fifth alternative relationship is that there is no value of P satisfying N=2 The scope of the present invention also includes embodiment in which the B binary bits of each data element are configured to represent a floating point number, an integer, a bit string, or a character string. Additionally, the present invention includes a processor having a plurality of vector register files. The plurality of vector register files is adapted to collectively store the matrix of L data elements. Note that the L data elements are not required to be stored duplicatively within the processor, because the rows and the subcolumns of the matrix are each individually addressable through use of vector register files in combination with address registers and multiplexors within the processor, as explained supra in conjunction with In embodiments of the present invention, illustrated supra in conjunction with While the matrix In the first example relating to the instruction depicted by In the second example relating to the instruction depicted by In the third example relating to the instruction depicted by The preceding examples are merely illustrative. Since there are 256 permutations (i.e., 4 There are many other operations, in addition to the operations illustrated in While While embodiments of the present invention have been described herein for purposes of illustration, many modifications and changes will become apparent to those skilled in the art. Accordingly, the appended claims are intended to encompass all such modifications and changes as fall within the true spirit and scope of this invention. Classifications
Legal Events
Rotate |