US 20040230632 A1 Abstract A method and system for performing many different types if algorithms utilizes a single mathematical engine such that the mathematical engine is capable of utilizing the same multipliers for all of the algorithms. The mathematical engine includes a selectively controlled parallel output register, at least one selectively controlled memory, and a plurality of processing elements. The output register, the memory and the processing elements are selectively controlled depending upon the algorithm to be performed.
Claims(18) 1. A mathematical engine comprising:
a parallel output shift register receiving data to be processed; and a processor, including an adder tree using a plurality of arithmetic logic unit (ALU) circuits, for accepting the output of the shift register and for providing a data output; whereby the shift register includes a selectable initial position, to selectively output the data based upon the capacity of the processor. 2. The mathematical engine of 3. The mathematical engine of 4. The mathematical engine of 5. A calculation unit for performing a plurality of different types of calculations, the calculation unit comprising:
a parallel output shift register; a multiplexer, for receiving the output from said shift register and providing an output to an adder tree; the adder tree comprising a plurality of arithmetic logic units (ALUs); and a selection circuit for selectively enabling the shift register and the multiplexer to apply certain portions of the input data to the adder tree, to perform different calculations. 6. The calculation unit of 7. A mathematical engine for performing calculations, the mathematical engine including:
at least one input memory for storing input data; a selectable memory for receiving the input data from said at least one input memory and for providing a selectable output via a plurality of folds, wherein each fold comprises at least one position within the selectable memory; and a processor array having a plurality of processors for receiving an output from the selectable memory and selectively providing an output. 8. The mathematical engine of 9. The mathematical engine of an accumulation circuit for receiving and accumulating each output from the adder tree; whereby said enablement circuit further controls at least a portion of the adder tree, support the desired mathematical calculation. 10. A computation circuit for resolving complex functions; the computation circuit comprising:
a memory, for receiving input data for complex resolution; a store, for storing an operational factor for the complex function; a multiplexer, for receiving an input from the memory and the operational factor; a processing array circuit, for processing according to a number of bit locations stored by the memory, the processing array circuit including an output from the multiplexer and at least some of the input data; a complex adder tree receiving outputs from the processing array and providing an added output; and an accumulator circuit receiving an output from the adder tree and providing an accumulated complex output. 11. The computation circuit of 12. The computation circuit of 13. A computation circuit for resolving complex functions, comprising:
a memory, for receiving complex input data for resolution; a store for a twiddle factor; a multiplexer, for receiving the complex input data from the memory and the twiddle factor from the store; a processing array circuit, for processing data according to a number of bit locations stored by the memory, the processing array circuit including an output from the multiplexer and at least some of the input data; a complex adder tree receiving outputs from the processing array and providing an added output; and an accumulator for accounting said added outputs. 14. The computation circuit of 15. A method for electronically resolving complex functions, the method comprising:
providing input data for complex resolution from a first memory; providing an operational factor for the complex function from a second memory, and multiplexing the operational factor with data from the first memory to provide multiplexed data; and supplying a select portion of the multiplexed data to a processing array, as a parallel output, said select portion depending upon the complex function. 16. The method of 17. The method of providing a twiddle factor as the operational factor, for performing discrete Fourier transforms (DFTs); and selectively engaging at least a portion of the processing array, thereby controlling a data size processed by the processor array. 18. The method of Description [0001] This application claims priority from U.S. provisional application No. 60/413,164 filed Sep. 24, 2002, which is incorporated by reference as if fully set forth. [0002] The present invention relates to the utilization of a mathematical engine to calculate the output of an array of complex multipliers. More specifically, the present invention is a computationally efficient mathematical engine which is accessible to perform a plurality of mathematical calculations. [0003] Modern wireless communication systems generally require an immense number of mathematical calculations to perform signal processing. Such calculations are generally performed by processors and application specific integrated circuits (ASICs). [0004] Standard ASIC design for a receiver requires the implementation and calculation of many algorithms, which often require many parallel multiplications in order to complete the calculations during the allotted time. These algorithms typically consist of many matrix-to-matrix and matrix-to-vector multiplications and many discrete Fourier transform (DFT) and fast Fourier transform (FFT) calculations. Because multipliers take up alot of room on an ASIC, it is desirable to devise a solution which is capable of applying the same multipliers across the several algorithms. [0005] Certain common calculations may be used to support a variety of current wireless technologies such as WCDMA, WTT, CDMA2000, 802.1X, TDSCDMA, FDD, TDD, and also other future system architectures not presently contemplated. One such type of calculation which is commonly performed is the dot product multiplication. Performing a dot-product calculation is a standard function as an operation between two matrixes. For example, dot product calculations are required for performing channel estimation and data estimation. In a Wide Band TDD System such calculations may include calculation of the prime factor fast Fourier transform (FFT), multiplication of a matrix by another matrix, multiplication of a matrix by its complex conjugate transpose, and a multiplication of a matrix by a vector. [0006] In general, several dot product calculations must be performed by a single communication device, and therefore the communication device must have adequate processing power to support the required calculations. Presently, each algorithm utilizes dedicated hardware to implement its own mathematical functions. It would be advantageous to develop a system which enables reuse of hardware to maximize the operational efficiency. Operational efficiency includes, but is not limited to, time of processing, area of silicon to perform the processing, and the power required by the silicon during processing. [0007] According to the present invention, a mathematical engine is provided for performing multiple types of mathematical calculations, such that the hardware is utilized efficiently. The present invention includes a memory having a parallel output used to store one or more values which are selectively output in a parallel output of logically adjacent values. In the case that the length of the stored value, such as a vector, exceeds the capacity of the computational section, the memory is addressed so as to provide portions of the vector, referred to as a fold, in a logical sequence which permits completion of a mathematical execution on the full vector. [0008] Different algorithmic results are generated by the selective use of enable signals which enable data transfer and proper mathematical calculations to effect control of the operation of the mathematical engine. This has the advantage of increasing the flexibility of the mathematical engine to perform different types of calculations, and also provides an economy of processor circuitry to reduce the amount of semiconductor real estate required when designing the signal processor. [0009]FIG. 1 is a general block diagram the mathematical engine of the present invention. [0010]FIGS. 2A-2C show multiplication of the System Response Matrix (A [0011]FIGS. 3A-3G show the mathematical engine of FIG. 1 performing the calculations required for the A [0012]FIGS. 4A-4D show multiplication of the A [0013]FIGS. 5A-5I show the mathematical engine of FIG. 1 performing the calculations required for the A [0014]FIGS. 6A-6D show the mathematical engine of FIG. 1 performing the calculations required for a DFT. [0015]FIGS. 7A-7C illustrate selective use of the input sources. [0016] The present invention will be described with reference to the drawing figures wherein like numerals represent like elements throughout. [0017] The present invention is a single mathematical engine for processing a plurality of separate and distinct algorithms. The mathematical engine is capable of utilizing the same hardware for all of the algorithms. Because multipliers require significant space on the ASIC, the present invention reduces the amount of required space for the ASIC. The mathematical engine of the present invention is also very efficient in performing required calculations by using the hardware a higher percentage of the time. The efficiency of the mathematical engine is dependant on the sizes of the input matrixes and the number of processing elements. [0018] In general, the mathematical engine has at least two inputs and one output. The inputs include a serial input and a parallel input, where the parallel input is as wide as the number of processing elements. The number of processing elements may be optimized to be an entire vector, a piece of a vector, or a vector of a matrix. The parallel and serial inputs can both be loaded into a shift register, or another type of serial access register, for different types of operations. The parallel output shift register is a memory which has a parallel output and which permits rapid output of the data stored. The parallel output of the shift register is multiplexed such that logically adjacent values, whose width is determined by the number of processing elements, can perform the function of a serial access register, can perform the function of a parallel output shift register having a selectable output, or can provide access to a secondary parallel input. The primary parallel input and the multiplexed parallel output of the shift register and secondary parallel input act as inputs to complex multipliers and an adder tree, which increases the efficiency of the calculations performed by the mathematical engine. This permits data to be moved into the registers as quickly as possible for each of the performed operations, and also permits re-organization of the data for internal steps of the operation to be performed efficiently. [0019] In the preferred embodiment, the parallel output shift register outputs data from logically adjacent data values, so that the output is used to store a value which is selectively output in parallel to a computational section. In the case of the length of the stored vector exceeding the capacity of the computational section, the parallel output shift register is addressed so as to provide portions of the vector in a sequence which permits completion of a mathematical execution on the full vector. As a result, almost every clock results in a calculation, instead of multiple steps to prepare the data for the operation. When coupled with an output circuit of a given length, the parallel output shift register is a parallel output n-fold shift register, meaning that its memory store is n times the data width of the computational section (i.e. the number of the processing elements). [0020] The adder tree feeds an accumulator, which enables many different matrix-to-matrix and matrix-to-vector multiplications and enables efficient calculations such as A [0021] According to the present invention, the dot product calculations performed by the mathematical engine include, but are not limited, to a plurality of different types of multiplications required for channel estimation and data estimation such as: the prime factor FFT; the multiplication of a matrix by another matrix, multiplication of a matrix by its complex conjugate transpose, and a multiplication of a matrix by a vector. [0022] Referring to FIG. 1, a block schematic diagram of a mathematical engine [0023] In operation, the output of the PPDIS [0024] The output from the multiplexer [0025] The adder tree [0026] The mathematical engine [0027] An example of the present invention implementing the A [0028]FIG. 2A is a simple field matrix representation of the A [0029] Since the A [0030] In FIG. 3A the first fold of the first row of the A [0031] The portions of the matrix that will be computed are highlighted in FIG. 2B. The entire rows are highlighted, but only the portions of the matrixes that are inside the doted lines are used in the calculations. FIG. 2B shows the first row of the A [0032] Since the A [0033]FIG. 3D shows the multiplication of the first fold of the first row of the A [0034] In FIG. 2C it can be seen how 16 zeros have been shifted into the left of the shift register [0035] After every fold of the first valid A [0036] As a second example, the A [0037] For example, there are a total of 61 A [0038]FIG. 4A is a diagram representing the A [0039] The first step in calculating A [0040] To start the step-by-step process of calculating A [0041]FIG. 5A shows the first fold of the first A [0042] The next step is to calculate the next valid row of the A [0043] The next step is to start the calculations of the second A [0044] Since the A [0045]FIG. 5G shows one value of the r vector being shifted into the right side of the shift register while the first value put into the shift register is lost. FIG. 5H shows a second value of the r vector being shifted into the shift register while the second value put into the register is lost. FIG. 5I shows the first A [0046] As a third example, the Steiner algorithm as performed by the mathematical engine of FIG. 1 will be described with reference to FIGS. 6A-6D. The figures show the process of using the mathematical engine to do a 3 pt DFT for a 456 pt Steiner. FIGS. 6A-6C show the calculation of the first DFT, each figure showing a different clock. As shown, the first three addresses of the DFT were already loaded into the a memory serially so they can all be accessed at once in a parallel manner, through the PPDIS [0047] In each of these three clocks, the first DFT inputs at addresses [0048] In FIG. 6A the three points of Twiddle Set [0049]FIG. 6D shows how the first point of the next DFT is calculated with the first twiddle factor set and the other two sets will follow in the following clocks. A 64-point DFT is done in a slightly different manner where the multiplication of each DFT set by the twiddle set takes four consecutive clocks that are accumulated together before storage. [0050] In FIG. 6D the three points of Twiddle Set [0051] Referring to FIGS. 7A-7C, it can seen that selective enablement of inputs into the mathematical engine of the present invention permits the mathematical engine to perform A [0052] Referring to FIG. 7B, for the A [0053] As shown in FIG. 7C for the Steiner operation (FFT), the PPDIS provides the FFT input data set and the SPDIS provides the FFT twiddle factors. During the FFT operations, in an m-point FFT, the appropriate m-points of the data set are provided by the PPDIS to the complex multiplier array, while the appropriate FFT Twiddle factors are provided by the SPDIS. Referenced by
Classifications
Legal Events
Rotate |