|Publication number||US6692534 B1|
|Application number||US 09/392,070|
|Publication date||Feb 17, 2004|
|Filing date||Sep 8, 1999|
|Priority date||Sep 8, 1999|
|Publication number||09392070, 392070, US 6692534 B1, US 6692534B1, US-B1-6692534, US6692534 B1, US6692534B1|
|Inventors||Yong Wang, Allan Tzeng|
|Original Assignee||Sun Microsystems, Inc.|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (3), Referenced by (18), Classifications (7), Legal Events (4)|
|External Links: USPTO, USPTO Assignment, Espacenet|
1. Field of the Invention
The present invention relates to combination multiplication and addition operations in a computing environment. More particularly, the present invention relates to an apparatus for performing nonparallel and parallel multiplication/addition operations, and a method for operating that apparatus.
2. The Background Art
In modern computers, it is often true that multiplication operations must be performed in order to satisfy a given objective. Those multiplication operations are often performed in either of two ways, traditional and parallel.
This disclosure primarily concerns itself with traditional multiplication where two numbers such as those seen in FIG. 1A are split into upper and lower halves such as those seen in FIG. 1B. The upper and lower halves are then multiplied together in four operations, i.e the upper half of the first number is multiplied with the upper and lower halves of the second number, and the lower half of the first number is multiplied with the upper and lower halves of the second number, and then the four results are added together to form the final result of the multiplication.
Therefore, two numbers “A” and “B” (shown here as numbers 10 and 12, each being 32-bits wide in this example) being multiplied together will each be broken into 16-bit halves, resulting in an upper half 14 and a lower half 16 from original number 10, and an upper half and a lower half 20 from original number 12.
Once the numbers are split apart into halves, and operation known as booth decoding is performed on one of the two halves as a first step in the multiplication of that number with the other number.
Booth decoding is a process by which overlapping groups of three bits each chosen from a first number are used to determine quantities of the second number which must be added together in order to produce a correct final multiplication result. In the example which follows, a first number and a second number are to be multiplied together. The second number is used as the key in booth decoding. In booth decoding when using the lower bits of the second number, a leading zero is added as the least significant bit of the lower half of the second number resulting in three-bit groups such as booth groups 22, 24, 26, 28, 30, 34, 36, and 38. Group 38 has two zeros added as most significant bits of the lower half, in order to properly complete that group.
When determining booth groups corresponding to the upper half of the second number, prior art systems again add a zero as the least significant bit of that upper half and use that zero when determining the rightmost booth group such as booth group 40. The remaining booth groups such as booth groups 42, 44, 46, 48, 50, 52, and 54 are then determined as previously described.
In, order to determine the nine booth groups associated with this lower half of the second number, and also properly use those nine booth groups, the prior art uses the apparatus depicted in FIG. 2.
Referring to FIG. 2, prior art system 56 includes booth recoder 58 which has and input for receiving a first number and an input for receiving a second number. Booth recoder 58 has nine outputs, each of those nine outputs providing partial products corresponding to the respective booth groups 22, 24, 26, 28, 30, 32, 34,.36, and 38. Eight of the nine outputs are provided in groups of four to two 4:2 compressors 60 and 62 the outputs of which are said to a third 4:2 compressor 64.
The ninth output 66 of booth recoder 58 together with the two outputs 68 and 70 of 4:2 compressor 64 are provided to 3:2 compressor 72. The two outputs of 3:2 compressor 72 are then provided to adder 74 to perform the final addition operation required. Output 76 of adder 74 is the final result of the multiplication.
While prior art systems are suitable for their intended purpose they require hardware which is unnecessary and which requires valuable space which could otherwise be used for different purpose. It would therefore be beneficial to provide a system which requires less hardware and therefore requires less space. This benefit and others are provided by the present invention.
The present invention provides an apparatus for booth decoding which stores the most significant bit of the lower half of the number used as the key for booth decoding. By using this stored bit to determine the rightmost booth group corresponding to the upper half of the key, booth decoding may be accomplished more quickly using an apparatus that is simpler and smaller than prior art assemblies.
FIGS. 1A, 1B and 1C show prior art binary groups involved in the booth decoding process.
FIG. 2 is a block diagram of a prior art apparatus for performing the function N=A*B.
FIG. 3 shows binary groups involved in the present invention booth decoding process.
FIG. 4 is a block diagram showing a present invention apparatus for performing the function N=A*B.
FIG. 5 is a block diagram of a booth decoder incorporating the present invention.
FIG. 6 is a block diagram showing the present invention apparatus for performing the function N=A*B+C.
FIG. 7 shows one example of new numbers formed from the original upper and lower halves of original number A.
FIG. 8 is a block diagram of an alternative embodiment of FIG. 6.
Those of ordinary skill in the art will realize that the following description of the present invention is illustrative only and not in any way limiting. Other embodiments of the invention will readily suggest themselves to such skilled persons who are familiar with this disclosure.
The present invention provides a method and apparatus for booth recoding which stores the most significant bit of the lower half bits of the number being used to create booth groups and uses that most significant bit when determining the rightmost booth group from the upper half bits of that same number. FIG. 3 shows binary groups involved in the present invention booth decoding process.
Referring to FIG. 3, the same lower half bits 18 and upper half bits 20 are used as was previously seen with respect to FIG. 1B. However, when determining booth groups with respect to upper half bits 20, rather than padding lower half bits 20 with a zero in the least significant bit position, the present invention instead stores the value of most significant bit 76 from lower half bits 18 and uses that value in the least significant bit position for upper half bits 20.
FIG. 4 is a block diagram showing a present invention apparatus for performing the function N=A*B.
Referring to FIG. 4, system 78 includes booth recoder 80, 4:2 compressors 82, 84, and 86, and adder 88. Each 4:2 compressor takes in four partial products of equal width and provides carry and sum outputs which would represent those partial products added together. By adding those partial products together in stages, number A,or its higher or lower bits, are added together the proper number of times so that when the resulting upper and lower halves are put back together, the correct multiplication result is achieved.
Table 1 below shows the selection of each partial product according to the particular three-bit booth group.
Using Table 1 above, if a given booth group was “011”, the partial product chosen as the associated input to the respective compressor would be the binary number which is twice the value of the first number.
For traditional multiplication, booth recoder 80 selects booth groups such as booth groups 22, 24, 26, etc. previously described in relation to FIG. 1C, and then provides the corresponding partial products to the inputs of 4:2 compressors 82 and 84. A key feature however with the present invention is that by using the value of bit 76 of FIG. 3, the present invention multiplication system is not required to have electronic circuits which produce a ninth booth group such as booth group 38 of FIG. 1C. Nor is the present invention apparatus required to have 3:2 compressor 72 as was required by the prior art.
Amendment to the Specification
All 4:2 compressors described herein are intended to compress four inputs into two outputs. They may comprise full adders as are known to those of ordinary skill in the art, or may instead comprise compressors such as are described in U.S. patent application Ser. No. 09/391,166, filed Sep. 8, 1999, now U.S. Pat. No. 6,532,485 entitled “METHOD AND APPARATUS FOR PERFORMING MULTIPLICATION/ADDITION OPERATIONS”, naming Yong Wang as inventor, assigned to Sun Microsystems, Inc., the entire application being incorporated herein by reference.
The outputs of 4:2 compressors 82 and 84 are provided as inputs to 4:2 compressor 86.
The outputs of 4:2 compressor 86 is provided to carry-sum adder 88.
The output of adder 88 is the 32 bit desired number N, the result of performing the operation N=A*B.
There are many ways to determine booth groups from a given binary number. However, the heart of the invention is in providing the booth group which includes the MSB of the lower half of the number, together with the least significant two bits of the upper half of the number. The present invention stores the MSB of the lower half of the number is a flip-flop or other suitable storage means, making that bit of information easily available when the next booth group requiring that bit is determined.
FIG. 5 is a block diagram of a booth decoder incorporating the present invention.
Referring to FIG. 5 decoder 80 includes inputs 100 and 102, and outputs 104. Input 100 is used to provide a first number A to splitter 106 which splits first number A into upper and lower halves, and stores those halves in registers 108 and 110 respectively. Input 102 is used to provide a second number B to splitter 112 to which splits second number B into upper and lower halves which are stored in registers 114 and 116 respectively. Storage means 118 is provided to copy and contain the MSB of the lower half bits of second number B, for use when determining the first booth group pertaining to the upper half of number B. At the proper times, registers 114 and 116 and storage means 118 provide their outputs to booth group analyzers 120 and 122 which in turn provide their outputs to distribution circuit 124. Booth group analyzers 120 and 122 analyze the bit information contained in registers 114 and 116, and provide the proper booth groups to distribution circuit 124.
Distribution circuit 124 provides the proper partial products on outputs 104, the partial product for a give booth group being determined according to table 1 as previously described. Thus, distribution circuit 124 will be responsible for sampling registers 108 and 110 at the appropriate times and providing a “0”, A,−A, 2A or −2A on respective ones of its outputs, based on the booth group which corresponds to that partial product output.
FIG. 6 is a block diagram showing a present invention apparatus for performing the function N=A*B+C.
Referring to FIG. 6, system 160 includes booth recoder 162, 4:2 compressors 164, 166, 168, 170, 172, 174, 176, 178, and 180, and adder 182. For traditional operations, booth recoder 62 selects booth groups such as booth groups discussed herein, and then provides the corresponding partial products according to table 1 above to the inputs of 4:2 compressors 164, 166, 168, and 170. The outputs of 4:2 compressors 164, 166, 168, and 170 are then provided as inputs to 4:2 compressors 172 and 174.
The outputs of 4:2 compressors 172 and 174 are provided to 4:2 compressors 176 and 178 respectively. Two other inputs to 4:2 compressor 176 are the 32 bits of number C (properly aligned), from the equation N=A*B+C, and an input tied to a binary zero. Two other inputs to 4:2 compressor 178 are the 32 bits of number C (properly aligned), from the equation N=A*B+C, and an input tied to a binary zero. The inputs that are tied to binary zero may alternatively be removed, and the circuits for the respective 4:2 compressors designed to imply a zero.
To have number C properly aligned means that C is provided to 4:2 compressors in the same manner as are other inputs. Thus, bit 0 of number C is provided to the rightmost bitwise compressor within 4:2 compressors 176 and 178, and bit 1 provided to the next left-oriented bitwise compressor and so on.
The outputs of 4:2 compressors 176 and 178 are then provided to 4:2 compressor 180. The output of 4:2 compressor 180 is then provided to carry-sum adder 182 for the final addition. The output of adder 182 is the 32 bit desired number N, the result of performing the operation N=A*B+C.
The apparatus of FIG. 6 is significantly faster and requires substantially less space to implement than the prior art apparatus's. Further, using the present invention apparatus in a manner slightly differently than just described allows the performance of the same computation N=A*B+C using a technique known as “traditional.” Thus, the apparatus of FIG. 6 may be used for both traditional and parallel computations, a feat not able to be accomplished with prior art apparatus's.
When performing traditional multiplication/adds as described above, the first booth group has a least significant bit of zero, and each succeeding booth group is chosen in overlapping three bit groups, using all bits in the original number B.
When performing parallel multiplication/adds, the first number is separated into two halves, an upper half and a lower half. The upper half has the high bits which were present in the original first number, combined with a number of least
significant bits (LSB) having zeros, the number of LSB'sbringing the new upper half to be the same width as the original number.
The lower half is sign extended to be the same width as the original number. Thus, if the original number A is 32 bits wide, the lower half is sign extended from 16 bits to 32 bits.
FIG. 7 shows one example of a new number formed from the original upper and lower halves of original number A.
The new upper and lower halves are then processed using the FIG. 6 apparatus as previously described, using the upper and lower halves of the second number B to determine the booth groups for the new upper and lower halves of A. When determining the booth groups corresponding to the new upper half of A, a zero is added to the least significant bit position of the upper half of B, and two zeros are added to the most significant bit position of the upper half of B in the case of unsigned multiplication, or is alternatively sign extended two bits in the case of signed multiplication/add operations, as described previously.
Those of ordinary skill in the art having the benefit of this disclosure will readily recognize that the present invention may easily be adapted to systems having 8, 32, 64, 128 bits per number or more, by constructing 4:2 compressors with a number of bitwise compressors equal to the number of bits in each number A, B, and C, and by using suitable numbers of those 4:2 compressors in a tree form as shown in FIGS. 6 or 8, also depending on the number of bits involved in the original numbers A, B, and C.
While embodiments and applications of this invention have been shown and described, it would be apparent to those skilled in the art that many more modifications than mentioned above are possible without departing from the inventive concepts herein. The invention, therefore, is not to be restricted except in the spirit of the appended claims.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US4825401 *||Mar 12, 1987||Apr 25, 1989||Kabushiki Kaisha Toshiba||Functional dividable multiplier array circuit for multiplication of full words or simultaneous multiplication of two half words|
|US5521856 *||Oct 20, 1994||May 28, 1996||Kabushiki Kaisha Toshiba||Multiplier capable of calculating double precision, single precision, inner product and multiplying complex|
|US6233597 *||Jul 7, 1998||May 15, 2001||Matsushita Electric Industrial Co., Ltd.||Computing apparatus for double-precision multiplication|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US7313585 *||Aug 30, 2003||Dec 25, 2007||Hewlett-Packard Development Company, L.P.||Multiplier circuit|
|US7814137||Jan 9, 2007||Oct 12, 2010||Altera Corporation||Combined interpolation and decimation filter for programmable logic device|
|US7822799||Jun 26, 2006||Oct 26, 2010||Altera Corporation||Adder-rounder circuitry for specialized processing block in programmable logic device|
|US7836117 *||Jul 18, 2006||Nov 16, 2010||Altera Corporation||Specialized processing block for programmable logic device|
|US7865541||Jan 22, 2007||Jan 4, 2011||Altera Corporation||Configuring floating point operations in a programmable logic device|
|US7930336||Dec 5, 2006||Apr 19, 2011||Altera Corporation||Large multiplier for programmable logic device|
|US7948267||Feb 9, 2010||May 24, 2011||Altera Corporation||Efficient rounding circuits and methods in configurable integrated circuit devices|
|US7949699||Aug 30, 2007||May 24, 2011||Altera Corporation||Implementation of decimation filter in integrated circuit device using ram-based data storage|
|US8307023||Nov 6, 2012||Altera Corporation||DSP block for implementing large multiplier on a programmable integrated circuit device|
|US8386553||Feb 26, 2013||Altera Corporation||Large multiplier for programmable logic device|
|US8539014||Mar 25, 2010||Sep 17, 2013||Altera Corporation||Solving linear matrices in an integrated circuit device|
|US8539016||Feb 9, 2010||Sep 17, 2013||Altera Corporation||QR decomposition in an integrated circuit device|
|US9053045||Mar 8, 2013||Jun 9, 2015||Altera Corporation||Computing floating-point polynomials in an integrated circuit device|
|US9063870||Jan 17, 2013||Jun 23, 2015||Altera Corporation||Large multiplier for programmable logic device|
|US9098332||Jun 1, 2012||Aug 4, 2015||Altera Corporation||Specialized processing block with fixed- and floating-point structures|
|US20010009012 *||Jan 9, 2001||Jul 19, 2001||Mitsubishi Denki Kabushiki Kaisha||High speed multiplication apparatus of Wallace tree type with high area efficiency|
|US20050050134 *||Aug 30, 2003||Mar 3, 2005||Winterrowd Paul W.||Multiplier circuit|
|US20050246407 *||Jul 6, 2005||Nov 3, 2005||Renesas Technology Corp.||High speed multiplication apparatus of Wallace tree type with high area efficiency|
|U.S. Classification||708/100, 708/603, 708/629|
|International Classification||G06F7/533, G06F7/52|
|Oct 25, 1999||AS||Assignment|
Owner name: SUN MICROSYSTEMS, INC., CALIFORNIA
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WANG, YONG;TZENG, ALLAN;REEL/FRAME:010328/0372
Effective date: 19990907
|Aug 27, 2007||REMI||Maintenance fee reminder mailed|
|Feb 17, 2008||LAPS||Lapse for failure to pay maintenance fees|
|Apr 8, 2008||FP||Expired due to failure to pay maintenance fee|
Effective date: 20080217