US 6223195 B1 Abstract This arithmetic unit for carrying out partial sum of products for transform operations such as discrete cosine transform is provided which includes a plurality of first units for calculating in parallel sums of and/or differences between a plurality of input variables or sums of and/or differences between a plurality of values obtained by multiplying said plurality of input variables by a constant. The arithmetic unit also includes a processing unit having a plurality of shift units for shifting outputs from said plurality of first units by respectively predetermined numbers of digit-shifts and a plurality of second units for calculating concurrently sums of outputs from said plurality of shift units. The arithmetic can be used, for example, as a high speed discrete cosine unit, a high speed Hartley transform unit or a high speed Hough transform unit.
Claims(18) 1. A computer system comprising:
a discrete cosine arithmetic unit for discrete cosine transform; and
a buffer memory connected to said discrete cosine arithmetic unit,
wherein said discrete cosine arithmetic unit includes:
a plurality of first units for calculating in parallel sums of a plurality of values obtained by multiplying a plurality of input variables by a constant and differences between said plurality of values; and
a plurality of shift units for shifting outputs from said plurality of first units by respectively predetermined number of digit-shifts;
a plurality of second units for calculating sums of outputs from said plurality of shift units.
2. A computer system according to claim
1,wherein said plurality of first units includes selecting means for selecting, for an i-th column and a j-th column of n−1=i+j in a point discrete cosine transform formula, results of calculation of sums of and differences between data of said i-th column and said j-th column immediately after said data are inputted thereto in case of a transform of said n-point discrete cosine transform formula and for selecting the inputted data of an i-th row and a j-th row itself in case of an inverse transform of n−1=1+j in said n-point discrete cosine transform formula, whereby hardware is shared between said transform and said inverse transform of said n-point discrete cosine transform formula.
3. A computer system according to claim
1, including a plurality of gate circuits, each of which is a circuit for transforming a sum in an initial data inputting stage into a subtraction to achieve 1−1=0, 1−0=1, 0−1=−1, and 0−0=0, and redundant binary adders for subsequent additions of outputs from said plurality of gate circuits.4. A computer system according to claim
1, wherein a plurality of cosine coefficients multiplied by an appropriate fixed value of a matrix of rows and columns for discrete cosine transform are beforehand multiplied by said appropriate fixed value so that the number of non-zero coefficients after a recoding operation of said plurality of cosine coefficients of said matrix of rows and columns for discrete cosine transform is less than that of initial coefficients which would result if said plurality of cosine coefficients of said matrix of rows and columns for discrete cosine transform were not multiplied by said appropriate fixed value.5. A computer system according to claim
1, further comprising:a quantizing unit connected to said discrete cosine arithmetic unit; and
an encoding unit connected to said quantizing unit.
6. A computer system according to claim
5, further comprising:a display unit connected to said buffer memory; and
a storage media connected to said encoding unit.
7. A computer system according to claim
1, further comprising:a quantizing unit connected to said discrete cosine arithmetic unit; and
an encoding and decoding unit connected to said quantizing and inverse quantizing unit.
8. A computer system according to claim
7, further comprising:a display unit connected to said buffer memory; and
a storage media connected to said encoding and decoding unit.
9. A computer system comprising:
a discrete cosine arithmetic unit for discrete cosine transform; and
a buffer memory connected to said discrete cosine arithmetic unit,
wherein said discrete cosine arithmetic unit includes:
a plurality of first units for pre-calculating in parallel sums of a plurality of values obtained by multiplying a plurality of input variables by a constant and/or differences between said plurality of values; and
a plurality of shift units for shifting outputs from said plurality of first units by respectively predetermined number of digit-shifts;
a plurality of second units for post-calculating sums of outputs from said plurality of shift units.
10. A computer system according to claim
9, further comprising means for beforehand calculating sums of and/or differences between a plurality of variables and values obtained by multiplying the plural variables by a constant, shifting calculated results by predetermined numbers of digit places by directly using constants resultant from the multiplications between a plurality of constants, and conducting a plurality of calculations, thereby achieving an addition for the results at a time.11. A computer system according to claim
10, further including means for selecting, for i and j of n−1=i+j in an n-point discrete cosine transform formula, results of calculation of sums and differences between an i-th column and a j-th column immediately after data is inputted thereto in a case of transform and selecting the data itself in a case of inverse transform and means for classifying calculations thereafter into calculation of odd-numbered rows and calculation of even-numbered rows and selecting immediately before outputting results sums of and differences between the results of the groups of the even-numbered and odd-numbered rows and the results themselves, characterized in that hardware is shared between the transform and the inverse transform.12. A computer system according to claim
10, characterized by further directly including a gate circuit for transforming an addition in an initial stage into a subtraction to achieve 1−1=0, 1−0=1, 0−1=−1, and 0−0=0 and redundant binary adders for subsequent additions.13. A computer system according to claim
10, wherein appropriate coefficients are beforehand multiplied by a constant so that the number of non-zero coefficients after a recording operation is less than that of the initial constants.14. A computer system according to claim
9, further comprising a section to input and to output multimedia information of voice, image, code, and the like and a memory to buffer therein data of the multimedia information for conducting parallel input and output operations of data via the buffer memory, wherein operation of the data is performed by said discrete cosine high-speed arithmetic unit.15. A computer system according to claim
14, wherein said data operations are performed in a realtime fashion to virtually provide a storage capacity which is larger than an actual storage capacity thereof by two to three orders of magnitude.16. A computer system according to claim
9, further comprising means for beforehand calculating sums of and/or differences between a plurality of variables and values obtained by multiplying the plural variables by a constant, shifting calculated results by a predetermined numbers of digit places by directly using a constant result determined as one constant by beforehand conducting calculation between a plurality of constants, and conducting a plurality of calculations, thereby achieving an addition for the results at a time.17. A computer system comprising:
a discrete cosine arithmetic unit for inverse discrete cosine transform; and
a buffer memory connected to said discrete cosine arithmetic unit,
wherein said discrete cosine arithmetic unit includes:
a plurality of first units for calculating in parallel sums of a plurality of values obtained by multiplying a plurality of input variables by a constant and differences between said plurality of values; and
a plurality of shift units for shifting outputs from said plurality of first units by respectively predetermined number of digit-shifts;
a plurality of second units for calculating sums of outputs from said plurality of shift units.
18. A computer system comprising:
a discrete cosine arithmetic unit for inverse discrete cosine transform; and
a buffer memory connected to said discrete cosine arithmetic unit,
wherein said discrete cosine arithmetic unit includes:
a plurality of first units for pre-calculating in parallel sums of a plurality of values obtained by multiplying a plurality of input variables by a constant and/or differences between said plurality of values; and
a plurality of second units for post-calculating sums of outputs from said plurality of shift units.
Description This application is a continuation of application Ser. No. 08/737,569, filed on Nov. 15, 1996, U.S. Pat. No. 6,029,185, the entire disclosure of which is a 371 of PCT/JP95/00953 filed May 18, 1995. The present invention relates to an arithmetic unit of a computer system, and in particular, to a discrete cosine high-speed arithmetic unit suitable for achieving calculation of a sum of products using a plurality of constant function values and compressing and decompressing data at a high speed. Moreover, the present invention relates to a high-speed Hartley transform arithmetic unit suitable for calculating a sum of products using a plurality of constant function values and thereby executing the Hartley transform processing, which is related to a Fourier transform, at a high speed. Additionally, the present invention relates to image processing, and in particular, to a Hough transform circuit to achieve a Hough transform in which straight line components of an image are detected, the circuit being suitable for calculating a sum of products using a plurality of constant function values and executing the Hough transform processing at a high speed. In voice and image processing, there has been widely employed a discrete Fourier transform (DFT) and its variations such as a discrete cosine transform and a discrete Hartley transform. In these transform processes, a plurality of trigonometric functions are utilized to primarily calculate sums of products between the trigonometric functions and data items. In general, the calculation cost of multiplication is higher than that of addition and subtraction. Consequently, there have been devised several high-speed calculation algorithms in which the number of multiplications are advantageously reduced using relationships between trigonometric functions, e.g., the formula of double angle and the formula of half-angle. These algorithms have been briefly described in pages 115 to 142 of the “Nikkei Electronics” No. 511 published on October 15, 1990. In practice, the trigonometric functions are stored as constants in a memory. Particularly, due to the relatively small number of figures of the values, there has been also adopted a method in which the results of products between data items and trigonometric functions are stored in a memory. In addition, it is possible to utilize a known method in which each trigonometric function value is calculated in a CORDIC method using the principle of rotation of coordinates and/or a formula of approximate expression of function. In image processing, the Hough transform is often employed because the transform is advantageously applicable even when the data contains noises due to the detection of straight lines in the image. When the coordinates of an arbitrary pixel are expressed as (x,y), the Hough transform is defined as
FIG. 23 shows the geometric relationship of the transform. R stands for the length of a perpendicular drawn from the origin of the coordinate system to a straight line passing the pixel (x,y). Letter θ denotes the angle between the perpendicular and the positive direction of the x axis. In an actual application, for an arbitrary pixel, the angle θ takes a plurality of discrete values ranging from 0 to π such that R of expression (1) is calculated for each value of θ. R is also discretized and its frequency of occurrence is attained in the form of voting for all pixels so that (R, θ) having the highest number of votes obtained is detected as a straight line component. A plurality of trigonometric functions are stored as constants in a memory for use in calculation later. Or, in the conventional method in which the value of each trigonometric function is directly calculated using, e.g., the CORDIC method, even when the number of multiplications is reduced by a clever algorithm, a considerable amount of multiplications are still necessary. Furthermore, it is not practical to provide a multiplier for each of the multiplications, namely, the multiplier is to be sequentially used. This is cause of hindrance to the high-speed operation. Additionally, since an arbitrary input is assumed in a multiplier, even when a value at a digit place of binary input data is zero, a partial product is uselessly calculated for the digit place. When there is used the method in which all of the results of products between data items and trigonometric function values are stored in the memory, although the arithmetic unit can be easily designed, the memory capacity is increased and hence the chip size becomes larger. Moreover, to count the votes for the discrete (R, θ), there is required a large volume of memory. It is therefore an object of the present invention to provide a discrete cosine high-speed arithmetic unit, a high-speed Hartley transform arithmetic unit, and a high-speed Hough transform circuit in which considering that each trigonometric function value is constant, to possibly minimize the number of non-zero coefficients in the binary value obtained by expanding the trigonometric function value, the value is beforehand recoded into a redundant binary representation of {−1,0,+1}. The resultant values are shifted such that a pair of non-zero coefficients is optimally grouped. For each digit position, associated data pairs are subjected to addition or subtraction according to the signs of the coefficients. Moreover, the resultant values are shifted to be aligned to a fixed position and are then inputted to a group of adders to thereby obtain partial products therebetween, thereby attaining the sum of the partial products. In consequence, the arithmetic units and circuit above are efficiently configured in a compact structure to operate at a high speed. Since the number of non-zero coefficients is reduced in the constant and the pair of non-zero coefficient values are grouped for each digit position to commonly effect the addition in an optimal manner, the number of adders is decreased and the number of stages of gates is also minimized. FIG. 1 is a configuration diagram of a discrete cosine high-speed arithmetic unit of the present invention. FIG. 2 is a DCT/IDCT calculation expressed in matrix form. FIG. 3 is a table showing binary expansion values of seven cosine constants, values obtained by canonically recoding the expansion values, and values attained by multiplying the values by the square root of two. FIG. 4 is a diagram showing a combination of variable pairs to beforehand accomplish addition or subtraction for the anterior addition and shift input positions for the posterior addition. FIG. 5 is a circuit diagram for the calculation of the one-dimensional DCT/IDCT of the present invention. FIG. 6 is a one-digit circuit diagram for implementing x FIG. 7 is a one-digit circuit diagram of a redundant binary adder employed according to the present invention. FIG. 8 is an explanatory diagram of a method of achieving the two-dimensional DCT/IDCT by reducing the two-dimensional form into a one-dimensional form and a method of directly calculating the two-dimensional DCT/IDCT. FIG. 9 is a table showing binary expansion values of cos α×cos β at two-dimensional points, i.e., 4×4 points and recoded values thereof. FIG. 10 is a DCT at two-dimensional 4×4 points in matrix representation. FIG. 11 is an IDCT at two-dimensional points 4×4 in matrix representation. FIG. 12 is a diagram showing a method in which the DCT/IDCT at two-dimensional 4×4 points is directly calculated without reducing the two-dimensional form into a one-dimensional form and a combination of pairs of variables to beforehand accomplish addition or subtraction for the anterior addition and shift input positions for the posterior addition. FIG. 13 is a circuit construction diagram of the present invention in which the DCT at two-dimensional 4×4 points is directly calculated without reducing the two-dimensional form into a one-dimensional form. FIG. 14 is a configuration diagram of a chip in which a DCT/IDCT high-speed arithmetic unit of the present invention is incorporated. FIG. 15 is a construction diagram of a high-speed Hartley transform arithmetic unit as an embodiment of the present invention. FIG. 16 is an explanatory diagram showing a 16-point Hartley transform in matrix representation. FIG. 17 is an explanatory diagram showing a re-arranged 16-point Hartley transform in matrix representation. FIG. 18 is an explanatory diagram showing a state in which constants are developed into binary values. FIG. 19 is a circuit block diagram showing a product sum circuit. FIG. 20 is a circuit block diagram showing an initial stage of a butterfly arithmetic circuit. FIG. 21 is a circuit block diagram showing the second and subsequent stages of the butterfly arithmetic circuit. FIG. 22 is a construction diagram of a high-speed Hough transform circuit as an embodiment of the present invention. FIG. 23 is an explanatory diagram showing a geometric relationship of a 16-directional Hough transform. FIG. 24 is an explanatory diagram related to a binary expansion of constant values of cosine functions and grouping of common parts. A description will first be given of an 8-point discrete cosine transform (to be abbreviated as DCT herebelow). Assuming that input data and calculation data are respectively x Moreover, the formula of inverse DCT (to be abbreviated as IDCT herebelow) is expressed as: where, It is assumed g(i)=cos(πi/16), equation (1) can be represented in matrix equation as shown in FIG. Prior to conducting the DCT calculation, consider now the following formula of product sum: Assuming: in equation (4), the following relationship results: where, a
Furthermore, if there exist a pair of coefficients a
For the product sum x In addition, appropriately using a relationship
called a canonical recode in which the i non-zero coefficients can be decreased to two non-zero coefficients, there are conducted shift operations to thereby increase the pairs of coefficients satisfying equation (7) or (8). Next, a description will be given of a method in which the amount of calculations of DCT partial product sum is minimized according to these principles. First, FIG. 3 shows a binary expansion values of seven cosine constants g(i) up to the 16-th digit. Also shown are the canonical recode values (the number of non-zero coefficients is changed from 59 to 42 through the recoding process). However, −1 is represented by applying an overline to 1. Additionally, since g( In the DCT calculation, g(
(rounded up at the 13-th digit below the decimal point) is calculated as follows. First, as shown in FIG. 4, u
is calculated as follows. First, as shown in FIG. 4, u In FIG. 5, it is only necessary to appropriately select the type of adders Next, the IDCT calculation will be described. As shown in FIG. 2, the IDCT is attained by transposing the rows and columns of the DCT in the matrix representation. The difference therebetween resides in that the g(k) appearances are grouped according to odd and even values of k in DCT, but g(k) occurs for all values of k in the DCT. However, while g( Arranging the DCT/IDCT block diagram of FIG. 5 described above for each row of FIG. 2 in a parallel fashion for each calculation, there is provided the configuration diagram of FIG. 1 showing the discrete cosine high-speed arithmetic unit of the present invention. In short, eight original/calculation data items are simultaneously inputted to the arithmetic unit. In an anterior adder section The DCT/IDCT described above is related to one-dimensional calculations and is primarily adopted for the compression/decompression of voices. To apply the arithmetic unit to the compression/decompression of a two-dimensional image expressed with coordinates (x,y), the image is decomposed into two one-dimensional elements associated with x-directional and y-directional scans. Namely, the results of first one-dimensional elements related to the x-directional scan are provisionally stored in a random access memory (RAM). The rows and columns are transposed to be inputted to second one-dimensional elements related to the y-directional scan for the calculation. The obtained results are related to the two-dimensional image. In contrast to the conventional method described above, description will be given of a method of the present invention in which the calculation is directly achieved in the two-dimensional manner without decomposing the image in the one-dimensional elements. For simplicity of explanation, description will be given of the two-dimensional case of 4×4 points. This will be easily expanded to a case of 8×8 points. In the two-dimensional case, constant values of cos α·cos β are required to be calculated. If the fixed values respectively of cos and cos β are separately acquired for the calculation, multiplications will be required. However, in the case of 4×4 points, when the six combinations of the multiplied values are obtained in advance, the multiplications between the constants are unnecessary. Additionally, the storage operation of data in the RAM required in the conventional method in which the image is decomposed into two one-dimensional elements becomes unnecessary and hence the processing speed is increased. Assume f(i)=cos (πi/8). Then, the matrices of two-dimensional DCT and IDCT for 4×4 points are as shown in FIGS. 10 and 11. However, in this connection, the terms of coefficients which can be shifted are substantially unnecessary for the explanation and hence are not shown. Incidentally, matrix F According to the expression of the two-dimensional DCT of FIG. 10, it can be seen from FIG. 12 that the values are attained up to the 16-th digit below the floating decimal point and the pairs of additions are attained as (f( FIG. 14 shows an example of a chip system Description will now be given of a 16-point discrete Hartley transform (DHT). Assuming input data and calculation data to be x Since equations (10) and (11) are of the same form when the multiplication of {fraction (1/16)} (easily implemented by a shift operation) is ignored, only equation (10) will be described in the following paragraphs. Using the equation (13) shown below, equation (10) can be represented in matrix notation as shown in FIG.
Description will next be given of a method in which the amount of calculations of product sums is reduced in the DHT according to the principles of equations (4) to (9) described above. In the calculation of a DHT, the values of p, q, and square root of two appear as multiplication terms in groups as shown in FIG. Additionally, in relation to a row in which, for example, X w
is classified into three calculation groups of w First, in the calculation of
(rounded off at the 17-th digit below binary point), 2up+uq, uq−up, and uq+up are calculated in an anterior addition First of all, the calculating circuit of each digit place of wk=xi−xj can be configured with simple gate circuits shown in FIG. 20 without using any adder circuit because the calculations are 0−0=0, 0−1=−1, 1−0=+1, and 1−1=0. The calculation circuit of each digit place of wk=xi+xj can be configured only in consideration of wk=xi+xi=xi−(−xj). The value of −xi can be attained by {(inverse value of xj)+1} according to the representation of two's complement. The inverse value of xj is represented by drawing an overline over xj. Furthermore, since the additions in the second and subsequent stages are the redundant binary representation of {+1,0,−1}, a basic circuit Moreover, in the Hartley transform circuit, the operation of xy, a so-called butterfly operation, is often conducted. Therefore, the gate configuration is partially shared between the arithmetic circuits of x+y and x−y to resultantly obtain a butterfly arithmetic circuit As a result of the description above, there is constructed a DHT arithmetic unit shown in FIG. The flow of calculation steps is as shown in FIG. 15, which requires the following arithmetic units including Next, description will be given of a method in which the amount of calculations of partial product sum is decreased according to the principles of equations (4) to (9) already described above. Description will be given of the Hough transform in a case in which the angle θ is divided into 16 directions. Furthermore, only eight directions in the range of angle 0 to π/2 will be described. Since the minimization of the amount of calculations can be achieved by almost the same manner also for the remaining eight directions in the range of angle π/2 to π in which only the sign is partially changed, description thereof and a diagram related thereto will be omitted. Prior to the realization of calculations of the Hough transform, consider first a formula of product sum represented by equation (14).
Assuming now the following equation results: In the equations above, however, there is assumed a condition of cx, i, cy, iε{−1,0,+1}. If there exists a pair of coefficients of cx, i and cy, p for which cx, i=|cy,p|=1, equation (17) results:
For the product sum of x·cx+y·cy, if there exist a plurality of pairs of coefficients which satisfy equation (17), the sums of and differences between x and y·2n (n=shifted by (p−i) digit places) are calculated according to the principle designated by equation (17) and then the resultant values are shifted respectively to the digit positions to satisfy the condition above so as to be added to each other, thereby decreasing the number of calculations of partial product sums. In addition, appropriately employing the relationship of equation (18) called a canonical recoding in which i non-zero coefficients can be reduced to two non-zero coefficients, there are conducted shift operations to thereby increase the pairs of coefficients satisfying equation (17) as follows:
FIG. 24 shows the recoded results of eight cos values (rounded off at the 17-th digit below binary point). Moreover, according to the relationship between the equation of the Hough transform and equation (17), the common pairs of x and y for the addition and subtraction can be arranged as enclosed in a rectangle in FIG. In a case of, for example, θ=3π/16, since R x cos(3π/16)+y cos(5π/16), the calculation can be conducted in groups of x+y for the first, fourth, and 14-th digit places below binary point, x−2y for the eighth and 11-th digit places below binary point, and x+2y for the 16-th digit place below binary point in the anterior addition. In addition, these anterior addition groups can also be commonly adopted for other values of θ. In the conventional method based on multipliers, the values of cos(3π/16) and cos(5π/16) stored in a table are read therefrom to be respectively multiplied by x and y. Consequently, the common anterior addition steps above have been impossible. Values of R to be discretized are generated by an R decoder. Moreover, the R decoder is directly connected to a voting counter such that the value of the counter associated with votes is decoded and an operation of +1 is carried out. The R decoder and voting counter are arranged for each θ. As a result of the description above, there is configured the Hough transform circuit shown in FIG. In the multiplier-based system of the prior art, since 15 adders are necessary for one multiplier (in a 16-bit processing system), there are required 8×(2×15+1)=248 adders. Additionally, when one addition is regarded as one stage, five stages of addition are used (under a condition that eight multipliers are adopted in parallel). According to the present invention, there are required in total only 24 adders including the circuit As an application example of the present invention, there can be considered, for example, an application in which such items primarily including straight line components as Chinese characters are to be recognized. Moreover, in a case in which when the directions are fixed to about 16 directions, there can be possibly adopted a utilization mode in which the straight line components are first detected through a coarse detection step to be then sieved for a fine detection. The sieved pixels can be further processed for a precise determination of the direction such that the operation efficiently proceeds to the subsequent work processes. For the fine determination of direction, the system conducts calculation of equation (19) according to the addition theorem of trigonometric functions, where, α is an angle with the precision of 16 divisions and β indicates an angle with a finer division. Consequently, the calculation of equation (19) is precisely achieved using multipliers in a method similar to the conventional method. Assume X=(x cos α+y sin α) and Y=(x sin α+y cos α). It can be appreciated that these values can be immediately calculated by the hardware of the present invention. The present invention is not limited to the discrete cosine transform and Hartley transform, but can be expanded generally to trigonometric functions. Therefore, the present invention is applicable also to the discrete Fourier transform and its associated operations (such as Wavelet transform). In addition, the present invention can be applied not only to general transforms using trigonometric functions, such as Hough transforms, but also to Radon transforms which are obtained by generalizing Hough transforms. Moreover, the trigonometric functions can be expanded to general periodic functions. Additionally, the application range can be expanded to a case in which either one of the operations is a product sum operation of constants. Dimensions can be increased from two dimensions to three or more dimensions, and the discretization points can be increased to more than eight. The shift operation of the present invention is fixed as a predetermined operation. When the system to conduct the shift operation is configured with shifters, there is obtained a variable construction and hence the application range is expanded. Although a large number of adders are employed, the basic circuit of an arbitrary one digit place has a regular repetitive structure and hence the design scale can be easily increased. According to the present invention, there is attained an advantage that the number of gate stages is considerably reduced and the calculation speed is increased in the DCT/IDCT, Hartley transform, and Hough transform. Moreover, since the DCT/IDCT hardware is almost all commonly used, when the basic element is repeatedly used to hold (processing speed)×(area)=(constant) for the high-speed calculation, it is possible to minimize the chip area. Patent Citations
Non-Patent Citations
Referenced by
Classifications
Legal Events
Rotate |