CA1281425C - Circuit for computing the quantized coefficient discrete cosine transform of digital signal samples - Google Patents

Circuit for computing the quantized coefficient discrete cosine transform of digital signal samples

Info

Publication number
CA1281425C
CA1281425C CA000555741A CA555741A CA1281425C CA 1281425 C CA1281425 C CA 1281425C CA 000555741 A CA000555741 A CA 000555741A CA 555741 A CA555741 A CA 555741A CA 1281425 C CA1281425 C CA 1281425C
Authority
CA
Canada
Prior art keywords
circuit
output
subtracting
input
shifted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
CA000555741A
Other languages
French (fr)
Inventor
Benedetto Riolfo
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Telecom Italia SpA
Original Assignee
CSELT Centro Studi e Laboratori Telecomunicazioni SpA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CSELT Centro Studi e Laboratori Telecomunicazioni SpA filed Critical CSELT Centro Studi e Laboratori Telecomunicazioni SpA
Application granted granted Critical
Publication of CA1281425C publication Critical patent/CA1281425C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/147Discrete orthonormal transforms, e.g. discrete cosine transform, discrete sine transform, and variations therefrom, e.g. modified discrete cosine transform, integer transforms approximating the discrete cosine transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • G06F7/533Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even
    • G06F7/5334Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even by using multiple bit scanning, i.e. by decoding groups of successive multiplier bits in order to select an appropriate precalculated multiple of the multiplicand as a partial product

Abstract

Abstract A circuit for computing a quantized-coefficient, dis-crete, cosine-transform of digital signal samples and consisting of two parallel branches which perform multiplication and accumu-lation operations for even and odd rows of a cosine transform co-efficient matrix. Each branch includes: an input circuit in which the contributions of the opposing index columns of the matrix may be added, a multiplication circuit which performs multiplication operations for each matrix column by means of an addition and shifting operation for each matrix coefficient, and an accumulation circuit for the intermediate products of each matrix column.

Description

4~5 This invention relates to digital signal coding and, in particular, it relates to a circuit for computing a quantized-co-efficient, discrete, cosine-transform of digital signal samples.
Digital transform coding for one-, two-, and three-dimensional digital signal samples is widely used in many appli-cations, e.g. video signal processing which requires spectraanalysis, data compression, and reduction of original signal bandwidth.
Various types of transform coding are well known. They include the Hadamard or HCT (High Correlation Transform) types, which are based on extremely simple coefficients, and the Fourier transform, which requires complicated floating point calcula-tions Yet other types include the K.L. or Slant transforms, which are based on optimal, energy distribution in the frequency spectrum. However, the discrete cosine transform, referred to hereunder as the DCT transform, provides the best compromise between effective representation of the transform signal in the frequency-spectrum and simplicity of construction in many appli-cations, including video signal processing.
In the case of a N.N base, one-dimensional, DCT trans-form, the major advantage consists in the recurrence of its Nreal coefficients.
A number of DCT transform, computational algorithms are used; some are based on the direct derivation of the cosine transform from the Fourier transform, while others exploit co-eEficient recurrence. These algorithms all serve to reduce thenumber of multiplications in comparison with the total number of operations to be carried out (addition, accumulation, addressing, normalizing, roundlng off and cut-off operations). This makes them particularly suitable for software applications, the major objective of which is to reduce the number of micro-instruction cycles.
Of the several well-known N.N base, one-dimensional, DCT transform, computational algorithms, the algorithm which provides the greatest reduction in the number of multiplications is the Fralick-Chen algorithm. This algorithm is described in "A
fast computational algorithm for the discrete cosine transform", W. Chen. C.H. Smith, S.C. Fralick, IEEE Transactions on lf~8~

Communications, Vol. COM. 25, No. ll, November 1977. The number of operations required by this algorithm is:
3N/~(log N~1)+2 additions; and N log N-3N/2+4 multiplications.
For a N.N base, two-dimensional, DCT transform, it is possible to exploit the distributed property and apply algorithms for the one-dimensional case, such as the Fralick-Chen algorithm, in two orthogonal directions. In this way, the number of opera-tions carried out would be 2N times the number required for the one-dimensional case.
There is, however, a 2-D transform computational algor-ithm which produces a further reduction in the number of opera-tions. This algorithm is described in a paper by M. Vetterli, "Fast 2-D discrete cosine transform", IEEE ICASSP-1985. The number of operations required by this algorithm is given by:
(N.N/2) log N+N.N/3-2N+8/3 additions; and (N.N.5/2) log N+N.N/3-6N+62/3 multiplications.
Normally, however, the reduction in the number of oper-ations provided by these algorithms is accompanied by a corres-ponding complication in handling and re-ordering of intermediate product data, which produces serious memory addressing problems in the design of computation circuits for these algorithms.
Moreover, the non-uniform distribution, in the computational cir-cuits, of computational elements such as adding circuits and multipliers which have diferent propagation times, makes these components inefficient both with respect to reduction oE overall computation time, and with respect to utilization of processing resources.
Irrespective o~ whether the circuits for these algor-ithms are designed to use discrete or integrated components, the main problem is still the part of the circuit dedicated to multi-plication operations. This is because of the circuit complexity,the elevated computation time, the large occupation of space and the power dissipation.
The best known application of an N.N bit multiplication operation involves converting this operation into a sequence of N
elementary N-bit adding and shifting operations. This solution has been used in parallel type multipliers with various circuit 4~

optimizations. This solution would not appear to be the most efEicient for the DCT transform, even if a limited number of co-efficients is used, given that the number of elementary opera-tions to be performed is still high.
An attempt could be made to simplify the multiplier structure by using conversion tables employing ~OM or PROM memor-ies or programmed logic arrays (PLA) which contain the results of multiplications directly addressed by the operands. However, in some cases such structures cannot be used because the large number of multiplication coefficients and operand representation bits would require an excessively large memory capacity.
The DCT transEorm computation circuits described herein solve these problems, as they do not require multiplication com-putations or the use of multipliers. At the same time, the cir-cuits make it possible to reduce space requirements, computing time and power dissipation.
By selecting a suitable order for the coefficients of the transform matrix, and hence of operations to be performed, and by virtue of the precision (number of bits) adopted for rep-resenting the coefficients, each multiplication is performed by means of an addition and shifting operation which involves the sample to be transformed on the input and/or the results of pre~
vious multiplications. In this way, the entire transformation operation is performed by computing 2NN(N-1) equivalent additions in the case of N=16, or N.N(2N+1) equivalent additions in the case of N=8. Though the number of operations to be performed is thus drastically reduced, reordering the various intermediate products to be accumulated does not become more complicated, and creating accumulation memory addressing units also remains simple. This circuit solution encourages developments using VLSI
circuits.
Furthermore, the same circuit can be used to compute the DCT transform for any number of dimensions.
Accordingly, this invention provides a circuit for computing the discrete cosine transform of f(j) sample vectors each of dimension N(0~j~N-1), the transform having a square matrix of coefficients and having a dimension of N.Nt the co-efficients repeating in absolute value in each column of the 8~4~S
square matrix but the order of which and, in some cases, the sign of which differs, the circuit obtaining transformed F(k) sample vectors each of dimension N(O~kSN-l) and cornprising two circuit branches in parallel, the first circuit branch for operations relating to coefficients in even rows of the square matrix and the second circuit branch for coefficients in odd rows, the two branches comprising:
(a) a first adding circuit and a first subtracting circuit, forming part of the first and the second branch respectively, wh.ich at their inputs receive pairs of samples of an F(j) vector having indices (j) and (N-j-l) respectively, with j increasing sequentially from O to N/2-1;
(b) a first calculating circuit, forming part of the first branch, which, for each addition result received from the first adding circuit, calculates N/~ par~ial products o~ the result with the coefficients of the columns of the square matrix in even rows, in a sequential order o the coefficients which is fixed for all columns, to produce each partial product through an addition and shifting operation which involves the previous partial products and/or the input datum.
(c) a second calculating circuit, forming part of the second branch, which, for each subtraction result received from the first subtracting circuit, calculates N/2 partial products of the result with the coeffici-ents of the columns of the square matrix in odd rows, in a sequential order of the coefEicients, which is fixed for all columns, to produce each partial product through an addition and shifting operation which involves the previous partial products and/or the input datum.
(d) a first and a second addin~/subtracting circuit forming part of the first and the second branch respectively, which add or subtract the result of a previous addition or subtraction from a partial product received from the first and the second calculating circuit respect-ively, and which finds the sum in the case of a partial 4;~

product referring to a positive coefficient, or finds the difference in the case of a negative coefEicient;
(e) a first set of and a second set of memoriesr Eorming part of the first and the second branch respectively, each accumulating the N/2 par-tial results, resulting from the calculations performed by the Eirst and the second adding/subtracting circuit respectively, the first set of memories accumulating partial results R(2k) relating to even rows of the square matri~, the second set of memories accumulating partial results R~2k+1) relating to the odd rows, the partial results being components of a transformed sample vector F(k) at the N/2-lth column; and, ~f) a first addressiny unit which generates first control signals for the first and the second calculating units, the first control signals detexmining the sequential order of the coefficients in a column, the order fixed for all columns, and determining the addresses to be read in the first set of and the second set of memories of the partial results to be supplied as data to an in-put Oc the first and the second adding or subtracting circuit respectively and for re-writing the partial results updated in the same position, the addresses having a sequence which varies with the column of the square matrix~ to identify the partial result R(2k) and R(2k+1), the indices of which correspond to the row of the square matri~ containing the coefficient for which the first and the second calculating circuit perform the partial products~ and which generates operation selection signals for the first and the second adding/
subtracting circuit.
Embodiments of the invention are described/ by ~ay of example only, with reference to the drawings in which:
Figure 1 is a block diagram of a computational circuit for a one-dimensional, DCT transform;
Figures 2, 3, 4 and 5 show embodiments of the ERM and ORM circuits of Figure 1, which perform multiplication opera-tions; and, ~X8~4'~
Figure 6 shows an embodiment of a computational circ~it for a two-dimensional, DCT transform using two circuits as shown in Figure l.
Before describing the figures, a brief theoretical justification of the results obtained in circuit is given The one-dimensional DCT transform of the discrete function f(j), with j = 0, l, ...~ , N-l is defined as follows:

~-1 F(k) ~ 2 c(k) ~ ) cos [ n (2~1)k] (1) k = O, l, .... , N-~l where:
c(k) = l/~ for k=0 = l for k=l, 2, .... , N-1 The two-dimensional DCT transform of the discrete function f(i,j), with i,j = 0, l, .... , N-1 is defined as follows:

F(k,x) ~ 2 c(k)d(x) ~ C08[ n (2~+1)k] ~ f(l,~c08 [ ~ (2i+1)X] (2) N ~-0 2N 1~0 2N

k = 0, l, .... , N-l x = 0, l, .... , N l where:
c(k) = l/~ for k=0 = 1 for k=l, l, .... , N-l d(x) = l/~ for x=0 = l for x=l, 2, .... , N-l DCT transform coefficients in the two cases of base N=16 and N=8 are shown in Tables l and 2 below; coefficients are quantized at 8 bits, i.e. 7 mantissa bits and l sign bit:

314'~
Table l - DCT transform with base N-16 4 64 64 64 6~ 64 64 64 64 64 64 64 64 64 64 64 90 ~7 80 70 57 43 ~6 9 9 -26 -43 -57 -70 80 -81 -90 89 75 50 18 -18 -50 -75 -89 -89 -7~ -S0 -18 18 50 75 89 84 35 -35 -84 -84 -35 35 84 ~4 35 -35 -84 -84 -35 35 84 80 9 -70 -B7 -26 S7 90 43 -43 -90 -57 26 87 tO -9 -80 70 -43 -87 9 9o 26 -80 -57 57 80 -26 -90 -9 87 43 -70 64 -64 -64 64 64 -64 -64 64 64 -64 -64 6~ 64 -64 -64 64 57 -80 -~6 90 -9 -87 43 70 -70 -43 87 9 -90 26 80 -57 43 -~0 57 26 -87 70 9 -80 80 -9 -70 87 -26 -57 90 -43 35 -84 84 -35 -35 84 -84 35 35 -~4 84 -35 -35 84 -84 35 Table 2 DCT transform with base N=8 91 91 91 91 91. 91 91 91 1~6 106 71 25 -25 -71 -106 -126 118 49 -49 -118 -118 -4g 49 il8 91 -91 -91 9i 91 -91 W91 91 49 -118 118 -49 -49 11~ -118 49 It can be seen from tables l and 2 that ~here are several recurrent positive or negative coefficient values which repeat by column with a different order and with a principle horizontal pseudo-specularity. These recurrent coefficients can be exploited to reduce the number of operations to be performed by half by carrying out a preliminary addition or subtraction on samples opposite to the input sample vector f(j) in accordance 5 with the following convention:
f(j) + f(N-j-l) where j is even f(;) - f(N-j-l) where j is odd~

4~5 Each coeffient of a co~umn can be obtained from the preceding coefficients by means of a single addition and shift operation, keeping the odd matrix rows separate from the even rows at all times; the first coefficient of both even and odd rows is broken down in a sum of two power-of-two values.
A possible non-limiting example of the coefficient breakdown for N=16 is as follows:
Table 3 9 ~ ~1 1i3 Y 16+2 26 ~ 9x2+~3 35 ~ 18x2-1 043 - 26x2-9 50 ~ 18+32 -70 - 9x8+2 64 - 32-~32 80 ~ 64+16 75 ~ 50/2+50 57 - 26x2+80/16 ~4 - SOx2~16 87 D 43X2~1 89 1 50/2+64 1 590 ~ 26+64 64 ~ 32+32 A most important aspect of this breakdown is that per-forming a multiplication operation between a f(j) signal sample to be transformed and a matrix coefficient, at worst requires one addition and two shifts (in the case of coefficient 57) between terms obtained in the previous step. As a shift can be performed instantaneously through a suitable operand justification, it can be shown that the entire transformation algorithm can be reduced to computing N.N.N additions and N.N.(N-2) accumulations, or in other words around 2NN(N-l) equivalent additions~
Coefficient 70 is taken with a negative sign for reasons which will be clarified later.
Another non-limitiny example of the coefficient break-down applied to base N=8 is as follows:
Table 4 307 - 8-1 7 ~ 8-1 25 ~ 32-7 49 = 7x8-7 71 ~ 64+7 gl ~ 49~2-7 106 - 25x2+7x8 91 ~ 49x2-7 126 - 128-2 118 ~ 91x2-64 This case requires an initial breakdown in multiples of 7, which are in turn divided into power-of-2 factors; the total .~

number of equivalent additions to be performed is equal to N.N. (2N~l), which would seem to be greater than the previous case of base N=16, but is in practice smaller because the value of N is lower.
The transformation algorithm for the one~dimensional S case requires computing the matrix product of a sequence of N-component column vectors (representing the sequence of input signals to be transformed) and the kransform coefficient matrix of dimensions N.N, to obtain another N-component column vector.
In conventional circuits, this is generally carried out by multi-plying and accumulating each input vector component by all termsof the corresponding matrix row, moving sequentially by row inside the matrix.
Instead, the inventor has utilized the relations bet-ween the various matrix coefficients defined above to develop a circuit of relatively simple structure which computes the matrix product with a different ordering of the various intermediate products, moving by column inside the matrix. For each column, N
partial contributions are computed and accumulated for the N
column coefficients, but the N coefficients are always addressed in the same double succession, as shown in Table 3 or 4, in order to exploit the relationships defined above between the various coefficients. Only one half of the columns are considered, as the other half is a mirror ima~e.
On the other hand, a memory with a capacity of N words is required for temporary storage of intermediate accumulations, as all partial contributions of each f(k) component must be computed simultaneously for each f(j) input sample. Moreover, the two~dimensional transform requires a further N.N word capa-city memory between the two orthogonal transformation processes, as the second transform step is applied to the N vectors trans-posed for the one-dimensional case.
The size of each word is (log N+Nb-~Ns) bits, where log N may be 4 or 3, Nb is in our case equal to 8 and is the preci-sion oE the coefficients, and Ns is the precision of the input samples.
Reordering the various intermediate products to be accumulated permits easy address generation and management, so _ g that further complications do not arise in developing the cir-cuit's internal addressing unit.
Figure l is a block diagram oE a one-dimensional trans-form computational circuit for both N=16 and N=8.
MEMl is a memory in which the N vector components of the f(j) samples to be transformed are stored. These components are addressed during reading by an addressing unit ADRl so that output buses 1 and 2 simultaneously hold components of index f(j) and f(N-j-l) respectively, with j increasing sequentially from O
to (N/2-1).
SOMl is a conventional adding circuit for the outputs l and 2 of MEMl, while SOTl is a conventional subtracting circuit which subtracts output 2 from output 1 of MEMl.
RGl and RG2 are two conventional registers for tempor ary storage of outputs of SOMl and SOTl respectively.
ERM and ORM are two circuits capable of pseudo-multi-plying the vector components of the input-sample, by the coeffic-ients of the transformation matrix in accordance with this inven-tion. ERM is used for the even rows of the matrix and receives samples from output 3 of RG1s while ORM is used for odd rows and receives samples from output 4 of RG20 Embodiments of ERM and ORM are considered below.
RG3 and RG4 are two conventional registers or tempor-ary storage of results of operations carried out by circuits ERM
and ORM and which are available on outputs 5 and 6 respectively.
RG3 supplies data through output 8, and RG4 through output 9.
SMSTl and SMST2 are two conventional adding/subtracting circuits which add input 8 to input lO and input 9 to input ll, or subtract input 8 Erom lO and 9 from ll. The choice of opera-tions is controlled by circuit ADR2. The sum is perEormed for the positive coefficients of the transform, while subtraction is performed for the negative coefficients.
MEM2 and MEM3 are two memories for accumulating the partial results R(k) of the sum/subtractions performed by SMSTl and SMST2.
MEM2 memorizes N/2 partial results R(2k) relating to the contributions of the even coefficient matrix rows, while MEM3 memorizes N/2 partial results R(2k~1) of the odd rows (O<k<N~2-1).

4~5 MEM2 and MEM3 are line-structured, with one line for each index k.
RG5 and RG6 are two registers for temporary storage of partial results R(2k) and R(2k-~1) which are read from memories MEM2 and MEM3 respectively. During transform computation, the various partial results are present at outputs 10 and 11 of RG5 and RG6. At the end of the computation, the N components of transform vector F(k) are present at outputs 12 and 13, and are separated into even index components F(2k) at output 12 and odd index components F(2k~1) at output 13.
ADR2 is an address generating circuit which reads from and writes to the memories MEM2 and MEM3, the command signals for operations performed by circuits ERM and ORM, the operation selection signals of SMSTl and SMST2 and output selection signals of RG5 and RG6.
For each column oE the transform coefficient matrix, ADR2 always supplies the same succession of command signals to circuits ERM and ORM to perform the sequence of operations shown as an example in Tables 3 and 4. Items which change as a func-tion of matrix column, include control of the operation performed by SMSTl and SMST2~ and addressing oE memories MEM2 and MEM3 by ADR2 in order to read the partial results R(2k) and R(2k+1) of suitable index k. This is necessary because of the different location of coefficients of equal absolute value in the various matrix columns.
The MEM2 and MEM3 address reading sequence generated by ADR2 can be deduced easily from Tables 1, 2, 3 and 4 on the basis of the following considerations.
The circuit shown in Figure 1 is divided into two branches, relating respectively to the even and odd rows of the coefficient matrix. These branches operate simultaneously~ ADR2 thus simultaneously addresses pairs of partial results, one in MEM2 and one in MEM3.
For both cases, N=16 and N=8 in associated Tables 1 or 2, the top left corner is taken as the origin (column 0, row 0).
For each column m(0~m~N/2-1), ADR2 addresses the positions in MEM2 and MEM3 of row index n(0~n~N-l) such that the same succes-sion of coefficients indicated in Table 3 or 4 is always pro-duced.

~L~814~
.~

In these Tables, the left-hand succession (9, 26, .....
or 7, 25, 71, ....) relates to odd rows~ while the right-hand succession (18, 35, ...., or 7~ 49, 91~ ....) relates to even rows.
Regarding the accumulations relating to the last index (N/2-lth) column, the data at outputs 12 and 13 are the final results of the transformation operationr but in an unordered double sequence, i.e., with variable index k, which depends on the particular order of the sequence coefficients of Table 3 or 4 in the (N/2-l)th column of Table 1 or 2. Consequently, if the primary intention is to minimize the delay with which the circuit computes the transform, the last memorization of results in ~EM2 and MEM3 can be avoided, and these results can be made available on outputs 12 and 13 by means oE a data-ready signal DR supplied by ADR2 at the beginning of the last series of the calculations of SMSTl and SMST2 relating to the (N/2-1) th column. In this case, an external addressing unit downstream of the circuit shown in Figure 1 is assigned the task of ordering the results correct-ly .
Conversely, if it is more important to obtain a cor-rectly ordered sequence of results (with k increasing linearly)at the outputs of the circuit shown in Figure 1, the computation results relating to the last column are still memorized in MEM2 and MEM3. However, AD~2 then supplies the data-ready signal DR
and addresses MEM2 and MEM3 in an alternating sequence to provide the ordered sequence of the components of the N transform vector F(k) at outputs 12 and 13.
ADR2, as far as address generation for MEM2 and MEM3 iS
concerned, may consist of two counters, one for MEM2 and one for MEM3, which count in sequence from O to N/2-1 for each matrix column, followed by a combinatory logic which suitably changes bit justification at counter output in relation to the value of m (i.e., of the column~.
Alternatively, ADR2 may consist of a matrix-structured memory which is addressed sequentially in increasing order, ~irst by column and then by row, and in which the pair of N values used to address MEM2 and MEM3 is written in every position.

~ 2 ~
Construction of both embodiment of ADR2 would not pose problems for those skilled in the art.
Though not shown in the figure, a conventional external synchronizing unit is provided to supply a clock signal to the various sequential circuits and to the ~DR2 circuit. In addi-tion, this unit provides ADR2 with a start signal Eor computingan input sample vector f(j).
In Figure l, circuits MEMl and ADRl are not an essen-tial part of the computational circuit for the DCT transform, but represent a possible example of how the double se~uence of f(j) vectors on inputs l and 2 can be ob-tained.
If the circuit shown in Figure l is set up using an integrated structure, it is advisable to keep the circuits out-side of the integrated circuit. This is partly because they might be part of existing circuits and techniques used will thus depend on the specific case in hand. Care should be taken, how-ever, to obtain the particular sequence of vector f(j) components described above at inputs l and 2.
Figure 2 shows a first example of an embodiment of circuits ~RM and ORM for N=8.
ERM consists of circuits MXl, MX2, SHl, SH2, S~IST3 and RG7, while ORM consists of circuits MX3, M~4, SH3, SH4, SMST4, RG8, RG9 and RGlO.
MXl and MX2 are two conventioanl multiplexers, each of which receives output 3 of RGl (Figure l) and output 15 of RG7.
SHl and SH2 are two circuits which shift the data applied to their input a given variable number of input posi-tions. SHl receives the output of MXl, while SH2 receives the output of MX2. SHl and SH2 are barrel shifter circuits, each consisting of a battery of multiplexers which shift a variable number of input positions in a brief, fixed period (which is the average data propagation time through their structure). The number of shift positions is determined by the bit combination applied to their control input~ These circuits are used instead of normal shift registers because of their greater shifting speed. The outputs of SHl and SH2 are lead to a conventional adding/subtracting circuit SMST3/ which adds the inputs or sub-tracts the output of SH2 from the output of SHl. The output of ~3142S
SMST3 is the output 5 of the ~RM circuit, as well as the input of the conventional register RG7 which memorizes input data upon receiving a loading signal through connection 7. The data output of RG7 iS connected to tlle inputs of MXl and MX2.
The structure of circuit ORM is similar to that of ERM, and consists of two multiplexers MX3 and MX4 followed by two barrel-shifters SH3 and SH4 which supply an adding/subtracting circuit SMST4. The adding/subtracting circuit adds its inputs or subtracts the output of SH3 from the output of SH4. The output of SMST4 is the output 6 of ORM, as well as the input of the con-ventional registers RG8, RGg and RGl0 which memorize input dataupon receiving a loadin~ signal from connection l.
MX3 receives the output 4 of RG2 ( Figure l) and the output 16 of RG8, while MX4 receives the outputs 4 and 17 of RG9 and the output 18 of RG10.
The ADR2 circuit (Figure l) supplies, through connec-tion 7, the input selection signals of multiplexers MXl, MX2, MX3 and MX~; the bit connections determining the extent of shifting carried out by the shifters SHl, SH2, SH3 and SH4; the loading signals for registers RG4, R~8, RG9 and RG10; and the addition/
subtraction selection signals for SMST3 and SMST4. The design of the part of ADR2 which generates the signals on connection 7 is readily deduced from the following description of the ERM and ORM
circuit operations.
For each dat~ on inputs 3 and 4, the ERM circuit multiplies the coefficients in the right-hand column of Table 3, while ORM multiplies the coefficients in the left-hand column, moving from the top to the bottom of the table. To multiply the datum on input 3 by the first coefficient (18), MXl and MX2 are both switched to input 3. SHl shifts to the left by four posi-tions (multiplication by 16), while SH2 shifts to the left by lposition (multiplication by 2). SMST3 then adds, and the result of the multiplication by 18 is passed to output 5, and is also temporarily stored in register RG7 and serves as the subsequent datum on input 3 for multiplication by the second and third co-efficients (35 and 5Q). For the second coefficient (35), MXl is switched to input 15 from RG7, and MX2 to input 3. SHl shifts to the left by l position, SH2 does not shift position, and SMST3 ~'~8~5 finds the difference. For the third coefficient (50), MXl is switched to input 15 and MX2 to input 3. SHl does not shift position, while SH2 shifts 5 positions to -the left. SMST3 adds the two inputs and the result is stored in RG7 in place of the preceding result. The result is also made available at output 5.
For the fourth coefficient (64), MXl and MX2 are switched to in-put 3. SHl and SH2 shifts Eive positions to the left, and SMST3 finds the sum. For the fiEth coefficient (75), ~Xl and MX2 are switched to input 15. SHl shifts one position to the right (division by 2), while SH2 does not shift positionO SMST3 finds the sum of the inputs. For the sixth coefficient (84), MXl is switched to input 15 and MX2 to input 3. SHl shifts one position and SH2 four positions to the left. SMST3 finds the difference.
For the seventh coefficient (89)l MXl is switched to input 15 and MX2 to input 3. SHl shifts 1 position to the right, SH2 shifts 6 positions to the left, and SMST3 finds the sum. For the eighth coefficient (64), the same operations are carried out as for the fourth coefficient.
To multiply the datum on input 4 by the first coeffic-ient (9 r first column in Table 3), MX3 and MX4 are switched to input 4. SH3 does not shift position, SH4 shifts to the left by 3 positions (~ultiplication by 8). SMST4 finds the sum and the result is stored in register RG~ and serves as the datum for sub-sequent multiplication by the second, third and fourth coeffic-ients (26, 43 and 70). The result is also made available at out-put 6.
For the second coefficient (26), MX4 is switched to in-put 4 and MX3 to input 16. SH3 shifts 1 position to the left, SH4 shifts to the left by 3 positions. SMST4 finds the sum and the results are stored in register RG90 For the third coeffici-ent (43), MX3 is switched to input 16 and l`~X4 to input 17. SH3does not shift position, while SH4 shifts one position to the left. SMST4 finds the difference and the result is stored in register RG10. For the fourth coefficient (-70), ~X4 is s~itched to input 4 and MX3 to input 16. SH3 shifts 3 positions to the left, SH4 one position to the left and SMST4 finds the differ-ence. The fourth coefficient is taken with a negative sign ``` 5L,~,8~ 4~
otherwise it would be necessary to invert the inputs of SMST4 which would unnecessarily complicate the circuit. Flowever, the correct sign of the product computed by ORM is restored by a suitable add or subtract command from SMST~ (Fi~ure 1~ without further burdening the circuit. For the fifth coefficient (80), MX3 and MX4 are switched to input 4. SH3 and S~4 shift to the left by 4 and 6 positions respectively. SMST4 finds the sum, and the result is stored in register RG8 in place of the preceding result. For the sixth coefficient (57), MX3 is switched to input 16 and MX4 to input 17. SH3 shifts to the right by 4 positions (division by 16) and SH4 to the left by one position. SMST4 finds the sum. For the seventh coefficient (87), MX3 is switched to input 4 and MX4 to input 18. SH3 does not shift position.
5H4 shifts to the left by one position and SMST4 finds the sum.
For the eighth coefficient (90), MX3 is switched to input 4 and MX4 to input 17. S~3 shifts to the left 6 positions, SH~ does not shift position, and SMST4 finds the sum.
The two ERM and ORM circuits operate simultaneously and thus perform a generic multiplication by means of two shifts and an addition/subtraction, as well as storing the result in one of the registers RG7, O~ RG10 when necessary.
Figure 3 shows a second example of an embodiment of the ERM and ORM circuits, for N=16.
ERM and ORM consist of pairs of adding or subtracting circuits and registers; each pair is dedicated to the product for a given coefficient of Table 3. Each register output is connected to an output multiplexer and, where necessary, to the input of another adding or subtracting circuit. The inputs of the adding or subtracting circuits are suitably justified in order to produce the various multiplications or divisions by 30 power oE two as appearing in Table 3.
The ~DR2 circuit (Figure 1) controls only the multi-plexers to produce two successions of product results in accord-ance with the successions shown in Table 3. ERM produces the product succession for the right-hand column coefficients in Table 3, while ORM produces product successions for the left-hand column coefficients.

1;~8~
Considering the ER~ circuit in greater detail, an add-ing circuit SMl multiplies the first coefficient (18) in the right-hand column of Table 3, and register RGll memorizes the result. SMl receives the output 3 of register RGl (Figure l) shifted by 4 positions to the left, at its first input and by one position at its second input (multiplications by 16 and 2).
~ subtracting circuit STl mul~iplies the second co-efficient (35), and register RGl3 memorizes the result. STl sub-tracts the output 3 at its first input from the output of RGll, shifted to the left by one position~ at its second input.
An adding circuit SM2 multiplies the third coefficient (50), and register RGl2 memorizes the result. SM2 receives out-put 3 shifted to the left by 5 positions and the output of regis-ter RGll.
A register RGl7 multiples the fourth and eighth co-efficient (64) and in fact, receives output 3 shifted by 6 posi-tions.
An adding circuit SM4 multiplies the fifth coefficient (75), and a register R&16 memorizes the result. SM~ receives the output of register RGl2 shifted to the right by one positon (division by 2~ at its first input, and unshifted at its second input.
A subtracting circuit ST2 multiplies the sixth coeffic-ient (84), and register RGl4 memorizes the result. SM3 receives the output 3 shifted to the left by 6 positions and the output of register RGl2 shifted to the right by one position (division by 2).
The outputs of registers RGll, ...., RGl7 are lead to a 7-input multiple~er MX5. The output of RG17 is switched twice by MX5 on output 5.
The ORM circuit comprises an adding circuit SM6 which multiplies the irst coefficient (9) in the left-hand column of Table 3 and a register RGl9 which Memorizes the result. SM6 receives output 4 shifted by 3 positions to the left at its first input, and unshifted at its second input.
An adding circuit SM7 multiplies the second coefficient (26), and a register RG20 memorizes the result. SM7 receives the ~ 17 -output 4 shifted to the left by 3 positions, and the output of register RGl9 scaled to the left by one position.
A subtracting circuit ST~ multiplies the third coeffic-ient (43) and a register RG23 memorizes the result. ST4 sub-tracts the output of reyister RGlC~ from the output of register RG2U shifted to the left by one position.
A subtracting circuit ST3 multiples the fourth coeffic-ient (-70) and a register RG21 memorizes the result. ST3 sub-tracts the output of register RGl9 shifted to the left by 3 posi-tions from the output 4 shifted to the left by one position. In this case, the coefficient sign makes no difference, as it does not influence the complexity of the circuit.
An adding circuit SM5 multiplies the fifth coefficient (80) and a register RG18 memorizes the result. SM5 receives the output 4 shifted to the left by 6 positions at its first input, and to the left by 4 positions at its second input.
An adding circuit SM9 multiplies the sixth coefficient (57) and a register RG24 memorizes the result. SM9 receives the output of register RGl9 shi~ted to the right by 4 positions (division by 16) and the output of register RG20 shifted to the 2~ left by one position.
An adding circuit SM10 multiplies the seventh coefic-ient (87) and a register RG25 memorizes the result. SM 10 receives the output 4 unshifted and the output of register RG23 shifted to the left by one position.
An adding circuit SM8 multiplies the eighth coefficient (90) and a register RG22 memorizes the result. SM8 receives the output 4 scaled to the left by six positions, and ~he output oF
register RG2~ unshifted.
The outputs of registers RG18, .... , RG25 are lead to an 8-input multiplexer MX6.
As previously stated, the ADR2 circuit (Figure 1) con-trols only multiplexers MX5 and MX6 by selecting, through the connection 7 control signals, a suitable succession of input con-nections with outputs 5 and 6. It is not necessary to provide special loading co~mands for registers RGll, RG12, O~ RG25, as it is sufficient that data at their outputs be stable and correct - 18 ~

~LX~3~4~
at the time they are taken by multiplexers MX5 and MX6. Further-more, data at the outputs 3 and 4 of registers RGl and RG2 respectively (Figure 1) remaln fixed for the time required for circuits ERM and O~ to perform all calculations, i.e. the calcu-lations relating to one column of the transform coefficient matri~. Consequently, it is sufficient to start register loading by means of a conventional clock signal, which is input to all circuit registers but is not shown in the figure for reasons of simplicity, and the various results of the operation propagate among the various cascade-connected register levels.
1Q After the first clock signal pulse, the outputs of -the first level registers RGll ~ERM circuit) and RGl9 (O~M circuit) hold a correct, stable datum which is thus immediately communi-cated through multiplexers MX5 and MX6 to outputs 5 and 6, and so forth. MX5 and MX6 switching then follows clock signal cadence.
The embodiment shown in Figure 3 calls for the use of a larger number of components (or of equivalent logic gates if integrated circuits are employed~ than the example shown in Figure 2, but permits higher computing speeds and simplifies addressing by the ADR2 circuit.
Figure 4 shows a third embodiment of the circuits ERM
and ORM, for N=8. As this circuit solution follows that shown in Figure 2, the reader is directed to the description oE the cir-cuit for general considerations regarding the components used, the operation and the control mode of the ADR2 circuit (Figure 1).
For each datum at inputs 3 and ~, the ERM circuit multiplies the datum by the coefficients in the right hand column in Table 4, while ORM multiplies the datum by the coefficients in the left hand column, proceeding from the top to the bottom of Table 4.
The ERM block consists of two multiplexers MX7 and MX8, a barrel shifter SH5 downstream of MX4, an adding/subtracting circuit SMST5 which subtracts the output oE SH5 from the output of MX8, and two registers RG27 and RG28 which memorize the output 5 oE SMST5. At its inputs, MX4 receives the outputs 3 and 20 of RG27 while the inputs oE MX8 receive the outputs 3 and 20 of RG27 and the output 21 of RG2 8.

-- 19 - ' ~L~8~ 5 Only one barrel shiEter is necessary, as in order to obtain multiplications by powers of two of the Eirst addend of the breakdowns shown in the right-hand column on Table 4, it is sufficient for the inputs of multiplexer MX8 to be suitably justified. The output 21 of register RG28 is sent to an input of MX8 shifted to the left by one position (multiplication by 2 of coefficient 49 and 91), the output 20 of register RG27 and the output 3 of ~Gl (Figure 1) are sent to MX8 shifted to the left by three positions (multiplication by 8 of coefficient 7 and the input datum).
To multiply the datum on input 3 by the first coeffici-ent (7), MX7 and MX8 are switched to input 3, SH5 does not shift position, and the output of SMST5 is stored in RG27 and is not accumulated in MEM2 (Figure 1~ because it is used only inside ERM. For operations on the second coefficient (49), MX7 and MX8 are switched to the output 20 of RG27, SH5 does not shift posi-tion, and the output of SMST5 is stored in RG28 and is sent through output 5 and accumulated in MEM2 (Figure 1). For opera tions on the third coefficient (91), MX7 is switched to the out-put 20 of RG27, MX8 is switched to the output 21 of RG28, and SH5 does not shift position. The same cycle of operations is used for the fourth coefficient (91). The result of SMST5 iS stored in RG28. For operations on the fifth coefficient (118), MX7 is switched to output 3, MX8 is switched to the output 21 of RG28, and SH5 shifts to the left by 6 positions (multiplication by 64).
The ORM circuit consists of two multiplexers MX9 and MX10 followed by two barrel-shifters SH6 and SH7 which feed an adding/subtracting circuit SMST6 the output 6 of which is stored in two registers RG29 and RG30. MX9 receives the output 4 of RG2 (Figure 1) and the output 22 of RG29, while MX10 receives the 30 outputs 4 and 33 of RG30.
To multiply the datum on input 4 by the first coeffici-ent (7) in the left-hand column of Table 4, MX9 and MX10 are switched to the output 4 of RGl ~Figure 1), SH5 does not shift, SH7 shifts to the left by three positions and SMST6 subtracts the output of SH~ from the output of SH7. The output of SMST6 is stored only in RG29, and is not accumulated in MEM2 S
(Figure 1) because it is used only inside ORM. For operations on the second coefficient (25), MX9 is switched to the output 22 of RG29, MX10 is switched to output 4, SF~6 does not shift, SH7 shiEts by five positions, SMST6 subtracts the output of SH6 from the output of SH7, and the output is stored in RG30~ For opera-tions on the third coeEficient (71) 7 ~X9 iS switched to the out-put 22 of RG29, MX10 is switched to output 4, S~6 does not shift, SH7 shifts by si~ positions, and SMST6 adds the outputs of SH6 and SH7. For operations on the fourth coefficient (106), MX9 is switched to the output 22 of RG29, MX10 is switched to the output 23 of RG30, SH6 shifts to the left by 3 positions, SH7 by one position, and SMST6 adds the outputs of SH6 and SH7. For opera-tions on the fifth coefficient (126), MX9 and MX10 are switched to output 4, SH6 shifts to the left by one position and SH7 to the left by seven positions, and SMST6 subtracts the output of SH6 from the output of SH7.
Figure 5 shows as fourth embodiment of circuits ERM and ORM, for N=8.
As this circuit solution follows that shown in Figure 3, the reader is directed to the description of these circuits for general considerations regarding the components used, the operation and the control mode of the ADR2 circuit (Figure 1~.
For each datum at inputs 3 and 4, the ERM circuit multiplies the datum by the coefficients in the right-hand column of Table 4, while ORM multiplies the datum by the coefficients in the left-hand column of the Table.
In the ERM block, a subtracting circuit ST6 multiplies the datum on the output 3 of RGl (Figure 1) by the first coeffic-ient (7) in the right-hand column on Table 4, and the result is stored in register RG31.
At its two inputs, ST6 receives output 3, which is shifted to the left by three positions at the minuend input.
A subtracting circuit ST7 multiplies the second coefficient (49) and the result is stored in register RG32. At its two inputs, ST7 receives the output of RG31, which is shifted by three positions at the minuend input~
A subtracting circuit ST8 multiplies the third coeficient (91) and the result is stored in register RG33. ST8 ~;~8~ 5 subtracts the output of RG31 Erom the output of RG32 shifted to the leEt by one position.
A subtracting circuit ST9 multiplies the Eourth coefficient (118) and the result is stored in register RG34. srr9 subtracts output 3 shifted to the left by six positions from the output of RG33 shifted to the left by one position.
The outputs of registers RG32, RG33 and RG34 are sent to the inputs of a 3-input multiplexer MX11, and are connected in turn to output 5.
In the ORM circuit, a subtracting circuit ST10 multiplies the datum on the output 4 of RG2 (Figure 1) with the first coefficient (7) in the left-hand column of Table 4, and the result is stored in register RG35.
At its two inputs, ST10 receives output 4, which is shifted to the left by three positions at the minuend input.
A subtracting circuit ST12 multiplies the second coefficient (25) and the result is stored in register RG37. STl2 subtracts the output of RG35 from output 4 shifted to the left by five positions.
An adding circuit SM12 multiplies the third coefficient ~71), and the result is stored in register RG38. SM12 adds output 4, shifted to the left by si~ positions, to the output of RG35.
An adding circuit SM13 multiplies the fourth coefficient (106)~ and the result is stored in a register RG39.
SM13 adds the output of RG35, shiEted to the left by three positions to the output of RG37 shifted to the left by one posltlon.
A subtracting circuit STll multiplies the fifth coefficient (126), and register RG36 stores the result. At its two inputs, STll receives output 4, which is shifted to the left by seven positions at the minuend input, and to the left by one position at the subtrahend input.
The outputs of registers RG36, RG37, RG38, and RG39 are sent to the inputs of a 4-input multiplexer MX12, and in turn are connected to output 6.

.

-` ~8~L4Z~

The embodiment shown in Figure 5 uses a larger number of components (or of equivalen-t logic gates if integrated circuits are employed) than the example shown in Figure 4, but permits higher computiny speeds and simplifies addressing by the ADR2 circuit.
The circuit shown in Figure l may be used to compute the n-dimensional DCT transform for any n, and in particular may be used in the case of greatest practical interest, i.e., n=l, 2, 3.
In the one-dimensional case, the circuit shown in Figure 1 perorms a series of calculations on an input sample vector. In the 2- and 3-dimensional cases, it can be shown that the operations are repeated 2N and 3N2 times respectively.
Thus, the circuit shown in Figure 1 requires expansion to include memories for intermediate transforms and product accumulation and standard circuits for addressing these memories, which can also perform the functions of circuits MEMl and ADR1 (Figure l).
Advantageously this invention may be a VLSI circuit.
Supposing a representation precision of 12 bits for each component of the input vector f(j) component and 8 bits for the matrix of transform coefficients, the circuit's complexity can be evaluated for each of the variants described.
For the ERM and ORM circuits as shown in Figure 2, the overall circuit requires around 3500 equivalent gates (for examplep with HCMOS technology, each equivalent gate consists of four transistors) 9 plus a 16x24 bit accumulation memory (MEM2 and MEM3 circuits, Figure l). This embodiment provides an execution time per elementary operation of around 60ns. For the ERM and ORM circuits shown in Figure 3, the overall structure requires around 5800 equivalent gates and an equal accumulation memory, with execution time per elementary operation of around 2Ons.
The ERM and ORM circuits shown in Fiyures 3 and 4 require 2900 and 4000 equivalent yates respectively and an 8x23 bit accumulation memory. The execution times per elementary operation are still 60 ns and 20 ns respectively.

. .

~8~4~i Figure 6 showns an embodiment of a computational circuit for a two-dimensional DCT transform. It consists of two cascade-connected, one-dimensional, DCT transform, computation circuits (as shown in Figure 1), indicated as DCTl and DCT2, with an interposed memory MEM4 for temporary storage of intermediate result vectors F(k). The ADR3 circuit generates read/write addresses for MEM~ and synchronizes the address generators ADRl and ADR2 (Figure 1) of circuits DCTl and DCT2. Memory MEM4 is matrix-structured to contain N vectors F(k) each of N components, and is read in a manner orthogonal to the manner in which it is written (read by columns and written by rows or vice versa).
Design of the circuit shown in Figure 6 would not pose problems ~or those skilled in the art.

- 2~ -

Claims (14)

1. A circuit for computing the discrete cosine transform of f(j) sample vectors each of dimension N(0?j?N-1), the trans-form having a square matrix of coefficients and having a dimen-sion of N.N, the coefficients repeating in absolute value in each column of the square matrix but the order of which and, in some cases, the sign of which differs, the circuit obtaining trans-formed F(k) sample vectors each of dimension N(0?k?N-1) and com-prising two circuit branches in parallel, the first circuit branch for operations relating to coefficients in even rows of the square matrix and the second circuit branch for coefficients in odd rows, the two branches comprising:
(a) a first adding circuit and a first subtracting circuit, forming part of the first and the second branch respec-tively, which at their inputs receive pairs of samples of an F(j) vector having indices (j) and (N-j-1) res-pectively, with j increasing sequentially from 0 to N/2-1;
(b) a first calculating circuit, forming part of the first branch, which, for each addition result received from the first adding circuit, calculates N/2 partial pro-ducts of the result with the coefficients of the columns of the square matrix in even rows, in a sequen-tial order of the coefficients, which is fixed for all columns, to produce each partial product through an addition and shifting operation which involves the pre-vious partial products and/or the input datum;
(c) a second calculating circuit, forming part of the second branch, which, for each subtraction result re-ceived from the first subtracting circuit, calculates N/2 partial products of the result with the coeffici-ents of the columns of the square matrix in odd rows, in a sequential order of the coefficients, which is fixed for all columns, to produce each partial product through an addition and shifting operation which in-volves the previous partial products and/or the input datum;

(d) a first and a second adding/subtracting circuit forming part of the first and the second branch respectively, which add or subtract the result of a previous addition or subtraction from a partial product received from the first and the second calculating circuit respectively, and which finds the sum in the case of a partial product referring to a positive coefficient, or finds the difference in the case of a negative coefficient;
(e) a first set and a second set of memories, forming part of the first and the second branch respectively, each accumulating the N/2 partial results, resulting from the calculations performed by the first and the second adding/subtracting circuit respectively, the first set of memories accumulating partial results R(2k) relating to even rows of the square matrix, the second set of memories accumulating partial results R(2k+1) relating to the odd rows, the partial results being components of a transformed sample vector F(k) at the N/2-1th column; and, (f) a first addressing unit which generates first control signals for the first and the second calculating circuits, the first control signals determining the sequential order of the coefficients in a column, the order fixed for all columns, and determining the addresses to be read in the first set of and the second set of memories of the partial results to be supplied as data to an input of the first and the second adding/
subtracting circuit respectively and for re-writing the partial results updated in the same position, the addresses having a sequence which varies with the column of the square matrix, to identify the partial result R(2k) and R(2k+1), the indices of which corres-pond to the row of the square matrix containing the co-efficient for which the first and the second calculat-ing circuits perform the partial products, and which generates operation selection signals for the first and the second adding/subtracting circuit.
2. A circuit according to claim 1 in which, for N=16, the first calculating circuit comprises:
(a) a first and a second multiplexer which, each at its first input, receives the output of the first adding circuit and, each at its second input, receives the output of a first register;
(b) a first and a second shifting unit which receives, res-pectively, the output of the first multiplexer and the output of the second multiplexer; and (c) a third adding/subtracting circuit which receives the outputs of the first and the second shifting units, for subtracting the output of the second shifting unit from that of the first shifting unit, and the output of which is input to the first register and is the output of the first calculating circuit;
and in which the second calculating circuit comprises:
(d) a third and a fourth multiplexer which, each at its first input, receives the output of the first subtract-ing circuit, the third multiplexer receiving at its second input the output of a second register, the four-th multiplexer receiving at its third and its fourth input the outputs of a third and a fourth register;
(e) a third and a fourth shifting unit which receive res-pectively, the outputs of the third and fourth multi-plexers; and (f) a fourth adding/subtracting circuit which receives the outputs of the third and fourth shifting units, for subtracting the output of the third shifting unit from that of the fourth shifting unit, and the output of which is input to the second, third and fourth regis-ters and is the output of the second calculating cir-cuit, the first, second, third and fourth registers being used for temporary storage of the previous partial products.
3. A circuit according to claim 2 in which the first con-trol signals generated by the first addressing unit causes the following succession of operations to be performed by the first and the second calculating circuits: for the first partial pro-duct, the first, second, third and fourth multiplexers are each switched to its first input, the first, second, third and fourth shifting units shift to the left, the direction of multiplica-tion, by four, one, zero and three positions, respectively, the third and the fourth adding/subtracting circuits, find the sum, and the first and the second registers memorize the input datum;
for the second partial product, the first, second, third and fourth multiplexers are each switched to its second, first, second, first input, respectively, the first, second, third and fourth shifting units shift to the left by one, zero, one and three positions, respectively, the third and the fourth adding/
subtracting circuits find the sum and the difference, respective-ly, and the third register memorizes the input data; for the third partial product, the first, second, third and fourth multi-plexers are each switched to its second, first, second and third input, respectively, the first, second, third and fourth shifting units shift to the left by zero, five, zero and one position, re-spectively, the third adding/subtracting circuit finds the sum and the fourth adding/subtracting circuit finds the difference, and the first and fourth registers memorize the input data; for the fourth partial product, the first, second and fourth multi-plexers are each switched to its first input, the third multi-plexer to its second input, the first, second, third and fourth shifting units shift to the left by five, five, three and one position, respectively, the third adding/subtracting circuit finds the sum and the fourth adding/subtracting circuit finds the difference; for the fifth partial product, the first and second multiplexers are each switched to its second input, the third and fourth each to its first input, the first shifting unit shifts to the right, the direction of division, by one position, the third and fourth shifting units shift to the left by six and four posi-tions, respectively, the third and the fourth adding/subtracting circuits find the sum, and the second register memorizes the in-put datum; for the sixth partial product, the first, second, third and fourth multiplexers are each switched to its second, first, second and third input, respectively, the first, second and fourth shifting units shift to the left by one, four and one position respectively, the third shifting unit shifts to the right by four positions, the third adding/subtracting circuit finds the difference and the fourth adding/subtracting circuit finds the sum; for the seventh partial product, the first, second, third and fourth multiplexers are each switched to its second, first and fourth input, respectively, the first shifting unit shifts to the right by one position, the second and fourth shifting units shift to the left by six and one position, res-pectively, and the third and the fourth adding/subtracting cir-cuits find the sum; for the eighth partial product, the first, second and third multiplexers are each switched to its first in-put, the fourth to its third input, the first, second and third shifting units shift to the left by five, five and six positions respectively, and the third and fourth adding/subtracting cir-cuits find the sum.
4. A circuit according to claim 1 in which, for N=16, the first and the second calculating circuits include six and eight adding or subtracting circuits, respectively, which produce an equal number of partial products through suitable justification of each of their two inputs; six and eight registers, respective-ly which memorize the partial products; a further register having as input the input to the first calculating circuit; and multi-plexers which receive the outputs of the registers, the first control signals generated by the addressing units determining only the connection sequence between the inputs and the outputs of the multiplexers.
5. A circuit according to claim 4 in which the first cal-culating circuit comprises the following interconnection between the adding or subtracting circuits and the intercalated regi-sters: the output of the first adding circuit, shifted to the left by one and four positions, is applied to a second adding circuit; the output of the first adding circuit, shifted to the left by five positions, and the output of the second adding cir-cuit areapplied to a third adding circuit; the output of the second adding circuit, shifted to the left by one position, and the output of the first adding circuit are applied to a second subtracting circuit; the output of the first adding circuit, shifted to the left by four positions, and the output of the third adding circuit, shifted to the left by one position are applied to a third subtracting circuit; the output of the first adding circuit, shifted to the left by six positions, and the output of the third adding circuit, shifted to the right by one position are applied to a fourth adding circuit; the output of the third adding circuit shifted to the right by one position is applied to a fifth adding circuit at one of its inputs; the further register receives the output of the first adding circuit shifted to the left by six positions; and in which the second calculating circuit comprises the following interconnection bet-ween the adding and subtracting circuits and the intercalated registers: the output of the first subtracting circuit, shifted to the left by six and four positions, respectively and to the left by three and zero positions, respectively, is applied to a sixth and a seventh adding circuit; the output of the first sub-tracting circuit, shifted to the left by three positions, the output of the seventh adding circuit, shifted to the left by one position are applied to an eighth adding circuit; the output of the first subtracting circuit, shifted to the left by one posi-tion, and the output of the seventh adding circuit, shifted to the left by three positions are applied to a fourth subtracting circuit; the output of the seventh adding circuit and the output of the eighth adding circuit, each shifted to the left by one position are applied to the fifth subtracting circuit; the output of the first subtracting circuit, shifted to the left by six pos-itions, and the output of the eighth adding circuit are applied to a ninth adding circuit; the output of the sixth adding cir-cuit, shifted to the right by four positions and the output of the eighth adding circuit, shifted to the left by one position are applied to a tenth adding circuit; the output of the first subtracting circuit, and the output of the fifth subtracting cir-cuit, shifted to the left by one position are applied to an eleventh adding circuit; and in which the multiplexers each con-nects its inputs to its output in the following order: for the first partial product, the output of the second and the output of the seventh adding circuit are connected; for the second partial product, the output of the second subtracting circuit and the output of the eighth adding circuit are connected; for the third partial product, the output of the third adding circuit and the output of the fifth subtracting circuit are connected; for the fourth partial product, the output of the further register and the output of the fourth subtracting circuit are connected; for the fifth partial product, the output of the fifth adding circuit and the output of the sixth adding circuit are connected; for the sixth partial product, the output of the third subtracting cir-cuit and the output of the tenth adding circuit are connected;
for the seventh partial product, the output of the fourth adding circuit and the output of the eleventh adding circuit are con-nected; for the eighth partial product, the output of the further register and the output of the ninth adding circuit are connect-ed.
6. A circuit according to claim 1 in which, for N=8, the first calculating circuit includes:
(a) a fifth and a sixth multiplexer, each of which receiv-es, at its first input, the output of the first adding circuit and, at its second input, the output of a fifth register; the sixth multiplexer receiving at its third input the output of a sixth register shifted to the left by one position, and the outputs applied to its first and second inputs shifted to the left by three positions;
(b) a fifth shifting unit which receives the output of the fifth multiplexer; and (c) a sixth subtracting circuit which subtracts the output of the fifth shifting unit from that of the sixth multiplexer, and the output of which is passed to the fifth and sixth registers and is the output of the first calculating circuit;
and in which the second calculating circuit includes:
(d) a seventh and an eighth multiplexer, each of which receives, at its first input, the output of the first subtracting circuit and, at its second input, the out-puts of a seventh and an eighth register;
(e) a sixth and a seventh shifting unit which receive, respectively, the output of the seventh multiplexer and the output of the eighth multiplexer; and (f) a sixth adding/subtracting circuit which receives the output of the sixth shifting unit and the output of the seventh shifting unit, for subtracting the output of the sixth shifting unit from that of the seventh shift-ing unit, and the output of which is passed to the seventh register and the eighth register and is the output of the second calculating circuit; the fifth, sixth, seventh, and eighth registers used for temporary storage of the previous partial product.
7. A circuit according to claim 6 in which the first control signals generated by the addressing unit causes the first and second calculating circuits to undertake the following succession of operations: for the first partial product/ the fifth, the sixth, the seventh and the eighth muliplexers are each switched to its first input, the fifth, sixth, seventh shifting units shift to the left by zero, zero and three positions, respectively, the sixth adding/subtracting circuit performs subtraction, and the fifth and seventh registers memorize the input datum; for the second partial product, the fifth, sixth, and seventh multiplexers are each switched to its second input, the eighth multiplexer is switched to its first input, the fifth, sixth and seventh shifting units shift by zero, zero and five positions, respectively, the sixth adding/
subtracting circuit performs subtraction, and the sixth and eighth registers memorize the input datum; for the third partial product, the fifth, sixth, seventh and eighth multiplexers are each switched to its second, third, second and fifth input, respectively, the fifth, sixth and seventh shifting units shift to the left by zero, zero and six positions, respectively, and the sixth adding/subtrac-ting circuit finds the sum; for the fourth partial product, the fifth, sixth, seventh and eighth multiplexers are each switched to its second, third, second and second input, respectively, the fifth, sixth and seventh shifting units shift to the left by zero, three and one position, respectively, the sixth adding/subtracting circuit finds the sum, and the seventh register memorizes the input datum; for the fifth partial product, the fifth, sixth, seventh and eighth multiplexers are each switched to its first, third, first and first input, respectively, the fifth, sixth and seventh shift-ing units shift to the left by six, one and seven positions, res-pectively, and the sixth adding/subtracting circuit performs sub-traction.
8. A circuit according to claim 1 in which, for N=8, the first and the second calculating circuits include four and five adding or subtracting circuits respectively which produce an equal number of partial products through suitable justification of each of their two inputs; four and five registers, respectively which memorize the results thereof; and an output multiplexer which receives the outputs of the registers, the first control signals generated by the addressing unit determining the connection se-quence of the inputs of the multiplexers with the outputs.
9. A circuit according to claim 8 in which the first calcula-ting circuit comprises the following interconnections between the adding or subtracting circuits and the intercalated registers: the output of the first adding circuit, shifted to the left by three positions, is applied to a seventh subtracting circuit at one of its inputs; the output of the seventh subtracting circuit, shifted by three positions, is applied to an eighth subtracting circuit;
the output of the seventh subtracting circuit, and the output of the eighth subtracting circuit, shifted to the left by one posi-tion, are applied to a ninth subtracting circuit; the output of the first adding circuit, shifted to the left by six positions, and the output of the ninth subtracting circuit, shifted to the left by one position, are applied to a tenth subtracting circuit; and in which the second calculating circuit comprises the following interconnec-tions between the adding or subtracting circuits and the intercala-ted registers: the output of the first subtracting circuit, shifted to the left by three, zero, seven and one position, respectively,is applied to an eleventh and a twelfth subtracting circuit, the out-put of the first adding circuit, shifted to the left by five posi-tions, and the output of the eleventh subtracting circuit are applied to a thirteenth subtracting circuit; the output of the first subtracting circuit, shifted to the left by six positions, and the output of the eleventh subtracting circuit are applied to a twelfth adding circuit; the output of the eleventh subtracting cir-cuit, shifted to the left by three positions, and the output of the thirteenth subtracting circuit, shifted to the left by one posi-tion, are applied to a thirteenth subtracting circuit; the multi-plexers each connecting its inputs to its output in the following order: for the first partial product, the output of the eighth and the output of the thirteenth subtracting circuits are connected;
for the second partial product, the output of the ninth subtracting circuit and the output of the twelfth adding circuit are connected;
for the third partial product, the output of the ninth subtracting circuit and the output of the thirteenth adding circuit are connec-ted; for the fourth partial product, the output of the tenth and the output of the twelfth subtracting circuits are connected.
10. A circuit according to claim 1 in which the first addressing unit supplies a data-ready signal at the beginning of the operations relating to the (N/2-1)th column; on the basis of this signal, the partial results R(k) at the outputs of the first and the second adding/subtracting circuits are made available at the outputs of the circuit as components of a transformed F(k) sample vector.
11. A circuit according to claim 1 in which the first addressing unit supplies, at the end of calculations relating to the (N/2-1)th column, addresses to the first set of and the second set of memories for sequential reading and forwarding to the outputs of the circuit, the R(k) partial results, memorized, in the memories, as components of a transformed F(k) sample vector.
12. A circuit according to claim 1 in which the data at the outputs of the first adding circuit, the first subtracting cir-cuit, the first and the second calculating circuits and the first set of and the second set of memories are synchronized by the registers.
13. A circuit according to any one of claims 2, 3 and 6 in which the shifting units comprise barrel shifters.
14. A circuit for computing a two-dimensional, discrete, cosine transform comprising one-dimensional circuits according to claim 1 in which the circuit includes a first one-dimensional circuit followed by a matrix-structured, intermediate memory and by a second one-dimensional circuit, and a second addressing unit for the intermediate memory which determines writing of N vectors of transformed F(k) samples, calculated by the first one-dimen-sional circuit, and reading in a direction orthogonal to that of writing, of N vectors of f(j) samples to be transformed, each consisting for the (k-n)th component of the F(k) vectors present in the intermediate memory, for forwarding to the second one-dimensional circuit.
CA000555741A 1987-01-20 1988-01-04 Circuit for computing the quantized coefficient discrete cosine transform of digital signal samples Expired - Lifetime CA1281425C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IT67032-A/87 1987-01-20
IT8767032A IT1207346B (en) 1987-01-20 1987-01-20 DISCREET DISCREET COSE COEFFI CIRCUIT FOR THE CALCULATION OF THE QUANTITIES OF NUMERICAL SIGNAL SAMPLES

Publications (1)

Publication Number Publication Date
CA1281425C true CA1281425C (en) 1991-03-12

Family

ID=11299044

Family Applications (1)

Application Number Title Priority Date Filing Date
CA000555741A Expired - Lifetime CA1281425C (en) 1987-01-20 1988-01-04 Circuit for computing the quantized coefficient discrete cosine transform of digital signal samples

Country Status (6)

Country Link
US (1) US4849922A (en)
EP (1) EP0275979B1 (en)
JP (1) JPH0622033B2 (en)
CA (1) CA1281425C (en)
DE (1) DE3875979T2 (en)
IT (1) IT1207346B (en)

Families Citing this family (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6336180B1 (en) 1997-04-30 2002-01-01 Canon Kabushiki Kaisha Method, apparatus and system for managing virtual memory with virtual-physical mapping
FR2646046B1 (en) * 1989-04-18 1995-08-25 France Etat METHOD AND DEVICE FOR COMPRESSING IMAGE DATA BY MATHEMATICAL TRANSFORMATION WITH REDUCED COST OF IMPLEMENTATION, IN PARTICULAR FOR TRANSMISSION AT REDUCED THROUGHPUT OF IMAGE SEQUENCES
IT8921420V0 (en) * 1989-07-13 1989-07-13 Telettra Spa SYSTEM AND CIRCUIT FOR THE CALCULATION OF TWO-DIMENSIONAL DISCRETE TRANSFORMED.
US5053985A (en) * 1989-10-19 1991-10-01 Zoran Corporation Recycling dct/idct integrated circuit apparatus using a single multiplier/accumulator and a single random access memory
US5359549A (en) * 1989-12-01 1994-10-25 Ricoh Company, Ltd. Orthogonal transformation processor for compressing information
US5268853A (en) * 1989-12-01 1993-12-07 Ricoh Company, Ltd. Orthogonal transformation processor for compressing information
DE69225628T2 (en) * 1991-02-19 1998-11-26 Matsushita Electric Ind Co Ltd Orthogonal transformation device for video signal processing
US5257213A (en) * 1991-02-20 1993-10-26 Samsung Electronics Co., Ltd. Method and circuit for two-dimensional discrete cosine transform
JP2964172B2 (en) * 1991-03-08 1999-10-18 富士通株式会社 DCT matrix operation circuit
JP2866754B2 (en) * 1991-03-27 1999-03-08 三菱電機株式会社 Arithmetic processing unit
FR2683694A1 (en) * 1991-11-08 1993-05-14 Matra Communication Video signal coding device with time-based activity
EP0575675B1 (en) * 1992-06-26 1998-11-25 Discovision Associates Method and apparatus for transformation of signals from a frequency to a time domaine
US5394349A (en) * 1992-07-10 1995-02-28 Xing Technology Corporation Fast inverse discrete transform using subwords for decompression of information
US5339265A (en) * 1992-08-31 1994-08-16 University Of Maryland At College Park Optimal unified architectures for the real-time computation of time-recursive discrete sinusoidal transforms
JP2725544B2 (en) * 1992-11-12 1998-03-11 日本電気株式会社 DCT and inverse DCT operation device and operation method thereof
US5345408A (en) * 1993-04-19 1994-09-06 Gi Corporation Inverse discrete cosine transform processor
US5829007A (en) 1993-06-24 1998-10-27 Discovision Associates Technique for implementing a swing buffer in a memory array
AT402586B (en) * 1994-05-05 1997-06-25 Siemens Ag Oesterreich Method for carrying out the discrete cosine transform
GB2307072B (en) 1994-06-10 1998-05-13 Advanced Risc Mach Ltd Interoperability with multiple instruction sets
US5943502A (en) * 1994-12-09 1999-08-24 Neomagic Israel Ltd. Apparatus and method for fast 1D DCT
US5784011A (en) * 1996-06-14 1998-07-21 Lsi Logic Corporation Multiplier circuit for performing inverse quantization arithmetic
US5781239A (en) * 1996-06-20 1998-07-14 Lsi Logic Corporation System and method for performing an optimized inverse discrete cosine transform with improved efficiency
AUPO648397A0 (en) 1997-04-30 1997-05-22 Canon Information Systems Research Australia Pty Ltd Improvements in multiprocessor architecture operation
US6246396B1 (en) 1997-04-30 2001-06-12 Canon Kabushiki Kaisha Cached color conversion method and apparatus
AUPO647997A0 (en) 1997-04-30 1997-05-22 Canon Information Systems Research Australia Pty Ltd Memory controller architecture
US6414687B1 (en) 1997-04-30 2002-07-02 Canon Kabushiki Kaisha Register setting-micro programming system
US6674536B2 (en) 1997-04-30 2004-01-06 Canon Kabushiki Kaisha Multi-instruction stream processor
US6707463B1 (en) 1997-04-30 2004-03-16 Canon Kabushiki Kaisha Data normalization technique
US6356995B2 (en) * 1998-07-02 2002-03-12 Picoturbo, Inc. Microcode scalable processor
JP3934290B2 (en) * 1999-09-30 2007-06-20 株式会社東芝 Discrete cosine transform processing device, inverse discrete cosine transform processing device, discrete cosine transform processing device, and inverse discrete cosine transform processing device
AU2578001A (en) * 1999-12-10 2001-06-18 Broadcom Corporation Apparatus and method for reducing precision of data
US6895421B1 (en) 2000-10-06 2005-05-17 Intel Corporation Method and apparatus for effectively performing linear transformations
WO2002035380A1 (en) * 2000-10-23 2002-05-02 International Business Machines Corporation Faster transforms using scaled terms, early aborts, and precision refinements
US6859815B2 (en) * 2000-12-19 2005-02-22 Koninklijke Philips Electronics N.V. Approximate inverse discrete cosine transform for scalable computation complexity video and still image decoding
DE10311323A1 (en) * 2003-03-14 2004-09-30 Infineon Technologies Ag Device for synchronizing a mobile radio receiver to a frame structure of a received radio signal
CN107066234B (en) * 2017-04-21 2020-05-26 重庆邮电大学 Design method of quantum multiplier

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4196448A (en) * 1978-05-15 1980-04-01 The United States Of America As Represented By The Secretary Of The Navy TV bandwidth reduction system using a hybrid discrete cosine DPCM
US4385363A (en) * 1978-12-15 1983-05-24 Compression Labs, Inc. Discrete cosine transformer
US4293920A (en) * 1979-09-04 1981-10-06 Merola Pasquale A Two-dimensional transform processor
US4449194A (en) * 1981-09-25 1984-05-15 Motorola Inc. Multiple point, discrete cosine processor
US4562484A (en) * 1983-08-19 1985-12-31 Advanced Micro Devices, Inc. Method and device for decoding two-dimensional facsimile signals
FR2561011B1 (en) * 1984-03-09 1986-09-12 Cit Alcatel PROCESSOR FOR CALCULATING A DISCRETE INVERSE COSINUS TRANSFORM
FR2561010B1 (en) * 1984-03-09 1986-09-12 Cit Alcatel PROCESSOR FOR CALCULATING A DISCRETE COSINUS TRANSFORM

Also Published As

Publication number Publication date
JPH0622033B2 (en) 1994-03-23
IT8767032A0 (en) 1987-01-20
EP0275979B1 (en) 1992-11-19
US4849922A (en) 1989-07-18
JPS63182773A (en) 1988-07-28
IT1207346B (en) 1989-05-17
DE3875979D1 (en) 1992-12-24
DE3875979T2 (en) 1993-04-29
EP0275979A3 (en) 1989-11-02
EP0275979A2 (en) 1988-07-27

Similar Documents

Publication Publication Date Title
CA1281425C (en) Circuit for computing the quantized coefficient discrete cosine transform of digital signal samples
US6073154A (en) Computing multidimensional DFTs in FPGA
Madisetti et al. A 100 MHz 2-D 8/spl times/8 DCT/IDCT processor for HDTV applications
US6366936B1 (en) Pipelined fast fourier transform (FFT) processor having convergent block floating point (CBFP) algorithm
US5105378A (en) High-radix divider
WO1994010638A1 (en) Scalable dimensionless array
US5831883A (en) Low energy consumption, high performance fast fourier transform
US3591787A (en) Division system and method
US11074041B2 (en) Method and system for elastic precision enhancement using dynamic shifting in neural networks
Lin et al. Scalable montgomery modular multiplication architecture with low-latency and low-memory bandwidth requirement
Kammoun et al. Hardware acceleration of approximate transform module for the versatile video coding standard
Lim et al. A serial-parallel architecture for two-dimensional discrete cosine and inverse discrete cosine transforms
Coelho et al. Computation of 2D 8× 8 DCT based on the Loeffler factorization using algebraic integer encoding
US3591784A (en) Real time digital fourier analyzer
Cardarilli et al. RNS applications in digital signal processing
Ruiz et al. Parallel-pipeline 8/spl times/8 forward 2-D ICT processor chip for image coding
Tawalbeh Radix-4 asic design of a scalable montgomery modular multiplier using encoding techniques
Bruguera et al. 2-D DCT using on-line arithmetic
Villalba et al. Radix-4 vectoring cordic algorithm and architectures
Li et al. Low power design of two-dimensional DCT
Ismail et al. High speed on-chip multiple cosine transform generator
Duspara et al. Discrete cosine transform hardware accelerator in parallel ultra-low power system
Parhami Modular reduction by multi-level table lookup
Mora et al. High-performance architecture for digital transform processing
Nair et al. An asynchronous double precision floating point multiplier

Legal Events

Date Code Title Description
MKLA Lapsed