Search Images Maps Play YouTube News Gmail Drive More »
Sign in
Screen reader users: click this link for accessible mode. Accessible mode has the same essential features but works better with your reader.


  1. Advanced Patent Search
Publication numberUS3662161 A
Publication typeGrant
Publication dateMay 9, 1972
Filing dateNov 3, 1969
Priority dateNov 3, 1969
Publication numberUS 3662161 A, US 3662161A, US-A-3662161, US3662161 A, US3662161A
InventorsBergland Glenn D, Wilson Donald E
Original AssigneeBell Telephone Labor Inc
Export CitationBiBTeX, EndNote, RefMan
External Links: USPTO, USPTO Assignment, Espacenet
Global highly parallel fast fourier transform processor
US 3662161 A
A fast Fourier transform processor and associated process wherein an input sequence of samples is broadcast to each of a plurality of parallel processing elements where sets of accumulated sums of products of these samples with appropriate trigonometric function values are maintained. These sets of accumulated sums are then individually Fourier transformed in parallel to form the Fourier coefficients corresponding to the original input sequence.
Previous page
Next page
Description  (OCR text may contain errors)

ca es-72 United States Patent [151 3,662,161 Bergland et al. 1 May 9, 1972 GLOBAL HIGHLY PARALLEL FAST OTHER PUBLICATIONS FOURIER TRANSFORM PROCESSOR Bergland, FFl" Hardware lmplementations- An Overview Inventors; Glenn D. Bergland, Morris Township Trans. on Audio & Electroacoustics" June Morris County; Donald E. Wllson, Brookside, both of NJ.

104- l08. Bergland-Wilson, A PET Algorithm for a Global, Highly Parallel Processor IEEE Trans. on Audio & Elec- Assigneei Bell Telephone Laboratories, Incorporated, troacoustics" Vol. AU- 17, No. 2 June 1969 pp. l25- 127.

Murray Hill, Berkeley Heights, NJ. Pease, Organization of Large Scale Fourier Processors Journal of the Association for Computing Machinery," Vol. 16, [22] 1969 No. 3, July 1969 pp. 474- 482. [21] Appl. No.: 873,587

Primary Examiner-Malcolm A. Morrlson Assistant Examiner-David H. Malzahn 2 444/1 AttorneyR. J. Guenther and William L. Keefauver [5|] Int. Cl. ..G06f 7/38, G06f /34 [58] Field of Search ..235/l56; 324/77 6; 340/155 [57] ABSTRACT A fast Fourier transform processor and associated process [56] References cued wherein an input sequence of samples is broadcast to each of a plurality of parallel processing elements where sets of accu- UNITED STATES PATENTS mulated sums of products of these samples with appropriate 3,517,173 6/1970 Gilmartin, Jr. et al ..235/l56 trigonometric function values are maintained- These sets of 3,544,775 12/1970 Bergland et al ..235 151.31 accumulated Sums are then individually Fourier transformed in parallel to form the Fourier coefficients corresponding to the original input sequence.







ADD PRODUCT TO SUM IN RESPECTIVE k th REGISTERS I YES PERFORM TWIDDLE MULTIPLICATION i REORDERJ PERFORM r: r -POINT FOURIER TRANSFORMS This invention relates to methods and apparatus for signal processing. More particularly, this invention relates to methods and apparatus for the frequency analysis of data signals. Still more particularly, this invention relates to methods and apparatus for performing fast Fourier transforms using parallel processing techniques.

BACKGROUND OF THE INVENTION Machine methods for frequency analysis and synthesis of signals using Fourier series and integral techniques have long been important areas of scientific and engineering investigation. In recent years there have been developed improved means and methods for performing such analyses and syntheses. Among these improved techniques are included those known collectively as fast Fourier transform (FFI) techniques. These FFT techniques originated in recent history with a paper titled An Algorithm for the Machine Calculation of Complex Fourier Series, by J. W. Cooley and J. W. Tukey, Mathematics of Computation, Vol. 19, April 1965, pp. 297-301. The computational advantages demonstrated by this paper have spurred research in areas previously felt to be beyond economic feasibility. These advantages, often involving computational savings of time and machine complexity of an order of magnitude or more compared with classical techniques, flow largely from judicious groupings and reorganizations of summation techniques known in the prior art.

Numerous improvements and variations of the original FFT techniques have been developed since the publication of the original Cooley-Tukey paper, several of which were summarized in the June, 1967 and June, 1969 issues of the IEEE Transactions on Audio and Electroacoustics. Other important developments have been disclosed, for example, in U.S. patent applications by M. J. Gilmartin et al., Ser. No. 605,768, filed Dec. 29, 1966 now U.S. Pat. No. 3,517,173 issued June 23, 1970 and G. D. Bergland et al., Ser. No. 605,791, filed Dec. 29, 1966 now U.S. Pat. No. 3,544,775 issued Dec. 1, 1970. Another reference that should prove helpful in understanding the present invention in light of the prior art is one by R. Klahn and R. R. Shively, FFT Shortcut to Fourier Analysis, Electronics, Vol. 41, No.8, Apr. 15, 1968, pp. 124-129.

Many data processing applications require identical operations on multiple sets of data. Most present digital computers accomplish this through the use of a single processor that operates on one data set at a time. When the number of data sets is large, however, even the fastest of these computers is too slow to perform its task in a reasonable amount of time. The need for more efficient bulk" processing has contributed to recent interest in multiprocessing with highly parallel processors. See, for example, S. H. Unger, A Computer Oriented Towards Spatial Problems," Proc. IRE, Vol. 46, Oct. 1958, pp. 1744-1750; J. H. Holland, A Universal Computer Capable of Executing an Arbitrary Number of Subprograms Simultaneously, Proc. Eastern Joint Comp. Conf., Boston, Mass, Dec. 1-3, 1959, p. 108; J. Gregory, and R. McReynolds, The SOLOMON Computer, IEEE Trans. on Electronic Computers, Vol. 12, Dec. 1963, pp. 774-781; W. T. Comfort, Highly Parallel Machines, Proc. 1962 Workshop on Computer Organization, Spartan Books, Washington, D. C., 1963, p. 126.

Such machines obtain higher processing rates through the use of a large number of identical or similar processing units that operate simultaneously or in an overlapping mode, each on its own part of the overall task.

Other particularly useful parallel processing machines have been described in B. A. Crane et al., Bulk Processing in Distributed Logic Memory," IEEE Trans. on Electronic Computers, Vol. 14, April 1965,pp. l86-196;J. A. Githens,Highly- Parallel Calculating Arrays in Radar Data Processing, presented at the Workshop on the Development of New Computer Organizations, La Jolla, Calif, June 29, 1967; J. H. Huttenhoff and R. R. Shively, Arithmetic Unit of Computing Element in a Global, Highly-Parallel Computer," IEEE Trans. on Electronic Computers, Vol. C-l8, No. 8, August, 1969; and in B. A. Crane et al. U.S. Pat. Nos. 3,376,555 and 3,391,390 issued Apr..2, 1968 and July 2, 1968, respectively.

BRIEF DESCRIPTION The present invention recognizes and applies the principles of Fourier transform technology as implemented using selected aspects of the parallel data processing arts, with suitable extensions and modifications. In particular, means are provided in the present invention for decimating a sequence of input samples into a number of subsequences. Each of these subsequences is then processed in an independent computational element, thereby generating the Fourier transform for each subset taken alone. The parallel processing elements may take any one of several well-recognized forms, and each is under the control of a global control unit. Little or no communication or exchange of information is required between the several processing elements.

The results of the fourier transformation on the subsequences are then integrated in a subsequent processing step to generate the .desired transform corresponding to the initial input sequence.

By suitably designing the arrangement of each processing element and by controlling the flow of data to and from these elements, it is possible to reduce the number of fundamental arithmetic operations, thereby increasing the speed of Fourier transform data processing. Additionally, the algorithm associated with one aspect of the present invention permits simplification of the format of the generated results, thereby simplifying interaction with the system user and other portions of a comprehensive data processing facility. In particular, by suitable intermediate processing, it is possible to sequentially read the results from each successive element of a parallel processing array while retaining the proper order of Fourier coefficients.

BRIEF DESCRIPTION OF THE DRAWINGS The several aspects of the present invention will be more fully described below in connection with the several figures wherein:

FIG 1 shows a block diagram of a parallel data processing system suitable for performing fast Fourier transform analysis in accordance with the present invention;

FIG. 2 shows a computational element for use in the parallel processing array of FIG. 1;

FIGS. 3 through 9 show the flow of data and the storage patterns occurring during an FFT computation using the system of FIG. 1 for the special case where the number of processing elements is two;

FIG. 10 shows the formation of certain results for the general case where the number of computational elements is arbitrary;

FIG. 11 shows the number of complex multiplications required for various numbers of computational elements; and

FIG. 12 shows the number of complex multiplications required for the special cases where the number of computational elements is l and 2;

FIG. 13 is a flow chart corresponding to one embodiment of the present invention.

NOTATION AND NOMENCLATURE The detailed description to follow are presented largely in terms of algorithms. These algorithmic descriptions are the means used by those skilled in the data processing art to most effectively convey the substance and meaning of their work to others skilled in the data processing arts.

An algorithm is here, and generally, conceived to be a sequence of steps leading to a desired result. These steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It proves convenient at times, principally for reasons of common usage, to refer to these signals as samples, values, elements, terms, real-valued quantities, complex-valued quantities, number, or the like. It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities.

Further, the manipulations performed are often referred to in terms, such as adding, which are commonly associated with mental operations performed by a human being. This is not the sense in which terms such as adding are used here. No human operator is necessary (or desirable in most cases) in any of the operations described herein; the operations are machine operations.

Useful machines for performing part or all of the operations of the present invention include general purpose computers of the IBM 7090/94 class, various of the IBM System 360 class, the GE-600 class, or other similar machines. In all cases there should be borne in mind the distinction between the method operations in operating a computer and the (mathematical) method of computation itself. The present invention relates to method steps for operating a computer in processing electrical or other (e.g., mechanical, chemical) physical signals to generate other desired physical signals. The present invention also relates to apparatus for performing these operations.

The algorithms presented herein are not inherently related to any particular computer or other apparatus. In particular, various general purpose machines including those mentioned above, may be used, or it may prove more convenient to construct more specialized apparatus to perform the required method steps. The required structure for various of these machines will appear from the description given below in light of the state of presently existing knowledge in the field of parallel-processing computers.

DETAILED DESCRIPTION Overview FIG. 1 is a generalized block diagram of a parallel data processing system for use in computing Fourier coefficients. Shown there is a plurality of parallel processing elements 101-1 through llq, where q is an arbitrary positive integer. Also shown is an input source 110 for supplying sequences of N digital signals (derived by well-known sampling techniques from continuous signals, where appropriate) for which it is desired that Fourier coefficients be calculated. These input digital signals, which are typically representative of radar returns, speech or similar physical phenomena, are delivered to the processing elements 101-1 through l0l-q (in a manner to be described below) by way of multipliers 120-1 through I20-q, respectively.

Also shown in FIG. 1 is an ensemble control unit 130 which exercises global control over the several processing elements 101-1 to l01- Control unit 130 is arranged to alter the sequence of operations in each of the processing elements by means of selecting or altering a stored or wired program in each of the processing elements or by any of several wellknown techniques. Also shown in FIG. 1 is a conventional processing apparatus 140 which may, for example, be a general purpose data processing system having the usual arithmetic and logical capabilities represented by processor 145 and having provision for storing program data signals and other nonprogram data signals. The apparatus for storing these latter two classes of signals are designated by the identification numerals 150 and 160, respectively. This separation of the various aspects of the processing apparatus 140 is for convenience of discussion only. In actual implementations data processing apparatus 140 may, for example, take the form of any of several of the IBM System 360/ class machines or other similar machines including G.E. 600-series machines.

Processing Elements FIG. 2 shows a parallel processing element according to one embodiment of the present invention. Shown there is an arithmetic unit 210, a memory 220 and a control unit 230. The actual distribution of the circuitry to perform the required arithmetic operations will take any of several well recognized configuration depending, for example, on such things as the required size of memory 220, the required speed of operation, and the number of parallel processing elements used. In particular, it is possible and may be desirable to integrate memory with the arithmetic and control aspects of a computational element as described, for example in B. A. Crane, et al., "Bulk Processing in Distributed Logic Memory," IEEE Transactions on Electronic Computers, Vol. 14, April 1965, pp. 186-196; and in the Githens reference, supra. The processing element may also conveniently take the form described in the Huttenhoff and Shively reference, supra. This latter reference is also illustrative of a suitable parallel processing environment for the present invention, and is especially helpful with regard to describing the control of the several processing units.

The arithmetic unit 210 may of course assume a more specialized form. In particular, it may be convenient to use the modularized equipment arrangements described in pending US. patent applications by G. D. Bergland et al., Ser. No. 605,791 and M. J. Gilmartin, Jr. et al., Ser. No. 605,768 both filed Dec. 29, 1966 and assigned to the assignee of the present application. Also, in appropriate cases the memory units 220 and 230 shown in FIG. 2 may be integrated with the arithmetic unit in the arrangement of Bergland et al., or in accordance with computational elements of the type described in the Crane et a1. journal reference, supra. In any event, a further disclosure of the details of computational elements 10l-i, i= 1, 2 q, is not essential to a full and complete understanding of the present invention.

FFT Fundamentals Arithmetic units Processing elements 101-1', i= 1,2, ,q, may be arranged to perform any of the now well-known alternate versions of the fast Fourier transform. A useful description of the underlying principles of the FFT may be found in the Bergland et al. reference, supra. Only a brief description of the well-known aspects of the FFT will be given here.

By way of review, it is well to consider that the calculation of Fourier coefficients corresponding to a sequence of N input signals spaced over an interval of T seconds may be represented by UU j 7 r 7N 1 =0 where W e and A(k) represents the input sample sequence 14(0), 14(1),. .A(N-1).

The Cooley-Tukey algorithm reduces equation (1 to a recursive relation, the exact form of which depends on the properties of N. Starting with an initial list of numbers say b (the input sequence), subsequent lists b are calculated, the final one of which comprises the required Fourier coefficients. When N is taken equal to 2'", m lists will be calculated in accordance with one version of the algorithm. Each of the N entries in a list is identified by an m-digit binary integer. For example, one entry takes the form b,,(j I1 ,j,, k,,, k,,) where each of the js and ks are binary integers.

For the special case mentioned, N 2", the recursive relation takes the form 8 2mm Up 2p1+ 2pZ+ +j 2 nr-n The binary number specified by the argument j j ,k can be written simply as k, i.e., b,,(j ,j,, 1v 0 As may be seen from equation (2), a highly regular pattern is followed in calculating the terms in the current list. For purposes of constructing the list b,, the list h is considered to be portioned into two equal-length sublists or parts. Each term b,(k) in the list 12, is derived from two terms from the list b One of these terms from b is multiplied by the complex phasor W(in-1 +ip-2 +s0) where, as before, W= e The first term of b, consists of the first term in b plus the first term in the second half of b,, multiplied by Likewise the second term of b, is made up of the second term in b, plus the second term in second half of b multiplied by the same complex value. This process is continued for all terms in the first half ofi b,. To obtain the terms in the second half of b,, a similar scheme is followed except that the terms in the second half are multiplied by e ==l+j0.

The Cooley-Tukey algorithm proceeds by transforming the list b, into a third list 12 and so on. The same type of operations apply in each case, except that after each iteration the list is effectively partitioned into twice as many parts as before. Terms from selected parts of the then-current list are used to form the required terms for the list then being constructed.

The computational operations indicated in the various equations recited herein are each fundamental arithmetic operations which are performed using well-known apparatus. For example, adders are used to add, multipliers to multiply, and so forth. The generation of the complex exponential factors is accomplished using appropriate combinations of sine and cosine function values which may be stored for reference or which may be generated as they are required by well-known function generators.

Particularly simple and efficient generation of the required exponential factors may be effected using the techniques described in R. C. Singleton On Computing the Fast Fourier Transform," Comm. ACM, Vol. l0, pp. 647-654, October I967.

Special Case of Two Processing Elements A simplified version of the present invention may be understood from a consideration of FIG. 3 and the discussion of it given below. This view shows a portion ofa parallel processor in accordance with FIG. 1 generally, but being restricted to the case where q 2. Thus, in FIG. 3 there are but two parallel processing elements of the form shown generally in FIG. 2 and described above. These are designated 301-1 and 301-2, respectively.

The exemplary case to be considered is that where the input sequence Ak) contains but 16 input samples, i.e., N =16. These samples are shown at the upper left portion of FIG. 3 as 14(0), A( l A( l5), reading from right to left. Multipliers 320-1 and 320-2 are arranged for this simple example to multiply the input sequences applied to them by plus I or minus 1 (in a manner to be described below) before passing them on to I their respective processing elements l'1 and 301-2.

The processing elements 301-1 and 301-2 are required to have only eight data storage locations designated 0 through 7. These are typically located in that portion of the computational element shown as memory 220 in FIG. 2. There may, of course, be other information storing portions of the computational elements corresponding to the arithmetic and control portions of a computational element e.g., those portions which are designated by the identification numerals 210 and 230 in FIG. 2. The computational elements shown in FIG. 3 are each arranged to calculate an eight-point Fourier transform.

In FIG. 3 (as in FIG. 1) the input sequence is shown as being broadcast to both (all) of the computational elements 301-1 and 301-2. In accordance with the algorithm of the present invention for the case N 16, the first eight data elements in the sequence A(k), that is, A(0), A( l 14(7), are weighted or multiplied by +l and are stored in both of the computational elements as shown in FIG. 4. The remaining eight data signals A(8), A(9), ,A( 15) are then weighted by +land applied to element 301-1 and by *1 as they enter element 301-2. These numbers are successively algebraically added to the previous contents of memory locations to form the sums shown in FIG. 5.

It will be recognized that the contents stored in the corresponding locations of the computational elements 301-1 and 301-2 represent the (two-point) Fourier transform coefficients corresponding to the pairs of input data values 14(0) and A(8), A( 1), and 14(9),. A(7) and A( 15).

Before proceeding to the completion of the desired l6point transform, it is well to recall that the equations for one version of the FFI with N r,r are given by r,-l; m =e 21ri/s. In the present case r, 2 and r,=8. Thus we have j 7 1 k M0 k kzz) 12540 k 21ri/2)" l m 1e h m a 0 The expression in square brackets (for k 0,1, 7) is actually a sequence of eight two-point transforms and is precisely the result calculated by the method described above and stored as shown in FIG. 5. These terms are those designated A,(i,,,,' k,,) in connection with equation (3) above. The contents of memory shown in FIG. 5 may therefore be represented in the manner shown in FIG. 6. Equation (3) therefore reduces to It will be observed that the multiplication of the A, terms by the exponential factor (e to form the bracketed quality in equation (5) is merely a rereferencing ofA, terms in the familiar manner associated with the fast Fourier transform.

This rereferencing is accomplished by well-known internal arithmetic operations by arithmetic unit 210 in FIG. 2 and the results are as shown in FIG. 7. It proves convenient to relabel the rereferenced A, terms as A, so that equation (3) becomes, for the present case For convenience the A l, terms are referred to as shown in FIG. 8. It will be observed that the operations indicated in equation (6) are precisely those of a conventional Fourier transform taken over the two eight-point sets of .4, terms stored in the computational elements as shown in FIG. 8. These two eightpoint Fourier transforms are performed identically in the two computational elements, with only the data sequences operated on being different. This transformation may profitably be performed using well-known FFT techniques. FIG. 9 shows the storage pattern of the results of the transformation of the contents stored in the memory portion of each of the computation elements shown in FIG. 8. The trans formed quantities corresponding to the A, terms are referred to as A (j,,, j,). It should be noted that X(j,, j =A (j j,), or,

It should be noted that if in-place reordering of the A, terms in each processing element is accomplished according to wellknown techniques prior to transformation, the X terms may be read out from the entire system shown in FIG. 1 (with q 2) by alternately reading first from computational element 101-1 and then from computation element 101-2 while proceeding in sequence through the respective memories. Thus, the sometimes confusing and difficult-to-implement reordering of the Fourier coefiicients is readily simplified by the present organization. If the reordering of A, terms is not accomplished prior to transformation, the X terms will require the usual reordering.

Another useful feature of the present invention is that relating to the generation of the required exponential (W) signals required in forming, for example, the expressions shown in FIG. 7. Processing element ll-2 is shown to require eight different Wterms, i.e., W, i= 0,1, 7. By noting that for arbitrary m and n, W'"*" W"W", it is clear that only W need be stored explicitly on computational element l0l-2. The remaining W terms may then be calculated by simple multiplication as required, e.g., W? may be calculated by multiplying W by W, and so forth.

Alternately, the techniques described in Singleton, supra, may be used. Still another simplified variation of these exponential function generating techniques is one described by G. Sande at the IEEE Workshop of FFT processing at Arden House, Oct. 1968. There, Sande proposed calculating subsequent values from previous ones according to (W"*W'")/W"= W- 1 or W"*= W"(Wl)+ W. It is said that this particular rearrangement improves roundoff errors in computation. It is often helpful, especially for large values of K, to occasionally insert more accurate, separately calculated W values to prevent the accumulation of roundoff errors. The Algorithm Generally With the above notation and the description of the special case of a two-processor algorithm as a background, the general algorithm will now be presented. Briefly stated, the algorithm (illustrated in FIG. for N r,r comprises the steps of l. Dividing the input sequence {A(k)} ={A(k,,k,,)} {A(K, r,+ k,,)} for k=0,1,. N, k, =O,1,.. (r,l and k =0,l, (r l); into r, subsequences with k, fixed within each sequence. That is, let the first subsequent include only the first r elements, the second subsequence only the next r elements, and so forth.

2. Multiplying the terms of the k, th subsequence by W,, j,,=0,l,. (r,=l to form sequences of product terms {A(k,, k )W ,j,,= 0,1,. (r,l for each k,= 0,1, (r,- l Here, W, eni/rl. The multipliers 701-1 through 70l-i r, are conveniently adjusted for each new It,.

3. Within initially cleared registers in each of r, processing elements, combining corresponding product terms formed in step (2) for successive k, s to form Only N=r,r storage locations are required to form the A, terms because the previous contents of each location may be updated by appropriate product term formed in step (2), as k, varies over its range. That is, the A, terms may be summed as shown by accumulating partial sums. I

4. Multiplying A,(i ,k terms by W, for each j, and k to form corresponding referenced terms (2,0 k,,) and replacing the A, terms by the corresponding A, terms.

5. Forming the Fourier coefficients of the sets of xi, terms formed in step (4) and storing these coefficients in the locations previously occupied by the A, terms. Again, any standard FET technique may be employed, or, non-FET technique may be used.

6. Reading the Fourier coefficients corresponding to the original input sequence by reading successively terms stored 'in each computational element starting with the first. In each case terms read from each computational element are conveniently read from successive storage locations. The first coefficient is read from the first location in the first computational element, the next from the first location in the second computational element, and so forth.

It should be understood that the Fourier transformation performed at step 5 is conveniently, though not necessarily, performed using a fast Fourier technique. This method is particularly attractive in the present arrangement because no communication between computational elements is then required in performing step (5). When an FFT process is employed in step (5) an in-place reordering according to well-known techniques is conveniently effected prior to transformation so that no part-transformation reordering is required prior to readout.

The above procedures insure that the Fourier coefficients read from memory in accordance with step 6 will be in ascending order, i.e., no reordering of the coefficients generated is required.

The fact that no communication is required between the various computational elements in performing step (5) of the algorithm and the fact that this step may be executed in parallel, greatly increases the number of operations that can be performed in a unit time interval.

The number of global complex multiplications M that are required for the case where N =r, r 2"2 is given by M 2 2" 2". It is this expression which must be minimized to obtain the minimum number of computations required. it can readily be shown that with the constraint that a 2-point transform is to be performed in each computational element that, in general, the number of computations required decreases with an increasing number of computational elements. As shown in FIG. 11, however, the incremental decrease in the number of computations required decreases sharply as p increase. In particular, when p=1 and p=2 the W,

terms assume only the values i 1 and i In this special case only additions and conjugations (but no multiplications) are performed on the original input signals. Thus, the number of complex multiplications reduces to q2 2" which is plotted in FIG. 12. It is shown there that p=2 (four computational elements) is a good choice which actually gives better reducer than F5 for up to 2 input data samples. System Control While the computational procedures required by the present invention are detailed above, precise circuit arrangements have not been included. Thus, for example, the particular logic circuitry required to gate the input samples to the appropriate registers has not been shown. Similarly, the details of the multipliers (e.g., 120-1 to 120-q, in FIG. 1) have been omitted. This has not been by oversight, but rather by design.

Particular practitioners may wish to use preexisting facilities to carry out one or more of the fundamental operations involved or to coordinate one or more of these operations. The following discussion, taken together with the statement of the algorithm and other materials above, is sufficient to enable one to practice the instant invention in any of a number of particular configurations. It will be assumed in this discussion that program memory 150 in FIG. 1 is arranged to store a control program corresponding to the flow chart given in FIG. 13. Processing unit 145 is arranged to execute this program with reference to data (including stored trigonometric values, where appropriate) stored in data memory 160 shown in FIG. 1. Ensemble control unit 130 is then responsive to the program execution to actually close the appropriate switches, clear the appropriate registers, and so forth, that may appear in particular logic configurations. In keeping with the statement of the algorithm in the last section above, N will again be assumed to be equal to r,r

Referring to FIGS. 1 and 13 then, computer 140 causes all of the data storage registers in the processing elements 101-] to 101-r (recall q=r here to be cleared or reset to zero. An initial value of zero is then set for k and k These values in turn dictate the initial exponential signals to be supplied to (or be selected in) multipliers 120-1 to 120-r These and laterrequired values may be stored in memory 160 or, as indicated parenthetically above, may be stored in a limited memory located at the multipliers. Alternately, the required multiplying factors may be generated as required according to one of the techniques given earlier.

When the required multiplier signals are at hand at the multipliers l201 to 1204 the input signals are read one at a time in serial form and presented to each of the multipliers. Each such signal is then multiplied by the multiplier signal and the result combined with the previous signals stored in the first register of the respective computational element. Because with k, =0 the multiplier W, is l for eachj the first input sample will be merely stored in the first register of each processing element. It will be recalled that these registers were initially cleared, so no prior nonzero sum remains to be combined with this input sample.

Next, k is incremented and, since r is assumed not to be equal to 1, another sample is read, multiplied by l and stored in the respective register in each processing element. This procedure is repeated until k rrl, i.e., until the first subsequence of r input terms has been read. Then k is incremented, the multiplier values adjusted accordingly, and the next input sample value is read (with k being reset to 0).

Again the multiplications are performed by each of the multipliers on each input sample. Now, however, the multipliers are not all 1 as in the case of the first subsequence, i.e., when k, 0. Thus, the multiplications are not degenerate and the respective products are added to the existing values stored in the first register of the respective processing elements.

This process is then repeated until the second subsequence of k, input values is read, multiplied, the products added to previous accumulated sums, and the new sums stored in appropriate registers in the processing elements. Then k is incremented, the next k elements are read and each of the partial sums augmented the products formed. When k is equal to r l, the sums indicated in FIG. 10 are complete.

Each of the accumulated sums in each of the processing element registers is then multiplied by the appropriate twiddle factor. This may be accomplished in a convenient manner by supplying the twiddle factors in the same manner as were the W,, multipliers, and actually performing the multiplication in the multipliers 120-1 to 120r,.

If an FFT is to be used in further processing the twiddled values, it may be convenient to perform a standard (digits reversed) reordering as mentioned earlier. This is shown (as a dotted block) in FIG. 13. The r -point Fourier transformation of the contents of each of the r processing elements is then performed.

The results are then ready to be read in order from the respective processing elements unless an FFT was used without preprocessing reordering. In this case, the reordering of the results in each processing element may be performed prior to or concurrent with readout. In any event, the desired results are stored in the processing elements in a predictable order.

While the above description has proceeded primarily in terms of a plurality of specially designed FFT processing elements (including circuitry of the type described in the Bergland et al. reference, supra, for example), it should be understood that these processing elements can conveniently take the form of general purpose data processors programmed to perform the required individual transformations and other indicated operations. Further, it is clear that the multipliers and processing elements described as separate elements may be suitably combined where appropriate.

What is claimed is:

1. Apparatus for generating data signals representing Fourier coefficients corresponding to a sequence of N =r,r input data signals, where r, and r are positive integers with r 2, comprising means for generating trigonometric function values,

first means for forming r sets of r intermediate coefficients, each intermediate coefficient corresponding to the sum of the products of selected ones of said input data signals with selected trigonometric function values, and

second means of simultaneously generating r sets of output data signals representative of the Fourier coefficients for each respective set of said intermediate coefficients.

2. Apparatus according to claim 1 wherein said first means comprises means for s segmenting said sequence of input data signals into r subsequences, each including r data signals, and summing means for forming the sums of selected ones of said products.

3. Apparatus according to claim 2 wherein said summing means comprises r sets of r registers and means for adding a subsequent one of said products to a previously accumulated partial sum of said products, said registers being initially cleared.

4. Apparatus according to claim 3 wherein said first means further comprises means for forming the product of said sums and corresponding trigonometric function values, thereby to form rereferenced sums.

5. Apparatus according to claim 1 wherein said second means comprises r fast Fourier transform processors for operating on each of said r, sets of intermediate coefficients.

6. Apparatus for generating data signals representing F ourier coefficients corresponding to a sequence of N r,r input data signals comprising A. a source of trigonometric signals,

B. a plurality of multipliers for forming product signals representing the product of selected ones of said input data signals withselected trigonometric signals,

C. a plurality of processing elements, each including 1. means for forming sum-of-product signals representing accumulated sums of selected ones of said product signals,

2. means for forming signals representing the product of each of said sum-of-product signals and a corresponding trigonometric signal to form rereferenced signals,

D. means for forming signals representing the Fourier coefficients corresponding to said rereferenced signals.

7. Apparatus according to claim 6 further including in each of said processing elements means for reordering said rereferenced signals.

8. The machine method for generating signals representing the Fourier coefficients corresponding to a set of N=r,r ordered input signals, r and r being positive integers with r, 2 comprising the steps of A. generating r sets of r intermediate signals representing the sum of products of said input samples with selected trigonometric function signals, and

B. generating simultaneously a set of signals representing the Fourier coefficients corresponding to each of said sets of intermediate signals.

9. The method of claim 8 further comprising the step of multiplicity said intermediate signals by a rereferencing trigonometric function signal prior to performing step (B).

10. The method of claim 9 wherein said step (B) comprises the step of performing a fast Fourier transformation based on said intermediate signals.

11. The method of claim 10 further comprising the step of reordering the signals formed in accordance with the process of claim 9 prior to performing said fast Fourier transformatron.

12. In a digital processor, the machine method of generating signals representing Fourier coefficients corresponding to a sequence of N= r r input signals A(k) A(k,r +k k,= 0,1, ,r,-1; k 0,1, ,r l comprising the steps of l. generating first product signals by multiplying each A(k) by(e" "'l) forj =0,l,...,r,l,

2. forming sum signals by adding product signals formed in step 1 which correspond to the same value of j and k 3. forming second product signals by multiplying said sum signals by (e for corresponding values of j and k and 4. generating r sets of r Fourier coefficients, each set corresponding to that set of r of said second product signals corresponding to a fixed value ofj 13. The method of claim 12 wherein step l comprises the steps of A. reading an input signal,

B. reading r stored values of (e "'l) l for the input value read at step (A), forj =0,1, r -1,

C. generating a signal corresponding to the product of the signal read at step (A) with each of the signals read at step (B),

D. repeating steps (A), (B), and (C) of each input signal.

14. The method of claim 12 wherein step l comprises the steps of A. reading an input signal B. generating values of (e "'l) for the input value read at step (A) forj =0,l, r -l,

C. generating a signal corresponding to the product of the signal read at step (A) with each of the signals generated at step (B),

D. repeating steps (A), (B), and (C) for each input signal.

15. The method of claim 12 wherein step (2) comprises the steps of A. clearing each of N registers R0 k ),j 0,1, r,l,

k =O,l, r l, B. adding each of said first product signals as it is formed to the contents of the register having corresponding values of j and k 16. The method of claim 12 wherein each of said sets of r Fourier coefficients are generated in parallel at substantially the same time.

17. The method of claim 16 wherein at step (4) each of said sets of r coefficients are generated by performing a fast Fourier transform based on those r second product signals corresponding to a fixed value of j UNITED STATES PATENT oFTTcE CERTIFICATE OF CCRRECTICN Patent No. 3, 61 Dated May 9, 1972 Inventor(S) Glenn D. Bergland; Donald E. Wilson It is certified that error appears in the above-identified patent and that said Letters Patent are hereby corrected as shown below:

Column '2, line 1, after "of" insert --a-; line '22, change "fourier" to -Fourier; and line 67, change "description" to descriptions-. Column L, line 62, change 'b to I c o b Column 5 line 16, change i b l insert and line 39, change Column 6, Eq. (A), after to --A (j k Column 7, line 66, change H H t] k II 0 K r 1%)} to -{A(k r kO) line 71, change jOkln l to -W O l-; line 73 change "k )W to to W k )w Joki line 7 1 chan e "w' 2 and delete "Within" to With,- line '12, after and line 21, change "FET" to FFT and change "non-FET" to -'nonFFT.

-; line 7 1 change 7-01 i" to -70l-r Column 8, line 1, change "by" insert the-;

Column 9, line 2M, change r here" to --(recall q r here)--; line 39,

O and line 69, change "W (recall q n "O 11! change W to 1 r line 33, after 'ior" delete s; and line 76, change "generating simultaneously" to -simultaneously generating. Column 11, line 1, change multiplicity" to multiplying;

Column 10, line '29, change "means of" to --means for";

'ORM PO-1050 (10-69) USCOMM-DC suave-Poe I Q U.5, GOVERNMENT PRlNTlNG OFFICE! '95 0-365-334 CERTIFICATE OF CORRECTION '2 Inventors Glenn D. Bergland; Donald E. Wilson 1 J k to (e27T1/I'1) O 1 V 0 line 18, change "(e i/r 1) k I o o '27Ti O 0 line 22, change "(e I to /N) and k 2 o 1 2'ITi r o 1 line 31, change "(e l) to -(e 1) Column 12, line L, change 'of" to for-; and line 9,

' k 2 o 1 change "(e l) to -(e l) Signed and sealed this l2th day of December 1972.

(SEAL) Attest:

EDWARD M.FLETCHER,J'R. ROBERT GOTI'SCHALK Attesting Officer Commissioner of Patents

Patent Citations
Cited PatentFiling datePublication dateApplicantTitle
US3517173 *Dec 29, 1966Jun 23, 1970Bell Telephone Labor IncDigital processor for performing fast fourier transforms
US3544775 *Dec 29, 1966Dec 1, 1970Bell Telephone Labor IncDigital processor for calculating fourier coefficients
Non-Patent Citations
1 *Bergland Wilson, A FFT Algorithm for a Global, Highly Parallel Processor IEEE Trans. on Audio & Electroacoustics Vol. AU 17, No. 2 June 1969 pp. 125 127.
2 *Bergland, FFT Hardware Implementations An Overview IEEE Trans. on Audio & Electroacoustics June 1969 pp. 104 108.
3 *Pease, Organization of Large Scale Fourier Processors Journal of the Association for Computing Machinery, Vol. 16, No. 3, July 1969 pp. 474 482.
Referenced by
Citing PatentFiling datePublication dateApplicantTitle
US4092723 *Oct 1, 1976May 30, 1978Thomson-CsfComputer for computing a discrete fourier transform
US4241411 *Nov 16, 1978Dec 23, 1980Probe Systems, IncorporatedFFT Parallel processor having mutually connected, multiple identical cards
US4563750 *Mar 4, 1983Jan 7, 1986Clarke William LFast Fourier transform apparatus with data timing schedule decoupling
US4615027 *Mar 7, 1983Sep 30, 1986Elektroakusztikai GyarMultiprocessor-type fast fourier-analyzer
US4665494 *Dec 16, 1983May 12, 1987Victor Company Of Japan, LimitedSpectrum display device for audio signals
US4821224 *Nov 3, 1986Apr 11, 1989Microelectronics Center Of N.C.Method and apparatus for processing multi-dimensional data to obtain a Fourier transform
US6658441 *Aug 2, 1999Dec 2, 2003Seung Pil KimApparatus and method for recursive parallel and pipelined fast fourier transform
US6760741 *Jun 5, 2000Jul 6, 2004Corage Ltd.FFT pointer mechanism for FFT memory management
US6839728 *Jun 22, 1999Jan 4, 2005Pts CorporationEfficient complex multiplication and fast fourier transform (FFT) implementation on the manarray architecture
EP0250152A2 *Jun 10, 1987Dec 23, 1987AT&T Corp.High speed transform circuit
EP0262555A2 *Sep 22, 1987Apr 6, 1988Deutsche Thomson-Brandt GmbHCircuit for the dicrete cosine transform
U.S. Classification708/404, 324/76.33, 324/76.21
International ClassificationG06F17/14
Cooperative ClassificationG06F17/142
European ClassificationG06F17/14F2