US 2881257 A
Description (OCR text may contain errors)
p 1959' ET's. *WEIBEL 2,881,257
SPECTRUM SYNTHESIZER Filed Aug. 16, 1956 2 Sheets-Sheet 1 c-v-x FIG. 2
A I B 0' 2 ,a 4 5 s 7 a A ONE O'FIZIV TAPS i so 11 ONE or /v FIL rsns INVENTOR 1 5 s. WE/BEL New CN Y A TTOPNEV April 7, 1959 E. s. WEIBEL SPECTRUM SYNTHESIZER 2 Sheets-Sheet 2 Filed Aug. 16, 1956 V A TTORNLV United States Patent 2,881,257 SPECTRUM SYNTHESIZER Erich S. Weibel, Summit, NJ., assignor to Bell Telephone Laboratories, Incorporated, New York, N.Y., a corporation of New York Application August 16, 1956, Serial No. 604,460
13 Claims. (Cl. 179-1555) This invention relates to narrow band transmission of signals such as speech signals by vocoder techniques, and more specifically to the artificial reconstruction of such signals from narrow band channel control signals. Its principal object is to improve the correctness of such reconstruction, and a similar but more specific object is to improve the realism, quality and naturalness of artificially reconstructed speech.
One approach to the problem of narrow band speech transmission is represented by the vocoder. In the socalled filter bank vocoder of Dudley Patent 2,151,091, March 21, 1939, an incoming speech wave is broken down by a bank of band-pass filters into a number of contiguous subbands, and a low frequency control current is derived for each such subband representing the fractional part of the speech energy contained within that subband. These control currents, which occupy much narrower spectrum bands than the voice currents from which they are derived, are transmitted to a receiver station along with a pitch control signal. The receiver station includes a like bank of band-pass filters and, in 'tandem with each one, a variable attenuator or a variable gain amplifier. A buzz source, under control of the pitch signal, and a hiss source without such control are provided.
The output of the buzz source or the hiss source, as the case may be in dependence on whether the sound being analyzed is voiced or unvoiced, is applied in parallel to all the filters of the bank while the attenuation or gain in tandem with each such filter is varied under control of the corresponding one of the transmitted control currents. The outputs of these several channels are then combined and the combination is applied to a sound reproducer.
Ideally, the several spectrum control signals are proportional to the coetlicients of the Fourier expansion of the spectrum of the incoming speech sound. Because of limitations imposed by the properties of the analyzing filters, this ideal is not attained in practice. It has generally been believed that all that was necessary for substantially perfect quality of the reproduced speech was to attain, or at least approximate, this ideal. I have discovered, however, that even with substantially perfect control signals reconstitution through the agency of a bank of conventional band-pass filters is still defective and the resulting speech still degraded for the following reasons:
With any set of contiguous passbands the harmonic components of a speech wave of unstated pitch and of unstated harmonic distribution may lie anywhere with respect to the midband frequencies of the filters. Under one condition a major component may be precisely centered in one of these passbands, while under a slightly dilferent condition in which the same harmonic component of the voice has been slightly shifted on the frequency scale, it falls at the crossover point of two adja'cent filter characteristics.
On the assumption that the transmitted control cur rents are truly representative of the distribution of the harmonic speech components, the transmitted control currents which, in the receiver apparatus, modify the gain or attenuation in tandem with the several filters produce a correct result in the first situation but, with conventional filters, they produce an incorrect result in the second situation. In the first situation that filter in which the major speech component is centered is turned on strongly and those on either side of it are turned on more weakly. In the second situation the two filters which lie on either side of the crossover fre quency are turned on with the same strength; that is, the gains of the two adjacent channels which lie immediately above the harmonic component in question and immediately below it are adjusted to the same level. The effective passband of two filters connected in parallel, energized alike and associated with equal gains, is determined by the sum of the transmission characteristics of these two filters. With ordinary band-pass filters whose transfer characteristics have flat or moderately rounded tops and vertical or steeply sloping sides, the resultant characteristic has a passband which is nearly twice, as broad as that of either one alone. Hence in this condition the envelope of this portion of the spectrum of the reconstituted speech is nea-rly twice as wide as it should be. Hence, too, as the pitch, the harmonics and the formants of the voice shift on the frequency scale even by a small fraction the widths of the formants of the reconstructed speech vary widely. I have traced a substantial part of the degradation which characterizes ordinary vocoder reconstituted speech to this origin.
Evidently, what is required in this situation is a set of synthesizing filters whose characteristics are such that the bandwidth of two adjacent ones, taken together, is the same as that of either one alone.
The present invention is based in part upon the discovery that this property, heretofore unknown, is inherent in a set of N filters that meet two requirements:
First, each filter has an amplitude-frequency characteristic composed of constituents of the form where x is a normalized frequency measured from the center of the passband; i.e., from the major peak of the constituent characteristic. Each such constituent has .a major lobe centered at a specified point of the frequency scale and representing the principal portion of the passband of the resulting filter, and a series of minor lobes of successively opposite signs. The constituent characteristic thus crosses the axis of zero amplitude at each of a series of nulls. In particular, the major lobe being positive, the first two minor ones are negative, the second two are positive, and so on. Each such change of sign represents an abrupt change, of magnitude 1r radians or degrees, in the phase of the contribution of that constituent to the output of the entire filter. The series of nulls, otherwise regular, is broken by a gap between the first two nulls, twice as wide as the spacing between any two other nulls of the series, and the major :lobe is centered in this gap.
Second, the constituent characteristics of the successive filters of the set are so spaced apart on the frequency scale that the major lobe of each one falls at the frequency of the first nulls of its nearest neighbor constituents on either side, and hence at the frequency of higher order nulls of the characteristics of its more distant neighbors. The amplitude-frequency characteristics of the several filters realized from such constituents have the same properties.
With such a set of filters the effective bandwidth of any two adjacent ones, taken together, is the same as the bandwidth of any one alone. Hence, in operation, the widths of the formants of the reconstituted speech are substantially invariant with changes in the. character, other than formant width, of the original speech.
To take full advantage of the present invention it islnecessary to furnish the reconstitution apparatus with control signals which are truly representative of the spectrum of the original speech. The improved set of reconstruction filters cannot, of course, make up for defects in the analyzing apparatus. For example, the reconstruction apparatus must be furnished with a set of control signals in which any equality between the strengths of adjacent ones arises only by virtue of the momentary characteristics of the incoming speech itself and not by virtue of the construction of the analyzing filters. To this end it is preferred to employ at the analyzer a bank of filters which are much narrower and much more numerous than is customary, to sample their outputs in rapid succession, to combine the resulting samples and to derive improved control currents from this combination. A speech analyzer designed along these lines is described by James L. Flanagan in an article entitled Automatic Extraction of Formant Frequencies From Continuous Speech published in the Journal of the Acoustical Society of America for January 1956, volume 28, page 110.
Filters which for all practical purposes have the characteristics required for the synthesizing apparatus may be of any desired construction; e.g., they may be of the class known as transversal filters. Such a filter comprises a nonreflecting wave propagation device such as an eleccycles per second to 180 cycles per second and so to the ninetieth filter whose passband may extend from 3660 cycles per second to 3700 cycles per second. The cost of such a large group of filters is largely compensated by the fact that each may comprise merely a simple onestage resonant circuit.
To the output of each analyzing filter is connected at rectifier 3 and to the rectifier is connected, in turn, a low: pass filter 4 whose high frequency cutoff may be adjusted to about cycles per second thus to pass syllabie frequencies.
This arrangement provides, at the output points of the several low-pass filters 4, a plurality of low frequency control signals, representative of the spectrum of the incoming speech analyzed to a very fine scale. This group of control signals may be converted into a lesser number of coarser control signal-s by sampling them in rapid succession as by a commutator 5 of which the scanning element, here shown as a mechanical wiper arm 6, is rotated in a counterclockwise direction at a speed, for example, of 300 revolutions per second under control of a timing wave source 7. The electrical output of this wiper arm 6 is passed through a low-pass Y filter 8 whose cutoff is located at about 15,000 cycles per tromagnetic transmission line, terminated at its far end 1 to prevent any reflection, and having an appropriate number of lateral taps. An attenuator of appropriate magnitude is inserted in series with each of these taps and a phase inverter is inserted in series with some of them, in order to realize the negative signs of those porhand transmission apparatus in accordance with the invention;
Fig. 2 is a graph showing transmission characteristics of filters in accordance with the invention, and their resultant characteristic;
Fig. 3 is a graph employed in the explanation of the invention; and
Fig. 4 is a schematic circuit diagram showing a filter proportioned to produce a transmission characteristic as shown in Fig. 2. 7
Referring now to the drawings, the left-hand portion of Fig. 1 shows apparatus for deriving control signals of a sort which the apparatus shown at the right-hand portion of the figure can turn to account. Speech waves originating, for example, at a microphone 1 are delivered to the input points of a set of analyzing filters 2 connected in parallel. The passbands of the individual filters of this set are contiguous and together embrace the entire frequency band of the voice waves to be transmitted. As compared with the bank of filters of a conventional voice analyzer the filters of the present set are several times as numerous and the passband of each is several times as narrow. For example these filters may be ninety in number and the passband of each one may be of the order of 40 cycles per second. Thus the passband of the first may extend from 100 cycles per second to 140 cycles per second, thatof the second from 140 second. This filter is interposed merely to block components arising solely from the sampling operation. The signal, cleared by this filter 8 of sampling frequency components, is applied to the scanning element or wiper I arm 9 of another commutator 10 of which each segment occupies an angle six times as large as does a segment of the first commutator 5. The wiper arm 9 of this second commutator 10 is synchronized with that of the first commutator as by driving it, in a clockwise direction, from the same timing wave source 7.
With fifteen segments on the second commutator 10 and ninety segments on the first commutator 5 the signal thus applied to each bar of the second commutator 10 is an average of the energies in six adjacent filter channels of the analyzing filter bank 2. Therefore conductors connected to the several bars of the second commutator 10 carry low frequency channel control currents which are more truly representative of the envelope of the spectrum of the incoming speech than can be derived with a coarse-grained analyzer such as that of the aforementioned Dudley patent. Each of these conductors may also carry high frequency components introduced by the second commutator 10. These may be readily eliminated by the interposition in each control channel of a low-pass filter 11. The high frequency cutoffs of these several filters may be adjusted to about cycles per second. The outputs of these several filters are now control signals containing components of frequencies no higher than the syllabic frequencies of the speech.
In addition, the speech energy is applied to a pitch detector 12 which derives and delivers a control signal representative of the fundamental frequency of the speech or other wave being analyzed. The latter may be as shown in Fig. 3 of Riesz Patent 2,522,539, September 19, 1950 or otherwise as desired.
The pitch control signal and the fifteen channel control signals are transmitted in any desired fashion over an intervening medium, symbolized on the drawing merely by broken lines, to a receiver or synthesizer station. Here the pitch control signal operates and controls the frequency of a buzz source 20 and, when it falls below a. preassigned low amplitude threshold, indicating the absence of a voiced sound at the transmitter, permits a relay 21 to be deenergized, thus to turn off the buzz source and permit a hiss source 22 to be turned on in its stead. The buzz source 20 and its controlled tuning, the hiss source 22, and the relay 21 which selects between them may be as shown in the aforementioned Riesz patent.
A bank of fifteen filters 23 are supplied in parallel with energy from the buzz source 20 or the hiss source 22 as the case may be, in dependence on the character of the 5 pitch control signal. In tandem with the output terminals of the several filters are connected a like plurality of elements which introduce controllable amounts of gain or attenuation. These may be variable gain amplifiers 24 as shown, and the several channel control signals are applied to the gain control terminals of these amplifiers respectively.
The outputs of the several variable gain amplifiers 24 are additively combined as by application to a common conductor 25 which is in turn connected to a sound reproducer 26.
As with the vocoder synthesizer of the aforementioned Dudley patent, the passbands of the synthesizer filters 23 embrace the entire frequency band occupied by the original speech; i.e., the range extending, for example, from 100 to 3700 cycles per second. With fifteen of them, each one is responsible for the reconstruction of a subband of 240 cycles per second. Thus the first of these filters 23-1, responsible for the band l340 cycles per secand, is so proportioned as to have the major peak of its transmission characteristic located in the center of the first 240 cycle band; namely, at 220 cycles per second, the second, 23-2 to have its major peak centered 240 cycles higher on the frequency scale or at 460 cycles per second, and so on.
Each of these filters, furthermore, is so proportioned that its amplitude-frequency characteristic is composed of constituents of the form sin m:
wherein F is the spacing between the midband frequencies of adjacent filters, where 7" is any frequency, and n takes on successive integral values for the successive filters of the set. The filters are so proportioned, furthermore, that each of the several nulls of the amplitude characteristic of each one lies at the same frequency as the major peak of the characteristic of one of its neighbors; more specifically the major peaks of the several filters are spaced apart on the frequency scale precisely as are the nulls of the characteristic of any one of them. Hence F also represents the. spacing between adjacent nulls of the characteristic of any one filter.
It is convenient, in discussing the filters of the invention, to employ a normalized frequency h, defined as With this substituent Equation 2 becomes sin 1r(hn) T 1r(hn) (4) In Fig. 2 the curve A is a graph of the constituent characteristic of one such filter for which n=3; it is a graph of the curve sin 1r(h-3) rows) (5) and hence has its major peak at the point h=3 and nulls at points 0, 1, 2, 4, 5, 6, and so on, on the h axis. For
convenience the amplitude scale is chosen to place the major peak at height unity. The curve B shows the constituent characteristic of a neighboring filter of the set, for which n=4. That is to say, it is a graph of the curve sin 1r(h-4) Tut-4 (6) from considerations of symmetry it is evident that the major peaks are spaced apart on the h axis by unity. The falling shoulder of the curve A crosses the rising shoulder of the curve B at the h value one-half, i.e., midway be.- tween the h values of the two major peaks, and at a height 0.635 which differs but little from the height or 0.707, i.e., 3 decibels down from either major peak. At this crossover point considerations of symmetry again show that the width of each curve is unity. I
The sum of the curve A and the curve B is plotted as the curve C. Its rising shoulder lies inside the rising shoulder of the curve. A, and it passes through the peak of the curve A where the contribution to it from the curve B is zero. Similarly, its falling shoulder lies inside the falling shoulder of the curve B and it passes directly through the peak of the curve B where the contribution to it from the curve A is zero. Its peak is just twice as high as the point of crossover of curves A and B; thatis, the height of its peak is 1.27.
Descending from this peak by 3 decibels brings us to a height of 0.91. At this height the width of the curve C differs only slightly from its Width at height unity which is itself unity. It is thus evident from the graph that the width of the curve C differs but little from the width of curve A or that of curve B.
The foregoing good approximation was arrived at by the addition of only two neighboring curves of the form sin rx (1) namely the addition of curves A and B to produce the curve C. With the addition of a number of such curves, all similar in form and spaced apart on the frequency scale in the fashion above described, the approximation is still better. While this is difiicult to show graphically it may readily be shown analytically as follows:
Referring to Fig. 3, consider a function of the independent variable h, having peaks and valleys, the width of each peak being in excess of unity. With this restriction this function may be expressed as a series of the following form sin n-(h-n) This is shown in Probability and Information Theory With Applications to Radar by P. M. Woodward (Mc- Graw-Hill, 1953), page 34. In the above expression G(n) means the amplitude of the function G(h) at any one of a number n of discrete points of the h axis, termed sample points.
This means that G(h) can be calculated for any point h though its magnitude be given only for discrete integral values (h=n) of the abscissa.
It is plain from this theorem that no importance is attached to the relative locations of the peaks of the function G(h) with respect to the sampling points. They may fall on, or anywhere between, the sampling points. The representation of G(h) is therefore always of the same quality, no matter how it may be shifted upward or downward along the h axis.
In physical terms this means that with an infinitude of sm x filters, the behavior of each of which is reflected in one term of the foregoing series while the summation of their outputs is reflected in the summation sign of the series itself, the quality of the spectrum of the final output, and in particular the bandwidth of any of its peaks, is independent of the location of such peak on the frequency scale with respect to the midband frequency of any of the filters. The foregoing is nearly, though of course not absolutely, true with a limited though large number of sample points; physically, with a number such as fifteen of reconstruction filters.
The constituent characteristic given by Equation 4 is unsymmetrical with respect to the origin of frequencies except in the special case in which n is zero; that is to say when the characteristic is centered at the origin. This characteristic cannot be realized; that is to say no filter having this characteristic can be physically constructed. However, from this unsymmetrical, unrealizable characteristic there can be generated a symmetrical realizable characteristic by combining with the first constituent characteristic a second constituent characteristic which is defined by the same analytic relation but for negative frequencies. The function sin 1r is such a related constituent function and accordingly, when it is added to the function given by the Equation 4 and when the sum, an even function, is properly combined with an appropriate phase factor, an odd function, the result is a filter characteristic which is physically realizable. Thus Another characteristic which is antisymmetric and therefore realizable is obtained by subtracting Equation 8 from Equation 4, instead of adding them as in Equation 9, and multiplying both terms by i= /1, to give .[sin 1r(hn) sin 1r(h+n) with 1r(h-n) 1r(h+n) Here, the factor i, which is equal to I -irh 9 represents a fixed phase shift of radians or 90 degrees, and accordingly Equation 10 may be rewritten as It will be noted that the amplitude factors of Equations 9, 10, and 10a, if rewritten in absolute value form, would be everywhere positive and would be, in addition, even functions of frequency. Thus the well known conditions for physical realizability are satisfied.
A set of filters constructed to embody the characteristics of Equation 9, or a set constructed to embody the characteristics of Equation 10 serves even better than do the characteristics of equation 4; i.e., better than the unrealizable filters. The reason for this property is that the frequency spectrum of a wave arising from any physical source is itself symmetrical about the origin of frequencies. Thus symmetrical pairs of functions given by Equation 9 or by Equation 10, when properly weighted and summed as indicated in Equation 7, will exactly represent the wave spectrum of any signal encountered in practice.
Fig. 4 shows the configuration of a transversal filter which may readily be constructed to manifest a realizable characteristic, as given by Equation 9 or Equation 10. It
comprises a wave propagation device such as an electromagnetic transmission line 30 having'input terminals 31' and comprising inductors 32 connected in series and capacitors 33 connected in parallel and terminated, as by a resistor 34 connected to its right-hand terminals, to prevent reflection of energy introduced at its left-hand terminals 31 and propagated from left to right. It is provided with a number 2N of lateral taps, designated by successive integral values k. Each of these lateral taps is connected through a weighting network here shown for the sake of simplicity merely as a resistor 35, and either directly or through a phase reversing device 36 to a combining point 37.
The weighting factor for the kth tap of the nth filter is given, for the characteristic of Equation 9 by the formula W cos and, for the characteristic of Equation 10 by the formula tive weighting factors.
The design Formulae 11 and 12 which give the weighting factors for the several (2N) taps of the several (N) filters may be derived by following orthodox procedures which are well known in the field of filter design. These procedures include the determination, from the desired realizable characteristic, Equation 9 or 10, of the impulse response which corresponds to it in the time domain, and the synthesis of such impulse response from a plurality of component impulses, variously delayed and variously weighted. This procedure gives, for each of the various delayed impulses, the corresponding weight.
The signals thus derived from the several taps of any one propagation device are now combined at the point 37 to constitute the output of a particular one of these filters. This output is applied, as explained above, to a variable gain or attenuation device which is controlled as to the gain or attenuation which it introduces by a particular one of the incoming control signals. The outputs of these several devices are in turn combined additively and applied in common to a sound reproducer.
Modifications in detail and various other uses of the invention will suggest themselves to those skilled in the art.
What is claimed is:
1. Wave-synthesizing apparatus which comprises a plurality of filters each of which is proportioned to present an amplitude frequency characteristic comprising a major peak and a plurality of uniformly spaced nulls on each side of said peak, said characteristic thus changing sign at each null, said filters being so proportioned that their several major peaks are uniformly spaced on the frequency scale, each of said filters being further proportioned to have the nulls of its characteristic at the frequencies of the major peaks of the characteristics of its neighbors, means for energizing all of said filters together, means for modulating the outputs of said filters individually by control signals, means for combining the outputs of all of said filters as modulated, and means for reproducing said combined outputs as a wave. a
2. In combination with apparatus as defined in claim 1, a source of oscillations, and means for applying the oscillations of said source to all of said filters together.
3. Apparatus as defined in claim 1 wherein said filters are connected in parallel.
4. In combination with apparatus as defined in claim 1, attenuation-varying means connected in tandem with each of said filters, and means for selectively controlling said attenuation-varying means.
5. In combination with apparatus as defined in claim 4, means for combining the outputs of said several filters as modified by said attenuation-varying means to form a composite signal.
6. In combination with apparatus as defined in claim 5, means for translating said composite signal into a sound wave.
7. Apparatus as defined in claim 1 wherein the amplitude-frequency characteristic of each of said filters is composed of constituents of the form sin 1r(hn) 1r(hn) wherein F is the spacing, on the frequency scale, between the midband frequencies of successive ones of said filters, and wherein n takes on successive integral values for the successive filters of the set.
8. Apparatus as defined in claim 1 wherein the transmission characteristic of each of said filters is of the form F is the spacing, on the frequency scale, between the midband frequencies of successive ones of said filters, and wherein It takes on successive integral values for the successive filters of the set.
9. Apparatus as defined in claim 1 wherein the transmission characteristic of each of said filters is of the form F is the spacing, on the frequency scale, between the midband frequencies of successive ones of said filters, and wherein n takes on successive integral values for the successive filters of the set.
10. Apparatus for synthesizing a wave having a spectrum which extends from a lower frequency of substantially zero to an upper frequency NF, which comprises a set of N filters designated by successive integral numbers n, each of said filters comprising a wave propagation device having a plurality 2N of lateral taps spaced uniformly along its length and designated by successive integral numbers k, means connected with each tap for assigning a specified relative magnitude to its output, means for inverting the phases of some of said outputs, the relative magnitude and the phase inversion for each tap being given by cos mrk W [as means for modulating the outputs of said filters individually by control signals, means for combining the outputs of all of said filters as modulated, and means for reproducing said combined outputs as a wave.
11. Apparatus for synthesizing a wave having a spectrum which extends from a lower frequency of substantially zero to an upper frequency NF, which comprises a set of N filters designated by successive integral numbers it, each of said filters comprising a wave propagation device having a plurality 2N of lateral taps spaced uniformly along its length and designated by successive integral numbers k, means connected with each tap for assigning a specified relative magnitude to its output, means for inverting the phases of some of said outputs, the relative magnitude and the phase inversion for each tap being given by W=cos means for combining the outputs of all the taps of each propagation device, as thus modified, to form a filter output, means for energizing all of said filters together, means for modulating the outputs of said filters individually by control signals, means for combining the outputs of all of said filters as modulated, and means for reproducing said combined outputs as a wave.
12. Apparatus for synthesizing a wave having a spectrum which extends from a lower frequency of substantially zero to an upper frequency NF, which comprises a set of N filters designated by successive integral numbers k, each of said filters comprising a wave propagation device having a plurality 2N of lateral taps spaced uniformly along its length and designated by successive integral numbers k, means connected with each tap for assigning a specified relative magnitude to its output, means for inverting the phases of some of said outputs, the relative magnitude and the phase inversion for each tap being given by W=sin means for combining the outputs of all the taps of each propagation device, as thus modified, to form a filter output, means for energizing all of said filters together, means for modulating the outputs of said filters individually by control signals, means for combining the outputs of all of said filters as modulated, and means for reproducing said combined outputs as a wave.
13. Apparatus for synthesizing a wave which comprises a set of filters, each of said filters comprising a wave propagation device having a plurality of lateral taps spaced uniformly along its length, means connected with each tap for assigning a specified relative magnitude to its output, the relative magnitudes for the several taps of any one device varying sinusoidally along the length of said device, said variation differing from device to device in a fashion to space the midfrequencies of the passbands of the successive filters of the set apart along the frequency scale, means for combining the outputs of all the taps of each propagation device, as thus modified, to form a filter output, means for energizing all of said filters together, and means for utilizing the outputs of all of said filters.
Dudley Mar. 21, 1939 Bedford Feb. 1, 1944