United States Patent Office Palented
3,327,058 SPEECH WAVE ANALYZER Cecil H. Coker, Berkeley Heights, N.J., assignor to Bell Telephone Laboratories, Incorporated, New York, N.Y., a corporation of New York
Filed Nov. 8, 1963, Ser. No. 322,389
5 Claims. (CI. 179—1)
This invention relates to the analysis and transmission of speech, and in particular to the analysis and transmission of speech in bandwidth compression systems.
In order to make more economic use of the frequency bandwidth of speech transmission channels, a number of bandwidth compression arrangements have been devised for transmitting the information content of a speech wave over a channel whose bandwidth is substantially narrower than that required for transmission of the speech wave itself. Bandwidth compression systems typically include at a transmitting terminal an analyzer for deriving from an incoming speech wave .a group of narrow bandwidth control signals representative of selected information bearing characteristics of the speech wave, and at a receiving terminal a synthesizer for reconstructing from the control signals a replica of the original speech wave.
One well-known bandwidth compression system is the so-called "resonance vocoder," a specific form of which is described in J. L. Flanagan Patent 2,891,111, issued June 16, 1959. In a resonance vocoder, the distinctive information bearing characteristics which are represented by the control signals and which are reconstructed at the receiving terminal are the frequency locations of selected peaks or maxima in the speech amplitude spectrum. These spectral maxima correspond to the principal vocal tract resonances or formants; that is, they correspond to frequency regions of relatively effective acoustic transmission through a talker's vocal tract.
In order to reconstruct natural sounding, intelligible speech in a resonance vocoder, it is necessary to determine with accuracy the frequencies at which formants or maxima occur in the speech spectrum, since human listeners are able to detect variations in formant frequencies which are on the order of 50 cycles per second. One system for determining formant locations is described in the above-mentioned J. L. Flanagan patent, in which the spectrum of an incoming speech signal is divided into three frequency subbands, each of which embraces the characteristic frequency range of a particular formant. Each frequency subband is separated into its individual frequency components, and from these components the Flanagan system selects the frequency of the component having the maximum amplitude within a subband to represent the location of a formant occurring within that subband.
It is evident that in the Flanagan system the determination of formant locations is predicated upon the assumption that within each selected spectral subband the frequency of a single speech, component having an amplitude larger than the amplitude of any other component accurately represents a formant location. In general, however, formants do not coincide exactly with speech frequency components, so that the frequency of a component having a maximum amplitude within a frequency subband is generally an approximation to the true location of a formant.
The present invention provides a system for accurately determining formant locations by recognizing that in general a formant is most likely to occur at an intermediate frequency lying between several components in the vicinity of a maximum in the speech spectrum. Because the components that occur in the vicinity of a spectral maximum usually have larger amplitudes than any other components
within a particular frequency subband, the present invention interpolates a formant location among the frequencies of the components having the largest amplitudes within a subband. Further, since a fromant does occasionally coincide with a single speech component having an amplitude substantially larger than the amplitude of any other component within the subband, the present invention also provides for this situation by selecting the frequency of such a component to represent the formant location.
l0 The present invention determines whether a formant location is to be represented by the frequency of a single component or whether a formant location is to be interpolated among the frequencies of several speech components according to their relative amplitudes by first converting the absolute amplitudes of the speech components within a particular frequency subband into logarithmic values according to a predetermined scale. Then, from the logarithmic values thus obtained there is de
2_ rived a variable threshold against which the logarithmic values of the speech component amplitudes are compared. If no more than a single logarithmic value exceeds the threshold, then a single predetermined frequency corresponding to the speech component having the largest am
25 plitude is selected to represent the formant location. On the other hand, if more than one logarithmic value exceeds the threshold, then the formant location is represented by a frequency interpolated among the predetermined frequencies corresponding to the speech compo
g0 nents having amplitude whose logarithms exceed the threshold, the exact value of the interploated frequency depending upon the relative magnitudes of the differences between the variable threshold and the logarithmic values of the speech components that exceed the threshold. The
gg variable threshold against which the logarithmic values are compared is derived from the logarithmic values by setting the threshold at a level such that the sum of the differences between the threshold and the logarithmic values that exceed the threshold is equal to a predeter
40 mined constant.
An important feature of the present invention is its ability to interpolate formant locations among speech component frequencies based on relative, rather than absolute speech component amplitudes, thereby insuring ac
45 curate determination of formant locations despite wide variations in absolute speech component amplitudes due to corresponding wide variations in overall speech energy. This feature of the present invention is attained through the conversion of absolute amplitudes into logarithmic
50 values, and the interpolation of formant locations according to the relative magnitudes of logarithmic differences, for logarithmic differences correspond to ratios of absolute amplitudes, hence the present invention interpolates formant locations despite variations in overall speech en
The invention will be fully understood from the following descriptions of illustrative embodiment thereof taken in connection with the appended drawings, in which:
FIG. 1 is a block diagram showing a complete bandg0 width compression system embodying the principles of this invention;
FIG. 2 is a schematic diagram illustrating in detail the maximum value selector and interpolator of this invention;
FIG. 3A is a schematic diagram illustrating in detail 65 certain components of the systems shown in FIG. 1;
FIG. 3B is a graph of assistance in explaining the operation of a portion of the circuit shown in FIG. 3A;
FIG. 4 is a graph of assistance in explaining certain principles of the present invention; and fjO FIG. 5 is a block schematic diagram illustrating that the principles of this invention may be utilized in systems other than bandwidth compression systems.