|Publication number||US3327058 A|
|Publication date||Jun 20, 1967|
|Filing date||Nov 8, 1963|
|Priority date||Nov 8, 1963|
|Publication number||US 3327058 A, US 3327058A, US-A-3327058, US3327058 A, US3327058A|
|Inventors||Cecil H Coker|
|Original Assignee||Bell Telephone Labor Inc|
|Export Citation||BiBTeX, EndNote, RefMan|
|Non-Patent Citations (1), Referenced by (13), Classifications (7)|
|External Links: USPTO, USPTO Assignment, Espacenet|
Shees-Sheet .Fune 20, 1967 C. H. COKER SPEECH wAvE ANALYZBR Filed Nov. e, 1965 HQLC/s A TTORNE V nyam@ 20, 1967 c. HJCOKER SPEECH wAvE ANALYZER Filed Nov. e, 1963 3 Sheets-Sheet 2 3v w pew VB vw ha R T R2 Duw- CW y C R T/V E R P/ OUTPUT 7'0 LOW PASS F/L TER June 2U, 1967 c. H. coKER SPEECH WAVE ANALYZER 5 Sheets-Sheet Filed NOV. 8, 1963 /Rs FoRMANr TH /RD FORMAN 7' SECOND EoRMANr I TH/Ro EoRNANr RANGE OND FORM/1m RANGE M. L :Ec
FREQUENCY /N cycLEs RER sEcoNo I OOO RS T FORMA RANGE U T/L /ZA T /ON DEV/ CE FOR/WANT DE TEC TOR FORMA/VT DETECTOR SPEECH /NPUT SIGNAL United States Patent -O 3,327,058 SPEECH WAVE ANALYZER Cecil H. Coker, Berkeley Heights, NJ., assignor to Bell Telephone Laboratories, Incorporated, New York, N.Y., a corporation of New York Filed Nov. 8, 1963, Ser. No. 322,389 Claims. (Cl. 179-1) Ihis invention relates t-o the analysis and transmission of speech, and in particular to the analysis and transmission of speech in bandwidth compression systems.
In order to make more economic use of the frequency bandwidth of speech transmission channels, a number of bandwidth compression arrangements have been devised for transmitting the information content of a speech Wave over a channel whose bandwidth is substantially narrower than that required for transmission of the speech wave itself. Bandwidth compression systems typically include at a transmitting terminal an analyzer for deriving from an incoming speech wave Ia group of narrow bandwidth control signals representative of selected information bearing characteristics of the speech wave, and at a receiving terminal a synthesizer for reconstructing from the control signals a replica of the original speech Wave.
One well-known bandwidth compression system is the so-called resonance vocoder, a specific form of which is described in I. L. Flanagan Patent 2,891,111, issued June 16, 1959. In a resonance vocoder, the distinctive i11- formation bearing characteristics which are represented by the control signals and which are reconstructed at the receiving terminal are the frequency locations of selected peaks or maxima in the speech amplitude spectrum. These spectral maxima correspond to the principal vocal tract resonances or formants; that is, they correspond to frequency regions of relatively effective acoustic transmission through a talkers vocal tract.
In order to reconstruct natural sounding, intelligible speech in a resonance vocoder, it is necessary to determine with accuracy the frequencies at which formants or maxima occur in the speech spectrum, since human listeners are able to detect variations in formant frequencies which are on the order of 5() cycles per second. One system for determining formant locations is described in the above-mentioned J. L. Flanagan patent, in which the spectrum of an incoming speech signal is divided into three frequency subbands, each of which embraces the characteristic frequency range of a particular formant. Each frequency subband is separated into its individual frequency components, and from these components the Flanagan system selects the frequency of the component having the maximum amplitude within a subband to represent the location of a formant occurring within that subband.
It is evident that in the Flanagan system the determination of formant locations is`predicated upon the assumption'that within each selected spectral subband the frequency of a single speech, component having an amplitude larger than the amplitude of any other component accurately represents a formant location. In general, however, formants do not coincide exactly with speech frequency components, so that the frequency of a component having a maximum amplitude within a frequency subband is generally an approximation to the true location of a formant.
The present invention provides a system for accurately determining formant locations by recognizing that in general a formant is most likely to occur at an intermediate frequency lying between several components in the vicinity of a maximum in the speech spectrum. Because the components that occur in the vicinity of a spectral maximum usually have larger amplitudes than any other components 3,327,058 Patented June 20, 1967 ICC within a particular frequency subband, the present invention interpolates a formant location among the frequencies ofthe components having the largest amplitudes within a subband, Further, since a fromant does occasionally coincide with a single speech component having an arnplitude substantially larger than the amplitude of any other component within the subband, the present invention also provides for this situation by selecting the frequency of such a component to represent the formant location.
The present invention determines whether a formant location is to be represented by the frequency of a single component or whether a formant location is to be interpolated among the frequencies of several speech components according to their relative amplitudes by first converting the absolute amplitudes of the speech components within a particular frequency subband into logarithmic values according to a predetermined scale. Then, from the logarithmic values thus obtained there is derived a variable threshold .against which the logarithmic values lof the speech component amplitudes are compared. If no more than a single logarithmic value exceeds the threshold, then a single predetermined frequency corresponding to the speech component having the largest amplitude is selected to represent the formant location. On the other hand, if more than one logarithmic value exceeds the threshold, then the formant location is represented by a frequency interpolated among the predetermined frequencies corresponding tio the speech components having amplitude whose logarithms exceed the threshold, the exact value of the interploated frequency depending upon the relative magnitudes of the differences between the variable threshold and the logarithmic values of the speech components that exceed the threshold. The variable threshold against which the logarithmic values are compared is derived from the logarithmic values by setting the threshold at a level such that the sum of the differences between the threshold and the logarithmic values that exceed the threshold is equal to a predetermined constant.
An important feature of the present invention is its ability to interpolate formant locations among speech component frequencies based on relative, rather than absolute speech component amplitudes, thereby insuring accurate determination of formant locations despite Wide variations in absolute speech component amplitudes due to corresponding wide variations in overall speech energy. This feature of the present invention is attained through the conversion of absolute amplitudes into logarithmic values, and the interpolation of formant locations according to the relative magnitudes of logarithmic differences, for logarithmic differences correspond to ratios of absolute amplitudes, hence the present invention interpolates formant locations despite variations in overall speech energy.
The invention will be fully understood from the following descriptions of illustrative embodiment thereof taken in connection with the appended drawings, in Which:
FIG. 1 is a block diagram showing a complete bandwidth compression system embodying the principles of this invention;
FIG. 2 is a schematic diagram illustrating in detail the maximum value selector and interpolator of this invention;
FIG. 3A is a schematic diagram illustrating in detail certain components of the systems shown in FIG. 1;
FIG. 3B is a graph of assistance in explaining the operation of a portion of the circuit shown in FIG. 3A;
FIG. 4 is a graph of assistance in explaining certain principles of the present invention; and
FIG. 5 is a block schematic diagram illustrating that the principles of this invention may be utilized in systems other than bandwidth compression systems.
Referring first to FIG. 1, this drawing illustrates a complete speech communication system embodying the principles of the present invention. At the transmitter station, an incoming speech wave from source 10, which may be a conventional trans-ducer for converting speech sounds into a corresponding electrical wave, is delivered simultaneously to formant detectors 11, 12, and 13 and to excitation detector 14. Formant detectors 11, 12, and 13 are similar in construction, but as described in detail below, each of these detectors is designed to obtain a relatively narrow band control signal indicative of the frequency location of a particular formant of the sound represented by the incoming speech wave. As shown in FIG. 1, detectors 11, 12, and 13 may be designed to obtain narrow band control signals respectively indicative of the first, second and third formants. lExamples of peaks in the spectrum of a typical voiced speed sound which correspond to these formants are illustrated in FIG. 4, in which the vertical lines represent the frequency components of the speech sound.
Excitation detector 14, which rnay be of any conventional design, derives from the incoming speech Wave a group of narrow band excitation control signals indicative of the characteristics of the excitation that produced this speech wave; for example, detector 14 may be a conventional pitch detector that derives from the speech wave a pitch control signal representing the fundamental frequency of voice portions of the speech wave and a voiced-unvoiced control signal indicating whether the speech wave at a given instant represents a voiced speech sound or an unvoiced speech sound. Alternatively, if desired, excitation detector 14 may comprise the pitch detector and voice amplitude detector, the unvoiced amplitude detector, and the unvoiced control signal generator shown and described in the copending application of E. E. David, Jr., and J. L. Flanagan, Ser. lNo. 235,703, filed Nov. 6, 1962, now issued as Patent 3,190,963 on June 22,` 1965.
The narrow band control signals derived by detectors 11, 12, 13, and 14 represented selected characteristics of the original speech wave which convey sufficient speech information for the reconstruction of an intelligible replica of the original speech wave, but the collective bandwidth of these control signals is substantially smaller than the bandwidth of the original speech wave. Hence these control signals may be transmitted to a distant receiver station over a reduced bandwidth transmission medium, the medium being indicated in FIG. 1 by broken lines. At the receiver station the transmitted narrow band control signals are utilized in a speech synthesizer 1S to reconstruct an artificial speech wave that is a highly intelligible, natural sounding replica of the original speech wave. Speech synthesizer 15 may be any conventional formant vocoder synthesizer; for example, synthesizer 15 may be of the variety shown in the above-mentioned David, ]r.Flanagan Patent 3,190,963. Synthesizer 1-5 produces a reconstructed speech wave having a spectrum whose peaks are adjusted to follow closely the locations ofthe formants of the original speech wave as indicated by the transmitted formant control signals. Speech sounds may be obtained from the original wave by applying the reconstructed wave to a suitable transducer, for example, a conventional loudspeaker 16.
Referring back to formant detectors 11, 12,` and 13 at the transmitter station, only detector 11 is shown and described in detail, since the detectors are structurally simil-ar, except for certain design characteristics which depend upon the particular formant to be located, as described below. Within detector 11 the incoming speech wave is applied in parallel to a bank of n contiguous or overlapping bandpass filters 110-1 through 110-n, where the pass bands of the filters are respectively denoted Afl through Afm in cycles per second. The pass bands of filters 110-1 through 110-n collectively span a frequency subband of the speech spectrum which embraces the frequency range within which the first speech formant normally occurs, and similarly the pass bands of the corresponding banks of bandpass filters in detectors 12 and 13 collectively span frequency subbands of the speech spectrum which embrace the respective frequency ranges within which the second and third formants normally occur. Since the frequency ranges of the various formants overlap to some extent, it is evident that one or more of the bandpass filters in each of the detectors 11, 12, and 13 will have pass bands embracing the same frequencies, as indicated by the division of the frequency scale in FIG. 4. Further, although FIG. 1 illustrates only three formant detectors for determining the first three formants, which are the most important formants for speech intelligibility, it is to be understood that additional formant detectors may be provided if it is desired to determine the frequency locations of higher order formants.
Depending upon the frequency content of the incoming speech wave, there will be developed at the output terminals of a number of the filters -1 through 110-n alternating waves whose frequencies and amplitudes represent the frequencies and -amplitudes of corresponding frequency components of the incoming speech wave within `the frequency subband covered by the pass bands of filters 110-1 through 110-n. The output terminal of each filter 110-1 through 110-n is connected to the input terminal of a corresponding logarithmic amplifier 111-1 through 111-n, each of which is followed by a corresponding rectifier 112-1 through 112-n, and a corresponding low pass filter 113-1 through 113-n, -all connected in series. A detailed schematic of a suitable logarithmic amplifier and rectifier is shown in FIG. 3A and described below, but conventional logarithmic amplifiers and rectifers may be employed if desired, while low pass filters 113-1 through 11S-n may be of any well-known design. The logarithmic amplier, rectifier, and low pass filter following each 'bandpass filter serve to derive from the alternating output signal of a preceding bandpass filter a unidirectional or narrow band signal having a magnitude that represents the logarithm of the instantaneous amplitude of the corresponding speech wave component indicated by a preceding sinusoidal output signal.
In this invention the voltages of the unidirectional output signals of low pass filters 113-1 through 113-n, which represent the amplitudes of the corresponding speech frequency components are described in terms of decibels but other logarithmic scales are equally suitable. A suitable scale factor for these unidirectional signals is 0.1 volt per decibel, where, as shown in FIG. 4, the decibel represents amplitude 4on a logarithmic scale according to the usual definition. By making the unidirectional signals represent the amplitudes of speech frequency components on a logarithmic or decibel scale, the maximum value selector and interpolator 114 in formant detector 11 is able to determine formant locations with a high degree of accuracy, as will be explained in the following ydetailed description of maximum value selector 114. n
Turning now to FIG. 2, this drawing illustrates in detail the structure Vof the maximum value selector and interpolator employed in each of the formant detectors 11, 12, and 13. The output terminal of each of the n low pass filters preceding the maximum value selector and interpolator is connected to the base of a corresponding transistor T1 through Tn, where the emitters of transistors T1 through Tn are each connected through an identical resistor RE to a common emitter bus 22. Transistors T1 through Tn may be NPN transistors of conventional design. The collectors of transistors T1 through T,u are each connected through a corresponding tap P1 through Pn of a plurality `of series connected resistors Rc1 through Rcn to a source of positive potential 21.
A constant current I from conventional current source 2 is Idelivered to the common emitter bus 22, and depending upon the relative magnitudes of the incoming unidirectional -signals which produce base voltages V31 through VBn at the lbases of the corresponding transistors, the current I is directed to flow through one or more of the transistors and the associated resistors Rc1 through Rcn to vary the magnitude of the output voltage V between points a and b. The output voltage V may be calibrated to assume a predetermined value Vj corresponding to a formant centered on the pass band Af, of bandpass filter 110-j by choosing Rcj such that where it is to be understood that calibration of V is made in the order proceeding from Rc1 through Ren. In this manner, the output voltage may be calibrated to assume n predetermined different values V1 through Vn corresponding to the n possibilities that a formant location coincides with a particular speech component whose frequency corresponds to the center frequency of the pass band of a particular bandpass lter. However, as explained in detail below, the output voltage produced by the maximum value selector and interpolator also assumes intermediate values between adjacent voltages Vj and VH1 to represent formant locations that occur at intermediate points between lter center frequencies.
The maximum value selector and interpolator circuit shown in FIG. 2 operates in the following manner. When one of the unidirectional voltages applied -to the base of its corresponding transistor is substantially more positive than any of the 4other unidirectional voltage-s, all of the current I passes through the corresponding transistor and the resistors Rc that lie between the collector of the conducting transistor and the positive power supply 21 to produce an output voltage V whose magnitude has been calibrated to indicate a predetermined frequency location in accordance with Equation 1. This mode of operation may be understood from the following relationship between the common emitter bus voltage, denoted Vk, and the voltage across the ith transistor T1, in the event that Ti is conducting the current Ii,
l making the other transistors noncondu-ctive and allowing all of the current I to flow through Ti, as stated above.
A graphic illustration of the operation of the maximum value selector and interpolator circuit in the situation where a-ll of the current I flows through a single transistor is shown by the rst formant in FIG. 4. It is observed in FIG. 4 that the rst formant coincides with the fth harmonic of the speech wave and it is further observed that the fifth harmonic is about 8 decibels larger than the next largest harmonic, the sixth harmonic. In this situation there will be obtained a unidirectional voltage at the output terminal of one of the low pass lters having an amplitude which is correspondingly larger than the next largest unidirectional voltage, the magnitude of the differ- Yence between the largest and next largest unidirectional voltage being determined by the scale factor that relates voltage to decibels. For example, with a scale factor of lo of a volt -per decibel, the unidirectional voltage representing the fifth harmonic will be Y10 of a volt larger than the unidirectional voltage representing the sixth harmonic. Assuming that the fth harmonic coincides with the center frequency of the pass band of the fifth bandpass filter, 110-5, there will appear at the base of corresponding transistor T5 a base voltage that is at least 5%@ of a volt larger than the base voltage appearing at the base of any other transistor. By choosing the resistance of each of the 6 emitter resistors RE and the value of the current I so that the product I -RE is less than EV10 of a volt, for example, by choosing resistance and current values so that I RE=0.6 volt, it will be seen from Equation 2 that the common emitter bust voltage Vk is more positive than any of the other base voltages, thereby making all of the other transistors nonconductive. As a result, the output voltage V will be equal to the value calibrated for filter 5, that is,
It is to be noted at this point that in the above example the output voltage V would have the same value even if the fifth harmonic did not coincide exactly with the center frequency of the pass band of the fifth bandpass filter. That is, so long as the fth harmonic lay somewhere within the pass band of the fifth bandpass filter and the resulting unidirectional voltage exceeded the produ-ct I RE, T5 would conduct all of the current and the magnitude of the output voltage would be determined by Equation l, However, it has been found that the error between the frequency represented by V and the frequency of the component passed by the corresponding bandpass filter is a tolerable one.
In the example given above it was assumed for purposes of illustration that a formant was characterized by a relatively sharp peak in the speech spectrum so that the formant coincided with a single frequency component having an amplitude substantially larger than the amplitude of any other component within the frequency range of a particular formant. The determination of a formant location by the maximum value selector and interpolator in this situation required only identification of the frequency component having the largest amplitude within the particular formant frequency range. In general, however, these conditions do not prevail since a formant is not necessari-1y centered upon a single, relatively large speech frequency component; rather, it is more usual for a formant to occur in the vicinity of two or more speech frequency components having relatively large amplitudes, as shown in FIG. 4 by the second and third formants. Accurate determination of a formant location in this situation requires interpolation between the frequencies of two or more speech components having the largest amplitudes within the frequency range of a particular formant. It is a particular feature of the maximum value selector and interpolator shown in FIG. 2 that the output voltage V indicates the occurrence of a formant between the frequencies of two or more speech components having the largest relative amplitudes within a particular formant frequency range by interpolating between the two or more largest components according to their relative amplitudes.
Referring back to FIG. 2, it is evident from Equation 2 that when two transistors, T1 and Tj conduct respective current I, and Ij, the common emitter bus voltage Vk is given by Vk: VBirrIiRE: VBj-JRE (3) When the magnitudes of VB, and VB,- become sufficiently large to make Vk more positive than the base voltages applied to any other transistors, the other transistors are made nonconductive, and the sum of the respective currents Ii and Ij `conducted by T1 and Tj is equal to the total current I, that is,
I=liilj (4) In this situation, Equation 3 may be rewritten 1 l Y A Vk= V -I R j 2 El B E 5) that is, the magnitude of Vk is dependent upon both the magnitudes of the base voltages VB, and VB3 and the magnitude of the predetermined constant IRE. The common emitter bus voltage Vk thus. serves as a variable threshold against which the magnitudes of all the base .voltages are compared, the magnitudes of the two base voltages V31 and VBj exceeding the magnitude ofVk by 7 individual amounts IRE and IjRE whose sum is equal to the predetermined constant IRE. This is illustrated graphically in FIG. 4 where it is seen that the sum of the differences between Vk and VB, and VBj is equal to IiRE+IjRE=IRE (6) From Equation 3 the currents Ii and Ij may be expressed in terms of the difference between the base voltages applied to Ti and Tj,
and since Ii=l-1j,
which may be rewritten Similarily, by substituting :1-11 in Equation 8a, the output voltage V may also be expressed as In Equations 8b and 8c it is seen that the output voltage V interpolates between the condition in which Ti conducts all of the current and the condition in which TJ- conducts all of the current. Thus the iirst term on the right-hand side of Equations Sb and 8c respectively represents Vi and Vj, the magnitudes of V when all of the current I ows only through Ti or only through Tj, and the second term on the right-hand side of Equation 8b represents the additional voltage due to the ow of current Ij through Tj, while the second term on the right-hand side of Equation 8c represents the `amount by which Vj is reduced due to the ow of Ii through Ti. The exact formant location represented by V is therefore seen to be dependent upon the relative amplitudes of the two largest speech components as reected in the relative magnitudes of the corresponding base voltages VBi, VB3, that exceed the threshold Vk, and produce the resulting currents Ii, Ij, according to the equations above.
It is believed to be evident from the foregoing explanationthat the maximum value selector and interpolator also interpolates among more than two relatively large frequency components. That is, the value of the output voltage V may be shown to be a function of the respective currents Ii, Ij, Ik IN flowing through a number of transistors Ti, Tj, Tk TN in FIG. 2, in an extension of the result shown in Equation 8a, that is Thus in the situation where base voltages VB, VB3, VBk VBN are respectively applied to transistors T1,
8 Tj, Tk TN to cause respective ycurrents Ii, Ij, lk IN to flow, the value of Vk is given by Vk: VB1I1RE H Il VBN-INRE (10a) However, the relationship between Vk and the base voltages is the same in the case of several conducting transistors as it is in the cases of one or two conducting transistors: when the magnitudes of VB, VBj, VBk VBN become sufciently large to make Vk more positive than the base voltages applied to any of the other transistors, the other transistors are made nonconductive, and the sum of the respective currents I1, Ij, Ik IN conducted yby T1, Tj, Tk TN is equal to the total cur- The specific case of three base voltages VB, VBj, VBk greater than the variable threshold Vk is illustrated graphically in FIG. 4, where it is seen that in the vicinity of the third formant that the sum of the `differences between vk and V31, VBj, VBk IS equal t0 Equation ll therefore specifies the conditions under which the maximum value selector and interpolator -apparatus of this invention interpolates a formant frequency location among the frequencies of any number of relatively large speech components within a frequency subband which have magnitudes greater than the variable threshold Vk, where the sum of the differences in magnitude Ibetween the amplitudes of the two or more relatively large components and the variable threshold is equal to a predetermined constant, and where the variable threshold exceeds the amplitude of any other component.
It is important to observe that the accuracy achieved by the maximum value selector of this invention in determining formant locations is due in large part to the interpolation capability of the maximum value selector and interpolator since most formants occur at an intermediate point between several speech components. By converting absolute speech component amplitudes into logarithmic values, this accuracy is maintained despite wide variations in overall speech energy which produce corresponding wide variations in absolute speech component amplitudes, since the differences in logarithmic values which determine whether interpolation is performed according to Equation 3 correspond to ratios of absolute values.
The discussion above has distinguished between the cases in which the output voltage of the maximum value selector and interpolator may represent either the frequency of a single speech component having an amplitude that is larger than the amplitude of any other cornponent by more than a predetermined amount or a frequency interpolated among several speech components having amplitudes that differ by less than the predetermined amount. However, it may be desirable to interpolate between two different indications of the same frequency component, which may be obtained by construct ing bandpass filters to have partially overlapping pass bands with gentle cutoifs so that the same frequency component is represented by two alternating voltages of different amplitudes. In this way the output voltage of the maximum value selector and interpolator of this invention will indicate that a speech component does not coincide exactly With the center frequency of a particular bandpass lter since the output voltage V in this case will represent an interpolation between two different indications of the same frequency component.
Turning now to FIG. 3A, this drawing illustrates a detailed schematic of a suitable logarithmic amplifier and a suitable rectifier which may be employed in the formant detector of this invention shown in FIG. 1. An incoming alternating voltage representative of a speech frequency component lying within the pass band of a preceding bandpass filter 110 is :applied to the input point of the logarithmic amplifier 30, from which point complementary symmetry transistors T1 and T2 acting together as a high impedance current driver reproduce the applied voltage as an alternating controlled current at the common collector output point 301. The alternating controlled current is passed to ground through the two diodes D1 and D2, which have been selected for logarithmic forward current-voltage characteristics. By connecting the two diodes to present a forward diode characteristic as a load to current in either direction, there is obtained the combined logarithmic current-voltage characteristic shown in FIG. 3B. The overall effect of the current driver and its logarithmic nonlinear load impedance is to generate across the diodes D1, D2 an output voltage whose instantaneous amplitude is proportional to the logarithm of the sinusoidal input voltage amplitude. Transistors T3 and T4 form a high input impedance feedback amplifier that increases the amplitude of the logarithmic voltage developed across diodes D1, D2 sufficiently to overcome nonlinearities in diodes D3, D4 of the following full wave rectifier 31. From element 30 the amplified logarithmic voltage is delivered to element 31, in which the two diodes D2 and D4 and the two transistors T5 and T6 together form a transformerless full wave rectifier. Diodes D3 and D4 each work as a half wave rectifier, one for e-ach half of the wave, while transistors T5 and T6 act as a difference amplifier amplifying on one-half cycle the rectified output signal of D3 and on the other half cycle the rectified output signal of D4. However, since the output signals of D3 and D4 are of opposite polarity, and since these diodes are connected to opposite polarity input terminals of the difference amplifier, each half cycle of the alternating input signal has the same polarity in the output signal of the difference amplifier. The output signal thus developed at the output terminal of rectifier 31 is a unidirectional voltage having an amplitude proportional to the logarithm of the amplitude of a speech frequency component represented by the alternating voltage applied to logarithmic amplifier 30. The unidirectional output signal of rectifier 31 is then passed to a low pass filter 113, as shown in FIG. 1, in order to remove unwanted high frequency components from the unidirectional signal.
Although this invention has been described in terms of speech communication systems of the type shown in FIG. 1, it is to be understood that applications of the principles of this invention are not limited to these systems, but include such related fields as automatic speech recognition, speech processing, and automatic message recording and reproduction. Other applications of this invention are indicated in FIG. 5, in which formant detectors 5-1 through 5-n determine the locations of n selected formants of an incoming speech signal for use in a suitable utilization device 50. In addition, it is to be understood that the above-described embodiments are merely illustrative of the numerous arrangements which may be devised for the principles of this invention by those skilled in the art without departing from the spirit and scope of the invention.
What is claimed is:
1. A speech transmission system that comprises a transmitter station including:
a source of an incoming speech wave,
a plurality of formant detectors for deriving from said speech wave a corresponding plurality of formant contr-ol signals representative of the frequency locations of a corresponding selected plurality of formants of said speech wave, each of said formant detectors comprising a plurality of means for separating a selected subband of the amplitude spectrum of said speech wave into its individual speech frequency components to obtain a group of alternating voltages representative of the amplitudes and frequencies of said speech frequency components,
means for obtaining from said group of alternating voltages a corresponding group of unidirectional voltages having magnitudes proportional to the logarithms of the amplitudes of corresponding speech components,
means under the control of said group of unidirectional voltages for generating a control signal representative of the frequency location of a selected formant of said speech wave according to the relative magnitudes of said unidirectional voltages,
said means for generating a control signal including means for deriving from said unidirectional voltages a threshold having la magnitude proportional to the magnitudes of the largest of said unidirectional voltages so that the sum of the differences between said threshold and the magnitudes of said unidirectional voltages which exceed said threshold is equal to a predetermined constant, and
means for deriving from the magnitudes of said unidirectional voltages which exceed said threshold a control signal having a magnitude proportional to the relative magnitudes of said unidirectional voltages vwhich exceed said threshold, and
an excitation detector for deriving from said speech wave a group of excitation control signals indicative of selected characteristics of the excitation characteristics of said speech wave,
a transmission medium for delivering said formant control signals and said excitation control signals to a receiver station, and
at said receiver station,
means for reconstructing from said formant control signals and said excitation signals a replica of said incoming speech wave at said transmitter station.
2. Apparatus for determining the frequency location of a selected formant of a speech wave from the relative amplitudes of the frequency components of said speech wave which lie within a frequency subband that embraces the frequency range in which said formant may appear which comprises a source of a speech wave,
means supplied with said speech wave for o'btaining from said speech wave a group of unidirectional signals with magnitudes proportional to the logarithms of the amplitudes of the frequency components of said speech wave within said frequency subband, and
means under the control of said group of unidirectional signals for comparing the magnitudes of said unidirectional signals against a threshold derived from said magnitudes to generate Ia formant control signal with a magnitude proportional to the relative magnitudes of each of said unidirectional signals that exceeds said threshold, wherein the sum of the differences in magnitude between said threshold and each of said unidirectional signals is equal to a predetermined constant.
3. Apparatus as defined in claim 2 wherein said means for obtaining a group of unidirectional signals comprises a plurality of signal paths each including an input terminal connected to said source of a speech wave, an output terminal connected to said means under the control of said group of unidirectional signals, and
a bandpass filter, a logarithmic amplifier, a rectifier, and a ylow pass filter connected in series between said input terminal and said output terminal.
4. Apparatus for determining the frequency location of a selected formant of a speech wave as defined in clairn 2 wherein said means under the control of said group of unidirectional signals comprises a source of current of constant magnitude,
a first plurality of resistors in one-to-one correspondence with said `group of unidirectional signals, wherein each of said first plurality of resistors has an input terminal connected in parallel to said sour-ce of current, an output terminal, and a uniform selected resist-ance,
a plurality of transistors in one-to-one correspondence with said group -of unidirectional signals and said first plurality of resistors, wherein each of said transistors is provided with an emitter terminal connected to the output terminal of the corresponding resistor in said first plurality of resistors, a base terminal, and a collector terminal,
a second plurality of resistors in one-to-one correspondence with said group of unidirectional signals and said plurality of transistors, wherein each of said second plurality of resist-ors is provided with a predetermined resistance, an input terminal, and an output terminal,
a source of positive potential having an input point,
means for connecting said second plurality of resistors in series with each other to form a resistance path h-aving an output point Iconnected to the input point of said source of positive potential and an input point,
means for connecting the collector terminal of each of said transistors to the input terminal of the corresponding one of said second plurality of resistors, and
means for applying each of said unidirectional signals to the base terminal of the corresponding one of said plurality of transistors to develop an output voltage between the input point of said source of positive lpotential and the input point of said resistance path.
5. Apparatus for determining the frequency locations of `selected formants of a speech wave from the relative amplitudes of the frequency components of said speech wave which lie within frequency subbands that embrace the respective frequency ranges in which said formants may appear, which comprises a source of a speech wave,
a -plurality of formant detectors connected in parallel to said source of a speech Wave for deriving from said speech wave a corresponding plurality of output signals representative of the frequency location of a selected formant of said speech wave, each of said formant detectors including a plurality of 4contiguous bandpass filters provided with contiguous pass bands that span a frequency sub'band embracing the frequency range within which a preassigned formant normally occurs,
means for `applying said speech wave in parallel to said rbandpass lters to develop a group of alternating voltages representative of the amplitudes of the frequency components of said speech Wave which lie within said subband,
a plurality of logarithmic amplifiers in one-to-one correspondence with said plurality of band-pass filters for deriving from said group of alternating voltages a corresponding group of logarithmic signals representative of the -logarithms of the amplitudes of said alternating voltages,
a plurality of rectiliers in one-to-one correspondence with said plurality of logarithmic amplifiers for obtaining from said group of logarithmic signals a corresponding group of unidirectional voltages,
a plurality of l-ow pass filters in one-to-one correspondence with said plurality of rectifiers for removing unwanted high frequency -components from said group of unidirectional voltages to develop a corresponding group of narrow `band control signals -having magnitudes representative of the logarithms of the amplitudes of the frequency components of said speech wave, and
a maximum value selector and interpolator responsive to the 4magnitudes of said group -of narrow band control signals for `generating `an output signal indicative of the frequency location of said preassigned formant, wherein said maximum value selector and interpolator includes means for deriving from said narrow band control signals a threshold having `a magnitude proportional to the magnitudes of the largest of said narrow band control signals so that the sum of the differences in magnitude between said threshold and said narrow band control signals which exceed said threshold is equal to a predetermined constant, and
means for deriving from the magnitudes of said narrow band control signals which exceed said threshold an output signal having a magnitude representative of the relative magnitudes of said narrow band control signals which exceed said threshold.
No references cited.
KATHLEEN H. CLAFFY, Primary Examiner.
R. MURRAY, Assistant Examiner.
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US3437757 *||Jun 15, 1966||Apr 8, 1969||Bell Telephone Labor Inc||Speech analysis system|
|US3439122 *||Jun 15, 1966||Apr 15, 1969||Bell Telephone Labor Inc||Speech analysis system|
|US3450989 *||Sep 28, 1965||Jun 17, 1969||Ibm||Frequency analyzer for detection of energy peaks|
|US3535549 *||Feb 14, 1967||Oct 20, 1970||Singer Co||Function generator|
|US3975587 *||Sep 13, 1974||Aug 17, 1976||International Telephone And Telegraph Corporation||Digital vocoder|
|US4829574 *||Feb 1, 1988||May 9, 1989||The University Of Melbourne||Signal processing|
|US4833711 *||Oct 19, 1983||May 23, 1989||Computer Basic Technology Research Assoc.||Speech recognition system with generation of logarithmic values of feature parameters|
|US4862503 *||Jan 19, 1988||Aug 29, 1989||Syracuse University||Voice parameter extractor using oral airflow|
|US5463716 *||Jan 18, 1994||Oct 31, 1995||Nec Corporation||Formant extraction on the basis of LPC information developed for individual partial bandwidths|
|US8737641 *||Nov 4, 2008||May 27, 2014||Mitsubishi Electric Corporation||Noise suppressor|
|US20090231011 *||May 31, 2007||Sep 17, 2009||Dirk Baldringer||Circuit assembly for distributing an input signal|
|US20110123045 *||Nov 4, 2008||May 26, 2011||Hirohisa Tasaki||Noise suppressor|
|US20110199144 *||Apr 22, 2011||Aug 18, 2011||Dirk Baldringer||Circuit arrangement for distorting an input signal|
|U.S. Classification||704/209, 327/350, 327/576|
|Cooperative Classification||H05K999/99, G10L19/02|