US3437757A - Speech analysis system - Google Patents

Speech analysis system Download PDF

Info

Publication number
US3437757A
US3437757A US557687A US3437757DA US3437757A US 3437757 A US3437757 A US 3437757A US 557687 A US557687 A US 557687A US 3437757D A US3437757D A US 3437757DA US 3437757 A US3437757 A US 3437757A
Authority
US
United States
Prior art keywords
formant
speech
speech wave
wave
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US557687A
Inventor
Cecil H Coker
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AT&T Corp
Original Assignee
Bell Telephone Laboratories Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bell Telephone Laboratories Inc filed Critical Bell Telephone Laboratories Inc
Application granted granted Critical
Publication of US3437757A publication Critical patent/US3437757A/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Electrophonic Musical Instruments (AREA)

Description

April 8, 1969 c. H. COKER 37,757
SPEECH ANALYSIS SYSTEM Filed June 15, 1956 Sheet of 3 PREVIOUS FORMANT INFORMATION l2 II IIII A I/ sELECTN/ E FORNIANT t I FILTER OETECTOR I FIG. 2,4 m F0 H g l INPUT lsRECTRUI/Ig FREQUENCY f F (PREVIOUS F/G.2B
D sELEcTIvE hi FILTER REsPONsE FREQUENCY FIG. 4
27 I 4% FORMANT' PROCESSOR AN'IT'. 2 S'GNAL D T CTOR PROCESSOR E E E8 WR' PI RIIIFI CO 0 N O AIO SPEECH V\lAVE I sELECTIvE FORMANT FILTER DETECTOR T M 7 II 26 32 I a DELAY //Vl/E/V7OR C. H. C OKE R ATTOP/VEV April 8, 1969 c. H. COKER SPEECH ANALYSIS SYSTEM Sheet .3 of 3 Filed June 15, 1966 fin? HT? :1 my cl; l. $3-; civf 25 5 M 1 EOE; :5 Ni 25 @3 am .T NaEGQ Hmwcfiz 03 Inf? 7% Emma M J]; EOE as: 03
J @585 w 1 562 02 Km 7 V m RINW T; lfll a :12 T2 0 w 8 2 ZOEEEE m 5 z llllfizou 55x8 @0555 5.8 295%? $258 265E 3,437,757 SPEECH ANALYSIS SYSTEM Cecil H. Coker, Berkeley Heights, N.J., assignor to Bell Telephone Laboratories, Incorporated, Murray Hill, Berkeley Heights, N .J., a corporation of New York Filed June 15, 1966, Ser. No. 557,687 Int. Cl. H041) 1/66; G101 1/06 U.S. Cl. 179-1 Claims ABSTRAUI OF THE DISCLOSURE This invention pertains to the analysis of speech waves and, more particularly, to the analysis of speech waves in bandwidth compression systems.
In order to make more economical use of the frequency bandwidth of speech transmission channels, a number of bandwidth compression arrangements have been devised f r transmitting the information content of a speech wave over a channel whose bandwidth is substantially narrower than that required for facsimile transmission of the speech wave itself. Bandwidth compression systems typically include, at a transmitting terminal, an analyzer for deriving from an incoming speech wave a group of narrow bandwidth control signals representative of selected information-bearing characteristics of the speech wave and, at a receiving terminal, a synthesizer for reconstructing from the control signals a replica of the original speech wave.
One well-known bandwidth compression system is the so-called resonance vocoder. In a resonance vocoder, the distinctive information-bearing characteristics represented by the control signals and reconstructed at the receiving terminal are the frequency locations of selected peaks or maxima in the speech amplitude spectrum. These selected maxima, formants, correspond to vocal tract resonances, that is, they correspond to frequency regions of relatively effective transmission through a talkers vocal tract. Generally, it is the maxima corresponding to the three principal volacl tract resonances which are selected.
In a typical resonance vocoder analyzer, for example, one of the filterbank, maximum value selector variety, the spectrum of an incoming speech wave is divided into three fixed frequency subbands, and each subband embraces a frequency range within which a particular formant normally occurs. From the speech frequency components lying within a subband there is derived a narrow band control signal representative of the frequency at which a formant peak occurs in that frequency subband of the spectrum. It is an empirical fact that these narrow band control signals, emanating from this type of vocoder, are subject to discontinuous changes to grossly incorrect values of the indicated formant frequencies. Characteristically, these abrupt changes are of short duration and are sometimes attributable to occasional faltering of the glottal excitation of the speaker. Statistically, these discontinuous changes are most likely in error since the nature of speech is such that changes in formant frequency are necessarily continuous with time due to the inertia of the vocal tract. Thus the filter-bank variety of vocoder fails to take into account this natural constraint inherent in speech signals. The above disadvantages partially stem from the fact that the filter-bank vocoder has no provision for comparing present indications of formant locations with previous indications, i.e., the vocoder lacks memory.
In the feedback, tracking-filter variety of resonant vocoder analyzer, on the other hand, strong use is made of the prior history of the speech signal. This type of vocoder is not sensitive to faltering of the glottal excitation of the speaker, but instead, has a tendency to make errors at the onset of voicing and then to perpetuate these errors in a groping search for the proper formant. Thus, while not subject to discontinuous changes in amplitude, there is a strong possibility that the analyzer will lock onto an improper frequency component and continue to do so for a considerable period of time.
In the present inventin these shortcomings are overcome by selectively utilizing the advantages of both types of equalizers without incurring their concomitant serious disadvantages. In accordance with this invention, a formant detector is excited by a speech wave altered by a selective filter, responsive to control signals representative of the previous values of formant locations. At the onset of voicing, the selected filter is bypassed to eliminate the possibility of locking onto an improper speech frequency component. After a predetermined interval of time has elapsed and sutficient information is available to determine where the formants have been located in the past, this information is utilized to provide a reliable indication of the vicinity where the formants will be located in the future. Accordingly, the selective filter is controlled by these signals indicative of the past history of the speech signal and responsive thereto the filter characteristics are altered to accentuate or enhance the transmission characteristics of the filter in the probable formant region. Thus, abrupt and discontinuous changes in immediate formant location indications are highly unlikely since there is a predisposition to find a succeeding formant in the region where previous formant have been located. However, if the input speech signal exhibits a maximum value exceeding a predetermined threshold, the detector will select this new value of spectral component. This feature allows the analyzer to abandon a peak in the speech signal if it becomes absurdly small with respect to another peak. The criterion for what is absurdly small is a matter of choice depending on the selected threshold value. Preselected threshold values provide a graduated set of performance characteristics intermediate between vocoders of the filter-bank and tracking-filter varieties. Furthermore, it has been found that the average error rate of the analyzer of this invention is lower than that found in either type of prior art analyzer.
The invention may be more fully understood from the following description of illustrative embodiments thereof, taken in connection with the appended drawing in which:
FIG. 1 is a block diagram of apparatus illustrative of the operation and principles of the present invention;
FIG. 2A is a graphical presentation which is of assistance in explaining the principles of operation of this invention;
FIG. 2B is a graphical presentation which is also of assistance in explaining the principles of operation of this invention;
FIG. 3 is a block diagram showing a complete speech transmission system embodying the principles of this invention;
FIG. 4 is a block diagram of a formant processor and detector utilized in the present invention; and
FIG. 5 illustrates a selective filter utilized in the present invention.
Explication of the present invention is facilitated by reference to FIGS. 1 and 2. FIG. 2A illustrates the spectral distribution of a speech signal in the vicinity of its first principal speech formant, indicated as F Assuming for the moment that a speech signal is applied directly to formant detector 11 of FIG. 1, there would be a strong possibility that the adjoining peak, indicated as F would be selected as the first principal speech formant. This is because abnormal spectral weighting of the speech signal can cause the peak indicated at P to have, temporarily, more energy than the formant F and hence a detector of the filter-bank variety would select F as the formant. In accordance with the principles of the present invention, however, the signal applied to formant detector 11 is first processed by selective filter 12. Filter 12 is responsive to control signals which are indicative of the previous locations of the principal speech formant P In response to these control signals the transmission characteristics of filter 12 are altered in order to accentuate transmissibility in the region of these previous formant locations. As shown in FIG. 2B, the selective filter response is enhanced in the region where previous formants have been located. Thus, there is a tendency for filter 12 to favor the transmission of spectral components in the vicinity of the formant peak F It is to be noted, however, that it is not a binding constraint that the peak F be transmitted to formant detector 11. If the peak F is larger by a predetermined threshold value, indicated as A in FIG. 2B, then the peak F will be selected irrespective of the prefiltering accen tuation of filter 12. Thus, as discussed above, this feature allows the system to abandon a peak if it becomes absurdly small as compared to another peak. Experience has indicated that the threshold value, A, is optimally /8 to A the average Q of the specific formant peak expressed in decibels. The Q of a formant peak is defined in the same manner as that of a resonance curve, namely, the ratio of the bandwidth to the center frequency of the peak. Thus for a peak F whose Q ranges from three to five, corresponding to a range of ten to fourteen decibels, a threshold value of two decibels has been found appropriate.
In the illustrative embodiment of this invention shown in FIG. 3, which incorporates the above principles, an incoming speech wave from source 10, which may be a conventional transducer for converting speech sounds into a corresponding electrical wave, is applied to equal izer 13. Equalizer 13, which may be of the type disclosed in the copending application of C. H. Coker, Ser. No. 322,390, filed Nov. 8, 1963, now Patent No. 3,327,057, serves to adjust the amplitudes of the frequency components of the speech wave in order to approximately optimize the operation of the formant detecting apparatus which follows. The equalized speech wave from equalizer 13 is simultaneously applied to formant processor and detector 14 and to delay element 15. Formant processor and detector 14 derives from a selected frequency subband of the equalized speech wave a first narrow band control signal representative of the location of the speech formant that normally occurs in that subband, for example, the first speech formant. Delay element 15 serves to delay the equalized speech Wave from equalizer 13 by an amount sufiicient to compensate for the delay, if any, introduced by formant detector 14 in the detection of a speech formant, and the delayed, equal'ued speech wave from delay element 15 is applied to the input terminal of formant suppressor 16.
Formant suppressor 16, which may have one of the alternative forms shown in the aforementioned copending application Ser. No. 322,390, is controlled by the narrow band control signal from processor 14 to suppress a formant peak by suppressing all of the frequency components in the vicinity of that formant peak in the incoming speech Wave which correspond to the formant represented by the narrow band control signal. By suppressing all of the frequency components in the vicinity of a formant peak, before detecting the location of the next formant peak, the frequency subhand within which the next formant peak normally occurs will not contain any large amplitude components from the vicinity of the suppressed formant peak. Since the detection of formant locations is often based upon the frequency location of the largest amplitude components within a particular frequency sub and, the suppression of frequency components in the vicinity of one formant peak prevents these components from being mistakenly recognized as indicating the location of another formant.
The output signal of suppressor 16 is also applied to delay element 17 in order to delay the output signal of suppressor 16 by an amount of time sutficient to compensate for the delay, if any, introduced by formant processor and detector 18 in deriving a second narrow band control signal representative of another speech formant location. The second narrow band control signal from processor 18 is applied to the control terminal of formant suppressor 19, While the delayed output signal from suppressor 16 is applied to the input terminal of suppressor 19. Suppressor 19, which functions in a manner similar to' suppressor 16, serves to suppress in the output signal of suppressor 16 the frequency components in the vicinity of that speech formant which correspond to the formant represented by the narrow band control signal developed by detector 18. The output signal developed by suppressor 19, and delivered to formant detector 21, therefore has two fewer formants than are found in the original speech wave, so that in the situation where the apparatus is designed to locate the three principal speech formants, the output signal of suppressor 19 contains only one principal formant. The output signal of suppressor 19 is passed to formant processor and detector 21, which derives from this output signal a third narrow band control signal representative of still another speech formant, for example, the third principal speech formant.
The narrow band control signals developed at the output terminals of processors 14, 18 and 21 may be utilized to reconstruct a replica of the original speech wave in a suitable synthesizer 22, where it is understood that synthesizer 22 is to be supplied with the usual additional control signals necessary to specify completely the speech characteristics. A suitable synthesizer is disclosed in H. L. Barney Patent 2,819,341, issued Jan. 7, 1958, and speech sounds may be reproduced from the reconstructed wave by a suitable transducer 23, for example, a loudspeaker of conventional design. It is important to note at this point, however, that in order for synthesizer 22 to reconstruct a replica of the speech wave having formants that occur at the same frequency locations as the formants of the original speech wave, it is necessary that the narrow band control signals from processors 14, 18 and 21 unambiguously identify particular formants of the original speech wave. If the three principal formants are ordered in terms of their relative locations on the frequency scale, then, for a given speech sound, the second formant occurs at higher frequencies than the first formant and lower frequencies than the third formant, and the third formant occurs at higher frequencies than the first and second formants. Although equalizer 13 may be constructed so that the first, second, and third narrow band control signals developed by detectors 14, 18 and 21, respectively, represent the first, second, and third principal speech formants in that order, it is contemplated that other types of equalizers may be employed. In that case, the first, second, and third narrow band control signals respectively developed by the detectors may not necessarily represent the first, second, and third principal formant locations in that order. In the latter event, it is necessary to distinguish between the three narrow band control signals so that the proper narrow band signal may be applied to the proper input point of synthesizer 22, and this may be accomplished by passing the three narrow band control signals through formant ordering circuit 24. Formant ordering circuit 24, which may be of the type disclosed in the aforementioned copending application Ser. No. 322,390, rearranges the narrow band control signals as necessary in order that the narrow band control signals representative of the formant locations appear in the desired sequence.
Simultaneously with the application of the incoming speech signal to equalizer 13, the speech signal is also applied to a decision making device, e.g., voiced-unvoiced detector 25 which may be of any well-known type. Detector 25 develops a signal indicative of the nature of the applied speech signal, e.g., whether it is voiced or unvoiced. Other criteria may, of course, be used, as desired. The signals developed by detector 25 are utilized to control formant processors and detectors 14, 18 and 21 via conductor 26, as discussed hereinafter. Information regarding the previous history of formant locations is provided by a feedback arrangement via conductor 27 to the formant processors and detectors. This information is developed at the appropriate output terminals of formant ordering circuit 24, as indicated in FIG. 3.
FIG. 4 shows a typical formant processor and detector 14, 18 or 21, utilized in the apparatus of FIG. 3. The input signals to the processor comprise a speech wave, a control signal via conductor 26 from voiced-unvoiced detector 25 and previous formant location information via conductor 27. The speech wave is applied to selective filter 12, which may be of the type shown in FIG. 5 and discussed hereinafter. Selective filter 12 accentuates or enhances the transmission of spectral components in the vicinity of previous formant locations in response to information from signal processor 28, which is responsive to the control signals appearing on conductor 27. In order to remove random fluctuations and irregularities, if any, in the formant information fed back from the output of formant ordering circuit 24 of FIG. 3, apparatus 28 is preferably an optimum prediction device, for example, an extrapolator, as shown and described in the copendin'g application of C. filed May 7, 1964, now Patent No. 3,349,180. Of course, if the control signals are not subject to fluctuation, apparatus 28 need not be used. The output signal emanating from processor 28 thus corresponds to an optimum average of the previous formant location information.
In order to avoid improper tracking by selective filter 12 upon the initiation of voicing of the input speech signal, which is usually accompanied by noise, delay element 29 responsive to the control signals emanating from detector 25, activate switch 31 via relay 32. Switch 31 in its normally inoperative position is connected in such a manner that selective filter 12 is bypassed. After voicing has commenced, a control signal applied from detector 25, via conductor 26, to element 29 is delayed by approximately ten milliseconds. This delay allows the control signal to develop into a reliable indication of past formant locations prior to operation of selective filter 12. Thus, after a delay of ten milliseconds, relay 32 is activated, operating switch 31, and connecting selective filter 12 directly to formant detector 11. In this mode of operation, selective filter 12 is continuously altering its transmission characteristics, responsive to the previous formant control information emanating from apparatus 28. Thus, there is a strong predisposition for formant detector 11 to select formant peaks occurring in the vicinity of previous formant locations. Formant detector 11 may he of any well-known construction; a suitable formant detector is described in the copending application of C. H. Coker, Ser. No. 322,389, new Patent No. 3,327,058. Thus, at the onset of voicing or some other predetermined criterion, selective filter 12 is bypassed to eliminate the possibility of locking onto an improper speech frequency component, therefore avoiding the disadvantage of typical tracking-filter analyzers. After a predetermined inteval of time, typically milliseconds, selective filter 12, activated by previous formant control information, increases the probability of finding a formant in the vicinity of previous formants, thus avoidig the abrupt and discontinuous changes inherent in analyzers of the filter-bank variety.
A preferable form of selective filter 12 is illustrated in H. Coker, Ser. No. 365,654,
FIG. 5. An incoming speech signal, for example, the signal from equalizer 13 or formant suppressors 16- and 19 of FIG. 3, is applied in parallel to a bank of contigous 0r overlapping bandpass filters 41-1 through 41-11. The pass bands of filters 41-1 through 41n span the entire frequency range of the incoming signal, so that there is developed at the output terminals of these filters a group of alternating signals representative of the frequency components of the incoming speech wave. Each bandpass filter is followed by a conventional logarithmic amplifier and detector 421 through 42n, respectively. Elements 42-1 through 42n develop from the group of alternating signals a corresponding group of unidirectional voltages proportional to the logarithms of the amplitudes of the components of the incoming speech wave. The group of unidirectional voltages from elements 421 through 42n is combined in adders 431 through 43-11 with the output signals of diode-resistor networks 331 through 33-11. Diode-resistor networks 33 are controlled through injection point selector 45 and current generator 47, which may be of the type described in the aforementioned copending application of C. H. Coker, Ser. No. 322,390, by an incoming control signal, from processor 28 of FIG. 4, representative of the location of previous formant peaks. At the output points of networks 33 there are developed a group of unidirectional voltages representing the vicinity in the frequency domain where previous formant peaks have been detected. A constant current is injected, via generator 47 and point selector 45, into a predetermined resistor-diode network 33-1, in response to the applied formant control information. This injected current is distributed via the interconnecting diodes of network 33i and its immediately adjacent networks, 33(i1) and 33(i+1), to load resistors R R and R The signal voltages developed across resistors R R, and R are combined in adders 43(i1), 43-i and 43-(i+1) with the signals emanating from the respective logarithmic amplifiers and detectors. Since no current is injected into the other load resistors, R through R and, R through R the contribution to the respective adders to which they are connected is zero. Accordingly, only the spectral components in channels (i-l) through (il1) are accentuated by the addition of signal voltages developed across the load resistors. The product of the current flowing in the selected load resistors and the resistive values thereof is selected to provide an accentuation, of the speech components in the vicinity of previous formant peaks, of the order of A1 to A; of the Q of the formant peaks, as discussed above. The accentuated spectral components and their unaccentuated counterparts are conveyed via the output leads of adders 431 through 43n to formant detector 11 of FIG. 4. Thus, there is a decided predisposition for formant detector 11 to locate a formant peak in the vicinity of previous formant locations.
Although this invention has been described in terms of speech communcation systems, it is to be understood that applications of the principles of this invention are not limited to these systems but include such related fields as automatic speech recognition, speech processing and automatic message recording and reproduction. In addition, it is to be understood that the above-described embodiments are merely illustrative of the numerous arrangements which may be devised for the principles of this invention by those skilled in the art without departing from the spirit and scope of the invention.
What is claimed is:
1. Apparatus for accentuating the detection of a selected formant of a speech wave which comprises:
a source of an incoming speech wave,
means supplied with said speech wave for obtaining a first group of unidirectional signals representative of the amplitudes of the frequency components of said speech wave,
a source of a control signal representative of the 7 previous frequency locations of a selected formant of said speech wave,
means responsive to said control signal for generating a second group of unidirectional signals representative of the vicinity in the frequency domain of the previous formant locations of said selected formant represented by said control signal,
and means for combining said first and second groups of unidirectional signals.
2. Apparatus for increasing the probability of detecting a selected formant of a speech wave which comprises:
a source of an incoming speech Wave,
means supplied with said speech wave for obtaining a first group of unidirectional signals representative of the amplitudes of the frequency components of said speech Wave,
a source of a control signal representative of the previous frequency locations of a selected formant of said speech Wave,
selector means responsive to said control signal and provided with a plurality of output points for delivering a constant current to a selected plurality of said output points in accordance with the magnitude of said control signal,
means provided with a plurality of input points each connected to a corresponding one of said plurality of output points of said selector means for generating a second group of unidirectional signals representative of the location in the frequency domain of the previous formants represented by said control signal,
and means for combining said first and second groups of unidirectional signals.
3. Apparatus for determining the frequency locations of selected formants of speech sounds which comprises:
a source of an incoming speech wave,
equalizer means supplied with said speech wave for enhancing the relative ampltiudes of the high frequency components of said speech wave by predetermined relative amounts and for eliminating selected frequency components of said speech wave, thereby to develop an equalized speech wave,
first detector means in circuit relation with said equalizer means for deriving from said equalized speech wave a first control signal with a magnitude representative of the frequency location of a formant of said equalized speech wave,
first suppressor means under the control of said first control signal and responsive to said equalized speech wave for individaully suppressing in said equalized speech wave each frequency component in the vicinity of the formant represented by said first control signal to obtain a first suppressed formant speech wave,
second detector means in circuit relation with said first suppressor means for deriving from said first suppressed formant speech wave a second control signal with a magnitude representative of the frequency location of a formant of said first suppressed formant speech wave,
second suppressor means under the control of said second control signal and supplied with said first suppressed formant speech wave for individually suppressing in said first suppressed formant speech wave each frequency component in the vicinity of the formant represented by said second control signal to obtain a second suppressed formant speech Wave,
third detector means in circuit relation with said second suppressor means for deriving from said second suppressed formant speech Wave a third control signal with a magnitude representative of the frequency location of a formant of said second suppressed formant speech wave,
formant ordering means supplied with said first, second, and third control signals for arranging said first,
second, and third control signals in a predetermined order,
means responsive to said ordered control signals for altering the performance of said detector means in accordance with a predetermined schedule to increase the probability of detecting a subsequent formant in the vicinity of the previous formants represented by said control signals,
means responsive to said incoming speech wave for controlling the operation of said detector means in accordance with predetermined criteria,
and speech synthesizing means responsive to said first,
second and third ordered control signals for reconstructing a replica of said incoming speech Wave.
4. Apparatus as defined in claim 3 wherein said altering means comprises:
means for removing random fluctuations and irregularities in said control signals to establish a signal which corresponds to the optimum estimate of said control signals,
and selective filter means responsive to said optimum control signals for accentuating the transmission of the speech wave spectral components in the vicinity of previous formant locations established by said optimum control signals.
5. Apparatus as defined in claim 4 wherein said selective filter means comprises:
means supplied with said speech wave for obtaining a first group of unidirectional signals representative of the amplitudes of the frequency components of said speech wave,
means responsive to said control signals for generating a second group of unidirectional signals representa tive of the vicinity in the frequency domain of the previous formant locations of said selected formant represented by said control signals,
and means for combining said first and second groups of unidirectional signals.
'6. Apparatus as defined in claim 3 wherein said controlling means comprises:
means for developing a signal representative of the voiced or unvoiced nature of said speech wave, means for delaying said representative signal,
and means responsive to said delayed representative signal for energizing said altering means during voiced intervals of said speech wave. 7. Speech equalizer apparatus for determining the frequency locations of selected formants of a speech Wave comprising:
selective filter means responsive to applied control signals representative of previous formant locations of said speech wave for enhancing the transmissibility of the spectral components of said speech wave in the vicinity of said previous formant locations,
detection means responsive to the signals of said selective filter means for determining the locations of the formants of said speech wave,
and means responsive to signals representative of the nature of said speech waves for applying said speech waves directly to said detection means for a predetermined interval of time.
8. Apparatus as defined in claim 7 wherein said selective filter means comprises:
means supplied with said speech wave for obtaining a first group of unidirectional signals representative of the amplitudes of the frequency components of said speech wave,
means responsive to said control signals for generating a second group of unidirectional signals representative of the vicinity in the frequency domain of the previous formant locations of said selected formant represented by said control signals,
and means for combining said first and second groups of unidirectional signals.
9. Apparatus for determining the frequency locations 10 performance of said detection means in accordance with a predetermined schedule to increase the probability of detecting a subsequent formant in the vicinity of the previous formants represented by said control signals,
means responsive to said incoming speech Wave for controlling the operation of said detection means during predetermined intervals of said speech Wave,
and speech synthesizing means responsive to said first,
second and third control signals for reconstructing a control signal and responsive to said equalized speech Wave for individaully suppressing in said equalized speech Wave each frequency component in the vicinity of the formant represented by said first control signal to obtain a first suppressed formant speech wave,
second detector means in circuit relation With said first suppressor means for deriving from said first suppressed formant speech wave a second control signal representative of the frequency location of a formant of said first suppressed formant speech Wave,
second suppressor means under the control of said second control signal and supplied with said first suppressed formant speech Wave for individually supi pressing in said first suppressed formant speech wave each frequency component in the vicinity of the formant represented by said second control signal to obtain a second suppressed formant speech Wave,
third detector means in circuit relation with said second suppressor means for deriving from said second suppressed formant speech wave a third control signal representative of the frequency location of a formant of said second suppressed formant speech wave,
means responsive to said control signals for altering the replica of said incoming speech Wave.
10. Apparatus as defined in claim 9 wherein said altering means comprises:
means for compensating for random fluctuations and irregularities in said control signals,
and selective filter means responsive to said compensating means for emphasizing the transmission of the speech wave spectral components in the vicinity of previous formant locations established by said control signals.
References Cited UNITED STATES PATENTS 2,857,465 10/1958 Schroeder. 3,078,345 2/1963 Campanella 179-15.55
3,327,057 16/ 1967 Coker. 3,327,058 6/1967 Coker.
KATHLEEN H. CLAFFY, Primary Examiner. R. P. TAYLOR, Assistant Examiner.
US. Cl. X.R.
US557687A 1966-06-15 1966-06-15 Speech analysis system Expired - Lifetime US3437757A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US55768766A 1966-06-15 1966-06-15

Publications (1)

Publication Number Publication Date
US3437757A true US3437757A (en) 1969-04-08

Family

ID=24226484

Family Applications (1)

Application Number Title Priority Date Filing Date
US557687A Expired - Lifetime US3437757A (en) 1966-06-15 1966-06-15 Speech analysis system

Country Status (1)

Country Link
US (1) US3437757A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3732868A (en) * 1970-03-25 1973-05-15 Philips Corp Device for the audible reproduction of a cardiogram with speech-like sounds
US3740476A (en) * 1971-07-09 1973-06-19 Bell Telephone Labor Inc Speech signal pitch detector using prediction error data
US3808370A (en) * 1972-08-09 1974-04-30 Rockland Systems Corp System using adaptive filter for determining characteristics of an input
US3989896A (en) * 1973-05-08 1976-11-02 Westinghouse Electric Corporation Method and apparatus for speech identification

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US2857465A (en) * 1955-11-21 1958-10-21 Bell Telephone Labor Inc Vocoder transmission system
US3078345A (en) * 1958-07-31 1963-02-19 Melpar Inc Speech compression systems
US3327058A (en) * 1963-11-08 1967-06-20 Bell Telephone Labor Inc Speech wave analyzer
US3327057A (en) * 1963-11-08 1967-06-20 Bell Telephone Labor Inc Speech analysis

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US2857465A (en) * 1955-11-21 1958-10-21 Bell Telephone Labor Inc Vocoder transmission system
US3078345A (en) * 1958-07-31 1963-02-19 Melpar Inc Speech compression systems
US3327058A (en) * 1963-11-08 1967-06-20 Bell Telephone Labor Inc Speech wave analyzer
US3327057A (en) * 1963-11-08 1967-06-20 Bell Telephone Labor Inc Speech analysis

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3732868A (en) * 1970-03-25 1973-05-15 Philips Corp Device for the audible reproduction of a cardiogram with speech-like sounds
US3740476A (en) * 1971-07-09 1973-06-19 Bell Telephone Labor Inc Speech signal pitch detector using prediction error data
US3808370A (en) * 1972-08-09 1974-04-30 Rockland Systems Corp System using adaptive filter for determining characteristics of an input
US3989896A (en) * 1973-05-08 1976-11-02 Westinghouse Electric Corporation Method and apparatus for speech identification

Similar Documents

Publication Publication Date Title
US5749067A (en) Voice activity detector
US5161210A (en) Coder for incorporating an auxiliary information signal in a digital audio signal, decoder for recovering such signals from the combined signal, and record carrier having such combined signal recorded thereon
US7366294B2 (en) Communication system tonal component maintenance techniques
US6345246B1 (en) Apparatus and method for efficiently coding plural channels of an acoustic signal at low bit rates
US4074069A (en) Method and apparatus for judging voiced and unvoiced conditions of speech signal
US5511093A (en) Method for reducing data in a multi-channel data transmission
US5621854A (en) Method and apparatus for objective speech quality measurements of telecommunication equipment
EP0877355A2 (en) Speech coding
US6421802B1 (en) Method for masking defects in a stream of audio data
EP0856961A2 (en) Testing telecommunications apparatus
US6240388B1 (en) Audio data decoding device and audio data coding/decoding system
KR20010099764A (en) A method and device for adaptive bandwidth pitch search in coding wideband signals
US6873954B1 (en) Method and apparatus in a telecommunications system
US6011846A (en) Methods and apparatus for echo suppression
US5430826A (en) Voice-activated switch
US20050114119A1 (en) Method of and apparatus for enhancing dialog using formants
US3903366A (en) Application of simultaneous voice/unvoice excitation in a channel vocoder
US6205423B1 (en) Method for coding speech containing noise-like speech periods and/or having background noise
US20150071463A1 (en) Method and apparatus for filtering an audio signal
US3437757A (en) Speech analysis system
US6199036B1 (en) Tone detection using pitch period
US4219695A (en) Noise estimation system for use in speech analysis
US3327058A (en) Speech wave analyzer
US3405237A (en) Apparatus for determining the periodicity and aperiodicity of a complex wave
US3327057A (en) Speech analysis