Search Images Maps Play YouTube News Gmail Drive More »
Sign in
Screen reader users: click this link for accessible mode. Accessible mode has the same essential features but works better with your reader.

Patents

  1. Advanced Patent Search
Publication numberUS3740476 A
Publication typeGrant
Publication dateJun 19, 1973
Filing dateJul 9, 1971
Priority dateJul 9, 1971
Also published asCA967285A1, DE2233872A1, DE2233872C2
Publication numberUS 3740476 A, US 3740476A, US-A-3740476, US3740476 A, US3740476A
InventorsB Atal
Original AssigneeBell Telephone Labor Inc
Export CitationBiBTeX, EndNote, RefMan
External Links: USPTO, USPTO Assignment, Espacenet
Speech signal pitch detector using prediction error data
US 3740476 A
Abstract
Pitch periods in a complex speech signal are determined by evaluating the error in predicting the value of a sample of the signal on the basis of past sample values, and by locating samples for which the prediction error is large. Advantageously, the prediction error signal is devoid of all formant structure, so that there is no chance of confusing pitch signal peaks with formant peaks. A voiced-unvoiced decision is obtained from the ratio of the mean-squared value of the speech signal to the mean-squared value of the prediction error signal.
Images(1)
Previous page
Next page
Description  (OCR text may contain errors)

United States Patent 1 Atal [ June 19, 1973 PREDICTION PARAMETER COMPUTER [54] SPEECH SIGNAL PITCH DETECTOR USING 2,732,424 1/1956 Oliver 179/15.55 R PREDICTION ERROR DATA 3,026,375 3/1962 Graham 179/1 SA 3,420,955 l/l969 Noll 179/l SA [75] Inventor: Bishnu Saroop Atal, Murray Hill,

Primary Examiner-Kathleen H. Claffy [73] Assignee: Bell Telephone Laboratories, Assistant BradfordiL'eaheey lncorporated, Mun-3y Hill, J Attorney-R. J. Guenther and William L. Keefauver [22] Filed: July 9, 1971 ABSTRACT lzll PP 161,173 Pitch periods in a complex speech signal are determined by evaluating the error in predicting the value of ['52] US. Cl 179/1 SA a Sample of the Signal on the basis of P Sample [51] Int. Cl. G101 1/04 and y locating Samples for which the Prediction [58] Field of Search 179/1 SA, 15.55 R; error is large Advantageously, the prediction error 325 33 A nal is devoid of all formant structure, so that there is no chance of confusing pitch signal peaks with formant 5 7 References Cited peaks. A voiced-unvoiced decision is obtained from the 'UNlTED STATES PATENTS ratio of the mean-squared value of the speechsignal 'to the mean-squared value of the prediction errorsignal. 3,437,757 4/1969 Coker 179/1 SA 3,405,237 10/1968 David '179/1 SA 8 Claims, 2 Drawing Figures THRESHOLD 25 DETECTOR I VOICED UNVOICED J,24 23 SIGNAL MEAN- MEAN- SQUARE DIVIDER SQUARE I NETWORK NETWORK '10 n 13 LR PlTCH PULSES OUT FILTER SAMPLER m l8 SPEEC l6 Iv-l. ,19 20' 2| H l2 SUBTRACTOR L.F? PEAK THRE SlGlldALS NETWORK FILTER I RECT' PlCKER l" DETEET O R I CLOCK [M l -J ADAPTIVE PREDICTOR SPEECH SIGNAL PITCH DETECTOR USING PREDICTION ERROR DATA This invention is concerned with the analysis of complex signals, and particularly with the determination of the fundamental frequency, or period, of a complex periodic signal, such as a voiced speech signal. Its principal objectives are to simplify the measurement of pitch frequency and to improve the reliability of the measure.

BACKGROUND OF THE INVENTION A number of arrangements for reducing the channel capacity required for the transmission of complex signals, such as speech signals, have been proposed. One of the best known of these is the vocoder. More recently, techniques for removing inherent signal redundancy through the use of linear prediction techniques have been described. In all of these arrangements, a speech wave is analyzed to determine its significant characteristics, and coded information concerning these characteristics is transmitted instead of the speech signal itself. At a receiver station a synthetic speech signal is developed from the coded information.

In general, a different set of coded signal information is employed in each type of bandwidth compression system. However, virtually all employ one characteristic of the speech signal, namely, its pitch frequency. This characteristic denotes the fundamental frequency at which the vocal cords vibrate during the production of different voiced speech sounds. Most speech bandwidth compression systems also employ coded information to identify a speech signal as voiced or unvoiced. Some combine the two forms of information so that the pitch signal inherently specifies the voicing condition.

FIELD OF THE INVENTION.

A number of different proposals for automatically measuring andencoding the pitch characteristic of a speech signal are known and used in the art. Some rely on simple filtering, some on signal correlation, some on formant detection and tracking, and others on a transformation of the logarithm of the spectrum of a speech signal, the so-called cepstrum of the signal. All of these arrangements, however, operate on the speech signal itself and in one way or another strive to find peak values in the signal, or in a modification of it, which identify the pitch characteristic. Unfortunately, peaks due to formants, particularly the first formant of a speech signal, are often stronger than a peak developed to indicate pitch. If the two peaks are close together, it is difficult to determine which is which. Consequently, even the most sophisticated pitch detectors are subject to error and do not always correctly characterize the pitch frequency of a signal.

It is thus another object of this invention to capitalize on a unique property of a voiced speech signal to develop a measure of the pitch frequency of the signal that is unambiguous and which is entirely independent of the formant character of the speech signal.

SUMMARY OF THE INVENTION Analysis of a complex speech signal to determine its pitch frequency is, in accordance with the invention, based on an analysis of the error between a predicted value of the speech signal based on its past sample values and its actual value at that moment. The time interval represented by the number of samples used to ob tain the predicted value is typically 1 msec. Due to the short memory used in the prediction process, the predicted signal values represent, in large measure, the formant structure of the speech signal. The pitch analysis arrangement of the invention is particularly effective because, in developing a difference signal, i.e., the prediction error signal, the formant structure of the signal is removed from the input signal. Yet, since-the pitch period in speech signals ranges typically from 3 msec to 20 msec, the prediction of the pitch structure, based on 1 msec of past speech, is completely negligible. Thus, pitch information is retained in the prediction error signal. Consequently, there is little or no interference from the formant structure and a peak picking operation is effective in developing a measure of the pitch character of the input signal.

A feature of the invention is the additional use of prediction error samples to develop a voiced-unvoiced signal indication. In accordance with the invention, a voicing decision is based on the ratio of the meansquared value of input signal samples to the meansquared value of corresponding prediction error samples.

This invention will be more fully understood from the following detailed description of an illustrative embodiment of it taken together with the attached drawings.

BRIEF DESCRIPTIONOF THE DRAWINGS FIG. 1 is a block schematic diagram of a speech signal analysis system which illustrates the principles of the invention, and

FIG. 2 is an illustration of the waveform of a segment of a voiced speech signal, the positions of detected pitch pulses in the voiced speech signal, as shown by vertical lines, and a segment of unvoiced speech.

DETAILED DESCRIPTION A signal analysis arrangement which illustrates the principles of the invention is illustrated in FIG. 1. Speech signals supplied from any desired source are delivered to the analyzer and passed through low-pass filter 10. Filter 10 typically has a cutoff frequency in the neighborhood of 5 kHz. The resultant signal is then sampled at a frequency of approximately 10 kHz in sampler 11 under control of signals from clock 12.

Speech samples, s,,, thus derived are supplied to storage unit 13 which maintains them in order, typically in blocks of 200 samples, i.e., s s S200. Blocks or frames of samples are periodically keyed out of storage unit 13, for example, under control of a signal from clock 12, and delivered to adaptive predictor l4, prediction parameter computer 15, and to subtractor network 16.

Adaptive predictor 14 operates on supplied signal samples' to predict the present value of each sample on the basis of a weighted summation of a number of prior sample values. The prediction operation is carried out on a sample-by-sample basis and predictor 14 is periodically supplied with a new frame of samples from storage unit 13. An adaptive predictor suitable for use in the system of this invention is described in detail in a copending application of B. S. Atal, Ser. No. 753,408, filed Aug. 19, 1968, now U.S. Pat. No. 3,631,520.

To accommodate the constantly changing character of the input speech signal, predictor I4 is controlled to adapt it to the current signal condition. It has been found sufficient to readjust the values of the parame' ters used to control the predictor at intervals comparable to those of a pitch period of the signal. Since the exact pitch interval is not available (although the pitch output signal of the system may be used in a feedback arrangement to approximate the interval of a later pitch period), readjustment of the parameter values at intervals corresponding approximately to the time of 200 samples is entirely satisfactory. This corresponds to a time interval of approximately 20 msec.

Prediction parameter computer thus operates on applied speech samples from unit 13 to develop a sequence of parameter signals a a a a,,, which are used periodically to adjust predictor 14. Parameter values a are selected to minimize the mean-squared prediction error of the system. An extensive discussion of the relation of parameter signals a to the input signal, their development, and the manner in which they are used to control the predictor is explained in detail in the above-mentioned copending patent application. Parameter signals from computer 15 are developed well in advance of the time that a block of signals is processed in predictor 14 because of the delay inherent in the prediction operation. Typically, parameter control signals are developed within an interval corresponding to the'time of approximately 60 samples.

Sample values developed by predictor 14 are subtracted in network 16 fromthe actual value of corresponding signal samples delivered from storage unit 13 to the subtractor. The resultant difference signal represents the error in predicting the value of the signal. It is accordingly called a prediction error signal. Evidently, appropriate delay is provided, for example, in the readout of samples from storage unit 13 or in their delivery to subtractor 16, to allow time for all predictor operations to be completed. Suffice it to say that all of the described operations are carried on in synchronism in a conventional manner.

It is of importance to recognize that the values of signal samples are predicted largely on the basis of their formant constituency. Predicted signals, therefore, represent essentially the formant structure of the input signal. Since the predicted signal values are subtracted from actual signal values, the prediction error signal at the output of subtractor network 16 is essentially devoid of all formant information. Yet, the prediction error signal has been found to preserve, and indeed to denote, the pitch character of the applied signal.

Prediction error signals from subtractor 16 are passed through low-pass filter 17. Filter 17 is constructed with a relatively low cutoff frequency since the fundamental pitch of the applied signal generally is in the lower portion of the band. Elimination of higher frequency portions aids in isolating the pitch signal.

In accordance with the invention, the positions of individual pitch pulses in the applied signal is determined by locating the samples for which the prediction error is large. Samples delivered from filter 17 thus have amplitudes that are proportional to the difference between the applied signal sample and the predicted signal. It is necessary, therefore, only to seek the fundamental frequency of the prediction (error) signal. This may be done using any desired fundamental frequency detector 18 of any desired construction. A suitable detector includes a half-wave rectifier 19, employed to retain positive peaks only of the signal in order to simplify later operations. The rectified signal is delivered to peak picking network 20, which seeks the largest sample in each frame of signals. Such peak picking arrangements are well known to those skilled in the art and are frequently used in pitch detection arrangements, particularly those of the cepstrum type. Peak signals thus developed are passed through threshold detector 21', adjusted to a level selected to prevent minor peaks from reaching the output of the analyzer. The threshold is adjusted to accommodate the true fundamental frequency peaks determined, for example, from experience. The resulting sequence of pitch pulses is indicative of the fundamental frequency or period of the applied speech signal and may be used in any desired fashion.

Alternatively, as previously described in the art, the fundamental frequency detector may include an autocorrelator followed by a peak picker and a threshold detector.

FIG. 2 illustrates a typical interval of a speech signal. A voiced speech segment is shown in line A. Line B illustrates the sequence of pulses derived from fundamental frequency detector 18 as the output signal of the analyzer system. Line C of the figure illustrates a typical unvoiced segment of speech.

To assure that a clear distinction between voiced and unvoiced signal segments is available, it is in accordance with the invention to produce a voiced-unvoiced decision signal. In accordance with the invention, the voiced-unvoiced decision is based on the ratio of the mean-squared value of speech samples to the meansquared value of prediction error samples. It has been found that this ratio is considerably smaller for unvoiced speech sounds than for voiced speech sounds, typically by a factor of approximately 10.

Accordingly, speech samples from sampler 1 1 are delivered to mean-squared network 22 and prediction error samples from subtractor 16 are delivered to mean-squared network 23. Networks for deriving a signal proportional to the mean value of sequence of samples are well known in the art and are frequently used in acoustic signal processing apparatus. A typical network includes an arrangement for developing a signal proportional to the square of each signal sample, an adding network for summing a sequence of squared signal values, and a divider network for developing a signal proportional to the average,or mean value, of the summed squared signals.

Two signals proportional, respectively, to the meansquared value of speech samples and the mean-squared value of prediction error samples are delivered to divider network 24 which produces as its output the quotient of the two signal values. The quotient signal is thereupon delivered to threshold detector 25, which is arranged to develop a first signal for quotient values greater than 10, as an indication of a voiced signal interval, and a second signal for quotients less than 10, as an indication of an unvoiced'signal interval. Output signals from detector 25 maybe used in any desired fashion to indicate the voicing character of the input signal.

It will be evident to those skilled in the art that the fundamental frequency determination arrangement of the invention, together with the voicing decision arrangement, greatly enhances the reliability with which two important characteristics of a speech signal are determined. This increased reliability is due primarily to the virtual absence of formant structure in the signal at the time the pitch measurement is made. Furthermore, it will be apparent that the fundamental frequency detector of the invention is particularly applicable to use in a speech transmission system or a speech analysis system in which a linear prediction arrangement is used. In such cases, it is evident that the prediction error signal delivered to subtractor 16 may be derived from the predictor used in coding the speech signals.

Furthermore, it will be apparent that the voicing decision signal may be used in conjunction with other criteria, such as the spectral balance of low frequencies related to high frequencies to make the voicedunvoiced decision more reliable.

What is claimed is:

l. A signal analyzer for determining the fundamental period of a speech signal, whichcomprises,

adaptive predictor means supplied with samples of said speech signal for predicting the present value of each sample on the basis of a weighted summation of a number of prior sample values of said speech signal,

means for subtracting said predicted speech value from the actual speech value to develop a difference signal, and

means for determining the fundamental frequency of said difference signal as an indication of the fundamental period of said speech signal.

2. A signal analyzer as defined in claim 1, wherein said means for determining the fundamental frequency of said difference signal comprises,

means for determining the frequency of occurrence of difference signal maxima above a prescribed threshold.

3. A signal analyzer as defined in claim 1, wherein said means for determining the fundamental frequency of said difference signal comprises, t

means for autocorrelating said difference signal for developing an autocorrelation signal representative of the periodic character of said difference signal, and

means for detecting the location of the peak value of said autocorrelation signal.

4. Apparatus for determining the fundamental period of a speech signal, which comprises,

means for developing an estimate of the present value of a speech signal on the basis of past values of said speech signal,

means for developing a signal representative of the difference between said signal estimate and the true present value of said speech signal, and

means for determining the fundamental frequency of said difference signal to develop a signal representative of the fundamental period of said speech signal.

5. Apparatus for determining the fundamental period of a speech signal, which comprises,

adaptive predictor means supplied with samples of said speech signal for developing an estimate of the momentary value of said speech signal from previously supplied samples, means for developing a prediction error signal from the difference between said predicted signal estimate and the corresponding momentary value of samples of said speech signal, means for identifying prediction error samples whose magnitudes are above a prescribed threshold, and means for utilizing the frequency of occurrence of said identified error samples as a measure of the fundamental period of said speech signal. 6. Apparatus for analyzing the character of a speech signal, which comprises, in combination,

predictor means supplied with samples of a speech signal for developing an estimate of the momentary value of said signal from previously supplied samples, means for developing prediction error signal samples from the difference between samples of said signal estimate and the corresponding momentary value of samples of said speech signal, means for identifying prediction error samples whose magnitudes are above a prescribed threshold, means for developing a first signal proportional to the mean-squared value of said speech samples, means for developing a second signal proportional to the mean-squared value of corresponding ones of said error samples, means for developing a signal proportional to the ratio of said first to said second mean-squared signals, means for utilizing the frequency of occurrence of said identified threshold error samples as a measure of the fundamental period of said speech signal, and means for utilizing said ratio of first and second mean-squared signals as a measure of the voicing characteristic of said speech signal. 7. Apparatus for analyzing the character of a speech signal as defined in claim 6, wherein,

values of said ratio of mean-squared signals equal to or greater than a prescribed threshold are used to classify said speech signal as voiced, and wherein values of said ratio of mean-squared signals less than said threshold are used to classify said speech signal as unvoiced. 8. In a pitch analysis arrangement for speech signals, the combination of,

means for developing a signal representative of the formant structure of an applied speech signal, means for removing said formant representative signal from said speech signal to produce a signal essentially devoid of all formant information, means for measuring the period of said formant devoid signal, and means for determining the voicing character of said speech signal on the basis of the power in said speech signal and the power in said formant devoid signal.

Patent Citations
Cited PatentFiling datePublication dateApplicantTitle
US2732424 *Apr 13, 1951Jan 24, 1956 oliver
US3026375 *May 9, 1958Mar 20, 1962Bell Telephone Labor IncTransmission of quantized signals
US3405237 *Jun 1, 1965Oct 8, 1968Bell Telephone Labor IncApparatus for determining the periodicity and aperiodicity of a complex wave
US3420955 *Nov 19, 1965Jan 7, 1969Bell Telephone Labor IncAutomatic peak selector
US3437757 *Jun 15, 1966Apr 8, 1969Bell Telephone Labor IncSpeech analysis system
Referenced by
Citing PatentFiling datePublication dateApplicantTitle
US3979557 *Jul 3, 1975Sep 7, 1976International Telephone And Telegraph CorporationSpeech processor system for pitch period extraction using prediction filters
US4038495 *Nov 14, 1975Jul 26, 1977Rockwell International CorporationSynthesizer using recursive filters
US4070709 *Oct 13, 1976Jan 24, 1978The United States Of America As Represented By The Secretary Of The Air ForcePiecewise linear predictive coding system
US4074069 *Jun 1, 1976Feb 14, 1978Nippon Telegraph & Telephone Public CorporationMethod and apparatus for judging voiced and unvoiced conditions of speech signal
US4081605 *Aug 18, 1976Mar 28, 1978Nippon Telegraph And Telephone Public CorporationSpeech signal fundamental period extractor
US4133976 *Apr 7, 1978Jan 9, 1979Bell Telephone Laboratories, IncorporatedPredictive speech signal coding with reduced noise effects
US4164626 *May 5, 1978Aug 14, 1979Motorola, Inc.Pitch detector and method thereof
US4280387 *Feb 26, 1979Jul 28, 1981Norlin Music, Inc.Frequency following circuit
US4282406 *Feb 19, 1980Aug 4, 1981Kokusai Denshin Denwa Kabushiki KaishaAdaptive pitch detection system for voice signal
US4383135 *Jan 23, 1980May 10, 1983Scott Instruments CorporationMethod and apparatus for speech recognition
US4472832 *Dec 1, 1981Sep 18, 1984At&T Bell LaboratoriesDigital speech coder
US4561102 *Sep 20, 1982Dec 24, 1985At&T Bell LaboratoriesPitch detector for speech analysis
US4653098 *Jan 31, 1983Mar 24, 1987Hitachi, Ltd.Method and apparatus for extracting speech pitch
US4827517 *Dec 26, 1985May 2, 1989American Telephone And Telegraph Company, At&T Bell LaboratoriesDigital speech processor using arbitrary excitation coding
US4879748 *Aug 28, 1985Nov 7, 1989American Telephone And Telegraph CompanyParallel processing pitch detector
US5010574 *Jun 13, 1989Apr 23, 1991At&T Bell LaboratoriesVector quantizer search arrangement
US5233659 *Jan 3, 1992Aug 3, 1993Telefonaktiebolaget L M EricssonMethod of quantizing line spectral frequencies when calculating filter parameters in a speech coder
US5353372 *Jan 27, 1992Oct 4, 1994The Board Of Trustees Of The Leland Stanford Junior UniversityAccurate pitch measurement and tracking system and method
US5471527 *Dec 2, 1993Nov 28, 1995Dsc Communications CorporationIn a telecommunications network
US5586126 *Dec 30, 1993Dec 17, 1996Yoder; JohnSample amplitude error detection and correction apparatus and method for use with a low information content signal
US5657358 *Apr 22, 1993Aug 12, 1997Interdigital Technology CorporationSubscriber RF telephone system for providing multiple speech and/or data signals simultaneously over either a single or plurality of RF channels
US5687194 *Apr 22, 1993Nov 11, 1997Interdigital Technology CorporationSubscriber RF telephone system for providing multiple speech and/or data signals simultaneously over either a single or a plurality of RF channels
US5717819 *Apr 28, 1995Feb 10, 1998Motorola, Inc.Methods and apparatus for encoding/decoding speech signals at low bit rates
US5734678 *Oct 2, 1996Mar 31, 1998Interdigital Technology CorporationSubscriber RF telephone system for providing multiple speech and/or data signals simultaneously over either a single or a plurality of RF channels
US5852604 *May 20, 1996Dec 22, 1998Interdigital Technology CorporationModularly clustered radiotelephone system
US5937376 *Apr 10, 1996Aug 10, 1999Telefonaktiebolaget Lm EricssonMethod of coding an excitation pulse parameter sequence
US6014374 *Sep 9, 1997Jan 11, 2000Interdigital Technology CorporationSubscriber RF telephone system for providing multiple speech and/or data signals simultaneously over either a single or a plurality of RF channels
US6064956 *Apr 10, 1996May 16, 2000Telefonaktiebolaget Lm EricssonMethod to determine the excitation pulse positions within a speech frame
US6140568 *Nov 5, 1998Oct 31, 2000Innovative Music Systems, Inc.System and method for automatically detecting a set of fundamental frequencies simultaneously present in an audio signal
US6208630Dec 21, 1998Mar 27, 2001Interdigital Technology CorporationModulary clustered radiotelephone system
US6282180Nov 4, 1999Aug 28, 2001Interdigital Technology CorporationSubscriber RF telephone system for providing multiple speech and/or data signals simultaneously over either a single or a plurality of RF channels
US6393002Aug 6, 2001May 21, 2002Interdigital Technology CorporationSubscriber RF telephone system for providing multiple speech and/or data signals simultaneously over either a single or a plurality of RF channels
US6496488Nov 2, 2000Dec 17, 2002Interdigital Technology CorporationModularly clustered radiotelephone system
US6771667Feb 26, 2003Aug 3, 2004Interdigital Technology CorporationSubscriber RF telephone system for providing multiple speech and/or data signals simultaneously over either a single or a plurality of RF channels
US6842440Apr 25, 2002Jan 11, 2005Interdigital Technology CorporationSubscriber RF telephone system for providing multiple speech and/or data signals simultaneously over either a single or a plurality of RF channels
US6954470May 14, 2002Oct 11, 2005Interdigital Technology CorporationSubscriber RF telephone system for providing multiple speech and/or data signals simultaneously over either a single or a plurality of RF channels
US7124075May 7, 2002Oct 17, 2006Dmitry Edward TerezMethods and apparatus for pitch determination
US7245596Jul 11, 2002Jul 17, 2007Interdigital Technology CorporationModularly clustered radiotelephone system
US8447605 *Jun 3, 2005May 21, 2013Nintendo Co., Ltd.Input voice command recognition processing apparatus
US20110213614 *Sep 11, 2009Sep 1, 2011Newsouth Innovations Pty LimitedMethod of analysing an audio signal
USRE32580 *Sep 18, 1986Jan 19, 1988American Telephone And Telegraph Company, At&T Bell LaboratoriesDigital speech coder
USRE34247 *May 2, 1991May 11, 1993At&T Bell LaboratoriesDigital speech processor using arbitrary excitation coding
WO1979000901A1 *Mar 28, 1979Nov 15, 1979Western Electric CoPredictive speech signal coding with reduced noise effects
Classifications
U.S. Classification704/207, 704/219
International ClassificationG10L25/90, G10L25/93
Cooperative ClassificationG10L25/90, G10L25/93, H05K999/99
European ClassificationG10L25/93, G10L25/90