Search Images Maps Play YouTube News Gmail Drive More »
Sign in
Screen reader users: click this link for accessible mode. Accessible mode has the same essential features but works better with your reader.

Patents

  1. Advanced Patent Search
Publication numberUS3488446 A
Publication typeGrant
Publication dateJan 6, 1970
Filing dateOct 31, 1966
Priority dateOct 31, 1966
Publication numberUS 3488446 A, US 3488446A, US-A-3488446, US3488446 A, US3488446A
InventorsRalph L Miller
Original AssigneeBell Telephone Labor Inc
Export CitationBiBTeX, EndNote, RefMan
External Links: USPTO, USPTO Assignment, Espacenet
Apparatus for deriving pitch information from a speech wave
US 3488446 A
Abstract  available in
Images(2)
Previous page
Next page
Claims  available in
Description  (OCR text may contain errors)

R. L. MILLER Jan. 6, 1970 2 sheets-sheet x Filed Oct. 31. 1966 @E jf mL J 285C S922: l @6 2 N? 1 v r N OCWEQ /L 5&2@ m NEE; S\ @z wwcl 5:23 L V 1 l 1 1 1 H: UGS ENSS I l I l 1 l l l l L 1 Smm O02 -SN nu I mi l Al w |,f||||| OMV'V E Q .SU @z xz 22582 l l l I l l l I l l l l l l l l l *l l E28 I l @22725 @zh mmm a835@ im@ Ewz GNS mm vm Nm" xUOl Q m 5%: S 55% www r @2225@ 29mg@ mQmOQjW 2925i 5912 m [J1 NN lllr'l. zu@ 503m m23 oj A n l m v iT 5&3@ w V ,-:L Z, @35,2 r mmJl a Q wh* A T TOR/VE V Jan. 6, 1970 R. l.. MILLER n 3,488,446

APPARATUS FOR DERIVING FITCH INFORM.`.TION FROM A SPEECH WAVE Filed Oct. 31. 1966 2 Sheets-She'et a AMPLITUDE -P- o E L L L A A A sAMPLE F rL n rL rL n SAMPLEy 2 G L n rL IL L n) L* l l l I I l I TLME 3D fA4 PEAK L SAMPLE HG- 3 VREcTlPxEP E HOLD L f4: A DELAY 7 A SAMPLE l f4e f47 5 42 cLocK *d6 AvERAGER LPF SAMPLE 2 l-JlDELAY PEAK L SAMPLE RECTIFLEP a. HOLD United States Patent O 3,488,446 APPARATUS FOR DERIVING PITCH INFORMA- 'HON FROM A SPEECH WAVE Ralph L. Miller, Chatham, NJ., assignor to Bell Telephone Lahoratories, Incorporated, Murray Hill, NJ., a corporation of New York Filed Oct. 3l, 1966, Ser. No. 590,582 lnt. Cl. H04b 1 66 U.S. Cl. 179-4555 10 Claims ABSTRACT OF THE DISCLOSURE In the present invention, a signal proportional to the peak envelope wave of an applied speech wave is d evel-, oped and algebraically combined, via a slicing action, with the speech wave. The resultant peak pulses developed by the .slicing action may, after further processing, be used as excitation or control signals for a vocoder synthesizer during voiced portions of the speech wave.

This invention pertains to communication systems for transmitting the information content of a wide-band speech `wave over a narrow-band channel and, more particularly, to apparatus for deriving pitch information from a speech wave.

Attempts to improve the performance of channel vocoder systems have been plagued by the so-called pitch problem, i.e., the problem of determining, with the accuracy demanded by the ear, whether a speech signal is voiced or unvoiced and, if voiced, the pitch. With the advent of the voice-exciter vocoder (VEV) described, e.g., in Patent No. 3,030,450 issued to M. R. Schroeder on Apr. 17, 1962, it was thought that the n eed for determining the fundamental pitch of a speech signal had been obviated. In the VEV, the excitation signal for the vocoder synthesizer is obtained from the low-frequency components of the speech signal, i.,e., the baseband, by a p rocess called spectrum attening. This process is described in the above-cited Schroeder patent and in Patent 3,139,- 487 issued to B. F. Logan et al. on June 30, 1964. The success of this method of excitation was immediate. The excitation signal in a VEV has, inherently, the correct periodicity: for an aperiodic input, the output of the spectrum-ilattener is also aperiodic, and, for a periodic input, the output will be periodic with the same periodicity.

With the development of the voice-excited vocoder synthesized speech has lost much of its unnatural buzzy quality. However, with the modern demand for highdelity vocoders, the conventional VEV has not been found suitable in those particular applications requiring exacting standards.

The spectrum flattening process, when applied to the received baseband signal producesq a series of pulses-for each pitch period of the speech signal. This series of pulses, when considered in terms of the impulse response of the synthesizing filter system, tends to ll in the whole pitch period of the synthesized speech with a series of peaks, in contradistinction to the simple damped sinusoid which is characteristic of voiced speech waves. The subjective effect of this series of peaks is to produce speech signals having, for the want of a better term, a very mushy quality. Measurements have indicated that this distortion is primarily due to phase variations in the harmonic spectral components. In addition, the spectrum flattening process tends to produce a buz-like quality for unvoiced sounds; the sensitivity of the ear is such that an attempt to stimulate what is essentially white noise with a series of quasi-random peak signals is bound to be detected.

ice

In the copending application of mine, now issued as U.S. Patent 3,431,362, on Mar. 4, 1969, the quality and naturalness of synthesized speech is improved by incorporating two seemingly incompatible concepts, namely, pitch determination and voice excitation. The spectrumtlattener of the conventional voice-exciter vocoder is replaced by a pitch pulse generator and a voice-unvoiced detector. Responsive to the received baseband signal, a sequence of excitation pulses is generated having a repetition rate corresponding to the fundamental frequency of the speech signal. Thus, one pulse per pitch period of the original speech wave is generated. This excitation signal, after appropriate filtering, closely resembles the damped sinusoid of voiced speech waves. The sequence of excitation pulses is selectively combined with random noise signals and the resultant signal is applied as an excitation function t0 the vocoder synthesizer channel control modulators.

Thus, the circle has come full terin. An accurate determination of whether a speech signal is voiced or unvoiced is a necessity in both channel vocoder systems and in high quality VEV systems, Even more importantly, a sequence of excitation or control pulses must be generated which have a repetition rate accurately corresponding to the fundamental frequency of the speech signal.

It is therefore an object of this invention to improve the naturalness of vocoder speech by generating a sequency of excitation pulses having a repetition rate corresponding to the fundamental frequency of a speech signal.

It is another object of this invention to determine at any given instant whether an incoming speech wave represents a voiced or unvoiced sound.

Yet another object of this invention is to insure that only one pulse per pitch period of an incoming speech wave is generated.

In accordance with the principles of the present invention, it is recognized that a basic characteristic of a voiced speech signal, even when the fundamental pitch component is not present, is that it will have an envelope wave which is related to the pitch period of the speakers voice. It is further recognized that a speech Iwave will usually be unsymmetrical, i.e., it will have a much larger series of peaks for one polarity of the signal than for the other. In the present invention, two similar circuits are utilized to track both the positive and negative peaks of the speech envelope wave. Two tracking waves are thus obtained whose amplitudes correspond to the maximum peaks of the speech wave for each polarity of the speech signal. A determination is made as to which of the two tracking signals is of greater amplitude and the larger signal is selectively combined with a delayed version of the original speech signal. By adjusting the amplitude of the delayed speech wave to be slightly greater than the selected peak tracking signal, pulses are developed, via a slicing action, each time peaks of the speech wave exceed the amplitude of the peak tracking signal. These peak pulses after further processing may then be used as excitation signals or as excitation control signals .for a vocoder synthesizer. Any pulses developed during an unvoiced period of the speech signal are meaningless. Thus, a determination must be made as to the voiced-unvoiced nature of the applied speech signal.

The problem of making an accurate voiced-unvoiced decision is particularly diicult when the applied voice signal has been band limited. For example, typical telephone signals are restricted to the range of 200 to 3500 cycles per second (c.p.s.). Much higher frequencies are usually used to make reliable voiced-unvoiced decisions. In `accordance with the present invention, accurate determination is accomplished by the use of two parallel decision circuits. The axis-crossing rate for voiced sounds will nearly always be below the rate corresponding to an input frequency of 1600 c.p.s. One circuit is utilized to make a determination on this basis. Certain voiced sounds, however, have sufciently high frequency content so as to give a false indication when this criterion is used. Accordingly, a determination is made by a second circuit as to the arnplitude of the speech wave in the 200 to 1000 c.p.s. range. Voiced sounds normally have a larger absolute amplitude for components in this low frequency range. Thus a measure of low frequency energy content is used as a check on the decision of the axis-crossing circuit. The final Vvoiced-unvoiced determination is used to control the transmission of the peak pulses which have been developed in a manner as described.

If the slicing action, discussed above, is adjusted to operate between 80 to 90 percent of the peak amplitude of the voiced wave, a great majority of different voiced waves will yield only a single pulse for each peak period of the voiced wave. Occasionally, two peaks per pitch period may be developed for certain waves and, on very rare Occasions, three or more pulses. In accordance with the principles of this invention, by utilizing a blanking interval following the first pulse of a given period, these undesired additional pulses may be eliminated. The most desirable blanking interval depends upon the pitch period of the voice of the person speaking. An average short period for a mans voice is of the order of 6 milliseconds and that for a womans voice is approximately 3 milliseconds. Thus, practically all unwanted multiple pulses can be eliminated if a blanking interval of slightly less than 3 milliseconds is used. Automatic blanking is accomplished in the present invention by the use of two branch circuits. In a first branch, a variable blanking circuit, controlled by signals proportional to the average pitch of the speech wave, provides the desired blanking interval for the peak pulses. In a second branch, a fixed blanking circuit, adjusted, for example, to a predetermined interval of 3 milliseconds, develops a pulse train whose repetition rate is a measure of the average pitch rate. These pulses are integrated and the resultant signal, proportional to the pitch period, is then used to control the variable blanking circuit of the first branch.

Thus, by the practice of this invention one pulse per pitch period of an incoming speech signal is developed during voiced intervals of an applied speech wave. Synthesized speech may therefore be obtained with higher quality and greater naturalness than previously available in vocoder systems.

This invention may be more fully understood from the following detailed description of an illustrative ernbodiment thereof taken in connection with the drawings in which:

FIG. 1 is a schematic block diagram of apparatus for deriving pitch information from an applied speech wave in accordance with the principles of the present invention;

FIG. 2 is a graphical presentation of the operation of the peak detector of the apparatus of this invention;

FIG. 3 is a schematic block diagram of a peak follower circuit used in the present invention; and

FIG. 4 is an illustrative schematic diagram of a circuit used in this invention.

The apparatus of the present invention may be utilized in either a channel vocoder transmitter station or the receiver station of a voice-excited vocoder. The speech input 10, depicted in FIG. l, therefore corresponds either to the applied speech signal at the input of a channel vocoder or the baseband excitation signal used in a VEV. Speech signals are applied in parallel to peak detector and voiced-unvoiced detector 30. Peak detector 20 is used to develop one pulse per pitch period of the speech signal. Explication of its operation may be facilitated by reference to the waveforms of FIG. 2. Graph A (solid line) depicts a typical speech waveform. Over a period of time approximating a typical pitch period, for example, l() milliseconds, a peak rectifier is charged to the largest amplitude of the wave that occurred during this predetermined period. The developed peak signal is represented by the broken line Waveform associated with Graph A, At the termination of each predetermined period this maximum voltage signal is transferred to a sample and hold circuit where it is stored during the next period. A stepped wave is thus derived which exactly follows the largest peak during each predetermined sample period, but which lags the Original speech wave by an average of one sample period. Graph B, (solid line), represents this stepped wave. A second circuit configuration operating in the same manner but offset in time by half a sample period develops a similar step wave which is represented by the broken line of Graph B. These two peak tracking waves are combined to obtain an average representative peak signal as illustrated by Graph C. Thus, an average peak tracking wave is obtained which accurately follows the peaks of speech waves, having a pitch period greater than l0 milliseconds, and, still, accurately represents the peak envelope of speech signals for much shorter periods of time, for example, when the speech is that of a woman. Simultaneously, a signal similar to that of waveform C is developed for the opposite polarity of the speech wave. A polarity decision circuit selects the average wave which has the greatest amplitude and this wave is applied, with a slightly delayed version of the original speech signal, to a comparison or slicer circuit. The slicer circuit compares the selected average peak tracking wave with the delayed original wave as indicated in Graph D. Both polarities of the speech wave and the corresponding tracking waves are shown for illustrative purposes. If the amplitude of the speech wave is slightly larger than the selected peak tracking wave a series of pulses, for example, as shown in Graph E, are developed. Generally these pulses will be separated by an interval corresponding to the fundamental pitch of the speech signal.

The operations represented by the graphical depiction of FIG. 2 are embodied in peak detector 20 of FIG. l. The input 10 speech signal is applied to a bandpass filter 11, to limit the band of the signal to a predetermined frequency range, and then in parallel to two peak follower circuits 14 and 15. Follower circuit 14 develops a signal corresponding to the average amplitude of the negative peaks of the speech signal during predetermined intervals of time, as discussed above. Similarly, follower circuit 15 develops a corresponding signal for the positive peaks of the applied speech wave. The operation of a typical peak follower circuit is described in detail hereinafter. Clock circuit 16, which may be of any well-known type, develops periodic singals for activating follower circuits 14 and 15. The output signals of follower circuits 14 and 15 are applied to a Slicer circuit 22 and polarity decision apparatus 21. Apparatus 21, which may comprise an operational amplifier connected as a difference amplifier, develops a signal indicative of which peak tracking signal is of greater magnitude. This indicative signal is applied to slicer circuit 22.

The band limited speech signal, delayed by network 13 and increased in magnitude by amplifier 19, is also applied to slicer circuit 22. The function of the Slicer circuit is to algebraically combine the delayed and slightly amplified speech signal with the selected peak tracking signal of greater magnitude. Circuit 22 may comprise any form of combinational circuit well known to those skilled in the art. An operational amplifier connected to algebraically combine its input signals and half wave rectify the resulting signal has been found satisfactory. Appropriate logic circuits, of any well-known type, responsive to the signals developed by polarity decision apparatus 21 may be used for gating the selected peak tracking wave. Generally, the slicing action should be adjusted to operate between and 90 percent of the peak amplitude of the Speech wave. The resultant pulse signals, depicted in Graph E of FIG. 2, are generally separated by an interval corresponding to the pitch period of the speech wave. These pulses may be directly used as the excitation function in a baseband vocoder or, if so desired, there may be coded and transmitted to a channel vocoder receiver station. The pulses convey no information during unvoiced intervals of the speech wave. Accordingly, if the pulses are to be used as a direct excitation function they must be inhibited during unvoiced intervals. Gate 25, responsive to signals developed by voicedunvoiced detector 30, performs this function. Gate 25 and the conductive paths connected thereto are shown in broken lines to indicate that this mode of operation is an alternative.

Generally, it may be desired to further process the analog pulses developed by slicer circuit 22. Thus, a pulse generator 26 is provided responsive to these analog pulses, for generating digital pulse signals. Pulse generator 26 may be of any type well known to those skilled in the art. A standardized pulse is therefore developed whose leading edge may be accurately defined. These digital pulses are applied to AND gate 29 with the control signals emanating from voiced-unvoiced detector 3i).

Voiced-unvoiced detector 30 develops a signal indicative of the voiced-unvoiced nature of the applied speech signal. The input speech wave is applied to an equalizer 12 which boosts the low frequency components of the wave to compensate for any degradation of such components. The equalized signal is applied to two circuit paths. A first path, which develops a signal representative of the magnitude of the speech wave, comprises a bandpass filter 17, a full wave rectifier 23 and a low-pass filter 27. Voiced sounds normally have la larger absolute amplitude for components in the low frequency range, for example, 200 to 1000 c.p.s. Accordingly, if the speech wave is voiced a signal will be developed at the output of filter 27 which exceeds a predetermined threshold A second path, which comprises limiter 18 and axis-crossing detector 24, develops a signal proportional to the axis-crossing rate of the applied speech signal. The axis-crossing rate for voiced sound will generally fall below that rate corresponding to `an input frequency of 1600 c.p.s. Accordingly, the magnitude of the output signal of detector 24 is a reliable indica tion of the voiced-unvoiced nature of the speech signal. Detector 24 may be any one of the diverse circuits known to those skilled in this art. Certain voiced sound have sufiiciently high frequency components to yield a false indication when based solely on the axis-crossing rate. The signals developed by the first-mentioned path provide a corrective check on the operation of the second path. The output signals of both paths are applied to a logic circuit 28. Logic circuit 28, which may be a conventional OR circuit, develops a signal indicative of a voiced condition whenever the magnitude of the output signal of detector 24 is below a predetermined threshold or when the output signal of low-pass filter 27 is above a predetermined threshold.

This signal, which represents the presence of a voiced interval of the applied speech wave, is lapplied to AND gate 29 along with the pulses emanating from detector 20. In the alternative arrangement discussed above and indicated by the broken lines connecting detector 30 and gate 25, the representative signal of detector 30 is used to control the transmission of the analog pulses developed by slicer circuit 22. The pitch pulses developed by detector 20 .and allowed to pass through AND gate 29, during voiced intervals of the speech signal, are applied to automatic blanking circuit 40.

Generally, if the slicer circuit 22 is adjusted to operate between 80 and 90 percent of the 4peak amplitude of the voiced wave only one pulse per pitch period will be developed. Occasionally, two or more pulses will be developed for each period depending upon the individual pitch period and the first formant frequency of the applied speech wave. By utilizing a blanking interval, following the first pulse of a given period, these unwanted extraneous pulses may be eliminated. The most desirable blanking interval is determined by the speakers pitch period. An average short period for a mans voice is 6 milliseconds and that for a womans voice is 3 milliseconds. Thus, a blanking interval of slightly less than 3 milliseconds will generally eliminate all the undesired multipe peaks.

In automatic blanking circuit 40 of the present invention, two parallel paths are provided for the pitch pulses transmitted by AND gate 29. A first path comprises variable blanking circuit 31 which -automatically adjusts the blanking interval in accordance with the average pitch period, sensed by a second path which includes fixed blanking circuit 32. Both blanking -circuits 31 and 32 may be monostable multivibrators. Circuit 32 has preferably a fixed delay of 3 milliseconds while the delay of circuit 31 is dependent upon the magnitude of the applied control signal. Any of the conventional and well-known circuits of this type may be utilized. Thus, the pitch pulses emanating from AND gate 29 are applied to fixed blanking circuit 32, are integrated by a low-pass filter 34 and applied to automatic blanking control circuit 35. By integrating the blanked pitch pulses over an interval of a few words, a reliable DC voltage proportional to the average pitch of the speech signal is obtained. If a mans voice is followed very quickly by a womans voice, for' example, las in a conference, then the adjustment for a long blanking interval required for the mans voice will have a deleterious effect on the womans voice. The reverse situation, i.e., a rapid change from a womans voice to a mans voice would have little effect on the mans voice except for some small added roughness in the reproduced speech signal. Thus, if there is any doubt, the blanking interval should be rapidly decreased to that for a woman, i.e., 3 milliseconds. Such as arrangement may be embodied in the automatic blanking control circuit 35 by means of a bidirectional charging circuit. Such circuits are well known and may comprise any one of the multitudinous RC networks that have two different charging time constants dependent upon the magnitude of the applied signal, and a unitary discharge time constant. A parallel combination of capacitor and resistor energized by two parallel paths each containing a poled diode and resistor of predetermined value has been found satisfactory. The diodes are selected to conduct in accordance with changes in polarity of the applied control signal. A typical circuit is illustratively shown in FIG. 4. If the pitch pulses change from a short period (3 milli- Seconds) to a longer period (6 milliseconds) the capacitor 48 charges with a time constant of approximately 2.5 seconds. If on the other hand the pitch pulses change from a long period to a short period, the -charging time constant is approximately .0l second. This arrangement allows the voltage on the integrating device 48 to change rapidly in one direction but not in the other. The voltage across the capacitor may be sensed and used to control la bistable amplifier, for example, a summing amplifier that has a Zener diode in its feedback circuit. The magnitude of the control signal, which is indicative of the average pitch period of the voiced signal, is used to control the delay of variable blanking circuit 31. Only two states are used to indicate the desired blanking interval in the present invention. However, as is readily apparent to those skilled in this art, it is possible to make the bl-anking interval continuously variable, responsive to the magnitude of the control signal.

The automatic blanking arrangement will be recognized as a form of a forward-acting automatic pitch tracking system. Since the present pitch derivation systern is particularly useful in obtaining pitch information from a speech wave which does not have the fundamental frequency component present, it operates on a basis of the wave period. If the vocoder were to be designe-d to derive the pitch information from the fundamental frequency component, the tracking arrangement could be used to operate a tracking filter. Because of its positive forward action it cannot get locked up on the wrong component as backward-acting devices can. Power hum also would not affect it as the peak information can be derived from a frequency band well above the power frequency components. Such an arrangement is shown in dotted lines in FIG. l. The input speech signal is applied to a tracking filter 51, of any conventional type. Since the output signal of automatic blanking control circuit 35 is a reliable indication of pitch, it is used to control tracking lter 51. Reliance on the backward-acting systems of the prior art, which utilize frequency meters for developing a control signal, is therefore unnecessary. A signal representative of the fundamental frequency is thus developed by conventional frequency measuring apparatus 52.

Thus, available at the output of variable blanking circuit 31 is a sequence of pulses having a repetition rate corresponding to the fundamental frequency of the applied speech signal. These pulses, after appropriate filtering, closely resemble the damped sinusoid of voiced speech Waves. They therefore may be used as the excitation signal in a voiced excited vocoder. In the conventional channel vocoder, on the other hand, these pulses may be further processed and then transmitted to the receiver synthesizer of the vocoder. Such processing may involve quantization and coding, if so desired.

FIG. 3 illustrates a peak follower circuit used in the apparatus depicted in FIG. l. The speech signal, from bandpass filter 11 of FIG. l is applied, in parallel via conductor 37, to two peak rectifiers 38 and 39. The peak rectiers, which may be of any conventional type, allow only a charge of a predetermined polarity to accumulate on the storage elements thereof. Thus, for example, peak follower 14 of FIG. l will embody peak rectiiiers, eg., 38 and 39, which provide a measure of the amplitude of negative peaks of the speech signal. Clock 16 activates rectiers 38 and 39 via delay elements 41 and 42, respectively. Delay elements 41 and 42 are interposed so as to insure that the sample and hold circuits, 44 and 4S, are activated, prior to the discharge of rectifiers 38 and 39 which are responsive to the signals emanating from elements 41 and 42, respectively. In addition, the pulses activating rectifier 38 and sample and hold circuit 44 are olfset by half a sample period from the pulses activating rectilier 39 and sample and hold circuit 45, as depicted by Graphs F and G of FIG. 2. In this way, two peak tracking signals, indicated by the solid and broken lines of Graph B of FIG. 2, are developed as discussed above. These two peak tracking waves are combined and averaged in circuit 46 so as to develop a tracking signal indicated as waveform C of FIG. 2. A low-pass filter 47 smooths any discontinuous transitions. The filtered signal which, for example, represents the values of the peaks of the negative half of the input speech signal are applied to Slicer circuit 22 of FIG. l. Similarly, an identical configuration to that of FIG. 3 develops a positive peak tracking signal which is applied to slicer circuit 22. These two peak tracking signals are represented by the broken lines of Graph D of FIG. 2.

It is to be understood that the above-described arrangements are merely illustrative of applications of the principles of the invention. Numerous other arrangements may be devised by those skilled in the art without departing from the spirit and scope of the invention. For example, variable resonators, similar to those used in the well-known resonance vocoder, may be incorporated to reduce the peak factor of the excitation signal.

What is claimed is:

1. Pitch information derivation apparatus comprising:

means for developing a signal proportional to the peak envelope wave of an applied speech wave,

means for algebraically combining said proportional signal and said applied speech wave to develop a series of pitch pulses,

means for determining the voiced-unvoiced nature of said speech wave,

and means for selectively transmitting said pitch pulses during voiced intervals of said speech wave.

2. In a vocoder communication system, means for deriving pitch information from a speech Wave which comprises:

means for deriving from said speech wave a first unidirectional signal proportional to the peak amplitudes of one polarity of said speech wave during predetermined intervals of time,

means for deriving from said speech wave a second unidirectional signal proportional to the peak amplitudes of the other polarity of said speech wave during predetermined intervals of time,

means for selecting from said first and second signals the signal of greater magnitude,

means for algebraically combining said selected signal and said speech wave to develop a series of pulses having a repetition rate corresponding to the pitch period of said speech wave,

means for determining the voiced-unvoiced nature of said speech Wave,

and means responsive to said determining means for selectively transmitting said pulses.

3. A vocoder communication system as defined in claim 2 wherein said means for deriving a proportional unidirectional signal comprises:

first means for storing a signal proportional to the peak amplitude of said speech wave during first predetermined intervals of time,

second means for storing a signal proportional to the peak amplitude of said speech Wave during second predetermined intervals of time,

and means for combining the stored signals of said first and second means to develop a composite signal proportional to the peak amplitude of said speech wave.

4. A vocoder communication system as defined in claim 2 wherein said algebraically combining means comprises:

means for subtracting from the amplitude envelope of said speech wave said selected signal.

S. A vocoder communication system as defined in claim 2 wherein said means for determining the voiced-unvoiced nature of said speech wave comprises:

first means for measuring the axis-crossing rate of said speech wave, second means for measuring the energy content of the lower frequency components of said speech wave,

and logic means responsive to said first and second means for developing a signal indicative of the voicedunvoiced nature of said speech signal.

6. A vocoder communication system as defined in claim 2 wherein said means for selectively transmitting said pulses comprises:

means responsive to said voiced-unvoiced determining means for transmitting said pulses during voiced intervals of said speech wave,

means responsive to said transmitted pulses for developing a signal proportional to the average pitch of said speech wave,

and means responsive to said proportional average pitch signal for inhibiting the transmission of said pulses for a predetermined interval of time immediately following the initiation of one of said pulses.

7. The method of deriving pitch information from an applied speech wave comprising the steps of:

developing a signal proportional to the peak amplitudes of one polarity of said speech wave,

algebraically combining said proportional signal and said speech wave to develop a series of peak pulses, determining the voiced-unvoiced nature of said speech wave,

transmitting said peak pulses during voiced intervals of said speech wave,

and deleting each of said pulses occurring during predetermined intervals of time following the initiation of one of said pulses.

8. Pitch information derivation apparatus comprising:

means for developing a signal proportional to the peak envelope wave of an applied speech Wave,

means for algebraically combining said proportional signal and said applied speech wave to develop a series of pitch pulses,

means for determining the voiced-unvoiced nature of said speech wave,

means for selectively transmitting said series of pitch pulses during voiced intervals of said speech wave,

means responsive to said transmitted series of pulses for determining the average pitch period of said speech Wave,

and means responsive to said pitch period determining means for inhibiting said transmitted series of pulses for a predetermined interval of time.

9. The method of deriving pitch information from an applied speech signal comprising the steps of developing a signal proportional to the magnitude of the envelope Wave of said speech signal,

subtracting said proportional signal from said speech signal to develop a series of peak pulses,

determining the voiced-unvoiced nature of said speech signal,

transmitting said peak pulses during voiced intervals of said speech wave,

and measuring the repetition rate of said transmitted pulses.

10. Pitch information derivation apparatus comprising:

means for developing a signal proportional to the peak envelope Wave of an applied speech Wave,

means for algebraically combining said proportional signal and said applied speech Wave to develop a series of pitch pulses,

means for determining the voiced-unvoiced nature of said speech wave,

means for selectively transmitting said pitch pulses during voiced intervals of said speech wave,

means responsive to said transmitted pitch pulses for developing a control signal representative of the average fundamental pitch of said speech wave,

and means responsive to said control signal and said speech Wave for determining the instantaneous fundamental pitch of said speech wave.

References Cited UNITED STATES PATENTS 3,020,344 2/ 1962 Prestigiacomo 179-1 3,321,582 5/1967 Schroeder 179-1 3,381,091 4/1968 Sondhi 179-1 ROBERT L. GRIFFIN, Primary Examiner W. S. FROMMER, Assistant Examiner U.S. Cl. XR.

Patent Citations
Cited PatentFiling datePublication dateApplicantTitle
US3020344 *Dec 27, 1960Feb 6, 1962Bell Telephone Labor IncApparatus for deriving pitch information from a speech wave
US3321582 *Dec 9, 1965May 23, 1967Bell Telephone Labor IncWave analyzer
US3381091 *Jun 1, 1965Apr 30, 1968Bell Telephone Labor IncApparatus for determining the periodicity and aperiodicity of a complex wave
Referenced by
Citing PatentFiling datePublication dateApplicantTitle
US3600516 *Jun 2, 1969Aug 17, 1971IbmVoicing detection and pitch extraction system
US4001505 *Apr 3, 1975Jan 4, 1977Nippon Electric Company, Ltd.Speech signal presence detector
US4783807 *Aug 27, 1984Nov 8, 1988John MarleySystem and method for sound recognition with feature selection synchronized to voice pitch
Classifications
U.S. Classification704/208
International ClassificationG10L25/90
Cooperative ClassificationH05K999/99, G10L25/90
European ClassificationG10L25/90