|Publication number||US3102928 A|
|Publication date||Sep 3, 1963|
|Filing date||Dec 23, 1960|
|Priority date||Dec 23, 1960|
|Publication number||US 3102928 A, US 3102928A, US-A-3102928, US3102928 A, US3102928A|
|Inventors||Schroeder Manfred R|
|Original Assignee||Bell Telephone Labor Inc|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (4), Referenced by (16), Classifications (5)|
|External Links: USPTO, USPTO Assignment, Espacenet|
L M. R. SCHROEDER voCoDER EXCITATION GENERATOR Filed Dec. 25, 1960 SPEECH A LOW PASS F/LTER OUTPUT ZERO AXIS CROSS/NG C OUTPUT PE R/ OD CONTROL D SIGNAL FOR TRANS M/ /ON VOLTAGE CONT ROLLED /MPULSE GEN.
OUTPUT 3 Sheets-Sheet 2 NIHHHHHHIHII ALL PASS OUTPUT COMM U 7A T OP WMWWW f Uuvvvvuv U u F/LTEA) UU UUU Ul/l/ UUU UUU UUV UUUU /Nl/ENTOR M. P. SCHPOEDER u BV C2. s/@JJM ATTORNEV SePt- 3, 1963 M. R. scHRox-:DER 3,102,928
f vocoDER EXCITATION GENERATOR TO SYNTHES/ZEP MODUL/1 T ORS D.C.y
` SUPPLY VOL T4 GE ROLLED MPULSE GE C OMMUTA T OR TO SVNTHES/ZER MODULATORS M. R .SCHROEDER @Y lai/La ATTORNEY United States Patent O 3,102,928 VOCODER EXCITATION GENERATOR Manfred R. Schroeder, Gillette, NJ., assigner to Bell Telephone Laboratories, Incorporated, New York, N.Y., a corporation of New York Filed Dec. 23, 1960, Ser. No. 78,168 s claims. (c1. 179-1) This invention relates generally to the transmission of speech signals over narrow band media by vocoder techniques. More particularly, it relates to the method of and apparatus for generating a complex wave suitable for both periodic and aperiodic excitation of a vocoder synthesizer. Its principal object is to remove the need for voiced-unvoiced switching and the accompanying voiced signal analysis.
Many proposals have been made in the past for reducing the frequency band required for the transmission of speech signals by modifying the voice currents in various ways. Among such proposals a notable one is the channel vocoder of H. W. Dudley Patent 2,151,091 which issued March 21, 1939. In this system an input speech wave is applied to a number of different filters connected in parallel to determine its fundamental frequency or pitch, and the `distribution of amplitudes among a number of frequency sub-bands into which the speech frequency range is divided. The result of this analysis is translated into a number of control currents each representative of the energy in one sub-band. In particular, `one of these control signals represents the fundamental or pitch frequency of the voice. The control currents are transmitted to a receiver station and are there utilized to build up, from local energy sources in a speech synthesizer, an artificial speech wave having the characteristic pitch and amplitude-frequency distribution of the original impressed speech. The synthesizing apparatus at the receiver includes a buzz source and a hiss source to represent the source of voiced and unvoiced sounds, respectively. The incoming control signals, derived at the transmitter station, operate to switch the buzz source and the hiss source into action in alternation, as required, and to adjust the frequency of the buzz source, i.e., to tune it. This energy is applied to the synthesizer network which, in turn, is continuously adjusted by the control signals.
lIn another significant system, known as the resonance vocoder and `described in H. W. Dudley Patent 2,243,527 which issued May 27, 1941 as well as elsewhere, a speech wave is divided into a small number, eg., 3, of comparatively wide bands, each of which embraces a single group of harmonics or formants in which the speech energy tends to be concentrated. The resonance vocoder derives for each such band both a frequency control current and an amplitude control current. At a receiver station, these control currents which occupy much narrower spectrum bands than the voice currents from which they are derived, energize the hiss source or a buzz source in dependence on whether the sound being analyzed is voiced or unvoiced.
In the analyzer portion of both the channel vocoder and the resonance vocoder, an indication both as to the distribution of power among the frequency sub-bands (of either kind) and an indication as to whether a particular sound is voiced or unvoiced is generated for transmission. In the case of the former, the fundamental vocal chord frequency or pitch is also transmitted. In the case of many voice signals, however, particularly when they have been subjected to a certain amount of inevitable frequency `distortion due to transmission through band limited apparatus, the problem of relating the character of the hiss and buzz sources utilized at the receiver synthesizer to the signal is a complex and trying one.
The degree of naturalness and realism attained in the ICC construction of artificial speech depends in large measure on the nature of the excitation signals employed in the synthesis process. Ideally, speech synthesis apparatus requires an excitation signal characterized by a flat spectrum of constant power density and of the proper type, i.e., discrete or continuous. In conventional vocoder apparatus of the sort referred to above, two separate excitation signals are employed; one, the buzz signal, is periodic in nature and the other, the hiss signal, is aperiodic in form. An auxiliary control signal operates to select as between the buzz energy and the hiss energy and a pitch signal adjusts the frequency of oscillation of the buzz signal. A decision element, with its inherent susceptibility to error, is required to make the selection.
In a band compression system described in an application of M. R. Schroeder, Serial No. 774,173, filed November 17, 1958, now Patent 3,030,450, dated April 17, 1962, the problem is effectively solved by a direct and unmodified transmission of a relatively broad band of low frequency components of a voice message and by the transmission of the higher frequency components only by vocoder techniques. At the synthesizer, a single excitation signal is generated that is highly correlated with the voice signal input of the analyzer. The single excitation signal assumes the role both of the hiss source and the buzz source of a conventional vocoder synthesizer. Correlation is preserved by deriving the excitation signal from the wide base band of uncoded voice signals by means of a nonlinear distortion network which effectively spreads the spectrum of the base band to embrace the frequency range of the high band signals. Periodici-ty of the excitation signal is thus automatically preserved together with irregular uctuations of consecutive voice periods which contribute individuality to a voice. In conventional vocoder fashion the energy of the excitation signal is continuously adjusted by the high frequency control signals to produce a replica of the high band voice signals. The reconstructed signals are combined with the unmodified base band signals to produce a composite signal which may be delivered to a reproducer. Although separate hiss and buzz generators and the switching apparatus associated with them are avoided in this system, the over-al-l bandwidth required for transmission necessarily is somewhat greater than required in conventional vocoder apparatus to accommodate a coded pitch control signal.
In laccordance. with the present invention, the abovementioned naturalness problem, involving suitable selection of hiss or buzz in accordance with the voicedunvoiced nature of the signal, is avoided without resort to a departure from -full vocoder operation. As a result, the entire frequency band of the input signal is subjected to narrow band vocoder transmission, i.e., a common wave form for excitation is developed without resort to an uncoded base band of signals. In conventional fashion, a low frequency control channel is employed to accommodate the necessary pitch signal information. Information relating to the voiced-unvoiced condition of the signal is an inherent part of the pitch control signal; means for separately determining this condition are not required.
At the synthesizer the narrow band pitch control signal is employed to adjust a locally generated excitation signal in accordance with the nature of the input signal, i.e., with its voiced-unvoiced condition and if voiced, then in accordance with its fundamental frequency. The effectiveness of this procedure is based, in part `at least, on the realization that ra wave form consisting of many equally strong harmonic components of rando-m phases 'and having a low peak-factor sounds respectable like White, Gaussian noise when the fundamental period is varied in a random fashion whereas the same wave form subjected to periodic variations of the fundamental period maintains its quasi-periodic character. Hence, the same low peak-factor wave may be used both for periodic and -aperiodic excitation. As used throughout this specication, the peak-factor of a wave is taken to be the range of the wave, erg., its peak-to-peak voltage, divided by its r.m.s. value. A low peak-factor wave is, therefore, a wave whose peak-to-peak voltage range divided by its r.m.s. value is low.
It is in accordance with the present invention, therefore, to utilize a pitch control signal developed at the analyzer to control the period length of a low peakfactor excitation signal generated at the synthesizer. All that is required is that the pitch period detector produce a random output for aperiodic inputs, ie., for unvoiced sounds, `and la relatively regular output for periodic outputs, i.e., Ifor voiced sounds. The need for voiced-lunvoiced decision and buzz-hiss switching is thus completely eliminated and, further, vocoder economy is maintained over the entire speech band.
The invention will )be more fully apprehended in the following detailed description of preferred embodiments thereof taken in connection with the appended drawings, in which:
FIG. 1 is |a block schematic `diagram showing speech transmission apparatus illustrating the invention;
FIG. 2 is a `set of curves helpful in the explanation of the invention;
FIG. 3 is a set of wave form diagrams illustrating the characteristics of a low peak-factor wave;
FIG. 4 is a block schematic diagram showing apparatus for generating an .excitation energy sign-a1 in accordance with one embodiment of the invention;
FIG. 5 is a Idiagram of assistance in the exposition of the apparatus of FIG. 4; and
FIG. 6 is a schematic circuit diagram, partially in block schematic form, showing structural `details of a preferred form of excitation energy generator in accordance with the invention: Referring now to the drawings:
FIG. 1 shows a transmission system, including a speech analyzer at a transmitter station and a speech synthesizer at a receiver station, that employs the principles of the invention. Speech currents which may originate in a telephone transmitter instrument 10 are delivered in parallel to apparat-us, shown in the upper portion of the figure, which derives la fundamental period control signal, and to conventional spectrum analyzer apparatus shown in a broken line box in the lower port of the figure.
The apparatus at the top of the iigure serves to isolate the fundamental frequency of the speech signals yand to develop for voiced speech a slowly varying D.C. signal whose magnitude is proportional to the fundamental frequency of the speech signal. During unvoiced speech intervals, the apparatus develops la random but alwlays positive noise-like signal. Low level circuit noise randomly generated in the absence of a strong driving signal is common to most pitch measuring circuits, but is ordinarily treated as 'an unavoidable `defect and, in fact, means lare often taken to mask such noise. In the present invention, the inevitability of such circuit noise is recognized and turned to account in developing a control signal that indicates not only the fundamental frequency of speech currents developed by telephone transmitter 10, but ialso the voiced-unvoiced condition of the speech signals. The resulting signal, which may conveniently be called a period control signal, is transmitted in any convenient manner to the synthesizer at the receiver where it controls the period of the novel excitation signal generator in accordance with the invention.
Although 'any form of pitch determination apparatus well known in the art may -be used, one entirely suitable one is `shown by way of illustration in the upper lefthand portion of FIG. l. It insures that a random noiselike signal is `developed in non-voiced speech intervals. Speech currents `from telephone transmitter 10 are rst Cil passed through a ylow pass filter 11 proportioned to pass components in the range of 50-250 cycles per second, thus to embrace the fundamental frequency of applied speech signals. The 4output of .the filter is connected to an axis crossing counter 12 that supplies an indication in the form of a pulse, or the like, for each zero axis crossing of the applied wave. The pulse output of counter 12 is thus an indication of the frequency of the speech currents in the 50-250 cycle per secon-d band. In unvoiced intervals, i.e., for fricative speech or the like, the counter 12 responds to random circuit noise and generates a sequence of aperiodic pulses. A threshold adjustment may be made to provide a satisfactory random pulse output for these intervals. Alternatively, a Gaussian noise generator l13 may be employed to supply, through low pass filter 14, proportioned to restrict the noise to a band of frequencies in the range, for example, 0 350 cycles per second, and switch 15, a continuous noise voltage at the input of counter 12. With a pass band in filter 14 of 350 cycles, approximately two hundred positive-'going axis crossings are produced per second. This number is suicient to cause a low peak-factor wave at the receiver to sound ostensibly like noise. By suitably adjusting the output level of noise `generator 13, `and if desired, by appropriate use of amplifying fand `attenuating elements (not shown), the magnitude of speech currents passed via filter 11 to the zero crossing `counter 12 m-ay be made considerably `greater than the magnitude of the noise currents that :stimulate a response of counter 12. For example, speech currents of approximately 10` volts peaktopeak and noise currents of l/ 50 volt peak-to-peak are values typically encountered in pitch measuring circuits. With added noise `from generator 13, somewhat better maintenance of a constant ratio is obtained; one that insures that the threshold `of the particular axis counting apparatus is exceeded both for voiced and unvoiced speech currents.
By employing a speech current to noise current ratio of the value given above, the slight shift in the position of zeros of speech currents (during voiced speech) due to the simultaneous presence of noise (and zeros associated with the noise) is held to a minimum. That is to say, during voiced speech, the zeros marked by counter 12 are slightly shifted in time by the `added noise, but because of the wide difference in amplitudes of the two signals, the shift is virtually unnoticeable by a subscriber at the receiver terminal. Actually this shift is present in most pitch measuring apparatus due to random circuit noise. A single trip multivibrator 16 or the like is used to reshape the pulses formed by axis crossing counter 12, and `a low pass filter 17, proportioned to pass components in the frequency range 0-25 cycles per second, is connected in the path to develop a slowly varying direct current whose magnitude is proportional to the fundamental frequency of voiced speech and to `a random but positive function during unvoiced speech. A signal of this sort is used at the synthesizer to control the generation of an excitation wave suitable for the synthesis of both voiced and unvoiced speech.
The development of the period control signal may be more fully understood from a consideration l'of FIG. 2, wherein the curve A shows a representative speech signal for a voiced speech sound at the left-hand portion of the figure and for unvoiced, fricative speech at the right-hand portion. It will be observed that voiced speech is 'a highly periodic signal of average complexity, whereas unvoiced speech is an aperiodic signal of considerable complexity. Curve B shows a low frequency wave that results from passing the signal of curve A through a low pass filter (11). It faithfully preserves an indication of the fundamental presence of the wave of curve A but is considerably less complex, at least in the voiced speech portion. It remains quite complex in the unvoiced portion even though limited `in frequency by the lter. Curve C shows the signals generated by taxis crossing counter 12 that mark each positive-going excursion of the wave of curve B through the Zero axis, and curve D shows the narrow band wave (approximately 0-25 cycles per second) that results from passing the marker pulses of curve C through a low pass filter (17). It will be observed that the wave is relatively smooth and quasi-periodic in the voiced interval, and fairly erratic in ythe unvoiced interval. It is virtually a constant D.-C. for prolonged vowel sounds such as al1 h and a highly irregular signal for fricatives such as a shu sh sound. The signal represented by curve D is used in accordance with the invention for the control of an excitation generation at the synthesizer; it is accordingly transmitted by any desired means, indicated by conductor 18-19, to the receiver station.
Spectrum analyzer 20 (once again considering the apparatus of FIG. l) comprises a bank of band pass filters 21 connected in parallel, each of which is proportioned to pass fa pre-assigned sub-band of the voice frequency band of interest, while together passing the entire band. For the sake of illustration, ten such filters are indicated, the first two and the last one being shown. Each filter is followed by a detector 22 Which is in turn followed by la low pass filter 23. The control current output of each of these several low pass filters 23 is thus a measure of the voice energy in that 'sub-'band to which the corresponding low pass filter is connected.
These spectrum control currents are transmitted by any desired means, indicated by conductors 24 and 25, to a receiver station that includes the conventional synthesizing apparatus shown in the broken line box 30 at the lower righthand portion of the figure. Synthesizer 30 comprises a number of filters 31 having their output terminals connected in parallel to a reproducer 32. 'Iihe seV- eral filters are proportioned to exhibit transmission charlacteristics like those lof the several analyzing filters 21. Modulation networks 33 'are connected between conductors and filters 31. Each modulator 33 is supplied by Way of conductor 34 with locally generated excitation energy, while its transmission characteristic is modulated by the narrow band currents derived at the analyzer station and transmitted over the intervening channels 24-25. Except for the exact nature of the excitation energy supplied to the several modulators, `the spectrum reconstruction apparatus is conventional.
Turning now to the receiver apparatus responsible for the generation of the synthesizer excitation energy (shown in the upper right-hand portion of FIG. 1), period control information generated at the transmitter (curve D of FIG. 2) is supplied by Way of conductor 19 to a voltage sensitive impulse generator 35. Such impulse generators are well known in the art and need no detailed explanation here; typically 1a free running multivibrator tuned by the control signal may be used. Suffice it to say that a train of uniform amplitude pulses is produced by the generator whose repetition rate is a function of the magnitude of the -applied signal. A train of such pulses is shown by way of example in curve E of FIG. 2: during voiced speech portions, a substantially periodic train is produced whose period corresponds to the fundamental period of the original speech; during unvoiced speech portions an aperiodic train is produced.
This train is used, in accordance with the invention, to control the period length of a totally generated low peakfactor signal that is used in place of the conventional hiss and buzz signals. Thus the switching operation as between these twlo sources is completely dispensed with.
FIG. 3 shows, Iby Way of example, two waves; both contain 3l harmonics of substantially equal amplitude but the wave o-f curve A has a high peak-factor; i.e., it represents an impulse typical of those that form, in sufficient numbers `and of the appropriate frequency, the buzz eX- citation of conventional synthesizers. FIG. 3B, on the other hand, shows a low peak-factor Wave that is used in the present invention both for hiss and for buzz excitation. Typically, its peak factor is approximately 4, or
6 3 to 4 times smaller than that of the band limited impulse of FIG. 3A.
Another important property of the low peak-factor wave is turned to account in the present invention, namely, that variations in the Ifundamental period of such a wave radically alter the sound of the wave. In particular, if the fundamental period of a low peak-factor Wave is varied at an essentially periodic rate, the Wave sounds like speech that is slightly inflected. If, to the contrary, the fundamental period is varied in a random fashion, the wave sounds respectably like Gaussian noise. Accordingly, the low peak-factor wave is eminently suitable both for periodic and aperiodic excitation. -In contrast, the wave form of FIG. 3A that has an identical amplitude spectrum but a flat phase spectrum, i.e., a band limited impulse of high peak-factor, does not at all sound like a Gaussian noise .when the fundamental period is similarly jittered at a random rate.
It occasionally happens, of course, that the train of pulses produced during voiced speech is slightly erratic, occasioned, for example, by a sudden chan-ge in the inllection rate of voiced speech. For these cases, the reconstructed voiced speech sounds slightly noise-like, but nevertheless, this occasional slight degradation is exchanged for a complete avoidance of all voiced-unvoiced decision errors and switching apparatus.
It is thus evident that the period control signal developed by 4impulse lgenerator 35, i.e., the train of pulses shown in FIG. 2E, bave the necessary form to control the period of a locally generated low peak-factor wave. In voiced speech periods, a virtually periodic train of pulses yields a slightly inflected low peakafactor wave that repeats at the fundamental period of the original speech. It is shaped in modulators 33, under control of the several spectrum control signals, to form reconstructions of the corresponding sub-bands of voice frequency information. In unvoiced speech intervals, the random train of pulses jitters the low peak-factor wave so that it resembles the form 4of Gaussian noise, and under control of the spectrum control signals reconstructs fricative speech. The train of pulses .from impulse generator 35 are thereupon applied to the control input of wave form generator 36 whose output is the appropriate excitation signal.
A low peak-factor wave suitable for use in the practice of the invention may be generated in a number of ways. For example, the train of control pulses derived from impulse generator 35 may be passed through a passive pulse stretching all-pass filter. FIG. 4 shows a suitable circuit configuration. All-pass filter 40 may be a chain of all-pass sections of the type described by S. Darlington in 29 Bell System Technical Journal 94404, January, 1950. It pre-ferably has a linearly increasing delay characteristic and a quadratic phase characteristic as shown in the curves of FIG. 5. Such a network characteristic effectively transforms a sharp impulse into a low peak- :Eactor wave. If the delay rmx at the highest frequency of the speech band, eig., 3000 cycles per second, is selected to be equal approximately to the reciprocal of the smallest fundamental frequency fomm), i.e., to be 1/ fomin), then the highest frequency is delayed about one pitch period. Thus the applied impulse will be stretched effectively to fill even the longest pitch period interval.
FIG. 2F illustrates the excitation signal derived from the train of period control pulses shown in curve 2E. For each applied pulse, there is produced an interval of low peak-factor signal information. In the voiced interval, the excitation Wave virtually -lls all of the period gaps. In the unvoiced interval, considerable overlap occurs that aids in imparting Ito the signal the character of noise.
A preferred form of excitation wave form generator in accordance with the invention is shown in FIG. 6. By its use, the inexactness off peirod match of the Iformer apparatus is completely avoided. By its use, a low peakfactor wave, eig., that of FIG. 3B, is stretched and compressed in time as required exactly to ill the gap between successive control pulses. To visualize this, it is helpful to think of the wave of FIG. 3B as being printed on a slightly stretched piece of elastic material. As required, the elastic may be further stretched to increase the period length without, however, radically altering the wave shape, or allowed to contract below the bias point at which printing took place, to decrease the period length.
Turning now to FIG. 6, Ithe low peak-factor .generator comprises a source of direct current energy, eig., a battery 60, and a plurality of potentiometers 61 connected across the battery. Selected potentiometers are connected between the positive terminal 62 and an intermediate terminal, e.g., Iground, and other selected potentiometers are connected between the negative terminal 63 and ground. The exact polarities and magnitudes of the several potentials connected to the commutator 65 may be found as follows. The magnitudes of the potentials are obtained by sampling at the Nyquist rate a wave form consisting of, for example, 31 cosine waves of equal amplitude but random phases. One particular choice of polarities which =has been found to lgive an extremely low peak-factor wave yform consisting of 3l cosine harmonics of equal magnitudes is as follows:
where N is the total number of harmonics.
The adjusted potentials available at the potentiometer arms are scanned in order, kfor example, by an electronic comm-utator 65, and the resulting series of positive and negative pulses of selected magnitudes and selected order are passed through a low pass lter 66, proportioned to pass frequencies in the range of -3000 cycles per second, to produce a wave essentially of the form shown in FIG. 2G. For a wave with 3l harmonics, 64 individual pulses are needed to form the wave. Since t-he wave may be, as a matter of convenience, symmetrical about the fundamental, only 32 potentiometers are required. Scanning takes place lfrom 0-31 and without pause from 31-0. That is to say, for symmetric wave forms the number of independent samples is only one-half the number required to generate the general wave form of the same bandwidth.
Electronic commutators are well known in the art; they find their counterpart in electromechanical commutators, one of which may also be used in the practice of the invention if desired. In either case, the commutator is stepped along at a rate proportional to the frequency of the period control pulses generated in impulse generator 35. If the derived wave form is to have a fundamental and 31 harmonics, 64 separate pulses must be generated in the interval between successive control pulses. The commutator is ithus stepped along in 64 equal steps for each control pulse. In practice, the voltage controlled impulse oscillator 35 is adjusted to operate at a frequency 64 times greater than the fundamental frequency. T ypically, a free running multivibrator is used, in which the natural frequency of the multivibrator, tuned by the slowly varying control signal (FIG. 2D) from the analyzer, is adjusted to operate at a rate 64 times that of the fundamental frequency indicated by the applied tuning voltage. An adjustment of the time-constant of the multivibrator is all that is required.
Since successive groups of 64 pulses each from impulse generator 35 are thus keyed to the fundamental frequency of the speech wave during voiced intervals and lto a relatively random frequency during unvoiced intervals, the commutator 65 is stepped along at the proper rate to scan 64 individual taps in the required period and to fill unfailingly the period with a (the same) low peak-factor wave. As pointed out above, regular repetitions of the lolw peak-factor wave is a suitable buzz excitation source, whereas random jittered repetitions of the low peak-factor wave is 4a suitable hiss excitation source.
Thus, in voiced speech intervals, the excitation signal retains the periodic nature of 4the original speech and yields at the output of synthesizer 30 a reconstruction of the voiced speech sound. A typical reconstruction is shown in the left-hand portion of FIG. 2G. In unvoiced speech intervals, the random sequence of pulses from voltage controlled oscillator 35 similarly steps the commutator along to scan the samples, but the resultant, illustrated at the right-hand portion of curve 2G is subjectively (psychoacoustically) virtually indistinguishable from Gaussian noise. When shaped by the spectrum control signals in synthesizer 30, it yields a reconstruction of fricative speech.
While two specific embodiments of fthe invention have been Selected for detailed description, the invention is not, of cou-rse, limited in its application to the embodiments described. Thus, although the commutator method of excitation generation described in connection with FIG. 6 specifies that the phase spectrum of the output is invariant when referred ito harmonic numbers rather than to an absolute frequency scale, the all-pass filter system for excitation generation obtains satisfactory results with an invariant phase spectrum on a fixed frequency scale. In general, either approach is satisfactory, and falls within the scope of the present invention. As a variant of the all-pass lter approach, illustrated in FIGS. 4 and 5, since the synthesizer 30 of FIG. 1 is in itself a (slowly time-varying) linear filter, it follows that all-pass lter 40' may be positioned between the common conductor connecting the outputs of filters 31 and the reproducer 32, equally as well as between oscillator 35 and synthesizer 30 as indicated in FIG. 1. Actually, in practice, this implementation of the all-pass system is somewhat to be preferred because the modulators 33 are somewhat easier to design, e.g., using transistors, if clean pulses from oscillator 35 (FIG. 2E) are used to operate them instead of the complex wave of FIG. 2F.
What is claimed is:
1. In a system for the artificial reconstruction of speech, means for generating a complex wave composed of a plurality of different harmonic components whose phases are selected to establish -a low peak-factor for said complex wave, means for continuously adjusting the fundamental period of said complex wave in accordance with the fundamental period of a speech wave during voiced speech intervals and in accordance with a random schedule during unvoiced speech intervals, a spectrum synthesizer under control of a plurality of spectrum control signals each representative of the speech energy falling within a plurality of frequency sub-bands of said speech wave, ymeans for shaping said complex wave individually in each of a number of parallel paths in accordance with each of said spectrum control signals, respectively, and means for combining all of said complex waves as shaped in said parallel paths to form a reconstruction of said speech wave.
2. In a system for the artificial production of speech sounds, the combination which comprises: a speech analyzer station having; means for deriving from a speech signal a control signal that indicates the fundamental period of voiced speech signals and indicates unvoiced speech signals by aperiodic pulses, and -means for deriving a plurality of spectrum control signals; a reproducer station having, means for locally generating a complex wave characterized by a number of strong harmonic components of random phases and a low peak-factor as an excitation signal, and a spectrum synthesizer; means for transmitting all of said control signals to said reproducer station; means for continuously adjusting the period of said complex wave under control of said control signal; means for applying said adjusted complex wave to said spectrum synthesizer; and means for controlling said spectrum synthesizer under control of said spectrum control signals.
3. An excitation generator for a speech synthesizer comprising, an impulse oscillator whose repetition rate is a function `of the magnitude of an applied voltage, means for applying a pitch period signal to said oscillator to control its repetition rate, said pitch period signal comprising relatively constant voltage portions whose smooth variations are an indication of the fundamental periods of voiced speech signals and relatively erratic portions whose substantial variat-ions are indications of aperiodic intervals of unvoiced speech signals, and means responsive to impulses from said oscillator for generating a wave whose period varies in synchrony with the repetition rate of said oscillator and whose peak-to-peak range as compared with its root means square value is low, whereby the period of said low peak-factor wave varies in accordance with the fundamental period of voiced speech Signals and in accordance with aperiodic intervals representing unvoiced speech signals.
4. An excitation generator for a speech synthesizer comprising, means for converting pitch period signals into sequences of pulses whose repetition rates are (1) proportional to the fundamental period of voiced speech signals and are (2) aperiodic for unvoiced speech signals, a linear all-pass network, said network having a substantially non-linear phase versus frequency characteristic and a linear delay versus frequency characteristic, means for connecting said all-pass network to receive said sequences of pulses, and means for utilizing signals passed through said all-pass network as excitation for the generation of synthetic speech.
5. An excitation generator for a speech synthesizer comprising, means for converting pitch period signals into sequences of pulses whose repetition rates are (l) proportional to the fundamental period of voiced speech signals and are (2) aperiodic for unvoiced speech signals, a spectrum synthesizer under control of a plurality of spectrum control signals each representative of the speech energy falling within a plurality of frequency sub-bands of said speech signal, means for shaping said sequences of pulses individually in accordance with each of said spectrum control signals, means for combining all of said shaped sequences of pulses, a linear filter, said filter having a substantially non-linear phase versus frequency characteristic and a linear delay versus `frequency characteristic, and means for passing said combined sequences of pulses through said filter to form a reconstruction of said speech signal.
6. Apparatus for generating hiss and buzz excitation as required for the construction of artificial speech comprising, means for generating a complex Wave composed of a plurality of different harmonic components of substantially equal magnitudes and random phases and characterized by a low peak-factor, said generating means comprising means for sampling in sequence a plurality of voltage sources of preselected polarities and magnitudes and means for passing said samples through a low-pass network, and means responsive to the fundamental pitch period of a speech signal during voiced speech intervals and responsive to the erratic periods of a noise signal during unvoiced speech intervals for continuously adjusting said sampling rate whereby a period length of said complex wave is made to correspond to said fundamental pitch period during voiced speech intervals and to erratic periods during unvoiced speech intervals.
7. In artificial speech synthesis apparatus, means for generating a complex wave that is relatively periodic during voiced portions of a speech signal and is relatively aperiodic during unvoiced portions of said speech signal that comprises: means for deriving a signal indicative of the fundamental component of voiced portions of said speech signal, means for deriving a signal indicative of unvoiced portions of said speech signal, a source of constant potential having a positive polarity terminal, a negative polarity terminal, and a terminal intermediate said positive and said negative terminals, a plurality of potentiometers, means for connecting said potentiometers respectively between either said positive terminal or said negative terminal and said intermediate terminal in accordance with a first preselected schedule, means for adjusting the movable tap of each of said potentiometers to supply at each of said taps a potential selected in accordance with a second preselected schedule, a commutator having a plurality of input terminals, a single output terminal, and means for successively connecting said input terminals to said output terminal at a rate established by an applied control signal, means for supplying said selected potentials respectively to the input terminals of said commutator, a low-pass filter, means for connecting said output terminal of said commutator to said low-pass filter, and means for altering said cornmutation rate in accordance with said signals indicative respectively of the voiced and unvoiced portions of said speech signal.
8. In a system for the artificial production of speech sounds, means for generating a complex wave composed of a plurality of different harmonic components with non-uniform phases characterized by a restricted range of amplitudes, means for generating a control signal indicative of the fundamental period of a speech signal, means for continuously adjusting the fundamental period of said complex wave under control of said period control signal, a source of a plurality of control signals representative, respectively, of the speech energy falling within a plurality of frequency sub-bands of said speech signal, spectrum synthesizer apparatus under control of said plurality of spectrum control signals, and means for applying said complex wave to said synthesizer.
References Cited in the file of this patent UNITED STATES PATENTS 2,635,146 Steinberg Apr. 14, 1953 2,908,761 Vilbig Oct. 13, 1959 2,928,901 Bogert Mar. l5, 1960 2,928,902 Vilbig Mar. 15, .1960
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US2635146 *||Dec 15, 1949||Apr 14, 1953||Bell Telephone Labor Inc||Speech analyzing and synthesizing communication system|
|US2908761 *||Oct 20, 1954||Oct 13, 1959||Bell Telephone Labor Inc||Voice pitch determination|
|US2928901 *||Apr 13, 1956||Mar 15, 1960||Bell Telephone Labor Inc||Transmission and reconstruction of artificial speech|
|US2928902 *||May 14, 1957||Mar 15, 1960||Friedrich Vilbig||Signal transmission|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US3381093 *||Aug 4, 1965||Apr 30, 1968||Bell Telephone Labor Inc||Speech coding using axis-crossing and amplitude signals|
|US3416080 *||Mar 2, 1965||Dec 10, 1968||Int Standard Electric Corp||Apparatus for the analysis of waveforms|
|US3431355 *||Apr 5, 1965||Mar 4, 1969||Ibm||Device for excitation controlled smoothing of the spectrum-channel signals of a vocoder|
|US3499991 *||Aug 1, 1967||Mar 10, 1970||Philco Ford Corp||Voice-excited vocoder|
|US3553372 *||Oct 18, 1966||Jan 5, 1971||Int Standard Electric Corp||Speech recognition apparatus|
|US3737842 *||Mar 30, 1966||Jun 5, 1973||Us Navy||Feature recognition techniques|
|US3743787 *||Aug 31, 1970||Jul 3, 1973||H Fujisaki||Speech signal transmission systems utilizing a non-linear circuit in the base band channel|
|US3860759 *||Apr 10, 1972||Jan 14, 1975||California Inst Of Techn||Seismic system with data compression|
|US4253374 *||Dec 7, 1978||Mar 3, 1981||Watterman Peter C||Method and apparatus for piano tuning and tempering|
|US4589131 *||Sep 23, 1982||May 13, 1986||Gretag Aktiengesellschaft||Voiced/unvoiced decision using sequential decisions|
|US4815135 *||Jul 9, 1985||Mar 21, 1989||Nec Corporation||Speech signal processor|
|US4937868 *||Jun 9, 1987||Jun 26, 1990||Nec Corporation||Speech analysis-synthesis system using sinusoidal waves|
|US5025221 *||Jun 3, 1981||Jun 18, 1991||Siemens Aktiengesellschaft||Method for measurement of attenuation and distortion by a test object|
|US5471527 *||Dec 2, 1993||Nov 28, 1995||Dsc Communications Corporation||Voice enhancement system and method|
|DE1258910B *||May 24, 1965||Jan 18, 1968||Ibm||Schaltungsanordnung zur Sprachverarbeitung nach dem Kanalvocoderprinzip|
|DE1271203B *||May 24, 1965||Jun 27, 1968||Ibm||Verfahren und Anordnung zur Gewinnung der Anregungsfunktion bei Kanalvocodern|
|U.S. Classification||704/268, 704/208|