Search Images Maps Play YouTube News Gmail Drive More »
Sign in
Screen reader users: click this link for accessible mode. Accessible mode has the same essential features but works better with your reader.

Patents

  1. Advanced Patent Search
Publication numberUS3071652 A
Publication typeGrant
Publication dateJan 1, 1963
Filing dateMay 8, 1959
Priority dateMay 8, 1959
Publication numberUS 3071652 A, US 3071652A, US-A-3071652, US3071652 A, US3071652A
InventorsManfred R Schroeder
Original AssigneeBell Telephone Labor Inc
Export CitationBiBTeX, EndNote, RefMan
External Links: USPTO, USPTO Assignment, Espacenet
Time domain vocoder
US 3071652 A
Abstract  available in
Images(3)
Previous page
Next page
Claims  available in
Description  (OCR text may contain errors)

3 Sheets-Sheet 2 Filed May 8. 1959 /NVENT-OR M. RSCHROEDER )J f M70 M5" Jan. 1, 1963 M. R. scHRoEDER 3,071,652 y TIME DOMAIN vocoDER Filed May s. 195e i 3 sheets-sheet s AMPL/T'UDE VUUV luv UV L P/rcH PER/oo 4.- P/rcH PER/oo /NVENrO/e' M. R. .SCHROEDER ByNvwy @Naf ATVORNEV United States Patent O 3,071,652 TIME DOMAlN VOCODER Manfred R. Schroeder, Murray Hill, NJ., assignor to Bell Telephone Laboratories, Incorporated, New York, N.Y., a corporation of New York Filed May 8, 1959, Ser. No. 812,028 11 Claims. (Cl. 179-1555) This invention relates in general to the modification of signals to facilitate their transmission, and particularly to the reduction of their information rates and hence to the compression of their frequency bands. Its principal specific object is to compress the band of frequencies occupied by a telephone message wave. A more general object is to apply new principles to the band compression or other modification of a message wave.

One -well known approach to the band compression problem, exemplified by Dudley Patent 2,151,091, March 2l, 1939, is to divide the entire frequency band occupied by a complex message Wave, eg., a speech wave, into a number of contiguous constituent subbands, to determine the energy in each such subband, and to derive, for each subband, a control signal whose magnitude represents the subband energy. The analysis is performed with a bank of filters to all of which the speech wave is applied in parallel, while a rectifier connected to the output terminal of each filter delivers a signal representative of the energy passing through the lter. The resulting low frequency control signals, after transmission to a receiver station, control the synthesis of artificial speech.

The analysis carried out by the apparatus of the Dudley patent is essentially an analysis according to Fouriers Theorem: It postulates that the speech wave may be represented, to whatever degree of precision may be required, by a harmonic series of components, each of which, by itself, is a pure sinusoid, and thus orthogonal to each of the others. To the extent that this postulate is untrue of the speech or other complex wave undergoing analysis, to that extent the control signals derived by the apparatus fail accurately to represent the original speech wave, and the final synthetic speech fails to duplicate it.

A further difficulty with any frequency-domain approach such as that of the Dudley patent is inherent in the characteristics of the filters that effect the breakdown of the voice into its constituent subbands. Both for physical realizability and to avoid excessive delays each of these filters must have a passband that is by no means negligible: it should occupy a fraction of 1/10 to 1&0 of the entire speech frequency band. Now, while the responses of the several filters of such a group are correct and satisfactory when the several components of the applied speech wave are centered in the several filter passbands, phase considerations make for unsatisfactory performance when the speech wave components lic at or close to the crossover points between adjacent subbands. Performance is still more unsatisfactory while either condition is in the course of changing to the other.

To avoid the difficulties inherent in the frequency-domain approach to voice analysis, both those of principle and those of instrumentation, various proposals have been made to carry out the analysis in the time domain without resort to filters. Both autocorrelation and crosscorrelation techniques have 'been proposed. The autocorrelation technique suffers from the disadvantage that the autocorrelation function of a speech wave is inherently of a quadratic character. Unavoidably, this emphasizes components of large amplitudes at the expense of components of smaller amplitudes and thus, unless this quadratic character be removed, makes for distortion in the reconstructed speech. To remove it, however, presents serious difficulties.

ice

To escape the quadratic distortion imposed by the autocorrelation approach it has already been proposed, notably by W. H. Huggins in A Note On Autocorrelation Analysis of Speech Sounds, published in the Journal of the Acoustical Society of America for` September 1954, vol. 26, page 790, to carry out a cross-correlation -between the speech wave and a new wave consisting of pulses which have amplitudes that are all equal and independent of the original speech intensity but whose epochs are synchronized with the laryngeal pulses creating the original speech. Huggins suggests that the outcome of these operations will be a wave that is free of the foregoing quadratic distortion. Such a wave might then be utilized to control the synthesis of artificial speech through appropriate instrumentation not suggested by Huggins.

To derive the pulse train recommended by Huggins presents serious difficulties. It demands in principle that the commencement of each period of the voice wave be unambiguously identified and that a pulse be generated to mark its inception. This is a decision-making process which it is by no means always possible to carry out with certainty; and if the pulses occur at the wrong instants on the time scale, the result is serious distortion in the reconstructed speech.

Accordingly, it is a principal object of the invention to analyze a speech wave by carrying out a cross-correlation operation of the speech wave with a special reference wave that has two principal properties. First, and in order that the resulting cross-correlation signals shall be first order counterparts of the voice wave in contrast to quadratic counterparts, the reference wave is one of which the frequency spectrum is relatively flat; that is to say its variations with frequency are small compared with those of the spectrum of the speech wave itself. Second, and to avoid the decision-making process inherent in the generation of the pulse train of the Huggins monograph, a reference wave is chosen that is coherent with all, or at least most, of the periodicities of the voice wave, notably the periodicities of its principal formants and its fundamental or pitch period.

Many possibilities arise for a reference Wave having these properties. A first one is the clipped counterpart of the speech Wave itself or of a derivative of the speech wave. The clipped wave may be employed without further processing but, advantageously, the reference wave is itself the first derivative of this clipped wave, preferably after half wave rectification. Another possibility is a train of samples of the speech wave, each taken at the instant at which a peak of the speech Wave occurs. Still another possibility is a train of pulses of uniform amplitudes occurring at the successive peaks of the speech Wave, at its successive zero crossings, or at successive points of the time scale that mark some unambiguously identifiable property of the speech wave.

|Once a reference Wave having the required properties of spectral iiatness and coherence with the speech wave has been selected, it may be generated or derived from the speech wave itself in any desired fashion by straightforward instrumentation in dependence on its other properties. Once the reference wave is thus generated, the cross-correlation operation between it and the original speech Wave may be carried out in various Ways, several such being described below in detail. The result is a `group of control signals that are together representative of the cross-correlation.

These control signals may betransmitted to a receiver station where, because they are representative of the speech wave to the first order and without the quadratic distortion feature inherent in the autocorrelation analysis, they may be utilized directly in controlling the synthesis of artificial speech. Because the individual control signals of this group vary only at syllabic rates the transmission requires a much narrower frequency band than does the transmission of the original speech.

The present invention envisions a further compression of the required transmission band, and that is obtained in the following fashion. The control signals, taken together, determine all of the frequency components required for the reconstruction of the original speech wave but contain little information as to their phases. Now it has long been an established fact that the human ear is entirely insensitive to small phase shifts among the components which together make up a complex sound and, indeed, that large phase shifts, provided they are not excessive, do not affect intelligibility. Hence, an artificial wave having the correct frequency components but bearing little resemblance in form to the original wave is indistinguishable, by the ear, from the original wave itself. One such artificial wave is that in which the phases of the several components have been so shifted that the wave itself is a symmetrical one. Such symmetry means that the second half of the wave is a mirror image of the first half. Accordingly, the invention provides for the development and transmission of the cross-correlation control signals for only one half of each speech wave period and for the generation, locally at the receiver station, of an artificial wave of which the irst half is under control of the transmitted cross-correlation signals while the second half is generated without benefit of additional transmitted information and simply by a repetition of the rst half on an inverted time scale. This time scale inversion is conveniently carried out with the aid of a wave propagation device or delay line that is terminated at one end for complete reflection without change of phase.

The invention will be fully apprehended from the following detailed description of illustrative embodiments thereof taken in connection with the appended drawings in which:

FIG. l is a graph of the absolute magnitude [xl of a quantity x plotted against its algebraic magnitude;

FIG. 2 is a schematic block diagram showing apparatus for analyzing a speech wave to derive cross-correlation control signals;

FIG. 3 is a schematic block diagram showing apparatus alternative to that of FIG. 2;

FIG. 4 is a schematic block diagram of apparatus for developing cross-correlation control signals in approximate form;

FIG. 5 is a schematic block diagram showing apparatus for synthesizing a wave from received cross-correlation control signals;

FIG. 6 is a waveform diagram showing three consecutive periods of a typical speech wave;

FIG. 7 is a graph showing the cross-correlation function of a speech wave with a reference wave selected to endow the correlation function with even symmetry;

FIG. 8 is a diagram of assistance in explaining the operation of the apparatus of FIG. 5; and

FIG. 9 shows two consecutive periods of a synthetic wave developed by the apparatus of FIG. 5.

Before entering upon a detailed description of 4the drawings it is desirable to discuss certain mathematical relations, some of which are instrumented by the apparatus shown in the drawings.

In corelation analysis a signal f(t) to be analyzed is compared, for each of Ia number of different values of lag -r by which it is delayed, either with a reference signal or with an undelayed version of the original signal. In particular, the cross-correlation p12 of two signals f1.0) and f2(t) is given, -for any particular value of r, by

Autocorrelation on is similar, with the sole dierence 4 that one signal is a delayed version of the other. Thus,

1 t-l-T/z n f,f =ft T/2f t .fo-edt 2) In these expressions the integration extends over an interval T symmetrically disposed about the time t.

When the integration interval T is infinite, each of the foregoing expressions is independent of the time t and depends on the lag -r only. When, to the contrary, the interval T is not infinite, each of these expressions depends on the time t as indicated. When the time dependence is significant, the cross-correlation of (l) and the autocorrelation of (2) are of the so-called short term variety.

In what follows We shall deal with short term autocorrelation and cross-correlation, in contrast to the long term correlation functions. For simplicity of notation dependence of the correlation functions on the time t is to be understood.

Further, for simplicity of notation and since each of the foregoing integrals represents a time average, the simpler notation for such time average, namely a superposed bar, will be employed.

With this understanding the autocorrelation given by Equation 2 becomes, for a message wave s(t),

This expression bears a quadratic relation to the wave s(t). This quadratic character can be emphasized by reformulating Equ-ation 3 in accordance with the well known relation Thus,

The quadratic character of the autocorrelation function as given in (3) or (5) makes for certain diiculties in the instrumentation of this function for wave analysis. Consider, however, two further modifications: First, the replacement of each of the two squared terms in the numerator of (5) by its absolute value. This leads to son( wherein the substitution of the symbol A, for the symbol p embraces all of these changes together, and indicates that it depends on the magnitude of e.

In the limit, as e approaches zero, the expression on the right remains formally the same but the left-hand side is now independent of the magnitude of e. Thus,

Mulitiplying numerator and denominator of this expression by the delayed wave s(t-1) yields (To divide an expression containing a limit, by a finite quantity independent of the limit process, inside the limit sign and to multiply by the same quantity outside the limit sign is well established to be a justifiable procedure.)

The limit in the foregoing Expression 9 has the form of a differential quotient. ln particular, it is the differential quotient of the function [xl with respect to x at the point z=s(t), for xeO. As is immediately apparent from FIG. 1, the differential quotient of` Ixl, for values of x other than zero, is simply the algebraic sign of x.

Reference to the Expression 9 itself shows that, for x=x(t)=0, the entire expression vanishes, thus taking account of the single case excepted above. Thus, defining the sign function.

-ll for x Sgn :c: Ofor 06:0 (10) -1 for x 0 (9) becomes }\(1)=s(t) sgn s(t) (11) from which it appears that Mv) is in fact the cross-correlation of the wave s(t) with its own sign function. (It is, of course, of no importance which of the two factors is delayed with respect to the other.)

But the sign function sgn s(t) of Ia speech wave s(t) is simply a formal represent-ation of the infinitely clipped counterpart of the same speech wave. Denoting the clipped counterpart of the wave s(t) by clp s(t), (11) becomes Inasmuch as the amplitudes of the clipped speech wave clp s(r), `or of the sign wave sgn s(t) are always either +1, 0, or -1, its only significant features are its zeros and its spectrum. lts zeros coincide, on the time scale,

.lith those of the original (undelayed) speech wave s(t).

Hence, it is fully coherent with all the periodicities of the speech Wave, notably those of its fundamental and those of its various formants. As for its spectrum, it is relatively flat on the frequency scale; at any rate, much flatter than the spectrum of s(t) itself. For this reason Mt) as defined in (ll) or (l2) bears a first order spectral resemblance to s(t) and the restrictions imposed by the quadratic character ofthe autocorrelation function of (3), (5), and (6) have been escaped from. Moreover, it conrains all the information necessary for the generation, after transmission to a receiver station, of an artificial wave that is fully as intelligible as the original speech wave and differs from the original speech wave in quality only to a minor extent.

It appears, therefore, that what is required, for the satisfactory time scale analysis of a speech wave, is to generate the cross-correlation function between the speech wave itself and a reference wave that (a) is coherent with the speech wave, and (b) has a flat spectrum. The question arises whether many waves lother than the clipped wave may not exist which satisfy these4 requirements and which might, therefore, serve equally well as reference waves to be cross-correlated with the original speech Wave. lt turns out that there are many such. One such wave is the derivative of the clipped wave. This derivative wave has a positive peak for each rising zero of the clipped speech wave and a negative peak for each falling zero. It is thus coherent with the speech wave to precisely the same extent as is the clipped Wave. The brief pulses of which it is composed make for a flat spectrum. At the price of a slight and unimportant reduction in coherence. its spectrum can be rendered flatter still by employment of a half wave rectifier that blocks all the negative pulses. Any reference wave -that satisfies the requirements of spectral flatness and speech wave coherence may conveniently be designated c(t), Wherefore Equations 11 and l2 thus generalized, become At this point it is desirable to distinguish between coarse flatness and fine flatness. The term coarse fiatness may be applied to the envelope of a spectrum while fine atness refers to the density of its components. In the present example, blocking of all the negative pulses, as by a half wave rectifier, reduces the number of pulses occurring in each time interval by a factor 2, and this reduction in the time domain is reflected, in the frequency domain, as an increase in the number of significant spec'- tral components. Thus, given the spectral components of the clipped Wave the spectrum of the half wave rectified pulse train has additional significant components. Thus the spectrum of the half wave rectified pulse train has a greater degree of line scale flatness than does the spectrum of the clipped wave.

In the case of -the clipped speech wave itself and of its derivative, each of these reference waves is coherent with the original speech wave at the zeros of the latter. Presumably, coherence with the speech Wave at successive points of the time scale that mark any unambiguously identifiable property of the speech Wave would serve equally well. 1t has been found that a reference wave having a flat spectrum and being coherent with the original speech Wave at the instants of its peaks, instead of its zeros, is equally ,suitable from the present standpoint and more suitable from a different standpoint, to be discussed below. A train of pulses of like amplitudes, each identifying a positive peak of the speech wave, constitutes a reference wave of this character. It may readily be generated by first differentiating the speech wave, thereby to provide a derivative wave of which the zeros coincide on ythe time scale with the peaks of the original wave, clipping the derivative wave, differentiating the clipped wave and rectifying the differentiated wave. Still better, from the standpoint of fine scale tlatness, is to utilize each pulse of such a train to control the operation of a wave amplitude sampler. The output of the sampler is thus a train of pulses that co-incide on the time scale with the peaks of the speech wave and of which the amplitudes are proportional to the amplitudes of the speech wave at its successive peaks.

Still other reference waves having the required properties of spectral atness and speech wave coherence are possible, some of which will be discussed below.

Referring now to the circuit diagrams, FIG. 2 showsy a system for the `development of a set of cross-correlation control signals by the instrumentation of Equation ll or 12. A speech wave which may be derived, for eX- ample, from a microphone Lis first band-limited as by a filter 2 to meet the requirements of the sampling theorem. It is then applied, as a wave s(t) to the input point of a wave propagation device or delay line 3 of which the output point is terminated in a matched impedance element 4 to prevent reflection. The delay device, which may comprise a plurality of like reactance networks connected in tandem, each having series inductance and shunt capacitance, is provided with a plurality of lateral taps that are numbered, in order, from o to n. Evidently, the wave s(t) reappears in succession at each of these lateral taps, and after a delay determined, in each case, by the length of the delay line 3 from its input point to that lateral tap as indicated on the drawing, for each tap, by the symbol -r with a subscript identifying the tap.

The energy paths extending from the oth tap and from the nth tap are shown in full, similar energy paths extending from the other taps ofthe group being merely indicated. The signal appearing on the nth tap, having the waveform of the input signal s(t) but delayed byMrn, is evidently s(t-rn). This is applied to one input point of a modulator 5.

In a branch path the input wave s(t) is passed through an infinite clipper 6 of any desired construction whose operation, as indicated by its input-output characteristic, is to reduce all positive amplitudes of the input wave to a uniform positive amplitude of -l-l and similarly -to reduce all negative amplitudes of the input wave to a uniform negative amplitude of -1. The output o-f the clipper 6 is thus a clipped speech wave which, by comparison of Equation 11 with Equation 12 may be designated either sgn s(t) or clp s(t). With switches S1, S2 thrown to the positions in which they are shown, the output of the clipper 6 is applied directly to the other input point of the modulator 5. lntermodulation by the modulator of its two input signals results in the development of a complex modulation product wave which may be represented as In the modulation process each frequency component of the rst factor is multiplied by every frequency component of the second factor, components of sum and difference frequencies being thus developed. The higher the frequency of the component, the more it tends to be cancelled out, and the lower its frequency, the more it tends to be preserved, in the time average. The entire complex of components contained in the product is now applied to a low pass filter 7 which, by preserving the low frequency components of the product and blocking its high frequency components, operates to smooth it; i.e., to form its time average, -thus to develop, on the nth output terminal of the apparatus the wave of Equation 11, evaluated for 1:13,; i.e. \(rn). Similarly, by a like multiplication in a modulator 5', of the clipper output with the undelayed speech wave derived from the oth tap of the delay line 3, the signal developed on the uppermost outgoing terminal is MTU).

With the switches S1 and S2 thrown to the positions indicated by the broken lines the direct path is broken and the tandem combination of a ditferentiator 8 and a half wave rectier 9 are inserted between the output point of the clipper 6 and the second input point of the modulator 5. This results in the multiplication of the delayed speech wave s(tr) by a reference wave c(t) consisting of a train of pulses of like amplitudes, each coinciding with a positive-going zero crossing of the clipped speech wave. As explained above, the spectrum of such a pulse train has greater fiatness, both coarse and fine, than does the spectrum of the clipped speech wave.

Provision of a similar modulator and low-pass filter for each of the remaining taps of the group and application of the output of the clipper 6 to the second input points of all of the modulators results in the development of a set of cross-correlation control signals as indicated at the right-hand portion of the drawing, one for each preselected value of the lag T. en supplemented by a period control signal as described below, they contain within themselves all of the information required for the satisfactory reconstruction of a speech Wave that is, for all practical purposes, indistinguishable from the original speech wave.

For synthesis of a satisfactory artificial speech wave Vfrom these cross-correlation signals they must be supplemented with a pitch signal that is indicative of the fundamental pitch of the speech wave. Certain advantages are attained when, in addition, the pitch signal is coherent with the fundamental periods of the speech wave. Apparatus for developing a suitable noncoherent pitch signal is disclosed in application of G. Raisbeck, Serial No. 463,467, filed October 20, 1954, now matured into Patent 2,908,761, granted October 13, 1959. Apparatus for developing a suitable period marker signal is disclosed in an application of B. P. Bogert and W. E. Kock, Serial No. 542,702, filed October 25, 1955, now matured into Patent 2,890,285 granted lune 9, 1959, and also in an application of B. P. Bogert, Serial No. 578,097, filed April 13, 1956, now matured into Patent 2,928,901, granted March 15, 1960. While apparatus of this kind is adequate in principle, it sometimes presents diiculties of instrumentation in practice. To avoid such difficulties, and at the price of a slight increase in the total bandwidth of all the transmitted control signals it is preferred to transmit, without further processing, a narrow subband of the speech wave itself embracing the fundamental frequencies of all speech waves that may be encountered in practice. To this end a band-pass filter 10 is provided, proportioned to pass only the lower portion of the speech range, extending from cycles per second to 350 cycles per second. The output of this filter 10 comprises a wave that is coherent with the major peaks of the successive periods of the speech wave, and it serves, in cooperation with the cross-correlation control signals derived in the fashion described above, to control the synthesis of an artificial speech wave.

The autocorrelation Vfunction of Equation 2 or Equation 3 is symmetric about the origin of time. This symmetry reliects the fact that it contains no information as to the phase relations among the components of the original speech wave from which it is derived.

ln contrast, the cross-correlation function of Equation 1 is in general not symmetric about the origin of time. The same is true of the modied cross-correlation functions of Equations l1, l2 and 13. These can, however, be rendered approximately symmetric about the origin by a proper choice of the reference signal c(t); in particular, by choosing for the reference function one whose pulses coincide in time with the peaks of the speech wave in contrast to its zeros. With this choice the effect of the multiplication and averaging operations called for by Equation 13 is roughly to pile the peaks, that occur in succession on the t scale, one on top of another on the -r scale for 1=0, thus to generate on the vscale a single peak that outweighs all the other peaks in amplitude. This results in imparting approximate symmetry to the cross-correlation function MT). This symmetry has important consequences in making for further reductions in the frequency bandwidth or bit rate required for transmission, as will be fully explained below.

FIG. 3 shows an alternative to the apparatus of FIG. 2 that incorporates the feature of peak coherence of the reference wave. A speech wave derived, for example, from a microphone is applied, after band-limiting by a filter, as a wave s(t) to the input point of a delay line 3 like that of FIG. 2, similarly terminated in a matched impedance load 4 and similarly provided with a plurality of lateral taps that are similarly identied. The energy path from the nth lateral tap is shown in full, similar energy paths from the other taps of the group being merely indicated. The nth tap extends to the first input point of a modulator 15. The wave thus applied to this input point of the modulator is evidently .s0-rn).

A branch path from the microphone 1 extends to a differentiator 11 whose output is applied to an infinite clipper 12 which may be identical with the clipper 6 of FIG. 2. Because each positive peak of the band-limited speech wave s(tl is marked by a negative-going zero of its first derivative the output of the clipper 12 is a rectangular wave that has downward-going discontinuities coincident with the positive peaks of the speech wave and upward-going discontinuities coincident with its negative peaks.

In accordance with this aspect of the invention a reference wave c(t) is generated having a pulse for each downward-going discontinuity of the clipped wave and hence for each upgoing peak of the speech wave. The generation of this reference wave may be instrumented in various ways, for example, with the aid of a diterentiator followed by a half wave rectifier, poled to pass negative pulses and to block positive ones. Alternatively, a monostable or single trip multivibrator 13 may be provided, adjusted to respond, by delivering output pulses of a single preassigned polarity, only to input pulses of negative polarity.

Evidently, merely by poling the half wave rectifier to pass positive pulses and to block negative ones and by correspondingly modifying the single trip multivibrator to respond only to pulses of positive polarity, the output of this unit would comprise a train of pulses each of which is coincident on the time scale with one of the down-going peaks ofthe speech Wave. Because the speech wave is not, in general, entirely symmetric about the time axis, the decay of the envelope of its up-going peaks` may differ from the corresponding decay of the envelope of its down-going peaks. It is preferred to derive the pulse train from that set of speech wave peaks, up-going or down-going, that has the greater envelope decay.

It is also remarked, in passing, that the terms upgoing and down-going refer to the graphic representation of the speech wave, so that the correspondence between the polarities on the graph and the differential air pressures at the mouth of the speaker that they represent is arbitrary.

The Output of this element, c1(t), may if desired be utilized directly and, provided the switch S3 is closed, applied to the second input point of the modulator 15. For the sake of gaining the additional fine fiatness provided by amplitude variations among these pulses it is preferred to employ them as control pulses to operate a sampler 14 which thus delivers, for each such pulse, an amplitude sample of the speech wave applied to its input conduction terminal. The sequence of such samples, designated 02(1), is now applied, provided the switch S3 is open, to the second input point of the modulator 15.

As a refinement, and to accentuate the improvements that result from nonuniform amplitudes of the pulses of the reference wave, the speech wave may be predistorted before application to the conduction terminal of the sampler. The predistorting element 16, which expands the amplitude scale of the speech wave may, for example, have an input-output characteristic that obeys any odd power law; for example, the output may be proportional to the cube of the input.

In the event that this amplitude expansion is employed at the input terminal of the sampler, to avoid overloading the modulator 15 and in order to avoid distortion of the ultimately reconstructed speech, an automatic gain control device 13 having a suitably long time constant is preferably included.

The output of the modulator 15 is passed through a low-pass filter 17 which operates to smooth or average it, thus to develop on the nth output terminal of the apparatus the correlation function of Equation 13 evaluated for 1=1n. Provision of a similar modulator and lowpass filter for each of the remaining taps of the group, and application of the reference wave C?) or c2(t) to the second input points of all of the modulators results in the development of a set of cross-correlation control signals as indicated at the right-hand portion of the drawing, one for each preselectedI value of the lag f. inasmuch as each of these control signals satisfies the requirements of spectral fiatness and coherence, the same requirements are satisfied by the control signals taken as a group; and when as described in connection with FIG. 2 they are supplemented by a period control signal provided by a baseband filter 10, they contain within themselves all of the information required for the synthesis of a satisfactory artificial speech wave that is, for all practical purposes, indistinguishable from the original speech wave.

Returning to Equation 7, it will be recalled that the cross-correlation functions of Equations 11, 12 and 13 Wave.

10 were developed by passing to the limit at'which e approaches zero; Le., by going from equation 7 to Equation 8.

Valuable results, however, may be secured without passing to this limit, by vholding e to a magnitude that is a small quantity but by no means an infinitesimal one; i.e., by the instrumentation of Equation 7. It may, for example, have the value Y10. FIG. 4 shows apparatus by which the approximate correlation function 7 is instrumented. With the switches S4 and S5 closed a speech wave originating, for example, in a microphone 1 passes, after band-limiting by a filter 2, through an attenuator 21 which reduces its amplitude from s(t) to es(t). The speech wave thus attenuated is applied to a delay line 3 like that of FIGS. 2 and 3 and similarly terminated in a matched impedance load 4 and provided with a number of lateral taps designated in the same fashion as those of FIGS. 2 and 3. The energy path extending from the nth tap is shown in full, the others being understood. The wave appearing on the nth tap is evidently es(t1n). This delayed and attenuated wave is added to and subtracted from the original speech wave by an adder 22 and a subtractor 23. The resulting sum and difference waves are converted to their absolute magnitude counterparts by full wave rectifiers 24, 25 and a final subtractor 26 forms the difference between the outputs of the two rectiers. Reference to Equation 7 shows immediately that the output of this final subtractor is equal to ZexEh-n). As above remarked, the factor 2e is merely a scale factor. Since it is common to all of the correlation function outputs it can be disregarded, or compensated by an amplifier, as preferred. The resulting signal, after smoothing by a filter 27, passes to the nth output terminal of the apparatus.

Without significant change in the character of the wave represented by Equation 7, the fourth term may be omitted. This omission, however, reduces the magnitude of the numerator by a factor 2. Accordingly, to preserve the magnitude of the fraction, the denominator may likewise be reduced by a factor 2. These two alterations together lead to It will be noted that Equations 7 and 14 follow the two standard forms for a finite difference quotient.

This change from Equation 7 to Equation 14 results in certain obvious simplifications of instrumentation in FIG. 4.

The set of control signals appearing on the several conductors at the right-hand portion of FIG. 4, whether this figure be constructed to instrument Equation 7 or Equation 14, thus constitute an approximate representation of the cross-correlation function of Equation ll. As in the case of the other figures, they are supplemented by a period control signal which may be developed by a baseband filter 1f?. i

If preferred, the operations of the apparatus of FIG. 4 may be applied to the first derivative of an incoming speech wave or to its second derivative, instead of to the unaltered speech wave itself. To this end, opening of either one of the switches S4, S5, introduces a single differentiator, 28 or 29, into the path from the microphone to the delay line 3, and so applies the first derivative of the speech Wave to the apparatus and opening both switches together introduces both the differentiators 28, 29 and so applies the second derivative of the speech wave to the apparatus. The resulting control signals are therefore representative of the approximate cross-correlation, with a suitable reference wave, of the first or the second derivative of the speech wave, as the case may be. After transmission of these control signals to a receiver station an artificial wave may be synthesized which refiects the significant features of this first or second derivative of the The synthesized wave may then be integrated,

once for the first derivative or twice for the second derivative, to recover a synthetic wave having the same spectral character as the speech wave.

The same refinement may be introduced in the apparatus of FIG. l or that of FIG 3. To avoid complexity of drawings the circuit details have not been shown.

The cross-correlation control signals and the accompanying period control signal are now to be transmitted to a receiver station, there to control the synthesis of an artificial speech wave. If they are representative of an asymmetric cross-correlation function they should do so for each full speech wave period. (Three consecutive periods of a typical speech wave are shown in FIG. 6.) In this event the synthesizing apparatus may be of known character, for example, as described in the aforementioned application of W. E. Kock and B. P. Bogert. To cover the full speech wave period with suicient detail for high quality of the synthetic speech wave requires a` considerable number of such control signals.V Thus, for example, the fundamental or pitch frequency of a bass voice may be as low as 100 cycles per second and this means a fundamental period of 10 milliseconds duration. For adequate reproduction of the harmonic and formant frequencies of such a voice up to and including 3000 cycles per second, the Nyquist sampling rate is 6000 samples per second, which gives a Nyquist period of 1/6 millisecond. The ratio, then, of the fundamental speech wave period to the Nyquist interval is l ms. its ms.

As indicated above, however, proper choice of the reference wave c(t), especially choice of a wave that is coherent with the peaks of the speech wave in contrast to its zeros, makes for symmetry of the cross-correlation function about the origin. Such a cross-correlation function, having even symmetry about the origin, is shown in FlG. 7, where the dots on the right-hand branch represent its discrete values as carried by the control signal channels; i.e., M), M11) MTB), etc. In accordance with one aspect of the invention this symmetry is turned to account in a fashion which permits the reduction of the number of individual control signal channel by a factor 2, i.e., in the present example a reduction from 60 such channels to 30 such channels. Broadly speaking, the manner in which this result is achieved is as follows:

'The cross-correlation being symmetric about the origin it may be reproduced under control of the transmitted control signals, for one half of a fundamental speech wave period and, by a scanning process, converted into a wave in the time domain. The wave thus generated is now reproduced again on an inverted time scale after a delay of one half of the speech wave period, and without benefit of any additional transmitted information. The result of this step is to generate a time wave which is a mirror image of the rst time Wave and occupies the second half of the speech wave period. The result of these two steps, taken together, is a symmetrical time wave that occupies the full speech wave period. This wave is coherent with the original speech wave and has the same spectrum. Hence its intelligibility is no less than that of the original speech wave and its quality differs only slightly. When plotted as a graph its appearance may differ widely from that of the original speech wave which is normally far from symmetrical. This wide difference in appearance is a consequence of the suppression, in the synthesized wave, of all information as to the relative phases of the various components of which the original speech wave is constituted. Since it is a well established fact that the ear is largely insensitive to such phase shifts, at least within a single fundamental period, they are of no moment the reproduction. Thus, the improvement in econonfg." by a factor 2 that results from the use of the symmetric cross-correlation function is purchased at the price of only a negligible amount of degradation of the quality of the synthetic speech.

FIG. 5 shows apparatus by which the foregoing scheme is instrumented. The incoming baseband signal, after first passing through a delay equalizer 30, is applied to the input terminal of an energy source 31 which may comprise a clipper 32 and a single trip multivibrator 33 connected in tandem. The operation of this apparatus is to deliver a train of pulses, one for each up-going Zero of the clipped baseband wave. Provided the baseband pitch signal is above a preassigned amplitude threshold it energizes a relay winding 34 thus to hold the tongue of the relay against its front fixed contact so that the pulses delivered by the single trip multivibrator 33 are applied to one input point of a modulator 35. When, as in the case of unvoiced speech the baseband signal fails, the relay tongue falls to the back contact, thus to deliver to the modulator 35 the output of a noise source 36.

The `several cross-correlation signals, after reaching the receiver station, appear on the several conductors shown at the left-hand margin of the ligure. Each one is identified by the cross-correlation signal that it carries, and these are numbered in order from MTG) to MTN). Hence the signals that these conductors carry individually are proportional to the yamplitudes of the dots in FIG. 7, lso that Ithe signals on all the conductors taken as a group constitute a space pattern that is at every point proportional to the right-hand portion of the cross-correlation curve of FIG. 7.

The energy paths from the third one of the incoming conductors and from the nth one are shown in full. The others are to tbe understood as similar. Thus, the signal Mrn) is applied to the second input point of the nth modulator 35 of which the output terminal is connected to a lateral tap, designated 1n, of a delay line 40 having lateral taps numbered in order from O to N. This delay line 40 may be of the same construction as those shown in the other figures and is similarly terminated at its right-hand end in a matched impedance load 41. Unlike the other delay lines, however, the delay line 40 of FIG. 5 is terminated at its left-hand end for complete reection. When, as with the reference signal employed in FIG. 3, the symmetry of the cross-correlation function about the origin is even (FlG. 7), the reflection should be without change of phase, as by an open circuit as shown in FIG. 5. In contrast, in any case in which the cross-correlation function has odd symmetry about the origin, the reflection should include a phase inversion, as Iby a short circuit.

With this arrangement the pulse output of the nth modulator 35, after entering the delay line 40 at the nth lateral tap, travels in both directions Traveling to the right it reaches the load 41 directly after a delay of rN-q-n. Traveling to the left it reaches the open circuit terminals after a delay of Tn whereupon it is reflected, travels to the right, and reaches the load 41 after a further delay of TN. Thus each pulse entering the delay line `at the nth tap is reproduced in the load twice: once after a delay lrN-Tn and again after a delay TN-f-rn. AS a result, for each pulse reproduced earlier than t=TN, a mirror image of this pulse is reproduced `at a later time, symmetrically located on the time scale with respect to the time tzrN. The fixed delay 1N is of no consequence.

FIG. 8 shows such a. pulse pair for 1=rn and another such pulse pair for 7:73. The two members of each pair are symmetrically disposed on the time scale about the point t=TN. In conformity with FIG. 7, the two pulses of the pair for 1-:1-3 are indicated as being of larger amplitude than the members of the pulse pair for 7:71,. The two selected values r3 and 7 are only representative of the number indicated in FIG. 7 by the dots on the correlation function curve. A complete representation in the fashion of FIG. 8 would include a pulse pair for each such dot, the two members of each pair 13 being of like amplitudes and symmetrically disposed about the point t=rN and would have magnitudes and polarities in proportion to the magntudes and polarities of the dots in FIG. 7.

The operation of the reflecting delay line 40 is thus to `scan the group of incoming cross-correlation conductors, commencing at the highest numbered one and proceeding in one direction to the lowest numbered one, and then immediately -to scan them again in the opposite direction from the lowest numbered one to the highest numbered one, thus to reproduce the cross-correlation appearing on each single one of these conductors as a pair of pulses. All of the pulses picked olf the incoming conductors in the course of the first scan constitute a first time wave portion, and this is immediately followed by a second time wave portion, constituted of all of the pulses picked off in the course of the second scan. These two time wave portions constituted, as they are, of the pulses picked off in the cour'se of the two successive scans, are now 'smoothed as by a low-pass filter 43 proportioned to have its cutoff frequency at about 3000 cycles per second. The pulse train as thus smoothed appears as in FIG. 9 and may now be applied directly to a reproducer 44 which delivers intelligible and natural sounding speech; and this despite the disparity in appearance between the synthetic wave of FIG. 9 and the original speech wave of FIG. 6. As above explained, this disparity in appearance results from the suppression, in the cross-correlation control signals, of all information designating the phase positions, within a fundamental speech wave period, of the several frequency components of which it is constituted. As above indicated, this phase information is of minimal importance to the ear.

While lthe invention has been described las applied to a speech wave, it will be readily apparent to those skilled in the lart that it is of general application to message waves provided only that their statistics are such that the cross-correlation of the message wave with -a suitable reference wave that has a relatively flat spectrum and is coherent with the message wave is a meaningful quantity.

What is claimed is:

1. Apparatus for analyzing a message wave and for developing control signals for use in the reconstitution of an artificial message wave which comprises means for 4 generating, under control of said message wave, an auX- iliary wave having a relatively flat frequency spectrum and being coherent with at least several of the periodicities of the message wave, means for delaying one of said message and auxiliary waves with respect `to the `other by each of a plurality of different lags, modulator means for developing, for each of said lags, a modulation product wave of said delayed wave by 'said undelayed wave, and filter means for smoothing said modul-ation product waves to provide a group of control signals that are together representative of the `significant characteristics of said message wave. Y

2. Apparatus for analyzing a message wave and for developing control signals for use in the reconstitution of an artificial message Wave which comprises means for generating, under control of said message wave, an auxiliary wave consisting of a train of pulses of alternately opposite polarities and coincident in time with all of the several zeroes of said message wave, a rectifier having an input terminal and an output terminal, said input terminal being coupled in tandem with said generating means, said rectifier thus acting to eliminate from said train all of the pulses of said train that are of one polarity without affecting the pulses that are of the other polarity, whereby said train as thus modified is characterized by a relatively flat frequency spectrum and by coherence with the several periodicities of the message wave, means for delaying said message Wave with respect to said modified train by each of a plurality of different lags, a plurality of modulators, one for each of said lags, each modulator having two input terminals and an output terminal, connections extending from the output terminal of said rectifier to one terminal of each of said modulators, connections for applying the variously delayed message Waves to the other input terminals of said modulators, one to each, said modulators thus acting to develop, for each of said lags, a modulation product wave of the delayed message wave by the modified pulse train, and filter means for smoothing said modulation product waves to provide a group of control signals that are together representative of the significant characteristics of said message wave.

3. Apparatus for analyzing a message wave characterized by up-going peaks alternating with down-going peaks and for developing control signals for use in the reconstitution of an artificial message Wave which comprises means for generating, under control of said message wave, an auxiliary wave consisting of a train of pulses that are coincident in time with selected ones of the successive peaks of said message wave, whereby said train is characterized by a relatively flat frequency spectrum and 'oy coherence With at least several of the periodicities of the message wave, means for delaying one of said message and auxiliary waves with respect to the other by each of a plurality of different lags, modulator means for developing, for each of said lags, a modulation product wave of said delayed wave by said undelayed wave, and filter means for smoothing said modulation product waves to provide a group of control signals that are together representative of the significant characteristics of said message wave.

4. Apparatus for analyzing a message wave characterized by up-going peaks alternating with down-going peaks and for developing control signals for use in the reconstitution of an artificial message Wave which comprises means for generating, under control of said message wave, a train of pulses that are coincident in time with the successive peaks of said message Wave that extend in one direction from the message wave axis, means for sampling the amplitudes of said message wave under control of the pulses of said train to provide an auxiliary wave consisting of a peak sample train, whereby said peak sample train is characterized by a relatively flat frequency spectrum and by coherence with the periodicities of the message wave, means for delaying one of said message and auxiliary waves with respect to the other by each of a plurality of different lags, modulator means for developing, for each of said lags, a modulation product wave of said delayed Wave by said undelayed Wave, and filter means for smoothing said modulation product Waves to provide a group of control signals that are together representative of the significant characteristics of said message wave.

5. Apparatus for analyzing a message wave characterized by 11p-going peaks alternating with down-going peaks and for developing control signals for use in the reconstitution of an artificial message wave which comprises means for generating, under control of said message Wave, a train of pulses that are coincident in time with the successive peaks of said message wave that extend in one direction from the message wave axis, means for expanding the amplitude scale of said message wave, means for sampling the expanded amplitudes of said message wave under control of the pulses of said train to provide an auxiliary wave consisting of said expanded peak sample train, whereby said auxiliary wave is characterized by a relatively flat frequency spectrum and by coherence with the periodicities of the message wave, means for delaying one of said message and auxiliary Waves With respect to the other by each of a plurality of different lags, modulator means for developing, for each of said lags, a modulation product wave of said delayed wave by said undelayed wave, and filter means for smoothing said modulation product waves to provide a group of control signals that are together representative of the significant characteristics of said message wave.

6. Apparatus for synthesizing an artificial wave from a plurality of incoming control signals that are together representative of the cross-correlation function of an original message wave with a reference wave which comprises means for presenting said signals as an extended space pattern, means for scanning said space pattern from end to end in one direction to generate a first time wave portion, means for immediately re-scanning said space pattern from end to end in the opposite direction to generate a second time wave portion that is continuous with said first time wave portion, and means for reproducing said first and second time wave portions in sequence.

7. Apparatus for synthesizing an artificial wave from a plurality of incoming control signals that are together representative of the cross-correlation function between an original message wave and a reference wave, each of said control signals being individually representative of said cross-correlation for a single preassigned value of a lag T, said values of 1- increasing monotonically from the value zero for the first of said control signals to the value TN for the last of said control signals, which comprises an elongated wave propagation device having a first end terminated in a matched impedance load and a second end terminated for complete reflection, said device being provided with a plurality of lateral taps equal in number to said incoming control signals, means under control of an incoming pitch signal for generating a train of pulses that are coherent with said original message wave, a plurality of modulators equal in number to said incoming control signals, each of said modulators having two input points and an output point, means for applying said several incoming control signals to the rst input points of said several modulators, one to each, means for applying said pulse train to the second input points of all of said modulators, connections from the output points of the several modulators to the several taps of said propagation device, one to each, and means for reproducing the wave appearing in said load impedance.

8. Apparatus for analyzing a message wave comprising consecutive fundamental periods each of which is divisible into a first half and a second half period to develop narrow band control signals, and for synthesizing an articial message wave from said control signals, which cornprises means for generating, under control of said message wave, an auxiliary -wave consisting of a train of pulses that are coincident in time with peaks of said message wave, whereby said train is characterized by a relatively flat frequency spectrum and by coherence with the several periodicities of the message wave, and the cross correlation of said train with an entire message Wave period is characterized by even symmetry, means for delaying one of said message and auxiliary waves with respect to the other by each of a plurality of different lags that together span the rst half, only, of each fundamental message wave period, modulator means for developing, for each of said lags, a modulation product wave of said delayed Wave by said undelayed wave, lter means for smoothing said modulation product waves to provide a group of control signals that are together representative of the correlation between the first halves of the message wave periods and the auxiliary wave, means for transmitting said control signals to a receiver station and, at said receiver station, means for reconstructing from said control signals a replica, for each message wave period, of said first half period correlation, means for thereupon generating an image, of reversed time-order, of said first half period correlation, and means for reproducing said first half period correlation and said image in immediate succession.

9. Apparatus for analyzing a message wave comprising consecutive fundamental periods each of which is divisible into a first and a second half period to develop narrow band control signals which comprises means for generating, under control of said message wave, an auxiliary wave consisting of a train of pulses that are coincident in time with peaks of said message wave, whereby said train is characterized by a relatively fiat frequency spectrum and by coherence with the several periodicities of the message wave, and the cross correlation of said train with an entire message Wave period is characterized by even symmetry, means for delaying one of said message and auxiliary waves with respect to the other by each of a plurality of different lags that together span the first half, only, of each fundamental message wave period, modulator means for developing, for each of said lags, a modulation product wave of said delayed wave by said undelayed wave, filter means for smoothing said modulation product waves to provide a group of control signals that are together representative of the correlation between the first halves of the message wave periods and the auxiliary wave, means for transmitting said control signals to a receiver station and, at said receiver station, means for synthesizing an artificial message wave from said control signals.

10. Apparatus for analyzing a periodic message wave to develop narrow band control signals and for synthesizing an artificial message wave from said control signals, which comprises means for developing, from each period of said message Wave, a control signal constituted of a first half and a second half and having even symmetry, means for transmitting to a receiver station the first half, only, of each control signal and, at said receiver station, means for reconstructing from said control signals a replica, for each message wave period, of said first half control signal, means for thereupon generating an image, on an inverted time scale, of said first half control signal, and means for reproducing said first half control signal and said image in immediate succession.

11. In a system comprising a transmitter station and a receiver station, transmitter station apparatus for analyzing a message wave comprising consecutive fundamental periods each of which is divisible into a first and a second half period to develop narrow band control signals which comprises means for generating, under control of said message wave, an auxiliary wave consisting of a train of pulses that are coincident in time with peaks of said message wave, whereby said train is characterized by a relatively at frequency spectrum and by coherence with the several periodicities of the message Wave, and the cross correlation of said train with an entire message wave. period is characterized by even symmetry, means for delaying one of said message and auxiliary waves with respect to the other by each of a plurality of different lags that together span the first half, only, of each fundamental message wave period, modulator means for developing, for each of said lags, a modulation product wave of said delayed wave by said undelayed wave, filter means for smoothing said modulation product waves to provide a group of control signals that are together representative of the correlation between the first halves of the message wave periods and the auxiliary wave, means for transmitting said control signals to a receiver station and, at said receiver station, means for synthesizing an artificial message Wave from said control signals which comprises an elongated wave propagation device having a first end terminated in a matched impedance load and a second end terminated for complete reliection, said device being provided with a plurality of lateral taps equal in number to said incoming control signals, means under control of an incoming pitch signal for generating a train of pulses that are coherent with said original message wave, a plurality of modulators equal in number to said incoming control signals, each of said modulators having two input points and an output point, means for applying said several incoming control signals to the first input points of said several modulators, one to each, means for applying said locally generated pulse train to the second input points of all of said modulators, connections from the output points of the several modulators to the several taps of said propaga- References Cited in the le of this patent UNITED STATES PATENTS Dudley Mar. 19, 1940 Oliver Ian. 24, 1956 Dudley et al. Nov. 20, 1956 lFeldman et a1. Nov. 4, 1958 Bogart et al. June 9, 1959 Edson et a1 Sept. 29, 1959 Bogert Mar. 15, 1960

Patent Citations
Cited PatentFiling datePublication dateApplicantTitle
US2194298 *Dec 23, 1937Mar 19, 1940Bell Telephone Labor IncSystem for the artificial production of vocal or other sounds
US2732424 *Apr 13, 1951Jan 24, 1956 oliver
US2771509 *May 25, 1953Nov 20, 1956Bell Telephone Labor IncSynthesis of speech from code signals
US2859405 *Feb 17, 1956Nov 4, 1958Bell Telephone Labor IncDerivation of vocoder pitch signals
US2890285 *Oct 25, 1955Jun 9, 1959Bell Telephone Labor IncNarrow band transmission of speech
US2906955 *Feb 17, 1956Sep 29, 1959Bell Telephone Labor IncDerivation of vocoder pitch signals
US2928901 *Apr 13, 1956Mar 15, 1960Bell Telephone Labor IncTransmission and reconstruction of artificial speech
Referenced by
Citing PatentFiling datePublication dateApplicantTitle
US3135917 *Sep 11, 1961Jun 2, 1964Sanders Associates IncFrequency sensitive wave analyzer including frequency sensing phase shifting means
US3176305 *Feb 1, 1962Mar 30, 1965Jersey Prod Res CoSeismic signal processing system
US3394228 *Jun 3, 1965Jul 23, 1968Bell Telephone Labor IncApparatus for spectral scaling of speech
US3493684 *Jun 15, 1966Feb 3, 1970Bell Telephone Labor IncVocoder employing composite spectrum-channel and pitch analyzer
US3742146 *Oct 20, 1970Jun 26, 1973Nat Res DevVowel recognition apparatus
US3825685 *May 5, 1972Jul 23, 1974Int Standard CorpHelium environment vocoder
US3947638 *Feb 18, 1975Mar 30, 1976The United States Of America As Represented By The Secretary Of The ArmyPitch analyzer using log-tapped delay line
US4034160 *Mar 5, 1976Jul 5, 1977U.S. Philips CorporationSystem for the transmission of speech signals
US4052563 *Oct 7, 1975Oct 4, 1977Nippon Telegraph And Telephone Public CorporationMultiplex speech transmission system with speech analysis-synthesis
US4477925 *Dec 11, 1981Oct 16, 1984Ncr CorporationClipped speech-linear predictive coding speech processor
US4545065 *Apr 28, 1982Oct 1, 1985Xsi General PartnershipExtrema coding signal processing method and apparatus
WO1985000686A1 *Jul 23, 1984Feb 14, 1985Advanced Micro Devices IncApparatus and methods for coding, decoding, analyzing and synthesizing a signal
Classifications
U.S. Classification704/218, 324/76.35, 375/260
International ClassificationG10L19/02, G10L11/00
Cooperative ClassificationH05K999/99, G10L25/00, G10L19/02
European ClassificationG10L19/02, G10L25/00