Search Images Maps Play YouTube News Gmail Drive More »
Sign in
Screen reader users: click this link for accessible mode. Accessible mode has the same essential features but works better with your reader.

Patents

  1. Advanced Patent Search
Publication numberUS3349183 A
Publication typeGrant
Publication dateOct 24, 1967
Filing dateOct 29, 1963
Priority dateOct 29, 1963
Publication numberUS 3349183 A, US 3349183A, US-A-3349183, US3349183 A, US3349183A
InventorsCampanella Samuel Joseph
Original AssigneeMelpar Inc
Export CitationBiBTeX, EndNote, RefMan
External Links: USPTO, USPTO Assignment, Espacenet
Speech compression system transmitting only coefficients of polynomial representations of phonemes
US 3349183 A
Abstract  available in
Images(5)
Previous page
Next page
Claims  available in
Description  (OCR text may contain errors)

3,349,183 CIENTS Oct. 24, 1967 s. 1. CAMPANELLA SPEECH COMPRESSION SYSTEM TRANSMITTING ONLY COEFF'I OF POLYNOMIAL REPRESENTATIONS OF PHONEMES Filed Oct. 29, 1965 5 Sheelcs-Sheet 2 ,lm 11111 dw llllllll :MIJ IIII m m Q (X hlm 0|# MmrNlw: mm m \x w l vm; Wm N A k w n m 3 L 1x m m A om. M I A X A W H vr W 72+ X nx mi mm; Q M J. A X A WM5 1 s mE @v j .B9 o lllll i@ fffffffff im@ EVIHI 0K X W. EL? vm S ml l Lw W N mGgN-V .I ||I HFILIIUAW h Hw@ IIIII Il vul mmJ L@ m515 cvT LWN@ :1 o :9i \.n\ lllllll II W wm www w? ZOEHEDD mim-Oil .VC ium S mr #35:: Swim |l.+| 1 m 5x25 i 1 53;? lxow {IUM x LL [D LH EL IIlLllll l|||| llllilrlllf.,

Oct. 24, 1967 s. J. CAMPANELLA 3,349,183

SPEECH COMPRESSION SYSTEM TRANSMITTING ONLY COEFFICIENTS AL REPRESENTATIONS OF PHONEMES 5 Sheecs-Sheel 3 v OF POLYNOMI Filed 0G13. 29, 1963 W HH SJOSEPH CAMPAMELLA ATTORNEYS 3,349,183 ICIENTS Oct. 24, 1967 s. J. cAMPANl-:LLA

SPEECH COMPRESSION SYSTEM TRANSMITTING ONLY COEFF AI.- REPRESENTATIONS OF PHONEMES OF' POLYNOMI Filed OCT.. 29, 1963 5 Sheets-Sheet 4 `l i 195: t

fum

OR HM) ifrfT/Z Hw 4mom come@ 58 MOTOR R nn o u T E V mm mi. @mc Sam www wflllwm woj. ff S w mw mm N E B wm V V, fr /l/ U8 a\ f\ I... I 8 H mm ATTORNEYS 3,349,183 CIENTS Oct 24, 1967 s. J. CAMPANELLA SPEECH COMPRESSION SYSTEM TRANSMITTING ONLY COEFFI OF POLYNOMIAL REPRESENTATIONS OF PHONEMES Filed Oct. 29, 1965 5 Sheets-Sheet 5 PO (t) INVENTOR S .JOSEPH CAMPANELLA POLYNOTHAL GENERATOR HUC) *w P2 ATTORNEYS United States Patent Oilice 3,349,183 Patented Oct. 24, 1967 ABSTRACT F THE DISCLQSURE A speech bandwidth compression system in which analog control parameters are developed from an input speech Waveform by a vocal tract analog analyzer and are processed to yield respective steady-state polynominal approximations of the control parameters over time intervals during which one or more of those parameters is continuous, each interval bounded by instants of discontinuity, the polynominal approximations sampled and digitally encoded over time segments which are also digitally encoded and which generally, though not necessarily, coincide with the time intervals over which the polynomial approximations are produced. The digital format is transmitted to a speech regeneration station at which a substantial replica Iof the original speech waveform is reproduced by reconstitution from the steady-state polynomial approximations of the control parameters together with the information-relating to the boundaries or intervals over which those steady-state approximations are valid.

The present invention relates generally to reduced bandwidth transmission systems, eg. speech compression systems, and more particularly to a system wherein polynomial coeliicients are utilized as the primary information source.

Mathematically, any function XU), continuous within the interval TatTb, may be represented over said interval by the polynomial a0+a1t+a2t2+a3t3| -i-ant (l) where ai (On) is the coeicient of the ith power of t. For most physical phenomena, and particularly for speech, where t represents time, the coeicients above a certain order are relatively small so that the occurrence can be accurately represented by the zero through a finite order polynomial.

In the present invention, the Values of a0, a1, a2, a3, or a4 are determined over an interval in which the occurrence being monitored is continuous, e.g. in speech a phoneme. These values, constant over each continuous interval, along with the duration of each continuous interval are transmitted as low frequency information via a communication link to a receiver. At the receiver, a replica ofthe original signal is formed in response to the transmitted data and a prior information concerning the waveshape of each polynomial order as a function of time.

According to one embodiment of the invention, it is assumed that the continuous interval T extends from *rf i.e. the time origin is in the center of the interval during which the function is continuous. With such an assumption, the function, XU), can be written within the interval where X0, X0, in, and .5in are the polynomial coeicients, represented by a0, al, a2, and a3 in Equation 1,

X0 being the value of the function at, t=0, ie., X(0) It can be shown that the coeflicients in Equation 2 can be estimated at time tk in response to the actual signal X(t) being monitored as:

In relationships (3a-3d), the symbol (A) indicates the estimated value of the coeflcient and the bar over a quantity indicates its average value from -T/Z to tk.

Equations 3a-3d are not the most suitable to utilize from an apparatus standpoint to determine the estimator values because -of the necessity for starting the integration, or averaging operation, at t=-T/ 2. To obviate this situation, the time reference is shifted so all integration begins at t=O. It can be shown that the estimator values can thus be represented at the end of the interval where t==T as:

l T X, L Xmas (4a) Generalizing, it can be seen that, for the nth derivative:

a2 T2* EE'LJ'TH* 5 -l-Tm., TIO tX(t)dt+... -l-Tzn T* X(Ddt (4e) where a0, a1, a2 an are constants `andln is any positive integer. By utilizing analog computer techniques, it is possible to solve Equations 4a-4d to derive the estimator values that are derived at the transmitter and supplied to a receiver where the original signal is regenerated.

At the receiver, a signal in accordance with lit., T 3 T 3 fs-i0?) '(t-XnTl (5) is initiated each time an indication that a new continuous function is being received. The duration forwhich the function isco'ntinuous, T, and the estimator values X0,

X0, X0, and )20 are supplied to the regenerator to derive X(t) as Equation 5. Thus, the link between transmitter and receiver may have very low bandwidth since all of the transmitted information is constant within the interval` for which the function is continuous. For speech, the continuous interval, the length of a phoneme, averages about 0.1 second so that a system having a bandwidth between Oand about 20 c.p.s. is sufficient.

According to a further embodiment of the invention, the polynomial coefficients a0, al, a2, a3, and a4 are de.- termined Wit-hin each continuous interval Oft-l' by use, of the weighted summation of orthonormal polynomials. An appropriate and convenient orthonormal polynomial set is the Legendre set wherein:

Generalizing, it can be shown that:

where 11:0, l, 2 and an is as defined in Equation l. It can be shown that for this relationship 1F-HOT nomadi (8) In this embodiment, the values of an are computed at the transmitter in accordance with Equations 6a-6d and 8. The functions 6er-6d are generated by analog computer circuitry employing real time integrators having their capacitors set in accordance with the continuous interval T. This requires the `analyzed function, f(t), to be delayed by the line T prior to being fed to the computer circuitry. When f(t) is initially supplied to the computer, it begins to generate the orthonormal polynomials, `Equations 6a-6d that are multiplied by f(t) in accordance with Equation 8. After the computer has operated for T seconds, its outputs are sampled to derive the polynomial coeicients a0, a4 and its operation is terminated until the next continuous function is supplied to it.

At the receiver is provided a computing function generator, identical to that at the transmitter, The polynomials of Equations 6ft-6d are generated for the time period T in response to an indication that a function of interest is to be processed. IDuring the interval, each of the time varying orthonormal polynomials is multiplied by a constant value, its coefficient a0, a1, a2 In accordance with Equation 7, these products are summed to regenerate the original signal.

As with the first embodiment, the bandwidth of the transmitted signal is extremely low, comprising essentially the time of each continuous interval and each coefficient value, these being constants over the interval. The embodiment employing orthonormal polynomials, however, introduces a ,time delay that does not occur in the initially described estimator apparatus.

It is accordingly an object of the presentinvention to provide a new and improved low bandwidth transmission system.

It is another object of the present `invention to provide a transmission system wherein the coefficients of a polynomial constitute the primary source of information.

A further object is to provide a system for analyzing continuous data to derive from it the coeicients of a polynomial.

An additional object of` the invention is to provide a system for` regenerating a continuous signal in response to the polynomial coefficients representing the signal.

Yet another object is to provide a system for deriving the polynomial coefficients representing continuous data by computing estimates of the coeicients or by utilizing orthonormal polynomials.

Still further objects of the invention 'are to provide systems for regenerating a signal in response to values repersenting coeflicient estimators or by employing orthonormal polynomials.

An additional object is to provide a speech compression system wherein the information in each phonerne, is primarily expressed as a set of polynomial coefiicients and a time duration.

The above and `still further objects, features and advantages of the present invention will become apparent upon consideration of the following detailed description' of several specific embodiments thereof, especially when taken inconjunction with the accompanying drawings, wherein:

FIGURE 1 is a system block diagram;

FIGURE 2 is a circuit diagram of the phoneme boundary detecting network;

FIGURE 3 is a circuit diagram of the analyzing apparatus for one formant in the apparatus of FIGURE 1, according to one embodiment;

FIGURE 4 is a circuit diagram of the regenerator utilized `with FIGURE 3;

FIGURE 5 is a circuit diagram of an orthonormal function generator, as employed in a second embodiment of the invention;

FIGURE 6 illustrates waveforms generated by the apparatus of FIGURE 5; and

FIGURE 7 is a block diagram to show how the function generator of FIGURE 5 is employed with regenerating apparatus.

The present invention is illustrated in conjunction with a speech compression system, where its primary utility is believed to lie. It is to be understood, however, that the principles are applicable to any suitable transmission system wherein it is desired to reduce bandwidth.

Reference is now made to FIGURE 1 wherein source 11 of speech signals is coupled to speech bandwidth compression system 12, preferably of the type disclosed in U.S. Patent 3,078,345. System 12 includes seven dilferent compression units 13-19 for deriving low frequency information, less than 25 c.p.s., indicative of signal 11. Units 13, 13 and 15 derive voltages proportional to the centroids of the rst F1, second F2 and third F3 formants, respectively, while units 16, 17 and 14 generate voltages proportional to the amplitude of the first, second and third formants. It is to be understood that formant ranges are defined in the usual convention as follows:

270 c.p.s. F1 730 c.p.s. 840 c.p.s. F2 2230 c.p.s. 2240 c.p.s. F3 3010 c.p.s.

Hence, the outputs of formant trackers 13-18 represent the amplitude and spectral distribution of energy in speech signal 11 within the frequency bands of the three formants. Over each phoneme or frame in speech signal 11, there is derived from each of trackers 13-18, a continuous function of time. When a phoneme begins or ends, there is a sudden change in the outputs of trackers 13-18, so that the trackers 13-18 outputs may be considered as cornprising a plurality of time sequenced continuous functions between which exist discontinuities, as indicated by waveform 2t).

In addition to formant trackers 13-18, system 12 includes pitch extractor 19. The output of extractor 19 is a constant amplitude negative voltage when an unvoiced utterance or a gap in speech occurs. For voiced speech signals, extractor 19 derives a positive signal proportional to the fundamental pitch frequency of signal 11. 'Ihe output of extractor 19 is supplied to voice, unvoiced silence detector which derives a positive voltage when its input is negative for at least a predetermined time interval, approximately milliseconds, su'icient to indicate a silent period in signal 11. The silent period signal deriving from detector 21 is fed as one input to frame start-stop decision circuit 22, the other input of which is supplied by rate detector 23.

Rate detector 23, responsive to the signal deriving from the first and second formant centroid trackers 13 and 1S, generates a positive output whenever the trackers voltage exceeds a predetermined positive or negative rate of change. Thus, a positive pulse is derived from detector 23 at the beginning and end of each phoneme, as determined by either of the iirst or second formant centroid signals. The outputs of detectors 21 and 23 are combined in decision circuit 22, the output of which is a positive volta-ge over the duration of the phoneme being monitored. The occurrence of a silent period, as determined by detector 21, overrides the phoneme length indication derived from detector 23 so that the output of decision circuit 22 is a relatively accurate measure of phoneme duration. When a phoneme boundary occurs, decision circuit 22 generates a negative voltage; the voltage being a pulse between consecutive, adjacent phonemes and a rectangular wave during a gap of speech signal 11.

The waveform deriving from frame decision circuit 22 is applied in parallel as a control signal to coeicient analyzers 24-29. Analyzers 24-29 are also responsive to the low frequency waveforms deriving from formant trackers 13-18, respectively. In response to these signals,

`each of the coefficient analyzers derives a plurality of voltages representing the polynominal coecients of the input function X(t), thereof, which coefficients are defined by Equation l as a0, a1, Coeicient analyzers 24-29 operate on their input signals from trackers 13-18 only over each phoneme, as determined by decision circuit 22, to generate the polynomial coefficients of the continuous time varying inputs thereto, in a manner described infra. When a phoneme terminates, analyzers 24-29 are restored to their initial conditions and await an indication that a new phoneme occurs.

The plural, coefficient representing, time varying outputs of each analyzer 24-29 are coupled to a bank of samplers and digital coders 31-36, controlled by end of frame decision circuit 37. In response to the negative going voltage deriving frame start-stop decision network 22, circuit 37 generates a short duration pulse for activating samplers 31-36 at the end of each phoneme. When this occurs, the time varying analog outputs of analyzers 24-29 are sampled by networks 31-36 which derive parallel digital signals indicative of the sampled voltages. A further digital signal, indicative of each phoneme duration, is generated by time to digital coder 38. The time duration of each positive voltage swing deriving from decision circuit 22 is translated into an analog voltage by coder 38. The analog voltage is sampled at the end of each phoneme in response to the output of end of frame decision circuit 37 and a digital representation of the sampled voltage is generated by coder 38.

The digital outputs of coders 31-36 and 38 are transmitted over a conventional low bit rate digital transmission link 39 to receiver 41 where the speech signal is reconstituted. The digital signals anent each phoneme are preferably all transmitted in parallel to enable the receiver to simultaneously receive all information without concern about the time duration of each phoneme.

The digital signals from link 39 are supplied to appropriate channels in receiver 41, corresponding with those in the transmitter. The receiver includes six digital to analog converters 42-47 to derive signals proportional to the formant tracker polynominal coeic'rents. As a result, each of converters 42-47 generates a plurality of analog voltages that are replicas of the voltages sampled by coders 31-36. The outputs of converters 42-47 remain constant between adjacent transmission bursts through link 39. When a new burst occurs, the analog outputs of converters 42-47 suddenly change to represent its polynomial coeflicients. A further converter 48 is provided to convert the time indicating digital signal into a variable amplitude analog voltage and a rectangular wave, the duration of which corresponds with the transmitted phoneme length. In response to the leading and trailing edges of the rectangular wave, frame start and stop detectors 51 and S2 respectively generate spaced pulses indicating the beginning and end of a phoneme at the receiver.

While a digital transmission link has been disclosed, it is to be' understood that the analog voltages sampled from analyzers 24-29 at the end of each phoneme and the phoneme duration voltage could be directly transmitted.

The polynomial coeicient outputs of each converter 42-47 are supplied to regenerators 53-58 that are controlled by frame duration, start and stop circuits 48, 51 and 52. As disclosed infra, reach of the regenerators 53- 58 develops a separate time varying output signal that closely approximates the signals deriving from formant trackers 13-18, respectively. For the duration of each phoneme, as determined by networks 48, 51 and 52, regenerators 53-58 derive a plurality of time varying power terms, i.e. t0, t1, t2., each of which is modified in amplitude by the constant coeicient representing signals from trackers 42-47. The modified time varying power terms are linearly combined to provide the regenerators 53-58 outputs.

The outputs of formant regenerators 53-58 are coupled to synthesizer 59, of the type disclosed in U.'S.

Patent No. 3,078,345. Synthesizer 59 combines the inputs thereto to derive an approximate replica of original `speech signal source r11, and the replica is supplied to speaker 61.

Reference is now made to FIGURE 2 of thedrawings, wherein the phoneme duration circuitry designated by boxes 21-23 is illustrated. Rate detector 23 includes a pair of identical circuit 71 and 72, responsive to the signals from centroid formant trackers 13 and 18, respectively and an OR gate 73 responsive to the outputs of circuits 71 and 72. Circuit 71, generates a positive output pulse Whenever the input thereto exceedsa predetermined rate` of change in the positive or negative direction and includes a ditlerentiator comprising capacitor 74 and resistor 75. The dilerentiator output, across resistor 75, is fed in parallel to positive and negative amplitude `sensing Schmitt triggers 76 and 77, each of which generates a positive short duration pulse when the positive and negative voltages respectively applied thereto exceed `a predetermined leveLThe output of triggers 76 and 77 are linearly combined and then fed to OR gate 73.

In operation, let it be assumed that a phoneme begins with a sudden increase in the output voltages of tracker 13.i The ditferentiator generates a voltage that is proportional to the rate of change and is sensed by Schmitt trigger 76. Because the change is sudden, the differentiated input to trigger 76 is sufficient to activate it and cause a pulse to be generated thereby. In response to i the beginning of the same phoneme, the second formant voltage is assumed to change relatively slowly so its differentiator output is not of large enough amplitude to trigger its Schmitt trigger. In consequence, there is derived a pulse from network 71 but not from network 72 at the beginning of the phoneme of interest. The pulse is coupled through OR gate 73 to theinput of bistable llip flop 78 and activates the ip op so that it generatesl a positive voltage.

When the phoneme ends, it is assumed that sudden negative and positive transitions occur in the outputs of trackers 13 and 18, respectively. In consequence, the differentiators in networks 71 and 72' generate negative and positive voltages of relatively great amplitude. The negative voltage generatedacross resistor 75 in network 71 activates Schmitt trigger 77 to produce a positive pulse that is coupled to OR gate 73. Network 72 generates a pulse of the same polarity and approximately at ,the same time in response to its input. The pulses deriving from networks 71 and 72 are of sufcient length and close enough to` each other that, when combined by OR gate 73, they appear as a single pulse, indicative of phoneme end boundary. This pulse is coupled to -ip op 7 S to return it to its low voltage output. Hence, there is derived from ilip flop 78 a rectangular waveform, the duration of which equals the phoneme of interest.

The output of tlip op 78 is normally coupled through inhibit gate 79, the output Iof which correspondswith that of decision network 22. The inhibit terminal of gate 79 is responsive to silence detector 21 so that if the output ipop 78 is erroneously positive during a period between speech utterances, the output of network 22 v nevertheless will be zero, as required.

Silence detector y21 comprises diode 81 havingits cathode connected to the output of pitch extractor 19 and its anode connected to the input of the integrator` comprising resistor 82 and capacitor 83. The `voltage across capacitor 83 is supplied to Schmitt Trigger 84 that develops a positive voltage for inhibiting gate 79` voltage of diode 81 goes negative to reduce the charge on capacitor 83. The time constant of the integrator comprising resistor 82and capacitor 83 and the known off diode 81 and the voltage at terminal 86 quickly restor the trigger 84 input to a level far above its ring voltage.

For silent periods, however, the cathode voltage of diode 81 remains negative for a substantial time period, long enough to charge capacitor 83 negatively to a level that activates trigger 84. When the silent period is over, diode 81 cuts Ioff and the positive voltage across capacitor 83 is' quickly restored because of the relatively low value of resistor 85.

It is thus seen that the output of gate 79 comprises a plurality of `time sequenced rectangular waves, as indicated by waveform 90. When two phonemes occur consecutively, a negative going, short duration pulse 91 is derivedbetween the positive voltage levels while a negative voltage level 92 is generated for the duration of a silence period. The end of frame decision circuit 37, connected to the output of gate 79, senses the negative going voltage swings in wave 90. `This is accomplished merely by connecting a diierentiator comprising capacitor 93 and resistor 94 in cascade with gate 79. The cathode of diode 95 is connected to the dilerentiator output so that only the negative going pulses, generated in response t-o the trailing edge of wave 90,.developed.across it are passed, the positive going pulses being blocked by the rectifier.

Reference is now made to FIGURE 3 of the drawings, wherein` a preferred e-mbodiment of one of the coeliicient analyzers e.g. 24, is disclosed. In responseto the output voltage )20) of formant; tracker 13, the apparauls of FIGURE 3 generates estimators fn, X0, 'C

and 3io, as dened by Equations 3a-3d and as expressed by Equations 4er-4d.

In FIGURE 3, there are illustrated six cascaded analog computer type integrators 100405, the rst of which is responsive to the output of frame decision cir-cuit 22.

`When a new phoneme begins, a positive step voltage is 'is being made. At the end of a phoneme, tk=T, the

phoneme duration. It is to be Vunderstood that the illus trated -single chain of integrators -105 can be utilized 1n comunction with a plurality of coeicient analyzers.

It is to be noted from Equations 4a-4d that each of the estimators is the sum `of la plurality of integrals, each expressed as being proportional t0 )2(1) signal and the output of each multiplier 10G-108 is coupled to a separate -one of integrators 111-114. Because the integrators do not inherently average over` a time interval, but merely accumulate charge in response to their inputs, it is necessary to multiply their outputs by l/T to obtain the expression ott Equation 8. This is accomplished by feeding the integrators 111-114 output voltages to dividers 115-118, respectively. The divisor inputs to dividers 115-118 are responsive to the T output of integrator 101. In consequence, at the end of a phoneme the outputs of dividers 115-118 are analog voltages respectively proportional to:

l T e TL #Xwdt and Tf() prima cation thereof is necessary. The quantity )Ain `approximately represents the Zero orde-r coefiicient, a0, in the polynomial X(t)=a-{a1tl-a2t2+a3t3 that is indicative of the variation of )20) over the phoneme of interest. For )20) variations that are represented by exact mathematical functions, (0=a0, )'l`(0=a1, `)^.(0=a2, and 520:@

To generate the quantity X0, the outputs of dividers 115 and 116 are supplied to the negative (subtracting) and positive (adding) input terminals of amplifier 121 via dividers 124 and 125, respectively. The T and T2 outputs off integrators 100 and 101 are respectively supplied to the divisor inputs of dividers 124 and 125; the former also including a proportionality constant of 1/2. The gain of amplifier 121 is adjusted to have a value of l2 to introduce the common factor indicated in Equation 4b. Thus at the end of a phoneme, the voltage amplitude deriving from amplifier 121 is approximately thecoefiicient a1:

The positive input terminals of summation amplifier 123 are coupled to dividers 128 and 129, the divident inputs of which are responsive to signals from dividers 116 and 118. The outputs of dividers 115 and 117 are supplied to the negative inputs of summer 123 via dividers 131 and 132. The divisor inputs to quotient networks 131, 128, 132 and 129, proportional to T3, T4, T5 and T6, are derived from the outputs of integrators 102-105. Dividers 128, 131 and 132 as well as amplifier 123 introduce the constant term l/20, 3/5, 3/2 and 16800 of Equation 4d so that the amplifier output is a voltage LT t2?? (Min As indicated supra, the outputs of amplifiers 121-123 are sampled at the end of a phoneme, just prior to discharging the integrators in the circuits. The sampled voltages represent the polynomial coefficients and are transmitted as nonvarying voltages over the phoneme duration, during which X(t) may be considered as continuous.

The nonvarying voltages are transmitted through digital data link 39 are decoded and supplied to regenerator 42 for the first formant centroid. The apparatus of regenerator 42 is illustrated in FIGURE 4 a-s comprising polynomial term function generator 141, multipliers 142- 144 and summing amplifier 145. It is to be understood that a single function generator 141 may be common to all of regenerators 53-58 and that it is illustrated as shown for simplicity.

Function generator 141 comprises a linear saw tooth generator 151, the sweep orf which is initiated by a trigger pulse from frame start detector 51, and Whose trailing edge is generated in response to a pulse from frame stop detector 52. Hence, the output of generator 151 is a voltage, proportional in amplitude to the instant of time being examined from the beginning of a phoneme, At the endof a phoneme, the saw tooth attains a value representing T.

To change the time base of the Iregenerator to have the same origin as that -of the transmitter the values of t must -be modified in accordance with phoneme length, T. It will be recalled from the introduction that in the original equation for X(tk), the time system origin t=0 is assumed to bisect the continuous or phoneme interval, T.

The estimator values fio, X0, X0 and X0 are computed upon this assumption. To provide the regenerator with a real time zero origin where the middle of the phoneme interval occurs at t=T/ 2, not t=0, as in the transmitter, it is necessary to modify the time varying terms t, t2 and t3 by constants proportional to T. In consequence, the first, second and third order time varying terms derived from generator 141 are represented as:

and

not as t, t2 and t3.

To derive these time varying terms, the analog phoneme duration voltage, T, generated by ltrarne duration con verterY 42 is coupled in parallel to constant amplitude -multiplier 152 and squarer 153. Multiplier 151 halves the summation amplifier 155. The output of amplifier 155 is squared insquarer 156 to derive the voltage (eff To generate a signal level proportional to T 3 (tra the outputs of amplifier 155 and squarer 156 are combined in signal multiplier 157. The

T (tra output of amplifier 155 is also multiplied by the 3'1'2/ 20 signal deriving from circuit 154 in network 158 that generates the second term in the third order time varying polynomial.

The

T 2 (t-t and rl`2/12 outputs of circuits 156 and 153 are subtracted in summing amplifier 159, the outputof which represents the time `varying second order polynomial. T o generate the third order polynomial, amplifier 161 subtracts the output of multiplier 158 from that of multiplier 157.

The time varying signals deriving from amplifiers 155, 159 and 161 thus represent the first, second and third order polynomial power terms. These terms are multiplied by therst,^ second and third order coefficients, repre- To provide an indication that the U,

42 output is a replica of the *X(t) input to analyzer 24,

consider the simple example where )20) is assumed to be (t+1) over the phoneme duration T. It can be shown by substituting (t+1) for )20) into Equations 4a-4d that Substituting the values indicated by Equations `9-12 into regenerator Equation gives:

Simplifying, X (t)=1+t, the orignally assumed function X(t). From this simple example, it is seen that 'for any X0, X'o and i values represent the coefficients and that the regenerator function accurately represented by the sum of a polynomial, `there is no theoretical error in the analyzing and regenerator apparatus. Since most functions of interest can be so represented, the system accuracy is quite good.

Reference is now m-ade to FIGURE 5 of the drawings wherein an orthonormal polynomial function generator utilized in a second embodiment of the invention is illustrated. The same basic orthonormal function generator is common to analyzers 24-29 and regenerators 53-58 of FIGURE 1.

The nature of the function generator requires that computation occurafter the phoneme of interest has terminated. This implies that the operation of the function generator for the phoneme of interest start after that phoneme has terminated.

To accomplish thisresult at the transmitter, the amplitude varying, phoneme duration voltage generated and sampled by coder 38 is suppliedin parallel to voltage controlled variable delay elements 181, one delay element being provided between each of trackers 13-'13 and analyzers 24-29. Elements 181 delay, by a phoneme length T, the outputs of trackers 13-18 supplied to analyzers 24-29 (but not to detectors 21 and 23) and, the rectangular wave output of decision circuit 22 coupled to analyzers 24-29 and end of frame decisioncircuit 37. Thus all of the input voltages to analyzers 24-29` occurafter the phoneme of interest has terminated. lust after the phoneme ends, the time indicating variable voltage from coder 38 is coupled to motor 18.2.` In response to this voltage, motor 182 is quickly rotated by an Vamount proportional to T, the rotation being completed prior vto the beginning of the computation cycle.

At the receiver, the outputs of detectors 51 and 52 are applied to a saw tooth generator, such as shown by 151 in FIGURE 4, the output of which is sampled under the control of frame stop detector 51. The outputs of converters 42-48 are delayed in time under the con-` trolof the previously described circuit by variable delay elements 181. To provide the unit step voltage necessary to activate the functiongenerator, outputpulses from start and stop detectors 51 and 52 are fed to a ip-op. The rectangular wave deriving from the Hip-flop is applied through one of variable delay elements 181 and is indicated by the lead denoted H.(t).

The apparatus of FIGURE 5 generates, over the duration of a phoneme, time varying voltages expressed by Equations 6ft-6e. The curves for the functions defined by the equations, assuming a unit step function for P00) in the interval OtT, are indicated in FIGURE` 6. An observation of the curves on FIGURE 6 reveals that P00) is a constant voltage having unit amplitude while Pl(t) is a straight line crossing the t axis at T/ 2. P20) is a parabola, the axis of which is on the line t=T/2. P3(t) is a cubic function crossing the taxis at T/ 2 while P4(t) is a quartic having its axis at z=T/2.

At the transmitter, the integrated output of each centroid tracker- 13-18 multiplies with these curves and the resulting voltages are sampled at the end of the function generating cycle, i.e. at t=T for the function generator, to derive the coefficients. At the receiventthe time varying signals defined by Equations 6a-6e are generated, in response to a unit step function input. They are multiplied by the transmitted coefficients to derive each term of the polynomial.

To generate the curves defined by Equations 6a-6e and illustrated in FIGURE 6, the function generator includes a plurality of cascaded integrators 18S-186; Each of integrators 18S-186 includes a separate variable capacitor 187-190 in the feed back loop of an analog cornputer operational amplifiers. Each of the capacitors is driven by motor 182 to a value proportional to T/Z. This causes the output of each integrator, over a phoneme, to be a factor of the constant 2/ T. One input of integrators 1844186 is supplied by a rectangular voltage from H(t) from one of the elements 181 having a positive value representing one during a phoneme computation interval. Input resistors 192-195 to integrators 184-186 are of appropriate values to introduce the required scaling factors set forth by Equations 6a-6e.

When the circuit of FIGURE 5 is utilized at the transmitter, the time averaged output of one formant tracker, e.g. tracker 13, over one phoneme,`

fffmdr is supplied to the `input of the amplifier in integrator 183 via integrator 202 (indicated in phantom) and resistor 195 from tracker 13. The output of integrator 183 at any time t is 13 i; 1 t L f(e)dt)dt or the double integral of the signal deriving from tracker 13. The integration operation continues all down the line so that signals including the third, fourth and fifth integrals are derived from integrators 184-186, respectively. Thus, in considering the relationship between Equations 6ft-6e and 8, it becomes apparent that the coefiicient terms defined by the latter include fifth order integrals.

When the function generator is utilized at the receiver, the input to integrator 183 amplifier is the rectangular wave, H), integrator 202 being omitted. As a result, the outputs of integrators 18S-186 only represent terms up to the fourth integral of a constant, i.e., the output of integrator 186 includes a term varying as a function of t4.

To generate the P00) term, it is merely necessary to monitor the voltage applied to resistor 195. The P) and P) terms are derived by combining the outputs of integrators 183 land 184 with the rectangular wave H0) in separate summing amplifiers 196 and 197. To provide the proper coefiicients for P10) and P20), the ratio of feedback to input resistors for amplifier 196 is l/\/3 while that for amplifier 197 is 1/\/5. The P30) term is generated by combining the outputs of integrators 183 and 18S with H0) in summation amplifier 198. The P40) term is similarly derived by summing amplifier 199 in response to H0) and the voltages from amplifiers 184 and 186. To provide for the f P2( t) dt term in Equation 6e, representing P40), the output of integrator 183 is applied as one of the inputs to integrator 186 via scaling resistor 201. To provide appropriate scaling for P) and P40) the feedback resistors thereof have values l/\/ and 1/\/9 times the respective input resistors. It is to be noted that the outputs of amplifiers 197 and 199 are negative to properly represent the P20) and P40) terms. This phase reversal is inherent in the operation of each integrator and is desirable because of the ease with which H0) is combined with the alternate polarity outputs of integrators 18S-186. Prior to transmission, however, the transmitted coefiicients are all converted to the same phase.

Reference is now made to FIGURE 7 of the drawings, a block diagram of the receiver regenerator, wherein the polynomial function generator illustrated in FIGURE 5 is shown by block 211. As indicated supra, phoneme duration T and rectangular waveform H0) are supplied to generator 211 to cause it to derive the P00)-P40) waveforms indicated by Equations 6ft-6e and FIGURE 6 over the length T of a phoneme.

The time varying polynomial representing potentials are coupled through variable resistors 212-216 to the input of summing amplifier 217. The values of resistors 212-216 are maintained constant over each phoneme duration at a value proportional to A11-A4, the coef-licient values sampled at the end of each phoneme at the transmitter. This is accomplished by controlling the values of resistances 212-216 by motors in response to the outputs of digital to analog convetrers 222-226 to which are supplied signals indicative of A11-A4. Converting elements 222-226 are included in only one of the converter blocks on FIGURE l, e.g. block 42. The P00)-P40) time varying outputs of generator 211 are thus multiplied by A0-A4 by means of resistors 212-216 to derive the a0, alt, a2z2-iterms. These terms are added by amplifier, the .output of which approximately represents f0). It is to be noted that each of A11-A4 and P00)-P4,0) is assumed to be of normal polarity.

To provide a simple example of how this embodiment operates, consider that centroid tracker 13 generates a wave of constant unity amplitude over the entire phoneme duration. This causes the input to integrator 183 to vary as a linear function of time over the interval 0 t T so that P00) may be considered to be of the form bo-I-b1t. The saw tooth wave is supplied to integrator 183, the output of which is proportional to c1t+c2t2. The co term in P10) is inserted into the output of amplifier integrator 183 by the addition of H0) in amplifier 196 so P10) is represented by a polynomial including the zero, first and second order polynomials.

Under the assumed conditions, the P20) output of amplifier 197 is of the form d0+d1ttld2t2+d3t3 because the output of integrator 183 is integrated by circuit 184 to derive terms proportional to t2 and t3. The t term is derived by supplying the constant H0) voltage through integrator 184 via resistor 184 while the constant term is introduced by H0) being supplied directly to the input of amplifier 197 via coupling resistor 201.

In a similar manner, the P30) and P40) outputs of amplifiers 198 and 199 are representative of the polynomials respectively, under the assumed conditions. The time varying polynomials at the terminals designated P00)- 140) are sampled at the end of the phoneme frame, when t=T, to derive the coefficients A0, A1, A2, A3 and A4 that are transmitted to the receiver and set into resistors 212-216, FIGURE 7. After the outputs vof amplifiers 196-199 are sampled, capacitors 187-190 are discharged so a new function generator cycle can be initiated.

The values of the coefficients set into resistors 212-216 are multiplied by the respective polynomial wave shapes derived by generator 211 and illustrated in FIGURE 6. The coefficient values A1, A2, A3 and A1 modify the P1, P2, P3 and P4 signals such that all time varying terms in A1P1+A2P2+A3P3+P4A4 rare zero over the entire frame of interest. Hence, the output of amplifier 217 is a constant for the entire phoneme duration.

While I have described and illustrated several specific embodiments of my invention, it will be clear that variations of the details of construction which are specifically illustrated and described may be resorted to without departing from the true spirit and scope of the invention as defined in the appended claims.

I claim:

1. In a communication link, the combination comprising means responsive to a time varying signal for a deriving constant valued approximations thereof only over time intervals during which said signal is substantially continuous, each time interval bounded by instants of discontinuity, said approximations comprising coeflicients of terms of a polynomial function of time representative approximately of said signal, and means responsive to said constant valued approximations for regenerating said signal.

2. The combination according to claim 1 wherein said means for deriving and said means for regenerating each include separate orthonormal function generators, said deriving means function generators for producing an orthonormal set of polynomial functions of time wherein said coefficients are estimated as functions of the average value of said signal over at least a portion of a time interval during which said signal is substantially continuous, and said regenerating means function generators for reproducing said orthonormal set of polynomial functions for amplitude scaling by said coeiiicients.

3. A speech compression system comprising a source of speech, means responsive to said speech for deriving indications of the beginning and end of a phoneme, means responsive to said indications and said speech for computing the coeflicients of the polynomial representing the speech variation over each phoneme, a receiver, and means for transmitting said coefiicients and said indications to said receiver, said receiver including means responsive to 15 said coeflicents and indications for regenerating the signal deriving from the speech source.

4. The system of claim 3 wherein said means for transmitting includes means. for digitally transmitting said coefficients.

5. The system of claim 3 wherein said means for deriving includes means for generating pulses in response to discontinuities in said speech at said beginning and end of a phoneme, and wherein said means for computing and said means for regenerating each include separate orthonormal function generators for producing an orthonormal polynomial yset for combination with the computed coeicients to provide an approximate replica of said speech. 6. In combination, a source of signal continuous within a series of time intervals each bounded by periods of discontinuity of said signal, and represented within each said interval by a polynomial function of time of the form f(t)=a|a1t+a2l2{- -i-aktk, and means responsive to said signal for computing the values of the coefficients a0, a1, a2 ak during each said time interval to provide steady-state approximations of said signal for respective ones of said time intervals, and means for digitally encoding said coefficient values yand information relating to the respective time intervals during which said coeiiicient values were computed.

7. Apparatus for analyzing speech, comprising means responsive to incoming speech for deriving therefrom analog time-varying control parameters including frequency and amplitude of the speech formants,

means responsive to at least some of said control parameters for developing respective time dependent polynomial approximations thereto over time intervals during which said parameters are substantially continuous, bounded by instants of discontinuity of one or more of said parameters,

means for encoding said developed polynomial approximations in digital format for transmission thereof, and

time base extracting means for actuating said digital encoding means in accordance with time segments derived from atleast one of said control parameters, 8.`The combination according to claim 7 wherein is included means for digitally encoding information relating. to said time segments for transmission with said digitall format.

9. The combination according to claim 8 wherein said time segments coincide with said time interval.

10. The combination according to claim 7 wherein the respective polynomial approximations are coecients of polynomial functions of time of the form wherein X(t) is a respective control parameter of the speech being analyzed, a0, a1, a2, anv are said coeicients, n is any positive integer greater than 2, and t represents time.

11. The combination according to claim 10 wherein said ypolynomial approximation developing means comprises means for estimating said polynomial coefficients.

12. A speech compression system comprising the speech analyzing apparatus of claim 8, and meansfor regenerating a substantial replica of the original speech from the transmitted digital data; said regenerating means including means responsive to the transmitted digital` data for conversion of said digitally encoded polynomial approximations to analog signals corresponding to said polynomial approximations over respective ones of said time intervals,

means responsive to the last-named analog signals for reproducing respective ones of said control parameters therefrom, and

means responsive to reproduced control parameters for combination thereof to provide said substantial replica of the original speech.

13. The combination according to claim 8 wherein said polynomial approximation developing means comprises means for generating amplitude scaling coeflicients for a set of Orthonormal polynomials, vwherein said coeicients are estimated as functions of the respective average values of said control parameters over at least a portion of a time interval during which said control parameters are substantially continuous.

14. The combination according to claim 13 wherein said polynomial approximation developing means further includes means for delaying the derived control parameters for a period of time corresponding to a respective one of said time intervals, and

means for applying said delayed parameters to said amplitude scaling coeicient generating means.

15. A speech compression system comprising the speech analyzing apparatus according to claim 14, and

means .for regenerating a substantial replica of the original speech from the digital informationtransmitted from said analyzerysaid regenerating means including function generating means for producing said set of orthonormal polynomials,

means for weighting said orthonormal polynomials with said amplitude scaling coecients by multiplication thereof, and

means for summing the weighted orthonormal polynomials over respective ones of said time intervals.

References Cited UNITED STATES PATENTS 3,234,332 2/1966 Belar 179-1 3,247,322 4/1966 Savage et al. 179-1 3,261,916 7/l966 Bakis 179--1 OTHER REFERENCES H. I. Manley, Analysis-Synthesis of Connected Speech in Terms'of Orthogonalized Exponentially Damped Sinusoids, in the Journal of the Acoustical Society of America, vol. 35, No. 4, April 1963, pp. 464-473.

IOHNW. CALDWELL, Primary Examiner.

I. T. STRATMAN, Assistant Examiner.

Patent Citations
Cited PatentFiling datePublication dateApplicantTitle
US3234332 *Dec 1, 1961Feb 8, 1966Rca CorpAcoustic apparatus and method for analyzing speech
US3247322 *Dec 27, 1962Apr 19, 1966Allentown Res And Dev CompanyApparatus for automatic spoken phoneme identification
US3261916 *Nov 16, 1962Jul 19, 1966IbmAdjustable recognition system
Referenced by
Citing PatentFiling datePublication dateApplicantTitle
US3403227 *Oct 22, 1965Sep 24, 1968Page Comm Engineers IncAdaptive digital vocoder
US3571515 *Jun 12, 1968Mar 16, 1971IbmVoice analysis and recovery system
US4829574 *Feb 1, 1988May 9, 1989The University Of MelbourneSignal processing
US5581654 *May 25, 1994Dec 3, 1996Sony CorporationMethod and apparatus for information encoding and decoding
US5583967 *Jun 16, 1993Dec 10, 1996Sony CorporationApparatus for compressing a digital input signal with signal spectrum-dependent and noise spectrum-dependent quantizing bit allocation
US5596679 *Oct 26, 1994Jan 21, 1997Motorola, Inc.Method and system for identifying spoken sounds in continuous speech by comparing classifier outputs
US5608713 *Feb 8, 1995Mar 4, 1997Sony CorporationBit allocation of digital audio signal blocks by non-linear processing
US5638486 *Oct 26, 1994Jun 10, 1997Motorola, Inc.Method and system for continuous speech recognition using voting techniques
US5642111 *Jan 19, 1994Jun 24, 1997Sony CorporationHigh efficiency encoding or decoding method and device
US5706392 *Jun 1, 1995Jan 6, 1998Rutgers, The State University Of New JerseyPerceptual speech coder and method
US5752224 *Jun 4, 1997May 12, 1998Sony CorporationInformation encoding method and apparatus, information decoding method and apparatus information transmission method and information recording medium
US5758316 *Jun 13, 1995May 26, 1998Sony CorporationMethods and apparatus for information encoding and decoding based upon tonal components of plural channels
US5781586 *Jul 26, 1995Jul 14, 1998Sony CorporationMethod and apparatus for encoding the information, method and apparatus for decoding the information and information recording medium
US5819214 *Feb 20, 1997Oct 6, 1998Sony CorporationLength of a processing block is rendered variable responsive to input signals
US5832426 *Dec 7, 1995Nov 3, 1998Sony CorporationHigh efficiency audio encoding method and apparatus
US6128592 *May 13, 1998Oct 3, 2000Sony CorporationSignal processing apparatus and method, and transmission medium and recording medium therefor
US6647063Jul 26, 1995Nov 11, 2003Sony CorporationInformation encoding method and apparatus, information decoding method and apparatus and recording medium
USRE36559 *May 18, 1994Feb 8, 2000Sony CorporationMethod and apparatus for encoding audio signals divided into a plurality of frequency bands
Classifications
U.S. Classification704/204, 704/215, 704/214, 704/207
International ClassificationG10L11/00
Cooperative ClassificationG10L25/00, H05K999/99
European ClassificationG10L25/00