Publication number | USRE43099 E1 |

Publication type | Grant |

Application number | US 12/313,140 |

Publication date | Jan 10, 2012 |

Filing date | Nov 17, 2008 |

Priority date | Dec 19, 1996 |

Also published as | DE69703233D1, DE69703233T2, EP0852375A1, EP0852375B1, US5839098 |

Publication number | 12313140, 313140, US RE43099 E1, US RE43099E1, US-E1-RE43099, USRE43099 E1, USRE43099E1 |

Inventors | Rajiv Laroia, Boon-Lock Yeo |

Original Assignee | Alcatel Lucent |

Export Citation | BiBTeX, EndNote, RefMan |

Patent Citations (27), Non-Patent Citations (13), Classifications (12), Legal Events (4) | |

External Links: USPTO, USPTO Assignment, Espacenet | |

US RE43099 E1

Abstract

Coding systems that provide a perceptually improved approximation of the short-term characteristics of speech signals compared to typical coding techniques such as linear predictive analysis while maintaining enhanced coding efficiency. The invention advantageously employs a non-linear transformation and/or a spectral warping process to enhance particular short-term spectral characteristic information for respective voiced intervals of a speech signal. The non-linear transformed and/or warped spectral characteristic information is then coded, such as by linear predictive analysis to produce a corresponding coded speech signal. The use of the non-linear transformation and/or spectral warping operation of the particular spectral information advantageously causes more coding resources to be used for those spectral components that contribute greater to the perceptible quality of the corresponding synthesized speech. It is possible to employ this coding technique in a variety of speech coding techniques including, for example, vocoder and analysis-by-synthesis coding systems.

Claims(38)

1. A method for coding a speech signal to generate a coded signal comprising:

generating a sequence of spectral magnitude values for a frame interval of said speech signal representing voiced speech, said spectral magnitude value sequence characterizing spectral components of a short-term frequency spectrum of said interval;

performing at least one of a non-linear transformation or spectral warping process on said sequence to produce an intermediate spectral value sequence having an enhanced characterization of at least one particular frequency range relative to another frequency range in the intermediate spectral sequence; and

coding said intermediate spectral value sequence to produce at least a portion of said coded signal for said interval of said speech signal.

2. The method of claim 1 wherein said coding step codes said processed spectral value sequence based on linear predictive analysis.

3. The method of claim 2 wherein said coding step comprises:

inverse transforming said intermediate spectral values into a time domain representation signal; and

generating linear predictive codes for said time domain representation signal.

4. The method of claim 1 wherein said step of performing non-linear transformation includes processing at least a portion of said spectral magnitude value sequence according to the expression [A(i)]^{N} [A(i)]^{N}, where A(i) represents the respective values in said sequence portion and the value N is not 0 or 1.

5. The method of claim 4 where the value N is a value less than 0 and not less than −1.

6. The method of claim 1 , further comprising performing a spectral warping process on said sequence of spectral magnitude values, and wherein said coding step includes generating a warp code for said coded signal indicating a portion of said sequence warped by said warping process.

7. The method of claim 6 wherein said warp code is an index of an entry in a warping function codebook.

8. The method of claim 1 further comprising performing spectral warping on said sequence to produce an intermediate spectral value sequence having an enhanced characterization of at least one particular frequency range relative to another frequency range in the intermediate spectral sequence, wherein said step of performing spectral warping comprises increasing the number of values in a portion of said intermediate spectral value sequence characterizing a particular frequency range that would effect the perceptual quality of a correspond speech signal synthesized from said coded signal.

9. The method of claim 8 wherein said step of performing spectral warping comprises decreasing the number of values in at least one other portion of said intermediate spectral value sequence characterizing another particular frequency range.

10. The method of claim 1 wherein the particular operation performed for said non-linear transformation or spectral warping process is based on a property of said speech signal.

11. The method of claim 10 wherein said property of said speech signal is a duration of a pitch period of said frame interval.

12. The method of claim 1 wherein the particular frequency range represented in the spectral magnitude value sequence that is warped by said warping process is selected based on the value magnitudes representing the signal energy for such frequency range.

13. The method of claim 1 wherein said coding step performs analysis-by-synthesis coding.

14. The method of claim 13 wherein said analysis-by-synthesis coding is code-excited linear prediction analysis.

15. The method of claim 1 wherein said step of generating said spectral magnitude value sequence characterizing said short-term frequency spectrum generates such sequence based on spectral components of at least one pitch period interval in said frame.

16. The method of claim 15 wherein said step of generating the sequence of spectral magnitude values comprises:

identifying a portion of said frame interval of said speech signal representing a pitch period;

performing a discrete Fourier transform of said identified portion of said frame interval to generate a sequence of spectral component values; and

determining respective magnitudes of said spectral component values to produce said spectral magnitude value sequence for said frame interval.

17. A method for decoding a coded speech signal, said coded signal including successive coded frame intervals of a speech signal, the decoding of a frame interval of said coded signal comprising the steps of:

generating an intermediate spectral value sequence for at least a portion of said interval representing voiced speech, said intermediate spectral value sequence characterizing spectral components of a short-term frequency spectrum of said interval and further having an enhanced characterization of at least one particular frequency range relative to another frequency range; and

processing said intermediate spectral value sequence with at least one of an inverse non-linear transformation or inverse spectral warping process to produce a sequence of spectral magnitude values characterizing the short-term frequency spectrum for the voiced portion of said interval.

18. The method of claim 17 wherein said short-term frequency spectrum represented in said intermediate spectral value sequence is a pitch period of voiced speech represented in said interval.

19. The method of claim 17 wherein said step of processing by inverse non-linear transformation includes processing at least a portion of said spectral magnitude value sequence according to the expression [Ā′(i)]^{N} [Ā′(i)]^{N}, where Ā″(i) Ā′(i) represents the respective values in said sequence portion and the value N is not 0 or 1, and wherein said expression performs an inverse transformation of a non-linear transformation used in coding said coded signal interval.

20. The method of claim 17 further comprises the step of claim 17, further comprising processing said intermediate spectral value sequence with an inverse spectral warping process, and receiving a warp code for said coded signal interval indicating a portion of said intermediate spectral value sequence warped during said coded signal interval.

21. The method of claim 20 wherein said warp code is an index of an entry in a warping function codebook.

22. The method of claim 17 further comprising processing said intermediate spectral value sequence with an inverse spectral warping process to produce a sequence of spectral magnitude values characterizing the short-term frequency spectrum for the voiced portion of said interval, wherein said step of processing by inverse warping said intermediate spectral value sequence comprises adjusting a number of spectral values in the intermediate spectral value sequence characterizing at least one particular frequency range in producing said spectral magnitude value sequence and wherein said spectral value adjustment corresponds to inverse warping used in coding said coded signal interval.

23. The method of claim 17 wherein the particular operation performed for said inverse non-linear transformation or spectral warping process is based on a property of said coded speech signal.

24. The method of claim 23 wherein said property of said speech signal is a duration of a pitch period in said coded speech signal interval.

25. The method of claim 17 wherein said generating step includes analysis-by-synthesis decoding.

26. The method of claim 25 wherein said analysis-by-synthesis decoding is based on code-excited linear prediction analysis and comprises receiving codes identifying a respective excitation codebook entry corresponding to said interval.

27. A coder for generating a coded signal based on a speech signal comprising:

a spectral transformer for generating a sequence of spectral magnitude values for a frame interval of said speech signal representing voiced speech, said spectral magnitude value sequence characterizing spectral components of a short-term frequency spectrum of said frame interval;

an encoder coupled to said spectral processor, said encoder for performing at least one of a non-linear transformation or a spectral warping process on said sequence to produce an intermediate spectral value sequence having an enhanced characterization of at least one particular frequency range relative to another frequency range in the intermediate spectral sequence; and

a spectral coder coupled to said encoder, said spectral coder for coding said intermediate spectral value sequence to produce at least a portion of said coded signal for said interval of said speech signal.

28. The coder of claim 27 wherein said spectral coder comprises:

an inverse transformer for inverse transforming said spectral parameters processed by said spectral processor into a time domain representation signal; and

a linear predictive code generator for generating linear predictive coefficients for said coded signal based on said time domain representation signal for said interval of said speech signal.

29. The coder of claim 27 wherein said spectral coder includes a vocoder.

30. The coder of claim 27 wherein said spectral coder includes an analysis-by-synthesis coder.

31. The coder of claim 30 wherein said analysis-by-synthesis coder is a code-excited linear prediction coder.

32. The coder of claim 27 wherein said spectral transformer for generating said spectral magnitude value sequence characterizing spectral components of a short-term frequency spectrum performs a transformation based on at least one pitch period represented in said interval.

33. The coder of claim 32 wherein said spectral transformer comprises:

a window processor and pitch detector for identifying an interval in said frame interval of said speech signal representing a pitch period; and

a discrete Fourier transformer coupled to said window processor, said discrete Fourier transformer for generating said spectral magnitude value sequence for said interval.

34. A coder for generating a coded signal from a speech signal comprising:

means for generating a sequence of spectral magnitude values for a frame interval of said speech signal representing voiced speech, said spectral magnitude value sequence characterizing spectral components of a short-term frequency spectrum of said interval;

means for performing at least one of a non-linear transformation or spectral warping process on said sequence to produce an intermediate spectral value sequence having an enhanced characterization of at least one particular frequency range relative to another frequency range in the intermediate spectral sequence; and

means for coding said intermediate spectral value sequence to produce at least a portion of said coded signal for said interval of said speech signal.

35. A decoder for decoding a coded speech signal, said coded signal including successive coded frame intervals of a speech signal, said decoder comprising:

a spectral decoder, said spectral decoder for generating an intermediate spectral value sequence for voiced speech represented in said frame interval of the coded signal, said intermediate spectral value sequence characterizing spectral components of a short-term frequency spectrum of said voiced speech and further having an enhanced characterization of at least one particular frequency range relative to another frequency range; and

inverse processor coupled to said spectral decoder, said inverse processor for processing said intermediate spectral value sequence with at least one of an inverse non-linear transformation or inverse spectral warping process to produce a sequence of spectral magnitude values characterizing a short-term frequency spectrum for the voiced portion of said interval.

36. The decoder of claim 35 wherein said spectral decoder includes an analysis-by-synthesis decoder.

37. The decoder of claim 35 wherein said analysis-by-synthesis decoder performs code-excited linear prediction analysis.

38. A decoder for decoding a coded speech signal, said coded signal including successive coded frame intervals of a speech signal, said decoder comprising:

means for generating an intermediate spectral value sequence for voiced speech represented in said frame interval of the coded signal, said intermediate spectral value sequence characterizing spectral components of a short-term speech spectrum of voiced speech represented in said interval and further having an enhanced characterization of at least one particular frequency range relative to another frequency range; and

means for processing said intermediate spectral value sequence with at least one of an inverse non-linear transformation or inverse spectral warping process to produce a sequence of spectral magnitude values characterizing said short-term frequency spectrum for the voiced portion of said interval.

Description

The invention relates generally to speech communication systems and more specifically to systems for encoding and decoding speech.

Digital speech communication systems including voice storage and voice response systems use speech coding and data compression techniques to reduce the bit rate needed for storage and transmission. Voiced speech is produced by a periodic excitation of the vocal tract by the vocal chords. As a consequence, a corresponding signal for voiced speech contains a succession of similarly but evolving waveforms having a substantially common period which is referred to as the pitch period. Typical speech coding systems take advantage of short-term redundancies within a pitch period interval to achieve data compression in a coded speech signal.

In a typical voice coder (vocoder) system, such as that described in U.S. Pat. No. 3,624,302, which is incorporated by reference herein, the speech signal is partitioned into successive fixed duration intervals of 10 msec. to 30 msec. and a set of coefficients are generated approximating the short-term frequency spectrum resulting from the short-term redundancies or correlation in each interval. These coefficients are generated by linear predictive analysis and referred to as linear predictive coefficients (LPC's). The LPC's represent a time-varying all-pole filter that models the vocal tract. The LPC's are useable for reproducing the original speech signal by employing an excitation signal referred to as a prediction residual. The prediction residual represents a component of the original speech signal that remains after removal of the short-term redundancy by linear predictive analysis.

In vocoders, the prediction residual is typically modeled as white noise for unvoiced sounds and a periodic sequence of impulses for voiced speech. A synthesized speech signal can be generated by a vocoder synthesizer based on the modeled residual and the LPC's of the linear predictive filter modeling the vocal tract. Vocoders approximate the spectral information of an original speech signal and not the time-domain waveform of such a signal. Moreover, a speech signal synthesized from such codes often exhibits a perceptible synthetic quality that is, at times, difficult to understand.

Alternative known speech coding techniques having improved perceptual speech quality approximate the waveform of a speech signal. Conventional analysis-by-synthesis systems employ such a coding technique. Typical analysis-by-synthesis systems are able to achieve synthesized speech having acceptable perceptual quality. Such systems employ both linear predictive analysis for coding the short-term redundant characteristics of the pitch period as well as a long-term predictor (LTP) for coding long term pitch correlation in the prediction residual. In LTP's, characteristics of past pitch periods are used to provide an approximation of characteristics of a present pitch period. Typical LTP's have included an all-pole filter providing delayed feedback of past pitch-period characteristics, or a codebook of overlapping vectors of past pitch-period characteristics.

In particular analysis-by-synthesis systems, the prediction residual is modeled by an adaptive or stochastic codebook of noise signals. The optimum excitation is found by searching through the codebook of candidate excitation vectors for successive speech intervals referred to as frames. A code specifying the particular codebook entry of the found optimum excitation is then transmitted on a channel along with coded LPC's and the LTP parameters. These particular analysis-by-synthesis systems are referred to as code-excited linear prediction (CELP) systems. Exemplary CELP coders are described in greater detail in B. Atal and M. Schroeder, “Stochastic Coding of Speech Signals at Very Low Bit Rates”, Proceedings IEEE Int. Conf Comm., p. 48.1 (May 1984); M. Schroeder and B. Atal, “Code-Excited Linear Predictive (CELP): High Quality Speech at Very Low Bit Rates”, Proc. IEEE Int. Conf ASSP., pp. 937-940 (1985) and P. Kroon and E. Deprettere, “A Class of Analysis-by-Synthesis Predictive Coders for High-Quality Speech Coding at Rate Between 4.8 and 16 KB/s”, IEEE J on Sel. Areas in Comm., SAC-6(2), pp. 353-363 (Feb. 1988), which are all incorporated by reference herein.

However, in vocoder and analysis-by-synthesis systems as well as other types of speech coding systems, there is a recognized need for methods of coding characteristics of the short-term frequency spectrum with enhanced perceptual accuracy.

As shown in **301** and/or spectral warping process **302** on a sequence **303** of spectral magnitude values characterizing the short-term frequency spectrum of respective voiced speech frames prior to spectral coding **304** by, for example, linear predictive analysis. Spectral warping spreads or compresses particular frequency ranges represented in the spectral characterization sequence based on the effect such frequency ranges have on the perceptual quality of corresponding speech synthesized from the coded signal.

In particular, spectral warping spreads frequency ranges that substantially effect the perceptual quality of corresponding synthesized speech and compress perceptually less significant frequency ranges. In a corresponding manner, the non-linear transformation performs a magnitude warping operation on the spectral magnitude values. Such transformation amplifies and/or attenuates spectral magnitude values to enhance the characterization of the perceptual quality of a corresponding synthesized speech signal.

The invention is based on the realization that typical coding methods, including linear predictive analysis, perform coding of the short-term frequency spectrum of a speech signal with substantially equal coding resources used for respective frequency components whether such frequency components substantially effect the perceptual quality of a speech signal synthesized from the coded signal or otherwise. In other words, typical coding techniques do not perform coding of frequency components of the short-term frequency spectrum characterization based on the perceptual accuracy such frequency components produce in a corresponding synthesized speech signal.

In contrast, the present invention processes the spectral component values by spectral warping and/or non-linear transformation to produce a transformed and/or warped characterization that causes subsequent spectral coding, such as by linear predictive analysis, to provide more coding resources for perceptually more significant spectral components and less coding resources to those spectral components that are less perceptually significant. Accordingly, the resulting synthesized voiced speech produced from such a coded signal would have an improved perceptual quality while maintaining an advantageous coding efficiency relative to the coding process alone.

A corresponding decoder according to the invention employs a complementary inverse non-linear transformation and/or spectral warping process to obtain the corresponding approximation of the original short-term frequency spectrum of the respective frames of the speech signal with improved perceptual quality.

It is possible to employ the coding technique of the invention in a variety of spectral coding arrangements including, for example, vocoder and analysis-by-synthesis coding systems, or other techniques where linear prediction analysis has been used for characterizing the short-term frequency spectrum of a speech signal.

Additional features and advantages of the present invention will become more readily apparent from the following detailed description and accompanying drawings.

The invention advantageously employs processing of successive frames of a speech signal by performing a non-linear transformation and/or spectral warping process on a spectral magnitude value sequences characterizing the short-term frequency spectrum of respective voiced speech frames prior to spectral coding by, for example, linear predictive analysis. As used herein, “short-term frequency spectrum” refers to spectral characteristics arising from the short-term correlation in the speech signal excluding the correlation resulting from the pitch periodicity. The short-term frequency spectrum is alternatively referred to as the short-time frequency spectrum in the art, and is described in greater detail in L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, sects. 6.0-6.1, pp. 250-282 (Prentice-Hall, New Jersey, 1978), which is incorporated by reference herein in its entirety.

Spectral warping spreads or compresses particular frequency ranges represented in the spectral magnitude value sequence based on the effect such frequency ranges have on the perceptual accuracy produce in corresponding speech synthesized from the coded signal. In a corresponding manner, the non-linear transformation performs a magnitude warping operation on the spectral magnitude values. Such transformation amplifies and/or attenuates the spectral magnitude values to enhance the characterization for producing an improved perceptual accuracy in corresponding synthesized speech.

The invention is based on the realization that typical coders, including linear predictive coders, code frequency components of a voiced speech signal interval such that perceptually significant frequency components are coded using identical or similar resources to that used for coding perceptually less significant frequency components. In contrast, the invention processes the spectral magnitude values by spectral warping and/or non-linear transformation to produce a transformed and/or warped characterization having an enhanced characterization of at least one particular frequency range that causes the coder to provide more coding resources to perceptually more significant spectral components and less coding resources to those spectral components that are less perceptually significant. Accordingly, synthesized speech produced from such a coded speech signal has an improved perceptual quality relative to the coding process alone while maintaining an advantageous coding efficiency.

The invention is described below with regard to using linear predictive analysis for providing the spectral coding for illustration purposes only and is not intended to be a limitation of the invention. It is alternatively possible to employ numerous other spectral coding techniques that code the frequency components of the short-term frequency spectrum by methods other than coding based on a corresponding perceptual quality or accuracy that such components would have in corresponding synthesized speech. For instance, it is possible to use a spectral coder according to the invention that does not allocate coded signal bits or coding resources based on the perceptual quality of the respective spectral components.

The invention is useable in a variety of coder systems for encoding the short-term vocal tract characteristics of voiced speech including, for example, vocoders or analysis-by-synthesis systems such as CELP coders. Exemplary vocoder and CELP type coder and decoder systems employing the technique of the invention are illustrated in

For clarity of explanation, the illustrative embodiments of the invention are shown as including, among other things, individual function blocks. The functions these blocks represent may be provided through the use of either shared or dedicated hardware including hardware capable of executing software instructions. For example, such functions can be performed by digital signal processor (DSP) hardware, such as the Lucent DSP16 or DSP32C, and software performing the operations discussed below, which is not meant to be a limitation of the invention. It is also possible to use very large scale integration (VLSI) hardware components as well as hybrid DSP/VLSI arrangements in accordance with the invention.

An exemplary vocoder-type coder arrangement **1** according to the invention is depicted in **5** that produces a corresponding analog speech signal. This analog speech signal is bandlimited and converted into a sequence of pulse samples by filter and sampler circuit **10**. It is possible for the band limited filtering to remove frequency components of the speech signal above 4.0 KHz and for the sampling rate ƒs to be 8.0 KHz as is typical used for processing speech signals. Each speech signal sample is then transformed into an amplitude representative sequence of digital codes S(n) by analog-to-digital converter **15**. The sequence S(n) is commonly referred to as digitized speech. The digitized speech S(n) is supplied to a short-term frequency spectrum processor **20**, which determines and codes the corresponding short-term spectral characteristics from the digitized speech S(n) according to the invention.

The processor **20** sequentially processes intervals of the sequence S(n) in frames or blocks corresponding to a substantially fixed duration of time such as in the range of 15 msec. 70 msec. For instance, a 30 msec. frame duration for speech sampled at a rate of 8.0 kHz corresponds to a frame of 240 samples from the sequence S(n) and a frame rate of approximately 33 frames/sec. The processor **20** first determines if the a sequence frame represents speech that is voiced or unvoiced. If the frame represents voiced speech, then the processor **20** determines spectral component values representing a short-term frequency spectrum for at least one pitch period in the frame. Numerous methods can be employed for producing the spectral component values representing the short-term frequency spectrum of the frame. An exemplary method is described in greater detail below with respect to

Nevertheless, in the encoder **20**, the spectral component values representing the short-term frequency spectrum of the frame are then processed by a non-linear transformation and/or spectral warping operation to produce a sequence of transformed and/or warped values or intermediate values according to the invention. A particular spectral warping operation is selected to enhance characterization of at least one particular frequency range of the frame of the speech signal relative to another spectral range. It is advantageous for the enhanced spectral range to be a range that substantially effects the perceptible quality of corresponding synthesized speech.

The processor **20** then determines autocorrelation coefficients corresponding to the transformed and/or warped spectral values. A spectral coding technique such as linear predictive analysis is then performed on the autocorrelation coefficients to produce a coefficient sequence, such as linear predictive coefficients (LPC's), that are quantized to produce the quantized coefficient sequence {acute over (α)}_{1}, {acute over (α)}_{2 }. . . {acute over (α)}_{P }for the processed frame of the digitized speech signal S(n). The number of coefficients P corresponds to the order of the linear predictive analysis.

The quantized coefficient sequence {acute over (α)}_{1}, {acute over (α)}_{2 }. . . {acute over (α)}_{P }is provided by the processor **20** to the channel coder **30** which converts the quantized sequence into a form suitable for transmission over a transmission medium or storage in a storage medium. Exemplary conversions for transmission include conversion of the codes into electrical signals for transmitting over a wired or wireless transmission medium or light signals over an optical transmission medium. In a similar manner, exemplary conversions for storage include conversion of the codes into recordable signals for storage into a magnetic or optical data storage medium. Since LPC's are typically not readily amenable to quantization, it is possible to for the LPC's to be transformed in an equivalent quantizable form such as conventional line spectral pair (LSP) or partial correlation (PARCOR) parameters for forming the quantized coefficient sequence {acute over (α)}_{1}, {acute over (α)}_{2 }. . . {acute over (α)}_{P}.

The remaining output signals of the processor **20** includes a warp code signal W indicating the warping function, if any, used to warp the spectral component values representing the short-term frequency spectrum for the respective voiced speech frames. The processor **20** also produces other output signals typically generated in conventional speech coding systems including signals representing whether the processed speech frame includes voiced or unvoiced speech, a gain constant G for the processed frame and a signal X for the pitch period duration if the processed frame is voiced speech.

An exemplary configuration for the short-term frequency spectrum processor **20** according to the invention is shown in **40**. The N digital values for S(nj+i), i=1,2, . . . , N, for j-th frame to be processed are provided to a pitch detector **50** and a window processor **55**. The use of the previously described non-overlapping frame intervals are for illustration purposes only and it should be readily understood that overlapping frame intervals are also useable in accordance with the invention.

The pitch detector **50** determines if a voiced component is represented in the frame of the speech signal, or if the frame contain entirely unvoiced speech. If the detector **50** detects a voiced speech component, it determines the corresponding pitch period. A pitch period indicates the number of digitized samples in one cycle of the substantially periodic the voiced speech signal. Typically, a pitch period possesses a duration on the order of 3 msec. to 20 msec., which corresponds to 24 to 160 digital samples based on a sampling rate of 8.0 kHz.

Exemplary methods for determining if a frame contains a voiced speech component and for identifying pitch period intervals are described in the previously cited Digital Processing of Speech Signals book, sects. 4.8, 7.2, 8.10.1, pp. 150-157, 372-378, 447-450. It is possible to determine a pitch period interval by examining the long-term correlation in the speech frame and/or by performing linear predictive analysis on the speech frame and identifying the location of pitch impulse in the resulting prediction residual. The pitch detector **50** also determines the gain constant G based on the energy of the of the samples comprising the frame sequence being processed. Methods for such a determination is not critical to practicing the invention. An exemplary method for determining the gain constant G is also described in the previously cited Digital Processing of Speech Signals book, sect. 8.2, pp. 404-407.

The window processor **55** determines a window function that is essentially a pitch period in duration based on a signal X indicating the pitch period determined by the pitch detector **50**. The window processor **55** multiplies the digital samples of the frame received from the partitioner **40** with the determined window function to obtain a sequence of digital values S_{j}(i), i=1, . . . , M, that is essentially a pitch period in duration, where M represents the number of non-zero samples obtained by the window function for the frame j being processed. Typically desirable window functions have gradual roll-offs. As a consequence, it is possible for the processor **55** to determine a window function that supports larger intervals than a pitch period to obtain the desired sequence S_{j}(i). Accordingly, although the digital values obtained from such a window function corresponds to a duration longer than a pitch period, such an interval is still referred to as a pitch period interval in this description of the invention.

Moreover, it is advantageous to align the determined window function relative to the frame sequence of digitized speech samples for obtaining essentially a pitch period interval of samples from the beginning of a pitch period to the beginning of a next pitch period. It is possible for the pitch detector **50** to identify the beginnings of consecutive pitch period intervals by identifying respective pitch impulses occurring in a corresponding produced prediction residual using, for example, conventional linear predictive analysis on the speech frame interval.

The sequence S_{j}(i) produced by the window processor **55** for the frame j is provided to a spectral processor **60**. The spectral processor **60** generates the corresponding spectral magnitude values A(i), i=0, 1, . . . , K−1, of the short-term frequency spectrum of the pitch period speech sequence S_{j}(i) such as by performing a Discrete Fourier transform (DFT) of the sequence and determining the magnitude of the resulting transformed coefficients. The number of spectral values K should be selected to provide a sufficient frequency resolution to adequately characterize the short-term frequency spectrum of the pitch period for coding. Larger values of K provide improved frequency resolution of the short-term frequency spectrum. Typically values of K in the approximate range of 128 to 1024 provide sufficient frequency resolution. If the value K is greater than the number of samples M in the pitch period speech sequence S_{j}(i), then K-M zeros can be appended to the sequence S_{j}(i) prior to DFT processing.

The spectral magnitude sequence A(i) represents a sampled version of a continuous, i.e., non-discrete, short-term frequency spectrum A(z). However, the spectral magnitude sequence A(i) will alternatively be referred to as the short-term frequency spectrum for ease of explanation. A conventional DFT processor is useable to generate the desired spectral magnitude values A(i). However, phase components in addition to the desired magnitude components are typically produced by conventional DFT processors and are not required for this particular embodiment of the invention. Accordingly, since the phase component is not required according to the invention, other transforms that directly generate magnitude values are useable for the spectral processor **60**. Also, a fast Fourier transform (FFT) processors can be used for the spectral processor **60**. A plot of a short-term frequency spectrum A(z) represented by an exemplary sequence of spectral magnitude values A(i) for a pitch period of an exemplary speech signal is shown in

Moreover, the previous described method for producing the spectral magnitude value sequence A(i) characterizing the short-term frequency spectrum of the frame j is for illustration purposes only and is not meant as a limitation of the invention. It should he readily understood that numerous other techniques are useable for producing such a sequence characterizing the short-term frequency spectrum of the frame j.

Referring again to **60** is then provided to spectral warper **65**. The spectral warper **65** warps the sequence A(i) to generate a frequency warped sequence of spectral magnitude values A′(i). In producing the sequence, the warper **65** spreads, in frequency, respective spectral magnitude values for at least one frequency range that would enhance the perceptual quality of the corresponding synthesized speech. In a like manner, those spectral magnitude values characterizing a perceptually less significant frequency range are compressed. Such frequency spreading and compressing of the spectral magnitude values causes the subsequently performed linear predictive analysis to provide more of the available coding resources for the perceptually significant frequency ranges and less coding resources for the perceptually less significant frequency ranges.

_{1 }and Z_{2 }to Z_{3 }have relatively high energy and/or a plurality of relatively sharp magnitude peaks that would likely be perceptually significant in the corresponding synthesized speech. In contrast, frequency ranges Z_{1 }to Z_{2 }as well as Z_{3 }to ƒ_{s}/2 have relatively low energy and mostly gradual peaks that are perceptually less significant. Accordingly, the corresponding spectral magnitude values A(i) representing the spectrum A(z) of _{1}, Z_{2 }and Z_{3 }in _{1}, Z′_{2 }and Z′_{3 }in **65** spreads the perceptually more significant ranges of 0 to Z_{1 }and Z_{2 }to Z_{3 }to broader ranges 0 to Z′_{1 }and Z′_{2 }to Z′_{3}, and compresses the perceptually less significant ranges Z_{1 }to Z_{2 }and Z_{3 }to ƒ_{s}/2 in reduced ranges Z′_{1 }to Z′_{2 }and Z′_{3 }to ƒ_{s}/2.

An exemplary method for the spectral warper **65** for warping the spectral magnitude values A(i) representing the spectrum in **65** identifies four groups of magnitude values corresponding to the four frequency ranges identified as perceptually more or less significant as shown in _{1}(i), i=0, 1, . . . , a, for the frequency range 0 to Z_{1}; a second group containing magnitude values A_{2}(i), i=a+1, a+2, . . . ,b, for the frequency range Z_{1 }to Z_{2}; a third group containing magnitude values A_{3}(i), i=b+1, b+2, . . . , c, for the frequency range Z_{2 }to Z_{3}; and a fourth group containing magnitude values A_{4}(i), i=c+1, c+2, . . . ,k−1, for the frequency range Z_{3 }to ƒ_{s}/2. In the previous discussion, a frequency range u to v includes u but excludes v.

It is possible to compress the frequency ranges Z_{1 }to Z_{2 }and Z_{3 }to ƒ_{s}/2 represented by the second and fourth magnitude value groups A_{2}(i) and A_{4}(i) by reducing the number of magnitude values in such groups. For instance, three out of every four consecutive magnitude values can be discarded in such groups. Further, if such a compression technique were used, then the number of values used for such groups can be selected such that the number is a multiple of four. In the alternative, every four consecutive magnitude values in the sequence in such groups can be replaced by one value having a magnitude that is an average of the four values. Such techniques reduce the number of magnitude values for the second and fourth groups by a factor of four.

In a similar manner, it is possible to expand or spread the frequency ranges 0 to Z_{1 }and Z_{2 }to Z_{3 }represented by the first and third magnitude value groups A_{1}(i) and A_{3}(i) by increasing the number of magnitude values in such groups. For instance, the processor **65** can add a new magnitude values between every two consecutive values in such groups. As consequence, the number of magnitude values representing the first and third group would be doubled. Moreover, each added magnitude value can he equal to either of the neighboring magnitude values or based on some other relationship of the neighboring magnitude values. For example, it is possible to add a value that is a arithmetic mean of the two neighboring values using linear interpolation.

The warped spectral magnitude values A′(i), i=0, 1, . . . , K′−1, is obtained by concatenating the magnitude values in the four warped groups. The total number of warped spectral magnitude values K′ will likely be different than the original number of spectral magnitude values K. Further, it is possible to perform only compression of particular groups or only spreading of other groups to produce the warped spectral magnitude values A′(i) according to the invention.

The previously described warping method first performs the discrete Fourier transformation to generate a sequence of spectral magnitude values A(i) characterizing the short-term frequency spectrum of a digitized speech frame S_{j}(n), and then increases or decreases the number of spectral magnitude values characterizing particular frequency ranges in the sequence A(i) to produce the desired warped sequence A′(i). However, it is possible according to the invention to advantageously directly produce the warped sequence A′(i) by the discrete Fourier transformation by generating more spectral magnitude values for those frequency ranges to be emphasized and less spectral magnitude values for those frequency ranges to be de-emphasized.

Moreover, the previously described warping methods for spreading and compressing the spectral characterization of the short-term frequency spectrum in a voiced speech frame are based on piece-wise linear warping functions for illustration purposes only. It should be readily understood that the frequency warping can also be performed by other invertible warping functions. For instance, the particular warping process used for the spectral magnitude value sequence A(i) for respective voiced speech frame intervals can be chosen from a codebook of transforms. In such instance, the signal W is generated by the spectral warper **65** in

The warped sequence spectral magnitude values A′(i) generated by the spectral warper **65** is provided to a non-linear transformer **70** which performs a non-linear transformation on each value in the sequence A′(i) to yield a transformed sequence A″(i). N Exemplary non-linear transformations include the expression A″(i)=[A′(i)]^{N}, where the N is a positive or negative integer or fraction that is not positive one. Accordingly, such a non-linear transformation amplifies or attenuates the spectral magnitudes values based on the values of such magnitudes. For instance, when N=−1, A′(i) is transformed to A″(i)=1/A′(i) for each warped spectral magnitude value and effectively models the sequence A′(i) as an all-zero spectrum by processing with a subsequent linear predictive analyzer **85**.

When the value N is negative, the linear predictive analysis of the transformed spectrum represented by the to sequence A″(i) effectively provides an all-zero spectrum representation for the spectrum represented by the sequence A′(i). When the order of the linear predictive analysis is relative small, such as less than 30, it is often advantageous to use a value N corresponding to −1/B, where B is greater than one to reduce the dynamic range of the spectrum. Such a reduction of the dynamic range of the spectrum effectively shortens its time response facilitating the subsequent modeling of the spectrum by an all-zero filter of smaller order. Although the non-linear transformation was previous described with a negative value N, it alternatively possible to use a positive value N, that is not equal to one, to produce a corresponding all-pole spectrum representation according to the invention.

The previously described non-linear transformation is a fixed transformation and is typically known by a corresponding decoder for decoding the coded speech signal according to the invention. However, it is alternatively possible for the non-linear transformation to base the value N on a particular property of the current or previously processed speech frame such as, for example, the pitch period duration X that is provided in the coded signal received from the channel. The value N of the non-linear transformation can also be determined from a codebook of transformation. In such instance, the corresponding codebook index is included in the coded signal produced by the channel coder **30** of ^{N(i)}, where a different value N(i) can be used for different values i.

The transformed and warped sequence A″(i) generated by the transformer **70** provide spectral representation having an enhanced characterization of at least one particular frequency range relative to another frequency range. The spectral magnitude values of the sequence A″(i) are squared by the squarer **75** to produce corresponding power spectral values which are provided to inverse discrete Fourier transform (IDFT) processor **80**. The IDFT processor **80** then generates up to K′ autocorrelation coefficients based on the squared spectral magnitude values A″(i), i=0,1, . . . , K′−1. It is possible to use an FFT to perform the IDFT of the processor **80**.

The generated autocorrelation coefficients are then provided to a P-th order linear predictive analyzer **85** which generates P linear predictive coefficients (LPC's) corresponding to the transformed and warped spectral magnitude values A″(i). Then, the generated LPC's are quantized by a transformer/quantizer **90** to produce the coefficient sequence {acute over (α)}_{1}, {acute over (α)}_{2 }. . . {acute over (α)}_{P}. It is advantageous for the transformer/quantizer **90** to additionally transform the generated LPC's to a mathematically equivalent set of P values that are more amenable to quantization than typical LPC's prior to quantizing such values. The particular LPC transformation used by the processor **90** is not critical to practicing the invention and can include, for example, LPC transformations to conventional partial correlation (PARCOR) coefficients or line spectral pair (LSP) coefficients. The resulting coefficient sequence {acute over (α)}_{1}, {acute over (α)}_{2 }. . . {acute over (α)}_{P }represents the short-term frequency spectrum of the frame sequence being processed by the encoder **20**.

The exemplary embodiment of the short-term frequency spectrum processor **20**, shown in **65** and non-linear transformer **70** in a particular order to achieve improved perceptual coding of the short-term frequency spectrum of voiced speech frames of a speech signal. However, such enhanced characterization is alternatively achievable using the spectral warper **65** and transformer **70**, individually or in a different order.

An exemplary decoder **100** for decoding coded signals for the respective speech frames generated by the coder **1** of **105**. The channel decoder **105** decodes the respective signals for the successive received speech frames encoded by the channel encoder **30** including the voiced/unvoiced status of the frame, the gain constant G, the signal W, the quantized coefficient sequence {acute over (α)}_{1}, {acute over (α)}_{2 }. . . {acute over (α)}_{P }and pitch period duration X if the frame contains voiced speech. The coefficient sequence {acute over (α)}_{1}, {acute over (α)}_{2 }. . . {acute over (α)}_{P }and signal W for a current speech frame being processed is provided to a short-term frequency spectrum decoder **110** which is described in greater detail below with regard to

The short-term frequency spectrum decoder **110** produces, for example, corresponding all-zero filter coefficients a_{1}, a_{2}, . . . a_{H }for the processed frame based on an inverse non-linear transformation and/or spectral warping process of the transformed and/or warped short-term frequency spectrum represented by the coefficient sequence {acute over (α)}_{1}, {acute over (α)}_{2 }. . . {acute over (α)}_{P}. The generated filter coefficients a_{1}, a_{2}, . . . a_{H }are then provided to form an all-zero synthesis filter **115** for characterizing the spectral envelope that shapes the spectrum of synthesized speech corresponding to the speech frame.

The filter **115** uses the coefficients a_{1}, a_{2, }. . . a_{H }to modify the spectrum of an excitation sequence for the speech frame being processed to produce a synthesized speech signal corresponding to the original speech signal of **120** for producing impulses separated by a pitch period duration. Also, a white noise generator **125**, such as a Gaussian white noise generator, can be used to generate the necessary excitation for the unvoiced portions of the synthesized speech signal. A switch **130** coupled to the impulse generator **120** and white noise generator **125** is controlled by the voiced/unvoiced status signal for applying the respective outputs to a signal amplifier **135** for constructing the proper sequence for the excitation sequence based on the received speech frame information. For each frame, the magnitude of the amplification of the excitation signal by the amplifier **135** is based on the gain constant G of the frame received from the channel decoder **105**.

An exemplary configuration for the short-term frequency spectrum decoder **110** according to the invention is illustrated in **20** of _{1}, {acute over (α)}_{2 }. . . {acute over (α)}_{P }corresponding to the transformed and quantized LPC's for the speech frame being processed is provided to an inverse transformer **150** that transforms the sequence hack into the LPC's. More specifically, the inverse transformer **150** performs the inverse transformation to that performed by the transformer/quantizer **90** in the encoder **20** of **150** correspond to those signals generated by the LPC analyzer **85** in

The LPC's generated by the inverse transformer **150** are provided to a spectral processor **160**, such as a discrete Fourier transformer, which produces a corresponding intermediate value sequence of reciprocal spectral magnitude values representing the warped and transformed short-term frequency spectrum. The reciprocal sequence A″(i) of such values is then produced by processor **165** and corresponds to the transformed and warped spectrum represented in the sequence A″(i) produced by the non-linear transformer **70** in

Each of the spectral magnitude values Ā″(i) generated by the block **165** is then inverse non-linear transformed by the processor **170** to produce a spectrum sequence Ā′(i) that corresponds to the warped spectrum sequence A′(i) produced by the spectral warper **65** in **170** in **70** of **70**, then a square operation should be performed by the processor **170**.

The inverse transformed spectral magnitude value sequence Ā″(i) generated by the processor **170** is then provided to the inverse spectral warper **175** which produces a sequence of inverse spectral magnitude values Ā(i), i=0, 1, . . . ,K″−1. The produced inverse spectral magnitude values Ā(i) correspond to the original short-term spectrum represented in the sequence A(i) produced by the DFT transformer **60** in **175** of **1** of

Although the previously described signal W indicates a respective codebook entry, it is alternatively possible, for the signal W to indicate the particular employed spectral warping operation performed by the encoder for the short-term frequency spectrum of respective speech frames in another manner. Also, the warping signal W can be omitted if the employed warping function for a coded speech frame is based on a property of the speech frame such as, for example, the duration of the pitch period. In such a system, the signal X indicating the pitch period duration for the interval should also be provided to the inverse warper **175**.

In operation, if the spectral warper **65** of _{1 }to Z_{2 }during encoding of the speech signal as in the previously described example depicted in **175** processes the magnitude values representing that frequency range to reduce the number of magnitude values substantially back to their original proportion. Numerous techniques can be used to process to achieve such an inverse spectral warping operation. For instance, in order to reduce the number of spectral magnitude values characterizing a particular frequency range by one-half, the inverse warper **175** could remove every other spectral value in the sequence that characterizes that frequency range, or substitute an average value for adjacent value pairs in such sequence.

Each of the K″ inverse warped and transformed magnitude values in the sequence Ā(i) are then squared by squarer **180** to produce a corresponding sequence of power spectral values. The reciprocal of each of the power spectral values is then generated by processor **185**. Such a representation is required for the subsequent generation of the desired relative high order LPC all-zero synthesis filter coefficients a_{1}, a_{2}, . . . a_{H }that models the spectrum characterized by the sequence A(i). Since the coding method according to the invention often employs relatively high order modeling of the spectrum sequence Ā(i), it is more advantageous to generate an all-zero filter model rather than all-pole model. Unstable predictive synthesis filters can be produced using truncated all-pole filter coefficients based on such relatively high order analysis. However, if an all-pole filter model is desired, then the processor **185** can be omitted from the decoder **110**.

The reciprocal sequence of power spectral values produced by the processor **185** are provided to IDFT processor **190** which generates up to K″ corresponding autocorrelation coefficients. It is possible to use an FFT to perform the IDFT of the processor **190**. The generated autocorrelation coefficients are then provided to an H-th order linear predictive analyzer **195** which generates the H linear predictive filter coefficients a_{1}, a_{2}, . . . a_{H }corresponding to an inverse transformed and inverse warped spectral characterization of the short-term frequency spectrum of the voiced speech frame being processed. Such generated filter coefficients are useable for forming an all-zero synthesis filter **115**, shown in

Although the exemplary short-term frequency spectrum decoder **110** in **170** and inverse warper **175**, individually or in a different order.

_{1 }and Z_{2 }to Z_{3 }more closely represent the original spectral magnitudes of

The method for encoding the short-term frequency spectrum of speech signals according to the invention has been described with respect to vocoder-type speech coders in **200** and decoder **300** according to the invention are depicted in **15** and short-term frequency spectrum coder **20**. Likewise, similar components in **110** and channel decoder **105**.

Referring to the CELP coder **200** of **5** is processed to produce digitized speech sequence S(n) by the filter and sampler **10** and A/D converter **15** as is previously described with respect to **20** which produces the encoded short-term frequency spectrum coefficient sequence {acute over (α)}_{1}, {acute over (α)}_{2 }. . . {acute over (α)}_{P }and warping signal W for successive frames of sequence S(n). The produced coefficient sequence {acute over (α)}_{1}, {acute over (α)}_{2 }. . . {acute over (α)}_{P }and warping signal W which characterize the short-term frequency spectrum of the respective speech frames are provided to the channel coder **30** for coding and transmission or storage on the channel. Such generation of the encoded short-term frequency spectrum coefficient sequence {acute over (α)}_{1}, {acute over (α)}_{2 }. . . {acute over (α)}_{P }and warping signal W is substantially identical to that previously described with respect to

The difference between the encoders **1** and **200** of **200** encodes the prediction residual based on long-term prediction analysis and codebook excitation entries while the coder **1** performs encoding of the prediction residual based on a relatively simple model of a periodic impulse train for voiced speech and white noise for unvoiced speech. The prediction residual is coded in **205** which generates corresponding long-term filter tap coefficients β_{1}, β_{2}, β_{3 }and delay H based on the respective frames of the sequence S(n). Exemplary pitch predictor analyzers are described in greater detail in B. S. Atal, “Predictive Coding of Speech at Low Bit Rates”, IEEE Trans. on Comm., vol. COM-30, pp. 600-614, (April 1982), which is incorporated by reference herein. The corresponding generated long-term filter tap coefficients β_{1}, β_{2}, β_{3 }and delay H for the respective frames are provided to the channel coder **30** for transmission or storage on the channel.

In addition, a stochastic codebook or code store **210** is employed which contains a fixed number, such as **1024**, of random noise-like codeword sequences, each sequence including a series of random numbers. Each random number represents a series of pulses for a duration equivalent to the duration of a frame. Each codeword can be applied to a scaler **215** by a sequencer **220** scaled by a constant G. The scaled codeword is used as excitation of a long-term predictive filter **225** and a short-term predictive filter **230** which in combination with signal combiner **227** generates a synthesized digital speech signal sequence **225** employs filter coefficients based on the long-term filter tap coefficients β_{1}, β_{2}, β_{3 }and delay H. Exemplary long-term predictive coders are described in greater detail in the previously cited “Predictive Coding of Speech at Low Bit Rates” article.

For each speech frame, the synthesis filter **230** uses the filter coefficients a_{1}, a_{2}, . . . a_{H }generated by the short-term frequency spectrum decoder **110** from the generated spectral coefficient sequence {acute over (α)}_{1}, {acute over (α)}_{2 }. . . {acute over (α)}_{P }and warping signal W generated by the encoder **20**. The operation of a suitable decoder for the decoder **110** is previously described with respect to **235**. The values of the error sequence is then squared by the squarer **240** and an average value based on the sequence is determined by an averager **245**.

Then, a peak picker **250** controls the sequencer **220** to sequence through the codewords in the codebook **210** to select the an appropriate codeword and value for the gain G that produces a substantially minimum mean-squared error signal. The determined codebook index L and gain G are then provided to the channel coder **30** for coding and transmission or storage of the respective speech signal frame on the channel. In this manner, the system effectively selects a codeword excitation entry L and gain constant G that substantially reduces or minimizes the error or difference between the digitized speech S(n) and the corresponding synthesized speech sequence

The decoder **300** of **200** if **105** decodes the coded sequence received from or read from the channel. The other components of the decoder **300** substantially correspond to those components in the coder used to synthesize the digital code sequence S(n) based on the received codeword entry L and the gain constant G for the respective frames of the speech signal. Accordingly, the speech signal **200** of

Although several embodiments of the invention have been described in detail above, many modifications can be made without departing from the teaching thereof. All of such modifications are intended to be encompassed within the following claims. For example, although the previously described embodiments have employed LPC analysis to code the non-linear transformed and/or warped spectral parameters, such coding can be performed by numerous alternative techniques according to the invention. It is possible for such alternative techniques to include those techniques that code the frequency components of the short-term frequency spectrum by methods other than coding based on a corresponding perceptual quality or accuracy that such components would have in corresponding synthesized speech.

Patent Citations

Cited Patent | Filing date | Publication date | Applicant | Title |
---|---|---|---|---|

US3624302 | Oct 29, 1969 | Nov 30, 1971 | Bell Telephone Labor Inc | Speech analysis and synthesis by the use of the linear prediction of a speech wave |

US4220819 | Mar 30, 1979 | Sep 2, 1980 | Bell Telephone Laboratories, Incorporated | Residual excited predictive speech coding system |

US4472832 | Dec 1, 1981 | Sep 18, 1984 | At&T Bell Laboratories | Digital speech coder |

US4827517 | Dec 26, 1985 | May 2, 1989 | American Telephone And Telegraph Company, At&T Bell Laboratories | Digital speech processor using arbitrary excitation coding |

US5255339 | Jul 19, 1991 | Oct 19, 1993 | Motorola, Inc. | Low bit rate vocoder means and method |

US5267317 | Dec 14, 1992 | Nov 30, 1993 | At&T Bell Laboratories | Method and apparatus for smoothing pitch-cycle waveforms |

US5371853 | Oct 28, 1991 | Dec 6, 1994 | University Of Maryland At College Park | Method and system for CELP speech coding and codebook for use therewith |

US5481642 | Aug 8, 1994 | Jan 2, 1996 | At&T Corp. | Constrained-stochastic-excitation coding |

US5495556 | Jan 14, 1994 | Feb 27, 1996 | Nippon Telegraph And Telephone Corporation | Speech synthesizing method and apparatus therefor |

US5513297 | Jul 10, 1992 | Apr 30, 1996 | At&T Corp. | Selective application of speech coding techniques to input signal segments |

USRE32580 | Sep 18, 1986 | Jan 19, 1988 | American Telephone And Telegraph Company, At&T Bell Laboratories | Digital speech coder |

GB533363A | Title not available | |||

JPH086596A | Title not available | |||

JPH0455899A | Title not available | |||

JPH0816195A | Title not available | |||

JPH0844394A | Title not available | |||

JPH05197400A | Title not available | |||

JPH06138896A | Title not available | |||

JPH07111462A | Title not available | |||

JPH07147566A | Title not available | |||

JPH07295574A | Title not available | |||

JPH07295594A | Title not available | |||

JPH08147883A | Title not available | |||

JPH08147886A | Title not available | |||

JPH08166799A | Title not available | |||

JPH08220199A | Title not available | |||

WO1992010830A1 | Dec 4, 1991 | Jun 25, 1992 | Digital Voice Systems Inc | Methods for speech quantization and error correction |

Non-Patent Citations

Reference | ||
---|---|---|

1 | B. Atal et al., "Stochastic Coding of Speech Signals at Very Low Bit Rates", Proc IEEE Int. Conf. Comm., pp. 1610-1613 (May 1984). | |

2 | B. Atal, et al. "Stochastic Coding of Speech Signals at Very Low Bit Rates", Proc IEEE Int. Conf. Comm., p. 48.1 (May 1984). | |

3 | Hicks, et al., "Pitch Invariant frequency lowering with nonuniform spectral compression", International Conference On Acoustics, Speech and Signal Processing, vol. 1, pp. 121-124 (1981). | |

4 | Japan Appeal Examiner's Office Letter dated Apr. 14, 2010. | |

5 | Japan Appeal Examiner's Office Letter dated Mar. 7, 2011. | |

6 | Japan Examiner's Office Letter dated Dec. 18, 2008. | |

7 | Japan Examiner's Refusal Decision dated Jul. 28, 2009. | |

8 | L. R. Rabiner et al., Digital Processing of Speech Signals, pp. 150-157, sects. 6.0-6.1, pp. 250-282, pp. 372-378, pp. 404-407, and pp. 447-450 (Prentice-Hall, New Jersey, 1978). | |

9 | M. Schroeder et al., "Code-Excited Linear Predictive (CELP): High Quality Speech at Very Low Bit Rates", Proc. IEEE Int. Conf. ASSP., pp. 937-940 (1985). | |

10 | Nelson, "The Mellin-wavelet transform" International Conference On Acoustics, Speech, And Signal Processing, vol. 2, pp. 1101-1104 (1995). | |

11 | P. Kroon et al., "A Class of Analysis-by-Synthesis Predictive Coers for High-Quality Speech Coding at Rate Between 4.8 and 16 KB/s", IEEE J. on Sel. Areas in Comm., SAC-6(2), pp. 353-363 (Feb. 1988). | |

12 | Wu, et al. "An investigation of sinusoidal speech coding" Proceedings Of Fourth International Symposium On Signal Processing And Its Applications, vol. 1, pp. 25-30 Aug. 1996. | |

13 | Wu, et al., "An investigation of Sinusoidal speech coding" Proceedings Of Fourth International Symposium on Signal Processing And Its Applications, vol. 1, pp. 9-12 (1996). |

Classifications

U.S. Classification | 704/203, 704/219, 704/220 |

International Classification | G10L19/06, G10L19/00, G10L19/04, H04B14/04, H03M7/30, G10L19/02 |

Cooperative Classification | G10L19/06, G10L19/0212 |

European Classification | G10L19/02T |

Legal Events

Date | Code | Event | Description |
---|---|---|---|

Oct 12, 2011 | AS | Assignment | Owner name: ALCATEL-LUCENT USA INC., NEW JERSEY Free format text: MERGER;ASSIGNOR:LUCENT TECHNOLOGIES INC.;REEL/FRAME:027047/0930 Effective date: 20081101 |

Oct 17, 2011 | AS | Assignment | Owner name: ALCATEL LUCENT, FRANCE Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ALCATEL-LUCENT USA INC.;REEL/FRAME:027069/0868 Effective date: 20111013 |

Jan 30, 2013 | AS | Assignment | Owner name: CREDIT SUISSE AG, NEW YORK Free format text: SECURITY AGREEMENT;ASSIGNOR:LUCENT, ALCATEL;REEL/FRAME:029821/0001 Effective date: 20130130 Owner name: CREDIT SUISSE AG, NEW YORK Free format text: SECURITY AGREEMENT;ASSIGNOR:ALCATEL LUCENT;REEL/FRAME:029821/0001 Effective date: 20130130 |

Sep 30, 2014 | AS | Assignment | Owner name: ALCATEL LUCENT, FRANCE Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG;REEL/FRAME:033868/0001 Effective date: 20140819 |

Rotate