CA2600713A1

CA2600713A1 - Time warping frames inside the vocoder by modifying the residual

Info

Publication number: CA2600713A1
Application number: CA002600713A
Authority: CA
Inventors: Rohit Kapoor; Serafin Diaz Spindola
Original assignee: Individual
Current assignee: Qualcomm Inc
Priority date: 2005-03-11
Filing date: 2006-03-13
Publication date: 2006-09-21
Anticipated expiration: 2026-03-13
Also published as: NO20075180L; IL185935A0; KR20090119936A; RU2007137643A; CA2600713C; EP1856689A1; AU2006222963A1; US20060206334A1; JP2008533529A; TWI389099B; RU2371784C2; US8155965B2; WO2006099529A1; KR20070112832A; BRPI0607624B1; TW200638336A; MX2007011102A; JP5203923B2; IL185935A; BRPI0607624A2

Abstract

In one embodiment, the present invention comprises a vocoder having at least one input and at least one output, an encoder comprising a filter having at least one input operably connected to the input of the vocoder and at least one output, a decoder comprising a synthesizer having at least one input operably connected to the at least one output of the encoder, and at least one output operably connected to the at least one output of the vocoder, wherein the encoder comprises a memory and the encoder is adapted to execute instructions stored in the memory comprising classifying speech segments and encoding speech segments, and the decoder comprises a memory and the decoder is adapted to execute instructions stored in the memory comprising time-warping a residual speech signal to an expanded or compressed version of the residual speech signal.

Claims

1. A method communicating speech, comprising:
time-warping a residual speech signal to an expanded or compressed version of said residual speech signal; and synthesizing said time-warped residual speech signal.

2. The method communicating speech according to claim 1, further comprising the steps of:
classifying speech segments; and encoding said speech segments.

3. The method of communicating speech according to claim 2, wherein said step of encoding speech segments comprises using prototype pitch period, code-excited linear prediction, noise-excited linear prediction or 1/8 frame coding.

4. The method of communicating speech according to claim 2, further comprising the steps of:
sending said speech signal through a linear predictive coding filter, whereby short-term correlations in said speech signal are filtered out; and outputting linear predictive coding coefficients and a residual signal.

5. The method of communicating speech according to claim 2, wherein said step of classifying speech segments comprises categorizing speech frames as periodic, slightly periodic or noisy depending on whether the frames represents voiced, unvoiced or transient speech.

6. The method of communicating speech according to claim 2, wherein said encoding is code-excited linear prediction encoding.

7. The method of communicating speech according to claim 2, wherein said encoding is prototype pitch period encoding.

8. The method of communicating speech according to claim 2, wherein said encoding is noise-excited linear prediction encoding.

9. The method according to claim 6, wherein said step of time-warping comprises:
estimating a pitch period; and adding or subtracting at least one of said pitch period after receiving said residual signal.

10. The method according to claim 6, wherein said step of time warping comprises:
estimating pitch delay;
dividing a speech frame into pitch periods, wherein boundaries of said pitch periods are determined using said pitch delay at various points in said speech frame;
overlapping said pitch periods if said residual speech signal is decreased;
and adding said pitch periods if said residual speech signal is increased.

11. The method according to claim 7, wherein said step of time warping comprises the steps of:
estimating at least one pitch period;
interpolating said at least one pitch period;
adding said at least one pitch period when expanding said residual speech signal;
and subtracting said at least one pitch period when compressing said residual speech signal.

12. The method according to claim 8, wherein said step of encoding comprises encoding linear predictive coding information as gains of different parts of a speech segment.

13. The method according to claim 10, wherein said step of overlapping said pitch periods if said speech residual signal is decreased comprises:

segmenting an input sample sequence into blocks of samples;
removing segments of said residual signal at regular time intervals;
merging said removed segments; and replacing said removed segments with a merged segment;

14. The method according to claim 10, wherein said step of estimating pitch delay comprises interpolating between a pitch delay of an end of a last frame and an end of a current frame.

15. The method according to claim 10, wherein said step of adding said pitch periods comprises merging speech segments.

16. The method according to claim 10, wherein said step of adding said pitch periods if said residual speech signal is increased comprises adding an additional pitch period created from a first pitch segment and a second pitch period segment.

17. The method according to claim 12, wherein said gains are encoded for sets of speech samples.

18. The method according to claim 13, wherein said step of merging said removed segments comprises increasing a first pitch period segment's contribution and decreasing a second pitch period segment's contribution.

19. The method according to claim 15, further comprising the step of selecting similar speech segments, wherein said similar speech segments are merged.

20. The method according to claim 15, further comprising the step of correlating speech segments, whereby similar speech segments are selected.

21. The method according to claim 16, wherein said step of adding an additional pitch period created from a first pitch segment and a second pitch period segment comprises adding said first and said second pitch segments such that said first pitch period segment's contribution increases and said second pitch period segment's contribution decreases.

22. The method according to claim 17, further comprising the step of generating a residual signal by generating random values and then applying said gains to said random values.

23. The method according to claim 17, further comprising the step of representing said linear predictive coding information as 10 encoded gain values, wherein each encoded gain value represents 16 samples of speech.

24. A vocoder having at least one input and at least one output, comprising:
an encoder comprising a filter having at least one input operably connected to the input of the vocoder and at least one output; and a decoder comprising a synthesizer having at least one input operably connected to said at least one output of said encoder and at least one output operably connected to said at least one output of the vocoder.

25. The vocoder according to claim 24, wherein said decoder comprises:
a memory, wherein said decoder is adapted to execute software instructions stored in said memory comprising time-warping a residual speech signal to an expanded or compressed version of said residual signal.

26. The vocoder according to claim 24, wherein said encoder comprises:
a memory and said encoder is adapted to execute software instructions stored in said memory comprising classifying speech segments as 1/8 frame, prototype pitch period, code-excited linear prediction or noise-excited linear prediction.

27. The vocoder according to claim 26, wherein said decoder comprises:
a memory and said decoder is adapted to execute software instructions stored in said memory comprising time-warping a residual signal to an expanded or compressed version of said residual speech signal.

28. The vocoder according to claim 27, wherein said filter is a linear predictive coding filter which is adapted to:
filter out short-term correlations in a speech signal; and output linear predictive coding coefficients and a residual signal.

29. The vocoder according to claim 27, wherein said encoder comprises:
a memory and said encoder is adapted to execute software instructions stored in said memory comprising encoding said speech segments using code-excited linear prediction encoding.

30. The vocoder according to claim 27, wherein said encoder comprises:
a memory and said encoder is adapted to execute software instructions stored in said memory comprising encoding said speech segments using prototype pitch period encoding.

31. The vocoder according to claim 27, wherein said encoder comprises:
a memory and said encoder is adapted to execute software instructions stored in said memory comprising encoding said speech segments using noise-excited linear prediction encoding.

32. The vocoder according to claim 29, wherein said time-warping software instruction comprises estimating at least one pitch period; and adding or subtracting said at least one pitch period after receiving said residual signal.

33. The vocoder according to claim 29, wherein said time-warping software instruction comprises estimating pitch delay;
dividing a speech frame into pitch periods, wherein boundaries of said pitch periods are determined using said pitch delay at various points in said speech frame;
overlapping said pitch periods if said residual speech signal is decreased;
and adding said pitch periods if said residual speech signal is increased.

34. The vocoder according to claim 30, wherein said time-warping software instruction comprises estimating at least one pitch period;
interpolating said at least one pitch period;
adding said at least one pitch period when expanding said residual speech signal;
and subtracting said at least one pitch period when compressing said residual speech signal.

35. The vocoder according to claim 31, wherein said encoding said speech segments using noise-excited linear prediction encoding software instruction comprises encoding linear predictive coding information as gains of different parts of a speech segment.

36. The vocoder according to claim 33, wherein said overlapping said pitch periods if said speech residual signal is decreased instruction comprises segmenting an input sample sequence into blocks of samples;
removing segments of said residual signal at regular time intervals;
merging said removed segments; and replacing said removed segments with a merged segment.

37. The vocoder according to claim 33, wherein said estimating pitch delay instruction comprises interpolating between a pitch delay of an end of a last frame and an end of a current frame.

38. The vocoder according to claim 33, wherein said adding said pitch periods instruction comprises merging speech segments.

39. The vocoder according to claim 33, wherein said adding said pitch periods if said speech residual signal is increased instruction comprises adding an additional pitch period created from a first pitch segment and a second pitch period segment.

40. The vocoder according to claim 35, wherein said gains are encoded for sets of speech samples.

41. The vocoder according to claim 36, wherein said merging said removed segments instruction comprises increasing a first pitch period segment's contribution and decreasing a second pitch period segment's contribution.

42. The vocoder according to claim 38, further comprising the step of selecting similar speech segments, wherein said similar speech segments are merged.

43. The vocoder to claim 38, wherein said time-warping instruction further comprises correlating speech segments, whereby similar speech segments are selected.

44. The vocoder according to claim 39, wherein said adding an additional pitch period created from a first pitch segment and a second pitch period segment instruction comprises adding said first and said second pitch segments such that said first pitch period segment's contribution increases and said second pitch period segment's contribution decreases.

45. The vocoder according to claim 40, wherein said time-warping instruction further comprises generating a residual speech signal by generating random values and then applying said gains to said random values.

46. The vocoder according to claim 40, wherein said time-warping instruction further comprises representing said linear predictive coding information as 10 encoded gain values, wherein each encoded gain value represents 16 samples of speech.