« PreviousContinue »
(12) United States Patent
Koishida et al.
(io) Patent No.: US 6,658,383 B2 (45) Date of Patent: Dec. 2,2003
(54) METHOD FOR CODING SPEECH AND MUSIC SIGNALS
(75) Inventors: Kazuhito Koishida, Goleta, CA (US);
Vladimir Cuperman, Goleta, CA (US);
Amir H. Majidimehr, Woodinville, WA
(US); Allen Gersho, Goleta, CA (US)
(73) Assignee: Microsoft Corporation, Redmond, WA (US)
( * ) Notice: Subject to any disclaimer, the term of this patent is extended or adjusted under 35 U.S.C. 154(b) by 0 days.
(21) Appl. No.: 09/892,105
(22) Filed: Jun. 26, 2001
(65) Prior Publication Data
US 2003/0004711 Al Jan. 2, 2003
(51) Int. CI.7 G10L 19/02; G10L 19/04;
G10L 19/00; H04Q 1/20; H04B 14/06
FOREIGN PATENT DOCUMENTS
WO WO 9827543 6/1998
Lefebvre, et al., "High quality coding of wideband audio signals using transform coded excitation (TCX)," Apr. 1994, 1994 IEEE International Confernece on Acoustics, Speech, and Signal Processing, vol. 1, pp. 1/193—1/196.* Salami, et al., "A wideband codec at 16/24 kbit/s with 10 ms frames," Sep. 1997, 1997 Workshop on Speech Coding for Telecommunications , pp 103-104.*
ITU-T, G.722.1 (09/99), Series G: Transmission Systems and Media, Digital Systems and Networks, Coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss.*
Saunders, J., "Real Time Discrimination of Broadcast Speech/Music," Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 993-996 (May 1996).
(List continued on next page.)
Primary Examiner—Marsha D. Banks-Harold
Assistant Examiner—-V. Paul Harper
(74) Attorney, Agent, or Firm—Leydig, Voit & Mayer, Ltd.
The present invention provides a transform coding method efficient for music signals that is suitable for use in a hybrid codec, whereby a common Linear Predictive (LP) synthesis filter is employed for both speech and music signals. The LP synthesis filter switches between a speech excitation generator and a transform excitation generator, in accordance with the coding of a speech or music signal, respectively. For coding speech signals, the conventional CELP technique may be used, while a novel asymmetrical overlap-add transform technique is applied for coding music signals. In performing the common LP synthesis filtering, interpolation of the LP coefficients is conducted for signals in overlap-add operation regions. The invention enables smooth transitions when the decoder switches between speech and music decoding modes.
7 Claims, 11 Drawing Sheets
Scheirer, E., et al., "Construction and Evalutaiton of A Robust Multifeature Speech/Music Discriminator," In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 1331-1334, (Apr. 1997).
Combescure, P., et al., "A16,24, 32 kbit/s Wideband Speech Codec Based on ATCELP," In Proceedings of IEEE International Conference On Acoustics, Speech, and Signal Processing, vol. 1, pp. 5-8 (Mar. 1999). Ellis, D., et al., "Speech/Music Discrimination Based on Posterior Probability Features," In Proceedings of Eurospeech, 4 pages, Budapest (1999).
El Maleh, K., et al. "Speech/Music Discrimination for Multimedia Applications," In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, pp. 2445-2448, (Jun. 2000). Houtgast, T, et al., "The Modulation Transfer Function In Room Acoustics As A Predictor of Speech Intelligibility," Acustica, vol. 23, pp. 66-73 (1973).
Tzanetakis, G., et al., "Multifeature Audio Segmentation for Browsing and Annotation," Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, pp. 103-106 (Oct. 1999).
J. Schnitzler, J. Eggers, C. Erdmann and P. Vary, "Wideband Speech Coding Using Forward/Backward Adaptive Prediction with Mixed Time/Fre quency Domain Excitation," in Proc. IEEE Workshop on Speech Coding, pp. 3-5, 1999. B. Bessette, R. Salami, C. Lafiamme and R. Lefebvre, "A Wideband Speech and Audio Codec at 16/24/32 kbit/s using Hybrid ACELP/TCX Techniques," in Proc. IEEE Workshop on Speech Coding, pp. 7-9, 1999.
S.A. Ramprashad, "A Multimode Transform Predictive Coder (MTPC) for Speech and Audio," in Proc. IEEE Workshop on Speech Coding, pp. 10-12, 1999. L. Tancerel, R. Vesa, V.T Ruoppila and R. Lefebvre, "Combined Speech and Audio Coding by Discrimination," in Proc. IEEE Workshop on Speech Coding, pp. 154-156, 2000.
J-H. Chen and D. Wang, "Transform Predictive Coding of Wideband Speech Signals," in Proc. International Conference on Acoustic, Speech, Signal Processing, pp. 275-278, 1996.
A. Ubale and A. Gersho, "Multi-Band CELP Wideband Speech Coder," Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Munich, pp. 1367-1370.
* cited by examiner
FIG. 23 High-level structure of hybrid speech/music encoder