SYSTEM AND METHOD FOR
MULTIRESOLUTION SCALABLE AUDIO
 Inventors: Scott N. Levine, Palo Alto; Tony S.
Verma, Stanford, both of Calif.
 Assignee: The Board of Trustees of the Leland Stanford Junior University, Palo Alto, Calif.
[ * ] Notice: This patent issued on a continued prosecution application filed under 37 CFR 1.53(d), and is subject to the twenty year patent term provisions of 35 U.S.C. 154(a)(2).
 Appl. No.: 7,995
 Filed: Jan. 16, 1998
Related U.S. Application Data
 Provisional application No. 60/035,576, Jan. 16, 1997.
 Int. CI. 6 G10H 1/12
 U.S. CI 84/603; 84/661; 84/DIG. 9;
704/209; 704/220; 704/268
 Field of Search 84/603, 621, 661,
84/683, 691, 699, 700, DIG. 9; 704/205-210,
 References Cited
U.S. PATENT DOCUMENTS
5,202,528 4/1993 Iwaooji 84/661 X
5,502,277 3/1996 Sakata 84/661
5,691,496 11/1997 Suzuki et al 84/661
N.J. Fliege et al, "Multi-Complementary Filter Bank", Hamburg University of Technology, ICASSP, 1993, pp. 1-4. Anderson, "Speech Analysis and Coding Using A MultiResolution Sinusoidal Transform", Georgia Institute of Technology, 0-7803-3192-3/96 1996 IEEE, pp. 1037-1040.
An audio signal analyzer and encoder is based on a model that considers audio signals to be composed of deterministic or sinusoidal components, transient components representing the onset of notes or other events in an audio signal, and stochastic components. Deterministic components are represented as a series of overlapping sinusoidal waveforms. To generate the deterministic components, the input signal is divided into a set of frequency bands by a multicomplementary filter bank. The frequency band signals are oversampled so as to suppress cross-band aliasing energy in each band. Each frequency band is analyzed and encoded as a set of spectral components using a windowing time frame whose length is inversely proportional to the frequency range in that band. Low frequency bands are encoded using longer time frames than higher frequency bands. Transient components are represented by parameters denoting sinusoidal shaped waveforms produced when the transient components are transformed into a real valued frequency domain waveform. Stochastic or noise components are represented as a series of spectral envelopes. The parameters representing the three signal components compose a stream of compressed encoded audio data that can be further compressed so as to meet a specified transmission bandwidth limit by the deleting the least significant bits of quantized parameter values, reducing the update rates of parameters, and/or deleting the parameters used to encode higher frequency bands until the bandwidth of the compressed audio data meets the bandwidth requirement. Signal quality degrades in a graduated manner with successive reductions in the transmitted data rate.
27 Claims, 6 Drawing Sheets