WO2004079923A2

WO2004079923A2 - Method and apparatus for audio compression

Info

Publication number: WO2004079923A2
Application number: PCT/US2004/004477
Authority: WO
Inventors: Victor D. Kolesnik; Boris D. Kudryashov; Sergey Petrov; Evgeny Ovsyannikov; Boris Trojanovsky; Andrey Trofimov
Original assignee: Xvd Corporation
Priority date: 2003-02-28
Filing date: 2004-02-17
Publication date: 2004-09-16
Also published as: US20050159941A1; US20040172239A1; US6965859B2; US7181404B2; WO2004079923A3

Abstract

A method and apparatus for audio compression receives an audio signal. Transform coding is applied to the audio signal to generate a sequence of transform frequency coefficients. The sequence of transform frequency coefficients is partitioned into a plurality of non-uniform width frequency ranges (301) and then zero value frequency coefficients are inserted at the boundaries of the non-uniform width frequency ranges (303). As a result, certain of the transform frequency coefficients that represent high frequencies are dropped (305).

Description

METHOD AND APPARATUS FOR AUDIO COMPRESSION

BACKGROUND OF THE INVENTION

Cross Reference to Related Applications

[0001] This application claims priority from U.S. Provisional Patent Application, Serial

No. entitled "Method and Apparatus for Audio Compression" filed

February 28, 2003.

Field of the Invention

[0002] The invention relates to the field of data compression. More specifically, the invention relates to audio compression.

Background of the Invention

[0003] To allow typical computing systems to process (e.g., store, transmit, etc.) audio signals, various techniques have been developed to reduce (compress) the amount of data required to represent an audio signal. In typical audio compression systems, the following steps are generally performed: (1) a segment or frame of an audio signal is transformed into a frequency domain; (2) transform coefficients representing (at least a portion of) the frequency domain are quantized into discrete values; and (3) the quantized values are converted (or coded) into a binary format. The encoded/compressed data can be output, stored, transmitted, and or decoded/decompressed.

[0004] To achieve relatively high compression/low bit rates (e.g., 8 to 16 kbps) for various types of audio signals (e.g., speech, music, etc.), some compression techniques (e.g., CELP, ADPCM, etc.) limit the number of components in a segment (or frame) of an audio signal which is to be compressed. Unfortunately, such techniques typically do not take into account relatively substantial components of an audio signal. Thus, such techniques result in a relatively poor quality synthesized (decompressed) audio signal due to loss of information.

[0005] One method of audio compression that allows relatively high quality compression/decompression involves transform coding (e.g., discrete cosine transform, Fourier transform, etc.). Transform coding typically involves transforming an input audio signal using a transform method, such as low order discrete cosine transform (DCT). Typically, each transform coefficient of a portion (or frame) of an audio signal is quantized and encoded using any number of well-known coding techniques. Transform compression techniques, such as DCT, generally provide a relatively high quality synthesized signal, since a relatively high number of spectral components of an input audio signal are taken into consideration.

[0006] Most audio signal compression algorithms are based on transform coding. Some examples of transform coders include Dolby AC -2, AC-3, MPEG LII and LIII, ATRAC, Sony MiniDisc, and Ogg Norbis I. These coders employ modified discrete cosine transfer (MDCT) transforms with different frame lengths and overlap factor. [0007] Increasing frame length leads to better frequency resolution. As a result, high compression ratios can be achieved for stationary audio signals by increasing frame length. However, transform frequency coefficient quantization errors are spread over the entire length of a frame. The pursuit of higher compression with larger frame length results in "echo", which appears when sound attacks present in an audio signal input. This means that frame length, or frequency resolution, should be vary depending on the input audio signals, hi particular, the transform length should be shorter during sound attacks and longer for stationary signals. However, a sound attack may only occupy part of an entire signal bandwidth.

[0008] Large transform length also leads to large computational complexity. Both the number of computations and the dynamic range of transform coefficients increase if transform length increases, hence higher computational precision is required. Audio data representation and arithmetic operations must be performed with at least 24 bit precision if the frame is greater than or equal to 1024 samples, hence 16-bit digital signal processing cannot be used for encoding/decoding algorithms. [0009] In addition, conventional MDCT provides identical frequency resolution over an entire signal, even though different frequency resolutions are appropriate for different frequency ranges. To accommodate the perceptual ability of the human ear, higher frequency resolution is needed for low-frequency ranges and lower frequency resolution is needed for high-frequency ranges.

[0010] Furthermore, the amplitude transfer function of conventional MDCT is not "flat" enough. There are significant irregularities near frequency range boundaries. These irregularities make it difficult to use MDCT coefficients for psycho-acoustic analysis of the audio signal and to compute bit allocation. Conventional audio codecs compute auxiliary spectrum (typically with FFT, which is computationally expensive) for constructing a psycho-acoustic model (PAM).

BRIEF SUMMARY OF THE INVENTION

[0011] A method and apparatus for audio compression is described. According to one aspect of the invention, a method and apparatus for audio compression provides for receiving an audio signal, applying transform coding to the audio signal to generate a sequence of transform frequency coefficients, partitioning the sequence of transform frequency coefficients into a plurality of non-uniform width frequency ranges, inserting zero value frequency coefficients at the boundaries of the non-uniform width frequency ranges; and dropping certain of the transform frequency coefficients that represent high frequencies.

[0012] These and other aspects of the present invention will be better described with reference to the Detailed Description and the accompanying Figures.

BRIEF DESCRIPTION OF THE DRAWINGS

[0013] The invention may best be understood by referring to the following description and accompanying drawings that are used to illustrate embodiments of the invention. In the drawings:

[0014] . Figure 1 is an exemplary diagram of an audio encoder with an adaptive non- uniform filterbank according to one embodiment of the invention. [0015] Figure 2 is a block diagram of an exemplary adaptive non-uniform filterbank according to one embodiment of the invention.

[0016] Figure 3 is a flowchart for encoding an audio signal input according to one embodiment of the invention.

[0017] Figure 4 is a diagram illustrating exemplary zero value frequency coefficient stuffing according to one embodiment of the invention.

[0018] Figure 5 is a block diagram of an exemplary audio encoding unit with a non- uniform frequency range transfer function flattening filterbank and a adaptive sound attack based transform length varying filterbank according to one embodiment of the invention. [0019] Figure 6 is a block diagram illustrating an exemplary audio decoder according to one embodiment of the invention.

[0020] Figure 7 is a block diagram of an exemplary inverse non-uniform filterbank according to one embodiment of the invention.

[0021] Figure 8 is a diagram illustrating removal of boundary frequency coefficients from, frequency ranges according to one embodiment of the invention.

DETAILED DESCRIPTION OF THE INVENTION

[0022] hi the following description, numerous specific details are set forth to provide a thorough understanding of the invention. However, it is understood that the invention may be practiced without these specific details. In other instances, well-known circuits, structures, standards, and techniques have not been shown in detail in order not to obscure the invention.

Overview

[0023] A method and apparatus for audio compression is described. According to one embodiment of the invention, a method and apparatus for audio compression generates frequency ranges of non-uniform width (i.e., the frequency ranges are not all represented by the same number of transform frequency coefficients) during encoding of an audio input signal. Each of these non-uniform frequency ranges is processed separately, thus reducing the computational complexity of processing the audio signal represented by the frequency ranges. Partitioning (logical or actual) a transformed audio signal input into non-uniform frequency ranges also enables utilization of different frequency resolutions based on the width of a frequency range. [0024] According to another embodiment of the invention, transform frequency coefficients at the boundary of each of these frequency ranges are displaced with zero- value frequency coefficients (i.e., the frequency ranges are stuffed with zeroes at their boundaries). Stuffing zeroes at the boundaries of the frequency ranges provides for a flattened amplitude transfer function that can be used for quantizing, encoding, and psycho-acoustic model (PAM) computing.

[0025] In another embodiment of the invention, normalization and transforms are performed on a set of non-uniform width frequency ranges based on their width. Separately processing different width frequency ranges enables scalability and support of multiple sampling rates and multiple bit rates. Furthermore, separately processing each of a set of non-uniform frequency ranges enables modification of time resolution based on detection of a sound attack within a particular frequency range, independent of the other frequency ranges.

[0026] Decoding an audio signal that has been encoded as described above includes extracting frequency ranges from an encoded audio bitstream and processing the frequency ranges separately.

Encoding an Audio Signal

[0027] Figure 1 is an exemplary diagram of an audio encoder with an adaptive non- uniform filterbank according to one embodiment of the invention. In Figure 1, an adaptive non-uniform filterbank 101 is coupled with a PAM computing unit 105, a quantization unit 103, and a lossless coding unit 107. The adaptive non-uniform filterbank 101 is described at a high level in Figure 1 and will be described in more detail below. The adaptive non-uniform filterbank 101 receives an audio signal input. The adaptive non-uniform filterbank 101 processes the received audio signal input and generates indications of applied transform length, normalization coefficients, transform frequency coefficients, and block lengths of each frequency range. [0028] The transform frequency coefficients are processed by the adaptive non-uniform filterbank 101 based on the width of their corresponding frequency range and multiplexed together before being transmitted to the quantization unit 103 and the PAM computing unit 105. The transform frequency coefficients can be sent to both the quantization unit 103 and the PAM computing unit 105 because the adaptive non- uniform filterbank 101 has performed zero stuffing on the transform frequency coefficients to flatten the amplitude transfer function. The block lengths sent to the PAM computing unit 105 and the quantization unit 103 indicate the width of each frequency range.

[0029] The normalization coefficients sent from the adaptive non-uniform filterbank 101 to the lossless coding unit 107 include a normalization coefficient for each of the non-uniform width frequency ranges generated by the adaptive non-uniform filterbank 101. h an alternative embodiment of the invention, the normalization coefficients are transmitted to the quantization unit 103 in addition to or instead of the lossless coding unit 107. [0030] The adaptive non-uniform filterbank 101 also sends indications of applied transform length to the lossless coding unit 107. The indications of applied transform length indicates whether a short or long transform was performed on a frequency range. The adaptive non-uniform filterbank 101 adapts the length of transform performed on a frequency ranges based on presence of a sound attack within a frequency range. [0031] Figure 2 is a block diagram of an exemplary adaptive non-uniform filterbank according to one embodiment of the invention. Figure 3 is a flowchart for encoding an audio signal input according to one embodiment of the invention. Figure 2 will be described with reference to Figure 3. In Figure 2, an adaptive non-uniform filterbank 202 includes a non-uniform frequency range transform function flattening filterbank 201, an adaptive sound attack based transform length varying filterbank 203, and a sound attack based transform length decision unit 205.

[0032] The non-uniform frequency range transform function flattening filterbank 201 is coupled with the adaptive sound attack based transform length varying filterbank 203. The sound attack based transform length decision unit 205 is also coupled with the adaptive sound attack based transform length varying filterbank 203. In Figure 2, the non-uniform frequency range transform function flattening filterbank 201 and the sound attack based transform length decision unit 205 both receive an audio signal input. The sound attack based transform length decision unit 205 also (or instead) must receive the output of the non-uniform frequency range transform function flattening filterbank 201 to make independent decisions for different subbands. The original time-domain signal is used to make decisions about the presence of sound attacks over the entire signal.

[0033] Referring to Figure 3 at block 301, the non-uniform frequency range transform function flattening filterbank 201 of Figure 2 generates non-uniform frequency ranges of transform frequency coefficients from the audio input signal. At block 203, zero value frequency coefficients are stuffed at the boundaries of the frequency ranges. At block 205, the transform frequency coefficients that have been shifted beyond the last frequency range because of zero value frequency coefficient stuffing are dropped. [0034] Figure 4 is a diagram illustrating exemplary zero value frequency coefficient stuffing according to one embodiment of the invention. In Figure 4, a line diagram indicates 320 transform frequency coefficients. The 320 transform frequency coefficients have been partitioned into 5 frequency ranges (also referred to as subbands). Frequency ranges 401, 403, 405, 407, and 409 respectively include transform frequency coefficients 1 - 32, 33 - 64, 65 - 128, 128 - 192, and 193 - 320. In alternative embodiments of the invention greater or fewer frequency ranges may be generated. Also, a greater or fewer number of transform frequency coefficients may be generated.

[0035] After zero value frequency coefficient stuffing, a different set of frequency ranges are generated. A frequency range 411 includes transform frequency coefficients 1 - 30 and two zero value frequency coefficients at the end of the frequency range 411. Frequency ranges 413, 415, and 417 each include two zero value frequency coefficients at their beginning and at their end. Between the boundary zero value frequency coefficients, the frequency ranges 413, 415, and 417 respectively include transform frequency co efficients 31 - 58, 59 - 118, and 119 - 178. The last frequency range 419 includes two zero value frequency coefficients at the beginning of the range and transform frequency coefficients 179 - 304. As illustrated by Figure 4, stuffing sixteen zero value frequency coefficients at the boundaries of the frequency ranges has resulted in the last sixteen transform frequency coefficients being shifted out of the last frequency range 419 and dropped. Typically, the frequency coefficients that are dropped represent frequencies that are not perceivable by the human ear. Although Figure 4 has been described with reference to stuffing two zero value frequency coefficients at the boundaries of frequency ranges, a lesser number or greater number of zero value frequency coefficients can be stuffed at the boundaries of frequency ranges. [0036] As previously stated, displacing transform frequency coefficients at the boundaries of frequency ranges with zero value frequency coefficients flattens the amplitude transfer function for the represented audio signal. Flattening the transfer function enables the same transform coefficients to be used for PAM construction and quantization and encoding.

[0037] Returning to Figure 3, normalization coefficients are generated based on the zero stuffed non-uniform frequency ranges at block 307. At block 309, transform is performed on frequency ranges based on width of the frequency range. At block 311, the audio signal and transform frequency coefficients are analyzed for sounds attacks and the transform length performed on frequency ranges is varied based on detection of a sound attack.

[0038] Referring to Figure 2, the sounds attack based transform is performed by the adaptive sound attack based transform length varying filterbank 203. The sound attack based transform length decision unit 205 of Figure 2 determines if a sound attack is present in a particular frequency range and indicates to the adaptive sound attack based transform length varying filterbank 203 the appropriate transform length that should be applied.

[0039] The sound attack based transform length decision unit 205 is coupled with a lossless coding unit 211 and sends indications of applied transform lengths to the lossless coding unit 211. The adaptive sound attack based transform length varying filterbank 203 is coupled with a quantization unit 209 and a PAM computing unit 207. The adaptive sound attack based transform length varying filterbank 203 sends transform frequency coefficients and block length to the quantization unit 209 and the PAM computing unit 207.

[0040] The non-uniform frequency range transfer function flattening filterbank 201 is coupled with the lossless coding unit 211. The non-uniform frequency range transfer function flattening filterbank 201 generates normalization coefficients as described at block 307 in Figure 3 and sends these generated normalization coefficients to the lossless coding unit 211. In an alternative embodiment of the invention, the normalization coefficients are sent to the quantization unit 209. [0041] Partitioning a signal into multiple frequency ranges and processing the multiple frequency ranges separately reduces the complexity of the encoded audio signal and enables flexibility of the algorithm.

[0042] Figure 5 is a block diagram of an exemplary audio encoding unit with a non- uniform frequency range transfer function flattening filterbank and a adaptive sound attack based transform length varying filterbank according to one embodiment of the invention, hi Figure 5, a modified discrete cosine transform 640 (MDCT640) unit 501 receives 320 samples. Each time period, 320 samples are receive by the MDCT640 unit 501 and combined with a previous 320 samples to generate a 640 sample frame. The MDCT640 unit 501 windows and transforms these 640 samples to obtain 320 transform frequency coefficients. The MDCT640 unit 501 then partitions the 320 transform frequency coefficients into frequency ranges of non-uniform width. These frequency ranges are sent to a zero-stuffing unit 503. The zero-stuffing unit 503 stuffs zero value frequency coefficients at the boundaries of the frequency ranges and drops those transform frequency coefficients shifted out of the last frequency range, as previously described.

[0043] After zero-stuffing, the zero-stuffing unit 503 sends each frequency range to a different normalization unit, hi Figure 5, the 320 transform frequency coefficients have been partitioned into 5 frequency ranges. Each of the frequency ranges is sent to a different one of normalization units 505 A - 505E. The energy and dynamic range of transform frequency coefficients is different for different frequency ranges. Typically, the average energy in the first frequency range is 50-80 dB large than for last frequency range. Normalizing each frequency range separately enables further computations in each frequency range using relatively simple fixed-point arithmetic. Each of the normalization units 505A - 505E generates a normalization coefficient for their corresponding frequency range, which are sent to the next unit in the encoding process (e.g., the quantization unit). Each normalized frequency range then flows into one of a set of inverse MDCT units, hi Figure 5, the first frequency range flows into an TMDCT64 unit 507A and the second frequency range flows into an LMDCT64 unit 507B. The third and fourth frequency ranges respectively flow into LMDCT128 units 507C and 507D. The fifth frequency range flows into an LMDCT256 unit 507E. Each of the LMDCT units 507A - 507E performs on the received normalized transform frequency coefficients inverse DCT-IV transform, windowing, and overlapping with previous normalized transform frequency coefficients. Output from the LMDCT units 507A - 507E respectively flow into MDCT units 509A - 509E. Output from the LMDCT units 507A - 507E also flows into a sound attack based transform length decision unit 504.

[0044] The sound attack based transform length decision unit 504 analyzes the raw 640 samples and the frequency ranges from the LMDCT units 507A - 507E to detect sound attacks over the entire frame and/or within each frequency range. Based on detection of a sound attack , the sound attack based transform length decision unit 504 indicates to the appropriate MDCT unit the transform length that should be performed on a certain frequency range. The sound attack based transform length decision unit 504 also indicates to a lossless encoding unit the length of transform performed. [0045] To illustrate transform length varying based on sounds attack detection, processing of the first frequency range received by the MDCT512/128 unit 509A will be explained. If a sound attack is not detected in the first frequency range, then 256- samples long transform is used. In other words 8 output 32 transform frequency coefficients are combined to obtain a sequence of length 256. This sequence is coupled with 256 previous samples to obtain an input frame for length 512 MDCT transform performed by the MDCT512/128 unit 509 A. The MDCT512/128 unit 509A will generate 256 transform frequency coefficients. If a sound attack is detected in the first frequency range, then the MDCT512/128 unit 509A is switched to short-length mode of functioning. First, a transitional frame of length 256+64=320 is transformed. After the transitional frame is transformed, short transforms of length 128 are applied to the first frequency range until a decision is made by the sound attack based transform length decision unit 504 to switch to long-length transform. Another transitional frame (of length 320) is switches from short-length to long-length mode. Although in one embodiment of the invention MDCT units perform short or long length transforms, alternative embodiments of the invention have a greater number of modes of transform length. By switching to short transform length mode, time resolution can be reduced by 4 times during sound attacks or dynamically changing signals in any frequency range. [0046] The transform frequency coefficients generated by the MDCT units 509A - 509E are sent to a multiplexer 511. The multiplexer 511 orders the received transform frequency coefficients to form a sequence that will be quantized and losslessly encoded according to a PAM. [0047] Assuming E₀ denotes the sampling frequency of an audio signal and the audio signal does not includes sound attacks (i.e., all MDCT units are functioning in long- length mode), then the maximal frequency resolution for low frequencies is equal to E₀ /2/320/8 Hz. For example, if E₀ = 44100 Hz, then frequency resolution will be equal to 8.6 Hz for the first and second frequency ranges. For the third and fourth frequency ranges their frequency resolution will be equal to 17.2 Hz. For the fifth frequency range, the frequency resolution will be equal to 68.9.5 Hz. [0048] The audio encoder described in the above figures can be applied to application that require scalability, embedded functioning, and/or support of multiple sampling rates and multiple bit rates. For example, assume a 44.1 kHz audio signal input is partitioned into 5 frequency ranges (or subbands). The information transmitted to various users can be scaled to accommodate particular users. One set of users may receive all 5 frequency ranges whereas other users may only receive the first three frequency ranges (the lower frequency ranges). The two different sets of users are provided different bit-rates and different signal quality. The audio decoders of the set of users that receive only the lower frequency ranges reconstruct half of the time- domain samples, resulting in a 22.1 kHz signal sampling frequency. If a set of users only receive the 1^st frequency range (lowest frequency), then the reconstructed signal can be reproduced with a sampling rate of 8 or 11.025 kHz.

Decoding a Zero Stuffed Length Varied Audio Signal

[0049] Decoding a zero stuffed length varied audio signal involves performing inverse operations of encoding described above.

[0050] Figure 6 is a block diagram illustrating an exemplary audio decoder according to one embodiment of the invention. A demultiplexer 601 receives a bitstream. The demultiplexer 601 is coupled with a lossless decoder and dequantizer 603 and an inverse non-uniform filterbank 605. The demultiplexer 601 extracts encoded data (quantized and encoded zero stuffed length varied transform frequency coefficients) and bit allocation from the received bitstream and sends them to the lossless decoder and dequantizer 603. The demultiplexer 601 also extracts frame length from the bitstream and sends the frame length to the lossless decoder and dequantizer 603 and the inverse non-uniform filterbank 605. The lossless decoder and dequantizer 603 uses the bit allocation and the frame length to decode and dequantize the encoded data received from the demultiplexer 601. The lossless decoder and dequantizer 603 outputs transform frequency coefficients and normalization coefficients to the inverse non- uniform filterbank 605. The inverse non-uniform filterbank 605 processes the transform frequency coefficients and the normalization coefficients to generate synthesized audio data. [0051] Figure 7 is a block diagram of an exemplary inverse non-uniform filterbank according to one embodiment of the invention. A demultiplexer 701 is coupled with LMDCT units 703 A - 703E. The LMDCT units 703 A - 703D are LMDCT 512/128 units. The LMDCT unit 703E is an LMDCT 256/64. The demultiplexer 701 receives transform frequency coefficients and demultiplexes the transform frequency coefficients into frequency ranges. Frequency ranges 1 - 5 respectively flow to LMDCTunits 703 A - 703E. All of the LMDCT units 703 A - 703E also receive frame length. After the LMDCT units 703A - 703E perform inverse MDCT on the frequency range(s) that they have received, the outputs from the LMDCT units 703A - 703E respectively flow from to MDCT units 705A - 705E. MDCT units 705A - 705B are MDCT64 units. MDCT 705C - 705D are MDCT128 units. MDCT unit 705E is an MDCT256 unit. The MDCT units 705A - 705E are respectively coupled with de- normalization units 707A - 707E. Outputs from the MDCT units 705A - 705E respectively flow to the de-normalization units 707A - 707E. The de-normalization units 707A - 707E also receive normalization coefficients. The de-normalization units 707A - 707E de-normalize the transform frequency coefficients received from the MDCT units 705A - 705E using the normalization coefficients. The denormalized transform frequency coefficients flow into a zero-removing unit 709. The zero- removing unit 709 modifies the frequency ranges by removing boundary frequency coefficients that were originally zero value frequency coefficients. [0052] Figure 8 is a diagram illustrating removal of boundary frequency coefficients from frequency ranges according to one embodiment of the invention. In Figure 8, frequency ranges 801, 803, 805, 807, and 809 respectively include transform frequency coefficients 1 - 32, 33 - 64, 65 - 128, 129 - 192, and 193 - 320. In the example illustrated in Figure 8, the following transform frequency coefficients were originally zero value frequency coefficients: 31 - 34, 63 - 66, 127 - 130, and 191 - 194. After removal of boundary frequency coefficients, the resulting frequency ranges 811, 813, 815, 817, and 819 respectively include the following frequency coefficients: 1 - 32, 35, 36; 37 - 60, 65 - 72; 73 - 126, 131 - 140; 141 - 190, 195 - 208; and 209 - 304. In addition to transform frequency coefficients 209 - 304, the frequency range 819, which corresponds to the frequency range 809, also includes zero value frequency coefficients as the frequency coefficients 305 - 320. [0053] Returning to Figure 7, the zero-removing unit 709 passes the modified frequency ranges to an LMDCT640 unit 711. After performing inverse MDCT on the frequency ranges, the LMDCT640 unit 711 outputs synthesized audio data. [0054] The audio encoder and decoder described above includes memories, processors, and/or ASICs. Such memories include a machine-readable medium on which is stored a set of instructions (i.e., software) embodying any one, or all, of the methodologies described herein. Software can reside, completely or at least partially, within this memory and/or within the processor and/or ASICs. For the purpose of this specification, the term "machine-readable medium" shall be taken to include any mechanism that provides (i.e., stores and/or transmits) information in a form readable by a machine (e.g., a computer). For example, a machine-readable medium includes read only memory ("ROM"), random access memory ("RAM"), magnetic disk storage media, optical storage media, flash memory devices, electrical, optical, acoustical, or other form of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.), etc.

Alternative Embodiments

[0055] While the invention has been described in terms of several embodiments, those skilled in the art will recognize that the invention is not limited to the embodiments described. For instance, while the flow diagrams show a particular order of operations performed by certain embodiments of the invention, it should be understood that such order is exemplary (e.g., alternative embodiments may perform the operations in a different order, combine certain operations, overlap certain operations, etc.). In addition, while embodiments of the invention have been described with reference to MDCT and LMDCT, alternative embodiments of the invention utilize other transform coding techniques.

[0056] Thus, the method and apparatus of the invention can be practiced with modification and alteration within the spirit and scope of the appended claims. The description is thus to be regarded as illustrative instead of limiting on the invention.

Claims

CLAIMS We claim:

1. A method for audio compressing comprising: receiving an audio signal; applying transform coding to the audio signal to generate a sequence of transform frequency coefficients; partitioning the sequence of transform frequency coefficients into a plurality of non-uniform width frequency ranges; inserting zero value frequency coefficients at the boundaries of the non-uniform width frequency ranges; and dropping certain of the transform frequency coefficients that represent high frequencies.

2. The method of claim 1 further comprising separately applying a transform to each of the plurality of non-uniform width frequency ranges.

3. The method of claim 2 wherein application of the transform is in parallel.

4. The method of claim 1 further comprising varying length of transform operations applied to each of the plurality of non-uniform width frequency ranges.

5. The method of claim 1 wherein the number of dropped transform frequency coefficients is equal to the number of inserted zero value frequency coefficients.

6. The method of claim 1 further comprising: constructing a psycho-acoustic model with the plurality of non-uniform width frequency ranges with inserted zero value frequency coefficients; and quantizing the plurality of non-uniform width frequency ranges with inserted zero value frequency coefficients.

7. A method for audio compression comprising: generating a plurality of frequency coefficients representing an audio signal; grouping the plurality of frequency coefficients into frequency ranges of non- uniform width; determining if a sound attack occurs in any one of the non-uniform width frequency ranges; and performing transform length switching separately on each of the frequency ranges based on determining occurrence of a sound attack.

8. The method of claim 7 further comprising stuffing zeros at the boundaries of the non-uniform width frequency ranges and dropping certain of the plurality of frequency coefficients that represent higher end frequencies.

9. The method of claim 8 wherein stuffing zeros at the boundaries comprises: insert zeros at the boundaries of the frequency ranges; and shifting those of the plurality of frequency coefficients that are displaced by the inserted zeros into the next frequency range.

10. The method of claim 7 further comprising separately performing transforms on each of the plurality of non-uniform width frequency ranges based on their width.

11. The method of claim 10 wherein the transforms are inverse modified discrete cosine transforms.

12. The method of claim 7 wherein the performed long and short transforms are modified discrete cosine transforms.

13. A method for audio compression comprising: applying a transform to a plurality of audio samples to generate a sequence of transform frequency coefficients; and partitioning the sequence of transform frequency coefficients into varying width frequency subbands with zero value frequency coeffcients at the boundaries of the frequency subbands.

14. The method of claim 13 further comprising dropping a set of one or more transform frequency coefficients in the highest frequency subband.

15. The method of claim 14 wherein the number of dropped transform frequency coefficients corresponds to the number of zero value frequency coefficients stuffed at the boundaries of the frequency subbands.

16. The method of claim 13 further comprising: constructing a psycho-acoustic model with the varying width subbands; and quantizing the varying width subbands.

17. The method of claim 13 further comprising applying transforms of varying length to each of the varying width subbands.

18. A method for audio compression comprising: partitioning an audio input into a plurality of non-uniform frequency subbands, each of the plurality of non-uniform frequency subbands including a set of one or more frequency coefficients; displacing those of the set of frequency coefficients at the boundary of each subband with zeros; and dropping those of the set of frequency coefficients that fall outside of the plurality of frequency subbands after the displacing.

19. The method of claim 18 further comprising separately applying a transform to each of the plurality of non-uniform frequency subbands.

20. The method of claim 19 wherein application of the transform is in parallel.

21. The method of claim 18 further comprising varying length of transform operations applied to each of the plurality of non-uniform frequency subbands.

22. The method of claim 18 wherein the number of dropped frequency coefficients is equal to the number of inserted zeros.

23. The method of claim 18 further comprising: constructing a psycho-acoustic model with the plurality of non-uniform frequency subbands; and quantizing the plurality of non-uniform frequency subbands.

24. A method for audio compression comprising: generating a plurality of non- uniform frequency subbands, each of the plurality of non-uniform frequency subbands including a set of one or more frequency coefficients, from an audio input signal; displacing those of the set of frequency coefficients at the boundary of each non-uniform frequency subband with zeros; separately normalizing the non-uniform frequency subbands, including the

varying transform length applied to each of the plurality of non-uniform frequency subbands based on the detection of a sound attack within the plurality of non-uniform frequency subbands; and multiplexing the plurality of non-uniform frequency subbands.

25. The method of claim 24 wherein inverse modified discrete transform is applied to the plurality of non-uniform frequency subbands after normalizing.

26. The method of claim 24 wherein the varied transform is modified discrete cosine transform.

27. A method for audio decompression comprising: receiving a bitstream; extracting a sequence of transform frequency coefficients from the bitstream; demultiplexing the sequence of transform frequency coefficients into a plurality of frequency ranges; removing boundary transform frequency coefficients that were originally zeros from the plurality of frequency ranges; shifting the remaining transform frequency coefficients to fill in for the removed boundary transform frequency coefficients; and inserting zeros into vacancies in the higher range of the plurality of frequency ranges caused by said shifting.

28. The method of claim 27 further comprising applying inverse modified discrete cosing transform to the plurality of frequency ranges.

29. The method of claim 27 further comprising decoding and dequantizing the sequence of transform frequency coefficients.

30. A method for audio compression comprising: partitioning an audio signal into a plurality of non-uniform width frequency ranges, each of the plurality of non-uniform width frequency ranges including a set of one or more transform frequency coefficients; indicating the width of each of the plurality of frequency ranges; separately processing each of the plurality of non-uniform width frequency ranges; and encoding the plurality of frequency ranges and their width indications.

31. The method of claim 30 further comprising separately performing transform length switching on one of the plurality of frequency ranges based on detection of a sound attack within the one of the plurality of frequency ranges.

32. The method of claim 30 further comprising: stuffing zeros at the boundaries of the plurality of frequency ranges; shifting those transform frequency coefficients displaced by the stuffed zeros; and dropping those transform frequency coefficients that fall outside of the plurality of frequency ranges from said shifting.

33. The method of claim 30 wherein the processing comprises normalizing and transforming.

34. The method of claim 33 wherein the transforming is modified discrete cosine transforming.

35. An apparatus comprising: an adaptive non-uniform filterbank to represent an audio input with a number of transform frequency coefficients that is less than the audio input's number of samples; a quantization unit coupled with the adaptive non-uniform filterbank, the quantization unit to receive fransform frequency coefficients from adaptive non-uniform filterbank; and a lossless encoding unit coupled with the quantization unit, the lossless encoding unit to receive quantized transform coefficients from the quantization unit.

36. The apparatus of claim 35 wherein the adaptive non-uniform filterbank comprises: a non-uniform frequency range transform function flattening filterbank to partition a sequence of transform frequency coefficients generated from the audio input into frequency ranges of non-uniform width and to flatten a transfer function of the sequence of transform frequency coefficients; an adaptive sound attack based transform length varying filterbank coupled with the non-uniform frequency range transform function flattening filterbank; a sound attack detection unit coupled with the adaptive sound attack based transform length varying filterbank; and a multiplexer coupled with the adaptive sound attack based transform length varying filterbank.

37. The apparatus of claim 36 wherein the non-uniform frequency range transform function flattening filterbank comprises: a modified discrete cosine transform unit; a frequency range boundary zero stuffing unit coupled with the transform unit; and a plurality of parallel inverse modified discrete cosine transform units coupled with the frequency range boundary zero stuffing unit.

38. The apparatus of claim 36 wherein the adaptive sound attack based transform length varying filterbank comprises a plurality of parallel multi-length transform units.

39. The apparatus of claim 35 further comprising a psych-acoustic model computing unit coupled with the adaptive non-uniform filterbank and the quantization unit.

40. An apparatus comprising: a non-uniform frequency range transform function flattening filterbank to receive an audio signal, to partition the audio signal into varying frequency ranges of frequency coefficients, and to perform zero bit stuffing at the boundaries of the frequency ranges and to drop certain high frequency coefficients; a sound attack detection unit coupled with the non-uniform frequency range transform function flattening filterbank, the sound attack detection unit to locate sound attacks within the audio signal; an adaptive sound attack based transform length varying filterbank coupled with the non-uniform frequency range transform function flattening filterbank and the sound attack detection unit, the adaptive sounds attack based transform length varying filterbank to perform varying length transforms on the audio signal based on sound attack detection indicated by the sound attack detection unit; a multiplexer coupled with the adaptive sound attack based transform length varying filterbank; a quantization unit coupled with the multiplexer; a pysco-acoustic model (PAM) computing unit coupled with the multiplexer; and a lossless coding unit coupled with the quantization unit and the PAM computing unit, the lossless coding unit to losslessly code transform coefficients received from the quantization unit.

41. The apparatus of claim 40 wherein the non-uniform frequency range transform function flattening filterbank comprises: a modified discrete cosine transform unit; a frequency range boundary zero stuffing unit coupled with the transform unit; and a plurality of parallel inverse modified discrete cosine transform units coupled with the frequency range boundary zero stuffing unit.

42. The apparatus of claim 40 wherein the adaptive sound attack based transform length varying filterbank comprises a plurality of parallel multi-length transform units.

43. An audio decoder comprising: a demultiplexer to receive a bitstream and to extract a sequence of transform frequency coefficients; and an inverse adaptive non-uniform filterbank coupled with the demultiplexer, the inverse adaptive non-uniform filterbank to partition a sequence of transform frequency coefficients into a plurality of non-uniform width frequency ranges, to remove certain boundary transform frequency coefficients originally based on zeros, and to insert zeros for previously removed high range transform frequency coefficients.

44. The audio decoder of claim 43 wherein the inverse adaptive non-uniform filterbank includes: a plurality of parallel inverse modified discrete cosine transform units; a plurality of parallel modified discrete cosine transform units coupled with the plurality of parallel inverse modified discrete cosine transform units; a plurality of parallel de-normalization units coupled with the plurality of parallel modified discrete cosine transform units; a zero removing unit coupled with the plurality of de-normalization units; and an inverse modified discrete cosine transform unit coupled with the zero removing unit.

45. The audio decoder of claim 43 further comprising a decoder and dequanztizer unit coupled with the demultiplexer.

46. A machine-readable medium having a set of instruction stored thereon, which when executed by a set of one or more processors causes the set of processors to perform the operations comprising: receiving an audio signal; applying transform coding to the audio signal to generate a sequence of transform frequency coefficients; partitioning the sequence of transform frequency coefficients into a plurality of non-uniform width frequency ranges; inserting zero value frequency coefficients at the boundaries of the non-uniform width frequency ranges; and dropping certain of the transform frequency coefficients that represent high frequencies.

47. The machine-readable medium of claim 46 further comprising separately applying a transform to each of the plurality of non-uniform width frequency ranges.

48. The machine-readable medium of claim 47 wherein application of the transform is in parallel.

49. The machine-readable medium of claim 46 further comprising varying length of transform operations applied to each of the plurality of non-uniform width frequency ranges.

50. The machine-readable medium of claim 46 wherein the number of dropped transform frequency coefficients is equal to the number of inserted zero value frequency coefficients.

51. The machine-readable medium of claim 46 further comprising: constructing a psycho-acoustic model with the plurality of non-uniform width frequency ranges with inserted zero value frequency coefficients; and quantizing the plurality of non-uniform width frequency ranges with inserted zero value frequency coefficients.

52. A machine-readable medium having a set of instruction stored thereon, which when executed by a set of one or more processors causes the set of processors to perform the operations comprising: generating a plurality of frequency coefficients representing an audio signal; grouping the plurality of frequency coefficients into frequency ranges of non- uniform width; determining if a sound attack occurs in any one of the non-uniform width frequency ranges; and performing short transforms on those non-uniform frequency ranges that have a sound attack and long transforms on those non-uniform frequency ranges that do not have a sound attack.

53. The machine-readable medium of claim 52 further comprising stuffing zeros at the boundaries of the non-uniform width frequency ranges and dropping certain of the plurality of frequency coefficients that represent higher end frequencies.

54. The machine-readable medium of claim 53 wherein stuffing zeros at the boundaries comprises: insert zeros at the boundaries of the frequency ranges; and shifting those of the plurality of frequency coefficients that are displaced by the inserted zeros into the next frequency range.

55. The machine-readable medium of claim 52 further comprising separately performing transforms on each of the plurality of non-uniform width frequency ranges based on their width.

56. The machine-readable medium of claim 55 wherein the transforms are inverse modified discrete cosine transforms.

57. The machine-readable medium of claim 52 wherein the performed long and short transforms are modified discrete cosine transforms.

58. A machine-readable medium having a set of instruction stored thereon, which when executed by a set of one or more processors causes the set of processors to perform the operations comprising: applying a transform to a plurality of audio samples to generate a sequence of transform frequency coefficients; and partitioning the sequence of transform frequency coefficients into varying width frequency subbands with zero value frequency coeffcients at the boundaries of the frequency subbands.

59. The machine-readable medium of claim 58 further comprising dropping a set of one or more transform frequency coefficients in the highest frequency subband.

60. The machine-readable medium of claim 59 wherein the number of dropped transform frequency coefficients corresponds to the number of zero value frequency coefficients stuffed at the boundaries of the frequency subbands.

61. The machine-readable medium of claim 58 further comprising: constructing a psycho-acoustic model with the varying width subbands; and quantizing the varying width subbands.

62. The machine-readable medium of claim 58 further comprising applying transforms of varying length to each of the varying width subbands.

63. A machine-readable medium having a set of instruction stored thereon, winch when executed by a set of one or more processors causes the set of processors to perform the operations comprising: partitioning an audio input into a plurality of non-uniform frequency subbands, each of the plurality of non-uniform frequency subbands including a set of one or more frequency coefficients; displacing those of the set of frequency coefficients at the boundary of each subband with zeros; and dropping those of the set of frequency coefficients that fall outside of the plurality of frequency subbands after the displacing.

64. The machine-readable medium of claim 63 further comprising separately applying a transform to each of the plurality of non-uniform frequency subbands.

65. The machine-readable medium of claim 64 wherein application of the transform is in parallel.

66. The machine-readable medium of claim 63 further comprising varying length of transform operations applied to each of the plurality of non-uniform frequency subbands.

67. The machine-readable medium of claim 63 wherein the number of dropped frequency coefficients is equal to the number of inserted zeros.

68. The machine-readable medium of claim 63 further comprising: constructing a psycho-acoustic model with the plurality of non-uniform frequency subbands; and quantizing the plurality of non-uniform frequency subbands.

69. A machine-readable medium having a set of instruction stored thereon, which when executed by a set of one or more processors causes the set of processors to perform the operations comprising: generating a plurality of non- uniform frequency subbands, each of the plurality of non-uniform frequency subbands including a set of one or more frequency coefficients, from an audio input signal; displacing those of the set of frequency coefficients at the boundary of each non-uniform frequency subband with zeros; separately normalizing the non-uniform frequency subbands, including the zeros; varying transform length applied to each of the plurality of non-uniform frequency subbands based on the detection of a sound attack within the plurality of non-uniform frequency subbands; and multiplexing the plurality of non-uniform frequency subbands.

70. The machine-readable medium of claim 69 wherein inverse modified discrete transform is applied to the plurality of non-uniform frequency subbands after normalizing.

71. The machine-readable medium of claim 69 wherein the varied transform is modified discrete cosine transform.

72. A machine-readable medium having a set of instruction stored thereon, which when executed by a set of one or more processors causes the set of processors to perform the operations comprising: receiving a bitstream; extracting a sequence of transform frequency coefficients from the bitstream; demultiplexing the sequence of transform frequency coefficients into a plurality of frequency ranges; removing boundary transform frequency coefficients that were originally zeros from the plurality of frequency ranges; shifting the remaining transform frequency coefficients to fill in for the removed boundary transform frequency coefficients; and inserting zeros into vacancies in the higher range of the plurality of frequency ranges caused by said shifting.

73. The machine-readable medium of claim 72 further comprising applying inverse modified discrete cosing transform to the plurality of frequency ranges.

74. The machine-readable medium of claim 72 further comprising decoding and dequantizing the sequence of transform frequency coefficients.

75. A machine-readable medium having a set of instruction stored thereon, which when executed by a set of one or more processors causes the set of processors to perform the operations comprising: partitioning an audio signal into a plurality of non-uniform width frequency ranges, each of the plurality of non-uniform width frequency ranges including a set of one or more transform frequency coefficients; indicating the width of each of the plurality of frequency ranges; separately processing each of the plurality of non-uniform width frequency ranges; and encoding the plurality of frequency ranges and their width indications.

76. The machine-readable medium of claim 75 further comprising separately performing transform length switching on one of the plurality of frequency ranges based on detection of a sound attack within the one of the plurality of frequency ranges.

77. The machine-readable medium of claim 75 further comprising: stuffing zeros at the boundaries of the plurality of frequency ranges; shifting those transform frequency coefficients displaced by the stuffed zeros; and dropping those transform frequency coefficients that fall outside of the plurality of frequency ranges from said shifting.

78. The machine-readable medium of claim 75 wherein the processing comprises normalizing and transforming.

79. The machine-readable medium of claim 78 wherein the transforming is modified discrete cosine transforming.