US 20050159941 A1 Abstract A method and apparatus for audio compression receives an audio signal. Transform coding is applied to the audio signal to generate a sequence of transform frequency coefficients. The sequence of transform frequency coefficients is partitioned into a plurality of non-uniform width frequency ranges and then zero value frequency coefficients are inserted at the boundaries of the non-uniform width frequency ranges. As a result, certain of the transform frequency coefficients that represent high frequencies are dropped.
Claims(18) 1. A method for audio compression comprising:
generating a plurality of frequency coefficients representing an audio signal; grouping the plurality of frequency coefficients into frequency ranges of non-uniform width; determining if a sound attack occurs in any one of the non-uniform width frequency ranges; and performing transform length switching separately on each of the frequency ranges based on determining occurrence of a sound attack. 2. The method of 3. The method of insert zeros at the boundaries of the frequency ranges; and shifting those of the plurality of frequency coefficients that are displaced by the inserted zeros into the next frequency range. 4. The method of 5. The method of 6. The method of 7. A method for audio compression comprising:
generating a plurality of non-uniform frequency subbands, each of the plurality of non-uniform frequency subbands including a set of one or more frequency coefficients, from an audio input signal; displacing those of the set of frequency coefficients at the boundary of each non-uniform frequency subband with zeros; separately normalizing the non-uniform frequency subbands, including the zeros; varying transform length applied to each of the plurality of non-uniform frequency subbands based on the detection of a sound attack within the plurality of non-uniform frequency subbands; and multiplexing the plurality of non-uniform frequency subbands. 8. The method of 9. The method of 10. A machine-readable medium having a set of instruction stored thereon, which when executed by a set of one or more processors causes the set of processors to perform the operations comprising:
generating a plurality of frequency coefficients representing an audio signal; grouping the plurality of frequency coefficients into frequency ranges of non-uniform width; determining if a sound attack occurs in any one of the non-uniform width frequency ranges; and performing short transforms on those non-uniform frequency ranges that have a sound attack and long transforms on those non-uniform frequency ranges that do not have a sound attack. 11. The machine-readable medium of 12. The machine-readable medium of insert zeros at the boundaries of the frequency ranges; and shifting those of the plurality of frequency coefficients that are displaced by the inserted zeros into the next frequency range. 13. The machine-readable medium of 14. The machine-readable medium of 15. The machine-readable medium of 16. A machine-readable medium having a set of instruction stored thereon, which when executed by a set of one or more processors causes the set of processors to perform the operations comprising:
generating a plurality of non-uniform frequency subbands, each of the plurality of non-uniform frequency subbands including a set of one or more frequency coefficients, from an audio input signal; displacing those of the set of frequency coefficients at the boundary of each non-uniform frequency subband with zeros; separately normalizing the non-uniform frequency subbands, including the zeros; varying transform length applied to each of the plurality of non-uniform frequency subbands based on the detection of a sound attack within the plurality of non-uniform frequency subbands; and multiplexing the plurality of non-uniform frequency subbands. 17. The machine-readable medium of 18. The machine-readable medium of Description This is a divisional application of U.S. patent application Ser. No. 10/378,455, filed Mar. 3, 2003, which claims priority from U.S. Provisional Patent Application Ser. No. 60/450,943, filed Feb. 28, 2003. 1. Field of the Invention The invention relates to the field of data compression. More specifically, the invention relates to audio compression. 2. Background of the Invention To allow typical computing systems to process (e.g., store, transmit, etc.) audio signals, various techniques have been developed to reduce (compress) the amount of data representing an audio signal. In typical audio compression systems, the following steps are generally performed: (1) a segment or frame of an audio signal is transformed into a frequency domain; (2) transform coefficients representing (at least a portion of) the frequency domain are quantized into discrete values; and (3) the quantized values are converted (or coded) into a binary format. The encoded/compressed data can be output, stored, transmitted, and/or decoded/decompressed. To achieve relatively high compression/low bit rates (e.g., 8 to 16 kbps) for various types of audio signals (e.g., speech, music, etc.), some compression techniques (e.g., CELP, ADPCM, etc.) limit the number of components in a segment (or frame) of an audio signal which is to be compressed. Unfortunately, such techniques typically do not take into account relatively substantial components of an audio signal. Thus, such techniques result in a relatively poor quality synthesized (decompressed) audio signal due to loss of information. One method of audio compression that allows relatively high quality compression/decompression involves transform coding (e.g., discrete cosine transform, Fourier transform, etc.). Transform coding typically involves transforming an input audio signal using a transform method, such as low order discrete cosine transform (DCT). Typically, each transform coefficient of a portion (or frame) of an audio signal is quantized and encoded using any number of well-known coding techniques. Transform compression techniques, such as DCT, generally provide a relatively high quality synthesized signal, since they have a relatively high-energy compaction of spectral components of an input audio signal. Most audio signal compression algorithms are based on transform coding. Some examples of transform coders include Dolby AC-2, AC-3, MPEG LII and LIII, ATRAC, Sony MiniDisc, and Ogg Vorbis I. These coders employ modified discrete cosine transfer (MDCT) transforms with different frame lengths and overlap factors. Increasing frame length leads to better frequency resolution. As a result, high compression ratios can be achieved for stationary audio signals by increasing frame length. However, transform frequency coefficient quantization errors are spread over the entire length of a frame. The pursuit of higher compression with larger frame length results in “echo”, which appears when sound attacks present in an audio signal input. This means that frame length, or frequency resolution, should be vary depending on the input audio signals. In particular, the transform length should be shorter during sound attacks and longer for stationary signals. However, a sound attack may only occupy part of an entire signal bandwidth. Large transform length also leads to large computational complexity. Both the number of computations and the dynamic range of transform coefficients increase if transform length increases, hence higher computational precision is required. Audio data representation and arithmetic operations must be performed with at least 24 bit precision if the frame is greater than or equal to 1024 samples, hence 16-bit digital signal processing cannot be used for encoding/decoding algorithms. In addition, conventional MDCT provides identical frequency resolution over an entire signal, even though different frequency resolutions are appropriate for different frequency ranges. To accommodate the perceptual ability of the human ear, higher frequency resolution is needed for low-frequency ranges and lower frequency resolution is needed for high-frequency ranges. Furthermore, the amplitude transfer function of conventional MDCT is not “flat” enough. There are significant irregularities near frequency range boundaries. These irregularities make it difficult to use MDCT coefficients for psycho-acoustic analysis of the audio signal and to compute bit allocation. Conventional audio codecs compute auxiliary spectrum (typically with FFT, which is computationally expensive) for constructing a psycho-acoustic model (PAM). A method and apparatus for audio compression is described. According to one aspect of the invention, a method and apparatus for audio compression provides for receiving an audio signal, applying transform coding to the audio signal to generate a sequence of transform frequency coefficients, partitioning the sequence of transform frequency coefficients into a plurality of non-uniform width frequency ranges, inserting zero value frequency coefficients at the boundaries of the non-uniform width frequency ranges; and dropping certain of the transform frequency coefficients that represent high frequencies. These and other aspects of the present invention will be better described with reference to the Detailed Description and the accompanying Figures. The invention may best be understood by referring to the following description and accompanying drawings that are used to illustrate embodiments of the invention. In the drawings: In the following description, numerous specific details are set forth to provide a thorough understanding of the invention. However, it is understood that the invention may be practiced without these specific details. In other instances, well-known circuits, structures, standards, and techniques have not been shown in detail in order not to obscure the invention. Overview A method and apparatus for audio compression is described. According to one embodiment of the invention, a method and apparatus for audio compression generates frequency ranges of non-uniform width (i.e., the frequency ranges are not all represented by the same number of transform frequency coefficients) during encoding of an audio input signal. Each of these non-uniform frequency ranges is processed separately, thus reducing the computational complexity of processing the audio signal represented by the frequency ranges. Partitioning (logical or actual) a transformed audio signal input into non-uniform frequency ranges also enables utilization of different frequency resolutions based on the width of a frequency range. According to another embodiment of the invention, transform frequency coefficients at the boundary of each of these frequency ranges are displaced with zero-value frequency coefficients (i.e., the frequency ranges are stuffed with zeroes at their boundaries). Stuffing zeroes at the boundaries of the frequency ranges provides for a flattened amplitude transfer function that can be used for quantizing, encoding, and psycho-acoustic model (PAM) computing. In another embodiment of the invention, normalization and transforms are performed on a set of non-uniform width frequency ranges based on their width. Separately processing different width frequency ranges enables scalability and support of multiple sampling rates and multiple bit rates. Furthermore, separately processing each of a set of non-uniform frequency ranges enables modification of time resolution based on detection of a sound attack within a particular frequency range, independent of the other frequency ranges. Decoding an audio signal that has been encoded as described above includes extracting frequency ranges from an encoded audio bitstream and processing the frequency ranges separately. Encoding an Audio Signal The transform frequency coefficients are processed by the adaptive non-uniform filterbank The normalization coefficients sent from the adaptive non-uniform filterbank The adaptive non-uniform filterbank The non-uniform frequency range transform function flattening filterbank Referring to After zero value frequency coefficient stuffing, a different set of frequency ranges are generated. A frequency range As previously stated, displacing transform frequency coefficients at the boundaries of frequency ranges with zero value frequency coefficients flattens the amplitude transfer function for the represented audio signal. Flattening the transfer function enables the same transform coefficients to be used for PAM construction and quantization and encoding. Returning to Referring to The sound attack based transform length decision unit The non-uniform frequency range transfer function flattening filterbank Partitioning a signal into multiple frequency ranges and processing the multiple frequency ranges separately reduces the complexity of the encoded audio signal and enables flexibility of the algorithm. After zero-stuffing, the zero-stuffing unit The sound attack based transform length decision unit To illustrate transform length varying based on sounds attack detection, processing of the first frequency range received by the MDCT512/128 unit The transform frequency coefficients generated by the MDCT units Assuming F The audio encoder described in the above figures can be applied to application that require scalability, embedded functioning, and/or support of multiple sampling rates and multiple bit rates. For example, assume a 44.1 kHz audio signal input is partitioned into 5 frequency ranges (or subbands). The information transmitted to various users can be scaled to accommodate particular users. One set of users may receive all 5 frequency ranges whereas other users may only receive the first three frequency ranges (the lower frequency ranges). The two different sets of users are provided different bit-rates and different signal quality. The audio decoders of the set of users that receive only the lower frequency ranges reconstruct half of the time-domain samples, resulting in a 22.1 kHz signal sampling frequency. If a set of users only receive the 1 Decoding a Zero Stuffed Length Varied Audio Signal Decoding a zero stuffed length varied audio signal involves performing inverse operations of encoding described above. Returning to The audio encoder and decoder described above includes memories, processors, and/or ASICs. Such memories include a machine-readable medium on which is stored a set of instructions (i.e., software) embodying any one, or all, of the methodologies described herein. Software can reside, completely or at least partially, within this memory and/or within the processor and/or ASICs. For the purpose of this specification, the term “machine-readable medium” shall be taken to include any mechanism that provides (i.e., stores and/or transmits) information in a form readable by a machine (e.g., a computer). For example, a machine-readable medium includes read only memory (“ROM”), random access memory (“RAM”), magnetic disk storage media, optical storage media, flash memory devices, electrical, optical, acoustical, or other form of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.), etc. While the invention has been described in terms of several embodiments, those skilled in the art will recognize that the invention is not limited to the embodiments described. For instance, while the flow diagrams show a particular order of operations performed by certain embodiments of the invention, it should be understood that such order is exemplary (e.g., alternative embodiments may perform the operations in a different order, combine certain operations, overlap certain operations, etc.). In addition, while embodiments of the invention have been described with reference to MDCT and IMDCT, alternative embodiments of the invention utilize other transform coding techniques. Thus, the method and apparatus of the invention can be practiced with modification and alteration within the spirit and scope of the appended claims. The description is thus to be regarded as illustrative instead of limiting on the invention. Referenced by
Classifications
Legal Events
Rotate |