CN100481736C

CN100481736C - Coding method for compressing coding of multiple audio track digital audio signal

Info

Publication number: CN100481736C
Application number: CNB2005100987122A
Authority: CN
Inventors: 游余立
Original assignee: GUANGSHENG DIGITAL TECHNOLOGY Co Ltd GUANGZHOU
Current assignee: Digital Rise Technology Co Ltd
Priority date: 2002-08-21
Filing date: 2002-08-21
Publication date: 2009-04-22
Anticipated expiration: 2022-08-21
Also published as: CN100452657C; CN1783727A; CN1477872A; CN1750407A; CN1750408A; CN100481734C; CN100481735C; CN1750404A; CN1750406A; CN100505554C; CN1750405A; CN1750403A; CN100435485C; CN100533990C; CN100481733C; CN100474780C; CN100477531C; CN1750409A; CN1233163C; CN1756087A

Abstract

This invention relates to a device and a method for carrying out compressed encode and decode, among which, the encode device includes a frame length selector, a sub-band dissociated filter, a transient state detector, a proportional factor evaluator, a bit distributor used in adaptively distributing the bit resource decided by the target code rate to each sub-band, a sub-band quantiser and a multiplexer. The audio signals encoded by this invention can be edited synchronously with video signals and can endure over ten times of column codes so as to meet the needs of program distribution and transmission and reduce the processes and devices of program distribution greatly.

Description

Be used for coding method to compressing coding of multiple audio track audio signal

The application be submitted on August 21st, 2002, application number is 02130245.6 divide an application.

Technical field

The present invention relates to the encoding/decoding apparatus and the method thereof of digital audio and video signals, or rather, is about the Apparatus and method for to compressing coding of multiple audio track audio signal/decoding.

Background technology

Multichannel (comprising stereo) Digital Audio Compression Coding technology has been widely used in VCD, SVCD, and DVD, satellite television, Digital Television, and in the fields such as (Internet), the Internet.The subject matter that it will solve is that to be used to express the code check of multichannel digital audio signal very high, but it is very limited to can be used for propagating or store its channel capacity.For example, the surround sound of expressing 5.1 sound channels of each sample 24 bit of 48kHz sample rate with PCM needs the code check of 6912kbps (kilobits/second), and as the more limited application of channel capacity of digital TV and so on the code check that can distribute to digital audio and video signals be generally 384kbps, even the so looser application of channel capacity of DVD can be distributed to the code check of digital audio and video signals and also be generally 384kbps, 768kbps, and 1536kbps.At this, the Digital Audio Compression Coding Technology Need provides the compression ratio up to 18 times.

It is early stage that the exploitation of Digital Audio Compression Coding Study on Technology can be traced back to the seventies.Through the development in 30 years, at present extensively the technological frame that adopts finalized the design almost and has been frame length selector, frequency or sub-band division device, transient detector, linear predictor, bit distributor, quantizer, entropy coder, and multiplexer.

For example, MPEG 2 AAC[lists of references 1] and MPEG 4 AAC[lists of references 2] technology is divided into 1024 sample one frames to input audio frequency PCM signal flow, then every frame signal is made transient state and detect.If not finding has transient response in this frame sample, then (selectively) do long-term linear prediction, make the frequency decomposition of 1024 subbands then, again (selectively) each filial generation signal is done the short-term linear prediction.If finding has transient response, then further 1024 samples of this frame are divided into 8 subframes, 128 samples of every frame are made the frequency decomposition of 128 subbands then, and the position of those subframes at transient response place is sent to multiplexer.Make overall bit distributor subsequently, subband signal is done non-linear scalar quantization and quantification index is made Huffman (Huffman) coding based on the human auditory system model.At last, multiplexer supplementary that above each step produced and the Huffman code of expressing each sub-band samples be packaged into one complete be the compressed bit stream of unit with the frame.The advantage of AAC is the compression efficiency height.But the encoder complexity, the tonequality of decoded audio signal is not exclusively transparent.

Again for example, the frame length selector of the multi-channel audio coding device of DTS [list of references 3,4 and 5] can be according to sample rate and code check from 256,512, select a frame length in 1024,2048 and 4096, and divide framing input audio frequency PCM signal flow by this frame length.Make the frequency decomposition of 32 subbands subsequently, again each subband signal is made sub-band coding.Sub-band coding comprises that transient state detects, linear prediction, and based on the overall bit distributor of human auditory system model, scalar/vector quantization, and Huffman (Huffman) coding.At last, multiplexer supplementary that above each step produced and express the quantification index of each sub-band samples or Huffman code be packaged into one complete be the compressed bit stream of unit with the frame.The advantage of DTS is that the tonequality of decoded audio signal is good, is thought transparent fully by a lot of people when high code check (as 1536kbps).But its compression efficiency is not high.

Along with Digital Television in recent years in the commercial broadcasting of Europe and North America, the multichannel audio program becomes a problem that presses for solution as the dispensing of television field frame.Here a subject matter that relates to is that the present TV station and the facility of recording studio are only supported stereo.They are made into the almost armamentarium that multichannel means that replacing is relevant with audio frequency.The code check that the multichannel program is compressed to three-dimensional acoustic energy support can be avoided this problem.Transmit between each TV station and the recording studio and share program also helping after the compression of multichannel program.But the audio code stream after the compression has been introduced the structure of frame.If not waiting of the length of audio frame and frame of video will switch to the inside of audio frame when doing montage on to the border of video code flow at its frame, thereby destroys the structure of audio frame, decoder is made mistakes.In addition, in the making of multichannel program and delivery process, often need it is carried out repeatedly Code And Decode (file coding Tandem Coding) operation.This requires compress technique must be able to stand the file coding more than at least ten times and can't hear distortion.

Dolby E is an audio compression coding techniques [list of references 6] that aims at above application and design.Its frame length is fixed as 1792, but it with the method for sampling rate conversion make the frame length of shared time of the data flow of a frame Dolby E and various general frame of video frequency (NTSC, PAL, and film) equate with reach can with the purpose of their synchronous montages.Simultaneously, it is guaranteed to stand the file coding more than ten times and can't hear distortion with high code check again.But the compression efficiency of Dolby E is not high, is not suitable as the compress technique of a multichannel program transmission to end user's (as television set).Therefore, TV station must be decoded into PCM after making program with Dolby E, and then is encoded into AC-3[list of references 7] or the code stream of the coding techniques of the high compression efficiency of MPEG and so on after just can launch.Fig. 1 illustrates the compression coding technology that adopts Dolby E to do the program dispensing, and AC-3 makes the TV station's dispensing of compression coding technology of program transmission and the process of transmission of audio program.Therefrom as can be seen, there is following difficulty in this TV station's method scheme: 1) distortion of audio signal is big: the sample rate conversion of Dolby E itself is introduced distortion, introduces new distortion from the code stream of Dolby E form again to the transfer coding (Transcoding) of the code stream of AC-3 form.2) program of having launched is difficult to use again: as shown in Figure 1, if again with the program of having launched, it must be decoded into PCM and could switch with other programs (as advertisement etc.) behind Dolby E coding.When emission, must be decoded to the transfer cataloged procedure of AC-3 coding again through the Dolby E that associates.Because it is generally not high to make the code check of final transmission, the tonequality of the program of having launched after shifting coding through above this a string (AC-3 decoding-Dolby E coding-Dolby E decoding-AC-3 coding) is difficult to maybe can't guarantee.3) because the sample rate of the input and output of Dolby E does not have simple frequency multiplication relation, its Code And Decode device is complex and expensive very all.

Summary of the invention

A first aspect of the present invention, what a kind of efficient high-fidelity was provided carries out the encoder and the coding method thereof of compressed encoding to multichannel (comprising monophony) audio signal.When this audio signal during as the sound accompaniment of vision signal, this method can satisfy the requirement of dispensing multi-sound channel digital audio program, also can satisfy with in the requirement (compression efficiency height) of propagating the multi-sound channel digital audio program of low code check.Also promptly, it has realized the function that Dolby E and other transmission compression coding technology such as AC-3 add up.

The present invention comprises at the encoder aspect this: 1) frame length selector, be used for sample rate according to audio signal, and code check and frame of video frequency (when multi-channel audio signal during as the sound accompaniment of vision signal) are selected the audio frequency frame length; 2) sub-band division bank of filters is used for the audio signal of a frame one frame input is resolved into a plurality of subband signals; 3) transient detector is used for the subband signal of input is divided into transient state section and stable state section; 4) bit distributor, be used for by bit resource allocation that target bit rate determined to each sub band; 5) quantized subband device, being used for described subband signal is that unit quantizes with the section; 6) multiplexer (MUX) is used for that multiplexed to be packaged into one be the complete code stream of unit with the frame with the quantification index of subband and relevant supplementary.

The present invention comprises in the coding method aspect this: 1) according to the sample rate of audio signal, code check and frame of video frequency (when multi-channel audio signal during as the sound accompaniment of vision signal) are selected the audio frequency frame length; 2) audio signal of one frame, one frame being imported by the sub-band division bank of filters resolves into a plurality of subband signals; 3) each subband signal is divided into transient state section and stable state section; 4) will be by bit resource allocation that target bit rate determined to each sub band; 5) be that unit quantizes with the section to described subband signal; 6) multiplexed to be packaged into one be the complete code stream of unit with the frame with the quantification index of subband signal and relevant supplementary.

A second aspect of the present invention provides a kind of decoder and coding/decoding method thereof that the audio code stream that is formed by above-mentioned coding method coding by above-mentioned encoder is decoded.Wherein this decoder comprises: 1) demultiplexer (DEMUX) is used for decompositing the quantification index of subband signal and relevant as the audio frequency frame length, supplementarys such as sub band border and Bit Allocation in Discrete from the code stream multichannel of above-mentioned coding; 2) subband inverse quantizer, being used for according to relevant supplementary is that unit rebuilds subband signal by the quantification index of subband signal with the section; 3) subband synthesis filter group is used for by the subband signal reconstructed audio signals.

The present invention comprises at the coding/decoding method aspect this: 1) multichannel decomposites the quantification index of subband signal and relevant supplementary from the code stream of above-mentioned audio coding; 2) be that unit rebuilds subband signal by the quantification index of subband signal with the section according to relevant supplementary; 3) with the subband synthesis filter group by the subband signal reconstructed audio signals.

When multi-channel audio signal during as the sound accompaniment of vision signal, the audio frequency frame length that the present invention selects has simple multiple relation with the video frame length: perhaps the video frame length is the positive integer times of audio frequency frame length, and perhaps the audio frequency frame length is the integral multiple of video frame length.Like this, the code stream that produces of the present invention can with the synchronous montage of video frequency program.

Further, when multi-channel audio signal during as the sound accompaniment of vision signal, the sub band number of the sub-band division bank of filters that the present invention selects is the common factor of the pairing different audio frequency frame lengths of different video frame frequency (PAL and film).Like this, when the frame frequency of video frequency program changed, encoder of the present invention need only change the multiple of audio frequency frame length and this common factor, rather than sub band number, can keep the aforementioned multiple relation of audio frequency frame length and video frame length, to keep the synchronous montage ability with video frequency program.Because sub band number do not become, each delay line of encoder also need not change, thereby need not the reset variation of frame frequency that can the adaptive video program of encoder.

More than do not hinder the present invention a very short frame length to be arranged to the restriction of sub band number and frame length so that a low mode of operation that postpones to be provided.

The filter that sub-band division device and synthesizer of the present invention are adopted is long to guarantee to keeping decoded audio signal to seamlessly transit after doing montage on the border of the data flow behind the compressed encoding of the present invention at frame.

The transient detector of coded system of the present invention is divided into transient state and steady fragment to the input subband signal.It has very high temporal resolution, to reduce forward direction echo (Pre-echo) effect that the audio compression coding is often run into.The present invention's each parts subsequently all are that unit carries out with the fragment to the processing of input audio signal.

Utilization of the present invention is striden the correlation of audio signal on same subband (frequency) of sound channel and is done long-term and short-term forecast.So both can utilize the periodicity of audio signal and the correlation between the sound channel fully, the fallout predictor that can adopt low exponent number again is to reduce operand.

Bit distributor of the present invention has only utilized the human auditory system model to reach the purpose of simplifying Bit Allocation in Discrete very limitedly.So both can reduce the computation complexity of Bit Allocation in Discrete greatly, also can only express the bit distribution result with a parameter.Decoder can accurately be rebuild the Bit Allocation in Discrete that encoder is used very simply according to it after receiving this parameter.So save other technologies and be used to transmit the bit resource of Bit Allocation in Discrete.Because this is saved, in this way the improvement that brings for code efficiency probably can with the matching in excellence or beauty of the very complicated human auditory system model of usefulness.

Aspect quantification, the strategy that this patent adopts is to use vector quantization at quantized level after a little while, and quantized level is used scalar quantization for a long time, reaching the optimization compression efficiency, and decoding complex degree and to the purpose of the demand of mnemon.

Description of drawings

Fig. 1: the dispensing and the transmission plan of the digital television program of the audio compression techniques of employing Dolby.

Fig. 2: the encoder of first embodiment.

Fig. 3: transient state of the present invention detects schematic diagram.

Fig. 4: the encoder of second embodiment.

Fig. 5: the threshold of hearing (Threshold in Quite) of people's ear under quiet environment.

Fig. 6: the encoder of the 3rd embodiment.

Fig. 7: the encoder of the 4th embodiment.

Fig. 8: the quantification of linear prediction of this sound channel and predictive error signal.

Fig. 9: the quantification of striding sound channel linear prediction and predictive error signal.

Figure 10: this sound channel with stride sound channel linear prediction and usefulness, and the quantification of predictive error signal.

Figure 11: coding flow chart.

Figure 12: decoding process figure.

Figure 13: decoder.

Figure 14: the process of reconstruction of sub-band samples when adopting the linear prediction of this sound channel.

Figure 15: the process of reconstruction that adopts sub-band samples when striding the sound channel linear prediction.

Figure 16: this sound channel and the process of reconstruction of striding sound channel linear prediction and time spent sub-band samples.

Figure 17: the dispensing and the transmission schematic diagram that adopt the digital television program of encoder of the present invention.

Embodiment

Coding of the present invention and coding/decoding method almost are to separate fully with identical to the processing of each sound channel, and therefore the following description of this invention is all based on a sound channel, unless otherwise indicated.

Coding

[embodiment 1]

As shown in Figure 2, encoder of the present invention is by frame length selector 1, sub-band division bank of filters 2, transient detector 3, bit distributor 4, and quantized subband device 5, and multiplexer 6 constitutes, and is used for the audio signal of input is carried out compressed encoding.This audio signal can be the audio signal of Digital Television.Below by detailed description the operation principle of encoder is described to each assembly.

Frame length selector 1:

The effect of frame length selector 1 is according to certain frame length branch framing the audio signal of input.As for the selection of frame length, it is generally acknowledged that frame length is big more, encoding compression efficient is high more, but coding delay is big more, and encoder is also big more to the demand of mnemon.Therefore, generally select smaller frame length when code check is high for use, generally select the frame length of some greatly when code check is low for use.Because coding delay is directly proportional with the product of frame length and sample rate, when selecting frame length, must consider sample rate.

When audio signal is the sound accompaniment of vision signal, when selecting frame length, except will considering above factor, must be considered as some specific (special) requirements of the sound accompaniment of vision signal.The most outstanding in these specific (special) requirements, also be one of the problem to be solved in the present invention simultaneously, be the audio code stream after being compressed can be done synchronous montage with vision signal.The audio code stream after the present invention refers to be compressed with " montage synchronously " and the reasonable clip point of video code flow are fully synchronously in time, so that after their any one clip point is done montage to audio-video code stream simultaneously, audio-video code stream is not made mistakes, and decoded audio-video signal can both seamlessly transit in clip point.

The basic demand that realizes synchronous montage is the shared time of a frame audio code stream must a simple multiple relation be arranged with shared time of a frame video code flow.Otherwise, on the border of frame of video, will cut inside during the montage video code flow at audio frame, destroy the structure of audio frame, cause that decoder makes mistakes.We consider two kinds of situations.First kind is the integral multiple of audio frequency frame length or equal situation for the video frame length.Can do montage this moment on the border of each frame of video, and this clip point always drops on the border of audio frame.

For the vision signal of 25 frame/seconds (PAL), the shared time of its every frame is 0.04 second.With respect to the audio signal with the 48000kHz sampling, it is equivalent to 1920 samples.Because 1920=3x5x2 ⁷, any combination of these factors and the audio frequency frame length that forms all can be done synchro edit with the vision signal of 25 frame/seconds.

For the vision signal of 24 frame/seconds (film), with the sample number of the frame audio signal (48000kHz sample rate) of its equity be 2000=2 ⁴X5 ³In like manner, the audio frequency frame length that is formed by these combinations of factors all can be done synchronous montage with vision signal.

Another restriction that audio frame length is selected is that it must be the integral multiple of the sub band number of the back sub-band division bank of filters that will experience.If greatest common factor (G.C.F.) 2 from 1920 and 2000 ⁴Select among the x5 some the combination as sub band number, we then can under the condition of constant moving sub band number, realize and 24 frame/seconds and 25 frame/second these two kinds of vision signals synchro edit.In the description of sub-band division bank of filters, will be further described this.

Consider second kind of situation now, promptly the audio frequency frame length is the situation of the integral multiple (as V times) of video frame length.In order to ensure the integrality of audio frame, the clip point of an audio signal must just be arranged every V frame of video.The video code flow that was compressed often will just have a clip point every several frames, and this also may be dynamic at interval.Suppose that this dynamic maximum at interval is W, the maximum montage that then can satisfy the restriction of video and audio clips simultaneously is spaced apart WV frame of video.This obviously needs very big memory space.But method described above is appointed right being suitable for.

Sub-band division bank of filters 2

Sub-band division bank of filters 2 is used for resolving into M subband signal passing the audio signal of coming by frame length selector 1.Various bank of filters [list of references 8] be can adopt here, perfect reconstruction and non-perfect reconstruction bank of filters included but not limited to, quadrature and nonopiate bank of filters, tree-shaped bank of filters, the bank of filters of cosine-modulation, Wavelet Packet etc.In order to guarantee the flatness of audio signal after the code stream through the present invention's coding is carried out montage, overlapping long bank of filters is used in special recommendation.

Because the low and simplicity of design of operand, the bank of filters of cosine-modulation is a preferred filter group of the present invention.Its k sub-filter is [list of references 8]:

h_{k} (n) = 2 p_{0} (n) \cos (\frac{π}{M} (k + 0.5) (n - \frac{N}{2}) + λ_{k}) - - - (1)

Wherein n is a sample index, and k is the subband index, and M is a sub band number, p ₀() is prototype filter, and N is the length of prototype filter,

λ_{k} = {(- 1)}^{k} \frac{π}{4} .

The present invention has certain restriction to the sub band number M of subband analysis filter bank 2, and its principle is that sub band number M must be the common factor of the different frame of video frequency of correspondence by the audio frequency frame length of frame length selector 1 selection.Also promptly, all available audio frequency frame length all can be by following equation expression:

Fsize＝k*M (2)

Wherein Fsize is the audio frequency frame length, and k is a positive integer.Like this, when the frame frequency of video changed, 1 of frame length selector needed to select another k, just can be under the constant condition of maintenance sub band number M, select one with new video frame frequency corresponding audio frame length.Keep that sub band number is constant just to mean that the various delay lines of encoder are constant.When the frame rate of vision signal changed, encoder need not reset, and only used the k in the following formula is adjusted accordingly to adapt to this variation reposefully.

For example, mentioned when describing frame length selector 1, for the audio signal with the 48kHz sampling, 2000 can be used as the audio frequency frame length with respect to the vision signal of 24 frame/seconds, and 1920 can be used as the audio frequency frame length with respect to the vision signal of 25 frame/seconds.If selection sub band number M is 2000 and 1920 greatest common factor (G.C.F.) 80, then need only select k=25 and 24 keeping realizing 2000 and 1920 audio frequency frame length respectively under the constant condition of sub band number.Certainly, other common factor as 40 and 20 etc., all can reach same effect.Table 1 has been listed the sub band number of preferred audio frequency frame length of the present invention and bank of filters:

Table 1

Short frame length in the last table, as 200,240,400,480 etc., provide low delayed mode for this audio compression method obviously.Therefore, the present invention has low delayed mode to not overslaugh of restriction the present invention of audio frequency frame length and sub band number.

Transient detector 3

Enter transient detector 3 immediately from the subband signal of sub-band division bank of filters 2 outputs.The certain detection yardstick of transient detector 3 foundations is analyzed the transient state situation of every frame subband signal, then a frame subband signal further is divided into transient state section and stable state section, and exports every section positional information.Needing ben is that these fragment position change with the transient state situation adaptively.The present invention is base unit to the subsequent treatment of subband signal with section (to call sub band in the following text).

With subband signal shown in Figure 3 is example, and detector 3 can judge that A is the stable state section, and B is the transient state section, and C is the stable state section, and exports every section positional information such as

segment length

30,20 and 50.

The yardstick that is used for the transient state detection includes but not limited to the energy of subband signal, the logarithm of energy, energy entropy etc.Detection technique can be simple Threshold detection, also can be some very complicated technology, includes but not limited to famous K-Means algorithm [list of references 8].

Bit distributor 4:

Bit distributor 4 can be each sub band allocation bit [list of references 1--9] of each sound channel according to signal to noise ratio (snr) or signal to hiding than (SMR) with present technique field method commonly used.The result of Bit Allocation in Discrete is the bit number of each sub-band samples that is used to quantize each sub band of each sound channel.This result is fed to quantized subband device 5 and multiplexer 6.

Quantized subband device 5

Quantized subband device 5 comprises one group of a plurality of quantizer [list of references 9] that has different quantification manners and quantize progression.The selection of this group quantizer is very big to the compression efficiency influence with configuration.Because the probability distribution inhomogeneous (especially at quantized level after a little while) of quantification index is so general compress technique adopts scalar quantization to add the method for entropy coding (as the Huffman sign indicating number) more.But the coding/decoding method complexity of entropy coding, operand is big and inhomogeneous, for the commerce of decoder realizes having brought many difficulties.

For this reason, the present invention preferably adopts at quantized level and uses vector quantization after a little while, and quantized level is used scalar quantization for a long time.

Quantized subband device 5 chooses a concrete quantizer for each sub band from above-mentioned quantizer group according to the result of Bit Allocation in Discrete, and quantizes each sub-band samples in this sub band with this quantizer.

Quantizing process to each sub-band samples in each sub band divides following four steps (Fig. 2):

1) estimate scale factor: scale factor estimator 51 can be with the maximum of the absolute value of all samples of this sub band, the variance of all samples of this sub band, or other variable is as scale factor.

2) quantization scaling factor: scale factor itself also needs to quantize so that send decoder to.Because people's ear increases with volume the susceptibility of volume and reduces, so the quantification of the Comparative Examples factor should be adopted nonlinear way, as to quantification.With the variance is example, the variance of establishing d of k subband of c sound channel section be σ (c, k, d), then scale factor quantizer 52 for this scale factor select quantification index be:

s (c, k, d) = \frac{\log σ (c, k, d)}{α}, - - - (2)

Wherein α is a quantization step.

3) normalization sub-band samples: sub-band samples normalization device 53 usefulness quantize the scale factor of back reconstruction to all the sample normalization in this sub band.

4) quantize sub-band samples: the bit number that quantizer Chooser 54 is sent here according to bit distributor 4 is chosen concrete quantizer, and its sub-band samples after to normalization of sub-band samples quantizer 55 usefulness quantizes to obtain the quantification index of each sample then.

Multiplexer (MUX) 6

Multiplexer 6 following information package that above each encoder component produced together to form a complete bit stream or code stream.

The audio frequency frame length that frame length selector 1 is selected.

The fragment position information of transient detector 3 outputs.

The bit number that bit distributor 4 distributes for each sub band.

The quantification index of the scale factor that quantized subband device 5 produces and the quantification index of each sub-band samples.

Multiplexer 6 also can be other supplementary packing.These supplementarys include but not limited to the sample frequency of input audio signal, audio amplifier setting, error correcting code, timing code etc.

[embodiment 2]

The encoder of the second embodiment of the present invention as shown in Figure 4.Its most of assembly is identical with embodiment's 1, and difference is the Bit distribution method of the uniqueness of present embodiment.Specifically, the Bit Allocation in Discrete of present embodiment do not resemble the embodiment 1 according to signal to noise ratio (snr) with or signal come to be each sub band allocation bit to hiding than (SMR), but come to be each sub band allocation bit with following formula according to the scale factor quantification index of scale factor quantizer 52 outputs.

b(c，k，d)＝f(α·s(c，k，d))-θ(k)-β (3)

Wherein:

1) (c, k d) are the bit number of distributing to each sample of current sub section to b.

2) (α s (c, k, d)) is a strictly monotone increasing function to f.It can preferably be made as f (α s (c, k, d))=[α s (c, k, d)] ^q, 0＜q≤2 wherein.It also can further preferably be made as f (α s (c, k, d))=α s (and c, k, d).

3) θ (k) can preferably be made as shown in Figure 5 the curve of approximation [list of

references

1,2 and 10] of the threshold of hearing (Threshold in Quite) of people's ear under quiet environment, more can preferably be made as zero and calculate to simplify.

4) β is that bit is adjusted the factor.

From formula (3) as can be seen, adjust under the definite situation of factor-beta at bit, the bit number that is assigned to each sample of each sub band depends on the quantification index of its scale factor fully.

Clearly, to distribute to the bit of each sub band many more for β more little (3) formula; The bit that β big more (3) formula is distributed to each sub band is few more.The task of Bit Allocation in Discrete is that the summation at the shared bit number of each sub band is no more than the minimum value that finds β under the condition of total bit number that given target bit rate (bit rate) allows to distribute to every frame audio signal.

Bit Allocation in Discrete can be a global optimum, and the public Bit Allocation in Discrete of also promptly all sound channels is adjusted the factor.Suppose that the total bit number that allows to distribute to every frame audio signal following of a given target bit rate is that (this is the total value that is deducting after transmitting the required bit number of various supplementarys to Total_Bits, supposition that such was the case with later on, explanation no longer in addition), bit is adjusted factor searcher 41 and must be searched for the different minimum β value of β value to find one to satisfy following condition:

\underset{c}{Σ} \underset{k}{Σ} \underset{d}{Σ} b (c, k, d) \cdot n (c, k, d) \leq Total_Bits - - - (4)

Wherein (c, k d) are the sample number of sub band to n.

The β that tries to achieve thus just can be used to be each sub band allocation bit of all sound channels according to formula (3).Simultaneously, encoder need only be adjusted the factor to this Bit Allocation in Discrete pass to decoder, the Bit Allocation in Discrete that decoder also just can be used for each sub band reconstruction encoder of all sound channels according to formula (3) according to the quantification index of it and scale factor.

Bit Allocation in Discrete also can be local optimum respectively, has an independent Bit Allocation in Discrete respectively as each sound channel and adjusts the factor.Suppose that total bit number of distributing to every frame audio signal of c sound channel according to certain predetermined mode under a given target bit rate is Total_Bits (c).Bit is adjusted factor searcher 41 must search for the different minimum β value of β value to find one to satisfy following condition:

\underset{k}{Σ} \underset{d}{Σ} b (c, k, d) \cdot n (c, k, d) \leq Total_Bits (c) - - - (5)

At this moment, encoder must be adjusted the factor to decoder for each sound channel transmits a Bit Allocation in Discrete.Obviously, this method can very directly be extended to other other form and share the situation that Bit Allocation in Discrete is adjusted the factor.

In sum, the Bit Allocation in Discrete program of present embodiment is as follows:

1) all proportions factor quantification device 52 of each sound channel is given bit the scale factor quantification index of each sub band and is adjusted factor searcher 41.

2) bit is adjusted factor searcher 41 and is searched out one in the Bit Allocation in Discrete adjustment factor to the global optimum under the constant bit rate according to formula (3) and (4), and passes to multiplexer 6 and bit distributor 42.Perhaps, bit is adjusted factor searcher 41 and is being adjusted the factor for each sound channel searches out one respectively to the Bit Allocation in Discrete to each sound channel local optimum under the constant bit rate according to formula (3) and (5), and they are passed to multiplexer 6 and bit distributor separately 42 respectively.

3) bit distributor 42 is each sub band allocation bit by formula (3), and passes to its corresponding quantitative device Chooser 54.

Therefore Bit Allocation in Discrete of the present invention has only utilized the human auditory system model to reach the purpose of simplifying Bit Allocation in Discrete very limitedly.So both can reduce the computation complexity of Bit Allocation in Discrete greatly, and also can only adjust the factor and express the bit distribution result with a Bit Allocation in Discrete.Only must comprise this Bit Allocation in Discrete in the code stream behind coding and adjust the factor.Decoder can accurately be rebuild the Bit Allocation in Discrete that encoder is used with formula (3) very simply according to it after receiving this parameter.So just save other technologies and be used to transmit the required bit resource of bit number that is assigned to each sub band.The bit resource that these save can be used for transmitting the quantification index of sub-band samples, thereby can further improve tonequality.

[embodiment 3]

The encoder of the third embodiment of the present invention as shown in Figure 6.Its most of assembly is identical with other embodiment's, and difference is that present embodiment Duoed a combined strength encoder 7 than other embodiment.The theoretical foundation of combined strength coding is people's ear to the space orientation of sound at high frequency when being higher than 7kHz (as) mainly according to the intensity of sound.As shown in Figure 6, when coding this combined strength encoder 7 can by about (or other can be united) sound channel transient detector 3 outputs high-frequency sub-band add up, only pass the quantification index of each sample of this and subband (be called source sound channel by the subband of combined coding), add by the intensity index of being united subband of associating sound channel, to reach the purpose of saving bit.

When using combined strength when coding, must be considered by the Bit Allocation in Discrete of the sub band of combined coding that to the source sound channel other is united the bit demand of the same sub band of sound channel coding.Suppose that the source sound channel is c, other are J by the general collection of the sound channel of combined coding, then calculate to the c sound channel by the Bit Allocation in Discrete of the sub band of combined coding the time scale factor that should adopt be:

s (c, k, d) = \max {s (c, k, d), \max_{j &Element; J} {s (j, k, d)}} - - - (6)

May cause that space orientation narrows down if the frequency that the combined strength coding begins to enable is crossed to hang down, therefore, in the present embodiment, only when hanging down code check, just quote the combined strength coding.

Embodiment 4

The encoder of the fourth embodiment of the present invention as shown in Figure 7.Its most of assembly is identical with other embodiment's, difference be present embodiment than other embodiment Duoed one stride sound channel for a long time and short-term linear predictor 8.To each cross-talk band signal, the present invention searches for the short-term and the long-term autocorrelation of it and this subband, and with association's correlation of the signal of the same subband of other sound channel, with the linear predictor 8 that finds to make the predicated error minimum.If (c, k n) are n sample of k subband of c sound channel to x, and then the linear prediction to the c sound channel based on the s sound channel is

\hat{x} (c, k, n) = Σ_{m = m_{0}}^{A} a (m) x (s, k, n - m) + Σ_{m = 0}^{B} b (m) x (s, k, n - m - τ) - - - (7)

Wherein, a (m) and b (m) are respectively the coefficient of the predictive filter of short-term and long-term predictor, and τ is the delay of long-term forecast filter.When s=c, above-mentioned prediction is fully based on this sound channel, m ₀=1; When s ≠ c, sound channel is striden in above-mentioned prediction, m ₀=0.

With (7) formula to each sub-band samples x (c, k, n) make prediction after, just can obtain the predicated error of a correspondence:

e (c, k, n) = x (c, k, n) - \hat{x} (c, k, n) . - - - (8)

The task of encoder is to find the predictive coefficient a (m) of one group of the best, and b (m) and delay τ are so that in total predicated error of this sub band such as following mean square error minimum

e^{2} (c, k) = \underset{n}{Σ} e^{2} (c, k, n) = \underset{n}{Σ} {(x (c, k, n) - \hat{x} (c, k, n))}^{2} . - - - (9)

If (periodicity n) is stronger for c, k, and then the variance of predicated error can (n) variance Ben Shen be little a lot of for c, k than subband signal x for subband signal x.The prediction gain that this means linear prediction is very high, predicts very successful.This moment, ((c, k n) delivered to quantized subband device 5 n) to replace x for c, k with regard to the available predictions error signal e.Otherwise just directly (c, k n) deliver to quantized subband device 5 subband signal x.Therefore, the working procedure of linear predictor 8 can be summarized as follows:

1) estimates predictive coefficient and prediction gain.

2) if the prediction gain height adopts decision, predictive coefficient and the delay of linear prediction to deliver to multiplexer 6 this sub band.Simultaneously, produce predictive error signal, and give quantizer 5 it by (8) formula.

3), do not adopt the decision of linear prediction to deliver to multiplexer 6 this sub band if prediction gain is not high.Simultaneously, (c, k n) give quantizer 5 with the sample x of this sub band.

When using the prediction of this sound channel,, must closed loop carry out [list of references 8,9 and 11] to the quantification of predictive error signal, as shown in Figure 8 for fear of quantization error diffusion when decoding.Because decoder can only obtain the quantification index by the predicated error of quantizer 5 output, decoder can only be rebuild predicated error with inverse quantizer 9, then it and predicted value x (c, k, n) phase Calais reconstruction sub-band samples.And this sound channel fallout predictor 81 also can only be predicted following sub-band samples with the sub-band samples of these reconstructions.The sub-band samples that the sub-band samples of substitution also promptly, (7) formula comes down to rebuild.

Similarly, when striding the sound channel prediction, for fear of quantization error diffusion when decoding, (s, k n) also must be the sub-band samples of rebuilding after having decoded, as shown in Figure 9 to x.Notice that 9 fallout predictor 82 is to stride the sound channel fallout predictor among the figure, because the sub-band samples of its input comes from another sound channel.

Striding sound channel fallout predictor 82 and this sound channel fallout predictor 81 can be simultaneously and usefulness, as shown in figure 10.At first, calculate the error of striding the sound channel prediction:

e_{1} (c, k, n) = x (c, k, n) - Σ_{m = m_{0}}^{A_{1}} a_{1} (m) x (s, k, n - m) - Σ_{m = 0}^{B_{1}} b_{1} (m) x (s, k, n - m - τ_{1})

(10)

Then, this error is done the prediction of this sound channel:

e_{2} (c, k, n) = e_{1} (c, k, n) - Σ_{m = m_{0}}^{A_{2}} a_{2} (m) e_{1} (c, k, n - m) - Σ_{m = 0}^{B_{2}} b_{2} (m) e_{1} (c, k, n - m - τ_{2})

(11)

Stride sound channel when prediction when using, encoder must guarantee that the source sound channel is decoded when decoding.That is to say encoder must guarantee to the decoding of each sound channel must because of follow certain can the realization order: first sound channel of decoding necessarily can only be with the prediction of this sound channel, and second sound channel can only be striden sound channel with the prediction of this sound channel or decoded that sound channel in front and be predicted.By that analogy.

Encoder can be searched for all decoding order realized to find the decoding order of prediction gain maximum.Encoder also can only be done limited search to obtain suboptimal solution.

Prediction example 1: long-term or short-term forecast

Because long-term and short-term forecast and time spent estimate that predictive coefficient is difficult, thus can adopt or for a long time or the mode of short-term forecast to reduce complexity:

\hat{x} (c, k, n) = Σ_{m = 0}^{B} b (m) x (s, k, n - m - τ)

At this moment, be short-term forecast, and be long-term forecast when τ is big in τ hour.

When above formula is used in the situation of striding sound channel prediction and this sound channel prediction while and usefulness, at first calculate the residual error of striding the sound channel prediction:

e_{1} (c, k, n) = x (c, k, n) - Σ_{m = 0}^{B_{1}} b_{1} (m) x (s, k, n - m - τ_{1})

Wherein, postpone τ ₁Desirable minimum value is zero.Then, this residual error is done the prediction of this sound channel:

e_{2} (c, k, n) = e_{1} (c, k, n) - Σ_{m = 0}^{B_{2}} b_{2} (m) e_{1} (c, k, n - m - τ_{2})

Wherein, postpone τ ₂Desirable minimum value is 1.

Prediction example 2: the sound channel of striding of limited search is predicted

Stride the complexity of sound channel prediction on search decoding order in order to simplify, this example finds the optimum order of two sound channels earlier.Sound channel is subsequently only searched for this sound channel prediction and predicted with the sound channel of striding that two sound channels are made the source sound channel.

In order further to reduce complexity, can not do any order search, always with the source sound channel of striding the sound channel prediction of first sound channel as all other sound channels.

Preferably, encoder of the present invention can also be striden sound channel and/or the prediction of this sound channel to audio frame before audio frame is input to sub-band division bank of filters 2, promptly between frame length selector 1 and sub-band division bank of filters 2, further comprise one and stride sound channel and/or this sound channel fallout predictor, after this, predicated error is input to sub-band division bank of filters 2, and according to similar processing described above to coded prediction error.

The coding flow process:

Aforesaid four embodiment can become a complete encoder separately.If but their all functions were all gathered encoder of composition, then could reach optimal compression efficiency.Coding flow process when Figure 11 shows all functions of four embodiment of the present invention are merged together.Wherein, the combined strength coding, it is optionally independent striding prediction of sound channel shot and long term and overall Bit Allocation in Discrete.When in them any one was not selected, it only played the function that data that a handle sends into do not spread out of with making any changes in Figure 11.Certainly, selected if the bit distributor among Figure 11 does not have, another general bit distributor of discussing in embodiment 1 must be introduced into to realize same function.Below in conjunction with Fig. 2,4,6,7,8,9 and 10 describe coding step of the present invention (referring to Figure 11).

E1) frame length is selected: frame length selector 1 receives from the sample of the audio signal of each sound channel input, sample rate according to audio signal, target bit rate, (when multi-channel audio signal during as the sound accompaniment of vision signal) selects the audio frequency frame length with the frame of video frequency, then frame length information sent to other assembly of encoder.Because encoder of the present invention and method thereof are to be base unit with the frame, all component of encoder in each step of coding flow process, is all used frame length directly or indirectly.But succinctly bright and clear for what describe, Figure 11 does not all mark the transfer path of frame length information.After frame length was determined, frame length selector 1 also divided framing the sample of input audio signal by frame length, and a frame one frame send into sub-band division bank of filters 2.This step is not done any processing to input audio signal itself.

E2) sub-band division: sub-band division bank of filters 2 is decomposed into M subband signal to the audio signal of each sound channel.

E3) transient state detects: transient detector 3 is analyzed the transient state situation of each subband signal, and in view of the above it is divided into transient state section and stable state section.Then, the positional information of each section is sent to other assembly of encoder.Owing to step e 1 similar reason, Figure 11 not temporarily/transfer path of the positional information of stable state section all marks.This step is not done any processing to subband signal itself.

E4) combined strength coding: combined strength encoder 7 is abandoned by the sample of associating subband after finishing the combined strength coding, and only the intensity index with its each sub band sends Bit Allocation in Discrete (E7) device 4 and multiplexed (E9) device 6 to.This step is that the present invention is preferred when hanging down code check.Do not adopt this step not influence the integrality of encoder or method, just code efficiency has decline.

E5) stride the long-term and short-term forecast of sound channel: stride the long-term and short-term forecast device 8 of sound channel after finishing linear prediction, will whether adopt the decision of prediction to pass to multiplexed (E9) device 6.If prediction is adopted in decision, also the delay and the predictive coefficient of predictive filter are passed to multiplexed (E9) device 6, and replace subband signal with predictive error signal and pass to quantized subband device 5.This step is that the present invention is preferred, does not adopt this step not influence the integrality of encoder or method, and just code efficiency has decline.

For convenience of description, Figure 11 was divided into for two steps to the function of subband quantizer 5: scale factor (E6) and vector/scalar quantization (E8).

E6) scale factor: quantized subband device 5 from subband signal (having adopted linear prediction as decision, then is predictive error signal, below is referred to as subband signal) with temporarily/the stable state section is estimated by unit and quantization scaling factor.Then, the quantification index of scale factor is sent to Bit Allocation in Discrete (E7) device 4 and multiplexed (E9) device 6.This step is not done any processing to subband signal itself.

E7) Bit Allocation in Discrete: bit is adjusted the quantification index of factor searcher 41 according to the scale factor of being imported by step e 6, and by the intensity index (if the combined strength coding is selected) of E4 input, search out optimum Bit Allocation in Discrete and adjust the factor, and it is passed to multiplexed (E9) device 6.Then, bit distributor 42 arrives each sub band to Bit Allocation in Discrete according to (3) formula again, and the assigned bit number of each sub band is passed to quantizer Chooser 54 use for vector/scalar quantization (step e 8).Above Bit distribution method is that the present invention is preferred.As preferred the method, must adopt other a Bit distribution method, to keep the integrality of encoder.This step is not done any processing to subband signal itself.

E8) vector/scalar quantization: the assigned bit number of each sub band that quantizer Chooser 54 is sent here according to Bit Allocation in Discrete (E7) device 4 is chosen a quantizer for each sub band, then it is sent to sub-band samples quantizer 55.Sub-band samples quantizer 55 is that unit quantizes each sub-band samples subsequently with the sub band, and its quantification index is passed to multiplexed (E9) device 6.

E9) multiplexed (MUX): multiplexer 6 becomes a complete audio frame to the quantification index of each sub-band samples and following supplementary packing (multiplexed) and with its output: the audio frequency frame length, temporarily/position of stable state section, intensity index (if the combined strength coding is selected), the decision whether employing is predicted, the delay of predictive filter and coefficient, the quantification index of scale factor and Bit Allocation in Discrete are adjusted the factor.Multiplexer 6 also can pack (multiplexed) export some other auxiliary data, as sample frequency, audio amplifier setting, error correcting code, timing code etc.

Decoding

Decoder of the present invention and coding/decoding method thereof are the inverse process of encoder and method thereof in itself. at this, decoding process and each parts of decoder are described according to Figure 12 and 13.

Decoding process:

Code stream through coding method of the present invention and encoder generation must come the decoding and re-establishing multiple acoustic track audio signal through following key step (Figure 12):

D1) unpack (DEMUX) supplementary: multiplexed de-packetizer 110 is separated and is contracted out following supplementary:

Frame length.

The position of all sub bands (temporary/the stable state section).

Bit Allocation in Discrete is adjusted the factor.

The scale factor quantification index of each sub band.

Whether each sub band adopts the decision of striding the long-term and short-term forecast of sound channel; If adopt, further separate the delay and the coefficient that contract out predictive filter.

The intensity index of the sub band of being encoded by combined strength

D2) Bit Allocation in Discrete: the scale factor quantification index that bit distributor 42 is adjusted the factor and each sub band according to the Bit Allocation in Discrete of input is each sub band allocation bit.The bit distributor 42 that this bit distributor and encoder are used is just the same, so still prolong with label 42.This Bit distribution method is that the present invention is preferred; Adopt other Bit distribution method as encoder, then must cross and not make this step.But must increase by one unpacks project to step D1 usually: separate from input code flow and contract out Bit Allocation in Discrete.

D3) unpack the quantification index of sub-band samples: multiplexed de-packetizer 110 is separated the quantification index that contracts out each sub-band samples according to the bit number that step D2 distributes from input code flow.

D4) re-quantization is rebuild sub-band samples: the quantification index that sub-band samples inverse quantizer 120 is separated the scale factor quantification index that contracts out according to step D1 and step D3 separates the sub-band samples that contracts out is rebuild each sub-band samples.

D5) stride the long-term and short-term forecast of sound channel: to each sub band, be certainly, then stride sound channel all samples long-term and 130 pairs of these sub bands of short-term forecast device and predict if step D1 separates the prediction decision that contracts out.Otherwise, stride sound channel all samples long-term and 130 pairs of these sub bands of short-term forecast device and do not do any processing.This step is that the present invention is preferred; Do not adopt as encoder and to stride the long-term and short-term forecast of sound channel, then must cross and not make this step.

D6) combined strength decoding: to each sub band of being encoded by combined strength, combined strength decoder 140 at first copies sub-band samples to this sub band from the source sub band.Then, separate the intensity index that contracts out according to step D1 and rebuild the intensity factor, and revise the sample value that copies this sub band to it.This step is that the present invention is preferred when hanging down code check; Do not adopt the combined strength coding as encoder, then must cross and not make this step.

D7) composite filter group: composite filter group 150 is the synthetic audio signal that is reconstructed into of sub-band samples.

If encoder of the present invention had carried out striding sound channel and/or the prediction of this sound channel to audio frame before audio frame is input to sub-band division bank of filters 2, must do to predict with reconstructed audio signals accordingly this moment to the signal of being rebuild by composite filter group 150.

Decoder:

Decoder of the present invention as shown in figure 13.Each parts are below described:

Multiplexed de-packetizer (DEMUX) 110:

Multiplexed de-packetizer 110 is responsible for separating from the code stream of compressed encoding contracting out the data of listing among decoding step D1 and the D3.It also is responsible for separating from the code stream of compressed encoding contracting out other auxiliary data, as sample frequency, and audio amplifier setting, error correcting code, timing code etc.

Bit distributor 42:

The bit distributor 42 that this bit distributor and encoder are used is just the same, so still prolong with label 42.The function of this bit distributor is that Bit Allocation in Discrete is adjusted scale factor quantification index substitution (3) formula of the factor and each sub band to draw the bit number of each sample of distributing to each sub band.

Sub-band samples inverse quantizer 120:

Sub-band samples inverse quantizer 120 is chosen quantizer according to the Bit Allocation in Discrete of each sub band.Then, rebuild sub-band samples with the scale factor of this quantizer and this sub band by the quantification index of sub-band samples.

Notice that stride the long-term and short-term forecast of sound channel when some sub bands have adopted, what then sub-band samples inverse quantizer 120 was rebuild is the predictive error signal of this sub band, rather than sub-band samples itself.

Stride the long-term and short-term forecast device 130 of sound channel:

To each sub band, be certainly if step D1 separates the prediction decision that contracts out, then stride sound channel all samples long-term and 130 pairs of these sub bands of short-term forecast device and predict.Otherwise, stride sound channel all samples long-term and 130 pairs of these sub bands of short-term forecast device and do not do any processing.

Make fallout predictor such as Figure 14 of the prediction of this sound channel.Fallout predictor 81 wherein is just the same with the fallout predictor 81 that encoder is used, so still prolong with label 81.

Stride fallout predictor such as Figure 15 of sound channel prediction.Fallout predictor 82 wherein is just the same with the fallout predictor 82 that encoder is used, so still prolong with label 82.

Predict simultaneously and the time spent when striding sound channel and this sound channel, make this sound channel earlier and predict, remake then and stride the sound channel prediction, as shown in figure 16.

Combined strength decoder 140:

To each sub band of being encoded by combined strength, combined strength decoder 140 at first copies sub-band samples to this sub band from the source sub band.Then, rebuild the intensity factor, and revise the sample value that copies this sub band to it according to intensity index.

Subband synthesis filter group 150:

Subband synthesis filter group 150 is the synthetic audio signal that is reconstructed into of sub-band samples.Subband synthesis filter group 150 is sub-band division bank of filters 2 contrary in the encoder, designs simultaneously.Also promptly, after the sub-band division bank of filters 2 in the encoder was determined, subband synthesis filter group 150 had also just been determined [list of references 8] fully.

Application scheme

What the present invention relates to is multi-sound channel digital audio compressed encoding/decoding technique, and its advantage comprises the compression efficiency height, and decoder is simple, decoded audio signal fidelity height, and can be applicable to height, in and the various application of low code check.Therefore, it is applicable to pure voice applications fully, as digital audio broadcasting etc.; Its decoder can be installed in the pure stereo set fully independently, as power amplifier, and walkman etc.

Because audio signal occurs mainly with the form of the sound accompaniment of vision signal greatly, because the compression efficiency height of decoding method of the present invention, audio signal through the present invention's coding can be encoded with the synchronous montage of vision signal and the file that can afford to stand more than ten times, thereby has in actual applications following advantage:

1) can satisfy simultaneously the dispensing of program and the requirement of transmission.

2) coding techniques of the present invention has greatly been simplified link and the equipment of program dispensing. With DTV is example, and Figure 17 shows the program dispensing of adopting present technique and the process of transmitting. Clearly, it is simpler many than the scheme (Fig. 1) of Dolby.

3) owing to saved transfer coding link repeatedly, present technique has greatly improved program The fidelity of dispensing process.

4) owing to saved a plurality of encoder among Fig. 1, present technique also greatly Reduced the cost of program dispensing.

List of references

[1]ISO/IEC?13818-7，1997.

[2]ISO/IEC?14496-3，1998.

[3]S.Smyth，M.Smyth，and?W.P.Smith，“Multi-channelPredictive?Subband?Audio?conder?using?Psychoacoustic?AdaptiveBit?Allocation?In?Frequency，Time，And?Over?The?MultipleChannels，”US?Patent?5956674.

[4]M.Smyth，“An?overview?of?the?Coherent?Acoustics?codingsystem，” http://www.dtsonline.com/whitepaper.pdf，1999.

[5]S.Smyth，W.P.Smith，M.Smyth，M.Yan，and?T.Jung，“DTS?Coherent?Acoustics?Delivering?High?Quality?MultichannelSound?to?the?Consume，”AES?100th?Convent?ion，1996.

[6]L.Fielder?and?C.Todd，“The?design?of?a?video?friendlyaudio?coding?system?for?distribution?applications，”AES?17 ^thInternaltional?Conference，pp.86-92，1999.

[7]C.Todd，G.Davidson，M.Davis，L.Fielder，B.Link，and?S.Vernon，“AC-3，Flexible?perceptual?coding?for?audiotransmission?and?storage，”96 ^thAES?Convent?ion，Amsterdam，1994.

[8]P.P.Vaidyanathan，“Multirate?systems?and?filterbanks，”Prentice?Hall，1993.

[9]A.Gersho?and?R.M.Gray，“Vector?quantization?andsignal?compression，”Kluwer，1992.

[10]B.C.J.Moore，“An?introduction?to?the?psychologyof?hearing，”Academic?Press，1997.

[11]A.M.Kondoz，“Digital?Speech，”John?Wiley?&?Sons，1994

Claims

1. a coding method that is used for compressing coding of multiple audio track audio signal comprises the following steps:

A) frame length is selected step, selects frame length according to the sample rate and the target bit rate of described audio signal, and divides framing by this frame length with described audio signal;

B) sub-band division filter step, the audio signal that will import frame by frame with the sub-band division bank of filters resolves into a plurality of subband signals;

C) transient state detects step, and described subband signal is divided into transient state section and stable state section, and the segment length of each transient state section or stable state section changes with transient state and stable situation adaptively;

D) scale factor of every cross-talk band signal and the quantification index of output-scale-factor are estimated and quantized to scale factor estimating step;

E) Bit Allocation in Discrete step will be assigned to each sub band according to the scale factor quantification index of each sub band adaptively by bit number that target bit rate determined;

F) quantized subband step, the bit number of foundation each sub band of distributing to are that unit quantizes with the section to described subband signal;

G) multiplexed step, the quantification index of the sub-band samples that the quantized subband device is produced and include the quantification index of frame length, fragment position information, scale factor and the supplementary of bit distribution information multiplexed to be packaged into one be the complete code stream of unit with the frame.

2. coding method as claimed in claim 1, when described audio signal during as the sound accompaniment of vision signal, frame length wherein selects step also to select the audio frequency frame length according to the frame of video frequency, makes that the audio frequency frame length is rational fraction or its integral multiple of video frame length.

3. coding method as claimed in claim 2, the sub band number of wherein said sub-band division bank of filters are the common factors of being selected each audio frequency frame length that step selects according to different frame of video frequencies by frame length.

4. coding method as claimed in claim 1, quantized subband step is wherein used vector quantization at quantized level after a little while, and quantized level is used scalar quantization for a long time.

5. coding method as claimed in claim 1, Bit Allocation in Discrete step are wherein adjusted the factor according to the scale factor quantification index with a Bit Allocation in Discrete, are each sub-band samples allocation bit.

6. coding method as claimed in claim 5, wherein said Bit Allocation in Discrete step distribute to the sample in k subband d of the c sound channel section bit number b (c, k d) are determined by following formula:

b(c，k，d)＝f(α·s(c，k，d))-θ(k)-β

Wherein f () is a strictly monotone increasing function, and (c, k d) are the quantification index of the scale factor of this sub band to s, and α is its quantization step, and θ (k) is the curve of approximation of the threshold of hearing of people's ear under quiet environment, and β is that bit is adjusted the factor.

7. coding method as claimed in claim 6, wherein

F (α s (c, k, d))=[α s (c, k, d)] ^q, 0＜q≤2 wherein.

8. coding method as claimed in claim 7, wherein θ (k)=0.

9. as any described coding method among the claim 1-8, wherein said Bit Allocation in Discrete is adjusted the factor or is shared one of all sound channels, perhaps be private one respectively of each sound channel, perhaps for all sound channels being divided into some groups every group private one respectively, but these Bit Allocation in Discrete are adjusted the feasible summation of distributing to the bit number of each sub-band samples of the factor and are added the required bit number of transmission supplementary, can not exceed target bit rate and allow total bit number of distributing to every frame audio signal.

10. as any described coding method among the claim 1-8, further comprise a combined strength coding step, immediately following detecting the step back in described transient state, by the sub band of combined coding, the scale factor quantification index that is used for the Bit Allocation in Discrete step is necessary for all sound channels of being encoded by gang maximum at the scale factor quantification index of same sub band for each.