US RE40691 E1 Abstract An audio type signal is encoded. The signal is first divided into bands. For each band, a yardstick signal element is selected. The yardstick may be the signal element having the largest magnitude in the band, the second largest, closest to the median magnitude, or having some other selected magnitude. This magnitude is used for various purposes, including assigning bits to the different bands, and for establishing reconstruction levels within a band. The magnitude of non yardstick signal elements is also quantized. The encoded signal is also decoded. Apparatus for both encoding and decoding are also disclosed. The location of the yardstick element within its band may also be recorded and encoded, and used for efficiently allocating bits to non-yardstick signal elements. Split bands may be established, such that each split band includes a yardstick signal element and each full band includes a major and a minor yardstick signal element.
Claims(114) 1. A method for encoding a selected aspect of a signal that is defined by signal elements that are discrete in at least one dimension, said method comprising the steps of:
a. dividing the signal into at least one band, at least one of said at least one band(s) having a plurality of adjacent signal elements;
b. in at least one band, identifying a signal element having a magnitude with a preselected size relative to other signal elements in said at least one band(s) and designating said signal element as a “yardstick” signal element for said at least one band(s); and
c. encoding the location of at least one yardstick signal element(s) with respect to its position along said at least one dimension in which said signal elements are discrete within its respective band.
2. The method of
3. The method of
4. The method of
5. The method of
6. A method for decoding a code representing a selected aspect of a signal that is defined by signal elements that are discrete in at least one dimension, which code has been encoded by a method comprising the steps of:
a. dividing the signal into at least one band, at least one of said at least one band(s) having a plurality of adjacent signal elements;
b. in at least one band, identifying a signal element having a magnitude with a preselected size relative to other signal elements in said at least one band(s) and designating said signal element as a “yardstick” signal element for said at least one band(s);
c. encoding the location of at least one yardstick signal element(s) with respect to its position along said at least one dimension in which said signal elements are discrete within its respective band;
d. quantizing the magnitude(s) of said at least one yardstick signal element(s) for which the location was encoded; and
e. using a function of said encoded location(s) and magnitude(s) of said at least one yardstick signal element(s) to encode said selected aspect of said signal;
said method of decoding comprising the step of translating said code based on a function that is appropriately inversely related to said function of the location(s) and magnitude(s) used to encode said code.
7. An apparatus for encoding a selected aspect of a signal that is defined by signal elements that are discrete in at least one dimension, said apparatus comprising:
a. means for dividing the signal into at least one band, at least one of said at least one band(s) having a plurality of adjacent signal elements;
b. in at least one band, means for identifying a signal element having a magnitude with a preselected size relative to other signal elements in said at least one band(s) and means for designating said signal element as a “yardstick” signal element for said band;
c. means for encoding the location of at least one yardstick signal element(s) with respect to its position along said at least one dimension in which said signal elements are discrete within its respective band; and
d. means for quantizing the magnitude of said at least one yardstick signal element(s) for which the location was encoded.
8. An apparatus for decoding a code representing a selected aspect of a signal that is defined by signal elements that are discrete in at least one dimension, which code has been encoded by an apparatus comprising:
a. means for dividing the signal into at least one band, at least one of said at least one band(s) having a plurality of adjacent signal elements;
b. means for, in at least one band, identifying a signal element having a magnitude with a preselected size relative to other signal elements in said at least one band(s) and designating said signal element as a “yardstick” signal element for said at least one band(s);
c. means for encoding the location of at least one yardstick signal element(s) with respect to its position along said at least one dimension in which said signal elements are discrete within its respective band;
d. means for quantizing the magnitude of said at least one yardstick signal element(s) for which the location was encoded; and
e. means for using a function of said encoded location and magnitude of said at least one yardstick signal element(s) to encode said selected aspect of said signal;
said decoding apparatus comprising:
i. a yardstick location decoder; and
ii. a code transistor that applies a translating rule that is appropriately inversely related to said function of the location and magnitude used to encode said selected aspect of said signal.
9. A method of encoding a signal defined by signal elements that are discrete in at least one dimension, the method comprising:
dividing at least some of the signal elements into a plurally of bands, at least one band having a plurality of adjacent signal elements; selecting a signal element from each of more than one of the bans, at least one of the selected signal elements being from one of the bands having a plurality of adjacent signal elements; and performing a transformation on the selected signal elements. 10. The method of
11. The method of
12. The method of
13. The method of
14. The method of
15. The method of
domain aliasing cancellation coefficients. 16. The method of
17. The method of
18. The method of
19. The method of
20. The method of
21. A method of encoding a signal defined by signal elements that are discrete in at least one dimension, the method comprising:
dividing at least some of the signal elements into a plurality of bands, at least one band having a plurality of adjacent signal elements; selecting a signal element from each of more than one of the bands, at least one of the selected signal elements being from one of the bands having a plurality of adjacent signal elements; processing the selected signal elements; and performing a transformation on the processed selected signal elements. 22. The method of
23. The method of
24. The method of
25. The method of
linear mapping. 26. The method of
27. The method of
28. The method of
29. The method of
30. The method of
31. The method of
domain aliasing cancellation coefficients. 32. The method of
33. The method of
34. The method of
35. The method of
36. The method of
37. A method of encoding a signal defined by signal elements that are discrete in at least one dimension, the signal elements comprising transform coefficients obtained using samples of the signal, the method comprising:
selecting a signal element from each of more than one of the bans, the selected signal element having a preselected size of magnitude relative to the other signal elements within one of the bands, at least one of the selected signal elements being from one of the bands having a plurality of adjacent signal elements; processing the selected signal elements, the processing including quantizing the magnitudes of the selected signal elements; and transforming the processed selected signal elements using a transformation that reduces the average number of bits needed to encode the processed selected signal elements. 38. The method of
39. A method of decoding, comprising:
receiving an encoded signal, the signal being defined by signal elements that are discrete in at least one dimension, the encoded signal of the type encoded by: selecting a signal element from each of more than one of the bands, at least one of the selected signal elements being from one of the bands having a plurality of adjacent signal elements; and performing a transformation on the selected signal elements; and decoding at least some of the received encoded signal, the decoding comprising performing an inverse transformation. 40. The method of
41. The method of
42. The method of
43. The method of
44. The method of
45. A method of decoding, comprising:
receiving an encoded signal, the signal being defined by signal elements that are discrete in at least one dimension, the encoded signal of the type encoded by: processing the selected signal elements; and performing a transformation on the processed selected signal elements; and decoding at least one of the received signal, the decoding comprising performing an inverse transformation. 46. The method of
47. The method of
48. The method of
49. The method of
linear mapping. 50. The method of
51. The method of
52. The method of
53. The method of
54. The method of
55. The method of
56. A method of encoding a signal defined by transform coefficients that are discrete in at least one dimension, the method comprising:
determining a division of at least some of the transform coefficients into a plurality of bands, at least one of the bands having a plurality of adjacent transform coefficients; and providing information describing the determined division. 57. The method of
58. The method of
59. The method of
60. The method of
61. The method of
62. The method of
63. The method of
64. The method of
65. The method of
66. The method of
67. The method of
68. The method of
domain aliasing cancellation coefficients. 69. The method of
70. The method of
71. A method of decoding, comprising:
receiving an encoded signal, the signal being defined by transform coefficients that are discrete in at least one dimension, the encoded signal of the type encoded by: encoding information describing the determined division; and decoding at least part of an encoded signal, the decoding comprising using the received encoded information describing the determined division. 72. The method of
73. The method of
74. The method of
75. The method of
76. The method of
77. The method of
78. The method of
domain aliasing cancellation coefficients. 79. The method of
80. The method of
81. A method of encoding a signal defined by signal elements that are discrete in at least one dimension, the method comprising:
selecting a signal element from each of more than one of the bands, at least one of the selected signal elements being from one of the bands having a plurality of signal elements; processing the selected signal elements; performing a transformation on the processed selected signal elements; encoding the transformed processed selected signal elements; and encoding information describing the dividing. 82. The method of
83. The method of
84. The method of
85. The method of
86. The method of
87. The method of
88. The method of
domain aliasing cancellation coefficients. 89. The method of
90. The method of
91. A method of decoding, comprising:
receiving an encoded signal, the signal being defined by signal elements that are discrete in at least one dimension, the encoded signal of the type encoded by: processing the selected signal elements; performing a transformation on the processed selected signal elements; encoding the transformed processed selected signal elements; and encoding information describing the dividing; and decoding at least some of the received encoded signal, the decoding comprising: using the information describing the dividing; and performing an inverse transformation. 92. The method of
93. The method of
94. The method of
95. The method of
96. The method of
97. The method of
domain aliasing cancellation coefficients. 98. The method of
99. The method of
100. A method of encoding an audio-
type signal, the method comprising: sampling the audio-type signal to obtain discrete samples and costructing therefrom frames, each frame obtained by applying a window to the discrete samples; determining a set of transform coefficients from each of at least some of the frames; and for each of at least some of the sets of transform coefficients: dividing at least some of the transform coefficients into a plurality of bands, at least one band having a plurality of adjacent transform coefficients; selecting a transform coefficient from each of more than one of the bands, at least one of the selected transform coefficients being from one of the bands having a plurality of adjacent transform coefficients; processing the selected transform coefficients; and performing a transformation on the processed selected transform coefficients. 101. The method of
102. The method of
103. A method of encoding an audio-
type signal, the method comprising: sampling the audio-type signal to obtain discrete samples and constructing therefrom frames, each frame obtained by applying a window to the discrete samples; determining a set of transform coefficients from each of at least some of the frames; for each of at least some of the sets of transform coefficients: dividing at least some of the transform coefficients into a plurality of bands, at least one band having a plurality of adjacent transform coefficients; and encoding the dividing. 104. The method of
105. The method of
106. A method of decoding an audio-
type signal, the method comprising: receiving an encoded audio-type signal, the encoded signal of the type encoded by: sampling the audio-type signal to obtain discrete samples and constructing therefrom frames, each frame obtained by applying a window to the discrete samples; determining a set of transform coefficients from each of at least some of the frames; for each of at least some of the sets of transform coefficients: dividing at least some of the transform coefficients into a plurality of bands, at least one band having a plurality of adjacent transform coefficient; processing the selected transform coefficients; and performing a transformation on the processed selected transform coefficients; and decoding the received encoded audio-type signal, the decoding comprising performing an inverse transformation. 107. The method of
108. The method of
109. The method of
110. A method of decoding an audio-
type signal, the method comprising: receiving an encoded audio-type signal, the encoded signal of the type encoded by: sampling the audio-type signal to obtain discrete samples and constructing therefrom frames, each frame obtained by applying a window to the discrete samples; determining a set of transform coefficients from each of at least some of the frames; for each of at least some of the sets of transform coefficients: encoding the dividing; and decoding the received encoded audio-type signal, the decoding comprising decoding the dividing. 111. The method of
112. The method of
113. A method of encoding an audio-
type signal, the method comprising: sampling the audio-determining a set of transform coefficients from each of at least some of the frames; for each of at least some of the sets of transform coefficients: processing the selected transform coefficients; performing a transformation on the processed selected transform coefficients; and encoding the dividing. 114. A method of decoding an audio-
type signal, the method comprising: receiving an encoded audio-type signal, the encoded signal of the type encoded by; sampling the audio-determining a set of transform coefficients from each of at least some of the frames; for each of at least some of the sets of transform coefficients: selecting a transform coefficient from each of more than one of the bands, at least one of the selected transform coefficients being from one of the bands having plurality of adjacent transform coefficients; processing the selected transform coefficients; performing a transformation on the processed selected transform coefficients; and encoding the dividing; and decoding the encoded audio-type signal, the decoding comprising: performing an inverse transformation; and decoding the dividing. Description This is a continuation of application Ser. No. 07/879,635 filed on May 7, 1992, now U.S. Pat. No. 5,369,724, which is a continuation-in-part of Ser. No. 07/822,247, filed Jan. 17, 1992, now U.S. Pat. No. 5,394,508. The present invention relates generally to the field of signal processing, and more specifically to data encoding and compression. The invention relates most specifically to a method and an apparatus for the encoding and compression of digital data representing audio signals or signals generally having the characteristics of audio signals. Audio signals are ubiquitous. They are transmitted as radio signals and as part of television signals. Other signals, such as speech, share pertinent characteristics with audio signals, such as the importance of spectral domain representations. For many applications, it is beneficial to store and transmit audio type data encoded in a digital form, rather than in an analogue form. Such encoded data is stored on various types of digital media, including compact audio discs, digital audio tape, magnetic disks, computer memory, both random access (RAM) and read only (ROM), just to name a few. It is beneficial to minimize the amount of digital data required to adequately characterize an audio-type analogue signal. Minimizing the amount of data results in minimizing the amount of physical storage media that is required, thus reducing the cost and increasing the convenience of whatever hardware is used in conjunction with the data. Minimizing the amount of data required to characterize a given temporal portion of an audio signal also permits faster transmission of a digital representation of the audio signal over any given communication channel. This also results in a cost saving, since compressed data representing the same temporal portion of an audio signal can be sent more quickly, relative to uncompressed data, or can be sent over a communications channel having a narrower bandwidth, both of which consequences are typically less costly. The principles of digital audio signal processing are well known and set forth in a number of sources, including Watkinson, John, The Art of Digital Audio., Focal Press, London (1988). An analogue audio signal x(t) is shown schematically in FIG. Sampling the signal x(t) is shown schematically in FIG. The outline of a general method of digital signal processing is shown schematically in FIG. The transformation produces a set of amplitude coefficients of a variable other than time, typically frequency. The coefficients can be both real valued or they can be complex valued. (If X(k) is complex valued, then the present invention can be applied to the real and imaginary parts of X(k) separately, or the magnitude and phase parts of X(k) separately, for example. For purposes of discussion, it will be assumed, however, that X(k) is real valued.) A typical plot of a portion of the signal x(n) transformed to X(k) is shown schematically in FIG. The transform is taken by applying the transformation function to a time-wise slice of the sampled analogue signal x(n). The slice (known as a “frame”) is selected by applying a window at Application of the transformation, indicated at As shown in An important task in coding signals is to allocate the fixed number of available bits to the specification of the amplitudes of the coefficients. The number of bits assigned to a coefficient, or any other signal element, is referred to herein as the “allocated number of bits” of that coefficient or signal element. This step is shown in relation to the other steps at Thus, a simple method of allocating the N available bits is to distribute them evenly among the C coefficients, so that each coefficient can be specified by N/C bits. (For discussion purposes, it is assumed that N/C is an integer.) Thus, considering the transformed signal X(k) as shown in There are various known methods for allocating the number of bits to each coefficient. However, all such known methods result in either a significant waste of bits, or a significant sacrifice in the precision of quantizing the coefficient values. One such method is described in a paper entitled “High-Quality Audio Transform Coding at 128 Kbits/s”. Davidson, G., Fielder, L., and Antill, M., of Dolby Laboratories, Inc., ICASSP, pp 1117-1120, Apr. 3-6. Albuquerque, N. Mex. (1990) (referred to herein as the “Dolby paper”) which is incorporated herein by reference. According to this method, the transform coefficients are grouped to form bands, with the widths of the bands determined by critical band analysis. Transform coefficients within one band are converted to a band block floating-point representation (exponent and mantissa). The exponents provide an estimate of the log-spectral envelope of the audio frame under examination, and are transmitted as side information to the decoder. The log-spectral envelope is used by a dynamic bit allocation routine, which derives step-size information for an adaptive coefficient quantizer. Each frame is allocated the same number of bits, N. The dynamic bit allocation routine uses only the exponent of the peak spectral amplitude in each band to increase quantizer resolution for psychoacoustically relevant bands. Each band's mantissa is quantized to a bit resolution defined by the sum of a coarse, fixed-bit component and a fine, dynamically-allocated component. The fixed bit component is typically established without regard to the particular frame, but rather with regard to the type of signal and the portion of the frame in question. For instance, lower frequency bands may generally receive more bits as a result of the fixed bit component. The dynamically allocated component is based on the peak exponent for the band. The log-spectral estimate data is multiplexed with the fixed and adaptive mantissa bits for transmission to the decoder. Thus the method makes a gross analysis of the maximum amplitude of a coefficient within a band of the signal, and uses this gross estimation to allocate the number of bits to that band. The gross estimate tells only the integral part of the power of 2 of the coefficient. For instance, if the coefficient is seven, the gross estimate determines that the maximum coefficient in the band is between 2 In addition to determining how many bits to allocate to each coefficient for encoding that coefficient's amplitude, an encoding method must also divide the entire amplitude range into a number of amplitude divisions shown at It is also useful to determine a masking level. The masking level relates to human perception of acoustic signals. For a given acoustic signal, It is possible to calculate approximately the level of signal distortion (for example, quantization noise) that will not be heard or perceived, because of the signal. This is useful in various applications. For example, some signal distortion can be tolerated without the human listener noticing it. The masking level can thus be used in allocating the available bits to different coefficients. The entire basic process of digitizing an audio signal, and synthesizing an audio signal from the encoded digital data is shown schematically in FIG. The preceding steps, Eventually, the data is received by a receiver Thus, the several objects of the invention include, to provide a method and apparatus for coding and decoding digital audio-type signals: which permits efficient allocation of bits such that in general, fewer bits are used to specify coefficients of smaller magnitude then are used to specify larger coefficients; which provides for a quantization of the amplitude of the coefficients such that bands including larger coefficients are divided into reconstruction levels differently from bands including only smaller coefficients, such that both smaller and larger coefficients can be specified more accurately than if the same reconstruction levels were used for all coefficients; which permits accurate estimation of the masking level; which permits efficient allocation of bits based on the masking level; which robustly localizes errors to small portions of the digitized data, and, with respect to that data, limits the error to a small, known range; and that minimizes the need to redundantly encode coefficients, all allowing a highly efficient use of available bits. In a first preferred embodiment, the invention is a method for encoding a selected aspect of a signal that is defined by signal elements that are discrete in at least one dimension, said method comprising the steps of: dividing the signal into at least one band, at least one of said at least one bands having a plurality of adjacent signal elements; in at least one band, identifying a signal element having a magnitude with a preselected size relative to other signal elements in said band and designating said signal element as a “yardstick” signal element for said band; and encoding the location of at least one yardstick signal element with respect to its position in said respective band. In a second preferred embodiment, the invention is a method for decoding a code representing a selected aspect of a signal that is defined by signal elements that are discrete in at least one dimension, which has been encoded by a method comprising the steps of: dividing the signal into at least one band, at least one of said at least one bands having a plurality of adjacent signal elements; in at least one band, identifying a signal element having a magnitude with a preselected size relative to other signal elements in said band and designating said signal element as a “yardstick” signal element for said band; encoding the location of at least one yardstick signal element with respect to its position in said respective band; and using a function of said encoded location of said at least one yardstick signal element to encode said selected aspect of said signal; said method of decoding comprising the step of translating said encoded aspect of said signal based on a function of the location of said yardstick signal element that is appropriately inversely related to said function of the location used to encode said selected aspect of said signal. In a third preferred embodiment, the invention is an apparatus for encoding a selected aspect of a signal that is defined by signal elements that are discrete in at least one dimension, said apparatus comprising: means for dividing the signal into at least one band, at least one of said at least one bands having a plurality of adjacent signal elements; in at least one band, means for identifying a signal element having a magnitude with a preselected size relative to other signal elements in said band and means for designating said signal element as a “yardstick” signal element for said band; means for encoding the location of at least one yardstick signal element with respect to its position in said respective band; and means for quantizing the magnitude of said at least one yardstick signal element for which the location was encoded. In a fourth preferred embodiment, the invention is an apparatus for decoding a code representing a selected aspect of a signal that is defined by signal elements that are discrete in at least one dimension, which has been encoded by a method comprising the steps of: dividing the signal into at least one band, at least one of said at least one bands having a plurality of adjacent signal elements; in at least one band, identifying a signal element having a magnitude with a preselected size relative to other signal elements in said band and designating said signal element as a “yardstick” signal element for said band; encoding the location of at least one yardstick signal element with respect to its position in said respective band; and using a function of said encoded location of said at least one yardstick signal element to encode said selected aspect of said signal; said decoding apparatus comprising means for translating said encoded aspect of said signal based on a function of the location of said yardstick signal element that is appropriately inversely related to said functions of the location used to encode said selected aspect of said signal. In a fifth preferred embodiment, the invention is a method for encoding a selected signal element of a signal that is defined by signal elements that are discrete in at least one dimension, said method comprising the steps of: dividing the signal into a plurality of bands, at least one band having a plurality of adjacent signal elements; in each band, identifying a signal element having the greater magnitude of any signal element in said band, and designating said signal element as a “yardstick” signal element for said band; quantizing the magnitude of each yardstick signal element to a first degree of accuracy; and allocating to said selected signal element a signal element bit allocation that is a function of the quantized magnitudes of said yardstick signal elements, said signal element bit allocation chosen such that quantization of said selected signal element using said signal element bit allocation is to a second degree of accuracy, which is less than said first degree of accuracy. In a sixth preferred embodiment the invention is a method for encoding a selected signal element of a signal that is defined by signal elements that are discrete in at least one dimension, said method comprising the steps of: dividing the signal into a plurality of bands, at least one band having a plurality of adjacent signal elements, one of said bands including said selected signal element; in each band, identifying a signal element having the greatest magnitude of any signal element in said band, and designating said signal element as a “yardstick” signal element for said band; quantizing the magnitude of each yardstick signal element only one time; allocating to said selected signal element a signal element bit allocation that is a function of the quantized magnitudes of said yardstick signal elements. In a seventh preferred embodiment, the invention is a method of decoding a selected signal element that has been encoded by either of the preferred methods of the invention mentioned above, said method of decoding comprising the step of translating a codeword generated by the method of encoding based on a function of the quantized magnitudes of said yardstick signal elements that is appropriately inversely related to said function of the quantized magnitudes used to allocate bits to said selected signal element. In a eighth preferred embodiment, the invention is an apparatus for encoding a selected signal element of a signal that is defined by signal elements that are discrete in at least one dimension, said apparatus comprising: means for dividing the signal into a plurality of bands, at least one band having a plurality of adjacent signal elements, one of said bands including said selected signal element; means for identifying, in each band, a signal element having the greatest magnitude of any signal element in said band, and designating said signal element as a “yardstick” signal element for said band; means for quantizing the magnitude of each yardstick signal element to a first degree of accuracy; means for allocating to said selected signal element a signal element bit allocation that is a function of the quantized magnitudes of said yardstick signal elements, said signal element bit allocation chosen such that quantization of said selected signal element using said signal element bit allocation is to a second degree of accuracy, which is less than said first degree of accuracy. In a ninth preferred embodiment, the invention is an apparatus for decoding a codeword representing a selected signal element of a signal that has been encoded by a method of the invention mentioned above, the apparatus comprising means for translating said codeword based on a function of the quantized magnitudes of said yardstick signal elements that is appropriately inversely related to said function of the quantized magnitudes used to allocate bits to said selected signal element. A first preferred embodiment of the invention is a method of allocating bits to individual coefficients, for the encoding of the magnitude (i.e. the absolute value of the amplitude) of these coefficients. According to the method of the invention, an audio signal x(t) is obtained as in An important aspect of the method of the invention is the method by which the total number of bits N are allocated among the total number of coefficients, C. According to the method of the invention, the number of bits allocated is correlated closely to the amplitude of the coefficient to be encoded. The first step of the method is to divide the spectrum of transform coefficients in X(k) into a number B of bands, such as B equal sixteen or twenty-six. This step is indicated at If the number of frequency coefficients in each band is not uniform, then the pattern of the bandwidth of each band must be known or communicated to the decoding elements of the apparatus of the invention. The non-uniform pattern can be set, and stored in memory accessible by the decoder. If, however, the bandwidth of the bands is varied “on-the-fly,” based on local characteristics, then the decoder must be made aware of these variations, typically, by an explicit message indicating the pattern As shown in It may be useful, although not necessary for the invention, to analyze the spectrum coefficients in a domain where the spectrum magnitudes are compressed through non-linear mapping such as raising each magnitude to a fractional power α, such as ½, or a logarithmic transformation. The human auditory system appear to perform some form of amplitude compression. Also, non-linear mapping such as amplitude compression tends to lead to a more uniform distribution of the amplitudes, so that a uniform quantizer is more efficient. Non-linear mapping followed by uniform quantization is an example of the well known non-uniform quantization. This step of non-linear mapping is indicated at In each band of the exponentially scaled spectrum, the coefficient Cb The method of the invention entails several embodiments. According to each, the magnitude of the yardstick coefficients is used to allocate bits efficiently among the coefficients, and also to establish the number and placement of reconstruction levels. These various embodiments are discussed in detail below, and are indicated in The magnitude of each of yardstick coefficient is quantized very accurately, in typical cases, more accurately than is the magnitude of non-yardstick coefficients. In some cases, this accurate rendering is manifest as using more bits to encode a yardstick coefficient (on average) than to encode a non-yardstick coefficient (on average). However, as is explained below with respect to a yardstick-only transformation step performed at step After quantization, the yardstick coefficients are encoded into codewords at The accurately quantized magnitude of the yardstick coefficients are used to allocate bits among the remaining coefficients in the band. Because, in this first discussed embodiment, each yardstick coefficient is the coefficient of greatest magnitude in the band of which it is a member, it is known that all of the other coefficients in the band have a magnitude less than or equal to that of the yardstick coefficient. Further, the magnitude of the yardstick coefficient is also known very precisely. Thus it is known how many coefficients must be coded in the band having the largest amplitude range, the next largest, the smallest, etc. Bits can be allocated efficiently among the bands based on this knowledge. There are many ways that the bits can be allocated. Two significant general methods are: to allocate bits to each band, and then to each coefficient within the band; or to allocate bits directly to each coefficient without previously allocating bits to each band. According to one embodiment of the first general method, initially, the number of bits allocated for each individual band are determined at For instance, as shown in It is also possible to adjust the estimate for the size of the band depending on the number of coefficients (also known as frequency samples) in the band. For instance, the more coefficients, the less likely it is that the average magnitude is equal to the magnitude of the yardstick coefficient. In any case, a rough estimate of the size of the band facilitates an appropriate allocation of bits to that band. Within each band, bits are allocated at As is mentioned above, rather than first allocating bits among the bands, and then allocating bits among the coefficients in each band, it is also possible to use the estimate of |X(k)| Due to the accurate quantization of the yardstick coefficients, the present invention results in a more appropriate allocation of bits to coefficients in each band than does the method described in the prior art Dolby paper. Consider, for example, the two bands b Further according to the prior art method, yardstick coefficient Conversely, according to the method of the invention, because the yardstick coefficients are quantized very accurately, yardstick coefficient Comparison to the bit allocation of the method of the invention to the prior art method shows that the allocation according to the method of the invention is much more appropriate. For band b Once each coefficient has been allocated its allotment of bits at The reconstruction levels that would be assigned according to the method of the invention are quite different from those of the prior art, and, in fact, differ between the two bands. In the example, band b Comparison of the accuracy of the two methods shows that the method of the invention provides greater efficiency than does the prior art. For the coefficients in band b The placement of the boundaries between reconstruction levels and the assignment of reconstruction values to the reconstruction levels within the range can be varied to meet specific characteristics of the signal. If uniform reconstruction levels are assigned, they can be placed as shown in As in the case of uneven allocation of bits to coefficients in a band, if more than one reconstruction scheme can be applied by the encoder, then either a signal must be transmitted to the decoder along with the data pertaining to the quantized coefficients indicating which reconstruction scheme to use, or the decoder must be constructed so that in all situations, it reproduces the required distribution of reconstruction levels. This information would be transmitted or generated in a manner analogous to the manner in which the specific information pertaining to the number of coefficients per band would be transmitted or generated, as discussed above. Rather than divide up the amplitude of the band evenly, it may be beneficial to divide it at The foregoing examples have implicitly assumed that the yardstick coefficient is greater than zero and that all of the other coefficients are greater than or equal to zero. Although this can happen, many situations will arise where either or both of these assumptions will not lie. In order to specify the sign of the non-yardstick coefficients, several methods are possible. The most basic is to expand the amplitude range of the band to a range having a magnitude of twice the magnitude of the yardstick coefficient, and to assign at Rather than an equal apportionment to positive and negative values, it is possible to assign either the positive or negative reconstruction levels more finely, as shown in FIG. The foregoing examples demonstrate that with very accurate quantization of the yardsticks, very accurate range information for a particular band can be established. Consequently, the reconstruction levels can be assigned to a particular band more appropriately, so that the reconstructed values are closer to the original values. The method of the prior art results in relatively larger ranges for any given band, and thus less appropriate assignment of reconstruction levels. The estimation of the masking level is also improved over the prior art with application of the method of the invention. Estimation of the masking level is based upon an estimation of the magnitude of the coefficients |X(k)|. As has been mentioned, in general, for each coefficient, the masking level is a measure of how much noise, such as quantization noise, is tolerable in the signal without it being noticeable by a human observer. In most applications, signals of larger amplitude can withstand more noise without the noise being noticed. Factors in addition to amplitude also figure into the masking level determination, such as frequency and the amplitudes of surrounding coefficients. Thus, a better estimation of |X(k)|, for any given coefficient results naturally in a better estimation of an appropriate masking level. The masking level is used to fine-tune the allocation of bits to a coefficient. If the coefficient is situated such that it can tolerate a relatively high amount of quantization noise, then the bit allocation takes this into account, and may reduce the number of bits that would be allocated to a specific coefficient (or band) as compared to the number that would have been applied if the masking level were not taken into account. After the coefficients are encoded according to the method of the invention, the stream of codewords are transmitted at At The decoder translates the codewords into quantization levels by applying an inverse of the steps conducted at the encoder. From the yardstick coefficients, the coder has available the number of bands and the magnitudes of the yardsticks. Either from side information or from preset information, the number of non-yardstick coefficients in each band is also known. From the foregoing, the reconstruction levels (number and locations) can be established by the decoder by applying the same rule as was applied by the encoder to establish the bit allocations and reconstruction levels. If there is only one such rule, the decoder simply applied it. If there are more than one, the decoder chooses the appropriate one, either based on side information or on intrinsic characteristics of the yardstick coefficients. If the codewords have been applied to the reconstruction levels according to a simple ordered scheme, such as the binary representation of the position of the reconstruction level from lowest arithmetic value to highest, then that scheme is simply reversed to produce the reconstruction level. If a more complicated scheme is applied, such as application of a codebook, then that scheme or codebook must be accessible to the decoder. The end result is a set of quantized coefficients for each of the frequencies that were present in the spectrum X(k). These coefficients will not be exactly the same as the original, because some information has been lost by the quantization. However, due to the more efficient allocation of bits, better rang division, and enhanced masking estimation, the quantized coefficients are closer to the original than would be requantized coefficients of the prior art. (However, reconstituted non-yardstick coefficients typically do not compare to the original non-yardstick coefficients as accurately as the reconstituted yardstick coefficients compared to the original yardstick coefficients.) After requantization, the effect of the operation of raising the frame to the fractional power α, such as ½, is undone at The foregoing discussion has assumed that only the magnitude of the yardstick coefficients were encoded accurately at If the location of the yardstick coefficient had not been encoded, it would be necessary to encode its magnitude in the stream of all coefficients, for instance at step If, however, the location is coded originally at If at As has been mentioned above, a basic method to allocate bits within the band is to allocate an equal number of bits to each non-yardstick coefficient. However, in some cases, this cannot be done, for instance when the number of bits available is not an integer multiple of the number of non-yardstick coefficients. In this case, it is frequently beneficial to give more bits to the coefficients that are closest (in location within the band) to the yardstick coefficient, because experience has shown that for audio-type signals, adjacent coefficients are often closer to each other in magnitude than are distant coefficients. There are various other uses to which extra bits can be put. For instance, more preference can be given to coefficients lying to the left of the yardstick coefficient, i.e. of a lower frequency than the yardstick coefficient. This is in consideration of the masking result. Typically, the impact of a specific frequency component on the masking function occurs with respect to a higher frequency region than the frequency in question. Therefore, giving preference to coefficients of lower frequency than the yardstick, (thus lying to the left of the yardstick on a conventional scale such as shown in Thus, accurately specifying the location of the yardstick coefficient within the band allows further more appropriate allocation of the bits among the various non yardstick coefficients. With more appropriate allocation of bits per non-yardstick coefficient, the division of the bits into appropriate reconstruction levels, as discussed above, is further enhanced. Knowing the location of the yardstick coefficients also permits a better rough estimation of |X(k)| If the location of each yardstick coefficient has been specified, then it is possible without redundancy to go back to any yardsticks that have been encoded and enhance the accuracy of their coding if more bits are available than was assumed at the time of yardstick encoding. For instance, the particular band may gave received a very large number of bits due to the very large yardstick, but may not require such a large number of bits to encode the other signal elements, due to a very small number of signal elements being in the band. If the locations are known, more bits can be allocated to specifying the amplitude of the yardstick coefficient after the first pass of allocation of bits to yardsticks. If the locations are not known, it can not be done efficiently without redundancy. One way to further specify the magnitude of the yardstick would be to use the extra bits to encode the difference between the magnitude of the yardstick first encoded, and the original yardstick amplitude. Because the decoding apparatus will be employing the same routines to determine how bits have been allocated as were used by the encoder, the decoder will automatically recognize the enhanced yardstick amplitude information properly. Additional coding efficiency and accuracy can be achieved by accurately specifying and encoding the sign of the yardstick coefficient (which corresponds to the phase of the signal components at that frequency). Only one additional bit per yardstick coefficient is necessary to encode its sign if X(k) is real-valued. Knowing the sign of the yardstick coefficient enhances the ability of the method to efficiently determine reconstruction levels within a given band. For instance, experience indicates that a band may often include more non-yardstick coefficients having the same sign as the yardstick coefficient. Therefore, it may be beneficial to provide one or two more reconstruction levels having that sign. Knowing the sign of the yardstick does not generally enhance estimation of the masking effect. The usefulness of the sign information varies depending upon which transform has been used. Another preferred embodiment of the method of the invention is particularly useful if the number of bands is relatively small. This embodiment entails a further division of each band in the spectrum X(k) into two split-bands at step The magnitudes of the minor yardstick coefficients are also quantized accurately at There are various ways to divide the entire frame into, for instance, sixteen bands. One is to divide the segment from the beginning into sixteen bands. The other is to divide the entire segment into two, and then divide each part into two, and so on, with information derived from the first division being more important than information derived from the second division. Using split bands thus provides a hierarchy of important information. The first division is more important than the second division, which is more important than the next division, etc. Thus it may be beneficial to preserve bits for the more important divisions. As has been mentioned above, it may be beneficial to apply a second transformation to the yardsticks before quantizing, coding and transmitting at step Thus, at step It is because of this potential yardstick-only transformation that it is not appropriate in all cases to conclude that according to the method of the invention, the higher accuracy to which the yardstick coefficients are encoded is the result of devoting more bits to each yardstick coefficient (on average) than to each non-yardstick coefficient (on average). This is because the application of the yardstick-only transformation may result in a significant reduction in the number of bits necessary to encode all of the yardstick coefficients and thus of any single yardstick coefficient (on average). Of course, this savings in bits is achieved due to an increase in computational requirements, both in encoding and decoding. In some applications, the bit savings will justify the computational burden. In others, it may not. Both will be apparent to those of ordinary skill in the art. If the yardsticks are twice transformed, they must be inverse transformed back into the frequency domain of X(k) at During the decoding steps of the method of the invention, the exact manner of translation at step For instance, an established algorithm may set the number of coefficients per band in the first half of the frame at sixteen and the number of coefficients per band in the second half at thirty-two. Further a rule might be established to allocate bits within a band evenly among coefficients, with any extra bits being given, one to each of the first coefficients in the band. If the sign of the yardstick coefficient is quantized, then each coefficient may be divided into reconstruction levels with one additional reconstruction level having a sign that is the same as the yardstick coefficient. In light of the foregoing detailed discussion of the method of the invention, the apparatus of the invention will be understood from TDAC type transformer In a preferred embodiment, band-wise bit allocator In another preferred embodiment of the apparatus, the band-wise bit allocator can also take information from the yardstick position quantizer In another embodiment of the apparatus of the invention, the bandwise bit allocator The receiver or decoder portion of the invention is shown schematically in FIG. Another preferred embodiment of the encoder omits the band-wise bit allocator and includes only a coefficient-wise bit allocator, which takes the estimate of |X(k)| The foregoing discussion of method and apparatus has assumed that the yardstick coefficients are the coefficients having the maximum absolute value of amplitude in the band. It is also beneficial to use a coefficient other than the maximum magnitude as the reference yardstick against which the others are measured. For instance, although it is believed that optimal results will be achieved using the maximum amplitude coefficient, beneficial results could be obtained by using a coefficient having an amplitude near to the greatest, such as the second or third greatest. Such a method is also within the contemplation of the invention and is intended to be covered by the attached claims. The reference yardstick may also be the coefficient having a magnitude that is closest among all of the magnitudes of other coefficients in the band to the middle or median coefficient in the band. A middle value yardstick is beneficial in cases where the statistical characteristics of the signal are such that the middle, or median value contains more information about the total energy in the signal than does the maximum value in a band. This would be the case if the typical signal is characterized by excursions within a steady range above and below a middle value. It would also be necessary to characterize or estimate a range for the magnitude of the excursions. For example, if the middle value of a band had a value of positive five, and it were known from the statistics of the type of signal that such signal values typically diverge from the median by only ±four units, the range would be set from positive one to positive nine, and reconstruction level would be established within the range. As before, the reconstruction levels can be evenly divided, or can be concentrated more around the middle value, or skewed toward either end of the range, depending upon statistical information about the particular class of signal. Similarly, the yardstick coefficient may be the coefficient having a magnitude that is closest to the average of all of the magnitudes of the other coefficients in the band. Such an average value is useful if the average value represents a better estimate of the energy in the band than any other value, for instance the maximum or the median values. The invention has been discussed above with respect to a signal that has been divided into a plurality of bands, and this is expected to be the application for which the invention provides the greatest benefits. However, the invention is also useful in connection with coding the amplitudes of a plurality of coefficients in only a single band. Application of the invention to a signal or signal component on only a single band follows the same principles as the application to multi-band signals discussed above. The yardstick is selected, and quantized accurately, preferably although not necessarily encoding the location and the sign of the yardstick. The accurate quantization of the yardstick is used in conjunction with the number of available bits to establish reconstruction levels and to allocate bits among the non yardstick coefficients. All of the considerations discussed above apply to the signal band embodiment, except that the number of bits available for the band will be determined, and will not depend on the specifics of other bands, if any. The present invention has many benefits. The bits related to bit allocation, such as the magnitude of the yardstick coefficient as well as their locations and signs, will be well protected. Thus, any error that occurs will be localized to one particular band and will not be any larger than the magnitude of the yardstick coefficient in each band. The yardstick coefficients will always be accurately represented. The yardstick amplitude information is not discarded as in some prior art methods, but is used very efficiently for its own direct use and for bit allocation. Relative to the method discussed in the Dolby paper, the invention uses the available bits more efficiently. In the Dolby method, the exponents of the peak spectral values for each band are encoded. Thus, a gross estimate of the amplitude of a band is first made. Subsequently, all of the coefficients, including the peak coefficient are encoded and transmitted using a finer estimate of their magnitude. Thus, the accuracy of the peak amplitudes is the same as that of other coefficients in the same band. Further, the accuracy of the yardstick coefficients in the present invention ensures that accurate ranges are used for determining reconstruction levels, which allows more efficient use of available bits. In addition to the foregoing specific implementations of the method and apparatus of the invention, additional variations are within the intended scope of the claims. It is possible to incorporate techniques that take into account the perceptual properties of human observers, in addition to, the estimation of the masking level. Further, more than one frame at a time may be considered. For instance, in the special case of silence, bits can be taken away from the frame in which the silence occurs, and given to another. In less extreme cases, it may still be appropriate to devote fewer bits to one frame than another. The establishment of bands can be done “on-the-fly”, by including in a band sequential coefficients that are close to each other, and then beginning a new band upon a coefficient of significantly different magnitude. The method and apparatus of the invention can also be applied to any data that is encoded, for instance to two-dimensional signals. The data need not have been transformed. The invention can be applied to time domain samples x(n), except that in the case of audio, the results will not be as good as they would be if the data were transformed. Transformation is typically applied to data to exploit patterns within the data. However, transformation need not be applied and, in some cases, where the data tends toward randomness, it is not typically beneficial. In the case of time domain samples the coefficients will, in fact be sampled signal elements having sampled amplitudes of the actual sampled signal, rather than some transformation thereof into another domain. The method of the invention is applied in the same fashion, excluding the transformation and inverse transformation steps. Similarly, the apparatus of the invention would in that case not require the forward and inverse transform operators. (It might, however, still be beneficial to perform the yardstick-only transformation.) Further, interaction between frames can also be implemented. The foregoing discussion should be understood as illustrative and should not be considered to be limiting in any sense. While this invention has been particularly shown and described with references to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the claims. Patent Citations
Non-Patent Citations
Referenced by
Classifications
Legal Events
Rotate |