US 6658162 B1 Abstract A method for compressing and decompressing image information. An encoder receives initial image information and transforms said initial information using a linear transform to produce coefficients. These are then locally normalized using a neighborhood masking weighting factor, quantized and coded to result in a compressed bit stream. The compressed bit stream is received at a decoder and an inverse process is applied to reconstruct said image data from the compressed bitstream. Alternatively, the neighborhood-masking factor can be applied after quantization in the rate-distortion optimization process.
Claims(10) 1. A method for compressing and decompressing image information, the method comprising:
receiving initial image information at an encoder;
transforming said initial information using a linear transform to produce coefficients;
normalizing said coefficients using a neighborhood masking weighting factor based upon averages of absolute values of neighboring coefficients raised to a predetermined power in a causal neighborhood within a moving window producing locally normalized coefficients;
quantizing said locally normalized coefficients;
coding said quantized locally normalized coefficients thereby producing a compressed bit stream;
receiving said compressed bit stream at a decoder; and
applying an inverse process to reconstruct said image data from said compressed bitstream.
2. The method as claimed in
3. The method as claimed in
4. The method as claimed in
5. The method as claimed in
6. The method as claimed in
7. A method for compressing and decompressing image information, the method comprising:
receiving initial image information at an encoder;
transforming said initial information using a linear transform to produce coefficients;
applying a nonlinear transducer function raising the coefficients to a sower having a value in the approximate range of 0 to 1 to produce self-masking-compensated coefficients;
quantizing said coefficients;
coding said self-masking-compensated quantized coefficients based upon a distortion measure weighted by a neighborhood masking factor for each code-block, thereby producing a compressed bit stream of locally normalized coefficients;
receiving said compressed bit stream at a decoder; and
applying an inverse process to reconstruct said image data from said compressed bitstream.
8. The method as claimed in
9. The method as claimed in
10. The method as claimed in
Description This application claims priority to U.S. Provisional Patent Application No. 60/141,642, filed Jun. 26, 1999. This invention relates to method for coding images, more particularly to methods for image coding that use visual optimization techniques of self-contrast and neighborhood masking. Image compression involves coding image information in such a manner that the amount of data required to reconstruct the image is compressed. When the image information is transmitted, not as much bandwidth is required to transmit the image when it is compressed. The compression of images is typically referred to as image coding. The reconstruction is typically referred to as decoding. Image compression has as one of its goals the removal of statistical redundancy in the image data. Redundancy leads to increased bandwidth. Compression techniques try to minimize the distortion of the image within a given transmission bit rate, and minimize the bit rate when given an allowable distortion target. Another goal of image compression focuses on removing perceptual irrelevancy. Aspects of the image that cannot be detected by the human visual system are irrelevant. Therefore, it wastes resources and bandwidth to compress in such a manner to include these aspects. Compression schemes should take into account properties of the human visual system in the process of optimizing the coding. One common visual optimization strategy for compression makes use of the contrast sensitivity function of the visual system. Human eyes are less sensitive to high frequency errors, meaning that high frequency components of images can be more coarsely quantized. DCT and wavelet based compression systems use this strategy widely, as demonstrated in U.S. Pat. No. 5,629,780, issued May 13, 1997; S. Daly, Application of a Noise-Adaptive Contrast Sensitivity Function to Image Data Compression, The advantages of this technique become less noticeable for lower is resolution display and closer viewing distance. The contrast sensitivity function curve (CSF) tends to be flat in those conditions, not allowing the high frequency content to be more coarsely quantized without affecting the perception. Another perceptual phenomenon occurs in an effect called visual masking. Images acting as background signals mask artifacts locally. For example, in the wavelet transform domain a larger coefficient can tolerate a larger distortion than smaller coefficients. This occurs because the large coefficient has a larger background signal that masks the visual distortion. U.S. Pat. No. 5,136,377, issued Aug. 4, 1992, and U.S. Pat. No. 4,725,885, issued Feb. 16, 1988, show early work with this phenomenon. These attempts basically tried to scale the overall quantization values as a function of local image variance. These methods require processing overhead to notify the decoder what quantizer encoded a local block. One example of these techniques, found in U.S. Pat. No. 4,774,574, issued 1988, scales the individual coefficients in a zigzag scan of a DCT block as a function of the preceding coefficients. This avoids the overhead for specifying the quantizer. It exploits the coefficient masking effects where the low frequency components mask high frequency components. In wavelet applications, the coefficient masking effects result in intra-band masking, these ‘bands’ in DCT applications are the coefficients which have a narrow bandwidth. However, this approach has a potential problem in that the nature of the DCT and the zigzag effect do not allow accurate modeling of the masking effect. It is now understood that the masking property of human vision primarily occurs within spatial frequency channels that are limited in radial frequency as well as orientation. This makes it possible to quantize more coarsely as a function of the activity in spatial frequency and spatial location. Nonuniform quantization can then utilize the visual masking effects instead of overtly adaptive techniques. An advantage occurs in this approach because the masking effects are approximately the same in each channel. Once normalized, the same masking procedure can be used in each channel without incurring any overhead. An example of this technique can be found in U.S. patent application Ser. No. 09/218,937, filed Dec. 22, 1998 and co-owned by the assignee of the present invention. One method to exploit this masking effect, hereinafter referred to as self-contrast masking effect, for image compression puts the CSF-normalized transform coefficients through a nonlinear transducer function before a uniform quantization is applied. This results in a non-uniform quantization of the original coefficients. The decoder applies the inverse process between dequantization and inverse wavelet transform. Another example of this type of technique is shown in U.S. Pat. No. 5,313,298, issued May 17, 1994, although it uses the spatial domain rather than the frequency domain. Another method of exploiting visual masking controls individual code-block contribution. This was proposed in the JPEG2000 context in High Performance Scalable Image Compression with EBCOT, by David Taubman, submitted to However, in the post-compression rate-distortion optimization process of Taubman, the distortion metric takes into account the visual masking effect. In this step of the process sub-bitstreams from each code-block are assembled in a rate-distortion-optimized order to form the final bitstream. The modified metric effectively controls the bit allocation among different code-blocks, taking advantage of the visual masking effect. The distortion of each coefficient is weighted by a visual masking factor that is generally a function of the neighboring coefficients in the same subband. This will be referred to as spatially extensive masking or neighborhood masking. It treats each coefficient value, V
and the masking strength function is
with A being the normalization factor. The weakness of this approach is that it only adjusts the truncation point of each code-block. This is a spatially coarser adjustment than the sample-by-sample compensation offered by the approach discussed in the U.S. patent application Ser. No. 09/218,937, mentioned previously. The bit stream order within each code-block, usually no less than 32×32, does not take into account any visual masking effect. In the article APIC: Adaptive Perceptual Image Coding Based on Subband Decomposition with Locally Adaptive Perceptual Weighting, published in However, this estimate may not be accurate, as the coefficients are de-correlated. It also does not take advantage of spatially extensive, or neighborhood, masking. Therefore, there is a need for a coding method that takes into account both the self-contrast masking and the neighborhood masking effects. It must take these effects into account without significantly increasing the overhead of the encoder or decoder. One aspect of the invention is a method for compressing and decompressing image information. The method includes the steps of receiving initial image information at an encoder, and transforming the initial information using a linear transform to produce coefficients. These coefficients are then locally normalized with a neighborhood-masking factor, and then quantized and coded to produce a compressed bitstream. The compressed bitstream is decoded at a decoder using an inverse process. An alternative embodiment applies the neighborhood masking weighting factor during encoding, after quantization, and uses self-masking-compensated coefficient prior to quanitzation. Either one of these embodiments can be combined with the contrast sensitivity function and the local luminance sensitivity of the human visual system. For a more complete understanding of the present invention and for further advantages thereof, reference is now made to the following Detailed Description taken in conjunction with the accompanying Drawings in which: FIG. 1 shows one embodiment of an encoder and decoder process in accordance with the invention. FIG. 2 shows a causal neighborhood in accordance with the invention. FIG. 3 shows an alternate embodiment of an encoder and decoder process in accordance with the invention. The masking property of human vision primarily occurs within spatial frequency channels. The approach, as described in U.S. patent application Ser. No. 09/218,937, exploits the self-contrast masking by applying a non-linear transducer function to the coefficients prior to uniform quantization. This essentially protects low amplitude coefficients, whereas the distortion introduced by more coarsely quantizing high amplitude coefficients is well masked by the coefficients themselves. However, several problems exist with the approach for wavelet or DCT based compression systems. The first problem results from the assumption that the wavelet/DCT band structure and filters are a good match to the visual system's underlying channels, which is generally not true. Although the wavelet structure is a much better model of the visual system than the DCT, it still has a problem with the diagonal band due to the Cartesian separable approach. In the visual system, frequencies at 45 degrees orientation have very little masking effect on those at −45 degrees, but the diagonal band has no way of distinguishing the two. This gives rise to artifacts perpendicular to the diagonal edge. The second problem also relates to the diagonal edges. The horizontal (H) and vertical (V) bands encroach on the diagonal signals at multiples of the Nyquist/2 To overcome the over-masking at diagonal edges, other properties of the human visual system (HVS) must be taken into account. One of the solutions is to exploit the masking capability of a complex region, therefore allocating more bits to smooth regions or regions with simple edge structures. More specifically, a masking weighting factor can be derived for each coefficient. This factor may be derived as a function of the amplitudes of neighboring coefficients as suggested by Taubman. An advantage of this strategy lies in its ability to distinguish between large amplitude coefficients that lie in a region of simple edge structure and those in a complex region. This feature assures good visual quality of simple edges in a smooth background, often critical to the overall perceived visual quality, especially for wavelet or DCT compression. Therefore, the present invention intends to exploit both the self-contrast masking and neighborhood masking effects of the HVS to maximize the perceived quality of the compressed images. A fixed uniform quantizer or a fixed deadzone quantizer will be assumed in the compression system, for purposes of discussion only. This is believed to be the most convenient way to lead to a quality scalability of the compressed bitstream. However, there is no limitation of applicability of the invention to other types of quantizers. With this assumption, the only way to account for the masking effect is to modify the original wavelet or DCT coefficients according to the HVS model prior to uniform quantization. Similarly, while the below discussion assumes wavelet based compression, that is only for purposes of discussion. The invention can be applied to many transform-based coding systems, including DCT, wavelet, stearable pyramid, cortex-transform based, among others. Wavelet compression is only used to facilitate understanding of the invention. FIG. 1 shows one embodiment of an encoder/decoder process in accordance with the invention. Initial or original image information In step This step assumes each signal with which a coefficient is associated is lying on a common flat background. Under this assumption, {y To further exploit the neighborhood masking effect, the second step normalizes y where {circumflex over (x)} denotes the quantized version of x. The neighboring coefficients could be in the same subband, or they could be from different frequency subbands but around the same spatial location. As mentioned previously, the second part of step For ease of discussion, the first step will be referred to as self-masking and the resulting values of y It must be noted that the first part of step As will be discussed in more detail later, FIG. 1 shows the implementation of one embodiment of the invention, wherein this step is performed prior to the uniform quantization step Continuing the discussion of the approach in FIG. 1, the process moves to step To make sure the inverse process is feasible, it is necessary to discuss interrelation between the encoder and decoder. The decoder must perform the reverse process of the encoder as shown in FIG. At the encoder, quantized versions of the neighboring coefficients that are available at the decoder will be used. The neighborhood has to be causal in the sense that each coefficient x
where z is the locally normalized, self-masking-compensated coefficient and will be subject to uniform quantization. In this transform, |φ| denotes the size of the causal neighborhood, a is the normalization factor and the causal neighborhood contains coefficients in the same band that lie within an NxN window centered at the coefficient and appear earlier in the raster scan order than the current coefficient. The causal neighborhood does not include the current coefficient itself in order to have an explicit solution for the inverse process. α is a value between 0 and 1, typically 0.7 or 0.8. β is a positive value, and together with N and a is used to control the degree of neighborhood masking. An example of the neighborhood is shown in FIG. β and N play important roles in differentiating coefficients around simple edges from those in the complex areas. N controls the degree of averaging, β controls the influence of the amplitude of each coefficient. Preferably, β is chosen to be a value of less than 1, a typical value is 0.2. This protects coefficients around simple sharp edges, as these coefficients typically have high values. A small value of β suppresses the contribution of large coefficients around sharp edges to the masking factor. Quantized neighboring coefficients will be used at the encoder to ensure that both the encoder and decoder perform exactly the same operation to calculate w
Unfortunately, for embedded coding the encoder cannot do the non-linear transformation based on the exact actual decompressed/quantized version of the coefficient x Nevertheless, in embedded coding the discrepancy of w As long as n is large enough with respect to the available bit rate at the decoder, the decoder will obtain the same quantized version of the neighboring coefficients. The compromise results in a coarser granularity of w As mentioned previously, an alternate embodiment implements the second part of visual masking differently. This embodiment is shown in FIG. For example, if the same power function is applied where y
It should be noted that the Taubman reference implements
In the above implementation, the second step is used to adjust the truncation points of each code-block. This is a coarse adjustment. Experiments have shown that application of the invention significantly improved image quality when compared to self-contrast masking approach of U.S. patent Ser. No. 09/218,937, and the neighborhood masking approach in the Taubman reference. It preserves low amplitude texture quality while maintaining good quality at sharp edges. The invention can be combined with other properties of the HVS including local luminance sensitivity and the contrast sensitivity function (CSF). The original coefficient x The concept of neighborhood masking through a measure of local frequency activity can also be extended to DCT-based coding systems such as JPEG, MPEG and H.26x. The neighboring coefficients in these systems are the coefficients of the same bands in the neighboring blocks. In particular, the DCT coefficients of 8×8 blocks can be reorganized into a structure similar to a wavelet subband structure. Each DCT coefficient would be regarded as a local frequency component and coefficients coming from the same location in the block DCT domain can be grouped together to form a subband. This reorganization allows scalable, or bit-plane embedded, coding and the proposed visual masking scheme can be applied to these reorganized subbands. Thus, although there has been described to this point a particular embodiment for a method and structure for coding and decoding of image information using the HVS model, it is not intended that such specific references be considered as limitations upon the scope of this invention except in-so-far as set forth in the following claims. Patent Citations
Non-Patent Citations
Referenced by
Classifications
Legal Events
Rotate |