« PreviousContinue »
METHOD FOR EFFICIENT RATE CONTROL
CROSS-REFERENCE TO RELATED
The present application is related to U.S. application Ser. No. 09/223,073 (D.78922/PCW), filed December 1998, by Paul W. Jones, et al, titled, "METHOD AND APPARATUS FOR VISUALLY OPTIMIZED COMPRESSION PARAMETERS"; and, U.S. application Ser. No. 09/222,190 (D.78923/PCW), filed December 1998, by Chris W. Honsinger, et al, titled, "METHOD AND APPARATUS FOR VISUALLY OPTIMIZED RATE CONTROL".
FIELD OF THE INVENTION
The current invention relates generally to the field of image processing, and, in particular, to a method of controlling the rate and quality of compressed images.
BACKGROUND OF THE INVENTION 20
The advent of the Internet coupled with dramatic improvements in software and printing technologies is causing a migration from local printing presses to the semiautomated high-speed and high-volume shops that can be located virtually anywhere in the world. Printer manufacturers, in order to succeed in this new paradigm, are required to make faster and cheaper printers while maintaining the high levels of image quality required by various applications. Consequently, there is an ever increasing need for image compression, not just for storage considerations, but also for enabling the digital circuitry to keep up with the extremely high data rates dictated by the printing process.
These factors impose a strict requirement on the robustness and speed of the compression technology used in these 35 systems. Many printing applications have a strict requirement on the minimum compressed data rate in order to enable the high-speed printing process while at the same time, they also require very high quality decompressed images. Unfortunately, these two requirements are often in 4Q direct conflict with one another. In fact, a key factor in the design of many high-speed printing systems is the development of a compression strategy that reduces the amount of compressed data to or below a desired compressed file size (or equivalently, the desired data rate), while still achieving 45 a minimum (or better) level of quality. The current invention deals with the development of such a compression strategy.
A popular technique for the compression of continuous tone images is the JPEG international compression standard. (Digital compression and coding of continuous-tone still 50 images—Parti: Requirements and Guidelines (JPEG), ISO/ IEC International Standard 10918-1, ITU-T Rec. T.81,1993, or W. B. Pennebaker and J. L. Mitchell, JPEG Still Image Data Compression Standard, Van Nostrand Reinhold, New York, 1993). Briefly, when using JPEG compression, the 55 digital image is formatted into 8x8 blocks of pixel values, and a linear decorrelating transformation known as the discrete cosine transform (DCT) is applied to each block to generate 8x8 blocks of DCT coefficients. The DCT coefficients are then normalized and quantized using a frequency- go dependent uniform scalar quantizer.
In the JPEG standard, the user can specify a different quantizer step size for each coefficient. This allows the user to control the resulting distortion due to quantization in each coefficient. The quantizer step sizes may be designed based 65 on the relative perceptual importance of the various DCT coefficients or according to other criteria depending on the
application. The 64 quantizer step sizes corresponding to the 64 DCT coefficients in each 8x8 block are specified by the elements of an 8x8 user-defined array, called the quantization table or "Q-table". The Q-table is the main component in the JPEG system for controlling the compressed file size and the resulting decompressed image quality.
Each block of the quantized transform coefficients is ordered into a one-dimensional vector using a pre-defined zig-zag scan that rearranges the quantized coefficients in the order of roughly decreasing energy. This usually results in long runs of zero quantized values that can be efficiently encoded by runlength coding. Each nonzero quantized value and the number of zero values preceding it are encoded as a runlengfh/amplitude pair using a minimum redundancy coding scheme such as Huffman coding. The binary coded transform coefficients along with an image header containing information such as the Q-table specification, the Huffman table specification, and other image-related data are either stored in a memory device or transmitted over a channel.
As mentioned previously, the ability to trade off image quality for compressed file size in JPEG is accomplished by manipulating the elements of the Q-table. In general, each of the 64 components of the Q-table can be manipulated independently of one another to achieve the desired image quality and file size (or equivalently, the desired compression ratio or bit rate) or image quality. However, in most applications, it is customary to simply scale all of the elements of a basic Q-table with a single constant. For example, multiplying all elements of a given Q-table by a scale factor larger than unity would result in a coarser quantization for each coefficient and hence a lower image quality. But at the same time, a smaller file size is achieved. On the other hand, multiplication by a scale smaller than unity would result in a finer quantization, higher image quality, and a larger file size. This scaling strategy for trading image quality for compressed file size is advocated by many developers of JPEG compression products including the Independent JPEG Group (IJG) whose free software is probably the most widely used tool for JPEG compression. A current version of the software is available at the time of this writing from ftp://ftp.uu.net/graphics/jpeg/. The IJG implementation scales a basic Q-table by using a parameter known as the "IJG quality setting", which converts a value between 1 and 100 to a multiplicative scaling factor.
In many applications, an image needs to be JPEGcompressed to a pre-specified file size. This problem is sometimes referred to as "rate control". In the prior art, JPEG rate control is typically accomplished by compressing the image multiple times until the desired file size is achieved. First, the image is compressed using a basic Q-table, e.g., one that has been designed for that application or the example tables given in the JPEG standard specifications. If the target file size is not achieved, the components of the Q-table are appropriately scaled based on a predefined strategy, and the image is compressed again with the scaled Q-table. This process is repeated until the compressed file size is within an acceptable range of the target file size. One strategy that can be used to determine the Q-table scaling at each iteration is described in the prior art by Tzou (IEEE Transactions on Circuits And Systems For Video Technology, Vol 1, No. 2, June 1991, pages 184-196). Although this and other similar strategies often provides rapid convergence (usually 3 or less iterations), they usually require the pre-calculation of an operational rate-distortion (R-D) curve for the class of imagery used in that particular application. This curve is constructed by: (i) compressing
many representative images using different scaled versions of a basic Q-table; (ii) averaging the resulting file sizes for each Q-table scale to obtain the so-called "control points" on the curve; and (iii) using piecewise-polynomials to approximate all points between the control points by interpolation to 5 obtain a plot of the average file size against the scaling parameter.
A major drawback of such rate control strategies as practiced in the prior art is the amount of computation involved in the application of multiple compression cycles. 1° The current invention overcomes this drawback by compressing only a small set of randomly chosen image pixels with multiple Q-tables. The resulting operational R-D curve is then used to select an appropriate Q-table for compressing the entire image that achieves the target file size. This 15 method is extremely fast, requiring only a fraction of the system resources used in the prior art, while it is also accurate in predicting the resulting bit rate, thus making it ideal for high-speed printers.
SUMMARY OF THE INVENTION
The present invention is aimed at providing an extremely fast method for calculating the rate-distortion (R-D) characteristics of an image by using pseudo-randomly sampled blocks. 25
According to a preferred embodiment of this invention, a plurality of Q-tables, corresponding to different levels of visual quality, are generated and each Q-table is indexed with a quality parameter. For each original image to be compressed, a small set of image pixel blocks are randomly 30 chosen and compressed with the plurality of Q-tables, each of which is indexed by a quality parameter. The resulting set of control points, which denote compressed file size as a function of the quality parameter, are referred to as the rate-distortion (R-D) characteristics of the image and are 35 stored in a lookup table. This lookup table, or alternatively, an interpolated version of that using piecewise-polynomials, is then used to make rate-distortion tradeoffs in the system.
These and other aspects, objects, features and advantages of the present invention will be more clearly understood and appreciated from a review of the following detailed description of the preferred embodiments and appended claims, and by reference to the accompanying drawings.
ADVANTAGES OF THE PRESENT INVENTION 45
The present invention offers the following advantage of providing an extremely fast method for calculating the rate-distortion (R-D) characteristics of an image by using pseudo-randomly sampled blocks. ^
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a system level block diagram of the preferred embodiment of the invention;
FIG. 2 is a schematic of the pseudo-random block selec- 5J tion strategy and the calculation of the operational R-D characteristics;
FIG. 3 is a plot of the predicted bit rate resulting from the JPEG compression of only the randomly selected blocks versus the actual bit rate resulting from the JPEG compres- 60 sion of the entire image with the same Q-table;
FIG. 4 depicts a situation where the target viewing distance is less than the minimum viewing distance needed by the user;
FIG. 5 depicts a situation where the target viewing 65 distance falls between the two levels of quality specified by the user; and,
FIG. 6 depicts a situation where the target viewing distance is greater than the maximum allowed by the user.
DETAILED DESCRIPTION OF THE
FIG. 1 depicts a system level block diagram of the preferred embodiment of the present invention. We first describe the general function of the various blocks in FIG. 1 and then describe each individual block in more detail. While a high-speed printer is used as the example application in our preferred embodiment, it is understood that the methods described herein can be applied to digital cameras, software applications, satellite transmission systems or any imaging apparatus requiring a quality controlled use of compression.
Referring to FIG. 1, a digital image (10) is sent as input to a pseudo-random block selection module (12). The output of this module is a small set of image pixel blocks that have been pseudo-randomly chosen. The computational complexity of obtaining an operational R-D curve for the digital image (10) is significantly reduced by compressing only the blocks in the output of module (12) instead of the entire image. The digital image (10) is also sent to a frame buffer (8) to allow later access to the image data when compressing the full image.
In a preferred embodiment of this invention, response characteristics of the human visual system (HVS) are used to derive a plurality of Q-tables. In generating the appropriate Q-table values from an HVS model, parameters for the viewing conditions and display rendering are required. These parameters are input to a Q-table generation module (18) that constructs a set of Q-tables (20) based on a wide range of assumed threshold viewing distances, and each Q-table is indexed by a quality parameter. The set of Q-tables (20) and their corresponding quality parameters are stored in memory as a lookup table (LUT) where each Q-table is indexed by its associated quality parameter. These Q-tables, when used in compressing an image, will create a wide range of compressed file sizes. In a preferred embodiment of the present invention, the Daly visual system model described in U.S. Pat. No. 4,780,761 (also refer to "Application of a noise-adaptive contrast sensitivity function to image data compression," Optical Engineering, Volume 19, number 8, pages 979-987, August 1990) is used in conjunction with a range of threshold viewing distances extending from 0.5 inches to 20.0 inches and spaced at 0.5 inch intervals. The quality parameter used to index the i'h Q-table, Q„ is the corresponding threshold viewing distance D,.
The randomly selected blocks are sent as input to the operational R-D calculator module (14) where they are JPEG-compressed with the set of Q-tables (20). For each Q-table Q„ the compressed rate for each randomly selected block is combined to produce an estimated compressed file size R,. While file size is used as the example application in our preferred embodiment, it is understood that the methods described herein can be applied to bit rate or compression ratio to produce the same effect. The resulting set of data pairs (quality parameter, file size) constitute the control points for constructing the operational R-D characteristics of the input image (10). A continuous operational R-D curve can be constructed by using piecewise-polynomials to approximate the points between the control points. Alternatively, the discrete set of data pairs can be saved in a lookup table (LUT) and subsequently used to approximate the missing points in the R-D curve. In a preferred embodiment of the present invention, the operational R-D curve is