Search Images Maps Play YouTube News Gmail Drive More »
Advanced Patent Search | Page images | Web History | Sign in

Patents

  
[graphic]

1

HYBRID SOFTWARE/HARDWARE VIDEO
DECODER FOR PERSONAL COMPUTER

BACKGROUND OF THE INVENTION 5

This invention relates to video decoding on personal computers.

MPEG video has become widely accepted as a standard for video. The original protocol, MPEG1, is in widespread use, and a new, higher-quality standard, MPEG2, is being 10 introduced. Typically, MPEG1 decoding performed on personal computers is done using software, as software decoding is much less expensive than hardware decoding, which requires a dedicated video decoder board. Today's highspeed processors (e.g., a 90+ MHz Pentiums) make such 15 software decoders possible. But at 30 frames per second, such decoders are forced to resort to approximating some of the MPEG1 decoding steps (e.g., dequantizing, IDCT, motion compensation), as they cannot otherwise decode quickly enough to keep up with the incoming video. The 20 result is noticeably degraded video quality.

Limited reliance has been placed on the graphics coprocessor chip (sometimes referred to as a graphics accelerator chip) in MPEG video decoding. The graphics coprocessor's role has been to convert the decoded video from YUV to RGB format and to scale the images to a desired size.

MPEG2 decoding will require about 4 times the computing resources required for MPEG1, making it likely that software decoding (with RGB transformation and scaling 3Q done by the accelerator chip) will not be feasible. This suggests that it will be necessary to use hardware decoders, e.g., dedicated video boards or chips, to handle MPEG2 decoding.

SUMMARY OF THE INVENTION 35

The invention provides a software/hardware hybrid decoder that takes advantage of processing capabilities of graphics coprocessors to perform the motion compensation portion of video decoding. The invention should make it 4Q possible to decode MPEG1 with full accuracy on today's PCs (e.g., 90-150 MHz Pentiums) and MPEG2 on the next generation PCs (e.g., Pentium MMX or Pentium Pro MMX), without the added cost of a dedicated hardware video decoder. Preferably, motion compensation is performed by 45 bit block transfer, or bit BLT, operations on the graphics coprocessor. The bit BLT operations may be used to add pixels in the reference and error blocks, and to interpolate between reference blocks to provide subpixel resolution for motion vectors. 50

We have found that about 40% of computational resources required for MPEG decoding are consumed in motion compensation. By moving that 40% of the computations to the graphics coprocessor, where the computations can be performed with bit BLT operations that require little 55 increase in chip complexity, the invention achieves greatly increased video decoding capability at relatively little increase in PC cost.

In general, the invention features decoding a series of frames of motion-compensated video data using a personal 60 computer that includes a central processor and a graphics coprocessor, wherein the software executing on the central processor extracts motion vectors from the video data and decompresses the video data, and the graphics coprocessor carries out the motion compensation. 65

In preferred embodiments, the software may also transfer frames of video data to the graphics coprocessor, which uses

2

the motion vectors to retrieve motion compensation reference blocks from the frames of video data. The decompression performed by the software may include Huffman decoding and RLE decoding. The software may perform the inverse DCT transform of the video data.

Other features of the invention will be apparent from the following description of preferred embodiments, and from the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of the video decoding process of a preferred embodiment of the invention.

FIG. 2 is a diagram showing the 8x8 pixel blocks making up each macro block of an MPEG encoded image.

FIG. 3 is a diagram illustrating a half-pixel interpolation in the X direction in computing a reference block for motion compensation.

FIG. 4 is a diagram illustrating the processing required to average a backward and forward reference block and to add an error block to a reference block.

DESCRIPTION OF THE PREFERRED
EMBODIMENTS

The decoding process of the preferred embodiment is shown in FIG. 1. The incoming compressed MPEG video stream is processed by software 10 running on the PC's main processor (e.g., a 150 MHz Pentium for MPEG1). The video packets are parsed, Huffman decoded (12), run length (RLE) decoded (14), and dequantized (16), to produce decompressed macro blocks (dequantizing also includes "dezigzagging", to remove the diagonal pixel ordering used by MPEG to improve RLE compression). The decompressed blocks are then inverse transformed (IDCT 22), which transforms the spatial frequency coefficients to pixels, and stored in the graphics memory associated with the graphics coprocessor.

Each macro block (FIG. 2) consists of four luminance blocks Yl, Y2, Y3, Y4, and two chrominance blocks U, V. Each block is an 8x8 array of DCT (discrete cosine transform) coefficients representing (in frequency domain form) either the luminance/chrominance at that location in the current frame (intra (I) frames) or the difference (or error) between the luminance/chrominance at that location in the current frame and a reference location in a reference frame(s). There are two types of difference frames: Predicted (P) frames, in which the coefficients represent differences between blocks in the current frame and reference blocks in a prior frame; bidirectional (B) frames, in which the coefficients represent differences between blocks in the current frame and reference blocks in either a future frame, a prior frame, or both a future and a prior frame. For both P and B frames, there are associated motion vectors that identify the reference blocks in the prior and future frames (frames are sent out of order, so that "future" reference frames arrive before the B frames that reference them). The motion vectors are processed (18) to compute the addresses of the reference blocks, and the addresses are supplied to the graphics coprocessor.

The graphics coprocessor 30 uses the reference blocks (which it reads from the reference frames using the supplied block addresses) to motion compensate (32) the decompressed, inverse transformed blocks (for P and B frames). The coprocessor also performs the linear transformation necessary to convert the YUV blocks to RGB form (34), scales the resulting frames as prescribed by the user (36), and provides an output for the PC's display monitor 38.

3 4

To increase the accuracy of motion estimation, MPEG Other embodiments are within the following claims. For

motion vectors have a one-half pixel resolution, which is example, although it could require an appreciable increase in

implemented by using as the reference block an interpolated chip complexity (because of multiplication steps required),

block assumed to lie one-half pixel in either, or both, the X and thus not achieve as dramatic gains in price/performance

or Y directions from an actual block. Performing that 5 as the preferred embodiment, the IDCT operation could also

interpolation requires that either an 8x9, 9x8, or 9x9 block be moved to the graphics coprocessor. Such a configuration

be processed to produce the interpolated reference block. could be quite practical, and of significant value, if the

Each pixel in the interpolated block is the average of either graphics coprocessor provided with the personal computer

two or four pixels. The interpolation is performed (40) using had built-in fast multiply capability, such as may be the case

a series of bit block transfer (bit BLT) operations in which in three-dimensional graphics coprocessors,

pixels from one 8x8 block are added to pixels of the 8x8 The block size referred to throughout the discussion of the

block one pixel over, and the sums are divided by two. preferred embodiment is 8x8, but other sizes could be used

Alternatively, if available in the graphics coprocessor, the (e.g., a macro block, 16x16, could be processed at once). For

interpolation can be performed using a scaling bit BLT, by frames in which many adjoining blocks receive the same

supplying the scaling bit BLT with either the 9x8, 8x9, or motion compensation, a large number of blocks (even

9x9 input block, and requesting an 8x8 output block. FIG. 15 approach an entire frame in size) could efficiently be pro

3 illustrates the operation for the simple case in which the cessed in a single bit BLT operation,

reference block R is the average of two 8x8 blocks A, B, If the invention is applied to MPEG2, it would probably

offset from one another by one pixel in the X direction. be preferable to use the next generation processor (e.g.,

In the case of B frames, the reference blocks from the 2Q Pentium MMX or Pentium Pro MMX).

prior and future frame are averaged (42), using the same bit What is claimed is:

BLT operation (add and divide by two) used for interpola- 1- A method of decoding a series of frames of motion

tion. compensated video data using a personal computer that

These interpolation and averaging operations provide the includes a central processor and a graphics coprocessor, the

reference blocks that are added (44) to the error blocks 25 method comprising the steps of:

produced by the inverse transform operation (IDCT). This executing a stored program on the central processor to

addition is also performed using a bit BLT operation. This carry out at least the following steps: extracting motion

particular bit BLT operation is not one conventionally found vectors from the video data, and decompressing the

in graphics coprocessor chips, but it could be added at little video data, and

increase in chip complexity. Pixels of the source block are 30 operating the graphics coprocessor to carry out at least the

added to pixels of the destination block and the resulting following step: motion compensating the video data

sums (after appropriate clipping) are written over the cor- based on the motion vectors using bit BLT operations,

responding pixels of the destination block. The pixels rep- 2. The method of claim 1 wherein the bit BLT operations

resenting the error terms are signed numbers, whereas the comprise adding the pixels of a source block to the pixels of

pixels representing the reference block are unsigned num- 35 a destination block to create sum pixels, and replacing the

bers. The bit BLT operation must, therefore, add a signed pixels of the destination block with the sum pixels,

number to an unsigned number, and provide appropriate 3. The method of claim 2 wherein one of the source and

clipping of the result (e.g., clipping if it exceeds an accept- destination pixels is an unsigned number and the other is a

able range, which could be the full 0 to 255 range provided signed number.

by 8 bits, or a smaller range such as 16 to 240, to allow 40 4. The method of claim 3 wherein the bit BLT operations

values outside those limits to be used for other purposes). comprise adding the pixels of a source block to the pixels of

FIG. 4 shows the bit BLT operations required to handle a destination block to create sum pixels, dividing the sum

the two reference blocks used in motion compensating a B Pixels bY a constant to create interpolated pixels, and replac

frame block. The add-and-divide-by-two operation could be ing the pixels of the destination block with the interpolated

implemented in at least two ways. The graphics coprocessor 45 pixels.

could be designed to read both blocks and perform the 5. The method of claim 1 wherein the step of decomhalf-pixel interpolation operation simultaneously. pressing the video data by the central processor includes Alternatively, it could read one reference block, write it to a decompressing the video data using RLE decoding, temporary location, and then read the second reference 6- A method of decoding a series of frames of motionblock, add it to the first block and divide by two, and write 50 compensated video data using a personal computer that the result to the temporary location. includes a central processor and a graphics coprocessor, the

FIG. 4 also shows the bit BLT operations required to add method comprising the steps of:

the reference block (the averaged blocks in the case of B executing a stored program by the central processor to

frames) to the error block. The reference data is the source carry out at least the following steps: extracting motion

block, and the error block the destination block. The addition 55 vectors from the video data, and decompressing the

of the source and destination blocks must be a straight add video data,

(no division by two), and since there is no divide by two, and operating the graphics coprocessor to carry out at least the since one value is signed (those from the error block), the following step: motion compensating the video data result must be clipped as noted elsewhere. using bit BLT operations, including interpolating to Preferably the bit BLT operations are performed in one or 60 determine an interpolated reference block, and wherein a small number of batch operations, in which a list of the bit the interpolating is performed using bit BLT operations. BLT operations are executed. Such batch processing can 7. A method of decoding a series of frames of motionperform the bit BLT operations more efficiently than is compensated video data using a personal computer that possible if isolated bit BLT operations are performed. Batch includes a central processor and a graphics coprocessor, the processing is made possible by providing sufficient memory 65 method comprising the steps of:

in which to store the lists of bit BLT operations needing executing a stored program on the central processor to

execution. carry out at least the following steps:

extracting motion vectors from the video data, and

decompressing the video data, and operating the graphics coprocessor to carry out at least the

following steps:

motion compensating the video data based on motion 5 vectors extracted by the central processor through executing the stored program; and

using the motion vectors to retrieve motion compensation reference blocks from frames of video data.

8. The method of claim 7 wherein the step performed by 1° the central processor of partially decompressing the video data comprises Huffman decoding.

9. The method of claim 8 wherein the step performed by the central processor of partially decompressing the video data further comprises RLE decoding. :5

10. The method of claim 9 wherein the steps performed by the stored program executed on the central processor further comprise forming the inverse transform of the video data to transform the data from spatial frequency coefficients to pixels. 20

11. The method of claim 7 wherein the steps performed by operating the graphics coprocessor further comprise interpolating to determine an interpolated reference block, and wherein the interpolating is performed using bit BLT operations. 25

12. The method of claim 8 wherein the bit BLT operations comprise adding the pixels of a source block to the pixels of a destination block to create sum pixels, dividing the sum pixels by a constant to create interpolated pixels, and replacing the pixels of the destination block with the interpolated 30 pixels.

13. A personal computer including a main processor wherein the personal computer further comprises:

software that, when read by the main processor, causes the main processor to extract motion vectors from video data, and decompressing the video data, and

a graphics coprocessor that performs motion compensation on the video data based on the motion vectors extracted by the central processor through executing the stored program; and

that uses the motion vectors to retrieve motion compensation reference blocks from frames of video data.

14. The personal computer of claim 13 wherein the software further comprises programming instructions that cause the main processor to execute Huffman decoding and RLE decoding, and wherein the video data includes MPEG video data.

15. The personal computer of claim 13 wherein the software further comprises programming instructions that cause the main processor to form the inverse transform of the video data to transform the video data from spatial frequency coefficients to pixels.

16. The personal computer of claim 13 wherein the motion compensated performed by the graphics coprocessor is performed using bit BLT operations.

17. The personal computer of claim 13 wherein the graphics coprocessor further comprises means for interpolating to determine an interpolated reference block, and wherein the interpolating is performed using bit BLT operations.

« PreviousContinue »