WO1998011729A1 - Compression and decompression scheme performed on shared workstation memory by media coprocessor - Google Patents

Compression and decompression scheme performed on shared workstation memory by media coprocessor Download PDF

Info

Publication number
WO1998011729A1
WO1998011729A1 PCT/US1997/012437 US9712437W WO9811729A1 WO 1998011729 A1 WO1998011729 A1 WO 1998011729A1 US 9712437 W US9712437 W US 9712437W WO 9811729 A1 WO9811729 A1 WO 9811729A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
memory
compression
system memory
transferring
Prior art date
Application number
PCT/US1997/012437
Other languages
French (fr)
Inventor
Mark W. Troeller
Michael L. Fuccio
Henry P. Moreton
Bent Hagemark
Te-Li Lau
Original Assignee
Silicon Graphics, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Silicon Graphics, Inc. filed Critical Silicon Graphics, Inc.
Priority to AU36027/97A priority Critical patent/AU3602797A/en
Priority to DE69720477T priority patent/DE69720477T2/en
Priority to EP97932621A priority patent/EP0925687B1/en
Priority to JP10513639A priority patent/JP2001500686A/en
Priority to CA002259513A priority patent/CA2259513C/en
Publication of WO1998011729A1 publication Critical patent/WO1998011729A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/423Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding

Definitions

  • the present invention relates to data compression and decompression and more particularly, to compression and decompression of digital information in a real-time computer system environment.
  • Image data which may include audio information, may be generated by video cameras or other known devices as a stream of bits, or bit stream. Image data is generated in real-time and is transferred from one device to another in the computer system or over a network. Due to the large amounts of data associated with image data, compression systems, such as JPEG, MPEG, M-JPEG, H.261 and others, have been adopted to define methods of compressing the bit stream. As is well known, compression systems are implemented because some devices, such as fixed disks, CD-ROM drives, as well as most network protocols require compressed video image data before delivering or accepting a real-time video stream.
  • Compression systems reduce spatial and temporal redundancy in a sequence of pictures.
  • Most standard compression systems work on individual blocks of data so the tokens corresponding to patterns in one block may differ from the tokens of another block. This variability creates complexity in coding and decoding the data. Further, the variability makes it difficult to design a single hardware embodiment capable of coding and decoding the different algorithms used by the various compression systems.
  • a computer system it is common for a computer system to include more than one hardware and/or software compression systems. Further, the variability requires that the various components of the image capture and display system be compatible so as to prevent creation of artifacts or inclusion of other visible errors into the image.
  • Software compression systems operate as a software driver that intercepts the bit stream coming from, for example, an image source, such as a camera or CD-ROM, stores the data in system memory and invokes the system's central processor (CPU) to compress the data before sending it to a destination device, such as the disk drive.
  • CPU central processor
  • the software engine When the data is later accessed, the software engine must first transfer the compressed data to memory and, using the CPU, run a decompression algorithm on the data before it may be displayed or otherwise manipulated by the CPU.
  • One advantage of software compression/decompression is that there is complete versatility as to the selection of the appropriate compression or decompression algorithm.
  • software compression systems are slow and tend to consume large amounts of the central processor's computational time. Indeed, the demands of some compression algorithms are so great, most MPEG software compression drivers are unable to compress full size video images in real time.
  • Hardware compression engines are common and are available as board level devices.
  • U.S. Patent No. 5,357,614 which issued on October 18, 1994 and entitled DATA COMPRESSION CONTROLLER discloses a compression engine that compresses/decompresses data as it is transferred to or from a storage device such as a tape drive.
  • This compression engine uses a proprietary compression algorithm and is not capable of decompressing data compressed according to other standards.
  • the board must also include expensive local memory, a local processor and/or control and interface logic.
  • the coprocessor does not have adequate memory available, it is possible that the coprocessor may not be able to properly compress or decompress images in accordance with many available algorithms. As will be readily appreciated, providing substantial dedicated memory in the coprocessor, which may be sparingly utilized, will increase system cost. Further, if the CPU must process image data in some manner, such as. by way of example, adding overlays, changing shadings or perform color space conversion, there will be unnecessary data transfers between system memory and the coprocessor's memory which may unnecessarily increase the load on the system's bus and the throughput rate.
  • What is required is a system that compresses video and image data that is independent from the requirements of the image capture device or the display device. What is also required is a system that provides the processing power to manipulate billions of arithmetic operations per second across a wide range of real world applications so that the image data may be manipulated before it is compressed and stored (or transmitted over a network) or decompressed displayed.
  • the present invention relates to a general purpose circuit that maximizes the computing power of a Unix workstation or other computer system for processing image data in realtime while providing the ability to compress or decompress the image data.
  • the present invention dynamically allocates system memory for storage of both compressed and uncompressed data and ensures adequate compression and decompression rates.
  • the circuit of the present invention converts between various image protocols, color space and different signal domains such as frequency and time with little impact on system performance.
  • the performance requirements of such operations are provided by a novel design that enhances the flow of data through the computer system without requiring significant processing resources from the central CPU.
  • new and existing image processing tasks are provided without adding memory circuit elements dedicated to such tasks.
  • the resources of the present invention may be shared across multiple software applications such as texture generation, data compression or other image processing functions since image data may be readily processed in system memory.
  • the present invention manipulates video and image data in system memory with a video, imaging and compression (VIC) engine that consists of a DMA (direct memory access) controller to move data to and from system memory to the VIC engine, a media signal processor that performs integer, logical and mathematical operations necessary for signal processing and a bit in compression and de-compression algorithms.
  • VIP video, imaging and compression
  • the combination of using the workstation memory for storage of image data and the VIC engine is very powerful and versatile in that image data can be stored, retrieved, and manipulated by the CPU as opposed to prior art computer systems than merely record and display image data in a manner that emulates a television or a photograph.
  • the workstation since the VIC engine off-loads the compression or decompression tasks, the workstation is capable of over an additional billion instructions per second for general purpose pixel manipulation. Accordingly, the present invention permits a wide range of flexibility in processing data without the addition of expensive multiple dedicated coprocessors or add-on peripherals such as rendering and blending or video capture, and display devices. Further, the workstation and its available memory is flexible enough to support many compression and decompression algorithms regardless of memory requirements.
  • the present invention advantageously uses the large bandwidth system bus Or the workstation for transferring image data to or from system memory which serves as a frame buffer, a Z buffer or texture memory and permits economical sharing of low-level functional elements, such as the arithmetic block, logical block, and control flow block of the VIC engine, among many peripherals and processes involving multiple memory-to-memory activities that may not necessarily involve a peripheral.
  • the CPU may manipulate the image data before it is compressed by the VIC engine.
  • the data is transferred from the storage device to system memory and made available to the VIC engine.
  • data is transferred to the VIC engine, decompressed and transferred back to system memory where the CPU may further process the data before it is sent to the display device.
  • Using the shared system memory to implement the VIC engine functions requires that dedicated areas of system memory be mapped for use by the VIC engine so as to prevent other areas of system memory from being overwritten by the VIC engine.
  • Memory mapping in the preferred embodiment is controlled by the host CPU and dynamically programmed into a table memory provided in the VIC engine. While it is desirable to provide up to 4 Mbytes of system memory for use by the VIC engine, the system memory need not be contiguous.
  • the present invention provides a system that permits the manipulation of video and image data rather than just displaying image data in a manner that merely emulates a television or photographs.
  • the present invention frees up the workstation to sort or manipulate image data, to perform content recognition, as well as to compress or decompress the image data in real-time.
  • FIGURE 1 illustrates a simplified block diagram of one embodiment of the present invention.
  • FIGURE 2 illustrates a simplified block diagram of one embodiment of the video, imaging and compression (VIC) engine.
  • VOC video, imaging and compression
  • FIGURE 3 illustrates on-chip memory allocation for decoding data in accordance with a known decompression algorithm.
  • FIGURE 4 represents a timing chart showing the concurrent operation of the processors of the VIC engine in a decode operation.
  • FIGURE 5 illustrates memory allocation for encoding data in accordance with a known compression algorithm.
  • FIGURE 6 represents a timing chart showing the concurrent operation of the processors of the VIC engine in an encoding operation.
  • Figure 1 shows a preferred embodiment of a computer system 100 of the present invention having a central processing unit (CPU) 102, system memory 104 which is preferably a block of between 64 megabytes and one gigabyte of high speed RAM, a system controller 106 that controls transfer of data and instructions between CPU 102, system memory 104 and a plurality of peripherals such as a graphics display, a disk drive and other input/output (I/O) devices (not shown) .
  • Graphics interface 108 couples system controller 106 to a write-only display device while I/O interface 110 couples system controller 106 to a plurality of SCSI, Ethernet, PCI, video or audio peripherals suck as a video capture card or a CD-ROM.
  • computer system 100 also includes a video, imaging and compression (VIC) engine 112 that is coupled to both CPU 102 and system controller 106 by a high speed system bus 114.
  • VIP video, imaging and compression
  • VIC engine 112 shown in greater detail in Figure 2, contains four major functional blocks. Specifically, a media signal processor 200, a bit stream processor circuit 202, a DMA controller 204 and a host interface 206. Processors 200 and 202, together with the controller 204 and the interface 206 accelerate standard compression and de-compression algorithms.
  • video or image data from an input device such as a video camera or a CD-ROM ( or any other uncompressed data) is transferred to a portion of system memory 104.
  • VIC engine 112 may initiate a block transfer using dedicated DMA controller 204 to transfer the data from system memory 104 to one of the three dedicated memories 222-226.
  • the VIC engine compresses the data and stores compressed data to a second of the three memories 222-226. While data in the first memory is being compressed, additional • lata may be transferred to a third of the three memories. Upon completion of the compression operation, the data in the second of the memories is transferred back to a different portion of system memory 104 by the DMA controller 204.
  • the compressed data may then be transferred from system memory 204 to a disk or other storage device (not shown) . Compression on data in the third memory is then initiated with compressed data being stored in the first memory.
  • the DMA controller 204 concurrently transfers data to the second of the memories. This process is continued until all data in system memory is compressed.
  • the CPU may manipulate the image data before it is compressed by VIC engine 112.
  • the data is transferred from the storage device to system memory and made available to VIC engine 112.
  • VIC engine 112. Under DMA control, data is transferred to one of the memories 222-226, decompressed and transferred back to system memory where the CPU may process the data before transfer to the display or other peripheral device.
  • VIC engine 112 may operate either as a slave on system bus 114 responding to bus transactions initiated by either CPU 102 or system controller 106 or as a bus master. VIC engine 112 may also request mastership of system bus 114 for performing pipelined writes and reads or DMA transfers to or from system memory 104.
  • media signal processor 200 is an implementation of prior art Silicon Graphics MSP architecture that performs cosine transformation on visible pixels where YCrCb value is converted into the frequency domain.
  • Media signal processor 200 includes a scalar unit processor 21 and a vector unit processor 212 both of which operate at 66 Mhz.
  • Scalar unit processor 210 and vector unit processor 212 are coupled to a common 4K-byte instruction memory 214 by a 64-bit wide bus 216.
  • Scalar unit processor 210 performs control flow operations (such as jump and branch instructions) , scalar integer arithmetic operations, logical operations and local memory load and stores.
  • Scalar unit processor 210 is coupled by a three state bus 211 to vector unit processor 212, host interface 206, DMA controller 204 and bit stream processor circuit 202.
  • Scalar unit processor is also coupled to memories 222, 224 and 226 by a 128-bit wide data path 213 and output path 215 via crossbar switch 228.
  • Crossbar switch 228 permits a one-to-one connection between processors 210 and 212 and memories 222-226 ensuring minimal capacitance bus loading.
  • Vector unit processor 212 is a single instruction multiple data (SIMD) processor that performs math operations and particularly, performs the cosine transformation so that visible pixels, having a YCrCb value, are converted to the frequency domain for compression or from the frequency domain to the spatial domain for decompression.
  • Vector unit processor 212 is coupled to memories 222, 224 and 226 by an internal, 128-bit wide data path 218 and output path 220 via crossbar switch 228.
  • Data paths 218 and 220 may be sliced in eight sixteen bit segments for the purpose of performing integer mathematical operations. No branch instructions are included in the instruction set of vector unit processor 212 as such functions are performed by scalar processing unit 210.
  • Bit stream processor circuit 202 is a programmable device tailored for processing bit streams of compressed data. It is capable of performing entropy encoding which requires variable length lookups for compression algorithms as well as handling additional protocol, such as bit stuffing, header and preamble generation, etc., as may be required by a particular compression standard.
  • Bit stream processor 230 is preferably a RISC core with a load and store architecture. It has an instruction set stored in instruction memory 232 comprising register to register operations (i.e.. arithmetic operations), instruction stream control (i.e., jumps and branches) and memory to register transfer of data.
  • bit stream processor 230 has instructions that are specific to manipulating arbitrarily aligned "tokens" or artifacts of the compression process, in a bit stream of data. Further still, bit stream processor 230 has instructions that can perform table lookup operations necessary to decode variable length tokens in a bit stream.
  • the lookup tables are stored in table memory 234 and are programmable to support MPEG-1, MPEG-2, H.261, JPEG or proprietary algorithms. The lookup tables are further programmable by CPU 102 to dynamically map areas of system memory 104 available for use as a frame buffer.
  • DMA controller 204 provides flexible address generation so that VIC engine 112 can access regions of system memory 104 to acquire blocks of data that need to be either compressed or decompressed.
  • DMA controller 204 consists of two DMA channels that either media signal processor 200 or bit stream processor circuit 202 or CPU 102 may access. Each channel consists of a DMA state machine, control registers, and a descriptor memory. Descriptors are used to define starting addresses of DMA controller 204 and further define the mode (i.e. Read/Write or Y/C split) and span and stride settings.
  • DMA controller 204 may be used to fill system memory 104 with data from the VIC engine 112 or to transfer data from system memory 104 to memory 214, 222-226 and/or 232. For DMA transactions to or from system memory 104, DMA controller 204 decomposes descriptor requests into a respective physical UNIX system memory address and byte count.
  • a lookup table in table memory 234 converts between contiguous memory address space of the media signal processor instruction memory 214, the bit stream processor circuit instruction memory 232 and the corresponding memory address space of system memory 104.
  • System memory 104 appears to VIC engine 112 as a region of four megabytes of contiguous addressable memory space allocated for use by the processors 200 and 202. This allocated system memory is grouped in physically contiguous 64K byte blocks.
  • the lookup table maps the location requested in the 4 megabyte region (as seen by VIC engine 112) into system memory into one of 64 different 64K byte pages of physical system memory 104. As part of the lookup table, there are bits to see if the VIC engine 112 is allowed to write to a selected block as well as bits to see if that block is mapped. VIC engine 112 can interrupt CPU 102 and halt if a write violation occurs or a block is not mapped to protect system memory 104 from corruption.
  • the lookup table can be re-programmed by CPU 102 as tasks assigned to VIC engine 112 change. This feature allows different 64K blocks of system memory 104 to be mapped into the VIC engine
  • 112 viewport containing the lookup table without actually moving any data in system memory 104. This feature is useful for quickly switching between processes on computer system 100 that wish to share the VIC engine resource. As one skilled in the art will appreciate, if a local dedicated memory connected to the VIC engine were in use, it would most likely need to be saved and restored for each process switch.
  • Host interface 206 couples VIC engine 112 with the 64-bit system processor bus 114 of computer system 100.
  • DMA arbiter 240 performs arbitration to allow VIC engine 112 to initiate transactions on system bus 114 without any intermediary overhead. Further, VIC engine 112 is able to respond as a slave to system processor bus 114 transactions or to request control of the system processor bus 114 to perform block pipelined writes and reads.
  • data is transferred to system memory 104 where CPU 102 may manipulate the data under control of an application program before the data is transferred to one of memories 222 - 226.
  • VIC engine 112 compresses the transferred data in accordance with a selected compression standard (i.e., JPEG, MPEG, M-JPEG, H.261, etc.). Specifically, VIC engine 112 operates on the data loaded in a first of the three memories, encoded data is stored in the second memory, while the next frame of data is being loaded into the third of the three memories 222-226.
  • a selected compression standard i.e., JPEG, MPEG, M-JPEG, H.261, etc.
  • the encoded data is transferred from VIC engine 112 to system memory 104 where CPU may concatenate the compressed data with index tags or other information that will assist in future sorting of the compressed images.
  • the compressed data may be retained in local storage device or transferred across a network.
  • DMA controller 204 When compressed data is to be recovered, the data is first loaded into system memory 104 where CPU 102 may manipulate or process the compressed images. The compressed data is transferred to memories 222-226 by DMA controller 204. DMA controller 204 is also responsible for transferring data to the VIC engine 112 for the operation of the appropriate decompression algorithm. DMA controller 204 is further responsible for transferring the uncompressed data to system memory 104. Advantageously, CPU may further manipulate the image before routing the image to the display device.
  • memories 222-226 comprise about 6K bytes of static RAM with each memory having 128 by 128 bits.
  • Memories 222-226 are dual port memory blocks with access controlled by ports 242, 244 and 246, respectively.
  • Vector unit processor 212 and scalar unit processor 210 have deterministic access to memories 222-226 that is guaranteed by DMA arbiter 240 and buses 248 and 256.
  • Host interface 206, bit stream processor circuit 202 and DMA controller 204 access memories 222-226 over DMA bus 250.
  • JPEG compression is a standardized image compression mechanism for the archival and transmission of still images such as digitized photograph. q, graphs, etc. JPEG compression can achieve compression ratios ranging from, for example, 4:1 to 20:1 for a full color photograph so an image consisting of 2 Mbytes of data may be compressed to about 100 Kbytes.
  • VIC engine 112 data flow through VIC engine 112 is depicted, by way of example, for decoding JPEG images.
  • an instruction sequence for implementing an algorithm for JPEG compression or decompression is loaded into memories 214 and 232 (which are shown in Figure 2) .
  • a minimum coded unit comprising a level of granularity that is determined by the picture size, picture generation rate and the cycle time available for processing the unit of granularity is transferred by DMA controller 204 to a first in, first out (FIFO) 302.
  • FIFO first in, first out
  • data is transferred at the minimum coded unit level where a minimum coded unit comprises 256-bytes.
  • DMA controller 204 transfers data to FIFO 302 associated with bit stream processor circuit 202 64-bytes at a time.
  • DMA bus arbiter 240 arbitrates between bus contentions for accessing system memory 104 and responding to bit stream processor circuit 202 requirements.
  • FIFO buffer 302 is a 64-byte buffer that is emptied by bit stream processor 23032-bits at a time.
  • Bit stream processor 230 is used to detect "markers" in compressed bit streams. These markers identify variable length coded segments within the compressed bit streams. Bit stream processor 230 can decode these variable length coded segments to reconstruct 8x8 blocks of discrete cosine transform (DCT) coefficients which are further processed by scalar unit processor 210 and vector unit processor 212. Bit stream processor 230 performance is driven by both the number of bits to be decoded as well as the number of tokens.
  • DMA controller 204 stores decompressed data from a write buffer 304 in memory 222 before performing a block transfer to system memory 104.
  • bit stream processor 230 decodes the bitstream corresponding to an i-th+1 minimum coded unit and loads the decoded bitstream into write buffer 304.
  • Write buffer 304 comprises an intermediate 4-byte register from which the decoded bitstream is transferred to memory 222 under control of DMA controller 204. While bit stream processor 230 is processing the i-th+1 minimum coded unit, the i-th minimum coded unit, which stored in memory 224, is available for processing by media signal processor 200. Media signal processor 200 must have deterministic access to the data in memory 224 so its memory access is accorded the highest priority. Results of the i-th-1 minimum coded unit decoded by the media signal processor are stored to memory 226 and transferred by DMA controller to system memory 104. It is possible, in alternative embodiments, for the decoded i-th-1 minimum coded unit to be stored in an unused portion of memory 222 when using the JPEG algorithm.
  • the decode registration diagram of Figure 4 shows the process for 4:2:2 level decoding of a JPEG bitstream.
  • the decode cycle duration in the preferred embodiment is 800 cycles, as shown at 402.
  • DMA controller 204 transfers data to or from system memory 104 as generally shown at 406; 2) media signal processor 200 decodes the i-th macroblock as shown at 410; and 3) bit stream processor circuit 202 works on conditioning the i-th +1 minimum coded unit, as shown at 404.
  • bit stream processor circuit 202 begins the process by decoding the minimum coded unit's header information, as indicated at 412.
  • DMA controller 204 first zeros out i-th+1 memory 222, as shown at 414. and transfers the i-th -1 minimum coded unit to system memory 104, as shown at 416.
  • DMA controller 204 also responds to the bit stream processor's data requirements, as shown at 408 while bit stream processor decodes the compressed minimum coded units of data. As shown at 416, media signal processor 200 updates the decode information, such as Q matrix information. Media signal processor 200 then performs an inverse quantization of the data in memory 224, as indicated at 418. Upon completion, media signal processor 200 then performs the inverse discrete transform of the data and stores the result in memory 226. Memories 222-226 and parallel operation of DMA controller 204 with bit stream processor circuit 202 and media signal processor 200 compensate for the latency of the system memory.
  • bit stream processor circuit 202 and media signal processor 200 apply multi-tasking to the computational and decision making tasks so as to accomplish the compression or decompression process with relatively inexpensive processors operating at relatively slow 66 MHertz clock rates.
  • the decode operation is completed with minimum memory requirement for memories 222-226 in a manner that compensates for any latency associated with obtaining data from system memory 104. In this manner, data will always be available for processing by scalar and/or vector units 210 and 212 of media signal processor 200.
  • Encoding of JPEG data is described in conjunction with Figures 5 and 6 where DMA controller 204 loads a portion of memory 222 with the i th+1 4:2:2 macro control block and a corresponding quantization table. While memory 222 is being loaded, media signal processor 200 performs quantization and a discrete cosine transform on four 8x8 blocks of data and saves the results in memory 226. Concurrently, bit stream processor circuit 202 Huffman codes the i-th -1 data in memory 226 and writes the result to write buffer 304. DMA controller 204 then transfers the compressed data from write buffer 304 to system memory 104.
  • DMA controller 204 loads the i-th+1 macro control block at 610 when it is not attending to requests from write buffer 304, as indicated at 604.
  • Media signal processor 200 performs the discrete cosine transform on the i-th macro control block up to 612. Thereafter, media signal processor 200 performs quantization transform on the i-th macro control block and transfers the result to memory 226, as indicated at 614.
  • bit stream processor circuit 202 is performing Huffman coding and bitstream packing on the data in memory 226. Under ideal conditions, such operations will be completed as indicated at 618. However, since media signal processor 200 has priority access to memories 222-226, the time to complete the bit stream processor circuit's operation will most likely need to be de-rated, as indicated at 620, due to memory contention.
  • MPEG MPEG-1 and MPEG-2
  • MPEG adds adaptive quantization at the minimum coded unit (16 x 16 pixel area) layer.
  • data is first loaded, by way of example, in memory 222 as well as a backward predictor and a forward predictor.
  • Memory 224 is used to for the resultant decoded or encoded data while the next frame of data is loaded by DMA controller 204 into memory 226.
  • VIC engine 112 affords the computational power to compress or decompress data in accordance with many known compression or decompression algorithms. Since data is initially routed to system memory, the VIC engine 112 does not require substantial dedicated memory since it needs only operate at a rate sufficient to minimize the amount of system memory 104 consumed. In the preferred embodiment, about 4 Mbytes of system memory 104 is allocated for up to six frames of uncompressed image data assuming a 640x480 pixel display. Additional system memory 104 is allocated for the compressed image data which in the worst case may amount to about 160 Kbytes for each frame, assuming a lightly compressed rate of 4:1. Higher compression rates would accordingly reduce memory requirements .

Abstract

The present invention relates to a general purpose circuit that maximizes the computing power of a Unix workstation or other computer system for processing image or other data in accordance with a selected one or ones of several alternative compression and decompression algorithms. The present invention dynamically allocates system memory for storage of both compressed and uncompressed data and ensures adequate compression and decompression rates.

Description

COMPRESSION AND DECOMPRESSION SCHEME PERFORMED ON SHARED WORKSTATION MEMORY BY MEDIA COPROCESSOR
TECHNICAL FIELD
The present invention relates to data compression and decompression and more particularly, to compression and decompression of digital information in a real-time computer system environment.
BACKGROUND ART
In computer systems, large amounts of data, such as operating systems application programs, graphical and video information in digital form must be stored or manipulated. To reduce the bandwidth required to transfer the data and storage requirements, the data is often compressed. Before the computer system may operate on compressed data, it must be decompressed. Since most operating system and application data is relatively static, rates of compression and decompression are not critical. However, since graphical and video image data is generated and/or displayed in real-time and since such data is generated in voluminous quantities, performance requirements are higher.
As picture telephony technology is implemented in conjunction with the internet or other computer based networks, the demand to compress and transmit image data in real-time will increase. However, not only will such image data need to be compressed, it is also desirable to permit the computer system to enhance or manipulate image data concurrently with the compression or decompression process. Such enhancement may include addition of time and date information, blending, add overlays or some other programmable function.
Image data, which may include audio information, may be generated by video cameras or other known devices as a stream of bits, or bit stream. Image data is generated in real-time and is transferred from one device to another in the computer system or over a network. Due to the large amounts of data associated with image data, compression systems, such as JPEG, MPEG, M-JPEG, H.261 and others, have been adopted to define methods of compressing the bit stream. As is well known, compression systems are implemented because some devices, such as fixed disks, CD-ROM drives, as well as most network protocols require compressed video image data before delivering or accepting a real-time video stream.
Compression systems reduce spatial and temporal redundancy in a sequence of pictures. Most standard compression systems work on individual blocks of data so the tokens corresponding to patterns in one block may differ from the tokens of another block. This variability creates complexity in coding and decoding the data. Further, the variability makes it difficult to design a single hardware embodiment capable of coding and decoding the different algorithms used by the various compression systems.
Accordingly, it is common for a computer system to include more than one hardware and/or software compression systems. Further, the variability requires that the various components of the image capture and display system be compatible so as to prevent creation of artifacts or inclusion of other visible errors into the image.
One method that avoids many of the problems associated with hardware compression is known as software compression. Software compression systems operate as a software driver that intercepts the bit stream coming from, for example, an image source, such as a camera or CD-ROM, stores the data in system memory and invokes the system's central processor (CPU) to compress the data before sending it to a destination device, such as the disk drive. When the data is later accessed, the software engine must first transfer the compressed data to memory and, using the CPU, run a decompression algorithm on the data before it may be displayed or otherwise manipulated by the CPU. One advantage of software compression/decompression is that there is complete versatility as to the selection of the appropriate compression or decompression algorithm. However, as is well known in the art, software compression systems are slow and tend to consume large amounts of the central processor's computational time. Indeed, the demands of some compression algorithms are so great, most MPEG software compression drivers are unable to compress full size video images in real time.
Hardware compression engines are common and are available as board level devices. For example, U.S. Patent No. 5,357,614 which issued on October 18, 1994 and entitled DATA COMPRESSION CONTROLLER discloses a compression engine that compresses/decompresses data as it is transferred to or from a storage device such as a tape drive. This compression engine uses a proprietary compression algorithm and is not capable of decompressing data compressed according to other standards. As is typical with such compression engines, the board must also include expensive local memory, a local processor and/or control and interface logic.
Many image generating devices, such as video cameras, are provided with a resident compression engine so the output of the bit stream is compressed before it is transferred to a display device, which must be provided with a complementary decompression engine, or directly to storage. As will be readily appreciated, if the hardware engine resident on the video camera is not compatible with the display device, it will be unnecessarily expensive. As will be further appreciated, replicating compression and decompression engines among the peripherals is expensive and unnecessarily redundant since each device must include its own engine and associated memory and control logic, especially if multiple compression algorithms must be supported.
Another significant problem with image generating devices having resident compression is that the compressed data losses its character as image data. This loss of character prevents the CPU from operating or processing the data in real-time. By way of example, if data is compressed before it is stored in system memory, it is difficult to add time stamps or index markers to selected image frames or eliminate redundant images before storing to a storage device. Alternatively, some computer systems include a compression coprocessor as a compromise between pure software compression systems and dedicated hardware compression systems. Such coprocessors perform the compression and decompression algorithms otherwise run by the CPU. However, such coprocessors generally require substantial amounts of such coprocessors to have up to 4 Mbytes of fast and expensive dedicated memory. If the coprocessor does not have adequate memory available, it is possible that the coprocessor may not be able to properly compress or decompress images in accordance with many available algorithms. As will be readily appreciated, providing substantial dedicated memory in the coprocessor, which may be sparingly utilized, will increase system cost. Further, if the CPU must process image data in some manner, such as. by way of example, adding overlays, changing shadings or perform color space conversion, there will be unnecessary data transfers between system memory and the coprocessor's memory which may unnecessarily increase the load on the system's bus and the throughput rate.
What is required is a system that compresses video and image data that is independent from the requirements of the image capture device or the display device. What is also required is a system that provides the processing power to manipulate billions of arithmetic operations per second across a wide range of real world applications so that the image data may be manipulated before it is compressed and stored (or transmitted over a network) or decompressed displayed.
SUMMARY OF THE INVENTION The preceding and other shortcomings of the prior art systems are addressed and overcome by the present invention. The present invention relates to a general purpose circuit that maximizes the computing power of a Unix workstation or other computer system for processing image data in realtime while providing the ability to compress or decompress the image data. The present invention dynamically allocates system memory for storage of both compressed and uncompressed data and ensures adequate compression and decompression rates. The circuit of the present invention converts between various image protocols, color space and different signal domains such as frequency and time with little impact on system performance. The performance requirements of such operations are provided by a novel design that enhances the flow of data through the computer system without requiring significant processing resources from the central CPU. Thus, new and existing image processing tasks are provided without adding memory circuit elements dedicated to such tasks. Further, the resources of the present invention may be shared across multiple software applications such as texture generation, data compression or other image processing functions since image data may be readily processed in system memory.
The present invention manipulates video and image data in system memory with a video, imaging and compression (VIC) engine that consists of a DMA (direct memory access) controller to move data to and from system memory to the VIC engine, a media signal processor that performs integer, logical and mathematical operations necessary for signal processing and a bit in compression and de-compression algorithms. The combination of using the workstation memory for storage of image data and the VIC engine is very powerful and versatile in that image data can be stored, retrieved, and manipulated by the CPU as opposed to prior art computer systems than merely record and display image data in a manner that emulates a television or a photograph. Specifically, since the VIC engine off-loads the compression or decompression tasks, the workstation is capable of over an additional billion instructions per second for general purpose pixel manipulation. Accordingly, the present invention permits a wide range of flexibility in processing data without the addition of expensive multiple dedicated coprocessors or add-on peripherals such as rendering and blending or video capture, and display devices. Further, the workstation and its available memory is flexible enough to support many compression and decompression algorithms regardless of memory requirements. The present invention advantageously uses the large bandwidth system bus Or the workstation for transferring image data to or from system memory which serves as a frame buffer, a Z buffer or texture memory and permits economical sharing of low-level functional elements, such as the arithmetic block, logical block, and control flow block of the VIC engine, among many peripherals and processes involving multiple memory-to-memory activities that may not necessarily involve a peripheral.
Another advantage of the present invention is that the CPU may manipulate the image data before it is compressed by the VIC engine. When it is desired to display the compressed stored data, the data is transferred from the storage device to system memory and made available to the VIC engine. Under control of the DMA controller, data is transferred to the VIC engine, decompressed and transferred back to system memory where the CPU may further process the data before it is sent to the display device.
Using the shared system memory to implement the VIC engine functions requires that dedicated areas of system memory be mapped for use by the VIC engine so as to prevent other areas of system memory from being overwritten by the VIC engine. Memory mapping in the preferred embodiment is controlled by the host CPU and dynamically programmed into a table memory provided in the VIC engine. While it is desirable to provide up to 4 Mbytes of system memory for use by the VIC engine, the system memory need not be contiguous.
Accordingly, the present invention provides a system that permits the manipulation of video and image data rather than just displaying image data in a manner that merely emulates a television or photographs. The present invention frees up the workstation to sort or manipulate image data, to perform content recognition, as well as to compress or decompress the image data in real-time. The foregoing and additional features and advantages of this invention will become further apparent from the detailed description and accompanying drawing figures that follow.
BRIEF DESCRIPTION OF THE DRAWINGS FIGURE 1 illustrates a simplified block diagram of one embodiment of the present invention.
FIGURE 2 illustrates a simplified block diagram of one embodiment of the video, imaging and compression (VIC) engine.
FIGURE 3 illustrates on-chip memory allocation for decoding data in accordance with a known decompression algorithm.
FIGURE 4 represents a timing chart showing the concurrent operation of the processors of the VIC engine in a decode operation.
FIGURE 5 illustrates memory allocation for encoding data in accordance with a known compression algorithm.
FIGURE 6 represents a timing chart showing the concurrent operation of the processors of the VIC engine in an encoding operation.
DETAILED DESCRIPTION OF THE INVENTION In the following description of the preferred embodiment, reference is made to the accompanying drawings which form a part hereof, and in which is shown by way of illustration a specific embodiment in which the invention may be practiced. It is to be understood that other embodiments may be utilized and structural changes may be made without departing from the scope of the present invention. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.
Referring to the drawings more particularly by reference numbers, Figure 1 shows a preferred embodiment of a computer system 100 of the present invention having a central processing unit (CPU) 102, system memory 104 which is preferably a block of between 64 megabytes and one gigabyte of high speed RAM, a system controller 106 that controls transfer of data and instructions between CPU 102, system memory 104 and a plurality of peripherals such as a graphics display, a disk drive and other input/output (I/O) devices (not shown) . Graphics interface 108 couples system controller 106 to a write-only display device while I/O interface 110 couples system controller 106 to a plurality of SCSI, Ethernet, PCI, video or audio peripherals suck as a video capture card or a CD-ROM. In addition, computer system 100 also includes a video, imaging and compression (VIC) engine 112 that is coupled to both CPU 102 and system controller 106 by a high speed system bus 114.
VIC engine 112, shown in greater detail in Figure 2, contains four major functional blocks. Specifically, a media signal processor 200, a bit stream processor circuit 202, a DMA controller 204 and a host interface 206. Processors 200 and 202, together with the controller 204 and the interface 206 accelerate standard compression and de-compression algorithms.
In accordance with one aspect of the invention, video or image data from an input device such as a video camera or a CD-ROM ( or any other uncompressed data) is transferred to a portion of system memory 104. Subsequently, VIC engine 112 may initiate a block transfer using dedicated DMA controller 204 to transfer the data from system memory 104 to one of the three dedicated memories 222-226. The VIC engine compresses the data and stores compressed data to a second of the three memories 222-226. While data in the first memory is being compressed, additional • lata may be transferred to a third of the three memories. Upon completion of the compression operation, the data in the second of the memories is transferred back to a different portion of system memory 104 by the DMA controller 204. The compressed data may then be transferred from system memory 204 to a disk or other storage device (not shown) . Compression on data in the third memory is then initiated with compressed data being stored in the first memory. The DMA controller 204 concurrently transfers data to the second of the memories. This process is continued until all data in system memory is compressed. Advantageously, the CPU may manipulate the image data before it is compressed by VIC engine 112.
When it is desired to display or further process the compressed data, the data is transferred from the storage device to system memory and made available to VIC engine 112. Under DMA control, data is transferred to one of the memories 222-226, decompressed and transferred back to system memory where the CPU may process the data before transfer to the display or other peripheral device.
VIC engine 112 may operate either as a slave on system bus 114 responding to bus transactions initiated by either CPU 102 or system controller 106 or as a bus master. VIC engine 112 may also request mastership of system bus 114 for performing pipelined writes and reads or DMA transfers to or from system memory 104.
In one preferred embodiment, media signal processor 200 is an implementation of prior art Silicon Graphics MSP architecture that performs cosine transformation on visible pixels where YCrCb value is converted into the frequency domain. Media signal processor 200 includes a scalar unit processor 21 and a vector unit processor 212 both of which operate at 66 Mhz. Scalar unit processor 210 and vector unit processor 212 are coupled to a common 4K-byte instruction memory 214 by a 64-bit wide bus 216.
Scalar unit processor 210 performs control flow operations (such as jump and branch instructions) , scalar integer arithmetic operations, logical operations and local memory load and stores. Scalar unit processor 210 is coupled by a three state bus 211 to vector unit processor 212, host interface 206, DMA controller 204 and bit stream processor circuit 202. Scalar unit processor is also coupled to memories 222, 224 and 226 by a 128-bit wide data path 213 and output path 215 via crossbar switch 228. Crossbar switch 228 permits a one-to-one connection between processors 210 and 212 and memories 222-226 ensuring minimal capacitance bus loading. Vector unit processor 212 is a single instruction multiple data (SIMD) processor that performs math operations and particularly, performs the cosine transformation so that visible pixels, having a YCrCb value, are converted to the frequency domain for compression or from the frequency domain to the spatial domain for decompression. Vector unit processor 212 is coupled to memories 222, 224 and 226 by an internal, 128-bit wide data path 218 and output path 220 via crossbar switch 228. Data paths 218 and 220 may be sliced in eight sixteen bit segments for the purpose of performing integer mathematical operations. No branch instructions are included in the instruction set of vector unit processor 212 as such functions are performed by scalar processing unit 210.
Bit stream processor circuit 202 is a programmable device tailored for processing bit streams of compressed data. It is capable of performing entropy encoding which requires variable length lookups for compression algorithms as well as handling additional protocol, such as bit stuffing, header and preamble generation, etc., as may be required by a particular compression standard. Bit stream processor 230 is preferably a RISC core with a load and store architecture. It has an instruction set stored in instruction memory 232 comprising register to register operations (i.e.. arithmetic operations), instruction stream control (i.e., jumps and branches) and memory to register transfer of data. In addition, bit stream processor 230 has instructions that are specific to manipulating arbitrarily aligned "tokens" or artifacts of the compression process, in a bit stream of data. Further still, bit stream processor 230 has instructions that can perform table lookup operations necessary to decode variable length tokens in a bit stream. The lookup tables are stored in table memory 234 and are programmable to support MPEG-1, MPEG-2, H.261, JPEG or proprietary algorithms. The lookup tables are further programmable by CPU 102 to dynamically map areas of system memory 104 available for use as a frame buffer.
DMA controller 204 provides flexible address generation so that VIC engine 112 can access regions of system memory 104 to acquire blocks of data that need to be either compressed or decompressed. DMA controller 204 consists of two DMA channels that either media signal processor 200 or bit stream processor circuit 202 or CPU 102 may access. Each channel consists of a DMA state machine, control registers, and a descriptor memory. Descriptors are used to define starting addresses of DMA controller 204 and further define the mode (i.e. Read/Write or Y/C split) and span and stride settings.
DMA controller 204 may be used to fill system memory 104 with data from the VIC engine 112 or to transfer data from system memory 104 to memory 214, 222-226 and/or 232. For DMA transactions to or from system memory 104, DMA controller 204 decomposes descriptor requests into a respective physical UNIX system memory address and byte count. A lookup table in table memory 234 converts between contiguous memory address space of the media signal processor instruction memory 214, the bit stream processor circuit instruction memory 232 and the corresponding memory address space of system memory 104.
System memory 104 appears to VIC engine 112 as a region of four megabytes of contiguous addressable memory space allocated for use by the processors 200 and 202. This allocated system memory is grouped in physically contiguous 64K byte blocks.
These blocks can be located anywhere in system memory 104. The lookup table maps the location requested in the 4 megabyte region (as seen by VIC engine 112) into system memory into one of 64 different 64K byte pages of physical system memory 104. As part of the lookup table, there are bits to see if the VIC engine 112 is allowed to write to a selected block as well as bits to see if that block is mapped. VIC engine 112 can interrupt CPU 102 and halt if a write violation occurs or a block is not mapped to protect system memory 104 from corruption.
The lookup table can be re-programmed by CPU 102 as tasks assigned to VIC engine 112 change. This feature allows different 64K blocks of system memory 104 to be mapped into the VIC engine
112 viewport containing the lookup table without actually moving any data in system memory 104. This feature is useful for quickly switching between processes on computer system 100 that wish to share the VIC engine resource. As one skilled in the art will appreciate, if a local dedicated memory connected to the VIC engine were in use, it would most likely need to be saved and restored for each process switch.
Host interface 206 couples VIC engine 112 with the 64-bit system processor bus 114 of computer system 100. DMA arbiter 240 performs arbitration to allow VIC engine 112 to initiate transactions on system bus 114 without any intermediary overhead. Further, VIC engine 112 is able to respond as a slave to system processor bus 114 transactions or to request control of the system processor bus 114 to perform block pipelined writes and reads.
In accordance with another aspect of the invention, data is transferred to system memory 104 where CPU 102 may manipulate the data under control of an application program before the data is transferred to one of memories 222 - 226. VIC engine 112 compresses the transferred data in accordance with a selected compression standard (i.e., JPEG, MPEG, M-JPEG, H.261, etc.). Specifically, VIC engine 112 operates on the data loaded in a first of the three memories, encoded data is stored in the second memory, while the next frame of data is being loaded into the third of the three memories 222-226. After compression, the encoded data is transferred from VIC engine 112 to system memory 104 where CPU may concatenate the compressed data with index tags or other information that will assist in future sorting of the compressed images. As will be appreciated one skilled in the art, the compressed data may be retained in local storage device or transferred across a network.
When compressed data is to be recovered, the data is first loaded into system memory 104 where CPU 102 may manipulate or process the compressed images. The compressed data is transferred to memories 222-226 by DMA controller 204. DMA controller 204 is also responsible for transferring data to the VIC engine 112 for the operation of the appropriate decompression algorithm. DMA controller 204 is further responsible for transferring the uncompressed data to system memory 104. Advantageously, CPU may further manipulate the image before routing the image to the display device.
In the preferred embodiment, memories 222-226 comprise about 6K bytes of static RAM with each memory having 128 by 128 bits. Memories 222-226 are dual port memory blocks with access controlled by ports 242, 244 and 246, respectively. Vector unit processor 212 and scalar unit processor 210 have deterministic access to memories 222-226 that is guaranteed by DMA arbiter 240 and buses 248 and 256. Host interface 206, bit stream processor circuit 202 and DMA controller 204 access memories 222-226 over DMA bus 250.
As is well known in the art, JPEG compression is a standardized image compression mechanism for the archival and transmission of still images such as digitized photograph. q, graphs, etc. JPEG compression can achieve compression ratios ranging from, for example, 4:1 to 20:1 for a full color photograph so an image consisting of 2 Mbytes of data may be compressed to about 100 Kbytes.
Referring now to Figure 3, data flow through VIC engine 112 is depicted, by way of example, for decoding JPEG images. Specifically, an instruction sequence for implementing an algorithm for JPEG compression or decompression is loaded into memories 214 and 232 (which are shown in Figure 2) . When a still image loaded into memory 104 is to be decompressed, a minimum coded unit comprising a level of granularity that is determined by the picture size, picture generation rate and the cycle time available for processing the unit of granularity is transferred by DMA controller 204 to a first in, first out (FIFO) 302. For the case of JPEG decoding or encoding, data is transferred at the minimum coded unit level where a minimum coded unit comprises 256-bytes. For the case of MPEG-2 decoding, data is transferred in macroblocks where a macroblock comprises 128-bytes. DMA controller 204 transfers data to FIFO 302 associated with bit stream processor circuit 202 64-bytes at a time. DMA bus arbiter 240 arbitrates between bus contentions for accessing system memory 104 and responding to bit stream processor circuit 202 requirements.
FIFO buffer 302 is a 64-byte buffer that is emptied by bit stream processor 23032-bits at a time. Bit stream processor 230 is used to detect "markers" in compressed bit streams. These markers identify variable length coded segments within the compressed bit streams. Bit stream processor 230 can decode these variable length coded segments to reconstruct 8x8 blocks of discrete cosine transform (DCT) coefficients which are further processed by scalar unit processor 210 and vector unit processor 212. Bit stream processor 230 performance is driven by both the number of bits to be decoded as well as the number of tokens. DMA controller 204 stores decompressed data from a write buffer 304 in memory 222 before performing a block transfer to system memory 104.
During the decode process, bit stream processor 230 decodes the bitstream corresponding to an i-th+1 minimum coded unit and loads the decoded bitstream into write buffer 304. Write buffer 304 comprises an intermediate 4-byte register from which the decoded bitstream is transferred to memory 222 under control of DMA controller 204. While bit stream processor 230 is processing the i-th+1 minimum coded unit, the i-th minimum coded unit, which stored in memory 224, is available for processing by media signal processor 200. Media signal processor 200 must have deterministic access to the data in memory 224 so its memory access is accorded the highest priority. Results of the i-th-1 minimum coded unit decoded by the media signal processor are stored to memory 226 and transferred by DMA controller to system memory 104. It is possible, in alternative embodiments, for the decoded i-th-1 minimum coded unit to be stored in an unused portion of memory 222 when using the JPEG algorithm.
The decode registration diagram of Figure 4 shows the process for 4:2:2 level decoding of a JPEG bitstream. The decode cycle duration in the preferred embodiment is 800 cycles, as shown at 402. During the decode cycle: 1) DMA controller 204 transfers data to or from system memory 104 as generally shown at 406; 2) media signal processor 200 decodes the i-th macroblock as shown at 410; and 3) bit stream processor circuit 202 works on conditioning the i-th +1 minimum coded unit, as shown at 404. Specifically, with this scheme, bit stream processor circuit 202 begins the process by decoding the minimum coded unit's header information, as indicated at 412. Meanwhile, DMA controller 204 first zeros out i-th+1 memory 222, as shown at 414. and transfers the i-th -1 minimum coded unit to system memory 104, as shown at 416.
As required, DMA controller 204 also responds to the bit stream processor's data requirements, as shown at 408 while bit stream processor decodes the compressed minimum coded units of data. As shown at 416, media signal processor 200 updates the decode information, such as Q matrix information. Media signal processor 200 then performs an inverse quantization of the data in memory 224, as indicated at 418. Upon completion, media signal processor 200 then performs the inverse discrete transform of the data and stores the result in memory 226. Memories 222-226 and parallel operation of DMA controller 204 with bit stream processor circuit 202 and media signal processor 200 compensate for the latency of the system memory. The parallel operation of bit stream processor circuit 202 and media signal processor 200 apply multi-tasking to the computational and decision making tasks so as to accomplish the compression or decompression process with relatively inexpensive processors operating at relatively slow 66 MHertz clock rates. Advantageously, the decode operation is completed with minimum memory requirement for memories 222-226 in a manner that compensates for any latency associated with obtaining data from system memory 104. In this manner, data will always be available for processing by scalar and/or vector units 210 and 212 of media signal processor 200.
Encoding of JPEG data is described in conjunction with Figures 5 and 6 where DMA controller 204 loads a portion of memory 222 with the i th+1 4:2:2 macro control block and a corresponding quantization table. While memory 222 is being loaded, media signal processor 200 performs quantization and a discrete cosine transform on four 8x8 blocks of data and saves the results in memory 226. Concurrently, bit stream processor circuit 202 Huffman codes the i-th -1 data in memory 226 and writes the result to write buffer 304. DMA controller 204 then transfers the compressed data from write buffer 304 to system memory 104.
As shown in Figure 6, DMA controller 204 loads the i-th+1 macro control block at 610 when it is not attending to requests from write buffer 304, as indicated at 604. Media signal processor 200 performs the discrete cosine transform on the i-th macro control block up to 612. Thereafter, media signal processor 200 performs quantization transform on the i-th macro control block and transfers the result to memory 226, as indicated at 614. Concurrently, bit stream processor circuit 202 is performing Huffman coding and bitstream packing on the data in memory 226. Under ideal conditions, such operations will be completed as indicated at 618. However, since media signal processor 200 has priority access to memories 222-226, the time to complete the bit stream processor circuit's operation will most likely need to be de-rated, as indicated at 620, due to memory contention.
With respect to other compression algorithms, it is known in the art that MPEG (MPEG-1 and MPEG-2) is a recognized standard for compressing image and audio portions of moving pictures. MPEG adds adaptive quantization at the minimum coded unit (16 x 16 pixel area) layer. However, it is difficult to edit MPEG sequence on a frame-by-frame basis since each frame is intimately tied to the ones around it. Accordingly, in a minimum coded unit encode or decode application, data is first loaded, by way of example, in memory 222 as well as a backward predictor and a forward predictor. Memory 224 is used to for the resultant decoded or encoded data while the next frame of data is loaded by DMA controller 204 into memory 226.
One skilled in the art will appreciate that the versatility afforded by the architecture o VIC engine 112 affords the computational power to compress or decompress data in accordance with many known compression or decompression algorithms. Since data is initially routed to system memory, the VIC engine 112 does not require substantial dedicated memory since it needs only operate at a rate sufficient to minimize the amount of system memory 104 consumed. In the preferred embodiment, about 4 Mbytes of system memory 104 is allocated for up to six frames of uncompressed image data assuming a 640x480 pixel display. Additional system memory 104 is allocated for the compressed image data which in the worst case may amount to about 160 Kbytes for each frame, assuming a lightly compressed rate of 4:1. Higher compression rates would accordingly reduce memory requirements .
While certain exemplary preferred embodiments have been described and shown in the accompanying drawings, it is to be understood that such embodiments are merely illustrative of and not restrictive on the broad invention. Further, it is to be understood that this invention shall not be limited to the specific construction and arrangements shown and described since various modifications or changes may occur to those of ordinary skill in the art without departing from the spirit and scope of the invention as claimed.

Claims

1. In a computer system having system memory, a system controller for controlling access to system memory, a central processing unit for manipulating data stored in said memory, and an engine for compressing and decompressing data stored in said system memory comprising: a first processor for performing cosine transformation on a YCrCb value representing visible pixels into the frequency domain, said central processor configuring said second processor to compress or decompress said data in accordance with a selected compression or decompression algorithm; a second processor for entropy encoding or decoding a bit stream of frequency domain data, said central processor configuring said second processor to compress or decompress said data in accordance with said selected compression or decompression algorithm; a bank of memory, associated with said first and second processor, divided into at least three independently addressable memory banks; and a DMA controller associated with said first and second processors, for transferring said data between system memory and said bank of memory; wherein said DMA controller is adapted to transfer a first portion of said data from system memory to a first of said banks of independently addressable memory banks to initiate a compression or decompression of said data, said DMA controller further adapted to transfer data to a second of said banks of independently addressable memory banks while said first and second processors compresses or decompresses said data in said first independently addressable memory bank, and said DMA controller further adapted to addressable memory bank to said system memory.
2. The DMA controller of claim 1 further comprising means for acquiring mastership of said system bus and transferring a unit of granularity of data to one of said independently addressable banks of memory.
3. The compression and decompression engine of claim 1 or 2 further comprising arbiter means and bus means for ensuring said first processor has highest priority access to each of said independently addressable banks of memory.
4. The invention of claim 1, 2 or 3 wherein the compressed or decompressed data provided by said first and second processors is stored in said third independently addressable memory bank;
5. The invention of claim 1, 2, 3, 4 or 5 wherein said data comprises uncompressed real-time video data stored to a page of system memory, said DMA controller further comprising means for locating said page of real-time video data and transferring said real-time video data to said independently addressable banks of memory for compression and for transferring compressed data from said independently addressable banks of memory to a second page of said system memory in real-time.
6. The invention of claim 5, wherein said central processor unit may point said compression/decompression engine to one of a plurality of system memory decompression.
7. The invention of claim 5, wherein the rate of compression and decompression is sufficient for real-time compression and decompression of video data.
8. The invention of claim 6, wherein said compression and decompression of real-time video data in accordance with said selected compression or decompression algorithm further comprises means for converting said video data from the spatial domain to the frequency domain for compression and from the frequency domain to the spatial domain for decompression.
9. In a computer system having a central processor and a system memory for storing data, a compression and decompression coprocessor comprising: a local data memory having sufficient storage space for storing only a portion of the data in said system memory; processor means, associated with said local data memory for processing the data resident in said local memory in accordance with a compression or decompression algorithm; means for transferring a portion of said data from said system memory to said local data memory at a rate sufficient to maintain operation of said processor means and for transferring processed data from said local data memory to said system memory; means associated with said data transferring means for identifying the location of said unprocessed data in said system memory and the location for storing processed data in said system memory.
10. The invention of claim 9 wherein said system memory comprises storage for the equivalent of up to six frames of uncompressed video data captured in real-time.
11. The invention of claim 10 wherein said system memory further comprises dynamically allocable storage for processed data.
12. The invention of claim 10 wherein said central processor selectively operates on said uncompressed video data stored in said system memory pr or to the transfer of data from system memory to local memory.
13. In a computer system, a method for controlling data flow in compressing and decompressing image data with a compression engine comprising the steps of: collecting image data in a region of system memory associated with said computel system; configuring said engine with a selected compression algorithm; transferring a portion of said image data comprising at least one unit of granularity of data to a first local memory associated with said engine; concurrently compressing the transferred portion of said image data and transferring an additional portion of said image data to a second local memory associated with said engine; saving the compressed portion of said image data in said first local memory in a third local memory; transferring the compressed portion of said image data in said third local memory to a second region of system memory; concurrently compressing the transferred portion of said image data in said second local memory and transferring an additional portion of said image data to said third local memory associated with said engine; saving the compressed portion of said image data in said second local memory in said first local memory; transferring the compressed portion of said image data in said first local memory to said second region of system memory; concurrently compressing the transferred portion of said image data in said third local memory and transferring an additional portion of said image data to a first local memory; saving the compressed portion of said image data in said third local memory in said second local memory; transferring the compressed portion of said image data to a second region of system memory; and repeating the above sequence of steps so that said image data in said region of system memory is transferred from system memory to said local memories, compressed and transferred back to said system memory.
14. In a computer system, a method for utilizing system memory in the collection of data from a data source for compression or decompression with a compression/decompression coprocessor comprising the steps of: collecting data in a plurality of pages in system memory; transferring a portion of said data in one of said pages to a local memory associated with said coprocessor; concurrently transforming said data in accordance with a selected compression or decompression algorithm while transferring a next portion of said data to said local memory; and transferring the transformed data from said local memory to a corresponding one of a second plurality of pages in system memory.
15. The method of claim 14 further comprising the step of configuring said coprocessor to selectively perform a compression or a decompression algorithm with a CPU associated with said computer system.
16. The method of claim 15 wherein said configuring step further comprises the steps of: transferring a portion of said data in a different one of said pages to said local memory associated with said coprocessor; concurrently transforming said data in accordance with the selected compression or decompression algorithm while transferring a next portion of said data to said local memory; and transferring the transformed data from said local memory to one of the pages comprising said second plurality of pages in system memory whereby said coprocessor may be time shared by one or more data sources.
PCT/US1997/012437 1996-09-13 1997-07-03 Compression and decompression scheme performed on shared workstation memory by media coprocessor WO1998011729A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
AU36027/97A AU3602797A (en) 1996-09-13 1997-07-03 Compression and decompression scheme performed on shared workstation memory by media coprocessor
DE69720477T DE69720477T2 (en) 1996-09-13 1997-07-03 COMPRESSION AND DECOMPRESSION SCHEME EXECUTED BY A MEDIA COPROCESSOR ON A COMMON STORAGE OF A WORKSTATION
EP97932621A EP0925687B1 (en) 1996-09-13 1997-07-03 Compression and decompression scheme performed on shared workstation memory by media coprocessor
JP10513639A JP2001500686A (en) 1996-09-13 1997-07-03 Compression and decompression scheme performed on shared workstation memory by media coprocessor
CA002259513A CA2259513C (en) 1996-09-13 1997-07-03 Compression and decompression scheme performed on shared workstation memory by media coprocessor

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US08/713,599 US5768445A (en) 1996-09-13 1996-09-13 Compression and decompression scheme performed on shared workstation memory by media coprocessor
US08/713,599 1996-09-13

Publications (1)

Publication Number Publication Date
WO1998011729A1 true WO1998011729A1 (en) 1998-03-19

Family

ID=24866752

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1997/012437 WO1998011729A1 (en) 1996-09-13 1997-07-03 Compression and decompression scheme performed on shared workstation memory by media coprocessor

Country Status (7)

Country Link
US (1) US5768445A (en)
EP (1) EP0925687B1 (en)
JP (1) JP2001500686A (en)
AU (1) AU3602797A (en)
CA (1) CA2259513C (en)
DE (1) DE69720477T2 (en)
WO (1) WO1998011729A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999063447A1 (en) * 1998-06-01 1999-12-09 Advanced Micro Devices, Inc. Compression and decompression of serial port data and status using direct memory access

Families Citing this family (66)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6002411A (en) * 1994-11-16 1999-12-14 Interactive Silicon, Inc. Integrated video and memory controller with data processing and graphical processing capabilities
IT1285258B1 (en) * 1996-02-26 1998-06-03 Cselt Centro Studi Lab Telecom HANDLING DEVICE FOR COMPRESSED VIDEO SEQUENCES.
US6192073B1 (en) * 1996-08-19 2001-02-20 Samsung Electronics Co., Ltd. Methods and apparatus for processing video data
TW360823B (en) * 1996-09-30 1999-06-11 Hitachi Ltd Data processor and graphic processor
US5986709A (en) * 1996-11-18 1999-11-16 Samsung Electronics Co., Ltd. Adaptive lossy IDCT for multitasking environment
JP3104643B2 (en) * 1997-05-07 2000-10-30 株式会社セガ・エンタープライゼス Image processing apparatus and image processing method
US6188394B1 (en) * 1998-08-28 2001-02-13 Ati Technologies, Inc. Method and apparatus for video graphics antialiasing
US6388989B1 (en) * 1998-06-29 2002-05-14 Cisco Technology Method and apparatus for preventing memory overrun in a data transmission system
US6624761B2 (en) 1998-12-11 2003-09-23 Realtime Data, Llc Content independent data compression method and system
WO2000046978A2 (en) * 1999-02-04 2000-08-10 Quvis, Inc. Scaleable resolution motion image recording and storage system
US6604158B1 (en) 1999-03-11 2003-08-05 Realtime Data, Llc System and methods for accelerated data storage and retrieval
US6601104B1 (en) 1999-03-11 2003-07-29 Realtime Data Llc System and methods for accelerated data storage and retrieval
US7437483B1 (en) * 1999-03-24 2008-10-14 Microsoft Corporation System and method for transferring a compressed data file to a peripheral device
US6172625B1 (en) * 1999-07-06 2001-01-09 Motorola, Inc. Disambiguation method and apparatus, and dictionary data compression techniques
US20010047473A1 (en) 2000-02-03 2001-11-29 Realtime Data, Llc Systems and methods for computer initialization
AU2001243463A1 (en) * 2000-03-10 2001-09-24 Arc International Plc Memory interface and method of interfacing between functional entities
US8692695B2 (en) 2000-10-03 2014-04-08 Realtime Data, Llc Methods for encoding and decoding data
US7417568B2 (en) 2000-10-03 2008-08-26 Realtime Data Llc System and method for data feed acceleration and encryption
US9143546B2 (en) 2000-10-03 2015-09-22 Realtime Data Llc System and method for data feed acceleration and encryption
US7386046B2 (en) 2001-02-13 2008-06-10 Realtime Data Llc Bandwidth sensitive data compression and decompression
US6822654B1 (en) 2001-12-31 2004-11-23 Apple Computer, Inc. Memory controller chipset
US6697076B1 (en) 2001-12-31 2004-02-24 Apple Computer, Inc. Method and apparatus for address re-mapping
US7681013B1 (en) 2001-12-31 2010-03-16 Apple Inc. Method for variable length decoding using multiple configurable look-up tables
US6931511B1 (en) 2001-12-31 2005-08-16 Apple Computer, Inc. Parallel vector table look-up with replicated index element vector
US7015921B1 (en) 2001-12-31 2006-03-21 Apple Computer, Inc. Method and apparatus for memory access
US6573846B1 (en) 2001-12-31 2003-06-03 Apple Computer, Inc. Method and apparatus for variable length decoding and encoding of video streams
US7055018B1 (en) 2001-12-31 2006-05-30 Apple Computer, Inc. Apparatus for parallel vector table look-up
US7034849B1 (en) 2001-12-31 2006-04-25 Apple Computer, Inc. Method and apparatus for image blending
US7114058B1 (en) 2001-12-31 2006-09-26 Apple Computer, Inc. Method and apparatus for forming and dispatching instruction groups based on priority comparisons
US7305540B1 (en) 2001-12-31 2007-12-04 Apple Inc. Method and apparatus for data processing
US6693643B1 (en) 2001-12-31 2004-02-17 Apple Computer, Inc. Method and apparatus for color space conversion
US7467287B1 (en) 2001-12-31 2008-12-16 Apple Inc. Method and apparatus for vector table look-up
US6877020B1 (en) 2001-12-31 2005-04-05 Apple Computer, Inc. Method and apparatus for matrix transposition
US7558947B1 (en) 2001-12-31 2009-07-07 Apple Inc. Method and apparatus for computing vector absolute differences
US6985853B2 (en) * 2002-02-28 2006-01-10 Broadcom Corporation Compressed audio stream data decoder memory sharing techniques
US7154502B2 (en) * 2002-03-19 2006-12-26 3D Labs, Inc. Ltd. 3D graphics with optional memory write before texturing
ITMI20022003A1 (en) * 2002-09-20 2004-03-21 Atmel Corp APPARATUS AND METHOD FOR DYNAMIC DECOMPRESSION OF PROGRAMS.
US6707397B1 (en) 2002-10-24 2004-03-16 Apple Computer, Inc. Methods and apparatus for variable length codeword concatenation
US6707398B1 (en) 2002-10-24 2004-03-16 Apple Computer, Inc. Methods and apparatuses for packing bitstreams
US6781529B1 (en) 2002-10-24 2004-08-24 Apple Computer, Inc. Methods and apparatuses for variable length encoding
US6781528B1 (en) 2002-10-24 2004-08-24 Apple Computer, Inc. Vector handling capable processor and run length encoding
US9614772B1 (en) 2003-10-20 2017-04-04 F5 Networks, Inc. System and method for directing network traffic in tunneling applications
US8024483B1 (en) 2004-10-01 2011-09-20 F5 Networks, Inc. Selective compression for network connections
US7783781B1 (en) 2005-08-05 2010-08-24 F5 Networks, Inc. Adaptive compression
US8533308B1 (en) 2005-08-12 2013-09-10 F5 Networks, Inc. Network traffic management through protocol-configurable transaction processing
US7536533B2 (en) * 2005-09-30 2009-05-19 Silicon Laboratories Inc. MCU based motor controller with pre-load register and DMA controller
US8275909B1 (en) 2005-12-07 2012-09-25 F5 Networks, Inc. Adaptive compression
US7882084B1 (en) 2005-12-30 2011-02-01 F5 Networks, Inc. Compression of data transmitted over a network
US8565088B1 (en) 2006-02-01 2013-10-22 F5 Networks, Inc. Selectively enabling packet concatenation based on a transaction boundary
US7873065B1 (en) 2006-02-01 2011-01-18 F5 Networks, Inc. Selectively enabling network packet concatenation based on metrics
JP4929073B2 (en) * 2006-07-07 2012-05-09 キヤノン株式会社 Multi-function printer
US9356824B1 (en) 2006-09-29 2016-05-31 F5 Networks, Inc. Transparently cached network resources
US8417833B1 (en) 2006-11-29 2013-04-09 F5 Networks, Inc. Metacodec for optimizing network data compression based on comparison of write and read rates
US9106606B1 (en) 2007-02-05 2015-08-11 F5 Networks, Inc. Method, intermediate device and computer program code for maintaining persistency
KR100793286B1 (en) * 2007-05-02 2008-01-10 주식회사 코아로직 Digital video codec using small size buffer memory, and method for controlling the same
US20080291209A1 (en) * 2007-05-25 2008-11-27 Nvidia Corporation Encoding Multi-media Signals
US8755515B1 (en) 2008-09-29 2014-06-17 Wai Wu Parallel signal processing system and method
US8367460B2 (en) 2010-06-22 2013-02-05 Micron Technology, Inc. Horizontally oriented and vertically stacked memory cells
US9026568B2 (en) 2012-03-30 2015-05-05 Altera Corporation Data compression for direct memory access transfers
US20140108704A1 (en) * 2012-10-16 2014-04-17 Delphi Technologies, Inc. Data decompression method for a controller equipped with limited ram
US9053121B2 (en) 2013-01-10 2015-06-09 International Business Machines Corporation Real-time identification of data candidates for classification based compression
US9792350B2 (en) 2013-01-10 2017-10-17 International Business Machines Corporation Real-time classification of data into data compression domains
US9564918B2 (en) 2013-01-10 2017-02-07 International Business Machines Corporation Real-time reduction of CPU overhead for data compression
US9686536B2 (en) * 2013-05-20 2017-06-20 Advanced Micro Devices, Inc. Method and apparatus for aggregation and streaming of monitoring data
CN104572655B (en) * 2013-10-12 2019-04-12 腾讯科技(北京)有限公司 The method, apparatus and system of data processing
US11184456B1 (en) * 2019-06-18 2021-11-23 Xcelastream, Inc. Shared resource for transformation of data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1993019431A1 (en) * 1992-03-20 1993-09-30 Maxys Circuit Technology Ltd. Parallel vector processor architecture
WO1994010641A1 (en) * 1992-11-02 1994-05-11 The 3Do Company Audio/video computer architecture
EP0686939A1 (en) * 1994-06-10 1995-12-13 Hitachi, Ltd. Image display apparatus

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5212742A (en) * 1991-05-24 1993-05-18 Apple Computer, Inc. Method and apparatus for encoding/decoding image data
US5373327A (en) * 1993-02-25 1994-12-13 Hewlett-Packard Company Detection, correction and display of illegal color information in a digital video signal

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1993019431A1 (en) * 1992-03-20 1993-09-30 Maxys Circuit Technology Ltd. Parallel vector processor architecture
WO1994010641A1 (en) * 1992-11-02 1994-05-11 The 3Do Company Audio/video computer architecture
EP0686939A1 (en) * 1994-06-10 1995-12-13 Hitachi, Ltd. Image display apparatus

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"HIGH-SPEED DIRECT MEMORY ACCESS CONTROL METHOD", IBM TECHNICAL DISCLOSURE BULLETIN, vol. 33, no. 8, 1 January 1991 (1991-01-01), pages 368 - 370, XP000109222 *
BACHSTEIN W ET AL: "SINGLE-CHIP ERLEDIGT MULTIMEDIA", ELEKTRONIK, vol. 45, no. 17, 20 August 1996 (1996-08-20), pages 58 - 62, XP000633205 *
ONOYE T ET AL: "HDTV LEVEL MPEG2 VIDEO DECODER VLSI", 1995 IEEE TENCON. IEEE REGION TEN INTERNATIONAL CONFERENCE ON MICROELECTRONICS AND VLSI, HONG KONG, NOV. 6 - 10, 1995, 6 November 1995 (1995-11-06), INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS, pages 468 - 471, XP000585824 *
WOOBIN LEE ET AL: "REAL-TIME MPEG VIDEO CODEC ON A SINGLE-CHIP MULTIPROCESSOR", PROCEEDINGS OF THE SPIE, vol. 2187, 1 January 1994 (1994-01-01), pages 32 - 42, XP000571385 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999063447A1 (en) * 1998-06-01 1999-12-09 Advanced Micro Devices, Inc. Compression and decompression of serial port data and status using direct memory access
US6385670B1 (en) 1998-06-01 2002-05-07 Advanced Micro Devices, Inc. Data compression or decompressions during DMA transfer between a source and a destination by independently controlling the incrementing of a source and a destination address registers

Also Published As

Publication number Publication date
JP2001500686A (en) 2001-01-16
AU3602797A (en) 1998-04-02
EP0925687B1 (en) 2003-04-02
EP0925687A1 (en) 1999-06-30
US5768445A (en) 1998-06-16
CA2259513A1 (en) 1998-03-19
DE69720477D1 (en) 2003-05-15
DE69720477T2 (en) 2003-12-18
CA2259513C (en) 2006-11-21

Similar Documents

Publication Publication Date Title
EP0925687B1 (en) Compression and decompression scheme performed on shared workstation memory by media coprocessor
JP4426099B2 (en) Multiprocessor device having shared memory
JP3806936B2 (en) Image compression coprocessor having data flow control and multiple processing units
Rathnam et al. An architectural overview of the programmable multimedia processor, TM-1
US7230633B2 (en) Method and apparatus for image blending
US7403564B2 (en) System and method for multiple channel video transcoding
US6018353A (en) Three-dimensional graphics accelerator with an improved vertex buffer for more efficient vertex processing
US7054964B2 (en) Method and system for bit-based data access
US7885336B2 (en) Programmable shader-based motion compensation apparatus and method
JP4101253B2 (en) Compression device and method
US5309528A (en) Image digitizer including pixel engine
US7055018B1 (en) Apparatus for parallel vector table look-up
US6822654B1 (en) Memory controller chipset
KR100818034B1 (en) ON THE FLY DATA TRANSFER BETWEEN RGB AND YCrCb COLOR SPACES FOR DCT INTERFACE
US6820087B1 (en) Method and apparatus for initializing data structures to accelerate variable length decode
US6313766B1 (en) Method and apparatus for accelerating software decode of variable length encoded information
JP4227218B2 (en) Dynamic memory management device and control method thereof
AU739533B2 (en) Graphics processor architecture
JP4298006B2 (en) Image processor and image processing method thereof
US6070002A (en) System software for use in a graphics computer system having a shared system memory
KR100295304B1 (en) Multimedia computer with integrated circuit memory
JP3327900B2 (en) Data processing device
AU728882B2 (en) Compression
AU760297B2 (en) Memory controller architecture
US6987545B1 (en) Apparatus for assisting video compression in a computer system

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AU CA JP

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 1997932621

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2259513

Country of ref document: CA

Ref country code: CA

Ref document number: 2259513

Kind code of ref document: A

Format of ref document f/p: F

ENP Entry into the national phase

Ref country code: JP

Ref document number: 1998 513639

Kind code of ref document: A

Format of ref document f/p: F

WWP Wipo information: published in national office

Ref document number: 1997932621

Country of ref document: EP

WWG Wipo information: grant in national office

Ref document number: 1997932621

Country of ref document: EP