« PreviousContinue »
United States Patent m
Van Maren et al.
[li] Patent Number: 4,870,415  Date of Patent: Sep. 26, 1989
 DATA COMPRESSION SYSTEM WITH EXPANSION PROTECTION
 Inventors: David J. Van Maren, Fort Collins;
Jeff J. Kato, Greeley, both of Colo.
 Assignee: Hewlett-Packard Company, Palo Alto, Calif.
 Appl. No.: 109,969
 Filed: Oct. 19,1987
 IntCl." H03M7/30
 U.S. CI 341/94; 341/51;
 Field of Search 340/347 DD; 358/260,
358/261, 257, 263, 280; 382/56; 235/310; 364/926.1, 926.2, 926.3, 926.6; 341/51, 55, 63,
71, 87, 94, 95
 References Cited
U.S. PATENT DOCUMENTS
4,145,686 3/1979 McMurray et al 340/347 DD
4,486,784 12/1984 Abraham et al 340/347 DD
4,654,877 3/1987 Shimoni et al 382/56
4,719,514 1/1988 Kurahayashi et al 358/260
4,772,955 9/1988 Kurahayashi et al 358/257
Primary Examiner—William M. Shoop, Jr.
Assistant Examiner—Brian K. Young
Attorney, Agent, or Firm—William W. Cochran, II
A data compression system implementing expansion protection employs one or more pairs of FIFOs to compare the lengths of raw and processed versions of a block of received data. The shorter version is transmitted so that the data transmitted by the data compression system is at most negligibly expanded relative to the system input. A code is inserted in the output stream to indicate the beginning of the transmission of a raw data block so that a receiving or retrieving system can determine whether the data following needs to be decompressed or not. Further codes can be injected to indicate a switch from raw data to processed data in the output of the compression system.
6 Claims, 2 Drawing Sheets
DATA COMPRESSION SYSTEM WITH
BACKGROUND OF THE INVENTION 5
The present invention relates to signal processing and, more particularly, to data compression.
Data compression is the reversible re-encoding of information into a more compact expression. This more compact expression permits information to be stored 10 and/or communicated more efficiently, generally saving both time and expense. A typical encoding scheme, e.g., based on ASCII, encodes alphanumeric characters and other symbols into binary sequences. A major class of compression schemes encodes symbol combinations" using binary sequences not otherwise used to encode individual symbols. Compression is effected to the degree that the symbol combinations represented in the encoding scheme are encountered in a given text or other file. By analogy with bilingual dictionaries used to 20 translate between human languages, the device that embodies the mapping of uncompressed code into compressed code is commonly referred to as a "dictionary".
The present invention is primarily applicable to dictionary-based compression schemes, which are part of a 25 larger class of sequential compression schemes. These are contrasted with non-sequential schemes which examine an entire file before determining the encoding to be used. Other sequential compression schemes, such as run-length limited (RLL) compression, can be used in 30 conjunction with adaptive schemes.
Generally, the usefulness of a dictionary-based compression scheme is dependent on the frequency with which the symbol-combination entries in the dictionary are matched as a given file is being compressed. A die- 35 tionary optimized for one file type is unlikely to be optimized for another. For example, a dictionary which includes a large number of symbol combinations likely to be found in newspaper text files is unlikely to compress effectively data base files, spreadsheet files, bit- 40 mapped graphics files, computer-aided design files, Musical Instrument Data Interface (MIDI) files, etc.
Thus, a strategy using a single fixed dictionary might be best tied to a single application program. A more sophisticated strategy can incorporate means for identi- 45 fying file types and selecting among a predetermined set of dictionaries accordingly. Even the more sophisticated fixed dictionary schemes are limited by the requirement that a file to be compressed must be matched to one of a limited number of dictionaries. Furthermore, 50 there is no widely accepted standard for identifying file types essentially limiting multiple dictionary schemes to specific applications or manufacturers.
Adaptive compression schemes are known in which the dictionary used to compress a given file is devel- 55 oped as that file is being compressed. Entries are made into a dictionary as symbol combinatios are encountered in the file. The entries are used on subsequent occurrences of an encoded combination. Compression is effected to the extent that the symbol combinations 60 occurring most frequently in the file are encountered as the dictionary is developing. Systems incorporating adaptive compression schemes can include means for clearing the dictionary between files so that the dictionary can be adapted on a file-by-file basis. 65
Adaptive compression systems and methods are disclosed in U.S. Pat. No. 4,464,650 to Eastman et al. and U.S. Pat. No. 4,558,302 to Welch. These references
explain further the use of dictionaries in both adaptive and non-adaptive compression strategies. Further pertinent references to compression strategies include: G. Herd, "Data Compression: Techniques and Applications—Hardware and Software Considerations, Wiley, 1983; R. G. Gallagher, "Variations on a Theme of Huffman", IEEE Transactions on Information Theory, Vol. IT-24, No. 6, pp. 668-674, November 1978; J. Ziv and A. Lampel, "A Universal Algorithm for Sequential Data Compression", IEEE Transactions on Information Theory, Vol. IT-23, No. 3, pp. 337-343, May 1977; J. Ziv and A. Lampel, "Compression of Individual Sequences via Variable Rate Coding", IEEE Transactions of Information Theory, Vol. IT-24, No. 5, pp. 530-536, September 1978; and T. A. Welch, "A Technique for High Performance Data Compression", IEEE Computer, June 1984.
A disadvantage of such adaptive compression techniques is that in some cases they can expand rather than compress the data. In fact expansion is the rule rather than the exception when an adaptive compression scheme is used to compress a file which has already been compressed by that scheme. As data compression becomes more widely employed, the chances of data expansion due to an attempted compression of a previously compressed file increases. For example, an application program can include a dedicated compression scheme so that files created by the program can be stored efficiently on a hard disk drive. Likewise, a tape drive system for backing up a hard disk include a data compression scheme in hardware for more efficient archiving of the hard disk drive. In this situation, attempting data compression during archiving can result in data expansion rather than contraction.
As data compression becomes more common, this counterproductive scenario becomes less the exception and more the rule. If data compression is to be implemented in hardware so that it operates irrespective of the type of data being compressed, it becomes necessary to protect against unintended data expansion. Of course, this protection must not interfere with the process of decompression that must occur upon the reception or retrieval of compressed data.
SUMMARY OF THE INVENTION
The present invention provides expansion protection in a data compression system by selecting the shorter of (1) the "raw" data as received by the system, and (2) the "processed" data as processed by an incorporated data compressor. A data compression system in accordance with the present invention includes a data compressor, a control function, and one or more pairs of buffers. Each pair of buffers includes a buffer for raw data and a buffer for processed data. When the raw data buffer is first to fill, data is transmitted from the processed data buffer, and vice-versa.
As just indicated, when a processed data buffer is first to fill, the contents of the raw data buffer are to be transferred. Prior to this transmission, a code can be injected to indicate that the data following is raw data. Upon reception or retrieval of the data file, this code can be used by a decompression system to determine when decompression is required and when it is not. Additional codes can be used to indicate the resumption or non-resumption of data compression.
Accordingly, the present invention provides for expansion protection with minimal performance over