This invention provides a method and apparatus for detecting common spans within one or more data blocks by partitioning the blocks (FIG. 4) into subblocks and searching the group of subblocks (FIG. 12) (or their corresponding hashes (FIG. 13)) for duplicates. Blocks can be partitioned into subblocks using a variety of methods, including methods that place subblock boundaries at fixed positions (FIG. 3), methods that place subblock boundaries at data-dependent positions (FIG. 3), and methods that yield multiple overlapping subblocks (FIG. 6). By comparing the hashes of subblocks, common spans of one or more blocks can be identified without ever having to compare the blocks or subblocks themselves (FIG. 13). This leads to several applications including an incremental backup system that backs up changes rather than changed files (FIG. 25), a utility that determines the similarities and differences between two files (FIG. 13), a file system that stores each unique subblock at most once... |
Citations|
| US4698628 | Oct 4, 1985 | Oct 6, 1987 | Siemens Aktiengesellschaft | Method and apparatus for transmission of data with data reduction | | US5235623 | Nov 14, 1990 | Aug 10, 1993 | NEC Corporation | Adaptive transform coding by selecting optimum block lengths according to variatons between successive blocks | | US5479654 | Mar 30, 1993 | Dec 26, 1995 | Squibb Data Systems, Inc. | Apparatus and method for reconstructing a file from a difference signature and an original file |
Referenced by|
| US6388586 | Jul 2, 2001 | May 14, 2002 | Hewlett-Packard Company | Method for reversing the bits of a computer data structure | | US6513050 | Feb 10, 1999 | Jan 28, 2003 | Connected Place Limited | Method of producing a checkpoint which describes a box file and a method of generating a difference file defining differences between an updated file and a base file | | US6625625 | Apr 4, 2000 | Sep 23, 2003 | Hitachi, Ltd. | System and method for backup and restoring by utilizing common and unique portions of data | | US6671703 | Jun 22, 2001 | Dec 30, 2003 | Synchrologic, Inc. | System and method for file transmission using file differentiation | | US6704730 | Feb 5, 2001 | Mar 9, 2004 | Avamar Technologies, Inc. | Hash file system and method for use in a commonality factoring system | | US6810398 | Feb 5, 2001 | Oct 26, 2004 | Avamar Technologies, Inc. | System and method for unorchestrated determination of data sequences using sticky byte factoring to determine breakpoints in digital sequences | | US6828925 | Dec 8, 2003 | Dec 7, 2004 | NBT Technology, Inc. | Content-based segmentation scheme for data compression in storage and transmission including hierarchical segment representation | | US6915302 | Oct 1, 1999 | Jul 5, 2005 | International Business Machines Corporation | Method, system, and program for accessing files in a file system | | US6961009 | Oct 18, 2004 | Nov 1, 2005 | NBT Technology, Inc. | Content-based segmentation scheme for data compression in storage and transmission including hierarchical segment representation | | US6976089 | Oct 4, 2001 | Dec 13, 2005 | Secul.com Corp. | Method for high speed discrimination of policy in packet filtering type firewall system | | US7002493 | Jan 8, 2004 | Feb 21, 2006 | Mathstar, Inc. | Boolean logic tree reduction circuit | | US7039634 | Mar 12, 2003 | May 2, 2006 | Hewlett-Packard Development Company, L.P. | Semantic querying a peer-to-peer network | | US7043470 | Mar 5, 2003 | May 9, 2006 | Hewlett-Packard Development Company, L.P. | Method and apparatus for improving querying | | US7079053 | Nov 29, 2004 | Jul 18, 2006 | Honeywell International Inc. | Method and system for value-based data compression | | US7116249 | Sep 8, 2005 | Oct 3, 2006 | NBT Technology, Inc. | Content-based segmentation scheme for data compression in storage and transmission including hierarchical segment representation | | US7120666 | Oct 30, 2002 | Oct 10, 2006 | Riverbed Technology, Inc. | Transaction accelerator for client-server communication systems | | US7124305 | Jan 7, 2004 | Oct 17, 2006 | Permabit, Inc. | Data repository and method for promoting network storage of data | | US7152220 | Dec 9, 2000 | Dec 19, 2006 | SenseMaking Technologies Corp. | Collaboration engine: adding collaboration functionality to computer software | | US7225185 | Mar 24, 2004 | May 29, 2007 | Seiko Epson Corporation | Originality guarantee system, embedded information/alteration detection apparatus and embedded information/alteration detection method, and record medium storing embedded information/alteration detection program therein | | US7269689 | Jun 17, 2004 | Sep 11, 2007 | Hewlett-Packard Development Company, L.P. | System and method for sharing storage resources between multiple files | | US7272602 | Jun 4, 2004 | Sep 18, 2007 | EMC Corporation | System and method for unorchestrated determination of data sequences using sticky byte factoring to determine breakpoints in digital sequences | | US7287030 | Jan 7, 2004 | Oct 23, 2007 | Burnside Acquisition, LLC | Data repository and method for promoting network storage of data | | US7318100 | Aug 12, 2003 | Jan 8, 2008 | Riverbed Technology, Inc. | Cooperative proxy auto-discovery and connection interception | | US7321322 | Nov 7, 2005 | Jan 22, 2008 | SAP Portals Israel Ltd. | Pattern-driven, message-oriented compression apparatus and method | | US7356701 | Jan 7, 2004 | Apr 8, 2008 | Burnside Acquisition, LLC | Data repository and method for promoting network storage of data | | US7363326 | Jan 7, 2004 | Apr 22, 2008 | Burnside Acquisition, LLC | Archive with timestamps and deletion management | | US7366859 | Oct 6, 2005 | Apr 29, 2008 | Acronis Inc. | Fast incremental backup method and system | | US7398283 | Oct 14, 2004 | Jul 8, 2008 | Burnside Acquisition, LLC | Method for providing access control for data items in a data repository in which storage space used by identical content is shared | | US7412462 | Feb 16, 2001 | Aug 12, 2008 | Burnside Acquisition, LLC | Data repository and method for promoting network storage of data | | US7421433 | Oct 31, 2002 | Sep 2, 2008 | Hewlett-Packard Development Company, L.P. | Semantic-based system including semantic vectors | | US7424498 | Jun 30, 2003 | Sep 9, 2008 | Data Domain, Inc. | Probabilistic summary data structure based encoding for garbage collection | | US7428573 | Sep 15, 2005 | Sep 23, 2008 | Riverbed Technology, Inc. | Transaction accelerator for client-server communication systems | | US7443321 | Feb 13, 2007 | Oct 28, 2008 | Packeteer, Inc. | Compression of stream data using a hierarchically-indexed database | | US7451168 | Jun 30, 2003 | Nov 11, 2008 | Data Domain, Inc. | Incremental garbage collection of data in a secondary storage | | US7457800 | Oct 6, 2005 | Nov 25, 2008 | Burnside Acquisition, LLC | Storage system for randomly named blocks of data | | US7457813 | Oct 6, 2005 | Nov 25, 2008 | Burnside Acquisition, LLC | Storage system for randomly named blocks of data | | US7457959 | Jan 7, 2004 | Nov 25, 2008 | Burnside Acquisition, LLC | Data repository and method for promoting network storage of data | | US7467144 | Jan 7, 2004 | Dec 16, 2008 | Burnside Acquisition, LLC | History preservation in a computer storage system | | US7477166 | Aug 22, 2006 | Jan 13, 2009 | Riverbed Technology, Inc. | Content-based segmentation scheme for data compression in storage and transmission including hierarchical segment representation | | US7496555 | Jan 7, 2004 | Feb 24, 2009 | Permabit, Inc. | History preservation in a computer storage system | | US7504969 | Jul 11, 2006 | Mar 17, 2009 | Data Domain, Inc. | Locality-based stream segmentation for data deduplication | | US7506173 | Jun 17, 2004 | Mar 17, 2009 | Burnside Acquisition, LLC | Data repository and method for promoting network storage of data | | US7509420 | Feb 5, 2001 | Mar 24, 2009 | EMC Corporation | System and method for intelligent, globally distributed network storage | | US7523098 | Sep 15, 2004 | Apr 21, 2009 | International Business Machines Corporation | Systems and methods for efficient data searching, storage and reduction | | US7555531 | Apr 15, 2004 | Jun 30, 2009 | Microsoft Corporation | Efficient algorithm and protocol for remote differential compression | | US7558856 | Nov 29, 2004 | Jul 7, 2009 | EMC Corporation | System and method for intelligent, globally distributed network storage | | US7574418 | Feb 13, 2004 | Aug 11, 2009 | Data Domain, Inc. | Method and apparatus for storing composite data streams | | US7587617 | Jan 7, 2004 | Sep 8, 2009 | Burnside Acquisition, LLC | Data repository and method for promoting network storage of data | | US7613787 | Sep 24, 2004 | Nov 3, 2009 | Microsoft Corporation | Efficient algorithm for finding candidate objects for remote differential compression | | US7649909 | Jun 30, 2006 | Jan 19, 2010 | Packeteer, Inc. | Adaptive tunnel transport protocol | | US7657931 | Jan 7, 2004 | Feb 2, 2010 | Burnside Acquisition, LLC | Data repository and method for promoting network storage of data | | US7660836 | Mar 9, 2006 | Feb 9, 2010 | International Business Machines Corporation | Controlling incremental backups using opaque object attributes | | US7676509 | Feb 1, 2006 | Mar 9, 2010 | i365 Inc. | Methods and apparatus for modifying a backup data stream including a set of validation bytes for each data block to be provided to a fixed position delta reduction backup application | | US7685096 | Oct 14, 2004 | Mar 23, 2010 | Permabit Technology Corporation | Data repository and method for promoting network storage of data | | US7685459 | Apr 13, 2006 | Mar 23, 2010 | Symantec Operating Corporation | Parallel backup | | US7689547 | Sep 6, 2006 | Mar 30, 2010 | Microsoft Corporation | Encrypted data search | | US7689633 | Sep 15, 2004 | Mar 30, 2010 | Data Domain, Inc. | Network file system-based data storage system | | US7693814 | Jan 7, 2004 | Apr 6, 2010 | Permabit Technology Corporation | Data repository and method for promoting network storage of data | | US7707166 | Jul 24, 2007 | Apr 27, 2010 | Data Domain, Inc. | Probabilistic summary data structure based encoding for garbage collection | | US7733910 | Dec 29, 2006 | Jun 8, 2010 | Riverbed Technology, Inc. | Data segmentation using shift-varying predicate function fingerprinting | | US7734595 | Jan 7, 2004 | Jun 8, 2010 | Permabit Technology Corporation | Communicating information between clients of a data repository that have deposited identical data items | | US7734598 | Aug 2, 2006 | Jun 8, 2010 | Fujitsu Limited | Computer-readable recording medium having recorded hash-value generation program, computer-readable recording medium having recorded storage management program, and storage system | | US7734603 | Jan 26, 2006 | Jun 8, 2010 | NetApp, Inc. | Content addressable storage array element | | US7747581 | Apr 19, 2007 | Jun 29, 2010 | EMC Corporation | Network file system-based data storage system | | US7747584 | Aug 22, 2006 | Jun 29, 2010 | NetApp, Inc. | System and method for enabling de-duplication in a storage system architecture | | US7747586 | Apr 23, 2003 | Jun 29, 2010 | International Business Machines Corporation | Apparatus and method to map and copy computer files | | US7747635 | Dec 21, 2004 | Jun 29, 2010 | Oracle America, Inc. | Automatically generating efficient string matching code | | US7752171 | Aug 6, 2007 | Jul 6, 2010 | DataCenterTechnologies N.V | Efficient computer file backup system and method | | US7761766 | Nov 15, 2005 | Jul 20, 2010 | i365 Inc. | Methods and apparatus for modifying a backup data stream including logical partitions of data blocks to be provided to a fixed position delta reduction backup application | | US7783682 | Jul 27, 2007 | Aug 24, 2010 | EMC Corporation | Probabilistic summary data structure based encoding for garbage collection in backup systems | | US7814074 | Mar 14, 2008 | Oct 12, 2010 | International Business Machines Corporation | Method and system for assuring integrity of deduplicated data | | US7814129 | Mar 10, 2006 | Oct 12, 2010 | | Method and apparatus for storing data with reduced redundancy using data clusters | | US7827146 | Mar 30, 2007 | Nov 2, 2010 | Symantec Operating Corporation | Storage system | | US7831793 | Feb 26, 2007 | Nov 9, 2010 | Quantum Corporation | Data storage system including unique block pool manager and applications in tiered storage | | US7840537 | Dec 21, 2007 | Nov 23, 2010 | CommVault Systems, Inc. | System and method for storing redundant information | | US7844652 | Apr 11, 2006 | Nov 30, 2010 | EMC Corporation | Efficient computation of sketches | | US7849134 | Aug 14, 2008 | Dec 7, 2010 | Riverbed Technology, Inc. | Transaction accelerator for client-server communications systems | | US7849462 | Jan 7, 2005 | Dec 7, 2010 | Microsoft Corporation | Image server | | US7852237 | Dec 8, 2008 | Dec 14, 2010 | Riverbed Technology, Inc. | Content-based segmentation scheme for data compression in storage and transmission including hierarchical segment representation | | US7853750 | Jan 30, 2007 | Dec 14, 2010 | NetApp, Inc. | Method and an apparatus to store data patterns | | US7860843 | Apr 9, 2007 | Dec 28, 2010 | Data Storage Group, Inc. | Data compression and storage techniques | | US7877556 | Mar 30, 2007 | Jan 25, 2011 | Hitachi, Ltd. | Method and apparatus for a unified storage system | | US7912855 | Jan 7, 2004 | Mar 22, 2011 | Permabit Technology Corporation | History preservation in a computer storage system | | US7921077 | Jun 29, 2006 | Apr 5, 2011 | NetApp, Inc. | System and method for managing data deduplication of storage systems utilizing persistent consistency point images | | US7925683 | Dec 18, 2009 | Apr 12, 2011 | Copiun, Inc. | Methods and apparatus for content-aware data de-duplication | | US7930315 | Jan 7, 2004 | Apr 19, 2011 | Permabit Technology Corporation | History preservation in a computer storage system | | US7933939 | Apr 16, 2008 | Apr 26, 2011 | Quantum Corporation | Apparatus and method for partitioning data blocks | | US7949824 | Apr 11, 2006 | May 24, 2011 | EMC Corporation | Efficient data storage using two level delta resemblance | | US7953706 | Mar 28, 2008 | May 31, 2011 | CommVault Systems, Inc. | System and method for storing redundant information | | US7953869 | Feb 24, 2009 | May 31, 2011 | Riverbed Technology, Inc. | Cooperative proxy auto-discovery and connection interception | | US7962499 | Aug 16, 2007 | Jun 14, 2011 | FalconStor, Inc. | System and method for identifying and mitigating redundancies in stored data | | US7979491 | Mar 27, 2009 | Jul 12, 2011 | Hewlett-Packard Development Company, L.P. | Producing chunks from input data using a plurality of processing elements | | US7979584 | Jul 14, 2006 | Jul 12, 2011 | EMC Corporation | Partitioning a data stream using embedded anchors | | US7979670 | Jan 24, 2008 | Jul 12, 2011 | Quantum Corporation | Methods and systems for vectored data de-duplication | | US7983301 | Jun 24, 2005 | Jul 19, 2011 | O2Micro International, Ltd. | Method for extended transmission capabilities of short message service | | US8001273 | Mar 16, 2009 | Aug 16, 2011 | Hewlett-Packard Development Company, L.P. | Parallel processing of input data to locate landmarks for chunks | | US8028009 | Nov 10, 2008 | Sep 27, 2011 | EMC Corporation | Incremental garbage collection of data in a secondary storage | | US8028106 | Jul 3, 2008 | Sep 27, 2011 | Proster Systems, Inc. | Hardware acceleration of commonality factoring with removable media | | US8037028 | Mar 28, 2008 | Oct 11, 2011 | CommVault Systems, Inc. | System and method for storing redundant information | | US8037260 | Dec 15, 2010 | Oct 11, 2011 | Hitachi, Ltd. | Method and apparatus for a unified storage system | | US8041641 | Dec 19, 2006 | Oct 18, 2011 | Symantec Operating Corporation | Backup service and appliance with single-instance storage of encrypted data | | US8046509 | Jul 3, 2008 | Oct 25, 2011 | Prostor Systems, Inc. | Commonality factoring for removable media | | US8051252 | Mar 10, 2006 | Nov 1, 2011 | | Method and apparatus for detecting the presence of subblocks in a reduced-redundancy storage system | | US8069321 | Nov 13, 2006 | Nov 29, 2011 | i365 Inc. | Secondary pools | | US8073926 | Jan 7, 2005 | Dec 6, 2011 | Microsoft Corporation | Virtual machine image server | | US8078930 | Jul 19, 2010 | Dec 13, 2011 | i365 Inc. | Methods and apparatus for modifying a backup data stream including logical partitions of data blocks to be provided to a fixed position delta reduction backup application | | US8086799 | Aug 12, 2008 | Dec 27, 2011 | NetApp, Inc. | Scalable deduplication of stored data | | US8099401 | Jul 18, 2007 | Jan 17, 2012 | EMC Corporation | Efficiently indexing and searching similar data | | US8099573 | Oct 22, 2008 | Jan 17, 2012 | Hewlett-Packard Development Company, L.P. | Data processing apparatus and method of processing data | | US8108446 | Jun 27, 2008 | Jan 31, 2012 | Symantec Corporation | Methods and systems for managing deduplicated data using unilateral referencing | | US8112496 | Jul 31, 2009 | Feb 7, 2012 | Microsoft Corporation | Efficient algorithm for finding candidate objects for remote differential compression | | US8115660 | Sep 18, 2008 | Feb 14, 2012 | Packeteer, Inc. | Compression of stream data using a hierarchically-indexed database | | US8117173 | Apr 28, 2009 | Feb 14, 2012 | Microsoft Corporation | Efficient chunking algorithm | | US8117343 | Oct 28, 2008 | Feb 14, 2012 | Hewlett-Packard Development Company, L.P. | Landmark chunking of landmarkless regions | | US8117464 | Apr 30, 2008 | Feb 14, 2012 | NetApp, Inc. | Sub-volume level security for deduplicated data | | US8131924 | Mar 19, 2008 | Mar 6, 2012 | NetApp, Inc. | De-duplication of data stored on tape media | | US8135930 | Jul 13, 2009 | Mar 13, 2012 | Vizioncore, Inc. | Replication systems and methods for a virtual computing environment | | US8140491 | Mar 26, 2009 | Mar 20, 2012 | International Business Machines Corporation | Storage management through adaptive deduplication | | US8140637 | Sep 26, 2008 | Mar 20, 2012 | Hewlett-Packard Development Company, L.P. | Communicating chunks between devices | | US8140786 | Dec 4, 2007 | Mar 20, 2012 | CommVault Systems, Inc. | Systems and methods for creating copies of data, such as archive copies | | US8145863 | Apr 13, 2011 | Mar 27, 2012 | EMC Corporation | Efficient data storage using two level delta resemblance | | US8150851 | Oct 27, 2008 | Apr 3, 2012 | Hewlett-Packard Development Company, L.P. | Data processing apparatus and method of processing data | | US8156293 | Sep 8, 2011 | Apr 10, 2012 | Hitachi, Ltd. | Method and apparatus for a unified storage system | | US8161255 | Jan 6, 2009 | Apr 17, 2012 | International Business Machines Corporation | Optimized simultaneous storing of data into deduplicated and non-deduplicated storage pools | | US8165221 | Apr 28, 2006 | Apr 24, 2012 | NetApp, Inc. | System and method for sampling based elimination of duplicate data | | US8166263 | Jul 3, 2008 | Apr 24, 2012 | CommVault Systems, Inc. | Continuous data protection over intermittent connections, such as continuous data backup for laptops or wireless devices | | US8166265 | Sep 23, 2011 | Apr 24, 2012 | Vizioncore, Inc. | Systems and methods for performing backup operations of virtual machine files | | US8176186 | Aug 14, 2008 | May 8, 2012 | Riverbed Technology, Inc. | Transaction accelerator for client-server communications systems | | US8176338 | Mar 31, 2009 | May 8, 2012 | Symantec Corporation | Hash-based data block processing with intermittently-connected systems | | US8180740 | Aug 12, 2009 | May 15, 2012 | NetApp, Inc. | System and method for eliminating duplicate data by generating data fingerprints using adaptive fixed-length windows | | US8190742 | Apr 25, 2006 | May 29, 2012 | Hewlett-Packard Development Company, L.P. | Distributed differential store with non-distributed objects and compression-enhancing data-object routing | | US8195614 | Aug 23, 2011 | Jun 5, 2012 | EMC Corporation | Incremental garbage collection of data in a secondary storage | | US8200641 | Aug 17, 2010 | Jun 12, 2012 | Dell Products L.P. | Dictionary for data deduplication | | US8200924 | Jan 8, 2009 | Jun 12, 2012 | Sepaton, Inc. | Emulated storage system | | US8200969 | Jan 31, 2008 | Jun 12, 2012 | Hewlett-Packard Development Company, L.P. | Data verification by challenge | | US8209334 | Dec 28, 2007 | Jun 26, 2012 | | Method to direct data to a specific one of several repositories | | US8214607 | Jul 7, 2011 | Jul 3, 2012 | | Method and apparatus for detecting the presence of subblocks in a reduced-redundancy storage system | | US8219524 | Jun 24, 2008 | Jul 10, 2012 | CommVault Systems, Inc. | Application-aware and remote single instance data management | | US8224875 | Jan 5, 2010 | Jul 17, 2012 | Symantec Corporation | Systems and methods for removing unreferenced data segments from deduplicated data systems | | US8225060 | Oct 16, 2009 | Jul 17, 2012 | | Data de-duplication by predicting the locations of sub-blocks within the repository | | US8244992 | May 24, 2010 | Aug 14, 2012 | | Policy based data retrieval performance for deduplicated data |
Claims1. A method for organizing a block b of digital data for storage, communication, or comparison, comprising the step of: - partitioning said block b into a plurality of subblocks at at least one position k.vertline.k+1 within said block,
- for which b[k-A+1 . . . k+B] satisfies a predetermined constraint, and
- wherein A and B are natural numbers.
2. The method of claim 1, wherein the constraint comprises the hash of at least a portion of b[k-A+1 . . . k+B]. 3. The method of claim 1, further comprising the step of: - locating the nearest subblock boundary on a side of a position p.vertline.p+1 within said block, said locating step comprising the step of:
- evaluating whether said predetermined constraint is satisfied at each position k.vertline.k+1 for increasing or decreasing k,
- wherein k starts with the value p.
4. The method of claim 1, wherein at least one bound is imposed on the size of at least one of said plurality of subblocks. 5. The method of claim 1, wherein additional subblocks are formed from at least one group of subblocks. 6. The method of claim 1, wherein an additional hierarchy of subblocks is formed from at least one group of contiguous subblocks. 7. The method of claim 1, further comprising the step of: - calculating the hash of each of at least one of said plurality of subblocks.
8. The method of claim 1, further comprising the step of: - forming a projection of said block, being an ordered or unordered collection of elements, wherein each element consists of a subblock, an identity of a subblock, or a reference of a subblock.
9. The method of claim 1, wherein said subblocks are compared by comparing the hashes of said subblocks. 10. The method of claim 1, wherein subsets of identical subblocks within a group of one or more subblocks are found by inserting each subblock, an identity of each subblock, a reference of each subblock, or a hash of each subblock into a data structure. 11. A method for comparing one or more blocks, comprising the steps of: - organizing a block b of digital data for the purpose of comparison, comprising the step of:
- partitioning said block b into a plurality of subblocks at at least one position k.vertline.k+1 within said block;
- for which b[k-A+1 . . . k+B] satisfies a predetermined constraint; and
- wherein A and B are natural numbers,
- forming a projection of each said block, being a collection of elements, wherein each element comprises a selected one of a subblock, an identity of a subblock, and a reference of a subblock, and
- comparing the elements of said projections of said blocks.
12. A method for representing one or more blocks comprising a collection of subblocks and block representatives which are mapped to lists of entries which identify subblocks; said method comprising the step of modifying one of said blocks including the steps of: - partitioning said block into a plurality of subblocks at at least one position k.vertline.k+1 within said block, for which b[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers,
- adding to said collection of subblocks zero or more subblocks which are not already in said collection, and
- updating said subblock list associated with said modified block.
13. A method for representing one or more blocks comprising a collection of subblocks and block representatives which are mapped to lists of entries which identify subblocks; said method comprising the step of modifying one of said blocks including the steps of: - partitioning said block into a plurality of subblocks at at least one position k.vertline.k+1 within said block, for which b[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers,
- removing from said collection of subblocks zero or more subblocks, and
- updating said subblock list associated with said modified block.
14. A method for representing one or more blocks comprising a collection of subblocks and block representatives which are mapped to lists of entries which identify subblocks; said method comprising the step of modifying one of said blocks including the steps of: - partitioning said block into a plurality of subblocks at at least one position k.vertline.k+1 within said block, for which b[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers,
- adding to said collection of subblocks zero or more subblocks that are not already in the collection,
- removing from said collection of subblocks zero or more subblocks, and
- updating said subblock list associated with said modified block.
15. A method for an entity E1 to communicate a block X to E2 where E1 possesses the knowledge that E2 possesses a group of Y subblocks Y.sub.1 . . . Y.sub.m, comprising the steps of: - partitioning said block X into a plurality of subblocks X.sub.1 . . . X.sub.n at at least one position k.vertline.k+1 within said block, for which X[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers, and
- transmitting from E1 to E2 the contents of zero or more subblocks in X.sub.1 and the remaining subblocks as references to subblocks in Y.sub.1 . . . Y.sub.m, and to subblocks transmitted.
16. A method for an entity E1 to communicate one or more subblocks of a group X of subblocks X.sub.1 . . . X.sub.n to E2 where E1 possesses the knowledge that E2 possesses a block Y, comprising the steps of: - partitioning said block Y into a plurality of subblocks Y.sub.1 . . . Y.sub.m at at least one position k.vertline.k+1 within said block, for which Y[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers, and
- transmitting from E1 to E2 the contents of zero or more subblocks in X, and the remaining subblocks as references to subblocks in Y, and to subblocks already transmitted.
17. A method for an entity E1 to communicate a block X to E2 where E1 possesses the knowledge that E2 possesses a block Y, comprising the steps of: - partitioning said block X into a plurality of subblocks X.sub.1 . . . X.sub.n at at least one position k.vertline.k+1 within said block, for which X[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers,
- partitioning said block Y into a plurality of subblocks Y.sub.1 . . . Y.sub.m at at least one position k.vertline.k+1 within said block, for which Y[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers, and
- transmitting from E1 to E2 the contents of zero or more subblocks in X, and the remaining subblocks as references to subblocks in Y, and to subblocks already transmitted.
18. A method for constructing a block D from a block X and a group Y of subblocks Y.sub.1 . . . Y.sub.m such that X can be constructed from Y and D, comprising the steps of: - partitioning said block X into a plurality of subblocks X.sub.1 . . . X.sub.n at at least one position k.vertline.k+1 within said block, for which X[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers, and
- constructing D from a selected at least one of:
- the contents of zero or more subblocks in X,
- references to zero or more subblocks in Y, and
- references to zero or more subblocks in D.
19. A method for constructing a block D from a group X of subblocks X.sub.1 . . . X.sub.n and a block Y such that X can be constructed from Y and D, comprising the steps of: - partitioning said block Y into a plurality of subblocks Y.sub.1 . . . Y.sub.m at at least one position k.vertline.k+1 within said block, for which Y[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers, and
- constructing D from a selected at least one of:
- the contents of zero or more subblocks in X,
- references to zero or more subblocks in Y, and
- references to zero of more subblocks in D.
20. A method for constructing a block D from a block X and a block Y such that X can be constructed from Y and D, comprising the steps of: - partitioning said block X into a plurality of subblocks X.sub.1 . . . X.sub.n at at least one position k.vertline.k+1 within said block, for which X[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers,
- partitioning said block Y into a plurality of subblocks Y.sub.1 . . . Y.sub.m at at least one position k.vertline.k+1 within said block, for which Y[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers, and
- constructing D from a selected at least one of:
- the contents of zero or more in X,
- references to zero or more subblocks in Y, and
- references to zero or more subblocks in D.
21. A method for constructing a block D from a block X and a projection Y said projection comprising a collection of elements wherein said elements comprises a subblock in Y, an identity of a subblock in Y, or a reference of a subblock in Y, such that X can be constructed from Y and D, comprising the steps of: - partitioning said block X into a plurality of subblocks X.sub.1 . . . X.sub.n at at least one position k.vertline.k+1 within said block, for which X[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers, and
- constructing D from a selected at least one of:
- the contents of zero or more in X,
- references to zero or more subblocks in Y, and
- references to zero or more subblocks in D.
22. A method for constructing a block X from a block Y and a block D, comprising the steps of: - partitioning said block Y into a plurality of subblocks Y.sub.1 . . . Y.sub.m at at least one position k.vertline.k+1 within said block, for which Y[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers, and
- constructing X from D and Y by constructing the subblocks of X based on a selected at least one of:
- subblocks contained within D,
- references in D to subblocks in Y, and
- references to D to subblocks in D.
23. A method for constructing a group X of subblocks X.sub.1 . . . X.sub.n from a block Y and a block D, comprising the steps of: - partitioning said block Y into a plurality of subblocks Y.sub.1 . . . Y.sub.m at at least one position k.vertline.k+1 within said block, for which Y[k-A+1 . . . k+b] satisfies a predetermined constraint, and wherein A and B are natural numbers, and
- constructing X.sub.1 . . . X.sub.n from D and Y based on a selected at least one of:
- subblocks contained within D,
- references in D to subblocks in Y, and
- references to D to subblocks in D.
24. A method for communicating a data block X from one entity E1 to another entity E2, comprising the steps of: - partitioning said block X into a plurality of subblocks X.sub.1 . . . X.sub.n at at least one position k.vertline.k+1 within said block, for which X[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers,
- transmitting from E1 to E2 an identity of at least one subblock,
- transmitting from E2 to E1 information communicating the presence or absence of subblocks at E2, and
- transmitting from E1 to E2 at least the subblocks identified as not being present at E2.
25. A method for communicating a block X from one entity E1 to another entity E2, comprising the steps of: - partitioning said block X into a plurality of subblocks X.sub.1 . . . X.sub.n at at least one position k.vertline.k+1 within said block, for which X[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers,
- transmitting from E2 to E1 information communicating the presence or absence at E2 of members of a group Y of subblocks Y.sub.1 . . . Y.sub.m, and
- transmitting from E1 to E2 the contents of zero or more subblocks in X, and the remaining subblocks as references to subblocks in Y.sub.1 . . . Y.sub.m and to subblocks already transmitted.
26. A method for an entity E2 to communicate to an entity E1 the fact that E2 possesses a block Y, comprising the steps of: - partitioning said block Y into a plurality of subblocks Y.sub.1 . . . Y.sub.m at at least one position k.vertline.k+1 within said block, for which Y[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers, and
- transmitting from E2 to E1 references of the subblocks Y.sub.1 . . . Y.sub.m.
27. A method for an entity E1 to communicate a subblock X.sub.1 to an entity E2, comprising the steps of: - partitioning said block X into a plurality of subblocks X.sub.1 . . . X.sub.n at at least one position k.vertline.k+1 within said block, for which X[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers,
- transmitting from E2 to E1 an identity of X.sub.i,
- transmitting X.sub.i from E1 to E2.
28. An apparatus for organizing a block b of digital data for storage, communication, or comparison, comprising - means for partitioning said block b into a plurality of subblocks at at least one position k.vertline.k+1 within said block, for which b[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers.
29. The apparatus of claim 28, in which the constraint comprises the hash of some or all of b[k-A+1 . . . k+B]. 30. The apparatus of claim 28, further comprising - means for locating the nearest subblock boundary on a side of a position p.vertline.p+1 within said block, said means for locating comprising:
- means for evaluating whether said predetermined constraint is satisfied at each position k.vertline.k+1 for increasing or decreasing k,
- wherein k starts with the value p.
|