Search Images Maps Play YouTube News Gmail Drive More »
Advanced Patent Search | Web History | Sign in

Patents

This invention provides a method and apparatus for detecting common spans within one or more data blocks by partitioning the blocks (FIG. 4) into subblocks and searching the group of subblocks (FIG. 12) (or their corresponding hashes (FIG. 13)) for duplicates. Blocks can be partitioned into subblocks using a variety of methods, including methods that place subblock boundaries at fixed positions (FIG. 3), methods that place subblock boundaries at data-dependent positions (FIG. 3), and methods that yield multiple overlapping subblocks (FIG. 6). By comparing the hashes of subblocks, common spans of one or more blocks can be identified without ever having to compare the blocks or subblocks themselves (FIG. 13). This leads to several applications including an incremental backup system that backs up changes rather than changed files (FIG. 25), a utility that determines the similarities and differences between two files (FIG. 13), a file system that stores each unique subblock at most once...

InventorRoss Neil Williams
Current U.S. Classification341/51; 341/67; 707/E17.01
International Classification: H03M 700

View patent at USPTO
Search USPTO Assignment Database

Citations

Cited PatentFiling dateIssue dateOriginal AssigneeTitle
US4698628Oct 4, 1985Oct 6, 1987Siemens AktiengesellschaftMethod and apparatus for transmission of data with data reduction
US5235623Nov 14, 1990Aug 10, 1993NEC CorporationAdaptive transform coding by selecting optimum block lengths according to variatons between successive blocks
US5479654Mar 30, 1993Dec 26, 1995Squibb Data Systems, Inc.Apparatus and method for reconstructing a file from a difference signature and an original file

Referenced by

Citing PatentFiling dateIssue dateOriginal AssigneeTitle
US6388586Jul 2, 2001May 14, 2002Hewlett-Packard CompanyMethod for reversing the bits of a computer data structure
US6513050Feb 10, 1999Jan 28, 2003Connected Place LimitedMethod of producing a checkpoint which describes a box file and a method of generating a difference file defining differences between an updated file and a base file
US6625625Apr 4, 2000Sep 23, 2003Hitachi, Ltd.System and method for backup and restoring by utilizing common and unique portions of data
US6671703Jun 22, 2001Dec 30, 2003Synchrologic, Inc.System and method for file transmission using file differentiation
US6704730Feb 5, 2001Mar 9, 2004Avamar Technologies, Inc.Hash file system and method for use in a commonality factoring system
US6810398Feb 5, 2001Oct 26, 2004Avamar Technologies, Inc.System and method for unorchestrated determination of data sequences using sticky byte factoring to determine breakpoints in digital sequences
US6828925Dec 8, 2003Dec 7, 2004NBT Technology, Inc.Content-based segmentation scheme for data compression in storage and transmission including hierarchical segment representation
US6915302Oct 1, 1999Jul 5, 2005International Business Machines CorporationMethod, system, and program for accessing files in a file system
US6961009Oct 18, 2004Nov 1, 2005NBT Technology, Inc.Content-based segmentation scheme for data compression in storage and transmission including hierarchical segment representation
US6976089Oct 4, 2001Dec 13, 2005Secul.com Corp.Method for high speed discrimination of policy in packet filtering type firewall system
US7002493Jan 8, 2004Feb 21, 2006Mathstar, Inc.Boolean logic tree reduction circuit
US7039634Mar 12, 2003May 2, 2006Hewlett-Packard Development Company, L.P.Semantic querying a peer-to-peer network
US7043470Mar 5, 2003May 9, 2006Hewlett-Packard Development Company, L.P.Method and apparatus for improving querying
US7079053Nov 29, 2004Jul 18, 2006Honeywell International Inc.Method and system for value-based data compression
US7116249Sep 8, 2005Oct 3, 2006NBT Technology, Inc.Content-based segmentation scheme for data compression in storage and transmission including hierarchical segment representation
US7120666Oct 30, 2002Oct 10, 2006Riverbed Technology, Inc.Transaction accelerator for client-server communication systems
US7124305Jan 7, 2004Oct 17, 2006Permabit, Inc.Data repository and method for promoting network storage of data
US7152220Dec 9, 2000Dec 19, 2006SenseMaking Technologies Corp.Collaboration engine: adding collaboration functionality to computer software
US7225185Mar 24, 2004May 29, 2007Seiko Epson CorporationOriginality guarantee system, embedded information/alteration detection apparatus and embedded information/alteration detection method, and record medium storing embedded information/alteration detection program therein
US7269689Jun 17, 2004Sep 11, 2007Hewlett-Packard Development Company, L.P.System and method for sharing storage resources between multiple files
US7272602Jun 4, 2004Sep 18, 2007EMC CorporationSystem and method for unorchestrated determination of data sequences using sticky byte factoring to determine breakpoints in digital sequences
US7287030Jan 7, 2004Oct 23, 2007Burnside Acquisition, LLCData repository and method for promoting network storage of data
US7318100Aug 12, 2003Jan 8, 2008Riverbed Technology, Inc.Cooperative proxy auto-discovery and connection interception
US7321322Nov 7, 2005Jan 22, 2008SAP Portals Israel Ltd.Pattern-driven, message-oriented compression apparatus and method
US7356701Jan 7, 2004Apr 8, 2008Burnside Acquisition, LLCData repository and method for promoting network storage of data
US7363326Jan 7, 2004Apr 22, 2008Burnside Acquisition, LLCArchive with timestamps and deletion management
US7366859Oct 6, 2005Apr 29, 2008Acronis Inc.Fast incremental backup method and system
US7398283Oct 14, 2004Jul 8, 2008Burnside Acquisition, LLCMethod for providing access control for data items in a data repository in which storage space used by identical content is shared
US7412462Feb 16, 2001Aug 12, 2008Burnside Acquisition, LLCData repository and method for promoting network storage of data
US7421433Oct 31, 2002Sep 2, 2008Hewlett-Packard Development Company, L.P.Semantic-based system including semantic vectors
US7424498Jun 30, 2003Sep 9, 2008Data Domain, Inc.Probabilistic summary data structure based encoding for garbage collection
US7428573Sep 15, 2005Sep 23, 2008Riverbed Technology, Inc.Transaction accelerator for client-server communication systems
US7443321Feb 13, 2007Oct 28, 2008Packeteer, Inc.Compression of stream data using a hierarchically-indexed database
US7451168Jun 30, 2003Nov 11, 2008Data Domain, Inc.Incremental garbage collection of data in a secondary storage
US7457800Oct 6, 2005Nov 25, 2008Burnside Acquisition, LLCStorage system for randomly named blocks of data
US7457813Oct 6, 2005Nov 25, 2008Burnside Acquisition, LLCStorage system for randomly named blocks of data
US7457959Jan 7, 2004Nov 25, 2008Burnside Acquisition, LLCData repository and method for promoting network storage of data
US7467144Jan 7, 2004Dec 16, 2008Burnside Acquisition, LLCHistory preservation in a computer storage system
US7477166Aug 22, 2006Jan 13, 2009Riverbed Technology, Inc.Content-based segmentation scheme for data compression in storage and transmission including hierarchical segment representation
US7496555Jan 7, 2004Feb 24, 2009Permabit, Inc.History preservation in a computer storage system
US7504969Jul 11, 2006Mar 17, 2009Data Domain, Inc.Locality-based stream segmentation for data deduplication
US7506173Jun 17, 2004Mar 17, 2009Burnside Acquisition, LLCData repository and method for promoting network storage of data
US7509420Feb 5, 2001Mar 24, 2009EMC CorporationSystem and method for intelligent, globally distributed network storage
US7523098Sep 15, 2004Apr 21, 2009International Business Machines CorporationSystems and methods for efficient data searching, storage and reduction
US7555531Apr 15, 2004Jun 30, 2009Microsoft CorporationEfficient algorithm and protocol for remote differential compression
US7558856Nov 29, 2004Jul 7, 2009EMC CorporationSystem and method for intelligent, globally distributed network storage
US7574418Feb 13, 2004Aug 11, 2009Data Domain, Inc.Method and apparatus for storing composite data streams
US7587617Jan 7, 2004Sep 8, 2009Burnside Acquisition, LLCData repository and method for promoting network storage of data
US7613787Sep 24, 2004Nov 3, 2009Microsoft CorporationEfficient algorithm for finding candidate objects for remote differential compression
US7649909Jun 30, 2006Jan 19, 2010Packeteer, Inc.Adaptive tunnel transport protocol
US7657931Jan 7, 2004Feb 2, 2010Burnside Acquisition, LLCData repository and method for promoting network storage of data
US7660836Mar 9, 2006Feb 9, 2010International Business Machines CorporationControlling incremental backups using opaque object attributes
US7676509Feb 1, 2006Mar 9, 2010i365 Inc.Methods and apparatus for modifying a backup data stream including a set of validation bytes for each data block to be provided to a fixed position delta reduction backup application
US7685096Oct 14, 2004Mar 23, 2010Permabit Technology CorporationData repository and method for promoting network storage of data
US7685459Apr 13, 2006Mar 23, 2010Symantec Operating CorporationParallel backup
US7689547Sep 6, 2006Mar 30, 2010Microsoft CorporationEncrypted data search
US7689633Sep 15, 2004Mar 30, 2010Data Domain, Inc.Network file system-based data storage system
US7693814Jan 7, 2004Apr 6, 2010Permabit Technology CorporationData repository and method for promoting network storage of data
US7707166Jul 24, 2007Apr 27, 2010Data Domain, Inc.Probabilistic summary data structure based encoding for garbage collection
US7733910Dec 29, 2006Jun 8, 2010Riverbed Technology, Inc.Data segmentation using shift-varying predicate function fingerprinting
US7734595Jan 7, 2004Jun 8, 2010Permabit Technology CorporationCommunicating information between clients of a data repository that have deposited identical data items
US7734598Aug 2, 2006Jun 8, 2010Fujitsu LimitedComputer-readable recording medium having recorded hash-value generation program, computer-readable recording medium having recorded storage management program, and storage system
US7734603Jan 26, 2006Jun 8, 2010NetApp, Inc.Content addressable storage array element
US7747581Apr 19, 2007Jun 29, 2010EMC CorporationNetwork file system-based data storage system
US7747584Aug 22, 2006Jun 29, 2010NetApp, Inc.System and method for enabling de-duplication in a storage system architecture
US7747586Apr 23, 2003Jun 29, 2010International Business Machines CorporationApparatus and method to map and copy computer files
US7747635Dec 21, 2004Jun 29, 2010Oracle America, Inc.Automatically generating efficient string matching code
US7752171Aug 6, 2007Jul 6, 2010DataCenterTechnologies N.VEfficient computer file backup system and method
US7761766Nov 15, 2005Jul 20, 2010i365 Inc.Methods and apparatus for modifying a backup data stream including logical partitions of data blocks to be provided to a fixed position delta reduction backup application
US7783682Jul 27, 2007Aug 24, 2010EMC CorporationProbabilistic summary data structure based encoding for garbage collection in backup systems
US7814074Mar 14, 2008Oct 12, 2010International Business Machines CorporationMethod and system for assuring integrity of deduplicated data
US7814129Mar 10, 2006Oct 12, 2010Method and apparatus for storing data with reduced redundancy using data clusters
US7827146Mar 30, 2007Nov 2, 2010Symantec Operating CorporationStorage system
US7831793Feb 26, 2007Nov 9, 2010Quantum CorporationData storage system including unique block pool manager and applications in tiered storage
US7840537Dec 21, 2007Nov 23, 2010CommVault Systems, Inc.System and method for storing redundant information
US7844652Apr 11, 2006Nov 30, 2010EMC CorporationEfficient computation of sketches
US7849134Aug 14, 2008Dec 7, 2010Riverbed Technology, Inc.Transaction accelerator for client-server communications systems
US7849462Jan 7, 2005Dec 7, 2010Microsoft CorporationImage server
US7852237Dec 8, 2008Dec 14, 2010Riverbed Technology, Inc.Content-based segmentation scheme for data compression in storage and transmission including hierarchical segment representation
US7853750Jan 30, 2007Dec 14, 2010NetApp, Inc.Method and an apparatus to store data patterns
US7860843Apr 9, 2007Dec 28, 2010Data Storage Group, Inc.Data compression and storage techniques
US7877556Mar 30, 2007Jan 25, 2011Hitachi, Ltd.Method and apparatus for a unified storage system
US7912855Jan 7, 2004Mar 22, 2011Permabit Technology CorporationHistory preservation in a computer storage system
US7921077Jun 29, 2006Apr 5, 2011NetApp, Inc.System and method for managing data deduplication of storage systems utilizing persistent consistency point images
US7925683Dec 18, 2009Apr 12, 2011Copiun, Inc.Methods and apparatus for content-aware data de-duplication
US7930315Jan 7, 2004Apr 19, 2011Permabit Technology CorporationHistory preservation in a computer storage system
US7933939Apr 16, 2008Apr 26, 2011Quantum CorporationApparatus and method for partitioning data blocks
US7949824Apr 11, 2006May 24, 2011EMC CorporationEfficient data storage using two level delta resemblance
US7953706Mar 28, 2008May 31, 2011CommVault Systems, Inc.System and method for storing redundant information
US7953869Feb 24, 2009May 31, 2011Riverbed Technology, Inc.Cooperative proxy auto-discovery and connection interception
US7962499Aug 16, 2007Jun 14, 2011FalconStor, Inc.System and method for identifying and mitigating redundancies in stored data
US7979491Mar 27, 2009Jul 12, 2011Hewlett-Packard Development Company, L.P.Producing chunks from input data using a plurality of processing elements
US7979584Jul 14, 2006Jul 12, 2011EMC CorporationPartitioning a data stream using embedded anchors
US7979670Jan 24, 2008Jul 12, 2011Quantum CorporationMethods and systems for vectored data de-duplication
US7983301Jun 24, 2005Jul 19, 2011O2Micro International, Ltd.Method for extended transmission capabilities of short message service
US8001273Mar 16, 2009Aug 16, 2011Hewlett-Packard Development Company, L.P.Parallel processing of input data to locate landmarks for chunks
US8028009Nov 10, 2008Sep 27, 2011EMC CorporationIncremental garbage collection of data in a secondary storage
US8028106Jul 3, 2008Sep 27, 2011Proster Systems, Inc.Hardware acceleration of commonality factoring with removable media
US8037028Mar 28, 2008Oct 11, 2011CommVault Systems, Inc.System and method for storing redundant information
US8037260Dec 15, 2010Oct 11, 2011Hitachi, Ltd.Method and apparatus for a unified storage system
US8041641Dec 19, 2006Oct 18, 2011Symantec Operating CorporationBackup service and appliance with single-instance storage of encrypted data
US8046509Jul 3, 2008Oct 25, 2011Prostor Systems, Inc.Commonality factoring for removable media
US8051252Mar 10, 2006Nov 1, 2011Method and apparatus for detecting the presence of subblocks in a reduced-redundancy storage system
US8069321Nov 13, 2006Nov 29, 2011i365 Inc.Secondary pools
US8073926Jan 7, 2005Dec 6, 2011Microsoft CorporationVirtual machine image server
US8078930Jul 19, 2010Dec 13, 2011i365 Inc.Methods and apparatus for modifying a backup data stream including logical partitions of data blocks to be provided to a fixed position delta reduction backup application
US8086799Aug 12, 2008Dec 27, 2011NetApp, Inc.Scalable deduplication of stored data
US8099401Jul 18, 2007Jan 17, 2012EMC CorporationEfficiently indexing and searching similar data
US8099573Oct 22, 2008Jan 17, 2012Hewlett-Packard Development Company, L.P.Data processing apparatus and method of processing data
US8108446Jun 27, 2008Jan 31, 2012Symantec CorporationMethods and systems for managing deduplicated data using unilateral referencing
US8112496Jul 31, 2009Feb 7, 2012Microsoft CorporationEfficient algorithm for finding candidate objects for remote differential compression
US8115660Sep 18, 2008Feb 14, 2012Packeteer, Inc.Compression of stream data using a hierarchically-indexed database
US8117173Apr 28, 2009Feb 14, 2012Microsoft CorporationEfficient chunking algorithm
US8117343Oct 28, 2008Feb 14, 2012Hewlett-Packard Development Company, L.P.Landmark chunking of landmarkless regions
US8117464Apr 30, 2008Feb 14, 2012NetApp, Inc.Sub-volume level security for deduplicated data
US8131924Mar 19, 2008Mar 6, 2012NetApp, Inc.De-duplication of data stored on tape media
US8135930Jul 13, 2009Mar 13, 2012Vizioncore, Inc.Replication systems and methods for a virtual computing environment
US8140491Mar 26, 2009Mar 20, 2012International Business Machines CorporationStorage management through adaptive deduplication
US8140637Sep 26, 2008Mar 20, 2012Hewlett-Packard Development Company, L.P.Communicating chunks between devices
US8140786Dec 4, 2007Mar 20, 2012CommVault Systems, Inc.Systems and methods for creating copies of data, such as archive copies
US8145863Apr 13, 2011Mar 27, 2012EMC CorporationEfficient data storage using two level delta resemblance
US8150851Oct 27, 2008Apr 3, 2012Hewlett-Packard Development Company, L.P.Data processing apparatus and method of processing data
US8156293Sep 8, 2011Apr 10, 2012Hitachi, Ltd.Method and apparatus for a unified storage system
US8161255Jan 6, 2009Apr 17, 2012International Business Machines CorporationOptimized simultaneous storing of data into deduplicated and non-deduplicated storage pools
US8165221Apr 28, 2006Apr 24, 2012NetApp, Inc.System and method for sampling based elimination of duplicate data
US8166263Jul 3, 2008Apr 24, 2012CommVault Systems, Inc.Continuous data protection over intermittent connections, such as continuous data backup for laptops or wireless devices
US8166265Sep 23, 2011Apr 24, 2012Vizioncore, Inc.Systems and methods for performing backup operations of virtual machine files
US8176186Aug 14, 2008May 8, 2012Riverbed Technology, Inc.Transaction accelerator for client-server communications systems
US8176338Mar 31, 2009May 8, 2012Symantec CorporationHash-based data block processing with intermittently-connected systems
US8180740Aug 12, 2009May 15, 2012NetApp, Inc.System and method for eliminating duplicate data by generating data fingerprints using adaptive fixed-length windows
US8190742Apr 25, 2006May 29, 2012Hewlett-Packard Development Company, L.P.Distributed differential store with non-distributed objects and compression-enhancing data-object routing
US8195614Aug 23, 2011Jun 5, 2012EMC CorporationIncremental garbage collection of data in a secondary storage
US8200641Aug 17, 2010Jun 12, 2012Dell Products L.P.Dictionary for data deduplication
US8200924Jan 8, 2009Jun 12, 2012Sepaton, Inc.Emulated storage system
US8200969Jan 31, 2008Jun 12, 2012Hewlett-Packard Development Company, L.P.Data verification by challenge
US8209334Dec 28, 2007Jun 26, 2012Method to direct data to a specific one of several repositories
US8214607Jul 7, 2011Jul 3, 2012Method and apparatus for detecting the presence of subblocks in a reduced-redundancy storage system
US8219524Jun 24, 2008Jul 10, 2012CommVault Systems, Inc.Application-aware and remote single instance data management
US8224875Jan 5, 2010Jul 17, 2012Symantec CorporationSystems and methods for removing unreferenced data segments from deduplicated data systems
US8225060Oct 16, 2009Jul 17, 2012Data de-duplication by predicting the locations of sub-blocks within the repository
US8244992May 24, 2010Aug 14, 2012Policy based data retrieval performance for deduplicated data

Claims

1. A method for organizing a block b of digital data for storage, communication, or comparison, comprising the step of:

partitioning said block b into a plurality of subblocks at at least one position k.vertline.k+1 within said block,
for which b[k-A+1 . . . k+B] satisfies a predetermined constraint, and
wherein A and B are natural numbers.

2. The method of claim 1, wherein the constraint comprises the hash of at least a portion of b[k-A+1 . . . k+B].

3. The method of claim 1, further comprising the step of:

locating the nearest subblock boundary on a side of a position p.vertline.p+1 within said block, said locating step comprising the step of:
evaluating whether said predetermined constraint is satisfied at each position k.vertline.k+1 for increasing or decreasing k,
wherein k starts with the value p.

4. The method of claim 1, wherein at least one bound is imposed on the size of at least one of said plurality of subblocks.

5. The method of claim 1, wherein additional subblocks are formed from at least one group of subblocks.

6. The method of claim 1, wherein an additional hierarchy of subblocks is formed from at least one group of contiguous subblocks.

7. The method of claim 1, further comprising the step of:

calculating the hash of each of at least one of said plurality of subblocks.

8. The method of claim 1, further comprising the step of:

forming a projection of said block, being an ordered or unordered collection of elements, wherein each element consists of a subblock, an identity of a subblock, or a reference of a subblock.

9. The method of claim 1, wherein said subblocks are compared by comparing the hashes of said subblocks.

10. The method of claim 1, wherein subsets of identical subblocks within a group of one or more subblocks are found by inserting each subblock, an identity of each subblock, a reference of each subblock, or a hash of each subblock into a data structure.

11. A method for comparing one or more blocks, comprising the steps of:

organizing a block b of digital data for the purpose of comparison, comprising the step of:
partitioning said block b into a plurality of subblocks at at least one position k.vertline.k+1 within said block;
for which b[k-A+1 . . . k+B] satisfies a predetermined constraint; and
wherein A and B are natural numbers,
forming a projection of each said block, being a collection of elements, wherein each element comprises a selected one of a subblock, an identity of a subblock, and a reference of a subblock, and
comparing the elements of said projections of said blocks.

12. A method for representing one or more blocks comprising a collection of subblocks and block representatives which are mapped to lists of entries which identify subblocks; said method comprising the step of modifying one of said blocks including the steps of:

partitioning said block into a plurality of subblocks at at least one position k.vertline.k+1 within said block, for which b[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers,
adding to said collection of subblocks zero or more subblocks which are not already in said collection, and
updating said subblock list associated with said modified block.

13. A method for representing one or more blocks comprising a collection of subblocks and block representatives which are mapped to lists of entries which identify subblocks; said method comprising the step of modifying one of said blocks including the steps of:

partitioning said block into a plurality of subblocks at at least one position k.vertline.k+1 within said block, for which b[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers,
removing from said collection of subblocks zero or more subblocks, and
updating said subblock list associated with said modified block.

14. A method for representing one or more blocks comprising a collection of subblocks and block representatives which are mapped to lists of entries which identify subblocks; said method comprising the step of modifying one of said blocks including the steps of:

partitioning said block into a plurality of subblocks at at least one position k.vertline.k+1 within said block, for which b[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers,
adding to said collection of subblocks zero or more subblocks that are not already in the collection,
removing from said collection of subblocks zero or more subblocks, and
updating said subblock list associated with said modified block.

15. A method for an entity E1 to communicate a block X to E2 where E1 possesses the knowledge that E2 possesses a group of Y subblocks Y.sub.1 . . . Y.sub.m, comprising the steps of:

partitioning said block X into a plurality of subblocks X.sub.1 . . . X.sub.n at at least one position k.vertline.k+1 within said block, for which X[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers, and
transmitting from E1 to E2 the contents of zero or more subblocks in X.sub.1 and the remaining subblocks as references to subblocks in Y.sub.1 . . . Y.sub.m, and to subblocks transmitted.

16. A method for an entity E1 to communicate one or more subblocks of a group X of subblocks X.sub.1 . . . X.sub.n to E2 where E1 possesses the knowledge that E2 possesses a block Y, comprising the steps of:

partitioning said block Y into a plurality of subblocks Y.sub.1 . . . Y.sub.m at at least one position k.vertline.k+1 within said block, for which Y[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers, and
transmitting from E1 to E2 the contents of zero or more subblocks in X, and the remaining subblocks as references to subblocks in Y, and to subblocks already transmitted.

17. A method for an entity E1 to communicate a block X to E2 where E1 possesses the knowledge that E2 possesses a block Y, comprising the steps of:

partitioning said block X into a plurality of subblocks X.sub.1 . . . X.sub.n at at least one position k.vertline.k+1 within said block, for which X[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers,
partitioning said block Y into a plurality of subblocks Y.sub.1 . . . Y.sub.m at at least one position k.vertline.k+1 within said block, for which Y[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers, and
transmitting from E1 to E2 the contents of zero or more subblocks in X, and the remaining subblocks as references to subblocks in Y, and to subblocks already transmitted.

18. A method for constructing a block D from a block X and a group Y of subblocks Y.sub.1 . . . Y.sub.m such that X can be constructed from Y and D, comprising the steps of:

partitioning said block X into a plurality of subblocks X.sub.1 . . . X.sub.n at at least one position k.vertline.k+1 within said block, for which X[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers, and
constructing D from a selected at least one of:
the contents of zero or more subblocks in X,
references to zero or more subblocks in Y, and
references to zero or more subblocks in D.

19. A method for constructing a block D from a group X of subblocks X.sub.1 . . . X.sub.n and a block Y such that X can be constructed from Y and D, comprising the steps of:

partitioning said block Y into a plurality of subblocks Y.sub.1 . . . Y.sub.m at at least one position k.vertline.k+1 within said block, for which Y[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers, and
constructing D from a selected at least one of:
the contents of zero or more subblocks in X,
references to zero or more subblocks in Y, and
references to zero of more subblocks in D.

20. A method for constructing a block D from a block X and a block Y such that X can be constructed from Y and D, comprising the steps of:

partitioning said block X into a plurality of subblocks X.sub.1 . . . X.sub.n at at least one position k.vertline.k+1 within said block, for which X[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers,
partitioning said block Y into a plurality of subblocks Y.sub.1 . . . Y.sub.m at at least one position k.vertline.k+1 within said block, for which Y[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers, and
constructing D from a selected at least one of:
the contents of zero or more in X,
references to zero or more subblocks in Y, and
references to zero or more subblocks in D.

21. A method for constructing a block D from a block X and a projection Y said projection comprising a collection of elements wherein said elements comprises a subblock in Y, an identity of a subblock in Y, or a reference of a subblock in Y, such that X can be constructed from Y and D, comprising the steps of:

partitioning said block X into a plurality of subblocks X.sub.1 . . . X.sub.n at at least one position k.vertline.k+1 within said block, for which X[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers, and
constructing D from a selected at least one of:
the contents of zero or more in X,
references to zero or more subblocks in Y, and
references to zero or more subblocks in D.

22. A method for constructing a block X from a block Y and a block D, comprising the steps of:

partitioning said block Y into a plurality of subblocks Y.sub.1 . . . Y.sub.m at at least one position k.vertline.k+1 within said block, for which Y[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers, and
constructing X from D and Y by constructing the subblocks of X based on a selected at least one of:
subblocks contained within D,
references in D to subblocks in Y, and
references to D to subblocks in D.

23. A method for constructing a group X of subblocks X.sub.1 . . . X.sub.n from a block Y and a block D, comprising the steps of:

partitioning said block Y into a plurality of subblocks Y.sub.1 . . . Y.sub.m at at least one position k.vertline.k+1 within said block, for which Y[k-A+1 . . . k+b] satisfies a predetermined constraint, and wherein A and B are natural numbers, and
constructing X.sub.1 . . . X.sub.n from D and Y based on a selected at least one of:
subblocks contained within D,
references in D to subblocks in Y, and
references to D to subblocks in D.

24. A method for communicating a data block X from one entity E1 to another entity E2, comprising the steps of:

partitioning said block X into a plurality of subblocks X.sub.1 . . . X.sub.n at at least one position k.vertline.k+1 within said block, for which X[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers,
transmitting from E1 to E2 an identity of at least one subblock,
transmitting from E2 to E1 information communicating the presence or absence of subblocks at E2, and
transmitting from E1 to E2 at least the subblocks identified as not being present at E2.

25. A method for communicating a block X from one entity E1 to another entity E2, comprising the steps of:

partitioning said block X into a plurality of subblocks X.sub.1 . . . X.sub.n at at least one position k.vertline.k+1 within said block, for which X[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers,
transmitting from E2 to E1 information communicating the presence or absence at E2 of members of a group Y of subblocks Y.sub.1 . . . Y.sub.m, and
transmitting from E1 to E2 the contents of zero or more subblocks in X, and the remaining subblocks as references to subblocks in Y.sub.1 . . . Y.sub.m and to subblocks already transmitted.

26. A method for an entity E2 to communicate to an entity E1 the fact that E2 possesses a block Y, comprising the steps of:

partitioning said block Y into a plurality of subblocks Y.sub.1 . . . Y.sub.m at at least one position k.vertline.k+1 within said block, for which Y[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers, and
transmitting from E2 to E1 references of the subblocks Y.sub.1 . . . Y.sub.m.

27. A method for an entity E1 to communicate a subblock X.sub.1 to an entity E2, comprising the steps of:

partitioning said block X into a plurality of subblocks X.sub.1 . . . X.sub.n at at least one position k.vertline.k+1 within said block, for which X[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers,
transmitting from E2 to E1 an identity of X.sub.i,
transmitting X.sub.i from E1 to E2.

28. An apparatus for organizing a block b of digital data for storage, communication, or comparison, comprising

means for partitioning said block b into a plurality of subblocks at at least one position k.vertline.k+1 within said block, for which b[k-A+1 . . . k+B] satisfies a predetermined constraint, and wherein A and B are natural numbers.

29. The apparatus of claim 28, in which the constraint comprises the hash of some or all of b[k-A+1 . . . k+B].

30. The apparatus of claim 28, further comprising

means for locating the nearest subblock boundary on a side of a position p.vertline.p+1 within said block, said means for locating comprising:
means for evaluating whether said predetermined constraint is satisfied at each position k.vertline.k+1 for increasing or decreasing k,
wherein k starts with the value p.