|Publication number||US8190742 B2|
|Application number||US 11/411,386|
|Publication date||May 29, 2012|
|Filing date||Apr 25, 2006|
|Priority date||Apr 25, 2006|
|Also published as||EP2011042A2, EP2011042B1, US8447864, US20070250519, US20120239815, WO2007127246A2, WO2007127246A3|
|Publication number||11411386, 411386, US 8190742 B2, US 8190742B2, US-B2-8190742, US8190742 B2, US8190742B2|
|Inventors||Samuel A. Fineberg, Kave Eshghi, Pankaj Mehra, Mark Lillibridge|
|Original Assignee||Hewlett-Packard Development Company, L.P.|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (52), Non-Patent Citations (29), Referenced by (6), Classifications (11), Legal Events (3)|
|External Links: USPTO, USPTO Assignment, Espacenet|
The present invention is related to data storage systems and, in particular, to distributed, differential electronic-data storage systems that do not distribute data objects across multiple component storage systems and that employ compression-enhancing data-object routing methods that route data objects to component storage systems in order to achieve large data-compression ratios for stored data objects.
Since the 1960's, the computer hardware and software industries have provided a relentless and spectacular increase in the capabilities and functionalities of computer-based data processing systems. For example, contemporary office workers are typically equipped with modern personal computers (“PCs”) that surpass, in processor speeds, memory sizes, and mass-storage capacities, supercomputers of only 20 years ago. Networking technologies allow PCs to be interlinked with one another and with powerful servers and other computational resources to provide extremely high-bandwidth interconnection between computer users, access by users to vast computational resources, and immense capacities for data storage and retrieval. Today, large and complex business organizations can easily implement highly interconnected, paperless work environments using relatively inexpensive, commercially available computer hardware and software products. However, as the capabilities of computer hardware and software have increased, the rate and amount of data that is generated and computationally managed in business, commercial, and even home environments, has rapidly increased. Computer users may receive hundreds of emails each day, many including photographs, video clips, and complex, multi-media documents. Moreover, many computer users routinely generate large numbers of text documents, multi-media presentations, and other types of data. Much of this data needs to be managed and stored for subsequent retrieval. Recent legislation mandates, for example, reliable storage of emails and other electronic communications generated and received in certain business environments for lengthy periods of time, spanning decades. Although it is possible to purchase ever-larger mass-storage devices and ever-increasing numbers of servers to manage backup and archiving of electronic data on the mass-storage devices, the expense, management overhead, and administrative overhead of storing and managing the large amounts of electronic data may quickly reach a point of commercial and economical impracticality. For these and other reasons, computer users, business and research organizations, vendors of computer systems and computer software, and various governmental organizations have all recognized the need for improved, more cost-effective methods and systems for backing up and archiving electronic data.
One embodiment of the present invention provides a distributed, differential electronic-data storage system that includes client computers, component data-storage systems, and a routing component. Client computers direct data objects to component data-storage systems within the distributed, differential electronic-data storage system. Component data-storage systems provide data storage for the distributed, differential electronic-data storage system. The routing component directs data objects, received from the clients computers, through logical bins to component data-storage systems by a compression-enhancing routing method.
Various embodiments of the present invention include a variety of different types of distributed, differential electronic-data storage systems in which stored data objects are fully contained within individual component storage systems. In these various embodiments of distributed, differential electronic-data storage systems, data objects may be routed to component storage systems through logical bins in order to increase the flexibility and robustness of the distributed, differential electronic-data storage systems. The various distributed, differential electronic-data storage systems of the present invention employ compression-enhancing data-object routing methods that direct data objects to those component data-storage systems in which the data objects can be stored with best compression. Compression-enhancing routing methods include content-based compression-enhancing routing methods and query-based compression-enhancement routing methods. Query-based compression-enhancing routing methods further include trial-storage-based query methods, similarity-key-based query methods, and hash-list-based query methods. In a first subsection, below, a general architecture for distributed, differential electronic-data storage systems that represent embodiments of the present invention is provided. In a second subsection, bin-based indirect data-object routing is discussed. In a third subsection, differential-data-storage compression and differential-data-storage metrics used for evaluating the efficiency of differential data-storage systems are described. In a fourth subsection, an overview of compression-enhancing routing is provided. In a fifth subsection, content-based compression-enhancing routing methods are discussed. Finally, in a sixth subsection, query-based compression-enhancing routing methods are discussed.
Non-distributed Data Storage Systems
In a first approach to backing up and archiving data, a user may invest in multiple disk drives for the PC, and store backup and archival copies of important data objects on a disk drive allocated for backup and archiving. In slightly more sophisticated systems, a user may employ two or more disk drives within a PC and operating-system features to implement an automated mirroring process by which an exact, mirror copy of a working disk drive is maintained on a separate, mirror disk drive. However, these techniques are inadequate in many commercial and even home situations. First, even when multiple disk drives are employed, theft of, or significant damage to, the PC may nonetheless lead to irretrievable loss of data. Moreover, as operating systems and application programs continue to evolve, the data objects routinely generated by users have tended to become larger and more complex, and are generated at ever-increasing rates. Therefore, a PC often lacks sufficient mass-storage capacity for long-term archiving. Finally, localized strategies for backing up and archiving data generally involve significant management and administrative overhead, as a result of which users often tend to neglect to properly maintain backed up and archived data, and frequently fail to continuously backup and archive data that they may subsequently need. Commercial and governmental organizations cannot generally rely on individual users and employees to administer data backups and data archiving.
For all of the reasons discussed above, computer users within commercial and governmental organizations, and even certain sophisticated home users of PCs, generally centralize important backup and archiving tasks and policies on servers or larger computers to which the users' PCs are interconnected via computer networks.
Networked computer systems with servers dedicated to backup and archiving tasks are far more reliable than localized backup and archiving techniques discussed with reference to
Distributed Electronic Data Archiving
In order to overcome many of the problems of localized backup and archiving, discussed above with reference to
Each component data-storage system, such as component data-storage system 306, in the distributed, differential electronic-data backup and archiving system comprises one or more computer systems, such as computer systems 344 and 345 in component data-storage system 306. Each computer system has attached mass-storage devices, including attached mass-storage devices 346 and 347 connected to computer systems 344 and 345, respectively. Multiple computer systems with separate, attached mass-storage devices allow for mirroring of data stored in each component data-storage system to increase both availability and reliability of the data store.
Although the component organization shown in
A distributed electronic-data backup and archiving system addresses many of the problems associated with PC-based backup and archiving and ad hoc backup and archiving in networked systems, discussed above with respect to
As discussed above, logical bins represent logical targets for routing data objects for storage. Logical bins may be known only to a routing method carried out on portal computers or on client computers, and may exist only as a software abstraction to isolate routing-method implementations from other software components involved in management, configuration, and monitoring of a distributed, differential electronic-data storage system. Logical bins (324-338 in
As data objects continue to be stored to a distributed, differential electronic-data storage system, particularly when the distributed, differential electronic-data storage system is employed for data archiving and other such purposes that involve relatively large numbers of storage operations and significantly fewer data-object deletion operations, a given component data-storage system may begin to approach maximum storage capacity. As the component data-storage system more closely approaches maximum storage capacity, storage operations may become more costly in both time and processing cycles. At some point, the component data-storage system may not be able to sufficiently rapidly store additional data objects, or may lack sufficient remaining data storage to store additional data objects. A component data-storage system may also begin to exhibit sporadic error conditions, or may begin to fail altogether.
For these and a variety of other reasons, it may be desirable to reallocate the bins through which data objects are directed to an over-utilized or failing component data-storage system to one or more newly added or currently under-utilized component data-storage systems.
On the other hand, data objects may be routed through a single logical bin to multiple component data-storage systems.
Many other types of mappings between logical bins and component data-storage systems are possible.
In general, an individual data object may contain a significant amount of redundant information, and may therefore be compressed individually.
where O1 and O2 are the two data objects; and
Many other metrics co-compression metrics are possible, including co-compression metrics with values that range from 0 to 1,
Certain distributed, differential electronic-data storage systems may achieve increasingly greater levels of differential-data-storage compression per object when storing more than two data objects, while others may achieve only the pair-wise compression levels discussed above with reference to
While routing of similar data objects to the same component data-storage system is desirable for maximizing the data compression of a distributed, differential electronic-data storage system, overall data-storage efficiency is increased by relatively uniformly distributing data objects across all of the component data-storage systems. In other words, when each component data-storage system stores an approximately equal volume of data, the overall storage capacity of the distributed, differential electronic-data storage system can be most efficiently used. Otherwise, certain of the component data-storage systems may be filled to maximum capacity while other of the component data-storage systems may remain idle, requiring expensive data redistribution operations or equally expensive and inefficient addition of additional component data-storage systems in order to increase capacity of the distributed, differential electronic-data storage system, even though certain of the component data-storage systems are not storing data. Thus, as shown in
In many distributed, differential electronic-data storage systems, it is not necessary that all similar data structures are successfully routed to a single component data-storage system, and it is also not necessary that data be stored in a way that guarantees absolute, uniform distribution of data across all the component data-storage systems. Instead, quality of routing may range from random assignment of data objects to component data-storage systems, regardless of similarity between data objects to ideal collocation of all similar data objects, and may range from non-uniform distribution of data within a distributed data-storage system to an ideal, uniform distribution in which each component data-storage system stores the same volume of data, within the granularity of a minimum data object size. In general, as with most computational systems, there are processing-overhead, communications-overhead, and memory-usage tradeoffs among various approaches to routing, and the closer a routing system approaches ideal uniform data distribution and ideal similar-data-object collocation, the greater amount of processing, memory, and communications resources that may be needed to execute the routing system. In many cases, it is desirable to somewhat relax distribution and collocation requirements in order to increase the speed and efficiency by which data objects are routed. The various embodiments of the present invention represent a favorable balance between routing speed and computational efficiency versus uniformity of data distribution and the degree to which similar data objects are collocated.
It should be noted that, in general, data objects are supplied to a distributed, differential electronic-data storage system serially, one-by-one, so that the distributed, differential electronic-data storage system needs to route data objects to component data-storage systems without the benefit of global information with respect to the data objects that are eventually stored within the distributed, differential electronic-data storage system. Moreover, as additional data objects are stored, and already stored data objects are deleted, the data state of a distributed, differential electronic-data storage system varies dynamically, often in a relatively unpredictable fashion. Therefore, strategies for routing data to achieve uniformity of data distribution and collocation of similar data objects are often unavoidably non-optimal. Furthermore, because routing may represent a significant bottleneck with respect to data-object exchange between a distributed, differential electronic-data storage system and accessing host computer systems, router efficiency and routing speed may be limiting factors in overall system performance. It should also be noted that data-object similarity may be measured in many different ways, subgroups of which are relevant to different compression techniques and differential-store strategies employed by different distributed, differential electronic-data storage systems. The method and system embodiments of the present invention assume the similarity between two data-objects to be correlated with the number of identical, shared subsequences of data units contained within the two data objects.
Assuming data objects to be sequentially ordered, linear arrays of data units, method and system embodiments of the present invention process the data objects in order to first generate a digitally-encoded value, or similarity key, such as a large integer, that is generally much smaller than the data object, in order to represent or characterize the data object. Then, in a second step, method and system embodiments of the present invention, typically using modulo arithmetic, generate a component data-system index or address for directing the data object represented or characterized by the digitally encoded value to a particular component data-storage system or group of data-storage systems.
Next, as shown in
Next, as shown in
The generalized routing method discussed above with reference to
Two particular routing schemes, representing particular fixed parameter values, are of particular interest. In the max-chunk method, offset is equal in value to width, so that the successive windows form a series of consecutive chunks along the linear-array representation of the data object. In this method, the maximum hash value generated from any particular chunk may be selected as the value characteristic of the data object, and a component data-storage address may be computed based on this maximum hash value. Alternatively, the minimum hash value may be selected, or some other value may be computed from the hash values generated from the chunks. In the n-gram routing method, offset is equal to “1.” Thus, hash values are generated for each successive window displaced from the preceding window by one data unit. The n data-units within each window, where n is equal to the width of the window, are considered to be an n gram, and the n-gram hash therefore computes a characteristic value based on examining all possible N grams within the data object.
In a family of distributed, differential electronic-data storage systems, objects are first chunked, and then stored as a set of chunks. One possible co-compression metric for differential-data-storage compression achievable by storing two data objects O1 and O2 in is a chunk-based distributed, differential electronic-data storage system is:
where the function hList produces a list of hashes for a set of chunks;
the function cks produces a set of chunks for a data object; and
the function CountOf returns the number of elements in a set.
This compression metric ranges from 0, for perfect compression, to 1 when no differential-data-storage compression is obtained for the two objects.
Next, in step 1808, the routing method sets the best-received-comparison-metric variable bcm to some large value, such as maxInt, and sets the best-component-store variable bcs to some null value. A compression metric in which greatest compression is represented by 0 and lower levels of compression are represented by numerically increasing values is employed in the described embodiment, although other types of compression metrics may be used in alternative embodiments. Then, in the for-loop of steps 1810-1816, the routing method queries each component data-storage system c successively selected from the n component data-storage systems selected in step 1806. In step 1811, the currently considered data-storage system c is queried for the compression that can be achieved by the component data-storage system c in storing the data object obj. If the compression achievable by the currently considered component data-storage object c is greater than that indicated by the current value stored in variable bcm, as determined in step 1812, then, in step 1813, the variable bcm is updated to the compression level achievable by currently considered component data-storage object c and the variable bcs is set to c. Otherwise, if the compression achievable by component data-storage component c is equal to that indicated by the current value of variable bcm, as determined in step 1814, then a tie-breaking procedure may be invoked, in step 1815. A tie-breaking procedure may involve an additional, more detailed query, or may involve some arbitrary tie-breaking process. One arbitrating tie-breaking process is to eliminate step 1814 and 1815 and select the first component data-storage system that reports the maximum achievable compression rate obtained by querying the n component data-storage systems. If there are additional component data-storage systems to query, as determined in step 1816, then control flows back in step 1811. Otherwise, if the maximum compression achievable, represented by the metric stored in variable bcm, is greater than the threshold compression level t, as determined in step 1818, then the component data-storage system indicated by the value of the variable bcs is returned. Otherwise, a content-based or other non-query-based routing method is undertaken, in step 1806, in order to determine the component data-storage system to which to route data object obj.
In alternative embodiments of chunk-based query-based compression-enhancing routing methods, some subset of hashes for the chunks within a data object may be sent to candidate component data-storage systems, such as a relatively small number of initial chunks, or the initial part of a list of hash values ordered by numerical value. Alternative compression metrics may be returned.
Although the present invention has been described in terms of a particular embodiment, it is not intended that the invention be limited to this embodiment. Modifications within the spirit of the invention will be apparent to those skilled in the art. For example, an essentially limitless number of different implementations of the query-based compression-enhancing routing methods of the present are possible, the implementations programmable in a variety of different programming languages, using alternative flow-control, data structures, modular organizations, targeted for execution on any number of different hardware platforms supporting various different operating systems. A wide variety of distributed, differential electronic-data storage systems that employ binning-based routing and compression-enhancing routing methods are also possible. Although a variety of different metrics have been provided, above, for evaluating the efficiency of distributed, differential electronic-data storage systems, a large number of alternative differential-data-storage-compression evaluating metrics may be devised. In query-based compression-enhancing routing, a single data object can be routed, or multiple, discrete data objects can be combined together for common routing. Alternatively, a single data object can decomposed into smaller, component data objects that can each be separately routed. While, in the above-discussed implementation, component data-storage systems return compression metrics in response to queries, component data-storage systems can alternatively return the memory required to store a data object or some other value or combination of values that allow a query-based compression-enhancing routing method to determine the level of compression achievable by storing the data object in the component data-storage system.
The foregoing description, for purposes of explanation, used specific nomenclature to provide a thorough understanding of the invention. However, it will be apparent to one skilled in the art that the specific details are not required in order to practice the invention. The foregoing descriptions of specific embodiments of the present invention are presented for purpose of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Obviously many modifications and variations are possible in view of the above teachings. The embodiments are shown and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the following claims and their equivalents:
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US5408653||Apr 15, 1992||Apr 18, 1995||International Business Machines Corporation||Efficient data base access using a shared electronic store in a multi-system environment with shared disks|
|US5574902||May 2, 1994||Nov 12, 1996||International Business Machines Corporation||Efficient destaging of updated local cache pages for a transaction in a multisystem and multiprocess database management system with a high-speed shared electronic store|
|US5638509||Jun 13, 1996||Jun 10, 1997||Exabyte Corporation||Data storage and protection system|
|US5990810||Feb 15, 1996||Nov 23, 1999||Williams; Ross Neil||Method for partitioning a block of data into subblocks and for storing and communcating such subblocks|
|US6141053 *||Jan 3, 1997||Oct 31, 2000||Saukkonen; Jukka I.||Method of optimizing bandwidth for transmitting compressed video data streams|
|US6513050||Feb 10, 1999||Jan 28, 2003||Connected Place Limited||Method of producing a checkpoint which describes a box file and a method of generating a difference file defining differences between an updated file and a base file|
|US6651140||Sep 1, 2000||Nov 18, 2003||Sun Microsystems, Inc.||Caching pattern and method for caching in an object-oriented programming environment|
|US6839680||Sep 30, 1999||Jan 4, 2005||Fujitsu Limited||Internet profiling|
|US6938005||Dec 21, 2000||Aug 30, 2005||Intel Corporation||Digital content distribution|
|US6961009||Oct 18, 2004||Nov 1, 2005||Nbt Technology, Inc.||Content-based segmentation scheme for data compression in storage and transmission including hierarchical segment representation|
|US7082548||Feb 27, 2001||Jul 25, 2006||Fujitsu Limited||Backup system and duplicating apparatus|
|US7085883 *||Oct 30, 2002||Aug 1, 2006||Intransa, Inc.||Method and apparatus for migrating volumes and virtual disks|
|US7269689||Jun 17, 2004||Sep 11, 2007||Hewlett-Packard Development Company, L.P.||System and method for sharing storage resources between multiple files|
|US7536291 *||Nov 8, 2005||May 19, 2009||Commvault Systems, Inc.||System and method to support simulated storage operations|
|US7558801 *||Jun 8, 2005||Jul 7, 2009||Getzinger Thomas W||Distributing limited storage among a collection of media objects|
|US20010010070||Feb 15, 2001||Jul 26, 2001||Crockett Robert Nelson||System and method for dynamically resynchronizing backup data|
|US20020103975 *||Jan 26, 2001||Aug 1, 2002||Dawkins William Price||System and method for time weighted access frequency based caching for memory controllers|
|US20020156912||Feb 15, 2001||Oct 24, 2002||Hurst John T.||Programming content distribution|
|US20030101449||Jan 9, 2002||May 29, 2003||Isaac Bentolila||System and method for behavioral model clustering in television usage, targeted advertising via model clustering, and preference programming based on behavioral model clusters|
|US20030110263 *||Oct 23, 2002||Jun 12, 2003||Avraham Shillo||Managing storage resources attached to a data network|
|US20030140051||Aug 16, 2002||Jul 24, 2003||Hitachi, Ltd.||System and method for virtualizing a distributed network storage as a single-view file system|
|US20030223638 *||May 31, 2002||Dec 4, 2003||Intel Corporation||Methods and systems to index and retrieve pixel data|
|US20040054700 *||Aug 18, 2003||Mar 18, 2004||Fujitsu Limited||Backup method and system by differential compression, and differential compression method|
|US20040162953||Feb 18, 2004||Aug 19, 2004||Kabushiki Kaisha Toshiba||Storage apparatus and area allocation method|
|US20050091234||Oct 23, 2003||Apr 28, 2005||International Business Machines Corporation||System and method for dividing data into predominantly fixed-sized chunks so that duplicate data chunks may be identified|
|US20060059173||Sep 15, 2004||Mar 16, 2006||Michael Hirsch||Systems and methods for efficient data searching, storage and reduction|
|US20060059207||Jul 29, 2005||Mar 16, 2006||Diligent Technologies Corporation||Systems and methods for searching of storage data with reduced bandwidth requirements|
|US20060155735 *||Jan 7, 2005||Jul 13, 2006||Microsoft Corporation||Image server|
|US20060293859||Apr 12, 2006||Dec 28, 2006||Venture Gain L.L.C.||Analysis of transcriptomic data using similarity based modeling|
|US20070220197||Sep 7, 2005||Sep 20, 2007||M-Systems Flash Disk Pioneers, Ltd.||Method of managing copy operations in flash memories|
|US20070250519||Apr 25, 2006||Oct 25, 2007||Fineberg Samuel A||Distributed differential store with non-distributed objects and compression-enhancing data-object routing|
|US20070250670||Apr 25, 2006||Oct 25, 2007||Fineberg Samuel A||Content-based, compression-enhancing routing in distributed, differential electronic-data storage systems|
|US20080126176||Jun 29, 2007||May 29, 2008||France Telecom||User-profile based web page recommendation system and user-profile based web page recommendation method|
|US20090112945||Oct 24, 2008||Apr 30, 2009||Peter Thomas Camble||Data processing apparatus and method of processing data|
|US20090112946||Oct 27, 2008||Apr 30, 2009||Kevin Lloyd Jones||Data processing apparatus and method of processing data|
|US20090113167||Oct 22, 2008||Apr 30, 2009||Peter Thomas Camble||Data processing apparatus and method of processing data|
|US20100161554||Dec 22, 2009||Jun 24, 2010||Google Inc.||Asynchronous distributed de-duplication for replicated content addressable storage clusters|
|US20100198792||Oct 25, 2007||Aug 5, 2010||Peter Thomas Camble||Data processing apparatus and method of processing data|
|US20100198832||Oct 25, 2007||Aug 5, 2010||Kevin Loyd Jones||Data processing apparatus and method of processing data|
|US20100205163||Feb 10, 2009||Aug 12, 2010||Kave Eshghi||System and method for segmenting a data stream|
|US20100235372||Oct 25, 2007||Sep 16, 2010||Peter Thomas Camble||Data processing apparatus and method of processing data|
|US20100235485||Mar 16, 2009||Sep 16, 2010||Mark David Lillibridge||Parallel processing of input data to locate landmarks for chunks|
|US20100246709||Mar 27, 2009||Sep 30, 2010||Mark David Lillibridge||Producing chunks from input data using a plurality of processing elements|
|US20100280997||Apr 30, 2009||Nov 4, 2010||Mark David Lillibridge||Copying a differential data store into temporary storage media in response to a request|
|US20100281077||Apr 30, 2009||Nov 4, 2010||Mark David Lillibridge||Batching requests for accessing differential data stores|
|WO2006030326A1||Jan 27, 2005||Mar 23, 2006||3Dhistech Kft.||Method and computer program product for the storage ensuring fast retrieval and efficient transfer of interrelated high-volume 3d information|
|WO2006094365A1||Mar 10, 2006||Sep 14, 2006||Rocksoft Limited||Method for storing data with reduced redundancy using data clusters|
|WO2006094366A1||Mar 10, 2006||Sep 14, 2006||Rocksoft Limited||Method for indexing in a reduced-redundancy storage system|
|WO2006094367A1||Mar 10, 2006||Sep 14, 2006||Rocksoft Limited||Method for detecting the presence of subblocks in a reduced-redundancy storage system|
|WO2007127248A2||Apr 25, 2007||Nov 8, 2007||Hewlett-Packard Development Company, L.P.||Content-based, compression-enhancing routing in distributed, differential electronic-data storage systems|
|WO2009054828A1||Oct 25, 2007||Apr 30, 2009||Hewlett-Packard Development Company, L.P.||Data processing apparatus and method of processing data|
|WO2009131585A1||Apr 25, 2008||Oct 29, 2009||Hewlett-Packard Development Company, L.P.||Data processing apparatus and method of processing data|
|1||Andrejko et al.: User Characteristics Acquisition from Logs with Semantics, Slovak University of Technology in Bratislava, 2007 (8 pages).|
|2||Anthony Ha: Facebook investor backs Chattertrap, a personal assistant for content, Jun. 28, 2010 (6 pages).|
|3||Baoyao, Zhou; ""Intelligent Web Usage Mining"" Nanyang Technological University, Division of Information Systems, School of Computer Engineering, 2004 (94 pages).|
|4||Baynote Inc.: The Collective Intelligence Platform, Online, http://www.baynote.com/technology/platform/ 2010 (1 page).|
|5||Brin, Sergey, et al., "Copy Detection Mechanisms for Digital Documents", Department of Computer Science, Stanford University, Oct. 31, 1994, p. 1-12.|
|6||Clattertrap; Online http://www.clattertrap.com; Jul. 20, 2010 (1 page).|
|7||Claypool et al.; "Implicit Interest Indicators", Worcester Polytechnic Institute, Worcester, Computer Science Department Worcester Polytechnic Institute Worcester, MA 01609, USA., 2001 (8 pages).|
|8||EMC: "Centera Content Addressed Storage Product Description Guide" Internet Citation, [Online] 2002 . . . .|
|9||Eshghi et al., "Jumbo Store: Providing Efficient Incremental Upload and Versioning for a Utility Rendering Service," 2007 (16 pages).|
|10||Fenstermacher et al.; "Mining Client-Side Activity for Personalization" Department of Management Information Systems, Eller College of Business and Public Administration, University of Arizona, Jun. 2002 (8 pages).|
|11||Hongjun Lu et al: Extending a Web Browser with Client-Side Mining, Hong Kong University of Science and Technology Department of Computer Science, Clear Water Bay, Kowloon, Hong Kong, China, 2003 (12 pages).|
|12||Hottolink Inc.; "Recognize" Online, http://www.hottolink.co.jp/english/reconize.html, 2009 (2 pages).|
|13||HSNW: SRI defense technology spawns civilian application: published Jun. 29, 2010 (2 pages).|
|14||L.L. You C. Karamanolis: "Evaluation Of Efficient Archival Storage Techniques". Proceedings of the 21st IEEE/L2TH Nasa Goddard Conference On Mass Storage Systems . . . .|
|15||Manber, Udi, "Finding Similar Files in a Large File System," Department of Computer Science, University of Arizona, TR 93-33, Oct. 1993, (11 pages).|
|16||Muthitacharoen Athicha, et al., "A Low-Bandwidth Network File System," Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP '01), Oct. 2001.|
|17||Rabin, M.O., "Fingerprinting by Random Polynomials," Technical Report, Center for Research in Computing Technology, Harvard University, 1981, Report TR-15-81 (14 pages).|
|18||Sendhikumar et al.; "Personalized ontology for web search personalization" Anna University, Chennai, India , 2008 (7 pages).|
|19||Shahabi et al.; "Yoda an Accurate and Scalable Web based Recommendation System?" University of SouthernCalifornia, Los Angeles, Sep. 2001 (14 pages).|
|20||Shahabi et al.; A Framework for Efficient and Anonymous Web Usage Mining Based on Client-Side Tracking, University of Southern California, Los Angeles, 2002 (48 pages).|
|21||Shankar et al.; ""Personalized Web Search Based on Client Side Ontology"", CS 498: B.Tech Project,10. IIT Kanpur, India 2010 (9 pages).|
|22||U.S. Appl. No. 10/870,783, Non-Final Rejection dated Dec. 15, 2006, pp. 1-4 and attachments.|
|23||U.S. Appl. No. 10/870,783, Notice of Allowance dated Jun. 13, 2007 (7 pages).|
|24||U.S. Appl. No. 11/411,467, Examiner's Answer dated May 11, 2010 (pp. 1-11 and attachment).|
|25||U.S. Appl. No. 11/411,467, Final Rejection dated Aug. 11, 2009 (pp. 1-11 and attachment).|
|26||U.S. Appl. No. 11/411,467, Non-Final Rejection dated Jan. 27, 2009 (pp. 1-9 and attachments).|
|27||U.S. Appl. No. 12/432,804, Non-Final Rejection dated Apr. 8, 2011, pp. 1-16 and attachment.|
|28||Why WUBAT? Website User Behavior &Analysis Tool, Wubat, Online, http://www.wubat.com/ (3 pages).|
|29||You L L et al: "Deep Store: An Archival Storage System Architecture", Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on Tokyo, Japan . . . .|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US8554743 *||Dec 8, 2009||Oct 8, 2013||International Business Machines Corporation||Optimization of a computing environment in which data management operations are performed|
|US8818964||Sep 13, 2012||Aug 26, 2014||International Business Machines Corporation||Optimization of a computing environment in which data management operations are performed|
|US9141621||Apr 30, 2009||Sep 22, 2015||Hewlett-Packard Development Company, L.P.||Copying a differential data store into temporary storage media in response to a request|
|US9672218||Feb 2, 2012||Jun 6, 2017||Hewlett Packard Enterprise Development Lp||Systems and methods for data chunk deduplication|
|US20110138154 *||Dec 8, 2009||Jun 9, 2011||International Business Machines Corporation||Optimization of a Computing Environment in which Data Management Operations are Performed|
|US20120143715 *||Oct 26, 2009||Jun 7, 2012||Kave Eshghi||Sparse index bidding and auction based storage|
|U.S. Classification||709/226, 709/201, 709/229|
|Cooperative Classification||H03M7/30, G11B27/11, G11B27/002, G11B2220/412|
|European Classification||H03M7/30, G11B27/11, G11B27/00A|
|Aug 4, 2006||AS||Assignment|
Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P., TEXAS
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:FINEBERG, SAMUEL A.;ESHGHI, KAVE;MEHRA, PANKAJ;AND OTHERS;REEL/FRAME:018159/0838;SIGNING DATES FROM 20060607 TO 20060724
Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P., TEXAS
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:FINEBERG, SAMUEL A.;ESHGHI, KAVE;MEHRA, PANKAJ;AND OTHERS;SIGNING DATES FROM 20060607 TO 20060724;REEL/FRAME:018159/0838
|Oct 27, 2015||FPAY||Fee payment|
Year of fee payment: 4
|Nov 9, 2015||AS||Assignment|
Owner name: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP, TEXAS
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.;REEL/FRAME:037079/0001
Effective date: 20151027