Search Images Maps Play YouTube News Gmail Drive More »
Sign in
Screen reader users: click this link for accessible mode. Accessible mode has the same essential features but works better with your reader.

Patents

  1. Advanced Patent Search
Publication numberUS20090287665 A1
Publication typeApplication
Application numberUS 12/511,653
Publication dateNov 19, 2009
Filing dateJul 29, 2009
Priority dateDec 22, 2006
Also published asCA2706013A1, CA2706013C, EP2115634A2, EP2115634A4, US7882098, US7937365, US8234249, US8615523, US9639529, US20080228771, US20080243796, US20080249996, US20110179039, US20120271832, US20140114940, WO2008080143A2, WO2008080143A3, WO2008080143B1
Publication number12511653, 511653, US 2009/0287665 A1, US 2009/287665 A1, US 20090287665 A1, US 20090287665A1, US 2009287665 A1, US 2009287665A1, US-A1-20090287665, US-A1-2009287665, US2009/0287665A1, US2009/287665A1, US20090287665 A1, US20090287665A1, US2009287665 A1, US2009287665A1
InventorsAnand Prahlad, Srinivas Kavuri, Rajiv Kottomtharayil, Arun Prasad Amarendran, Brian Brockway, Marcus S. Muller, Andreas May
Original AssigneeAnand Prahlad, Srinivas Kavuri, Rajiv Kottomtharayil, Arun Prasad Amarendran, Brian Brockway, Muller Marcus S, Andreas May
Export CitationBiBTeX, EndNote, RefMan
External Links: USPTO, USPTO Assignment, Espacenet
Method and system for searching stored data
US 20090287665 A1
Abstract
A complete document management system is disclosed. Accordingly, systems and methods for managing data associated with a data storage component coupled to multiple computers over a network are disclosed. Systems and methods for managing data associated with a data storage component coupled to multiple computers over a network are further disclosed. Additionally, systems and methods for accessing documents available through a network, wherein the documents are stored on one or more data storage devices coupled to the network, are disclosed.
Images(12)
Previous page
Next page
Claims(28)
1. A computing system for managing data associated with a data storage component, wherein the data storage component is coupled to multiple computers over a network, the computing system comprising:
a processor;
a memory;
a data storage management component for managing primary copies of data stored within the data storage component and managing secondary copies of the primary copies of the data stored within the data storage component, wherein the secondary copies include copies having two or more storage formats, the two or more storage formats being different than a native format of the primary copies of the data;
a content indexing component for creating or updating at least one index of the stored data managed by the data storage management component, wherein the at least one index includes a first set of information resulting from indexing the primary copies of the data and a second set of information resulting from indexing the secondary copies of the data; and
a web-based search component for searching for stored data, wherein the search component is configured to search the first and second sets of information included in the at least one index for content within the primary copies and the secondary copies based on a single query.
2. The computing system of claim 1, further comprising a metabase associated with the data storage management component, the metabase storing metadata referring to the stored data managed by the data storage management component.
3. The computing system of claim 1, wherein the search component is configured to search for a user-specified parameter in the at least one index.
4. The computing system of claim 1, further comprising a data security component configured to permit access only to stored data satisfying a predefined security level.
5. The computing system of claim 1, further comprising a data security component configured to disallow access to stored data having a predefined security level.
6. The computing system of claim 1, wherein the web-based search component provides an interface for receiving a search parameter from a user, and wherein the search parameter specifies a client or volume for searching.
7. The computing system of claim 1, wherein the secondary copies comprise backup copies and archive copies.
8. A method performed by a computing system having a processor and memory for managing data associated with a data storage component, wherein the data storage component is coupled to multiple computers over a network, the method comprising:
managing primary copies of data stored within the data storage component and secondary copies of the primary copies of the data stored within the data storage component, wherein the secondary copies include copies having two or more storage formats, the two or more storage formats being different than a native format of the primary copies of the data;
creating or updating, by the computing system, at least one index of the stored data managed by the data storage management component, wherein the at least one index includes a first set of information about content within the primary copies of the data and a second set of information about content within the secondary copies of the data; and
searching the at least one index for content within the primary copies and the secondary copies based on a single query.
9. The method of claim 8, further comprising creating or updating, by the computing system, a metabase associated with the data storage component, the metabase storing metadata referring to the data.
10. The method of claim 8, further comprising performing storage operations on the data stored within the data storage component based on a set of preferences or other criteria.
11. The method of claim 8, wherein the secondary copies comprise backup copies and archive copies.
12. A computer-readable storage medium having computer-executable instructions that, when executed by a computing system having a processor and memory, cause the computing system to perform a method of managing data associated with a data storage component, wherein the data storage component is coupled to multiple computers over a network, the method comprising:
managing primary copies of data stored within the data storage component and secondary copies of the primary copies of the data stored within the data storage component, wherein the secondary copies include copies having two or more storage formats, the two or more storage formats being different than a native format of the primary copies of the data;
creating or updating, by the computing system, at least one index of the stored data managed by the data storage management component wherein the at least one index includes a first set of information about content within the primary copies of the data and a second set of information about content within the secondary copies of the data; and
searching the at least one index for content within the primary copies and the secondary copies based on a single query.
13. A computing system for searching data stored in data storage media, the computing system comprising:
a processor;
a memory;
an index component, wherein the index component is configured to:
create an index entry, in a single index for data stored in the data storage media, to include information associated with a first production copy of a first electronic document, the first production copy having a first native format;
create an index entry in the single index to include information associated with a second production copy of a second electronic document, the second production copy having a second native format, wherein the second native format is different than the first native format;
create an index entry in the single index to include information associated with a first secondary copy of the first electronic document, the first secondary copy having a first non-native format; and
create an index entry in the single index to include information associated with a second secondary copy of the second electronic document, the second secondary copy having a second non-native format, wherein the second non-native format is different than the first non-native format; and
a search component, wherein the search component is configured to:
receive a request to query the single index, wherein the request includes specific search criteria;
query the single index to locate information associated with the production copies and secondary copies that satisfies the specific search criteria, wherein the querying includes querying the single index for information associated with the first production copy, the second production copy, the first secondary copy, and the second secondary copy; and
present a result of the query.
14. The computing system of claim 13, wherein the production copies are stored within hard disks associated with the first electronic document or the second electronic document and the secondary copies are stored within magnetic tapes located off site from the hard disks.
15. The computing system of claim 13, further comprising:
a pruning component, wherein the pruning component is configured to remove the first production copy of the first electronic document and the second production copy of the second electronic document;
wherein the presented result of the query includes information identifying the first production copy of the first electronic document or the second production copy of the second electronic document.
16. The computing system of claim 13, wherein the information associated with the first production copy, the second production copy, the first secondary copy, or the second secondary copy includes information identifying a time of creation of the copy.
17. The computing system of claim 13, wherein the information associated with the first production copy, the second production copy, the first secondary copy, or the second secondary copy includes information identifying a location of creation of the copy.
18. The computing system of claim 13, wherein the search component includes a TCP/IP-based graphical user interface for receiving user input identifying the request, the user input identifying data to be used by the search component in the query.
19. The computing system of claim 13, wherein the search component is configured to receive search criteria input via a web browser.
20. The computing system of claim 13, further comprising:
a metabase, wherein the metabase is configured to store the information associated with the copies as metadata relating to the electronic documents.
21. The computing system of claim 13, wherein presenting a result of the query includes:
presenting a first copy of an electronic document, wherein presenting a first copy includes presenting a production copy having a format similar to the electronic document; and
presenting a second copy of the electronic document, wherein presenting the second copy includes:
retrieving a secondary copy having a format different than the format of the electronic document;
converting the secondary copy to a format similar to the format of the electronic document; and
presenting the converted secondary copy.
22. A method in a computing system having a processor and memory for searching data stored in data storage media, the system comprising:
building, by the computing system, an index of data stored in data storage media, wherein building the index includes:
creating an index entry to include information associated with a first production copy of a first electronic document, the first production copy having a first native format;
creating an index entry to include information associated with a second production copy of a second electronic document, the second production copy having a second native format, wherein the second native format is different than the first native format;
creating an index entry to include information associated with a first secondary copy of the first electronic document, the first secondary copy having a first non-native format; and
creating an index entry to include information associated with a second secondary copy of the second electronic document, the second secondary copy having a second non-native format, wherein the second non-native format is different than the first non-native format;
receiving from a user a request to query the built index, wherein the request includes one or more search criteria;
querying the built index to locate production copies and secondary copies that satisfy the request, wherein the querying includes querying the index of the first production copy, the second production copy, the first secondary copy, and the second secondary copy; and
presenting a result of the query to the user.
23. The method of claim 14, wherein creating the index entry to include information associated with the first secondary copy of the first electronic document includes creating the index entry to include the information associated with the first secondary copy before converting the first secondary copy to the non-native format.
24. A method in a computing system having a processor and memory for retrieving data stored across two or more types of data storage media, the method comprising:
for a first data set, performing, by the computing system:
creating a primary copy of the first data set;
storing the primary copy of the first data set within first data storage media, wherein the first data storage media is located at a first disk drive;
identifying information associated with the primary copy of the first data set;
generating an index entry relating the primary copy of the first data set with the identified information associated with the primary copy of the first data set;
updating a single index that tracks data stored in data storage media with the index entry associated with the primary copy of the first data set, wherein the data storage media includes the first data storage media;
transferring the primary copy of the first data set to create a secondary copy of the primary copy of the first data set;
storing the secondary copy of the primary copy of the first data set to the first data storage media;
identifying information associated with the secondary copy of the primary copy of the first data set;
generating an index entry relating the secondary copy of the primary copy of the first data set with the identified information associated with the secondary copy of the primary copy of the first data set;
updating the single index that tracks data stored in the data storage media with the index entry associated with the secondary copy of the primary copy of the first data set;
transferring the secondary copy of the primary copy of the first data set to create a first auxiliary copy, wherein the first auxiliary copy includes a data format different than a format of the primary copy of the first data set and a format of the secondary copy of the primary copy of the first data set;
storing the first auxiliary copy to first removable data storage media at a location different than a location of the first disk drive, wherein the data storage media includes the first removable data storage media;
identifying information associated with the first auxiliary copy;
generating an index entry relating the first auxiliary copy with the identified information associated with the first auxiliary copy; and
updating the single index of data stored across the data storage media with the index entry associated with the first auxiliary copy;
for a second data set, different than the first data set, performing by the computing system:
creating a primary copy of the second data set;
storing the primary copy of the second data set within second data storage media, wherein the second data storage media is located at a second disk drive;
identifying information associated with the primary copy of the second data set;
generating an index entry relating the primary copy of the second data set with the identified information associated with the primary copy of the second data set;
updating the single index that tracks data in the data storage media with the index entry associated with the primary copy of the second data set, wherein the data storage media includes the second data storage media;
transferring the primary copy of the second data set to create a secondary copy of the primary copy of the second data set;
storing the secondary copy of the primary copy of the second data set to the second data storage media;
identifying information associated with the secondary copy of the primary copy of the secondary data set;
generating an index entry relating the secondary copy of the primary copy of the second data set with the identified information associated with the secondary copy of the primary copy of the second data set;
updating the single index that tracks the data in the data storage media with the index entry associated with the secondary copy of the primary copy of the second data set;
transferring the secondary copy of the primary copy of the secondary data set to create a second auxiliary copy, wherein the second auxiliary copy includes a data format different than a format of the primary copy of the second data set and a format of the secondary copy of the primary copy of the second data set;
storing the second auxiliary copy to second removable data storage media at a location different than a location of the second disk drive, wherein the data storage media includes the second removable data storage media;
identifying information associated with the second auxiliary copy;
generating an index entry relating the second auxiliary copy with the identified information associated with the second auxiliary copy; and
updating the single index that tracks the data in the data storage media with the index entry associated with the second auxiliary copy;
receiving from a user a request to locate data from the first data set or the second data set, wherein the request includes information associated with the data;
querying the single index of data stored across the data storage media for the requested information; wherein querying includes searching the information associated with the primary copy of the first data set, the secondary copy of the primary copy of the first data set, the first auxiliary copy, the primary copy of the second data set, the secondary copy of the primary copy of the second data set, and the second auxiliary copy; and
presenting a result of the query to the user, wherein presenting the result of the query includes identifying one or more copies related to the requested information.
25. A method performed by a computing system having a processor and memory for providing search results, the method comprising:
receiving a search query from a user, wherein the search query includes one or more search criteria;
accessing one or more indices, wherein the one or more indices include:
a first set of index information generated from a first set of data items;
a second set of index information generated from a second set of data items, the second set of data items created as a result of a first data storage operation performed on the second set of data items; and
a third set of index information generated from a third set of data items, the third set of data items created as a result of a second data storage operation performed on either the first set of data items or the second set of data items;
determining, by the computing system, index information of the first, second, and/or third sets that satisfies the one or more search criteria;
determining, by the computing system, data items of the first, second, and/or third sets corresponding to the determined index information; and
providing search results corresponding to the data items of the first, second, and/or third sets for display to the user, such that the user may select a data item of the first, second, and/or third sets for display or retrieval.
26. The method of claim 25 wherein:
the first set of data items are stored by one or more computing devices;
the second set of index information is generated so as not to negatively impact the one or more computing devices; and
the third set of index information is generated so as not to negatively impact the one or more computing devices.
27. The method of claim 25 wherein some of the third set of data items are stored on one or more tapes at an off-site location, some of the search results correspond to the third set of data items stored on the one or more tapes, and wherein the method further comprises:
estimating a time to retrieve the third set of data items stored on the one or more tapes; and
providing the estimate in association with the search results, so that the user may ascertain the estimated time to retrieve the third set of data items.
28. The method of claim 25 wherein the one or more search criteria specify that search results corresponding to one or more of the first, second, and third sets of data items are to be provided.
Description
    CROSS-REFERENCE TO RELATED APPLICATIONS
  • [0001]
    This application is a continuation of U.S. patent application Ser. No. 11/931,034, filed Oct. 31, 2007, which claims the benefit of U.S. Provisional Application No. 60/871,735, filed Dec. 22, 2006, each of which is incorporated by reference herein in its entirety.
  • BACKGROUND
  • [0002]
    Data protection systems contain large amounts of data. This data includes personal data, such as financial data, customer/client/patient contact data, audio/visual data, and much more. Corporate computer systems often contain word processing documents, engineering diagrams, spreadsheets, business strategy presentations, and so on. With the proliferation of computer systems and the ease of creating content, the amount of content in an organization has expanded rapidly. Even small offices often have more information stored than any single employee can know about or locate.
  • [0003]
    Some data protection applications provide functions for actively searching for files within the organization based on a previously created index of the information available in each file. A user can then search for and retrieve documents based on a topic. Typical search software operates on a single index of keywords derived from the data that has been copied for protection purposes. It is typical for an organization to maintain many secondary copies of its data and the various copies are typically stored in multiple formats in multiple devices. For example, when current copy of data is made, previous copies are often maintained so that an historical archive is created. Thus, if the most recent copy does not have the desired data for a restore operation, an older copy may be used. With the existence of multiple copies on multiple devices spanning weeks, months and even years, a search over this data can be complex and time consuming. A search over such a large amount of data can require separately searching content indices of all of the computer systems within an organization. This can put an unexpected load on already burdened systems and can require significant time on the part of a system operator.
  • [0004]
    Typical search systems also create problems when retrieval of the desired data is attempted. First, typical systems require that retrieval of the identified data be performed as a restore operation. The typical restore operation first identifies a secondary copy of the data in question on a secondary volume and copies the identified copy of the data back onto a production server (or other primary or working volume) and overwrites the existing data files. This can be inconvenient if it is desired to maintain the production copy or if it is merely desired to inspect the contents of a secondary data store. Second, typical systems are blind to the security rights of users and database operators. Typical systems do not have an integrated data rights security control that identifies the security privileges of the operator or user for whom the data is being restored and allows or denies the restore accordingly. Additionally, typical systems do not allow a user to promote and reapply search criteria throughout the data management system.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • [0005]
    FIG. 1 illustrates an example of a group of platforms and data types for searching.
  • [0006]
    FIG. 2 is a block diagram that illustrates a hierarchical data storage system.
  • [0007]
    FIG. 3 is a block diagram that illustrates components of a storage operations cell.
  • [0008]
    FIG. 4 is a block diagram that illustrates interaction between a global cell and data storage cells.
  • [0009]
    FIG. 5 is a block diagram that illustrates flow of data through the system.
  • [0010]
    FIG. 6 is a flow diagram that illustrates processing of a content indexing component of the system.
  • [0011]
    FIG. 7 is a flow diagram that illustrates processing of an index searching component of the system.
  • [0012]
    FIG. 8 illustrates a client selection interface for searching.
  • [0013]
    FIG. 9 illustrates a query construction interface for searching.
  • [0014]
    FIG. 10 illustrates a search summary.
  • [0015]
    FIG. 11 illustrates a results display in an interface for searching.
  • [0016]
    In the drawings, the same reference numbers and acronyms identify elements or acts with the same or similar functionality for ease of understanding and convenience.
  • DETAILED DESCRIPTION
  • [0017]
    The invention will now be described with respect to various examples. The following description provides specific details for a thorough understanding of, and enabling description for, these examples of the invention. However, one skilled in the art will understand that the invention may be practiced without these details. In other instances, well-known structures and functions have not been shown or described in detail to avoid unnecessarily obscuring the description of the examples of the invention.
  • [0018]
    The terminology used in the description presented below is intended to be interpreted in its broadest reasonable manner, even though it is being used in conjunction with a detailed description of certain specific examples of the invention. Certain terms may even be emphasized below; however, any terminology intended to be interpreted in any restricted manner will be overtly and specifically defined as such in this Detailed Description section.
  • [0019]
    FIG. 1 illustrates a summary example of a group of platforms and data types that can be searched. As illustrated and as described in more detail herein, a search can be performed over any platform, over any data type, and for documents having been created over any period of time. As illustrated, the system described herein can operate to archive and search data files including, for example, word processing documents 101, email correspondence 102, and database files 103. These files and documents can exist as online copies 105, backup copies 110, and archive copies 115. Thus, the systems and methods described herein can be used to search for and locate virtually any document that has ever existed on an institutional system, whether it currently exists or existed at any time in the past. These various data types and platform types can coexist in and be operated on in a hierarchical data storage system.
  • Suitable System
  • [0020]
    Referring to FIG. 2, a block diagram illustrating a hierarchical data storage system comprises two levels: a storage operations level 210 and a global level 250. The global level 250 may contain a global operations cell 260, which may contain a global manager 261 and a database 262. The storage operations level 210 may contain storage operations cells, such as cells 220 and 230. Cells 220 and 230 may perform specified data storage operations, or may perform varied data storage operations that depend on the needs of the system.
  • [0021]
    Cell 220 contains components used in data storage operations, such as a storage manager 221, a database 222, a client 223, and a primary storage database 224. Cell 230 may contain similar components, such as storage manager 231, a database 232, a client 233, and a primary storage database 234. In this example, cell 230 also contains media agent 235 and secondary database 236. Both cells 220 and 230 communicate with global manager 260, providing information related to the data storage operations of their respective cells.
  • [0022]
    Referring to FIG. 3, a block diagram illustrating components of a storage operations cell is shown. Storage operations cells (such as cells 220 or 230 of FIG. 2) may contain some or all of the following components, depending on the use of the cell and the needs of the system. For example, cell 300 contains a storage manager 310, clients 320, multiple media agents 330, and multiple storage devices 340. Storage manager 310 controls media agents 330, which are responsible, at least in part, for transferring data to storage devices 340. Storage manager 310 includes a jobs agent 311, a management agent 312, a database 313, and an interface module 314. Storage manager 310 communicates with client 320. Client 320 accesses data to be stored by the system from database 322 via a data agent 321. The system uses media agents 330, which contain databases 331, to transfer and store data into storage devices 340.
  • [0023]
    Cells 300 may include software and/or hardware components and modules used in data storage operations. The cells 300 may be transfer cells that function to transfer data during data store operations. The cells 300 may perform other storage operations in addition to operations used in data transfers. For example, cells 300 may perform creating, storing, retrieving, and/or migrating primary and secondary data copies. The data copies may include snapshot copies, secondary copies, hierarchical storage manager copies, archive copies, and so on. The cells 300 may also perform storage management functions that may push information to higher level cells, including global manager cells.
  • [0024]
    In some embodiments, the system can be configured to perform a storage operation based on one or more storage policies. A storage policy may be, for example, a data structure that includes a set of preferences or other criteria considered during storage operations. The storage policy may determine or define a storage location, a relationship between components, network pathways, accessible datapipes, retention schemes, compression or encryption requirements, preferred components, preferred storage devices or media, and so on. Storage policies may be stored in storage manager 310, 221, 231, or may be stored in global manager 261, as discussed above.
  • [0025]
    Additionally or alternatively, the system may implement or utilize schedule policies. A schedule policy may specify when to perform storage operations, how often to perform storage operations, and so on. The schedule policy may also define the use of sub-clients, where one type of data (such as email data) is stored using one sub-client, and another type of data (such as database data) is stored using another sub-client. In these cases, storage operations related to specific data types (email, database, and so on) may be distributed between cells.
  • [0026]
    Referring to FIG. 4, a block diagram illustrating interaction between the global cell and data storage cells is shown. Global server 100, which may contain global load components, global filter components, and other components configured to determine actions based on received data storage information, may communicate with a database 420 and a user interface 410. Database 420 may store storage policies, schedule policies, received sample data, other storage operation information, and so on. User interface 410 may display system information to a user. Further details with respect to the user interface display are discussed below.
  • [0027]
    Global server 100 may push data to a management server 442. Server 442 communicates with a database 445 and clients 451, 452 and/or 453. Data storage servers 430 push data to the global server 100, and contain data agents 432 and can communicate with databases 435. These servers may communicate with clients 454, 455, and/or 456.
  • [0028]
    Global server 100 can be configured to perform actions (such as redistributing storage operations), and apply these actions to the data storage system via a management server. Global server 100 receives information used to determine the actions from the storage servers 430. In this example, the global server acts as a hub in the data storage system by sending information to modify data storage operations and monitoring the data storage operations to determine how to improve the operations.
  • Index Searching
  • [0029]
    The hierarchical storage system described herein can be used for searching multiple indices of content, retrieving the identified data in accordance with integrated data security policies, and applying the search criteria as a data management policy. Some or all of these functions can be performed via a simple interface accessed, e.g., from a web browser.
  • [0030]
    The content indices searched can be created by a content indexing system. Indices of this data can be created using any known technique including those described in the assignee's co-pending application Ser. No. 11/694,869 entitled “Method and System for Offline Indexing of Content and Classifying Stored Data” (Attorney Docket No. 60692-8046), the contents of which are herein incorporated by reference.
  • [0031]
    The content indexing system can create an index of an organization's content by examining files generated from routine secondary copy operations performed by the organization. The content indexing system can index content from current secondary copies of the system as well as older copies that contain data that may no longer be available on the organization's network. For example, the organization may have secondary copies dating back several years that contain older data that is no longer available, but may still be relevant to the organization. The content indexing system may associate additional properties with data that are not part of traditional indexing of content, such as the time the content was last available or user attributes associated with the content. For example, user attributes such as a project name with which a data file is associated may be stored.
  • [0032]
    Members of the organization can search the created index to locate content on a secondary storage device that is no longer online. For example, a user can search for content related to a project that was cancelled a year ago. In this way, content indexing is not affected by the availability of the system that is the original source of the content and users can find additional organization data that is not available in traditional content indexing systems.
  • [0033]
    In some embodiments, members of the organization can search for content within the organization independent of the content's source through a single, unified user interface, which may be available thorough a web browser. For example, members may search for content that originated on a variety of computer systems within the organization. Members may also search through any copy of the content including any primary, secondary, and/or tertiary or auxiliary copies of the content.
  • [0034]
    In some embodiments, the content indexing system searches for content based on availability information related to the content. For example, a user may search for content available during a specified time period, such as email received during a particular month. A user may also search specifically for content that is no longer available, such as searching for files deleted from the user's primary computer system. The user may perform a search based on the attributes described above, such as a search based on the time an item was deleted or based on a project with which the item was associated. A user may also search based on keywords associated with user attributes, such as searching for files that only an executive of the organization would have access to, or searching for files tagged as confidential.
  • [0035]
    FIG. 5 is a block diagram that illustrates the procedural flow of data, in one embodiment. Content is initially stored on a data server 505 that may be a user computer, data warehouse server, or other information store accessible via a network. The data is accessed by a secondary copy manager 510 to perform a regular copy of the data. Secondary copies of data are stored in a secondary copy data store 515 such as a network attached storage device or secondary copy server. The secondary copy data store 515 provides the data to the content indexing system 520 to perform the functions described above. As illustrated in the diagram, because the content indexing system 520 works with a copy of the data, the original data server 505 is not negatively impacted by the operations of the content indexing system 520. Search system 525 can operate on the data in the content indexing system 520 to provide search functionality for the data having been stored in the secondary copy data store 515.
  • [0036]
    FIGS. 6-7 are representative flow diagrams that depict processes used in some embodiments. These flow diagrams do not show all functions or exchanges of data, but instead they provide an understanding of commands and data exchanged under the system. Those skilled in the relevant art will recognize that some functions or exchange of commands and data may be repeated, varied, omitted, or supplemented, and other (less important) aspects not shown may be readily implemented.
  • [0037]
    FIG. 6 is a flow diagram that illustrates the processing of a content indexing component for later searching, according to one embodiment. The component is invoked when new content is available or additional content is ready to be added to the content index. In step 610, the component selects a copy of the data to be indexed. For example, the copy may be a secondary copy of the data or a data snapshot. In step 620, the component identifies content within the copy of the data. For example, the component may identify data files such as word processing documents, spreadsheets, and presentation slides within the secondary data store. In step 630, the component updates an index of content to make the content available for searching. For example, the component may add information such as the location of the content, keywords found within the content, and other supplemental information about the content that may be helpful for locating the content during a search. After step 630, these steps conclude.
  • [0038]
    FIG. 7 is a flow diagram that illustrates the processing of an index searching component of the system, in one embodiment. In step 710, the component receives a search request specifying criteria for finding matching target content. For example, the search request may specify one or more keywords that will be found in matching documents. The search request may also specify boolean operators, regular expressions, and other common search parameters to identify relationships and precedence between terms within the search query. The search request may also specify data stores to be searched. The request may specify that the search is to include one or more of an original copy, a primary secondary copy, and secondary or auxiliary copies of the content. As described in more detail below, in some embodiments, a user may be provided with an interface by which to select one or more classes of data stores for search. In some embodiments, an interface may be provided by which a user can specify a security clearance and corresponding operators. For example, a user could form a search query for all documents on a certain class of data store having medium security or higher clearance.
  • [0039]
    In step 720, the component searches the content index to identify matching content items that are added to a set of search results. For example, the component may identify documents containing specified keywords or other criteria and add these to a list of search results. In step 730, the component selects a first or next search result. In decision step 740, if the search results indicate that the identified content is offline, then the component continues at step 750, else the component continues at step 760. For example, the content may be offline because it is on a tape that has been sent to an offsite storage location. In step 470, the component retrieves the archived content. Additionally or alternatively, the component may provide an estimate of the time required to retrieve the archived content and add this information to the selected search result. In step 760 the component provides the search results in response to the search query. For example, the user may receive the search results through a web browser interface that lists the search results or the search results may be provided to another component for additional processing through an application programming interface (API). After step 760, these steps conclude.
  • Federated Search
  • [0040]
    The search described herein can include indices of data, where the data is a snapshot, primary copy, secondary copy, auxiliary copy, and so on. An organization may have several copies of data available on different types of media. Data may be available on, for example, a tape, on a secondary copy server, or through network attached storage.
  • [0041]
    The search capability can be extended to handle an end-user based search via a web interface, a user-based search (e.g., all files that can belong to “Bob” or that can be viewed by “Bob”), search results across several application types (e.g., file copies, Microsoft Exchange mailbox copies, Microsoft Exchange data agents, Microsoft Exchange public folders, etc.) and search results across multiple computers.
  • [0042]
    Using a graphical user interface, search criteria can be provided to specify data that is stored on any number and type of volumes and any type of data. An interface such as the interface 800 illustrated in FIG. 8 can be used to specify a search term 801 and one or more clients or volumes to search. As illustrated in FIG. 8, a list of available clients 805 can be presented. A set of controls 810 can be used to select one or more of the available clients. Selected clients can be displayed in region 815. Variations on this embodiment of the interface can be used to allow a user to select various volumes for the search. For example, the interface can allow a user to specify that the search is to be over the original copy, a primary secondary copy, and secondary or auxiliary copies of the content. The interface can also be configured to allow the user to specify that the search is to include file contents. An exemplary interface for allowing this option and receiving additional related parameters from a user can include an enabling check box 820 for searching in files, a search by field 825, a file name field 830, and a folder path 835 field.
  • [0043]
    The search criteria can also specify that the data be from any of multiple applications or of any type. An example of an interface for receiving additional search parameters is shown in FIG. 9. The search interface 900 can include fields for a search term 905, file name 906, file size 907, folder 908, modification date 909, email subject 910, email sender 911, email recipient 912, folder 913, date of receipt 914, and various advanced options such as client 915, iDA 916, owner 917, accessibility 918, sample 919, indexing time 920, and time zone 921.
  • [0044]
    Through the same interface or a separate interface, the user can also select the various types of application data to be searched. The graphical interface for performing the search can provide an efficient means for a user to enter search terms and perform that search over multiple volumes and data types. For example, the interface can provide check boxes or other population routines for identifying hardware or resources and display the list whereby a user can select specific volumes by name or address or whereby a user can select volumes by type or classification. Similarly, a user may be prompted to specify data types or classes.
  • [0045]
    In some embodiments, the search performed over multiple secondary copies and physical devices will be made with reference to metadata stored in one or more metabases or other forms of databases. A data collection agent may traverse a network file system and obtain certain characteristics and other attributes of data in the system. In some embodiments, such a database may be a collection of metadata and/or other information regarding the network data and may be referred to herein as a metabase. Generally, metadata refers to data or information about data, and may include, for example, data relating to storage operations or storage management, such as data locations, storage management components associated with data, storage devices used in performing storage operations, index data, data application type, or other data. Operations can be performed on this data using any known technique including those described in the assignee's co-pending application Ser. No. 11/564,119 entitled “Systems and Methods for Classifying and Transferring Information in a Storage Network” (Attorney Docket No. 60692-8029) the contents of which are herein incorporated by reference.
  • [0046]
    Current storage management systems employ a number of different methods to perform storage operations on electronic data. For example, data can be stored in primary storage as a primary copy or in secondary storage as various types of secondary copies including, as a backup copy, a snapshot copy, a hierarchical storage management copy (“HSM”), as an archive copy, and as other types of copies.
  • [0047]
    A primary copy of data is generally a production copy or other “live” version of the data which is used by a software application and is generally in the native format of that application. Primary copy data may be maintained in a local memory or other high-speed storage device that allows for relatively fast data access if necessary. Such primary copy data is typically intended for short term retention (e.g., several hours or days) before some or all of the data is stored as one or more secondary copies, for example to prevent loss of data in the event a problem occurred with the data stored in primary storage.
  • [0048]
    Secondary copies include point-in-time data and are typically for intended for long-term retention (e.g., weeks, months or years depending on retention criteria, for example as specified in a storage policy as further described herein) before some or all of the data is moved to other storage or discarded. Secondary copies may be indexed so users can browse and restore the data at another point in time. After certain primary copy data is backed up, a pointer or other location indicia such as a stub may be placed in the primary copy to indicate the current location of that data.
  • [0049]
    One type of secondary copy is a backup copy. A backup copy is generally a point-in-time copy of the primary copy data stored in a backup format as opposed to in native application format. For example, a backup copy may be stored in a backup format that is optimized for compression and efficient long-term storage. Backup copies generally have relatively long retention periods and may be stored on media with slower retrieval times than other types of secondary copies and media. In some cases, backup copies may be stored at on offsite location.
  • [0050]
    Another form of secondary copy is a snapshot copy. From an end-user viewpoint, a snapshot may be thought as an instant image of the primary copy data at a given point in time. A snapshot generally captures the directory structure of a primary copy volume at a particular moment in time, and also preserves file attributes and contents. In some embodiments, a snapshot may exist as a virtual file system, parallel to the actual file system. Users typically gain a read-only access to the record of files and directories of the snapshot. By electing to restore primary copy data from a snapshot taken at a given point in time, users may also return the current file system to the prior state of the file system that existed when the snapshot was taken.
  • [0051]
    A snapshot may be created instantly, using a minimum of file space, but may still function as a conventional file system backup. A snapshot may not actually create another physical copy of all the data, but may simply create pointers that are able to map files and directories to specific disk blocks.
  • [0052]
    In some embodiments, once a snapshot has been taken, subsequent changes to the file system typically do not overwrite the blocks in use at the time of snapshot. Therefore, the initial snapshot may use only a small amount of disk space needed to record a mapping or other data structure representing or otherwise tracking the blocks that correspond to the current state of the file system. Additional disk space is usually only required when files and directories are actually modified later. Furthermore, when files are modified, typically only the pointers which map to blocks are copied, not the blocks themselves. In some embodiments, for example in the case of copy-on-write snapshots, when a block changes in primary storage, the block is copied to secondary storage before the block is overwritten in primary storage and the snapshot mapping of file system data is updated to reflect the changed block(s) at that particular point in time. An HSM copy is generally a copy of the primary copy data, but typically includes only a subset of the primary copy data that meets a certain criteria and is usually stored in a format other than the native application format. For example, an HSM copy might include only that data from the primary copy that is larger than a given size threshold or older than a given age threshold and that is stored in a backup format. Often, HSM data is removed from the primary copy, and a stub is stored in the primary copy to indicate its new location. When a user requests access to the HSM data that has been removed or migrated, systems use the stub to locate the data and often make recovery of the data appear transparent even though the HSM data may be stored at a location different from the remaining primary copy data.
  • [0053]
    An archive copy is generally similar to an HSM copy, however, the data satisfying criteria for removal from the primary copy is generally completely removed with no stub left in the primary copy to indicate the new location (i.e., where it has been moved to). Archive copies of data are generally stored in a backup format or other non-native application format. In addition, archive copies are generally retained for very long periods of time (e.g., years) and in some cases are never deleted. Such archive copies may be made and kept for extended periods in order to meet compliance regulations or for other permanent storage applications.
  • [0054]
    In some embodiments, application data over its lifetime moves from more expensive quick access storage to less expensive slower access storage. This process of moving data through these various tiers of storage is sometimes referred to as information lifecycle management (“ILM”). This is the process by which data is “aged” from more forms of secondary storage with faster access/restore times down through less expensive secondary storage with slower access/restore times, for example, as the data becomes less important or mission critical over time.
  • [0055]
    With this arrangement, when a search over multiple secondary copies is to be performed, a system administrator or system process may simply consult the metabase for such information rather than iteratively access and analyze each data item in the network. This approach significantly reduces the amount of time required to obtain data object information by substantially reducing or eliminating the need to obtain information from the source data, and furthermore reduces or minimizes the involvement of network resources in this process, thereby reducing the processing burden on the host system.
  • [0056]
    In some embodiments, a query may be received by the system for certain information. This request may be processed and analyzed by a manager module or other system process that determines or otherwise identifies which metabase or metabases within the system likely include at least some of the requested information. For example, the query itself may suggest which metabases to search and/or the management module may consult an index that contains information regarding metabase content within the system. The identification process may include searching and identifying multiple computing devices within an enterprise or network that may contain information satisfying search criteria.
  • [0057]
    A processor can be configured to search metabases or other indices corresponding to multiple volumes and data stores to identify an appropriate data set that may potentially have information related to the query. This may involve performing iterative searches that examine results generated by previous searches and subsequently searching additional, previously unidentified metabases to find responsive information that may not have been found during the initial search. Thus, the initial metabase search may serve as a starting point for searching tasks that may be expanded based on returned or collected results. The returned results may be optionally analyzed for relevance, arranged, and placed in a format suitable for subsequent use (e.g., with another application), or suitable for viewing by a user and reported.
  • [0058]
    Once a search has been performed and at least one document or other discrete data item identified, a list of the identified documents or data items can be provided. An example interface 1000 for displaying the results of an email search is illustrated in FIG. 10. The interface 1000 can include a summary area 1005 with summary information as well as a search results section 1010.
  • [0059]
    In some further embodiments, the one or more identified documents can be retrieved without performing a restore of the data back to the production volume. Such a transfer may involve copying data objects and metadata from one data store and metabase to another, or in some embodiments, may involve migrating the data from its original location to a second location and leaving a pointer or other reference to the second location so the moved information may be quickly located from information present at the original location.
  • [0060]
    In some embodiments, a preview pane can be provided so that a user can view at least a portion of the contents of the identified file. One such preview pane 1100 is illustrated in FIG. 11. This preview can be provided before any restore or retrieve operation is executed. In some embodiments, the preview can be generated by reading the identified file from the original data store and displaying the contents on the screen. In other embodiments, the identified file can be copied to a local disk and the preview generated based on file as it resides on a local disk. In some embodiments, the interface can display a portion of content 1105 from the data file returned by the search query and, in some further embodiments, prompt a user to refine the search. Data retrieval can also be performed using any known technique including those described in the assignee's co-pending application Ser. No. 11/694,890 entitled “System and Method for Data Retrieval, Including Secondary Copy Precedence Optimizations” (Attorney Docket No. 60692-8039), the contents of which are herein incorporated by reference.
  • Data Management Policy Integration
  • [0061]
    In some embodiments, the search criteria provided by a user as part of a search can later be applied as a data management policy. For example, a user could develop search terms that identify a certain set of data files. These search terms can then be stored as a data management policy which can then be applied at any other point in the data storage system. A data management policy created in this manner can be a data structure or other information source that includes a set of preferences and other storage criteria associated with performing a storage operation. The data management policy created based on a user-supplied search criteria can also be used as part of a schedule policy.
  • [0062]
    A schedule policy may specify when to perform storage operations and how often, and may also specify performing certain storage operations on sub-clients of data and how to treat those sub-clients. A sub-client may represent static or dynamic associations of portions of data of a volume and are typically mutually exclusive. Thus, a portion of data may be given a label and the association is stored as a static entity in an index, database or other storage location used by the system. Sub-clients may also be used as an effective administrative scheme of organizing data according to data type, department within the enterprise, storage preferences, etc. The search criteria provided by a user can be used as a file selector in connection with any schedule policy.
  • [0063]
    In some embodiments, the data management policy can include various storage preferences, for example, those expressed by a user preference or storage policy. As non-limiting examples, the data management policy can specify a storage location, relationships between system components, network pathway to utilize, retention policies, data characteristics, compression or encryption requirements, preferred system components to utilize in a storage operation, and other criteria relating to a storage operation. Thus, a storage policy may indicate that certain data is to be stored in a specific storage device, retained for a specified period of time before being aged to another tier of secondary storage, copied to secondary storage using a specified number of streams, etc. A storage policy and/or a schedule policy may be stored in a storage manager database or in other locations or components of the system.
  • Integrated Data Rights Security Control
  • [0064]
    Some organizations may have multiple levels of security according to which some users can access certain files while others cannot. For example, a high security user group can be defined and this group can be granted access to all documents created by the organization; a medium security group can be granted access to only certain classes of documents; a low security group can be granted access only to certain predefined documents.
  • [0065]
    The search interface described herein can be configured to be accessible by any type of user including a secondary copy administrator, an end user who does not have any administrative privileges, or a user of any security clearance. Additionally, the data files stored in the data management system can tagged with security information. This information tag can be stored in a metabase or any other form of content index and can be used to leverage existing security schema. In embodiments in which a search is performed on one or more context indices, corresponding security tag information can be stored therein. Security information can include identification of various classes of users who are granted rights to access the document as well as identification of classes of users who are denied access rights.
  • [0066]
    In some embodiments, security information can be stored in the form of user tags. User tags are further described in the assignee's co-pending application Ser. No. 11/694,784 entitled “System and Method Regarding Security And Permissions” (Attorney Docket No. 60692.8042), the contents of which are herein incorporated by reference.
  • [0067]
    In some further embodiments, the search results can be filtered based on the user's security clearance or access privileges. After a user enters search parameters, data files matching those parameters may be identified, and a list of the identified files displayed to the user. If the user does not have the required security clearance or access privileges, the interface can be configured not to display the file.
  • [0068]
    It is possible that a secondary copy administrator may not have sufficient security clearance to inspect a file that is being restored or retrieved. In such a circumstance, the administrator will not be allowed to preview the file or otherwise inspect the contents of it during the search process. The interface providing results may be configured to not display a preview of such a file. If a secondary copy administrator had sufficient security clearance, then a preview may be provided or the administrator may be allowed to make a local copy of the file.
  • [0069]
    If the secondary copy administrator does not have sufficient security clearance for a specific file or group or class of files, an interface may be provided through which the administrator may initiate a copy of that file directly from the secondary copy device to a directory or disk associated with a user who has sufficient security clearance. In some instances, the user associated with the file may be the owner of the file. If the secondary copy administrator or other user executing a search query has sufficient security clearance to inspect the contents of the one or more files identified in the search, a preview of the data file may be displayed.
  • System Embodiments
  • [0070]
    The following discussion provides a brief, general description of a suitable computing environment in which the invention can be implemented. Although not required, aspects of the invention are described in the general context of computer-executable instructions, such as routines executed by a general-purpose computer, e.g., a server computer, wireless device or personal computer. Those skilled in the relevant art will appreciate that the invention can be practiced with other communications, data processing, or computer system configurations, including: Internet appliances, hand-held devices (including personal digital assistants (PDAs)), wearable computers, all manner of cellular or mobile phones, multi-processor systems, microprocessor-based or programmable consumer electronics, set-top boxes, network PCs, mini-computers, mainframe computers, and the like. Indeed, the terms “computer,” “host,” and “host computer” are generally used interchangeably herein, and refer to any of the above devices and systems, as well as any data processor.
  • [0071]
    Aspects of the invention can be embodied in a special purpose computer or data processor that is specifically programmed, configured, or constructed to perform one or more of the computer-executable instructions explained in detail herein. Aspects of the invention can also be practiced in distributed computing environments where tasks or modules are performed by remote processing devices, which are linked through a communications network, such as a Local Area Network (LAN), Wide Area Network (WAN), or the Internet. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.
  • [0072]
    Aspects of the invention may be stored or distributed on computer-readable media, including magnetically or optically readable computer discs, hard-wired or preprogrammed chips (e.g., EEPROM semiconductor chips), nanotechnology memory, biological memory, or other data storage media. Indeed, computer implemented instructions, data structures, screen displays, and other data under aspects of the invention may be distributed over the Internet or over other networks (including wireless networks), on a propagated signal on a propagation medium (e.g., an electromagnetic wave(s), a sound wave, etc.) over a period of time, or they may be provided on any analog or digital network (packet switched, circuit switched, or other scheme). Those skilled in the relevant art will recognize that portions of the invention reside on a server computer, while corresponding portions reside on a client computer such as a mobile or portable device, and thus, while certain hardware platforms are described herein, aspects of the invention are equally applicable to nodes on a network.
  • CONCLUSION
  • [0073]
    From the foregoing, it will be appreciated that specific embodiments of the system have been described herein for purposes of illustration, but that various modifications may be made without deviating from the spirit and scope of the invention. For example, although files have been described, other types of content such as user settings, application data, emails, and other data objects can all be indexed by the system. Accordingly, the invention is not limited except as by the appended claims.
  • [0074]
    Unless the context clearly requires otherwise, throughout the description and the claims, the words “comprise,” “comprising,” and the like are to be construed in an inclusive sense, as opposed to an exclusive or exhaustive sense; that is to say, in the sense of “including, but not limited to.” The word “coupled”, as generally used herein, refers to two or more elements that may be either directly connected, or connected by way of one or more intermediate elements. Additionally, the words “herein,” “above,” “below,” and words of similar import, when used in this application, shall refer to this application as a whole and not to any particular portions of this application. Where the context permits, words in the above Detailed Description using the singular or plural number may also include the plural or singular number respectively. The word “or” in reference to a list of two or more items, that word covers all of the following interpretations of the word: any of the items in the list, all of the items in the list, and any combination of the items in the list.
  • [0075]
    The above detailed description of embodiments of the invention is not intended to be exhaustive or to limit the invention to the precise form disclosed above. While specific embodiments of, and examples for, the invention are described above for illustrative purposes, various equivalent modifications are possible within the scope of the invention, as those skilled in the relevant art will recognize. For example, while processes or blocks are presented in a given order, alternative embodiments may perform routines having steps, or employ systems having blocks, in a different order, and some processes or blocks may be deleted, moved, added, subdivided, combined, and/or modified. Each of these processes or blocks may be implemented in a variety of different ways. Also, while processes or blocks are at times shown as being performed in series, these processes or blocks may instead be performed in parallel, or may be performed at different times.
  • [0076]
    The teachings of the invention provided herein can be applied to other systems, not necessarily the system described above. The elements and acts of the various embodiments described above can be combined to provide further embodiments.
  • [0077]
    These and other changes can be made to the invention in light of the above Detailed Description. While the above description details certain embodiments of the invention and describes the best mode contemplated, no matter how detailed the above appears in text, the invention can be practiced in many ways. Details of the system may vary considerably in implementation details, while still being encompassed by the invention disclosed herein. As noted above, particular terminology used when describing certain features or aspects of the invention should not be taken to imply that the terminology is being redefined herein to be restricted to any specific characteristics, features, or aspects of the invention with which that terminology is associated. In general, the terms used in the following claims should not be construed to limit the invention to the specific embodiments disclosed in the specification, unless the above Detailed Description section explicitly defines such terms. Accordingly, the actual scope of the invention encompasses not only the disclosed embodiments, but also all equivalent ways of practicing or implementing the invention under the claims.
  • [0078]
    While certain aspects of the invention are presented below in certain claim forms, the inventors contemplate the various aspects of the invention in any number of claim forms. For example, while only one aspect of the invention is recited as embodied in a computer-readable medium, other aspects may likewise be embodied in a computer-readable medium. Accordingly, the inventors reserve the right to add additional claims after filing the application to pursue such additional claim forms for other aspects of the invention.
Patent Citations
Cited PatentFiling datePublication dateApplicantTitle
US4995035 *Oct 31, 1988Feb 19, 1991International Business Machines CorporationCentralized management in a computer network
US5005122 *Sep 8, 1987Apr 2, 1991Digital Equipment CorporationArrangement with cooperating management server node and network service node
US5093912 *Jun 26, 1989Mar 3, 1992International Business Machines CorporationDynamic resource pool expansion and contraction in multiprocessing environments
US5133065 *Jul 27, 1989Jul 21, 1992Personal Computer Peripherals CorporationBackup computer program for networks
US5193154 *Oct 25, 1991Mar 9, 1993Hitachi, Ltd.Buffered peripheral system and method for backing up and retrieving data to and from backup memory device
US5212772 *Feb 11, 1991May 18, 1993Gigatrend IncorporatedSystem for storing data in backup tape device
US5226157 *Mar 2, 1989Jul 6, 1993Hitachi, Ltd.Backup control method and system in data processing system using identifiers for controlling block data transfer
US5276860 *Dec 19, 1989Jan 4, 1994Epoch Systems, Inc.Digital data processor with improved backup storage
US5276867 *Dec 19, 1989Jan 4, 1994Epoch Systems, Inc.Digital data storage system with improved data migration
US5287500 *Jun 3, 1991Feb 15, 1994Digital Equipment CorporationSystem for allocating storage spaces based upon required and optional service attributes having assigned piorities
US5333315 *Jun 27, 1991Jul 26, 1994Digital Equipment CorporationSystem of device independent file directories using a tag between the directories and file descriptors that migrate with the files
US5410700 *Sep 4, 1991Apr 25, 1995International Business Machines CorporationComputer system which supports asynchronous commitment of data
US5491810 *Mar 1, 1994Feb 13, 1996International Business Machines CorporationMethod and system for automated data storage system space allocation utilizing prioritized data set parameters
US5495607 *Nov 15, 1993Feb 27, 1996Conner Peripherals, Inc.Network management system having virtual catalog overview of files distributively stored across network domain
US5504873 *Nov 12, 1993Apr 2, 1996E-Systems, Inc.Mass data storage and retrieval system
US5519865 *Jul 20, 1994May 21, 1996Mitsubishi Denki Kabushiki KaishaSystem and method for retrieving and classifying data stored in a database system
US5619644 *Sep 18, 1995Apr 8, 1997International Business Machines CorporationSoftware directed microcode state save for distributed storage controller
US5638509 *Jun 13, 1996Jun 10, 1997Exabyte CorporationData storage and protection system
US5729743 *Nov 30, 1995Mar 17, 1998Deltatech Research, Inc.Computer apparatus and method for merging system deltas
US5737747 *Jun 10, 1996Apr 7, 1998Emc CorporationPrefetching to service multiple video streams from an integrated cached disk array
US5751997 *Jan 19, 1996May 12, 1998Apple Computer, Inc.Method and apparatus for transferring archival data among an arbitrarily large number of computer devices in a networked computer environment
US5758359 *Oct 24, 1996May 26, 1998Digital Equipment CorporationMethod and apparatus for performing retroactive backups in a computer system
US5761677 *Jan 3, 1996Jun 2, 1998Sun Microsystems, Inc.Computer system method and apparatus providing for various versions of a file without requiring data copy or log operations
US5764972 *Jun 7, 1995Jun 9, 1998Lsc, Inc.Archiving file system for data servers in a distributed network environment
US5875478 *Dec 3, 1996Feb 23, 1999Emc CorporationComputer backup using a file system, network, disk, tape and remote archiving repository media system
US5887134 *Jun 30, 1997Mar 23, 1999Sun MicrosystemsSystem and method for preserving message order while employing both programmed I/O and DMA operations
US5892917 *Sep 27, 1995Apr 6, 1999Microsoft CorporationSystem for log record and log expansion with inserted log records representing object request for specified object corresponding to cached object copies
US5901327 *Mar 17, 1997May 4, 1999Emc CorporationBundling of write data from channel commands in a command chain for transmission over a data link between data storage systems for remote data mirroring
US5907621 *Nov 15, 1996May 25, 1999International Business Machines CorporationSystem and method for session management
US6021415 *Oct 29, 1997Feb 1, 2000International Business Machines CorporationStorage management system with file aggregation and space reclamation within aggregated files
US6023710 *Dec 23, 1997Feb 8, 2000Microsoft CorporationSystem and method for long-term administration of archival storage
US6026414 *Mar 5, 1998Feb 15, 2000International Business Machines CorporationSystem including a proxy client to backup files in a distributed computing environment
US6052735 *Apr 10, 1998Apr 18, 2000Microsoft CorporationElectronic mail object synchronization between a desktop computer and mobile device
US6061692 *Nov 4, 1997May 9, 2000Microsoft CorporationSystem and method for administering a meta database as an integral component of an information server
US6076148 *Dec 26, 1997Jun 13, 2000Emc CorporationMass storage subsystem and backup arrangement for digital data processing system which permits information to be backed up while host computer(s) continue(s) operating in connection with information stored on mass storage subsystem
US6175829 *Apr 22, 1998Jan 16, 2001Nec Usa, Inc.Method and apparatus for facilitating query reformulation
US6212512 *Jan 6, 1999Apr 3, 2001Hewlett-Packard CompanyIntegration of a database into file management software for protecting, tracking and retrieving data
US6343324 *Sep 13, 1999Jan 29, 2002International Business Machines CorporationMethod and system for controlling access share storage devices in a network environment by configuring host-to-volume mapping data structures in the controller memory for granting and denying access to the devices
US6356801 *May 19, 2000Mar 12, 2002International Business Machines CorporationHigh availability work queuing in an automated data storage library
US6374336 *Apr 3, 1998Apr 16, 2002Avid Technology, Inc.Computer system and process for transferring multiple high bandwidth streams of data between multiple storage units and multiple applications in a scalable and reliable manner
US6389432 *Apr 5, 1999May 14, 2002Auspex Systems, Inc.Intelligent virtual volume access
US6519679 *Jun 11, 1999Feb 11, 2003Dell Usa, L.P.Policy based storage configuration
US6538669 *Jul 15, 1999Mar 25, 2003Dell Products L.P.Graphical user interface for configuration of a storage system
US6542909 *Jun 30, 1998Apr 1, 2003Emc CorporationSystem for determining mapping of logical objects in a computer system
US6542972 *Jan 30, 2001Apr 1, 2003Commvault Systems, Inc.Logical view and access to physical storage in modular data and storage management system
US6564228 *Jan 14, 2000May 13, 2003Sun Microsystems, Inc.Method of enabling heterogeneous platforms to utilize a universal file system in a storage area network
US6581143 *May 9, 2002Jun 17, 2003Emc CorporationData processing method and apparatus for enabling independent access to replicated data
US6732124 *Feb 9, 2000May 4, 2004Fujitsu LimitedData processing system with mechanism for restoring file systems based on transaction logs
US6847984 *Dec 16, 1999Jan 25, 2005Livevault CorporationSystems and methods for backing up data files
US6857053 *Apr 10, 2002Feb 15, 2005International Business Machines CorporationMethod, system, and program for backing up objects by creating groups of objects
US6871163 *May 31, 2002Mar 22, 2005Sap AktiengesellschaftBehavior-based adaptation of computer systems
US6886020 *Aug 17, 2000Apr 26, 2005Emc CorporationMethod and apparatus for storage system metrics management and archive
US6983322 *Nov 21, 2000Jan 3, 2006Al Acquisitions, Inc.System for discrete parallel processing of queries and updates
US6996616 *Apr 17, 2001Feb 7, 2006Akamai Technologies, Inc.HTML delivery from edge-of-network servers in a content delivery network (CDN)
US7003519 *Sep 22, 2000Feb 21, 2006France TelecomMethod of thematic classification of documents, themetic classification module, and search engine incorporating such a module
US7035880 *Jul 6, 2000Apr 25, 2006Commvault Systems, Inc.Modular backup and retrieval system used in conjunction with a storage area network
US7047236 *Dec 31, 2002May 16, 2006International Business Machines CorporationMethod for automatic deduction of rules for matching content to categories
US7167895 *Mar 22, 2000Jan 23, 2007Intel CorporationSignaling method and apparatus to provide content on demand in a broadcast system
US7181444 *Nov 20, 2001Feb 20, 2007America Online, Inc.System and process for searching a network
US7194454 *Mar 12, 2002Mar 20, 2007Lucent TechnologiesMethod for organizing records of database search activity by topical relevance
US7197502 *Feb 18, 2004Mar 27, 2007Friendly Polynomials, Inc.Machine-implemented activity management system using asynchronously shared activity data objects and journal data items
US7330997 *Jun 3, 2004Feb 12, 2008Gary OdomSelective reciprocal backup
US7343365 *Jun 28, 2002Mar 11, 2008Microsoft CorporationComputer system architecture for automatic context associations
US7346623 *Sep 30, 2002Mar 18, 2008Commvault Systems, Inc.System and method for generating and managing quick recovery volumes
US7346676 *Dec 21, 2006Mar 18, 2008Akamai Technologies, Inc.Load balancing service
US7356657 *Nov 22, 2005Apr 8, 2008Hitachi, Ltd.System and method for controlling storage devices
US7356660 *May 5, 2005Apr 8, 2008Hitachi, Ltd.Storage device
US7359917 *Dec 14, 2002Apr 15, 2008Thomson Licensing LlcMethod and apparatus for automatic detection of data types for data type dependent processing
US7386663 *May 13, 2004Jun 10, 2008Cousins Robert ETransaction-based storage system and method that uses variable sized objects to store data
US7496589 *Jul 9, 2005Feb 24, 2009Google Inc.Highly compressed randomly accessed storage of large tables with arbitrary columns
US7500150 *Dec 30, 2005Mar 3, 2009Microsoft CorporationDetermining the level of availability of a computing resource
US7509316 *Jun 24, 2004Mar 24, 2009Rocket Software, Inc.Techniques for performing policy automated operations
US7512601 *Jan 18, 2005Mar 31, 2009Microsoft CorporationSystems and methods that enable search engines to present relevant snippets
US7512814 *Nov 9, 2004Mar 31, 2009Fortiva Inc.Secure and searchable storage system and method
US7529748 *Mar 15, 2006May 5, 2009Ji-Rong WenInformation classification paradigm
US7532340 *Apr 19, 2002May 12, 2009Toshiba Tec Kabushiki KaishaDocument management system rule-based automation
US7533103 *Dec 23, 2003May 12, 2009Sap AgSelf-describing business objects
US7553181 *Apr 17, 2008Jun 30, 2009Van Dalinda Iii William RCord connection device
US7668798 *Apr 4, 2001Feb 23, 2010Red Hat, Inc.System and method for accessing data in disparate information sources
US7716171 *Aug 18, 2005May 11, 2010Emc CorporationSnapshot indexing
US7734715 *Mar 1, 2001Jun 8, 2010Ricoh Company, Ltd.System, computer program product and method for managing documents
US20020049626 *Apr 12, 2001Apr 25, 2002Peter MathiasMethod and system for interfacing clients with relationship management (RM) accounts and for permissioning marketing
US20020069324 *Dec 6, 2000Jun 6, 2002Gerasimov Dennis V.Scalable storage architecture
US20030018607 *Aug 2, 2001Jan 23, 2003Lennon Alison JoanMethod of enabling browse and search access to electronically-accessible multimedia databases
US20030046313 *Aug 26, 2002Mar 6, 2003Arkivio, Inc.Techniques for restoring data based on contents and attributes of the data
US20040010493 *Jun 26, 2003Jan 15, 2004Ns Solutions CorporationDatabase system and a method of data retrieval from the system
US20040015514 *Feb 18, 2003Jan 22, 2004Austin MeltonMethod and system for managing data objects
US20060031225 *Dec 16, 2004Feb 9, 2006Grand Central Communications, Inc.Providing on-demand access to services in a wide area network
US20060031263 *Apr 22, 2005Feb 9, 2006Yan ArrouyeMethods and systems for managing data
US20060101285 *Nov 9, 2004May 11, 2006Fortiva Inc.Secure and searchable storage system and method
US20060106814 *Nov 17, 2005May 18, 2006Steven BlumenauSystems and methods for unioning different taxonomy tags for a digital asset
US20070033191 *Aug 4, 2006Feb 8, 2007John HornkvistMethods and systems for managing permissions data and/or indexes
US20070043956 *Aug 19, 2005Feb 22, 2007Microsoft CorporationSystem and methods that facilitate third party code test development
US20070100867 *Oct 31, 2005May 3, 2007Celik Aytek ESystem for displaying ads
US20070112809 *Dec 20, 2006May 17, 2007Yan ArrouyeMethods and systems for managing data
US20080021921 *Sep 21, 2007Jan 24, 2008Horn Bruce LComputer system for automatic organization, indexing and viewing of information from multiple sources
US20080059515 *Sep 1, 2006Mar 6, 2008Fulton Michael SMethod, system, and program product for organizing a database
US20080091655 *Mar 30, 2007Apr 17, 2008Gokhale Parag SMethod and system for offline indexing of content and classifying stored data
USRE37601 *Nov 15, 1995Mar 19, 2002International Business Machines CorporationMethod and system for incremental time zero backup copying of data
Referenced by
Citing PatentFiling datePublication dateApplicantTitle
US7747579Nov 28, 2006Jun 29, 2010Commvault Systems, Inc.Metabase for facilitating data classification
US7801864Nov 28, 2006Sep 21, 2010Commvault Systems, Inc.Systems and methods for using metadata to enhance data identification operations
US7831553Jan 28, 2010Nov 9, 2010Commvault Systems, Inc.Systems and methods for classifying and transferring information in a storage network
US7831622Apr 27, 2010Nov 9, 2010Commvault Systems, Inc.Systems and methods for classifying and transferring information in a storage network
US7836174Jan 30, 2008Nov 16, 2010Commvault Systems, Inc.Systems and methods for grid-based data scanning
US7849059Nov 28, 2006Dec 7, 2010Commvault Systems, Inc.Data classification systems and methods for organizing a metabase
US8010769Nov 4, 2010Aug 30, 2011Commvault Systems, Inc.Systems and methods for classifying and transferring information in a storage network
US8037031Dec 20, 2010Oct 11, 2011Commvault Systems, Inc.Method and system for offline indexing of content and classifying stored data
US8051095Jan 28, 2010Nov 1, 2011Commvault Systems, Inc.Systems and methods for classifying and transferring information in a storage network
US8131680Nov 2, 2009Mar 6, 2012Commvault Systems, Inc.Systems and methods for using metadata to enhance data management operations
US8131725Sep 20, 2010Mar 6, 2012Comm Vault Systems, Inc.Systems and methods for using metadata to enhance data identification operations
US8234249Mar 31, 2011Jul 31, 2012Commvault Systems, Inc.Method and system for searching stored data
US8271548Nov 28, 2006Sep 18, 2012Commvault Systems, Inc.Systems and methods for using metadata to enhance storage operations
US8285681Mar 31, 2010Oct 9, 2012Commvault Systems, Inc.Data object store and server for a cloud storage environment, including data deduplication and data management across multiple cloud storage sites
US8285685Jun 23, 2010Oct 9, 2012Commvault Systems, Inc.Metabase for facilitating data classification
US8285964Jul 21, 2011Oct 9, 2012Commvault Systems, Inc.Systems and methods for classifying and transferring information in a storage network
US8296301Jan 30, 2008Oct 23, 2012Commvault Systems, Inc.Systems and methods for probabilistic data classification
US8307177Sep 3, 2009Nov 6, 2012Commvault Systems, Inc.Systems and methods for management of virtualization data
US8352472Mar 2, 2012Jan 8, 2013Commvault Systems, Inc.Systems and methods for using metadata to enhance data identification operations
US8356018Nov 11, 2010Jan 15, 2013Commvault Systems, Inc.Systems and methods for grid-based data scanning
US8370442Aug 27, 2009Feb 5, 2013Commvault Systems, Inc.Method and system for leveraging identified changes to a mail server
US8392706 *Nov 25, 2009Mar 5, 2013Perlustro, L.P.Method and system for searching for, and collecting, electronically-stored information
US8407190Mar 31, 2010Mar 26, 2013Commvault Systems, Inc.Performing data storage operations with a cloud environment, including containerized deduplication, data pruning, and data transfer
US8442983Dec 23, 2010May 14, 2013Commvault Systems, Inc.Asynchronous methods of data classification using change journals and other data structures
US8612439Mar 31, 2010Dec 17, 2013Commvault Systems, Inc.Performing data storage operations in a cloud storage environment, including searching, encryption and indexing
US8612714Sep 14, 2012Dec 17, 2013Commvault Systems, Inc.Systems and methods for classifying and transferring information in a storage network
US8615523Jun 29, 2012Dec 24, 2013Commvault Systems, Inc.Method and system for searching stored data
US8707070Aug 28, 2008Apr 22, 2014Commvault Systems, Inc.Power management of data processing resources, such as power adaptive management of data storage operations
US8719264Mar 31, 2011May 6, 2014Commvault Systems, Inc.Creating secondary copies of data based on searches for content
US8725737Sep 11, 2012May 13, 2014Commvault Systems, Inc.Systems and methods for using metadata to enhance data identification operations
US8738668 *Dec 16, 2010May 27, 2014Renew Data Corp.System and method for creating a de-duplicated data set
US8805822 *May 2, 2008Aug 12, 2014Oracle International CorporationKnowledge base search utility
US8832406Dec 11, 2013Sep 9, 2014Commvault Systems, Inc.Systems and methods for classifying and transferring information in a storage network
US8849761Sep 14, 2012Sep 30, 2014Commvault Systems, Inc.Data object store and server for a cloud storage environment, including data deduplication and data management across multiple cloud storage sites
US8849955Mar 31, 2010Sep 30, 2014Commvault Systems, Inc.Cloud storage and networking agents, including agents for utilizing multiple, different cloud storage sites
US8892523Jun 8, 2012Nov 18, 2014Commvault Systems, Inc.Auto summarization of content
US8930496Dec 15, 2006Jan 6, 2015Commvault Systems, Inc.Systems and methods of unified reconstruction in storage systems
US8950009Mar 7, 2013Feb 3, 2015Commvault Systems, Inc.Information management of data associated with multiple cloud services
US9021282Dec 23, 2013Apr 28, 2015Commvault Systems, Inc.Power management of data processing resources, such as power adaptive management of data storage operations
US9047296May 14, 2013Jun 2, 2015Commvault Systems, Inc.Asynchronous methods of data classification using change journals and other data structures
US9098542May 7, 2014Aug 4, 2015Commvault Systems, Inc.Systems and methods for using metadata to enhance data identification operations
US9158835May 1, 2012Oct 13, 2015Commvault Systems, Inc.Method and system for offline indexing of content and classifying stored data
US9171008Mar 26, 2013Oct 27, 2015Commvault Systems, Inc.Performing data storage operations with a cloud environment, including containerized deduplication, data pruning, and data transfer
US9213848Jan 5, 2015Dec 15, 2015Commvault Systems, Inc.Information management of data associated with multiple cloud services
US9262496Mar 7, 2013Feb 16, 2016Commvault Systems, Inc.Unified access to personal data
US9417968Sep 22, 2014Aug 16, 2016Commvault Systems, Inc.Efficiently restoring execution of a backed up virtual machine based on coordination with virtual-machine-file-relocation operations
US9418149Oct 9, 2014Aug 16, 2016Commvault Systems, Inc.Auto summarization of content
US9436555Sep 22, 2014Sep 6, 2016Commvault Systems, Inc.Efficient live-mount of a backed up virtual machine in a storage management system
US9454537Sep 24, 2014Sep 27, 2016Commvault Systems, Inc.Data object store and server for a cloud storage environment, including data deduplication and data management across multiple cloud storage sites
US9489244Feb 17, 2016Nov 8, 2016Commvault Systems, Inc.Seamless virtual machine recall in a data storage system
US9495404Dec 6, 2013Nov 15, 2016Commvault Systems, Inc.Systems and methods to process block-level backup for selective file restoration for virtual machines
US9509652Feb 5, 2013Nov 29, 2016Commvault Systems, Inc.Method and system for displaying similar email messages based on message contents
US9571579Dec 14, 2015Feb 14, 2017Commvault Systems, Inc.Information management of data associated with multiple cloud services
US9606994Jul 30, 2015Mar 28, 2017Commvault Systems, Inc.Systems and methods for using metadata to enhance data identification operations
US9633064Aug 8, 2014Apr 25, 2017Commvault Systems, Inc.Systems and methods of unified reconstruction in storage systems
US9639529Dec 23, 2013May 2, 2017Commvault Systems, Inc.Method and system for searching stored data
US9652283Oct 14, 2016May 16, 2017Commvault Systems, Inc.Creation of virtual machine placeholders in a data storage system
US9684535Mar 14, 2016Jun 20, 2017Commvault Systems, Inc.Archiving virtual machines in a data storage system
US9703584Jan 6, 2014Jul 11, 2017Commvault Systems, Inc.Virtual server agent load balancing
US9710465Sep 22, 2014Jul 18, 2017Commvault Systems, Inc.Efficiently restoring execution of a backed up virtual machine based on coordination with virtual-machine-file-relocation operations
US9740702Jun 28, 2013Aug 22, 2017Commvault Systems, Inc.Systems and methods to identify unprotected virtual machines
US9740764Dec 14, 2015Aug 22, 2017Commvault Systems, Inc.Systems and methods for probabilistic data classification
US20070179995 *Nov 28, 2006Aug 2, 2007Anand PrahladMetabase for facilitating data classification
US20090276400 *May 2, 2008Nov 5, 2009Oracle International CorporationKnowledge base search utility
US20100070725 *Sep 3, 2009Mar 18, 2010Anand PrahladSystems and methods for management of virtualization data
US20100138653 *Nov 25, 2009Jun 3, 2010Elliot SpencerMethod and system for searching for, and collecting, electronically-stored information
US20100332818 *Mar 31, 2010Dec 30, 2010Anand PrahladCloud storage and networking agents, including agents for utilizing multiple, different cloud storage sites
US20110178996 *Dec 16, 2010Jul 21, 2011Renew Data Corp.System and method for creating a de-duplicated data set
Classifications
U.S. Classification1/1, 711/E12.001, 707/E17.044, 711/E12.103, 707/999.003, 707/999.102, 707/999.202
International ClassificationG06F12/00, G06F17/30, G06F12/16, G06F7/00
Cooperative ClassificationG06F17/30864, G06F17/30442, G06F17/30082, G06F17/30011
European ClassificationG06F17/30W1, G06F17/30S4P3