|Publication number||US7139884 B2|
|Application number||US 10/454,953|
|Publication date||Nov 21, 2006|
|Filing date||Jun 5, 2003|
|Priority date||Jun 5, 2003|
|Also published as||US20040250162|
|Publication number||10454953, 454953, US 7139884 B2, US 7139884B2, US-B2-7139884, US7139884 B2, US7139884B2|
|Inventors||Donald R. Halley, Paul Douglas Koeller|
|Original Assignee||International Business Machines Corporation|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (3), Referenced by (1), Classifications (14), Legal Events (4)|
|External Links: USPTO, USPTO Assignment, Espacenet|
The present invention relates generally to the data processing field, and more particularly, relates to a method, apparatus and computer program product for implementing enhanced autonomic data backup using multiple backup devices.
As used in the following description and claims, the term library means a container similar to a directory that contains a list of objects that contain the actual data for backup. As used in the following description and claims, the term tape drives includes other forms of backup media, such as optical media, DASDs, and remote servers.
Currently the amount of data that customers need to back up continues to increase. Unfortunately the amount of time available to perform the backup, often referred to as the backup window, typically stays the same or even decreases. In order to successfully complete their backup within the backup window many customers purchase additional backup devices. These backup devices are most often tape libraries with multiple tape devices, although it could also be just additional standalone tape devices or some type of optical devices. The customer then concurrently uses the multiple devices to reduce the total amount of time required to perform the backup.
Tape drives have been used to store a backup copy of data objects onto removable tape media in various computer systems. For example, in the International Business Machines Corporation eServer iSeries Server computer system, the iSeries Server operating system provides users with commands that allow users to make a backup copy of data objects onto removable media, such as tape backup. The iSeries server operating system has provided two ways to support the use of the multiple tape devices: a serial format defined as serially backing up a single library to a single tape device and a parallel format defined as backing up a single library or even a single object to multiple tape devices in parallel.
For example, U.S. Pat. No. 6,154,852 to Amundson et al., issued Nov. 28, 2000 and assigned to the present assignee, discloses a method and apparatus for data backup and recovery. The data backup and recovery method uses a plurality of tape drives in parallel. A unique token is associated with each data object being saved to a tape media. While saving backup data to the plurality of tape drives, a dynamic load balancer dynamically balances the load between the plurality of tape drives. While recovering backup data from tape media, the unique token is utilized for processing tape media files in any order. Data segments of one or more objects are distributed across the parallel tape devices and are non-serial across the tape media files used. When recovering backup data from tape media, the same number or fewer tape drives than used during data saving can be used.
By making use of the serial and parallel formats the iSeries Server operating system provided two alternative methods to utilize multiple tape devices to back up a given set of libraries. An inherent problem with the method to backup each library in parallel format is that while the parallel format works well for very large libraries it is inefficient for small libraries. For small libraries the parallel format backup method can be slower than backing up to a single tape device. This is because tape opens must be performed on each device and at a minimum some control information must be written to each tape. Furthermore, it can complicate a recovery of a small library because the data is spread across multiple tape devices.
The method to back up each library in serial format makes use of the tape drives as they become available. For example, if libraries named A, B, C, . . . Z are to be backed up to three tape drives 1–3 that processing will concurrently back up library A to tape drive 1, library B to tape drive 2, library C to tape drive 3, library D will use which ever tape drive finishes first, and each subsequent library will use the next available tape drive until all libraries have been backed up. The problem with this approach is that each library can be backed up only to a single tape drive. If the customer has one library that is very large, the total backup time might be gated by that one library. Worse yet if the large library happens to be near the end of the list the total backup time may be significantly lengthened. For example, consider the example above but assume that library Z is 1000 times larger than libraries A through Y. While libraries A through Y would complete quite quickly the backup window would not be complete until library Z was backed up to a single tape drive.
To deal with the problems above, currently customers must resort to manually optimizing their backup. For example, they may use the parallel backup method but have to create customized backup procedures to handle libraries that are too small to efficiently be backed up in parallel. They need to omit those small libraries from their general backup and instead run a separate backup to back up those small libraries in serial format. Or likewise, they use the serial backup method and instead identify large libraries. Then they need to omit those large libraries from their general backup and instead run a separate backup to back up those large libraries in parallel.
While the customer ultimately can optimize these approaches, this is time consuming, requiring extensive planning and manual setup by the customer. In addition, the backup procedures also need to be modified as new libraries are created or the sizes of libraries change. There are other problems with these approaches. For example, it is more difficult or in some cases impossible for the user to synchronize their backup data across multiple libraries if they require multiple backup procedures to backup all the data. Finally, the tape drives may be used less efficiently because the tape drives completely stop between the separate backup procedures.
A need exists for an improved mechanism to backup libraries using available tape devices.
A principal object of the present invention is to provide a method, apparatus and computer program product for implementing enhanced autonomic data backup using multiple backup devices. Other important objects of the present invention are to provide such a method, apparatus and computer program product for implementing enhanced autonomic data backup using multiple backup devices substantially without negative effect and that overcome some of the disadvantages of prior art arrangements.
In brief, a method, apparatus and computer program product are provided for implementing enhanced autonomic data backup using multiple backup devices. A media definition object is defined for saving predefined user selections including a default backup format to be used, an order to process the libraries, a library exception size, and a maximum number of backup devices to be used serially. A backup procedure is started and user selections are identified utilizing the media definition object. A list of libraries for backup is generated responsive to the identified order to process the libraries. Each library in the generated list of libraries is processed to form at least one library queue of a serial device wait queue and a parallel device wait queue. A process IO procedure is called until backup completes for each library from the at least one library queue.
In accordance with features of the invention, the generated list of libraries for backup includes either a user specified order of the libraries or a size order of the libraries from largest to smallest. When the user specifies a library exception size, the library size of each library is compared to the library exception size; each library having the library size less than or equal to the library size exception is added to the serial device wait queue; and each library having the library size greater than the library size exception is added to the parallel device wait queue.
In accordance with features of the invention, the user selection of the maximum number of backup devices to be used serially is used to build a serial device list and a parallel device list of the multiple backup devices. When the maximum number of backup devices to be used serially is the total number of the multiple backup devices, then the multiple backup devices are first used for serial backups and then the multiple backup devices are used for the parallel backups. When the maximum number of backup devices to be used serially is less than the total number of the multiple backup devices, then serial and parallel backups are provided concurrently.
The present invention together with the above and other objects and advantages may best be understood from the following detailed description of the preferred embodiments of the invention illustrated in the drawings, wherein:
Having reference now to the drawings, in
As shown in
In accordance with features of the invention, an enhanced media definition object 140 of the preferred embodiment is provided to solve the problems of conventional data backup methods. The media definition object 140 of the preferred embodiment is used to implement optimal autonomic backup of a given set of libraries using either of two alternative methods, to backup each library in parallel format or in serial form.
A conventional media definition object allows the user to specify the list of tape devices that are available to be used, the list of tape volumes to use on each device, the minimum number of tape devices that must be available to start the backup, and a maximum number of tape devices to use for the backup.
In accordance with features of the invention, additional fields in the enhanced media definition object 140 allow the user to specify a default backup format 150 to be used, an order to process the libraries or a library process order 152, a library exception size 154, and a number of tape drives to be used serially 156. By specifying the new fields 150, 152, 154, and 156, the data backup program 132 of the preferred embodiment can optimize the use of multiple devices 118 to perform the data backup. The method of the preferred embodiment frees the user from having to manually plan and setup their backups. The method of the preferred embodiment also optimizes the backup each time it is run. For example, if new libraries are added or libraries significantly increase in size they are handled automatically without requiring the user to customize their backup procedures.
The tape devices 118 can be either stand alone tape devices or the names of tape media library devices that contain multiple tape devices. A unique collaborative file ID 160 is associated with each piece of data written to tape media for each data backup. A unique load group ID 162 is appended to the collaborative file ID 160 during recovery processing.
Various commercially available processors could be used for computer system 100, for example, an IBM personal computer or similar workstation can be used. An example of a specific computer system on which the invention may be implemented is the International Business Machines Corp. iSeries Server computer system. Central processor unit 102 is suitably programmed to execute the flowcharts of
The default backup format 150 defines the default backup format used for each library. The default backup format 150 for each library is either parallel format or serial format. With the user selected default format for each library of serial format, each library is backed up in serial format using specified tape devices as they become available.
The library processing order 152 defines the order to be used when processing the list of libraries. The library processing order 152 enables processing the libraries in a particular library order specified by the user or in a size order. In some cases users have reasons why they want libraries backed up in some specific order. For example, there may be order dependencies between libraries or they may want their most critical libraries backed up first on the tape.
The library processing order 152 enables processing the libraries in size order from largest to smallest. For many customers, using a default format of serial format for using tape devices as they become available, and specifying this size sort order is sufficient to substantially optimize their backup. The reason being that the largest library is the first one sent to a tape device so it can be backed up while subsequent smaller libraries are backed up to the other available tape drives.
The library exception size 154 defines a library size to be used when determining which backup format to use. If the default format is serial then this value specifies that libraries larger than the specified exception size 154 should be backed up in parallel format. If the default format is parallel then this value specifies that libraries smaller than the specified exception size 154 should be backed up in serial format.
The number of tape devices to be used serially 156 defines the maximum number of tape drives to be used serially as they become available. The user can specify the number of tape devices to be used serially 156 equal to the total number of tape drives 1-N, 118 to use all of the specified tape drives 1-N, 118 as the tape drives become available. This enables all of the specified tape drives 1-N, 118 first being used as they become available to back up libraries that qualify for backup in serial format. When that backup in serial format completes, all of the tape drives 1-N, 118 will then be used to back up the libraries that qualify to be backed up in parallel format.
The user can specify a number of tape devices to be used serially 156 that is a subset of the total number of specified tape drives. This enables the specified number 156 of tape drives to be used as available to back up libraries that qualify to be backed up in a serial format. Concurrently with the serial backups, the remaining tape drives will be used to back up the libraries that qualify to be backed up in a parallel format.
Referring now to
Referring now to
Referring now to
Referring now to
Referring now to
Referring now to
A sequence of program instructions or a logical assembly of one or more interrelated modules defined by the recorded program means 504, 506, 508, 510, direct the computer system 100 for implementing enhanced autonomic data backup using multiple backup devices of the preferred embodiment.
While the present invention has been described with reference to the details of the embodiments of the invention shown in the drawing, these details are not intended to limit the scope of the invention as claimed in the appended claims.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US5502811 *||Sep 29, 1993||Mar 26, 1996||International Business Machines Corporation||System and method for striping data to magnetic tape units|
|US6154852 *||Jun 10, 1998||Nov 28, 2000||International Business Machines Corporation||Method and apparatus for data backup and recovery|
|US6735636 *||Jun 28, 2000||May 11, 2004||Sepaton, Inc.||Device, system, and method of intelligently splitting information in an I/O system|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US20080155319 *||Oct 28, 2006||Jun 26, 2008||Robert Duncan||Methods and systems for managing removable media|
|U.S. Classification||711/161, 711/162, 710/112, 714/E11.121, 710/57, 707/999.204, 707/999.202, 714/6.3|
|International Classification||G06F12/16, H04L1/22|
|Cooperative Classification||Y10S707/99955, Y10S707/99953, G06F11/1458|
|Jun 5, 2003||AS||Assignment|
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HALLEY, DONALD R.;KOELLER, PAUL DOUGLAS;REEL/FRAME:014147/0668
Effective date: 20030530
|Jun 28, 2010||REMI||Maintenance fee reminder mailed|
|Nov 21, 2010||LAPS||Lapse for failure to pay maintenance fees|
|Jan 11, 2011||FP||Expired due to failure to pay maintenance fee|
Effective date: 20101121