|Publication number||US20090013213 A1|
|Application number||US 12/167,249|
|Publication date||Jan 8, 2009|
|Filing date||Jul 3, 2008|
|Priority date||Jul 3, 2007|
|Also published as||US7913075, US20090013168|
|Publication number||12167249, 167249, US 2009/0013213 A1, US 2009/013213 A1, US 20090013213 A1, US 20090013213A1, US 2009013213 A1, US 2009013213A1, US-A1-20090013213, US-A1-2009013213, US2009/0013213A1, US2009/013213A1, US20090013213 A1, US20090013213A1, US2009013213 A1, US2009013213A1|
|Inventors||Dean Kalman, Jeffrey MacFarland|
|Original Assignee||Adaptec, Inc.|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (18), Referenced by (7), Classifications (8), Legal Events (1)|
|External Links: USPTO, USPTO Assignment, Espacenet|
This application claims the benefit of (1) U.S. Provisional Application No. 60/947,851, filed on Jul. 3, 2007, and entitled “Systems and Methods for Automatic Storage Initiators Grouping in a Multi-Path Storage Environment; (2) U.S. Provisional Application No. 60/947,878, filed on Jul. 3, 2007, and entitled “Systems and Methods for Server-Wide Initiator Grouping in a Multi-Path Storage Environment; (3) U.S. Provisional Patent Application No. 60/947,881, filed on Jul. 3, 2007, and entitled “Systems and Methods for Intelligent Disk Rebuild;” (4) U.S. Provisional Patent Application No. 60/947,884, filed on Jul. 3, 2007, and entitled “Systems and Methods for Logical Grouping of San Storage Zones;” and (5) U.S. Provisional Patent Application No. 60/947,886, filed on Jul. 3, 2007, and entitled “Systems and Methods for Automatic Provisioning of Storage and Operating System Installation,” the disclosures of which are incorporated herein by reference.
Embodiments of this invention generally relates to replacing a failed disk drive that is part of a RAID drive group and rebuilding the replaced disk drive and creating logical grouping of the SAN Storages.
When a drive fails that is part of a RAID 1, RAID 5, or RAID 6 drive group, the failing drive should to be replaced. Once the failing drive is replaced, RAID controllers go through a process called rebuild. For RAID 1, this would involve a copy operation from the surviving drive to the replaced drive. For RAID 5 and RAID 6, this would involve a reconstruction of the data or parity from the surviving drives to the replaced drive.
Currently storage is allocated from individual storage enclosures. When provisioning the storage in a SAN environment, the user must understand the location, capabilities, reliability and access control associated with each storage enclosure. Therefore, the user needs to keep track of each storage enclosure for its location, reliability, capabilities, and access control characteristics.
In view of these issues, embodiments of the invention arise.
Broadly speaking, embodiments of the invention provide methods and systems for intelligent rebuilding of the replaced disk drive after disk failure, and creating SAN storage zones to logically group a plurality of storage devices.
In one embodiment, with the increase in disk drive sizes, rebuild times are becoming exorbitantly long, taking may hours or days. Long rebuild times are a detriment since they impact the overall RAID controller performance and in addition leaving user data exposed without protection. If for example a second drive fails while a RAID 5 drive group is rebuilding, the drive group will go offline and the data on that drive group will be lost. Speeding up rebuild times is therefore an essential requirement going forward. In this embodiment, an embodiment to speed up rebuild times is to use a host write tracking persistent log. The log is configured to keep track of what areas on the disk group have been written by the host since the drive group was constructed. As result, there is no need to reconstruct an unwritten area since there is no data to reconstruct.
In another embodiment, a method of rebuilding a replacement drive used in a RAID group of drives is disclosed. The method includes tracking data modification operations continuously during use of the drives. The method also includes saving the tracked data modifications to a log in a persistent storage, where the tracked data modifications are associated with stripe data present on the drives. Then, rebuilding a failed one of the drives with a replacement drive. The rebuilding is facilitated by referencing the log from the persistent storage, and the log facilitating reading only portions of stripe data from surviving drives and omitting reading of portions from the drives where no data was written. Thus, the rebuilding only rebuilds the stripe data to the replacement drive.
In another embodiment, storage zones are defined. The logical grouping of SAN storage based on location or other characteristics is established, instead of based upon individual storage enclosures within a SAN. For example, the storage zone can consist of all the storage located within one computer rack, the storage contained within a building, or storage with particular characteristics, such as performance, cost, and reliability. Along these lines, initiator permissions are defined for each created storage zone. One benefit of zoning is it allows for simplified storage administration, simplified storage allocation and/or use. Initiator permissions and policy are then associated with storage zones. Thus, SAN storage can be allocated via “logical grouping” and not individual storage enclosures.
In yet another embodiment, a method of creating storage area network zones is disclosed. The method includes identifying a plurality of storage devices. Then, assigning each of the plurality of storage devices to a logical group, where the logical group being identified by characteristics. Then, presenting the plurality of storage devices as part of the logical group without regard to enclosure identifications. Access and control properties are then assigned to the logical group, which provide access to the plurality of storage devices. Administration is also now carried out for the logical group, instead of the physical characteristics or individual SANs. Thus, easy SAN grouping can be carried out, where administration is simplified.
Other aspects of the invention will become more apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the present invention.
Embodiments of the invention provide methods and systems for intelligent rebuilding of the replaced disk drive after disk failure and creating SAN storage zones to logically group a plurality of storage devices.
In iSCSI (Internet Small Computer Systems Interface) compliant Storage Area Networks, the SCSI commands are sent in IP packets. Use of IP packets to send SCSI commands to the disk arrays enables implementation of a SAN over an existing Ethernet. Leveraging the IP network for implementing SAN also permits use of IP and Ethernet features, such as sorting out packet routes and alternate paths for sending the packets.
iSCSI is a protocol that allows clients (called initiators) to send SCSI commands (CDBs) to SCSI storage devices (targets) on remote servers. This Storage Area Network (SAN) protocol allows organizations to consolidate storage into data center storage arrays while providing hosts (such as database and web servers) with the illusion of locally-attached disks. Unlike Fibre Channel, which requires special-purpose cabling, iSCSI can be run over long distances using existing network infrastructure.
In the iSCSI therefore, there are main functional entities, initiators and targets. Initiators are machines that need to access data and targets are machines that provide the data. A target could be a RAID array or another computer system. Targets handle iSCSI requests from initiators. Target machines may include hot standby machines with “mirrored” storage. If the active machine fails, the standby machine will take over to provide the iSCSI service, and when the failed machine returns, the failed machine will re-synchronize with the standby machine and then take back the iSCSI service.
With the increase in disk drive sizes, rebuild times are becoming exorbitantly long taking many hours or days. Long rebuild times are a detriment since they impact the overall RAID controller performance and in addition leave the customers data exposed and possibly not protected. If for example a second drive fails while a RAID 5 drive group is rebuilding, the drive group will go offline and the data on that drive group will be lost. Speeding up rebuild times is therefore an essential requirement going forward. The embodiments of the present invention typically provide a faster rebuild of the replaced drive.
The main performance-limiting issues with disk storage relate to the slow mechanical components that are used for positioning and transferring data. Since a RAID drive group has many drives in it, an opportunity presents itself to improve performance by using the hardware in all these drives in parallel. For example, if we need to read a large file, instead of pulling it all from a single hard disk, it is much faster to chop it up into pieces, store some of the pieces on each of the drives in the group, and then use all the disks to read back the file when needed. This technique of chopping up pieces of files is called striping.
Striping can be done at the byte level, or in blocks. Byte-level striping means that the file is broken into “byte-sized pieces”. The first byte of the file is sent to the first drive, then the second to the second drive, and so on. Sometimes byte-level striping is done as a sector of 512 bytes. Block-level striping means that each file is split into blocks of a certain size and those are distributed to the various drives. The size of the blocks used is also called the stripe size (or block size, or several other names), and can be selected from a variety of choices when the drive group is set up.
The advantages of the present invention are numerous. Most notably, the system and methods described herein provides a faster way of rebuilding the replaced disk in a RAID group by tracking data modification operations (or stripping information) (e.g. write, delete, update) continuously and rebuilding the replaced drive by reading only the portions of stripe from one or more surviving disk drives in the RAID array.
In one embodiment, the disk rebuild time is enhanced by the use of a persistent write operations tracking module. The persistent write operations tracking module keeps track of what areas on the disk group have been written by the host since the drive group was constructed. The tracking information is stored in a persistent tracking log. With the information contained in the persistent tracking log, a replaced disk drive can be rebuilt quickly by selectively reading only parts (e.g. stripping information) of one or more surviving disk drives. There is no need to reconstruct an unwritten area since there is no data to reconstruct. A simplified example using a RAID 1 drive group is shown in
The persistent tracking log is used to track the stripes that have been written.
When the rebuild algorithm starts, it looks at the persistent log and determines which stripes need to be rebuilt. In this example illustrated by
In one embodiment, the persistent tracking log is maintained by the RAID controller. In other embodiment, the persistent tracking log may be maintained by any component of the computing system to which the RAID array is in communication with so long as the persistent tracking log can be retrieved at a later time to rebuild the replacement drive. The persistent tracking log, in one embodiment, is stored in a relational database. In other embodiment, the persistent tracking log is stored in a non-volatile memory, including a disk drive, ROM, Flash Memory, or any similar storage media.
In accordance with another embodiment, methods and systems for creating SAN storage zones to logically group a plurality of storage devices is provided. The advantages provided by this embodiment are numerous. Most notably, the system and methods described herein eliminate a need for the user to keep track of the storage characteristics, and location of each individual storage enclosure.
Instead, a logical group consisting of a plurality of storage enclosures that may be located at different locations and having different storage characteristics is created. The logical group of storage enclosures is then made available as a single storage enclosure to the user. The administrator of the logical group may modify the characteristics of the logical group by adding or removing one or more storage enclosures, changing locations of the one or more storage enclosures in a logical group.
In one embodiment, the storage enclosures in a logical group are hidden from the user. Hence, any change (e.g., adding or removing enclosures, changing location, etc.) in the structure of logical groups does not affect overall system configuration and usage. Therefore, the logical grouping of the storage enclosures simplifies the management of the Storage Area Network (SAN) and permits efficient storage, configuration and privilege management.
With the creation of the storage zone, i.e., the logical grouping of the storage enclosures, SAN storage is no longer viewed at the enclosure level. The storage enclosures are logically grouped together to meet customers' unique requirements for administrating, provisioning, and usage of the storage enclosures.
The storage administrator defines the storage zone by creating a logical group and adding the selected storage enclosures to the local group. The access control properties are then defined and permissions to individual storage initiators e.g., iSCSI (Internet Small Computer Systems Interface), Fibre Channel (FC), SAS, etc. Initiator permissions can be unique for each initiator within a storage zone. In one embodiment, logical groups of initiators can also be defined and added to a particular storage zone.
In one embodiment, the SAN administrator(s) defines grouping properties for each of the physical and logical storage coupled to the SAN appliances. The SAN appliance as described herein a box including slots for a plurality of server blades, RAID disk arrays, and SAN control and management software to control and manage the server blades, RAID, data buses, and other necessary components of the SAN. The properties may include location of the storage, names of special characteristics, capabilities, and type of the storage. In one embodiment, each property in the properties is structured in a tree structure format. For example, under a “Location” named node in the property tree structure, a nod named “Building 23” is created. Under the “Building 23” node, a child node named “Server Room A” may be created. More sibling and child nodes may be created to properly identify a location. The properties may be stored anywhere in the SAN so long as the appliance in which the zone grouping is being created may read the properties.
One or more zone grouping rules are then created and stored in the SAN. The zone grouping rule may define a set of properties that if matched would trigger creation of a zone group. A zone grouping rule may be set to be active or inactive. The appliance discovers all the storages that are coupled to the appliance and retrieves the properties associated with each of the storage. Further, based on one or more active zone grouping rules, the appliance attempt to match the properties of the storages. If a matching rule is satisfied, the appliance creates a zone group of the storages that provides matching properties as defined by one or more zone grouping rule. The zone groups are then permanently stored in the appliance. The SAN administrator may edit the zone groups if a change in the group is necessary.
A set of default group properties is provided. One or more default group properties are attached to a newly created zone group. The zone group rule would include which default group proprieties are to be used for a newly created group. The group properties may include permissions and privilege grants to one or more storage initiators.
In one embodiment, storage zones may be created by grouping the storage enclosures based on a location. In another embodiment, storage zones may be created by grouping the storage enclosures based on reliability characteristics of the storage enclosures. In yet another embodiment, a zone group may be created based on any physical or logical characteristics so long as the physical or logical characteristics is defined in the property of the storage enclosures and one or more zone group rules are defined to use the physical or logical characteristics to create zone groups.
By providing a layer of abstraction over the storage initiators and storage enclosures, initiator storage allocation does not require involvement of the Storage Area Network (SAN) administrator. The storage initiators work with the storage zones and not with the physical storage enclosures. Furthermore, more storage enclosures can be seamlessly added to a storage zone without impacting availability of storage interface to the initiators of users and without a need to create access control properties for the newly added storage enclosure. Similarly, new storage initiators may be added to a storage zone without impacting the usage of the physical storage enclosures in the storage zone.
Since from usage view point, a storage zone is treated same as a physical storage enclosures, a unique set of permission may be associated with the storage zone, similar to associating access control properties to a physical storage enclosure. Therefore, the logical grouping of SAN storage greatly simplified the administration and use of the storage enclosures.
With the above embodiments in mind, it should be understood that the invention may employ various hardware and software implemented operations involving data stored in computer systems. These operations are those requiring physical manipulation of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. Further, the manipulations performed are often referred to in terms, such as producing, identifying, determining, or comparing.
Any of the operations described herein that form part of the invention are useful machine operations. The invention also relates to a device or an apparatus for performing these operations. The apparatus may be specially constructed for the required purposes, such as the carrier network discussed above, or it may be a general purpose computer selectively activated or configured by a computer program stored in the computer. In particular, various general purpose machines may be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.
The programming modules, page modules, and, subsystems described in this document can be implemented using a programming language such as Flash, JAVA, C++, C, C#, Visual Basic, JAVA Script, PHP, XML, HTML etc., or a combination of programming languages. Commonly available application programming interface (API) such as HTTP API, XML API and parsers etc. are used in the implementation of the programming modules. As would be known to those skilled in the art that the components and functionality described above and elsewhere in this document may be implemented on any desktop operating system which provides a support for a display screen, such as different versions of Microsoft Windows, Apple Mac, Unix/X-Windows, Linux etc. using any programming language suitable for desktop software development.
The programming modules and ancillary software components, including configuration file or files, along with setup files required for installing and related functionality as described in this document, are stored on a computer readable medium. Any computer medium such as a flash drive, a CD-ROM disk, an optical disk, a floppy disk, a hard drive, a shared drive, and an storage suitable for providing downloads from connected computers, could be used for storing the programming modules and ancillary software components. It would be known to a person skilled in the art that any storage medium could be used for storing these software components so long as the storage medium can be read by a computer system.
The invention may be practiced with other computer system configurations including hand-held devices, microprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers and the like. The invention may also be practiced in distributing computing environments where tasks are performed by remote processing devices that are linked through a network.
As used herein, a storage area network (SAN) is an architecture to attach remote computer storage devices (such as disk arrays, tape libraries and optical jukeboxes) to servers in such a way that, to the operating system, the devices appear as locally attached.
The invention can also be embodied as computer readable code on a computer readable medium. The computer readable medium is any data storage device that can store data, which can thereafter be read by a computer system. Examples of the computer readable medium include hard drives, network attached storage (NAS), read-only memory, random-access memory, CD-ROMs, CD-Rs, CD-RWs, DVDs, Flash, magnetic tapes, and other optical and non-optical data storage devices. The computer readable medium can also be distributed over a network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.
While this invention has been described in terms of several preferable embodiments, it will be appreciated that those skilled in the art upon reading the specifications and studying the drawings will realize various alternation, additions, permutations and equivalents thereof. It is therefore intended that the present invention includes all such alterations, additions, permutations, and equivalents as fall within the true spirit and scope of the claims.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US5557770 *||Mar 24, 1993||Sep 17, 1996||International Business Machines Corporation||Disk storage apparatus and method for converting random writes to sequential writes while retaining physical clustering on disk|
|US5787242 *||Dec 29, 1995||Jul 28, 1998||Symbios Logic Inc.||Method and apparatus for treatment of deferred write data for a dead raid device|
|US7577866 *||Jun 27, 2005||Aug 18, 2009||Emc Corporation||Techniques for fault tolerant data storage|
|US20030005354 *||Jun 28, 2001||Jan 2, 2003||International Business Machines Corporation||System and method for servicing requests to a storage array|
|US20030120863 *||Dec 9, 2002||Jun 26, 2003||Lee Edward K.||Self-healing log-structured RAID|
|US20030188101 *||Mar 29, 2002||Oct 2, 2003||International Business Machines Corporation||Partial mirroring during expansion thereby eliminating the need to track the progress of stripes updated during expansion|
|US20040068612 *||Oct 8, 2002||Apr 8, 2004||Stolowitz Michael C.||Raid controller disk write mask|
|US20060041793 *||Aug 17, 2004||Feb 23, 2006||Dell Products L.P.||System, method and software for enhanced raid rebuild|
|US20060069947 *||Dec 10, 2004||Mar 30, 2006||Fujitsu Limited||Apparatus, method and program for the control of storage|
|US20060161805 *||Jan 14, 2005||Jul 20, 2006||Charlie Tseng||Apparatus, system, and method for differential rebuilding of a reactivated offline RAID member disk|
|US20070028044 *||Dec 19, 2005||Feb 1, 2007||Lsi Logic Corporation||Methods and structure for improved import/export of raid level 6 volumes|
|US20070294565 *||Apr 28, 2006||Dec 20, 2007||Network Appliance, Inc.||Simplified parity disk generation in a redundant array of inexpensive disks|
|US20080005382 *||Jun 14, 2006||Jan 3, 2008||Hitachi, Ltd.||System and method for resource allocation in fault tolerant storage system|
|US20080040553 *||Aug 11, 2006||Feb 14, 2008||Ash Kevin J||Method and system for grouping tracks for destaging on raid arrays|
|US20080133969 *||Nov 30, 2006||Jun 5, 2008||Lsi Logic Corporation||Raid5 error recovery logic|
|US20080195808 *||Jun 11, 2007||Aug 14, 2008||Via Technologies, Inc.||Data migration systems and methods for independent storage device expansion and adaptation|
|US20080250269 *||Apr 5, 2007||Oct 9, 2008||Jacob Cherian||System and Method for Improving Rebuild Speed Using Data in Disk Block|
|US20080256420 *||Apr 12, 2007||Oct 16, 2008||International Business Machines Corporation||Error checking addressable blocks in storage|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US7877626 *||Dec 31, 2007||Jan 25, 2011||Datadirect Networks, Inc.||Method and system for disk storage devices rebuild in a data storage system|
|US8095828 *||Aug 31, 2009||Jan 10, 2012||Symantec Corporation||Using a data storage system for cluster I/O failure determination|
|US9047220 *||Jul 23, 2012||Jun 2, 2015||Hitachi, Ltd.||Storage system and data management method|
|US9087019 *||Jan 27, 2012||Jul 21, 2015||Promise Technology, Inc.||Disk storage system with rebuild sequence and method of operation thereof|
|US9110591||Apr 22, 2011||Aug 18, 2015||Hewlett-Packard Development Company, L.P.||Memory resource provisioning using SAS zoning|
|US20130198563 *||Jan 27, 2012||Aug 1, 2013||Promise Technology, Inc.||Disk storage system with rebuild sequence and method of operation thereof|
|US20140025990 *||Jul 23, 2012||Jan 23, 2014||Hitachi, Ltd.||Storage system and data management method|
|U.S. Classification||714/20, 714/E11.113, 711/E12.001, 711/114|
|International Classification||G06F12/00, G06F11/14|
|Jul 23, 2008||AS||Assignment|
Owner name: ADAPTEC, INC., CALIFORNIA
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KALMAN, DEAN;MACFARLAND, JEFFREY;REEL/FRAME:021279/0288
Effective date: 20080708