|Publication number||US8078718 B1|
|Application number||US 10/753,653|
|Publication date||Dec 13, 2011|
|Filing date||Jan 7, 2004|
|Priority date||Jan 7, 2004|
|Publication number||10753653, 753653, US 8078718 B1, US 8078718B1, US-B1-8078718, US8078718 B1, US8078718B1|
|Inventors||Chaitanya Nulkar, Jeffrey A. Kemp, James R. Grier, Jose Mathew|
|Original Assignee||Network Appliance, Inc.|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (7), Classifications (11), Legal Events (2)|
|External Links: USPTO, USPTO Assignment, Espacenet|
At least one embodiment of the present invention pertains to storage systems, and more particularly, to a method and apparatus for testing a storage system head in a clustered failover configuration.
A file server is a network-connected processing system that stores and manages shared files in a set of storage devices (e.g., disk drives) on behalf of one or more clients. The disks within a file system are typically organized as one or more groups of Redundant Array of Independent/Inexpensive Disks (RAID). One configuration in which file servers can be used is a network attached storage (NAS) configuration. In a NAS configuration, a file server can be implemented in the form of an appliance that attaches to a network, such as a local area network (LAN) or a corporate intranet. An example of such an appliance is any of the Filer products made by Network Appliance, Inc. in Sunnyvale, Calif.
Another specialized type of network is a storage area network (SAN). A SAN is a highly efficient network of interconnected, shared storage devices. Such devices are also made by Network Appliance, Inc. One difference between NAS and SAN is that in a SAN, the storage appliance provides a remote host with block-level access to stored data, whereas in a NAS configuration, the file server normally provides clients with only file-level access to stored data.
A simple example of a NAS network configuration is shown in
In this context, a “head” (as in filer head 2) means all of the electronics, firmware and/or software (the “intelligence”) that is used to control access to a set of mass storage devices; it does not include the mass storage devices themselves. In a file server, the head normally is where all of the “intelligence” of the file server resides. Note that a “head” in this context is not the same as, and is not to be confused with, the magnetic or optical head that is used to physically read or write data from or to the mass storage medium. The network 3 can be essentially any type of computer network, such as a local area network (LAN), a wide area network (WAN), metropolitan area network (MAN), or the Internet.
Filers are often used for data backup and recovery applications. In these applications, it is desirable to protect against as many potential failure scenarios as possible. One possible failure scenario is the failure of a filer head. One approach which has been used to protect against the possibility of a filer head failure is known as clustered failover (CFO). CFO involves the use of two or more redundant filer heads, each having “ownership” of a separate set of mass storage devices. CFO refers to a capability in which two or more interconnected heads are both active at the same time, such that if one head fails or is taken out of service, that condition is immediately detected by the other head, which automatically assumes the functionality of the inoperative head as well as continuing to service its own client requests. A file server “cluster” is defined to include at least two file server heads connected to at least two separate volumes of disks.
In a CFO configuration it is desirable for one head to have the ability to perform diagnostics on the other head (or heads), to assess its operational status. Moreover, it is desirable to have the ability to perform such diagnostics without taking the head under test out of its normal operational mode.
A first storage server head and a second storage server head are operated and are configured redundantly to provide a host with access to a plurality of mass storage devices. A diagnostic process is executed in the first storage server head to assess operational status of the second storage server head while the second storage server head is in a mode for providing the host with access to the plurality of mass storage devices.
Other aspects of the invention will be apparent from the accompanying figures and from the detailed description which follows.
The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:
A method and apparatus for testing a head in a storage system that contains multiple heads configured for CFO are described. Note that in this description, references to “one embodiment” or “an embodiment” mean that the feature being referred to is included in at least one embodiment of the present invention. Further, separate references to “one embodiment” or “an embodiment” in this description do not necessarily refer to the same embodiment; however, such embodiments are also not mutually exclusive unless so stated, and except as will be readily apparent to those skilled in the art from the description. For example, a feature, structure, act, etc. described in one embodiment may also be included in other embodiments. Thus, the present invention can include a variety of combinations and/or integrations of the embodiments described herein.
As described in greater detail below, a standalone storage system according to certain embodiments includes two heads connected by a passive backplane and configured as CFO pair. Each head includes an operating system kernel and a separate diagnostic kernel. During a cluster interconnect test, one head runs the diagnostic kernel to assess the operational status of the other head and the connection between the two heads, while the head under test (HUT) runs its operating system kernel and is available to serve clients. The test is used by the diagnostic kernel to determine, among other things, whether the HUT is properly running its operating system. The diagnostic kernel uses this knowledge to avoid testing hardware shared by both heads that may be in use by operating system.
Connected to the backplane 51 are several individual disk drives 23, redundant power supplies 52 and associated cooling modules 53, and two heads 64. For purposes of this description, it can be assumed that the heads 64 are configured to operate in CFO mode, such that each of the heads 64 owns a separate subset of the disk drives 23. Connecting the heads 64 to the backplane 51 is advantageous, because (among other reasons) it eliminates the need for cables or wires to connect the heads 64. Note that although the system 71 includes two heads 64, the system 71 can operate as a standalone system with only one head 64.
The PBCs 94 are connected to the processor 91 through the Fibre Channel adapter 93 and can be connected to the passive backplane 51 through standard pin-and-socket type connectors (not shown) mounted on the circuit board 80 and on the backplane 51. The PBCs 94 are connected to the Fibre Channel adapter 93 in a loop configuration. In operation, each PBC 94 can communicate (through the backplane 51) separately with two or more disk drives installed within the same chassis. Normally, each PBC 94 is responsible for a different subset of the disk drives within the chassis. Each PBC 94 provides loop resiliency with respect to the disk drives for which it is responsible, to protect against a disk drive failure. In other words, in the event a disk drive fails, the associated PBC 94 will simply bypass the failed disk drive. Examples of PBCs with such functionality are the HDMP-0480 and HDMP-0452 from Agilent Technologies in Palo Alto, Calif., and the VSC7127 from Vitesse Semiconductor Corporation in Camarillo, Calif.
The head 64 also includes a number (three in the illustrated embodiment) of IC Ethernet adapters 95. In the illustrated embodiment, two of the Ethernet adapters 95 are coupled to external connectors to allow them to be connected to devices outside the chassis for network communication (e.g., to clients and/or a management station). The third Ethernet adapter 95A is connected only to the backplane 51 and is used only for head-to-head communication, as described further below.
The head 64 further includes (mounted on the circuit board 80) a standard RJ-45 connector 96 which is coupled to the processor 91 through a standard RS-232 transceiver 97. This connector-transceiver pair 96 and 97 allows an external terminal operated by a network administrator to be connected to the head 64, for purposes of remotely monitoring or configuring the head 64 or other administrative purposes.
The single-board head 64 also includes (mounted on the circuit board 80) at least one non-volatile memory 98 (e.g., Flash memory or the like), which stores information such as boot firmware, a boot image, test software and the like. The test software includes a diagnostic kernel which used to run diagnostics on the other head 64 and the head-to-head interconnect, as described further below.
The head 64 further includes a number of Fibre Channel connectors 102 to allow connection of the head 64 to external components. One of the Fibre Channel connectors 102 is coupled directly to the Fibre Channel adapter 93, while another Fibre Channel connector 102A is coupled to the Fibre Channel adapter 93 through one of the PBCs 94. Fibre Channel connector 102A can be used to connect the head 64 to an external disk shelf. Although the head 64 allows the enclosure to be used as a standalone file server without any external disk drives, it may nonetheless be desirable in some cases to connect one or more external shelves to the enclosure to provide additional storage capacity. The head 64 also includes a connector 99 to allow testing of the single-board head 64 in accordance with JTAG (IEEE 1149.1) protocols.
In certain embodiments, the processor 91 in the head 64 is programmed (by instructions and data stored in memory 92 and/or in memory 98) so that the enclosure is operable as both a NAS filer (using file-level accesses to stored data) and a SAN storage system (using block-level accesses to stored data) at the same time, i.e., to operate as a “unified” storage device, sometimes referred to as fabric attached storage (FAS) device. In other embodiments, the single-board head 64 is programmed so that the enclosure is operable as either a NAS file server or a SAN storage, but not at the same time, where the mode of operation can be determined after deployment according to a selection by a user (e.g., a network administrator). In other embodiments of the invention, the single-board head 64 is programmed so that the enclosure can operate only as a NAS file server or, in still other embodiments, only as a SAN storage system.
As noted above, the heads 64 in the storage system 71 may be programmed to operate as a CFO (redundant) pair. In the illustrated embodiment, the heads 64 communicate with each other only via the passive backplane 51. In certain embodiments, the heads 64 communicate through the backplane 51 using M-VIA (emulated Virtual Interface Architecture) over Gigabit Ethernet protocol. In other embodiments, however, other protocols may be used instead for communication between the heads 64.
Referring now to
During the diagnostic test, the OS kernel 55 in the head that is not under test (the “initiating head”) and the diagnostic kernel 56 in the HUT are quiescent (as indicated by the dashed lines in
Below the file system layer 61 is the OS kernel 55. In accordance with the invention, the OS kernel 55 includes an M-VIA sublayer 68, to allow communication between the heads 64 (via the backplane 51) using M-VIA. The operating system 34 also includes the diagnostic kernel 56, which in certain embodiments is a stripped down version of the OS kernel 55, but without the file system 61, and with the added functionality described below.
Below the OS kernel 55, on the network side the operating system 34 includes a network access layer 64 and, at the lowest level, a media access layer 65. The network access layer 64 implements any of various protocols used to communicate with client devices, such as network file system (NFS), common Internet file system (CIFS) and/or hypertext transport protocol (HTTP). The media access layer 65 includes one or more drivers which implemented the protocols used to communicate over the network, such as Ethernet. In accordance with the invention, the media access layer 65 includes a Gigabit Ethernet (GbE) sublayer 69, to allow communication between the heads 64 (via the backplane 51).
Below the kernel layer 62 on the storage device side, the operating system 34 includes a storage access layer 66 and, at the lowest level, a driver layer 67. The storage access layer 66 implements a disk storage protocol such as RAID, while the driver layer 67 implements a lower-level storage device access protocol, such as Fibre Channel or SCSI.
The test performed by the diagnostic kernel 56 is carried out using the Ethernet port 95A (
The test provides the following functions:
1) verifies that the cluster interface is connected and a network link between the two heads is present;
2) verifies that the data path between cluster partners is functional;
3) verifies packet integrity over the cluster interconnect and provides error detection;
4) detect whether any errors originate in the HUT or the initiating head; and
5) informs diagnostics whether the operating system is running on the HUT.
Next, at block 705 the initiating head is set to “promiscuous” mode (i.e., to receive all packets communicated via the head-to-head port 95A). At block 706 the diagnostic kernel 56 attempts to detect a link to the HUT, to verify connectivity with the HUT. If a link is detected, the process proceeds to block 707; otherwise, the diagnostic kernel 56 generates a report at block 711 indicating the absence of a link. At block 708 the diagnostic kernel 56 transmits all of the above-mentioned send buffers. The send buffers are chained into the transmit descriptor ring and, in certain embodiments, are sent by direct memory access (DMA), 128 buffers at a time. At block 709 the diagnostic kernel 56 then checks whether all of the diagnostic packets have been received back from the HUT. If all of the diagnostic packets have not been returned, the diagnostic kernel 56 generates a report indicating this as an error condition at block 712. If all packets have been returned, the diagnostic kernel 56 then examines the contents of the returned packets at block 709 to determine whether the contents match the contents of the diagnostic packets that were sent. If the contents do not match, the diagnostic kernel 56 generates a report indicating this as an error condition at block 712. If all packets were received back from the HUT and the contents of all packets were verified, the diagnostic kernel 56 generates a report indicating the test was successful.
In the HUT, when the M-VIA sublayer 68 detects receipt of a packet containing the predetermined header pattern, it recognizes the packet as a diagnostic packet and simply sends the packet back to the initiating head (via the backplane), without passing the packet to the kernel layer 62 or allowing processing of the packet. If the operating system 34 is not running on the HUT, any diagnostic packets transmitted by the initiating head will not be returned.
In certain embodiments, for each test the diagnostic kernel 56 gathers and reports to the user the following parameters regarding the transmitted diagnostic packets: total number of bytes, total number of frames, number of collisions, number of late collisions, number of excessive collisions, number of FCS errors, number of abort errors, number of bad frames, number of runt frames, and number of long frames. Similarly, in certain embodiments for each test the diagnostic kernel 56 gathers and reports to the user the following parameters regarding the received diagnostic packets: total number of bytes, total number of frames, number of multicast frames, number of broadcast frames, number of bad frames, number of runt frames, number of long frames, number of FCS errors, number of length errors, number of code errors and number of alignment errors.
Among other advantages, the above-described technique enables one head in a clustered system to test the cluster interconnect without affecting the other head. If the other head is serving data, it can continue to serve data while the test runs. The technique further enables the diagnostic kernel to know if the other head is running its operating system and to execute its tests with that knowledge. The test further serves as a vehicle for the operating system to communicate to the diagnostic kernel on the other head.
Note that the diagnostic techniques described above can also be applied in various other contexts. For example, these techniques can be applied in a system with modular, standalone heads which need not be implemented each on a single circuit board. The heads may be in separate enclosures from each other and/or from the mass storage devices. Further, these techniques can be applied in essentially any type of storage system which uses two or more heads, not just in a NAS environment; in particular, note that these techniques can also be applied in a SAN environment.
Thus, a method and apparatus for testing a head in a storage system that contains multiple heads configured for CFO have been described. Although the present invention has been described with reference to specific exemplary embodiments, it will be recognized that the invention is not limited to the embodiments described, but can be practiced with modification and alteration within the spirit and scope of the appended claims. Accordingly, the specification and drawings are to be regarded in an illustrative sense rather than a restrictive sense.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US6609213 *||Aug 10, 2000||Aug 19, 2003||Dell Products, L.P.||Cluster-based system and method of recovery from server failures|
|US6732289 *||Aug 31, 2000||May 4, 2004||Sun Microsystems, Inc.||Fault tolerant data storage system|
|US7043663 *||Jun 28, 2002||May 9, 2006||Xiotech Corporation||System and method to monitor and isolate faults in a storage area network|
|US7268690 *||Feb 28, 2003||Sep 11, 2007||Cisco Technology, Inc.||Industrial ethernet switch|
|US20030105850 *||May 23, 2002||Jun 5, 2003||Yoogin Lean||Methods and systems for automatically configuring network monitoring system|
|US20030140107 *||Jun 22, 2001||Jul 24, 2003||Babak Rezvani||Systems and methods for virtually representing devices at remote sites|
|US20050154841 *||Mar 7, 2005||Jul 14, 2005||Gautham Sastri||Data storage system for a multi-client network and method of managing such system|
|U.S. Classification||709/224, 714/4.5|
|International Classification||G06F15/16, G06F11/00|
|Cooperative Classification||G06F11/2294, G06F11/2015, G06F11/2082, G06F11/2038, G06F11/2046|
|European Classification||G06F11/20P6, G06F11/22R|
|May 17, 2004||AS||Assignment|
Owner name: NETWORK APPLIANCE, INC., CALIFORNIA
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:NULKAR, CHAITANYA;KEMP, JEFFREY A.;GRIER, JAMES R.;AND OTHERS;SIGNING DATES FROM 20040316 TO 20040405;REEL/FRAME:015330/0609
|Jun 15, 2015||FPAY||Fee payment|
Year of fee payment: 4