WO2002075535A1 - Method for aggregating a plurality of links to simulate a unitary connection - Google Patents

Method for aggregating a plurality of links to simulate a unitary connection Download PDF

Info

Publication number
WO2002075535A1
WO2002075535A1 PCT/US2002/000337 US0200337W WO02075535A1 WO 2002075535 A1 WO2002075535 A1 WO 2002075535A1 US 0200337 W US0200337 W US 0200337W WO 02075535 A1 WO02075535 A1 WO 02075535A1
Authority
WO
WIPO (PCT)
Prior art keywords
links
link
data
programmable hardware
fibre channel
Prior art date
Application number
PCT/US2002/000337
Other languages
French (fr)
Inventor
Jeffrey J. Nelson
Robert Grant
Stephen Trevitt
Original Assignee
Mcdata Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mcdata Corporation filed Critical Mcdata Corporation
Priority to EP02707404A priority Critical patent/EP1379946A4/en
Publication of WO2002075535A1 publication Critical patent/WO2002075535A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/24Multipath
    • H04L45/245Link aggregation, e.g. trunking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/42Loop networks
    • H04L12/427Loop networks with decentralised control
    • H04L12/433Loop networks with decentralised control with asynchronous transmission, e.g. token ring, register insertion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L25/00Baseband systems
    • H04L25/02Details ; arrangements for supplying electrical power along data transmission lines
    • H04L25/14Channel dividing arrangements, i.e. in which a single bit stream is divided between several baseband channels and reassembled at the receiver
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/35Switches specially adapted for specific applications
    • H04L49/356Switches specially adapted for specific applications for storage area networks
    • H04L49/357Fibre channel switches
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Definitions

  • This invention pertains generally to a method for aggregating a plurality of links to simulate a unitary connection among one or more nodes in a fibre channel system.
  • This invention is particularly, but not exclusively, useful for providing in-order delivery of data frames across the plurality of links without requiring reinitialization of the fabric in a fibre channel system due to variations in link characteristics.
  • a "channel” provides direct or switched point-to-point connection communicating devices. The primary task of the channels is to transport data at the highest possible data rate, with the least amount of delay. Channels typically perform simple error correction in hardware.
  • a "network”, by contrast, is an aggregation of distributed nodes.
  • a “node” as used in this document is either an individual computer or another machine in a network (workstations, mass storage units, etc.) with a protocol that supports interaction among the nodes. Typically, each node is capable of recognizing error conditions on the network, and provides the error management required to recover from error conditions.
  • Fibre Channel systems typically are routed using a protocol known as the FCP Protocol, which like protocols in general, includes a data transmission convention encompassing timing, control, formatting, and data representation.
  • SCSI is an "intelligent" and parallel I/O bus on which various peripheral devices and controllers can exchange information. Although designed approximately 15 years ago, SCSI remains in use.
  • the first SCSI standard, now known as SCSI-1 was adopted in 1986 and originally designed to accommodate up to eight devices at speeds of 5 MB/sec. SCSI standards and technology have been refined and extended frequently, providing ever faster data transfer rates up to 40 MB/sec. SCSI performance has doubled approximately every five years since the original standard was released; and the number of devices permitted on a single bus, for example, has been increased to 16. In addition, backward compatibility has been enhanced, enabling newer devices to coexist on a bus with older devices.
  • Serial Storage Architecture is a high-speed serial interface designed to connect data storage devices, subsystems, servers and workstations.
  • SSA was developed and is promoted as an industry standard by IBM; formal standardization processes began in 1992.
  • SSA is undergoing approval processes as an ANSI standard.
  • the basic transfer rate through an SSA port is only 20 MB/sec
  • SSA is dual ported and full-duplex, resulting in a maximum aggregate transfer speed of up to 80 MB/sec.
  • SSA connections are carried over thin, shielded, four-wire (two differential pairs) cables, which are less expensive and more flexible than the typical 50- and 68-conductor SCSI cables.
  • IBM is the only major disk drive manufacturer shipping SSA drives; there has been little industry-wide support for SSA. That is not true of Fibre Channel, which has achieved wide industry support.
  • Fibre Channel is an industry-standard, high-speed serial data transfer interface used to connect systems and storage in point-to-point or switched topologies.
  • FC-AL technology developed with storage connectivity in mind, is a recent enhancement that also supports copper media and loops containing up to 126 devices, or nodes.
  • fibre channel is a switched protocol that allows concurrent communication among workstations, super computers and various peripherals. The total network bandwidth provided by fibre channel may be on the order of a terabit per second.
  • Fibre channel is capable of transmitting frames along links (also, "lines” or "lanes") at rates exceeding 1 gigabit per second in at least two directions simultaneously.
  • Fibre Channel may be considered a channel-network hybrid.
  • a Fibre Channel system contains sufficient network features to provide connectivity, distance and protocol multiplexing, and enough channel features to retain simplicity, repeatable performance and reliable delivery.
  • Fibre channel allows for an active, intelligent interconnection scheme, known as a "fabric,” as well as fibre channel switches to connect nodes.
  • the F/C fabric includes a plurality of fabric-ports (F_ports) that provide for interconnection and frame transfer between plurality of node-ports (N_ports) attached to associated devices that may include workstations, super computers and/or peripherals.
  • F_ports fabric-ports
  • N_ports node-ports
  • a fabric has the capability of routing frames based on information contained within the frames.
  • the N_port transmits and receives data to and from the fabric. Transmission is isolated from the control protocol so that different topologies (e.g., point-to-point links, rings, multidrop buses, and crosspoint switches) can be implemented.
  • Fibre Channel a highly reliable, gigabit interconnect technology allows concurrent communications among workstations, mainframes, servers, data storage systems, and other peripherals.
  • F/C technology not only provides interconnect systems for multiple topologies that can scale to a total system bandwidth on the order of a terabit per second, but also can deliver a high level of reliability and throughput. Switches, hubs, storage systems, storage devices, and adapters designed for the F/C environment are available now.
  • Fibre Channel Following a lengthy review of existing equipment and standards, the Fibre Channel standards group realized that it would be useful for channels and networks to share the same fiber.
  • fiber or “fibre” are used synonymously, and include both optical and copper cables.
  • a Fibre Channel protocol was developed and adopted, and continues to be developed, as the American National Standard for Information Systems ("ANSI"). See Fibre Channel Physical and Signaling Interface, Revision 4.2, American National Standard for Information Systems (ANSI) (1993) for a detailed discussion of the fibre channel standards, which is incorporated by reference into this document.
  • ANSI American National Standard for Information Systems
  • Fibre Channel's current maximum data rate at 1.0625 Gb/sec is 100 MB/sec (200 MB/sec full-duplex) after accounting for overhead.
  • Fibre Channel In addition to strong channel characteristics, Fibre Channel provides powerful networking capabilities, allowing switches and hubs to interconnect systems and storage into tightly-knit clusters. The clusters are capable of providing high levels of performance for file service, database management, or general purpose computing. Because Fibre Channel is able to span up to 10 kilometers between nodes, F/C allows very high-speed movement of data between systems that are greatly separated from one another.
  • the F/C standard defines a layered protocol architecture consisting of five layers, the highest layer defining mappings from other communication protocols onto the F/C fabric.
  • the network behind the servers links one or more servers to one or more storage systems.
  • Each storage system may be RAID ("Redundant Array of Inexpensive Disks"), tape backup, tape library, CD-ROM library, or JBOD ("Just a Bunch of Disks").
  • Fibre Channel networks have proven robust and resilient, and include at least these features: shared storage among systems; scalable networking; high performance; fast data access and backup.
  • legacy storage systems are interfaced using a Fibre Channel to SCSI bridge.
  • Fibre Channel standards include network features that provide required connectivity, distance, and protocol multiplexing.
  • F/C also supports traditional channel features for simplicity, repeatable performance, and guaranteed delivery.
  • a class 1 transfer requires circuit switching, i.e., reserved data paths through the network switch, and generally involves the transfer of more than one frame, frequently numerous frames, between two identified network elements.
  • a class 2 transfer requires allocation of a path through the network switch for each transfer of a single frame from one network element to another.
  • Frame switching for class 2 transfers is more difficult to implement that class 1 circuit switching because frame switching requires a memory mechanism for temporarily storing incoming frames in a source queue prior to their routing to a destination port, or a destination queue at a central destination port.
  • a memory mechanism typically includes numerous input/output connections with associated support circuitry and queuing logic. Additional complexity and hardware is required when channels carrying data at different bit rates are to be interfaced.
  • At least one standard in connection with Fibre Channel technology imposes the requirement to maintain guaranteed in-order delivery of data frames across connecting links, regardless of cable distances ("Distance Standard"). As indicated, the Distance Standard cannot be satisfied using SCSI technology.
  • Known striping methods for transmitting data frames across links include byte striping and word striping. Both have disadvantages in the Fibre Channel environment because of the high-speed requirements for data movement and transfer. Both byte striping and word striping require not only multiple links, but also that links remain open during transmission of data. As indicated, in an environment demanding significantly accelerated speeds of data movement, not all links will remain "open”; not all lanes consistently and continually will deliver frames at an appointed or expected point in proper sequence.
  • the present invention eliminates the problems associated with byte and word striping; frame striping is employed.
  • frame striping may be viewed or perceived as one vertical length or link; the links may be aggregated to simulate a unitary connection among the nodes. This eliminates the adverse consequences caused by variable link characteristics, including different cable lengths. Accordingly, problems associated at least with differences in length are avoided.
  • the present invention will continue to stripe data frames across the remaining links.
  • Inter-Element Links (lEL's)
  • Inter-Switch Links (ISL's) as they are sometimes referred to, between entities in a network system has, until now, proven to be a significant limiting factor to successful in-order data delivery in connection with the Delivery Standard.
  • the fabric must be reinitialized and new routing paths configured.
  • a method for aggregating links to simulate a unitary connection among one or more nodes in a fibre channel system includes providing means for striping data frames across the links. Striping data frames includes transmitting data frames in their entirety across individual links.
  • Programmable hardware mechanisms are connected to the links, as well as to the nodes.
  • the nodes may include by way of example, and not of limitation, fibre channel switches.
  • a programmable hardware mechanism may include a link controller connected to at least the links.
  • the hardware mechanisms hold a program.
  • the program includes at least an algorithm that provides at least a sequence of instructions for collecting information about each of the links.
  • the information includes the time required for a representative pattern of data to be transmitted and received across the links.
  • the algorithm therefore enables the hardware mechanism to calculate the length of links within the system.
  • the collected information may be tabulated into a table of link length information for each link.
  • variable link characteristics include, without limitation, different link lengths.
  • the present invention includes the programmable hardware mechanism that is operatively coupled to devices connected to the links.
  • the program stored in the programmable hardware mechanism collects information about the variable link characteristics to be processed by the program.
  • the programmable hardware mechanism also may include a link controller.
  • the link controller is connectable to the links.
  • the present invention may include a queue scheduler that is connected to at least the link controller and to the links. Further, queue schedulers and buffers are included for routing the collected information. In addition, queue schedulers may be included.
  • At least one objective of the hardware mechanisms is to reallocate bandwidth among the plurality of links to overcome problems of bandwidth over-subscription as well as under subscription.
  • the programmable hardware mechanism also tabulates additional information for ensuring in-order delivery of the data frames across the plurality of links.
  • the present invention also will guarantee in-order delivery of data from point-to-point even though, paradoxically, each frame may not arrive at each delivery point in sequence.
  • the present invention aggregates the links to obviate the need for sequential delivery of data frames at each point.
  • Yet another advantage of the present invention is a method for selectively transmitting frames across a fibre channel fabric that is easy to use and to practice, and is cost effective.
  • Figure 1 is a schematic diagram showing one of many ways a number of devices, including a Fibre Channel switch, may be interconnected in a Fibre Channel network;
  • Figure 2 is schematic representation of a variable-length frame communicated through a fiber optic switch as contemplated by the Fibre Channel industry standard
  • Figure 3 is a schematic block diagram showing six nodes connected to four links in a representative fibre channel system
  • Figure 4 is a schematic block flow diagram showing one way in which the method for aggregating a plurality of links to simulate a unitary connection among one or more nodes in a fibre channel system may be implemented; and
  • Figure 5 is a schematic block flow diagram showing one way in which the method for aggregating a plurality of links to simulate a unitary connection among one or more nodes in a fibre channel system may be implemented on receipt of data frames.
  • the present invention provides a method for aggregating a plurality of links to simulate a unitary connection among one or more nodes in a fibre channel system, providing in-order delivery of data frames across the plurality of links without requiring reinitialization of the fabric in a fibre channel system due to variations in link characteristics.
  • Figure 1 a schematic and block diagram is shown illustrating in general a representative fibre channel fabric 10 that includes a switch 12.
  • Fabric 10 also may include a device called a JBOD ("Just a Bunch of Disks") 14, a disk array 16, one or more servers 18a and 18b, an SCSI bridge 20 to an SCSI RAID ("Redundant Array of Inexpensive Disks") 22, as well as a Fibre Channel RAID 24 (collectively, "Devices 14-24").
  • a switch 12 which enables a Fibre Channel system to transmit and received the extraordinary amounts of data at great speed.
  • a "frame” or “data frame” 26 is the smallest individual packet of data that is sent and received on a link, and includes a presumed configuration of an aggregation of data bits into data frames 26a, b as exemplified in Figure 2.
  • the present invention provides a method for aggregating a plurality of links 28 to appear as a single virtual link (not shown) either between switch 12a and switch 12b, or between Devices 14-24 and a switch 12.
  • Plurality of links 28 is also labeled for clarity in Figure 3 as "L1-L4.”
  • the present invention trains a fibre channel system to consider plurality of links 28 as a single link for purposes of passing data frames 26 across links 28.
  • the present invention thus compensates for different link characteristics, including at least differences lengths of links L1-L4, particularly differences in the length of links 28 between nodes 30 and 32 in fabric 10.
  • the present invention determines the length differentials of links 28 in part by calculating the amount of time required for a data frame 26 to cross links 28.
  • the present invention causes a fibre channel system to "see" the four or more links shown in Figure 3 as a single virtual link for purposes of passing data frames 26 across a series of links L1-L4.
  • At least one advantage of the present invention is that the method allows for hardware-based load balancing across plurality of links 28, while achieving and maintaining requirements of fibre channel standards requiring in-order, guaranteed delivery of data frames 26 across plurality of links 28 regardless of the cable distances.
  • the hardware-based implementation of the method of the present invention automatically adjusts link characteristics in connection with or with respect to cable or other port level failures without disrupting fabric 10, and without requiring reinitialization of fabric 10.
  • links 28 connecting nodes 30 and 32 in fabric 10 must be substantially similar in length.
  • Links 28 are not substantially similar in length, the length differential engenders alignment problems within the system, that cause delays frequently call "jitter.”
  • Frame striping is employed to assist in reallocate data traffic across links 28, overcoming the limitation of word or byte striping, which requires the same number of links as there are words, a problem that has become more pronounced as links comprising a combination of four links, as shown in Figure 3, have become more standard in the field.
  • ISL's in the form of plurality of links 28 assign both source nodes and destination nodes across ISL's without respect to, and knowledge of, potential bandwidth utilization.
  • Figure 3 shows a hypothetical six nodes 30, individually labeled A1-F1 , connected to switch 12a, also labeled SW1 for clarity. Nodes A1-F1 are connected through switch 12a across plurality of links 28 to switch 12b, also labeled SW2 for clarity. Switch 12b is connected to nodes 32, individually labeled A2-F2.
  • servers 18a or 18b as shown in Figure 1 are attached to SW1 ; a storage device such as F/C RAID 24 is attached to SW2; further assuming that each server required so MB for each direction, and further assuming that the links had a capacity of 100 MB, the load would be split equally between only two storage devices for a total of 300 MB (6x50) in each direction. Accordingly, the four ISL's in plurality of links 28 between switches 12a and 12b provides
  • bandwidth means the rate at which a communications system can transmit data or, more technically, the range of frequencies that an electronic system can transmit. High bandwidth allows fast transmission or the transmission of many signals at once, a criterion of significant importance in the high-speed transmission through fibre channel systems.
  • Tables 1-3 demonstrate that with only one hundred MB capacity available on each ISL link A1-F1 , as shown in Figure 3, link 1 ("L1") is over- subscribed, thus causing system performance degradation.
  • the present invention solves the foregoing problems and limitations by providing a method for frame striping across links 28.
  • the method provides structural elements within internal switching elements to hunt for available paths across links 28, including the conventional four ISL configurations shown in Figure 3.
  • the method of the present invention causes a plurality of links 28, in a conventional configuration of four ISL's, to appear to system software as a single "virtual" ISL.
  • the method of the present invention includes a hardware mechanism 34 that is programmed by software management to adjust for link characteristic differences across links 28, to make it appear to the system that plurality of links 28 is but a single link (not shown) for purposes of passing data frames 26 across links 28.
  • Hardware mechanisms 34 associated with the present invention provide load balancing across links 28, and guarantee in-order delivery of data frames 26 between, for example, a source port and a destination port, or as represented in Figure 3, between nodes A1 and B2.
  • one or more algorithms associated with the software is configurable to detect, or may dynamically detect, variable link characteristics.
  • the one or more algorithms may be executed to calculate lengths of links 28, such as L1-L4, as shown in Figure 3.
  • the algorithm and hardware mechanism 34 send patterns of signals and data during transmit and receive functions. When such a pattern is sent, a counter, not shown but eatable in hardware mechanism 34, is started. When the pattern is received back at the transmitting source, the counter stops. Cable length, therefore, may be mathematically determined from the time to transmit and receive, and the cable lengths also may be compared to identify at least one variable link characteristic, namely link length.
  • link-to-link gap time may be established by the software associated with hardware mechanism 34.
  • hardware mechanism 34 is a link controller 34.
  • Link controller 34 does not transmit consecutive frames between the same SRC/DST port pairs until the first transmitted frame 26 has traveled far enough down a link L1-L4 to guarantee that it will be received at a receiving node 32, for example F2 as shown in Figure 3, before a second frame 26a is received. This may result in link-to-link inter-frame gap ("IFG”) time.
  • IFG link-to-link inter-frame gap
  • a transmit queue controller 36 receives data frames 26 from internal switching elements 38, and for scheduling data frames 26 for transmission across links 28.
  • link controller 34 calculates the length of each link L1-L4. Length calculations are accomplished by sending patterns across links 28, and by providing hardware mechanism 34 to transmit and receive transmission loop-backs well known to those skilled in the art.
  • the transmission loop-back value permits establishment of a table of values, created by the software within hardware mechanism 34 for each link L1-L4, which thus identifies the cable length in clock increments.
  • the term "clock" as used in this document means the circuit that generates a series of evenly spaced pulses. All switching activity occurs while the clock is sending out pulses. Between pulses, the devices are allowed to stabilize. The count being maintained by the clock expires when the head of a data frame 26 is received at a remote node 32.
  • the software in hardware mechanism 34 thus calculates differences in comparison with every other link 28 in the group. The information resulting from those calculation is maintained in the transmit queue scheduler 36 as shown in Figure 4.
  • frames 26 are received by a transmit buffer memory 40 from internal switching element 38. Transmit queue scheduler 36 copies the SRC/DST address information from a frame 26, and a queue entry is established. As links 28 become part of transmit queue scheduler 36, transmit queue scheduler 36 maintains the status of which SRC/DST frames has been transmitted across links 28, and also identifies which links L1-L4 frames 26 have been transmitted across. As subsequent data frames 26 are received in transmit buffer memory 40, transmit queue scheduler 36 compares SRC/DST data against current frames 26 being transmitted across other links 28. If no SRC/DST matches are made, frames 26 may be immediately transferred to an available link L1-L4 among plurality of links 28.
  • the software associated with hardware mechanism 34 performs one or more calculations with respect to the link length differences among links 28 last matching the SRC/DST frame 26 that was transmitted on and is currently available in links 28. If it can be guaranteed that the currently transmitted frame 26 will be received at a remote node 32 switch before a following frame 26, it can immediately be transmitted; otherwise, a frame 26 must be queued until it can be transmitted to arrive in-order.
  • link controllers 34a through 34n are provided for receiving frames 26.
  • Each link controller 34a-n is allocated one or more buffers 50 within the shared received buffer memory 44.
  • Link controllers 34a-n will sort and compare received frames 26.
  • Information 0 accumulated by links controllers 34a-n is combined with a buffer 50 number that contains data frame 26. The information is transmitted to the central queue manager 46. Because the transmit logic of software associated with hardware mechanism 34 guarantees in-order delivery of data frames 26 to remote switches, the system algorithm may employ first-in-first-out queue 5 information. As connections are made to internal switching elements 38, any buffer 50 can transmit data frames 26 across links 28.
  • central queue manager 46 requests connection to the physical destination port when a connection is established, and passes the buffer 50 number to a reader 42a-42n for transmission to internal switch 38, o and passes buffer 50 back to link controllers 34a-34n for buffer management and link control.

Abstract

A method and system (12a, 12b) for aggregating a plurality of link (L1, L2, L3, L4) to simulate a unitary connection among one or more nodes in a fibre channel system includes means for striping data frames across the link (28). One or more programmable hardware mechanisms (12a, 12b),operatively connectable to the links and to nodes (A 1, A2, B1, B2). In the fabric are provided. A program for collecting information about variable link characteristics is included, Programmable hardware. Mechanisms provide in- order, delivery date, frames across the links (11, L2, L3, L4) despite the variable link characteristics

Description

METHOD FOR AGGREGATING A PLURALITY OF LINKS TO SIMULATE A UNITARY CONNECTION
BACKGROUND OF THE INVENTION
Field Of The Invention This invention pertains generally to a method for aggregating a plurality of links to simulate a unitary connection among one or more nodes in a fibre channel system. This invention is particularly, but not exclusively, useful for providing in-order delivery of data frames across the plurality of links without requiring reinitialization of the fabric in a fibre channel system due to variations in link characteristics.
Relevant Background
The information explosion of recent decades has, in part, driven requirements for enhanced computer performance that has increased significantly, if not exponentially. Consequently, demand for high- performance communications for server-to-storage and server-to-server networking has increased. Performance improvements in hardware entities, including storage, processors, and workstations, along with the move to distributed architectures such as client server, have increased the demand for data-intensive and high-speed networking applications. The interconnections between and among these systems, and their input/output devices, require enhanced levels of performance in reliability, speed, and distance. Simultaneously, demands for more robust, highly available, disaster-tolerant computing resources, with ever-increasing speed and memory capabilities, continue unabated. To satisfy such demands, the computer industry has worked to overcome performance problems often attributable to conventional I/O ("input/output") device subsystems. Mainframes, supercomputers, mass storage systems, workstations and very high resolution display subsystems frequently are connected to facilitate file and print sharing. Because of the demand for increased speed across such systems, networks and channels conventionally used for connections introduce communication clogging, aptly called "bottlenecks," especially if data is in large file format typical of graphically based applications. Efforts to satisfy enhanced performance demands have been, in part, directed to providing storage interconnect solutions that address performance and reliability requirements of modern storage systems. At least three technologies are directed to solving those problems, SCSI ("Small Computer Systems Interface"); SSA ("Serial Storage Architecture"), a technology advanced primarily by IBM; and Fibre Channel ("F/C"), a high performance interconnect technology.
Two prevalent types of data communication connections exist between processors, and between a processor and peripherals. A "channel" provides direct or switched point-to-point connection communicating devices. The primary task of the channels is to transport data at the highest possible data rate, with the least amount of delay. Channels typically perform simple error correction in hardware. A "network", by contrast, is an aggregation of distributed nodes. A "node" as used in this document is either an individual computer or another machine in a network (workstations, mass storage units, etc.) with a protocol that supports interaction among the nodes. Typically, each node is capable of recognizing error conditions on the network, and provides the error management required to recover from error conditions. Protocols, of course, are analogous to various languages and dialects used in human speech; to the extent that a node can "understand" which protocol is used, all nodes in a system can "speak the same language." Fibre Channel systems typically are routed using a protocol known as the FCP Protocol, which like protocols in general, includes a data transmission convention encompassing timing, control, formatting, and data representation.
SCSI is an "intelligent" and parallel I/O bus on which various peripheral devices and controllers can exchange information. Although designed approximately 15 years ago, SCSI remains in use. The first SCSI standard, now known as SCSI-1 , was adopted in 1986 and originally designed to accommodate up to eight devices at speeds of 5 MB/sec. SCSI standards and technology have been refined and extended frequently, providing ever faster data transfer rates up to 40 MB/sec. SCSI performance has doubled approximately every five years since the original standard was released; and the number of devices permitted on a single bus, for example, has been increased to 16. In addition, backward compatibility has been enhanced, enabling newer devices to coexist on a bus with older devices. Significant problems associated with SCSI remain, however, including, for example, limitations caused by bus speed, bus length, reliability, cost, and device count. In connection with bus length, originally limited to six meters, newer standards requiring even faster transfer rates and higher device populations now place more stringent limitations on bus length that are only partially cured by expensive differential cabling or extenders. Accordingly, industry designers now seek to solve limitations inherent in SCSI by employing serial device interfaces. Featuring data transfer rates as high as 200 MB/sec, serial interfaces use point-to-point interconnections rather than busses. Serial designs also decrease cable complexity, simplify electrical requirements, and increase reliability. Two solutions have been considered, Serial Storage Architecture ("SSA") and what has become known as Fibre Channel technology, including the Fibre Channel Arbitrated Loop ("FC-AL").
Serial Storage Architecture is a high-speed serial interface designed to connect data storage devices, subsystems, servers and workstations. SSA was developed and is promoted as an industry standard by IBM; formal standardization processes began in 1992. Currently, SSA is undergoing approval processes as an ANSI standard. Although the basic transfer rate through an SSA port is only 20 MB/sec, SSA is dual ported and full-duplex, resulting in a maximum aggregate transfer speed of up to 80 MB/sec. SSA connections are carried over thin, shielded, four-wire (two differential pairs) cables, which are less expensive and more flexible than the typical 50- and 68-conductor SCSI cables. Currently, IBM is the only major disk drive manufacturer shipping SSA drives; there has been little industry-wide support for SSA. That is not true of Fibre Channel, which has achieved wide industry support.
Fibre Channel is an industry-standard, high-speed serial data transfer interface used to connect systems and storage in point-to-point or switched topologies. FC-AL technology, developed with storage connectivity in mind, is a recent enhancement that also supports copper media and loops containing up to 126 devices, or nodes. Briefly, fibre channel is a switched protocol that allows concurrent communication among workstations, super computers and various peripherals. The total network bandwidth provided by fibre channel may be on the order of a terabit per second. Fibre channel is capable of transmitting frames along links (also, "lines" or "lanes") at rates exceeding 1 gigabit per second in at least two directions simultaneously. F/C technology also is able to transport commands and data according to existing protocols such a Internet protocol ("IP"), high performance parallel interface ("HIPPI"), intelligent peripheral interface ("IPI"), and, as indicated using SCSI, over and across both optical fiber and copper cable. Fibre Channel may be considered a channel-network hybrid. A Fibre Channel system contains sufficient network features to provide connectivity, distance and protocol multiplexing, and enough channel features to retain simplicity, repeatable performance and reliable delivery. Fibre channel allows for an active, intelligent interconnection scheme, known as a "fabric," as well as fibre channel switches to connect nodes.
The F/C fabric includes a plurality of fabric-ports (F_ports) that provide for interconnection and frame transfer between plurality of node-ports (N_ports) attached to associated devices that may include workstations, super computers and/or peripherals. A fabric has the capability of routing frames based on information contained within the frames. The N_port transmits and receives data to and from the fabric. Transmission is isolated from the control protocol so that different topologies (e.g., point-to-point links, rings, multidrop buses, and crosspoint switches) can be implemented. Fibre Channel, a highly reliable, gigabit interconnect technology allows concurrent communications among workstations, mainframes, servers, data storage systems, and other peripherals. F/C technology not only provides interconnect systems for multiple topologies that can scale to a total system bandwidth on the order of a terabit per second, but also can deliver a high level of reliability and throughput. Switches, hubs, storage systems, storage devices, and adapters designed for the F/C environment are available now.
Following a lengthy review of existing equipment and standards, the Fibre Channel standards group realized that it would be useful for channels and networks to share the same fiber. (The terms "fiber" or "fibre" are used synonymously, and include both optical and copper cables.) A Fibre Channel protocol was developed and adopted, and continues to be developed, as the American National Standard for Information Systems ("ANSI"). See Fibre Channel Physical and Signaling Interface, Revision 4.2, American National Standard for Information Systems (ANSI) (1993) for a detailed discussion of the fibre channel standards, which is incorporated by reference into this document.
Current standards for F/C support bandwidth of 133 Mb/sec, 266 Mb/sec, 532 Mb/sec, 1.0625 Gb/sec, and 2 Gb/sec (proposed) at distances of up to ten kilometers. Fibre Channel's current maximum data rate at 1.0625 Gb/sec is 100 MB/sec (200 MB/sec full-duplex) after accounting for overhead.
In addition to strong channel characteristics, Fibre Channel provides powerful networking capabilities, allowing switches and hubs to interconnect systems and storage into tightly-knit clusters. The clusters are capable of providing high levels of performance for file service, database management, or general purpose computing. Because Fibre Channel is able to span up to 10 kilometers between nodes, F/C allows very high-speed movement of data between systems that are greatly separated from one another.
Also, the F/C standard defines a layered protocol architecture consisting of five layers, the highest layer defining mappings from other communication protocols onto the F/C fabric. The network behind the servers links one or more servers to one or more storage systems. Each storage system may be RAID ("Redundant Array of Inexpensive Disks"), tape backup, tape library, CD-ROM library, or JBOD ("Just a Bunch of Disks"). Fibre Channel networks have proven robust and resilient, and include at least these features: shared storage among systems; scalable networking; high performance; fast data access and backup. In a Fibre Channel network, legacy storage systems are interfaced using a Fibre Channel to SCSI bridge. Fibre Channel standards include network features that provide required connectivity, distance, and protocol multiplexing. F/C also supports traditional channel features for simplicity, repeatable performance, and guaranteed delivery.
The Fibre Channel industry standards also provide for several different types, or classes, of data transfers. A class 1 transfer requires circuit switching, i.e., reserved data paths through the network switch, and generally involves the transfer of more than one frame, frequently numerous frames, between two identified network elements. In contrast, a class 2 transfer requires allocation of a path through the network switch for each transfer of a single frame from one network element to another. Frame switching for class 2 transfers is more difficult to implement that class 1 circuit switching because frame switching requires a memory mechanism for temporarily storing incoming frames in a source queue prior to their routing to a destination port, or a destination queue at a central destination port. A memory mechanism typically includes numerous input/output connections with associated support circuitry and queuing logic. Additional complexity and hardware is required when channels carrying data at different bit rates are to be interfaced.
At least one standard in connection with Fibre Channel technology imposes the requirement to maintain guaranteed in-order delivery of data frames across connecting links, regardless of cable distances ("Distance Standard"). As indicated, the Distance Standard cannot be satisfied using SCSI technology. Known striping methods for transmitting data frames across links include byte striping and word striping. Both have disadvantages in the Fibre Channel environment because of the high-speed requirements for data movement and transfer. Both byte striping and word striping require not only multiple links, but also that links remain open during transmission of data. As indicated, in an environment demanding significantly accelerated speeds of data movement, not all links will remain "open"; not all lanes consistently and continually will deliver frames at an appointed or expected point in proper sequence. The result has been described as a bottleneck, the inability of each successive frame to pass across each link in a prescribed or desired order or sequence. To achieve the objective of sequential, in-order delivery of data frames across connecting links, existing methods and apparatus require that all cables and channels be similar in length. Otherwise, alignment problems attributable to delayed sequencing occur. Those skilled in the art sometimes refer to delayed sequencing of data in the form of frames as "jitter." Existing technologies are unable to provide sufficient error management to overcome the problems of clogging, bottlenecks, and jitter.
The present invention eliminates the problems associated with byte and word striping; frame striping is employed. By directing successive data frames across links connecting entities in a F/C environment, load balancing is achieved across all links. As viewed by software associated with F/C technology, frame striping may be viewed or perceived as one vertical length or link; the links may be aggregated to simulate a unitary connection among the nodes. This eliminates the adverse consequences caused by variable link characteristics, including different cable lengths. Accordingly, problems associated at least with differences in length are avoided. Considering the pragmatic problems that impact operation of a F/C network, if one F/C link is cut or disable, the present invention will continue to stripe data frames across the remaining links. Thus, unlike the problems inherent in the conventional SCSI system, a disruption on one link will not affect operation of the system as a whole. The present invention quickly reallocates traffic across the links. Inter-Element Links ("lEL's"); or Inter-Switch Links ("ISL's") as they are sometimes referred to, between entities in a network system has, until now, proven to be a significant limiting factor to successful in-order data delivery in connection with the Delivery Standard. As the lengths change between points in the fabric, or between entities in the network, without the present invention the fabric must be reinitialized and new routing paths configured.
Therefore, a previously unaddressed need exists in the industry for a new, useful and reliable method and apparatus for aggregating links in networks, particularly in a Fibre Channel environment. It would be of considerable advantage to provide a method and apparatus that aggregates a plurality of links to simulate a unitary connection among one or more nodes in a fibre channel system, thus enabling in-order delivery of data frames across the plurality of links without reinitializing the fabric in a fibre channel system due to variations in link characteristics.
SUMMARY OF THE INVENTION
In accordance with the present invention a method for aggregating links to simulate a unitary connection among one or more nodes in a fibre channel system is provided. According to the present invention, a method for aggregating a plurality of links to simulate a unitary connection among one or more nodes in a fibre channel system includes providing means for striping data frames across the links. Striping data frames includes transmitting data frames in their entirety across individual links.
Programmable hardware mechanisms are connected to the links, as well as to the nodes. The nodes may include by way of example, and not of limitation, fibre channel switches. A programmable hardware mechanism may include a link controller connected to at least the links.
The hardware mechanisms hold a program. The program includes at least an algorithm that provides at least a sequence of instructions for collecting information about each of the links. The information includes the time required for a representative pattern of data to be transmitted and received across the links. The algorithm therefore enables the hardware mechanism to calculate the length of links within the system. The collected information may be tabulated into a table of link length information for each link.
In-order delivery of data across the links is affected by variable link characteristics. Variable link characteristics include, without limitation, different link lengths. To overcome problems precluding in-order delivery of data frames across links due to variable link characteristics, the present invention includes the programmable hardware mechanism that is operatively coupled to devices connected to the links. The program stored in the programmable hardware mechanism collects information about the variable link characteristics to be processed by the program. The programmable hardware mechanism also may include a link controller. The link controller is connectable to the links. In addition, the present invention may include a queue scheduler that is connected to at least the link controller and to the links. Further, queue schedulers and buffers are included for routing the collected information. In addition, queue schedulers may be included. The combination of elements, and application of the method, of the present invention provides in-order data delivery across the links of a fibre channel system, regardless of intervening system disruptions caused by the link characteristics.
At least one objective of the hardware mechanisms is to reallocate bandwidth among the plurality of links to overcome problems of bandwidth over-subscription as well as under subscription. The programmable hardware mechanism also tabulates additional information for ensuring in-order delivery of the data frames across the plurality of links.
The present invention also will guarantee in-order delivery of data from point-to-point even though, paradoxically, each frame may not arrive at each delivery point in sequence. The present invention aggregates the links to obviate the need for sequential delivery of data frames at each point. Yet another advantage of the present invention is a method for selectively transmitting frames across a fibre channel fabric that is easy to use and to practice, and is cost effective.
These advantages, and other objects and features, of such a method 5 for aggregating a plurality of links to simulate a unitary connection among one or more nodes in a fibre channel system to provide in-order delivery of data frames across the plurality of links without reinitializing the fabric in a fibre channel system due to variations in link characteristics, will become apparent to those skilled in the art when read in conjunction with the accompanying 0 following description, drawing figures, and appended claims.
As those skilled in the art will appreciate, the conception on which this disclosure is based readily may be used as a basis for designing other structures, methods, and systems for carrying out the purposes of the present invention. The claims, therefore, include such equivalent constructions to the 5 extent the equivalent constructions do not depart from the spirit and scope of the present invention. Further, the abstract associated with this disclosure is neither intended to define the invention, which is measured by the claims, nor intended to be limiting as to the scope of the invention in any way.
The foregoing has outlined broadly the more important features of the o invention to better understand the detailed description that follows, and to better understand the contribution of the present invention to the art. Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not limited in application to the details of construction, and to the arrangements of the components, provided in the 5 following description or drawing figures. The invention is capable of other embodiments, and of being practiced and carried out in various ways. Also, the phraseology and terminology employed in this disclosure are for purpose of description, and should not be regarded as limiting.
The novel features of this invention, and the invention itself, both as to o structure and operation, are best understood from the accompanying drawing, considered in connection with the accompanying description of the drawing, in which similar reference characters refer to similar parts, and in which:
BRIEF DESCRIPTION OF THE DRAWING
Figure 1 is a schematic diagram showing one of many ways a number of devices, including a Fibre Channel switch, may be interconnected in a Fibre Channel network;
Figure 2 is schematic representation of a variable-length frame communicated through a fiber optic switch as contemplated by the Fibre Channel industry standard; Figure 3 is a schematic block diagram showing six nodes connected to four links in a representative fibre channel system;
Figure 4 is a schematic block flow diagram showing one way in which the method for aggregating a plurality of links to simulate a unitary connection among one or more nodes in a fibre channel system may be implemented; and Figure 5 is a schematic block flow diagram showing one way in which the method for aggregating a plurality of links to simulate a unitary connection among one or more nodes in a fibre channel system may be implemented on receipt of data frames.
DESCRIPTION OF THE PREFERRED EMBODIMENTS Briefly, the present invention provides a method for aggregating a plurality of links to simulate a unitary connection among one or more nodes in a fibre channel system, providing in-order delivery of data frames across the plurality of links without requiring reinitialization of the fabric in a fibre channel system due to variations in link characteristics. Referring first to Figure 1 , a schematic and block diagram is shown illustrating in general a representative fibre channel fabric 10 that includes a switch 12. Fabric 10 also may include a device called a JBOD ("Just a Bunch of Disks") 14, a disk array 16, one or more servers 18a and 18b, an SCSI bridge 20 to an SCSI RAID ("Redundant Array of Inexpensive Disks") 22, as well as a Fibre Channel RAID 24 (collectively, "Devices 14-24"). One of the important devices in the F/C fabric or system is switch 12, which enables a Fibre Channel system to transmit and received the extraordinary amounts of data at great speed. As used in this document, and as shown in Figure 2, a "frame" or "data frame" 26 is the smallest individual packet of data that is sent and received on a link, and includes a presumed configuration of an aggregation of data bits into data frames 26a, b as exemplified in Figure 2.
As shown in Figure 3, the present invention provides a method for aggregating a plurality of links 28 to appear as a single virtual link (not shown) either between switch 12a and switch 12b, or between Devices 14-24 and a switch 12. Plurality of links 28 is also labeled for clarity in Figure 3 as "L1-L4." As indicated, the present invention trains a fibre channel system to consider plurality of links 28 as a single link for purposes of passing data frames 26 across links 28. The present invention thus compensates for different link characteristics, including at least differences lengths of links L1-L4, particularly differences in the length of links 28 between nodes 30 and 32 in fabric 10. The present invention determines the length differentials of links 28 in part by calculating the amount of time required for a data frame 26 to cross links 28. The present invention causes a fibre channel system to "see" the four or more links shown in Figure 3 as a single virtual link for purposes of passing data frames 26 across a series of links L1-L4.
At least one advantage of the present invention is that the method allows for hardware-based load balancing across plurality of links 28, while achieving and maintaining requirements of fibre channel standards requiring in-order, guaranteed delivery of data frames 26 across plurality of links 28 regardless of the cable distances. The hardware-based implementation of the method of the present invention, more fully described below, automatically adjusts link characteristics in connection with or with respect to cable or other port level failures without disrupting fabric 10, and without requiring reinitialization of fabric 10. In a fibre channel environment not having the advantages of the present invention, links 28 connecting nodes 30 and 32 in fabric 10 must be substantially similar in length. If links 28 are not substantially similar in length, the length differential engenders alignment problems within the system, that cause delays frequently call "jitter." Frame striping is employed to assist in reallocate data traffic across links 28, overcoming the limitation of word or byte striping, which requires the same number of links as there are words, a problem that has become more pronounced as links comprising a combination of four links, as shown in Figure 3, have become more standard in the field.
As shown in Figure 3, ISL's in the form of plurality of links 28 assign both source nodes and destination nodes across ISL's without respect to, and knowledge of, potential bandwidth utilization. For example, Figure 3 shows a hypothetical six nodes 30, individually labeled A1-F1 , connected to switch 12a, also labeled SW1 for clarity. Nodes A1-F1 are connected through switch 12a across plurality of links 28 to switch 12b, also labeled SW2 for clarity. Switch 12b is connected to nodes 32, individually labeled A2-F2. For purposes of explication, it is assumed that at fabric initialization time, routes through fabric 10 are established. If, in the configuration shown in Figure 3, servers 18a or 18b as shown in Figure 1 are attached to SW1 ; a storage device such as F/C RAID 24 is attached to SW2; further assuming that each server required so MB for each direction, and further assuming that the links had a capacity of 100 MB, the load would be split equally between only two storage devices for a total of 300 MB (6x50) in each direction. Accordingly, the four ISL's in plurality of links 28 between switches 12a and 12b provides
400 MB in each direction. Accordingly, 25% additional bandwidth is available. As used in this document, the term "bandwidth" means the rate at which a communications system can transmit data or, more technically, the range of frequencies that an electronic system can transmit. High bandwidth allows fast transmission or the transmission of many signals at once, a criterion of significant importance in the high-speed transmission through fibre channel systems.
Because data traffic patterns across links 28 are not known at the time of initialization, and because the rate and volume of traffic across links 28 are dependent on executing applications through servers 18a and 18b, as shown in Figure 1 , a condition aptly called "bottlenecks" on one or more ISL's may occur. Due to a bottleneck, a F/C system may not have enough bandwidth across links 28. For example, as shown in Figure 3, each node A1-F1 has 50 MB capacity so the collectively nodes A1-F1 have a total of three hundred MB/s. In a storage configuration, therefore, the results shown in Tables 1-3 may follow. ISL link requirements would be as shown in Table 1.
Figure imgf000016_0001
TABLE 1
Based on Table 1 , the following bandwidths are required, based on the data traffic between A1-F1 having 50 MB for a throughput total of 300MB:
Figure imgf000016_0002
TABLE 2 It follows, therefore, that ISL link requirements would be:
Figure imgf000016_0003
TABLE 3
Tables 1-3 demonstrate that with only one hundred MB capacity available on each ISL link A1-F1 , as shown in Figure 3, link 1 ("L1") is over- subscribed, thus causing system performance degradation.
If this problem were extant in a conventional 4 - ISL configuration of four links 28, as shown in Figure 3, two alternatives may be available for redistribution of data traffic in a fibre channel environment. One alternative for redistribution of bandwidth across the four links L1-L4 is to program the ISL's for an Error_Detect_Timeout_Value, typically two seconds, followed by reallocation of the routes across links 28. Another alternative for redistribution of traffic across links 28 is to reallocate the routes by creating a distribution plan for out-of-order delivery of frames 26. For example, if nodes A1 and B1 used L2 instead of L1 , it might be possible to buffer frames 26 in SW1 from A1 to B1 on L1 to cause delivery to occur after frames 26 have used L2.
Either alternative, however, accepts the likelihood of performance degradation because of the time delay of two seconds, or because a command to conduct out-of-order delivery of frames 26 for passage across links 28 would, as applications change among nodes 30 and 32, cause new bottlenecks to be introduced, thus compounding the problems sought to be overcome by the present invention.
Perhaps yet another alternative available under current technology to solve inadequate allocation of bandwidth would be to apply options from current 10 GB technologies, by employing byte striping or word striping across links 28. Although this approach might supply adequate bandwidth, the solution is only temporary: application of byte striping or word striping introduces a potential single point of failure in the system as a whole. Additionally, using byte and word striping requires cable link matching to avoid improper byte/load alignment caused by the distance or length of cable limitations, particularly in metropolitan distances.
The present invention solves the foregoing problems and limitations by providing a method for frame striping across links 28. The method provides structural elements within internal switching elements to hunt for available paths across links 28, including the conventional four ISL configurations shown in Figure 3. The method of the present invention causes a plurality of links 28, in a conventional configuration of four ISL's, to appear to system software as a single "virtual" ISL.
As shown by cross-reference between Figures 3 and 4, the method of the present invention includes a hardware mechanism 34 that is programmed by software management to adjust for link characteristic differences across links 28, to make it appear to the system that plurality of links 28 is but a single link (not shown) for purposes of passing data frames 26 across links 28. Hardware mechanisms 34 associated with the present invention provide load balancing across links 28, and guarantee in-order delivery of data frames 26 between, for example, a source port and a destination port, or as represented in Figure 3, between nodes A1 and B2.
In a conventional configuration for ISL's, as shown in Figure 3, one or more algorithms associated with the software, and well known to those skilled in the art, is configurable to detect, or may dynamically detect, variable link characteristics. The one or more algorithms may be executed to calculate lengths of links 28, such as L1-L4, as shown in Figure 3. The algorithm and hardware mechanism 34 send patterns of signals and data during transmit and receive functions. When such a pattern is sent, a counter, not shown but eatable in hardware mechanism 34, is started. When the pattern is received back at the transmitting source, the counter stops. Cable length, therefore, may be mathematically determined from the time to transmit and receive, and the cable lengths also may be compared to identify at least one variable link characteristic, namely link length. When time and link length differences are determined, and time delays have been determined for each link L1-L4, link-to-link gap time may be established by the software associated with hardware mechanism 34. In a preferred embodiment of the present invention, as shown in Figure 4, hardware mechanism 34 is a link controller 34. Link controller 34 does not transmit consecutive frames between the same SRC/DST port pairs until the first transmitted frame 26 has traveled far enough down a link L1-L4 to guarantee that it will be received at a receiving node 32, for example F2 as shown in Figure 3, before a second frame 26a is received. This may result in link-to-link inter-frame gap ("IFG") time. The IFG time will be dependent on individual links and variable link characteristics involved, and is a calculation only required when link controller 34 determines that an existing data frame 26 is in flight across fabric 10, and a subsequent data frame 26 with the same SRC/DST pair must be transmitted. A transmit queue controller 36, as shown in Figure 4, receives data frames 26 from internal switching elements 38, and for scheduling data frames 26 for transmission across links 28.
As indicated, link controller 34 calculates the length of each link L1-L4. Length calculations are accomplished by sending patterns across links 28, and by providing hardware mechanism 34 to transmit and receive transmission loop-backs well known to those skilled in the art. The transmission loop-back value permits establishment of a table of values, created by the software within hardware mechanism 34 for each link L1-L4, which thus identifies the cable length in clock increments. The term "clock" as used in this document means the circuit that generates a series of evenly spaced pulses. All switching activity occurs while the clock is sending out pulses. Between pulses, the devices are allowed to stabilize. The count being maintained by the clock expires when the head of a data frame 26 is received at a remote node 32. The software in hardware mechanism 34 thus calculates differences in comparison with every other link 28 in the group. The information resulting from those calculation is maintained in the transmit queue scheduler 36 as shown in Figure 4.
As also shown in Figure 4, frames 26 are received by a transmit buffer memory 40 from internal switching element 38. Transmit queue scheduler 36 copies the SRC/DST address information from a frame 26, and a queue entry is established. As links 28 become part of transmit queue scheduler 36, transmit queue scheduler 36 maintains the status of which SRC/DST frames has been transmitted across links 28, and also identifies which links L1-L4 frames 26 have been transmitted across. As subsequent data frames 26 are received in transmit buffer memory 40, transmit queue scheduler 36 compares SRC/DST data against current frames 26 being transmitted across other links 28. If no SRC/DST matches are made, frames 26 may be immediately transferred to an available link L1-L4 among plurality of links 28. If a match is made, the software associated with hardware mechanism 34 performs one or more calculations with respect to the link length differences among links 28 last matching the SRC/DST frame 26 that was transmitted on and is currently available in links 28. If it can be guaranteed that the currently transmitted frame 26 will be received at a remote node 32 switch before a following frame 26, it can immediately be transmitted; otherwise, a frame 26 must be queued until it can be transmitted to arrive in-order.
5 As shown in Figure 5, at the data frame 26 receiving end of the plurality of links 28, one or more link controllers 34a through 34n, are provided for receiving frames 26. Each link controller 34a-n is allocated one or more buffers 50 within the shared received buffer memory 44. Link controllers 34a-n will sort and compare received frames 26. Information 0 accumulated by links controllers 34a-n is combined with a buffer 50 number that contains data frame 26. The information is transmitted to the central queue manager 46. Because the transmit logic of software associated with hardware mechanism 34 guarantees in-order delivery of data frames 26 to remote switches, the system algorithm may employ first-in-first-out queue 5 information. As connections are made to internal switching elements 38, any buffer 50 can transmit data frames 26 across links 28. Thus, as internal links become available, central queue manager 46 requests connection to the physical destination port when a connection is established, and passes the buffer 50 number to a reader 42a-42n for transmission to internal switch 38, o and passes buffer 50 back to link controllers 34a-34n for buffer management and link control.
While the method for aggregating a plurality of links to simulate a unitary connection among one or more nodes in a fibre channel system as shown in drawing figures 1 through 5 is one embodiment of the present 5 invention, it is indeed but one embodiment of the invention, is not intended to be exclusive, and is not a limitation of the present invention. While the particular method for scoring queued frames for selective transmission through a switch as shown and disclosed in detail in this instrument is fully capable of obtaining the objects and providing the advantages stated, this o disclosure is merely illustrative of the presently preferred embodiments of the invention, and no limitations are intended in connection with the details of construction, design or composition other than as provided and described in the appended claims.

Claims

1. A method for aggregating a plurality of links to simulate a unitary connection among one or more nodes in a fibre channel system, comprising: providing means for striping data frames across the plurality of links; equipping the system with at least one programmable hardware mechanism operatively connectable to the plurality of links and to the one or more nodes; installing in the at least one programmable hardware mechanisms a program for collecting information about each of the plurality of links; tabulating the information for use by the at least one programmable hardware mechanisms; and employing the at least one programmable hardware mechanisms to provide in-order delivery of the data frames across the plurality of links.
2. The method of claim 1 wherein said providing means for striping data frames includes providing means for transmitting at least one complete data frame across one or more links among the plurality of links.
3. The method of claim 1 wherein said equipping the system with at least one programmable hardware mechanism includes the providing a link controller operatively connectable to at least the plurality of links.
4. The method of claim 1 wherein the program installing step includes installing at least one algorithm for determining at least the length of the plurality of links.
5. The method of claim 1 wherein the information tabulating step includes creating a table of link length information for each link in the plurality of Links.
6. The method of claim 1 wherein said employing at least one programmable hardware mechanism includes reallocating bandwidth among the plurality of links.
7. A method for in-order delivery of data across one or more links having variable link characteristics, comprising: disposing one or more devices operatively connectable to the one or more Links; including a programmable hardware mechanism operatively couplable to the one or more devices; storing a program in the programmable hardware mechanism for collecting the variable link characteristics; striping one or more frames of data across the one or more links to the one or more devices; and executing the program to achieve in-order delivery of the data to the one or more devices.
8. The method of claim 7 wherein the disposing step includes supplying the one or more devices in a fibre channel system.
9. The method of claim 7 wherein the including step further includes transmitting the data by: including one or more link controllers operatively connectable to the plurality of links; including one or more queue schedulers operatively connectable to the one or more link controllers and to the one or more links; including one or more internal switching elements operatively connectable to at least the one or more queue schedulers; and including one or more buffers operatively connectable to at least the one or more switching elements and to the one or more queue schedulers.
10. The method of claim 9 wherein the including step further comprises installing a link controller operatively connectable to at least the one or more Links.
11. The method of claim 7 wherein the program storing step includes providing an algorithm capable of determining link lengths of the one or more Links.
12. A system for adjusting link characteristics to provide in-order data delivery, comprising: a plurality of links; one or more nodes connectable to the plurality of links; at least one programmable hardware mechanism operatively connectable to the plurality of links and to the one or more nodes; a program installable in the at least one programmable hardware mechanism for collecting the link characteristics; means for frame striping the data across the plurality of links; and means for executing the at least one programmable hardware mechanism to eliminate system disruptions caused by the link characteristics.
13. The system of claim 12 wherein the plurality of links reside in a fibre channel fabric.
14. The system of claim 12 wherein the one or more nodes includes at least a fibre channel switch.
15. The system of claim 12 wherein the at least one programmable hardware mechanism includes one or more link controllers operatively connectable to the plurality of links.
16. The system of claim 12 wherein the program is installable in the one or more link controllers.
17. The system of claim 12 wherein the means for frame striping includes transmitting entire data frames through the plurality of links.
18. The system of claim 17 wherein the means for frame striping includes a controller for determining link-to-link interframe gap time.
19. The system of claim 12 wherein the means for executing the at least one programmable hardware mechanism includes an algorithm for determining link length differences.
20. The system of claim 19 wherein the algorithm further calculates time differentials between sending and receiving data across the plurality of links.
21. The system of claim 12 wherein the means for executing the at least one programmable hardware mechanism includes creating one or more tables of link characteristics.
PCT/US2002/000337 2001-03-14 2002-01-07 Method for aggregating a plurality of links to simulate a unitary connection WO2002075535A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP02707404A EP1379946A4 (en) 2001-03-14 2002-01-07 Method for aggregating a plurality of links to simulate a unitary connection

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/809,996 US6941252B2 (en) 2001-03-14 2001-03-14 Striping data frames across parallel fibre channel links
US09/809,996 2001-03-14

Publications (1)

Publication Number Publication Date
WO2002075535A1 true WO2002075535A1 (en) 2002-09-26

Family

ID=25202704

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2002/000337 WO2002075535A1 (en) 2001-03-14 2002-01-07 Method for aggregating a plurality of links to simulate a unitary connection

Country Status (3)

Country Link
US (1) US6941252B2 (en)
EP (3) EP2285055B1 (en)
WO (1) WO2002075535A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1779590A1 (en) * 2004-08-20 2007-05-02 Cisco Technology, Inc. Port aggregation for fibre channel interfaces
US7848253B2 (en) 1999-01-12 2010-12-07 Mcdata Corporation Method for scoring queued frames for selective transmission through a switch
US8190790B1 (en) 2011-05-31 2012-05-29 Hitachi, Ltd. Storage apparatus and method of controlling the same
US8412831B2 (en) 2009-08-03 2013-04-02 Brocade Communications Systems, Inc. Per priority TCP quality of service

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8271672B1 (en) * 2001-08-31 2012-09-18 Juniper Networks, Inc. Guaranteed bandwidth memory apparatus and method
WO2003032555A2 (en) 2001-10-05 2003-04-17 Aware, Inc. Systems and methods for multi-pair atm over dsl
US7698454B1 (en) * 2001-11-26 2010-04-13 Juniper Networks, Inc. Interfacing with streams of differing speeds
US7126970B2 (en) * 2001-12-20 2006-10-24 Tropic Networks Inc. Communication system with balanced transmission bandwidth
US7593336B2 (en) 2003-10-31 2009-09-22 Brocade Communications Systems, Inc. Logical ports in trunking
US7619974B2 (en) * 2003-10-31 2009-11-17 Brocade Communication Systems, Inc. Frame traffic balancing across trunk groups
US7400585B2 (en) * 2004-09-23 2008-07-15 International Business Machines Corporation Optimal interconnect utilization in a data processing network
US7729361B2 (en) * 2006-02-24 2010-06-01 Cisco Technology, Inc. Method and system for power-efficient adaptive link aggregation
US7548556B1 (en) * 2007-12-14 2009-06-16 Raptor Networks Technology, Inc. Secure communication through a network fabric
US8223803B2 (en) * 2008-02-07 2012-07-17 Infinera Corporation Programmable time division multiplexed switching
US8223633B2 (en) * 2008-10-03 2012-07-17 Brocade Communications Systems, Inc. Port trunking at a fabric boundary
CN102130911A (en) * 2011-03-01 2011-07-20 林定伟 Method for simulating network
US20130191569A1 (en) * 2012-01-25 2013-07-25 Qualcomm Incorporated Multi-lane high-speed interfaces for high speed synchronous serial interface (hsi), and related systems and methods
US9917728B2 (en) 2014-01-14 2018-03-13 Nant Holdings Ip, Llc Software-based fabric enablement
US10212101B2 (en) 2014-01-14 2019-02-19 Nant Holdings Ip, Llc Low level provisioning of network fabrics

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5418939A (en) * 1992-02-20 1995-05-23 International Business Machines Corporation Concurrent maintenance of degraded parallel/serial buses
US5544345A (en) * 1993-11-08 1996-08-06 International Business Machines Corporation Coherence controls for store-multiple shared data coordinated by cache directory entries in a shared electronic storage
US5586264A (en) * 1994-09-08 1996-12-17 Ibm Corporation Video optimized media streamer with cache management
US5768623A (en) * 1995-09-19 1998-06-16 International Business Machines Corporation System and method for sharing multiple storage arrays by dedicating adapters as primary controller and secondary controller for arrays reside in different host computers
US5790794A (en) * 1995-08-11 1998-08-04 Symbios, Inc. Video storage unit architecture
US5894481A (en) * 1996-09-11 1999-04-13 Mcdata Corporation Fiber channel switch employing distributed queuing
US5928327A (en) * 1996-08-08 1999-07-27 Wang; Pong-Sheng System and process for delivering digital data on demand
US5964886A (en) * 1998-05-12 1999-10-12 Sun Microsystems, Inc. Highly available cluster virtual disk system
US5999930A (en) * 1996-08-02 1999-12-07 Hewlett-Packard Company Method and apparatus for distributed control of a shared storage volume
US6029168A (en) * 1998-01-23 2000-02-22 Tricord Systems, Inc. Decentralized file mapping in a striped network file system in a distributed computing environment

Family Cites Families (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4577312A (en) * 1984-07-05 1986-03-18 At&T Bell Laboratories Arrangement for wideband transmission via a switched network
US4959540A (en) * 1989-05-15 1990-09-25 International Business Machines Corporation Optical clock system with optical time delay means
US5251210A (en) * 1991-11-01 1993-10-05 Ibm Corporation Method and apparatus for transforming low bandwidth telecommunications channels into a high bandwidth telecommunication channel
US5509122A (en) * 1992-02-20 1996-04-16 International Business Machines Corporation Configurable, recoverable parallel bus
US5357608A (en) * 1992-02-20 1994-10-18 International Business Machines Corporation Configurable, recoverable parallel bus
US5455830A (en) * 1992-02-20 1995-10-03 Gregg; Thomas A. Error detection and recovery in parallel/serial buses
US5455831A (en) * 1992-02-20 1995-10-03 International Business Machines Corporation Frame group transmission and reception for parallel/serial buses
US5267240A (en) * 1992-02-20 1993-11-30 International Business Machines Corporation Frame-group transmission and reception for parallel/serial buses
US5548623A (en) * 1992-02-20 1996-08-20 International Business Machines Corporation Null words for pacing serial links to driver and receiver speeds
SE470039B (en) * 1992-03-17 1993-10-25 Ellemtel Utvecklings Ab Ways to achieve link grouping in a packet selector
GB2267200B (en) * 1992-05-19 1995-10-25 Dowty Communications Ltd Packet transmission system
US5425020A (en) * 1993-11-04 1995-06-13 International Business Machines Corporation Skew measurement for receiving frame-groups
US5805924A (en) 1994-11-08 1998-09-08 Stoevhase; Bent Method and apparatus for configuring fabrics within a fibre channel system
US5581566A (en) * 1995-01-06 1996-12-03 The Regents Of The Univ. Of California Office Of Technology Transfer High-performance parallel interface to synchronous optical network gateway
US5822317A (en) * 1995-09-04 1998-10-13 Hitachi, Ltd. Packet multiplexing transmission apparatus
JP2785005B2 (en) * 1995-10-25 1998-08-13 株式会社超高速ネットワーク・コンピュータ技術研究所 Multiplexing / demultiplexing method in FC / ATM network interconversion equipment
US5793983A (en) 1996-01-22 1998-08-11 International Business Machines Corp. Input/output channel interface which automatically deallocates failed subchannel and re-segments data block for transmitting over a reassigned subchannel
US5798623A (en) * 1996-02-12 1998-08-25 Quantum Corporation Switch mode sine wave driver for polyphase brushless permanent magnet motor
GB9614814D0 (en) * 1996-07-15 1996-09-04 Plessey Telecomm Communication links for transmission of data in fixed size packets
US5793770A (en) * 1996-11-18 1998-08-11 The Regents Of The University Of California High-performance parallel interface to synchronous optical network gateway
US6236647B1 (en) * 1998-02-24 2001-05-22 Tantivy Communications, Inc. Dynamic frame size adjustment and selective reject on a multi-link channel to improve effective throughput and bit error rate
FI104671B (en) * 1997-07-14 2000-04-14 Nokia Networks Oy A switching fabric arrangement
US6094683A (en) * 1997-08-29 2000-07-25 Intel Corporation Link bundling in a network
US6002670A (en) * 1997-12-12 1999-12-14 Nortel Networks Corporation Optimization and recovery techniques in IMA networks
US6148004A (en) 1998-02-11 2000-11-14 Mcdata Corporation Method and apparatus for establishment of dynamic ESCON connections from fibre channel frames
US6160819A (en) * 1998-02-19 2000-12-12 Gte Internetworking Incorporated Method and apparatus for multiplexing bytes over parallel communications links using data slices
US6370579B1 (en) * 1998-10-21 2002-04-09 Genuity Inc. Method and apparatus for striping packets over parallel communication links
EP0996262A1 (en) * 1998-10-22 2000-04-26 Texas Instruments France Communication system with plurality of synchronised data links
IT1307016B1 (en) * 1999-01-27 2001-10-11 Cselt Centro Studi Lab Telecom PROCEDURE AND DEVICE FOR THE TRANSMISSION OF NUMERICAL SIGNALS.
US6222858B1 (en) * 1999-02-10 2001-04-24 Verizon Laboratories Inc. Method of inverse multiplexing for ATM
GB2350757A (en) * 1999-06-03 2000-12-06 Nokia Telecommunications Oy Delay compensation buffer

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5418939A (en) * 1992-02-20 1995-05-23 International Business Machines Corporation Concurrent maintenance of degraded parallel/serial buses
US5544345A (en) * 1993-11-08 1996-08-06 International Business Machines Corporation Coherence controls for store-multiple shared data coordinated by cache directory entries in a shared electronic storage
US5586264A (en) * 1994-09-08 1996-12-17 Ibm Corporation Video optimized media streamer with cache management
US5790794A (en) * 1995-08-11 1998-08-04 Symbios, Inc. Video storage unit architecture
US5768623A (en) * 1995-09-19 1998-06-16 International Business Machines Corporation System and method for sharing multiple storage arrays by dedicating adapters as primary controller and secondary controller for arrays reside in different host computers
US5999930A (en) * 1996-08-02 1999-12-07 Hewlett-Packard Company Method and apparatus for distributed control of a shared storage volume
US5928327A (en) * 1996-08-08 1999-07-27 Wang; Pong-Sheng System and process for delivering digital data on demand
US5894481A (en) * 1996-09-11 1999-04-13 Mcdata Corporation Fiber channel switch employing distributed queuing
US6029168A (en) * 1998-01-23 2000-02-22 Tricord Systems, Inc. Decentralized file mapping in a striped network file system in a distributed computing environment
US5964886A (en) * 1998-05-12 1999-10-12 Sun Microsystems, Inc. Highly available cluster virtual disk system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Method to send striping data over one link in an optical network", IBM TECHNICAL DISCLOSURE BULLETIN, vol. 34, no. 8, January 1992 (1992-01-01), pages 142 - 144 *
DATABASE TDB [online] XP002950136, Database accession no. NN9201142 *
See also references of EP1379946A4 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7848253B2 (en) 1999-01-12 2010-12-07 Mcdata Corporation Method for scoring queued frames for selective transmission through a switch
EP1779590A1 (en) * 2004-08-20 2007-05-02 Cisco Technology, Inc. Port aggregation for fibre channel interfaces
EP1779590A4 (en) * 2004-08-20 2010-07-14 Cisco Tech Inc Port aggregation for fibre channel interfaces
US8412831B2 (en) 2009-08-03 2013-04-02 Brocade Communications Systems, Inc. Per priority TCP quality of service
US8190790B1 (en) 2011-05-31 2012-05-29 Hitachi, Ltd. Storage apparatus and method of controlling the same
WO2012164610A1 (en) * 2011-05-31 2012-12-06 Hitachi, Ltd. Storage apparatus and method of controlling the same

Also Published As

Publication number Publication date
US20020161565A1 (en) 2002-10-31
US6941252B2 (en) 2005-09-06
EP1379946A1 (en) 2004-01-14
EP2285055A1 (en) 2011-02-16
EP1720294B1 (en) 2013-04-10
EP1720294A3 (en) 2007-05-16
EP1379946A4 (en) 2006-06-07
EP1720294A2 (en) 2006-11-08
EP2285055B1 (en) 2012-05-23

Similar Documents

Publication Publication Date Title
EP2285055B1 (en) Method for aggregating a plurality of links to simulate a unitary connection
US8014315B2 (en) Method for scoring queued frames for selective transmission through a switch
US6608819B1 (en) Method for scoring queued frames for selective transmission through a switch
US10223314B2 (en) PCI express connected network switch
US6988161B2 (en) Multiple port allocation and configurations for different port operation modes on a host
US8964754B2 (en) Backplane interface adapter with error control and redundant fabric
USRE44818E1 (en) Quality of service in virtual computing environments
US7606150B2 (en) Fibre channel switch
US20030026267A1 (en) Virtual channels in a network switch
US20020118692A1 (en) Ensuring proper packet ordering in a cut-through and early-forwarding network switch
US20030202520A1 (en) Scalable switch fabric system and apparatus for computer networks
US7194661B1 (en) Keep alive buffers (KABs)
US7719969B1 (en) System and method for assigning network device port address based on link rate
US10423333B2 (en) System and method for scalable processing of abort commands in a host bus adapter system
US8089971B1 (en) Method and system for transmitting flow control information
US20060013135A1 (en) Flow control in a switch
WO2006036468A1 (en) Method and system for optimizing data transfer in networks
US7907546B1 (en) Method and system for port negotiation
US11632334B2 (en) Communication apparatus and communication method
NETWORK Competitive Brief: Cisco vs. Brocade Director Architecture in FICON Environments

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): JP

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2002707404

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2002707404

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP