Methods and Systems for Determining Network Integrity and Providing Improved Network Availability
Related Applications
This application claims the benefit of U.S. Prov. Pat. App. No. 60/459548, filed on
April 1, 2003, entitled "Method to Improve Ethernet Network Availability", and U.S.
Pat. App. 10/727,118, filed on December 2, 2003, entitled "Improving Network
Availability", the entire contents of both of which are incorporated herein by reference.
Field of the Invention
The invention relates to computer network coimnunications and, in particular embodiments, to network management protocols.
Background of the Invention In prior art Wide Area Network (WAN) architecture, communications paths (or links) are point-to-point. In such networks, the nodes (or host computers) communicate with each other directly. Reliable WAN link up/link down status mechanisms are well known at both the physical layer and the data link layer (layers 1 and 2 of the Open Systems Interconnect [OSI] Reference Model). These status mechanisms allow link faults to be determined within about tens of milliseconds to about one or two seconds.
However, in Local Area Network (LAN) architectures status determination is not as readily available. In part, because LAN network protocols, such as Ethernet, are connectionless and support multiple accesses, several problems arise. By way of example, an Ethernet LAN can be partitioned into multiple subnetworks or segments. A given node (such as, but no limited to, a host computer, load balancing device, or router) on such a LAN is not aware of any such segmentation. If a node faults, there is not necessarily any notification (e.g. a "loss of carrier" signal) to other nodes on its segment or to other segments. Additionally, there is generally no "keep alive" or "link up" check mechanism to determine whether the link or links to a particular node are working or if the node is still "listening" or has left the segment.
Accordingly, a convenient way of determining network integrity and readiness for LANs, such as, but not limited to Ethernet LANs is needed.
Summary of the Invention
The invention addresses the deficiencies in the prior art by, in one embodiment, providing a method and apparatus for improving local area network (LAN) availability by implementing a standards-based link up/link down status detection protocol on segment-to-segment communications paths. The status detection protocol employs, in one embodiment, industry-standard Logical Link Control (LLC) Type 1 "test frame," described in IEEE Standard 802.2, to provide Ethernet status test messages and return responses. According to a feature, the status detection protocol of the invention provides continuous status information. Such continuous status information enables rapid routing table updates in the LAN (or attached WAN), and thus avoids inefficiencies due to routing to or through disabled or unavailable (down) nodes. In one preferred implementation, the status detection protocol of the invention operates on top of an existing Ethernet protocol in layer 2. According to another feature, the status detection protocol provides multiple access capabilities (a "multiaccess" protocol) and is compatible with Ethernet protocol generally.
In one aspect, a method according to the invention, includes the steps of, periodically transmitting a test message over a plurality of communication links from a source node of a source network segment to a plurality of destination nodes, each of the plurality of destination nodes being in communication with a respective destination network segment; generating, for each of the plurality of destination nodes, a return message if the test message is received at the destination node; determining the status of each of the plurality of communication links in response to the return messages generated by the plurality of destination nodes; and providing the status of the plurality of communication links to each of the plurality of destination nodes that generates a return message. According to one configuration, the return messages include echo messages, generated in response to the test messages.
According to one embodiment, the test message includes a LLC Type 1 frame format message. However, any suitable, preferably compact, test message may be employed. In one configuration, the test message is transmitted at a rate of about once per
second. Preferably, prior to transmitting the test message, the method of the invention detects an initial state of the network, for example, by observing the routing table at a source or destination node on which the method is operating. The source and/or destination nodes may be, for example, a router, load balancer, firewall, special- purpose device, host computer, or other like device.
According to one feature, the method of the invention operates concurrently on a plurality of the nodes in a network segment to be protected. In a related embodiment, the method of the invention operates concurrently on all or substantially all of the nodes in the network segment. Each node then performs its own self-discovery of adjacency and determines the status of adjacent nodes and links. This information is then used to update an adjacency status table at each node with adjacency information seen from the perspective of that node, hi an alternate embodiment, less than all of the nodes in the segment may utilize the method of the invention. However, preferably, more than one node uses the invention.
In one embodiment, status is determined by waiting a pre-determined period of time for a return acknowledgment message, such as an echo of the transmitted test frame. In an alternative embodiment, status is determined by detecting whether the source node receives at least a predetermined number of return messages from a respective destination node, in response to the source node transmitting a predetermined number of test messages. If the expected number of return messages is not received, the system determines that there is a fault with the communication link between the source node and destination node.
If the status of any node changes, as denoted by the failure to receive a return message from that node signifying either a node or a link failure, the sending node updates its local adjacency status table. The status changes may then be incorporated into the local RIB/routing table, which is then propagated to other routers on the network through standard mechanisms well-known in the art.
Because each router updates its adjacency status table each time the local message/response cycle is completed, reflecting the true state of all links, LAN efficiency is improved by avoiding routes through dead links or to unresponsive nodes. For example, a response wait period of approximately one second allows
router table updates approximately every few seconds, instead of the 5 to 10 minutes seen in the prior art.
One or more of the nodes performing the above status discovery process may be, in some embodiments, one of the hosts on the network, or a dedicated device configured to act as a router (as that term and function is known in the art) with the added functionality necessary to implement the presently-disclosed methods. Alternately, one or more of the status-discovering nodes may be a specially-adapted hardware and/or software device dedicated to this function.
In an alternate embodiment, the local node may update its copy of the network routing table directly upon determining that a node on the network (or network segment) has not responded to the test message. The modified routing table may then be advertised and propagated to all other routers on the network.
Brief Description of the Drawings
The methods and systems of the invention may be better understood by referencing the following drawings, where like reference designations refer to like components or steps.
Figure 1 is a high-level block diagram of a Local Area Network (LAN) configured in accordance with one embodiment of the invention.
Figure 2 is a flowchart of a method for determining status and increasing LAN availability, according to one embodiment of the invention.
Description of the Illustrative Embodiments
Figure 1 is a block diagram of an illustrative LAN 100. The LAN 100 includes two network segments 112 and 114. The network segments 112 and 114 may be, for example, Ethernet networks, although the invention is equally applicable to other network protocols, and is not limited to a particular network protocol. The network segment 112 includes a plurality of communication links 120a-120d between nodes 125a-125c. Similarly, the network segment 114 includes a plurality of communication links 122a-122e for communicatively interconnecting the nodes 124a- 124e. The nodes 125a-125e and 124a-124e may be, for example, host computers,
routers, load balancers, firewalls, or any other suitable network device. In the particular illustrative embodiment of Figure 1, the devices 125d, 124a and 124c are routers. The routers 125d and 124c can communicate with each other via the channels 130 and 132, thus enabling communications between the network segments 112 and 114.
The router 125d, in one exemplary embodiment, may be configured to act as one of the status-discovering nodes for the segment 114. As such, the router 125d sends messages to all external (to segment 114) nodes 124a-124e, one node at a time, to see if the communication channels to them (e.g., channels 130, 132, and 122a-122e) are operational. These messages may be LLC type 1 test frames, although any short test messages with a regular and predefined format may be used. The Logical Link Control (LLC) layer is the higher of the two data link layer sub-layers defined by the IEEE in its Ethernet standards. The LLC sub-layer handles error control, flow control, framing, and MAC-sub-layer addressing. The most prevalent LLC protocol is IEEE Standard 802.2, which includes both connectionless and connection-oriented variants.
To reduce intra-segment traffic, test frames may not be sent to locally attached nodes (e.g., nodes 125a-125c). For example, in one embodiment, only nodes outside of the network segment 112 (referred to herein as "destination" nodes) may be sent messages.
Return messages are generated by the destination nodes and sent back to the source node (i.e., the status-discovering node) for collection and matching to transmitted test messages. The return message may be a simple echo of the test message or a different, confirming message may be sent. Either way, the presence of a return message acknowledging (in some sense) the transmitted message provides a complete, end-to-end test of path continuity and therefore its status.
One advantage of using the LLC Type 1 test message is that it is purely a Layer 2 approach that does not propagate any overhead to Layer 3 or above in the protocol stack. Accordingly, the low overhead on the source and destination nodes makes for low round-trip delay and hence improved link fault detection timeliness.
Note that this statusing approach differs from the link integrity test used to determine the health of a link as far back as lOBase-T Ethernet. As described in the Cisco Press, Internetworking Technology Handbook (online, at:
http://www.cisco.com/univercd/cc/td/doc/cisintwk/ito_doc/index.htni
in Chapter 2, (accessed September 20, 2002): lOBase-T was also the first Ethernet version to include a link integrity test to determine the health of the link. Immediately after power-up, the physical medium attachment (PMA) sublayer transmits a normal link pulse (NLP) to tell the NIC at the other end of the link that this NIC wants to establish an active link connection:
If the NIC at the other end of the link is also powered up, it responds with its own NLP.
If the NIC at the other end of the link is not powered up, this NIC continues sending an NLP about once every 16 ms until it receives a response.
The link is activated only after both NICs are capable of exchanging valid NLPs.
It is applicant's understanding that the lOBase-T integrity check is only used at initial power-up, to establish the link between the Network Interface Cards (NICs) in two hosts. The statusing mechanism herein described, by contrast, operates continuously to keep track of segment host status, i some exemplary embodiments, the status test message is sent approximately once per second to keep status information current.
Figure 2 is a flowchart 200 depicting a process for improving network ability according to one illustrative embodiment of the invention. The process begins on power-up of a status-detecting node, 210. Initially, each status-detecting node performs a discovery step 215 to identify its nearest (adjacent) network neighbors outside of the status host's own network segment and the status of those network neighbors, using any conventional mechanism. Alternatively, a status-detecting node
may refer to initial status and adjacency information supplied to it in a local configuration file.
Next, the status-detecting node begins sending test messages 220 to each nearest neighbor not within the status-detecting node's 95 segment (e.g., where the status checking node is 125d, not within the network segment 112). After each message, the status-detecting node waits a pre-determined time (on the order of about 500 milliseconds) for a response, 230. Test 240 is a binary test on the reply received: if the reply matches the expected message (branch 242), then the channel or path is up and working. The status of that connection is then marked as "up" 244 in the local adjacency status table.
In some embodiments, the local adjacency status table is a separate table in the local routing information base (RLB); it may also be separate and distinct from the RLB. However, according to the illustrative embodiment, the adjacency status table is not a part of the local routing table when that term is used as implying a distinction from the RIB.
If the return message is not as expected or does not arrive at all within the predetermined wait time, branch 246 is taken and the link path status is marked as "down" in step 248.
In a preferred embodiment, the pre-determined wait time is specified in a configuration table (or file) supplied to the status discovery process or coded into software as a default value of, for example, one second. This link-specific wait time may be adjusted (not shown) according to the (known) speed of each link and the actual round-trip time (RTT) through mechanisms well-known to those of ordinary skill in the art. Thus, for distant (long) links operating at slow speeds, the discovery process will increase the link-specific wait time during the initial discovery. In particular, the method does not mark a link as "down" until it first verifies the RTT wait time by finding (and marking) the link as "up," as depicted by the secondary test 270.
In marking the link down in the adjacency status table, there may be several degrees of "down" indicated. The link may be down because it is overly congested, i.e., when
no replies are received in the wait period for several tries. Alternately, the link may be marked down because the destination node is itself down or congested. Furthermore, the link may be down because the network or a segment thereof is down as signaled through for example, a routine routing table update. This information may be included by using different symbols for the different states or by encoding the information using two or more bits through methods well-known in the art.
The updated path status from either step 244 or 248 is then used to update the local node's adjacency status table 250, which in turn forces a Routing Information Base (RIB) update, 255. The process waits approximately one second, 260, before sending a test message to the next host in step 220, repeating the cycle indefinitely or until commanded to cease or power-down. (As noted above, in some embodiments, the wait time is dynamically adjusted to reflect the actual RTT to each node).
The wait durations described above are examples only. Longer or shorter wait times 230 (before declaring a lack of response message as a link "down" indicator) and 260 (recycle time between messages) are also useable. The length of wait determines the degree to which message traffic overhead (from the test messages and their responses) impact the overall network's performance. Longer waits (especially at recycle step 260) decrease message overhead, but at the cost of additional latency before status updates hit the router table and can be propagated through the network.
The illustrative method may be practiced by a single node, by a plurality of nodes, or by some or all nodes in a segment or network. When multiple nodes each act as independent status discoverers, very rapid RTB/routing table updates will result as nodes, links, or paths come up or go down. In such a scenario, link state information may be updated on the order of once every five or ten seconds, a significant improvement over prior methods of monitoring link status.
While particular embodiments of the invention have been shown and described, changes and modifications may be made without departing from the scope invention. By way of example, the illustrative steps of the invention are described above in a particular order. However, they may be performed other orders within the scope of the invention. Additionally, the methodology of the invention may be performed in hardware, software or any combination thereof. Additionally, the methods and
systems of the invention maybe embodied in software, firmware, and/or microcode operating on a computer or computers of any type.