CA2117619C - Method and apparatus for fault tolerant connection of a computing system to local area networks - Google Patents

Method and apparatus for fault tolerant connection of a computing system to local area networks

Info

Publication number
CA2117619C
CA2117619C CA002117619A CA2117619A CA2117619C CA 2117619 C CA2117619 C CA 2117619C CA 002117619 A CA002117619 A CA 002117619A CA 2117619 A CA2117619 A CA 2117619A CA 2117619 C CA2117619 C CA 2117619C
Authority
CA
Canada
Prior art keywords
pair
network
data
communicating
ports
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CA002117619A
Other languages
French (fr)
Other versions
CA2117619A1 (en
Inventor
Kevin J. Rowett
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tandem Computers Inc
Original Assignee
Tandem Computers Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tandem Computers Inc filed Critical Tandem Computers Inc
Publication of CA2117619A1 publication Critical patent/CA2117619A1/en
Application granted granted Critical
Publication of CA2117619C publication Critical patent/CA2117619C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2002Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where interconnections or communication control functionality are redundant
    • G06F11/2005Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where interconnections or communication control functionality are redundant using redundant communication controllers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2002Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where interconnections or communication control functionality are redundant
    • G06F11/2007Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where interconnections or communication control functionality are redundant using redundant communication media
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2046Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant where the redundant components share persistent storage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2017Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where memory access, memory control or I/O control functionality is redundant
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2038Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant with a single idle spare processing component
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2097Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements maintaining the standby controller/processing unit updated

Abstract

A computing system, having an input/output bus for communicating data thereon, is connected to a network by a pair of network controller devices. Each of the network controller devices, in turn, connect to a corresponding one of a pair of multi-ported network repeater elements which are, in turn, connected to one another by a pair of network links. At least one workstation is connected to each of the network repeaters. One of the network controllers is initially selected as a primary data communicating path from the computing system to the network. The network controllers periodically transmit messages to one another, and if receipt of those messages by the primary network controller is noted, the selection of the primary controller will be switched to the other.

Description

MFTHOD AND APPARATUS FOR FAULT TOLERANT CONNECTION OF
A COMPUTING SYST~M TO LOCAL AREA NETWORKS
BACKGROUND OF THE INVENTION
The present lnventlon ls related generally to computlng system network configuratlons, and more partlcularly to a network conflguration that ls tolerant of faults that may occur.
Fault tolerant computing systems have long been known. In envlronments ln whlch hlgh avallablllty of computlng tasks are deslred, computlng systems have lncluded varlous fault tolerant deslgns such as redundant parts, elements or modules, data checklng and correctlng schemes, and the llke. An example of a fault tolerant computlng system may be found ln U.S. Pat. No. 4,228,496, whlch teaches a multlprocesslng system havlng fault tolerant capablllty for example ln the form of redundant facillties to llmit single polnts of fallure.
One use of such hlgh availabillty computing systems is as a "server" in various network configurations, such as, for example, local area network (LAN) designs. As a server, the computlng system stores and maintains the various resources (e.g., application programs, data, etc.) used by the statlons (e.g., personal computer (PC) systems) connected to the network. It was soon realized, however, that even though the server unlt may be structured to provlde high availablllty, the network ltself llmlted that avallability in that the network contalned single points of fallure. For example, cables can break, or inadvertently be removed, ~.~ 64157-434 fi ~ ~

elimlnating a communlcatlon path of the network and, more likely than not, lsolatlng one or more statlons from the server.
Solutlons to thls problem have lncluded provldlng fully redundant networks. However, thls solutlon has requlred the appllcatlon programs to be extenslvely rewrltten, thereby effectlvely ellmlnatlng the use of off-the-shelf appllcatlon programs.
Thus, lt ls apparent that there ls a need for extendlng the fault tolerant hlgh avallablllty provlded by multl-processor systems such as that of the aforementloned U.S. Pat. No. 4,228,496 to the network wlth whlch lt may be used and connected. Thls extenslon of fault tolerance should be done ln such a way so that appllcatlons run on workstatlons or the network need no, or llttle, revlslon or modlflcatlons.
SUMMARY OF THE INVENTION
The present lnventlon, ls generally dlrected to a fault tolerant network conflguratlon that provldes hlgh avallablllty ln a manner that does not requlre modlflcatlon of the appllcatlons that may be run on the network. The lnventlon, ln a more speclflc aspect, provldes a fault tolerant connectlon of a fault tolerant computlng system to a network conflguratlon, extendlng the fault tolerance of the computlng system to the network.
Accordlng to the network conflguratlon of the present lnventlon, a computlng system, preferably multl-processor based llke that taught by the aforementloned Pat. No.
4,228,496, lncludes a palr of network controllers coupled to the lnput/output (I/O) busses of at least one of the processor unlts of the system. The palr of network controllers, ln turn, each connect to a correspondlng one of a pair of multlported network access devlces (preferably, a concentrator or star-wlred repeater uslng the 10BASET wlrlng speclflcatlon of IEEE Standard 802.3). Other statlons connect to the network preferably by two network llnks, each connectlng the statlon to one of the access devlces, and the network access devlces themselves are preferably communlcatlvely connected to one another by a palr of network llnks.
The computlng system (l.e., the server) to whlch the network controller palr connects wlll select one of the network controllers as a "prlmary" communlcatlon path to the network, deslgnatlng the other network controller as a "back-up." All communlcatlons between the computlng system and the network are, therefore, communlcatloned through the prlmary network controller so selected.
The two network controllers perlodlcally send message packets (termed "heartbeat" messages) to one another, through the network, slgnalllng each other that the sender, and connectlng paths, are stlll in worklng order. Fallure of one of the network controllers to recelve a heartbeat message of the other wlll cause the computlng system to mark that controller as "unpalred" l.e., unavallable for use. If the unavallable network other controller happens to be the selected prlmary network controller, the computlng system wlll swltch selectlon to deslgnate the prlor backup controller as the prlmary, and log the fact that the prlor prlmary , 64157-434 controller ls no longer avallable.
Thus, these heartbeat messages perform two important functlons: Flrst, they operate to establlsh that each LAN
controller 26, 28 ls capable of transmlttlng and recelvlng message packets; and, second, they establlsh that the network ls conflgured ln a manner that allows the LAN controllers 26, 28 to functlon as a palr, one belng prlmary, the other the backup.
The network conflguratlon of the present lnventlon provldes a hlgh avallablllty networklng system that can reconflgure ltself in the face of a fault totally transparent to appllcatlons runnlng on the network/workstatlon.
Accordlng to a broad aspect of the lnventlon there ls provlded a method for fault tolerant connectlon of a processing means to a network , the processlng means havlng an lnput/output bus means for communlcatlng data, the method comprlslng the steps of:
connectlng to the lnput/output bus means flrst and second network controller means;
provldlng a palr of hub means each havlng a plurallty of ports at whlch data ls recelved or transmltted, a one of the plurallty ports of each of the palr of hub means belng coupled to correspondlng ones of the flrst and second network controller means for recelvlng data therefrom or transmlttlng data thereto, another of the plurallty of ports of a one of the palr of hub means belng coupled to another of the plurallty of ports of the other one of the palr of hub means;
connectlng at leat one statlon means to another one of - 3a --2 ~

the plurality of ports of each of the palr of hub means; and selectlng a one of the flrst or second network controller means as a prlmary data communlcatlon path between the processlng means and the palr of hub means.
According to another broad aspect of the invention there ls provlded a fault tolerant connectlon of a processlng system to a network, the processlng system havlng an lnput/output bus means for communlcatlng data, the connection comprislng:
flrst and second network controller means connected to the lnput/output bus means;
a pair of forwardlng means each havlng a plurallty of ports at whlch data ls recelved or transmltted, a one of the plurallty ports of each of the palr of forwardlng means belng coupled to correspondlng ones of the flrst and second network controller means for communlcatlng data therebetween, another of the plurallty of ports of a one of the palr of forwardlng means belng coupled to another of the plurallty of ports of the other one of the palr of forwardlng means;
at least one statlon means connected to a port of each of the palr of forwardlng means; and means for selectlng a one of the flrst or second network controller means as a prlmary data communlcatlon path from the processlng means to the palr of forwardlng means.
Accordlng to another broad aspect of the lnventlon there ls provided a method of coupllng a computlng system to a data communlcatlng network, the computlng system havlng at least a palr of processor unlts each havlng an lnput/output - 3b -_ 64157-434 ~2 ~

bus for communlcating data thereon;
provldlng at least a palr of network control devlces each coupled to the lnput/output bus of the pair of processor unlts;
provldlng a palr of access devlces each havlng multlple data ports for communlcatlng data;
coupllng a one of the palr of network control devices to a one of the multlple data ports of a correspondlng one of the palr of access devlces, and coupllng the other of the palr of network control devlces to a one of the multlple data ports of the other of the palr of access devlces;
connectlng each of the access devlces to the data communlcatlng network for communlcatlng data therebetween; and selectlng one of the palr of network control devlces for data communlcatlon between the computlng system and the data communlcatlng network.
Accordlng to another broad aspect of the lnventlon there ls provlded a fault tolerant connectlon between a computlng system and a network that lncludes at least one data communlcatlng statlon, the computlng system havlng an lnputtoutput bus for communlcatlng data, the fault tolerant connectlon comprlslng:
flrst and second network control devlces each coupled to the lnputtoutput bus for communlcatlng data therewlth, each of the flrst and second network control devlces operatlng to perlodlcally communlcate a message to the other of the flrst and second network control devlces, and to report to the computlng system fallure to recelve the message;

- 3c -~ ~ ~ ? ~

a first network access device communicatlvely coupled to the first network control device, and a second network access device communicatlvely coupled to the second network control device;
at least a palr of data communication paths coupled between the flrst and second network access devlces, a one of the pair of communication paths being selected for communicatlng data between the first and second network access devices; and a pair of data paths each respectively coupling the first and second access devices to the data communicating station, the data communicating station including a multiplex circuit operatlng to select one of the access devlces for data communlcatlon.
These and other aspects and advantages of the present invention will become evident to those skilled in the art upon a reading of the following detailed descrlption of the invention, which should be taken in con~unction with the accompanying drawings.
BRIEF DES~~ ON OF THE DRAWINGS
Flgure 1 ls a block dlagram lllustratlon of a slmpllfied network configuration structured according to the present inventlon;
Figure 2 is a state diagram maintalned by the computlng system of Flg. 1 to track the status of the network controllers that connect the computlng system to the network;

- 3d -Fig. 3 is a simplified block diagram showing the interface elements used to connect the workstation shown in Fig. 1 to the access devices; and Fig. 4 is a block diagram illustrating use of the invention in a more complex computing system.

DESCRIPTION OF ~HE PREFERRED EMBODIMENT
Referring to the figures, and for the moment principally Fig. 1, there is illustrated in simplified form a preferred embodiment of the invention in a local area network (LAN) configuration. Designated with the reference numeral 10, the network 90 formed includes a computing system 12 operating as a network server for the network. The network shown implements an Ethernet protocol, although other network protocols (e.g., FDDI, token ring, etc.) can also be used.
Communications carried on by the network 10 use, therefore, message packets as is conventional, and include the identification of the packet's originating station (source address), destination station (destination addressJ and such other information as may be provided for by the particular protocol used by the network.
Preferably, the computing system 12 is of a fault tolerant, multi-processor design, such as that illustrated in the aforementioned '496 patent, although those skilled in this art will recognize that a unitary processor system may also be used, except that it would present a single point of failure and therefore limit the fault tolerant capabilities of the overall system. As illustrated, the processing system 12 includes processor units 14 (14a, 14b). The processor units are connected to one another for interprocessor commlln;cations by a bus structure 16 (in the form of a pair of separate buses 16a, 16b). In addition, each processor unit 14 includes an input/output (I/O) bus 20 (20a, 20b) for each of the processors 14a, 14b, respectively.
Typically, particularly if operating as a server, the processing system 10 will include mass storage, for example, disk units 22, which are controlled by a dual-ported -dlsk controller 24. The dlsk controller 24, ln turn, ls coupled to each of the I/O buses 20a, 20b.
Also connected to the I/O buses 20a, 20b are dual-ported LAN controllers 26, 28. In turn, the LAN network controllers 26, 28 respectlvely connect the processlng system 12 to a palr of LAN access devlces or 32, 34. An acceptable LAN controller ls the 3615 LAN controller manufactured by the asslgnee hereln, Tandem Computers Incorporated of Cupertlno, Callfornla. Such a LAN controller is what is termed an "lntelllgent" devlce ln that lt ls a mlcroprocessor-based machlne.
As lndlcated, the LAN access devlces 32, 34 are preferably each a palr of concentrators or star-wlred repeater unlts uslng the 10BASET wlrlng. An acceptable access devlce ls an ACCESS/ONE (a reglstered trademark of Ungermann-Bass Corporatlon) ASC 2000 cablnet contalnlng two 10BASET
concentrator cards (P/N ASM 32012), manufactured by Ungermann-Bass Corporatlon of Santa Clara, Callfornla. The LAN access devlces are herelnafter referred to as LAN "hubs" 32, 34, although lt wlll be evldent to those skllled ln thls art that other types of media access devlces (e.g., bridges) may also be used wlthout departlng from the teachlng of the present lnventlon.
Thus, the LAN controller 26 connects to one of the multlple ports of the LAN hub 32 vla a llnk 36a, whlle the LAN
controller 28 connects to the LAN hub 34 by a link 36b. A
palr of the ports of each LAN hub 32, 36 (e.g., the backbone connection ports) communicatively connect the two LAN hubs 32, 34 to each other by links 38. Also, a port of each of the LAN
hubs 32, 34 ls connected by links 40a, 40b to a workstation 44.
Each of the LAN controllers 26, 28 are assigned three medla access control (MAC) addresses. One address ls unlque to the speclflc controller. Message packets contalnlng as a destlnatlon address this unique address wlll be recelved only by the ldentlfled LAN controller. The second address, a "palr-wlse" address, ldentifies the palr of LAN controllers 26, 28 so that network message packets containlng as the destlnation address such a pair-wise address wlll be recelved - 5a -by the both LAN controllers 26, 28. The thlrd address is a group MAC address. Message packets set to thls group MAC
address will be received by all units or stations of the group identified by the group MAC address, including the LAN
controllers 26, 28 if they are assigned to that group.
Each processor unit 14 runs under an operating system (OS) 50 that has available an associated input/output process (IOP) devlce driver 52 through which communications between the processor unit 14 are handled (via an I/O channel 54 that forms the interface for the processor unit 14 to the I/O bus 20). The IOPs 52 handle communications through the LAN controllers 26,28.
Each LAN controller will run the MAC layer protocol necessary to communicate with the LAN hubs 32, 34, and the network. The pair of LAN controllers 26, 28 will periodically send so-called "heartbeat" message packets to the group MAC
address, which will be received by all other devices on the network, including the other LAN controller of the pair.
Thus, for example, the LAN controller 26 will periodically send a heartbeat message packet that is received by its sibling LAN controller 28. The heartbeat message packet is an indicatlon by one LAN controller to lts slbllng that lt is still in operating condition and that its connection to the network 10 (and to the sibling LAN network controller) continues to exist. The LAN controllers 26, 28 wlll report the absence of any such heartbeat message packets to the IOP
52, allowlng the IOP 52 to malntaln a log, ln the form of a .' - 6a -software state machlne (Flg. 2), of the status of the LAN
controllers lt controls.
The heartbeat message packets wlll contaln, as a source address, the unlque ldentlflcatlon address of the LAN
controller 26, 28 from whlch the packet orlginates. The recelvlng LAN controller wlll perlodlcally report the recelpt of heartbeat messages from lts slbllng, as well as reportlng other LAN trafflc. If a partlcular LAN controller (e.g., LAN
controller 28) does not hear from lts slbllng (e.g., LAN
controller 26), that absence of messaglng ls reported to the assoclated IOP 52. If the sllent LAN controller was the CA2il~619 designated primary, the associated IOP 52 will switch the designations. The ~AN controller formally designated as the backup will now become the primary, and the former (but now silent) primary LAN controller will be designated as 5 Illlnp~;red~ n indicating that for some reason or other, it or it~ network connections are not functioning properly.
Since two processors 14 connect to the pair of LAN
controllers 26, 28, they must decide among themselves who will control the LAN controllers, i.e., who will be the ~'primary~
processor unit. The other becomes a backup, and will effect control only if the first becomes inoperative (see the ' 496 patent).
Then, the IOP 52 of the prim~ary processor unit 14 will select one of the LAN controllers 26 (more or le~ on an arbitrary basis - at least at the outset) as the primary cnmml~n-cation path between the processor unit 14 and the network 10. All c~mml]nications between the processor unit 14 (under control of the IOP 52) will~ therefore, be conducted through the ~elected "primary" LAN controller 26. The other of the pair of LAN controllers (e.g., LAN controller 28) operates as a backup. It may receive message packets from the network, but does not transmit those message packets on to the identified processor unit 14.
The ~tatus of the LAN controllers 26, 28 as to which is primary, which is backup (i.e., available) is maintained in a software state machine by the IOP 52. The state diagram of that IOP-maint~1ne~ state machine is illustrated in Fig. 2, showing the five separate states that identify the different status condition~ that can be assumed by the pair of LAN
controllers 26, 28 (i.e., which is primary, which is backup, and which is unavailable by the "unpaired" indication). The individual states are identified as follows:
1. Controller 26 iS primary; controller 28 is backup.
2. Controller 26 iS backup; controller 28 iS primary.
3. Controller 26 ig primary; controller 28 is lln~ red.
4. Controller 26 iS llnr~; red; controller 28 iS primary.
5. Both down.

As indicated above, the term "unpaired" i9 meant to indicate that the LAN controller is unavailable as a backup.
That unavailability may result from a malfunction of the controller itself, or by a missing link connection such as, for example, link 36a in the case LAN controller 26.
A LAN controller 26, 28 selected a~ primary will be the one that the IOP will conduct all outgoing message packet traffic to for comm~n;cation to the network 10, and will be the one that it will receive network message traffic from.
The IOP 52 does not commlln;cate with the network through the backup (or, obviously, the unpaired) LAN controller.
There are six different events that can occur to cause the IOP maintained state machine to change states.
- Tho~e events are:
1. A heartbeat message that is reported to IOP 52 as being received by LAN controller 28, and sent by LAN controller 26. (In the tables, below, this is represented as "HB msg 26~28.") 2. A heartbeat message is reportedto the IOP 52 as being received by the LAN controller 26 and sent by the LAN controller 28 ("HB m~g 28~26").
3. LAN controller 26 reports receiving MAC message packet traffic, but no heartbeat mes~age packets from ~AN controller 28 ("'26' rpts MAC").
4. LAN controller 28 reports receiving MAC message packet traffic, but no heartbeat message packets from LAN controller 26 ("'28' rpts MAC").
5. LAN controller 26 reports silence, i.e., no heartbeat or MAC message packet traffic is being received ("'26' rpts silence").
6. LAN controller 28 reports silence ("'28' rpts silence n ) .
There are three actions that may be taken in the state machine as a result of the occurrence of an event as set forth above. Those actions are:
1. Switch (i.e., select the prior backup LAN
- controller as the primary).
2. Ignore the occurrence of the event.

~ i ~ 7 ~ ~ ~
3. Report the occurrence of the event.
The particular action taken, of course, depend~ upon the ~tate of the IOP-maintained state machine. Tables 1-5, below, set forth for each of the 9ix events that can occur, the action taken and, depending upon the event and the pre~ent state, the new state entered by the state machine. The left hand column of each table are the events; the center column identifies the action taken upon the occurrence of the event;
and the right hand column of each table indicates the result of the action (i.e., r~m~;n;ng in the same state or changing to new ~tate).

Table 1: State (l).CTLR-26- primary,CTLR-28- Bac~up Event Action New State (l) HB msg 26~28 ~2) Ignore (1) - Remain in Current State (2) B msg 28~26 ~2) Ignore n (3) "26" rpts MAC (3) Report (3) - 26 prim; 28 unpaired (4) "28" rpts MAC (1) Switch (4) - 26 unpaired; 28 prim (5) "26" rpts silence (1) SwitCh (4) - 26 unpaired; 28 prim (6) "28" rptB Bilence (3) Report (3) - 26 prim; 28 unpaired Table 2: State (2) CTLR-26- bac~up;CTLR~28- primary Event Action New State (1) HB m~g 26~28 (2) Ignore (2) HB m~g 28~26 (2) Ignore (3) "26" rpt~ MAC (1) Switch (3) - 26 prim; 28 unpaired (4) "28" rpts MAC (3) Report (4) - 26 unpaired; 28 prim (5) "26" rpt~ Bilence (3) Report (4) - 26 unpaired; 28 prim (6) "28" rptB ~ilence (1) Switch (3) - 26 prim; 28 unpaired ~ ~ ~ 7~ cl Table 3: State (3)cTLR -26- prlmary;CTLR~28- i~ unpalred Event Action New State (1) HB msg 26~28(3) ~eport (1) - 26 prim; 28 ~ackup (2) HB msg 28~26(3) Report (1) - 26 prim; 28 backup (3) ~26" rpts MAC(2) Ignore (-~) (4) "28" rpts MAC(2) Ignore (3) (5) "26~ rpts silence (3) ~eport (5) - Both Down (6) "28" rpts silence (2) Ignore (3) Table 4: State (4~CTLR ~26~ unp~lredCTLR ~28- prlmary Event Action New State (1) H13 msg 26~28(3) Report (2) - 26 backup; 28 prim (2) HB msg 28~26-(3) Report (2) - 26 backup; 28 prim (3) "26" rpts M~C(2) Ignore (4) (4) "28" rpt6 MZ~C(2) Ignore (4) (5) "26" rpts silence (2) Ignore (4) (6) "28" rpts silence (3) Report (5) - Both Down Table S: State (5). ~oth Controller~ Do~n Event Action New State (1) HB msg 26~28 (3) Report (1) - 26 prim; 28 backup (2) HB msg 28~26 (3) Report (2) - 26 backup; 28 prim (3) "26" rpts MAC (3) Report (3) - 26 prim; 28 unpaired (4) "28" rpts MAC (3) Report (4) - 26 unpaired; 28 prim (5) ~26" rpts silence (2) Ignore (5) (6) "28" rpts silence (2) Ignore (S) Thus, for example, table 1 illustrates the events that can occur while this state machine i9 in state 1: ~N
controller 26 is selected as the primary, and LAN controller 28 is the backup. If e~ent 1 (hAN controller 28 receives heartbeat message packets from LAN controller 26), the action is ignored (action 2) and the state machine remains in the state 1. However, assuming that we are still in state 1, and A

~ 2 event 3 occurs ~LAN controller 26 reports to the IOP that lt ls recelvlng only MAC message packets, no heartbeat message packets) thls event lndlcates that LAN controller 28 may be unavallable for one or more of the reasons outllned above.
Accordlngly, the state machlne moves from state 1 to state 3, ln whlch LAN controller 26 remalns the prlmary, but LAN
controller 28 ls now deslgnated as "unpalred."
Contlnulng, and as a further example of the LAN
controller status-keeplng operation of the IOP-malntained state machlne,refer now to table 3 whlch lllustrates state 3 of the state dlagram (Flg. 2): LAN controller 26 ls stlll the prlmary data path between the processlng system 12 and the LAN
hubs 32, 34, but the LAN controller 28 ls no longer a backup, and ls deslgnated unpalred. Assume now that elther one of events 1 or 2 occur: LAN controller 28 begins reportlng recelpt of heartbeat message packets from LAN controller 26 (event 1), or LAN controller 26 reports recelvlng heartbeat messages from the LAN controller (B). Elther occurrence lndlcates that the LAN controller 28 and/or lts network connectlons are functloning properly. As Table 3 shows, the action taken by elther of these occurrences ls to report them to the controlllng IOP 52, causlng a change ln state to state 1. (LAN controller 26 contlnues to be the primary connection to the network 10, but the status of the LAN controller 28 changes from unpalred to backup.) The remalnlng events of any of the tables 1-5 can be slmllarly lnterpreted, wlth reference to the slx events that are ldentified above that can occur, and the actions that are taken as a result of the occurrence ~ 64157-434 ~ 3 ~
....

of those events, depending upon the partlcular state of the state machlne.
Note that a~ a result of the contlnuous status tracklng capablllty provlded by IOP-malntalned state machlne, the IOP 52 ls able to reconflgure whlch LAN controller 26, 28 wlll be the prlmary connectlon to the network 10, and whlch wlll be backup as sltuatlons change. And, as wlll become evldent below, thls ls all done ln a manner that ls transparent to the rest of the network. For example, assume that after lnltlal selectlon of the LAN controller 26 and the prlmary, and LAN controller 28 the backup, the controlllng IOP

- lla -f 64157-434 ' 12 CA2i 17619 -52 notes that either LAN controller 28 begins reporting receipt of only M~C message packet traffic (i.e., no heartbeat message packets are being received from LAN controller 28 -occurrence 4), or the LAN controller 26 reports silence (occurrence 5). As the state diagram (Fig. 2) and Table 1 indicate, either of these occurrences will cause the state machine to move from state 1 to state 4 in which LAN
controller 26 is designated as unpaired, and LAN controller 28 now becomes the primary connection to the network 10. The IOP
52 will see to it that all further commln;cations with the network 10 are conducted through the LAN controller 28.
The transparency of this change in network connection results from the pair-wise addresses used by the LAN controllers 26, 28. All message packets directed to the processing system 12 will use this pair-wise address which, as explained above, is the same for both LAN controllers. Thus, when the workstation 44 sends a message packet to the computing system 12, the destination address in that packet will identify both the LAN controllers 26, 28, and both will receive that packet. Only the LAN controller 26, 28 designated as the primary will forward that mes~age packet on to the processor unit 14 (i.e., the IOP 52 running on the Processor 14). Thus, the computing system 12 was able to reconfigure itself to account for a fault occurring somewhere in its prior primary connection to the network 10 without assistance from, or even the knowledge of, the other stations on the network (e.g., workstation 44).
The LAN hub units 32, 34 are preferably designed to implement IEEE St~n~Ard 802.1 and, in particular, the "Spanning Tree and Learning Bridge Algorithm" described by that St~n~rd. This algorithm gives the LAN hub units 32, 34 the capability of learning the configuration of the network and the location of the stations on the network, includi~ng the A LAN controllers 26, 28~by the traffic to and from those ~tations. LAN hubs 32, 34 will begin recognizing message packet traffic to and from the LAN controller 28 and will reconfigure itself to forward the traffic for the processing system 12 to that LAN controller.

CA2 j i ~619 ~_ 13 In addition,-the preferred embodiment of the invention utilizes two links 38 to interconnect the LAN hub 32, 34 themselves. The Sp~nn; ng Tree and Learning Bridge Algorithm described in IEEE St~n~rd 802.1 will select as a primary a one of the two links 38, reserving the other as the backup. Should the primary fail, the algorithm will then select the backup.
The fault tolerant philosophy incorporated in the present invention is extended to the connection of the workstation 44 to the network 10 by means of the connection of the workstation 44 to each of the two LAN hubs 32, 34, providing an alternate comml~nlcation path should one go down.
The interface for the dual-link connection of the work station 44 to the network 10 is illustrated in Fig. 3 and designated with the reference numeral 60. As qhown, the network interface 60 includes a multiplexer (MUX~ 62 and conventional LAN control logic 64. In actual implementation the MUX 62 is an electro-mechanical relay, operating to select one of the two network links 42 to which it connects to the LAN control logic 64. The MUX 62 operates under control of a ~elect (SEL) signal from the central processing unit (CPU) of the workstation 44. The selected link 42 becomes the "primary"
commt~n;cation link for the workstation 44 to the network 10.
The LAN control logic 64 is a conventional network controller chip, ~uch as the Ultrachip Ethernet controller, and sold under the part number 83C790QF, manufactured by and available from St~n~rd Microsystems Corporation of Hauppauge, New York. Such network controller devices typically include pulse detection circuitry to monitor the LAN link to which the device is attached to determ~ne the integrity of the link. As is conventional, the LAN hubs 32, 34 are designed to periodically transmit link integrity pulses from each of its ports when there is no message packet traffic. The link integrity pulses provide an indication that the link itself (and the port connecting to the link) is in good order.
Absence of any such link integrity pulses will cause the LAN
control logic 64 to report that absence to the CPU 68.
Preferably, the port connections of the network interface 60 to the network links 42 are also lOBASET type 7 ~

wirlng connectlons, although the conventional AUI connectlons may also be used. The 10 BASET wlring connection ls preferred because lt provldes a qulcker report tlme for mlssing llnk lntegrlty pulses.
Inltlally, the CPU 68 wlll select one of the llnks 42 as the prlmary llnk, and communlcate that llnk to the LAN
control loglc 64 through the MUX 62. Thereafter, all traffic between the workstation 44 and the network 10 wlll be across only the selected llnk 42 (e.g., 42a). As lndlcated above, the CPU 68 wlll perlodlcally recelve reports concernlng the lntegrlty of the prlmary llnk from the LAN control loglc 64, as determlned from the recelpt (or absence of) llnk lntegrlty pulses from the LAN hubs 32, 34. If the report ldentlfles the prlmary llnk, the CPU 68 wlll swltch the MUX 62 to communlcate the llnk 42 that was formally the backup to the LAN control loglc 64.
The llnk lnltlally not selected as the prlmary (e.g., 42b) ls termed the backup llnk. Perlodlcally (e.g., every one-half second) the CPU and wlll select the backup llnk 42b and communlcate lt to the LAN control loglc 64 to determlned lf that llnk ls communlcatlng the llnk lntegrity pulses transmltted by the LAN hub 34 (Flg. 1). If not llnk lntegrlty pulses are reported to the CPU 68 as absent, the CPU
68 determlnes that for some reason the llnk 42b ls unavallable.
Returnlng for a moment to Flg. 1, and notlng that the LAN controllers 26, 28 are connected to two processor unlts 14, lt wlll be appreclated that each processor unlt 14 7 ~
has an IOP 52 for each palr LAN controllers 26, 28. If a processor unlt 14 was connected to an addltional palr, and addltlonal IOP 52 would be necessary. Thls is illustrated in Flg. 4, whlch shows the processlng system 12 as a more generallzed multl processlng system havlng up to N processors.
In the context of the multl processors system of the '496 patent N ls preferably llmlted to 16.
Thus, as Flg. 4 lllustrates, ln addltlon to the processor units 14a and 14b there ls also processor unit 14n, and additlonal LAN controller palr 26', 28' are connected between the I/O and buses 20b, 20n of the processors 14b, 14n.
The LAN controllers 26', 28' are connected, by network llnks 34a', 34b' to the LAN hub devlces 32, 34, respectlvely ln the same manner that LAN controllers 26, 28 are connected.
The processor unit 14n runs under an operating system (O.S.) 50n, wlth the lnputtoutput process (IOP) devlce drlver 52n for operating the LAN controllers 26', 28', (through the I/O channel 54n of the processor unit 14n). Now, however, the processor unit 14b has two separate IOP device drlvers an IOP 52b for controlllng lnput/output communlcatlon wlth the network, 10 through the LAN controllers 26, 28, and a separate IOP devlce drlver 52b' for controlllng communlcatlon through the second LAN controller palr 26', 28'.
It wlll be apparent to those skllled ln thls art that although the lnventlon has been dlscussed ln terms of a slngle workstatlon 44, ln reallty there would be a number of statlons of varlous types. Preferable each of those statlons would also be connected to the LAN hubs 23, 34 ln the same ~'~ 64157-434 manner as ls shown for workstation 44, i.e., be separate link connections. Further, although lt would be preferable that the invention be implemented with a processing system 14 as described ~with each pair of LAN controllers 26, 28 connected between two processor units 14), a computing system 14 could eschew the fault tolerant capability provided by a multiprocessor system and have only a single I/O bus to which slngle-ported LAN controllers connect the processor to the network. Although the single processor presents a single point of failure, the connection to the network through a pair of LAN controllers and the configuratlon of the network itself of the present invention still provides a fault tolerant network design.
In addition, although the preferred embodiment of the invention is to use "intelligent" LAN controllers to interconnect the computing system 12 to the network 10, that need not necessarily be the case. It will be recognlzed by those skilled in this art that many of the operations that would be performed by such intelligence of the LAN controller could be handled by the IOP running on the computing system, although at the cost of computing time of the computing system.

- 15a -

Claims (22)

1. A method for fault tolerant connection of a processing means to a network, the processing means having an input/output bus means for communicating data, the method comprising the steps of:
connecting to the input/output bus means first and second network controller means;
providing a pair of hub means each having a plurality of ports at which data is received or transmitted, a one of the plurality ports of each of the pair of hub means being coupled to corresponding ones of the first and second network controller means for receiving data therefrom or transmitting data thereto, another of the plurality of ports of a one of the pair of hub means being coupled to another of the plurality of ports of the other one of the pair of hub means;
connecting at least one station means to another one of the plurality of ports of each of the pair of hub means;
and selecting a one of the first or second network controller means as a primary data communication path between the processing means and the pair of hub means.
2. The method of claim 1, including the step of transmitting from the first and second network controller means for receipt by at least the second and first network controller means, respectively, message data.
3. The method of claim 2, wherein the transmitting step occurs periodically.
4. The method of claim 3, wherein the absence of receipt of the message data from the first or second controller means by the second or first controller means, respectively, is reported to the processor means.
5. The method of claim 4, including the step of selecting another of the first or second network controller means as a primary data communication path when absence of receipt of the message data from the prior selected one of the first or second network controller means is reported to the processor means.
6. The method of claim 1, wherein a further pair of the plurality of ports of each of the hub means is coupled to the other to form a corresponding pair of hub-to-hub connections for communicating data therebetween, and including the step of selecting a one of the pair of hub-to-hub connections as a primary data communicating path between the pair of hub means.
7. A fault tolerant connection of a processing system to a network, the processing system having an input/output bus means for communicating data, the connection comprising:
first and second network controller means connected to the input/output bus means;
a pair of forwarding means each having a plurality of ports at which data is received or transmitted, a one of the plurality ports of each of the pair of forwarding means being coupled to corresponding ones of the first and second network controller means for communicating data therebetween, another of the plurality of ports of a one of the pair of forwarding means being coupled to another of the plurality of ports of the other one of the pair of forwarding means;
at least one station means connected to a port of each of the pair of forwarding means; and means for selecting a one of the first or second network controller means as a primary data communication path from the processing means to the pair of forwarding means.
8. The fault tolerant connection of claim 7, the one station means including means for selecting a connection to a one of the ports of each of the pair of forwarding means as a primary path for data communication to and from the one station means.
9. The fault tolerant connection of claim 7, including means for causing the first and second network controller means to transmit message data to be received by at least the second and first network controller means, respectively.
10. The fault tolerant connection of claim 9, wherein the first and second network controller means report absence of receipt of message data to the processing means, and wherein the processing means selects another of the first or second network controller means as a primary data communication path when absence of receipt of the message data from the prior selected one of the first or second network controller means is reported to the processor means.
11. The fault tolerant connection of claim 7, wherein the first and second network controller means periodically transmit message data for receipt by the second and first network controller means, respectively, and wherein absence of receipt of message data by the first or second network controller means selected as the primary communications path results in selection of another of the first or second network controller means as the primary data communication path.
12. The fault tolerant connection of claim 7, including a peripheral device and a peripheral controller means connecting the peripheral device to the input/output bus means.
13. The fault tolerant connection of claim 12, wherein the peripheral device is a data storage means.
14. A method of coupling a computing system to a data communicating network, the computing system having at least a pair of processor units each having an input/output bus for communicating data thereon;
providing at least a pair of network control devices each coupled to the input/output bus of the pair of processor units;
providing a pair of access devices each having multiple data ports for communicating data;
coupling a one of the pair of network control devices to a one of the multiple data ports of a corresponding one of the pair of access devices, and coupling the other of the pair of network control devices to a one of the multiple data ports of the other of the pair of access devices;

connecting each of the access devices to the data communicating network for communicating data therebetween; and selecting one of the pair of network control devices for data communication between the computing system and the data communicating network.
15. The method of claim 14, including the step of selecting one of the pair of access devices for data communication with the data communicating network.
16. The method of claim 15, including the steps of providing a pair of communication paths between the pair of access devices, and selecting one of the communication paths for communicating data between the pair of access devices.
17. The method of claim 14, wherein data is communicated by the data communicating network using a message containing a destination address indicative of a recipient of the message, and including the step of providing each of the pair of network control devices with an address that identifies both of the pair of network controllers as the recipient the message containing the address as the destination address.
18. The method of claim 14, including the step of each of the pair of network control devices sending a periodic message to the other of the pair of network control devices.
19. The method of claim 18, wherein the sending step is for at least the purpose of identifying to the other of the pair of network control devices that each of the pair of network control devices continues to operate.
20. The method of claim 18, including the step of the other of the pair of network control devices reporting failure to receive the periodic message to the computing system.
21. The method of claim 18, including the steps of selecting a one of the pair of network control devices to communicate data between the computing system and the pair of access devices, and selecting the other one of the pair of network control devices in absence of the periodic message from the one of the pair of network control devices.
22. A fault tolerant connection between a computing system and a network that includes at least one data communicating station, the computing system having an input/output bus for communicating data, the fault tolerant connection comprising:
first and second network control devices each coupled to the input/output bus for communicating data therewith, each of the first and second network control devices operating to periodically communicate a message to the other of the first and second network control devices, and to report to the computing system failure to receive the message;

a first network access device communicatively coupled to the first network control device, and a second network access device communicatively coupled to the second network control device;
at least a pair of data communication paths coupled between the first and second network access devices, a one of the pair of communication paths being selected for communicating data between the first and second network access devices; and a pair of data paths each respectively coupling the first and second access devices to the data communicating station, the data communicating station including a multiplex circuit operating to select one of the access devices for data communication.
CA002117619A 1993-10-15 1994-09-01 Method and apparatus for fault tolerant connection of a computing system to local area networks Expired - Fee Related CA2117619C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US08/137,436 US5448723A (en) 1993-10-15 1993-10-15 Method and apparatus for fault tolerant connection of a computing system to local area networks
US08/137,436 1993-10-15

Publications (2)

Publication Number Publication Date
CA2117619A1 CA2117619A1 (en) 1995-04-16
CA2117619C true CA2117619C (en) 1999-05-11

Family

ID=22477427

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002117619A Expired - Fee Related CA2117619C (en) 1993-10-15 1994-09-01 Method and apparatus for fault tolerant connection of a computing system to local area networks

Country Status (6)

Country Link
US (1) US5448723A (en)
EP (1) EP0649092B1 (en)
JP (1) JP2583023B2 (en)
AU (1) AU685497B2 (en)
CA (1) CA2117619C (en)
DE (1) DE69414219T2 (en)

Families Citing this family (67)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5625825A (en) * 1993-10-21 1997-04-29 Lsi Logic Corporation Random number generating apparatus for an interface unit of a carrier sense with multiple access and collision detect (CSMA/CD) ethernet data network
WO1995015529A1 (en) * 1993-12-01 1995-06-08 Marathon Technologies Corporation Fault resilient/fault tolerant computing
JP3599364B2 (en) * 1993-12-15 2004-12-08 富士通株式会社 Network equipment
US5560033A (en) * 1994-08-29 1996-09-24 Lucent Technologies Inc. System for providing automatic power control for highly available n+k processors
JP3375245B2 (en) * 1995-04-26 2003-02-10 インターナショナル・ビジネス・マシーンズ・コーポレーション Device for fault-tolerant multimedia program distribution
US5822512A (en) * 1995-05-19 1998-10-13 Compaq Computer Corporartion Switching control in a fault tolerant system
US5713017A (en) * 1995-06-07 1998-01-27 International Business Machines Corporation Dual counter consistency control for fault tolerant network file servers
US6005884A (en) * 1995-11-06 1999-12-21 Ems Technologies, Inc. Distributed architecture for a wireless data communications system
GB2308040A (en) * 1995-12-09 1997-06-11 Northern Telecom Ltd Telecommunications system
DE19548445A1 (en) * 1995-12-22 1997-07-03 Siemens Ag Archive system for power plant process data
US6199172B1 (en) * 1996-02-06 2001-03-06 Cabletron Systems, Inc. Method and apparatus for testing the responsiveness of a network device
US6141769A (en) 1996-05-16 2000-10-31 Resilience Corporation Triple modular redundant computer system and associated method
US7148786B2 (en) * 1996-09-30 2006-12-12 Terumo Cardiovascular Systems Corporation Network communication and message protocol for a medical perfusion system
DE19643092C2 (en) * 1996-10-18 1998-07-30 Elan Schaltelemente Gmbh Field data bus system
US6195717B1 (en) 1997-05-13 2001-02-27 Micron Electronics, Inc. Method of expanding bus loading capacity
US6243838B1 (en) 1997-05-13 2001-06-05 Micron Electronics, Inc. Method for automatically reporting a system failure in a server
US6163849A (en) * 1997-05-13 2000-12-19 Micron Electronics, Inc. Method of powering up or powering down a server to a maintenance state
US6269412B1 (en) 1997-05-13 2001-07-31 Micron Technology, Inc. Apparatus for recording information system events
US6269417B1 (en) 1997-05-13 2001-07-31 Micron Technology, Inc. Method for determining and displaying the physical slot number of an expansion bus device
US6363497B1 (en) 1997-05-13 2002-03-26 Micron Technology, Inc. System for clustering software applications
US6253334B1 (en) 1997-05-13 2001-06-26 Micron Electronics, Inc. Three bus server architecture with a legacy PCI bus and mirrored I/O PCI buses
US6338150B1 (en) * 1997-05-13 2002-01-08 Micron Technology, Inc. Diagnostic and managing distributed processor system
US6179486B1 (en) 1997-05-13 2001-01-30 Micron Electronics, Inc. Method for hot add of a mass storage adapter on a system including a dynamically loaded adapter driver
US6292905B1 (en) * 1997-05-13 2001-09-18 Micron Technology, Inc. Method for providing a fault tolerant network using distributed server processes to remap clustered network resources to other servers during server failure
US6170028B1 (en) 1997-05-13 2001-01-02 Micron Electronics, Inc. Method for hot swapping a programmable network adapter by using a programmable processor to selectively disabling and enabling power thereto upon receiving respective control signals
US6202160B1 (en) 1997-05-13 2001-03-13 Micron Electronics, Inc. System for independent powering of a computer system
US6249885B1 (en) 1997-05-13 2001-06-19 Karl S. Johnson Method for managing environmental conditions of a distributed processor system
US6330690B1 (en) 1997-05-13 2001-12-11 Micron Electronics, Inc. Method of resetting a server
US6324608B1 (en) 1997-05-13 2001-11-27 Micron Electronics Method for hot swapping of network components
US6243773B1 (en) 1997-05-13 2001-06-05 Micron Electronics, Inc. Configuration management system for hot adding and hot replacing devices
US6266721B1 (en) 1997-05-13 2001-07-24 Micron Electronics, Inc. System architecture for remote access and control of environmental management
US6249828B1 (en) 1997-05-13 2001-06-19 Micron Electronics, Inc. Method for the hot swap of a mass storage adapter on a system including a statically loaded adapter driver
US6499073B1 (en) 1997-05-13 2002-12-24 Micron Electronics, Inc. System using programmable processor for selectively enabling or disabling power to adapter in response to respective request signals
US6163853A (en) 1997-05-13 2000-12-19 Micron Electronics, Inc. Method for communicating a software-generated pulse waveform between two servers in a network
US5892928A (en) * 1997-05-13 1999-04-06 Micron Electronics, Inc. Method for the hot add of a network adapter on a system including a dynamically loaded adapter driver
US6145098A (en) * 1997-05-13 2000-11-07 Micron Electronics, Inc. System for displaying system status
US6202111B1 (en) 1997-05-13 2001-03-13 Micron Electronics, Inc. Method for the hot add of a network adapter on a system including a statically loaded adapter driver
US6418492B1 (en) 1997-05-13 2002-07-09 Micron Electronics Method for computer implemented hot-swap and hot-add
US6249834B1 (en) 1997-05-13 2001-06-19 Micron Technology, Inc. System for expanding PCI bus loading capacity
US6247080B1 (en) 1997-05-13 2001-06-12 Micron Electronics, Inc. Method for the hot add of devices
US6219734B1 (en) 1997-05-13 2001-04-17 Micron Electronics, Inc. Method for the hot add of a mass storage adapter on a system including a statically loaded adapter driver
US6173346B1 (en) 1997-05-13 2001-01-09 Micron Electronics, Inc. Method for hot swapping a programmable storage adapter using a programmable processor for selectively enabling or disabling power to adapter slot in response to respective request signals
US6192434B1 (en) 1997-05-13 2001-02-20 Micron Electronics, Inc System for hot swapping a programmable adapter by using a programmable processor to selectively disabling and enabling power thereto upon receiving respective control signals
US6304929B1 (en) 1997-05-13 2001-10-16 Micron Electronics, Inc. Method for hot swapping a programmable adapter by using a programmable processor to selectively disabling and enabling power thereto upon receiving respective control signals
US6112249A (en) 1997-05-30 2000-08-29 International Business Machines Corporation Non-disruptively rerouting network communications from a secondary network path to a primary path
US5983371A (en) * 1997-07-11 1999-11-09 Marathon Technologies Corporation Active failure detection
US6175490B1 (en) 1997-10-01 2001-01-16 Micron Electronics, Inc. Fault tolerant computer system
US6263387B1 (en) 1997-10-01 2001-07-17 Micron Electronics, Inc. System for automatically configuring a server after hot add of a device
US6199173B1 (en) 1997-10-01 2001-03-06 Micron Electronics, Inc. Method for mapping environmental resources to memory for program access
US6138179A (en) * 1997-10-01 2000-10-24 Micron Electronics, Inc. System for automatically partitioning and formatting a primary hard disk for installing software in which selection of extended partition size is not related to size of hard disk
US6212585B1 (en) 1997-10-01 2001-04-03 Micron Electronics, Inc. Method of automatically configuring a server after hot add of a device
US6205503B1 (en) 1998-07-17 2001-03-20 Mallikarjunan Mahalingam Method for the hot swap and add of input/output platforms and devices
US6223234B1 (en) 1998-07-17 2001-04-24 Micron Electronics, Inc. Apparatus for the hot swap and add of input/output platforms and devices
US6078957A (en) * 1998-11-20 2000-06-20 Network Alchemy, Inc. Method and apparatus for a TCP/IP load balancing and failover process in an internet protocol (IP) network clustering system
US6768720B1 (en) * 1999-09-30 2004-07-27 Conexant Systems, Inc. Verification of link integrity of a home local area network
US6658595B1 (en) * 1999-10-19 2003-12-02 Cisco Technology, Inc. Method and system for asymmetrically maintaining system operability
WO2001086445A1 (en) * 2000-05-11 2001-11-15 Patmos International Corporation Connectionist topology computer/server
US6604030B1 (en) * 2000-06-06 2003-08-05 Ozuna Holdings Incorporated Single fault impervious integrated control and monitoring system
JP2001352335A (en) 2000-06-07 2001-12-21 Nec Corp Lan duplicate system and lan duplication method used for it
US7020709B1 (en) * 2000-06-30 2006-03-28 Intel Corporation System and method for fault tolerant stream splitting
US7318107B1 (en) 2000-06-30 2008-01-08 Intel Corporation System and method for automatic stream fail-over
EP1482720A1 (en) * 2003-05-28 2004-12-01 Ricoh Company, Ltd. Image processing apparatus and computer product
US9491084B2 (en) * 2004-06-17 2016-11-08 Hewlett Packard Enterprise Development Lp Monitoring path connectivity between teamed network resources of a computer system and a core network
US7724642B2 (en) * 2004-08-06 2010-05-25 Logic Controls, Inc. Method and apparatus for continuous operation of a point-of-sale system during a single point-of-failure
US7315963B2 (en) * 2004-08-10 2008-01-01 International Business Machines Corporation System and method for detecting errors in a network
JP2007257180A (en) * 2006-03-22 2007-10-04 Hitachi Ltd Network node, switch, and network fault recovery method
US8605573B2 (en) * 2008-06-26 2013-12-10 Shore Microsystems Inc. Autolearning network link protection device

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4228496A (en) * 1976-09-07 1980-10-14 Tandem Computers Incorporated Multiprocessor system
US4527270A (en) * 1983-05-04 1985-07-02 Allen-Bradley Company Communications network with stations that detect and automatically bypass faults
US4627048A (en) * 1984-10-09 1986-12-02 At&T Bell Laboratories Routing address bit selection in a packet switching network
US5034966A (en) * 1987-03-09 1991-07-23 Hochstein Peter A Redundant and fault tolerant communication link
US5084816A (en) * 1987-11-25 1992-01-28 Bell Communications Research, Inc. Real time fault tolerant transaction processing system
GB2219172B (en) * 1988-03-30 1992-07-08 Plessey Co Plc A data path checking system
JPH01284035A (en) * 1988-05-10 1989-11-15 Toshiba Corp Data transmission equipment
JP2591128B2 (en) * 1988-12-20 1997-03-19 日本電気株式会社 Communication line switching method
US5016244A (en) * 1989-09-08 1991-05-14 Honeywell Inc. Method for controlling failover between redundant network interface modules
US5115235A (en) * 1989-09-25 1992-05-19 Cabletron Systems, Inc. Flexible module interconnect system
US5091847A (en) * 1989-10-03 1992-02-25 Grumman Aerospace Corporation Fault tolerant interface station
US5311593A (en) * 1992-05-13 1994-05-10 Chipcom Corporation Security system for a network concentrator

Also Published As

Publication number Publication date
JPH07235933A (en) 1995-09-05
CA2117619A1 (en) 1995-04-16
JP2583023B2 (en) 1997-02-19
AU7581694A (en) 1995-05-04
DE69414219T2 (en) 1999-04-22
AU685497B2 (en) 1998-01-22
EP0649092A1 (en) 1995-04-19
US5448723A (en) 1995-09-05
DE69414219D1 (en) 1998-12-03
EP0649092B1 (en) 1998-10-28

Similar Documents

Publication Publication Date Title
CA2117619C (en) Method and apparatus for fault tolerant connection of a computing system to local area networks
US6173411B1 (en) Method and system for fault-tolerant network connection switchover
US6202170B1 (en) Equipment protection system
JPH11127129A (en) Line fault notification system for terminal device
JP2806614B2 (en) Dual signal repeater structure system
JP4340731B2 (en) Network fault monitoring processing system and method
JPH07131484A (en) Junction circuit nondisconnection detour system
US7210069B2 (en) Failure recovery in a multiprocessor configuration
EP1377905B1 (en) Method and apparatus for arbitrating transactions between domains in a computer system
Schmitter et al. The basic fault-tolerant system
Cisco Fault Tolerance
Cisco Fault Tolerance
Cisco Fault Tolerance
Cisco Fault Tolerance
Cisco Fault Tolerance
Cisco Fault Tolerance
JP2001344125A (en) Dual node system
JPH07319836A (en) Fault monitoring system
KR960003784B1 (en) Interconnection and its operation of processor unit communication
KR960010879B1 (en) Bus duplexing control of multiple processor
JPH10303966A (en) Failure detection system of redundant configuration system of inter-network device
KR940002145B1 (en) Level 3-3 network unit operating apparatus for signal repeating system
KR930002775B1 (en) Duplex structure signal transfer point system for common channel signalling system no.7
CN116347489A (en) Service processing method based on data center
JPH03261244A (en) Lan control system

Legal Events

Date Code Title Description
EEER Examination request
MKLA Lapsed