CA2117844C - Data network switch - Google Patents

Data network switch Download PDF

Info

Publication number
CA2117844C
CA2117844C CA002117844A CA2117844A CA2117844C CA 2117844 C CA2117844 C CA 2117844C CA 002117844 A CA002117844 A CA 002117844A CA 2117844 A CA2117844 A CA 2117844A CA 2117844 C CA2117844 C CA 2117844C
Authority
CA
Canada
Prior art keywords
data
switch
data network
network switch
possible solutions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CA002117844A
Other languages
French (fr)
Other versions
CA2117844A1 (en
Inventor
Andrew Timothy Pepper
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
General Datacomm Inc
Original Assignee
General Datacomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by General Datacomm Inc filed Critical General Datacomm Inc
Publication of CA2117844A1 publication Critical patent/CA2117844A1/en
Application granted granted Critical
Publication of CA2117844C publication Critical patent/CA2117844C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/10Packet switching elements characterised by the switching fabric construction
    • H04L49/104Asynchronous transfer mode [ATM] switching fabrics
    • H04L49/105ATM switching elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/40Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass for recovering from a failure of a protocol instance or entity, e.g. service redundancy protocols, protocol state redundancy or protocol service redirection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q11/00Selecting arrangements for multiplex systems
    • H04Q11/04Selecting arrangements for multiplex systems for time-division multiplexing
    • H04Q11/0428Integrated services digital network, i.e. systems for transmission of different types of digitised signals, e.g. speech, data, telecentral, television signals
    • H04Q11/0478Provisions for broadband connections
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/54Store-and-forward switching systems 
    • H04L12/56Packet switching systems
    • H04L12/5601Transfer mode dependent, e.g. ATM
    • H04L2012/5625Operations, administration and maintenance [OAM]
    • H04L2012/5627Fault tolerance and recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/54Store-and-forward switching systems 
    • H04L12/56Packet switching systems
    • H04L12/5601Transfer mode dependent, e.g. ATM
    • H04L2012/5628Testing

Abstract

A data network switch for switching data between a plurality of data links comprises a programmable con-troller for adapting the data switching in accordance with the state of the switch and the data links there-with, the controller comprising detector means for de-tecting a problem in the operation of the switch, memory means storing information on possible problems, their causes and possible solutions, and control means for:

(a) determining from the information stored in the memory means possible causes of the problem according to the nature of the problem detected, and for determining from the possible causes a ranked series of possible solutions;

(b) carrying out the first of the possible solutions;

(c) checking the output of the detector means af-ter a predetermined period of time, and, if the problem is still present, carrying out the next of the possible solutions; and (d) repeating step (c) until the problem is no longer present or no further possible solutions are left.

Description

2~:~.~844 This invention relates to a data network switch, for example a network switch for use in a packet data network operating to the X.25 protocol, or an asynchro-nous transfer mode (ATM) switch.
Essential features of the operation of such switches are high reliability and. tolerance of faults.
Providing redundant hardware is only a part of fault tolerance. It is also necessary to determine the nature of any problem that occurs with the main hardware and to make the best use of the redundant hardware in the circumstances.
Published UK patent application No 22T1041 dis-closes a switch in which a contra 1 means monitors the operation of the switch and adapts the switching in ac-cordance with a stored list of instructions upon detec-tion of a predetermined condition. This may be, for ZO example, time or data traffic conditions or a fault in one of the data links As an example, where the sc~ritch contains a main communications controller and a ;spare controller, the switch may be programmed to use the spare controller if the main controller fails. However, there are several possible disadvantages with this. Eor example, it may - Z -2~1~~~
he that the controller isno.t itself faulty, but the link to it has failed. There is also no provision for dealing with the problem of the spare controller also failing. Lf the main controller (or its link) recovers, should communications be returned to it from the spare?
Should the Network Management System be notified of the switch-over from one controller ar li-nk to the other?
The answers to these questions will depend on the situation. For example, if the spare controller uses an expensive "pay on usage" public data circuit to provide a back-up for .the main "leased line", it will be desir-able to return to the main line as soon as possible:
But if the service carried is such as to demand high reliability, for example video coverage for remote sur-gery, it will be necessary to delay switching back to the main line until there is no risk of further disrup-tion to the service.
The present invention provides a network switch which can be configured to provide heuristic solutions to management problems within the network components.
According to the invention there is provided a data network switch for switching data between a plurality of data links, the switch comprising' a programmable con-trolley for adapting the data switching in accordance with the state of the switch and the data links 2 _ I' therewith, the controller comprising detector means for detecting a problem in the operation of the switch, memory means storing information on possible problems, their causes and possible solutions, and control means for:
(a) determining from the information stored in the memory means possible causes of the problem according to the nature of the problem detected, and for determining from the possible causes a ranked series of possible solutions;
(b) carrying out the firsi~ of the possible solutions;
(c) checking the output of th.e detector means af-ter a predetermined period of time, and, if the problem is still present, carrying out the next of the possible solutions; and (d) repeating step (c) until. the problem is no longer present or'no further possible solutions are left.
Examples of problems which may be experienced with a data network switch are:
the physical failure of an extE~rnal link;
an external link becoming noiay or unreliable be-cause of a faulty connection, for example;
hardware failures within the switch, for example the slot controller, the connections thereto, or the switch fabric;
software failures..
Preferably, the programmable controller is also arranged to adapt the data switching in accordance with fulfilment of any one of a plurality of predetermined conditions. The programmable controller preferably comprises a stored program defining the conditions and arranged to monitor continuously or repeatedly fulfil-merit of the conditions.
The swi ch may be an X.25 switch of the general type disclosed in published UK patent application 2271041, an ATM switch of the general type disclosed in published UIi patent application 2273224, or a data net-work switch operating under another, protocol.
In the accompanying drawings:
Figure 1 is a block diagram of a conventional ATM.
switch;
Figure 2 is a block diagram ;showing the main con-trot functions of the switch in accordance with an exemplary embodiment of the invention; and Figure 3 is a flow diagram showing the process of problem solving undertaken by the switch of the invention.
2~
In a typical data network switch, as i.Llustrated by Figure l, the switch fabric 1 consists of a cross-point switch having eight inputs and eight outputs, for exam-ple. It will be appreciated that a greater or smaller number of connections is possible. Each input/output pair 2 from the cross-point switch. 1 is connected to a link controller card 3, referred t:o as a slot control-ler, controlling two or four external lines 4 via a line interface module (LIM) 5. In addition to the active slot controllers 3, there is a "slot 0" controller whose function is to monitor the status o f the other control-lers by sending out "health check cells" to the other slot controllers at regular intervals. The controllers respond to these cells, if working correctly; the ab-sence of a response indicates a problem with the slot.
Information by which the operation of the switch is controlled is stored in a Managed Information Base (MIB), into which data relating to the status of indi-vidual switch components is written and from which con-figuration data for the components is read. Thus, each of the functional components of the switch communicates with the MIB.
In the switch of the present invention, illustrated in Figure 2, a controller separate from the switch con-troller, the Problem Solving Intelligence or PSI, moni-tors the state of the MIB and reacta to the detection c~f a problem by determining from its own Rules data store the likely causes of the problem an:~d the possible solu-tions. Each of the functional components of the switch, for example main and standby links, looks to information in the MIB to determine its correct operating state.
Thus, if the information stored in the MIB indicates that the link should be active instead of inactive, the link is brought into use.
The PSI constantly monitors the information stored in the MIB to detect whether a problem has arisen, for example that a particular slot controller has not re=
sponded to the health check cells sent to it by the slot 0 controller. If it detects such a problem, it refers to the rules stored in the Rules data store to determine likely causes, for example that the slot controller has failed, the connections to it have failed, the LIM has failed, the switch fabric itself has failed or perhaps that the interface to slot 0 has itself failed. These likely causes are then used to determine appropriate possible solutions, for example switch to another link or another controller, again using 'the stored rules, the solutions being ranked according to which is most likely to succeed having regard to the possible causes identi-fied. The PSI then writes to the :MIB to try the solu-tions, in turn, at the same time in:itiating a monitoring process which watches the information written into the MIB to see whether the action taken has been effective.
21I78~4 If one trial solution does not return the switch to its desired operational state within a predetermined period of time, the PSI writes data to the MIB which will bring about the next most likely of the possible solutions.
The process is repeated until the problem is solved, or all the possible solutions have been attempted, when a warning to the operator can be generated. The process is illustrated by the flowsheet of Figure 3.
The PSI comprises a script-ba:>ed language in which certain statements are particularly adapted to the monitoring and problem seeking functions. For example:
every Syntax: every(condition) statement block The every statement sets up the conditions for the statement block to be executed. Unlike a conventional if statement, each every statement generates a small task that watches for its particular event to occur.
For example, the following program:
name main timercell onesec onesec=1 every (onesec==0) f 211'~8~4 r onesec=10 cursor 1,60 print dv2system.currentTime refresh }
wait sets up the every so that every time onesec reaches 0 the code inside the curly brackets is executed. The wait statement pauses the procedure, waiting for an event to occur. Multiple every statements can be placed in a procedure and the PSI interpreter will look for any of the conditions to be true to execute the correspond-ing statement block.
spawn Syntax: spawn filename, procedure name The spawn statement runs the procedure procedure name from the file filename as a background task simul-taneously with the original task.
try Syntax: try action, test, period The try statement spawns two tasks, one with the name action, the other with the name test. Both spawned tasks are run in the background. When executing try statements, PSI will monitor the waxy both the action and test procedures stop. If abort is used to stop the ac-tion procedure, then PSI will recal:L the test procedure.
_ g _ v 2117844 If stop is used to stop the test procedure, then PSI
assumes that the problem is now resolved, and the rest of the current statement block is skipped until the next close curly bracket is met.
If stop is used to stop the action procedure, then PSI will wait for timeout second~> for test to return "successful", i.e. to terminate using a stop.
wait The wait statement suspends t:he currently running procedure. This is generally used after a number of every or when statements have been executed to wait for the specified events to occur.
s top This statement performs a clE:an stop of a proce dure. Any resource allocated to the procedure is shut down. If the procedure was generated as the test procedure iri a try statement, then the original proce dure skips the rest of the block. For example, in the program:
name main every (.slotsdown) try swapslot,testslot try swapxp,testxp }
stop - 2II'~84~
the first try statement runs t:he procedure testslot to test to see if the slot has come' back into operation.
If the testslot procedure uses a stop statement to ter-urinate, then PSI will skip to the end of the code block.

Claims (8)

THE EMBODIMENTS OF THE INVENTION IN WHICH AN EXCLUSIVE
PROPERTY OR PRIVILEGE IS CLAIMED ARE DEFINED AS FOLLOWS:
1. A data network switch for switching data be-tween a plurality of data links, the switch comprising a programmable controller for adapting the data switching in accordance with the state of the switch and the data links therewith, the controller comprising detector means for detecting a problem in the operation of the switch, memory means storing information on possible problems, their causes and possible solutions, and con-trol means for:

(a) determining from the information stored in the memory means possible causes of the problem according to the nature of the problem detected, and for determining from the possible causes a ranked series of possible solutions;

(b) carrying out the first of the possible solutions;

(c) checking the output of the detector means af-ter a predetermined period of time, and, if the problem is still present, carrying out then next of the possible solutions; and (d) repeating step (c) until the problem is no longer present or no further possible solutions are left.
2. A data network switch according to Claim 1, wherein the programmable controller is also arranged to adapt the data switching in accordance with fulfilment of any one of a plurality of predetermined conditions.
3. A data network switch according to Claim 2, wherein the programmable controller comprises a stored program defining the conditions and arranged to monitor continuously or repeatedly fulfilment of the conditions.
4. A data network switch according to Claim 1, arranged to switch data in packet form.
5. A data network switch according to Claim 1, wherein at least one of the conditions is a time of day.
6. A data network switch according to Claim 1, wherein the control means is arranged to output an op-erator warning signal when no further possible solutions are left in step (d).
7. A data network switch according to Claim 1 which is an ATM cell switch.
8. A data network switch according to Claim 1, comprising a Managed Information Base (MIB) receiving and storing information relating to the operation of all the functional components of the switch, wherein the detector means is arranged to monitor the information in the MIB to detect problems in the operation of the switch.
CA002117844A 1993-10-13 1994-10-11 Data network switch Expired - Fee Related CA2117844C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB9321165A GB2282935B (en) 1993-10-13 1993-10-13 Data network switch
GB9321165.4 1993-10-13

Publications (2)

Publication Number Publication Date
CA2117844A1 CA2117844A1 (en) 1995-04-14
CA2117844C true CA2117844C (en) 2002-01-01

Family

ID=10743508

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002117844A Expired - Fee Related CA2117844C (en) 1993-10-13 1994-10-11 Data network switch

Country Status (3)

Country Link
US (1) US5461609A (en)
CA (1) CA2117844C (en)
GB (1) GB2282935B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5956342A (en) 1995-07-19 1999-09-21 Fujitsu Network Communications, Inc. Priority arbitration for point-to-point and multipoint transmission
US5913037A (en) * 1996-07-03 1999-06-15 Compaq Computer Corporation Dynamic management information base manager
JP3794151B2 (en) * 1998-02-16 2006-07-05 株式会社日立製作所 Information processing apparatus having crossbar switch and crossbar switch control method
DE69934852T2 (en) * 1998-09-11 2007-10-18 Hitachi, Ltd. IP packet communication apparatus
US6507863B2 (en) * 1999-01-27 2003-01-14 International Business Machines Corporation Dynamic multicast routing facility for a distributed computing environment
US6615259B1 (en) 1999-05-20 2003-09-02 International Business Machines Corporation Method and apparatus for scanning a web site in a distributed data processing system for problem determination
US6906998B1 (en) * 1999-08-13 2005-06-14 Nortel Networks Limited Switching device interfaces
US6868057B1 (en) * 1999-12-08 2005-03-15 Lucent Technologies Inc. Automatic protection switch decision engine
JP2001249828A (en) * 1999-12-28 2001-09-14 Toshiba Lsi System Support Kk Information processor, computer readable storage medium in which failure analysis program is stored, failure analyzing method and application program development support system
US7583590B2 (en) * 2005-07-15 2009-09-01 Telefonaktiebolaget L M Ericsson (Publ) Router and method for protocol process migration
US8422358B2 (en) 2010-11-23 2013-04-16 International Business Machines Corporation Best-path evaluation based on reliability of network interface layers

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4412323A (en) * 1980-06-03 1983-10-25 Rockwell International Corporation Muldem with improved monitoring and control system
US4561090A (en) * 1983-05-18 1985-12-24 At&T Bell Laboratories Integrated self-checking packet switch node
US5138615A (en) * 1989-06-22 1992-08-11 Digital Equipment Corporation Reconfiguration system and method for high-speed mesh connected local area network

Also Published As

Publication number Publication date
GB2282935B (en) 1998-01-07
GB9321165D0 (en) 1993-12-01
CA2117844A1 (en) 1995-04-14
GB2282935A (en) 1995-04-19
US5461609A (en) 1995-10-24

Similar Documents

Publication Publication Date Title
US5875290A (en) Method and program product for synchronizing operator initiated commands with a failover process in a distributed processing system
CA2117844C (en) Data network switch
US6012150A (en) Apparatus for synchronizing operator initiated commands with a failover process in a distributed processing system
US20060085669A1 (en) System and method for supporting automatic protection switching between multiple node pairs using common agent architecture
US6385665B1 (en) System and method for managing faults in a data transmission system
CN107147540A (en) Fault handling method and troubleshooting cluster in highly available system
IL105671A (en) Distributed control methodology and mechanism for implementing automatic protection switching
CN106789306A (en) Restoration methods and system are collected in communication equipment software fault detect
JPH07183948A (en) Processing method for data that generates rule that predicts phenomenon that arises in communication system
CN111107572A (en) Redundancy backup method and device
JPH08320835A (en) Fault detecting method for external bus
CN103186435B (en) System mistake disposal route and the server system using the method
CN114428451B (en) Method for switching external communication permission of redundant communication module
JPH06236299A (en) Method and device for monitoring system
JPH04299743A (en) Computer network system
RU2177179C1 (en) Burglar and fire alarm system
US20190362620A1 (en) Fire-prevention control unit
KR100440588B1 (en) Status Recognition and Alarm Device of Serial Bus Type Supporting hierarchical Structure
CN115616894B (en) Satellite system control method, satellite system and equipment
CN115598961A (en) Redundancy switching arbitration method and device and electronic medium
JPH0583366A (en) Network control system
RU2694008C1 (en) Method for dynamic reconfiguration of computing systems of modular architecture
JPH09162976A (en) Method for controlling module operation state of distributed processing system
JPH06348620A (en) System switching method for multiplex system
JPH02216931A (en) Fault information reporting system

Legal Events

Date Code Title Description
EEER Examination request
MKLA Lapsed