|Publication number||US4634110 A|
|Application number||US 06/518,291|
|Publication date||Jan 6, 1987|
|Filing date||Jul 28, 1983|
|Priority date||Jul 28, 1983|
|Publication number||06518291, 518291, US 4634110 A, US 4634110A, US-A-4634110, US4634110 A, US4634110A|
|Inventors||Paul M. Julich, Jeffrey B. Pearce|
|Original Assignee||Harris Corporation|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (8), Non-Patent Citations (2), Referenced by (160), Classifications (12), Legal Events (4)|
|External Links: USPTO, USPTO Assignment, Espacenet|
The present invention relates in general to redundant multi-processor data handling and signal processing systems and, in particular, to a scheme for carrying out fault diagnostic testing and controlling the functional insertion and removal of the redundant units within such a system in accordance with the results of such fault diagnostic tests.
With continuing advances in the power and versatility of data processing networks and with improvements in circuit miniaturization, applications of such networks to a variety of data handling and information processing systems have also expanded. Microprocessor-based control systems continue to replace a number of system control functions that were formerly performed manually or through cumbersome mechanical/hydraulic linkage configurations. One environment where weight and space restrictions make microprocessor-based control particularly attractive is an airborne control system (e.g. spacecraft/aircraft avionics, weapons delivery, sensor-response system). Because of the critical nature of a significant number of control functions involved (inherent in the nature of the system being controlled), redundancy (backup availability) and fault diagnostics constitute an essential ingredient in the utility and operational success of such a network as a substitute for mechanical/hydraulic control. Moreover, redundancy itself is usually both quantitatively and qualitatively structured in an effort to provide the sought-after fail-safe capability of the system.
For example, in a spacecraft environment, which is extremely remote from a ground service/maintenance facility, high (triple or greater) redundancy is commonly employed to ensure continuous system operation. In non-spacecraft airborne vehicles, a hybrid redundancy approach, where a prime electrical flight control network is augmented by a mechanical/hydraulic link, or vice verse, may be employed.
In a multi-processor redundancy system, fault testing and redundancy management have often incorporated a voting scheme for failure detection and/or selection of which redundant system is to be placed on-line. In a high redundancy network, voting among master controllers, for example, is capable of detecting a failure, i.e. a mismatch among the master controllers, and typically follows an odd man-out rule to identify and exclude a faulty controller. The penalty for such an approach is the considerable cost and hardware (added weight and space) which must be borne by the network. In a dual redundancy network, on the other hand, such a voting technique can only detect a failure, but cannot identify which controller is the faulty unit so that reconfiguration of the network cannot occur. As a result, voting cannot be relied upon as a primary fault tolerance mechanism in a dual redundancy network. A second problem with conventional redundancy management systems is the cascading of faults, i.e. a single fault may cause faults in two or more units due to the coupling introduced by the redundancy management equipment.
In accordance with the present invention the above-mentioned drawbacks of conventional redundancy networks, including their associated fault diagnostics, are overcome by a fault detection and redundancy management system that reduces hardware requirements while providing accurate fault detection and network auto-configuration and preventing the cascading effect of faults as needed. To this end, the present invention employs a dual redundancy based network architecture in which the principal control components (master units) are multiprocessor-configured and programmed to repetitively carry out intra- and inter-unit performance tests as an a priori requirement for network command capability. These performance tests are carried out by independent devices in a prescribed sequence to define the fault detection procedure. As a first step in this procedure, each processor in a master unit performs a thorough self-test of its own functional capability and the capability of the master unit as seen from the processor's vantage point. This may include, but is not limited to, a test of internal memory, ALU and data parity. Secondly, if a processor has determined that it has passed all of these internal procedures and external checks appear acceptable, it must then successfully inform a designated "chief" processor via an interprocessor handshake. This arrangement is fail-safe because any failure to execute the correct procedure will result in a redundancy management action. This interprocessor handshake is effected by causing each processor in the master unit to set a flag in shared memory during a prescribed time interval. These flags are read by the chief processor to determine whether to enable an associated bus controller for the next succeeding time interval, and once these flags have been read they are reset by the chief processor, as each processor is required to refresh the handshake flag during successive repetitive time intervals. If the flags indicate that each processor is capable of successfully performing its intended processing operations, the chief processor then raises respective MASTER REQUEST and CPU RESET INHIBIT flags in preselected locations in memory, to advise the bus controller of the health of the processors of the master unit. The CPU RESET INHIBIT flag is employed to prevent resetting of the processors during initialization.
The chief processor in a master unit also performs a number of additional checks to determine the ability of the master unit to perform as bus master (take command of the network). These tests include two checks to verify synchronization of the bus controller with the chief processor. A clock/interrupt handshake is used to verify that a repetitive interrupt check and a data input interrupt, which is generated by the bus controller during a polling sequence, occur in the proper order. A loss of synchronization test is carried out to ensure that the processing units and their associated bus controller are operating during the same time window. Should the chief processor detect an anomaly in these tests, it stops enabling the bus controller and fails to set the CPU RESET INHIBIT flag to cause the master unit in question to go off-line, perform a CPU reset and attempt to once again gain "MASTER" status.
The bus controller is enabled by a dedicated flag that is placed in memory by the chief processor - the MASTER REQUEST flag (referenced above). In addition, a CPU RESET INHIBIT flag is employed. In normal operation, both flags will be set periodically by the chief processor and read and reset periodically by the bus controller. In the event of a fault both flags would normally fail being set. In this circumstance the bus controller will drop off-line and a reset of all units in the affected master unit will occur. (This constitutes and attempt to correct the fault.) During initialization following a CPU reset, the CPU RESET INHIBIT flag is set by the chief processor without the MASTER REQUEST flag being set. This enables the master unit to initialize fully before further redundancy management action is taken. To maintain, or be eligible for, bus mastership, the chief processor in a master controller must refresh both of these flags at a prescribed repetition rate. Failure to set the MASTER REQUEST flag prevents the impacted bus controller from initiating commands on the network bus. Failure to set the CPU RESET INHIBIT flag causes all processors within the master unit to be reset by the bus controller.
If the chief processor of the master unit currently acting as the bus master fails to set either the MASTER REQUEST flag or the CPU-RESET INHIBIT flag, the bus controller associated with the master unit will cease polling remote devices on the network bus. The bus controller of the redundant (backup) master unit will detect silence on the network bus and, provided that its MASTER REQUEST and CPU RESET INHIBIT flags are set, will then take command of the network bus. This procedure allows command of the bus to be transferred without direct communication (e.g. voting) between master units.
While the above described procedure provides automatic reconfiguration of the network in response to a single failure, because of multiple independent processor configuration of the architecture, it is possible to program the system to tolerate more than a single failure if the network can be permitted to operate in a degraded mode, depending upon the network application and the impact of the failure(s). In this circumstance, the operation-control routine of the chief processor may be established to simply log a failure from another processor that does not critically affect network operation, for the purpose maintenance or operator (pilot) information, and thereafter continue to set the MASTER REQUEST and CPU RESET INHIBIT flags. Because the processor which detects a failure during a self-health test does not carry-out an interprocessor handshake, it is latched out of the system until a CPU RESET occurs. During this time, a preestablished fault recovery procedure stored in local memory is carried out by the faulty processor, and the system continues to operate in the acceptable degraded mode without the faulty processor.
Applications which do not require redundant master units may also provide a fault tolerant operation with degraded performance in response to a fault. For this purpose, each processor is programmed to assume responsibility for setting the MASTER REQUEST and CPU RESET INHIBIT flags. A processor which detects a failure in its own operation will disable its interrupts and execute the above-mentioned preestablished fault recovery procedure while awaiting a CPU RESET. As long as at least one processor is operational, it will set the MASTER REQUEST and CPU RESET INHIBIT flags so that the system will continue to operate, but without the benefit of those tasks assigned to the faulty processor(s). The bus controller may be programmed to periodically output the processor status flags to a remote unit, so as to provide an indication of processor status.
FIG. 1 is a block diagram of a dual redundancy data handling/signal processing network;
FIG. 2 is a block diagram of the architecture of a master unit employed in the network of FIG. 1;
FIG. 3 is a block diagram of the architecture of a remote (local) unit of the network of FIG. 1;
FIG. 4 is a timing diagram of the constituent portions of an individual one of a series of repetitive frames that govern the operation of the network of FIG. 1;
FIG. 5 is a state diagram of a bus master unit as related to command of the network bus;
FIG. 6 is a flow chart of the redundancy management/fault diagnostic routine carried out by an individual processor of a master unit;
FIG. 7 is a flow chart of the redundancy management/fault diagnostic routine carried out by the chief processor of a master unit; and
FIG. 8 is a flow chart of the redundancy management/fault diagnostic routine carried out by the bus controller of a master unit.
Before describing, in detail, the particular improved fault detection and redundancy management system in accordance with the present invention, it should be observed that the present invention resides primarily in a novel structural combination of conventional computer circuits and not in the particular detailed configurations thereof. Accordingly, the structure, control and arrangement of such circuits have been illustrated in the drawings by readily understandable block representations and schematic diagrams, which show only those specific details that are pertinent to the present invention, in order not to obscure the disclosure with structural details which will be readily apparent to those skilled in the art having the benefit of the description herein. In addition, various portions of an electronic data processing system have been appropriately consolidated and simplified in order to emphasize those portions that are most pertinent to the present invention. Thus, the block diagram illustrations of the Figures do not necessarily represent the mechanical structural arrangement of the exemplary system, but are primarily intended to illustrate the major structural components of the system in a convenient functional grouping, whereby the present invention can be more readily understood.
Referring to FIG. 1 there is shown a block diagram of a dual redundancy data handling/signal processing network, the various units of which are distributed along a pair of identical communication buses 10A and 10B which serve as the basic signal transmission link for all the units of the network. As such, each bus 10A and 10B contains a set of address, data and control links through which messages are transferred among the bus users. As noted briefly above, an exemplary environment in which the network architecture of FIG. 1 may be advantageously employed is an airborne (e.g. helicopter) flight control system for performing a number of flight-critical and mission-related tasks such as automated flight control, navigation, and engine/weapons monitoring. It should be observed, however, that such network architecture and the task assignments implemented therein are not limited to this or any other environment and, accordingly, should not be so construed. Instead, both the dual redundancy network and the fault detection and redundancy management system incorporated therein have utility in a number of applications. An airborne control system is described as an exemplary environment because of weight, size limitations, success-dependent nature of the tasks carried out by the network and the critical need to continuously provide system operation, regardless of the occurrence of subsystem anomalies. Given an airborne environment, each bus 10A and 10B may be military standard 1553B buses to which respective sets of master units 11A, 11B, local units 12A and 12B and remote units 13A, 13B are interfaced.
Each of master units 11A, 11B is the basic component through which all data handling/signal processing and operation of the network is effected, and each provides a redundant processing capability (with respect to the other master). The local units 12A, 12B and the remote units 13A, 13B constitute a dual redundant set of interfaces between the master units 11A, 11B and external flight control sensors and actuators (not shown).
The internal architecture of a master unit is shown in FIG. 2 as comprising a bus controller or bus interface unit (BIU) 21, through which messages to and from the master unit are interfaced with each of buses 10A and 10B, a set of shared memory units 22 and 23 and a plurality of (four in the example shown) processors 24-27 and associated processor memories 34-37. Communications internal to the master unit are carried out over an internal shared bus 30. In the set of memory units, memory unit 22 is a shared random access memory (RAM) which provides input/output (I/O) buffer storage for data transfers to and from buses 10A and 10B via bus controller 21 and for message transfers among internal processors 24-27. Memory unit 23 is a shared nonvolatile memory for storing critical configuration and control parameter data which cannot be lost in the event of a system failure or shut down. Each of memory units 22 and 23 also serves as a shared data storage area within the master unit having specified addresses to and from which intra-master unit communications take place. Namely, this shared memory set serves as a "bulletin board" for data that may be required by processors 24-27 or bus controller 21 within the master unit. Thus, results of computations by processors 24-27 are placed in shared memory where they can be read by any unit in the master via intra-master bus 30. This bulletin board approach to storing the results of processor actions (flight control computations, sensor output evaluations, etc.) simplifies the housekeeping required for message transfers among processors 24-27 and bus controller 21.
Access to bus 30 is effected in accordance with a preestablished hardware arbitration scheme among the components of the master unit to prevent conflicting memory access requests and collisions on the bus and to prevent any unit on the bus from tying up the bus, as is conventionally employed in multi-processor systems. For this purpose bus 30 is a distributed bus structure, with the arbitration scheme daisy-chained among users of the bus. Any unit which requests and is given access to the bus is permitted only a prescribed communication interval for message transmission, after which it relinquishes the bus to the next requiring unit. This prevents the bus from being monopolized by one unit and thereby prevents cascading of faults. Again, as noted previously, the circuits and bus structure of which the dual redundancy network embodying the present invention are configured employ conventional components and communication protocol and details of the same will not be described except where necessary to provide full appreciation of the invention.
As shown in FIG. 2 one of the processors 24-27, here processor 24, is designated as a "chief" processor which is used to coordinate the fault detection and redundancy management activity between the processors and the bus controller in accordance with the present invention. It is to be noted that any of processors 24-27 may be designated to perform this function, as it is established by software. Moreover, the number of processors and associated memories is not limited to four but may be any number consistent with the required speed of the system and the principles of the invention. The incorporation of processors 24-27 and associated memories 34-37 with a pair of shared memories is based upon the number of flight/mission control functions to be carried out in the selected (e.g. helicopter) airborne environment used in the present example.
As described above each master unit communicates over bus 10 with associated I/O controllers (the local and remote units) via bus controller 21. Bus controller 21 independently handles the communications over bus 10 via the assigned protocol (such as ML/STD 1553B in the present example) for both transmission and reception of messages. During transmission from the master unit, bus controller 21 accesses preassigned buffer storage in either of memories 22 and 23 and assembles the accessed data into the format of bus 10. During reception an incoming message on bus 10 is captured by the bus controller 21 and the information is placed in specified locations in shared memory 22/23. This relieves processors 24-27 of communication bus housekeeping chores, so that they can be maximally used for high speed computations and isolates failures in the processors from failures in the bus controller.
Local units 12A, 12B and remote units 13A, 13B interface data/control transfers between the master units 11A, 11B and sensor/actuator elements of the aircraft and are comprised of a set of I/O, interface and storage elements shown in FIG. 3. As in a master unit each local and remote unit contains a bus interface unit 41 through which messages to and from the bus 10 are coupled to and from buffer storage in local memory 43. Local memory 43 has an associated memory controller 42 and is coupled over bus 45 to I/O controller 44 and bus interface unit 41. I/O controller 44 contains digital-analog and analog-digital converter circuitry, as well as buffer storage units and signal amplifier circuitry, for interfacing signals between the flight action components of the aircraft and the local memory 43. The designations "local" and "remote" are used simply from a standpoint of proximity of these units to the hardware housing the master units in the aircraft. For example, when the master unit rack bay is adjacent to the cockpit part of the helicopter, the forward sensors are serviced by a "local" unit, whereas the tail rotor is serviced by a "remote" unit. Also, the number of such units is not limited to the force shown, but may be any number consistent with the demands of the environment and the specified network bus.
As pointed out above, operation of the redundancy network is controlled by one of the master units 11A, 11B, with the other master, if healthy, being in a standby state and carrying out exactly the same computations in its internal processors so that it is continuously available as a replacement for the command master in the event of a fault. The units of the network are interrupt-driven and synchronized by a repetitive timing signal scheme shown in FIG. 4. Within each master a basic system clock is employed to generate a SYNCHRONIZE COMMAND on bus 10 at the beginning of each of a sequence of prescribed time intervals, termed frames, each of which is comprised of a plurality of shorter intervals, termed superframes, so that a superframe is repeated at some multiple of the repetition rate of a frame. For example in a helicopter environment, a SYNCHRONIZE COMMAND may be issued at a 30 Hz repetition rate, with superframe interrupts (SIs) repeated at a 180 Hz rate. The SYNCHRONIZE COMMAND from each of master units 11A and 11B is conveyed over each of buses 10A and 10B in alternating succession, so that any unit along bus 10 that cannot receive messages on one bus (10A or 10B) will receive the SYNCHRONIZE COMMAND on the other bus (at a 15 Hz rate which is sufficient to maintain adequate synchronization among the units).
Within the master unit proper, synchronization on intra-master bus 30 is controlled by a superframe interrupt (SI) which is generated at the beginning of each superframe (at a 180 Hz rate). This superframe interrupt (SI) is used to identify the beginning of each superframe with six superframes occurring within a frame at the interrupt repetition ratio (180 Hz/30 Hz=6 superframes/frame). It should be observed, however, that the timing scheme employed herein is not limited to these frequencies or ratios but may be tailored according to the needs of the environment and signal processing components employed. Through the use of the superframe interrupts which occur at a multiple of the basic SYNCHRONIZE COMMAND tasks can be easily scheduled to run at multiples of the superframe interrupt rate. For example, sensor data required every other superframe could be processed in the master unit at a 90 Hz rate.
Prior to describing the fault detection and redundancy management scheme incorporated in the network architecture described above, an explanation of the basic network operation will be presented in order that the impact of the invention on such network may be more readily appreciated. In the description to follow the operational sequence will be understood to relate to both redundant systems A and B of the network although only a single system will be described.
In the environment under consideration, flight critical data is continuously and repetitively being made available for processing, is updated, and control signals are generated and refined to carry out aircraft navigation, control surface adjustment, engine control, etc. This data is multiplexed (in time division multiplexed (TDM) message packets) over the network bus 10 among the local and remote units and the master unit. The master unit has stored in memory a primary control (executive) program that dictates the data handling/signal processing sequence to be executed. (The details of the various routines within the executive task program are not required for an understanding of the present invention and, accordingly, will not be described here. It should be noted, however, that critical deterministic processing tasks are guaranteed execution time by virtue of the superframe interrupt timing scheme shown in FIG. 4.)
More particularly, at the beginning of each superframe, a superframe interrupt (SI) causes the master unit's bus controller 21 to access a polling list stored in dedicated memory which identifies I/O operations that must be performed during that particular superframe. In accordance with this list bus controller 21 will write inputs (from interface units (local/remote units) supplied over bus 10) into memory 22/23 via bus 30, or will read out data stored in memory 22/23 and assemble messages containing this data for transmission over bus 10 to the appropriate interface device (local unit 12 or remote unit 13). Upon completion of the input messages for a given superframe bus controller 21 generates a data input interrupt (DII) (which can occur at any point in the polling list) to which the processors in the master unit respond by processing new data that has been written into stored memory during the I/O window between the superframe interrupt (SI) and the data input interrupt (DII). When the items of the current list have been processed in processors 24-27 and the results have been placed in their preassigned locations in memories 22/23, the master unit (being interrupt driven as noted above) waits until the next superframe interrupt, whereupon the above process is repeated, using the polling list for the next superframe. It should be noted here that the various data processing tasks that are required to be carried out for proper operation and control of the environment of interest (e.g. helicopter flight/mission control) are appropriately subdivided among the superframes so that they are guaranteed to run to completion once started. In other words, loading allocations within the task list are effected on a worst-case basis, to permit sufficient time in a superframe for the processors of the master unit to complete execution of their assigned tasks. The length of a frame and the number of superframes per frame, as well as the speed of the processing, will govern the number of tasks that may be performed within the context of operation of the particular control function of interest. In the present example of an airborne flight/mission control network, the 5.5 msec time window per superframe within a frame window of 33.33 msec has been found to be sufficient for performing such tasks and also adequate to allow for additional statistical processing (among the background tasks and which need not be completed in a specific time frame) that are not critical to the successful operation of the environment (aircraft) being controlled.
In addition to ensuring that task allocation among the processing intervals (superframes/frame) is adequate to generate successful system operation when the network is performing properly, there must also be built into the network the capability of effecting replacement of a network component upon the detection of a fault. Since data handling/signal processing functions are performed by the master units, it is imperative that the performance of the master unit be monitored continuously and, when a prescribed fault tolerance is exceeded, that the redundant master unit be substituted immediately. As pointed out above, this is accomplished according to the present invention by a new and improved fault detection and redundancy management scheme pursuant to which the health of the bus master is checked by multiple independent processors through a mechanism of requiring the internal makeup of each master to examine itself and its view of the health of the master unit and report on that check.
As described above, for controlling the operation of a helicopter, data I/O and signal processing tasks are executed as deterministic according to a predetermined executive schedule in the master unit. Since the health of a master unit is, by its very nature, critically deterministic, fault detection is also scheduled to be responsive to superframe interrupts (SIs). In accordance with the invention, the master unit carries out this self-evaluation by requiring each processor in the unit to both perform specified tasks and to report its ability to communicate. This latter function is vitally important since, even when a processor can process data, if it cannot transfer the results of that processing to the required locations in shared memory, the communication mechanism in the master unit, then the processor is effectively useless.
Referring now to FIG. 6 showing a flow chart of the self diagnostic/handshake procedure of an individual processor, the above procedure is initiated in response to a superframe interrupt (SI) at step 601. In accordance with a preferred embodiment of the invention, step 601 corresponds to an SI every other superframe (here, the odd superframes) in response to which the chief processor takes action (in the even superframes, as will be described below). At step 602, the processor carries out self diagnostics to determine the processor's view of system health. If, as a result of these tests, the system appears healthy (step 602-YES) the processor is required to set an interprocessor handshake flag (step 604) in memory 22. Not only does this flag indicate the health of the processor, but it also equates with the ability of the processor to communicate and may therefore be recognized as an interprocessor handshake. The routine then exits at step 605. If the result of step 603 is NO, the routine exits at step 605 with no interprocessor handshake having taken place for that particular processor.
As noted previously, if the processor detects a failure in its own operation (in the event of data or program dependent errors such as memory parity, illegal instruction, etc., that processor will disable interrupts (its own), attempt to log the error for maintenance purposes, and execute a no-operation loop, so that step 604 is not executed and an interprocessor handshake is not conveyed to memory 22.
One of the processors, here processor 24, assigned the task of being the chief processor, has stored in PROM 34 the task of reading memory 22 for the presence of all such flags, including its own. This is shown in FIG. 7 which is a flow chart of the routine carried out by chief processor 24. As illustrated therein, chief processor 24 responds to a superframe interrupt (step 701) and, depending upon the superframe of interest (step 702), will either carrying out diagnostics tests and report on these tests (raising the MASTER REQUEST and CPU RESET INHIBIT flags) if the superframe is an odd superframe, or it will repeat (refresh) the MASTER REQUEST and CPU RESET INHIBIT flags of the previous superframe, if the superframe is even (step 709). After processor 24 has executed its own self diagnostics test (step 703) and examined memory 22 for interprocessor handshakes (step 704), it inquires as to the status of the system, i.e. whether the system is being initialized (step 705). If the system is being initialized, the CPU RESET INHIBIT flag is set in step 706 and the routine exits at step 710. (This flag prevents a lock-up reset loop during system start-up.) If the mode is other than initialization, processor 24 checks system health at step 707. Namely it checks to see if all interprocessor handshakes have been set or, if a degraded level of system performance is tolerable, whether those handshakes critical to this tolerable level of performance have been set. If the system is healthy (completely or tolerable) processor 24 raises the MASTER REQUEST and CPU RESET INHIBIT flags in step 708 and exits at step 710. If the system is not healthy, neither flat is set. As mentioned previously, the repeated raising of the MASTER REQUEST flag every superframe is an a priori requirement for bus command capability by the master unit. Also, the chief processor 24 immediately resets all handshake flags after reading them (step 704), as each processor must periodically refresh its handshake flag every other superframe for the master to be considered healthy.
As noted above, the interprocessor handshake is generated by a processor after the processor having successfully completed a series of self-tests. The specifics of such tests, per se, are not required for an understanding of the present invention and will not be described in detail here. As examples of such tests, however, each processor may examine the performance of its own ALU (arithmetic logic unit) by executing a representative subset of that processor's arithmetic and logical instructions and comparing the results of the execution to specified values stored in the processor's associated PROM. Such tests may include the use of all the processor's registers so as to fully examine the health of the processor.
Another useful test is a memory evaluation (write-read test), which may be performed by writing specified reference data patterns in real time to preestablished locations in each of non-volatile memory 23, shared memory 22 and local memory (34-37) and reading out the contents of those locations to verify the results. Such a test is principally intended to check interfaces between the processors and memory.
In addition to the tests exemplified above, the processors may also carry out parity checks as a check on memory transactions; moreover, further self-diagnostics, such as clock/interrupt generation, task overflow, may be employed to indicate the health of the processor. As noted above, the specific tests, per se, and the details thereof, may be selected from a variety of self-diagnostic functions, employed in current data processors. What is important, however, is that there be provided an indication of the results of tests (and the ability of the processor to communicate) by the setting of the handshake (and health) flag by the processor in shared memory 22. Namely, a dual function is achieved by the flag--an indication of the processor's computation capability (intraprocessor signaling) and its data transfer capability (interprocessor signaling).
As noted above, when the chief processor 24 successfully has read all handshake flags in memory 22, it raises a MASTER REQUEST flag in a specified location in memory 22, which is read by bus controller 21. It may also perform a number of global master unit checks to further determine the ability of the master unit to take command of the bus. Such global checks may include inquiries for verifying synchronization of bus controller 21 with chief processor 24. One such check is to simply observe whether the superframe interrupt (180 Hz repetition rate) precedes the data input interrupt (DII) for each superframe, namely, that SI and DII interrupts occur in the proper order. A second global check (a loss of sync test) may be used determine that the processors 24-27 and bus controller 21 are operating within the same superframe. This is accomplished through the use of software counters for the chief processor 24 and bus controller 21 and comparing in the chief processor the count values of each. Each software counter is reset to zero and counts through five (for the six superframes per frame) before being reset.
Should chief processor 24 detect an anomaly in any of the above required checks, it will cease raising the MASTER REQUEST flag. When bus controller 21 fails to see the MASTER REQUEST flag raised, it ceases polling on bus 10, to allow the current backup master unit, if healthy itself, to take command of the bus 10. In order for the backup master to be considered healthy, its chief processor 24 must have raised (and continue to raise every superframe) the MASTER REQUEST flag and a CPU RESET INHIBIT flag in dedicated locations in memory 22.
If the MASTER REQUEST flag has not been set in memory 22 (i.e. is not refreshed for each superframe, as noted previously) associated bus controller 21 is prevented from communicating over bus 10 to the local and remote units so that the particular master unit is obviously ineligible to take command of bus 10. Similarly, if the CPU RESET INHIBIT flag is not refreshed in memory 22 every superframe, bus controller 21 immediately resets each of processors 24-27 in the master unit. Thus, if chief processor 24 of the current bus master unit fails to refresh either of the above flags each superframe, bus controller 21 will no longer poll the local and remote units 12 and 13 along bus 10.
Transfer of command of bus 10 between master units 11A and 11B is initiated when the bus controller 21 of the backup master senses inactivity over bus 10. Provided that its own MASTER REQUEST and CPU RESET INHIBIT flags are refreshed, bus controller 21 then proceeds to take command of the bus without any direct communication from the other master unit.
The above action is illustrated in FIG. 5 which shows a state diagram for a bus master. If a master unit is healthy and in command of the bus its status is that of bus master 51. In the event of a failure and replacement of the other master unit to command the bus, the faulty master unit goes off line 52 and carries out internal procedures to try to regain its master status. Once again healthy, the master unit acts as a "hot back-up" slave 53 for the current bus master. If the current bus master fails, then the "hot back-up" slave master unit assumes command of the bus 51.
Referring to FIG. 8 a flow chart of the above routine carried out by the bus controller 21 is shown. Bus controller 21 responds to a superframe interrupt at step 801 and checks the status of the CPU RESET INHIBIT flag at step 802. If the chief processor has failed to raise the CPU RESET INHIBIT flag, bus controller 21 issues a CPU reset to each of the processors in the master unit (step 808) and then waits for the next superframe interrupt (step 809). The bus controller 21 next checks to see if the MASTER REQUEST flag has been raised (step 803). If not, the bus controller monitors transmission link 10 for activity, issues data input interrupts (DIIs) as required (step 807) and waits for the next SI. If the chief processor has raised the MASTER REQUEST flag, bus controller 21 checks to see if it is currently master of the bus (step 804). If its master unit is currently in command of the bus, it polls the various I/O units (step 806), issues DIIs and then waits for the next SI (steps 806, 809).
As pointed out previously, the mechanism for effecting master unit swapover is monitoring silence on bus 10. If a master unit is not currently in command of the bus, it is conducting redundant (backups) data processing operations that will ensure network operational success in the event of a failure of the other master unit. This mechanism is effected by monitoring the bus for inactivity (silence) via a time out inquiry (step 805). If a prescribed time out interval during which no activity on bus 10 is detected is exceeded, the backup master unit immediately assumes command of the bus (bus controller issues I/O polls (step 806)).
In addition to controlling swapover between master units, bus controller 21 also carries out redundancy management between buses 10A and 10B. A series of bus-health tests are conducted during each message transfer and if any test is failed, bus controller 21 will communicate over the opposite bus when the next communication to the local or remote device affected by the failure takes place. These tests include bus parity, proper sync detection, and communication time-outs that require each polled device to respond to bus controller 21 within a prescribed time interval. For this purpose bus controller 21 contains a message time out counter (not shown) which is reset upon each message transmission. A comparator monitors the contents of this counter, that are incremented continuously, with a reference value. If there is no message return within the time interval corresponding to the reference value, a time-out error signal is generated. In response to this error signal bus controller 21 conducts its next poll to the affected device over the other of buses 10A and 10B.
Bus controller 21 may include a counter (not shown) for comparing the number of data words that have been requested in a communication to a local or remote device with the actual number of words that are received in a response message. In response to a discrepancy between the number of words requested and the actual number received, bus controller 21 switches over to the other one of buses 10A and 10B on the next communication.
Bus controller 21 may also carry out a parity check or a signal format check on the information received over bus 10. In response to an error, the data is discarded and bus controller 21 switches over to the other one of buses 10A and 10B on the next transaction. Bus controller 21 will also switch over to the other bus if it detects a garbled transmission from a local or remote device. In other words, bus controller 21 continuously monitors the fidelity of the transmissions on the bus link over which bus controller 21 is communicating to local and remote devices. When an anomaly occurs, bus controller 21 proceeds to carry out its next transmission over the redundant bus during the next succeeding time slot assigned to that local or remote device.
Should a catastrophic failure occur on both buses, namely, the bus master cannot properly communicate to any local or remote unit on either bus, the bus controller 21 will relinquish command of the bus to the backup master unit as shown in the state diagram of FIG. 5.
As will be appreciated from the foregoing description of the present invention drawbacks of conventional redundancy networks, including their associated fault diagnostics, are overcome by a fault detection and redundancy management system that reduces hardware requirements while providing accurate fault detection and network auto-configuration as required. Because the master units are multiprocessor-configured and programmed to repetitively carry out intra- and inter-unit performance tests as an a priori requirement for network command capability the health and communication capability of each master unit can be checked on a continuous basis while avoiding the cascading of faults that may occur.
While we have shown and described an embodiment in accordance with the present invention, it is understood that the same is not limited thereto but is susceptible of numerous changes and modifications as known to a person skilled in the art, and we therefore do not wish to be limited to the details shown and described herein but intend to cover all such changes and modifications as are obvious to one of ordinary skill in the art.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US3636331 *||Jun 14, 1968||Jan 18, 1972||Huels Chemische Werke Ag||Method and system for the automatic control of chemical plants with parallel-connected computer backup system|
|US3875390 *||Jul 13, 1973||Apr 1, 1975||Secr Defence Brit||On-line computer control system|
|US4032757 *||Sep 19, 1974||Jun 28, 1977||Smiths Industries Limited||Control apparatus|
|US4212057 *||Apr 22, 1976||Jul 8, 1980||General Electric Company||Shared memory multi-microprocessor computer system|
|US4215395 *||Aug 24, 1978||Jul 29, 1980||Texas Instruments Incorporated||Dual microprocessor intelligent programmable process control system|
|US4351023 *||Apr 11, 1980||Sep 21, 1982||The Foxboro Company||Process control system with improved system security features|
|US4358823 *||Apr 12, 1979||Nov 9, 1982||Trw, Inc.||Double redundant processor|
|US4377000 *||Jun 18, 1980||Mar 15, 1983||Westinghouse Electric Corp.||Automatic fault detection and recovery system which provides stability and continuity of operation in an industrial multiprocessor control|
|1||*||R. S. Laughlin, The Galaxy/5: A Large Computer Composed of Multiple Microcomputers, 13th IEEE Computer Society Int. Conf., Sep. 7 10, 1976, pp. 90 94.|
|2||R. S. Laughlin, The Galaxy/5: A Large Computer Composed of Multiple Microcomputers, 13th IEEE Computer Society Int. Conf., Sep. 7-10, 1976, pp. 90-94.|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US4710926 *||Dec 27, 1985||Dec 1, 1987||American Telephone And Telegraph Company, At&T Bell Laboratories||Fault recovery in a distributed processing system|
|US4718002 *||Jun 5, 1985||Jan 5, 1988||Tandem Computers Incorporated||Method for multiprocessor communications|
|US4775930 *||Nov 12, 1985||Oct 4, 1988||Westinghouse Electric Corp.||Electronic key check for ensuring proper cradles insertion by respective processing board|
|US4792955 *||Aug 21, 1986||Dec 20, 1988||Intel Corporation||Apparatus for on-line checking and reconfiguration of integrated circuit chips|
|US4805085 *||Jun 6, 1988||Feb 14, 1989||Sony Corporation||Digital control system for electronic apparatus|
|US4933838 *||Jun 3, 1987||Jun 12, 1990||The Boeing Company||Segmentable parallel bus for multiprocessor computer systems|
|US4943919 *||Oct 17, 1988||Jul 24, 1990||The Boeing Company||Central maintenance computer system and fault data handling method|
|US4965879 *||Oct 13, 1988||Oct 23, 1990||United Technologies Corporation||X-wing fly-by-wire vehicle management system|
|US4979108 *||Dec 26, 1989||Dec 18, 1990||Ag Communication Systems Corporation||Task synchronization arrangement and method for remote duplex processors|
|US5233543 *||May 9, 1991||Aug 3, 1993||Asea Brown Boveri Ab||Device for generating a current corresponding to a quantity supplied to the device|
|US5289589 *||Sep 10, 1990||Feb 22, 1994||International Business Machines Corporation||Automated storage library having redundant SCSI bus system|
|US5343477 *||Sep 12, 1991||Aug 30, 1994||Omron Corporation||Data processing system with data transmission failure recovery measures|
|US5408647 *||Oct 2, 1992||Apr 18, 1995||Compaq Computer Corporation||Automatic logical CPU assignment of physical CPUs|
|US5408649 *||Apr 30, 1993||Apr 18, 1995||Quotron Systems, Inc.||Distributed data access system including a plurality of database access processors with one-for-N redundancy|
|US5428769 *||Mar 31, 1992||Jun 27, 1995||The Dow Chemical Company||Process control interface system having triply redundant remote field units|
|US5463733 *||Aug 8, 1994||Oct 31, 1995||International Business Machines Corporation||Failure recovery apparatus and method for distributed processing shared resource control|
|US5487149 *||Dec 29, 1994||Jan 23, 1996||Hyundai Electronics Industries Co., Inc.||Common control redundancy switch method for switching a faulty active common control unit with an inactive spare common control unit|
|US5491788 *||Sep 10, 1993||Feb 13, 1996||Compaq Computer Corp.||Method of booting a multiprocessor computer where execution is transferring from a first processor to a second processor based on the first processor having had a critical error|
|US5491794 *||Jan 23, 1995||Feb 13, 1996||Thomson Consumer Electronics, S.A.||Fault protection using microprocessor power up reset|
|US5513312 *||Dec 15, 1993||Apr 30, 1996||Siemens Aktiengesellschaft||Method for system-prompted fault clearance of equipment in communcation systems|
|US5524209 *||Feb 27, 1995||Jun 4, 1996||Parker; Robert F.||System and method for controlling the competition between processors, in an at-compatible multiprocessor array, to initialize a test sequence|
|US5533188 *||Oct 19, 1992||Jul 2, 1996||The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration||Fault-tolerant processing system|
|US5544347 *||Apr 23, 1993||Aug 6, 1996||Emc Corporation||Data storage system controlled remote data mirroring with respectively maintained data indices|
|US5560570 *||Jun 7, 1994||Oct 1, 1996||Sextant Avionique||Automatic piloting device for aerodynes|
|US5586249 *||Jul 12, 1995||Dec 17, 1996||Fujitsu Limited||Control information backup system|
|US5600784 *||Mar 16, 1995||Feb 4, 1997||Marathon Technologies Corporation||Fault resilient/fault tolerant computing|
|US5615403 *||Oct 2, 1995||Mar 25, 1997||Marathon Technologies Corporation||Method for executing I/O request by I/O processor after receiving trapped memory address directed to I/O device from all processors concurrently executing same program|
|US5621884 *||Jan 18, 1995||Apr 15, 1997||Quotron Systems, Inc.||Distributed data access system including a plurality of database access processors with one-for-N redundancy|
|US5627962 *||Dec 30, 1994||May 6, 1997||Compaq Computer Corporation||Circuit for reassigning the power-on processor in a multiprocessing system|
|US5627965 *||Oct 27, 1994||May 6, 1997||Integrated Micro Products, Ltd.||Method and apparatus for reducing the effects of hardware faults in a computer system employing multiple central processing modules|
|US5640508 *||Oct 24, 1994||Jun 17, 1997||Hitachi, Ltd.||Fault detecting apparatus for a microprocessor system|
|US5644700 *||Oct 5, 1994||Jul 1, 1997||Unisys Corporation||Method for operating redundant master I/O controllers|
|US5687308 *||Jun 7, 1995||Nov 11, 1997||Tandem Computers Incorporated||Method to improve tolerance of non-homogeneous power outages|
|US5689513 *||Mar 29, 1996||Nov 18, 1997||Fujitsu Limited||Data transmission system having a backup testing facility|
|US5692120 *||May 30, 1995||Nov 25, 1997||International Business Machines Corporation||Failure recovery apparatus and method for distributed processing shared resource control|
|US5751574 *||Sep 10, 1996||May 12, 1998||Siemens Aktiengesellschaft||Method for loading software in communication systems with non-redundant, decentralized equipment|
|US5790397 *||Sep 17, 1996||Aug 4, 1998||Marathon Technologies Corporation||Fault resilient/fault tolerant computing|
|US5862315 *||May 12, 1997||Jan 19, 1999||The Dow Chemical Company||Process control interface system having triply redundant remote field units|
|US5889940 *||Jan 25, 1997||Mar 30, 1999||Sun Microsystems, Inc.||System and method for reducing the effects of hardware faults in a computer system employing multiple central processing modules|
|US5896523 *||Jun 4, 1997||Apr 20, 1999||Marathon Technologies Corporation||Loosely-coupled, synchronized execution|
|US5915649 *||Aug 23, 1996||Jun 29, 1999||Mcdonnell Douglas Helicopter Company||Roadable helicopter|
|US5928368 *||Jun 23, 1994||Jul 27, 1999||Tandem Computers Incorporated||Method and apparatus for fault-tolerant multiprocessing system recovery from power failure or drop-outs|
|US5951683 *||Dec 19, 1997||Sep 14, 1999||Fujitsu Limited||Multiprocessor system and its control method|
|US5956474 *||Dec 18, 1996||Sep 21, 1999||Marathon Technologies Corporation||Fault resilient/fault tolerant computing|
|US5970226 *||Jan 26, 1995||Oct 19, 1999||The Dow Chemical Company||Method of non-intrusive testing for a process control interface system having triply redundant remote field units|
|US5983364 *||May 12, 1997||Nov 9, 1999||System Soft Corporation||System and method for diagnosing computer faults|
|US6038684 *||Mar 22, 1999||Mar 14, 2000||Sun Microsystems, Inc.||System and method for diagnosing errors in a multiprocessor system|
|US6038685 *||Sep 22, 1997||Mar 14, 2000||Marathon Technologies Corporation||Fault resilient/fault tolerant computing|
|US6047392 *||Mar 22, 1999||Apr 4, 2000||Sun Microsystems, Inc.||System and method for tracking dirty memory|
|US6049893 *||Mar 22, 1999||Apr 11, 2000||Sun Microsystems, Inc.||System and method for synchronously resetting a plurality of microprocessors|
|US6061600 *||May 9, 1997||May 9, 2000||I/O Control Corporation||Backup control mechanism in a distributed control network|
|US6061809 *||Dec 19, 1996||May 9, 2000||The Dow Chemical Company||Process control interface system having triply redundant remote field units|
|US6092218 *||Mar 22, 1999||Jul 18, 2000||Sun Microsystems, Inc.||System and method for self-referential accesses in a multiprocessor computer|
|US6134672 *||Mar 22, 1999||Oct 17, 2000||Sun Microsystems, Inc.||System and method for iterative copying of read/write memory|
|US6134679 *||Mar 22, 1999||Oct 17, 2000||Sun Microsystems, Inc.||System and method for accessing devices in a computer system|
|US6141710 *||Dec 15, 1998||Oct 31, 2000||Daimlerchrysler Corporation||Interfacing vehicle data bus to intelligent transportation system (ITS) data bus via a gateway module|
|US6141766 *||Mar 22, 1999||Oct 31, 2000||Sun Microsystems, Inc.||System and method for providing synchronous clock signals in a computer|
|US6170068||Mar 22, 1999||Jan 2, 2001||Sun Microsystems, Inc.||System and method for preserving the state of a device across a reset event|
|US6173416||Mar 22, 1999||Jan 9, 2001||Sun Microsystems, Inc.||System and method for detecting errors using CPU signature|
|US6178522||Aug 25, 1998||Jan 23, 2001||Alliedsignal Inc.||Method and apparatus for managing redundant computer-based systems for fault tolerant computing|
|US6181929 *||May 19, 1997||Jan 30, 2001||Motorola, Inc.||Method for switching cell site controllers|
|US6205565||May 19, 1998||Mar 20, 2001||Marathon Technologies Corporation||Fault resilient/fault tolerant computing|
|US6272386 *||Mar 27, 1998||Aug 7, 2001||Honeywell International Inc||Systems and methods for minimizing peer-to-peer control disruption during fail-over in a system of redundant controllers|
|US6279119||Nov 13, 1998||Aug 21, 2001||Marathon Technologies Corporation||Fault resilient/fault tolerant computing|
|US6298289 *||Apr 24, 1999||Oct 2, 2001||The Boeing Company||Integrated spacecraft control system and method|
|US6327668 *||Jun 30, 1998||Dec 4, 2001||Sun Microsystems, Inc.||Determinism in a multiprocessor computer system and monitor and processor therefor|
|US6330570 *||Feb 26, 1999||Dec 11, 2001||Hewlett-Packard Company||Data backup system|
|US6438586 *||Sep 30, 1996||Aug 20, 2002||Emc Corporation||File transfer utility which employs an intermediate data storage system|
|US6446201 *||Sep 14, 1999||Sep 3, 2002||Hartmann & Braun Gmbh & Co. Kg||Method and system of sending reset signals only to slaves requiring reinitialization by a bus master|
|US6449729 *||Feb 12, 1999||Sep 10, 2002||Compaq Information Technologies Group, L.P.||Computer system for dynamically scaling busses during operation|
|US6473869||Aug 10, 2001||Oct 29, 2002||Marathon Technologies Corporation||Fault resilient/fault tolerant computing|
|US6519704||Mar 22, 1999||Feb 11, 2003||Sun Microsystems, Inc.||System and method for driving a signal to an unbuffered integrated circuit|
|US6532546||Mar 15, 2002||Mar 11, 2003||Compaq Information Technologies Group, L.P.||Computer system for dynamically scaling busses during operation|
|US6539337 *||Jun 15, 2000||Mar 25, 2003||Innovative Technology Licensing, Llc||Embedded diagnostic system and method|
|US6577914 *||Aug 10, 1999||Jun 10, 2003||Advanced Micro Devices, Inc.||Method and apparatus for dynamic model building based on machine disturbances for run-to-run control of semiconductor devices|
|US6611860||Nov 17, 1999||Aug 26, 2003||I/O Controls Corporation||Control network with matrix architecture|
|US6615376||Nov 19, 1999||Sep 2, 2003||X/Net Associates, Inc.||Method and system for external notification and/or resolution of software errors|
|US6708231 *||Aug 12, 1999||Mar 16, 2004||Mitsumi Electric Co., Ltd.||Method and system for performing a peripheral firmware update|
|US6732202||Nov 17, 1999||May 4, 2004||I/O Controls Corporation||Network node with plug-in identification module|
|US6754762 *||Mar 5, 2001||Jun 22, 2004||Honeywell International Inc.||Redundant bus switching|
|US6807151 *||Mar 27, 2000||Oct 19, 2004||At&T Corp||Apparatus and method for group-wise detection of failure condition|
|US6886109 *||May 18, 2001||Apr 26, 2005||Hewlett-Packard Development Company, L.P.||Method and apparatus for expediting system initialization|
|US6966015||Mar 22, 2001||Nov 15, 2005||Micromuse, Ltd.||Method and system for reducing false alarms in network fault management systems|
|US6970961 *||Jan 2, 2001||Nov 29, 2005||Juniper Networks, Inc.||Reliable and redundant control signals in a multi-master system|
|US7039554 *||Jun 6, 2003||May 2, 2006||Pratt & Whitney Canada Corp.||Method and system for trend detection and analysis|
|US7155704 *||Sep 17, 2001||Dec 26, 2006||Sun Microsystems, Inc.||Determinism in a multiprocessor computer system and monitor and processor therefor|
|US7216063 *||Apr 13, 2005||May 8, 2007||Pratt & Whitney Canada Corp.||Method and apparatus for comparing a data set to a baseline value|
|US7321845 *||Apr 13, 2005||Jan 22, 2008||Pratt & Whitney Canada Corp.||Method and system for removing very low frequency noise from a time-based data set|
|US7383191||Nov 28, 2000||Jun 3, 2008||International Business Machines Corporation||Method and system for predicting causes of network service outages using time domain correlation|
|US7398299||Aug 22, 2003||Jul 8, 2008||I/O Controls Corporation||Control network with matrix architecture|
|US7472320||Feb 24, 2004||Dec 30, 2008||International Business Machines Corporation||Autonomous self-monitoring and corrective operation of an integrated circuit|
|US7584420||Oct 5, 2004||Sep 1, 2009||Lockheed Martin Corporation||Graphical authoring and editing of mark-up language sequences|
|US7697443 *||Apr 13, 2010||International Business Machines Corporation||Locating hardware faults in a parallel computer|
|US7796527||Apr 13, 2006||Sep 14, 2010||International Business Machines Corporation||Computer hardware fault administration|
|US7801702 *||Nov 30, 2004||Sep 21, 2010||Lockheed Martin Corporation||Enhanced diagnostic fault detection and isolation|
|US7823062||Oct 26, 2010||Lockheed Martin Corporation||Interactive electronic technical manual system with database insertion and retrieval|
|US7831866||Aug 2, 2007||Nov 9, 2010||International Business Machines Corporation||Link failure detection in a parallel computer|
|US7898937 *||Mar 1, 2011||Cisco Technology, Inc.||Voting to establish a new network master device after a network failover|
|US7908052 *||Aug 7, 2001||Mar 15, 2011||Thales||Maintenance system for an equipment set|
|US7979766||Sep 8, 2004||Jul 12, 2011||Centre For Development Of Telematics||Architecture for a message bus|
|US7996732 *||Aug 9, 2011||Denso Corporation||Program-execution monitoring method, system, and program|
|US8290602 *||Dec 15, 2009||Oct 16, 2012||Converteam Technology Ltd.||Electronic system with component redundancy, and control chain for a motor implementing such system|
|US8369969 *||Jun 10, 2010||Feb 5, 2013||Mitsubishi Electric Corporation||Distributed significant control monitoring system and device with transmission synchronization|
|US8516444||Feb 23, 2006||Aug 20, 2013||International Business Machines Corporation||Debugging a high performance computing program|
|US8560885 *||Sep 16, 2010||Oct 15, 2013||The Boeing Company||Dynamic redundancy management|
|US8645582||Sep 13, 2006||Feb 4, 2014||I/O Controls Corporation||Network node with plug-in identification module|
|US8688241 *||Sep 28, 2012||Apr 1, 2014||Mitsubishi Electric Corporation||Distributed control system for monitoring a significant control|
|US8769349||Oct 17, 2011||Jul 1, 2014||Cisco Technology, Inc.||Managing network devices based on predictions of events|
|US8805595 *||Jan 17, 2008||Aug 12, 2014||General Electric Company||Wind turbine arranged for independent operation of its components and related method and computer program|
|US8813037||Feb 28, 2013||Aug 19, 2014||International Business Machines Corporation||Debugging a high performance computing program|
|US8997000||Oct 20, 2011||Mar 31, 2015||Cisco Technology, Inc.||Integrated view of network management data|
|US9081653||Mar 14, 2013||Jul 14, 2015||Flextronics Ap, Llc||Duplicated processing in vehicles|
|US9081748 *||Jun 23, 2013||Jul 14, 2015||The Boeing Company||Dynamic redundancy management|
|US9225554 *||Jul 7, 2011||Dec 29, 2015||Cisco Technology, Inc.||Device-health-based dynamic configuration of network management systems suited for network operations|
|US9330230||Apr 19, 2007||May 3, 2016||International Business Machines Corporation||Validating a cabling topology in a distributed computing system|
|US20020010880 *||Sep 17, 2001||Jan 24, 2002||Sun Microsystems, Inc.||Determinism in a multiprocessor computer system and monitor and processor therefor|
|US20020170002 *||Mar 22, 2001||Nov 14, 2002||Steinberg Louis A.||Method and system for reducing false alarms in network fault management systems|
|US20020174381 *||May 18, 2001||Nov 21, 2002||Olarig Sompong P.||Method and apparatus for expediting system initialization|
|US20040148139 *||Jun 6, 2003||Jul 29, 2004||Nguyen Phuc Luong||Method and system for trend detection and analysis|
|US20040255186 *||May 27, 2003||Dec 16, 2004||Lucent Technologies, Inc.||Methods and apparatus for failure detection and recovery in redundant systems|
|US20050125565 *||May 3, 2004||Jun 9, 2005||I/O Controls Corporation||Network node with plug-in identification module|
|US20050183007 *||Oct 5, 2004||Aug 18, 2005||Lockheed Martin Corporation||Graphical authoring and editing of mark-up language sequences|
|US20050209821 *||Apr 13, 2005||Sep 22, 2005||Nguyen Phuc L||Method and system for removing very low frequency noise from a time-based data set|
|US20050209823 *||Apr 13, 2005||Sep 22, 2005||Nguyen Phuc L||Method and apparatus for comparing a data set to a baseline value|
|US20050223288 *||Nov 30, 2004||Oct 6, 2005||Lockheed Martin Corporation||Diagnostic fault detection and isolation|
|US20050223290 *||Nov 30, 2004||Oct 6, 2005||Berbaum Richard D||Enhanced diagnostic fault detection and isolation|
|US20050240555 *||Dec 23, 2004||Oct 27, 2005||Lockheed Martin Corporation||Interactive electronic technical manual system integrated with the system under test|
|US20060085692 *||Oct 5, 2005||Apr 20, 2006||Lockheed Martin Corp.||Bus fault detection and isolation|
|US20060120181 *||Oct 4, 2005||Jun 8, 2006||Lockheed Martin Corp.||Fault detection and isolation with analysis of built-in-test results|
|US20060155425 *||Aug 7, 2001||Jul 13, 2006||Peter Howlett||Maintenance system for an equipment set|
|US20070115808 *||Sep 13, 2006||May 24, 2007||Jeffrey Ying||Network node with plug-in identification module|
|US20070234294 *||Feb 23, 2006||Oct 4, 2007||International Business Machines Corporation||Debugging a high performance computing program|
|US20070242611 *||Apr 13, 2006||Oct 18, 2007||Archer Charles J||Computer Hardware Fault Diagnosis|
|US20070242685 *||Apr 13, 2006||Oct 18, 2007||Archer Charles J||Locating Hardware Faults in a Parallel Computer|
|US20070260909 *||Apr 13, 2006||Nov 8, 2007||Archer Charles J||Computer Hardware Fault Administration|
|US20080010563 *||Jun 14, 2007||Jan 10, 2008||Denso Corporation||Program-execution monitoring method, system, and program|
|US20080052281 *||Aug 23, 2006||Feb 28, 2008||Lockheed Martin Corporation||Database insertion and retrieval system and method|
|US20080120282 *||Nov 21, 2006||May 22, 2008||Lockheed Martin Corporation||Interactive electronic technical manual system with database insertion and retrieval|
|US20080137528 *||Dec 6, 2006||Jun 12, 2008||Cisco Technology, Inc.||Voting to establish a new network master device after a network failover|
|US20080189225 *||Mar 24, 2008||Aug 7, 2008||David Herring||Method and System for Predicting Causes of Network Service Outages Using Time Domain Correlation|
|US20080215355 *||Mar 24, 2008||Sep 4, 2008||David Herring||Method and System for Predicting Causes of Network Service Outages Using Time Domain Correlation|
|US20080215913 *||Jan 19, 2006||Sep 4, 2008||Yokogawa Electric Corporation||Information Processing System and Information Processing Method|
|US20080259816 *||Apr 19, 2007||Oct 23, 2008||Archer Charles J||Validating a Cabling Topology in a Distributed Computing System|
|US20080308635 *||Jun 18, 2008||Dec 18, 2008||Poulin Jeffrey S||Automated postal voting system and method|
|US20090037773 *||Aug 2, 2007||Feb 5, 2009||Archer Charles J||Link Failure Detection in a Parallel Computer|
|US20090187282 *||Jul 23, 2009||Detlef Menke||Wind turbine arranged for independent operation of its components and related method and computer program|
|US20100115357 *||Sep 8, 2004||May 6, 2010||Centre For Development Of Telmatics||Novel Architecture for a Message Bus|
|US20100168877 *||Dec 15, 2009||Jul 1, 2010||Converteam Ltd||Electronic system with component redundancy, and control chain for a motor implementing such system|
|US20100318197 *||Dec 16, 2010||Mitsubishi Electric Corporation||Control system|
|US20120191826 *||Jul 7, 2011||Jul 26, 2012||Rony Gotesdyner||Device-Health-Based Dynamic Configuration of Network Management Systems Suited for Network Operations|
|US20130030551 *||Sep 28, 2012||Jan 31, 2013||Mitsubishi Electric Corporation||Control system|
|US20130283031 *||Jun 23, 2013||Oct 24, 2013||The Boeing Company||Dynamic redundancy management|
|US20130318263 *||May 24, 2012||Nov 28, 2013||Infineon Technologies Ag||System and Method to Transmit Data over a Bus System|
|CN102023900A *||Dec 6, 2010||Apr 20, 2011||中国航空工业集团公司第六三一研究所||Two-channel fault logical arbitration method and system thereof|
|CN102023900B||Dec 6, 2010||Nov 21, 2012||中国航空工业集团公司第六三一研究所||Two-channel fault logic arbitration method and system thereof|
|CN103176870A *||Mar 21, 2013||Jun 26, 2013||中国铁道科学研究院||Multi-mode information interaction redundancy safety computer platform|
|CN103176870B *||Mar 21, 2013||Dec 3, 2014||中国铁道科学研究院||Multi-mode information interaction redundancy safety computer platform|
|EP0327029A2 *||Jan 31, 1989||Aug 9, 1989||ESG Elektronik-System- Gesellschaft mbH||Fire control device|
|WO1995015529A1 *||Nov 15, 1994||Jun 8, 1995||Marathon Technologies Corporation||Fault resilient/fault tolerant computing|
|WO2000030232A1 *||Nov 19, 1999||May 25, 2000||X/Net Associates, Inc.||Method and system for external notification and/or resolution of software errors|
|U.S. Classification||714/11, 700/81, 714/E11.071, 714/E11.145, 700/82, 700/2|
|International Classification||G06F11/22, G06F11/20|
|Cooperative Classification||G06F11/20, G06F11/22|
|European Classification||G06F11/20, G06F11/22|
|Jul 28, 1983||AS||Assignment|
Owner name: HARRIS CORPORATION, MELBOURNE, FL.
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST.;ASSIGNORS:JULICH, PAUL M.;PEARCE, JEFFREY B.;REEL/FRAME:004158/0664
Effective date: 19830726
Owner name: HARRIS CORPORATION, FLORIDA
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:JULICH, PAUL M.;PEARCE, JEFFREY B.;REEL/FRAME:004158/0664
Effective date: 19830726
|Jul 3, 1990||FPAY||Fee payment|
Year of fee payment: 4
|Jul 1, 1994||FPAY||Fee payment|
Year of fee payment: 8
|Jul 2, 1998||FPAY||Fee payment|
Year of fee payment: 12