CA2434494C - Fault-tolerant computer system, re-synchronization method thereof and re-synchronization program thereof - Google Patents
Fault-tolerant computer system, re-synchronization method thereof and re-synchronization program thereof Download PDFInfo
- Publication number
- CA2434494C CA2434494C CA002434494A CA2434494A CA2434494C CA 2434494 C CA2434494 C CA 2434494C CA 002434494 A CA002434494 A CA 002434494A CA 2434494 A CA2434494 A CA 2434494A CA 2434494 C CA2434494 C CA 2434494C
- Authority
- CA
- Canada
- Prior art keywords
- fault
- processor
- synchronization
- computing module
- access
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 230000015654 memory Effects 0.000 claims abstract description 34
- 230000004044 response Effects 0.000 claims abstract description 23
- 230000008569 process Effects 0.000 claims abstract description 13
- 230000002093 peripheral effect Effects 0.000 claims description 20
- 238000012544 monitoring process Methods 0.000 claims description 12
- 230000006870 function Effects 0.000 claims description 11
- 238000010276 construction Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 6
- 238000007792 addition Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- PWPJGUXAGUPAHP-UHFFFAOYSA-N lufenuron Chemical compound C1=C(Cl)C(OC(F)(F)C(C(F)(F)F)F)=CC(Cl)=C1NC(=O)NC(=O)C1=C(F)C=CC=C1F PWPJGUXAGUPAHP-UHFFFAOYSA-N 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/18—Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/1675—Temporal synchronisation or re-synchronisation of redundant processing components
- G06F11/1687—Temporal synchronisation or re-synchronisation of redundant processing components at event level, e.g. by interrupt or result of polling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/1629—Error detection by comparing the output of redundant processing systems
- G06F11/1641—Error detection by comparing the output of redundant processing systems where the comparison is not performed by the redundant processing components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/1675—Temporal synchronisation or re-synchronisation of redundant processing components
- G06F11/1679—Temporal synchronisation or re-synchronisation of redundant processing components at clock signal level
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/1675—Temporal synchronisation or re-synchronisation of redundant processing components
- G06F11/1683—Temporal synchronisation or re-synchronisation of redundant processing components at instruction level
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/18—Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits
- G06F11/183—Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits by voting, the voting not being performed by the redundant components
- G06F11/184—Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits by voting, the voting not being performed by the redundant components where the redundant components implement processing functionality
Abstract
A lock-step synchronism fault-tolerant computer system includes a plurality of computing modules having a processor and a memory in which each computing module processes the same instruction string in synchronization with each other. When disagreement in a state of access to an external bus among the respective processors in each computing module is detected, if no fault is detected in the system including the respective computing modules, an interruption is notified to all of said processors. Synchronization among each computing module is recovered by adjusting timing of a response to an access which each processor executes by an interruption.
Description
FAULT-TOLERANT COMPUTER SYSTEM, RE-SYNCHRONIZATION
METHOD THEREOF AND RE-SYNCHRONIZATION PROGRAM THEREOF
BACKGROUND OF THE INVENTION
1. Field of the Invention The present invention relates to a lock-step synchronism fault-tolerant computer system which processes the same instruction string in a totally the same manner by a plurality of computing modules in clock synchronization with each other. More particularly, it relates to a fault-tolerant computer system and a high-speed re-synchronization controlling method which realize speed-up of re-synchronization processing when a synchronism fault among computing modules occurs (lock-step comes off).
METHOD THEREOF AND RE-SYNCHRONIZATION PROGRAM THEREOF
BACKGROUND OF THE INVENTION
1. Field of the Invention The present invention relates to a lock-step synchronism fault-tolerant computer system which processes the same instruction string in a totally the same manner by a plurality of computing modules in clock synchronization with each other. More particularly, it relates to a fault-tolerant computer system and a high-speed re-synchronization controlling method which realize speed-up of re-synchronization processing when a synchronism fault among computing modules occurs (lock-step comes off).
2. Description of the Related Art With a conventional lock-step synchronism fault-tolerant computer system, when among a plurality of computing modules executing the same instruction string at the same time, a computing module is detected having a different output from that of other computing modules due to a failure or other external or internal factor, such countermeasures as follows are taken. In the following, a computing module detected failing to operate in synchronization with other computing modules will be referred to as a computing module in a step-out state.
More specifically, such a measure is taken of once cutting off a computing module whose lock-step comes off from an operational state and replacing the computing module as required according to a factor causing step-out or when replacement is not necessary, conducting re-initialization processing or the like according to the necessity to integrate the computing module into the operational state.
In conventional lock-step synchronism fault-tolerant computer system, at the time of this re.-integration into the operational state, because irrespectively whether a computing module at the step-out state has been replaced or not, for the computing module to synchronize with other computing modules continuing with operation to conduct the same processing again, all the memory data held by the computing modules at the operational state are copied into a memory held by the computing module which is to be re-integrated at the time of its re-integration.
In conventional lock-step synchronism fault-tolerant computer system, after executing replacement of a computing module at the step-out state, re-initialization processing according to a part causing step-out and the like, when integrating the computing module in question into the operational state again, computing modules at the operational state are halted for a long period of time.
More specifically, such a measure is taken of once cutting off a computing module whose lock-step comes off from an operational state and replacing the computing module as required according to a factor causing step-out or when replacement is not necessary, conducting re-initialization processing or the like according to the necessity to integrate the computing module into the operational state.
In conventional lock-step synchronism fault-tolerant computer system, at the time of this re.-integration into the operational state, because irrespectively whether a computing module at the step-out state has been replaced or not, for the computing module to synchronize with other computing modules continuing with operation to conduct the same processing again, all the memory data held by the computing modules at the operational state are copied into a memory held by the computing module which is to be re-integrated at the time of its re-integration.
In conventional lock-step synchronism fault-tolerant computer system, after executing replacement of a computing module at the step-out state, re-initialization processing according to a part causing step-out and the like, when integrating the computing module in question into the operational state again, computing modules at the operational state are halted for a long period of time.
More specifically, the conventional lock-step synchronism fault-tolerant computer system has a problem that while a computing module at the step-out state is subjected to re-integration processing, the entire fault-tolerant computer system has its operation halted for a long period of time (3 to 5 seconds in general or on the order of minutes).
The reason is that in order to integrate a computing module at the step-out state into the operational state, all the memory contents are copied all the time from the computing modules continuing with operation into the computing module to be re-integrated.
When operation of a normal computing module is continued during the copying processing, memory contents of the normal computing module have a possibility to be changed during the copying processing as well, so that copying can not be performed properly. For avoiding such a situation, a computing module at the operational state is temporarily stopped to prevent updating of its memory contents.
Since a memory capacity in a computing module today comes up to several Giga bytes, copying the entire memory region will require a long period of time.
In a lock-step synchronism fault-tolerant computer system, a step-out state among computing modules occurs due to various causes.
First case is a fixed failure occurring within a computing module. in this case, a computing module having a failure should be replaced and when integrating the computing module to be replaced into an operational system, all the data in a memory of a computing module at the operational state needs to be copied.
In a lock-step synchronism fault-tolerant computer system, a step-out state may occur because, in addition to the above-described fixed failure, computing modules operate at different timing due to difference in manufacturing of each unit in a computing module although its operation is normal or because of an automatically correctable intermittent failure of memory caused by effects of a-ray or the like.
in these cases, since a fixed failure occurs not in a computing module itself, the module fundamentally needs no replacement, and by again synchronizing its processing with that of other computing modules in operation to integrate the computing module in question, the entire fault-tolerant computer system can be restored to a normal operation state.
SUMMPARY OF THE INVENTION
An object of an embodiment of the present invention is to provide a fault-tolerant computer system, a re-synchronization method thereof and a re-synchronization program thereof which enable a computing module whose lock-step comes off due to other causes than a fixed failure to be integrated again into the operational'state at a higher speed than by a conventional system, thereby drastically reducing time of temporary halt of the .system operation caused by the integration processing.
Another object of an embodiment of the present invention is to provide a fault-tolerant computer system, a re-synchronization method thereof and a re-synchronization program thereof which realize improvement of availability of the system by the reduction of time for the above-described re-integration processing.
According to the first aspect of the invention, a lock-step synchronism fault-tolerant computer system including a plurality of computing modules having a processor and a memory in which each computing module processes the same instruction string in synchronization with each other, wherein when detecting disagreement in a state of access to an external bus among respective the processors in each the computing module, if no fault is detected in the system including each the computing module, synchronization among each the computing module is recovered by adjusting timing of a response to an access which each the processor executes as a synchronization control instruction by an interruption.
2'5 In the preferred construction, the fault-tolerant computer system further comprises a fault detector which monitors existence/non-existence of a fault in the entire system;
a bus monitor which monitors an access of the processor in each the computing module to the external bus and when detecting disagreement in output among the respective computing modules, if no fault is detected by the fault detector, notifying an interruption to each the processor, and a synchronization controller which re-synchronizes each computing module by adjusting timing of a response to an access from each the processor which is caused by the interruption In another preferred construction, the bus monitor, when detecting disagreement in output among the respective computing modules, if no fault is detected by the fault detector, interrupts each the processor with a predetermined task, which is a task of executing an access to a predetermined resource in the synchronization controller, to re-synchronizing the computing modules, and the synchronization controller transmits a response to all the computing modules simultaneously, when receiving accesses to the resource from all the processors.
In another preferred construction, a plurality of pairs of the bus monitor, the fault detector and the synchronization controller are provided.
In another preferred construction, the bus monitor, the fault detector and the synchronization controller are provided in a peripheral device control unit which controls a peripheral device and connected to the external bus in the computing module through a PCI
bridge.
According to the second aspect of the invention, a re-synchronization method in a lock-step synchronism fault-tolerant computer system including a plurality of computing modules having a processor and a memory in which each computing module processes the same instruction string in synchronization with each other, comprising the steps of when detecting disagreement in a state of access to an external bus among respective the processors in each the computing module, if no fault is detected in the system including each the computing module, generating an interruption to all of the processors, and causing each the processor to execute a synchronization control instruction to adjust timing of a response to an access from each processor, thereby causing each computing module to resume operation in synchronization.
In the preferred construction, the re-synchronization method further comprising the steps of detecting existence/non-existence of a fault in the entire system including each the computing module, monitoring an access of the processor in each the computing module to the external bus, when detecting disagreement in output, among the respective computing modules, if no fault is detected in the system, notifying an interruption to each the processor, and causing each the processor to execute the clock synchronization control instruction to adjust timing of a response to an access from each processor, thereby causing each computing module to resume operation in synchronization.
In another preferred construction, the re-synchronization method further comprising the steps of when detecting disagreement in output among the respective computing modules, if no fault is detected in the system, interrupts each the processor with a predetermined task for re-synchronizing the respective computing modules which is a task of executing an access to a predetermined resource;
queuing access to the resource from each processor, and responding to the accesses from all the computing modules simultaneously when all the accesses from the processors are received.
According to another aspect of the invention, a re-synchronization program for executing re-synchronization processing of a lock-step synchronism fault-tolerant computer system including a plurality of computing modules having a processor and a memory in which each computing module processes the same instruction string in synchronization with each other, comprising the functions of when detecting disagreement in a state of access to an external bus among respective the processors in each the computing module, if no fault is detected in the system including each the computing module, generating an interruption to all of the processors, and causing each the processor to execute a clock synchronization control instruction to adjust timing of a response to an access from each processor, thereby causing each computing module to resume operation in synchronization In the preferred construction, the re-synchronization program further comprises the functions of detecting existence/non-existence of a fault in the entire system including each the cornputing module, monitoring an access of the processor in each the computing module to the external bus, when detecting disagreement in output among the respective computing modules, if no fault is detected in the system, notifying an interruption to each the processor, and causing each the processor to execute the synchronization control instruction to adjust timing of a response to an access from each processor, thereby causing each computing module to resume operation in synchronization.
In another preferred construction, the re-synchronization program further comprises the functions of when detecting disagreement in output among the respective computing modules, if no fault is detected in the system, interrupts each the processor with a predetermined task for re-synchronizing the respective computing modules which is a task of executing an access to a predetermined resource;
queing access to the resource from each processor, and responding to the accesses from all the computing modules simultaneously when all the accesses from the processors are received.
According to another aspect of the invention, there is provided a lock-step synchronism fault-tolerant computer system including a plurality of computing modules having a processor and a memory in which each computing module processes the same instruction string in synchronization with each other, comprising: a fault detector which monitors existence/non-existence of a fault in the entire system; a bus monitor which monitors an access of the processor in each said computing module to the external bus and when detecting disagreement in output among the respective computing modules, if no fault is detected by said fault detector, notifying each said processor of an interruption; and a synchronization controller which re-synchronizes each computing module by adjusting timing of the response to an access from each said processor which is caused by said interruption; wherein said bus monitor, when detecting disagreement in output among the respective computing modules, if no fault is detected by said fault detector, interrupts each said processor with a predetermined task, which is a task of executing an access to a predetermined resource in said synchronization controller, to re-synchronizing the computing modules, and said synchronization controller transmits a response to all the computing modules simultaneously, when receiving accesses to said resource from all the processors.
-10a-According to a further aspect of the invention, there is provided a re-synchronization method in a lock-step synchronism fault-tolerant computer system including a plurality of computing modules having a processor and a memory in which each computing module processes the same instruction string in synchronization with each other, comprising the steps of: detecting existence/non-existence of a fault in the entire system including each said computing module; monitoring an access of the processor in each said computing module to the external bus; when detecting disagreement in output among the respective computing modules, if no fault is detected in the system, notifying each said processor of an interruption; causing each said processor to execute a clock.synchronization control instruction to adjust timing of the response to an access from each processor, thereby causing each computing module to resume operation in synchronization; when detecting disagreement in output among the respective computing modules, if no fault is detected in the system, interrupts each said processor with a predetermined task for re-synchronizing the respective computing modules which is a task of executing an access to a predetermined resource; and queuing access to said resource from each processor, and responding to said accesses from all the computing modules simultaneously when all the accesses from said processors are received.
According to a still further aspect of the invention, there is provided a computer readable medium having computer readable code embodied therein for executing re-synchronization processing of a lock-step synchronism fault-tolerant computer system including a plurality of computing modules having a processor and a memory in which each computing module processes the same instruction string -10b-in synchronization with each other, said computer readable code when executed, performing the functions of: detecting existence/non-existence of a fault in the entire system including each said computing module; monitoring an access of the processor in each said computing module to the external bus; when detecting disagreement in output among the respective computing modules, if no fault is detected in the system, notifying each said processor of an interruption; causing each said processor to execute the synchronization control instruction to adjust timing of the response to an access from each processor, thereby causing each computing module to resume operation in synchronization; when detecting disagreement in output among the respective computing modules, if no fault is detected in the system, interrupts each said processor with a predetermined task for re-synchronizing the respective computing modules which is a task of executing an access to a predetermined resource; and queuing access to said resource from each processor, and responding to said accesses from all the computing modules simultaneously when all the accesses from said processors are received.
-lOc-Other objects, features and advantages of the present invention will become clear from the detailed description given herebelow.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention will be understood more fully from the detailed description given herebelow and from the accompanying drawings of the preferred embodiment of the invention, which, however, should not be taken to be limitative to the invention, but are for explanation and understanding only.
In the drawings:
Fig. 1 is a block diagram showing a structure of a fault-tolerant computer system according to a first mode of implementation of the present invention;
Fig. 2 is a diagram for use in explaining the contents of re-synchronization processing of the fault-tolerant computer system;
Fig. 3 is a block diagram showing a structure of a fault-tolerant computer system according to a second mode of implementation of the present invention; and Fig. 4 is a block diagram showing a structure of a fault-tolerant computer system according to a third mode of implementation of the present invention.
DESCRIPTION OF THE PREFERRED EMBODIMENT
The preferred embodiment of the present invention will be discussed hereinafter in detail with reference to the accompanying drawings. In the following description, numerous specific details are set forth in order to provide a through understanding of the present invention. It will be obvious, however, to those skilled in the art that the present invention may be practiced without these specific details. In other instance, well-known structures are not shown in detail in order to unnecessary obscure the present invention.
Modes of implementation of the present invention will be described in detail with reference to the drawings in the following. Fig. 1 is a block diagram showing a structure of a lock-step synchronism fault-tolerant computer system according to a first mode of implementation of the present invention.
With reference to Fig. 1, the fault-tolerant computer system according to the present mode of implementation includes a plurality of computing modules 100, 200 and 300, each of which computing modules 100, 200 and 300 processes the same instruction string in clock synchronization with each other. The fault-tolerant computer system compares a processing result of each computing module. Even when one computing module has a failure, the processing can be continued by the remaining computing modules.
The respective computing modules 100, 200 and 300 include a plurality of processors 101 and 102, 201 and 202, and 301 and 302, processor external buses 103, 203 and 303, memories 104, 204 and 304 and memory control units 105, 205 and 305, respectively.
In addition, the computing modules 100, 200 and 300 are connected to peripheral device control units 400 and 500 for controlling a peripheral device through the memory control units 105, 205 and 305 and interface signal lines 600, 601, 602, 610, 611 and 612.
The above-described fault-tolerant computer system further includes a bus monitor 700, a fault detecting unit 702 and a synchronization control unit 701.
The bus monitor 700 monitors an access of a processor of each computing module to the external bus.
The bus monitor 700 is connected to the processor external buses 103, 203 and 303 of the respective computing modules 100, 200 and 300 through interface signal lines 710, 711 and 712.
The fault detecting unit 702 monitors existence/non-existence of a fault in the entire system including the respective computing modules.
The synchronization control unit 701, which is connected to each computing module, adjusts timing of a response to an access from each computing module to cause each computing module to resume operation in clock synchronization. The synchronization control unit 701 is connected to the memory control units 105, 205 and 305 of the respective computing modules 100, 200 and 300 through interface signal lines 730, 731 and 732.
Next, description will be made of operation of thus structured fault-tolerant computer system according to the present mode of implementation.
The fault detecting unit 702 monitors existence/non-existence of a fixed fault in the entire fault-tolerant computer system including the respective computing modules 100, 200 and 300 and the peripheral device control units 400 and 500. Then, the fault detecting unit 702 notifies a monitoring result to the bus monitor 700.
The bus monitor 700, which is connected to the processor external buses 103, 203 and 303 of the respective computing modules. 100, 200 and 300 through the interface signal lines 710, 711 and 712, compares external access control signals of the respective processors 101, 102, 201, 202, 301 and 302 to monitor whether the respective processors 101, 102, 201, 202, 301 and 302 access the external buses 103, 203 and 303 in clock synchronization with each other at the same timing or not.
Tn a case where by the above-described monitoring operation, the bus monitor 700 detects any of the processors 101, 102, 201, 202, 301 and 302 operating at different timing from timing of the others, when the fault detecting unit 702 detects a fixed fault nowhere in the fault-tolerant computer system, the bus monitor 700 detects that the step-out is not caused by a fault.
The result is notified to all the computing modules 100, 200 and 300 through the interface signal lines 710, 711 and 712 to generate an interruption to each processor In addition, the bus monitor 700 at the same time shifts to a break mode of monitoring the external buses 103, 203 and 303 of the processors.
Here, the bus monitor 700 monitors all the accesses to the external buses 103, 203 and 303 including a memory access from the processor and when detecting lack of synchronization in operation among the computing modules, instantaneously interrupts all the processors 101, 102, 201, 202, 301 and 302 to interrupt the processing, so that at the time of the interruption is generated, the contents of the memories 104, 204 and 304 in the respective computing modules 100, 200 and 300 are all coincident with each other.
In the following, description will be made of -w5-specific contents of operation of the fault-tolerant computer system according to the present mode of implementation with reference to Fig. 2.
When the bus monitor 700 detects lack of synchronization in operation among the computing modules (Step 201 in Fig. 2), the detection is notified through the interface signal lines 710, 711 and 712 to generate an interruption to each processor.
All the processors 101, 102, 201, 202, 301 and 302 are at the relevant interruption processing and queue a synchronization control task intended to obtain re-synchronization of clock synchronization operation among the respective computing modules 100, 200 and 300 to the top of a ready queue as a highest-priority task (Step 202 in Fig. 2).
The synchronization control task has a function of executing an instruction to access a resource specially prescribed in the synchronization control unit 701. Thereafter, when the above-described synchronization control task is shifted to an execution state by an OS, the task executes the instruction to access the prescribed resource in the synchronization control unit 701 (Step 203 in Fig. 2).
At this time point, an access to the prescribed resource from a computing module in the step-out state and an access to the prescribed resource from other computing modules in the lock-step state are naturally transmitted to the synchronization control unit '701 with a time delay.
Upon detecting an access from the computing modules 100, 200 and 300 to the internal resource specially prescribed, the synchronization control unit 701, when the access is the first, refrains from returning a response to the relevant computing module and waits for accesses from all of the other computing modules to come (Step 204 in Fig. 2). When the accesses from all the computing modules 100, 200 and 300 are transmitted, return a response to the accesses simultaneously to all the computing modules 100, 200 and 300.
In response to the response from the synchronization control unit 701, all the processors in the respective computing modules 100, 200 and 300 end the execution of the synchronization control task (Step 205 in Fig. 2). Thereafter, all the processors continue ordinary program operation (Step 206 in Fig. 2).
The operation described in the foregoing enables the computing modules 100, 200 and 300 to again continue with their operation in clock synchronization with each other. At this time, as described above, since re-synchronization processing is executed before the contents of the memories 104, 204 and 304 in the computing modules 100, 200 and 300 lose coincidence, after starting the operation again in clock synchronization, all the computing modules 100, 200 and 300 are again allowed to execute the same instruction string at the same timing. This eliminates the need of copying memory for re-synchronization which is required in a conventional fault-tolerant computer system, thereby enabling high-speed execution of re-synchronization processing.
Fig. 3 is a block diagram showing a structure of a fault-tolerant computer system according to a second mode of implementation of the present invention.
With reference to Fig. 3, the fault-tolerant computer system according to the present mode of implementation of the present invention is structured to include a plurality of computing modules 100 and 200 each having a processor and a memory and a plurality of peripheral device control units 400 and 500 each having a PCI
bridge 703. Each of the computing modules 100 and 200 processes the same instruction string in clock synchronization with each other. The fault-tolerant computer system compares a processing result of each computing module. Even when one computing module has a failure, the processing can be continued by the remaining computing modules. In addition, each of the peripheral device control units 400 and 500 is structured to be multiplexed by software control to enable, even when one peripheral device control unit develops a fault, processing to be continued using the other peripheral device control unit.
Each peripheral device control unit 400 includes the PCI bridge 703 connected to memory control units 105 and 205 in the respective computing modules 100 and 200 through a PCI for establishing connection with a peripheral device, a bus monitor 700 for monitoring an access of each processor in each of the computing modules 100 and 200 to an external bus, a fault detecting unit 702 for monitoring existence/non-existence of a fault in the entire fault-tolerant computer system including the computing modules 100 and 200, and a synchronization control unit 701 connected to each computing module through the PCI bridge 703 for adjusting timing of a response to an access from each computing module to recover clock synchronization of each computing module.
Although not illustrated in the figure, the peripheral device control unit 500 also has the above-described respective components similarly to the peripheral device control unit 400.
The lock-step synchronism fault-tolerant computer system structured according to the present mode of implementation ordinarily monitors clock synchronization operation of each of the computing modules 100 and 200 and controls a peripheral device by using the peripheral device control unit 400. When a failure occurs in the peripheral device control unit 400, conduct the same processing by switching the use to the peripheral device control unit 500.
In the present mode of implementation, execution of an instruction to access the prescribed resource in the synchronization control unit 701 which is shown in Fig. 2 (Step 203 in Fig. 2) is realized by the execution of a read instruction to a register in the synchronization control unit 701 in the peripheral device control unit 400 and the read instruction is transmitted to the synchronization control unit 701 through PCI buses 800 and 801 and the PCI bridge 703 and its response is transmitted to each of the computing modules 100 and 200 through the same route.
Content of the re-synchronization processing in the present mode of implementation is the same as that shown in Fig. 2.
In addition, although shown in the present mode of implementation is an embodiment in which two computing modules exist, structure having three computing modules as shown in the first mode of implementation illustrated in Fig. 1 or structure having four or more modules function in the same manner.
Fig. 4 is a block diagram showing a structure of a fault-tolerant computer system according to a third mode of implementation of the present invention.
Shown in the present mode of implementation is a structure in which a bus monitor 700 is connected to computing modules 100 and 200 through a PCI bridge 703.
In the present mode of implementation, monitoring of external buses 103 and 203 of the respective processors is executed by a signal (PCI bus protocol) transmitted to the bus monitor 700 through memory control units 105 and 205 of the respective computing modules 100 and 200, PCI buses 800 and 801 and the PCI
bridge 703. In addition, transmission of an interruption from the bus monitor 700 to each computing module is executed through a route reverse to the above-described route.
Content of the re-synchronization processing in the present mode of implementation is the same as that shown in Fig. 2.
Although in the second and third modes of implementation, the respective computing modules and the bus monitor and the like are connected using a PCI, the connection between these components may be established using an interface of other standard such as a PCI-X or a exclusive interface not standardized for general purposes, which affects none of the effects of the present invention.
In the fault-tolerant computer system of the present invention, the function of each unit for executing re-synchronization processing can be realized not only by hardware but also by loading a re-synchronization processing program 1000 which executes the function of each of the above-described units into a memory of a computer processing device to control the computer processing device. The re-synchronization processing program 1000 is stored in a magnetic disk, a semiconductor memory or other recording medium and loaded from the recording medium into the computer processing device to control operation of the computer processing device, thereby realizing each of the above-described functions.
Although the present invention has been described with respect to the preferred modes of implementation in the foregoing, the present invention is not necessarily limited to the above-described modes of implementation but realized in various forms within the scope of its technical idea.
Although shown in each of the above-described modes of implementation is the structure in which each computing module has two processors, structure having one processor or three or more processors functions in completely the same manner.
In addition, although shown in each of the modes of implementation is a case where the respective processors share one external bus and connected on the same bus, neither, for example, a structure in which a plurality of processors are connected in the form of asterism to a memory control unit nor a structure in which processors forming one computing module are physically put on a plurality of boards affects the effects of the present invention.
As described in the foregoing, the present invention attains the following effects.
First effect is enabling a certain computing module in a fault-tolerant computer system, when it comes off from the lock-step state due to other causes than a fixed failure, to be restored to the lock-step state in an extremely short period of time.
The reason is that at an initial stage of generation of step-out, when a memory in each computing module is yet to disagree with each other, the bus monitor generates an interruption to a processor to preferentially execute a task of executing an instruction string for controlling re-synchronization, thereby recovering synchronization without copying memory.
Second effect is improving availability of the fault-tolerant computer system. The reason is that a halt time period of the entire system can be drastically reduced by significantly speeding up time for re-integration when lock-step comes off.
Although the invention has been illustrated and described with respect to exemplary embodiment thereof, it should be understood by those skilled in the art that the foregoing and various other changes, omissions and additions may be made therein and thereto, without departing from the spirit and scope of the present invention. Therefore, the present invention should not be understood as limited to the specific embodiment set out above but to include all possible embodiments which can be embodies within a scope encompassed and equivalents thereof with respect to the feature set out in the appended claims.
The reason is that in order to integrate a computing module at the step-out state into the operational state, all the memory contents are copied all the time from the computing modules continuing with operation into the computing module to be re-integrated.
When operation of a normal computing module is continued during the copying processing, memory contents of the normal computing module have a possibility to be changed during the copying processing as well, so that copying can not be performed properly. For avoiding such a situation, a computing module at the operational state is temporarily stopped to prevent updating of its memory contents.
Since a memory capacity in a computing module today comes up to several Giga bytes, copying the entire memory region will require a long period of time.
In a lock-step synchronism fault-tolerant computer system, a step-out state among computing modules occurs due to various causes.
First case is a fixed failure occurring within a computing module. in this case, a computing module having a failure should be replaced and when integrating the computing module to be replaced into an operational system, all the data in a memory of a computing module at the operational state needs to be copied.
In a lock-step synchronism fault-tolerant computer system, a step-out state may occur because, in addition to the above-described fixed failure, computing modules operate at different timing due to difference in manufacturing of each unit in a computing module although its operation is normal or because of an automatically correctable intermittent failure of memory caused by effects of a-ray or the like.
in these cases, since a fixed failure occurs not in a computing module itself, the module fundamentally needs no replacement, and by again synchronizing its processing with that of other computing modules in operation to integrate the computing module in question, the entire fault-tolerant computer system can be restored to a normal operation state.
SUMMPARY OF THE INVENTION
An object of an embodiment of the present invention is to provide a fault-tolerant computer system, a re-synchronization method thereof and a re-synchronization program thereof which enable a computing module whose lock-step comes off due to other causes than a fixed failure to be integrated again into the operational'state at a higher speed than by a conventional system, thereby drastically reducing time of temporary halt of the .system operation caused by the integration processing.
Another object of an embodiment of the present invention is to provide a fault-tolerant computer system, a re-synchronization method thereof and a re-synchronization program thereof which realize improvement of availability of the system by the reduction of time for the above-described re-integration processing.
According to the first aspect of the invention, a lock-step synchronism fault-tolerant computer system including a plurality of computing modules having a processor and a memory in which each computing module processes the same instruction string in synchronization with each other, wherein when detecting disagreement in a state of access to an external bus among respective the processors in each the computing module, if no fault is detected in the system including each the computing module, synchronization among each the computing module is recovered by adjusting timing of a response to an access which each the processor executes as a synchronization control instruction by an interruption.
2'5 In the preferred construction, the fault-tolerant computer system further comprises a fault detector which monitors existence/non-existence of a fault in the entire system;
a bus monitor which monitors an access of the processor in each the computing module to the external bus and when detecting disagreement in output among the respective computing modules, if no fault is detected by the fault detector, notifying an interruption to each the processor, and a synchronization controller which re-synchronizes each computing module by adjusting timing of a response to an access from each the processor which is caused by the interruption In another preferred construction, the bus monitor, when detecting disagreement in output among the respective computing modules, if no fault is detected by the fault detector, interrupts each the processor with a predetermined task, which is a task of executing an access to a predetermined resource in the synchronization controller, to re-synchronizing the computing modules, and the synchronization controller transmits a response to all the computing modules simultaneously, when receiving accesses to the resource from all the processors.
In another preferred construction, a plurality of pairs of the bus monitor, the fault detector and the synchronization controller are provided.
In another preferred construction, the bus monitor, the fault detector and the synchronization controller are provided in a peripheral device control unit which controls a peripheral device and connected to the external bus in the computing module through a PCI
bridge.
According to the second aspect of the invention, a re-synchronization method in a lock-step synchronism fault-tolerant computer system including a plurality of computing modules having a processor and a memory in which each computing module processes the same instruction string in synchronization with each other, comprising the steps of when detecting disagreement in a state of access to an external bus among respective the processors in each the computing module, if no fault is detected in the system including each the computing module, generating an interruption to all of the processors, and causing each the processor to execute a synchronization control instruction to adjust timing of a response to an access from each processor, thereby causing each computing module to resume operation in synchronization.
In the preferred construction, the re-synchronization method further comprising the steps of detecting existence/non-existence of a fault in the entire system including each the computing module, monitoring an access of the processor in each the computing module to the external bus, when detecting disagreement in output, among the respective computing modules, if no fault is detected in the system, notifying an interruption to each the processor, and causing each the processor to execute the clock synchronization control instruction to adjust timing of a response to an access from each processor, thereby causing each computing module to resume operation in synchronization.
In another preferred construction, the re-synchronization method further comprising the steps of when detecting disagreement in output among the respective computing modules, if no fault is detected in the system, interrupts each the processor with a predetermined task for re-synchronizing the respective computing modules which is a task of executing an access to a predetermined resource;
queuing access to the resource from each processor, and responding to the accesses from all the computing modules simultaneously when all the accesses from the processors are received.
According to another aspect of the invention, a re-synchronization program for executing re-synchronization processing of a lock-step synchronism fault-tolerant computer system including a plurality of computing modules having a processor and a memory in which each computing module processes the same instruction string in synchronization with each other, comprising the functions of when detecting disagreement in a state of access to an external bus among respective the processors in each the computing module, if no fault is detected in the system including each the computing module, generating an interruption to all of the processors, and causing each the processor to execute a clock synchronization control instruction to adjust timing of a response to an access from each processor, thereby causing each computing module to resume operation in synchronization In the preferred construction, the re-synchronization program further comprises the functions of detecting existence/non-existence of a fault in the entire system including each the cornputing module, monitoring an access of the processor in each the computing module to the external bus, when detecting disagreement in output among the respective computing modules, if no fault is detected in the system, notifying an interruption to each the processor, and causing each the processor to execute the synchronization control instruction to adjust timing of a response to an access from each processor, thereby causing each computing module to resume operation in synchronization.
In another preferred construction, the re-synchronization program further comprises the functions of when detecting disagreement in output among the respective computing modules, if no fault is detected in the system, interrupts each the processor with a predetermined task for re-synchronizing the respective computing modules which is a task of executing an access to a predetermined resource;
queing access to the resource from each processor, and responding to the accesses from all the computing modules simultaneously when all the accesses from the processors are received.
According to another aspect of the invention, there is provided a lock-step synchronism fault-tolerant computer system including a plurality of computing modules having a processor and a memory in which each computing module processes the same instruction string in synchronization with each other, comprising: a fault detector which monitors existence/non-existence of a fault in the entire system; a bus monitor which monitors an access of the processor in each said computing module to the external bus and when detecting disagreement in output among the respective computing modules, if no fault is detected by said fault detector, notifying each said processor of an interruption; and a synchronization controller which re-synchronizes each computing module by adjusting timing of the response to an access from each said processor which is caused by said interruption; wherein said bus monitor, when detecting disagreement in output among the respective computing modules, if no fault is detected by said fault detector, interrupts each said processor with a predetermined task, which is a task of executing an access to a predetermined resource in said synchronization controller, to re-synchronizing the computing modules, and said synchronization controller transmits a response to all the computing modules simultaneously, when receiving accesses to said resource from all the processors.
-10a-According to a further aspect of the invention, there is provided a re-synchronization method in a lock-step synchronism fault-tolerant computer system including a plurality of computing modules having a processor and a memory in which each computing module processes the same instruction string in synchronization with each other, comprising the steps of: detecting existence/non-existence of a fault in the entire system including each said computing module; monitoring an access of the processor in each said computing module to the external bus; when detecting disagreement in output among the respective computing modules, if no fault is detected in the system, notifying each said processor of an interruption; causing each said processor to execute a clock.synchronization control instruction to adjust timing of the response to an access from each processor, thereby causing each computing module to resume operation in synchronization; when detecting disagreement in output among the respective computing modules, if no fault is detected in the system, interrupts each said processor with a predetermined task for re-synchronizing the respective computing modules which is a task of executing an access to a predetermined resource; and queuing access to said resource from each processor, and responding to said accesses from all the computing modules simultaneously when all the accesses from said processors are received.
According to a still further aspect of the invention, there is provided a computer readable medium having computer readable code embodied therein for executing re-synchronization processing of a lock-step synchronism fault-tolerant computer system including a plurality of computing modules having a processor and a memory in which each computing module processes the same instruction string -10b-in synchronization with each other, said computer readable code when executed, performing the functions of: detecting existence/non-existence of a fault in the entire system including each said computing module; monitoring an access of the processor in each said computing module to the external bus; when detecting disagreement in output among the respective computing modules, if no fault is detected in the system, notifying each said processor of an interruption; causing each said processor to execute the synchronization control instruction to adjust timing of the response to an access from each processor, thereby causing each computing module to resume operation in synchronization; when detecting disagreement in output among the respective computing modules, if no fault is detected in the system, interrupts each said processor with a predetermined task for re-synchronizing the respective computing modules which is a task of executing an access to a predetermined resource; and queuing access to said resource from each processor, and responding to said accesses from all the computing modules simultaneously when all the accesses from said processors are received.
-lOc-Other objects, features and advantages of the present invention will become clear from the detailed description given herebelow.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention will be understood more fully from the detailed description given herebelow and from the accompanying drawings of the preferred embodiment of the invention, which, however, should not be taken to be limitative to the invention, but are for explanation and understanding only.
In the drawings:
Fig. 1 is a block diagram showing a structure of a fault-tolerant computer system according to a first mode of implementation of the present invention;
Fig. 2 is a diagram for use in explaining the contents of re-synchronization processing of the fault-tolerant computer system;
Fig. 3 is a block diagram showing a structure of a fault-tolerant computer system according to a second mode of implementation of the present invention; and Fig. 4 is a block diagram showing a structure of a fault-tolerant computer system according to a third mode of implementation of the present invention.
DESCRIPTION OF THE PREFERRED EMBODIMENT
The preferred embodiment of the present invention will be discussed hereinafter in detail with reference to the accompanying drawings. In the following description, numerous specific details are set forth in order to provide a through understanding of the present invention. It will be obvious, however, to those skilled in the art that the present invention may be practiced without these specific details. In other instance, well-known structures are not shown in detail in order to unnecessary obscure the present invention.
Modes of implementation of the present invention will be described in detail with reference to the drawings in the following. Fig. 1 is a block diagram showing a structure of a lock-step synchronism fault-tolerant computer system according to a first mode of implementation of the present invention.
With reference to Fig. 1, the fault-tolerant computer system according to the present mode of implementation includes a plurality of computing modules 100, 200 and 300, each of which computing modules 100, 200 and 300 processes the same instruction string in clock synchronization with each other. The fault-tolerant computer system compares a processing result of each computing module. Even when one computing module has a failure, the processing can be continued by the remaining computing modules.
The respective computing modules 100, 200 and 300 include a plurality of processors 101 and 102, 201 and 202, and 301 and 302, processor external buses 103, 203 and 303, memories 104, 204 and 304 and memory control units 105, 205 and 305, respectively.
In addition, the computing modules 100, 200 and 300 are connected to peripheral device control units 400 and 500 for controlling a peripheral device through the memory control units 105, 205 and 305 and interface signal lines 600, 601, 602, 610, 611 and 612.
The above-described fault-tolerant computer system further includes a bus monitor 700, a fault detecting unit 702 and a synchronization control unit 701.
The bus monitor 700 monitors an access of a processor of each computing module to the external bus.
The bus monitor 700 is connected to the processor external buses 103, 203 and 303 of the respective computing modules 100, 200 and 300 through interface signal lines 710, 711 and 712.
The fault detecting unit 702 monitors existence/non-existence of a fault in the entire system including the respective computing modules.
The synchronization control unit 701, which is connected to each computing module, adjusts timing of a response to an access from each computing module to cause each computing module to resume operation in clock synchronization. The synchronization control unit 701 is connected to the memory control units 105, 205 and 305 of the respective computing modules 100, 200 and 300 through interface signal lines 730, 731 and 732.
Next, description will be made of operation of thus structured fault-tolerant computer system according to the present mode of implementation.
The fault detecting unit 702 monitors existence/non-existence of a fixed fault in the entire fault-tolerant computer system including the respective computing modules 100, 200 and 300 and the peripheral device control units 400 and 500. Then, the fault detecting unit 702 notifies a monitoring result to the bus monitor 700.
The bus monitor 700, which is connected to the processor external buses 103, 203 and 303 of the respective computing modules. 100, 200 and 300 through the interface signal lines 710, 711 and 712, compares external access control signals of the respective processors 101, 102, 201, 202, 301 and 302 to monitor whether the respective processors 101, 102, 201, 202, 301 and 302 access the external buses 103, 203 and 303 in clock synchronization with each other at the same timing or not.
Tn a case where by the above-described monitoring operation, the bus monitor 700 detects any of the processors 101, 102, 201, 202, 301 and 302 operating at different timing from timing of the others, when the fault detecting unit 702 detects a fixed fault nowhere in the fault-tolerant computer system, the bus monitor 700 detects that the step-out is not caused by a fault.
The result is notified to all the computing modules 100, 200 and 300 through the interface signal lines 710, 711 and 712 to generate an interruption to each processor In addition, the bus monitor 700 at the same time shifts to a break mode of monitoring the external buses 103, 203 and 303 of the processors.
Here, the bus monitor 700 monitors all the accesses to the external buses 103, 203 and 303 including a memory access from the processor and when detecting lack of synchronization in operation among the computing modules, instantaneously interrupts all the processors 101, 102, 201, 202, 301 and 302 to interrupt the processing, so that at the time of the interruption is generated, the contents of the memories 104, 204 and 304 in the respective computing modules 100, 200 and 300 are all coincident with each other.
In the following, description will be made of -w5-specific contents of operation of the fault-tolerant computer system according to the present mode of implementation with reference to Fig. 2.
When the bus monitor 700 detects lack of synchronization in operation among the computing modules (Step 201 in Fig. 2), the detection is notified through the interface signal lines 710, 711 and 712 to generate an interruption to each processor.
All the processors 101, 102, 201, 202, 301 and 302 are at the relevant interruption processing and queue a synchronization control task intended to obtain re-synchronization of clock synchronization operation among the respective computing modules 100, 200 and 300 to the top of a ready queue as a highest-priority task (Step 202 in Fig. 2).
The synchronization control task has a function of executing an instruction to access a resource specially prescribed in the synchronization control unit 701. Thereafter, when the above-described synchronization control task is shifted to an execution state by an OS, the task executes the instruction to access the prescribed resource in the synchronization control unit 701 (Step 203 in Fig. 2).
At this time point, an access to the prescribed resource from a computing module in the step-out state and an access to the prescribed resource from other computing modules in the lock-step state are naturally transmitted to the synchronization control unit '701 with a time delay.
Upon detecting an access from the computing modules 100, 200 and 300 to the internal resource specially prescribed, the synchronization control unit 701, when the access is the first, refrains from returning a response to the relevant computing module and waits for accesses from all of the other computing modules to come (Step 204 in Fig. 2). When the accesses from all the computing modules 100, 200 and 300 are transmitted, return a response to the accesses simultaneously to all the computing modules 100, 200 and 300.
In response to the response from the synchronization control unit 701, all the processors in the respective computing modules 100, 200 and 300 end the execution of the synchronization control task (Step 205 in Fig. 2). Thereafter, all the processors continue ordinary program operation (Step 206 in Fig. 2).
The operation described in the foregoing enables the computing modules 100, 200 and 300 to again continue with their operation in clock synchronization with each other. At this time, as described above, since re-synchronization processing is executed before the contents of the memories 104, 204 and 304 in the computing modules 100, 200 and 300 lose coincidence, after starting the operation again in clock synchronization, all the computing modules 100, 200 and 300 are again allowed to execute the same instruction string at the same timing. This eliminates the need of copying memory for re-synchronization which is required in a conventional fault-tolerant computer system, thereby enabling high-speed execution of re-synchronization processing.
Fig. 3 is a block diagram showing a structure of a fault-tolerant computer system according to a second mode of implementation of the present invention.
With reference to Fig. 3, the fault-tolerant computer system according to the present mode of implementation of the present invention is structured to include a plurality of computing modules 100 and 200 each having a processor and a memory and a plurality of peripheral device control units 400 and 500 each having a PCI
bridge 703. Each of the computing modules 100 and 200 processes the same instruction string in clock synchronization with each other. The fault-tolerant computer system compares a processing result of each computing module. Even when one computing module has a failure, the processing can be continued by the remaining computing modules. In addition, each of the peripheral device control units 400 and 500 is structured to be multiplexed by software control to enable, even when one peripheral device control unit develops a fault, processing to be continued using the other peripheral device control unit.
Each peripheral device control unit 400 includes the PCI bridge 703 connected to memory control units 105 and 205 in the respective computing modules 100 and 200 through a PCI for establishing connection with a peripheral device, a bus monitor 700 for monitoring an access of each processor in each of the computing modules 100 and 200 to an external bus, a fault detecting unit 702 for monitoring existence/non-existence of a fault in the entire fault-tolerant computer system including the computing modules 100 and 200, and a synchronization control unit 701 connected to each computing module through the PCI bridge 703 for adjusting timing of a response to an access from each computing module to recover clock synchronization of each computing module.
Although not illustrated in the figure, the peripheral device control unit 500 also has the above-described respective components similarly to the peripheral device control unit 400.
The lock-step synchronism fault-tolerant computer system structured according to the present mode of implementation ordinarily monitors clock synchronization operation of each of the computing modules 100 and 200 and controls a peripheral device by using the peripheral device control unit 400. When a failure occurs in the peripheral device control unit 400, conduct the same processing by switching the use to the peripheral device control unit 500.
In the present mode of implementation, execution of an instruction to access the prescribed resource in the synchronization control unit 701 which is shown in Fig. 2 (Step 203 in Fig. 2) is realized by the execution of a read instruction to a register in the synchronization control unit 701 in the peripheral device control unit 400 and the read instruction is transmitted to the synchronization control unit 701 through PCI buses 800 and 801 and the PCI bridge 703 and its response is transmitted to each of the computing modules 100 and 200 through the same route.
Content of the re-synchronization processing in the present mode of implementation is the same as that shown in Fig. 2.
In addition, although shown in the present mode of implementation is an embodiment in which two computing modules exist, structure having three computing modules as shown in the first mode of implementation illustrated in Fig. 1 or structure having four or more modules function in the same manner.
Fig. 4 is a block diagram showing a structure of a fault-tolerant computer system according to a third mode of implementation of the present invention.
Shown in the present mode of implementation is a structure in which a bus monitor 700 is connected to computing modules 100 and 200 through a PCI bridge 703.
In the present mode of implementation, monitoring of external buses 103 and 203 of the respective processors is executed by a signal (PCI bus protocol) transmitted to the bus monitor 700 through memory control units 105 and 205 of the respective computing modules 100 and 200, PCI buses 800 and 801 and the PCI
bridge 703. In addition, transmission of an interruption from the bus monitor 700 to each computing module is executed through a route reverse to the above-described route.
Content of the re-synchronization processing in the present mode of implementation is the same as that shown in Fig. 2.
Although in the second and third modes of implementation, the respective computing modules and the bus monitor and the like are connected using a PCI, the connection between these components may be established using an interface of other standard such as a PCI-X or a exclusive interface not standardized for general purposes, which affects none of the effects of the present invention.
In the fault-tolerant computer system of the present invention, the function of each unit for executing re-synchronization processing can be realized not only by hardware but also by loading a re-synchronization processing program 1000 which executes the function of each of the above-described units into a memory of a computer processing device to control the computer processing device. The re-synchronization processing program 1000 is stored in a magnetic disk, a semiconductor memory or other recording medium and loaded from the recording medium into the computer processing device to control operation of the computer processing device, thereby realizing each of the above-described functions.
Although the present invention has been described with respect to the preferred modes of implementation in the foregoing, the present invention is not necessarily limited to the above-described modes of implementation but realized in various forms within the scope of its technical idea.
Although shown in each of the above-described modes of implementation is the structure in which each computing module has two processors, structure having one processor or three or more processors functions in completely the same manner.
In addition, although shown in each of the modes of implementation is a case where the respective processors share one external bus and connected on the same bus, neither, for example, a structure in which a plurality of processors are connected in the form of asterism to a memory control unit nor a structure in which processors forming one computing module are physically put on a plurality of boards affects the effects of the present invention.
As described in the foregoing, the present invention attains the following effects.
First effect is enabling a certain computing module in a fault-tolerant computer system, when it comes off from the lock-step state due to other causes than a fixed failure, to be restored to the lock-step state in an extremely short period of time.
The reason is that at an initial stage of generation of step-out, when a memory in each computing module is yet to disagree with each other, the bus monitor generates an interruption to a processor to preferentially execute a task of executing an instruction string for controlling re-synchronization, thereby recovering synchronization without copying memory.
Second effect is improving availability of the fault-tolerant computer system. The reason is that a halt time period of the entire system can be drastically reduced by significantly speeding up time for re-integration when lock-step comes off.
Although the invention has been illustrated and described with respect to exemplary embodiment thereof, it should be understood by those skilled in the art that the foregoing and various other changes, omissions and additions may be made therein and thereto, without departing from the spirit and scope of the present invention. Therefore, the present invention should not be understood as limited to the specific embodiment set out above but to include all possible embodiments which can be embodies within a scope encompassed and equivalents thereof with respect to the feature set out in the appended claims.
Claims (5)
1. A lock-step synchronism fault-tolerant computer system including a plurality of computing modules having a processor and a memory in which each computing module processes the same instruction string in synchronization with each other, comprising:
a fault detector which monitors existence/non-existence of a fault in the entire system;
a bus monitor which monitors an access of the processor in each said computing module to the external bus and when detecting disagreement in output among the respective computing modules, if no fault is detected by said fault detector, notifying each said processor of an interruption; and a synchronization controller which re-synchronizes each computing module by adjusting timing of the response to an access from each said processor which is caused by said interruption;
wherein said bus monitor, when detecting disagreement in output among the respective computing modules, if no fault is detected by said fault detector, interrupts each said processor with a predetermined task, which is a task of executing an access to a predetermined resource in said synchronization controller, to re-synchronizing the computing modules, and said synchronization controller transmits a response to all the computing modules simultaneously, when receiving accesses to said resource from all the processors.
a fault detector which monitors existence/non-existence of a fault in the entire system;
a bus monitor which monitors an access of the processor in each said computing module to the external bus and when detecting disagreement in output among the respective computing modules, if no fault is detected by said fault detector, notifying each said processor of an interruption; and a synchronization controller which re-synchronizes each computing module by adjusting timing of the response to an access from each said processor which is caused by said interruption;
wherein said bus monitor, when detecting disagreement in output among the respective computing modules, if no fault is detected by said fault detector, interrupts each said processor with a predetermined task, which is a task of executing an access to a predetermined resource in said synchronization controller, to re-synchronizing the computing modules, and said synchronization controller transmits a response to all the computing modules simultaneously, when receiving accesses to said resource from all the processors.
2. The fault-tolerant computer system as set forth in claim 1, wherein a plurality of pairs of said bus monitor, said fault detector and said synchronization controller are provided.
3. The fault-tolerant computer system as set forth in claim 1, wherein said bus monitor, said fault detector and said synchronization controller are provided in a peripheral device control unit which controls a peripheral device and is connected to the external bus in said computing module through a PCI bridge.
4. A re-synchronization method in a lock-step synchronism fault-tolerant computer system including a plurality of computing modules having a processor and a memory in which each computing module processes the same instruction string in synchronization with each other, comprising the steps of:
detecting existence/non-existence of a fault in the entire system including each said computing module;
monitoring an access of the processor in each said computing module to the external bus;
when detecting disagreement in output among the respective computing modules, if no fault is detected in the system, notifying each said processor of an interruption;
causing each said processor to execute a clock synchronization control instruction to adjust timing of the response to an access from each processor, thereby causing each computing module to resume operation in synchronization;
when detecting disagreement in output among the respective computing modules, if no fault is detected in the system, interrupts each said processor with a predetermined task for re-synchronizing the respective computing modules which is a task of executing an access to a predetermined resource; and queuing access to said resource from each processor, and responding to said accesses from all the computing modules simultaneously when all the accesses from said processors are received.
detecting existence/non-existence of a fault in the entire system including each said computing module;
monitoring an access of the processor in each said computing module to the external bus;
when detecting disagreement in output among the respective computing modules, if no fault is detected in the system, notifying each said processor of an interruption;
causing each said processor to execute a clock synchronization control instruction to adjust timing of the response to an access from each processor, thereby causing each computing module to resume operation in synchronization;
when detecting disagreement in output among the respective computing modules, if no fault is detected in the system, interrupts each said processor with a predetermined task for re-synchronizing the respective computing modules which is a task of executing an access to a predetermined resource; and queuing access to said resource from each processor, and responding to said accesses from all the computing modules simultaneously when all the accesses from said processors are received.
5. A computer readable medium having computer readable code embodied therein for executing re-synchronization processing of a lock-step synchronism fault-tolerant computer system including a plurality of computing modules having a processor and a memory in which each computing module processes the same instruction string in synchronization with each other, said computer readable code when executed, performing the functions of:
detecting existence/non-existence of a fault in the entire system including each said computing module;
monitoring an access of the processor in each said computing module to the external bus;
when detecting disagreement in output among the respective computing modules, if no fault is detected in the system, notifying each said processor of an interruption;
causing each said processor to execute the synchronization control instruction to adjust timing of the response to an access from each processor, thereby causing each computing module to resume operation in synchronization;
when detecting disagreement in output among the respective computing modules, if no fault is detected in the system, interrupts each said processor with a predetermined task for re-synchronizing the respective computing modules which is a task of executing an access to a predetermined resource; and queuing access to said resource from each processor, and responding to said accesses from all the computing modules simultaneously when all the accesses from said processors are received.
detecting existence/non-existence of a fault in the entire system including each said computing module;
monitoring an access of the processor in each said computing module to the external bus;
when detecting disagreement in output among the respective computing modules, if no fault is detected in the system, notifying each said processor of an interruption;
causing each said processor to execute the synchronization control instruction to adjust timing of the response to an access from each processor, thereby causing each computing module to resume operation in synchronization;
when detecting disagreement in output among the respective computing modules, if no fault is detected in the system, interrupts each said processor with a predetermined task for re-synchronizing the respective computing modules which is a task of executing an access to a predetermined resource; and queuing access to said resource from each processor, and responding to said accesses from all the computing modules simultaneously when all the accesses from said processors are received.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2002-204305 | 2002-07-12 | ||
JP2002204305A JP3982353B2 (en) | 2002-07-12 | 2002-07-12 | Fault tolerant computer apparatus, resynchronization method and resynchronization program |
Publications (2)
Publication Number | Publication Date |
---|---|
CA2434494A1 CA2434494A1 (en) | 2004-01-12 |
CA2434494C true CA2434494C (en) | 2008-11-25 |
Family
ID=29728536
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA002434494A Expired - Fee Related CA2434494C (en) | 2002-07-12 | 2003-07-07 | Fault-tolerant computer system, re-synchronization method thereof and re-synchronization program thereof |
Country Status (10)
Country | Link |
---|---|
US (1) | US7225355B2 (en) |
EP (1) | EP1380952B1 (en) |
JP (1) | JP3982353B2 (en) |
KR (1) | KR100566339B1 (en) |
CN (1) | CN1326042C (en) |
AU (1) | AU2003208108A1 (en) |
CA (1) | CA2434494C (en) |
DE (1) | DE60301702T2 (en) |
ES (1) | ES2247459T3 (en) |
TW (1) | TWI226983B (en) |
Families Citing this family (80)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9047094B2 (en) | 2004-03-31 | 2015-06-02 | Icera Inc. | Apparatus and method for separate asymmetric control processing and data path processing in a dual path processor |
US8484441B2 (en) | 2004-03-31 | 2013-07-09 | Icera Inc. | Apparatus and method for separate asymmetric control processing and data path processing in a configurable dual path processor that supports instructions having different bit widths |
US7949856B2 (en) * | 2004-03-31 | 2011-05-24 | Icera Inc. | Method and apparatus for separate control processing and data path processing in a dual path processor with a shared load/store unit |
US7529807B1 (en) * | 2004-05-05 | 2009-05-05 | Sun Microsystems, Inc. | Common storage in scalable computer systems |
DE102004032405A1 (en) * | 2004-07-03 | 2006-02-09 | Diehl Bgt Defence Gmbh & Co. Kg | Space-enabled computer architecture |
US7308605B2 (en) * | 2004-07-20 | 2007-12-11 | Hewlett-Packard Development Company, L.P. | Latent error detection |
US7487395B2 (en) * | 2004-09-09 | 2009-02-03 | Microsoft Corporation | Method, system, and apparatus for creating an architectural model for generating robust and easy to manage data protection applications in a data protection system |
US7627781B2 (en) | 2004-10-25 | 2009-12-01 | Hewlett-Packard Development Company, L.P. | System and method for establishing a spare processor for recovering from loss of lockstep in a boot processor |
US7502958B2 (en) * | 2004-10-25 | 2009-03-10 | Hewlett-Packard Development Company, L.P. | System and method for providing firmware recoverable lockstep protection |
US7624302B2 (en) * | 2004-10-25 | 2009-11-24 | Hewlett-Packard Development Company, L.P. | System and method for switching the role of boot processor to a spare processor responsive to detection of loss of lockstep in a boot processor |
US7818614B2 (en) * | 2004-10-25 | 2010-10-19 | Hewlett-Packard Development Company, L.P. | System and method for reintroducing a processor module to an operating system after lockstep recovery |
US7516359B2 (en) * | 2004-10-25 | 2009-04-07 | Hewlett-Packard Development Company, L.P. | System and method for using information relating to a detected loss of lockstep for determining a responsive action |
JP2006178636A (en) * | 2004-12-21 | 2006-07-06 | Nec Corp | Fault tolerant computer and its control method |
JP2006178616A (en) * | 2004-12-21 | 2006-07-06 | Nec Corp | Fault tolerant system, controller used thereform, operation method and operation program |
US20060212677A1 (en) * | 2005-03-15 | 2006-09-21 | Intel Corporation | Multicore processor having active and inactive execution cores |
US7590885B2 (en) * | 2005-04-26 | 2009-09-15 | Hewlett-Packard Development Company, L.P. | Method and system of copying memory from a source processor to a target processor by duplicating memory writes |
US7496786B2 (en) * | 2006-01-10 | 2009-02-24 | Stratus Technologies Bermuda Ltd. | Systems and methods for maintaining lock step operation |
JP5220281B2 (en) * | 2006-03-31 | 2013-06-26 | 日本電気株式会社 | Core cell change control method and control program for information processing system |
WO2007143278A2 (en) | 2006-04-12 | 2007-12-13 | Soft Machines, Inc. | Apparatus and method for processing an instruction matrix specifying parallel and dependent operations |
US7480827B2 (en) | 2006-08-11 | 2009-01-20 | Chicago Mercantile Exchange | Fault tolerance and failover using active copy-cat |
US7434096B2 (en) * | 2006-08-11 | 2008-10-07 | Chicago Mercantile Exchange | Match server for a financial exchange having fault tolerant operation |
US8041985B2 (en) | 2006-08-11 | 2011-10-18 | Chicago Mercantile Exchange, Inc. | Match server for a financial exchange having fault tolerant operation |
EP2527972A3 (en) | 2006-11-14 | 2014-08-06 | Soft Machines, Inc. | Apparatus and method for processing complex instruction formats in a multi- threaded architecture supporting various context switch modes and virtualization schemes |
US8756402B2 (en) * | 2007-09-14 | 2014-06-17 | Intel Mobile Communications GmbH | Processing module, processor circuit, instruction set for processing data, and method for synchronizing the processing of codes |
CN101383690B (en) * | 2008-10-27 | 2011-06-01 | 西安交通大学 | Grid synchronization method for fault tolerant computer system based on socket |
GB2471138B (en) | 2009-06-19 | 2014-08-13 | Advanced Risc Mach Ltd | Handling integer and floating point registers during a context switch |
CN101882098B (en) * | 2009-07-10 | 2012-07-11 | 威盛电子股份有限公司 | Microprocessor integrated circuit and correlation debug method |
WO2011106308A2 (en) * | 2010-02-23 | 2011-09-01 | Navia Systems, Inc. | Configurable circuitry for solving stochastic problems |
US8058916B2 (en) | 2010-04-15 | 2011-11-15 | Xilinx, Inc. | Lockstep synchronization and maintenance |
KR101685247B1 (en) | 2010-09-17 | 2016-12-09 | 소프트 머신즈, 인크. | Single cycle multi-branch prediction including shadow cache for early far branch prediction |
WO2012051281A2 (en) | 2010-10-12 | 2012-04-19 | Soft Machines, Inc. | An instruction sequence buffer to store branches having reliably predictable instruction sequences |
EP2628072B1 (en) | 2010-10-12 | 2016-10-12 | Soft Machines, Inc. | An instruction sequence buffer to enhance branch prediction efficiency |
GB2489000B (en) | 2011-03-14 | 2019-09-11 | Advanced Risc Mach Ltd | Diagnosing code using single step execution |
CN108376097B (en) | 2011-03-25 | 2022-04-15 | 英特尔公司 | Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines |
US9766893B2 (en) | 2011-03-25 | 2017-09-19 | Intel Corporation | Executing instruction sequence code blocks by using virtual cores instantiated by partitionable engines |
US9274793B2 (en) | 2011-03-25 | 2016-03-01 | Soft Machines, Inc. | Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines |
US9442772B2 (en) | 2011-05-20 | 2016-09-13 | Soft Machines Inc. | Global and local interconnect structure comprising routing matrix to support the execution of instruction sequences by a plurality of engines |
WO2012162188A2 (en) | 2011-05-20 | 2012-11-29 | Soft Machines, Inc. | Decentralized allocation of resources and interconnect structures to support the execution of instruction sequences by a plurality of engines |
JP5699057B2 (en) * | 2011-08-24 | 2015-04-08 | 株式会社日立製作所 | Programmable device, programmable device reconfiguration method, and electronic device |
US9811338B2 (en) | 2011-11-14 | 2017-11-07 | Intel Corporation | Flag non-modification extension for ISA instructions using prefixes |
US20150039859A1 (en) | 2011-11-22 | 2015-02-05 | Soft Machines, Inc. | Microprocessor accelerated code optimizer |
IN2014CN03678A (en) | 2011-11-22 | 2015-09-25 | Soft Machines Inc | |
CN102521086B (en) * | 2011-12-08 | 2014-07-16 | 上海交通大学 | Dual-mode redundant system based on lock step synchronization and implement method thereof |
US8832720B2 (en) * | 2012-01-05 | 2014-09-09 | Intel Corporation | Multimedia driver architecture for reusability across operating systems and hardware platforms |
US8930674B2 (en) | 2012-03-07 | 2015-01-06 | Soft Machines, Inc. | Systems and methods for accessing a unified translation lookaside buffer |
US10146545B2 (en) | 2012-03-13 | 2018-12-04 | Nvidia Corporation | Translation address cache for a microprocessor |
US9880846B2 (en) | 2012-04-11 | 2018-01-30 | Nvidia Corporation | Improving hit rate of code translation redirection table with replacement strategy based on usage history table of evicted entries |
US10241810B2 (en) | 2012-05-18 | 2019-03-26 | Nvidia Corporation | Instruction-optimizing processor with branch-count table in hardware |
US9710399B2 (en) | 2012-07-30 | 2017-07-18 | Intel Corporation | Systems and methods for flushing a cache with modified data |
US9740612B2 (en) | 2012-07-30 | 2017-08-22 | Intel Corporation | Systems and methods for maintaining the coherency of a store coalescing cache and a load cache |
US9229873B2 (en) | 2012-07-30 | 2016-01-05 | Soft Machines, Inc. | Systems and methods for supporting a plurality of load and store accesses of a cache |
US9916253B2 (en) | 2012-07-30 | 2018-03-13 | Intel Corporation | Method and apparatus for supporting a plurality of load accesses of a cache in a single cycle to maintain throughput |
US9678882B2 (en) | 2012-10-11 | 2017-06-13 | Intel Corporation | Systems and methods for non-blocking implementation of cache flush instructions |
DE102012219180A1 (en) * | 2012-10-22 | 2014-05-08 | Robert Bosch Gmbh | Arithmetic unit for a control unit and operating method therefor |
US20140189310A1 (en) | 2012-12-27 | 2014-07-03 | Nvidia Corporation | Fault detection in instruction translations |
US9563579B2 (en) | 2013-02-28 | 2017-02-07 | Intel Corporation | Method, apparatus, system for representing, specifying and using deadlines |
US10108424B2 (en) | 2013-03-14 | 2018-10-23 | Nvidia Corporation | Profiling code portions to generate translations |
US9886279B2 (en) | 2013-03-15 | 2018-02-06 | Intel Corporation | Method for populating and instruction view data structure by using register template snapshots |
US9569216B2 (en) | 2013-03-15 | 2017-02-14 | Soft Machines, Inc. | Method for populating a source view data structure by using register template snapshots |
US10275255B2 (en) | 2013-03-15 | 2019-04-30 | Intel Corporation | Method for dependency broadcasting through a source organized source view data structure |
US9891924B2 (en) | 2013-03-15 | 2018-02-13 | Intel Corporation | Method for implementing a reduced size register view data structure in a microprocessor |
US9811342B2 (en) | 2013-03-15 | 2017-11-07 | Intel Corporation | Method for performing dual dispatch of blocks and half blocks |
US10140138B2 (en) | 2013-03-15 | 2018-11-27 | Intel Corporation | Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation |
WO2014150806A1 (en) | 2013-03-15 | 2014-09-25 | Soft Machines, Inc. | A method for populating register view data structure by using register template snapshots |
US9904625B2 (en) | 2013-03-15 | 2018-02-27 | Intel Corporation | Methods, systems and apparatus for predicting the way of a set associative cache |
WO2014150971A1 (en) | 2013-03-15 | 2014-09-25 | Soft Machines, Inc. | A method for dependency broadcasting through a block organized source view data structure |
KR20150130510A (en) | 2013-03-15 | 2015-11-23 | 소프트 머신즈, 인크. | A method for emulating a guest centralized flag architecture by using a native distributed flag architecture |
WO2014150991A1 (en) | 2013-03-15 | 2014-09-25 | Soft Machines, Inc. | A method for implementing a reduced size register view data structure in a microprocessor |
KR102063656B1 (en) | 2013-03-15 | 2020-01-09 | 소프트 머신즈, 인크. | A method for executing multithreaded instructions grouped onto blocks |
US9792121B2 (en) * | 2013-05-21 | 2017-10-17 | Via Technologies, Inc. | Microprocessor that fuses if-then instructions |
US9183155B2 (en) * | 2013-09-26 | 2015-11-10 | Andes Technology Corporation | Microprocessor and method for using an instruction loop cache thereof |
US9952620B2 (en) * | 2014-04-10 | 2018-04-24 | Intel Corporation | Time-synchronizing a group of nodes |
JP6360387B2 (en) * | 2014-08-19 | 2018-07-18 | ルネサスエレクトロニクス株式会社 | Processor system, engine control system, and control method |
US9697094B2 (en) * | 2015-02-06 | 2017-07-04 | Intel Corporation | Dynamically changing lockstep configuration |
TWI514148B (en) * | 2015-03-16 | 2015-12-21 | Univ Nat Sun Yat Sen | Cache memory |
JP6436031B2 (en) * | 2015-09-18 | 2018-12-12 | 信越半導体株式会社 | Single crystal pulling apparatus and single crystal pulling method |
JP6083480B1 (en) * | 2016-02-18 | 2017-02-22 | 日本電気株式会社 | Monitoring device, fault tolerant system and method |
US10152427B2 (en) | 2016-08-12 | 2018-12-11 | Google Llc | Hybrid memory management |
US10037173B2 (en) | 2016-08-12 | 2018-07-31 | Google Llc | Hybrid memory management |
JP6853162B2 (en) * | 2017-11-20 | 2021-03-31 | ルネサスエレクトロニクス株式会社 | Semiconductor device |
Family Cites Families (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3864670A (en) * | 1970-09-30 | 1975-02-04 | Yokogawa Electric Works Ltd | Dual computer system with signal exchange system |
US5020024A (en) * | 1987-01-16 | 1991-05-28 | Stratus Computer, Inc. | Method and apparatus for detecting selected absence of digital logic synchronism |
JPH0432955A (en) | 1990-05-23 | 1992-02-04 | Oki Electric Ind Co Ltd | Processor fault detecting device |
JPH05298134A (en) * | 1991-12-16 | 1993-11-12 | Internatl Business Mach Corp <Ibm> | Method and mechanism for processing of processing error in computer system |
EP0986007A3 (en) * | 1993-12-01 | 2001-11-07 | Marathon Technologies Corporation | Method of isolating I/O requests |
US5832253A (en) * | 1993-12-06 | 1998-11-03 | Cpu Technology, Inc. | Multiprocessors system for selectively wire-oring a combination of signal lines and thereafter using one line to control the running or stalling of a selected processor |
JPH08235015A (en) | 1995-02-27 | 1996-09-13 | Mitsubishi Electric Corp | Processor device and processor fault diagnostic method |
US5805870A (en) * | 1996-06-28 | 1998-09-08 | International Business Machines Corporation | System and method for correcting clock drift in multiprocessor systems |
JP3241997B2 (en) | 1996-06-28 | 2001-12-25 | 富士通株式会社 | Information processing device |
US5875320A (en) * | 1997-03-24 | 1999-02-23 | International Business Machines Corporation | System and method for synchronizing plural processor clocks in a multiprocessor system |
US5903717A (en) * | 1997-04-02 | 1999-05-11 | General Dynamics Information Systems, Inc. | Fault tolerant computer system |
US5923830A (en) * | 1997-05-07 | 1999-07-13 | General Dynamics Information Systems, Inc. | Non-interrupting power control for fault tolerant computer systems |
WO1999026133A2 (en) | 1997-11-14 | 1999-05-27 | Marathon Technologies Corporation | Method for maintaining the synchronized execution in fault resilient/fault tolerant computer systems |
US6175930B1 (en) * | 1998-02-17 | 2001-01-16 | International Business Machines Corporation | Demand based sync bus operation |
GB2340627B (en) * | 1998-08-13 | 2000-10-04 | Plessey Telecomm | Data processing system |
US6757847B1 (en) * | 1998-12-29 | 2004-06-29 | International Business Machines Corporation | Synchronization for system analysis |
JP2000200255A (en) | 1999-01-07 | 2000-07-18 | Hitachi Ltd | Method and circuit for synchronization between processors |
US6643787B1 (en) * | 1999-10-19 | 2003-11-04 | Rambus Inc. | Bus system optimization |
US6980617B1 (en) * | 2000-11-15 | 2005-12-27 | Advantest Corporation | Reception data synchronizing apparatus and method, and recording medium with recorded reception data synchronizing program |
US6480966B1 (en) * | 1999-12-07 | 2002-11-12 | International Business Machines Corporation | Performance monitor synchronization in a multiprocessor system |
JP2002049501A (en) | 2000-08-04 | 2002-02-15 | Nippon Telegr & Teleph Corp <Ntt> | Fault-tolerant system and its fault demarcating method |
-
2002
- 2002-07-12 JP JP2002204305A patent/JP3982353B2/en not_active Expired - Fee Related
-
2003
- 2003-07-01 TW TW092117919A patent/TWI226983B/en not_active IP Right Cessation
- 2003-07-03 AU AU2003208108A patent/AU2003208108A1/en not_active Abandoned
- 2003-07-07 CA CA002434494A patent/CA2434494C/en not_active Expired - Fee Related
- 2003-07-08 US US10/614,000 patent/US7225355B2/en not_active Expired - Fee Related
- 2003-07-10 ES ES03015796T patent/ES2247459T3/en not_active Expired - Lifetime
- 2003-07-10 DE DE60301702T patent/DE60301702T2/en not_active Expired - Lifetime
- 2003-07-10 EP EP03015796A patent/EP1380952B1/en not_active Expired - Fee Related
- 2003-07-11 KR KR1020030047086A patent/KR100566339B1/en not_active IP Right Cessation
- 2003-07-14 CN CNB031472990A patent/CN1326042C/en not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
CN1326042C (en) | 2007-07-11 |
DE60301702T2 (en) | 2006-07-06 |
KR100566339B1 (en) | 2006-03-31 |
TWI226983B (en) | 2005-01-21 |
DE60301702D1 (en) | 2005-11-03 |
EP1380952B1 (en) | 2005-09-28 |
US7225355B2 (en) | 2007-05-29 |
EP1380952A1 (en) | 2004-01-14 |
CA2434494A1 (en) | 2004-01-12 |
KR20040007322A (en) | 2004-01-24 |
ES2247459T3 (en) | 2006-03-01 |
CN1495611A (en) | 2004-05-12 |
JP3982353B2 (en) | 2007-09-26 |
AU2003208108A1 (en) | 2004-01-29 |
TW200401187A (en) | 2004-01-16 |
US20040010789A1 (en) | 2004-01-15 |
JP2004046611A (en) | 2004-02-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA2434494C (en) | Fault-tolerant computer system, re-synchronization method thereof and re-synchronization program thereof | |
US7107484B2 (en) | Fault-tolerant computer system, re-synchronization method thereof and re-synchronization program thereof | |
US7519856B2 (en) | Fault tolerant system and controller, operation method, and operation program used in the fault tolerant system | |
US7493517B2 (en) | Fault tolerant computer system and a synchronization method for the same | |
US7987385B2 (en) | Method for high integrity and high availability computer processing | |
CA2530018A1 (en) | Securing time for identifying cause of asynchronism in fault-tolerant computer | |
US6519710B1 (en) | System for accessing shared memory by two processors executing same sequence of operation steps wherein one processor operates a set of time later than the other | |
AU2005246990A1 (en) | Fault tolerant computer system and interrupt control method for the same | |
JP2008046942A (en) | Fault tolerant computer and transaction synchronous control method therefor | |
CA2694198C (en) | High integrity and high availability computer processing module | |
CA2435001C (en) | Fault-tolerant computer system, re-synchronization method thereof and re-synchronization program thereof | |
JP3774826B2 (en) | Information processing device | |
JP2001175545A (en) | Server system, fault diagnosing method, and recording medium | |
JPH08278950A (en) | Multiplexed computer system and fault restoring method | |
JPS63251840A (en) | Control method for detection of multi-processor abnormality |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request | ||
MKLA | Lapsed |
Effective date: 20130709 |