[54] APPARATUS FOR SYNCHRONIZING
OPERATOR INITIATED COMMANDS WITH
A FAILOVER PROCESS IN A DISTRIBUTED
PROCESSING SYSTEM
[75] Inventors: Robert F. Bartfai, West Shokan; John
Divirgilio, Middletown; John W.
Doxtader, Hurley; Peter J. LeVangia,
Kingston; Laura J. Merritt,
Wappingers Falls; Nicholas P. Rash,
Poughkeepsie; Kevin J. Reilly,
Kingston, all of N.Y.
[73] Assignee: International Business Machines Corporation, Armonk, N.Y.
[ * ] Notice: This patent is subject to a terminal disclaimer.
[21] Appl. No.: 08/827,133 [22] Filed: Mar. 27, 1997
[51] Int. CI.7 G06F 11/00; G06F 11/22
[52] U.S. CI 714/4; 714/12; 709/248;
709/400; 710/61
[58] Field of Search 395/182.02, 182.05,
395/182.09, 182.1, 182.11, 182.13, 182.21, 183.17, 200.78, 553, 561, 595, 670, 671, 672, 881; 371/5.4, 47.1; 340/825.14; 714/4, 7, 11, 12, 13; 709/248, 400; 710/61
[56] References Cited
U.S. PATENT DOCUMENTS
4,590,554 5/1986 Glazer et al 395/182.11
5,136,498 8/1992 McLaughlin et al 364/184
5,247,655 9/1993 Khan et al 711/106
5,404,544 4/1995 Crayford 395/750
5,408,645 4/1995 Ikeda et al 395/575
5,408,649 4/1995 Berhears et al 395/182.08
5,426,774 6/1995 Banerjee et al 714/16
5,463,763 10/1995 Kubo 714/4
5,473,599 12/1995 Li et al 370/16
5,485,465 1/1996 Liu et al 714/4
5,544,077 8/1996 Hershey 702/58
5,751,955 5/1998 Sonnier et al 395/200.19
5,764,903 6/1998 Yu 395/200.38
5,875,290 2/1999 Bartjai et al 714/13
FOREIGN PATENT DOCUMENTS
0609 051 Al 3/1994 European Pat. Off G06F 12/08
OTHER PUBLICATIONS
Groetsch, W. & Brand, T, Unix fault tolerance with the queue and count design; Conference pp. 233-243; Sep. 1989.
"Supervisor Recovery in Ring Networks", A. Goyal and R. Nelson, IBM Technical Disclosure Bulletin, vol. 27, No. 8, Jan. 1865, pp. 4715-4717.
Primary Examiner—Dieu-Minh T. Le
Attorney, Agent, or Firm—Floyd A. Gonzalez; David A.
Fox; Cantor Colburn LLP
[57] ABSTRACT
An apparatus for synchronizing operator commands with a failover process in a distributed system having a control workstation and a plurality of nodes. One of the nodes of the distributed system is designated a primary node and one of the nodes is designated a backup node. The backup node includes a backup daemon for performing a failover process if the primary node fails such that the backup node becomes the primary node. Shell scripts send a command string to be synchronized with the operation of the backup daemon from the control workstation to the backup node. The backup daemon is then checked to determine if the backup daemon is sleeping, and, in the event the backup daemon is sleeping, commands derived from the command string are enqueued in a work queue for processing by the backup daemon. The backup daemon is then awakened such that the derived commands in the work queue are processed. In the event that the backup daemon is busy, commands derived from the command string are failed, thereby synchronizing the derived commands with the processing of the backup daemon.
12 Claims, 4 Drawing Sheets