« PreviousContinue »
METHOD AND SYSTEM IN A COMPUTER
NETWORK FOR THE RELIABLE AND
CONSISTENT ORDERING OF CLIENT
BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates to improved methods and systems for reliable information-retrieval and updating. In particular, the present invention relates to improved methods 10 and systems for reliable information-retrieval and data (or information) updating in distributed computer networks. More particularly, the present invention relates to improved methods and systems for reliable information-retrieval and data (or information) updating in distributed computer net- 15 works via a consistent ordering of client requests.
2. Description of the Related Art
Computer networks allow users of data-processing systems to link with remote servers, and thus retrieve vast 2Q amounts of electronic information heretofore unavailable in an electronic medium. Computer networks are increasingly displacing more conventional means of information transmission, such as newspapers, magazines, and television. A computer network connects a set of machines and 2J allows them to communicate with one another. In such a computer network, one machine may act as a gateway that connects the computer network to other networks. The gateway handles data transfer and the conversion of messages from a sending network to protocols utilized by a 3Q receiving network.
A set of computer networks can be combined to form a so-called "Internet" when connected with one other directly or indirectly via gateways. The term "Internet" is an abbreviation for "Inter-network," and refers commonly to the 35 collection of networks and gateways that utilize the TCP/IP suite of protocols, which are well-known in the art of computer networking. TCP/IP is an acronym for "Transport Control Protocol/Internet Protocol," a software protocol developed by the Department of Defense for communication 40 between computers. In recent years, the Internet has allowed users to interact and share information over the networks. Because of such wide-spread information sharing, the Internet has thus generally evolved into an "open" system in which developers can design software applications for per- 45 forming specialized operations or services, essentially without restriction.
Typical networked systems utilized widely today follow a client/server architecture. In network computing, a client is a process (i.e., roughly a program or task) that requests a 50 service provided by another program, the server. The client process may utilize the requested service without having to "know" the working details of the other program or the requested service itself. In a client/server architecture, particularly a networked system, a client is usually a computer 55 that accesses shared network resources provided by another computer (i.e., a server).
A client/server architecture acts as a framework for distributed applications that organize the flow of information between clients and servers. For example, in a stock- 60 quotation service, a client sends requests to the server over the Internet, identifying a particular stock name. The server retrieves the desired information from one or more associated storage devices, and sends back an appropriate reply to the client over the network. This form of interaction between 65 the client and the server is referred to as "information retrieval."
Equivalently, some sets of clients may transmit requests to update the information stored at a server storage devices. For example, a computer utilized within a stock market environment may send information to a server to refresh current stock prices. Alternatively, in the same stock market environment, a client may interact with a server in an electronic business framework to update relevant information stored at the service storage device (e.g. a stock portfolio account). This form of interaction between the client and server is referred to as "information update" or "information updating." Generally, the client process may be active in a first computer system, and the server process may be active in a second computer system, thereby communicating with one another over a computer network, while providing distributed functionality and permitting multiple clients to take advantage of the information-gathering capabilities of the server.
Such client/server architectures are often configured as distributed computer networks in which distributed servers achieve reliability and high availability utilizing replication. In such systems, several processors or machines may be utilized to provide a service, with each machine replicating the state of the service. Such machines are referred to as "server replicas" or simply "replicas". A client may communicate with a subset of server replicas to obtain and update information. Such a subset may include all, some, or only one of the available replicas. A client may select the subset randomly or via pre-defined selection criteria. It is thus necessary that all server replicas maintain identical states in order to ensure a consistent view of the information manipulated by the service, as perceived by the same client or by different clients.
If a server replica fails, the remaining server replicas continue to operate, thereby ensuring uninterrupted service for the clients. A significant problem faced by designers in implementing replicated services is to ensure that replicas maintain identical states that reflect client transactions with the service. For example, two different clients may issue an update request to the same record in a database maintained by a replicated service. If the two requests are processed in different order by two or more different replicas, the values of the record may be inconsistent at different replicas.
Ordered multicast protocols have attempted to address this problem. Ordered multicast protocols ensure that all server replicas receive the same messages from the network in the same order, thereby guaranteeing that the server replicas process identical requests and maintain identical states. A multicast protocol guarantees that all server replicas receive all client requests, even if the clients are communicating with only a subset of the servers. Commercial products that benefit from such protocols include "Web" servers, on-line transaction processing systems, stock market quotation services, general purpose file systems, electronic business servers, and so forth.
Multicast protocols do, however, impose a performance penalty because such protocols must be processed in a manner to ensure that clients requests are delivered in the same order at all server replicas. An ordering protocol requires server replicas to exchange special-purpose messages, which in turn may consume network bandwidth and delay the delivery of a request to a server program until an agreement is reached regarding the delivery order. The server program cannot begin processing the request until the ordering protocol completes its operation. The resulting performance penalty translates into a decline in communication throughput and an increase in response time. A rapid response time is increasingly necessary for successful opera3
tion in interactive client/server systems, such as Web servers, electronic business transaction services, or interactive services. Rapid response times are increasingly difficult to achieve in such interactive client/server systems utilizing traditional multicast protocols. 5
Based on the foregoing, it can be appreciated that a need exists for an improved method and system for implementing an ordering protocol that ensures all replicas of a server receive client requests in the same order, while offering better response time than traditional multicast protocols. It is 1° believed that the invention described herein addresses and solves these problems.
SUMMARY OF THE INVENTION
It is therefore an object of the invention to provide an 15 improved method and system for reliable and consistent information-retrieval and information updating.
It is another object of the invention to provide an improved method and system for reliable and consistent information-retrieval and information updating in associa- 20 tion with distributed computer networks.
It is still another object of the invention to provide an improved method and system in a distributed computer network for reliable and consistent information retrieval and 2J information updating via an improved ordering of client requests to a group of replicated servers.
The above and other objects are achieved as is now described. A method and system are disclosed for the reliable and consistent delivery of client requests via an 30 ordering protocol in a computer network having at least one client connected to one or more servers among a group of servers. Each server among the group of servers replicates a particular network service to ensure that the particular network service remains uninterrupted in the event of a 35 server failure. A client s request to retrieve information can be directed to any server replica, which in turn executes the necessary retrieval operations and responds directly to the client.
However, a client's request to update information stored 40 within the service state may be directed to a "distinguished server replica," chosen from the group of replicated servers to receive all of the client's update requests. The distinguished server replica, upon receiving a client request to update the service state, defines an order in which all 45 replicated servers should receive and process the update request. The order sequences the update request within all previous and future update requests, such that all replicated servers can than process all updates in a consistent order. The distinguished server replica automatically transfers the 50 request, along with its sequence order, to the remaining server replicas.
The distinguished server replica immediately delivers the client request to the service program at the machine on which the distinguished server replica runs. This delivery 55 occurs before the sequence information may have reached the remaining group of replicas. Delivery at the distinguished server replica occurs before all replicas receive the order in which the request should be executed, thereby producing a temporary inconsistency among the service 60 states at different replicas. To avoid such inconsistencies from affecting the client or the server s state, the service program at the distinguished server replica executes operations necessary to carry out the update request in a "tentative mode." In a tentative mode, the updates computed by the 65 service program are not permitted to become permanent. Old copies of the updated items are utilized to field any corre
sponding retrieval requests of corresponding data items. Tentative updates are thus never allowed to affect the results of retrieval requests.
When a replicated server receives an update request from the distinguished replica, it immediately sends back an acknowledgment. The replicated server then executes the update request in a tentative mode according to the order defined by the distinguished replica. After the service program executes the update request in the tentative mode at the distinguished server replica, the service program forms a response to the client and stores this response in a buffer. The response remains in a buffer until all the other server replicas transmit acknowledgments that they did in fact receive the update request and the order of the update request from the distinguished server replica. At this point, the response can be released from the buffer and returned to the client. The tentative updates become permanent and affect responses to future retrieval requests. The distinguished replica then sends confirmation messages to the other replicas, which likewise convert the tentative updates to permanent updates.
BRIEF DESCRIPTION OF THE DRAWINGS
The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further objects, and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein:
FIG. 1 depicts a block diagram illustrative of a client/ replicated server architecture in accordance with a preferred embodiment of the present invention;
FIG. 2 illustrates a block diagram illustrative of interactions between a client and replicated servers during the execution of an information retrieval request, in accordance with a preferred embodiment of the present invention;
FIG. 3 depicts a block diagram illustrative of interactions between a client and replicated servers during the execution of an information update request, in accordance with a preferred embodiment of the present invention;
FIG. 4 illustrates a timing diagram of interactions between a client and replicated servers during the execution of an information update request in accordance with a preferred embodiment of the present invention;
FIG. 5 depicts a flowchart of operations illustrating a method for implementing an improved request ordering protocol via a distinguished server replica, in accordance with a preferred embodiment of the present invention; and
FIG. 6 illustrates a flowchart of operations illustrating a method for implementing an improved request ordering protocol via a general (non-distinguished) server replica, in accordance with a preferred embodiment of the present invention.
DETAILED DESCRIPTION OF PREFERRED
With reference now to the figures and in particular with reference to FIG. 1, there is depicted a detailed block diagram illustrating a client/replicated server architecture which may be implemented in accordance with a preferred embodiment of the present invention. Although the client and servers depicted in FIG. 1 represent processes generated from a high-level programming language (e.g., C++), which is interpreted and executed in a computer system at run-time (e.g., a workstation), it can be appreciated by those skilled
in the art that such processes may be implemented in a variety of hardware devices, either programmed or dedicated.
In the client/replicated architecture depicted in FIG. 1, client 92 is connected to a remote service via a computer 5 network. Active within client 92 is a first process, browser 72, which establishes the connections with remote services, presents information to users, and carries user requests to a remote service. Such browsers are often referred to in the art of computer networking as "web browsers." Any number of 10 commercially or publicly available browsers may be utilized in accordance with a preferred embodiment of the present invention. For example, the Mosaic-brand browser available from the National Center for Supercomputing Applications (NCSA) in Urbana-Champaign, 111., can be utilized with a ^ preferred embodiment of the present invention. Other browsers, such as Netscape, the Lynx-brand browsers, or those available which provide the functionality specified under HTTP, can be utilized in accordance with a preferred embodiment of the present invention. 20
In the example depicted in FIG. 1, a server process executes service program components 94, 96 and 98 to satisfy client requests, and responds to the client in the form of HTTP responses 90. HTTP responses 90 may correspond to so-called "web pages," which can be represented utilizing 25 Hyper-Text Markup Language (HTML) 94, or the outcome of a Common Gateway Interface (CGI) program 96 that permits the client program to request the execution of a specified program contained within the replicated service. This specified program may include a search engine which 30 scans received information in the server for presentation to the user controlling the client. This specified program may alternatively be based on a transaction processing system that performs updates in a database according to the client's particular requests. Utilizing Common Gateway Interface 35 (CGI) program 96 in association with HTTP responses 90, the server may notify the client of the results of that execution upon completion. Additionally, the client may direct the filling out of certain "forms" from the browser. This is provided by the "fill-in-forms" functionality (e.g., 40 forms 98) which permits a user, via a client application program, to specify terms in which the server causes an application program to function (e.g., terms or keywords contained in the articles which are of interest to the user).
A client request may retrieve information from the pro- 45 gram service (e.g. utilizing a GET operation of HTTP protocol 90). Additionally, a client request may update information stored by the program service (e.g. utilizing a PUT operation of HTTP protocol 90). The remote service executes applications at different server replicas 84, 86, and 50 88 to reduce the likelihood of service unavailability due to process, machine or communication failures. Such replicas are identical and run the same software required to implement a particular service program (e.g., components 94, 96 and 98). Client 92 can send information retrieval requests 55 and information update requests to any replica among replicas 84, 86, and 88. Replicas 84 and 86, however, redirect such requests to replica 88, which is referred to as a "distinguished replica." Such forwarding operations can occur, for instance, utilizing the "See Other" status response 60 of the HTTP protocol (code 303), or a similar communication facility. Browser 72 is responsible for establishing connections to replicas 84, 86 and 88, as appropriate, via a name resolution protocol, such as the Internet DNS system, well known in the art. Those skilled in the art can appreciate 65 that browser 72 utilizes performance metrics (e.g. network proximity) to choose an available server with which to
interact. The client or the users are not actually aware of the specific replication aspects related to the service.
FIG. 2 illustrates a block diagram illustrative of interactions between client 92 and replicated servers 84, 86, and 88 during the execution of an information retrieval request, in accordance with a preferred embodiment of the present invention. In FIG. 2, user requests for information retrieval may be directed to any available replica. In particular, FIG. 2 depicts a situation in which a client may direct its request through message 24 to replica 84, through a message 26 to replica 86, or through message 28 to replica 88. For information retrieval operations, the client sends only one of these messages, for example, to the nearest available server. The program service at the chosen server replica executes the necessary retrieval information and sends back an appropriate response (message 34, 36, or 38).
FIG. 3 depicts a block diagram illustrative of interactions between client 92 and a group of replicated servers (i.e., 84, 86, and 88) during the execution of an information update request, in accordance with a preferred embodiment of the present invention. A user request for update 48 is sent directly to distinguished server replica 88 (e.g. via the POST operation in the HTTP protocol). Equivalently, a client may send the request to either replica 84 or 86, which in turn redirects the client to distinguished server replica 88 (e.g. via a "See Other" response of the standard HTTP protocol). Upon receiving request 48, distinguished server replica 88 is synchronized with replicas 84 and 86 utilizing messages 44, 46, 54, 56, 64, and 66, which belong to the ordering protocol described herein. Messages 44, 46, 54, 56, 64, and 66 serve to ensure that all replicas receive the update request and execute it in the same order with respect to previous and future update requests, thereby maintaining the consistency of the replicated server state. After distinguished server replica 88 executes the ordering protocol and steps necessary to carry out the client s request (e.g., see block 166 of FIG. 5 and FIG. 6 herein), distinguished server replica 88 sends back a reply 49 containing a response to the client's request.
FIG. 4 illustrates a timing diagram illustrating interactions between client 92 and a group of replicated servers (i.e., 84, 86, and 88) during the execution of an information update request in accordance with a preferred embodiment of the present invention. In the timing diagram depicted in FIG. 4, time proceeds from left to right, and angled arrows represent the messages illustrated previously in FIG. 3. Two important features of the present invention are illustrated in FIG. 4. Specifically, the execution of request 42 begins as soon as message 48 is available to distinguished server replica 88. Those skilled in the art can appreciate that this arrangement differs from traditional multicast ordering configurations in which the execution is delayed until after the distinguished replica 88 receives both messages 54 and 56, and replicas 84 and 86 receive messages 64 and 66, respectively. The purpose of such a delay in traditional multicast ordering protocols is to ensure that the distinguished server replica processes the request in the same order as the other replicas. In the present invention, however, request 42 proceeds in a "tentative mode" in which the effects of the update are stored in temporary storage. Execution of the "tentative mode" does not result in any permanent updates to the data structure of the service. Execution of the "tentative mode" also does not reflect on subsequent retrieval operations until the "tentative mode" is lifted. The "tentative mode" is lifted as soon as the distinguished server replica 88 receives both messages 54 and 56. When the "tentative mode" is finally lifted, the effects of the update become permanent and henceforth affect future retrieval operations.
Thus, it can be appreciated by those skilled in the art that the results of processing request 42 via the multicast ordering protocol described herein are available to the client sooner than delaying execution of the request until messages 54 and 56 are received as is the case with traditional 5 multicast protocol configurations. Additionally, in traditional multicast protocol configurations, replicas 84 and 86 act to deliver the request to the service program only after they receive messages 64 and 66, respectively. According to the present invention described herein, however, the replicas 10 begin execution of the request in tentative mode as soon as messages 54 and 56 are respectively received. Thus, the results of the update, according to the present invention described herein, are available sooner than is the case with traditional multicast protocol configurations, because the 15 method and system described herein, in accordance with a preferred embodiment of the present invention described, avoids such delays.
Thus, according to the timing diagram depicted in FIG. 4, the service time for a client request 42 is represented as Ts, 2o and the time it takes the ordering protocol to deliver the client request in the same order at all replicas is represented as Tm. Time Tm represents the time elapsed between the receipt of update request 42 from client 92 until the distinguished server replica 88 receives both messages 54 and 56, 25 respectively. Given these parameters, the distinguished replica can produce a client request at max(Ts, Tm), which computes the larger of the two values Ts and Tm instead of (Ts+Tm) which is typically the case with traditional multicast ordering protocols. Based on these parameters, those skilled 30 in the art can appreciate that up to a 50% reduction in response time during update operations can be achieved via a preferred embodiment of the present invention versus traditional multicast ordering protocols. Note that the replicas abide by the order defined by the distinguished replica 35 when serving the client request. The early delivery of the client request to the server program at the distinguished replica does not result in any perceived inconsistencies as far as the clients are concerned. The "tentative mode" of execution hides any temporary inconsistency that results from 40 delivering the request at the distinguished replica 88 before replicas 84 and 86 acknowledge the receipt of the request and its processing order (i.e., messages 44 and 46).
FIG. 5 and FIG. 6 illustrate flowcharts of operations illustrating a method for implementing an improved order- 45 ing protocol, in accordance with a preferred embodiment of the present invention. FIG. 5 depicts a flowchart of operations illustrating a method for implementing an improved request ordering protocol via a distinguished server replica, in accordance with a preferred embodiment of the present 50 invention. FIG. 6 illustrates a flowchart of operations illustrating a method for implementing an improved request ordering protocol via a general (non-distinguished) server replica, in accordance with a preferred embodiment of the present invention. The operations depicted in the flowcharts 55 of FIG. 5 and FIG. 6 proceed in parallel as a request is being processed at the relevant server replica. Some of the operations described in FIG. 5 and FIG. 6 are common to both types of replicas, and are thus referenced and described with identical reference numerals and labels. 60
It can be appreciated by those skilled in the art that FIG. 5 and FIG. 6 present a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulation of physical quantities. Usually, although not necessarily, these quantities take the form of 65 electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipu
lated. It has proven convenient at times by those skilled in the art, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like. It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities.
Further, the manipulations performed are often referred to in terms, such as adding or comparing, which are commonly associated with mental operations performed by a human operator. No such capability of a human operator is necessary or desirable in most cases in any of the operations described herein which form part of the present invention; the operations are machine operations. Useful machines for performing operations of a preferred embodiment of the present invention include data-processing systems such as general purpose digital computers or other similar devices. In all cases the distinction between the method operations in operating a computer and the method of computation itself should be borne in mind.
It is important to note that, while the present invention has been (and will continue to be) described in the context of a fully functional computer network, those skilled in the art will appreciate that the present invention is capable of being distributed as a program product in a variety of forms, and that the present invention applies equally regardless of the particular type of signal-bearing media utilized to actually carry out the distribution. Examples of signal-bearing media include: recordable-type media, such as floppy disks, hard disk drives and CD ROMs, and transmission-type media such as digital and analog communication links. Examples of transmission-type media include devices such as modems. A modem is a type of communications device that enables a computer to transmit information over a standard telephone line. Because a computer is digital (i.e., works with discrete electrical signals representative of binary 1 and binary 0) and a telephone line is analog (i.e., carries a signal that can have any of a large number of variations), modems can be utilized to convert digital to analog and vice-versa. The term "media" as utilized herein is a collective word for the physical material such as paper, disk, CD-ROM, tape and so forth, utilized for storing computer-based information. It is also important to note that, while FIG. 1 to FIG. 6 depict and describe a configuration in which three server replicas are utilized, any number of server replicas may be also be utilized in accordance with a preferred embodiment of the present invention.
Thus, as depicted at block 140 in FIG. 5, a process is initiated for the reliable and consistent retrieval and updating of information according to client requests. As indicated at block 142, a group of server replicas is created within a computer network. Each server among the group of server replicas replicates a particular network service to ensure that the particular network service remains uninterrupted in the event of a server failure. Thereafter, as illustrated at block 144, a server replica is "distinguished," which can respond to client update requests. The computer network includes at least one client connected to the group of servers or a non-empty subset thereof. A request by a client to retrieve information can be directed to any server replica.
Next, as described at block 146, when a server replica receives a request, a test is performed to determine the type of desired request. If a client request to retrieve information is desired (e.g., message 24, 28, or 26 of FIG. 2), then as indicated at block 148, the client request to retrieve information is executed. Then, as depicted at block 150, the server replica produces a message containing the result of the operation and sends it back to the client (e.g., see