|Publication number||US6922791 B2|
|Application number||US 09/927,047|
|Publication date||Jul 26, 2005|
|Filing date||Aug 9, 2001|
|Priority date||Aug 9, 2001|
|Also published as||US7139930, US7281154, US20030051187, US20050268155, US20050268156|
|Publication number||09927047, 927047, US 6922791 B2, US 6922791B2, US-B2-6922791, US6922791 B2, US6922791B2|
|Inventors||Victor Mashayekhi, Jenwei Hsieh, Mohamad Reza Rooholamini|
|Original Assignee||Dell Products L.P.|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (25), Non-Patent Citations (2), Referenced by (110), Classifications (12), Legal Events (5)|
|External Links: USPTO, USPTO Assignment, Espacenet|
This invention relates in general to the field of computers, and more particularly, to a system and method for implementing failover policies in a cluster environment.
Clustering is a well known concept that refers to a group of individual computer systems, or nodes, operating together as if it were a single system or single computing resource. One characteristic of a cluster is its scalability, or the ability to add (or delete) nodes from the cluster to meet the user's performance requirements. Another characteristic of a cluster is high availability (HA). High availability refers to the ability of a server to remain operational, or available to a user, even when one or more nodes in the cluster fail. High availability can be improved in a cluster by implementing “failover” procedures that enable the operations performed on a node to failover to, or be assumed by, another node in the cluster in the event that that node fails. Failover procedures must include a policy for selecting the node(s) to assume the tasks or applications running on the failed node. Some failover policies improve the high availability of a cluster system more than others. Thus, the implementation of a failover policy can be crucial when considering the overall high availability of a cluster.
Establishing a failover policy is a simple task when the cluster consists of only two nodes. When one node fails, the only possible solution is to failover all applications running on the failed node to the surviving node. When there are more than two nodes in the cluster, and multiple applications running on each node, however, the failover possibilities become numerous. For example, theoretically all applications running on the failed node can failover to any one of the remaining nodes, and similarly any of the several applications running on the failed node can failover to any one of the surviving nodes. It is apparent that as the number of nodes and number of applications running on each node increase, the failover possibilities increase dramatically. Establishing concrete failover policies is critical for any cluster environment, as the high availability of the cluster will depend on it.
Many cluster systems operating on Windows NT based servers utilize Microsoft Cluster Services (MSCS) software to provide high availability for the cluster. In the Advanced Server product, MSCS provides fail-over capability for a two-node system. All applications running on the first node failover to the second node, and vice versa. This is considered to be the trivial case. In the DataCenter Server product, MSCS provides fail-over capability for up to a four-node system. MSCS multi-node fail-over is slightly more complex, but is encompassed within the prior art that is described more fully below.
Some known failover policies do exist that address failover in multiple node systems. One method, commonly referred to as cascading, establishes a circular list for all nodes in the cluster. For example, if there are four nodes (nodes 1-4) in the cluster, failure of node 1 will cause all applications running on node 1 to failover to node 2, failure of node 2 will cause all applications running on node 2 to failover to node 3, failure of node 3 will cause all applications running on node 3 to failover to node 4, and failure of node 4 will cause all applications running on node 4 to failover to node 1. The cascading failover policy can be represented graphically by the following illustration, where the direction of the arrow points to the node that will assume responsibility for all applications running on the failed node.
In the above failover policy, each node in the cluster may failover only to one single node that has been designated prior to the time of failure. Further, all applications running on any failed node must failover to the same designated surviving node.
Another known failover policy enables applications running on any given node to failover to any remaining node in the cluster, as is depicted by the following graph.
Although the applications of a failed node may theoretically failover to any surviving node, a single failover path must be chosen by a system administrator or the like when the cluster is established or at another time well in advance of the time of failure. Thus, although the possibility exists to select any node for failover purposes, the selection must take place in advance, and there is no way to dynamically assess the best suited node at the time the failure occurs.
Disadvantages of the above-described failover policies are many. First, for any given node, a failover node must be designated in advance. The obvious disadvantages of this are that either failover nodes are designated with complete disregard to the resources needed by the failed node and those available at the failover node (as in the cascading failover policy), or must be determined in a manner that cannot take into account changes in system resources that have occurred since the failover designations were made. For example, additional applications may be added to nodes, or user demands for any given application may increase over time, or even at any given time over the course of any given day. Further, in each of the failover policies described above all applications running on a failed node failover to another single node. This may impact the high availability of the system if the resources of a failover node at the particular time needed are such that it cannot handle all applications, but could otherwise provide failover for certain ones of those applications.
Another known failover policy utilizes a separate “passive” node that is present in the cluster exclusively for the purpose of being the failover node for all active nodes in the cluster. As illustrated in the following graph, each node on the cluster that is actively running applications (nodes 1-3) fails over to node 4, which is not tasked with running any applications other than in the event of a failover.
The disadvantages described above also are present in this failover policy. A further disadvantage is that this failover policy designates only a single failover node for each node running applications in the cluster, and requires the presence of an otherwise idle node, which is an inefficient use of system resources.
It is apparent from the above discussion of known failover policies in a cluster environment that there presently is no known way to dynamically choose among several possible failover nodes at the time failure actually occurs. Thus, none of these known policies enable the system to select a failover node that necessarily will have adequate, or the most available, resources at the time the failure occurs. Further, there is no known method by which the applications running on a failed node may be allocated to different ones of surviving nodes. A failover policy having one or more of the above features would be advantageous in that it would enable optimization in failover designations. Evaluation at the time of failure of the resources available on each surviving node, and directing failover to nodes that are most capable of handing one or more of the applications of the failed node, would enable more efficient use of cluster resources, and improve high availability of the cluster.
Therefore, a need has arisen for failover policies for a cluster environment having more than two nodes in which the applications running on a failed node may be dynamically allocated to one of a several possible surviving nodes. Further, a need has arisen for such a failover policy wherein applications running on the failed node may failover to more than one of the several possible surviving nodes.
In accordance with the present disclosure, a failover method is provided for a computer system having at least three nodes operating as a cluster. The method includes the steps of, following failure of one of the nodes, determining the weight of at least two surviving nodes, determining which of the at least two surviving nodes has the lowest weight, and assigning applications running on the failed node to the surviving node having the lowest determined weight. According to one embodiment, in the weight determining step, the weight of every one of the surviving nodes is determined, and according to yet another embodiment, the weight is determined by evaluating available resources of the node. In yet another embodiment, the evaluating step further includes the steps of examining at least one performance indicator associated with the node, and using a predetermined method to determine from the at least one performance indicator the weight of the node. According to alternate embodiments, the performance indicator is an indicator of current CPU utilization of that node, an indicator of memory currently being used by that node, or both.
A failover method is also provided for a computer system having at least three nodes operating as a cluster, wherein the method includes the steps of determining the amount of resources needed by applications running on one of the nodes, and following failure of the one node, for each of surviving nodes n=1 to N until a failover node is assigned, determining the weight of surviving node n, determining from the weight of surviving node n whether surviving node n has available resources greater than that determined to be needed by the failed node, and if the surviving node n is determined to have sufficient available resources, then assigning node n as the failover node and failing over the applications running on the failed node to the failover node, or if the surviving node n is determined not to have sufficient available resources, then n=n+1. In one embodiment, the determining resources step further includes the step of determining the resources needed by each application running on the one node, and the method further includes the step of prioritizing the applications running on the node, and assigning a failover node for each prioritized application successively starting with the application having the highest priority.
Also provided is a failover method for a computer system having at least three nodes operating as a cluster, wherein the method includes the steps of determining the weight of each of the at least three nodes, ordering the at least three nodes according to their respective increasing weights from lowest to highest, creating a queue containing the ordered nodes, wherein the first node in the queue has the lowest weight, and following failure of one of the at least three nodes, assigning the first surviving node in the queue as a failover node, and failing over applications running on the failed node to the failover node.
Yet another failover method is disclosed for a computer system having at least three nodes operating as a cluster, wherein the method includes the steps of, following failure of one of the at least three nodes, determining the order in which surviving nodes joined the cluster, assigning a failover node according to the order in which the surviving nodes joined the cluster, and failing over all applications running on the failed node to the failover node. According to one embodiment, the failover method further includes the steps of determining the first surviving node to join the cluster, and assigning the first joined surviving node as the failover node. According to an alternate embodiment, the failover method further includes the steps of determining the last surviving node to join the cluster, and assigning the last joined surviving node as the failover node.
A failover method is also provided for a computer system having at least three nodes operating as a cluster, wherein the method includes the steps of detecting failure of one of the at least three nodes, determining a time of failure of the one node, assigning a failover node depending in part on the determined time, and assigning applications running on the failed node to the failover node. According to one embodiment, the method further includes the steps of, for at least one node in the cluster, determining a time period during which the node is heavily utilized, and preventing the at least one node from being assigned as a failover node during the determined time period. According to yet another embodiment, the method further includes the steps of, for at least the failed node, determining in advance of failure a time during which at least one application running on the failed node is heavily utilized, and following failure of the node, if failure occurs during the determined time during which the at least one application is heavily utilized, then assigning a failover node for the at least one application first.
Also provided is a cluster computer system including at least three nodes, wherein the at least three nodes are computer systems operating as a cluster. The cluster computer system is capable of implementing a failover policy in which, following failure of one of the at least three nodes, the weight of surviving possible failover nodes is determined, and a failover node is selected based on the determined weights. According to one embodiment, the cluster computer system is further capable of determining the weights by examining performance indicators of the surviving possible failover nodes. According to yet another embodiment, the weight of the node is determined by using a predetermined mathematical formula including values of the performance indicators, and the performance indicators include at least an indicator of current CPU utilization of that node and an indicator of the amount of memory currently being used by that node.
A cluster computer system is also provided having at least three nodes, wherein the at least three nodes are computer systems operating as a cluster. The cluster computer system is capable of determining a time of failover of one of the at least three nodes, and implementing a failover policy in which, following failover of the one node, a failover node is selected based in part on the determined time. According to one embodiment, the cluster computer system is further capable of determining a time period during which at least one of the nodes is heavily utilized, and preventing the at least one node from being assigned as a failover node during the determined time period. According to yet another embodiment, the cluster is further capable of determining a time during which at least one application on the one failed node is heavily utilized, and upon failure of the one node, if failure occurs during the determined time, then failing over the at least one application first.
A more complete understanding of the present invention and advantages thereof may be acquired by referring to the following description taken in conjunction with the accompanying drawings, in which like reference numbers indicate like features, and wherein:
Preferred embodiments of the present invention are illustrated in the Figures, like numeral being used to refer to like and corresponding parts of the various drawings.
The present disclosure introduces many possible failover policies that may be implemented in a cluster environment having more than two nodes to better utilize available resources of surviving nodes, and thereby, to provide improved high system availability of the cluster. It further introduces the ability to assess system resources at the time a node failure occurs, and also the ability to implement different failover policies for different nodes, or different applications on a node, within the same cluster environment.
The available resources of a node generally refers to the extent to which that node is capable of running additional applications. According to the present disclosure, one method for determining the available resources is to assess the “weight” of each possible failover node by examining its system performance indicators. Most operating systems have performance monitors or indicators that track the usage of system entities, such as memory, disk, network, and processor, amongst others. For example, using these indicators one can collect and view real-time data about memory, or generate logs to accommodate trouble-shooting or subsequent recovery in case of a problem (such as in the event of a memory leak). Alerts can be set to notify an administrator if a performance indicator reaches or surpasses a user-settable threshold. The performance data for a system is available to an application using an Application Programming Interface (API). Using the APIs, an application can observe and track its use of system entities and perform operations based on the value of the performance counters during its operation. These performance indicators can be accessed and used to assess possible failover nodes. For example, in a cluster, should the memory usage of applications running on a node be above 75% of that node's physical memory, it can be designated as not a suitable failover site for a failed node. A node that is showing only 25% memory usage, however, can be designated as a suitable failover site. The “weight” of a node may be determined by creating a mathematical expression based on evaluation of any number of selected performance indicators. For example, if CPU utilization and memory usage are selected performance indicators, and CPU utilization is considered more important, one could assign a weight of 0.9 to CPU utilization and 0.6 to memory usage. The overall weight of the node would be (0.9)*(value of the CPU utilization performance indicator)+(0.6)*(value of the memory usage performance indicator). The node having the lowest weight represents the node having the most available resources. In this manner, the possible failover node having the lowest weight is selected as the desired failover node.
Thus, as shown in
After failure of node n is detected, at step 251 the weight of nodes n+1 and n−1 are determined using one or more predetermined performance indicators and a predefined equation as described above. Once the weight of each node is ascertained, a determination is then made at step 252 as to whether node n+1 or node n−1 has the lowest weight. If node n+1 has the lowest weight, that node is established as the failover node (step 253), but if node n−1 has the lowest weight, node n−1 is established as the failover node (step 254). Thus, a more intelligent failover node can be chosen, thereby improving the high availability of the cluster system.
Another possible failover policy according to the present disclosure can be implemented in cluster environments having an even number of nodes. As shown in
Yet another failover policy can be implemented in a cluster environment in which any single node may failover to any surviving node. Any type of hash function that randomly selects a failover node from the available nodes can be used in this type of system.
Other failover policies according to the present disclosure include First-in-First-out (FIFO) or Last-in-First-out (LIFO) policies, wherein a failover node is designated according to the order in which nodes joined the cluster. For example, according to a FIFO policy, if the order of joining the cluster is node 1, node 2, node 3, then node 4, then the first node to join the cluster (node 1) will be the failover node for all nodes. If node 1 itself is down (or is the one to fail first), then the next surviving node that was the first to join the cluster will become the failover node. In a LIFO failover policy, the designated failover node is the last one to join the cluster. In the above example, this is node 4, and if node 4 has failed it would become node 3, etc. The FIFO and LIFO policies are illustrated in
Yet another failover policy according to the present disclosure involves maintaining a prioritized queue of failover nodes. The prioritized queue is created by determining the weight of each node in the cluster (
Other failover policies that can be established according to the present disclosure are Best Fit or First Fit policies, which are illustrated by
Another failover policy that can be implemented involves determining in advance which node should serve as a failover node for which other nodes. This selection can be made based on the knowledge of what applications are designated to run on which nodes, and what resources each application requires. For example, a system administrator may set up the cluster such that node 1 and node 4 are file servers, and that both should be designated to failover to node 2. The system administrator may also designate more than one possible failover node, with the choice being made at the time of failure by assessing the weights of the designated nodes.
Yet another possible failover policy for multiple nodes in a cluster involves placing the nodes into subgroups, where a failed node is designated to failover to another node within the same subgroup. For example, if there are eight nodes in the cluster, the eight nodes may be broken down into two groups of four nodes each, or three groups of two nodes each. Each node within the group may be designated to failover to a single node within that group only, or to any other node within that group with the designation being made based on the weight of the other nodes within the group at the time of failure as described above.
Yet another failover policy according to the present disclosure assigns failover sites for different applications running on the failed node based on priority. Under this scheme, a priority value is assigned in advance to at least selected applications running on the node (see step 701 in FIG. 7). At the time failure is determined (step 702), the application having the highest priority is determined, and failed over first, followed by the one having the second highest priority, and so on (step 703). Applications without a priority value, if any, can be assigned afterwards. This failover policy helps to ensure that those applications that are most important continue to function in the event of a failure, since any limitations on system resources of the failed node will effect these applications last. This scheme can be combined with any other failover policy described in this disclosure to determine which node each application should fail over to.
Finally, a time-based failover scheme can be implemented in any cluster environment. In a time-based scheme, the failover node to which an application is transferred may depend on the time of day the failure occurs. For example, if a system administrator recognizes that a specific node that services email applications is heavily used in the first hours of the business day, applications on a failed note that occur during those hours would not be failed over to that specific node. This helps to minimize further system interruptions or failures, and helps to ensure that the significant uses of a given system at any given time are not interrupted. Further, should the node that services email applications fail during these critical hours, these applications should failover to another node immediately, and other less important applications running on the failover node can be temporarily suspended.
A time-based failover policy is illustrated generally in FIG. 8. At step 801, prior to the time of failure, the time(s) of day during which one or more nodes in the cluster are most heavily used is determined. According to one embodiment, it may also be determined at this time whether particular applications on the node are most heavily used during particular time(s) of the day (step 802). At step 803, failure of a node is determined, and at step 804 the time of failure is determined. Subsequently, possible surviving failover nodes are examined, and those that are designated as being heavily used during that particular time are removed as possible failover nodes, as shown in steps 805 and 806. When a possible failover node is found that is not excluded, it is assigned as the failover node (step 807). This may be done in several different ways, and in conjunction with implementing different ones of the failover policies described above. For example, possible failover nodes may be examined according to the best fit failover policy, but before applications are actually failed over an additional step is performed to see if the selected failover node is disqualified during that particular time of day. Alternatively, all excluded failover nodes may be determined first, implementing a best fit policy among the remaining nodes. Further, as described above, if it is also determined what time of day particular applications on a node are heavily used, at the time failure occurs the failed node may also be examined to determine if applications running on it are designated as heavily used during that time period. If so, that application(s) may be failed over first. Accordingly, by considering the time(s) of day during which nodes and/or applications running on nodes are heavily utilized, failover policies can be made more effective, improving the overall high availability of the system.
Many of the above failover policies have been described as having all applications running on a failed node failover to a single surviving node. Often times, however, it will be desirable to have different applications failover to different nodes. For example, it may be desirable to have a critical application running on a failed node failover to a surviving node having the most available resources (lowest weight), and other applications running on that node failover to a designated single node. Such a scheme may help to ensure that the most important application(s) running on a node are failed over in a manner that is most likely to ensure minimal interruption. Any one of the above failover policies may be implemented for any application running on a node. A system administrator would determine which failover policy to use for which application at the time the system is established, or at the time the node is added to the cluster.
Although the present invention has been described in detail, it should be understood that various changes, substitutions and alterations can be made hereto without departing from the spirit and scope of the invention as defined by the appended claims.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US5062055 *||May 29, 1990||Oct 29, 1991||Digital Equipment Corporation||Data processor performance advisor|
|US5459864 *||Feb 2, 1993||Oct 17, 1995||International Business Machines Corporation||Load balancing, error recovery, and reconfiguration control in a data movement subsystem with cooperating plural queue processors|
|US5612865||Jun 1, 1995||Mar 18, 1997||Ncr Corporation||Dynamic hashing method for optimal distribution of locks within a clustered system|
|US5696895||Jun 19, 1995||Dec 9, 1997||Compaq Computer Corporation||Fault tolerant multiple network servers|
|US5781716||Feb 19, 1997||Jul 14, 1998||Compaq Computer Corporation||Fault tolerant multiple network servers|
|US5784697 *||Mar 27, 1996||Jul 21, 1998||International Business Machines Corporation||Process assignment by nodal affinity in a myultiprocessor system having non-uniform memory access storage architecture|
|US5852724||Jun 18, 1996||Dec 22, 1998||Veritas Software Corp.||System and method for "N" primary servers to fail over to "1" secondary server|
|US5915095||Aug 8, 1995||Jun 22, 1999||Ncr Corporation||Method and apparatus for balancing processing requests among a plurality of servers based on measurable characteristics off network node and common application|
|US5938732||Dec 9, 1996||Aug 17, 1999||Sun Microsystems, Inc.||Load balancing and failover of network services|
|US6067412 *||Aug 17, 1995||May 23, 2000||Microsoft Corporation||Automatic bottleneck detection by means of workload reconstruction from performance measurements|
|US6067545 *||Apr 15, 1998||May 23, 2000||Hewlett-Packard Company||Resource rebalancing in networked computer systems|
|US6078957||Nov 20, 1998||Jun 20, 2000||Network Alchemy, Inc.||Method and apparatus for a TCP/IP load balancing and failover process in an internet protocol (IP) network clustering system|
|US6078960||Jul 3, 1998||Jun 20, 2000||Acceleration Software International Corporation||Client-side load-balancing in client server network|
|US6115830||Mar 28, 1998||Sep 5, 2000||Compaq Computer Corporation||Failure recovery for process relationships in a single system image environment|
|US6119143||May 22, 1997||Sep 12, 2000||International Business Machines Corporation||Computer system and method for load balancing with selective control|
|US6138159||Jun 11, 1998||Oct 24, 2000||Phaal; Peter||Load direction mechanism|
|US6145089||Nov 10, 1997||Nov 7, 2000||Legato Systems, Inc.||Server fail-over system|
|US6202080 *||Dec 11, 1997||Mar 13, 2001||Nortel Networks Limited||Apparatus and method for computer job workload distribution|
|US6202149||Sep 30, 1998||Mar 13, 2001||Ncr Corporation||Automated application fail-over for coordinating applications with DBMS availability|
|US6578064 *||Oct 7, 1998||Jun 10, 2003||Hitachi, Ltd.||Distributed computing system|
|US6622259 *||Jul 14, 2000||Sep 16, 2003||International Business Machines Corporation||Non-disruptive migration of coordinator services in a distributed computer system|
|US20040205767 *||Aug 5, 2002||Oct 14, 2004||Jukka Partanen||Controlling processing networks|
|US20040236860 *||Jan 26, 2004||Nov 25, 2004||Gary Logston||Method and apparatus for balancing distributed applications|
|US20050010567 *||Jul 30, 2004||Jan 13, 2005||Barth Brian E.||Method and apparatus for dynamic information connection search engine|
|EP1024428A2||Jan 27, 2000||Aug 2, 2000||International Business Machines Corporation||Managing a clustered computer system|
|1||U.S. Appl. No. 09/637,093, "A Cluster-Based System and Method of Recovery from Server Failures," Nam Nguyen, et al., filed Aug. 10, 2000.|
|2||U.S. Appl. No. 09/770,523, "System and Method for Identifying Memory Modules Having a Failing or Defective Address," filed Jan. 26, 2001.|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US7139928 *||Oct 17, 2002||Nov 21, 2006||Cisco Technology, Inc.||Method and system for providing redundancy within a network element|
|US7287179 *||May 15, 2003||Oct 23, 2007||International Business Machines Corporation||Autonomic failover of grid-based services|
|US7302608 *||Mar 31, 2004||Nov 27, 2007||Google Inc.||Systems and methods for automatic repair and replacement of networked machines|
|US7334062 *||Jul 22, 2003||Feb 19, 2008||Symantec Operating Corporation||Technique to monitor application behavior and tune replication performance|
|US7346800 *||Jul 30, 2007||Mar 18, 2008||Hitachi, Ltd.||Fail over method through disk take over and computer system having failover function|
|US7409576 *||Sep 8, 2004||Aug 5, 2008||Hewlett-Packard Development Company, L.P.||High-availability cluster with proactive maintenance|
|US7433931||Nov 17, 2004||Oct 7, 2008||Raytheon Company||Scheduling in a high-performance computing (HPC) system|
|US7475274||Nov 17, 2004||Jan 6, 2009||Raytheon Company||Fault tolerance and recovery in a high-performance computing (HPC) system|
|US7478149 *||Jun 27, 2003||Jan 13, 2009||Symantec Operating Corporation||Business continuation policy for server consolidation environment|
|US7516353||May 16, 2008||Apr 7, 2009||Hitachi, Ltd.||Fall over method through disk take over and computer system having failover function|
|US7529822 *||May 31, 2002||May 5, 2009||Symantec Operating Corporation||Business continuation policy for server consolidation environment|
|US7543174 *||Sep 24, 2003||Jun 2, 2009||Symantec Operating Corporation||Providing high availability for an application by rapidly provisioning a node and failing over to the node|
|US7574620 *||Aug 2, 2006||Aug 11, 2009||Fujitsu Siemens Computers Gmbh||Method for operating an arrangement of a plurality of computers in the event of a computer failure|
|US7669235||Feb 23, 2010||Microsoft Corporation||Secure domain join for computing devices|
|US7684964||Sep 8, 2005||Mar 23, 2010||Microsoft Corporation||Model and system state synchronization|
|US7689676||Mar 30, 2010||Microsoft Corporation||Model-based policy application|
|US7711121||Nov 2, 2004||May 4, 2010||Microsoft Corporation||System and method for distributed management of shared computers|
|US7711977 *||Apr 15, 2004||May 4, 2010||Raytheon Company||System and method for detecting and managing HPC node failure|
|US7711983 *||Mar 30, 2007||May 4, 2010||Hitachi, Ltd.||Fail over method for computer system|
|US7739380||Nov 12, 2004||Jun 15, 2010||Microsoft Corporation||System and method for distributed management of shared computers|
|US7765501||Mar 25, 2004||Jul 27, 2010||Microsoft Corporation||Settings and constraints validation to enable design for operations|
|US7778422||Feb 27, 2004||Aug 17, 2010||Microsoft Corporation||Security associations for devices|
|US7792931||Sep 7, 2010||Microsoft Corporation||Model-based system provisioning|
|US7797147||Sep 14, 2010||Microsoft Corporation||Model-based system monitoring|
|US7814126||Oct 12, 2010||Microsoft Corporation||Using task sequences to manage devices|
|US7814491 *||Oct 12, 2010||Oracle America, Inc.||Method and apparatus for managing system resources using a container model|
|US7827136 *||Jun 27, 2003||Nov 2, 2010||Emc Corporation||Management for replication of data stored in a data storage environment including a system and method for failover protection of software agents operating in the environment|
|US7853835 *||Aug 20, 2008||Dec 14, 2010||Hitachi, Ltd.||Cluster system wherein failover reset signals are sent from nodes according to their priority|
|US7861109||Nov 30, 2007||Dec 28, 2010||Cisco Technology, Inc.||Method and system for optimized switchover of redundant forwarding engines|
|US7886041||Mar 1, 2004||Feb 8, 2011||Microsoft Corporation||Design time validation of systems|
|US7890543||Oct 24, 2003||Feb 15, 2011||Microsoft Corporation||Architecture for distributed computing system and automated design, deployment, and management of distributed applications|
|US7890951||Feb 15, 2011||Microsoft Corporation||Model-based provisioning of test environments|
|US7900206 *||Mar 31, 2004||Mar 1, 2011||Symantec Operating Corporation||Information technology process workflow for data centers|
|US7912940||Mar 22, 2011||Microsoft Corporation||Network system role determination|
|US7941309||Nov 2, 2005||May 10, 2011||Microsoft Corporation||Modeling IT operations/policies|
|US8041986||Oct 18, 2011||Hitachi, Ltd.||Take over method for computer system|
|US8069368||May 4, 2009||Nov 29, 2011||Hitachi, Ltd.||Failover method through disk takeover and computer system having failover function|
|US8095691 *||Jan 10, 2012||International Business Machines Corporation||Multi-node configuration of processor cards connected via processor fabrics|
|US8121026 *||Sep 29, 2009||Feb 21, 2012||Juniper Networks, Inc.||Systems and methods for routing data in a communications network|
|US8122089||Jun 29, 2007||Feb 21, 2012||Microsoft Corporation||High availability transport|
|US8122106||Oct 24, 2003||Feb 21, 2012||Microsoft Corporation||Integrating design, deployment, and management phases for systems|
|US8190714||Apr 15, 2004||May 29, 2012||Raytheon Company||System and method for computer cluster virtualization using dynamic boot images and virtual disk|
|US8209395||Jun 26, 2012||Raytheon Company||Scheduling in a high-performance computing (HPC) system|
|US8230256 *||Jun 6, 2008||Jul 24, 2012||Symantec Corporation||Method and apparatus for achieving high availability for an application in a computer cluster|
|US8244882||Aug 14, 2012||Raytheon Company||On-demand instantiation in a high-performance computing (HPC) system|
|US8296601||Sep 21, 2011||Oct 23, 2012||Hitachi, Ltd||Take over method for computer system|
|US8312319||Nov 13, 2012||Hitachi, Ltd.||Failover method through disk takeover and computer system having failover function|
|US8326990 *||Dec 4, 2012||Symantec Operating Corporation||Automated optimal workload balancing during failover in share-nothing database systems|
|US8335909||Apr 15, 2004||Dec 18, 2012||Raytheon Company||Coupling processors to each other for high performance computing (HPC)|
|US8336040||Dec 18, 2012||Raytheon Company||System and method for topology-aware job scheduling and backfilling in an HPC environment|
|US8458515||Nov 16, 2009||Jun 4, 2013||Symantec Corporation||Raid5 recovery in a high availability object based file system|
|US8489728||Apr 15, 2005||Jul 16, 2013||Microsoft Corporation||Model-based system monitoring|
|US8495323||Dec 7, 2010||Jul 23, 2013||Symantec Corporation||Method and system of providing exclusive and secure access to virtual storage objects in a virtual machine cluster|
|US8549513||Jun 29, 2005||Oct 1, 2013||Microsoft Corporation||Model-based virtual system provisioning|
|US8601314||Oct 5, 2012||Dec 3, 2013||Hitachi, Ltd.||Failover method through disk take over and computer system having failover function|
|US8645454||Dec 28, 2010||Feb 4, 2014||Canon Kabushiki Kaisha||Task allocation multiple nodes in a distributed computing system|
|US8782098||Sep 1, 2010||Jul 15, 2014||Microsoft Corporation||Using task sequences to manage devices|
|US8812530 *||Feb 4, 2010||Aug 19, 2014||Nec Corporation||Data processing device and data processing method|
|US8910175||Oct 11, 2013||Dec 9, 2014||Raytheon Company||System and method for topology-aware job scheduling and backfilling in an HPC environment|
|US8918673 *||Jun 14, 2012||Dec 23, 2014||Symantec Corporation||Systems and methods for proactively evaluating failover nodes prior to the occurrence of failover events|
|US8935563 *||Jun 15, 2012||Jan 13, 2015||Symantec Corporation||Systems and methods for facilitating substantially continuous availability of multi-tier applications within computer clusters|
|US8954786||Jul 28, 2011||Feb 10, 2015||Oracle International Corporation||Failover data replication to a preferred list of instances|
|US8984525||Oct 11, 2013||Mar 17, 2015||Raytheon Company||System and method for topology-aware job scheduling and backfilling in an HPC environment|
|US9037833||Dec 12, 2012||May 19, 2015||Raytheon Company||High performance computing (HPC) node having a plurality of switch coupled processors|
|US9178784||Apr 15, 2004||Nov 3, 2015||Raytheon Company||System and method for cluster management based on HPC architecture|
|US9189275||Dec 12, 2012||Nov 17, 2015||Raytheon Company||System and method for topology-aware job scheduling and backfilling in an HPC environment|
|US9189278||Oct 11, 2013||Nov 17, 2015||Raytheon Company||System and method for topology-aware job scheduling and backfilling in an HPC environment|
|US9280433||Jan 13, 2012||Mar 8, 2016||Microsoft Technology Licensing, Llc||Hardware diagnostics and software recovery on headless server appliances|
|US9317270||Sep 30, 2013||Apr 19, 2016||Microsoft Technology Licensing, Llc||Model-based virtual system provisioning|
|US9344494||Aug 30, 2011||May 17, 2016||Oracle International Corporation||Failover data replication with colocation of session state data|
|US9361344||Jun 2, 2015||Jun 7, 2016||Facebook, Inc.||System and method for distributed database query engines|
|US20020133601 *||Mar 16, 2001||Sep 19, 2002||Kennamer Walter J.||Failover of servers over which data is partitioned|
|US20040153708 *||Jun 27, 2003||Aug 5, 2004||Joshi Darshan B.||Business continuation policy for server consolidation environment|
|US20040153841 *||Jan 16, 2003||Aug 5, 2004||Silicon Graphics, Inc.||Failure hierarchy in a cluster filesystem|
|US20040243915 *||May 15, 2003||Dec 2, 2004||International Business Machines Corporation||Autonomic failover of grid-based services|
|US20040267716 *||Jun 25, 2003||Dec 30, 2004||Munisamy Prabu||Using task sequences to manage devices|
|US20050125557 *||Dec 8, 2003||Jun 9, 2005||Dell Products L.P.||Transaction transfer during a failover of a cluster controller|
|US20050234846 *||Apr 15, 2004||Oct 20, 2005||Raytheon Company||System and method for computer cluster virtualization using dynamic boot images and virtual disk|
|US20050235055 *||Apr 15, 2004||Oct 20, 2005||Raytheon Company||Graphical user interface for managing HPC clusters|
|US20050235092 *||Apr 15, 2004||Oct 20, 2005||Raytheon Company||High performance computing system and method|
|US20050235286 *||Apr 15, 2004||Oct 20, 2005||Raytheon Company||System and method for topology-aware job scheduling and backfilling in an HPC environment|
|US20050246569 *||Apr 15, 2004||Nov 3, 2005||Raytheon Company||System and method for detecting and managing HPC node failure|
|US20050246771 *||May 25, 2004||Nov 3, 2005||Microsoft Corporation||Secure domain join for computing devices|
|US20050251567 *||Apr 15, 2004||Nov 10, 2005||Raytheon Company||System and method for cluster management based on HPC architecture|
|US20050251783 *||Mar 25, 2004||Nov 10, 2005||Microsoft Corporation||Settings and constraints validation to enable design for operations|
|US20060015773 *||Jul 16, 2004||Jan 19, 2006||Dell Products L.P.||System and method for failure recovery and load balancing in a cluster network|
|US20060031248 *||Mar 10, 2005||Feb 9, 2006||Microsoft Corporation||Model-based system provisioning|
|US20060053337 *||Sep 8, 2004||Mar 9, 2006||Pomaranski Ken G||High-availability cluster with proactive maintenance|
|US20060069805 *||Jul 30, 2004||Mar 30, 2006||Microsoft Corporation||Network system role determination|
|US20060106931 *||Nov 17, 2004||May 18, 2006||Raytheon Company||Scheduling in a high-performance computing (HPC) system|
|US20060112297 *||Nov 17, 2004||May 25, 2006||Raytheon Company||Fault tolerance and recovery in a high-performance computing (HPC) system|
|US20060117208 *||Nov 17, 2004||Jun 1, 2006||Raytheon Company||On-demand instantiation in a high-performance computing (HPC) system|
|US20070013703 *||Jun 20, 2006||Jan 18, 2007||Babel S.R.L.||Device for state sharing high-reliability in a computer system|
|US20070038885 *||Aug 2, 2006||Feb 15, 2007||Klaus Hartung||Method for operating an arrangement of a plurality of computers in the event of a computer failure|
|US20070260913 *||Jul 30, 2007||Nov 8, 2007||Keisuke Hatasaki||Fail over method through disk take over and computer system having failover function|
|US20080068986 *||Nov 30, 2007||Mar 20, 2008||Maranhao Marcus A||Method and system for optimized switchover of redundant forwarding engines|
|US20080091746 *||Mar 30, 2007||Apr 17, 2008||Keisuke Hatasaki||Disaster recovery method for computer system|
|US20080114879 *||May 9, 2007||May 15, 2008||Microsoft Corporation||Deployment of configuration data within a server farm|
|US20080168310 *||Jan 5, 2007||Jul 10, 2008||Microsoft Corporation||Hardware diagnostics and software recovery on headless server appliances|
|US20080235533 *||May 16, 2008||Sep 25, 2008||Keisuke Hatasaki||Fall over method through disk take over and computer system having failover function|
|US20090024868 *||May 31, 2002||Jan 22, 2009||Joshi Darshan B||Business continuation policy for server consolidation environment|
|US20090031316 *||Oct 7, 2008||Jan 29, 2009||Raytheon Company||Scheduling in a High-Performance Computing (HPC) System|
|US20090089609 *||Aug 20, 2008||Apr 2, 2009||Tsunehiko Baba||Cluster system wherein failover reset signals are sent from nodes according to their priority|
|US20100014416 *||Sep 29, 2009||Jan 21, 2010||Juniper Networks, Inc.||Systems and methods for routing data in a communications network|
|US20100180148 *||Mar 26, 2010||Jul 15, 2010||Hitachi, Ltd.||Take over method for computer system|
|US20100205151 *||Feb 4, 2010||Aug 12, 2010||Takeshi Kuroide||Data processing device and data processing method|
|US20100268986 *||Jul 2, 2010||Oct 21, 2010||International Business Machines Corporation||Multi-node configuration of processor cards connected via processor fabrics|
|US20100333086 *||Sep 1, 2010||Dec 30, 2010||Microsoft Corporation||Using Task Sequences to Manage Devices|
|CN101799779B||Feb 8, 2010||Jun 18, 2014||日本电气株式会社||Data processing device and data processing method|
|EP2752779B1 *||Dec 19, 2013||Jun 29, 2016||Facebook, Inc.||System and method for distributed database query engines|
|U.S. Classification||714/4.11, 714/E11.073|
|International Classification||H04L29/14, G06F11/20|
|Cooperative Classification||G06F11/2046, H04L69/40, G06F11/2028, G06F11/2035|
|European Classification||H04L29/14, G06F11/20P2, G06F11/20P4, G06F11/20P2E|
|Aug 9, 2001||AS||Assignment|
Owner name: DELL PRODUCTS, L.P., TEXAS
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MASHAYEKHI, VICTOR;HSIEH, JENWEI;ROOHOLAMINI, REZA;REEL/FRAME:012067/0373
Effective date: 20010806
|Jun 14, 2005||AS||Assignment|
Owner name: DELL PRODUCTS L.P., TEXAS
Free format text: RE-RECORD TO CORRECT THE NAME OF THE THIRD ASSIGNOR, PREVIOUSLY RECORDED ON REEL 012067 FRAME 0373.;ASSIGNORS:MASHAYEKHI, VICTOR;HSIEH, JENWEI;ROOHOLAMINI, MOHAMAD REZA;REEL/FRAME:016332/0939
Effective date: 20010806
|Jan 26, 2009||FPAY||Fee payment|
Year of fee payment: 4
|Jan 28, 2013||FPAY||Fee payment|
Year of fee payment: 8
|Jan 2, 2014||AS||Assignment|
Owner name: BANK OF AMERICA, N.A., AS COLLATERAL AGENT, NORTH
Free format text: PATENT SECURITY AGREEMENT (TERM LOAN);ASSIGNORS:DELL INC.;APPASSURE SOFTWARE, INC.;ASAP SOFTWARE EXPRESS, INC.;AND OTHERS;REEL/FRAME:031899/0261
Effective date: 20131029
Owner name: BANK OF AMERICA, N.A., AS ADMINISTRATIVE AGENT, TE
Free format text: PATENT SECURITY AGREEMENT (ABL);ASSIGNORS:DELL INC.;APPASSURE SOFTWARE, INC.;ASAP SOFTWARE EXPRESS,INC.;AND OTHERS;REEL/FRAME:031898/0001
Effective date: 20131029
Owner name: BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS FI
Free format text: PATENT SECURITY AGREEMENT (NOTES);ASSIGNORS:APPASSURE SOFTWARE, INC.;ASAP SOFTWARE EXPRESS, INC.;BOOMI, INC.;AND OTHERS;REEL/FRAME:031897/0348
Effective date: 20131029