US 20080282267 A1
Techniques are disclosed for determining placements of application instances on computing resources in a computing system such that the application instances can be executed thereon. By way of example, a method for determining an application instance placement in a set of machines under one or more resource constraints includes the following steps. An estimate is computed of a value of the first metric that can be achieved by a current application instance placement and a current application load distribution. A new application instance placement and a new application load distribution are determined, wherein the new application instance placement and the new load distribution optimize the first metric.
1. A method for determining an application instance placement in a set of machines under one or more resource constraints, the method comprising the steps of:
computing an estimate of a value of a first metric that can be achieved by a current application instance placement and a current application load distribution; and
determining a new application instance placement and a new application load distribution that optimizes the first metric.
2. The method of
3. The method of
4. The method of
5. The method of
6. The method of
7. The method of
8. The method of
9. The method of
10. The method of
11. The method of
12. The method of
13. The method of
14. The method of
15. The method of
16. Apparatus for determining an application instance placement in a set of machines under one or more resource constraints, the apparatus comprising:
a memory; and
at least one processor coupled to the memory and operative to: (i) compute an estimate of a value of a first metric that can be achieved by a current application instance placement and a current application load distribution; and (ii) determine a new application instance placement and a new application load distribution that optimizes the first metric.
17. An article of manufacture for determining an application instance placement in a set of machines under one or more resource constraints, comprising a machine readable medium containing one or more programs which when executed implement the steps of:
computing an estimate of a value of a first metric that can be achieved by a current application instance placement and a current application load distribution; and
determining a new application instance placement and a new application load distribution that optimizes the first metric.
This application is a continuation of pending U.S. application Ser. No. 11/473,818 filed on Jun. 23, 2006, the disclosure of which is incorporated herein by reference.
The present invention generally relates to computing systems and, more particularly, to techniques for determining placements of application instances on computing resources in a computing system such that the application instances can be executed thereon.
With the rapid growth of the Internet, many organizations increasingly rely on web (i.e., World Wide Web) applications to deliver critical services to their customers and partners. An “application” generally refers to software code (e.g., one or more programs) which perform one or more functions.
Over the course of a decade, web applications have evolved from the early HyperText Transport Protocol (HTTP) servers that only deliver static HyperText Markup Language (HTML) files, to the current ones that run in sophisticated distributed environments, e.g., Java 2 Enterprise Edition (J2EE), and provide a diversity of services such as online shopping, online banking, and web search. Modern Internet data centers may run thousands of machines to host a large number of different web applications. Many web applications are resource demanding and process client requests at a high rate. Previous studies have shown that the web request rate is bursty in nature and can fluctuate dramatically in a short period of time. Therefore, it is not cost-effective to over provision data centers in order to handle the potential peak demands of all the applications.
To utilize system resources more effectively, modern web applications typically run on top of a middleware system and rely on it to dynamically allocate resources to meet the applications' performance goals. “Middleware” generally refers to the software layer that lies between the operating system and the applications. Some middleware systems use a clustering technology to improve scalability, availability and load balancing, by integrating multiple instances of the same application, and presenting them to the users as a single virtual application.
Principles of the invention provide techniques for determining placements of application instances on computing resources in a computing system such that the application instances can be executed thereon.
By way of example, in one aspect of the invention, a method for determining an application instance placement in a set of machines under one or more resource constraints includes the following steps. An estimate is computed of a value of the first metric that can be achieved by a current application instance placement and a current application load distribution. A new application instance placement and a new application load distribution are determined, wherein the new application instance placement and the new application load distribution optimize the first metric.
The determining step may further include the new application instance placement improving upon the first metric and the new load distribution improving upon a second metric. The determining step may further include shifting an application load, changing the application instance placement without pinning to determine a first candidate placement, changing the application instance placement with pinning to determine a second candidate placement, and selecting a best placement from the first candidate placement and the second candidate placement as the new application instance placement. The determining step may be performed multiple times.
The method may also include the step of balancing an application load across the set of machines.
The first metric may include a total number of satisfied demands, a total number of placement changes, or an extent to which an application load is balanced across the set of machines.
One of the one or more resource constraints may include a processing capacity or a memory capacity.
The second metric may include a degree of correlation between residual resources on each machine of the set of machines, or a number of underutilized application instances.
These and other objects, features and advantages of the present invention will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.
Illustrative principles of the invention will be explained below in the context of an Internet-based/web application environment. However, it is to be understood that the present invention is not limited to such an environment. Rather, the invention is more generally applicable to any data processing environment in which it would be desirable to provide improved processing performance.
In the illustrative description below, the following problem is addressed. Given a set of machines (computing systems or servers) and a set of web applications with dynamically changing demands (e.g., the number of client requests for use of the application), an application placement controller decides how many instances to run for each application and where to put them (i.e., which machines to assign them to), while observing a variety of resource constraints. “Instances” of an application generally refer to identical copies of the application, but can also refer to different or even overlapping parts of the application. This problem is considered non-deterministic polynomial-time (NP) hard. Illustrative principles of the invention propose an online algorithm that uses heuristics to efficiently solve this problem. The algorithm allows multiple applications to share a single machine, and strives to maximize the total satisfied application demand, to minimize the number of application starts and stops, and to balance the load across machines. It is to be understood that reasonable extensions of the proposed algorithm can also optimize for other performance goals, for example, maximize or minimize certain user specified utility functions.
Flow control and load balancing decide how to dynamically allocate resources to the running application instances. Illustrative principles of the invention address an equally important problem. That is, given a set of machines with constrained resources and a set of web applications with dynamically changing demands, we determine how many instances to run for each application and what machine to execute them on.
We call this problem dynamic application placement. We assume that not every machine can run all the applications at the same time due to limited resources such as memory.
Application placement is orthogonal to flow control and load balancing, and the quality of a placement solution can have profound impacts on the performance of the entire system (i.e., the complete set of machines used for hosting applications). In
We illustratively formulate the application placement problem as a variant of the Class Constrained Multiple-Knapsack Problem (see, e.g., H. Shachnai and T. Tamir, “Noah's bagels—some combinatorial aspects,” In Proc. 1st Int. Conf. on Fun with Algorithms, 1998; and H. Shachnai and T. Tamir, “On two class-constrained versions of the multiple knapsack problem,” Algorithmica, 29(3), pp. 442-467, 2001). Under multiple resource constraints (e.g., CPU and memory) and application constraints (e.g., the need for special hardware or software), an automated placement algorithm strives to produce placement solutions that optimize multiple objectives: (1) maximizing the total satisfied application demand, (2) minimizing the total number of application starts and stops, and (3) balancing the load across machines. It is to be understood that we can also optimize for other objective functions, for example, a user specified utility function.
The placement problem is NP hard. In one embodiment, the invention provides an online heuristic algorithm that can produce within 30 seconds high-quality solutions for hard placement problems with thousands of machines and thousands of application. This scalability is crucial for dynamic resource provisioning in large-scale enterprise data centers. Compared with existing algorithms, for systems with 100 machines or less, the proposed algorithm is up to 134 times faster, reduces the number of application starts and stops by up to a factor of 32, and satisfies up to 25% more application demands.
The remainder of the detailed description is organized as follows. Section I formulates the application placement problem. Section II describes an illustrative placement algorithm.
Inputs 204 to placement controller 202 include the current placement of applications on machines (matrix I), the resource capacity of each machine (CPU capacity vector Ω and memory capacity vector Γ), the projected resource demand of each application (CPU demand vector ω and memory demand vector γ), and the restrictions that specify whether a given application can run on a given machine (matrix R), e.g., some application may require machines with special hardware or software. It is to be appreciated that such inputs are collected by auxiliary components. That is, placement sensor 205 generates and maintains current placement matrix I. Application demand estimator 206 generates and maintains the projected resource demand of each application (CPU demand vector ω and memory demand vector γ). Configuration database 207 maintains the resource capacity of each machine (CPU capacity vector Ω and memory capacity vector Γ).
Taking inputs 204, placement controller 202 generates outputs 208 including new placement matrix I and load distribution matrix L. That is, placement controller 202 computes a new placement solution (new matrix I) that optimizes certain objective functions, and then passes the solution to placement executor 209 to start and stop application instances accordingly. The placement executor schedules placement changes in such a way that they impose minimum disturbances to the running system. Periodically every T minutes, the placement controller produces a new placement solution based on the current inputs. By way of example only, T=15 minutes may be a default configuration.
Estimating application demands is a non-trivial task. In one embodiment, we use online profiling and linear regression to dynamically estimate the average CPU cycles needed to process one web request for a given application. The product of the estimated CPU cycles per request and the projected request rate gives the CPU cycles needed by the application per second. However, it is to be understood that other known techniques for estimating application demand may be used.
The remainder of this section presents the formal formulation of the illustrative placement problem. We first discuss the system resources and application demands considered in the placement problem. An application's demands for resources can be characterized as either load-dependent or load-independent. A running application instance's consumption of load-dependent resources depends on the request rate. Examples of such resources include CPU cycles and network bandwidth. A running application instance also consumes some load-independent resources regardless of the offered load, i.e., even if it processes no requests. An example of such resources is the process control block (PCB) maintained in the operating system kernel for each running program.
In this embodiment, for practical reasons, we treat memory as a load-independent resource, and conservatively estimate the memory usage to ensure that every running application has sufficient memory. It is assumed that the system includes a component that dynamically estimates the upper limit of an application's near-term memory usage based on a time series of its past memory usage. Because the memory usage estimation is updated dynamically, some load-dependent aspects of memory are indirectly considered by the placement controller.
We treat memory as a load-independent resource for several reasons. First, a significant amount of memory is consumed by an application instance even if it receives no requests. Second, memory consumption is often related to prior application usage rather than its current load. For example, even in the presence of a low load, memory usage may still be high as a result of data caching. Third, because an accurate projection of future memory usage is extremely difficult and many applications cannot run when the system is out of memory, it is more reasonable to be conservative in the estimation of memory usage, i.e., using the upper limit instead of the average.
Among many load-dependent and load-independent resources, we choose CPU and memory as the representative ones to be considered by the placement controller, because we observe that they are the most common bottleneck resources. For example, our experience shows that many business J2EE applications require on average 1-2 GB (gigabyte) real memory to run. For brevity, the description of the algorithm only considers CPU and memory, but it is to be understood that the algorithm can consider other types of resources as well. For example, if the system is network-bounded, we can use network bandwidth as the load-dependent resource, which introduces no changes to the algorithm.
Next, we present the formal formulation of the placement problem.
The outputs 208 of placement controller 202 are the updated placement matrix I and the load distribution matrix L. Placement executor 209 starts and stops application instances according to the difference between the old and new placement matrices. The load distribution matrix L is a byproduct. It helps verify the maximum total application demand that can be satisfied by the new placement matrix I. L may or may not be directly used by the placement executor or the request router. The request router may dynamically balance the load according to the real received demands rather than the load distribution matrix L computed based on the projected demands.
Placement controller 202 strives to find a placement solution that maximizes the total satisfied application demand. Again, it is to be understood that this is just one example of the optimization goal. That is, principles of the invention may also be used to optimize for other objective functions instead of maximizing the total satisfied demand, for example, maximize certain user-specified utility function. In addition, the placement controller also tries to minimize the total number of application starts and stops, because placement changes disturb the running system and waste CPU cycles. In practice, many J2EE applications take a few minutes to start or stop, and take some additional time to warm up their data cache. The last optimization goal is to balance the load across machines. Ideally, the utilization of individual machines should stay close to the utilization p of the entire system:
As we are dealing with multiple optimization objectives, we prioritize them in the formal problem statement below. Let I* denote the old placement matrix, and I denote the new placement matrix:
As mentioned above, this optimization problem is a variant of the Class Constrained Multiple-Knapsack problem. It differs from the prior formulation mainly in that it also minimizes the number of placement changes. This problem is NP hard. In the next section, we present an online heuristic algorithm for solving the optimization problem.
This section describes an illustrative embodiment of a placement algorithm, which can efficiently find high-quality placement solutions even under tight resource constraints.
The core of the place( ) function is a loop that incrementally optimizes the placement solution. Inside the loop, the algorithm first solves the max-flow problem (see, e.g., R. K. Ahuja, T. L. Magnanti, and J. B. Orlin, editors, “Network Flows: Theory, Algorithms, and Applications,” Prentice Hall, New Jersey, 1993, ISBN 1000499012) in
Below, we first define some terms that will be used in the algorithm description (subsection A), and then generally describe key concepts of the algorithm (subsections B and C). Finally, we describe in detail the load-shifting subroutine (subsection D), the placement-changing subroutine (subsection E), and the full placement algorithm (subsection F) that invokes the two subroutines.
A. Definition of Terms
A machine is fully utilized if its residual CPU capacity is zero (Ω*n=0); otherwise, it is underutilized. An application instance is fully utilized if it runs on a fully utilized machine. An instance of application m running on an underutilized machine n is completely idle if it has no load (Lm,n=0); otherwise, it is underutilized. The load of an underutilized instance of application m can be increased if application m has a positive residual CPU demand (ω*m>0). Note that the definition of a machine's utilization is solely based on its CPU usage.
The CPU-memory ratio of a machine n is defined as its CPU capacity divided by its memory capacity, i.e., Ωn/Γn. Intuitively, it is harder to fully utilize the CPU of machines with a high CPU-memory ratio. The load-memory ratio of an instance of application m running on machine n is defined as the CPU load of this instance divided by its memory consumption, i.e., Lm,n/γm. Intuitively, application instances with a higher load-memory ratio are more useful.
B. Load Shifting
Solving the max-flow problem in
C. Placement Changing
The load_shifting( ) subroutine prepares the load distribution in a way that makes later placement changes easier. The placement_changing( ) subroutine further employs several heuristics to increase the total satisfied application demand, to reduce placement changes, and to reduce computation time.
D. Load-Shifting Subroutine
Given the current application demands, the placement algorithm solves a max-flow problem to derive the maximum total demand that can be satisfied by the current placement matrix I.
When the load distribution problem is formulated as this max-flow problem, the maximum volume of flows going from the source node to the sink node is the maximum total demand w that can be satisfied by the current placement matrix I. Efficient algorithms to solve max-flow problems are well known (see, e.g., R. K. Ahuja, T. L. Magnanti, and J. B. Orlin, editors, “Network Flows: Theory, Algorithms, and Applications,” Prentice Hall, New Jersey, 1993, ISBN 1000499012). If w equals to the total application demand, no placement changes are needed. Otherwise, some placement changes are made in order to satisfy more demands. Before doing so, the load distribution matrix L produced by solving the max-flow problem in
The task of load shifting is accomplished by solving the min-cost max-flow problem in
The load distribution matrix L produced by solving the min-cost max-flow problem in
E. Placement-Changing Subroutine
The placement-changing subroutine takes as input the current placement matrix I, the load distribution matrix L generated by the load-shifting subroutine, and the residual application demands not satisfied by L. It tries to increase the total satisfied application demand by making some placement changes, for instance, stopping idle application instances and starting useful ones. Again, note that the “placement changes” in the algorithm description are all hypothetical.
As shown in
In the rest of this subsection, we describe the three nested loops in more detail.
The Outermost Loop. Before entering the outermost loop, the algorithm first computes the residual CPU demand of each application. We refer to the applications with a positive residual CPU demand (i.e., w*n>0) as residual applications. The algorithm inserts all the residual application into a right-threaded AVL (Adelson-Velsky Landis) tree called residual_app_tree. The applications in the tree are sorted in decreasing order of residual demand. As the algorithm progresses, the residual demand of applications may change, and the tree is updated accordingly. The algorithm also keeps track of the minimum memory requirement γmin of applications in the tree,
where γm is the memory needed to run one instance of application m. The algorithm uses γm to speedup the computation in the innermost loop. If a machine n's residual memory is smaller than γmin (i.e., Γ*n<γmin), the algorithm can immediately infer that this machine cannot accept any applications in the residual_app_tree.
The algorithm excludes fully utilized machines from the consideration of placement changes, and sorts the underutilized machines in decreasing order of CPU-memory ratio. Starting from the machine with the highest CPU-memory ratio, it enumerates each underutilized machine, and asks the intermediate loop to compute a placement solution for the machine. Because it is harder to fully utilize the CPU of machines with a high CPU-memory ratio, we prefer to process them first when we still have abundant options.
The Intermediate Loop. Taking as input the residual_app_tree and a machine n given by the outermost loop, the intermediate loop computes a placement solution for machine n. Suppose machine n currently runs c not-pinned application instances. Application instance pinning is described below. We can stop a subset of the c applications, and use the residual resources to run other applications. In total, there are 2c cases to consider. We use a heuristic to reduce this number to c+1. Intuitively, we prefer to stop the less “useful” application instances, i.e., those with a low load-memory ratio (Lm,n/γm).
The algorithm first sorts the not-pinned application instances on machine n in increasing order of load-memory ratio. Let (M1, M2, . . . , Mc) denote this sorted list. The intermediate loop iterates over a variable j (0≦j≦c). In iteration j, it stops on machine n the j applications (M1, M2, . . . , Mj) while keeping the other running applications intact, and then asks the innermost loop to find appropriate applications to consume machine n's residual resources that become available after stopping the j applications. As the intermediate loop varies the number of stopped applications from 0 to c, it collects c+1 placement solutions, among which it picks as the final solution the one that leads to the highest CPU utilization of machine n.
We illustrate this through an example. Suppose machine n currently runs three not-pinned application instances (M1, M2, M3) sorted in increasing order of load-memory ratio. Intuitively, M3 is more useful than M2, and M2 is more useful than M1. The algorithm tries four placement solutions. In solution 1, it stops none of M1, M2, and M3. In solution 2, it stops M1 but keeps M2 and M3. In solution 3, it stops M1 and M2, but keeps M3. In solution 4, it stops M0, M1, and M2. For each solution, the innermost loop finds appropriate applications to consume machine n's residual resources that become available after stopping the applications. Among the four solutions, the algorithm picks the best one as the final solution.
The Innermost Loop. The intermediate loop changes the number of applications to stop. The innermost loop uses machine n's residual resources to run some residual applications. Recall that the residual_app_tree is sorted in decreasing order of residual CPU demand. The innermost loop iterates over the residual applications, starting from the one with the largest residual demand. When an application m is under consideration, it checks two conditions: (1) if the restriction matrix R allows application m to run on machine n, and (2) if machine n has sufficient residual memory to host application m, (i.e., γm≦Γ*n). If both conditions are satisfied, it places application m on machine n, and assigns as much load as possible to this instance until either machine n's CPU is fully utilized or application m has no residual demand. After this allocation, application m's residual demand changes, and the residual_app_tree is updated accordingly.
The algorithm loops over the residual applications until either: (1) all the residual applications have been considered once; or (2) machine n's CPU becomes fully utilized; or (3) machine n's residual memory is insufficient to host any residual application (i.e., Γ*n<γmin, see Equation 12). Typically, after hosting a few residual applications, machine n's residual memory quickly becomes too small to host more residual applications. Therefore, the third condition helps reduce computation time.
F. Full Placement Algorithm
While the placement algorithm is outlined in
The placement algorithm incrementally optimizes the placement solution in multiple rounds. In one round, it first invokes the load-shifting subroutine and then invokes the placement-changing subroutine. It repeats for up to K rounds, but quits earlier it sees no improvement in the total satisfied application demand after one round of execution. The last step of the algorithm balances the load across machines. By way of example only, we use the load-balancing component from an exiting algorithm (A. Karve, T. Kimbrel, G. Pacifici, M. Spreitzer, M. Steinder, M. Sviridenko, and A. Tantawi, “Dynamic Application Placement for Clustered Web Applications,” In the International World Wide Web Conference (WWW), May 2006). However, other existing load balancing techniques can be employed. Intuitively, when the algorithm has choices, it moves the new application instances (started by the placement-changing subroutine) among machines to balance the load, while keeping the total satisfied demand and the number of placement changes the same.
The placement algorithm deals with multiple optimization objectives. In addition to maximizing the total satisfied demand, it also strives to minimize placement changes, because they disturb the running system and waste CPU cycles. In practice, many J2EE applications take a few minutes to start or stop, and take some additional time to warm up their data cache. The heuristic for reducing unnecessary placement changes is not to stop application instances whose load (in the load distribution matrix L) is above certain threshold. We refer to them as pinned instances. The intuition is that, even if we stop these instances on their hosting machines, it is likely that we will start instances of the same applications on other machines.
Each application m has its own pinning threshold wm pin. If the value of the threshold is too low, the algorithm may introduce many unnecessary placement changes. If it is too high, the total satisfied demand may be low due to insufficient placement changes. The algorithm computes the pinning thresholds for all the applications from the information gathered in a single dry-run invocation to the placement-changing subroutine. The dry run pins no application instances. After the dry run, the algorithm makes a second invocation to the placement-changing subroutine, and requires pinning the application instances whose load is higher than or equal to the pinning threshold of the corresponding application, i.e., Lm,n≧wm pin. The dry run and the second invocation use exactly the same inputs: the matrices I and L produced by the load-shifting subroutine. Between the two placement solutions produced by the dry run and the second invocation, the algorithm picks as the final solution the one that has a higher total satisfied demand. If the total satisfied demands are equal (e.g., both solutions satisfy all the demands), it picks the one that has less placement changes.
Next, we describe how to compute the pinning threshold wm pin for each application m from the information gathered in the dry run. Intuitively, if the dry run starts a new application instance, then we should not stop any instance of the same application whose load is higher than or equal to that of the new instance. This is because the new instance's load is considered sufficiently high by the dry run so that it is even worthwhile to start a new instance. Let wm new denote the minimum load assigned to a new instance of application m in the dry run.
Here Im,n represents a new instance of application m started on machine n in the dry run. Lm,n is the load of this instance. In addition, the pinning threshold also depends the largest residual demand w*max not satisfied in the dry run.
Here w*m is the residual demand of application m after the dry run. We should not stop the application instances whose load is higher than or equal to w*max. If we stop these instances, they will immediately become the applications that we try to find a place to run. The pinning threshold for application m is computed as follows.
Because we do not want to pin completely idle application instances, Equation 15 stipulates that the pinning threshold wm pin should be at least one CPU cycle per second.
It is to be appreciated that most of the computation time of the placement algorithm is spent on solving the max-flow problem and the min-cost max-flow problem in
Thus, the computing system shown in
As shown, computing system 1000 may be implemented in accordance with a processor 1002, a memory 1004, I/O devices 1006, and a network interface 1008, coupled via a computer bus 1010 or alternate connection arrangement.
It is to be appreciated that the term “processor” as used herein is intended to include any processing device, such as, for example, one that includes a CPU (central processing unit) and/or other processing circuitry. It is also to be understood that the term “processor” may refer to more than one processing device and that various elements associated with a processing device may be shared by other processing devices.
The term “memory” as used herein is intended to include memory associated with a processor or CPU, such as, for example, RAM, ROM, a fixed memory device (e.g., hard drive), a removable memory device (e.g., diskette), flash memory, etc.
In addition, the phrase “input/output devices” or “I/O devices” as used herein is intended to include, for example, one or more input devices (e.g., keyboard, mouse, etc.) for entering data to the processing unit, and/or one or more output devices (e.g., speaker, display, etc.) for presenting results associated with the processing unit. The graphical user interface of
Still further, the phrase “network interface” as used herein is intended to include, for example, one or more transceivers to permit the computing system of
Accordingly, software components including instructions or code for performing the methodologies described herein may be stored in one or more of the associated memory devices (e.g., ROM, fixed or removable memory) and, when ready to be utilized, loaded in part or in whole (e.g., into RAM) and executed by a CPU.
Accordingly, illustrative principles of the invention provide many advantages over existing approaches, for example:
Although illustrative embodiments of the present invention have been described herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that various other changes and modifications may be made by one skilled in the art without departing from the scope or spirit of the invention.