US 20050010388 A1 Abstract An improved method and system for performing dynamic online multi-parameter optimization for autonomic computing systems are provided. With the method and system of the present invention, a simplex, i.e. a set of points in the parameter space that has been directly sampled, is maintained. The system's performance with regard to a particular utility value is measured for the particular setting of configuration parameters associated with each point in the simplex. A new sample point is determined using the geometric transformations of the simplex. The method and system provide mechanisms for limiting the size of the simplex that is generated through these geometric transformations so that the present invention may be implemented in noisy environments in which the same configuration settings may lead to different results with regard to the utility value. In addition, mechanisms are provided for resampling a current best point in the simplex to determine if the environment has changed. If a sufficiently different utility value is obtained from a previously sampled utility value for the point in the simplex, then rather than contracting, the simplex is expanded. If the difference between utility values is not sufficient enough, then contraction of the simplex is performed. In addition, in order to allow for both real and integer valued parameters in the simplex, a mechanism is provided by which invalid valued parameters that are generated by geometric transformations being performed on the simplex are mapped to a nearest valid value. Similarly, parameter values that violate constraints are mapped to values that satisfy constraints taking care that the dimensionality of the simplex is not reduced.
Claims(35) 1. A method, in a data processing system, for determining configuration parameter value settings for a computing device to optimize an operational characteristic of the computing device, comprising:
obtaining a simplex of points, wherein each point in the simplex represents a set of configuration parameters for the computing device; performing a geometric transformation on the simplex of points to identify a new point to investigate; sampling the operational characteristic at the new point; determining if the operational characteristic associated with the new point is worse than a value of the operational characteristic for each point in the simplex of points; determining a set of points in the simplex that need to be resampled if the new point is worse than a value of the operational characteristic for each point int he simplex of points; resampling the operational characteristic at each of the points in the set of points; and determining a new simplex based on the resampled operational characteristic of points in the set of points. 2. The method of 3. The method of extending the simplex in a direction of the new point if the operational characteristic of the new point is better than values of the operational characteristic for each point in the simplex of points. 4. The method of assigning an upper threshold and a lower threshold to a size of the simplex; and limiting expansion or contraction of the simplex based on the assigned upper and lower thresholds. 5. The method of 6. The method of 7. The method of comparing a resampled operational characteristic value for the best point to a previous operational characteristic value for the best point; and determining whether to expand or contract the simplex based on a difference between the resampled operational characteristic value and the previous operational characteristic value. 8. The method of 9. The method of 10. The method of 11. The method of checking the dimensionality of the modified simplex obtained by expanding or contracting the simplex; and not performing the expansion or contraction if the modified simplex would have a different dimensionality from the simplex. 12. The method of converting configuration parameter values of the new point to one of integer and real values based on a value type for the configuration parameters; checking the converted configuration parameter values to determine if a dimensionality of the simplex is changed by the conversion of the configuration parameters; and setting the converted configuration parameter values of the new point that result in a change in the dimensionality of the simplex to converted configuration parameter values that do not reduce the dimensionality of the simplex. 13. The method of setting the converted configuration parameter values to converted configuration parameter values that equal the converted configuration parameter values minus a penalty value. 14. The method of 15. The method of using configuration parameter values of the best point in the simplex to configure the computing device if no improvement of the simplex is obtainable. 16. A computer program product in a computer readable medium for determining configuration parameter value settings for a computing device to optimize an operational characteristic of the computing device, comprising:
first instructions for obtaining a simplex of points, wherein each point in the simplex represents a set of configuration parameters for the computing device; second instructions for performing a geometric transformation on the simplex of points to identify a new point to investigate; third instructions for sampling the operational characteristic at the new point; fourth instructions for determining if the operational characteristic associated with the new point is worse than a value of the operational characteristic for each point in the simplex of points; fifth instructions for determining a set of points in the simplex that need to be resampled if the new point is worse than a value of the operational characteristic for each point int he simplex of points; sixth instructions for resampling the operational characteristic at each of the points in the set of points; and seventh instructions for determining a new simplex based on the resampled operational characteristic of points in the set of points. 17. The computer program product of 18. The computer program product of eighth instructions for extending the simplex in a direction of the new point if the operational characteristic of the new point is better than values of the operational characteristic for each point in the simplex of points. 19. The computer program product of eighth instructions for assigning an upper threshold and a lower threshold to a size of the simplex; and ninth instructions for limiting expansion or contraction of the simplex based on the assigned upper and lower thresholds. 20. The computer program product of 21. The computer program product of 22. The computer program product of instructions for comparing a resampled operational characteristic value for the best point to a previous operational characteristic value for the best point; and instructions for determining whether to expand or contract the simplex based on a difference between the resampled operational characteristic value and the previous operational characteristic value. 23. The computer program product of 24. The computer program product of 25. The computer program product of 26. The computer program product of eighth instructions for checking the dimensionality of the modified simplex obtained by expanding or contracting the simplex; and ninth instructions for not performing the expansion or contraction if the modified simplex would have a different dimensionality from the simplex. 27. The computer program product of eighth instructions for converting configuration parameter values of the new point to one of integer and real values based on a value type for the configuration parameters; ninth instructions for checking the converted configuration parameter values to determine if a dimensionality of the simplex is changed by the conversion of the configuration parameters; and tenth instructions for setting the converted configuration parameter values of the new point that result in a change in the dimensionality of the simplex to converted configuration parameter values that do not reduce the dimensionality of the simplex. 28. The computer program product of instructions for setting the converted configuration parameter values to converted configuration parameter values that equal the converted configuration parameter values minus a penalty value. 29. The computer program product of 30. The computer program product of eighth instructions for using configuration parameter values of the best point in the simplex to configure the computing device if no improvement of the simplex is obtainable. 31. An apparatus for determining configuration parameter value settings for a computing device to optimize an operational characteristic of the computing device, comprising:
means for obtaining a simplex of points, wherein each point in the simplex represents a set of configuration parameters for the computing device; means for performing a geometric transformation on the simplex of points to identify a new point to investigate; means for sampling the operational characteristic at the new point; means for determining if the operational characteristic associated with the new point is worse than a value of the operational characteristic for each point in the simplex of points; means for determining a set of points in the simplex that need to be resampled if the new point is worse than a value of the operational characteristic for each point int he simplex of points; means for resampling the operational characteristic at each of the points in the set of points; and means for determining a new simplex based on the resampled operational characteristic of points in the set of points. 32. A method of configuring a computing device by optimizing configuration parameter value settings, comprising:
obtaining a simplex of points, wherein each point in the simplex represents a set of configuration parameters for the computing device, and wherein each point has a corresponding operational characteristic value associated with the point; performing one or more geometric transformations on the simplex based on the operational characteristic values associated with the points of the simplex to identify a new points to investigate; measuring a value of the operational characteristic based on a set of configuration parameters associated with the new points; and configuring the computing device based on values of a set of configuration parameters associated with a best point in a resulting simplex, wherein performing the one or more geometric transformations includes checking the new points obtained from performing the one or more geometric transformations to determined if one or more conditions are violated and wherein the conditions are set so as to compensate for dynamic and noisy operating environments of the computing device. 33. The method of applying an upper and lower limit on a size of the simplex; comparing new point configuration parameter values against the upper and lower limit; and adjusting the new point configuration parameter values based on the comparison. 34. The method of permitting both real and integer valued configuration parameter values; determining if new point configuration parameter values result in a reduction in dimensionality of the simplex; and adjusting the new point configuration parameter values based on the determination. 35. The method of resampling the operational characteristic at a best point in the simplex if a geometric transformation does not result in a new point whose configuration parameter values result in a better operational characteristic value; comparing a resampled operational characteristic value of the best point to an original operational characteristic value of the best point; and determining whether to expand or contract the simplex based on the comparison. Description This application is related to, and claims the benefit of priority to, U.S. Provisional Patent Application 60/486,306 filed on Jul. 11, 2003, which is hereby incorporated by reference. 1. Technical Field The present invention is directed to an improved computing system. More specifically, the present invention is directed to an improved method and system for dynamically determining configuration values for improved performance in an autonomic computing system based on geometrical simplex transformations in the underlying multi-dimensional parameter space. 2. Description of Related Art The success of service-oriented Information Technology, such as Autonomic Computing, On-demand eBusiness and eCommerce, depends critically on the ability to provide information, goods, and services in a fast, efficient and cost-effective fashion. Unfortunately, the increasing complexity of the computing systems necessary to provide these services is rapidly outstripping human ability for system operation. This is especially true when it comes to optimization of system parameters for these complex computing systems. The fundamental difficulties in real-time optimization of system parameters in large complex systems arise from a number of sources. In many situations, a good model of the system and the way the system interacts with the world is not available (or may be too expensive to obtain). The lack of such a system model prohibits the use of sophisticated analytical and simulation tools for online (i.e., real-time) or offline optimization of the system parameters. The problem is further compounded by the fact that there may be multiple parameters that have to be optimized simultaneously to improve system performance. Since a model of the system is not accessible, there is little understanding of the relative importance of the different system parameters (in terms of how each parameter effects the system's performance) and of the potential nonlinear interactions between the different parameters (in terms of their combined effect on the system's performance). In situations where a model of the system is not at hand, one widely adopted technique is to sparsely sample the multidimensional parameter space (say, in a regular grid-like manner) and adopt the parameter setting that provides the best performance among the sampled points. Unfortunately, due to the curse of dimensionality, the number of necessary samples increases exponentially with the number of parameters to be optimized. Thus, even for a small set of parameters, the cost and time needed for a reasonable sampling of the multidimensional parameter space may be too prohibitive. Moreover, for these reasons, such sampling and optimization cannot be performed dynamically in real-time. In addition, a system's behavior may be stochastic in nature and/or it may operate in a noisy and dynamic environment, such that similar system configuration parameters may result in very different overall performance measures or utility values. Thus, the ability to use historical data to infer a system model is seriously jeopardized, especially in a dynamic environment where demand or the load that is placed on the system is changing continuously over time. In spite of all the above difficulties, it is the administrator's job to (re)configure the system parameters and improve the system's performance (as measured by a given metric) while the system is in operation. This calls for new methods and apparatus for dynamic, online, multi-parameter optimization that can automatically and quickly configure and tune system parameters without human intervention. The focus of such methods is not necessarily on determining the provably optimal parameter settings, but on finding reasonably good solutions reasonably quickly. Such methods are likely to play a fundamental role in Autonomic Computing, On-demand eBusiness and eCommerce system where there is a significant benefit in providing superior performance in unpredictable complex environments. Known mechanisms used to perform off-line multi-parameter optimization include the Direct Search methods, and its variants (e.g., simplex algorithm and pattern search). The popularity of such class of methods exist because (i) they tend to work well in practice, (ii) they can often avoid pitfalls that can afflict more elaborate methods, and (iii) they are simple and straightforward to implement; thus they can be applied almost immediately to many nonlinear optimization problems. These methods do not need to explicitly calculate derivative or gradient information in the parameter space. Typically, these methods maintain a set of points (called the simplex) that is obtained by directly sampling the parameter space. In addition, these methods use a variety of techniques for steep descent (but not necessarily methods of steepest descent) to arrive at near optimal solutions. Unfortunately, a direct application of the Direct Search method (and its variants) to automatically configure and optimize system parameters in Autonomic Computing systems is likely to fail for a number of reasons. First, Direct Search methods (and its variants) do not work in dynamic environments, where the demand or the load on the system is changing continuously over time, and where the same parameter settings can provide different performance measures at different times. Direct Search methods were designed for static problems and have no built-in mechanism to handle dynamic environments. Second, Direct Search methods work only for deterministic problems where there is no noise either in measurements of the system's performance on in the system's dynamics. Direct Search methods make the fundamental assumption that the same parameter setting is always going to provide the same performance measure. In noisy or stochastic environments, where such an assumption is not valid, Direct Search methods can fail dramatically in finding good solution regions quickly. Third, Direct Search methods make certain assumptions about the nature of the parameters being optimized. Typically, Direct Search methods (and the variants) are designed to handle problems with either all real-valued parameters or all integer-valued parameters. In most systems, parameters come in both flavors, and it is necessary to configure and tune both types of parameters simultaneously. In such scenarios, existing Direct Search methods, and the variants, can fail spectacularly since they fail to take the differences in the underlying granularity of the parameter space into account. Fourth, Direct Search methods, and the variants, cannot handle relational constraints between the parameters being optimized. In many problems of system configuration and optimization, there exist constraints that involve one or more parameters. For example, a set of constraints could indicate that:
0x 2 1.0 constraint # 3.
0x 3 1.0 constraint # 4
where x 1, x2, x3 are the system configuration parameters. Direct Search methods, and the variants, were designed for unconstrained problems and are highly inefficient in finding good parameter settings in constrained optimization problems. Thus they have not been employed in online constrained optimization problems.
Finally, Direct Search methods, and the variants, suffer from a number of pathological failure modes that prevent their direct application in many types of optimization problems. For example, in problems with real-valued parameters, the size of the simplex can become infinitesimally small; limiting the Direct Search method's ability to track changes in the optimal parameter settings in dynamic environments. On the other hand, in problems with discrete or integer values, the simplex can easily get stuck in a rut where the Direct Search method is unable to decide on a new point to sample. This pathological failure mode limits Direct Search method's ability to explore promising regions in parameter space. Therefore, it would be beneficial to have an improved system and method for performing dynamic online multi-parameter optimization for autonomic computing systems that does not suffer from the drawbacks of the Direct Search methods discussed above. cl SUMMARY OF THE INVENTION The present invention provides an improved method and system for performing dynamic online multi-parameter optimization for autonomic computing systems. With the method and system of the present invention, a simplex, i.e. a set of points in the parameter space that has been directly sampled, is maintained. The system's performance with regard to a particular utility value, i.e. operational characteristic, is measured for the particular setting of configuration parameters associated with that point in the simplex. A new sample point is determined using the mechanisms of the present invention that will hopefully provide an improved system performance with regard to the utility value. The new point is determined by applying geometric transformations to the points in the current simplex. These geometric transformations may include reflections, extensions, contractions, expansions and translations. The present invention provides mechanisms for limiting the size of the simplex that is generated through these geometric transformations so that the present invention may be implemented in noisy environments in which the same configuration settings may lead to different results with regard to the utility value. In addition, the present invention further includes a mechanism for resampling a current best point in the simplex to determine if the environment has changed. If a sufficiently different utility value is obtained from a previously sampled utility value for the point in the simplex, then rather than contracting, the simplex is expanded. If the difference between utility values is not sufficient enough, then contraction of the simplex is performed. In addition, in order to allow for both real and integer valued parameters in the simplex, the present invention provides a mechanism by which invalid valued parameters that are generated by geometric transformations being performed on the simplex are mapped to a nearest valid value. This may lead to a reduction in dimensions of the simplex however. Thus, in order to avoid the reduction in dimensions of the simplex, the present invention provides a mechanism for checking to determine if the dimensionality of the simplex would be changed by the execution of a particular geometric transformation prior to applying the geometric transformation. If a new point generated by the geometric transformation would result in a reduction in the dimensionality of the simplex, the current point that is the basis for the geometric transformation is perturbed by a small amount and the dimensionality check is performed again. Moreover, in order to handle constrained optimization problems, the present invention translates new points generated by geometric transformations that violate one or more constraints to the boundaries of the feasible region where all constraints are satisfied. The mechanism of the present invention uses a gradient that is based on a penalty value that is proportional to the distance between an infeasible point and its corresponding feasible setting. This gradient is used to move away from the infeasible region to a feasible boundary point. These and other features and advantages of the present invention will be described in, or will become apparent to those of ordinary skill in the art in view of, the following detailed description of the preferred embodiments. The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further objectives and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein: The present invention provides a mechanism for determining optimum configuration parameters for autonomic computing systems, on-demand eBusiness and eCommerce systems, and the like. As such, the present invention is especially well suited for determining configuration parameters of server computing systems in distributed data processing environments. Therefore, in order to provide a context for the description of the preferred embodiments of the present invention, the following With reference now to the figures, In the depicted example, server Referring to Peripheral component interconnect (PCI) bus bridge Additional PCI bus bridges Those of ordinary skill in the art will appreciate that the hardware depicted in The data processing system depicted in It is assumed that the server In the prior art, an administrator of the server Alternatively, the administrator may make use of a static off-line analysis, such as Direct Search methods or their variants, in an attempt to achieve an optimum configuration for the server For example, a vertex point may be established for a combination of a particular set of configuration parameters of a server, autonomic system, eBusiness or eCommerce system, or the like. These parameters may include, for example, with regard to the logging subsystem of the Gryphon system discussed hereafter, growth threshold, reclaimed space, suspend threshold, ration of chunk size and message size; with regard to the Apache sever discussed hereafter, max-client and keep-alive, and the like. At each vertex point, a utility value of interest is measured, or a function of the parameters is evaluated, in order to ascertain the resulting utility value obtained by configuring the system using the corresponding parameter values at that vertex point. This utility value is a performance value that is to be optimized. For example, this utility value for a particular setting of the parameters may include a weighted linear function of the measured response time, latency, cleaning overhead, and variation of log space usage in the logging subsystem of the Gryphon system discussed hereafter, and the like. Thereafter, the geometric transformations of reflection and extension are performed to transform the simplex based on continued identification of vertex points that result in better utility values. Once no better utility value is obtainable via reflection and extension, contraction and shrinking may be performed to identify a vertex point that provides the optimum parameter settings for the system. where a is a positive constant called the reflection coefficient. The utility value at this new point P where c is the extension coefficient and is greater than 1. If the measured utility value at this new point P If the reflected point is still worse than every point in the simplex, i.e., P Where P_ is either P If F The algorithm then continues with the next iteration. These steps of the Direct Search algorithm are illustrated in As discussed previously, while the Direct Search methodology works well for static off-line problems, they tend to fail when applied to dynamic on-line environments. The present invention solves the problems associated with the application of Direct Search methods to dynamic on-line environments by providing improvements to the Direct Search methodology that compensate for the dynamic and noisy nature of the on-line environment. The present invention modifies and extends Direct Search methods to overcome all of the limitations of known Direct Search methods and employs a new dynamic, online, multi-parameter optimization method for the self-configuration and self-tuning of Autonomic Computing systems. As in Direct Search methods, the basic objective is to sample new points in the hope of replacing the worst point in the simplex (i.e., the configuration setting with the worst utility value or performance measure) with a new point that has higher utility than the best point in the simplex. The position of the new point to be sampled is determined by applying geometric transformations to the points in the current simplex. In order to be able to perform dynamic, online, multi-parameter optimization to work, it is necessary to determine when and when not to apply the various geometric transformations. The present invention provides mechanisms for determining when to apply such geometric transformations. For example, if a reflection on the simplex provides a new point that returns a utility value higher than that of any point in the current simplex, then the next transformation (called extension) extends the simplex in the same direction of the new point with hope of finding a new point that has even higher utility. Typically, in Direct Search methods, when all other transformations of the simplex have been exhausted, and none have produced a point with higher utility or better performance measure than the current best, then the size of the simplex is reduced by contraction. The motivation here is that since the exploratory transformations outside the simplex failed to improve upon the current best solution, it is time to look inside the simplex to search for better solutions. This usually works fine in deterministic or static problems where any given point in the multi-dimensional parameter space returns one and only one utility value. Unfortunately, in noisy or dynamic environments, this type of contraction on the simplex severely inhibits the Direct Search method's ability to continue the search for better solutions as the method goes into a tailspin and contracts the simplex over and over again. In noisy environments, the simplex may contract to a point that is nowhere close to optimal parameter settings. On the other hand, in dynamic environments, the simplex may contract to a point that no longer represents a good setting of parameters under the current conditions. Thus, it is imperative in dynamic and/or noisy environments to limit the size of the simplex from becoming too small, and thus being unable to track changes in the environment, or, conversely, becoming too big and miss regions of high utility inside the simplex. The present invention provides a mechanism for assigning an upper and lower threshold to the dimensions of the simplex that limits the size to which the simplex may be extended, expanded, or contracted. The upper and a lower thresholds on the size of the simplex are based on domain knowledge (e.g., threshold values suggested by the system designer or system administrator based on his or her knowledge of the system) and can be decided upon in advance and stored as parameters of the methodology of the present invention. For example, the size of lower threshold may be determined by the lowest resolution of significance (or availability) for each of the parameters. Similarly, the region that includes the highest and lowest possible values of all the parameters may determine the upper threshold on the simplex size. In With regard to the lower threshold on the size of the simplex, a threshold value may be provided that limits the amount of contraction of the simplex that is permitted. Thus, when contraction of the simplex is performed, a determination may be made as to whether the contraction would result in a simplex that has one or more sides that have a length that is smaller than the lower threshold. In such a case, parameter values may be mapped to closest points on a simplex boundary that meets the lower threshold requirements. To handle dynamic environments, the present invention extends Direct Search methods by allowing for the geometric transformation of expansion on the current simplex. Before committing to simplex contraction as a result of the other geometric transformations not resulting in a better utility value, the present invention re-samples a new set of points. For example, the current best point, current n number of best points, and the like, in the simplex could be resampled. As an example, in a preferred embodiment of the present invention may resample only the current best point. If a significant difference in the performance measure (or utility value) is found between the new and the old measurement, then it is assumed that the environment has changed, and the simplex is expanded to track the change in the environment (unless the simplex size has reached an upper threshold). Thus, each point in the simplex (except P where m is the expansion coefficient greater than 1.0. By preventing the simplex from contracting and forcing the sampling of new points, the present invention allows the simplex to climb uphill even if the underlying utility landscape is changing over time. On the other hand, if the new and the old measurement do not differ by a significant amount, contraction of the simplex is allowed (unless the simplex size has reached a lower threshold). Whether a difference between the new and the old measurement is significant or not is determined through domain knowledge and the system administrator can set the “significance” threshold in advance. Similarly, in noisy or stochastic environments (with white or colored noise), the present invention uses domain knowledge before deciding upon the geometric transformation to apply on the simplex. The implication here is that the true utility value of a point in the simplex is said to be different than that of another point in the simplex only if the data, i.e., the measured utilities of the sampled points, suggests a statistically significant difference in the two measured values. Thus, if it is known that in a noisy system, repeated measurements of utility values for any particular configuration follows normal distribution, then standard statistical tests can be applied to determine, with a certain confidence level, that the utility value of a simplex is greater (or lesser) than the utility value at another vertex in the simplex. Additional information necessary to test for statistical significance (such as whether noise is white or colored) can be acquired beforehand, and the level of significance can be set in advance based on the critical nature of the system and its environment. Once the points in the simplex are ranked with the help of the above method, all of the geometric transformations on the simplex, including reflection, extension, contraction, and expansion can be applied as before to search for better parameter settings. In addition to the above, the present invention provides mechanisms for allowing both real-valued parameters and integer-valued parameters in the simplex. While the simplex can be defined as usual with real and integer valued parameters in each configuration, operations on the simplex have to be defined more carefully as geometric transformations on the current simplex may result in a new point with impossible or illegal parameter values. For example, if a system has real-valued parameters x This is a problem in that parameter x The present invention solves this problem by mapping the setting of the integer-valued parameter to the nearest integer or the nearest legal value. In this example, x While this mapping is simple to implement and works in general, it introduces another problem where the simplex is inadvertently reduced by one or more dimensions. Consider three parameter configurations Y Note that x which is remapped to:
in order to respect the fact that x In problems with integer-valued parameters, before accepting any new point, the present invention checks to make sure that the dimensionality of the simplex remains unchanged. This is guaranteed by confirming the non-colinearity of the new point against all pairs of points in the current simplex. If the new point happens to reduce the simplex dimension, then it is perturbed by a small random amount, and the linearity check is performed again. Thus in the above example, Y in order to avoid the co-linearity of x To handle constrained optimization problems the present invention translates a new point that violates one or more constraints to the boundaries of the feasible region where all constraints are satisfied. However, a naive approach to this translation is liable to lead to reductions in the simplex dimension, and thus, special attention is required to handle the constraints. Consider the problem with the following constraints:
1 1.0 constraint # 2.
0.0x 2 1.0 constraint # 3.
0.0x 3 1.0 constraint # 4
The first point to note is that although there are three parameters, the present invention takes advantage of the constraints to simplify the parameter space to be searched. Since x Now consider a new point Y The present invention avoids this problem by not re-mapping the coordinates of Y Thus, the present invention improves upon known Direct Search methods by including mechanisms for limiting the size of the simplex generated through simplex geometric transformations in order to ensure that the simplex remains at a size that ensures that the simplex is large enough for the mechanisms of the present invention to be able to track changes in the environment and small enough to identify regions of high utility within the simplex. Moreover, the present invention provides a mechanism for permitting expansion, rather than contraction, of a simplex when a determination is made that changes in the environment have occurred. In addition, the present invention provides a mechanism for selecting geometric transformations to be applied based on whether differences in the utility values of simplex points are statistically significant or not. Furthermore, the present invention provides a mechanism for permitting the inclusion of real and integer valued parameters in the simplex and ensuring that geometric transformations on such a simplex do not result in invalid points being utilized or a reduction in the dimensionality of the simplex. Also, the present invention provides a mechanism for ensuring that new points identified by the geometric transformations of the simplex do not violate established constraints and avoid reduction in dimensionality of the simplex. In addition, the on-line multi-parameter optimization device may be implemented in the autonomic computing system being configured using the on-line multi-parameter optimization device, or may be a separate device from the autonomic computing system that is being configured. In a preferred embodiment, the on-line multi-parameter optimization device is integrated with the autonomic computing system and operates in consort with the autonomic computing system. As shown in The controller Configuration parameter setting device The utility value measurement module The simplex geometrical transformation module The threshold and constraint storage module In operation, the controller The controller The simplex geometrical transformation module The simplex geometrical transformation module The operation then continues in the manner previously described above with continued iterations until stopping criteria are met. At that time, the best utility valued point in the simplex is selected as the optimum configuration parameter setting for the autonomic computing system. The configuration parameter setting device The historical data storage device The configuration parameter settings and their corresponding utility values stored in the historical data storage device Accordingly, blocks of the flowchart illustrations support combinations of means for performing the specified functions, combinations of steps for performing the specified functions and program instruction means for performing the specified functions. It will also be understood that each block of the flowchart illustrations, and combinations of blocks in the flowchart illustrations, can be implemented by special purpose hardware-based computer systems which perform the specified functions or steps, or by combinations of special purpose hardware and computer instructions. A determination is made as to whether the new point lies outside established thresholds for the size of the simplex (step If the new point is not worse than every other point in the simplex and all geometric transformations besides contraction has not been applied, a different geometric transformation is used to obtain a new point (step It should be appreciated that the utility value of interest is dependent upon the particular implementation of the present invention and may be selected by an administrator as the value that is sought to be optimized. Moreover, the terms “better”, “best”, “worse” and “worst” are relative terms that may take on different meaning based on the particular utility values being optimized. Thus, for example, a “better” utility value with regard to response time would be a lower overall value, i.e. 0.3 seconds is better than 0.5 seconds. However, for a utility value of number of packets processed per cycle, a higher value would be better than a lower value. Even though these terms are relative, one of ordinary skill in the art is well aware of what constitutes “better” and “worse” with regard to the particular utility values selected for optimization. Within steps Thus, the present invention provides a mechanism for dynamically optimizing autonomic computing systems by analyzing, on-line, the configuration parameters and their resulting utility values of the autonomic computing system to determine the optimum settings of these configuration parameters. With the present invention, an autonomic computing system may be periodically reconfigured so that optimum operation of the autonomic computing system is achieved. One type of autonomic computing system for which the present invention may be utilized is the logging and recovery subsystem of the content-based publish-subscribe (pub-sub) system called Gryphon, available from International Business Machines, Inc. Gryphon is deployed as a redundant overlay network of brokers for filtering and routing messages from publishers to subscribers. The Gryphon project has developed scalable algorithms for rapidly filtering messages through large numbers of overlapping filters, and to selectively route messages in a multi-hop network to those neighbors that are on a path towards matching subscribers. Recently, a guaranteed delivery (GD) service for exactly one delivery of messages to subscribers has been implemented in Gryphon. Informally, each publisher in the system is the source of an ordered event stream. Guaranteed delivery ensures that any subscriber who remains connected to the system sees a gapless filtered subsequence of this stream, starting from an initial point in time. A subsequence of the event stream is said to be gapless if for any two adjacent events in this stream, there is no event in the original stream that is between these events and matches the subscriber's filter. The guarantee must be honored in the presence of broker failures and link failures. More information about the Gryphon system and the logging and recovery subsystem may be found in Bagchi et al., “Design and Evaluation of a Logger-based Recovery Subsystem for Publish-Subscribe Middleware,” International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS 2002), San Diego, Calif., which is hereby incorporated by reference The system configuration, the workload characteristics, and the failure characteristics of the brokers or the links between the brokers can all vary widely from one deployment of the Gryphon system to another. The logging and recovery subsystem (hereafter referred to as the “logger subsystem”) within Gryphon has several different control parameters and for any particular Gryphon deployment, substantial manual tuning and system knowledge is necessary to determine the settings that result in better performance. Naturally, this assumes that performance metrics of interest have been defined a priori. The present invention was applied to the logger subsystem of Gryphon with the purpose of autonomically tuning the control parameters of the logger subsystem for superior performance in failure-free conditions and under failure injection conditions. With the application of the present invention to the logger subsystem of Gryphon, three metrics are defined that capture important performance and resource utilization characteristics of the Gryphon system as well as the logger subsystem. Four control parameters that have the most significant impact on the logger subsystem's performance under typical workload conditions are utilized in the optimization performed by the present invention: growth threshold, reclaim, suspend threshold, and ration of chunk size and message size. The optimization mechanism of the present invention is utilized to search the parameter space to find control parameter settings that result in improved performance in the Gryphon system. As mentioned above, the four control parameters used in the application of the present invention to optimization of the logger subsystem are growth threshold, reclaim, suspend threshold and ratio of chunk size and message size. The growth threshold (g) is a control parameter that defines when a cleaner task is scheduled to run. Thus, for example, the growth threshold may designate that the cleaner task is scheduled to run when the log size grows by more than the threshold of g % between two consecutive measurements of the size of the log space. The reclaim control parameter (r) identifies the amount of log space that is reclaimed when the cleaner task is scheduled. That is, for example, the cleaner task may reclaim r % of the most recent measure of the log size from the log space. The suspend threshold (s) identifies when writes to the log are suspended. Thus, for example, the cleaner task typically runs concurrently with the normal writes to the log. However, if the log size grows by more than s % of the last sampled log size during the cleaning, all further new writes (as opposed to cleaning writes) to the log are suspended. The ratio of chunk size and message size (z) is a measure of the relative size of the chunks of log space being allocated and deallocated to the size of the messages being used by the publishing clients. That is, the logger subsystem manages the physical log space through allocation or deallocation of disk space in units of a chunk-size which is a tunable parameter in the subsystem. There are relationships between the control parameters that must be adhered to in order to obtain proper operation of the logger subsystem. For example, r must be greater than g so that the cleaner tasks can reclaim at least as much log space as it has grown. Otherwise, the log size will grow in an unbounded fashion leading to a throttling of the writes to the system. Similarly, s must be greater than g. If this condition is not met, normal writes to the system will be suspended when the cleaner task is scheduled to run. These constraints are important in that they reduce the size of the search space of the parameter values that must be explored. The effects of these control parameters on the logger subsystem is measured with regard to three performance metrics which capture the essential performance and resource utilization characteristics that are of interest to a user of the Gryphon system. These performance metrics include variation of log space usage, cleaning overhead, and latency. Variation of log space usage. (v) is the ratio of the standard deviation of the disk space usage to the mean disk space usage. Since the cleaner task is scheduled only intermittently, the size of the disk space utilized by the logger subsystem can vary over time. A large variation would require over provisioning of storage space in the system and would also result in oscillatory behavior of the system. Cleaning overhead (c) represents the overhead associated with the cleaning of the log space. The cleaning of the log space can be looked upon as an overhead in the system that reduces the bandwidth available to the normal writes. The value c denotes the measure of the overhead due to cleaning and it is defined as the ratio of the number of puts due to cleaning to the total number of puts to the system. Latency (l) is the difference between actual latency and the latency in the system under ideal conditions, i.e. when there is no overhead due to the cleaning tasks. When the cleaner task is executing, the normal writes contend with the cleaning writes leading to an increase in latency for the normal writes. In particular, the overhead due to the logger subsystem results in a time delay between the initiation of a put to the Gryphon system and the time when both the corresponding write has been committed to stable storage and the call-back has been returned from the logger subsystem. From a system designer or a system administrator's view, the above three metrics highlight the conflicting requirements for performance and resource utilization in the Gryphon system. To characterize the overall behavior of the Gryphon system for a particular setting of the four control parameters, a scalar penalty measure P is defined that is a function of the three performance metrics:
In the application of the present invention to the logger subsystem, the three parameters g, r and s are restricted to only integer values in the range 0% to 100% subject to the two constraints mentioned earlier. For the control parameter z, a ranged of values between 64 and 1280 is utilized assuming that typical messages range in size from 10 Bytes to 2000 Bytes, with the chunk size remaining fixed at 128 Kbytes. The optimization system and method of the present invention was applied to the control parameters and metrics described above under the above conditions. The optimization system and method was applied with no faults being injected and with faults being injected. The results of the application of the present invention are shown in -
- Growth (g)=25%
- Reclaim (r)=27%
- Suspend (s)=49%
- Ratio Chunk/Message=119 Bytes
The penalty values obtained in the above experiment are much higher than in the fault-free case. Also the parameter settings obtained by the present invention at the end of the runs are different from those in the fault-free case. Hence, if the system were manually tuned under fault-free conditions, the system performance would no longer be optimal if the runtime environment had failures. This underscores the need for the present invention. Another type of autonomic computing system for which the present invention may be utilized is the Apache v1.3 Web server. Apache v1.3 on Unix is structured as a pool of worker processes monitored by a master process. The master process monitors the health of the worker processes and manages their creation and destruction. The worker processes are responsible for handling the communications with the Web clients as well as performing the work required to generate the responses to the requests from the Web clients. A worker process handles at most one connection at a time, and it continues to handle only that connection until the connection is terminated. Thus, the worker is idle between consecutive requests from its connected client. There are two main parameters to control the response time of the Apache web server: MaxClients and KeepAlive Timeout. The MaxClients parameter limits the size of this worker pool, thereby imposing a limitation on the processing capacity of the server. A higher MaxClients value allows Apache to process more client requests. But if MaxClients is too large, there are excessive resource utilizations that degrade performance for all clients, i.e., longer response time. The Apache “KeepAlive Timeout” tuning parameter controls the maximum time a worker process can remain in the “User Think” state before its client connection is closed. If KeepAlive is too large, CPU and memory are underutilized since clients with requests to process cannot connect to the server, and so the clients experience long response times. Reducing the timeout value means that workers spend less time in the “User Think” state, and more time in the “Busy” state. Hence, CPU increases and the response time decreases. If the timeout is too small, the TCP connection terminates prematurely and reduces the benefits of having the persistent connections. The extra overheads can make the user response time longer. The optimization system and method of the present invention was applied to control the MaxClients and KeepAlive Timeout parameters in the Apache Web Server to minimize the response time of the system under simulated static and variable load conditions. As in the Gryphon system, present invention was successfully able to find parameter settings that resulted in superior performance than those obtained from the default parameter settings of the Apache Web Server v1.3. Thus, the present invention provides an improved system and method for performing dynamic online multi-parameter optimization for autonomic computing systems that does not suffer from the drawbacks of the known Direct Search methods. The present invention expands upon Direct Search methods to provide additional functionality that permits the modified Direct Search methods to be applied to dynamic and noisy environments, such as eBusiness and eCommerce type systems operating on-line on a network, such as the Internet. It is important to note that while the present invention has been described in the context of a fully functioning data processing system, those of ordinary skill in the art will appreciate that the processes of the present invention are capable of being distributed in the form of a computer readable medium of instructions and a variety of forms and that the present invention applies equally regardless of the particular type of signal bearing media actually used to carry out the distribution. Examples of computer readable media include recordable-type media, such as a floppy disk, a hard disk drive, a RAM, CD-ROMs, DVD-ROMs, and transmission-type media, such as digital and analog communications links, wired or wireless communications links using transmission forms, such as, for example, radio frequency and light wave transmissions. The computer readable media may take the form of coded formats that are decoded for actual use in a particular data processing system. The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiment was chosen and described in order to best explain the principles of the invention, the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated. Referenced by
Classifications
Legal Events
Rotate |