« PreviousContinue »
U.S. Patent Mar. 20, 2012 Sheet 4 of5 US 8,140,548 B2
m F 691 SYSTEM MEMORY 620 D \\ (ROM) E / MONITOR BIOS Q 630 PROCESSING / 690 UNIT OUTPUT (RAM) 632 J VIDEO PERIPHERAL 696 — INTERFACE OPERATING INTERFACE / -6“ 4} 4? 4? M APPLICATION 35 695 PROGRAMS _ < SYSTEM Bus /' > —> OTHER PROGRAM 621 J SPEAKERS MODULES m 6)5o (660 670 P NoN_§EMovAB|_E REMOVABLE USER NETWORK LOCAL AREA NETWORK RSGRAM NON-VOL. MEMORY N°N'V°'-- INPUT INTERFACE <: 671 ATA E INTERFACE MEMORY INTERFACE 640/ INTERFACE AA II 672 IIEEEII 1 I'll“ Ell = ‘I: Aux 680 J 651-/ I / WIDE AREA / 641 655 © 599 NETWORK REMOTE CoMPuTER(S) 652 IIIII cI:I:I:I I:I:I:I:I |::I:| 0» I IIIIIIIIIIIIIIIIIIIIIIII E -iI OPERATING APPLICATION P2(T):|;:Mm PROGRAM \ ~J ||||| SYSTEM PROGRAMg MODULES DATA647 MIC 664 EYBOARD 662 — — REMOTE Q 631 661 APPLICATION FIG. 6 600\—'/‘ 663 MOUSE PROGRAMS
1 CONSTRAINED PHYSICAL DESIGN TUNING
The performance of database systems, particularly enterprise database systems, depends on an effective configuration of physical design structures, such as indexes, in the databases that compose those systems. Automatically configuring such physical design structures to increase the performance of the underlying database system, generally referred to as an automated physical design problem, has been recently researched.
The physical design problem statement is traditionally stated as:
Given a Workload W and a storage budget B, find the set of physical structures (that is, the configuration) that fits Within B and results in the lowest execution cost for W.
Thus, existing solutions to the automated physical design problem generally attempt to minimize execution costs of input workloads for a given a storage constraint. However, this model is not flexible enough to address several real-world situations. More particularly, a single storage constraint does not model many important situations in current database management system installations. What is needed is a generalized version of the physical design problem statement that accepts one or more complex constraints.
This Summary is provided to introduce a selection of representative concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used in any way that would limit the scope of the claimed subject matter.
Briefly, various aspects of the subject matter described herein are directed towards a technology by which a constraint language is provided to allow specifying one or more complex constraints for use in selecting configurations for use in physical database design tuning. The complex constraint (or constraints) are processed, e.g., in a search framework, to determine and output at least one configuration that meets the constraint for use in physical design tuning, e.g., a best configuration found before a stopping condition is met.
In one example implementation, a constraint language is provided, including grammar for asserting a function with respect to a value. For example, a new execution cost may be specified relative to an original execution cost of executing a workload. Also provided is a mechanism for the soft assertion of a function with respect to a value.
In one example implementation, a search framework processes a current configuration into candidate configurations, including by searching for candidate configurations from a current configuration based upon a complex constraint, iteratively evaluating a search space until a stopping condition is satisfied, using transformation rules to generate new candidate configurations, and selecting a best candidate configuration. Transformation rules and pruning rules are applied to efiiciently perform the search.
Other advantages may become apparent from the following detailed description when taken in conjunction with the drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention is illustrated by way of example and not limited in the accompanying figures in which like reference numerals indicate similar elements and in which:
Various aspects of the technology described herein are generally directed towards adding constraints, including more than a storage constraint, to provide solutions to the physical design problem. As will be understood, in part this is facilitated by a constraint language that is straightforward to use, yet is sufficiently powerful to express various scenarios. Further described is a search framework to incorporate such constraints into a transforrnation-based search strategy.
While some of the examples described herein are directed towards an example constraint language and technology that builds upon an existing transforrnation-based framework to effectively incorporate constraints in the search space, it is understood that these are only examples. Alternative languages and or frameworks may be implemented. As such, the present invention is not limited to any particular embodiments, aspects, concepts, structures, functionalities or examples described herein. Rather, any of the embodiments, aspects, concepts, structures, functionalities or examples described herein are non-limiting, and the present invention may be used various ways that provide benefits and advantages in computing and networking in general.
Turning to FIG. 1, there is shown a general block diagram representing starting with a problem specification and processing into results. In general, a user (designer, administrator or the like) provides a specification 102 using the constraint language described below, from which C++ code (in one example implementation) is produced. Initialization code 104 and constraints 106 are produced; the user (e.g., via custom code and/or a library) may also specify other constraints 108, as described below.
The C++ code produces a user-defined object code instance 110, e.g., via compilation. In association with a search framework 112 (described below with respect to FIG. 2), a constrained optimizer 114 uses the user-defined instance 110 to generate results for a database management system 116. The results include a deployment script 118 (e.g., in SQL), possibly along with text reports 120 or the like.
Turning to aspects of the constraint language, as a simple example, consider the following query:
sider further a narrow single-column index I N over (a). In this case, two I/Os are needed to respond to the query; one to locate the record-id of the qualifying tuple from the secondary index I N, and another to fetch the relevant tuple from the primary index.
In absolute terms, IC results in a better execution plan compared to that of I N. However, the execution plan that uses IN is only slightly less efficient to the one that uses IC (specially compared to the naive alternative that performs a sequential scan over table R), and at the same time incurs no overhead for updates on columns b, c, d, or e. If such updates are possible, it may make sense to “penalize” wide indexes such as IC from appearing in the final configuration. However, current techniques cannot explicitly model this requirement without resorting to artificial changes. For instance, this behavior may be simulated by introducing artificial UPDATE statements in the workload. This mechanism, however, is not general enough to capture other important scenarios described below. Note, however, that there are situations for which the covering index is the superior alternative (e.g., there could be no updates on table R by design). Indeed, an application that repeatedly and almost exclusively executes the above query can result in a fifty percent improvement when using the covering index IC rather than the narrow alternative IN. Deadlocks also may result when narrow indexes are used.
In general, there are other situations in which the traditional problem statement for physical design tuning is not sufficient. In many cases there is additional information that may be incorporated into the tuning process. It is often not possible to do so by only manipulating either the input workload or the storage constraint. For instance, it may be desired to tune a given workload for maximum performance under a storage constraint, while ensuring that no query degrades by more than ten percent with respect to the original configuration. Alternatively, it may be specified to enforce that the clustered index on a table T cannot be defined over certain columns of T that would introduce “hot-spots” (without specifying which of the remaining columns are to be chosen). As yet another example, in order to decrease contention during query processing, a tuning solution may want to prevent having any single column from a table appear in more than some number of indexes (e.g., three), as the more indexes in which a column appear, the more that contention is likely due to exclusive locks during updates. As can be readily appreciated, tuning based upon a single storage constraint does not handle these and other such situations in current DBMS installations.
Described herein is technology directed towards a generalized version of the physical design problem statement that accepts one or more complex constraints (that is, not simply a storage budget constraint) in the solution space, and provides expressiveness, so that it is straightforward to specify constraints with sufficient expressive power. Further, the technology facilitates effectiveness, in that constraints are able to effectively restrict the search process. Still further, specialization is provided, e.g., when there is a single storage constraint, the resulting configurations are close, in terms of quality, to those obtained by current physical design tools. The technology described herein addresses expressiveness, effectiveness and specialization. For simplicity herein, described are techniques to handle secondary indexes as the physical structures that define the search space, (although other structures such as corresponding to primary indexes, materialized views and other physical structures may similarly be used).
One example design provides a simple constraint language that covers various scenarios, along with a lower-level interface to specify more elaborate constraints as well as efficient ways to evaluate constraints. By way of examples of data types, functions and constants, the exemplified constraint language understands simple types such as numbers and strings, and also domain-specific ones. More particularly, data types that are relevant for physical design, such as database tables, columns, indexes and queries are natively handled. Sets of elements, comprising unordered homogeneous collections (e.g., workloads are sets of queries, and configurations are sets of indexes) are also supported. These sets may be accessed using either positional or associative array notation (e.g., W returns the second query in W, and W[“QLong”] returns the query in W whose identifier is QLong).
The language supports a rich set of functions over these data types. By way of example, the columns of a table T are obtained using cols(T), the expected size of an index I using size(I), and the expected cost of query q under configuration C using cost(q, C). Additional functions are described herein.
Further, other useful constants are referenced in the language. For example, W is used to denote the input workload, with the following constants to specify certain commonly used configurations:
C: denotes a desired configuration, for which constraints
are typically specified.
COrig: The configuration that is currently deployed in the database system.
CBase: The base configuration only contains those indexes originating from integrity constraints. Therefore, it is the worst possible configuration for SELECT queries in the workload, and the one with lowest UPDATE overhead.
CSelectBest: This configuration is the best possible one for SELECT queries in the workload; CSelectBest contains the indexes resulting from access-path requests generated while optimizing the input workload. Intuitively, indexes in this configuration are the most specific ones that can be used in some executionplan for a query in the workload. For instance, the two indexes in CSelectBest for a query: