PROGRAM PRODUCT FOR OPTIMIZING
PARALLEL PROCESSING OF DATABASE
This is a continuation of application Ser. No. 08/486,087, 5 filed Jun. 7, 1995 now abandoned which is a divisional of copending application Ser. No. 08/201,822 filed on Feb. 25, 1994.
FIELD OF THE INVENTION 10
This invention relates to processing queries in computer database systems and more particularly to a strategy for optimizing the execution of a database query.
BACKGROUND OF THE INVENTION 15
Database management systems (DBMS) support the definition, retrieval, and updating of data stored in a database. Arelational database management system (RDBMS) is a particular form of a database system where data is stored 20 in a tabularized form. The database tables consist of sets of rows which share common characteristics.
The database is physically stored as pages of data on non-volatile storage devices such as direct access storage devices (DASD). An index can also be stored on DASD 25 listing a directory for locating specific data which aids in the retrieval of that data.
Structured query languages, such as the SQL language, have been developed for relational database systems to access the data in a database. The relational database man- 30 agement systems implement the functions of the SQL language.
An example of a table stored in a relational database system is an EMPLOYEE_SALARY table which stores salary information for all employees of a company or 35 enterprise as follows:
The EMPLOYEE_SALARY table has columns for 45
employee number (emp no), employee name (emp name),
department number (dept no) and salary. Within the table,
each row lists an employee information set called a tuple comprising the employee number, name, department number and salary for a given employee. 50
Computer programmers write application programs to access and maintain the data in the database. The application programs are executed by the database management system. The application programs need to be processed by the central processing unit (CPU) of the computer system for 55 execution by the CPU. There are three stages in processing an application program—the precompilation, the compilation (also referred to as bind time) and the execution (also referred to as execution time or run time). During precompilation, all SQL statements are extracted from the go application program. The SQL statements are used to access the data from the database.
SQL statements specify what data is wanted but not how to get the data. The relational database management system determines the optimal method for accessing and retrieving 65 the requested data. A strategy is deemed optimal in terms of its minimizing the resource utilization costs. Typically, a
query optimizer process is performed, during bind time, to determine an optimal strategy. The optimal query processing strategy is transformed, also during bind time, into a query execution plan which consists of a series of steps executable by the CPU to access the requested data and satisfy the query. During execution time, the execution plan as constructed during bind time is executed. The query execution plans can be stored and executed or re-executed as needed.
The application program can include host variables which can change the execution of the embedded SQL statement. A host variable is a PL/I variable, a C variable, a FORTRAN variable, a COBOL data item, or an assembly language storage area referenced in a SQL statement. During execution time, when the query execution plan is activated and executed, values for the host variables are passed from the application program to the query execution plan.
An example of a SQL statement using a host variable is as follows:
Where DEPT_NO=:dept_no; The host variable in the preceding Select statement is ":dept no". The SQL Select statement performs the operation of retrieving the maximum salary for employees in the department specified by the host variable. An EMPLOYEE_SALARY table is accessed to satisfy the preceding query statement. One of the columns (attributes)
of the table is dept no (the employee's department
number). The MAX function identifies the maximum salary for the employees in the given department. The host variable
dept no allows the department number for the query to be
specified by the application program. During bind time, the value of the host value variable is unknown.
In order to satisfy the preceding query, the query access strategy needs to specify how the EMPLOYEE_SALARY table will be accessed and how the MAX function will be evaluated. The EMPLOYEE_SALARY table can be accessed either using a sequential scan of all the pages of records stored in the DASD or by using an index for the table.
In order to improve the RDBMS performance in evaluating and satisfying queries, it is desirable to exploit the inherent parallelism in multiple CPUs or I/O devices available in the computer system during execution. For example, when performing a sequential table scan of a table that is stored across multiple I/O devices, the table scans on the separate I/O devices can be performed at the same time to reduce I/O time by utilizing the concurrency of multiple asynchronous I/O operations on the devices. Parallelism can also be exploited by using multiple CPUs to evaluate the data according to criteria provided by a query, so that total CPU time is lowered. A more complex parallelism operation involves partitioning the query execution plan among CPUs and executing operations in parallel. The query optimizer needs to consider whether a parallel strategy should be invoked when determining the optimal strategy that minimizes CPU time and resource utilization costs.
A parallel execution plan expresses the degree and unit of parallelism. The degree of parallelism is the number of CPUs and I/O streams (processes) used for each unit of parallelism. The unit of parallelism refers to the group of operations (executable steps in an execution plan) assigned to the same process for execution where there is no blocking for operations within a unit. Blocking occurs when an operation has to wait for another operation to complete. For example, blocking occurs when an operation needs to wait for an intermediate result to be stored in a temporary table.