PARALLEL COMPUTATION PROCESSOR,
PARALLEL COMPUTATION CONTROL
METHOD AND PROGRAM THEREOF
FIELD OF THE INVENTION 5
The present invention relates to a parallel computation processor, a parallel computation control method and a program thereof and, more particularly, to a parallel computation processor that enables high-speed loop operation 10 with little power consumption, a computation control method adopted in the same and a program thereof.
BACKGROUND OF THE INVENTION
A DSP (digital signal processor) is among major parallel computation processors. The DSP operates plural execution units in parallel by taking advantage of a characteristic of digital signal processing, "parallel operation", to execute a digital signal processing program at a high speed. 20
For example, the FIR filter (finite impulse response filter), which is a typical program for the digital signal processing, carries out computation as follows:
y(n)=a(Q)x(n)+a(l)x(n-l)+a(2)x(n-2)+ . . . +a(i)x
(n-i) (1): 25
Incidentally, y(n) denotes filter output at time n, s(n-i) denotes filter input at time n-i, and a(i) denotes i-th filter coefficient. The computation represents a loop of the following three steps ((T), @, and (3)) with respect to i=0, 1, 30 2, 3:
(T): loading a(i) from a memory;
@: loading x(n-i) from a memory;
@ : multiplying a(i) and x(n-i) together, and accumulating the product. 35
Accordingly, the DSP in general is provided with a couple of units for loading data from a memory and a unit for executing the multiplication and accumulation. These units are operated in parallel for simultaneously carrying out the steps (T) , @ , and (3) , and thus enabling high-speed com- 40 putation of expression (1).
It is expected that the number of units built in the DSP will be increasing in future. However, the processor requires larger amount of power as computer resources increase. In order to reduce power consumption, it is necessary to limit 45 the computer resources such as a bus for memory access to a minimum except for execution units.
For the purpose of reducing power consumption, there has been proposed a parallel computation processor as depicted in FIG. 1. In the processor, computer resources for instruc- 50 tion issue are reduced by means of cutting down on the number of simultaneously issued instructions compared to the number of execution units.
With reference to FIG. 1 showing the configuration of the conventional parallel computation processor, the processor 55 is equipped with a feature that enables the parallel issue of up to m pieces of instructions. The number m is less than the number of execution units denoted by n. More specifically, the parallel computation processor comprises an instruction bus 201 for simultaneously fetching m pieces of instructions 60 from a memory, m pieces of instruction registers 202 (IR1 to IRm) for storing the instructions, m instruction decoders 203 (ID1 to IDm) for concurrently decoding the m pieces of instructions read out of the registers 202, an instruction dispatcher 204 for simultaneously dispatching decoded 65 instructions obtained by the decoders 203, n execution units 205 (El to En) for executing the instructions in parallel, and
a general register file 206 for feeding input data to the execution units 205 as well as storing outputs therefrom. Incidentally, the instruction dispatcher 204 determines the number of instructions to dispatch in parallel according to the mutual data dependency of instructions. The number may be any one of numbers 1 to m.
The parallel computation processor operates as follows. First, the instruction bus 201 fetches m pieces of instructions from an instruction memory at once, and writes the instructions to the instruction resisters 202. Subsequently, the instruction decoders 203 concurrently decode the m pieces of instructions read out of the resisters 202, and feed m pieces of decoding results to the instruction dispatcher 204. The instruction dispatcher 204 determines the number (a number between 1 and m) of instructions to dispatch in parallel based on dependency among the instructions, and simultaneously dispatches the determined number of instructions. The instructions are executed by any of the n pieces of execution units 205.
The parallel computation processor is available to execute such programs as have many branches that place difficulties in the parallel operation and control operations described by IF-ELSE statements, etc. However, as is clear from the fact that the number of the dispatched instructions does not correspond to the number of the execution units, the processor does not give a performance commensurate with the number of its execution units.
To solve the problem, for example, there has been disclosed a parallel computation processor in Japanese Patent Application laid open No. HEI7-110769. The parallel computation processor is known for having greater computational performance by means of dispatching instructions equal in number to execution units. The processor fetches instructions via an instruction bus while leaving the bandwidth of the bus narrow, and stores the instructions in a buffer to execute instructions of the same number as execution units in parallel.
Incidentally, in the case where the DSP is employed as a parallel computation processor for digital signal processing, it is required to execute high-speed loop operation, in which a series of instructions are executed repeatedly, as the loop operation is typical of a digital signal processing program. In order to implement the high-speed loop operation, there have been employed such techniques as zero-overhead loop. The zero-overhead loop is a control method for cutting out overhead due to execution of a branch instruction. In the zero-overhead loop, specifying instructions are inserted in the beginning of a series of instruction modules comprised in a loop to specify the number of steps in the loop and loop repeat count, namely, count of how many times the loop will be repeated. When the last step in the loop has been executed, execution sequence automatically returns to the address of the first step in the loop, thus enabling the elimination of overhead due to branches.
However, the above-mentioned parallel computation processor does not aim to execute the loop operation efficiently, and therefore structurally incurs overhead on the occasion of branching from the instruction at the loop end to the one at the beginning in execution sequence.
SUMMARY OF THE INVENTION
It is therefore an object of the present invention to provide a parallel computation processor that enables high-speed loop operation with little power consumption by means of