US 7003461 B2 Abstract An adaptive codebook search (ACS) algorithm is based on a set of matrix operations suitable for data processing engines supporting a single instruction multiple data (SIMD) architecture. The result is a reduction in memory access and increased parallelism to produce an overall improvement in the computational efficiency of ACS processing.
Claims(21) 1. In a computer device for speech synthesis, a method for searching a codebook of excitation vectors to identify a selected excitation vector for CELP (code-excited linear prediction) coding comprising:
receiving an input speech signal;
computing a metric M
_{i }based on the input speech signal and a signal synthesized by an excitation vector v_{i};repeating the computing step for each excitation vector in the codebook; and
identifying a minimum metric (M
_{min}) from among the computed M_{i}'s, the excitation vector associated with M_{min }being the selected excitation vector used to produce synthesized speech,wherein the computing step includes computing a correlation quantity between a target vector signal and an impulse response comprising:
accessing elements R
_{i }of a first vector (R) stored in a first area of a memory component of the computer device and representative of the target vector signal;accessing elements I
_{i }of a second vector (I) stored in a second area of the memory component and representative of the impulse response;
where s>1 and Frm is a framesize,
wherein the vectors F
1 and F2 together are representative of the correlation quantity.2. The method of
_{i }is defined by
where d is the correlation quantity and
φ is a covariance matrix of the impulse response.
3. The method of
4. The method of
^{s}-way SIMD (single instruction multiple data) instruction set.5. The method of
^{s+1}-way SIMD (single instruction multiple data) instruction set.6. The method of
^{s−1 }MAC instructions.7. The method of
^{t}-way SIMD (single instruction multiple data) instruction set, where t≠s.8. The method of
2 includes loading the elements I_{(m−(2} _{ s } _{−1)) }through I_{m }from the vector I into a first set of one or more registers in a central processing unit (CPU) of the computing device, wherein the elements I_{(m−(2} _{ s } _{−1))−(2} _{ s } _{−1) }through I_{(m−(2} _{ s } _{−1))+1 }from the vector I will have been previously loaded into a second set of one or more registers in the CPU.9. A computer program product suitable for execution on a data processing device for use in a speech synthesis system, the data processing device supporting SIMD (single instruction multiple data) instructions comprising:
computer readable media containing a computer program to select an excitation vector from codebook containing a plurality of excitation vectors v,
the computer program comprising:
first computer program code to operate the data processing device to access from a first area of a memory component elements R
_{i }of a vector R representative of a target vector signal;second computer program code to operate the data processing device to access from a second area of the computer memory component elements I
_{i }of a vector I representative of an impulse response;third computer program code to operate the data processing device to access the excitation vectors v from the codebook, the codebook stored in a third area of the computer memory component;
fourth computer program code to operate the data processing device to compute a metric M
_{i }based on an input speech signal and a signal synthesized from an excitation vector v_{i}, including computing a vector F2 which is a portion of a correlation vector d representative of a correlation between the target vector signal and the impulse response, where
s>1 and Frm is a framesize;
fifth computer program code to obtain the input speech signal; and
sixth computer program code to coordinate the first, second, third and fourth computer program codes to compute a metric for each excitation vector in the codebook and to identify a minimum metric therefrom, the excitation vector associated with the minimum metric being the selected excitation vector,
wherein the selected excitation vector can be used to synthesize speech.
10. The computer program product of
_{i }is defined by
where φ is a covariance matrix of the impulse response.
11. The computer program product of
1, where
wherein the vector F
1 and the vector F2 together constitute the correlation vector d.12. The computer program product of
13. The computer program product of
14. A speech codec device comprising:
a input component operable to receive a speech signal to produce an input speech signal;
a processing component supporting one or more single instruction multiple data (SIMD) instructions;
a data storage component coupled to the processing component for transferring data therebetween;
a first portion of the data storage component having stored therein a codebook of excitation vectors v;
a second portion of the data storage component having stored therein a vector R representative of a target vector signal generated based on the input speech signal;
a third portion of the data storage component having stored therein a vector I representative of an impulse response to a synthesis filter; and
computer program code stored in the data storage component comprising a code portion suitable for execution on the processing component to compute a metric M
_{i}=
for an excitation vector v
_{i}, where φ is a covariance matrix of the impulse response and d is a correlation vector representative of a correlation between the target vector signal and the impulse response, the correlation vector d comprising a vector F1 and a vector F2, wherein
where s>1 and Frm is a framesize,
the computer program code further computing a plurality of the metrics M
_{i }and identifying a minimum one of the metrics M_{min}, wherein the excitation vector corresponding to M_{min }constitutes a selected excitation vector.15. The device of
^{s }are related by a power of 2.16. The device of
17. The device of
18. The device of
19. A speech synthesis device comprising:
means for receiving input speech to produce an input speech signal;
data processing means for performing single instruction multiple data (SIMD) operations, including a multiply and accumulate (MAC) operation;
memory means, in data communication with the data processing means, for storing a vector R representative of a target vector signal produced based on the input speech signal, a vector I representative of an impulse response to a synthesis filter, and a codebook of excitation vectors v; and
computer program code stored in the memory means comprising a code segment suitable for execution on the data processing means to compute a metric
for an excitation vector v
_{i}, where φ is a covariance matrix of the impulse response and d is a correlation vector representative of a correlation between the target vector signal and the impulse response, the correlation vector d comprising a vector F1 and a vector F2, wherein
where Frm is a framesize.
20. The speech synthesis device of
21. The speech synthesis device of
Description The present invention relates to speech processing in general, and more particularly to a speech encoding method and system based on code excited linear prediction (CELP). Code-excited linear prediction (CELP) is a speech coding technique commonly used for producing high quality synthesized speech at low bit rates, i.e., 4.8 to 9.6 kilobits-per-second (kbps). This class of speech coding, also known as vector-excited linear prediction, utilizes a codebook of excitation vectors to excite the LPC filter The ability to reduce the computation complexity without sacrificing voice quality is important in the digital communications environment. Thus, a need exists for improved CELP processing. A method and system for speech synthesis includes an adaptive codebook search (ACS) process based on a set of matrix operations suited for data processing engines which support one or more SIMD (single instruction multiple data) instructions. A set of matrix operations were determined which recast the conventional standard algorithm for ACS processing so that a SIMD implementation achieves not only improved computational efficiency, but also reduces the number of memory accesses to realize improvements in CPU (central processing unit) performance. The optimum excitation signal is determined in the codebook search process Referring to the general architectural diagram of a speech synthesis system As shown in The speech coder can utilize various storage technologies. A typical storage (memory) component A signal converter The speech synthesis system The calculation which takes place in the codebook search process As mentioned above, adaptive codebook search involves searching for a codebook entry that minimizes the mean square error between the input speech signal and the synthesized speech. It can be shown (per the G.723.1 ITU specification) that the computation of MSE can be reduced to an equation whose “maximum” represents the best codebook entry to be selected:
v φ=H d=H R is the target vector signal, and H is the impulse response of the synthesis filter The quantity d represents the correlation between the target vector signal r and the impulse response H. The quantity d is defined by:
0≦j≦FrmSz. The quantity φ represents the covariance matrix of the impulse response:
For each excitation vector v The equation for d for a speech codec (coder/decoder) per the ITU (International Telecommunication Union) reference ‘C’ implementation is expressed as:
ImpRes is the impulse response buffer, and pitch is a constant. A typical scalar implementation of this expression is shown by the following C-language code fragment:
The ‘saturate( )’ function or some equivalent is commonly used to prevent overflow. A line-by-line statistical profiling of a conventional adaptive codebook search algorithm indicates that the foregoing implementation for computing the correlation quantity d consumes about one third of the total processing time in a speech codec. It was discovered that a decomposition of the expression:
Referring now to -
- I[ ] is the vector ImpRes[ ], where a vector element is referenced as I
_{i}, - R[ ] is the vector RzBf[ ], where a vector element is referenced as R
_{i}, and - F[ ] is an output vector FltBuf[ ] to store the result of the operation and thus is representative of the correlation quantity d, where a vector element is referenced as F
_{i}.
- I[ ] is the vector ImpRes[ ], where a vector element is referenced as I
In accordance with the invention, the first four elements of F[ ] (F Another constituent component of elements F As can be seen in The matrix operations shown in Every four elements in F[ ] (e.g., F In accordance with various implementations of the embodiments of the present invention these operations are implemented in a computer processing architecture that supports a SIMD instruction set. A commonly provided instruction is the “multiply and accumulate” (MAC) instruction, which performs the operation of multiplying two operands and summing the product to a third operand. A generic MAC instruction might be:
In a SIMD architecture, the MAC instruction performs the operation simultaneously on multiple sets of data. Typically, the registers used by a SIMD machine can store multiple data. For example, a 64-bit register (e.g., %1) can contain four 16-bit data (e.g., %1 Typically, a SIMD instruction set comprises a full complement of instructions for all math and logical operations, and for memory load and store operations. Specific instruction formats will vary from one manufacturer of processing unit to another. However, the same ideas of parallel operations are common among them. The processing in In a step Similarly in subsequent MAC operations (steps In a step Next, various pointers are updated in a step Note that by setting the pointers ptrRend to the beginning of the vector R[ ] and ptrYnxt to the beginning of vector Ynxt[ ], the very first iteration through the foregoing steps produces the boundary condition computation shown in The processing in Next, in a step Similar operations are performed in steps Registers are updated in a step A test is performed in a step Referring to Similarly, the matrix operation shown in The following assembly code fragment is provided merely to illustrate an example of an implementation of the processing shown in
It can be seen that the generalized form shown in Conversely, if a SIMD architecture provides for 2-way parallelism, it can be appreciated that the matrix operations are nonetheless suited for 2-way parallel operations, albeit requiring two operations to perform. For example, operations using a 4×4 matrix (i.e., It is further noted that word size can determine the amount of parallelism attainable. Consider a 4-way SIMD, using 64-bit registers. A 16-bit data size results in a single MAC instruction per vector multiplication of a row in the matrix. However, an 8-bit data size would allow for two such multiplication operations to occur per MAC instruction. Conversely, a 32-bit data size would require two MAC instructions per matrix row. It can be appreciated from the foregoing that varying degrees of parallelism and hence attainable performance gains can be achieved by a proper selection of SIMD parallelism and word size. The selection involves tradeoffs of available technology, system cost, performance goals such as speed, quality of synthesized speech, and the like. While such considerations may be particularly relevant to the specific implementation of the present invention, they are not germane to the invention itself. The foregoing description of the present invention was presented using human speech as the source of analog signal being processed. It noted this is merely for convenience of explanation. It can be appreciated that any form of analog signal of bandwidth within the sampling capability of the system can be subject to the processing disclosed herein, and that the term “speech” can therefore be expanded to refer any such analog signals. It can be further appreciated that the specific arrangement which has been described is merely illustrative of one implementation of an embodiment according to the principles of the invention. Numerous modifications may be made by those skilled in the art without departing from the true spirit and scope of the invention as set forth in the following claims. Patent Citations
Non-Patent Citations
Referenced by
Classifications
Legal Events
Rotate |