US 5243685 A Abstract A method of breaking up a vocal signal into binary frames of a predetermined duration. The frames are grouped together in packets of successive frames by associating a predictive filter with each frame of a packet. Furthermore, the coefficients of each predictive filter are quantified by taking into account the stable or non-stable configuration of the vocal signal.
Claims(9) 1. A speech encoding method for the coding of very low bit rate vocoders, comprising the steps of:
cutting up a vocal signal into binary frames of a predetermined duration, grouping together of a predetermined number of frames in packets of successive frames, quantifying the coefficients of a predetermined number of first predictive filters associated with each frame in each packet respectively, quantifying the coefficients of at least one second predictive filter associated to a predetermined combination of frames, selecting the predictive filter for which a predictive error is minimum, and restoring said vocal signal as a speech signal as a function of coefficients of said selected predictive filter. 2. A method according to claim 1, wherein the predetermined number of frames in a packet ranges from 2 to 4 inclusively.
3. A method according to any one of claims 1 or 2 wherein the number of combinations is four, eight or sixteen.
4. A method according to claim 3, wherein the choice of combinations is limited to four:
a first combination where the predictive filters are identical; a second and third combination where only two predictive filters are identical; and a fourth combination where all three predictive filters are different. 5. A method according to claim 4 wherein, for each combination, the prediction coefficients and the energy of the prediction error are computed to select only the prediction coefficients for which the prediction error is minimal.
6. A method according to claim 5 wherein, for the computation of the prediction coefficients, a computation is made, in each frame, of the self-correlation coefficients R
_{i},k of the vocal signal sampled, and the algorithm of Leroux-Gueguen or of Schur is applied to determine the reflection coefficients of each predictive filter.7. A method according to claim 6, wherein the reflection coefficients L
_{i},j of the filters are ten in number and are coded on a total length of 33 bits, irrespectively of the combination.8. A method according to claim 7, wherein the reflection coefficients L
_{1} to L_{10} of the filters respectively have the following lengths:(5,5,4,4,4,3,2,2,2,2) bits according to the first combination, (5,4,4,3,3,2,2,2,2,0,0) bits and (3,2,2,1,0,0,0,0,0,0) bits according to the second and third combinations, (4,4,3,3,3,2,2,0,0) bits for the coding of the intermediate frame, the frame 2, according to the fourth combination (3,2,2,1,1,0,0,0,0,0,0) bits for the other two frames, frame 1 and frame 3, according to the fourth combination. 9. A method according to claim 6, wherein the reflection coefficients of the filters are determined by the relationship:
L wherein L _{i},j represents the reflection coefficients and K_{i},j represents the prediction coefficients.Description 1. Field of the Invention The present invention concerns a method and a device for coding predictive filters for very low bit rate vocoders. 2. Description of the Prior Art The best known of the methods of digitization of speech at low bit rate is the LPC10 or "linear predictive coding, order 10" method. In this method, the speech synthesis is achieved by the excitation of a filter through a periodic signal or a noise source, the function of this filter being to give the frequency spectrum of the signal a waveform close to that of the original speech signal. The major part of the bit rate, which is 2400 bits per second, is devoted to the transmission of the coefficients of the filter. To this end, the binary train is cut up into 22.5 millisecond frames comprising 54 bits, 41 of which are used to adapt the transfer function of the filter. A known method of bit rate reduction consists in compressing the 41 bit associated with a filter into 10 to 12 bits representing the number of a pre-defined filter, belonging to a dictionary of 2 Just as, in television, where the reconstruction of a color image depends essentially on the quality of the luminance signal and not on that of the chrominance signal which may consequently be transmitted with a lower definition, it appears, also in speech synthesis, that it is enough to reproduce only the contour of the energy of the vocal signal while its timbre (voicing, spectral shape) are less important for its reconstruction. Consequently, in known speech synthesis methods, the process of searching for spectra, based on the change in the minimum distance between the spectra of the original speech (of the speaker) and the synthetic speech is not wholly warranted. For example, different examples of the sound "A" pronounced by different speakers or recorded under different conditions may have a high spectral distance but will always continue to be "A"s that cam be recognized as such and, if there is any ambiguity, in terms of a possibility of confusion with its neighboring sound, the listener can always make the correction from the context by himself. In fact, experience shows that in devoting no more than about 30 bits to the coefficients of the predictive filter instead of 41, the quality of restitution remains satisfactory even if a trained listener should perceive a slight difference among the synthesized sounds with the predictive coefficients defined on 30 or 41 bits. Furthermore, since the transmission is done at a distance, and since the intended listener is therefore not in a position to make out this difference, it would appear to be enough for the listener to be capable of understanding the synthesized sound accurately. It would also appear to be important that, in the stable parts of the signal (the vowels), the predictive filter should remain stable and be as close as possible to the original predictive filter. By contrast, in the unstable parts (such as transitions or unvoiced sound), the transmitted predictor does not need to be a faithful copy of the original predictor. It is an aim of the invention to overcome the above-mentioned drawbacks. To this effect, an object of the invention is a method for the coding of predictive filters of very low bit rate vocoders of the type in which the vocal signal is cut up into binary frames of a determined duration, a method wherein said method consists in grouping together the frames in packets of successive frames, in associating a predictive filter respectively with each frame contained in a packet, and in quantifying the coefficients of each predictive filter in taking account of the stable or non-stable configuration of the vocal signal. Other characteristics and advantages of the invention will appear here below from the following description, made with reference to the appended drawings, of which: FIG. 1 is a block diagram of a prior art speech synthesizer; FIG. 2 shows, in the form of tables, the four possible codings of the predictive filters of the vocoder according to the invention; FIG. 3 is a flow chart used to illustrate the computation of the prediction error of the predictive filters applied by the invention; FIG. 4 shows a graph of transformation of the reflection coefficients of the predictive filters; FIG. 5 represents the relationship of quantification of the reflection coefficients of the filters transformed by the graph of FIG. 3; FIG. 6 shows a device for the application of the method according to the invention. The speech synthesizer shown in FIG. 1 includes, in a known way, a predictive filter 1 coupled by its input E
L the graph of which is shown in FIG. 3 or, again according to the relationships:
(L or again application of the LSP coefficients computing method described by George S. Kang and Lawrence J. Fransen in the article "Application of Line Spectrum Pairs to Low Bit Rate Speech Encoder", Naval Research Laboratory DC 20375, 1985. At the fourth step, shown at 8, the coefficients L
K These values K
A
A for p=1, 2, . . . 10. with
A Finally, at the last step shown at 10, the computation of the energy of the prediction error is computed by the application of the following relationship: ##EQU2## To complete the algorithm, it is enough then to test the four different configurations described above by interposing an additional step, between the first and second steps of the method, said additional step taking account of the possible configurations to finally choose only the configuration for which the total prediction error obtained is minimal (summed on the three frames). In the first configuration, the same filter is used for all three frames. Then, for the progress of the steps 2 to 6, a fourth single fictitious filter is used. This fourth filter is computed from the coefficients R
R with j varying from 0 to 10. The total prediction error is then equal to E The coefficients L1 to L10 may then be quantified with, for example, 5,5,4,4,4,3,2,2,2,2, bits respectively, giving 33 bits in all. According to the second configuration, in which one and the same filter is used for the frames 1 and 2, the algorithm is done with values of the self-correlation coefficients R
R where j successively takes the values of 1 to 10 for the first two frames and R The prediction error is equal to E The fact of not transmitting the coefficients L In the third configuration, where the same filters are used for the frames 2 and 3, the same method as in the second configuration is used in grouping together the coefficients R Finally, for the last configuration, where all the filters are different, it must be considered that the three frames are uncoupled and that the total error is equal to E The device for the implementation of the method which is shown in FIG. 6 includes a device 1 for the computation of the the self-correlation coefficients for each frame coupled with delay elements formed by three frame memories 12 It goes without saying that the invention is not restricted to the examples just described, and that it can take other alternative embodiments depending, notably, on the coefficients that are applied to the filters which may be other than the coefficients L Patent Citations
Non-Patent Citations
Referenced by
Classifications
Legal Events
Rotate |