CA2251509C - Feature extraction apparatus and method and pattern recognition apparatus and method - Google Patents

Feature extraction apparatus and method and pattern recognition apparatus and method Download PDF

Info

Publication number
CA2251509C
CA2251509C CA002251509A CA2251509A CA2251509C CA 2251509 C CA2251509 C CA 2251509C CA 002251509 A CA002251509 A CA 002251509A CA 2251509 A CA2251509 A CA 2251509A CA 2251509 C CA2251509 C CA 2251509C
Authority
CA
Canada
Prior art keywords
feature
vector
distribution
space
noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CA002251509A
Other languages
French (fr)
Other versions
CA2251509A1 (en
Inventor
Naoto Iwahashi
Bao Hongchang
Hitoshi Honda
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony Corp
Original Assignee
Sony Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Corp filed Critical Sony Corp
Publication of CA2251509A1 publication Critical patent/CA2251509A1/en
Application granted granted Critical
Publication of CA2251509C publication Critical patent/CA2251509C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech

Abstract

It is intended to increase the recognition rate in speech recognition and image recognition. An observation vector as input data, which represents a certain point in the observation vector space, is mapped to a distribution having a spread in the feature vector space, and a feature distribution parameter representing the distribution is determined. Pattern recognition of the input data is performed based on the feature distribution parameter.

Description

S'~~ ~ ~m ~ .CAS
FEATURE EXTRACTION APPARATUS AND METHOD AND
PATTERN RECOGNITION APPARATUS AND METHOD
The present invention relates to a feature extraction apparatus and method and a pattern recognition apparatus and method. In particular, the invention relates to a feature extraction apparatus and method and a pattern recognition apparatus and method which are suitable for use in a case where speech recognition is performed in a noise environment.
Descri ~ i on of h _ R .l at .d Ar Fig. 1 shows an example configuration of a conventional pattern recognition apparatus.
An observation vector as a pattern recognition object is input to a feature extraction section 101. The feature extraction section 101 determines, based on the observation vector, a feature vector that represents its feature quantity.
The feature vector thus determined is supplied to a discrimination section 102. Based on the feature vector supplied from the feature extraction section 101, the discrimination section 102 judges which of a predetermined number of classes the input observation vector belongs to.
For example, where the pattern recognition apparatus of Fig. 1 is a speech recognition apparatus, speech data of each time unit (hereinafter referred to as a frame where appropriate) is input to the feature extraction section 101 as an observation vector. The feature extraction section 101 acoustically analyzes the speech data as the observation vector, and thereby extracts a feature vector as a feature quantity of speech such as a power spectrum, cepstrum coefficients, or linear prediction coefficients. The feature vector is supplied to the discrimination section 102. The discrimination section 102 classifies the feature vector as one of a predetermined number of classes. A classification result is output as a recognition result of the speech data (observation vector).
Among known methods for judging which one of a predetermined number of classes a feature vector belongs to in the discrimination section 102 are a method using a Mahalanobis discriminant function, a mixed normal distribution function, or a polynomial function, a method using an hidden Markov model (HMM) method, and a method using a neural network.
For example, the details of the above speech recognition techniques aredisclosedin"Fundamentalsof Speech Recognition (I) and (II)," co-authored by L. Rabirier and B-H. Juang, translation supervised by Furui, NTT AdvancedTechnology Corp., 1995. As for the general pattern recognition, detailed descriptions are made in, for example, R. Duda and P. Hart, "Pattern Classification and Scene Analysis, " John Wiley & Sons, 1973.
Incidentally, when pattern recognition is performed, an observation vector (input pattern) as a pattern recognition object generally includes noise. For example, a voice as an observation vector that is input when speech recognition is performed includes noise of an environment of a user' s speech (e. g., voices of other persons or noise of a car). To give another example, an image as an observation vector that is input when image recognition is performed includes noise of a photographing environment of the image (e. g., noise relating to weather conditions such as mist or rain, or noise due to lens aberrations of a camera for photographing the image).
Spectral subtraction is known as one of feature quantity (feature vector) extraction methods that are used in a case of recognizing voices in a noise environment.
In the spectral subtraction, an input before occurrence of a voice ( i . a . , an input before a speech section) is employed as noise and an average spectrum of the noise is calculated.
Upon subsequent input of a voice, the noise average spectrum is subtracted from the voice and a feature vector is calculated by using a remaining component as a true voice component.
For example, the details of the spectral subtraction are disclosed in S . F. Boll, "Suppression of Acoustic Noise in Speech Using Spectral Subtraction," IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-27, No. 2, 1979; and P. Lockwood and J. Boudy, "Experiments with a Nonlinear Spectral Subtractor, Hidden Markov Models and the Proj ection, for Robust Speech Recognition in Cars," Speech Communication, Vol. 11, 1992.
Incidentally, it can be considered that the feature extraction section 101 of the pattern recognition apparatus of Fig. 1 executes a process that an observation vector a representing a certain point in the observation vector space is mapped to (converted into) a feature vector y representing a corresponding point in the feature vector space as shown in Fig. 2.
Therefore, the feature vector y represents a certain point (corresponding to the observation vector a) in the feature vector space. In Fig. 2, each of the observation vector space and the feature vector space is drawn as a three-dimensional space.
In the spectral subtraction, an average noise component spectrum is subtracted from the observation vector a and then the feature vector y is calculated. However, since the feature vector y represents one point in the feature vector space as described above, the feature vector y does not reflect characteristics representing irregularity of the noise such as variance though it reflects the average characteristics of the noise.
Therefore, the feature vector y does not sufficiently reflect the features of the observation vector a, and hence it is difficult to obtain a high recognition rate with such a feature vector y.
The present invention has been made in view of the above circumstances, and an object of the invention is therefore to increase the recognition rate.
According to a first aspect of the invention, there is provided a feature extraction apparatus which extracts a feature quantity of input data, comprising calculating means for calculating a feature distribution parameter representing a distribution that is obtained when mapping of the input data is made to a space of a feature quantity of the input data.
According to a second aspect of the invention, there is provided a feature extraction method for extracting a feature quantity of input data, comprising the step of calculating a feature distribution parameter representing a distribution that is obtained when mapping of the input data is made to a space of a feature quantity of the input data.
According to a third aspect of the invention, there is provided a pattern recognition apparatus which recognizes a pattern of input data by classifying it as one of a predetermined number of classes, comprising calculating meansforcalculating a feature distribution parameter representing a distribution that is obtained when mapping of the input data is made to a space of a feature quantity of the input data; and classifying means for classifying the feature distribution parameter as one of the predetermined number of classes.
According to a fourth aspect of the invention, there is provided a pattern recognition method for recognizing a pattern of input data by classifying it as one of a predetermined number of classes, comprising the steps of calculating a feature distribution parameter representing a distribution that is obtained when mapping of the input data is made to a space of a feature quantity of the input data; and classifying the feature distribution parameter as one of the predetermined number of classes.
According to a fifth aspect of the invention, there is provided a pattern recognition apparatus which recognizes a pattern of input data by classifying it as one of a predetermined number of classes, comprising framing means for extracting parts of the input data at predetermined intervals, and outputting each extracted data as 1-frame data; feature extracting means receiving the 1-frame data of each extracted data, for outputting a feature distribution parameter representing a distribution that is obtained when mapping of the 1-frame data is made to a space of a feature quantity of the 1-frame data; and classifying means for classifying a series of feature distribution parameters as one of the predetermined number of classes.
According to a sixth aspect of the invention, there is provided a pattern recognition method for recognizing a pattern of input data by classifying it as one of a predetermined number of classes, comprising a framing step of extracting parts of the input data at predetermined intervals, and outputting each extracted data as 1-frame data; a feature extracting step of receiving the 1-frame data of each extracted data, and outputting a feature distribution parameter representing a distribution that is obtained when mapping of the 1-frame data is made to a space of a feature quantity of the 1-frame data;
and a classifying step of classifying a series of feature distribution parameters as one of the predetermined number of classes.
In the feature extraction apparatus according to the first aspect of the invention, the calculating means calculates a feature distribution parameter representing a distribution that is obtained when mapping of the input data is made to a space of a feature quantity of the input data.
In the feature extraction method according to the second aspect of the invention, a feature distribution parameter representing a distribution that is obtained when mapping of the input data is made to a space of a feature quantity of the input data is calculated.
In the pattern recognition apparatus according to the third aspect of the invention, the calculating means calculates a feature distribution parameter representing a distribution that is obtained when mapping of the input data is made to a space of a feature quantity of the input data, and the classifying means classifies the feature distribution parameter as one of the predetermined number of classes.
In the pattern recognition method according to the fourth aspect of the invention, a feature distribution parameter representing a distribution that is obtained when mapping of the input data is made to a space of a feature quantity of the input data is calculated, and the feature distribution parameter is classified as one of the predetermined number of classes.
In a pattern recognition apparatus according to the fifth aspect of the invention which recognizes a pattern of input data by classifying it as one of a predetermined number of classes, parts of the input data are extracted at predetermined intervals, and each extracted data is output as 1-frame data. A feature distribution parameter representing a distribution that is obtained when mapping of the 1-frame data of each extracted is made to a space of a feature quantity of the 1-frame data is output. Then, a series of feature distribution parameters is classified as one of the predetermined number of classes.
In a pattern recognition method according to the sixth aspect of the invention for recognizing a pattern of input data by classifying it as one of a predetermined number of classes, parts of the input data are extracted at predetermined intervals, _ g _ and each extracted data is output as 1-frame data. A feature distribution parameter representing a distribution that is obtained when mapping of the 1-frame data of each extracted data is made to a space of a feature quantity of the 1-frame data is output. Then, a series of feature distribution parameters is classified as one of the predetermined number of classes.
A pattern recognition apparatus, comprising: means for inputting a digital speech signal containing a true voice component and noise; a framing section for separating parts of said speech signal at predetermined sampling time intervals and outputting a plurality (T) of frames, each frame representing an observation vector a(t) containing information regarding said part of said speech signal and said sampling time intervals; a feature extraction section for mapping said observation vector a(t) representing one point in the observation vector space to a spread of points in the feature vector space, said feature extraction section including: means for extracting, based on acoustic analysis of said observation vector a(t), a feature vector y(t) representative of a feature quantity of said speech signal; means for estimating a noise vector u(t); means for calculating a feature distribution parameter Z (t) = y (t) - a (t) , (t = 1, 2, . . . T) , representing a distribution of estimated values for said true voice component in the feature vector space: and a discrimination section including: a discriminant calculation unit for receiving said feature distribution parameter Z(t) and calculating, based on discriminant functions gk{Z(t)}, class probability values corresponding to "K" classes each class corresponding to a word, and a decision unit for storing and comparing said class probability values and declaring the largest class probability value as a voice recognition result.
A pattern recognition method, comprising: inputting a digital speech signal containing true voice component and noises separating a part of said speech signal at predetermined sampling time intervals and outputting a plurality (T) of frames, each frame representing an _g_ observation vector a(t) containing information regarding said part of said speech signal and said sampling time intervals:
extracting, based on acoustic analysis of said observation vector a(t), a feature vector y(t) representative of a feature quantity of said speech signal; estimating a noise vector a (t) ; calculating a feature distribution parameter Z (t) = y (t) - u(t), (t = 1, 2,...T), representing a distribution of estimated values for said true voice component in the feature vector space; computing, based on discriminant functions gk((Z(t)}, class probability values corresponding to " K"
classes each class corresponding to a word: and storing and comparing said class probability values and declaring the largest class probability value as a voice recognition result.
Fig. 1 is a block diagram showing an example configuration of a conventional pattern recognition apparatus;
Fig. 2 illustrates a process of a feature extraction section 101 shown in Fig. 1;
Fig. 3 is block diagram showing an example configuration of a speech recognition apparatus according to an embodiment of the present invention;
Fig. 4 illustrates a process of a framing section 1 shown in Fig. 3;
Fig. 5 illustrates a process of a feature extraction section 2 shown in Fig. 3;
Fig. 6 is a block diagram showing an example configuration of the feature extraction section 2 shown in Fig. 3;
Figs. 7A and 7B show probability density functions of a noise power spectrum and a true voice power spectrum;
Fig. 8 is a block diagram showing an example configuration of a discrimination section 3 shown in Fig. 3;
Fig. 9 shows an HN1NI; and Fig. 10 is a block diagram showing another example configuration of the feature extraction section 2 shown in Fig.
3.
D ,TATT~ .D D .S .RT TTON 0 TH R RR D MBODTMFNT
Fig. 3 shows an example configuration of a speech recognition apparatus according to an embodiment of the present invention.
Digital speech data as a recognition object is input to a framing section 1 . For example, as shown in Fig. 4, the framing section 1 extracts parts of received speech data at predetermined time intervals (e.g., 10 ms; this operation is called framing) and outputs each extracted speech data as 1-frame data. Each 1-frame speech data that is output from the framing section 1 is supplied to a feature extraction section 2 in the form of an observation vector a having respective time-series speech data constituting the frame as components.
In the following, an observation vector as speech data of a t-th frame is represented by a(t), where appropriate.
The feature extraction section 2 (calculating means) acoustically analyzes the speech data as the observation vector a that is supplied from the framing section 1 and thereby extracts a feature quantity from the speech data. For example, the feature extraction section 2 determines a power spectrum of the speech data as the observation vector a by Fourier-transforming it, and calculates a feature vector y having respective frequency components of the power spectrum as components . The method of calculating a power spectrum is not limited to Fourier transform; a power spectrum can be determined by other methods such as a filter bank method.
Further, the feature extraction section 2 calculates, based on the above-calculated feature vector y, a parameter (hereinafter referred to as a feature distribution parameter) Z that represents a distribution, in the space of a feature quantity (i.e., feature vector space), obtained when a true voice included in the speech data as the observation vector a is mapped to points in the feature vector space, and supplies the parameter Z to a discrimination section 3.
That is, as shown in Fig. 5, the feature extraction section 2 calculates and outputs, as a feature distribution parameter, a parameter that represents a distribution having a spread in the feature vector space obtained by mapping of an observation vector a representing a certain point in the observation vector to the feature vector space.
Although in Fig. 5 each of the observation vector space and the feature vector space is drawn as a three-dimensional space, the respective numbers of dimensions of the observation vector space and the feature vector space are not limited to three and even need not be the same.
The discrimination section 3 (classifying means) classifies each of feature distribution parameters (a series of parameters) that are supplied from the feature extraction section 2 as one of a predetermined number of classes, and outputs a classification result as a recognition result of the input voice. For example, the discrimination section 3 stores discriminant functions to be used for judging which of classes corresponding to a predetermined number K of words a discrimination obj ect belongs to, and calculates values of the discriminant functions of the respective classes by using, as an argument, the feature distribution parameter that is supplied from the feature extraction section 2. A class (in this case, a word) having the largest function value is output as a recognition result of the voice as the observation vector a.
Next, the operation of the above apparatus will be described.
The framing section 1 frames input digital speech data as a recognition object. Observation vectors a of speech data of respective frames are sequentially supplied to the feature extraction section 2. The feature extraction section 2 determines a feature vector y by acoustically analyzing the speech data as the observation vector a that is supplied from the framing section 1. Further, based on the feature vector y thus determined, the feature extraction section 2 calculates a feature distribution parameter Z that represents a distribution in the feature vector space, and supplies it to the discrimination section 3. The discrimination section 3 calculates, by using the feature distribution parameter supplied from the feature extraction section 2, values of the discriminant functions of the respective classes corresponding to the predetermined number K of words, and outputs a class having the largest function value as a recognition result of the voice.
Since speech data as an observation vector a is converted into a feature distribution parameter 2 that represents a distribution in the feature vector space (space of a feature quantity of speech data) as described above, the feature distribution parameterZreflects distribution characteristics of noise included in the speech data. Further, since the voice is recognized based on such a feature distribution parameter Z, the recognition rate can greatly be increased.
Fig. 6 shows an example configuration of the feature extraction section 2 shown in Fig. 3.
An observation vector a is supplied to a power spectrum analyzerl2. Thepowerspectrum analyzerl2 Fourier-transforms the observation vector a according to, for instance, a FFT ( fast Fourier transform) algorithm, and thereby determines (extracts), as a feature vector, a power spectrum that is a feature quantity of the voice. It is assumed here that an observation vector a as speech data of one frame is converted into a feature vector that consists of D components (i.e., a D-dimensional feature vector).
Now, a feature vector obtained from an observation vector a (t) of a t-th frame is represented by y (t) . Further, a true voice component spectrum and a noise component spectrum of the feature vector y(t) are represented by x(t) and u(t), respectively. In this case, the component spectrum x (t) of the true voice is given by x(t) - y(t) - a (t) ..... (1) where it is assumed that noise has irregular characteristics and that the speech data as the observation vector a (t) is the sum of the true voice component and the noise.
Since the noise a (t) has irregular characteristics, a (t) is a random variable and hence x (t) , which is given by Equation (1) , is also a random variable. Therefore, for example, if the noise power spectrum has a probability density function shown in Fig. 7A, the probability density function of the power spectrum of the true voice is given as shown in Fig. 7B according to Equation (1). The probability that the power spectrum of the true voice has a certain value is obtained by multiplying, by a normalization factor that makes the probability distribution of the true voice have an area of unity, a probability that the noise power spectrum has a value obtained by subtracting the above value of the power spectrum of the true voice from the power spectrum of the input voice (input signal) .
Figs . 7A and 7B are drawn with an assumption that the number of components of each of u(t), x(t), and y(t) is one (D = 1).
Returning to Fig. 6, the feature vector y(t) obtained by the power spectrum analyzer 12 is supplied to a switch 13. The switch 13 selects one of terminals 13a and 13b under the control of a speech portion detection section 11.
The speech section detection section 11 detects a speech portion ( i . a . , a period during which a user is speaking) . For example, the details of a method of detecting a speech section are disclosed in J. C . Junqua, B . Mark, and B . Reaves, "A Robust Algorithm for Word Boundary Detection in the Presence of Noise, "
IEEE Transaction Speech and Audio Processing, Vol. 2, No. 3, 1994.
A speech portion can be recognized in other ways, for example, by providing a proper button in the speech recognition apparatus and having a user manipulate the button while he is speaking.
The speech portion detection section 11 controls the switch 13 so that it selects the terminal 13b in speech portions and the terminal 13a in the other sections (hereinafter referred to as non-speech portions where appropriate).
Therefore, in a non-speech portion, the switch 13 selects the terminal 13a, whereby an output of the power spectrum analyzer 12 is supplied to a noise characteristics calculator 14 via the switch 13. The noise characteristics calculator 14 calculates noise characteristics in a speech portion based on the output of the power spectrum analyzer 12 in the non-speech portion-In this example, .the noise characteristics calculator 14 determines average values (average vector) and variance (a variance matrix) of noise with assumptions that a noise power spectrum u(t) in a certain speech .portion has the same distribution as that in the non-speech portion immediately preceding that speech portion and that the distribution is a normal distribution. w Specifically, assuming that the first frame of the speech portion is a No. 1 frame (t = 1), an average vector ~Z' and a variance matrix E' of outputs y(-200) to y(-101) of the power spectrum analyzer 12 of 100 frames ( from a frame preceding the speech portion by 200 frames to a frame preceding the speech portion by 101 frames) are determined as noise characteristics in the speech portion.
The average vector ~' and the variance matrix E' can be determined according to -~o~ _ ~ N 200 y(~)(~) ~ ~ c~,i~- ,oo ,' 2~ cyct)r)-~' c~))cyct)o)-~' o)) . . . . . c2) where ~'(i) represents an ith component of the average vector ~' (i = 1, 2, ..., D), y(t)(i) represents an ith component of a feature vector of a t-th frame, and E'(i, j) represents an ith-row, jth-column component of the variance matrix s' (j =
1, 2, . . . , D) .
Here, to reduce the amount of calculation, it is assumed that for noise the components of the feature vector y have no mutual correlation. In this case, the components other than the diagonal components of the variance matrix E' are zero as expressed by ~' (~,i)~a , ~~i ..... (3) The noise characteristics calculator 14 determines the average vector ~' and the variance matrix ~' as noise characteristics in the above-described manner and supplies those to a feature distribution parameter calculator 15.
On the other hand, the switch 13 selects the terminal 13b in the speech portion, whereby an output of the power spectrum analyzer 12, that is, a feature vector y as speech data including a true voice and noise, is supplied to a feature distribution parameter calculator 15 via the switch 13. Based on the feature vector y that is supplied from the power spectrum analyzer 12 and the noise characteristics that are supplied from the noise characteristics calculator 19, the feature distribution parameter calculator 15 calculates a feature distribution parameter that represents a distribution of the power spectrum of the true voice (distribution of estimated values>.
That is, with an assumption that the power spectrum of the true voice has a normal distribution, the feature distribution parameter calculator 15 calculates, as a feature distribution parameter, an average vector ~ and a variance matrix ~ of the distribution according to the following formulae:
E'(t)(~)-E(X(t)(~)1 -Eb())())w(~)(~)) rf)(~) (Y(t)())-~())(~)) r~~)(P(u(t)(~)) d~())(i) - Jarn)c~) ( ( >( )) ( )( )-Jora)c~) ( Y())(~)J P a t ) du ) i a y(i)P(~())()))d~())n) Jrt~)a) . . . . .
a P(~())(~)~~(t)(~) -Y(~)(t)- o Y(~)(~) P(~())(~)k~(t)()) a If i = j, ~(t) (i, j) - V[x (t) (i) ]
- E [ (x (t) (i) ) 2] - (E [x (t) (i) ] ) 2 (= E [ (x (t) (i) ) 21 - (~ (t) (i) ) z) .
If i ~ j, ~(t) (i, j) - 0.
..... (5) E ((x(t)(i))z]~ E ((Y(t)(i)-u(t)(i))z]
Y(t)(i) 'fa (Y(t)(i)-u(t)(i))Z r(t)(i)P(u(t)(i)) du(t)(i) P(u(t)(i))du(t)(i) 1 x (Y(t)(i))z fort>(~)P(u(t)li))du(t)(i) ~r(t>(i) P(u(t)(i))du(t)(i) y(t)(i) -2y(t}(i)~o u(t)(i)P(u(t)(i))du(t)(i) ry(t)(i) + (u(t)(i))zP(u(t)(i))du(t)(i) rr(t1(i) u(t)(i)P(u(t)(i))du(t)(i) '(Y(t)(i))z-2Y(t)(i) o ,,y(t)li) . . . . . ( 6) P(u(t)(i))du(t)(i) ~rf)(i) (u(t)(i))zP(u(t)(i))du(t)(i) + o ~y(c)(i) Jo P(u(t)(i))du(t)(i) 1 z F ~ (i,j) (u(t)(i) i~ ~ (i))Z
P(u(t)(i))=
2n ~'(i,i) ..... (7) In the above formulae, ~(t) (i) represents an ith component of an average vector ~ (t) of a t-th frame, E [ ] means an average value of a variable in brackets ~~ [ ) , " and x ( t) ( i ) represents an ith component of a power spectrum x (t) of the true voice of the t-th frame . Further, a ( t ) ( i ) represents an ith component of a noise power spectrum of the t-th frame, and P(u(t)(i)) represents a probability that the ith component of the noise power spectrum of the t-th frame is a (t) (i) . In this example, since the noise distribution is assumed to be a~ normal distribution, P(u(t)(i)) is given by Equation (7).
Further,. y~(t)(i, j) represents an ith-row, jth-column 'component of a variance matrix y~ ( t) of the t-th frame, and V [ ]
means variance of a variable in brackets "[]."
In the above manner, the feature distribution parameter calculator 15 determines, for each frame, an.average vector ~
and a variance matrix yr as a feature distribution parameter representing a distribution of the true voice in the feature vector space (i.e., a normal distribution as assumed to be a distribution of the true voice in the feature vector space).
Then, when the speech portion has finished, the switch 13 selects the terminal 13a and the feature distribution parameter calculator 15 outputs the feature parameter that has been determined for each frame in the speech portion are output to the discrimination section 3. That is, assuming that the speech portion consists of T frames and that a feature distribution parameter determined for each of the T frames is~expressed as z(t) - t~(t), yr(t)} where t - 1, 2, ..., T, the feature distribution parameter calculator 15 supplies a feature distribution parameter ( a series of parameters ) ~Z = { z ( 1 ) , a(2), ..., z(T)} to the discrimination section 3.
The feature extraction section 2 thereafter repeats similar processes.
Fig. 8 shows an example configuration of the discrimination section 3 shown in Fig. 3.
The feature distribution parameter Z that is supplied from the feature extraction section 2 (feature distribution parameter calculator 15) is supplied to K discriminant function calculation sections 211-21K. The discriminant function calculation section 21k stores a discriminant function gk(Z) for discrimination of a word corresponding to a kth class of the K classes ( k = l, 2, . . . , K) , and the discriminant function gk(Z) is calculated by using, as an argument, the feature distribution parameter Z that is supplied from the feature extraction section 2.
The discrimination section 3 determines a word as a class according to an HMM (hidden Markov model) method, for example.
In this embodiment, for example, an HMM shown in Fig. 9 is used. In this HMM, there are H states ql-qH and only a self-transition and a transition to the right adjacent state are permitted. The initial state is the leftmost state ql and the final state is the rightmost state qH, and a state transition from the final state qH is prohibited. A model in which no transition occurs to states on the left of the current state is called a left-to-right model. A left-to-right model is generally employed in speech recognition.
Now, a model for discrimination of a kth class of the HMM

is called a kth class model. For example, the kth class model is defined by a probability (initial state probability) ~k(qh) that the initial state is a state q," a probability (transition probability) ak (qi, q~ ) that a state qi is established at a certain time point (frame) t and a state transition to a state q~ occurs at the next time point t+1, and a probability (output probability) bk (qi) (0) that a state qi outputs a feature vector O when a state transition occurs from the state qi (h = 1, 2, . . . , H) .
When a feature vector series O1, O~, ... is supplied, the class of a model having, for example, a highest probability (observation probability) that such a feature vector series is observed is selected as a recognition result of the feature vector series.
In this example, the observation probabilityis determined by using the discriminant function gk(Z). That is, the discriminant function gf (Z) is given by the following equation as a function for determining a probability that the feature distribution parameter (series) Z = (zl, z2, . . ., zT} is observed in an optimum state series (i.e., an optimum manner of state transitions) for the feature distribution parameter (series) Z = ( zl, zz, . . . , zT } .
9k(Z)° max ~k(qt) ' bk(qt)(Zt) ' ak(9t,q2) ' bk(qz)(z2) a,.~.. . ..Rr . ~ ak~qT-t~9T) ' bk~qT)~ZT) . . . . . ( In the above equation, b,~' (qi) ( z; ) represents an output probability for an output having a distribution z~. In this embodiment, for example, an output probability b;~ (s) (O~) , which is a probability that each feature vector is output at a state transition, is expressed by a normal distribution function with an assumption that components in the feature vector space have no mutual correlation. In this case, when an input has a distribution zt, an output probability b;~' (s) (zt) can be determined by the following equation that includes a probability density function Pkm(s)(x) that is defined by an average vector uk(s) and a variance matrix E,~(s) and a probability density function Pf(t)(x) that represents a distribution of a feature vector (in this embodiment, a power spectrum) of a t-th frame.
bk(5)(zt)° jPt(t)(X)~k(S)(x)dx = TT P(s}(i)(~'(t)(i). 't' (t)(i,l}) k=~,2,...,K : s-9~.qz~.~,qr: T=1,2~..,T _ .. . . (9) In Equation (9), the integration interval of the integral is the entire D-dimensional feature vector space (in this example, the power spectrum space).

In Equation (9) , P (s) (i) (~ (t) (i) , yr (t) (i, i) ) is given by P(S)(~)(~(t)(~), ')' (t)(~,~)) _ (!lk(S>t-~(t)(~))Z
1 a 2(~(S)(~,~)+~'(c)(~,~)) . . . . . ( 10 ) 2 n (Fk(s)(~~()~ '~ (t)(~,i)) where ~.k(s) (i) represents an ith component of an average vector ~r(s) and Ek(s)(i, i) represents an ith-row, ith-column component of a variance matrix ~k(s). The output probability of the kth class model is defined by the above equations.
As mentioned above, the HNIM is defined by the initial state probabilities nk (q,,) , the transition probabilities ak (qi, q~ ) , and the output probabilities bk(qi)(0), which are determined in advance by using feature vectors that are calculated based on learning speech data.
Where the HI'~M shown in Fig. 9 is used, transitions start from the leftmost state ql. Therefore, the initial probability of only the state ql is 1 and the initial probabilities of the other states are 0. As seen from Equations (9) and (10), if terms yi(t) (i, i) are 0, the output probability is equal to an output probability in a continuous F-~I in which the variance of feature vectors is not taken into consideration.
An example of an ~ learning method is a Baum-Welch re-estimation method.
The discriminant function calculation section 21;; shown in Fig. 8 stores, for the kth class model, the discriminant function g;;(Z) of Equation (8) that is defined by the initial state probabilities ~~ (q,,) , the transition probabilities a~ (qi, q~) , and the output probabilities bk (qi) (0) which have been determined in advance through learning. The discriminant function calculation section 21~ calculates the discriminant function gk (Z) by using a feature distribution parameter Z that is supplied from the feature extraction section 2, and outputs a resulting function value (above-described observation probability) gk(Z) to a decision section 22.
The decision section 22 determines a class to which the feature distribution parameter Z, that is, the input voice, belongs to by applying, for example, a decision rule of the following formula to function values gk(Z) that are supplied from the respective determinant function calculation sections 211-21;~ (i . a . , the input voice is classified as one of the classes).
C(Z)Wx , ~ 9x(Z)=max{9~(Z)) . . . . . ( 11 ) where C (Z) is a function of a discrimination operation (process) for determining a class to which the feature distribution parameter Z belongs to . The operation "max" on the right side of the second equation of Formula ( 11 ) means the maximum value of function values gi(Z) following it (i = l, 2, ..., K).
The decision section 22 determines a class according to Formula ( 11 ) and outputs it as a recognition result of the input voice.
Fig. 10 shows another example configuration of the feature extraction section 2 shown in Fig. 3. The components in Fig.
having the corresponding components in Fig . 6 are given the same reference symbols as the latter. That is, this feature extraction section 2 is configured basically in the same manner as that of Fig. 6 except that a noise buffer 31 and a feature distribution parameter calculator 32 are provided instead of the noise characteristics calculator l4 and the feature distribution parameter calculator 15, respectively.
In this example, for example, the noise buffer 31 temporarily stores, as noise power spectra, outputs of the power spectrum analyzer 12 in a non-speech portion. For example, the noise buffer 3l~stores, as noise power spectra, w(1) , w(2) , . . ., w(100) that are respectively outputs y(-200), y(-199), ..., y(-101) of the power spectrum analyzer 12 of 100 frames that precede a speech portion by 200 frames to lOl fraimes, respectively.
The noise power spectra w (n) of 100 frames (n = 1, 2, . . . , N~ 'in this example, N - 100) are output to the feature distribution parameter calculator 32 when a speech portion has appeared.
When the speech portion has appeared and the feature distribution parameter calculator 32 has received the noise power spectra w (n) (n = 1, 2, . . . , N) from the noise buffer 31, the feature distribution parameter calculator 32 calculates, for example, according to the following equations, an average vector ~ (t) and a variance matrix ~ (t) that define a distribution (assumed to be a normal distribution) of a power spectrum of a true voice (i.e., a distribution of estimated values of the power spectrum of the true voice).
~(t)(i)= E [x(t)(i)) a N nF' (Y(t)(i)-'N(~)(i)) N
'f(t)(i.l)~ N ~~~ ((Y(t)(i)-W(~)(~)-~(t)(i)) X(Y(t)(1)-'n'(~)(J)-~(t)(j))) . . . . . ( 12 ) j-~ 2 ...,D : j=1,2,~- ,D
where w(n) (i) represents an ith component of an nth noise power spectrum w (n) (w (n) (j ) is defined similarly) .
The feature distribution parameter calculator 32 determines an average vector ~(t) and a variance matrix E(t) for each frame in the above manner, and outputs a feature distribution parameter Z = { zl, zz, . . . , zT } in the speech section to the discrimination section 3 (a feature distribution parameter zt is a combination of ij (t) and E (t) ) .
While in the case of Fig. 6 it is assumed that components of a noise power spectrum have no mutual correlation, in the case of Fig. 10 a feature distribution parameter is determined without employing such an assumption and hence a more accurate feature distribution parameter can be obtained.

Although in the above examples a power spectrum is used as a feature vector ( feature quantity) , a cepstrum, for example, can also be used as a feature vector.
Now assume that x~ (t) represents a cepstrum of a true voice of a certain frame t and that its distribution (distribution of estimated values of the cepstrum) is a normal distribution, for example. An average vector ~' (t) and a variance matrix y~' (t) that define a probability density function P~(t)(x') that represents a distribution of a feature vector (in this case, a cepstrum) x' of the t-th frame can be determined according to the following equations.
N
~c(t)(~)= N ~~~ x'(t)(~)(~) i=1,2,...,D
N
'~'(t)(i,i)= N ~~~ (x'(t)(n)(i)-~°(t)(i))(X°(t)(~)(j)-~~(t)(i)) i=1,2,~ ~ ,D : j=1,2,~ ~ ~,D
..... (13) where ~° (t) (i) represents an ith component of the average vector ~'(t), ~r'(t)(i, j) is an ith-row, jth-column component of the variance matrix ~~'(t), and x'(t)(n)(i) is an ith component of a cepstrum x'(t)(n) that is given by the following equations.
x' (t) (n) - Cx~' (t) (n) xL(t) (n) - (xL(t) (n) (1), x~'(t) (n) (2), ..., x~'(t) (n) (D)) x"(t) (n) (i) - log(y(t) (i) - w(n) (i) ) .... (14) where i = l, 2, . . . , D. In the first equation of Equations ( 14 ) , C is a DCT (discrete cosine transform) matrix.
Where a cepstrum is used as a feature vector, the feature extraction section 2 of Fig. 3 may determine an average vector ~'(t) and a variance matrix ~V'(t) for each frame in the above manner, and output a feature distribution parameter Z' - { zl', z2', . .., zT'} in a speech section to the discrimination section 3 (a feature distribution parameter zt' is a combination {~' (t) , ~' (t) } .
In this case, an output probability bk' (s ) ( z_') , which is used to calculate a discriminant function gf(Z') in the discrimination section 3, can be determined, as a probability representing a distribution in the cepstrum space, by the following equationthatincludes a probability densityfunction Pkm ( s ) ( x' ) that i s de f fined by an average vector ~k' ( s ) and a variance matrix E,'(s) and a probability density function P'(t)(x') that represents a distribution of a feature vector (in this case, a cepstrum) of a t-th frame.
bk~s)~Zi )y P~~X')Pk ~S)~x')dx~
~k~s~~T~~C~~~+~1~5» ~~~~~~ ~k~s)~
_ a ..... (15) In Equation (15) , the integration interval of the integral is the entire D-dimensional feature vector space (in this case, cepstrum space). The term (~~(t) - ~k'(s))T is a transpose of a vector ~~ ( t ) - ~k~ ( s ) .
Since, as described above, a feature distribution parameter is determined that reflects noise distribution characteristics and speech recognition is performed by using the thus-determined feature distribution parameter, the recognition rate can be increased.
Table 1 shows recognition rates in a case where a speech recognition (word recognition) experiment utilizing the feature distribution parameter was conducted by using a cepstrum and an HN~ method as a feature quantity of speech and a speech recognition algorithm of the discrimination section 3, respectively, and recognition rates in a case where a speech recognition experiment utilizing the spectral subtraction was conducted.
Recognition rate (o) Speech input environment SS method Invention Idling and background music 72 86 Running in city area 85 90 Running on expressway 57 69 In the above experiments, the number of recognition obj ect words was 5,000 and a speaker was an unspecific person.
Speaking was performed in three kinds of environments, that is, an environment that the car was in an idling state and background music is heard, an environment that the car was running in a city area, and an environment that the car was running on an expressway.
As seen from Table l, in any of those environments, a higher recognition rate was obtained by the speech recognition utilizing the feature distribution parameter.
The speech recognition apparatus to which the invention is applied has been described above. This type of speech recognition apparatus can be applied to a car navigation apparatus capable of speech input and other various apparatuses.
In the above embodiment, a feature distribution parameter is determined which reflects distribution characteristics of noise. It is noted that, for example, the noise includes external noise in a speaking environment as well as characteristics of a communication line (when a voice that is transmitted via a telephone line or some other communication line is to be recognized).
For example, the invention can also be applied to learning for a particular speaker in a case of specific speaker recognition. In this case, the invention can increase the learning speed.
The invention can be applied to not only speech recognition but also pattern recognition such as image recognition. For example, in the case of image recognition, the image recognition rate can be increased by using a feature distribution parameter that reflects distribution characteristics of noise that is lens characteristics of a camera for photographing images, weather states, and the like.
In the above embodiment, a feature distribution parameter that represents a distribution in the power spectrum space or the cepstrum space is determined. However, other spaces such as a space of linear prediction coefficients, a space of a difference between cepstrums of adjacent frames, and a zero-cross space can also be used as a space in which to determine a distribution.
In the above embodiment, a feature distribution parameter representing a distribution in a space of one ( kind of ) feature quantity of speech is determined. However, it is possible to determine feature distribution parameters in respective spaces of a plurality of feature quantities of speech. It is also possible to determine a feature distribution parameter in one or more of spaces of a plurality of feature quantities of speech and perform speech recognition by using the feature distribution parameter thus determined and feature vectors in the spaces of the remaining feature quantities.
In the above embodiment, a distribution of a feature vector (estimated values of a feature vector of a true voice) in the feature vector space is assumed to be a normal distribution, and a feature distribution parameter representing such a distribution is used. However, other distributions such as a logarithmic normal probability distribution, a discrete probability distribution, and a fuzzy distribution can also be used as a distribution to be represented by a feature distribution parameter.
Further, in the above embodiment, class discrimination in the discrimination section 3 is performed by using an HMM in which the output probability is represented by a normal distribution. However, it is possible to perform class discrimination in the discrimination section 3 in other ways, for example, by using an HMM in which the output probability is represented by a mixed normal probability distribution or a discrete distribution, or by using a normal probability distribution function, a logarithmic probability distribution function, a polynomial function, a neural network, or the like .
As described above, in the feature extraction apparatus and method according to the invention, a feature distribution parameter representing a distribution that is obtained when mapping of input data is made to a space of a feature quantity of the input data is calculated. Therefore, for example, when input data includes noise, a parameter that reflects distribution characteristics of the noise can be obtained.
In the pattern recognition apparatus and method according to theinvention, a feature distribution parameter representing a distribution that is obtained when mapping of input data is made to a space of a feature quantity of the input data is calculated, and the feature distribution parameter is classified as one of a predetermined number of classes.
Therefore, for example, when input data includes noise, a parameter that reflects distribution characteristics of the noise can be obtained. This makes it possible to increase the recognition rate of the input data.

Claims (36)

1. A feature extracting apparatus, comprising:
means for inputting a digital speech signal containing a true voice component and noise;
a framing section for separating parts of said speech signal at predetermined sampling time intervals and outputting a plurality (T) of frames, each frame representing an observation vector a(t) containing information regarding said part of said speech signal and said sampling time intervals;
a feature extraction section for mapping said observation vector a(t) representing one point in the observation vector space to a spread of points in a feature vector space, said feature extraction section including:
- means for extracting, based on acoustic analysis of said observation vector a(t), a feature vector y(t) representative of a feature quantity of said speech signal;
- means for estimating a noise vector u(t) and - means for calculating a feature distribution parameter Z(t) = y(t) - u(t), (t = 1,
2,...T), representing a distribution of estimated values for said true voice component in the feature vector space.

2. The apparatus of claim 1, wherein said sampling time internals are equal.
3. The apparatus of claim 2, wherein said sampling time intervals are 20 ms each.
4. The apparatus of claim 1, wherein said feature quantity is a power spectrum and said noise has a noise power spectrum.
5. The apparatus of claim 4, wherein said noise power spectrum has an irregular distribution.
6. The apparatus of claim 4, wherein said noise power spectrum has a normal distribution during both a speech portion and the immediately preceding non-speech portion.
7. The apparatus of claim 6, wherein components of said feature vector y(t) have no mutual correlation, and wherein said feature extraction section comprising:
- a power spectrum analyser for applying a fast Fourier transform (FFT) to said observation vector a(t) to obtain a power spectrum of said feature vector y(t);
- a detection unit for identifying said speech portion and said non-speech portion in said speech signal; and - a switch, under the control of said detection unit, for selectively connecting said analyser to a first processor for averaging said noise power spectrum components during said non-speech portion, and to a second processor for receiving said average noise power spectrum and said power spectrum of said feature vector y(t) and outputting said feature distribution parameter Z(t).
8. The apparatus of claim 6, wherein said feature extraction section comprising a buffer for storing noise power spectra w(n) associated with frames contained in said non-speech portion, and a processor for receiving said noise power spectra w(n) and said power spectrum of said feature vector y(t) and calculating said feature distribution parameter Z(t).
9. The apparatus of any one of claims 7 or 8, wherein said feature distribution parameter Z(t) includes an average vector and a variance matrix.
10. The apparatus of claim 1, wherein said feature vector is a cepstrum vector and said feature vector space is a cepstrum space.
11. The apparatus of claim 1, wherein said feature vector space is one of a space of linear prediction coefficients, a space of a difference between cepstrums of adjacent frames, and a zero-cross space.
12. The apparatus of claim 1, wherein said calculating means calculates said feature distribution parameter Z(t) in a respective space of a plurality of feature quantities of said speech signal.
13. The apparatus of claim 1, wherein said feature distribution parameter Z(t) is calculated for a space associated with said feature quantity of said speech signal and further used for the spaces associated with the remaining feature quantities.
14. The apparatus of claim 1, wherein said distribution of estimated values is one of a normal probability distribution, a logarithmic normal probability distribution, a discrete probability distribution, and a fuzzy distribution.
15. A feature extracting method, comprising:
inputting a digital speech signal containing true voice component and noise;
separating a part of said speech signal at predetermined sampling time intervals and outputting a plurality (T) of frames, each frame representing an observation vector a(t) containing information regarding said part of said speech signal and said sampling time intervals;
extracting, based on acoustic analysis of said observation vector a(t), a feature vector y(t) representative of a feature quantity of said speech signal;
estimating a noise vector u(t); and calculating a feature distribution parameter Z(t) = y(t) - u(t), (t = 1, 2,...T), representing a distribution of estimated values for said true voice component in a feature vector space.
16. The method of claim 15, wherein said sampling time intervals are equal.
17. The method of claim 16, wherein said sampling time intervals are 20 ms each.
18. The method of claim 15, wherein said feature quantify is a power spectrum and said noise has a noise power spectrum.
19. The method of claim 18, wherein said noise power spectrum has an irregular distribution
20. The method of claim 18, wherein noise power spectrum has a normal distribution during both a speech portion and the immediately preceding non-speech portion.
21. The method of claim 20, wherein said feature vector y(t) components have no mutual correlation, and wherein said step of estimating is performed by averaging said noise power spectrum.
22. The method of claim 20, wherein said step of calculating is performed by storing noise power spectra w(n), (n = 1,2,..N), during said non-speech portion and computing during said speech portion an average of estimated true voice information contained in each feature vector y(t) component using said noise power spectra w(n).
23. The method of claim 15, wherein said feature distribution parameter Z(t) includes an average vector and a variance matrix.
24. The method of claim 15, wherein said feature vector is a cepstrum vector and said feature vector space is a cepstrum space.
25. The method of claim 15, wherein said feature vector space is one of a space of linear prediction coefficients, a space of a difference between cepstrums of adjacent frames, and a zero-cross space.
26. The method of claim 15, wherein said calculating means calculates said feature distribution parameter Z(t) in a respective space of a plurality of feature quantities of said speech signal.
27. The method of claim 16, wherein said feature distribution parameter Z(t) is calculated for a space associated with said feature quantity of said speech signal and further used for the spaces associated with the remaining feature quantities.
28. The method of claim 16, wherein said distribution of estimated values is one of a normal probability distribution, a logarithmic normal probability distribution, a discrete probability distribution, and a fuzzy distribution.
29. A pattern recognition apparatus, comprising:
means for inputting a digital speech signal containing a true voice component and noise;
a framing section for separating parts of said speech signal at predetermined sampling time intervals and outputting a plurality (T) of frames, each frame representing an observation vector a(t) containing information regarding said part of said speech signal and said sampling time intervals;
a feature extraction section for mapping said observation vector a(t) representing one point in the observation vector space to a spread of points in a feature vector space, said feature extraction section including:
- means for extracting, based on acoustic analysis of said observation vector a(t), a feature vector y(t) representative of a feature quantity of said speech signal;
- means for estimating a noise vector u(t);
- means for calculating a feature distribution parameter Z(t) = y(t) - u(t), (t = 1, 2,...T), representing a distribution of estimated values for said true voice component in the feature vector space; and a discrimination section including:
- a discriminant calculation unit for receiving said feature distribution parameter Z(t) and calculating, based on discriminant functions g k{Z(t)}, class probability values corresponding to "K" classes each class corresponding to a word, and - a decision unit for storing and comparing said class probability values and declaring the largest class probability value as a voice recognition result.
30. The apparatus of claim 29, wherein said discrimination section operates based on Hidden Markov Model (HMM).
31. The apparatus of claim 29, wherein said discrimination section operates based on Baum-Welch re-estimation method.
32. The apparatus of claim 29 used for car navigation with speech input.
33. The apparatus of claim 29 used for image recognition.
34. A pattern recognition method, comprising:
inputting a digital speech signal containing true voice component and noise;
separating a part of said speech signal at predetermined sampling time intervals and outputting a plurality (T) of frames, each frame representing an observation vector a(t) containing information regarding said part of said speech signal and said sampling time intervals;
extracting, based on acoustic analysis of said observation vector a(t), a feature vector y(t) representative of a feature quantity of said speech signal;
estimating a noise vector u(t);
calculating a feature distribution parameter Z(t) = y(t) - u(t), (t = 1, 2,...T), representing a distribution of estimated values for said true voice component in a feature vector space;
computing, based on discriminant functions gk{(Z(t)}, class probability values corresponding to "K" classes each class corresponding to a word; and storing and comparing said class probability values and declaring the largest class probability value as a voice recognition result.
35. The method of claim 34, wherein said step of computing is performed based on Hidden Markov Model (HMM).
36. The method of claim 34, wherein said step of computing is performed based on Baum-Welch re-estimation method.
CA002251509A 1997-10-31 1998-10-26 Feature extraction apparatus and method and pattern recognition apparatus and method Expired - Fee Related CA2251509C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP30097997A JP3584458B2 (en) 1997-10-31 1997-10-31 Pattern recognition device and pattern recognition method
JPP09-300979 1997-10-31

Publications (2)

Publication Number Publication Date
CA2251509A1 CA2251509A1 (en) 1999-04-30
CA2251509C true CA2251509C (en) 2005-01-25

Family

ID=17891383

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002251509A Expired - Fee Related CA2251509C (en) 1997-10-31 1998-10-26 Feature extraction apparatus and method and pattern recognition apparatus and method

Country Status (10)

Country Link
US (3) US6910010B2 (en)
EP (1) EP0913810A3 (en)
JP (1) JP3584458B2 (en)
KR (1) KR19990037460A (en)
CN (1) CN1216380A (en)
AU (1) AU746511B2 (en)
BR (1) BR9804324A (en)
CA (1) CA2251509C (en)
SG (1) SG75886A1 (en)
TW (1) TW392130B (en)

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3584458B2 (en) * 1997-10-31 2004-11-04 ソニー株式会社 Pattern recognition device and pattern recognition method
JP2000259198A (en) 1999-03-04 2000-09-22 Sony Corp Device and method for recognizing pattern and providing medium
EP1132896A1 (en) * 2000-03-08 2001-09-12 Motorola, Inc. Frequency filtering method using a Wiener filter applied to noise reduction of acoustic signals
US7072833B2 (en) * 2000-06-02 2006-07-04 Canon Kabushiki Kaisha Speech processing system
JP4538705B2 (en) * 2000-08-02 2010-09-08 ソニー株式会社 Digital signal processing method, learning method and apparatus, and program storage medium
JP2002123285A (en) * 2000-10-13 2002-04-26 Sony Corp Speaker adaptation apparatus and speaker adaptation method, recording medium and speech recognizing device
US9269043B2 (en) 2002-03-12 2016-02-23 Knowm Tech, Llc Memristive neural processor utilizing anti-hebbian and hebbian technology
US9280748B2 (en) 2012-06-22 2016-03-08 Knowm Tech, Llc Methods and systems for Anti-Hebbian and Hebbian (AHaH) feature extraction of surface manifolds using
US7130776B2 (en) * 2002-03-25 2006-10-31 Lockheed Martin Corporation Method and computer program product for producing a pattern recognition training set
US7191175B2 (en) 2004-02-13 2007-03-13 Attenex Corporation System and method for arranging concept clusters in thematic neighborhood relationships in a two-dimensional visual display space
JP4529492B2 (en) * 2004-03-11 2010-08-25 株式会社デンソー Speech extraction method, speech extraction device, speech recognition device, and program
US8218880B2 (en) 2008-05-29 2012-07-10 Microsoft Corporation Linear laplacian discrimination for feature extraction
US8738367B2 (en) * 2009-03-18 2014-05-27 Nec Corporation Speech signal processing device
US8572084B2 (en) 2009-07-28 2013-10-29 Fti Consulting, Inc. System and method for displaying relationships between electronically stored information to provide classification suggestions via nearest neighbor
US8145483B2 (en) * 2009-08-05 2012-03-27 Tze Fen Li Speech recognition method for all languages without using samples
JP5523017B2 (en) * 2009-08-20 2014-06-18 キヤノン株式会社 Image processing apparatus and image processing method
CA2772082C (en) 2009-08-24 2019-01-15 William C. Knight Generating a reference set for use during document review
KR101137533B1 (en) * 2010-09-03 2012-04-20 경희대학교 산학협력단 Method for feature selection for pattern recognition and apparatus therof
US20120116764A1 (en) * 2010-11-09 2012-05-10 Tze Fen Li Speech recognition method on sentences in all languages
US20130090926A1 (en) * 2011-09-16 2013-04-11 Qualcomm Incorporated Mobile device context information using speech detection
US8918353B2 (en) 2012-02-22 2014-12-23 Knowmtech, Llc Methods and systems for feature extraction
US11237556B2 (en) 2012-06-22 2022-02-01 Knowm, Inc. Autonomous vehicle
CN104575498B (en) * 2015-01-30 2018-08-17 深圳市云之讯网络技术有限公司 Efficient voice recognition methods and system
JP6543844B2 (en) * 2015-08-27 2019-07-17 本田技研工業株式会社 Sound source identification device and sound source identification method
AU2017274558B2 (en) 2016-06-02 2021-11-11 Nuix North America Inc. Analyzing clusters of coded documents
ES2964982T3 (en) * 2016-12-06 2024-04-10 Nippon Telegraph & Telephone Signal feature extraction device, signal feature extraction method, and program
CN110197670B (en) * 2019-06-04 2022-06-07 大众问问(北京)信息科技有限公司 Audio noise reduction method and device and electronic equipment
CN111256806A (en) * 2020-01-20 2020-06-09 福州大学 Non-contact vibration frequency composition measuring method

Family Cites Families (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE2751184C2 (en) * 1977-11-16 1979-04-19 Carl Hepting & Co, Lederwaren- Und Guertelfabrik, Gmbh, 7000 Stuttgart Fitting for a suitcase or the like
US4718093A (en) * 1984-03-27 1988-01-05 Exxon Research And Engineering Company Speech recognition method including biased principal components
US4897878A (en) * 1985-08-26 1990-01-30 Itt Corporation Noise compensation in speech recognition apparatus
JP2776848B2 (en) * 1988-12-14 1998-07-16 株式会社日立製作所 Denoising method, neural network learning method used for it
JPH02195400A (en) * 1989-01-24 1990-08-01 Canon Inc Speech recognition device
US5063603A (en) 1989-11-06 1991-11-05 David Sarnoff Research Center, Inc. Dynamic method for recognizing objects and image processing system therefor
US5210820A (en) * 1990-05-02 1993-05-11 Broadcast Data Systems Limited Partnership Signal recognition system and method
US5680481A (en) 1992-05-26 1997-10-21 Ricoh Corporation Facial feature extraction method and apparatus for a neural network acoustic and visual speech recognition system
JP2795058B2 (en) * 1992-06-03 1998-09-10 松下電器産業株式会社 Time series signal processing device
IT1257073B (en) 1992-08-11 1996-01-05 Ist Trentino Di Cultura RECOGNITION SYSTEM, ESPECIALLY FOR THE RECOGNITION OF PEOPLE.
US5497447A (en) * 1993-03-08 1996-03-05 International Business Machines Corporation Speech coding apparatus having acoustic prototype vectors generated by tying to elementary models and clustering around reference vectors
US5522011A (en) * 1993-09-27 1996-05-28 International Business Machines Corporation Speech coding apparatus and method using classification rules
EP0681730A4 (en) * 1993-11-30 1997-12-17 At & T Corp Transmitted noise reduction in communications systems.
US5704004A (en) * 1993-12-01 1997-12-30 Industrial Technology Research Institute Apparatus and method for normalizing and categorizing linear prediction code vectors using Bayesian categorization technique
US5544250A (en) * 1994-07-18 1996-08-06 Motorola Noise suppression system and method therefor
JP2690027B2 (en) * 1994-10-05 1997-12-10 株式会社エイ・ティ・アール音声翻訳通信研究所 Pattern recognition method and apparatus
US5706395A (en) * 1995-04-19 1998-01-06 Texas Instruments Incorporated Adaptive weiner filtering using a dynamic suppression factor
KR0170317B1 (en) * 1995-07-13 1999-03-30 김광호 Voice recognition method using hidden markov model having distortion density of observation vector
JP3536471B2 (en) * 1995-09-26 2004-06-07 ソニー株式会社 Identification device and identification method, and speech recognition device and speech recognition method
US5734796A (en) * 1995-09-29 1998-03-31 Ai Ware, Inc. Self-organization of pattern data with dimension reduction through learning of non-linear variance-constrained mapping
US5787394A (en) * 1995-12-13 1998-07-28 International Business Machines Corporation State-dependent speaker clustering for speaker adaptation
US6104833A (en) * 1996-01-09 2000-08-15 Fujitsu Limited Pattern recognizing apparatus and method
US5862519A (en) * 1996-04-02 1999-01-19 T-Netix, Inc. Blind clustering of data with application to speech processing systems
US5920644A (en) * 1996-06-06 1999-07-06 Fujitsu Limited Apparatus and method of recognizing pattern through feature selection by projecting feature vector on partial eigenspace
US6539115B2 (en) * 1997-02-12 2003-03-25 Fujitsu Limited Pattern recognition device for performing classification using a candidate table and method thereof
KR100434522B1 (en) * 1997-04-29 2004-07-16 삼성전자주식회사 Voice recognition method using time-base correlation, especially in relation to improving a voice recognition rate by using a time-base correlation without largely modifying a voice recognition system having a prior hmm scheme
US5960397A (en) * 1997-05-27 1999-09-28 At&T Corp System and method of recognizing an acoustic environment to adapt a set of based recognition models to the current acoustic environment for subsequent speech recognition
JP3584458B2 (en) * 1997-10-31 2004-11-04 ソニー株式会社 Pattern recognition device and pattern recognition method
KR20000033530A (en) * 1998-11-24 2000-06-15 김영환 Car noise removing method using voice section detection and spectrum subtraction
KR20000040574A (en) * 1998-12-18 2000-07-05 김영환 Method for voice recognition using gaussian potential function network algorithm and learning vector quantization algorithm
KR100358006B1 (en) * 1999-07-27 2002-10-25 학교법인 한국정보통신학원 Apparatus and method for searching song in melody database
KR100343223B1 (en) * 1999-12-07 2002-07-10 윤종용 Apparatus for eye and face detection and method thereof

Also Published As

Publication number Publication date
TW392130B (en) 2000-06-01
US7509256B2 (en) 2009-03-24
EP0913810A3 (en) 2000-04-12
US20050171772A1 (en) 2005-08-04
JP3584458B2 (en) 2004-11-04
US7117151B2 (en) 2006-10-03
AU8937398A (en) 1999-05-20
US20050171773A1 (en) 2005-08-04
KR19990037460A (en) 1999-05-25
BR9804324A (en) 1999-12-21
SG75886A1 (en) 2000-10-24
EP0913810A2 (en) 1999-05-06
CA2251509A1 (en) 1999-04-30
US20020010583A1 (en) 2002-01-24
JPH11133992A (en) 1999-05-21
AU746511B2 (en) 2002-05-02
US6910010B2 (en) 2005-06-21
CN1216380A (en) 1999-05-12

Similar Documents

Publication Publication Date Title
CA2251509C (en) Feature extraction apparatus and method and pattern recognition apparatus and method
EP1113419B1 (en) Model adaptive apparatus and model adaptive method, recording medium, and pattern recognition apparatus
EP1396845B1 (en) Method of iterative noise estimation in a recursive framework
EP0792503B1 (en) Signal conditioned minimum error rate training for continuous speech recognition
EP1355296B1 (en) Keyword detection in a speech signal
EP2148325B1 (en) Method for determining the presence of a wanted signal component
WO1997010587A9 (en) Signal conditioned minimum error rate training for continuous speech recognition
KR19990043998A (en) Pattern recognition system
JP3298858B2 (en) Partition-based similarity method for low-complexity speech recognizers
EP1023718B1 (en) Pattern recognition using multiple reference models
US20050010406A1 (en) Speech recognition apparatus, method and computer program product
US7280961B1 (en) Pattern recognizing device and method, and providing medium
US7702505B2 (en) Channel normalization apparatus and method for robust speech recognition
KR100614932B1 (en) Channel normalization apparatus and method for robust speech recognition
JP2001067094A (en) Voice recognizing device and its method
JPH11311998A (en) Feature extracting device, method therefor, pattern recognition device, method therefor and presentation medium
JP2002123285A (en) Speaker adaptation apparatus and speaker adaptation method, recording medium and speech recognizing device
Kimura et al. Practical speaker‐independent voice recognition using segmental features
JP2001022377A (en) Device and method for collating speaker accompanying register pattern renewal
JPH05289695A (en) Voice recognition system under noise

Legal Events

Date Code Title Description
EEER Examination request
MKLA Lapsed

Effective date: 20141027