WO2007070337A2 - Music detector for echo cancellation and noise reduction - Google Patents
Music detector for echo cancellation and noise reduction Download PDFInfo
- Publication number
- WO2007070337A2 WO2007070337A2 PCT/US2006/046720 US2006046720W WO2007070337A2 WO 2007070337 A2 WO2007070337 A2 WO 2007070337A2 US 2006046720 W US2006046720 W US 2006046720W WO 2007070337 A2 WO2007070337 A2 WO 2007070337A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- circuit
- music
- signal
- set forth
- telephone
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/90—Pitch determination of speech signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0033—Recording/reproducing or transmission of music for electrophonic musical instruments
- G10H1/0041—Recording/reproducing or transmission of music for electrophonic musical instruments in coded form
- G10H1/0058—Transmission between separate instruments or between individual components of a musical system
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/046—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for differentiation between music and non-music signals, based on the identification of musical parameters, e.g. based on tempo detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/155—Musical effects
- G10H2210/265—Acoustic effect simulation, i.e. volume, spatial, resonance or reverberation effects added to a musical sound, usually by appropriate filtering or delays
- G10H2210/281—Reverberation or echo
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/171—Transmission of musical instrument data, control or status information; Transmission, remote access or control of music data for electrophonic musical instruments
- G10H2240/201—Physical layer or hardware aspects of transmission to or from an electrophonic musical instrument, e.g. voltage levels, bit streams, code words or symbols over a physical link connecting network nodes or instruments
- G10H2240/241—Telephone transmission, i.e. using twisted pair telephone lines or any type of telephone network
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/171—Transmission of musical instrument data, control or status information; Transmission, remote access or control of music data for electrophonic musical instruments
- G10H2240/201—Physical layer or hardware aspects of transmission to or from an electrophonic musical instrument, e.g. voltage levels, bit streams, code words or symbols over a physical link connecting network nodes or instruments
- G10H2240/241—Telephone transmission, i.e. using twisted pair telephone lines or any type of telephone network
- G10H2240/251—Mobile telephone transmission, i.e. transmitting, accessing or controlling music data wirelessly via a wireless or mobile telephone receiver, analog or digital, e.g. DECT GSM, UMTS
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/025—Envelope processing of music signals in, e.g. time domain, transform domain or cepstrum domain
- G10H2250/031—Spectrum envelope processing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0204—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
Definitions
- This invention relates to a telephone employing circuitry for echo cancellation and noise reduction and, in particular, to such circuitry that includes a music detector.
- telephone is a generic term for a communication device that utilizes, directly or indirectly, a dial tone from a licensed service provider.
- telephone includes desk telephones ⁇ see FIG. 1), cordless telephones (see FIG. 2), speakerpho ⁇ es (see FIG. 3), hands-free kits (see FIG. 4), and cellular telephones (see FlG. 5), among others.
- the invention is described in the context of telephones but has broader utility; e.g. communication devices that do not utilize a dial tone, such as radio frequency transceivers.
- the invention has broader application in the analysis of audio signals.
- Noise reduction circuitry is generally part of a non-linear processor.
- noise refers to any unwanted sound, whether the unwanted sound is periodic, purely random, or somewhere in-between.
- noise includes background music, voices of people other than the desired speaker, tire noise, wind noise, and so on.
- noise could include an echo of the speaker's voice.
- echo cancellation is treated separately in a telephone.
- Echo cancellation involves subtracting a simulated echo from an input signal.
- the simulated echo is created by filtering an output signal with an adaptive filter.
- the adaptive filter is programmed to represent either the near-end path (speaker to microphone) or the far end path (line out to line in) to create the simulated echo.
- Noise is subjective, somewhat like a weed. It depends upon what one wants or does not want. In this description, noise is unwanted sound from the perspective of a person trying to converse on a telephone. For example, in a vehicle, noise includes road noise, music from a radio, background conversation, and the sound from the speaker element in a hands-free kit.
- the desired signal is usually only the voice of the person speaking.
- Music is generally characterized by a finite amount of energy at all times, some music having a relatively constant envelope and some not. Most of the acoustic energy in music is below 8 kHz, although rock and hard rock are almost like white noise. The spectral content of music changes frequently, depending upon the rhythm of the music. Based on these characteristics, certain features are selected and several different algorithms are being investigated in the art for classifying sound. Examples are in the literature identified above. Possible methods for classifying audio signals include envelope detection, linear prediction analysis, zero crossing detection, Bark band spectral analysis, autocorrelation, silence ratio, tracking spectral peaks, and differential spectrum (changes in spectral content from instant to instant). Silence ratio is really an amplitude comparison. A signal is divided into time segments. A signal having an amplitude less than a threshold is silence. The ratio is the number of silent segments divided by the total number of segments. Speech signals have a higher silence ratio than music. Noise and non-speech are problems, as is picking the correct time interval.
- Another object of the invention is to provide a method for unambiguously distinguishing mainstream music genre from noise while requiring little computational power.
- a further object of the invention is to provide a method for unambiguously. distinguishing mainstream music genre from noise in real time.
- spectral flatness is used to detect music and to distinguish music from noise.
- An audio signal is divided among exponentially related subband filters.
- the spectral flatness measure in each subband signal is determined and the measures are weighted and combined.
- the sum is compared with a threshold to determine the presence of music or noise. If music is detected, the noise estimation process in the noise reduction circuitry is turned off i:o avoid distorting the signal, if music is detected, residual echo suppression circuitry is also turned off to avoid inserting comfort noise.
- FIG. 1 is a perspective view of a desk telephone
- FIG. 2 is a perspective view of a cordless telephone
- FlG. 3 is a perspective view of a conference phone or a speakerphone
- FIG. 4 is a perspective view of a hands-free kit
- FlG. 5 is a perspective view of a cellular telephone
- FIG. 6 is a generic block diagram of audio processing circuitry in a telephone
- FlG. 7 is a more detailed block diagram of audio processing circuitry in a telephone
- FIG. 8 is a block diagram of a music detector constructed according to a preferred embodiment of the invention
- FIG. 9 is pseudo-code for calculating geometric mean according to one aspect of the invention.
- FIG. 10 is pseudo-code for calculating arithmetic mean according to one aspect of the invention.
- FIG. 11 is pseudo-code for calculating the ratio of the geometric mean to the arithmetic mean according to one aspect of the invention.
- FIG. 1 illustrates a desk telephone including base 10, keypad 11, display 13 and handset 14. As illustrated in FIG. 1, the telephone has speakerphone capability including speaker 15 and microphone 16.
- the cordless telephone illustrated in FlG. 2 is similar except that base 20 and handset 21 are coupled by radio frequency signals, instead of a cord, through antennas 23 and 24. Power for handset 21 is supplied by internal batteries (not shown) charged through terminals 26 and 27 in base 20 when the handset rests in cradle 29.
- FIG. 3 illustrates a conference phone or speakerphone such as found in business offices.
- Telephone 30 includes microphone 31 and speaker 32 in a sculptured case.
- Telephone 30 may include several microphones, such as microphones 34 and 35 to improve voice reception or to provide several inputs for echo rejection or noise rejection, as disclosed in U.S. Patent 5,138,651 (Sudo).
- FIG. 4 illustrates what is known as a hands-free kit for providing audio coupling to a cellular telephone, illustrated in FlG. 5.
- Hands-free kits come in a variety of implementations but generally include powered speaker 36 attached to plug 37, which fits an accessory outlet or a cigarette lighter socket in a vehicle.
- a hands-free kit also includes cable 38 terminating in plug 39.
- Plug 39 fits the headset socket on a cellular telephone, such as socket 41 (FIG. 5) in cellular telephone 42.
- Some kits use RF signals, like a cordless phone, to couple to a telephone.
- a hands-free kit also typically includes a volume control and some control switches, e.g. for going "off hook" to answer a call.
- a hands-free kit also typically includes a visor microphone (not shown) that plugs into the kit. Audio processing circuitry constructed according to the invention can be included in a hands-free kit or in a cellular telephone.
- FIG. 6 is a block diagram of the major components of a cellular telephone. Typically, the blocks correspond to integrated circuits implementing the indicated function. Microphone 51, speakei 52, and keypad 53 are coupled to signal processing circuit 54. Circuit 54 performs a plurality of functions and is known by several names in the art, differing by manufacturer. For example, Infineon calls circuit 54 a "single chip baseband IC.” QualComm calls circuit 54 a "mobile station modem.” The circuits from different manufacturers obviously differ in detail but, in general, the indicated functions are included.
- a cellular telephone includes both audio frequency and radio frequency circuits.
- Duplexer 55> couples antenna 56 to receive processor 57.
- Duplexer 55 couples antenna 56 to power amplifier 58 and isolates receive processor 57 from the power amplifier during transmission.
- Transmit processor 59 modulates a radio frequency signal with an audio signal from circuit 54.
- audio processor 60 It is audio processor 60 that is modified to include the invention. How that modification takes place is more easily understood by considering the echo canceling and noise reduction portions of an audio processor in more detail.
- FIG. 7 i ⁇ a detailed block diagram of a noise reduction and echo canceling circuit; e.g.
- a new voice signal entering microphone input 62 may or may not be accompanied by ambient noise or sounds from speaker output 68.
- the signals from input 62 are digitized in A/D converter 71 and coupled to summation network 72.
- summation network 72 There is, as yet, no signal from echo canceling circuit 73 and the data proceeds to non-linear processing circuit 74, which includes a music detector and other circuitry, such as a noise reduction circuit, a residual echo canceling circuit, and a center clipper.
- non-linear processing circuit 74 The output from non-linear processing circuit 74 is coupled to summation circuit 76, where comfort noise 75 is optionally added to the signal.
- the signal is then converted back to analog form by D/A converter 77, amplified in amplifier 78, and
- Circuit 73 reduces acoustic echo and circuit 81 reduces line echo as directed by control 80.
- the operation of these last two circuits is known per se in the art; e.g. as described in the above-identified text.
- FIG. 8 is a block diagram of a music detector for controlling at least a portion of the non-linear processor.
- the music detector is based upon a circuit that looks at the spectral amplitude (or energy) of samples of the signal and computes the ratio of the geometric mean to the arithmetic mean of the spectrum.
- a geometric mean is the nfh root of the product of n samples.
- FIGS. 9, 10 and 11 illustrate the computation of SFM using exponent and mantissa format.
- the norm factor mentioned in FIG. 9 is the number of left shifts needed to scale a given number to the range [0.5,1.0].
- the input signal is filtered to divide the signal into
- the subbands are preferably octaval and are individually weighted to give more emphasis to lower frequencies.
- the following Table shows the octave spacing used in one embodiment of the invention.
- the first subband is a whole octave.
- the remaining subbands are split octave.
- the subband spacing was determined empirically by performing Monte- Carlo simulations on a large database consisting of two hundred fifty-two music files and one hundred eighty-nine noise files.
- L refers to the bin number corresponding to the lower frequency boundary
- H refers to the bin number corresponding to the higher frequency boundary
- M is the number of spectral bins in each :;ubband.
- the spectral flatness measure (SFM) in each subband is calculated using the following formula.
- SFM[V) is the spectral measure for i subband at time (j ⁇ )
- L(i) and H( ⁇ ) corresponds to the lower and higher spectral bin number for i®* subband
- M(i) is the number of bins in i ⁇ subband.
- a simpler classification scheme is used in the invention.
- a single test statistic is g flick derived from the individual subband SFM.
- the test statistic is derived from an exponentially weighted sum of subband SFMs, as shown in the following equation.
- OC is the weighting factor
- q is the number of subbands
- SFM(i) is the SFM for t th subband.
- the weighting is chosen to emphasize tow frequencies, i.e. the contribution of individual SFMs gradually decreases as frequency increases. This is because, music, speech, and the noise spectrum share similar spectral characteristics at high frequencies.
- a weighting factor less than one ( ⁇ 1) suffices.
- a table could be used instead of calculating the weighting factor.
- the test statistic ⁇ is preferably median filtered to reduce spurious spikes in the
- ⁇ is the smoothing constant
- ⁇ ( ⁇ z) is the smoothed test statistics at time (n)
- y(n-l) is the test statistic at time 0 ⁇ -l).
- the smoothed test statistic is compared with a threshold to detect the presence of music. Specifically, if the smoothed test statistics are greater than the threshold ⁇ , then the spectrum is relatively flat and background noise is present and musicDetect goes to a logic "false” or, for positive logic, a "0" (zero). If the smoothed test statistic is not greater than the threshold ⁇ , then music is present and musicDetect is true or "1".
- the musicDetect signal is used by control 80 (FIG. 7) to prevent noise reduction circuitry in non-linear processor 74 from reducing noise when music is present.
- the invention thus provides a method for unambiguously distinguishing mainstream music genre from noise.
- the method does so efficiently, requiring little computational power, in part, due to the use of a pseudo floating-point operation in a fixed— point processor, and does so in real time.
- circuits 72 and 76 (FIG. 7) are called "summation" circuits with the understanding that a simple arithmetic process is being carried out, which can be either digital or analog, whether the process entails subtracting one signal from another signal or inverting (changing the sign of) one signal and then adding it to another signal.
- “summation” is defined herein as generic to addition and subtraction. Rather than dividing the spectrum into subbands and individually weighting the subbands, one could simply filter and analyze the lower portion of the spectrum, e.g. 300-1200 Hz. Rather than dividing the spectrum into octaval subbands, one could use exponentially related subbands. That is, the subbands can be related by other than a power of two; e.g. 1.5, 2.5, or 3.
- the system is not reliable using Bark bands (center frequencies of 570, 700, 840, 1000, 1170, 1370, 1600, 1850, 2150, 2500, 2900, 3400 Hz).
- the range covered is less than the frequency response of a telephone, roughly 50-3000 Hz. In systems having wider frequency response, a different set of octaves can be used. Rather than completely preventing noise reduction, a high on musicDetect could be used to reduce the effect of noise reduction circuitry, rather than shutting it off.
Abstract
An audio signal is divided among exponentially related subband filters. The spectral flatness measure (G(x)/A(x)) in each subband signal is determined and the measures are weighted (a<SUB>n</SUB> * x<SUB>n</SUB>) and combined (S). The sum is compared with a threshold to determine the presence of music or noise. If music is detected, the noise estimation process in the noise reduction circuitry is turned off to avoid distorting the signal. If music is detected, residual echo suppression circuitry is also turned off to avoid inserting comfort noise
Description
MUSIC DETECTOR FOR ECHO CANCELLATION AND NOISE REDUCTION
BACKGROUND OF THE INVENTION
This invention relates to a telephone employing circuitry for echo cancellation and noise reduction and, in particular, to such circuitry that includes a music detector. As used herein, "telephone" is a generic term for a communication device that utilizes, directly or indirectly, a dial tone from a licensed service provider. As such, "telephone" includes desk telephones {see FIG. 1), cordless telephones (see FIG. 2), speakerphoπes (see FIG. 3), hands-free kits (see FIG. 4), and cellular telephones (see FlG. 5), among others. For the sake of simplicity, the invention is described in the context of telephones but has broader utility; e.g. communication devices that do not utilize a dial tone, such as radio frequency transceivers. Although described in the context of telephones, the invention has broader application in the analysis of audio signals.
While not universally followed, the prior art generally associates noise "suppression" with subtracting a signal from the signal of interest and associates noise "reduction" with attenuation or reduced gain. Noise reduction circuitry is generally part of a non-linear processor.
There are many sources of noise in a telephone system. Some noise is acoustic in origin while other noise is electronic, from the telephone network, for example. As used herein, "noise" refers to any unwanted sound, whether the unwanted sound is periodic, purely random, or somewhere in-between. As such, noise includes background music, voices of people other than the desired speaker, tire noise, wind noise, and so on. As thus broadly defined, noise could include an echo of the speaker's voice. However, echo cancellation is treated separately in a telephone.
There are two kinds of echoes in telephones, an acoustic echo from the path between an earphone or a speaker and a microphone and a line echo generated in the switched network for routing a call between stations. Echo cancellation involves subtracting a simulated echo from an input signal. The simulated echo is created by filtering an output signal with an adaptive filter. The adaptive filter is programmed to represent either the near-end path (speaker to microphone) or the far end path (line out to line in) to create the simulated echo.
Noise is subjective, somewhat like a weed. It depends upon what one wants or does not want. In this description, noise is unwanted sound from the perspective of a person trying to converse on a telephone. For example, in a vehicle, noise includes road noise, music from a radio, background conversation, and the sound from the speaker element in a hands-free kit. The desired signal is usually only the voice of the person speaking.
If there is significant amount of background noise, it is usually desirable to reduce the background noise to improve intelligibility. On the other hand, a person may be at a musical concert and it may be desirable to allow the music to pass through the telephone network unaffected. To satisfy these contradictory conditions, one needs a special algorithm to distinguish between noise and music.
It is known in the art to distinguish music from speech; see, for example, Carey, Michael J. el: al., Comparison of Features for Speech, Music Discrimination, IEEE publication 0-7803-5041-3/99 © 1999. It is also known to distinguish music, speech, and noise; see, for example, G. Lu & T. Hankinson, "A Technique towards Automatic A udio Classification and Retrieval," 2998 Fourth Signal International Conference on Signal Processing Proceedings (ISCP- 98), Beijing, China 1998. Spectral flatness measure (SFM) is known in the art; see, for example, U.S. Patent 5,648,921 (Bayya et al.) and U.S. Patent 6,477,489 (Lockwood et al.). As used herein, SFM is defined differently from these two patents, which define SFM differently from each other. The differences are in form, not substance.
One of the main challenges in distinguishing music from noise is that the envelopes of both types of signal are relatively constant. Most known voice activity detectors measure the energy content of the envelope, which means that a voice activity detector will detect music as noise and will cause the noise reduction circuitry to reduce the background music, distorting the signal. It will also cause the non-linear processor to suppress the residual echo, which will then insert the comfort noise after suppressing the residual echo. This insertion of comfort noise can annoy a listener because the music will become intermittent. A similar effect can occur in echo canceling systems.
Music is generally characterized by a finite amount of energy at all times, some music having a relatively constant envelope and some not. Most of the acoustic energy in music is below 8 kHz, although rock and hard rock are almost like white
noise. The spectral content of music changes frequently, depending upon the rhythm of the music. Based on these characteristics, certain features are selected and several different algorithms are being investigated in the art for classifying sound. Examples are in the literature identified above. Possible methods for classifying audio signals include envelope detection, linear prediction analysis, zero crossing detection, Bark band spectral analysis, autocorrelation, silence ratio, tracking spectral peaks, and differential spectrum (changes in spectral content from instant to instant). Silence ratio is really an amplitude comparison. A signal is divided into time segments. A signal having an amplitude less than a threshold is silence. The ratio is the number of silent segments divided by the total number of segments. Speech signals have a higher silence ratio than music. Noise and non-speech are problems, as is picking the correct time interval.
Many of these methods are not robust enough to distinguish different genre of music unambiguously from noise. Some of the methods are not meant to be done in real time because of large computational requirements; e.g. requiring wide data bus, large amounts of storage, or long execution time for analysis. Hence, it is desirable to provide a method that can unambiguously distinguish mainstream music genre with small computational requirements.
In view of the foregoing, it is therefore an object of the invention to provide a method for unambiguously distinguishing mainstream music genre from noise.
Another object of the invention is to provide a method for unambiguously distinguishing mainstream music genre from noise while requiring little computational power.
A further object of the invention is to provide a method for unambiguously. distinguishing mainstream music genre from noise in real time.
SUMMARY OF THE INVENTION
The foregoing objects are achieved in this invention in which spectral flatness is used to detect music and to distinguish music from noise. An audio signal is divided among exponentially related subband filters. The spectral flatness measure in each subband signal is determined and the measures are weighted and combined. The sum is compared with a threshold to determine the presence of music or noise. If music is detected, the noise estimation process in the noise reduction circuitry is
turned off i:o avoid distorting the signal, if music is detected, residual echo suppression circuitry is also turned off to avoid inserting comfort noise.
' BRIEF DESCRIPTION OF THE DRAWINGS
A more complete understanding of the invention can be obtained by considering the following detailed description in conjunction with the accompanying drawings, in which:
FIG. 1 is a perspective view of a desk telephone; FIG. 2 is a perspective view of a cordless telephone; FlG. 3 is a perspective view of a conference phone or a speakerphone; FIG. 4 is a perspective view of a hands-free kit;
FlG. 5 is a perspective view of a cellular telephone;
FIG. 6 is a generic block diagram of audio processing circuitry in a telephone; FlG. 7 is a more detailed block diagram of audio processing circuitry in a telephone; FIG. 8 is a block diagram of a music detector constructed according to a preferred embodiment of the invention;
FIG. 9 is pseudo-code for calculating geometric mean according to one aspect of the invention;
FIG. 10 is pseudo-code for calculating arithmetic mean according to one aspect of the invention; and
FIG. 11 is pseudo-code for calculating the ratio of the geometric mean to the arithmetic mean according to one aspect of the invention.
Those of skill in the art recognize that, once an analog signal is converted to digital form, all subsequent operations can take place in one or more suitably programmed microprocessors. Reference to "signal," for example, does not necessarily mean a hardware implementation or an analog signal. Data in memory, even a single bit, can be a signal. In other words, a block diagram can be interpreted as hardware, software, e.g. a flow chart or an algorithm, or a mixture of hardware and software. Programming a microprocessor is well within the ability of those of ordinary skill in the art, either individually or in groups.
DETAILED DESCRIPTION OF THE INVENTION
This invention finds use in many applications where the electronics is essentially the same but the external appearance of the device may vary. FIG. 1 illustrates a desk telephone including base 10, keypad 11, display 13 and handset 14. As illustrated in FIG. 1, the telephone has speakerphone capability including speaker 15 and microphone 16. The cordless telephone illustrated in FlG. 2 is similar except that base 20 and handset 21 are coupled by radio frequency signals, instead of a cord, through antennas 23 and 24. Power for handset 21 is supplied by internal batteries (not shown) charged through terminals 26 and 27 in base 20 when the handset rests in cradle 29.
FIG. 3 illustrates a conference phone or speakerphone such as found in business offices. Telephone 30 includes microphone 31 and speaker 32 in a sculptured case. Telephone 30 may include several microphones, such as microphones 34 and 35 to improve voice reception or to provide several inputs for echo rejection or noise rejection, as disclosed in U.S. Patent 5,138,651 (Sudo).
FIG. 4 illustrates what is known as a hands-free kit for providing audio coupling to a cellular telephone, illustrated in FlG. 5. Hands-free kits come in a variety of implementations but generally include powered speaker 36 attached to plug 37, which fits an accessory outlet or a cigarette lighter socket in a vehicle. A hands-free kit also includes cable 38 terminating in plug 39. Plug 39 fits the headset socket on a cellular telephone, such as socket 41 (FIG. 5) in cellular telephone 42. Some kits use RF signals, like a cordless phone, to couple to a telephone. A hands-free kit also typically includes a volume control and some control switches, e.g. for going "off hook" to answer a call. A hands-free kit also typically includes a visor microphone (not shown) that plugs into the kit. Audio processing circuitry constructed according to the invention can be included in a hands-free kit or in a cellular telephone.
The various forms of telephone can all benefit from the invention. FIG. 6 is a block diagram of the major components of a cellular telephone. Typically, the blocks correspond to integrated circuits implementing the indicated function. Microphone 51, speakei 52, and keypad 53 are coupled to signal processing circuit 54. Circuit 54 performs a plurality of functions and is known by several names in the art, differing by manufacturer. For example, Infineon calls circuit 54 a "single chip
baseband IC." QualComm calls circuit 54 a "mobile station modem." The circuits from different manufacturers obviously differ in detail but, in general, the indicated functions are included.
A cellular telephone includes both audio frequency and radio frequency circuits. Duplexer 55> couples antenna 56 to receive processor 57. Duplexer 55 couples antenna 56 to power amplifier 58 and isolates receive processor 57 from the power amplifier during transmission. Transmit processor 59 modulates a radio frequency signal with an audio signal from circuit 54. In non-cellular applications, such as speakerphoπes, there are no radio frequency circuits and signal processor 54 may be simplified somewhat. Problems of echo cancellation and noise remain and are handled in audio processor 60. It is audio processor 60 that is modified to include the invention. How that modification takes place is more easily understood by considering the echo canceling and noise reduction portions of an audio processor in more detail. FIG. 7 i<; a detailed block diagram of a noise reduction and echo canceling circuit; e.g. see chapter 6 of Digital Signal Processing in Telecommunications by Shenoi, Prentice-Hall, 1995. The following describes signal flow through the transmit channel, from microphone input 62 to line output 64. The receive channel, from line input 66 to speaker output 68, works in the same way, except that the gain of a particular stage may be different from the gain of a corresponding stage in the transmit channel.
A new voice signal entering microphone input 62 may or may not be accompanied by ambient noise or sounds from speaker output 68. The signals from input 62 are digitized in A/D converter 71 and coupled to summation network 72. There is, as yet, no signal from echo canceling circuit 73 and the data proceeds to non-linear processing circuit 74, which includes a music detector and other circuitry, such as a noise reduction circuit, a residual echo canceling circuit, and a center clipper.
The output from non-linear processing circuit 74 is coupled to summation circuit 76, where comfort noise 75 is optionally added to the signal. The signal is then converted back to analog form by D/A converter 77, amplified in amplifier 78, and
■ coupled to line output 64. Circuit 73 reduces acoustic echo and circuit 81 reduces
line echo as directed by control 80. The operation of these last two circuits is known per se in the art; e.g. as described in the above-identified text.
FIG. 8 is a block diagram of a music detector for controlling at least a portion of the non-linear processor. The music detector is based upon a circuit that looks at the spectral amplitude (or energy) of samples of the signal and computes the ratio of the geometric mean to the arithmetic mean of the spectrum. A geometric mean is the nfh root of the product of n samples. An arithmetic mean is the sum of n samples divided by n. As known in mathematics, this ratio is always less than one unless the data are equal. For example, ^/2 x2x 2 x2 = (2+2+2+2)/4 but
< (l+2+3+4)/4. Equality, or perfect smoothness, is unattainable and so, in practice, the ratio is always less than one.
Because a geometric mean involves repeated multiplication, the precision of the root will be much less than the precision of the factors of the product if sixteen bit precision is used. On the other hand, increasing the number of bits of precision can significantly slow the calculation. This dilemma is solved according to another aspect of the invention by computing the geometric mean, arithmetic mean, and their ratio using floating— point notation (mantissa and exponent) in a 16— bit, fixed- point processor, referred to herein as a pseudo floating— point operation. The exponent is stored in a 16-bit memory location. The performance of the pseudo floating— point operation is equal to or better than conventional floating— point performance using processors of the same precision, e.g. 16-bits. Using the pseudo floating-point operation, the system is able to detect the presence of music correctly even if the signal level is very small (less than -45 dBFS). The steps in FIGS. 9, 10 and 11 illustrate the computation of SFM using exponent and mantissa format. The norm factor mentioned in FIG. 9 is the number of left shifts needed to scale a given number to the range [0.5,1.0].
In general, in a musical piece, a singer is accompanied by musical instruments playing at different frequency ranges. Under these circumstances, a spectral flatness measure of the entire spectrum may not give a distinct, discriminating feature to distinguish the music from noise. In order to circumvent this problem, according to another aspect of the invention, the input signal is filtered to divide the signal into
, ~j
subbands. The subbands are preferably octaval and are individually weighted to give more emphasis to lower frequencies.
The following Table shows the octave spacing used in one embodiment of the invention. The first subband is a whole octave. The remaining subbands are split octave. The subband spacing was determined empirically by performing Monte- Carlo simulations on a large database consisting of two hundred fifty-two music files and one hundred eighty-nine noise files. In the Table, L refers to the bin number corresponding to the lower frequency boundary, H refers to the bin number corresponding to the higher frequency boundary and M is the number of spectral bins in each :;ubband.
Table
Subband No. (£) Freq. (Hz.) L H M α
1 500-1000 33 64 32 1.00
2 1000-1500 65 96 32 0.50
3 1500-2000 97 128 32 0.73
4 2000-2500 129 160 32 0.61
5 2500-3500 161 224 64 0.52
The spectral flatness measure (SFM) in each subband is calculated using the following formula.
SFM[V) is the spectral measure for i subband at time (jι), L(i) and H(ϊ) corresponds to the lower and higher spectral bin number for i®* subband and M(i) is the number of bins in i^ subband. One can distinguish music and speech from noise using any one of the many N- feature set classification algorithms, such as /?-nearest-neighbor classifier, on the data for subband SFM. However, a simpler classification scheme is used in the invention. According to another aspect of the invention, a single test statistic is g „
derived from the individual subband SFM. The test statistic is derived from an exponentially weighted sum of subband SFMs, as shown in the following equation.
β(») = ∑a{ι-1]SFM(n,i)
(i-l)
OC is the weighting factor, q is the number of subbands and SFM(i) is the SFM for tth subband. The weighting is chosen to emphasize tow frequencies, i.e. the contribution of individual SFMs gradually decreases as frequency increases. This is because, music, speech, and the noise spectrum share similar spectral characteristics at high frequencies. A weighting factor less than one (<1) suffices. A table could be used instead of calculating the weighting factor. The test statistic β is preferably median filtered to reduce spurious spikes in the
SFM estimate. That is,
where p is the size of the median filter. The test statistic is further smoothed by calculating a rolling average to reduce the variance of the statistic. y(n) = εγ(n - 1) + (1- ε)λ(n)
where ε is the smoothing constant, γ(τz) is the smoothed test statistics at time (n) and y(n-l) is the test statistic at time 0ι-l).
Finally, the smoothed test statistic is compared with a threshold to detect the presence of music. Specifically, if the smoothed test statistics are greater than the threshold η, then the spectrum is relatively flat and background noise is present and musicDetect goes to a logic "false" or, for positive logic, a "0" (zero). If the smoothed test statistic is not greater than the threshold η, then music is present and musicDetect is true or "1". The musicDetect signal is used by control 80 (FIG. 7) to prevent noise reduction circuitry in non-linear processor 74 from reducing noise when music is present.
The invention thus provides a method for unambiguously distinguishing mainstream music genre from noise. The method does so efficiently, requiring little computational power, in part, due to the use of a pseudo floating-point operation in a fixed— point processor, and does so in real time.
Having thus described the invention, it will be apparent to those of skill in the art that various modifications can be made within the scope of the invention. For example, circuits 72 and 76 (FIG. 7) are called "summation" circuits with the understanding that a simple arithmetic process is being carried out, which can be either digital or analog, whether the process entails subtracting one signal from another signal or inverting (changing the sign of) one signal and then adding it to another signal. Stated another way, "summation" is defined herein as generic to addition and subtraction. Rather than dividing the spectrum into subbands and individually weighting the subbands, one could simply filter and analyze the lower portion of the spectrum, e.g. 300-1200 Hz. Rather than dividing the spectrum into octaval subbands, one could use exponentially related subbands. That is, the subbands can be related by other than a power of two; e.g. 1.5, 2.5, or 3. The system is not reliable using Bark bands (center frequencies of 570, 700, 840, 1000, 1170, 1370, 1600, 1850, 2150, 2500, 2900, 3400 Hz). The range covered is less than the frequency response of a telephone, roughly 50-3000 Hz. In systems having wider frequency response, a different set of octaves can be used. Rather than completely preventing noise reduction, a high on musicDetect could be used to reduce the effect of noise reduction circuitry, rather than shutting it off.
Claims
1. A method for detecting music in an analog signal also containing voice or noise, said method comprising the steps of: digitizing said analog signal by converting said analog signal into a plurality of samples indicating the magnitude of the analog signal at the time of the sample; dividing 1:he signal into exponentially related subband signals; determining the spectral flatness measure of each subband signal; combining the spectral flatness measures; and comparing the combined spectral flatness measures with a threshold.
2. The method as set forth in claim 1 wherein said" dividing step divides the signal into octavally related subband signals.
3. The method as set forth in claim 1 wherein said comparing step is followed by the step of indicating whether or not the analog signal contains music depending upon the outcome of said comparing step.
4. The method as set forth in claim 1 wherein said determining step is performed using pseudo floating-point operations in a fixed-point processor.
5. The method as set forth in claim 1 wherein the spectral flatness measure is defined as the ratio of the geometric mean of a group of samples to the arithmetic mean of the same group of samples.
6. The method as set forth in claim 1 and further including the step of: weighting the spectral flatness measure of each subband signal.
7. In a telephone including an audio frequency circuit having a first channel, a second channel, and a noise reduction circuit in one of said first channel and said second channel, the improvement comprising: a music detector in said audio frequency circuit for sensing a musical component in an audio signal and controlling said noise reduction circuit to prevent distortion to the audio signal; said music detector including: a fixed— point calculator for determining spectral flatness in pseudo floating-point operations; a circuit for comparing spectral flatness with a threshold and producing a flatness output signal; and a circuit for controlling said noise reduction circuit depending upon said flatnes:; output signal.
8. The telephone as set forth in claim 7 wherein said music detector further includes band pass filters for dividing said audio signal into exponentially related bands and said fixed— point calculator determines spectral flatness in each band and produces a plurality of outputs.
9. The telephone as set forth in claim 8 and further including a summation circuit for combining said plurality of outputs into said flatness output signal.
10. The telephone as set forth in claim 9 and further including a circuit for averaging successive flatness output signals and for coupling the average to said circuit for comparing.
11. In a telephone including an audio frequency circuit having a first channel, a second channel, and at least one echo canceling circuit coupled between said first channel and said second channel, the improvement comprising: a music: detector in said audio frequency circuit for sensing a musical component in an audio signal and controlling said echo canceling circuit to prevent intermittent music; said music detector including: a fixed-point calculator for determining spectral flatness in pseudo floating— point operations; a circuit for comparing spectral flatness with a threshold; and a circuit for controlling said noise reduction circuit depending upon the outcome of the comparison.
12. The. telephone as set forth in claim 11 wherein said music detector further includes band pass filters for dividing said audio signal into exponentially related bands and said fixed— point calculator determines spectral flatness in each band and produces a plurality of outputs.
13. The: telephone as set forth in claim 12 and further including a summation circuit for combining said plurality of outputs into said flatness output signal.
14. The telephone as set forth in claim 13 and further including a circuit for averaging successive flatness output signals and for coupling the average to said circuit for comparing.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/298,865 US8126706B2 (en) | 2005-12-09 | 2005-12-09 | Music detector for echo cancellation and noise reduction |
US11/298,865 | 2005-12-09 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2007070337A2 true WO2007070337A2 (en) | 2007-06-21 |
WO2007070337A3 WO2007070337A3 (en) | 2011-05-26 |
Family
ID=38140529
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2006/046720 WO2007070337A2 (en) | 2005-12-09 | 2006-12-06 | Music detector for echo cancellation and noise reduction |
Country Status (2)
Country | Link |
---|---|
US (1) | US8126706B2 (en) |
WO (1) | WO2007070337A2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9215538B2 (en) | 2009-08-04 | 2015-12-15 | Nokia Technologies Oy | Method and apparatus for audio signal classification |
Families Citing this family (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7558729B1 (en) * | 2004-07-16 | 2009-07-07 | Mindspeed Technologies, Inc. | Music detection for enhancing echo cancellation and speech coding |
FI20045315A (en) * | 2004-08-30 | 2006-03-01 | Nokia Corp | Detection of voice activity in an audio signal |
US8260613B2 (en) * | 2007-02-21 | 2012-09-04 | Telefonaktiebolaget L M Ericsson (Publ) | Double talk detector |
US8219387B2 (en) * | 2007-12-10 | 2012-07-10 | Microsoft Corporation | Identifying far-end sound |
US8611556B2 (en) * | 2008-04-25 | 2013-12-17 | Nokia Corporation | Calibrating multiple microphones |
US8244528B2 (en) * | 2008-04-25 | 2012-08-14 | Nokia Corporation | Method and apparatus for voice activity determination |
US8275136B2 (en) * | 2008-04-25 | 2012-09-25 | Nokia Corporation | Electronic device speech enhancement |
CN102099855B (en) * | 2008-08-08 | 2012-09-26 | 松下电器产业株式会社 | Spectral smoothing device, encoding device, decoding device, communication terminal device, base station device, and spectral smoothing method |
CN101847412B (en) * | 2009-03-27 | 2012-02-15 | 华为技术有限公司 | Method and device for classifying audio signals |
CN102044244B (en) * | 2009-10-15 | 2011-11-16 | 华为技术有限公司 | Signal classifying method and device |
RU2010152224A (en) * | 2010-12-20 | 2012-06-27 | ЭлЭсАй Корпорейшн (US) | MUSIC DETECTION BASED ON PAUSE ANALYSIS |
US8712076B2 (en) | 2012-02-08 | 2014-04-29 | Dolby Laboratories Licensing Corporation | Post-processing including median filtering of noise suppression gains |
US9173025B2 (en) | 2012-02-08 | 2015-10-27 | Dolby Laboratories Licensing Corporation | Combined suppression of noise, echo, and out-of-location signals |
US9704478B1 (en) * | 2013-12-02 | 2017-07-11 | Amazon Technologies, Inc. | Audio output masking for improved automatic speech recognition |
WO2015094083A1 (en) * | 2013-12-19 | 2015-06-25 | Telefonaktiebolaget L M Ericsson (Publ) | Estimation of background noise in audio signals |
GB2536203A (en) * | 2015-03-03 | 2016-09-14 | Nokia Technologies Oy | An apparatus |
US11621017B2 (en) | 2015-08-07 | 2023-04-04 | Cirrus Logic, Inc. | Event detection for playback management in an audio device |
US10186276B2 (en) * | 2015-09-25 | 2019-01-22 | Qualcomm Incorporated | Adaptive noise suppression for super wideband music |
US10242696B2 (en) | 2016-10-11 | 2019-03-26 | Cirrus Logic, Inc. | Detection of acoustic impulse events in voice applications |
JP7143327B2 (en) | 2017-10-03 | 2022-09-28 | グーグル エルエルシー | Methods, Computer Systems, Computing Systems, and Programs Implemented by Computing Devices |
US10951859B2 (en) | 2018-05-30 | 2021-03-16 | Microsoft Technology Licensing, Llc | Videoconferencing device and method |
US11017792B2 (en) * | 2019-06-17 | 2021-05-25 | Bose Corporation | Modular echo cancellation unit |
US11688384B2 (en) | 2020-08-14 | 2023-06-27 | Cisco Technology, Inc. | Noise management during an online conference session |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5886276A (en) * | 1997-01-16 | 1999-03-23 | The Board Of Trustees Of The Leland Stanford Junior University | System and method for multiresolution scalable audio signal encoding |
US20030112265A1 (en) * | 2001-12-14 | 2003-06-19 | Tong Zhang | Indexing video by detecting speech and music in audio |
US20040128119A1 (en) * | 1997-06-18 | 2004-07-01 | Maurudis Anastasios S. | Method and apparatus for accurately modeling digital signal processors |
US6760435B1 (en) * | 2000-02-08 | 2004-07-06 | Lucent Technologies Inc. | Method and apparatus for network speech enhancement |
US20050114128A1 (en) * | 2003-02-21 | 2005-05-26 | Harman Becker Automotive Systems-Wavemakers, Inc. | System for suppressing rain noise |
Family Cites Families (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1062963C (en) * | 1990-04-12 | 2001-03-07 | 多尔拜实验特许公司 | Adaptive-block-lenght, adaptive-transform, and adaptive-window transform coder, decoder, and encoder/decoder for high-quality audio |
JP3277398B2 (en) * | 1992-04-15 | 2002-04-22 | ソニー株式会社 | Voiced sound discrimination method |
US5583961A (en) * | 1993-03-25 | 1996-12-10 | British Telecommunications Public Limited Company | Speaker recognition using spectral coefficients normalized with respect to unequal frequency bands |
US5684921A (en) * | 1995-07-13 | 1997-11-04 | U S West Technologies, Inc. | Method and system for identifying a corrupted speech message signal |
FR2762467B1 (en) * | 1997-04-16 | 1999-07-02 | France Telecom | MULTI-CHANNEL ACOUSTIC ECHO CANCELING METHOD AND MULTI-CHANNEL ACOUSTIC ECHO CANCELER |
FR2768547B1 (en) * | 1997-09-18 | 1999-11-19 | Matra Communication | METHOD FOR NOISE REDUCTION OF A DIGITAL SPEAKING SIGNAL |
FR2768544B1 (en) * | 1997-09-18 | 1999-11-19 | Matra Communication | VOICE ACTIVITY DETECTION METHOD |
US7317958B1 (en) * | 2000-03-08 | 2008-01-08 | The Regents Of The University Of California | Apparatus and method of additive synthesis of digital audio signals using a recursive digital oscillator |
DE10134471C2 (en) * | 2001-02-28 | 2003-05-22 | Fraunhofer Ges Forschung | Method and device for characterizing a signal and method and device for generating an indexed signal |
US20030187663A1 (en) * | 2002-03-28 | 2003-10-02 | Truman Michael Mead | Broadband frequency translation for high frequency regeneration |
US7447631B2 (en) * | 2002-06-17 | 2008-11-04 | Dolby Laboratories Licensing Corporation | Audio coding system using spectral hole filling |
SG108862A1 (en) * | 2002-07-24 | 2005-02-28 | St Microelectronics Asia | Method and system for parametric characterization of transient audio signals |
JP3922997B2 (en) * | 2002-10-30 | 2007-05-30 | 沖電気工業株式会社 | Echo canceller |
JP3963850B2 (en) * | 2003-03-11 | 2007-08-22 | 富士通株式会社 | Voice segment detection device |
DE10313875B3 (en) * | 2003-03-21 | 2004-10-28 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Device and method for analyzing an information signal |
US20060229878A1 (en) * | 2003-05-27 | 2006-10-12 | Eric Scheirer | Waveform recognition method and apparatus |
US7379875B2 (en) * | 2003-10-24 | 2008-05-27 | Microsoft Corporation | Systems and methods for generating audio thumbnails |
US6980933B2 (en) * | 2004-01-27 | 2005-12-27 | Dolby Laboratories Licensing Corporation | Coding techniques using estimated spectral magnitude and phase derived from MDCT coefficients |
EP1646035B1 (en) * | 2004-10-05 | 2013-06-19 | Sony Europe Limited | Mapped meta-data sound-playback device and audio-sampling/sample processing system useable therewith |
US7676362B2 (en) * | 2004-12-31 | 2010-03-09 | Motorola, Inc. | Method and apparatus for enhancing loudness of a speech signal |
US7555117B2 (en) * | 2005-07-12 | 2009-06-30 | Acoustic Technologies, Inc. | Path change detector for echo cancellation |
US7562021B2 (en) * | 2005-07-15 | 2009-07-14 | Microsoft Corporation | Modification of codewords in dictionary used for efficient coding of digital media spectral data |
-
2005
- 2005-12-09 US US11/298,865 patent/US8126706B2/en active Active
-
2006
- 2006-12-06 WO PCT/US2006/046720 patent/WO2007070337A2/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5886276A (en) * | 1997-01-16 | 1999-03-23 | The Board Of Trustees Of The Leland Stanford Junior University | System and method for multiresolution scalable audio signal encoding |
US20040128119A1 (en) * | 1997-06-18 | 2004-07-01 | Maurudis Anastasios S. | Method and apparatus for accurately modeling digital signal processors |
US6760435B1 (en) * | 2000-02-08 | 2004-07-06 | Lucent Technologies Inc. | Method and apparatus for network speech enhancement |
US20030112265A1 (en) * | 2001-12-14 | 2003-06-19 | Tong Zhang | Indexing video by detecting speech and music in audio |
US20050114128A1 (en) * | 2003-02-21 | 2005-05-26 | Harman Becker Automotive Systems-Wavemakers, Inc. | System for suppressing rain noise |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9215538B2 (en) | 2009-08-04 | 2015-12-15 | Nokia Technologies Oy | Method and apparatus for audio signal classification |
Also Published As
Publication number | Publication date |
---|---|
US20070136053A1 (en) | 2007-06-14 |
US8126706B2 (en) | 2012-02-28 |
WO2007070337A3 (en) | 2011-05-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8126706B2 (en) | Music detector for echo cancellation and noise reduction | |
US7454010B1 (en) | Noise reduction and comfort noise gain control using bark band weiner filter and linear attenuation | |
CA2527461C (en) | Reverberation estimation and suppression system | |
JP3963850B2 (en) | Voice segment detection device | |
US7492889B2 (en) | Noise suppression based on bark band wiener filtering and modified doblinger noise estimate | |
US9432766B2 (en) | Audio processing device comprising artifact reduction | |
JP4307557B2 (en) | Voice activity detector | |
US7649988B2 (en) | Comfort noise generator using modified Doblinger noise estimate | |
US6510224B1 (en) | Enhancement of near-end voice signals in an echo suppression system | |
US20060126865A1 (en) | Method and apparatus for adaptive sound processing parameters | |
US20070055513A1 (en) | Method, medium, and system masking audio signals using voice formant information | |
US8423357B2 (en) | System and method for biometric acoustic noise reduction | |
CN104981870B (en) | Sound enhancing devices | |
EP3757993B1 (en) | Pre-processing for automatic speech recognition | |
CN111883182A (en) | Human voice detection method, device, equipment and storage medium | |
WO2006058361A1 (en) | Method and apparatus for adaptive sound processing parameters | |
US20080043995A1 (en) | Histogram for controlling a telephone | |
CN113316075B (en) | Howling detection method and device and electronic equipment | |
US20210151066A1 (en) | Audio Device And Method Of Audio Processing With Improved Talker Discrimination | |
EP2063420A1 (en) | Method and assembly to enhance the intelligibility of speech | |
US20130226568A1 (en) | Audio signals by estimations and use of human voice attributes | |
US20050213745A1 (en) | Voice activity detector for low S/N | |
CN115580804A (en) | Earphone self-adaptive output method, device, equipment and storage medium | |
JPH0337699A (en) | Noise suppressing circuit | |
US20080043997A1 (en) | Noise threshold matrix for controlling audio processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 06844965 Country of ref document: EP Kind code of ref document: A2 |