WO2001033814A1 - Integrated voice processing system for packet networks - Google Patents

Integrated voice processing system for packet networks Download PDF

Info

Publication number
WO2001033814A1
WO2001033814A1 PCT/US2000/030298 US0030298W WO0133814A1 WO 2001033814 A1 WO2001033814 A1 WO 2001033814A1 US 0030298 W US0030298 W US 0030298W WO 0133814 A1 WO0133814 A1 WO 0133814A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
signal
noise
processing blocks
background noise
Prior art date
Application number
PCT/US2000/030298
Other languages
French (fr)
Inventor
Richard C. Younce
Daniel J. Marchok
Charles W. K. Gritton
Ravi Chandran
Graham Rousell
Original Assignee
Tellabs Operations, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tellabs Operations, Inc. filed Critical Tellabs Operations, Inc.
Priority to AU13596/01A priority Critical patent/AU1359601A/en
Priority to CA002390200A priority patent/CA2390200A1/en
Publication of WO2001033814A1 publication Critical patent/WO2001033814A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/012Comfort noise or silence coding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0364Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B3/00Line transmission systems
    • H04B3/02Details
    • H04B3/20Reducing echo effects or singing; Opening or closing transmitting path; Conditioning for transmission in one direction or the other
    • H04B3/23Reducing echo effects or singing; Opening or closing transmitting path; Conditioning for transmission in one direction or the other using a replica of transmitted signal in the time domain, e.g. echo cancellers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • H04M9/082Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using echo cancellers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Definitions

  • the present invention is principally related to voice processing systems and, in particular, to a next generation voice processing system (NGVPS) designed specifically for voice-over-x systems and a wider class of voice processing applications.
  • NVPS next generation voice processing system
  • Voice quality is critical to the success of voice-over-x (e.g., Voice-Over-IP) systems, which has led to complex, digital signal processor (DSP) intensive, voice processing solutions.
  • DSP digital signal processor
  • TDM time division multiplex
  • FIG. 1 shows a typical "black box” block diagram.
  • the following abbreviations are used in FIG. 1: NR: noise reduction; ALC: automatic level control; ENC: speech encoder;
  • FE far end speaker
  • EC echo canceller
  • SS silence suppressor
  • NS noise substitution
  • DEC speech decoder
  • NE near end speaker.
  • a transmitted voice signal 102 is processed by the echo canceller, and the pulse code modulated (PCM) output of the canceller is simply forwarded to the optional noise reduction unit, and then onto the auto level control unit, and then onto the codec, etc.
  • PCM pulse code modulated
  • FIG. 2 shows some of the individual elements within the subsystems in the voice-over-x DSP system of FIG. 1. A feel for the problem can be attained by some examples; a couple of the subsystem elements that can lead to sub-optimal voice quality are examined here.
  • a non-linear processor is included within the echo cancellation block.
  • the NLP is a post-processor that eliminates the small amount of residual echo that is always present after the linear subtraction of the echo estimate.
  • One artifact of the NLP is that it can distort background noise signals.
  • FIG. 2 Also shown in FIG. 2 are some of the components inside the noise reduction (NR) block.
  • the NR sub-system must generate a background noise estimate. If the NR block is not aware of the distortion introduced by the NLP, it will improperly identify the background noise resulting in lower performance.
  • there is a background noise estimate function within the speech coder subsystem This estimate is sent to the far end voice-over-x system when the near end speaker is silent. Both the NLP and the NR block would also adversely affect this noise estimate if their actions were not taken into account.
  • VAD voice activity detectors
  • the goal of the VAD is to accurately detect the presence of either NE or FE speech. If speech is present, then the associated processing of the ALC, NR, or speech coder is performed.
  • the echo canceller's double talk detector (DTD) is another form of VAD. It must detect both NE and FE speech and control the canceller so that it only adapts when NE speech is absent. Interaction between the elements such as the NLP, NR, or changes in the ALC can negatively affect the accuracy of the downstream VAD. For example, losses in the NLP or NR subsystems may falsely trigger the speech encoder to misinterpret voice as silence. This would cause the codec to clip the NE speech, which would degrade voice quality. Similar issues exist with regard to the VAD in the ALC block.
  • the present invention provides a next-generation voice processing system (NGVPS) designed with the overall system in mind.
  • NGVPS next-generation voice processing system
  • Each voice-processing block has been opened up revealing common functions and inter-block dependencies.
  • the NGVPS also enhances the functionality of some functions by using processing and signals that were previously only available to a single block.
  • the NGVPS provides the best overall voice performance.
  • This holistic approach provides new means for optimizing voice processing from an end-to-end systems approach. This will be an important factor in the success of the new network.
  • the system-wide, integrated voice processing approach of the present invention also creates opportunities for further enhancements by reordering of the sub-blocks, which make up the various blocks.
  • work has been conducted in the past on sub-band NLPs for echo cancellers.
  • the significant processing required to create the sub-bands has typically been an over-riding factor with respect to the performance improvements.
  • a NR system typically divides the signal into sub-bands in order to perform its operations. Opening up these blocks facilitates a system in which the EC's NLP can be moved to the sub-band part of the NR system.
  • the performance improvement may be gained with very little additional processing.
  • the new public network concept which is based on packet voice, requires this type of processing at each point of entry and departure from the network.
  • Establishing a more integrated system, having the best performing processing elements at these points, is one of the objectives of the present invention.
  • the present invention may be applicable to voice band enhancement products or voice-over-x products. Additional applications that could benefit from the present invention include any other products carrying-out voice processing.
  • FIG. 1 is block diagram of a voice processing system in accordance with prior art techniques
  • FIG. 2 is a block diagram illustrating various blocks of the voice processing system of FIG. 1 in greater detail
  • FIG. 3 is a block diagram of a voice processing system in accordance with the present invention
  • FIG. 4 is a block diagram of an echo canceller and noise reduction circuit in accordance with prior art techniques and to which the present invention may be beneficially applied;
  • FIG. 5 is a block diagram of a noise injection system in accordance with one embodiment of the present invention.
  • FIG. 6 is a block diagram of a duo echo canceller system in accordance with another embodiment of the present invention.
  • FIG. 3 A block diagram of an integrated voice-over-x DSP system in accordance with the present invention is shown in FIG. 3.
  • various features of the system can be implemented in hardware, software, or a combination of hardware and software.
  • some aspects of the system can be implemented in computer programs executing on programmable computers.
  • Each program can be implemented in a high level procedural or object-oriented programming language to communicate with a computer system.
  • each such computer program can be stored on a storage medium, such as read-only-memory (ROM) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage medium is read by the computer to perform the functions described above.
  • ROM read-only-memory
  • Speech signals (preferably in digital form) are represented by heavy solid lines; signal estimates, representative of various qualities of the voice signals, are illustrated using dashed lines; control signals are illustrated using solid lines; and algorithmic parameters, representative of internal values calculated by the various voice processing blocks, are illustrated using heavy dashed lines.
  • Transmitted voice signals 102 are provided to an echo canceller having an adder 304 and echo estimator 306.
  • the resulting signals are then passed to a noise reduction circuit 308 and a non-linear processor 310.
  • the echo canceller, noise reduction circuit 308 and NLP 310 form an integrated echo and noise reduction section.
  • the output of the NLP 310 is sent to an ALC 312 and then to buffering 314 and a speech encoder 316.
  • a centralized buffer (not shown) is preferred over separate buffers associated with particular voice processing blocks (e.g., the buffering 314 associated with the speech encoder 316). In this manner, the various voice processing operations may be sequentially performed on audio data stored in the buffer.
  • the centralized buffer has not been illustrated in FIG.
  • circuitry and its derivatives are used throughout this description as a means of describing various functional elements shown in the figures. However, use of this term should not be construed as a limitation to the manner in which such elements may be implemented, i.e., as hardware circuits.
  • the various blocks within the control processing section of the integrated system receive inputs from and provide outputs to the various blocks in the transmit signal processing section. Such signals are well known to those having ordinary skill in the art and, where necessary, are discussed below.
  • a centralized voice activity detector 330 and a centralized noise estimator 332 are provided within the control processing section. As shown, these blocks are coupled to a residual estimator 334 (for assessing the amount of residual echo left in the transmit signal 102 after echo cancellation), a near end signal estimator 336, a near end gain controller 338 and a framing controller 340.
  • the centralized noise estimator 332, the residual estimator 334, the near end signal estimator 336, the near end gain controller 338 and the framing controller 340 are associated with the transmit signal processing section.
  • the control processing section also comprises a far end signal estimator 342 and a far end gain controller 344 associated with a receive signal processing section.
  • the receive signal processing section takes received audio signals 104 as input.
  • a lost packet handler 360 is provided to mitigate the effects of lost packets on the received audio.
  • the speech decoder 362 converts the received audio signal from a parameterized or other representative form to a continuous speech stream.
  • the received speech is then provided to an ALC 364. Note that the redundant blocks illustrated in FIG. 2 have been consolidated in the single control block in FIG. 3. Examples of consolidated and enhanced functions include the VADs and the background noise estimators.
  • VAD Voice Activity Detection circuitry
  • the NR sub-system needs to know when speech is absent so that it can update its estimate of the background noise. NR also needs to know when speech is present so that it can adjust gains and calculate signal powers.
  • the ALC block needs to know when speech is present so that it can get a good reading of the voice signal levels.
  • the echo canceller uses a form of VAD called a double talk detector (DTD) to reduce the influence of uncorrelated signals and thus improves its estimate of the echo.
  • DTD double talk detector
  • the speech encoder and accompanying silence suppressor uses a VAD to detect silence, which triggers a reduction in the rate of transmitted packets (i.e.
  • the codec outputs a description of the silence/background-noise periodically).
  • the integrated approach creates a common VAD that reduces the complexity of the product and in turn, increases density and reduces cost.
  • the consolidated VAD performs more accurately than the individual VADs.
  • VAD voice activity detectors
  • Each block increases the likelihood that the subsequent blocks' VADs will misinterpret speech as silence or silence as speech. Additionally, the problem of cascading errors is avoided. Certain problem cases can cause a single block to perform incorrectly on a segment of speech or silence. In the multiple VAD case, this can have a cascading effect as the subsequent blocks' VADs trigger errantly.
  • the goal of the VAD is to accurately detect the presence of either NE or FE speech. If speech is present, then the associated processing of the ALC, NR, or speech coder is performed.
  • the echo canceller's double talk detector (DTD) is another form of VAD. It must detect both NE and FE speech and control the canceller so that it only adapts when NE speech is absent. Interaction between the elements such as the NLP, NR, or changes in the ALC can negatively affect the accuracy of the downstream VAD. For example, losses in the NLP, NR, or ALC subsystems may falsely trigger the speech encoder to misinterpret voice as silence. This would cause the codec to clip the NE speech, which would degrade voice quality.
  • a NR system can be established that uses a probability of speech presence measure to control the algorithm instead of a simple threshold.
  • a second factor in the VAD's performance enhancement is that it uses metrics from several of the blocks that would otherwise only be visible to a single block.
  • the consolidated VAD uses performance measures from the echo canceller block such as Echo Return Loss Enhancement (ERLE), along with typical VAD measures (e.g. RMS power and zero-crossings) for both transmit and receive voice signals.
  • ERLE Echo Return Loss Enhancement
  • the CVAD also uses the spectral properties and formant information from the noise reduction algorithm and speech encoder. The other speech encoder parameters are also used to help determine voice activity.
  • the encoder's pitch predictor provides a powerful indicator of the presence of voiced speech and is used to further improve the CVAD. Those having ordinary skill in the art are familiar with these metrics and their use in implementing VADs.
  • a third factor in the CVAD performance enhancement is that it controls all of the hold-over and voice states for each of the subsystems.
  • a hold-over function is commonly added to a VAD to improve the system's performance for unvoiced speech by preventing state changes until a predetermined period of time has expired.
  • the use of multiple voice states is a VAD enhancement that is part of a proprietary adaptive noise cancellation (ANC) algorithm of Tellabs, which is used for noise reduction. Centralizing the control of these interacting enhancement functions prevents unstable inter-block interaction.
  • ANC adaptive noise cancellation
  • the speech presence sensitivity requirements of each block differ. For instance, if given a choice between having the speech coder not recognize silence or performing silence suppression procedures during low-level speech, the former would be the obvious choice. Thus, the speech coder requires high speech sensitivity.
  • Some of the other functions such as EC and NR can generally accommodate a less sensitive VAD, and benefit from a multi-level speech probability measure. For instance, the EC can slow the adaptation of its taps as the probability of speech presence measure approaches the DTD threshold.
  • a NR system can be established that uses a probability of speech presence measure to control the algorithm instead of a simple threshold.
  • the CVAD provides appropriate voice activity signals to the different blocks; although, the VAD processing is integrated. For instance, the CVAD would normally provide just a binary speech present or absent signal to the speech coder, while a multi-level or probability of speech presence measure is provided to the other blocks. These three CVAD factors combine to create high performance VAD, which produces a powerful improvement in overall system performance.
  • the interaction between self-optimized processing blocks can result in sub-optimal overall performance. This can be particularly pronounced for the EC function's NLP and the noise reduction function. This is particularly poor when ERLE is poor, which is the case when the NLP is used without the EC. The result is an intermittent choppiness in the speech and background noise.
  • the NR subsystem is placed between the EC and the EC's NLP. This is important to speech quality, as the nonlinear nature of the NLP affects the NR system in a dramatic way.
  • the NLP can change the noise location and affect its level at various frequencies in a time varying fashion that is difficult to track in the NR system. This is because most of the NR system's noise estimates are performed during silence, but used during speech. This makes NR systems susceptible to time varying noise backgrounds, particularly with regard to spectral content.
  • the NLP with its associated noise injection process may have different background noise levels when speech is present compared to when speech is absent. This is effectively a time varying noise source, which would degrade NR performance in a typical voice processing system (VPS).
  • VPN voice processing system
  • the integrated system places the NR function between the EC and the NLP. It also uses a central noise and signal estimate as described in Section 4. The estimates are adjusted to compensate for the effect of the NR system in the control of the NLP.
  • the NR system reduces noise by a fixed factor during times of voice inactivity. It has been shown that improved NLP performance is realized when the NLP operates in the sub-band domain. However, sub-band NLPs are rarely used due to the cost of creating the sub-band signal, both in real dollars as well as processing power and delay. However, the NGVPS offers this sub-band option, by further integrating the NLP into each of the NR systems sub-bands. These sub-bands are created as part of the noise reduction function. Hence, by integrating these two functions together, performance can be gained without the added cost. The sub-band NLP further improves performance. The integrated EC and NR approach out-performs the black-box approach even without this further enhancement.
  • the voice processing blocks include an echo canceller, noise reduction block and level adjustment block.
  • Each of those blocks makes a gain adjustment to the input signal. Normally this is done by each block independently.
  • a preferred implementation involves computing the adjustments individually in each block but then adjusting the signal once per the combined adjustment calculations in one central adjustment block, function or location.
  • FIG. 2 Contrast once again the block diagram of an integrated voice-over-x DSP system as shown in FIG. 3 with the system shown in FIG. 2.
  • the multiple signal estimators of FIG. 2 have been consolidated into a single signal estimator in the control block.
  • the multiple noise estimators of FIG. 2 have been consolidated into a single noise estimator in the control block.
  • the signal estimator is very closely related to some parts of the consolidated VAD (CVAD) function and should perhaps be shown as part of the VAD.
  • This consolidated signal estimator includes both broadband and sub-band signal estimates. The majority of the processing power associated with creating the sub-band estimates is actually part of the NR process. Similarly, the majority of the processing power for the broadband estimate can be considered to be part of an ordinary VAD.
  • the various background noise estimates are consolidated into a single background noise estimate.
  • This background noise estimate is actually a set of estimates, some broadband and some sub-band, but is referred to in singular to avoid confusion with the unconsolidated estimates.
  • This estimate is derived from the transmit signal just after the near-end echo estimate is subtracted by the canceller.
  • the consolidated noise estimate serves as the background noise estimate to the NLP subsystem for background noise transparency (also known as comfort noise injection), the NR subsystem (for spectral subtraction of background noise), and the speech encoder (to send silence descriptors during silence). It is also shared by the VAD to help it avoid false triggers resulting from noise and to more accurately calculate the probability of speech being present.
  • the improved background noise estimate can be used in the NR, which, in turn, increases the amount of noise reduction and reduces any artifacts or distortion in the speech. Distorted speech is even more difficult to model in the codec, so it, in turn, would add more distortion.
  • the silence suppressor uses a version of the noise estimate, which has been modified to account for the effect of the NR system. This improves the accuracy of the silence suppressor and reduces the noise modulation.
  • the quality of the noise often distinguishes one VBE system from the next.
  • speech is active less than 50% of the time, in a given direction.
  • the analog signal is sampled 8000 times per second and converted to an 8 bit digital a-law or ⁇ -law encoded signal.
  • Voice Processing Systems interface with this PCM encoded digital data stream.
  • An echo canceller is one such device that adapts to the impulse response of the near-end transmission facility and produces an echo estimate by multiplying this impulse response by the signal from the far end. This echo estimate is subtracted from the near-end signal producing a signal which has the echo component removed. This process is not exact because of the quantitization distortion of the a-law and ⁇ -law encoding processes. This quantitization distortion limits the echo return loss enhancement (ERLE) to approximately 33 dB even if all other processes are perfect.
  • ERLE echo return loss enhancement
  • NLP non-linear processor
  • the present invention contemplates how another aspect of a voice processing system, such as the noise reduction system element as a specific example, can be used during it's otherwise "idle" time to provide virtually non-perceptible insertion of a derived noise signal into the gaps created during NLP operation. While it may be possible to design an NLP to remove significant non-linear "echo” artefacts (as may be found in the tail circuit of a mobile cellular telephony network, for example) without disturbing the background noise, it is considered that the processing power required to effectively achieve such puts this solution out of the reach of a practical system. The present invention limits or altogether circumvents any such onerous requirement by keeping the NLP basic and using otherwise spare processing power.
  • FIG. 4 there is illustrated an exemplary echo canceller (EC) and noise reduction (NR) system in accordance with prior art techniques to which the present invention may be applied as described below.
  • EC echo canceller
  • NR noise reduction
  • operation of the echo canceller filter, the NLP, and the noise reduction filter are well published and known to those of ordinary skill in the art, and therefore need not be described in substantial detail herein. Accordingly, the focus of the following discussion will be on the technique by which system elements and/or characteristics and/or resources, such as for example the readily accessible noise reduction processing aspect of the system, can be used to provide a dynamic spectrally and amplitude matched comfort noise injection signal for insertion into the gaps of signal created by the NLP in response to far-end speech.
  • the NLP will be operating when the far-end talker speaks (to prevent residual echo), and releasing when the near-end talker speaks.
  • the NLP is released, but the residual echo remaining after the echo canceller filter is likely to be below a disturbing level.
  • the noise reduction processor will be converging on the stationary content of the background noise, this being the part of a noise signal for which the amplitude and spectrum remain constant over some seconds.
  • the far-end talker will respond to the near-end talker and the echo canceller filter algorithm decides whether the NLP should be operated or not (low to medium near-end noise, or high near-end noise condition respectively). If the NLP is operated then the residual echo and any near-end noise will be muted, giving rise to a background noise modulation effect perceived by the far-end.
  • residual echo and any near-end noise might be compressed, scrambled, or compressed and scrambled, or clipped or passed through unprocessed. From experience, perception of the modulation effect by the far-end user is increased if delay over the telephone circuit is increased (>40mS round trip delay). The overall effect is quite disturbing. Background noise modulation can be an issue wherever the speech path is interrupted, which is why the techniques described herein are equally useful in systems employing discontinuous transmission (DTX) methods and voice activity detectors (VAD).
  • DTX discontinuous transmission
  • VAD voice activity detectors
  • these estimates may be determined on a broadband or sub-band basis.
  • a sample of random, spectrally and amplitude matched noise is available to use, less the non-stationary elements that could cause a repeatable pattern during playback into the signal path.
  • the derived noise model can then be seamlessly (substantially unnoticed in the resultant audio) injected into the signal path following the NLP, whilst the NLP is operated.
  • the level of the noise injection may be partially based upon NLP parameters to accommodate various levels of muting or scrambling that might be taking place. Therefore the control for sampling the noise and injecting the noise is common to the NLP control line (not known in prior art systems) from the echo canceller filter shown in FIG. 4.
  • the term "injecting” refers to (means) substituting a noise signal for an NLP output, as well as combining a noise signal with the NLP output.
  • Techniques for deriving the noise spectrum and amplitude generally appear in other system designs, however among the differences between such other designs and the approach taken in the context of the herein-described embodiment of the present invention is that the system described herein makes alternative use of at least one aspect of a voice processor system.
  • resources associated with the noise reduction processor and system are used, during what is effectively an idle period for traditional noise reduction processors (e.g., when the NLP is operated), in a manner to improve the perceived quality of the communicated signal.
  • the noise reduction processor will be converging on the stationary element of the noise signal and then applying a filter function to remove a defined amount of the stationary noise from the signal.
  • the noise reduction filter is "frozen,” or in other words not updated or otherwise changed, so that the model is not lost while the NLP is in operation.
  • the noise reduction filter does not ordinarily function to provide noise reduction during this period of NLP operation, but then resumes operation once the NLP is no longer operated. In this way as the noise spectrum and amplitude change throughout the filter processor can track the changes and efficiently reduce the noise.
  • the spectral and gain estimates maintained by the noise reduction filter which are typically frozen as described above, are referenced and used in a new manner for the generation of a noise signal for injection into the communication signal at the appropriate intervals (e.g., during operation of the NLP).
  • One example approach for using such filter coefficients in this manner to generate a noise signal for injection is to use them to filter white noise that is internally generated.
  • This noise could be broadband noise that is then filtered by each sub-band weighting coefficient or independent per each sub-band also weighted by each sub-band coefficient.
  • the generated noise then has the same spectral characteristics as the true or actual background noise since the adaptive sub-band weighting coefficients converge to the spectral coefficients of that noise.
  • the gain estimate(s) to scale the spectrally matched noise, the model is able to more accurately match the background noise.
  • FIG. 5 An example embodiment of this aspect of the present invention is illustrated in FIG. 5.
  • a transmitted voice signal is provided to an echo canceller 502 and non-linear processor 504.
  • the resulting signal is then sent to an adaptive noise estimator/reducer 506.
  • a control signal 510 indicative of the active/inactive state of the NLP 504 is sent to a noise reduction controller 508.
  • the noise reduction controller 508 provides a noise reduction control signal 512 to the adaptive noise estimator/reducer 506.
  • the controller 508 configures the noise reduction control signal 512 to instruct the adaptive noise estimator/reducer 506 to allow the noise estimator to adapt and subtract a portion of the noise from the input voice signal. Conversely, if the NLP 502 is active, the controller 508 configures the noise reduction control signal 512 to instruct the adaptive noise estimator/reducer 506 to freeze the noise estimation process and generate synthesized background noise based on the current frozen background noise model. The synthesized noise is thereafter added to the input signal.
  • Another way in which the integration of the NGVPS outperforms current generation of VPSs is by synchronizing the entire system to fixed boundaries, preferably, the codec frames, sub-frames or both.
  • this is accomplished by the framing control block 340 issuing at least one boundary control signal to the respective voice processing blocks, which control signal informs the blocks of the boundaries.
  • This provides enhanced performance for a number of blocks.
  • the ALC, NR, and EC functions of the NGVPS are all enhanced.
  • ALC is used to add gain to low-level voice signals when too much transmission loss is encountered or to reduce high-level speech signals, which may overdrive analog circuits at the other end of the network.
  • the intelligent block-to-block control coordinates the interaction of the automatic gain control and the speech coder.
  • Gain control changes are synchronized with the frame boundaries of the speech coder. This allows the NGVPS to hold the gain constant during the speech coder sub-frames and/or frames. By not changing the gain during sub-frames and/or frames coder performance is enhanced. Reducing the variation of the signal level mid-frame improves the modeling of the speech by the encoder. Mid-frame level changes require a trade-off in the coder's non-gain speech parameters.
  • the codebook search for example, needs to select an excitation vector, which when played out through the filter based on the LPC coefficients would have a sudden increase in volume. This does not fit the normal speech model very well and can dominate the selection of a codebook vector causing the more subtle characteristics to be overlooked.
  • each frame and/or sub-frame of the coded speech contains a gain parameter. By synchronizing the ALC gain changes to these boundaries, the changes can be modeled in the gain parameter without the degenerating effect on the selection of the other parameters.
  • the ALC algorithm is not only synchronized to the frames in order to coordinate its gain adjustment times, but also to take advantage of the data-blocking required for the codecs.
  • An important part of an ALC system is the ability to minimize clipping due to over- amplification. By synchronizing the ALC system to the data-blocks, the ALC system can look at the entire block for clipping, and incorporate that into its gain selection.
  • look-ahead is used to improve the VAD's performance. It is often difficult to recognize changes in voice activity until some time after they happen. By adding look-ahead to a VAD its performance can be improved. Some codecs such as G.729 and G.723.1 require look-ahead data to perform their functions. Again by coordinating the data- blocks with the VAD function, the system VAD can use look-ahead without adding delay to the system.
  • Another feature of the present invention that can significantly enhance voice quality is the inclusion of a far end echo canceller.
  • Some of today's TDM carriers choose to cancel echo in both directions using a single network element.
  • These "duo" echo cancellers are most popular in wireless environments, where delay introduced in the wireless air interface creates the need to cancel echo in both directions; i.e. echo from the PSTN and wireless terminal.
  • an operator may similarly choose to deploy a duo canceller configuration, as the same condition exists.
  • packet network as used throughout this disclosure, is a specific example of a wider class of variable delay networks to which the present invention is applicable.
  • the packet network with speech compression adds delay to connections that might otherwise not need a canceller, as in wireless applications.
  • FIG. 6 shows the duo layout comprising a near end echo canceller 602 and a far end echo canceller 604. Notice that the far end or packet switch echo canceller has the packet network in its endpath. Packet networks are notorious for dropped packets and significant delay variation. Both of these impairments can severely affect the performance of a canceller. In a standard voice-over-x implementation, the packet processor has some knowledge of the lost packets and changes in endpath delay. By sharing this information with the far end echo canceller and by subsequently using this information to intelligently control the canceller's behavior, the detrimental effects created by the packet network are minimized. In other words, voice quality is optimized. Some advanced TDM networks being created for the wireless world may also have changing endpath delay.
  • This advanced echo canceller has a couple of new features. First, it is synchronized to packet boundaries and can disable both coefficient update and echo cancellation on a packet by packet basis. When a packet is lost and has to be replaced using lost or errored packet substitution, the coefficients are frozen and echo cancellation is disabled. If echo cancellation were not disabled, subtracting out the estimated echo response would actually add echo. This would result because the substituted packet would be so different from the lost packet that subtracting the actual echo would effectively be adding the negative of the echo to this signal.
  • the packet substitution algorithm does not base the replacement packet on the previously received packets, but on the echo cancelled versions of these packets.
  • this AEC is integrated with a decoder that receives the same silence description (SID) information sent to the far-end.
  • SID silence description
  • the SID packets only contain spectral information, which the far end uses to filter randomly chosen excitation vectors.
  • the accuracy of the reconstructed signal at the far end is limited to the spectral characteristics conveyed by the SID information.
  • the far-end codec is part of the end-to-end system, as with the present invention, it is possible to synchronize the local random codebook excitation selection with that being used at the far-end.
  • Such synchronization may take advantage of any unused bits in the SID packets, which are usually the same size as the regular speech packets but only contain spectral information.
  • the unused bits corresponding to the codebook excitation are available for random number generator synchronization between the two ends. This allows the AEC to have access to the signal that is echoing back, even when DTX is active and comfort noise generation is taking place at the far-end in response to SID packets. Without this feature, the EC would not know what signal was being echoed back and would have to disable coefficient updates.
  • a secondary issue with not having this feature is that any echoed noise would have to be left in the received signal.
  • this decoder is active even for non-SID packets. This helps to reduce the nonlinearity of the endpath by modeling the effect of the coder-decoder combination in one direction.
  • a last feature of the AEC is the ability for the echo cancellers at either end to move their respective h vectors (i.e., time domain transfer function) in response to changes in delay in their respective endpaths.
  • h vectors i.e., time domain transfer function
  • each end of the AEC maintains jitter buffers, which adjust in response to network conditions.
  • the EC receives information from its local jitter buffer and moves the effective locations of the h vector's coefficients in response to the buffer adjustments. Additionally, or alternatively, the EC also monitors its ERLE metric.
  • the EC knows to adjust its h vector's coefficient locations; if the delay has changed the AEC adjusts the h vector's coefficient locations accordingly. In this way the AEC can accommodate delay changes that occur and are not under the NGVPS 's control. These types of delay changes can occur due to adjustments in other network buffers. Furthermore, information regarding changes to delay characteristics determined at one end can be forwarded to the other end so that the effects of the changed delay can be accounted for at both ends.
  • the far end can send information regarding the change in delay to the near end so that it can begin to adjust its coefficients in anticipation of receiving the audio impacted by the change delay.

Abstract

A next-generation voice processing system (NGVPS) is provided. Voice-processing blocks within prior art system have been opened up revealing common functions and interblock dependencies. By opening up and consolidating portions of these blocks, the NGVPS enhances the functionality of some functions by using processing and signals that were previously only available to a single block. By taking into account the interaction of these various sub-systems and elements, the NGVPS provides the best overall voice performance. This holistic approach provides new implementations for optimizing voice processing from an end-to-end systems approach.

Description

INTEGRATED VOICE PROCESSING SYSTEM FOR PACKET NETWORKS
Technical Field
The present invention is principally related to voice processing systems and, in particular, to a next generation voice processing system (NGVPS) designed specifically for voice-over-x systems and a wider class of voice processing applications.
Cross-Reference To Related Applications
The present application claims priority from U.S. Patent Application Serial No. 60/163,359 entitled "INTEGRATED VOICE PROCESSING SYSTEM FOR
COMMUNICATION NETWORKS" filed on November 3, 1999 and of U.S. Patent
Application Serial No. 60/224,398 "NOISE INJECTING SYSTEM" filed on August 10,
2000, both assigned to the same assignee of the present invention.
The teachings of U.S. Patent Nos. 5,721,730; 5,884,255; 5,561,668; 5,857,167 and 5,912,966 are hereby incorporated by reference.
Background Of The Invention
Voice quality is critical to the success of voice-over-x (e.g., Voice-Over-IP) systems, which has led to complex, digital signal processor (DSP) intensive, voice processing solutions. For the so-called new public network to be successful in large-scale voice deployment, it must meet or exceed the voice quality standards set by today's time division multiplex (TDM) network. These systems require a combination of virtually all known single source voice processing algorithms, which include but are not limited to the following: echo cancellation, adaptive level control, noise reduction, voice encoders and decoders (or codecs), acoustic coupling elimination and non-linear processing, voice activity detectors, double talk detection, signaling detection-relay-and-regeneration, silence suppression, discontinuous transmission, comfort noise generation and noise substitution, lost packet substitution reconstruction, and buffer and jitter control. The current generation of voice solutions for packet networks has addressed this complex need by obtaining and plugging together separate voice sub-systems. Suppliers of these systems have concentrated their efforts in obtaining and creating each of the various blocks and making the blocks work together from an input-output perspective. During the integration process each of the functions have effectively been treated as black boxes. As a result, the sub-systems have been optimized only with regard to their function and not with respect to the complete system. This has lead to an overall sub-optimal design. The resulting systems have a reduced voice quality and require more processing power than an integrated approach, which has been optimized from a system perspective.
FIG. 1 shows a typical "black box" block diagram. The following abbreviations are used in FIG. 1: NR: noise reduction; ALC: automatic level control; ENC: speech encoder;
FE: far end speaker; EC: echo canceller; SS: silence suppressor; NS: noise substitution; DEC: speech decoder; and NE: near end speaker. As shown, a transmitted voice signal 102 is processed by the echo canceller, and the pulse code modulated (PCM) output of the canceller is simply forwarded to the optional noise reduction unit, and then onto the auto level control unit, and then onto the codec, etc. A similar path is provided for received voice signals 104.
The problem with this method of simply plugging together DSP boxes is that it does not take into account the interactions of the elements within the boxes. FIG. 2 shows some of the individual elements within the subsystems in the voice-over-x DSP system of FIG. 1. A feel for the problem can be attained by some examples; a couple of the subsystem elements that can lead to sub-optimal voice quality are examined here.
In typical fashion, a non-linear processor (NLP) is included within the echo cancellation block. The NLP is a post-processor that eliminates the small amount of residual echo that is always present after the linear subtraction of the echo estimate. One artifact of the NLP is that it can distort background noise signals. Also shown in FIG. 2 are some of the components inside the noise reduction (NR) block. The NR sub-system must generate a background noise estimate. If the NR block is not aware of the distortion introduced by the NLP, it will improperly identify the background noise resulting in lower performance. As also known in the art, there is a background noise estimate function within the speech coder subsystem. This estimate is sent to the far end voice-over-x system when the near end speaker is silent. Both the NLP and the NR block would also adversely affect this noise estimate if their actions were not taken into account.
Another interaction problem can occur with the voice activity detectors (VAD) shown in FIG. 2. The goal of the VAD is to accurately detect the presence of either NE or FE speech. If speech is present, then the associated processing of the ALC, NR, or speech coder is performed. The echo canceller's double talk detector (DTD) is another form of VAD. It must detect both NE and FE speech and control the canceller so that it only adapts when NE speech is absent. Interaction between the elements such as the NLP, NR, or changes in the ALC can negatively affect the accuracy of the downstream VAD. For example, losses in the NLP or NR subsystems may falsely trigger the speech encoder to misinterpret voice as silence. This would cause the codec to clip the NE speech, which would degrade voice quality. Similar issues exist with regard to the VAD in the ALC block.
Thus, a need exists for an improved voice processing system that does not suffer from the interactive shortcomings of prior art solutions.
Summary Of The Invention
The present invention provides a next-generation voice processing system (NGVPS) designed with the overall system in mind. Each voice-processing block has been opened up revealing common functions and inter-block dependencies. By opening up these blocks, the NGVPS also enhances the functionality of some functions by using processing and signals that were previously only available to a single block. By taking into account the interaction of these various sub-systems and elements, the NGVPS provides the best overall voice performance. This holistic approach provides new means for optimizing voice processing from an end-to-end systems approach. This will be an important factor in the success of the new network.
A more system-wide optimization approach is described herein. This approach takes into account the interaction of the various sub-systems and elements to provide the best overall voice performance. For the so-called new public network to be successful in large- scale voice deployment, it must meet and should exceed the voice quality standards set by today's TDM network. Therefore, optimizing voice processing from an end-to-end systems approach is a critical success factor in new network design.
The system-wide, integrated voice processing approach of the present invention also creates opportunities for further enhancements by reordering of the sub-blocks, which make up the various blocks. For example, work has been conducted in the past on sub-band NLPs for echo cancellers. However, the significant processing required to create the sub-bands has typically been an over-riding factor with respect to the performance improvements. However, a NR system typically divides the signal into sub-bands in order to perform its operations. Opening up these blocks facilitates a system in which the EC's NLP can be moved to the sub-band part of the NR system. Thus, the performance improvement may be gained with very little additional processing.
The new public network concept, which is based on packet voice, requires this type of processing at each point of entry and departure from the network. Establishing a more integrated system, having the best performing processing elements at these points, is one of the objectives of the present invention. The present invention may be applicable to voice band enhancement products or voice-over-x products. Additional applications that could benefit from the present invention include any other products carrying-out voice processing.
Brief Description Of The Drawings In the detailed description of presently preferred embodiments of the present invention which follows, reference will be made to the drawings comprised of the following figures, wherein like reference numerals refer to like elements in the various views and wherein:
FIG. 1 is block diagram of a voice processing system in accordance with prior art techniques;
FIG. 2 is a block diagram illustrating various blocks of the voice processing system of FIG. 1 in greater detail;
FIG. 3 is a block diagram of a voice processing system in accordance with the present invention; FIG. 4 is a block diagram of an echo canceller and noise reduction circuit in accordance with prior art techniques and to which the present invention may be beneficially applied;
FIG. 5 is a block diagram of a noise injection system in accordance with one embodiment of the present invention; and FIG. 6 is a block diagram of a duo echo canceller system in accordance with another embodiment of the present invention.
Detailed Description of the Invention
1. An Integrated Approach Higher levels of voice quality can be achieved if the interactions of the elements within the boxes are considered and an integrated design approach is taken. The NGVPS system in effect opens these blocks, combining and enhancing common functions. This approach also eliminates inter-block dependencies. As a result of taking into account the interaction of these various sub-systems and elements, the NGVPS provides improved voice performance with less processing. In addition to improving common functions, the NGVPS enhances overall functionality by using processing and signals that were previously only available within a single block for multiple functions.
2. A Consolidated Multifunction Voice Activity Detector A block diagram of an integrated voice-over-x DSP system in accordance with the present invention is shown in FIG. 3. As those having ordinary skill in the art will recognize, various features of the system can be implemented in hardware, software, or a combination of hardware and software. For example, some aspects of the system can be implemented in computer programs executing on programmable computers. Each program can be implemented in a high level procedural or object-oriented programming language to communicate with a computer system. Furthermore, each such computer program can be stored on a storage medium, such as read-only-memory (ROM) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage medium is read by the computer to perform the functions described above. Note that there are a variety of signal types illustrated in FIG. 3. Speech signals (preferably in digital form) are represented by heavy solid lines; signal estimates, representative of various qualities of the voice signals, are illustrated using dashed lines; control signals are illustrated using solid lines; and algorithmic parameters, representative of internal values calculated by the various voice processing blocks, are illustrated using heavy dashed lines.
Transmitted voice signals 102 are provided to an echo canceller having an adder 304 and echo estimator 306. The resulting signals are then passed to a noise reduction circuit 308 and a non-linear processor 310. Collectively, the echo canceller, noise reduction circuit 308 and NLP 310 form an integrated echo and noise reduction section. The output of the NLP 310 is sent to an ALC 312 and then to buffering 314 and a speech encoder 316. It should be noted that a centralized buffer (not shown) is preferred over separate buffers associated with particular voice processing blocks (e.g., the buffering 314 associated with the speech encoder 316). In this manner, the various voice processing operations may be sequentially performed on audio data stored in the buffer. However, the centralized buffer has not been illustrated in FIG. 3 for the sake of simplicity. Similarly, the echo canceller functionality and the speech encoder 316 are preferably integrated, although they are shown as being separate in FIG. 3. The elements described above collectively form a transmit signal processing section of the overall integrated system, as shown in FIG. 3. Note that the term "circuitry" and its derivatives are used throughout this description as a means of describing various functional elements shown in the figures. However, use of this term should not be construed as a limitation to the manner in which such elements may be implemented, i.e., as hardware circuits.
The various blocks within the control processing section of the integrated system receive inputs from and provide outputs to the various blocks in the transmit signal processing section. Such signals are well known to those having ordinary skill in the art and, where necessary, are discussed below. Within the control processing section, a centralized voice activity detector 330 and a centralized noise estimator 332 are provided. As shown, these blocks are coupled to a residual estimator 334 (for assessing the amount of residual echo left in the transmit signal 102 after echo cancellation), a near end signal estimator 336, a near end gain controller 338 and a framing controller 340. As shown, the centralized noise estimator 332, the residual estimator 334, the near end signal estimator 336, the near end gain controller 338 and the framing controller 340 are associated with the transmit signal processing section. However, the control processing section also comprises a far end signal estimator 342 and a far end gain controller 344 associated with a receive signal processing section.
The receive signal processing section takes received audio signals 104 as input. A lost packet handler 360 is provided to mitigate the effects of lost packets on the received audio. The speech decoder 362 converts the received audio signal from a parameterized or other representative form to a continuous speech stream. The received speech is then provided to an ALC 364. Note that the redundant blocks illustrated in FIG. 2 have been consolidated in the single control block in FIG. 3. Examples of consolidated and enhanced functions include the VADs and the background noise estimators.
Almost all of the blocks in FIG. 2 have some form of Voice Activity Detection (VAD) circuitry built into them. The NR sub-system needs to know when speech is absent so that it can update its estimate of the background noise. NR also needs to know when speech is present so that it can adjust gains and calculate signal powers. The ALC block needs to know when speech is present so that it can get a good reading of the voice signal levels. The echo canceller uses a form of VAD called a double talk detector (DTD) to reduce the influence of uncorrelated signals and thus improves its estimate of the echo. The speech encoder and accompanying silence suppressor uses a VAD to detect silence, which triggers a reduction in the rate of transmitted packets (i.e. during silence the codec outputs a description of the silence/background-noise periodically). The integrated approach creates a common VAD that reduces the complexity of the product and in turn, increases density and reduces cost. In addition, the consolidated VAD performs more accurately than the individual VADs.
Higher performance is the result of several factors. First, interaction problems that can occur when multiple voice activity detectors (VAD) are used can be avoided. Each block increases the likelihood that the subsequent blocks' VADs will misinterpret speech as silence or silence as speech. Additionally, the problem of cascading errors is avoided. Certain problem cases can cause a single block to perform incorrectly on a segment of speech or silence. In the multiple VAD case, this can have a cascading effect as the subsequent blocks' VADs trigger errantly.
The goal of the VAD is to accurately detect the presence of either NE or FE speech. If speech is present, then the associated processing of the ALC, NR, or speech coder is performed. The echo canceller's double talk detector (DTD) is another form of VAD. It must detect both NE and FE speech and control the canceller so that it only adapts when NE speech is absent. Interaction between the elements such as the NLP, NR, or changes in the ALC can negatively affect the accuracy of the downstream VAD. For example, losses in the NLP, NR, or ALC subsystems may falsely trigger the speech encoder to misinterpret voice as silence. This would cause the codec to clip the NE speech, which would degrade voice quality. Similarly losses in the NLP or NR subsystems could cause the VAD in the ALC block to perform errantly. Of course the loss in the NLP could likewise cause the NR subsystem to perform incorrectly, thereby suppressing voice. This problem would then cascade into all subsequent blocks. These problems are further accentuated by the various hold-over or hangover counters and the increased number of possible voice activity states in more sophisticated NR systems. A NR system can be established that uses a probability of speech presence measure to control the algorithm instead of a simple threshold.
A second factor in the VAD's performance enhancement is that it uses metrics from several of the blocks that would otherwise only be visible to a single block. The consolidated VAD (CVAD) uses performance measures from the echo canceller block such as Echo Return Loss Enhancement (ERLE), along with typical VAD measures (e.g. RMS power and zero-crossings) for both transmit and receive voice signals. The CVAD also uses the spectral properties and formant information from the noise reduction algorithm and speech encoder. The other speech encoder parameters are also used to help determine voice activity. The encoder's pitch predictor provides a powerful indicator of the presence of voiced speech and is used to further improve the CVAD. Those having ordinary skill in the art are familiar with these metrics and their use in implementing VADs.
A third factor in the CVAD performance enhancement is that it controls all of the hold-over and voice states for each of the subsystems. A hold-over function is commonly added to a VAD to improve the system's performance for unvoiced speech by preventing state changes until a predetermined period of time has expired. The use of multiple voice states is a VAD enhancement that is part of a proprietary adaptive noise cancellation (ANC) algorithm of Tellabs, which is used for noise reduction. Centralizing the control of these interacting enhancement functions prevents unstable inter-block interaction. Hence, with the CVAD, both of these VAD enhancements can be optimized for each subsystem without having a detrimental effect on other sub-systems.
Similarly, the speech presence sensitivity requirements of each block differ. For instance, if given a choice between having the speech coder not recognize silence or performing silence suppression procedures during low-level speech, the former would be the obvious choice. Thus, the speech coder requires high speech sensitivity. Some of the other functions such as EC and NR can generally accommodate a less sensitive VAD, and benefit from a multi-level speech probability measure. For instance, the EC can slow the adaptation of its taps as the probability of speech presence measure approaches the DTD threshold. And as previously mentioned, a NR system can be established that uses a probability of speech presence measure to control the algorithm instead of a simple threshold.
In order to accommodate the different speech presence sensitivity requirements, the CVAD provides appropriate voice activity signals to the different blocks; although, the VAD processing is integrated. For instance, the CVAD would normally provide just a binary speech present or absent signal to the speech coder, while a multi-level or probability of speech presence measure is provided to the other blocks. These three CVAD factors combine to create high performance VAD, which produces a powerful improvement in overall system performance.
3. Integrating EC and NR Functions
The interaction between self-optimized processing blocks can result in sub-optimal overall performance. This can be particularly pronounced for the EC function's NLP and the noise reduction function. This is particularly poor when ERLE is poor, which is the case when the NLP is used without the EC. The result is an intermittent choppiness in the speech and background noise.
By integrating the EC and NR functions together a significantly improved system is created. Integrating these two functions facilitates a reordering of the NLP and the NR subsystems. In the NGVPS, the NR subsystem is placed between the EC and the EC's NLP. This is important to speech quality, as the nonlinear nature of the NLP affects the NR system in a dramatic way. When the NLP is placed before the NR function, the NLP can change the noise location and affect its level at various frequencies in a time varying fashion that is difficult to track in the NR system. This is because most of the NR system's noise estimates are performed during silence, but used during speech. This makes NR systems susceptible to time varying noise backgrounds, particularly with regard to spectral content. Additionally, the NLP with its associated noise injection process may have different background noise levels when speech is present compared to when speech is absent. This is effectively a time varying noise source, which would degrade NR performance in a typical voice processing system (VPS).
The integrated system places the NR function between the EC and the NLP. It also uses a central noise and signal estimate as described in Section 4. The estimates are adjusted to compensate for the effect of the NR system in the control of the NLP. The NR system reduces noise by a fixed factor during times of voice inactivity. It has been shown that improved NLP performance is realized when the NLP operates in the sub-band domain. However, sub-band NLPs are rarely used due to the cost of creating the sub-band signal, both in real dollars as well as processing power and delay. However, the NGVPS offers this sub-band option, by further integrating the NLP into each of the NR systems sub-bands. These sub-bands are created as part of the noise reduction function. Hence, by integrating these two functions together, performance can be gained without the added cost. The sub-band NLP further improves performance. The integrated EC and NR approach out-performs the black-box approach even without this further enhancement.
In one arrangement, the voice processing blocks include an echo canceller, noise reduction block and level adjustment block. Each of those blocks makes a gain adjustment to the input signal. Normally this is done by each block independently. A preferred implementation involves computing the adjustments individually in each block but then adjusting the signal once per the combined adjustment calculations in one central adjustment block, function or location.
4. Centralized Noise and Signal Estimates
Contrast once again the block diagram of an integrated voice-over-x DSP system as shown in FIG. 3 with the system shown in FIG. 2. The multiple signal estimators of FIG. 2 have been consolidated into a single signal estimator in the control block. Likewise, the multiple noise estimators of FIG. 2 have been consolidated into a single noise estimator in the control block.
The signal estimator is very closely related to some parts of the consolidated VAD (CVAD) function and should perhaps be shown as part of the VAD. This consolidated signal estimator includes both broadband and sub-band signal estimates. The majority of the processing power associated with creating the sub-band estimates is actually part of the NR process. Similarly, the majority of the processing power for the broadband estimate can be considered to be part of an ordinary VAD. These calculations can now be shared by the new high performance CVAD as well as the NR and ALC subsystems.
The various background noise estimates are consolidated into a single background noise estimate. This background noise estimate is actually a set of estimates, some broadband and some sub-band, but is referred to in singular to avoid confusion with the unconsolidated estimates. This estimate is derived from the transmit signal just after the near-end echo estimate is subtracted by the canceller. The consolidated noise estimate serves as the background noise estimate to the NLP subsystem for background noise transparency (also known as comfort noise injection), the NR subsystem (for spectral subtraction of background noise), and the speech encoder (to send silence descriptors during silence). It is also shared by the VAD to help it avoid false triggers resulting from noise and to more accurately calculate the probability of speech being present. Using the signal out of the echo subtraction block improves the quality of this noise estimate, as the estimate is taken before performing other processing, which would corrupt the estimate. This improves the quality of the entire system. For example, the improved background noise estimate can be used in the NR, which, in turn, increases the amount of noise reduction and reduces any artifacts or distortion in the speech. Distorted speech is even more difficult to model in the codec, so it, in turn, would add more distortion. The silence suppressor uses a version of the noise estimate, which has been modified to account for the effect of the NR system. This improves the accuracy of the silence suppressor and reduces the noise modulation.
The quality of the noise often distinguishes one VBE system from the next. On average, speech is active less than 50% of the time, in a given direction.
5. Consolidated Noise Injection
In telephony digital PCM systems, the analog signal is sampled 8000 times per second and converted to an 8 bit digital a-law or μ-law encoded signal. Voice Processing Systems interface with this PCM encoded digital data stream. An echo canceller is one such device that adapts to the impulse response of the near-end transmission facility and produces an echo estimate by multiplying this impulse response by the signal from the far end. This echo estimate is subtracted from the near-end signal producing a signal which has the echo component removed. This process is not exact because of the quantitization distortion of the a-law and μ-law encoding processes. This quantitization distortion limits the echo return loss enhancement (ERLE) to approximately 33 dB even if all other processes are perfect. This still leaves a residual echo signal that is perceptible to the far-end talker. Historically, this problem is solved within the echo canceller design by passing the signal through a non-linear processor (NLP). The function of the NLP is to remove or attenuate the residual echo component of the signal so that it is no longer perceptible to the far-end talker.
One issue with the use of NLPs is apparent where high non-linearity (from acoustic echo) and background noise is present. When the far end user speaks, their voice energy drives the NLP to operate thereby removing the residual echo. At the same time however the far end user also hears the background noise muting, an effect known as background noise modulation. This is particularly obnoxious to the far end speaker if there is a perceptible delay between the far-end and near-end telephones, since this modulation effect is not covered up by the sidetone applied to his own earpiece.
Historically, one solution to enhancing "background transparency" is to add "comfort noise" that matches the level of the idle channel noise when the center clipper is active. One approach for accomplishing this is described in United States Patent No. 5,157,653 issued in the name of Roland Genter, the teachings of which are hereby incorporated by this reference. This works in most instances, causing this noise modulation effect to be essentially non- perceptible to the far-end listener. The key, however, is the close spectral matching of the comfort noise to the idle channel noise, which requires additional processing power in any system.
The present invention contemplates how another aspect of a voice processing system, such as the noise reduction system element as a specific example, can be used during it's otherwise "idle" time to provide virtually non-perceptible insertion of a derived noise signal into the gaps created during NLP operation. While it may be possible to design an NLP to remove significant non-linear "echo" artefacts (as may be found in the tail circuit of a mobile cellular telephony network, for example) without disturbing the background noise, it is considered that the processing power required to effectively achieve such puts this solution out of the reach of a practical system. The present invention limits or altogether circumvents any such onerous requirement by keeping the NLP basic and using otherwise spare processing power.
Referring now to FIG. 4, there is illustrated an exemplary echo canceller (EC) and noise reduction (NR) system in accordance with prior art techniques to which the present invention may be applied as described below. In general, operation of the echo canceller filter, the NLP, and the noise reduction filter are well published and known to those of ordinary skill in the art, and therefore need not be described in substantial detail herein. Accordingly, the focus of the following discussion will be on the technique by which system elements and/or characteristics and/or resources, such as for example the readily accessible noise reduction processing aspect of the system, can be used to provide a dynamic spectrally and amplitude matched comfort noise injection signal for insertion into the gaps of signal created by the NLP in response to far-end speech.
During a telephone call the NLP will be operating when the far-end talker speaks (to prevent residual echo), and releasing when the near-end talker speaks. During double-talk, speech is passing in both directions and the NLP is released, but the residual echo remaining after the echo canceller filter is likely to be below a disturbing level. In consideration of the near-end speech scenario, during this time the noise reduction processor will be converging on the stationary content of the background noise, this being the part of a noise signal for which the amplitude and spectrum remain constant over some seconds.
In the next instance the far-end talker will respond to the near-end talker and the echo canceller filter algorithm decides whether the NLP should be operated or not (low to medium near-end noise, or high near-end noise condition respectively). If the NLP is operated then the residual echo and any near-end noise will be muted, giving rise to a background noise modulation effect perceived by the far-end. In an alternate (and for claim construction, an equivalent) embodiment, in other NLP operations, residual echo and any near-end noise might be compressed, scrambled, or compressed and scrambled, or clipped or passed through unprocessed. From experience, perception of the modulation effect by the far-end user is increased if delay over the telephone circuit is increased (>40mS round trip delay). The overall effect is quite disturbing. Background noise modulation can be an issue wherever the speech path is interrupted, which is why the techniques described herein are equally useful in systems employing discontinuous transmission (DTX) methods and voice activity detectors (VAD).
Many voice-processing systems use a fixed spectrum noise injection system, which is quite suitable for wireline systems where the requirement is to match to random circuit noise ("white" noise), which is of equal amplitude per frequency over the channel bandwidth. A problem occurs however, because in nature the spectrum of acoustically derived background noise does not correspond to random noise, but is produced by music, background from traffic, car noise, or crowd noise (e.g., noise heard over a pay telephone in a restaurant). In many cases, the comfort noise injection is more obnoxious than having no noise injection. The desirable approach is to sample the noise during the speech gaps and derive a noise model of the stationary element for both amplitude and spectrum; in other words, a model comprising spectral and gain estimates. As known in the art, these estimates may be determined on a broadband or sub-band basis. By deriving the stationary element, a sample of random, spectrally and amplitude matched noise is available to use, less the non-stationary elements that could cause a repeatable pattern during playback into the signal path. The derived noise model can then be seamlessly (substantially unnoticed in the resultant audio) injected into the signal path following the NLP, whilst the NLP is operated. The level of the noise injection may be partially based upon NLP parameters to accommodate various levels of muting or scrambling that might be taking place. Therefore the control for sampling the noise and injecting the noise is common to the NLP control line (not known in prior art systems) from the echo canceller filter shown in FIG. 4. For purposes of claim construction, the term "injecting" refers to (means) substituting a noise signal for an NLP output, as well as combining a noise signal with the NLP output. Techniques for deriving the noise spectrum and amplitude generally appear in other system designs, however among the differences between such other designs and the approach taken in the context of the herein-described embodiment of the present invention is that the system described herein makes alternative use of at least one aspect of a voice processor system. In particular and in the context of the above-described and -illustrated EC and NR system, resources associated with the noise reduction processor and system are used, during what is effectively an idle period for traditional noise reduction processors (e.g., when the NLP is operated), in a manner to improve the perceived quality of the communicated signal.
Referring back to FIG. 4, ordinarily when there is a signal from the near-end, the noise reduction processor will be converging on the stationary element of the noise signal and then applying a filter function to remove a defined amount of the stationary noise from the signal. When the NLP is operated (to remove residual echo and background noise) the noise reduction filter is "frozen," or in other words not updated or otherwise changed, so that the model is not lost while the NLP is in operation. The noise reduction filter does not ordinarily function to provide noise reduction during this period of NLP operation, but then resumes operation once the NLP is no longer operated. In this way as the noise spectrum and amplitude change throughout the filter processor can track the changes and efficiently reduce the noise.
In the context of the present invention, the spectral and gain estimates maintained by the noise reduction filter, which are typically frozen as described above, are referenced and used in a new manner for the generation of a noise signal for injection into the communication signal at the appropriate intervals (e.g., during operation of the NLP). One example approach for using such filter coefficients in this manner to generate a noise signal for injection is to use them to filter white noise that is internally generated. This noise could be broadband noise that is then filtered by each sub-band weighting coefficient or independent per each sub-band also weighted by each sub-band coefficient. In either case, the generated noise then has the same spectral characteristics as the true or actual background noise since the adaptive sub-band weighting coefficients converge to the spectral coefficients of that noise. By using the gain estimate(s) to scale the spectrally matched noise, the model is able to more accurately match the background noise.
In this way, at appropriate points during the conversation the noise reduction system effectively contributes to noise generation, but not at the same time that the noise reduction filter is operating to provide typical noise reduction. An example embodiment of this aspect of the present invention is illustrated in FIG. 5. In particular, a transmitted voice signal is provided to an echo canceller 502 and non-linear processor 504. The resulting signal is then sent to an adaptive noise estimator/reducer 506. Additionally, a control signal 510 indicative of the active/inactive state of the NLP 504 is sent to a noise reduction controller 508. In turn, the noise reduction controller 508 provides a noise reduction control signal 512 to the adaptive noise estimator/reducer 506. Thus, if the NLP 502 is inactive, the controller 508 configures the noise reduction control signal 512 to instruct the adaptive noise estimator/reducer 506 to allow the noise estimator to adapt and subtract a portion of the noise from the input voice signal. Conversely, if the NLP 502 is active, the controller 508 configures the noise reduction control signal 512 to instruct the adaptive noise estimator/reducer 506 to freeze the noise estimation process and generate synthesized background noise based on the current frozen background noise model. The synthesized noise is thereafter added to the input signal.
Tests have shown the resulting noise insertion system to have a good match in subjective listening tests and imperceptible operation in conversational tests for a wide range of program material. Even when there is a high content of non-stationary noise in the background noise, the loss of this detail in the returned signal to the far-end user is not considered disturbing since they are talking at this time and sensitivity to non-stationary noise is reduced. It is certainly the case that the far-end talker perceives disturbance in the stationary content greatest and the present invention can be used to resolve this issue. This same centralized system is used by the codec for its background noise estimate, which is used to generate its SID (silence description) packets when DTX (discontinuous transmission) or multi-rate transmission is active. The noise estimate used by the codec is able to take into account NR, NLP, and noise injection levels and the noise spectrums. These make DTX as unobtrusive as possible.
6. System Awareness and Optimization for Codec Frames and Packetization
Current voice processing systems (VPSs) synchronize the packetization engine to the speech frames generated by codecs. This provides a natural packetization while reusing the same buffering and signal delay for both purposes. This has been accomplished without breaking the black-box approach to building a system, because the frame output of the codec is simply incorporated into the packets.
Another way in which the integration of the NGVPS outperforms current generation of VPSs is by synchronizing the entire system to fixed boundaries, preferably, the codec frames, sub-frames or both. Referring again to FIG. 3, this is accomplished by the framing control block 340 issuing at least one boundary control signal to the respective voice processing blocks, which control signal informs the blocks of the boundaries. This provides enhanced performance for a number of blocks. The ALC, NR, and EC functions of the NGVPS are all enhanced. ALC is used to add gain to low-level voice signals when too much transmission loss is encountered or to reduce high-level speech signals, which may overdrive analog circuits at the other end of the network. The intelligent block-to-block control coordinates the interaction of the automatic gain control and the speech coder. Gain control changes are synchronized with the frame boundaries of the speech coder. This allows the NGVPS to hold the gain constant during the speech coder sub-frames and/or frames. By not changing the gain during sub-frames and/or frames coder performance is enhanced. Reducing the variation of the signal level mid-frame improves the modeling of the speech by the encoder. Mid-frame level changes require a trade-off in the coder's non-gain speech parameters. The codebook search, for example, needs to select an excitation vector, which when played out through the filter based on the LPC coefficients would have a sudden increase in volume. This does not fit the normal speech model very well and can dominate the selection of a codebook vector causing the more subtle characteristics to be overlooked. Depending on the particular coder, each frame and/or sub-frame of the coded speech contains a gain parameter. By synchronizing the ALC gain changes to these boundaries, the changes can be modeled in the gain parameter without the degenerating effect on the selection of the other parameters.
The ALC algorithm is not only synchronized to the frames in order to coordinate its gain adjustment times, but also to take advantage of the data-blocking required for the codecs. An important part of an ALC system is the ability to minimize clipping due to over- amplification. By synchronizing the ALC system to the data-blocks, the ALC system can look at the entire block for clipping, and incorporate that into its gain selection.
This same type of look-ahead is used to improve the VAD's performance. It is often difficult to recognize changes in voice activity until some time after they happen. By adding look-ahead to a VAD its performance can be improved. Some codecs such as G.729 and G.723.1 require look-ahead data to perform their functions. Again by coordinating the data- blocks with the VAD function, the system VAD can use look-ahead without adding delay to the system.
Many families of noise reduction algorithms, such as the NR algorithm currently being sold by Tellabs, operate on blocks of data at a time. The blocking up of data adds delay to these systems. Unfortunately, these systems are typically used in highly delay- sensitive applications. The NR algorithms are typically fast Fourier transform (FFT) based and require significant buffering. Wavelet-based algorithms and those requiring look-ahead would also require buffers of data and have similar delay implications. The NGVPS eliminates the additional buffering delay required in other systems by using the same data- blocking delays associated with the codecs to perform noise reduction. The current black- box systems do not have this level of synchronization between elements.
The system-wide awareness of the codec frame is also used to improve the operation of the EC. This will be explained in the next section along with the other features of the NGVPS EC. This along with various other EC improvements are included as part of Section 7.
7. Network Adaptive Advanced Echo Canceller with Codec Integration
Another feature of the present invention that can significantly enhance voice quality is the inclusion of a far end echo canceller. Some of today's TDM carriers choose to cancel echo in both directions using a single network element. These "duo" echo cancellers are most popular in wireless environments, where delay introduced in the wireless air interface creates the need to cancel echo in both directions; i.e. echo from the PSTN and wireless terminal. In a packet voice network, an operator may similarly choose to deploy a duo canceller configuration, as the same condition exists. (Note that the term "packet network", as used throughout this disclosure, is a specific example of a wider class of variable delay networks to which the present invention is applicable.) The packet network with speech compression adds delay to connections that might otherwise not need a canceller, as in wireless applications. FIG. 6 shows the duo layout comprising a near end echo canceller 602 and a far end echo canceller 604. Notice that the far end or packet switch echo canceller has the packet network in its endpath. Packet networks are notorious for dropped packets and significant delay variation. Both of these impairments can severely affect the performance of a canceller. In a standard voice-over-x implementation, the packet processor has some knowledge of the lost packets and changes in endpath delay. By sharing this information with the far end echo canceller and by subsequently using this information to intelligently control the canceller's behavior, the detrimental effects created by the packet network are minimized. In other words, voice quality is optimized. Some advanced TDM networks being created for the wireless world may also have changing endpath delay. This advanced echo canceller (AEC) has a couple of new features. First, it is synchronized to packet boundaries and can disable both coefficient update and echo cancellation on a packet by packet basis. When a packet is lost and has to be replaced using lost or errored packet substitution, the coefficients are frozen and echo cancellation is disabled. If echo cancellation were not disabled, subtracting out the estimated echo response would actually add echo. This would result because the substituted packet would be so different from the lost packet that subtracting the actual echo would effectively be adding the negative of the echo to this signal.
In a more advanced version, the packet substitution algorithm does not base the replacement packet on the previously received packets, but on the echo cancelled versions of these packets.
Another feature of this AEC is that it is integrated with a decoder that receives the same silence description (SID) information sent to the far-end. This enables the near end EC to construct the signal being generated at the far-end. Normally, the SID packets only contain spectral information, which the far end uses to filter randomly chosen excitation vectors. As a result, the accuracy of the reconstructed signal at the far end is limited to the spectral characteristics conveyed by the SID information. However, when the far-end codec is part of the end-to-end system, as with the present invention, it is possible to synchronize the local random codebook excitation selection with that being used at the far-end. Such synchronization may take advantage of any unused bits in the SID packets, which are usually the same size as the regular speech packets but only contain spectral information. The unused bits corresponding to the codebook excitation are available for random number generator synchronization between the two ends. This allows the AEC to have access to the signal that is echoing back, even when DTX is active and comfort noise generation is taking place at the far-end in response to SID packets. Without this feature, the EC would not know what signal was being echoed back and would have to disable coefficient updates. A secondary issue with not having this feature is that any echoed noise would have to be left in the received signal. Preferably, this decoder is active even for non-SID packets. This helps to reduce the nonlinearity of the endpath by modeling the effect of the coder-decoder combination in one direction.
A last feature of the AEC is the ability for the echo cancellers at either end to move their respective h vectors (i.e., time domain transfer function) in response to changes in delay in their respective endpaths. As known in the art, such h vectors model the delay characteristics giving rise to echo conditions. In this regard, each end of the AEC maintains jitter buffers, which adjust in response to network conditions. At the end local to a given EC, the EC receives information from its local jitter buffer and moves the effective locations of the h vector's coefficients in response to the buffer adjustments. Additionally, or alternatively, the EC also monitors its ERLE metric. If the ERLE degrades past one or more thresholds, the EC knows to adjust its h vector's coefficient locations; if the delay has changed the AEC adjusts the h vector's coefficient locations accordingly. In this way the AEC can accommodate delay changes that occur and are not under the NGVPS 's control. These types of delay changes can occur due to adjustments in other network buffers. Furthermore, information regarding changes to delay characteristics determined at one end can be forwarded to the other end so that the effects of the changed delay can be accounted for at both ends. For example, if the far end detects a change in delay characteristics having an effect on an echo path manifested at the near end, the far end can send information regarding the change in delay to the near end so that it can begin to adjust its coefficients in anticipation of receiving the audio impacted by the change delay.
These features are also applicable to certain TDM networks, particularly those in the wireless world where speech compression and DTX can create many of the same problems, which the AEC addresses for packet network applications.
While the foregoing detailed description sets forth presently preferred embodiments of the invention, it will be understood that many variations may be made to the embodiments disclosed herein without departing from the true spirit and scope of the invention. This true spirit and scope of the present invention is defined by the appended claims, to be interpreted in light of the foregoing specifications.

Claims

ClaimsWhat is claimed is:
1. In a communication system comprising a plurality of voice processing blocks used to process a transmitted voice signal, a method for controlling operation of the plurality of voice processing blocks, the method comprising steps of: providing a centralized frame controller coupled to the plurality of voice processing blocks; providing, by the centralized frame controller, at least one boundary control signal to the plurality of voice processing blocks, wherein operation of each of the plurality of voice processing blocks on the transmitted voice signal is dependent in part upon the at least one boundary control signal.
2. The method of claim 1, wherein the at least one boundary control signal is determined based on at least one of a frame boundary and a plurality of sub-frame boundaries corresponding to operation of a speech codec.
3. The method of claim 2, wherein the frame boundary and sub- frame boundaries correspond to the codec frame and sub-frame boundaries.
4. The method of claim 2, wherein the plurality of voice processing blocks comprises at least one automatic level control circuit.
5. The method of claim 3, wherein the at least one boundary control signal delineates periods of time, and wherein the at least one automatic level control circuit, in response to the at least one boundary control signal, maintains a gain factor at constant levels during each period of time.
6. The method of claim 4, wherein the at least one automatic level control circuit, for each period of time, determines the gain factor by analyzing a portion of the transmitted voice signal delimited by the period of time.
7. The method of claim 1 , wherein the plurality of voice processing blocks comprises at least one noise reduction circuit.
8. The method of claim 6, wherein the at least one boundary control signal delineates periods of time, and wherein the at least one noise reduction circuit, in response to the at least one boundary control signal and for each period of time, performs noise reduction processing on a portion of the transmitted voice signal delineated by the period of time.
9. The method of claim 1 , wherein the plurality of voice processing blocks comprises at least one echo canceller.
10. The method of claim 8, wherein the at least one boundary control signal delineates periods of time, and wherein the at least one echo canceller, in response to the at least one boundary control signal and for each period of time, performs echo cancellation processing on a portion of the transmitted voice signal delineated by the period of time.
11. An apparatus for processing a transmitted voice signal, comprising: a plurality of voice processing blocks that each operate upon the transmitted voice signal; and a centralized frame controller, coupled to each of the plurality of voice processing blocks, that provides at least one boundary control signal to the plurality of voice processing blocks, wherein operation of each of the plurality of voice processing blocks on the transmitted voice signal is dependent in part upon the at least one boundary control signal.
12. The apparatus of claim 11, further comprising a programmable processor coupled to a storage device, wherein the centralized frame controller is implemented via instructions executed by the programmable processor and stored in the storage device.
13. The apparatus of claim 11 , wherein the centralized frame controller determines the at least one boundary control signal based on at least one of a frame boundary and a plurality of sub-frame boundaries corresponding to operation of a speech codec.
14. The apparatus of claim 11, wherein the plurality of voice processing blocks comprises at least one automatic level control circuit.
15. The apparatus of claim 14, wherein the at least one boundary control signal delineates periods of time, and wherein the at least one automatic level control circuit, in response to the at least one boundary control signal, maintains a gain factor at constant levels during each period of time.
16. The apparatus of claim 15, wherein the at least one automatic level control circuit, for each period of time, determines the gain factor by analyzing a portion of the transmitted voice signal delimited by the period of time.
17. The apparatus of claim 11 , wherein the plurality of voice processing blocks comprises at least one noise reduction circuit.
18. The apparatus of claim 17, wherein the at least one boundary control signal delineates periods of time, and wherein the at least one noise reduction circuit, in response to the at least one boundary control signal and for each period of time, performs noise reduction processing on a portion of the transmitted voice signal delineated by the period of time.
19. The apparatus of claim 11 , wherein the plurality of voice processing blocks comprises at least one echo canceller.
20. The apparatus of claim 19, wherein the at least one boundary control signal delineates periods of time, and wherein the at least one echo canceller, in response to the at least one boundary control signal and for each period of time, performs echo cancellation processing on a portion of the transmitted voice signal delineated by the period of time.
21. In a communication system comprising at least one echo canceller coupled to a variable-delay network, a method comprising steps of: determining, by a first echo canceller of the at least one echo canceller, delay characteristics related to at least one voice signal received via the variable delay network; determining, by the first echo canceller, that delay characteristics corresponding to the at least one voice signal have changed; and modifying, by the first echo canceller, echo cancellation processing on the at least one voice signal in response to the changed delay characteristics.
22. The method of claim 21 , further comprising a step of: prior to the step of determining that the delay characteristics have changed, determining that echo cancellation performance has degraded.
23. The method of claim 21 , wherein the step of determining that the delay characteristics have changed further comprises inspecting a jitter buffer used to store the at least one voice signal.
24. The method of claim 21, wherein the at least one voice signal is comprised of a plurality of packets, the method further comprising steps of: correlating the delay characteristics with a portion of the plurality of packets, wherein the step of modifying further comprises discontinuing echo cancellation processing for the portion of the plurality of packets.
25. The method of claim 24, wherein the step of modifying further comprises substituting previously echo cancelled packets for missing packets of the plurality of packets.
26. The method of claim 21, wherein the step of modifying further comprises adjusting a time domain transfer function used to perform the echo cancellation processing.
27. The method of claim 21 , wherein the step of determining that the delay characteristics have changed further comprises receiving information regarding changes to delay characteristics corresponding to a second echo canceller of the at least one echo canceller.
28. In a communication system comprising at least two echo cancellers, a method comprising steps of: determining, by a first echo canceller of the at least two echo cancellers, silence descriptor information related to a portion of a transmitted voice signal sent from the first echo canceller to a second echo canceller of the at least two echo cancellers; transmitting, by the first echo canceller to the second echo canceller, the silence descriptor information and excitation vector information; and reconstructing, by the second echo canceller, the portion of the transmitted voice signal based in part upon the silence descriptor information and the excitation vector information.
29. The method of claim 28, wherein the silence descriptor information comprises spectral information regarding the portion of the transmitted signal.
30. The method of claim 29, wherein the excitation vector information identifies a particular excitation vector that, when filtered according to the spectral information, provides an estimate of the portion of the transmitted signal.
31. The method of claim 28, further comprising steps of: receiving, by the first echo canceller from the second echo canceller, a received voice signal based in part upon the silence descriptor information and the excitation vector information; and modifying, by the first echo canceller, echo cancellation processing on the transmitted voice signal based on the received voice signal.
32. In a communication system comprising a plurality of voice processing blocks used to process at least one voice signal, a method for controlling operation of the plurality of voice processing blocks, the method comprising steps of: providing a centralized voice activity detector coupled to the plurality of voice processing blocks; performing, by the centralized voice activity detector, a first type of voice activity analysis on a transmitted voice signal of the at least one voice signal; and providing, by the centralized voice activity detector, at least one voice activity indication to the plurality of voice processing blocks in response to the first type of voice activity analysis, wherein operation of each of the plurality of voice processing blocks on the transmitted voice signal is dependent in part upon the at least one voice activity indication.
33. The method of claim 32, wherein the plurality of voice processing blocks comprises any combination of: a noise reduction circuit, an automatic level control circuit, an echo canceller, and a speech encoder.
34. The method of claim 32, wherein the at least one voice activity indication comprises at least two voice activity indications, and wherein the at least two voice activity indications are based on uniquely corresponding, non-identical voice activity thresholds.
35. The method of claim 34, wherein one of the at least two voice activity indications comprises a binary indication.
36. The method of claim 34, wherein one of the at least two voice activity indications comprises a probabilistic indication.
37. The method of claim 32, wherein the step of providing the at least one voice activity indication further comprises providing each of the at least one voice activity indication to a subset of the plurality of voice processing blocks.
38. The method of claim 32, wherein the step of performing further comprises performing a second type of voice processing analysis on a received voice signal of the at least one voice signal, and wherein the step of providing the at least one voice activity indication further comprises providing the at least one voice activity indication only in response to the second type of voice activity analysis.
39. The method of claim 38, wherein the step of providing the at least one voice activity indication further comprises providing the at least one voice activity indication in response to the first and the second type of voice activity analysis.
40. The method of claim 32, further comprising steps of: providing a centralized noise estimator coupled to the centralized voice activity detector and at least a portion of the plurality of voice processing blocks; performing, by the centralized noise estimator, at least one type of noise estimation analysis on the transmitted voice signal; and providing, by the centralized noise estimator, at least one noise estimate to the portion of the plurality of voice processing blocks in response to the at least one type of noise estimation analysis, wherein operation of each of the portion of the plurality of voice processing blocks on the transmitted voice signal is dependent in part upon the at least one noise estimate.
41. A computer-readable medium having stored thereon computer-executable instructions for performing the method of claim 32.
42. In a communication system comprising a plurality of voice processing blocks used to process a transmitted voice signal, a method for controlling operation of the plurality of voice processing blocks, the method comprising steps of: providing a centralized noise estimator coupled to the plurality of voice processing blocks; performing, by the centralized noise estimator, at least one type of noise estimation analysis on the transmitted voice signal; and providing, by the centralized noise estimator, at least one noise estimate to the plurality of voice processing blocks in response to the at least one type of noise estimation analysis, wherein operation of each of the plurality of voice processing blocks on the transmitted voice signal is dependent in part upon the at least one noise estimate.
43. The method of claim 42, wherein the plurality of voice processing blocks comprises any combination of: a noise reduction circuit, a non-linear processor, a voice activity detector, and a speech encoder.
44. The method of claim 42, wherein the at least one type of noise estimation analysis comprises broadband analysis and sub-band analysis, and wherein the at least one noise estimate comprises a broadband noise estimate based on the broadband analysis and a sub- band noise estimate based on the sub-band analysis.
45. The method of claim 42, wherein the transmitted voice signal is subjected to echo cancellation processing prior to the step of performing the at least one type of noise estimation analysis on the transmitted voice signal.
46. The method of claim 45, wherein the step of performing the at least one type of noise estimation analysis is performed before the transmitted voice signal is subject to non-linear processing.
47. The method of claim 42, further comprising steps of: providing a centralized voice activity detector coupled to the centralized noise estimator and at least a portion of the plurality of voice processing blocks; performing, by the centralized voice activity detector, voice activity analysis on the transmitted voice signal; and providing, by the centralized voice activity detector, at least one voice activity indication to the portion of the plurality of voice processing blocks in response to the voice activity analysis, wherein operation of each of the portion of the plurality of voice processing blocks on the transmitted voice signal is dependent in part upon the at least one voice activity indication.
48. A computer-readable medium having stored thereon computer-executable instructions for performing the method of claim 42.
49. An apparatus for processing at least one voice signal, comprising: a plurality of voice processing blocks that each operate upon the at least one voice signal; and a centralized voice activity detector, coupled to each of the plurality of voice processing blocks, that performs at least one type of voice activity analysis on the at least one voice signal and provides at least one voice activity indication to the plurality of voice processing blocks in response to the at least one type of voice activity analysis, wherein operation of each of the plurality of voice processing blocks on the at least one voice signal is dependent in part upon the at least one voice activity indication.
50. The apparatus of claim 49, wherein the plurality of voice processing blocks comprises any combination of: a noise reduction circuit, an automatic level control circuit, an echo canceller, and a speech encoder.
51. The apparatus of claim 49, wherein the at least one voice signal comprises a transmitted voice signal and the centralized voice activity detector performs a first type of voice activity analysis on the transmitted voice signal.
52. The apparatus of claim 51, wherein the at least one voice signal comprises a received voice signal and the centralized voice activity detector performs a second type of voice activity analysis on the received voice signal.
53. The apparatus of claim 49, wherein the at least one voice signal comprises a received voice signal and the centralized voice activity detector performs a second type of voice activity analysis on the received voice signal.
54. The apparatus of claim 49, further comprising a programmable processor coupled to a storage device, wherein the centralized voice activity detector is implemented via instructions executed by the programmable processor and stored in the storage device.
55. The apparatus of claim 49, further comprising: a centralized noise estimator, coupled to at least a portion of the plurality of voice processing blocks and the centralized voice activity detector, that performs at least one type of noise estimation analysis on the at least one voice signal and provides at least one noise estimate to the portion of the plurality of voice processing blocks in response to the at least one type of noise estimation analysis, wherein operation of each of the portion of the plurality of voice processing blocks on the at least one voice signal is dependent in part upon the at least one noise estimate.
56. An apparatus for processing at least one voice signal, comprising: a plurality of voice processing blocks that each operate upon the at least one voice signal; and a centralized noise estimator, coupled to each of the plurality of voice processing blocks, that performs at least one type of noise estimation analysis on the at least one voice signal and provides at least one noise estimate to the plurality of voice processing blocks in response to the at least one type of noise estimation analysis, wherein operation of each of the plurality of voice processing blocks on the at least one voice signal is dependent in part upon the at least one noise estimate.
57. The apparatus of claim 56, wherein the plurality of voice processing blocks comprises any combination of: a noise reduction circuit, a non-linear processor, a voice activity detector, and a speech encoder.
58. The apparatus of claim 56, further comprising a programmable processor coupled to a storage device, wherein the centralized noise estimator is implemented via instructions executed by the programmable processor and stored in the storage device.
59. The apparatus of claim 56, further comprising: a centralized voice activity detector, coupled to at least a portion of the plurality of voice processing blocks and the centralized noise estimator, that performs at least one type of voice activity analysis on the at least one voice signal and provides at least one voice activity indication to the portion of the plurality of voice processing blocks in response to the at least one type of voice activity analysis, wherein operation of each of the portion of the plurality of voice processing blocks on the at least one voice signal is dependent in part upon the at least one voice activity indication.
60. In a communication system comprising a plurality of voice processing blocks used to process a transmitted voice signal, a method for confrolling operation of the plurality of voice processing blocks, the method comprising steps of: providing a centralized signal characteristic estimator coupled to the plurality of voice processing blocks; performing, by the centralized signal characteristic estimator, at least one type of signal characteristic estimation analysis on the transmitted voice signal; and providing, by the centralized signal characteristic estimator, at least one signal characteristic estimate to the plurality of voice processing blocks in response to the at least one type of noise estimation analysis, wherein operation of each of the plurality of voice processing blocks on the transmitted voice signal is dependent in part upon the at least one signal characteristic estimate.
61. The method of claim 60, wherein the plurality of voice processing blocks comprises any combination of: a noise reduction circuit, a non-linear processor, a voice activity detector, and a speech encoder.
62. The method of claim 60, wherein the at least one type of signal characteristic estimation analysis comprises broadband analysis and sub-band analysis, and wherein the at least one signal characteristic estimate comprises a broadband signal characteristic estimate based on the broadband an signal characteristic analysis and a sub-band signal characteristic estimate based on the sub-band analysis.
63. The method of claim 60, wherein the transmitted voice signal is subjected to echo cancellation processing prior to the step of performing the at least one type of signal characteristic estimation analysis on the transmitted voice signal.
64. The method of claim 63, wherein the step of performing the at least one type of signal characteristic estimation analysis is performed before the transmitted voice signal is subject to non-linear processing.
65. The method of claim 60, further comprising steps of: providing a centralized voice activity detector coupled to the centralized signal characteristic estimator and at least a portion of the plurality of voice processing blocks; performing, by the centralized voice activity detector, voice activity analysis on the transmitted voice signal; and providing, by the centralized voice activity detector, at least one voice activity indication to the portion of the plurality of voice processing blocks in response to the voice activity analysis, wherein operation of each of the portion of the plurality of voice processing blocks on the transmitted voice signal is dependent in part upon the at least one voice activity indication.
66. A computer-readable medium having stored thereon computer-executable instructions for performing the method of claim 60.
67. In a communication system comprising a plurality of voice processing blocks used to process at least one voice signal, a method for combining the signal operations of the plurality of voice processing blocks, the method comprising steps of: computing the combined signal adjustment from at least substantially all of the voice processing blocks; adjusting the input signal in response to said step of computing the combined signal adjustment.
68. A method of compensating for background noise modulation caused by operation of a non-linear processor on an audio signal, the method comprising steps of: determining a background noise model for the audio signal; reducing background noise in the audio signal based on the background noise model when the non-linear processor is not operating on the audio signal; and injecting synthesized background noise based on the background noise model into the audio signal when the non-linear processor is operating on the audio signal.
69. The method of claim 68 wherein the step of determining the background noise model further comprises adaptively determining filter coefficients representative of the background noise.
70. The method of claim 69, wherein the step of reducing the background noise further comprises steps of: generating the synthesized background noise based on the filter coefficients; and subtracting the synthesized background noise from the audio signal.
71. The method of claim 69, wherein the step of injecting the synthesized background noise further comprises steps of: discontinuing adaptive determination of the filter coefficients when the non-linear processor is operating on the audio signal to provide fixed filter coefficients; generating the synthesized background noise based on the fixed filter coefficients; and adding the synthesized background noise to the audio signal.
72. A computer-readable medium having stored thereon computer-executable instructions for performing the method of claim 68.
73. In a communication system comprising a non-linear processor coupled to a noise reduction circuit, a method for the noise reduction circuit to compensate for background noise modulation caused by operation of the non-linear processor on an audio signal, the method comprising steps of: receiving a first indication that the non-linear processor is not operating on the audio signal; reducing background noise in the audio signal in response to the first indication; receiving a second indication that the non-linear processor is operating on the audio signal; and injecting synthesized background noise into the audio signal in response to the second indication.
74. The method of claim 73, wherein the step of injecting the synthesized background noise into the audio signal further comprises steps of: discontinuing adaptive determination of a background noise model to provide a fixed background noise model; generating the synthesized background noise based on the fixed background noise model; and adding the synthesized background noise to the audio signal.
75. A computer-readable medium having stored thereon computer-executable instructions for performing the steps of claim 73.
76. An apparatus that compensates for background noise modulation caused by operation of a non-linear processor on an audio signal, the apparatus comprising: means for determining a background noise model for the audio signal when the nonlinear processor is not operating on the audio signal; means for reducing background noise in the audio signal based on the background noise model when the non-linear processor is not operating on the audio signal; and means for injecting synthesized background noise based on the background noise model into the audio signal when the non-linear processor is operating on the audio signal.
77. The apparatus of claim 76, wherein the means for determining the background noise model further operate to adaptively determine filter coefficients representative of the background noise.
78. The apparatus of claim 77, wherein the means for reducing the background noise further comprise: means for generating the synthesized background noise based on the filter coefficients; and means for subtracting the synthesized background noise from the audio signal.
79. The apparatus of claim 77, wherein the means for determining the background noise further operate to discontinue adaptive determination of the filter coefficients when the nonlinear processor is operating on the audio signal to provide fixed filter coefficients, and wherein the means for injecting the synthesized background noise further comprise: means for generating the synthesized background noise based on the fixed filter coefficients; and means for adding the synthesized background noise to the audio signal.
80. An apparatus that compensates for background noise modulation caused by operation of a non-linear processor on an audio signal, the apparatus comprising: a controller, coupled to the non-linear processor, that receives operating status information from the non-linear processor and that provides a noise reduction control signal as output; and a noise reduction circuit, coupled to the controller for receiving the noise reduction control signal and coupled to the non-linear processor for receiving the audio signal, that reduces background noise in the audio signal when the noise reduction control signal is asserted and that injects synthesized background noise into the audio signal when the noise reduction control signal is not asserted.
81. The apparatus of claim 80, wherein the noise reduction circuit further comprises: an adaptive filter, positioned to receive the audio signal, that adaptively determines filter coefficients representative of the background noise; and a generation circuit that takes as input the filter coefficients and provides synthesized background noise as output.
82. The apparatus of claim 81 , wherein the noise reduction circuit further comprises: a combiner that subtracts the synthesized background noise from the audio signal when the noise reduction control signal is asserted.
83. The apparatus of claim 82, wherein the adaptive filter discontinues adaptive determination of the filter coefficients to provide fixed filter coefficients, the generation circuit provided the synthesized background noise based on the fixed filter coefficients, and the combiner adds the synthesized background noise to the audio signal when the noise reduction control signal is not asserted.
84. A method of compensating for background noise modulation caused by operation of a non-linear processor on an audio signal, the method comprising steps of: determining a background noise model for the audio signal; reducing background noise in the audio signal based on the background noise model; and injecting synthesized background noise based on at least one of: the background noise model and the state of the non-linear processor.
PCT/US2000/030298 1999-11-03 2000-11-03 Integrated voice processing system for packet networks WO2001033814A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU13596/01A AU1359601A (en) 1999-11-03 2000-11-03 Integrated voice processing system for packet networks
CA002390200A CA2390200A1 (en) 1999-11-03 2000-11-03 Integrated voice processing system for packet networks

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US16335999P 1999-11-03 1999-11-03
US60/163,359 1999-11-03
US22439800P 2000-08-10 2000-08-10
US60/224,398 2000-08-10

Publications (1)

Publication Number Publication Date
WO2001033814A1 true WO2001033814A1 (en) 2001-05-10

Family

ID=26859581

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/030298 WO2001033814A1 (en) 1999-11-03 2000-11-03 Integrated voice processing system for packet networks

Country Status (4)

Country Link
US (6) US6526139B1 (en)
AU (1) AU1359601A (en)
CA (1) CA2390200A1 (en)
WO (1) WO2001033814A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1339205A2 (en) * 2002-02-22 2003-08-27 Broadcom Corporation Interaction between an echo canceller and a packet voice processing
WO2004002127A1 (en) * 2002-06-19 2003-12-31 Koninklijke Philips Electronics N.V. Non stationary echo canceller
WO2004049687A1 (en) 2002-11-25 2004-06-10 Intel Corporation Noise matching for echo cancellers
EP1298815A3 (en) * 2001-09-20 2004-07-28 Mitsubishi Denki Kabushiki Kaisha Echo processor generating pseudo background noise with high naturalness
EP1434416A3 (en) * 2002-12-23 2009-07-08 Broadcom Corporation Packet voice system with far-end echo cancellation
US7920697B2 (en) 1999-12-09 2011-04-05 Broadcom Corp. Interaction between echo canceller and packet voice processing
US9794417B2 (en) 2002-12-23 2017-10-17 Avago Technologies General Ip (Singapore) Pte. Ltd. Packet voice system with far-end echo cancellation

Families Citing this family (206)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6377919B1 (en) * 1996-02-06 2002-04-23 The Regents Of The University Of California System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech
EP1062737B8 (en) * 1998-01-15 2009-03-25 Siemens Enterprise Communications GmbH & Co. KG Method for setting up echo suppression devices in communication links with automatic machines
FI116643B (en) * 1999-11-15 2006-01-13 Nokia Corp Noise reduction
US7263074B2 (en) * 1999-12-09 2007-08-28 Broadcom Corporation Voice activity detection based on far-end and near-end statistics
US7058568B1 (en) * 2000-01-18 2006-06-06 Cisco Technology, Inc. Voice quality improvement for voip connections on low loss network
US6743302B2 (en) * 2000-01-28 2004-06-01 Henkel Corporation Dry-in-place zinc phosphating compositions including adhesion-promoting polymers
WO2001075863A1 (en) * 2000-03-31 2001-10-11 Telefonaktiebolaget Lm Ericsson (Publ) A method of transmitting voice information and an electronic communications device for transmission of voice information
US7246057B1 (en) * 2000-05-31 2007-07-17 Telefonaktiebolaget Lm Ericsson (Publ) System for handling variations in the reception of a speech signal consisting of packets
US6728672B1 (en) * 2000-06-30 2004-04-27 Nortel Networks Limited Speech packetizing based linguistic processing to improve voice quality
US6970511B1 (en) 2000-08-29 2005-11-29 Lucent Technologies Inc. Interpolator, a resampler employing the interpolator and method of interpolating a signal associated therewith
US6876699B1 (en) 2000-08-29 2005-04-05 Lucent Technologies Inc. Filter circuit for a bit pump and method of configuring the same
US6983047B1 (en) * 2000-08-29 2006-01-03 Lucent Technologies Inc. Echo canceling system for a bit pump and method of operating the same
US6973146B1 (en) 2000-08-29 2005-12-06 Lucent Technologies Inc. Resampler for a bit pump and method of resampling a signal associated therewith
US6799062B1 (en) * 2000-10-19 2004-09-28 Motorola Inc. Full-duplex hands-free transparency circuit and method therefor
US7016833B2 (en) * 2000-11-21 2006-03-21 The Regents Of The University Of California Speaker verification system using acoustic data and non-acoustic data
US7177808B2 (en) * 2000-11-29 2007-02-13 The United States Of America As Represented By The Secretary Of The Air Force Method for improving speaker identification by determining usable speech
US7489790B2 (en) 2000-12-05 2009-02-10 Ami Semiconductor, Inc. Digital automatic gain control
US6865162B1 (en) 2000-12-06 2005-03-08 Cisco Technology, Inc. Elimination of clipping associated with VAD-directed silence suppression
US6707869B1 (en) * 2000-12-28 2004-03-16 Nortel Networks Limited Signal-processing apparatus with a filter of flexible window design
US6985550B2 (en) * 2001-04-30 2006-01-10 Agere Systems Inc. Jitter control processor and a transceiver employing the same
US7161905B1 (en) 2001-05-03 2007-01-09 Cisco Technology, Inc. Method and system for managing time-sensitive packetized data streams at a receiver
US20020172350A1 (en) * 2001-05-15 2002-11-21 Edwards Brent W. Method for generating a final signal from a near-end signal and a far-end signal
US7103014B1 (en) * 2001-07-23 2006-09-05 Cirrus Logic, Inc. Systems and methods for improving sound quality in computer-network telephony application
GB2381702B (en) * 2001-11-02 2004-01-07 Motorola Inc Communication system, user equipment and method of performing a conference call therefor
US20030091162A1 (en) * 2001-11-14 2003-05-15 Christopher Haun Telephone data switching method and system
US20030185160A1 (en) * 2002-03-29 2003-10-02 Nms Communications Corporation Technique for use in an echo canceller for providing enhanced voice performance for mobile-to-mobile calls in a GSM environment through use of inter-canceller communication and co-ordination
US7489687B2 (en) * 2002-04-11 2009-02-10 Avaya. Inc. Emergency bandwidth allocation with an RSVP-like protocol
US7398209B2 (en) * 2002-06-03 2008-07-08 Voicebox Technologies, Inc. Systems and methods for responding to natural language speech utterance
US7693720B2 (en) * 2002-07-15 2010-04-06 Voicebox Technologies, Inc. Mobile systems and methods for responding to natural language speech utterance
KR20050021472A (en) * 2002-07-16 2005-03-07 코닌클리케 필립스 일렉트로닉스 엔.브이. Echo canceller with model mismatch compensation
JP4161628B2 (en) * 2002-07-19 2008-10-08 日本電気株式会社 Echo suppression method and apparatus
US7545926B2 (en) * 2006-05-04 2009-06-09 Sony Computer Entertainment Inc. Echo and noise cancellation
US7251213B2 (en) * 2002-09-17 2007-07-31 At&T Corp. Method for remote measurement of echo path delay
US8176154B2 (en) 2002-09-30 2012-05-08 Avaya Inc. Instantaneous user initiation voice quality feedback
US20040073690A1 (en) * 2002-09-30 2004-04-15 Neil Hepworth Voice over IP endpoint call admission
US7359979B2 (en) 2002-09-30 2008-04-15 Avaya Technology Corp. Packet prioritization and associated bandwidth and buffer management techniques for audio over IP
KR20040044217A (en) * 2002-11-19 2004-05-28 주식회사 인티스 Apparatus and Method for Voice Quality Enhancement in Digital Communications
US7242763B2 (en) * 2002-11-26 2007-07-10 Lucent Technologies Inc. Systems and methods for far-end noise reduction and near-end noise compensation in a mixed time-frequency domain compander to improve signal quality in communications systems
KR100463657B1 (en) * 2002-11-30 2004-12-29 삼성전자주식회사 Apparatus and method of voice region detection
US7420937B2 (en) 2002-12-23 2008-09-02 Broadcom Corporation Selectively adaptable far-end echo cancellation in a packet voice system
US7230955B1 (en) * 2002-12-27 2007-06-12 At & T Corp. System and method for improved use of voice activity detection
US7272552B1 (en) 2002-12-27 2007-09-18 At&T Corp. Voice activity detection and silence suppression in a packet network
US7453900B2 (en) 2003-03-05 2008-11-18 Cisco Technology, Inc. System and method for monitoring noise associated with a communication link
DE10318598A1 (en) * 2003-04-24 2004-11-11 Grundig Aktiengesellschaft Method and device for echo cancellation in speech signals
US7848229B2 (en) * 2003-05-16 2010-12-07 Siemens Enterprise Communications, Inc. System and method for virtual channel selection in IP telephony systems
US20040234067A1 (en) * 2003-05-19 2004-11-25 Acoustic Technologies, Inc. Distributed VAD control system for telephone
US7149305B2 (en) * 2003-07-18 2006-12-12 Broadcom Corporation Combined sidetone and hybrid balance
US7353002B2 (en) * 2003-08-28 2008-04-01 Koninklijke Kpn N.V. Measuring a talking quality of a communication link in a network
US7443978B2 (en) * 2003-09-04 2008-10-28 Kabushiki Kaisha Toshiba Method and apparatus for audio coding with noise suppression
FR2861247B1 (en) * 2003-10-21 2006-01-27 Cit Alcatel TELEPHONY TERMINAL WITH QUALITY MANAGEMENT OF VOICE RESTITUTON DURING RECEPTION
US20050114118A1 (en) * 2003-11-24 2005-05-26 Jeff Peck Method and apparatus to reduce latency in an automated speech recognition system
FR2867649A1 (en) * 2003-12-10 2005-09-16 France Telecom OPTIMIZED MULTIPLE CODING METHOD
KR101035736B1 (en) * 2003-12-12 2011-05-20 삼성전자주식회사 Apparatus and method for cancelling residual echo in a wireless communication system
US7978827B1 (en) 2004-06-30 2011-07-12 Avaya Inc. Automatic configuration of call handling based on end-user needs and characteristics
US7917356B2 (en) 2004-09-16 2011-03-29 At&T Corporation Operating method for voice activity detection/silence suppression system
US8559466B2 (en) * 2004-09-28 2013-10-15 Intel Corporation Selecting discard packets in receiver for voice over packet network
WO2006040734A1 (en) * 2004-10-13 2006-04-20 Koninklijke Philips Electronics N.V. Echo cancellation
US8509703B2 (en) * 2004-12-22 2013-08-13 Broadcom Corporation Wireless telephone with multiple microphones and multiple description transmission
US20060147063A1 (en) * 2004-12-22 2006-07-06 Broadcom Corporation Echo cancellation in telephones with multiple microphones
US20060133621A1 (en) * 2004-12-22 2006-06-22 Broadcom Corporation Wireless telephone having multiple microphones
US8102872B2 (en) 2005-02-01 2012-01-24 Qualcomm Incorporated Method for discontinuous transmission and accurate reproduction of background noise information
ES2258918B1 (en) * 2005-02-21 2008-05-16 Ana Maria Hernandez Maestre MATTRESS OF INDEPENDENT AND REMOVABLE PARTS.
US7688817B2 (en) * 2005-04-15 2010-03-30 International Business Machines Corporation Real time transport protocol (RTP) processing component
US7640160B2 (en) * 2005-08-05 2009-12-29 Voicebox Technologies, Inc. Systems and methods for responding to natural language speech utterance
US7620549B2 (en) * 2005-08-10 2009-11-17 Voicebox Technologies, Inc. System and method of supporting adaptive misrecognition in conversational speech
US7949529B2 (en) * 2005-08-29 2011-05-24 Voicebox Technologies, Inc. Mobile systems and methods of supporting natural language human-machine interactions
US7610197B2 (en) * 2005-08-31 2009-10-27 Motorola, Inc. Method and apparatus for comfort noise generation in speech communication systems
WO2007027989A2 (en) * 2005-08-31 2007-03-08 Voicebox Technologies, Inc. Dynamic speech sharpening
GB2430853B (en) * 2005-09-30 2007-12-27 Motorola Inc Voice activity detector
US8738382B1 (en) * 2005-12-16 2014-05-27 Nvidia Corporation Audio feedback time shift filter system and method
US7729657B1 (en) 2006-04-12 2010-06-01 Abel Avellan Noise reduction system and method thereof
US8238817B1 (en) 2006-04-12 2012-08-07 Emc Satcom Technologies, Llc Noise reduction system and method thereof
DE602006005228D1 (en) * 2006-04-18 2009-04-02 Harman Becker Automotive Sys System and method for multi-channel echo cancellation
WO2007130765A2 (en) * 2006-05-04 2007-11-15 Sony Computer Entertainment Inc. Echo and noise cancellation
EP1855456B1 (en) * 2006-05-08 2009-10-14 Harman/Becker Automotive Systems GmbH Echo reduction in time-variant systems
DE602006007685D1 (en) * 2006-05-10 2009-08-20 Harman Becker Automotive Sys Compensation of multi-channel echoes by decorrelation
US8949120B1 (en) 2006-05-25 2015-02-03 Audience, Inc. Adaptive noise cancelation
JP2010506457A (en) * 2006-09-28 2010-02-25 クゥアルコム・インコーポレイテッド Method and apparatus for determining quality of service in a communication system
JP5038426B2 (en) * 2006-09-28 2012-10-03 クゥアルコム・インコーポレイテッド Method and apparatus for determining communication link quality
US8073681B2 (en) 2006-10-16 2011-12-06 Voicebox Technologies, Inc. System and method for a cooperative conversational voice user interface
FR2908004B1 (en) * 2006-10-26 2008-12-12 Parrot Sa ACOUSTIC ECHO REDUCTION CIRCUIT FOR HANDS-FREE DEVICE FOR USE WITH PORTABLE TELEPHONE
FR2908003B1 (en) * 2006-10-26 2009-04-03 Parrot Sa METHOD OF REDUCING RESIDUAL ACOUSTIC ECHO AFTER ECHO SUPPRESSION IN HANDS-FREE DEVICE
FR2908005B1 (en) * 2006-10-26 2009-04-03 Parrot Sa ACOUSTIC ECHO REDUCTION CIRCUIT FOR HANDS-FREE DEVICE FOR USE WITH PORTABLE TELEPHONE
EP1918910B1 (en) * 2006-10-31 2009-03-11 Harman Becker Automotive Systems GmbH Model-based enhancement of speech signals
ATE522078T1 (en) * 2006-12-18 2011-09-15 Harman Becker Automotive Sys LOW COMPLEXITY ECHO COMPENSATION
US7818176B2 (en) * 2007-02-06 2010-10-19 Voicebox Technologies, Inc. System and method for selecting and presenting advertisements based on natural language processing of voice-based input
US7617337B1 (en) 2007-02-06 2009-11-10 Avaya Inc. VoIP quality tradeoff system
US8654955B1 (en) 2007-03-14 2014-02-18 Clearone Communications, Inc. Portable conferencing device with videoconferencing option
US8019076B1 (en) 2007-03-14 2011-09-13 Clearone Communications, Inc. Portable speakerphone device and subsystem utilizing false doubletalk detection
US8406415B1 (en) 2007-03-14 2013-03-26 Clearone Communications, Inc. Privacy modes in an open-air multi-port conferencing device
US7912211B1 (en) 2007-03-14 2011-03-22 Clearone Communications, Inc. Portable speakerphone device and subsystem
US8077857B1 (en) 2007-03-14 2011-12-13 Clearone Communications, Inc. Portable speakerphone device with selective mixing
GB0705329D0 (en) * 2007-03-20 2007-04-25 Skype Ltd Method of transmitting data in a communication system
GB2448201A (en) * 2007-04-04 2008-10-08 Zarlink Semiconductor Inc Cancelling non-linear echo during full duplex communication in a hands free communication system.
US9191740B2 (en) * 2007-05-04 2015-11-17 Personics Holdings, Llc Method and apparatus for in-ear canal sound suppression
US10194032B2 (en) 2007-05-04 2019-01-29 Staton Techiya, Llc Method and apparatus for in-ear canal sound suppression
US11856375B2 (en) 2007-05-04 2023-12-26 Staton Techiya Llc Method and device for in-ear echo suppression
WO2008137870A1 (en) * 2007-05-04 2008-11-13 Personics Holdings Inc. Method and device for acoustic management control of multiple microphones
US11683643B2 (en) 2007-05-04 2023-06-20 Staton Techiya Llc Method and device for in ear canal echo suppression
US8526645B2 (en) 2007-05-04 2013-09-03 Personics Holdings Inc. Method and device for in ear canal echo suppression
EP1995940B1 (en) * 2007-05-22 2011-09-07 Harman Becker Automotive Systems GmbH Method and apparatus for processing at least two microphone signals to provide an output signal with reduced interference
US20090012786A1 (en) * 2007-07-06 2009-01-08 Texas Instruments Incorporated Adaptive Noise Cancellation
US20090043577A1 (en) * 2007-08-10 2009-02-12 Ditech Networks, Inc. Signal presence detection using bi-directional communication data
US7881459B2 (en) * 2007-08-15 2011-02-01 Motorola, Inc. Acoustic echo canceller using multi-band nonlinear processing
US7809129B2 (en) * 2007-08-31 2010-10-05 Motorola, Inc. Acoustic echo cancellation based on noise environment
US20090094026A1 (en) * 2007-10-03 2009-04-09 Binshi Cao Method of determining an estimated frame energy of a communication
US8428661B2 (en) * 2007-10-30 2013-04-23 Broadcom Corporation Speech intelligibility in telephones with multiple microphones
CN100555414C (en) * 2007-11-02 2009-10-28 华为技术有限公司 A kind of DTX decision method and device
US8290142B1 (en) 2007-11-12 2012-10-16 Clearone Communications, Inc. Echo cancellation in a portable conferencing device with externally-produced audio
US8140335B2 (en) 2007-12-11 2012-03-20 Voicebox Technologies, Inc. System and method for providing a natural language voice user interface in an integrated voice navigation services environment
JP4911010B2 (en) * 2007-12-11 2012-04-04 富士通株式会社 Packet capture device, packet capture method, and packet capture program
US8554551B2 (en) * 2008-01-28 2013-10-08 Qualcomm Incorporated Systems, methods, and apparatus for context replacement by audio level
US8290141B2 (en) * 2008-04-18 2012-10-16 Freescale Semiconductor, Inc. Techniques for comfort noise generation in a communication system
US9305548B2 (en) 2008-05-27 2016-04-05 Voicebox Technologies Corporation System and method for an integrated, multi-modal, multi-device natural language voice services environment
US8589161B2 (en) 2008-05-27 2013-11-19 Voicebox Technologies, Inc. System and method for an integrated, multi-modal, multi-device natural language voice services environment
US7522877B1 (en) 2008-08-01 2009-04-21 Emc Satcom Technologies, Inc. Noise reduction system and method thereof
US8218751B2 (en) 2008-09-29 2012-07-10 Avaya Inc. Method and apparatus for identifying and eliminating the source of background noise in multi-party teleconferences
US8320553B2 (en) 2008-10-27 2012-11-27 Apple Inc. Enhanced echo cancellation
TWI414176B (en) * 2009-02-18 2013-11-01 Realtek Semiconductor Corp Communication apparatus with echo cancellations and method thereof
US8326637B2 (en) 2009-02-20 2012-12-04 Voicebox Technologies, Inc. System and method for processing multi-modal device interactions in a natural language voice services environment
EP2222091B1 (en) 2009-02-23 2013-04-24 Nuance Communications, Inc. Method for determining a set of filter coefficients for an acoustic echo compensation means
KR101251045B1 (en) * 2009-07-28 2013-04-04 한국전자통신연구원 Apparatus and method for audio signal discrimination
KR20120091068A (en) * 2009-10-19 2012-08-17 텔레폰악티에볼라겟엘엠에릭슨(펍) Detector and method for voice activity detection
US9171541B2 (en) * 2009-11-10 2015-10-27 Voicebox Technologies Corporation System and method for hybrid processing in a natural language voice services environment
US9502025B2 (en) 2009-11-10 2016-11-22 Voicebox Technologies Corporation System and method for providing a natural language content dedication service
CN102131014A (en) * 2010-01-13 2011-07-20 歌尔声学股份有限公司 Device and method for eliminating echo by combining time domain and frequency domain
US20110178800A1 (en) 2010-01-19 2011-07-21 Lloyd Watts Distortion Measurement for Noise Suppression System
US20110234200A1 (en) * 2010-03-24 2011-09-29 Kishan Shenoi Adaptive slip double buffer
US9343073B1 (en) * 2010-04-20 2016-05-17 Knowles Electronics, Llc Robust noise suppression system in adverse echo conditions
US9558755B1 (en) 2010-05-20 2017-01-31 Knowles Electronics, Llc Noise suppression assisted automatic speech recognition
US8447595B2 (en) * 2010-06-03 2013-05-21 Apple Inc. Echo-related decisions on automatic gain control of uplink speech signal in a communications device
US8411874B2 (en) 2010-06-30 2013-04-02 Google Inc. Removing noise from audio
EP2405634B1 (en) * 2010-07-09 2014-09-03 Google, Inc. Method of indicating presence of transient noise in a call and apparatus thereof
US8570103B2 (en) 2011-06-16 2013-10-29 Donald C. D. Chang Flexible multi-channel amplifiers via wavefront muxing techniques
US20120064759A1 (en) * 2010-09-09 2012-03-15 Spatial Digital Systems Retractable mobile power device module
JP5937611B2 (en) 2010-12-03 2016-06-22 シラス ロジック、インコーポレイテッド Monitoring and control of an adaptive noise canceller in personal audio devices
US8908877B2 (en) 2010-12-03 2014-12-09 Cirrus Logic, Inc. Ear-coupling detection and adjustment of adaptive response in noise-canceling in personal audio devices
US8650029B2 (en) 2011-02-25 2014-02-11 Microsoft Corporation Leveraging speech recognizer feedback for voice activity detection
US9824677B2 (en) 2011-06-03 2017-11-21 Cirrus Logic, Inc. Bandlimiting anti-noise in personal audio devices having adaptive noise cancellation (ANC)
US8948407B2 (en) 2011-06-03 2015-02-03 Cirrus Logic, Inc. Bandlimiting anti-noise in personal audio devices having adaptive noise cancellation (ANC)
US9214150B2 (en) 2011-06-03 2015-12-15 Cirrus Logic, Inc. Continuous adaptation of secondary path adaptive response in noise-canceling personal audio devices
US8958571B2 (en) * 2011-06-03 2015-02-17 Cirrus Logic, Inc. MIC covering detection in personal audio devices
US9318094B2 (en) 2011-06-03 2016-04-19 Cirrus Logic, Inc. Adaptive noise canceling architecture for a personal audio device
US9307321B1 (en) 2011-06-09 2016-04-05 Audience, Inc. Speaker distortion reduction
US9496886B2 (en) 2011-06-16 2016-11-15 Spatial Digital Systems, Inc. System for processing data streams
US9325821B1 (en) * 2011-09-30 2016-04-26 Cirrus Logic, Inc. Sidetone management in an adaptive noise canceling (ANC) system including secondary path modeling
GB2495927B (en) 2011-10-25 2015-07-15 Skype Jitter buffer
CN103187065B (en) 2011-12-30 2015-12-16 华为技术有限公司 The disposal route of voice data, device and system
US9014387B2 (en) 2012-04-26 2015-04-21 Cirrus Logic, Inc. Coordinated control of adaptive noise cancellation (ANC) among earspeaker channels
US9142205B2 (en) 2012-04-26 2015-09-22 Cirrus Logic, Inc. Leakage-modeling adaptive noise canceling for earspeakers
US9123321B2 (en) 2012-05-10 2015-09-01 Cirrus Logic, Inc. Sequenced adaptation of anti-noise generator response and secondary path response in an adaptive noise canceling system
US9319781B2 (en) 2012-05-10 2016-04-19 Cirrus Logic, Inc. Frequency and direction-dependent ambient sound handling in personal audio devices having adaptive noise cancellation (ANC)
US9082387B2 (en) 2012-05-10 2015-07-14 Cirrus Logic, Inc. Noise burst adaptation of secondary path adaptive response in noise-canceling personal audio devices
US9318090B2 (en) 2012-05-10 2016-04-19 Cirrus Logic, Inc. Downlink tone detection and adaptation of a secondary path response model in an adaptive noise canceling system
US9532139B1 (en) 2012-09-14 2016-12-27 Cirrus Logic, Inc. Dual-microphone frequency amplitude response self-calibration
US9640194B1 (en) 2012-10-04 2017-05-02 Knowles Electronics, Llc Noise suppression for speech processing based on machine-learning mask estimation
CN103888630A (en) 2012-12-20 2014-06-25 杜比实验室特许公司 Method used for controlling acoustic echo cancellation, and audio processing device
US9107010B2 (en) 2013-02-08 2015-08-11 Cirrus Logic, Inc. Ambient noise root mean square (RMS) detector
US9369798B1 (en) 2013-03-12 2016-06-14 Cirrus Logic, Inc. Internal dynamic range control in an adaptive noise cancellation (ANC) system
US9215749B2 (en) 2013-03-14 2015-12-15 Cirrus Logic, Inc. Reducing an acoustic intensity vector with adaptive noise cancellation with two error microphones
US9414150B2 (en) 2013-03-14 2016-08-09 Cirrus Logic, Inc. Low-latency multi-driver adaptive noise canceling (ANC) system for a personal audio device
US9635480B2 (en) 2013-03-15 2017-04-25 Cirrus Logic, Inc. Speaker impedance monitoring
US9467776B2 (en) 2013-03-15 2016-10-11 Cirrus Logic, Inc. Monitoring of speaker impedance to detect pressure applied between mobile device and ear
US9324311B1 (en) 2013-03-15 2016-04-26 Cirrus Logic, Inc. Robust adaptive noise canceling (ANC) in a personal audio device
US9208771B2 (en) 2013-03-15 2015-12-08 Cirrus Logic, Inc. Ambient noise-based adaptation of secondary path adaptive response in noise-canceling personal audio devices
US10206032B2 (en) 2013-04-10 2019-02-12 Cirrus Logic, Inc. Systems and methods for multi-mode adaptive noise cancellation for audio headsets
US9462376B2 (en) 2013-04-16 2016-10-04 Cirrus Logic, Inc. Systems and methods for hybrid adaptive noise cancellation
US9460701B2 (en) 2013-04-17 2016-10-04 Cirrus Logic, Inc. Systems and methods for adaptive noise cancellation by biasing anti-noise level
US9478210B2 (en) 2013-04-17 2016-10-25 Cirrus Logic, Inc. Systems and methods for hybrid adaptive noise cancellation
US9578432B1 (en) 2013-04-24 2017-02-21 Cirrus Logic, Inc. Metric and tool to evaluate secondary path design in adaptive noise cancellation systems
US9264808B2 (en) 2013-06-14 2016-02-16 Cirrus Logic, Inc. Systems and methods for detection and cancellation of narrow-band noise
US9536540B2 (en) 2013-07-19 2017-01-03 Knowles Electronics, Llc Speech signal separation and synthesis based on auditory scene analysis and speech modeling
US9270830B2 (en) * 2013-08-06 2016-02-23 Telefonaktiebolaget L M Ericsson (Publ) Echo canceller for VOIP networks
US9420114B2 (en) * 2013-08-06 2016-08-16 Telefonaktiebolaget Lm Ericsson (Publ) Echo canceller for VOIP networks
US9392364B1 (en) 2013-08-15 2016-07-12 Cirrus Logic, Inc. Virtual microphone for adaptive noise cancellation in personal audio devices
US9251806B2 (en) * 2013-09-05 2016-02-02 Intel Corporation Mobile phone with variable energy consuming speech recognition module
US9666176B2 (en) 2013-09-13 2017-05-30 Cirrus Logic, Inc. Systems and methods for adaptive noise cancellation by adaptively shaping internal white noise to train a secondary path
US9620101B1 (en) 2013-10-08 2017-04-11 Cirrus Logic, Inc. Systems and methods for maintaining playback fidelity in an audio system with adaptive noise cancellation
GB2519117A (en) * 2013-10-10 2015-04-15 Nokia Corp Speech processing
US9704472B2 (en) 2013-12-10 2017-07-11 Cirrus Logic, Inc. Systems and methods for sharing secondary path information between audio channels in an adaptive noise cancellation system
US10219071B2 (en) 2013-12-10 2019-02-26 Cirrus Logic, Inc. Systems and methods for bandlimiting anti-noise in personal audio devices having adaptive noise cancellation
US10382864B2 (en) 2013-12-10 2019-08-13 Cirrus Logic, Inc. Systems and methods for providing adaptive playback equalization in an audio device
US9369557B2 (en) 2014-03-05 2016-06-14 Cirrus Logic, Inc. Frequency-dependent sidetone calibration
US9479860B2 (en) 2014-03-07 2016-10-25 Cirrus Logic, Inc. Systems and methods for enhancing performance of audio transducer based on detection of transducer status
US9648410B1 (en) 2014-03-12 2017-05-09 Cirrus Logic, Inc. Control of audio output of headphone earbuds based on the environment around the headphone earbuds
US9319784B2 (en) 2014-04-14 2016-04-19 Cirrus Logic, Inc. Frequency-shaped noise-based adaptation of secondary path adaptive response in noise-canceling personal audio devices
US9609416B2 (en) 2014-06-09 2017-03-28 Cirrus Logic, Inc. Headphone responsive to optical signaling
US10181315B2 (en) 2014-06-13 2019-01-15 Cirrus Logic, Inc. Systems and methods for selectively enabling and disabling adaptation of an adaptive noise cancellation system
US9767784B2 (en) * 2014-07-09 2017-09-19 2236008 Ontario Inc. System and method for acoustic management
US20170237510A1 (en) * 2014-08-05 2017-08-17 Institut Fur Rundfunktechnik Gmbh Variable time offset in a single frequency network transmission system
DE112015003945T5 (en) 2014-08-28 2017-05-11 Knowles Electronics, Llc Multi-source noise reduction
US9478212B1 (en) 2014-09-03 2016-10-25 Cirrus Logic, Inc. Systems and methods for use of adaptive secondary path estimate to control equalization in an audio device
US9626703B2 (en) 2014-09-16 2017-04-18 Voicebox Technologies Corporation Voice commerce
US9898459B2 (en) 2014-09-16 2018-02-20 Voicebox Technologies Corporation Integration of domain information into state transitions of a finite state transducer for natural language processing
EP3207467A4 (en) 2014-10-15 2018-05-23 VoiceBox Technologies Corporation System and method for providing follow-up responses to prior natural language inputs of a user
GB2547063B (en) * 2014-10-30 2018-01-31 Imagination Tech Ltd Noise estimator
US10614799B2 (en) 2014-11-26 2020-04-07 Voicebox Technologies Corporation System and method of providing intent predictions for an utterance prior to a system detection of an end of the utterance
US10431214B2 (en) 2014-11-26 2019-10-01 Voicebox Technologies Corporation System and method of determining a domain and/or an action related to a natural language input
US9552805B2 (en) 2014-12-19 2017-01-24 Cirrus Logic, Inc. Systems and methods for performance and stability control for feedback adaptive noise cancellation
WO2017029550A1 (en) 2015-08-20 2017-02-23 Cirrus Logic International Semiconductor Ltd Feedback adaptive noise cancellation (anc) controller and method having a feedback response partially provided by a fixed-response filter
US9578415B1 (en) 2015-08-21 2017-02-21 Cirrus Logic, Inc. Hybrid adaptive noise cancellation system with filtered error microphone signal
US10013966B2 (en) 2016-03-15 2018-07-03 Cirrus Logic, Inc. Systems and methods for adaptive active noise cancellation for multiple-driver personal audio device
US10331784B2 (en) 2016-07-29 2019-06-25 Voicebox Technologies Corporation System and method of disambiguating natural language processing requests
US10192567B1 (en) * 2017-10-18 2019-01-29 Motorola Mobility Llc Echo cancellation and suppression in electronic device
US11232807B2 (en) 2018-04-27 2022-01-25 Dolby Laboratories Licensing Corporation Background noise estimation using gap confidence
JP7043344B2 (en) * 2018-05-17 2022-03-29 株式会社トランストロン Echo suppression device, echo suppression method and echo suppression program

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4514703A (en) * 1982-12-20 1985-04-30 Motrola, Inc. Automatic level control system
US4663884A (en) * 1982-10-30 1987-05-12 Walter Zeischegg Planter, especially for hydroculture
US5241543A (en) * 1989-01-25 1993-08-31 Hitachi, Ltd. Independent clocking local area network and nodes used for the same
US5307405A (en) * 1992-09-25 1994-04-26 Qualcomm Incorporated Network echo canceller
US5313498A (en) * 1991-06-13 1994-05-17 Nec Corporation Method and arrangement of echo elimination in digital telecommunications system
US5452341A (en) * 1990-11-01 1995-09-19 Voiceplex Corporation Integrated voice processing system
US5704003A (en) * 1995-09-19 1997-12-30 Lucent Technologies Inc. RCELP coder
US5796439A (en) * 1995-12-21 1998-08-18 Siemens Medical Systems, Inc. Video format conversion process and apparatus
WO1999003093A1 (en) * 1997-07-10 1999-01-21 Coherent Communications Systems Corp. Combined speech coder and echo canceler
US5999828A (en) * 1997-03-19 1999-12-07 Qualcomm Incorporated Multi-user wireless telephone having dual echo cancellers
US6023674A (en) * 1998-01-23 2000-02-08 Telefonaktiebolaget L M Ericsson Non-parametric voice activity detection
US6040860A (en) * 1994-09-30 2000-03-21 Matsushita Electric Industrial Co., Ltd. Imaging apparatus for supplying images with rich gradation across the entire luminance range for all subject types
US6084881A (en) * 1997-05-22 2000-07-04 Efficient Networks, Inc. Multiple mode xDSL interface
US6088365A (en) * 1998-01-29 2000-07-11 Generaldata Corp ATM switch voice server module having DSP array
US6148078A (en) * 1998-01-09 2000-11-14 Ericsson Inc. Methods and apparatus for controlling echo suppression in communications systems
US6151576A (en) * 1998-08-11 2000-11-21 Adobe Systems Incorporated Mixing digitized speech and text using reliability indices
US6160886A (en) * 1996-12-31 2000-12-12 Ericsson Inc. Methods and apparatus for improved echo suppression in communications systems

Family Cites Families (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4225919A (en) * 1978-06-30 1980-09-30 Motorola, Inc. Advanced data link controller
US4663675A (en) * 1984-05-04 1987-05-05 International Business Machines Corporation Apparatus and method for digital speech filing and retrieval
US5388092A (en) * 1989-06-27 1995-02-07 Nec Corporation Echo canceller for two-wire full duplex digital data transmission
US5029199A (en) * 1989-08-10 1991-07-02 Boston Technology Distributed control and storage for a large capacity messaging system
GB2256351B (en) * 1991-05-25 1995-07-05 Motorola Inc Enhancement of echo return loss
US5274705A (en) * 1991-09-24 1993-12-28 Tellabs Inc. Nonlinear processor for an echo canceller and method
US5301226A (en) * 1992-02-05 1994-04-05 Octel Communications Corporation Voice processing systems connected in a cluster
US5995539A (en) * 1993-03-17 1999-11-30 Miller; William J. Method and apparatus for signal transmission and reception
US5450490A (en) * 1994-03-31 1995-09-12 The Arbitron Company Apparatus and methods for including codes in audio signals and decoding
US5587998A (en) * 1995-03-03 1996-12-24 At&T Method and apparatus for reducing residual far-end echo in voice communication networks
US5646947A (en) * 1995-03-27 1997-07-08 Westinghouse Electric Corporation Mobile telephone single channel per carrier superframe lock subsystem
US5590121A (en) * 1995-03-30 1996-12-31 Lucent Technologies Inc. Method and apparatus for adaptive filtering
FI110826B (en) * 1995-06-08 2003-03-31 Nokia Corp Eliminating an acoustic echo in a digital mobile communication system
US5561668A (en) 1995-07-06 1996-10-01 Coherent Communications Systems Corp. Echo canceler with subband attenuation and noise injection control
US5668794A (en) * 1995-09-29 1997-09-16 Crystal Semiconductor Variable gain echo suppressor
US5835486A (en) 1996-07-11 1998-11-10 Dsc/Celcore, Inc. Multi-channel transcoder rate adapter having low delay and integral echo cancellation
US5884255A (en) 1996-07-16 1999-03-16 Coherent Communications Systems Corp. Speech detection system employing multiple determinants
WO1998006185A1 (en) 1996-08-01 1998-02-12 Northern Telecom Limited Echo cancelling system for digital telephony applications
US5920834A (en) * 1997-01-31 1999-07-06 Qualcomm Incorporated Echo canceller with talk state determination to control speech processor functional elements in a digital telephone system
US5884225A (en) * 1997-02-06 1999-03-16 Cargill Incorporated Predicting optimum harvest times of standing crops
US6078645A (en) * 1997-02-20 2000-06-20 Lucent Technologies Inc. Apparatus and method for monitoring full duplex data communications
US6035048A (en) * 1997-06-18 2000-03-07 Lucent Technologies Inc. Method and apparatus for reducing noise in speech and audio signals
US6163608A (en) * 1998-01-09 2000-12-19 Ericsson Inc. Methods and apparatus for providing comfort noise in communications systems
US6282176B1 (en) * 1998-03-20 2001-08-28 Cirrus Logic, Inc. Full-duplex speakerphone circuit including a supplementary echo suppressor
US6222910B1 (en) * 1998-05-29 2001-04-24 3Com Corporation System and method for connecting and interfacing a communications device to a telephone line via a telephone set
US6351731B1 (en) * 1998-08-21 2002-02-26 Polycom, Inc. Adaptive filter featuring spectral gain smoothing and variable noise multiplier for noise reduction, and method therefor
US6330533B2 (en) * 1998-08-24 2001-12-11 Conexant Systems, Inc. Speech encoder adaptively applying pitch preprocessing with warping of target signal
US6324170B1 (en) * 1998-09-10 2001-11-27 Nortel Networks Limited Echo controller with compensation for variable delay networks
US6658107B1 (en) * 1998-10-23 2003-12-02 Telefonaktiebolaget Lm Ericsson (Publ) Methods and apparatus for providing echo suppression using frequency domain nonlinear processing
US6208618B1 (en) * 1998-12-04 2001-03-27 Tellabs Operations, Inc. Method and apparatus for replacing lost PSTN data in a packet network
US6580696B1 (en) * 1999-03-15 2003-06-17 Cisco Systems, Inc. Multi-adaptation for a voice packet based
US7423983B1 (en) * 1999-09-20 2008-09-09 Broadcom Corporation Voice and data exchange over a packet based network
US6377637B1 (en) 2000-07-12 2002-04-23 Andrea Electronics Corporation Sub-band exponential smoothing noise canceling system

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4663884A (en) * 1982-10-30 1987-05-12 Walter Zeischegg Planter, especially for hydroculture
US4514703A (en) * 1982-12-20 1985-04-30 Motrola, Inc. Automatic level control system
US5241543A (en) * 1989-01-25 1993-08-31 Hitachi, Ltd. Independent clocking local area network and nodes used for the same
US5452341A (en) * 1990-11-01 1995-09-19 Voiceplex Corporation Integrated voice processing system
US5313498A (en) * 1991-06-13 1994-05-17 Nec Corporation Method and arrangement of echo elimination in digital telecommunications system
US5307405A (en) * 1992-09-25 1994-04-26 Qualcomm Incorporated Network echo canceller
US6040860A (en) * 1994-09-30 2000-03-21 Matsushita Electric Industrial Co., Ltd. Imaging apparatus for supplying images with rich gradation across the entire luminance range for all subject types
US5704003A (en) * 1995-09-19 1997-12-30 Lucent Technologies Inc. RCELP coder
US5796439A (en) * 1995-12-21 1998-08-18 Siemens Medical Systems, Inc. Video format conversion process and apparatus
US6160886A (en) * 1996-12-31 2000-12-12 Ericsson Inc. Methods and apparatus for improved echo suppression in communications systems
US5999828A (en) * 1997-03-19 1999-12-07 Qualcomm Incorporated Multi-user wireless telephone having dual echo cancellers
US6084881A (en) * 1997-05-22 2000-07-04 Efficient Networks, Inc. Multiple mode xDSL interface
WO1999003093A1 (en) * 1997-07-10 1999-01-21 Coherent Communications Systems Corp. Combined speech coder and echo canceler
US6148078A (en) * 1998-01-09 2000-11-14 Ericsson Inc. Methods and apparatus for controlling echo suppression in communications systems
US6023674A (en) * 1998-01-23 2000-02-08 Telefonaktiebolaget L M Ericsson Non-parametric voice activity detection
US6088365A (en) * 1998-01-29 2000-07-11 Generaldata Corp ATM switch voice server module having DSP array
US6151576A (en) * 1998-08-11 2000-11-21 Adobe Systems Incorporated Mixing digitized speech and text using reliability indices

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KONDOZ ET AL.: "A high quality voice coder with integrated echo canceller and voice activity detector for VSAT systems", 3RD EUROPEAN CONFERENCE ON SATELLITE COMMUNICATIONS-ECSC-3, 1993, pages 196 - 200, XP002939217 *
LITTLE BERNHARD ET AL.: "Speech recognition for the siemens EWSD public exchange", INTERACTIVE VOICE TECHNOLOGY (IVT) FOR TELECOMMUNICATIONS APPLICATIONS, 1998. IVT PROCEEDINGS IEEE 4TH WORKSHOP, 1998, pages 175 - 178, XP002939218 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6925174B2 (en) 1999-12-09 2005-08-02 Broadcom Corporation Interaction between echo canceller and packet voice processing
US7920697B2 (en) 1999-12-09 2011-04-05 Broadcom Corp. Interaction between echo canceller and packet voice processing
EP1298815A3 (en) * 2001-09-20 2004-07-28 Mitsubishi Denki Kabushiki Kaisha Echo processor generating pseudo background noise with high naturalness
US7092516B2 (en) 2001-09-20 2006-08-15 Mitsubishi Denki Kabushiki Kaisha Echo processor generating pseudo background noise with high naturalness
EP1339205A2 (en) * 2002-02-22 2003-08-27 Broadcom Corporation Interaction between an echo canceller and a packet voice processing
EP1339205A3 (en) * 2002-02-22 2003-11-19 Broadcom Corporation Interaction between an echo canceller and a packet voice processing
WO2004002127A1 (en) * 2002-06-19 2003-12-31 Koninklijke Philips Electronics N.V. Non stationary echo canceller
WO2004049687A1 (en) 2002-11-25 2004-06-10 Intel Corporation Noise matching for echo cancellers
US7627111B2 (en) 2002-11-25 2009-12-01 Intel Corporation Noise matching for echo cancellers
EP1434416A3 (en) * 2002-12-23 2009-07-08 Broadcom Corporation Packet voice system with far-end echo cancellation
US9794417B2 (en) 2002-12-23 2017-10-17 Avago Technologies General Ip (Singapore) Pte. Ltd. Packet voice system with far-end echo cancellation

Also Published As

Publication number Publication date
US20060098808A1 (en) 2006-05-11
AU1359601A (en) 2001-05-14
US7039181B2 (en) 2006-05-02
US7236586B2 (en) 2007-06-26
CA2390200A1 (en) 2001-05-10
US6522746B1 (en) 2003-02-18
US7003097B2 (en) 2006-02-21
US6526139B1 (en) 2003-02-25
US6526140B1 (en) 2003-02-25
US20030091182A1 (en) 2003-05-15
US20030053618A1 (en) 2003-03-20

Similar Documents

Publication Publication Date Title
US6526139B1 (en) Consolidated noise injection in a voice processing system
US7558729B1 (en) Music detection for enhancing echo cancellation and speech coding
US7539615B2 (en) Audio signal quality enhancement in a digital network
US8290141B2 (en) Techniques for comfort noise generation in a communication system
Gustafsson et al. A psychoacoustic approach to combined acoustic echo cancellation and noise reduction
JP4897173B2 (en) Noise suppression
US5912966A (en) Enhanced echo canceller for digital cellular application
US7564964B2 (en) Echo canceller
CN106571147B (en) Method for suppressing acoustic echo of network telephone
US20110075833A1 (en) Echo Canceller With Correlation Using Pre-Whitened Data Values Received By Downlink Codec
EP1869672A1 (en) Method and apparatus for modifying an encoded signal
CA2328006C (en) Linear predictive coding based acoustic echo cancellation
US8582754B2 (en) Method and system for echo cancellation in presence of streamed audio
JP2001251652A (en) Method for cooperatively reducing echo and/or noise
US7711107B1 (en) Perceptual masking of residual echo
Chandran et al. Compressed domain noise reduction and echo suppression for network speech enhancement
Lu et al. Pitch analysis-based acoustic echo cancellation over a nonlinear channel
Hua-Zhu et al. Research on VoIP Acoustic Echo Cancelation Algorithm Based on Speex
EP1944877B1 (en) Method of modifying a residual echo
KANG et al. A new post-filtering algorithm for residual acoustic echo cancellation in hands-free mobile application
Gnaba et al. Combined acoustic echo canceller for the GSM network
Sakhnov et al. Method for comfort noise generation and voice activity detection for use in echo cancellation system
Ghous et al. Modified Digital Filtering Algorithm to Enhance Perceptual Evaluation of Speech Quality (PESQ) of VoIP
MXPA98002468A (en) Echo cancelling system for digital telephony applications

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AU BA BB BG BR BZ CA CN CR CU CZ DM DZ EE GD GE HR HU ID IL IN IS JP KR LC LK LR LT LV MA MG MK MN MX NO NZ PL RO SG SI SK TR TT UA UZ VN YU ZA

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 2390200

Country of ref document: CA

122 Ep: pct application non-entry in european phase