Search Images Maps Play YouTube News Gmail Drive More »
Sign in
Screen reader users: click this link for accessible mode. Accessible mode has the same essential features but works better with your reader.

Patents

  1. Advanced Patent Search
Publication numberUS20020010580 A1
Publication typeApplication
Application numberUS 09/249,108
Publication dateJan 24, 2002
Filing dateFeb 12, 1999
Priority dateFeb 12, 1999
Also published asUS6381570
Publication number09249108, 249108, US 2002/0010580 A1, US 2002/010580 A1, US 20020010580 A1, US 20020010580A1, US 2002010580 A1, US 2002010580A1, US-A1-20020010580, US-A1-2002010580, US2002/0010580A1, US2002/010580A1, US20020010580 A1, US20020010580A1, US2002010580 A1, US2002010580A1
InventorsDunling Li, Zoran Mladenovic, Bogdan Kosanovic
Original AssigneeDunling Li, Zoran Mladenovic, Bogdan Kosanovic
Export CitationBiBTeX, EndNote, RefMan
External Links: USPTO, USPTO Assignment, Espacenet
Signal dependent method for bandwith savings in voice over packet networks
US 20020010580 A1
Abstract
A system and method for maintaining acceptable perceived sound quality while achieving desired bandwidth savings in voice-over packet networks that use signal energy level dependent thresholds. The method utilizes only the block energy of the input signal to discriminate between active signal, such as voice, facsimile tone, touch tone or dial tone, and background noise. The discrimination algorithm is adaptive to changes in signal energy levels. The method is designed to accommodate a large dynamic range of signal volumes and is reliable under different background noise conditions. The method includes a robust active signal and noise level estimation algorithm that prevents threshold divergence. A speech-smoothing scheme is used to prevent misclassifying weak active signals as background noise. The complexity of the bandwidth saving method is linear with respect to the input signal length. The discrimination can be adapted to accommodate differing traffic loads on the packet system by providing greater savings during high traffic and decreasing compression during low traffic conditions.
Images(4)
Previous page
Next page
Claims(19)
We claim:
1. A method for establishing a noise/active signal threshold, comprising the steps of:
sampling the signal in blocks of a consistent length;
calculating the signal energy of each block;
extracting minimum and maximum energy vectors from the calculated block energies;
passing indicia of the block energy of signal blocks having an energy level below a low threshold to an estimator as noise;
passing indicia of the block energy of signal blocks having an energy level above a high threshold to an estimator as active signal;
estimating the energy level of signal blocks containing noise and the energy level of signal blocks containing active signal from said passed signal blocks;
establishing an active threshold for distinguishing noise from active signal based in part on said energy level estimations.
2. The method of claim 1, further comprising the step of:
discriminating said signal blocks into first signal blocks having an energy level below a low threshold, second signal blocks having an energy level above a high threshold, and third signal blocks having an energy level between said low threshold and said high threshold;
3. The method of claim 2, further including the step of:
eliminating third signal blocks having an energy level between said low and high threshold from said estimation.
4. The method of claim 1, further including the step of:
selecting an adaptive algorithm for adapting said established threshold based upon a determination of the signal envelope, wherein a first algorithm is applied when said envelop is indeterminate, a second algorithm is applied when said envelope is essentially constant and a third algorithm is applied when said envelope is varying but consistent.
5. The method of claim 4, wherein said signal is carried on a packet network, further including the step of:
sensing the channel traffic load in said packet network; and
dynamically reconfiguring said adaptive algorithm based upon said sensed traffic load.
6. The method of claim 1, further comprising the step of:
providing an output decision correlated to said signal blocks and indicative of the identification of the respective signal block as noise or active signal.
7. The method of claim 6, further comprising the step of:
smoothing said output decision.
8. The method of claim 7, wherein:
said smoothing step includes a delay of a change in said output decision from active signal to noise.
9. The method of claim 1, wherein:
said step of establishing an active threshold for distinguishing noise from active signal is based solely on said signal block energy level.
10. The method of claim 9, wherein:
said sampling, calculating, extracting, estimating and establishing are performed without noise reduction.
11. A system for establishing a noise/active signal threshold, comprising:
a signal sampler for sampling the signal in blocks of a consistent length;
a block energy calculator for calculating the signal energy of each block;
an extractor for extracting minimum and maximum energy vectors from the calculated block energies;
a marginal signal/noise discriminator for:
discriminating said signal blocks into first signal blocks having an energy level below a low threshold, second signal blocks having an energy level above a high threshold, and third signal blocks having an energy level between said low threshold and said high threshold;
for passing indicia of the block energy of signal blocks having an energy level below a low threshold to an estimator as noise; and
passing indicia of the block energy of signal blocks having an energy level above a high threshold to an estimator as active signal;
an estimator for estimating the energy level of signal blocks containing noise and the energy level of signal blocks containing active signal from said passed signal blocks;
control logic for establishing an active threshold for distinguishing noise from active signal based in part on said energy level estimations.
12. The system of claim 11, further comprising:
means for eliminating said third signal blocks having an energy level between said low and high threshold from said estimation.
13. The system of claim 11, further comprising:
logic control means for selecting an adaptive algorithm for adapting said established threshold based upon a determination of the signal envelope, wherein a first algorithm is applied when said envelope is indeterminate, a second algorithm is applied when said envelope is essentially constant and a third algorithm is applied when said envelope is varying but consistent.
14. The system of claim 13, wherein said signal is carried on a packet network, further comprising:
a receiver for receiving indicia of the channel traffic load in said packet network; and
a reconfiguration element for dynamically reconfiguring said adaptive algorithm based upon said sensed traffic load.
15. The system of claim 11, further comprising:
means for providing an output decision correlated to said signal blocks, indicative of the identification of the respective signal block as noise or active signal.
16. The system of claim 15, further comprising:
a smoother for smoothing said output decision.
17. The system of claim 11, wherein:
said control logic establishes said active threshold for distinguishing noise from active signal based solely on said signal block energy level.
18. The system of claim 17, wherein:
said sampling, calculating, extracting, estimating and establishing are performed without noise reduction.
19. The system of claim 18, wherein said signal is carried on a packet network, further including:
a compressor/noise reducer for reducing said noise in said packet network signal based upon said output decision.
Description
BACKGROUND OF THE INVENTION

[0001] 1. Field of the Invention

[0002] The invention relates to methods for conservation of bandwidth in a packet network. More specifically, the invention relates to methods for reducing the bandwidth consumption in voice-over packet networks by improved detection of active signals, background noise, and silence.

[0003] 2. Description of the Background Art

[0004] A system for bandwidth savings, known as time assignment speech interpolation (TASI), was introduced to increase the capacity of submarine telephone cables used in analog telephony. TASI was subsequently replaced with a similar digital system. Such schemes are commonly known as digital speech interpolation (DSI) systems. As multimode and variable-rate speech coding techniques have improved, several promising silence compression standards have been developed and issued to address the bandwidth saving problem. The algorithm standardized by the GSM for use in the Pan-European digital Cellular Mobile Telephone Service is an example of a voice activity detection (VAD) technique designed for the mobile environment. Another VAD algorithm in wireless applications is provided with the ITA/EIA/IS-127 Enhanced Variable Rate Codec standard. There are two silence compression standards from ITU: G.723.1 Annex A, and G.729 Annex B.

[0005] Although these standards for bandwidth savings are very effective, their complexity is very high. The complexity of these methods derives from the fact that they rely upon processing the spectral features of a signal, which requires an analysis of the frequency and/or spectrum of the signal to identify the characteristics of speech, voice, or other distinct signals. These methods require adaptive algorithms to reduce noise, band pass filters to isolate speech, and the like to identify accurately characteristics of the signal to detect voice from other sounds, signals, or noise.

[0006] Complex standards require complex algorithms and therefore require significant processing capabilities. The method of the present invention significantly reduces complexity and therefore can be implemented in high channel density wired telephony applications. The present invention is simple in terms of processing and memory requirements and results in excellent performance.

SUMMARY OF THE INVENTION

[0007] In voice-over packet applications, speech signal is transmitted using data packets. The general telephone network will limit the bandwidth of the speech signal to 300 to 3,400 Hz range. In most speech codecs, the signal is sampled at 8 Khz resulting in the maximum signal bandwidth of 4 Khz. Each sample is represented with 16 bits, resulting in a 128 kbps bit rate. To save on bandwidth, PCM and ADPCM codecs are widely used in telephony applications and are important in high channel density implementation of voice-over packet applications. For the purpose of bandwidth savings with PCM and ADPCM codecs, voice activity detection is used to distinguish silence from active signal. The silence packets are not transmitted during any nonspeech interval, effectively increasing the number of channels. In voice-over packet applications, the input speech level can be varied from −50 dBm0 to 0 dBm0, facsimile signal level varies from −48dBm0 to 0 dBm0, the noise properties may change considerably during a conversation.

[0008] To detect signal activity accurately under different signal input and noise conditions, the energy threshold is adapted to the input signal and noise levels. Because of its adaptive function, the corresponding signal activity detection algorithm herein provides bandwidth savings with low complexity and low delay and performs well for a wide range of signal energy input levels and background noise environments as well as signal energy level changes. Because the bandwidth savings may change based on packet network traffic load, the algorithm is dynamically configurable to adjust the bandwidth savings percentages.

[0009] In development of voice-over packet network applications, a reliable bandwidth saving method is crucial to achieve a desirable balance between acceptable perceived sound quality and reduction in bandwidth requirements. Due to a variety of working conditions a number of challenges are imposed upon such a method. The bandwidth savings needs to be accomplished with both low delay and low complexity. The method must perform well for a wide range of input signal levels, must work in a variety of background noise environments, and must be robust in the presence of active signal and/or background noise level changes. Since the bandwidth requirements may change based on network factors such as load or traffic conditions or because of changing performance needs, the present invention is dynamically configurable to perform well under different requirements. It is common for the noise environment to alter in real-time, and the present invention dynamically adjusts through monitoring such changes to accomplish bandwidth savings and to perform well under a wide variety of conditions.

[0010] The present invention accomplishes efficient savings in bandwidth through a system for active signal (e.g., voice, facsimile, dialtone) and background noise detection and discrimination which utilizes block energy threshold adaptation, adaptive marginal signal/noise discrimination, state control logic, and active signal smoothing. The system distinguishes active signal (e.g., voice, speech, etc.) from background noise to allow for the compression or elimination of periods of silence or background noise. The system includes a state machine for logic control in establishing a dynamic adaptive threshold, below which the signal is identified as silence or background noise, and above which the signal is identified as active signal. The threshold is established by factors, including an active signal estimation technique from discrimination of noise below a first threshold and active signal above a second threshold. Signal between the thresholds cannot be discriminated and is therefore not used in the estimation to avoid loss of voice through misidentification as noise or silence. The system is efficient in detection of active signals and elimination of noise, while maintaining a safety margin to avoid degradation of voice quality by misidentification of low voice signals as background or silence.

[0011] The state machine, FIG. 2, includes the flow logic, FIG. 3, for updating the adaptive block energy threshold used for threshold detection, FIG. 1. There are three states in the state machine: learning state, converged state, and constant envelope state. Learning state is the initial and default state, where the system does not have any reliable estimates of noise or active signal energy levels. The state control logic 6 is in converged state when the current energy level threshold is acceptable and the noise and signal level estimations are reliable. When the input signal has an approximate constant envelope, the state machine is in the constant envelope state to distinguish facsimile from background noise in order to identify facsimile as active signal, not noise,

[0012] The system utilizes signal energy detection to establish and adjust the adaptive lower and upper thresholds. The signal is divided into blocks of a desired length, and signal features relating to the signal energy level are extracted for analysis to determine signal feature characteristics used to establish noise and active signal predictive thresholds. These established thresholds are used to discriminate the signal.

[0013] A signal from a source is first processed to determine the energy E(n) of the signal. The energy level is processed into energy vectors corresponding to discrete time intervals, for analysis. Each block is first processed by comparison with an initial set of thresholds within a marginal signal and noise discriminator, to discriminate initially between noise and signal. If below a first noise threshold, the block is classified as noise. If above a second voice threshold, the block is classified as active signal. Once discriminated, blocks below the noise threshold are used in noise level estimation, and blocks above the active signal threshold are used in active signal level estimation. Blocks between the thresholds are not used in level estimation. In this manner the present invention creates a clear separation between signal and noise.

[0014] These processed signal blocks are then used to create active estimates of the noise level and of the active signal level. The estimation is a continuous processing activity updated as further signal blocks are discriminated and made available to the estimator. In the exemplary embodiment, estimation is performed using a combination RMS/geometric averaging of block energies under the control of the marginal signal and noise discriminator. However, either RMS or geometric averaging alone could be used, as could other power estimation techniques, sample based or block based averaging. The method of both sampling and averaging can be varied through a change of factors such as time constants, frame size for block energy threshold detection, changing noise and/or signal thresholds, elimination of a discrimination gap between noise and signal, estimate noise/voice division, etc., still within the scope of the invention as herein taught.

[0015] The estimates of noise level and active signal level are later used in establishing the adaptive thresholds used to process the current signal block in the threshold detector to determine if the signal is noise or voice used in establishing an output decision for use in compression for bandwidth savings.

[0016] The determined energy level E(n) of the signal is also supplied to a threshold detector to make the detection between noise and active signals. The current values of the adaptive thresholds within the detector, as established from the active estimates of noise signal and active signal level based upon the control of the state control logic, are used to classify an input block into “active signal” or “noise” comparing the corresponding block energy E(n) with the adaptive threshold. The threshold adaption is performed based upon a current one of several available algorithms selected by a state control logic based upon the dynamics of the signal estimation processing. Different threshold functions are applied to the detection based upon the reliability of these estimates and the consistency of the signal envelope.

[0017] Weak active signals, which may present intermittent low signal levels, can be misclassified as noise. In order to reduce misclassification, the output of the threshold detector is smoothed. By smoothing, short term active signal drops are not classified as noise and subsequently improperly compressed. The smoothed output of the threshold detector is used as the output decision of the system method. The smoothing mechanism is influenced by the traffic load configuration. In the exemplary embodiment, a hang-over period smoothing method is implemented. Alternative delay methods or smoothing algorithms can be implemented. However, the computational processing power needed to perform signal smoothing processing must be considered in implementing the present invention, which relies upon simplification for effective implementation.

[0018] The output decision is then used by the voice-over packet network communication system to implement the desired processing of the current packet for bandwidth savings by appropriate compression based upon the simplified active signal/noise discrimination of the present invention.

[0019] In energy-based signal activity detection, one of the difficulties is that a simple energy measure cannot distinguish low-level speech sounds (weak active signal) from background noise if the signal-to-noise ratio is not high enough. In the implementation of the preferred embodiment of the present invention as described below, the following assumptions have been made. However, these values can be adjusted to process signals according to desired design parameters while remaining within the inventive concept taught herein:

[0020] during natural conversation, within a long enough period of time, there will exist at least one silence frame (i.e., a signal frame that does not contain speech sounds) of a minimum duration;

[0021] during natural conversation, weak speech sounds should normally last only for short periods of time;

[0022] the short-term statistics (up to 1.5 seconds) of a noise are stationary or pseudo-stationary;

[0023] the block energy threshold should be a function of noise level, active signal level, and signal-to-noise ratio.

BRIEF DESCRIPTION OF THE DRAWINGS

[0024]FIG. 1 is an overall block diagram for the signal processing and threshold detection system of the present invention.

[0025]FIG. 2 is a block diagram illustrating the interaction of the states of the state control logic of the present invention.

[0026]FIG. 3 is a logic flow chart illustrating the threshold update process of the state control logic of the present invention.

[0027]FIG. 4 is a graph illustrating the coefficient K(Emax/Emin) for the learning state of the state control logic of the present invention.

[0028]FIG. 5 is a graph illustrating the coefficient K(Evoice/Enoise) for the learning state of the state control logic of the present invention.

DETAILED DESCRIPTION OF PREFERRED EXEMPLARY EMBODIMENTS

[0029]FIG. 1 is a block diagram illustrating an exemplary embodiment of the overall logic flow of the present invention. The signal from a source in a packet network passes through splitter 9 and is inputted into block 1 where the signal energy is calculated.

[0030] The signal energy is calculated using a block energy calculation technique where the input signal is partitioned into nonoverlapped 2.5 ms blocks. The 2.5 ms exemplary block size results in 20 samples/block, when an 8 kHz sampling rate is used. The block energy is calculated as a sum of sample squares or root-mean-square algorithm. The calculation can be performed according to a standard signal energy algorithm such as: E b = i = 0 N - 1 x ( i ) 2

[0031] for example, where: N=20 if 2.5 ms blocks are used and N=40 if 5 ms blocks are used.

[0032] Table I illustrates an exemplary typical result from the calculation of block energy. In the algorithm as implemented in an exemplary embodiment, the block length N=40 (samples of 5 ms), the threshold update period L=256 blocks (1.28 sec) and the update subperiod S=32 blocks (160 ms), the dimension of minimum/maximum energy vectors is D=8 (eight subperiods within a period or L/S). In the following example, shortened for the sake of illustration, N=5, L=12, and S=4, and therefore D=3.

TABLE I
Block # Samples Energy Value
1 −1
3
3
1
3 29
2 1
−2
−3
−2
0 18
3 2
−2
3
0
−2 21
4 2
0
−1
1
1 7
5 2
4
0
3
−4 45
6 4
−3
−3
3
2 47
7 −4
−5
3
−4
−3 75
8 1
−3
−1
−5
4 52
9 0
−1
0
−2
−1 6
10 −3
0
2
0
1 14
11 −3
−2
2
1
−1 19
12 0
2
−5
1
−5 55

[0033] The calculated block energies are used to extract features from the input signal at block 2 of FIG. 1. Using the calculated block energies, the following features are extracted every 1.28 seconds:

[0034] 1. Minimum energy vector.

[0035] 2. Maximum energy vector.

[0036] 3. Minimum energy.

[0037] 4. Maximum energy.

[0038] The minimum and maximum energy vectors are obtained by partitioning a 1.28-second period into eight parts. For each part the minimum and maximum block energies are determined. The minimum and maximum energies are determined from the minimum and maximum energy vectors, respectively. In an exemplary embodiment, 5 ms block energy features are extracted for each threshold update period (1.28 seconds). Other block size and update periods can be used as appropriate for the signal, the desired compression, active signal quality and bandwidth savings. The threshold is partitioned into eight non-overlapped subperiod intervals J of 160 ms (length N=32 5 ms blocks). Minimum and maximum energy vectors Evct min and Evct max are extracted as follows:

Evct_min(j)=min{E(n)}

[0039] and

Evct_max(j)=max{E(n)}

[0040] where: E(n) is 5 ms block energy, and

[0041] j=0,1,2 . . . , 7and nε[jN, (j+1)N−1]

[0042] The minimum energy and maximum energy are the minimum or maximum 5 ms block energy during the whole threshold update period, i.e., Emin=min{Evct_min} and Emax=max{Evct_max}. The 2.5 ms block threshold block energy E(l) is extracted for the threshold detector 5 while the 2.5 ms block-based zero crossing rate is considered as an optional feature which can be extracted for consideration in threshold determination by the state control logic 6. Because zero crossing rate is strongly affected by dc offset, a highpass filter should be used if the input signal has dc components. Block-based zero crossing rate can be extracted as follows: ZCR = 1 L l = 1 L - 1

[0043] where L=20 is the block length

[0044] Table II illustrates an exemplary feature extraction from the exemplary block energies illustrated in Table I.

TABLE II
Min Max
Block # Block Energy Emin Vector Emax Vector Energy Energy
1 29
2 18
3 21
4 7 7 29
5 45
6 47
7 75
8 52 45 75
9 6
10 14
11 19
12 55 6 55 6 75

Marginal Signal/Noise Discriminator

[0045] The purpose of the marginal signal and noise discriminator, block 3, is to keep a distance or gap between noise level and active signal level, so that overlapped parts of active signal and noise block energies can be eliminated before the subsequent noise and active signal energy estimations. The noise energy level estimate and the active signal energy level estimate are used by state control logic 6 during threshold establishment in the “converged state.” Establishing a region between a maximum noise level and a minimum active signal level is accomplished by maintaining two energy margins: one for noise, and the other for active signal. When block energy is below the noise margin, it is considered noise and used in noise level estimation. Similarly, when block energy is above the active signal margin, it is considered active signal and used in active signal level estimation. Otherwise, the block energy is not used in level estimation. The output of estimator 4 is used by state control logic 6 to select the current state based upon the signal envelope consistency and reliability. Therefore, the estimation of noise and active signal energy are independent of the output results of the bandwidth savings algorithm, and divergence due to misclassification can be avoided.

Signal/Noise Level Estimation

[0046] The signal and noise level estimation 4 is performed using the geometric averaging of block energies under the control of the marginal signal and noise discriminator. The outputs are active signal level and noise level. These outputs represent an ongoing adaptive estimate of the average noise and active signal levels of the processed signal and can be determined according to the exemplary method below:

T 1 =E min+{fraction (1/32)}(E max −E min)

T 2=4E min

T noise=min{2min{T 1 ,T 2},−21dBm0

T voice=min{max{αmax{T 1 ,T 2},−65dBm0},−17dBm0}

[0047] α = { 16 E max E min > 2 13 4 E max E min 2 13

[0048] Both the noise and active signal (e.g., voice) thresholds are based on minimum and maximum block energy during one threshold updating period. Active signal and noise energy estimation is calculated by a geometric averaging as follows:

E x(n)=(1αx)E x(n−1)+αx E(n)

[0049] where x is either voice or noise and α is adjusted for determination of voice or noise as follows: α voice = { 1 64 E ( n ) > T voice 0 E ( n ) T voice α noise = { 1 32 E ( n ) < T noise 0 E ( n ) T noise

[0050] where E(n) is 5 ms block energy, k and l are the number of voice and noise blocks respectively, from the marginal signal and noise discriminator 3.

State Control Logic

[0051] The purpose of control logic 6 is to perform the threshold adaptation. The threshold used for detection 5 is adaptive in the present invention, based upon a number of factors derived from the block energy calculation, including the discrimination 3 and estimation 4. The adaptation of the block energy threshold is necessary for the effective discrimination based upon the algorithm performance. The state control logic 6 performs the adaption of the threshold through processing algorithums based upon the state of the logic.

[0052] State control logic 6 is designed as a state machine with the following states:

[0053] 1. Constant envelope. The method is in this state when the input signal has approximately constant envelope as determined by the input from the marginal signal/noise discriminator 3. For example, facsimile signals, dial tone, and stationary noise signals would have a constant envelope.

[0054] Minimum and maximum energy vectors are used in state transition. Zero crossing rate is also used if available. The threshold function for constant envelope state is: T = { - 50 dBm 0 E max E min 2 f 1 ( E max ) otherwise where : f 1 ( E max ) = { 4 E max E max < 951 dBm0 - 45 dBm0 - 51 dBm0 E max < - 48 dBm0 2 E max E max - 48 dBm0

[0055] 2. Learning. The method is in this state when the marginal signal/noise discriminator 3 does not have reliable estimates for the energy margins. The minimum and maximum energies are used to update the threshold as: T = K ( E max E min ) E min

[0056] the coefficient K(Emax/Emin)is illustrated in FIG. 4.

[0057] The system of the present invention will always start in the learning state until converged or constant envelope state is identified. The system state control logic 6 will revert to the learning state when either constant envelope or converged state cannot be identified.

[0058] 3. Converged. The method is in this state when the marginal signal/noise discriminator 3 has reliable estimates for the energy margins. The converged state threshold update is based on background noise and signal-to-noise ratio. However, the estimations of noise energy and signal-to-noise ratio are based on signal activity decisions. To minimize unstable operation, a marginal signal and noise discriminator is used in noise and signal level estimation. The converge state threshold algorithm is a function of average voice energy (Evoice) and noise energy (Enoise). Evoice and Enoise are estimated according to the marginal signal and noise discriminator 3. The threshold function for the converged state is: T = K ( E voice E noise ) E noise

[0059] the coefficient K(Evoice/Enoise) illustrated in FIG. 5. If (Evoice/Enoise)<4, then the learning state threshold function will be used to update the threshold in detector 5. To keep the threshold adapt smooth, the following interpolation is used during converged state where m is the number of the threshold update period: T ( m + 1 ) = 1 2 T ( m ) + 1 2 T

[0060] the threshold is always bounded. The bounds depend on a traffic load.

Threshold Detector

[0061] State control logic 6 determines the thresholds used by threshold detector 5. The active signal level and noise level outputs of estimator 4 are one factor used by control logic 6 to establish detection thresholds for the threshold detector 5. Other factors can include zero crossing discrimination. The current value of noise and active signal thresholds in adaptive threshold detector, block 5, are used to classify a current input block into “active signal” or “noise” using the corresponding block energy for the current input block calculated in block energy calculation 1. The threshold values inputted to the threshold detector 5 are controlled by the state control logic 6 which determines the threshold function to be applied in the detector 5 based upon the state of control logic 6 determined by the estimation of signal estimator 4.

[0062] Threshold detector 5 performs a decision for the current block to detect active signal or noise and assigns a status follows: status = { { active signal } E ( k ) T noise E ( k ) < T

[0063] where T is adaptive 2.5 ms block energy threshold

[0064] An input frame is partitioned into non-overlapped 2.5 ms block (20 samples/block). A decision is made for each block based on the block energy. In an embodiment with an optional zero crossing rate available, an additional threshold detection step is utilized when the energy threshold detection detects the current block as noise, as follows: status = { active signal E ( k ) T or ZCR ( k ) T zcr noise otherwise

[0065] where Tzcr is fixed zero crossing rate threshold, which, for example, can be chosen as 0.7. The purpose of using an additional zero crossing rate detector is to minimize the potential misclassification between noise and weak active signal at the beginning of an active signal, such as the beginning of a conversation.

Active Signal Smoothing

[0066] In order to reduce the potential for misclassification of weak active signal as noise, the output of the threshold detector 5 is smoothed 7. Smoothing can be accomplished by providing a hang-over period for indicating active signal detection for a period of time after the signal has dropped below the active signal threshold. This will have the advantage of avoiding drops or holes in voice transmission and can help to avoid chopping of the end of speech. Other methods of smoothing can also be implemented within the scope of the invention. The output of threshold detector 5, after smoothing, is used as the output decision 8 of the method. The smoothing mechanism is influenced by the traffic load configuration. Typically, the output signal of the detector can indicate false noise detection in the presence of a short-lived weak active signal. By smoothing the signal, short noise detections can be significantly reduced. Under high traffic loads, it may be desirable to reduce the degree of smoothing to allow increased bandwidth savings with only slight potential degradation in voice quality. Under low traffic loads, it may be desirable to increase the degree of smoothing to achieve potentially greater voice quality with acceptable lower reductions in bandwidth savings. The dynamic adaptability of the present invention allows for change of smoothing based upon traffic and signal detection.

[0067] The output decision 8 is then supplied to the compression logic of the packet system in combination with the signal for the application of compression and/or noise elimination 11 as desired by the packet system. The portions of the signal classified as noise can be eliminated and the active signals passed or compressed as desired. The signal may need to be delayed 10 to adjust for the timing of the decision from the application of the method of the present invention.

[0068] In implementing the system of the present invention, the various parameters need to be adjusted to correspond to the signal, the equipment used in the packet network, and the desired tradeoff between compression and active signal transmission degradation. Any of the parameters (e.g., block size, sampling rate, threshold update period, hang-over period, minimum and maximum energy thresholds) as well the algorithms can be changed to get different effects within the scope of the invention. The algorithms can be implemented, and the system and the packet network can be monitored. The parameters can then be adapted to achieve the desired bandwidth conservation. The compression can depend on traffic load to adjust the parameters of the system actively.

[0069] A further specific exemplary implementation of the present invention is described in the paper entitled Signal Dependent Bandwidth Saving Method in Voice-Over Packet Networks of Dunling Li, Zoran Mladenovic, and Bogdan Kosanovic, attached hereto and incorporated by reference herein.

[0070] Because many varying and different embodiments may be made within the scope of the inventive concept herein taught, and because many modifications may be made in the embodiments herein detailed in accordance with the descriptive requirements of the law, it is to be understood that the details herein are to be interpreted as illustrative and as limiting.

Referenced by
Citing PatentFiling datePublication dateApplicantTitle
US7236929 *Dec 3, 2001Jun 26, 2007Plantronics, Inc.Echo suppression and speech detection techniques for telephony applications
US7996215Apr 13, 2011Aug 9, 2011Huawei Technologies Co., Ltd.Method and apparatus for voice activity detection, and encoder
US8296133Nov 30, 2011Oct 23, 2012Huawei Technologies Co., Ltd.Voice activity decision base on zero crossing rate and spectral sub-band energy
US8554547Jul 11, 2012Oct 8, 2013Huawei Technologies Co., Ltd.Voice activity decision base on zero crossing rate and spectral sub-band energy
US20120197642 *Apr 12, 2012Aug 2, 2012Huawei Technologies Co., Ltd.Signal processing method, device, and system
US20120209604 *Oct 18, 2010Aug 16, 2012Martin SehlstedtMethod And Background Estimator For Voice Activity Detection
Classifications
U.S. Classification704/233, 704/E19.041
International ClassificationG10L19/14, G10L21/02
Cooperative ClassificationG10L2021/02168, G10L19/18
European ClassificationG10L19/18
Legal Events
DateCodeEventDescription
Sep 25, 2013FPAYFee payment
Year of fee payment: 12
Sep 22, 2009FPAYFee payment
Year of fee payment: 8
Sep 27, 2005FPAYFee payment
Year of fee payment: 4
Feb 12, 1999ASAssignment
Owner name: TELOGY NETWORKS, INC., MARYLAND
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LI, DUNLING;MLADENOVIC, ZORAN;KOSANOVIC, BOGDAN;REEL/FRAME:009832/0802
Effective date: 19990210