Publication number | US7676043 B1 |
Publication type | Grant |
Application number | US 11/364,116 |
Publication date | Mar 9, 2010 |
Filing date | Feb 28, 2006 |
Priority date | Feb 28, 2005 |
Fee status | Paid |
Also published as | US7715573, US8036394 |
Publication number | 11364116, 364116, US 7676043 B1, US 7676043B1, US-B1-7676043, US7676043 B1, US7676043B1 |
Inventors | Ryo Tsutsui, Ivan Setiawan, Yoshihide Iwata |
Original Assignee | Texas Instruments Incorporated |
Export Citation | BiBTeX, EndNote, RefMan |
Patent Citations (9), Referenced by (19), Classifications (8), Legal Events (2) | |
External Links: USPTO, USPTO Assignment, Espacenet | |
This application claims priority from provisional applications Nos. 60/657,234, filed Feb. 28, 2005, 60/749,994, filed Dec. 13, 2005, and 60/756,099, filed Jan. 4, 2006. Co-assigned, copending patent application No. 60/660,372, filed Mar. 9, 2005 discloses related subject matter.
The present invention relates to digital signal processing, and more particularly to audio frequency bandwidth expansion.
Audio signals sometimes suffer from inferior sound quality. This is because their bandwidths have been limited due to the channel/media capacity of transfer/storage systems. For example, cut-off frequencies are set at about 20 kHz for CD, 16 kHz for MP3, 15 kHz for FM radio, and even lower for other audio systems whose data rate capability are poorer. At playback time, it is beneficial to recover high frequency components that have been discarded in such systems. This processing is equivalent to expanding an audio signal bandwidth, so it can be called bandwidth expansion (BWE); see
On the other hand, time domain processing for BWE has been proposed in which high frequency components are synthesized by using amplitude modulation (AM) and extracted by using a high-pass filter. This system performs the core part of high frequency synthesis in time domain and is time domain alias-free. Another property employed is to estimate the cut-off frequency of input signal, on which the modulation amount and the cut-off frequency of the high-pass filter can be determined in run-time depending on the input signal. BWE algorithms work most efficiently when the cut-off frequency is known beforehand. However, it varies depending on signal content, bit-rate, codec, and encoder used. It can vary even within a single stream along with time. Hence, a run-time cut-off frequency estimator, as shown in
Another bandwidth problem occurs at low frequencies: bass loudspeakers installed in electric appliances such as flat panel TV, mini-component, multimedia PC, portable media player, cell-phone, and so on cannot reproduce bass frequencies efficiently due to their limited dimensions relative to low frequency wavelengths. With such loudspeakers, the reproduction efficiency starts to degrade rapidly from about 100-300 Hz depending on the loudspeakers, and almost no sound is excited below 40-100 Hz; see
The present invention provides audio bandwidth expansion with adaptive cut-off frequency detection and/or a common expansion for stereo signals and/or even-odd harmonic generation for part of low frequency expansion.
1. Overview
Preferred embodiment methods include audio bandwidth extensions at high and/or low frequencies. Preferred embodiment high-frequency bandwidth expansion (BWE) methods include amplitude modulation and a high-pass filter for high frequency synthesis which reduces computation by making use of an intensity stereo processing in case of stereo signal input. Another BWE preferred embodiment estimates the level of high frequency components adaptively; this enables smooth transition in spectrum from original band-limited signals to synthesized high frequencies with a more natural sound quality.
Further preferred embodiments provide for the run-time creation of the high-pass filter coefficients, use of windowed sinc functions that requires low computation with much smaller look-up table size for ROM. This filter is designed to have linear phase, and thus is free from phase distortion. And the FIR filtering operation is done in frequency domain using the overlap-save method, which saves significant amount of computation. Some other operations including the AM operation are also converted to frequency domain processing so as to minimize the number of FFT operations.
In particular, a preferred embodiment method first identifies a cut-off frequency, as the candidate, with adaptive thresholding of the input power spectrum. The threshold is adaptively determined based on the signal level and the noise floor that is inherent in digital (i.e., quantized) signals. The use of the noise floor helps discriminate the presence of high frequencies in input signals. To verify the candidate cut-off frequency, the present invention then detects the spectrum envelope around the candidate. If no ‘drop-off’ is found in the spectrum envelope, the candidate will be treated as a false cut-off and thus discarded. In that case, the cut-off frequency will be identified as the Nyquist frequency F_{S}/2. All the processing is done in the decibel domain to emphasize the drop-off in spectra percentage and to estimate the cut-off frequency in a more robust manner.
Preferred embodiment systems perform preferred embodiment methods with any of several types of hardware: digital signal processors (DSPs), general purpose programmable processors, application specific circuits, or systems on a chip (SoC) which may have multiple processors such as combinations of DSPs, RISC processors, plus various specialized programmable accelerators; see
2. Single-channel AM-based BWE with Adaptive Signal Level Estimation
Preferred embodiment methods and devices provide for stereo BWE using a common extension signal. Thus, initially consider preferred embodiment BWE for a single channel system, this will be the baseline implementation for the preferred embodiment stereo-channel BWE. We adopt the AM-based BWE method due to its good sound quality and lower computation complexity.
F _{C} <F _{S}/2=F _{N}
where F_{N }denotes the Nyquist frequency. For example, typical sampling rates are F_{S}=44.1 or 48 kHz, so F_{N}=22.05 or 24 kHz; whereas, F_{C }may be about 16 kHz, such as in MP3.
In the figure, u_{1}(n) is output from the amplitude-modulation block AM (more precisely, cosine-modulation). Let the block AM be a point-wise multiplication with a time varying cosine weight:
u _{1}(n)=cos[2πf _{m} n/F _{S} ]x(n)
where f_{m }represents the frequency shift amount (known as a carrier frequency for AM) from the input signal. The behavior of this modulation can be graphically analyzed in the frequency domain. Let X(f) be the Fourier spectrum of x(n) defined as
X(f)=Σ_{−∞<n<∞} x(n) exp[−j2πfn]
and let U_{1}(f) be the Fourier spectrum of u_{1}(n) defined similarly. Then the modulation translates into:
U _{1}(f)=½X(f−f _{m} /F _{S})+½X(f+f _{m} /F _{S})
This shows that U_{1}(f) is composed of frequency-shifted versions of X(f). The top two panels of
As shown in
Before being added to x(n), the level of u_{2}(n) is adjusted using gain G(n), so that the band-expanded spectrum exhibits a smooth transition from the original spectrum through the synthesized high frequency spectrum (see the lower right panel in
In
Then G(n) is determined by
G(n)=2Σ_{i} |v _{H}(n−i)|/αΣ_{i} |v _{M}(n−i)|
where α factor compensates for the different frequency ranges in v_{H}(n) and v_{M}(n), and the factor of 2 is for canceling the ½ in the definition of U_{1}(f). Finally, we obtain the band-expanded output:
y(n)=x(n)+G(n)u _{2}(n)
From its definition, G(n) can be seen to be a rough estimation of the energy transition of |X(f)| for f in the interval F_{C}−f_{m}<f<F_{C}. This is because the definition of G(n) can be interpreted using Parseval's theorem as
Note that this is just for ease of understanding and is mathematically incorrect because Parseval's theorem applies in L^{2 }and not in L^{1}. For example, if the numerator integral gives a small value, it is likely that X(f) decreases as f increases in the interval F_{C}−f_{m}<f<F_{C}. Thus the definition tries to let G(n) be smaller so that the synthesized high frequency components get suppressed in the bandwidth expansion interval F_{C}<f<F_{C}+f_{m}.
3. BWE for Stereo
u _{1}(n)=cos[2πf _{m} n/F _{S}](x _{l}(n)+x _{r}(n))/2
Next, by high-pass filtering u_{1}(n) with HP_{C}(z), we obtain u_{2}(n), the high frequency components. The signal u_{2}(n) can be understood as a center channel signal for IS. We then apply the gains G_{l}(n) and G_{r}(n) to adjust the level of u_{2}(n) for left and right channels, respectively. Ideally, we separately compute G_{l}(n) and G_{r}(n) for the left and right channels, but the preferred embodiment methods provide further computation reduction and apply HP_{M}(z) only to the center channel while having HP_{H}(z) applied individually to left and right channels. That is, left channel input signal x_{l}(n) is filtered using high-pass filter HP_{H}(z) to yield v_{l,H}(n) and right channel input signal x_{r}(n) is filtered again using high-pass filter HP_{H}(z) to yield v_{r,H}(n); next, the center channel signal (x_{l}(n)+x_{r}(n))/2 is filtered using high-pass filter HP_{M}(z) to yield v_{M}(n). Then define the gains for the left and right channels:
G _{l}(n)=2Σ_{i} |v _{l,H}(n−i)|/αΣ_{i} |v _{M}(n−i)|
G _{r}(n)=2Σ_{i} |v _{r,H}(n−i)|/αΣ_{i} |v _{M}(n−i)|
Lastly, compute the left and right channel bandwidth-expanded outputs using the separate left and right channel gains with the HP_{C}-filtered, modulated center channel signal u_{2}(n):
y _{l}(n)=x _{l}(n)+G _{l}(n)u _{2}(n)
y _{r}(n)=x _{r}(n)+G _{r}(n)u _{2}(n)
The determination of F_{C }can be adaptive as described in the following section, and this provides a method to determine f_{m}, such as taking f_{m}=20 kHz−F_{C}.
4. Cut-off Frequency Estimation
Preferred embodiment methods estimate the cut-off frequency F_{C }of the input signal from the input signal, and then the modulation amount f_{m }and the cut-off frequency of the high-pass filter can be determined in run-time depending on the input signal. That is, the bandwidth expansion can adapt to the input signal bandwidth.
F _{C} =F _{S} k _{c} /N
The input sequence x(n) is assumed to be M-bit linear pulse code modulation (PCM), which is a very general and reasonable assumption in digital audio applications. The frequency spectrum of x(n) accordingly has the so-called noise floor originating from quantization error as shown in
Suppose that x(n) was obtained through quantization of the original signal u(n) in which q(n)=x(n)−u(n) is the quantization error, and the quantization step size is
Δ=2^{−M+1}
According to the classical quantization model, the quantization error variance is given by
E[q ^{2}]=Δ^{2}/12≡P _{q}
On the other hand, the quantization error can generally be considered as white noise. Let Q(k) be the N-point discrete Fourier transform (DFT) of q(n) defined by
Q(k)=1/NΣ _{0≦n≦N−1} q(n)e ^{−j2πnk/N}
Then, the expectation of the power spectrum will be constant as
E[|Q(k)|^{2} ]=P _{Q}
The constant P_{Q }gives the noise floor as shown in
From Parseval's theorem, the following relation holds:
Σ_{0≦k≦N−1} |Q(k)|^{2}=1/NΣ _{0≦n≦N−1} q(n)^{2}
By taking the expectation of this relation and using the foregoing, the noise floor is given by
P _{Q} =P _{q} /N=1/(32^{2M} N)
As shown in block diagram
Let x_{m}(n) be the input samples of the m-th frame as
x _{m}(n)=x(Nm+n) 0≦n≦N−1
Then, the frequency spectrum of the windowed m-th frame becomes
X _{m}(k)=1/NΣ _{0≦n≦N−1} w(n)x _{m}(n)e ^{−j2πnk/N}
where w(n) is the window function such as a Hann, Hamming, Blackman, et cetera, window.
Define the peak power spectrum of the m-th frame, P_{m}(k) for 0≦k≦N/2, as
P _{m}(k)=max{αP _{m−1}(k), |X _{m}(k)|^{2} +|X _{m}(−k)|^{2}}
where α is the decay rate of peak power per frame. Note that the periodicity X_{m}(k)=X_{m}(N+k) holds in the above definition. For simplicity, we will omit the subscript m in the peak power spectrum for the current frame in the following.
After the peak power spectrum is obtained, the candidate cut-off frequency k_{c}′ is identified as the highest frequency bin for which the peak power exceeds a threshold T:
P(k _{c}′)>T
The threshold T is adapted to both the signal level and the noise floor. The signal level is measured in mean peak power within the range [K_{1}, K_{2}] defined as
P _{X}=Σ_{K} _{ 1 } _{≦k≦K} _{ 2 } P(k)/(K _{2} −K _{1}+1)
The range is chosen such that P_{X }reflects the signal level in higher frequencies including possible cut-off frequencies. For example, [K_{1}, K_{2}]=[N/5, N/2]. The threshold T is then determined as the geometric mean of the mean peak power P_{X }and the noise floor P_{Q}:
T=√(P _{X} P _{Q})
In the decibel domain, this is equivalent to placing T at the midpoint between P_{X }and P_{Q }as
It must be noted that, even if there is no actual cut-off in P(k), the above method will identify a certain k_{c}′ as the candidate cut-off, which is actually a false cut-off frequency. Hence, whether there is the true cut-off at the candidate k_{c}′ or not should be examined.
In order to see if there is the actual cut-off at k_{c}′, the preferred embodiment method detects the envelope of P(k) separately for below k_{c}′ and for above k_{c}′. It uses linear approximation of the peak power spectrum in the decibel domain, as shown in
y=a _{L}(k _{c} ′−k)+b _{L}
and
y=a _{H}(k−k _{c}′)+b _{H}
The slopes a_{L}, a_{H }and the offsets b_{L}, b_{H }are derived by the simple two-point linear-interpolation. To obtain a_{L }and b_{L}, two reference points K_{L1 }and K_{L2 }are set as in
K _{L1} =k _{c} ′−N/16, K _{L2} =k _{c}′−3N/16
Then, the mean peak power is calculated for the two adjacent regions centered at the two reference points as
P _{L1}=(1/D _{L})Σ_{K} _{ L2 } _{−D} _{ L } _{/2≦k≦K} _{ L1 } _{+D} _{ L } _{/2−1} P(k)
P _{L2}=(1/D _{L})Σ_{K} _{ L2 } _{−D} _{L} _{/2≦k≦L} _{ L2 } _{+D} _{ L } _{/2−1} P(k)
where D_{L }is the width of the regions:
D _{L} =K _{L1} −K _{L2}
The linear-interpolation of the two representative points, (K_{L1}, P_{L1}) and (K_{L2}, P_{L2}), in the decibel domain gives
a _{L}=(
Similarly, for the envelope above k_{c}′, K_{H1 }and K_{H2 }are set appropriately, and
P _{H1}=(1/D _{H})Σ_{K} _{ H1 } _{−D} _{ H } _{/2≦k≦K} _{ H1 } _{+D} _{ H } _{/2−1} P(k)
P _{H2}=(1/D _{H})Σ_{K} _{ H2 } _{−D} _{ H } _{/2≦k≦K} _{ H2 } _{+D} _{ H } _{/2−1} P(k)
are computed, where
D _{H} =K _{H2} −K _{H1}
Example values are
K _{H1} =k _{c} ′+N/16, K _{H2} =k _{c} ′+N/8
With these values a_{H }and b_{H }can be computed by just switching L to H in the foregoing.
In the preferred embodiment method, the candidate cut-off frequency k_{c}′ is verified as
where k_{c }is the final estimation of the cut-off frequency, and
There are many other possible ways to verify the candidate cut-off frequency k_{c}′ using a_{L}, b_{L}, a_{H }and b_{H}. Another simple example is
b _{L} >
The high-pass filter coefficients H_{C}(z) is determined every time k_{c }(or F_{C}(n)) is updated. From the implementation view point, the filter coefficient creation has to be done with low computation complexity. The known approach precalculates and stores in a ROM a variety of IIR (or FIR) filter coefficients that correspond to the possible cut-off frequencies. If an IIR filter is used, H_{C}(z) will have non-linear phase response and the output u_{2}(n) will not be phase-aligned with the input signal x(n) even if we have the delay unit. This could cause perceptual distortion. On the other hand, FIR filters generally require longer tap length than IIR filters. Therefore huge amount of ROM size will be required to store FIR filter coefficients for variety of cut-off frequencies. To avoid these problems, the preferred embodiment design form H_{C}(z) with FIR that requires small amount of ROM size and low computational cost. With this design, the preferred embodiment system enables better sound quality than the known approach with IIR implementation for H_{C}(z) or much smaller ROM size than that with FIR implementation.
Our FIR filter design method is similar to that presented in cross-reference patent application No. 60/660,372, which is based on the well-known windowed sinc function. The impulse response h_{id} ^{(n)}(m) of the “ideal” high-pass filter with cut-off angular frequency at ω_{C}(n) at time n can be found by inverse Fourier transforming the ideal frequency-domain high-pass filter as follows:
h _{id} ^{(n)}(m)=(½π){∫_{−π≦ω≦−ω} _{ C } _{(n)} e ^{−j2πω} dω+∫ _{π≦ω≦ω} _{ C } _{(n)} e ^{−j2πω} dω}
so
Substituting ω_{C}(n)=2πF_{C}(n)/F_{S }gives
This “ideal” filter requires the infinite length for h_{id} ^{(n)}(m). In order to truncate the length to a finite number, window function is often used that reduces the Gibbs phenomenon. Let the window function be denoted w(m) and non-zero only in the range −L≦m≦L, then practical FIR high-pass filter coefficients with order-2L can be given as
For run-time calculation of these filter coefficients, we factor h^{(n)}(m) as
h ^{(n)}(m)=h _{w}(m)h _{S} ^{(n)}(m)
where
with h_{0} ^{(n)}=(1−k_{c}(n)/(N/2))w(0).
It is clear that h_{w}(m) is independent of the cut-off frequency and therefore time-invariant. It can be precalculated and stored in a ROM and then referenced for generating filter coefficients in run-time with any cut-off frequencies. The term h_{S} ^{(n)}(m) can be calculated with low computation using a recursive method as in the cross-referenced application. In particular, presume that
s _{1}(n)=sin [2πk _{c}(n)/N]
c _{1}(n)=cos [2πk _{c}(n)/N]
can be obtained by referring to a look-up table, then we can perform recursions for positive m:
h _{S} ^{(n)}(1)=s _{1}(n)
h _{S} ^{(n)}(2)=2c _{1}(n)h _{S} ^{(n)}(1)
h _{S} ^{(n)}(3)=2c _{1}(n)h _{S} ^{(n)}(2)−h _{S} ^{(n)}(1)
. . .
h _{S} ^{(n)}(m)=2c _{1}(n)h _{S} ^{(n)}(m−1)−h _{S} ^{(n)}(m−2)
and for negative m use h_{S} ^{(n)}(m)=−h_{S} ^{(n)}(−m).
The FIR filter derived above doesn't satisfy causality; that is, there exists m such that h^{(n)}(m)≠0 for −L≦m<0, whereas causality has to be satisfied for practical FIR implementations. To cope with this problem, we insert a delay in the FIR filtering in order to make it causal. That is,
u _{2}(n)=Σ_{−L≦m≦L} u _{1}(n−m−L)h ^{(n)}(m)
where u_{2}(n) is the output signal (see
6. BWE in Frequency Domain
FIR filtering is a convolution with the impulse response function; and convolution transforms into pointwise multiplication in the frequency domain. Consequently, a popular alternative formulation of FIR filtering includes first transform (e.g., FFT) a block of the input signal and the impulse response to the frequency domain, next multiply the transforms, and lastly, inverse transform (e.g, IFFT) the product back to the time domain.
x ^{(r)}(m)=x(Rr+m−N) 0≦m≦N−1
We assume x(m)=0 for m<0. Note that, for the FFT processing, N is chosen to be a power of 2, such as 256.
y ^{(r)}(m)=y(Rr+m−R) 0≦m≦R−1
In
Due to the frame-based processing, the cut-off frequency estimation can be done each frame, not for each input sample. Hence update of the high-pass filter becomes to be done less frequently. However, as is often the case, this causes no quality degradation because the input signal can be assumed to be stationary during a certain amount of duration, and the cut-off frequency is expected to change slowly.
For the r-th frame, the DFT (FFT implementation) of x^{(r) }is defined as:
X _{S} ^{(r)}(k)=Σ_{0≦m≦N−1} x ^{(r)}(m) exp [−j2πkm/N]
The DFT coefficients X_{S} ^{(r)}(k) will be used for high-frequency synthesis, and also the cut-off estimation after a simple conversion as explained in detail in the following.
The AM operation is applied to x^{(r)}(m) as described in section 2 above:
u _{1} ^{(r)}(m)=cos [2πF _{m} m/F _{S} ]x ^{(r)}(m)
Note that, in the following discussion regarding frequency domain conversion, a constraint will have to be fulfilled on the frequency-shift amount F_{m}. Let k_{m }be a bin number of frequency-shift amount, we have to satisfy k_{m}=NF_{m}/F_{S }is an integer since the bin number has to be integer. On the other hand, for use of FFT, the frame size N has to be power of 2. Hence, F_{m}=F_{S}/2^{integer}.
Also note that, the use of overlapped frames requires another condition to be satisfied on the output frame size R. The cosine weight in the modulation for overlapped input signals in successive frames has to be the same values. Otherwise the same input signal in different frames is weighted by different cosine weights, which causes perceptual distortion around output frame boundaries. Since
x ^{(r)}(m)=x(Rr+n−N)=x ^{(r−1)}(m+R),
we have to satisfy
cos [2πF _{m} m/F _{S}]=cos [2πF _{m}(m+R)/F _{S}]
This leads to
F _{m} =F _{S} I/R
where I is an integer value. This leads to R being 4 times an integer. This condition is not so strict for most of the applications. Overlap ratio of 50% (e.g, R=N/2) is often chosen for frequency domain processing, which satisfies R being 4 times an integer.
Then we convert the operation modulation to the frequency domain. Again with capitals denoting transforms of lower case:
U _{1} ^{(r)}(k)=½(X _{S} ^{(r)}(k−k _{m})+X _{S} ^{(r)}(k+k _{m}))
The equation indicates that, once we have obtained the DFT of the input frame, then the AM processing can be performed in frequency domain just by summing two DFT bin values.
Now apply the overlap-save method to implement the time domain FIR convolution at the end of section 5. Let the r-th frame of the output from the high-pass filter be u_{2} ^{(r)}(m) for 0≦m≦R−1. The sequence can be calculated using the overlap-save method as follows. First, let h^{(r)}(m) be the filter coefficients, which are obtained similarly to h^{(n)}(m) as described in section 5 above but for the r-th frame instead of time n. The length-(2L+1) sequence h^{(r)}(m) is extended to a periodic sequence with period N by padding with N−2L−1 0s. Note that we need 2L≦N−R to keep the convolution in a single block. Here we set 2L=N−R. After these, we can calculate FFT of h^{(r)}(m) and denote this H^{(r)}(k).
Now let V^{(r)}(k)=H^{(r)}(k) U_{1} ^{(r)}(k) for 0≦k≦N−1. Then, let v^{(r)}(m) denote the inverse FFT of V^{(r)}(k); extract u_{2} ^{(r)}(m) from v^{(r)}(m):
u _{2} ^{(r)}(m)=v ^{(r)}(m+L) for 0≦m≦R−1
By unframing the output frame u_{2} ^{(r)}(m) (see
Here we explain our method to calculate the DFT of the filter coefficients, H^{(r)}(k), which is required for the overlap-save method. Recall the formula of the filter coefficients for r-th frame, and after extending this to a periodic sequence with period N using zero padding, then we have
h ^{(r)}(m)=h _{w}(m)h _{S} ^{(r)}(m) for m=0, ±1, ±2, . . . , ±N/2
where
with h_{0} ^{(r)}=(1−k_{c}(r)/(N/2))w(0). Note we assume here that the cut-off frequency index for r-th frame, k_{c}(r), has already been obtained. Also note that h_{S} ^{(r)}(m) doesn't have to be zero-padded, because h_{w}(m) is zero-padded and that makes h^{(r)}(m) zero-padded.
It is well known that the time domain point-wise multiplication is equivalent to circular convolution of the DFT coefficients. Let H_{w}(k) and H_{S} ^{(r)}(k), respectively, be the DFT of h_{w}(m) and h_{S} ^{(r)}(m), then the product h^{(r)}(m)=h_{w}(m)h_{S} ^{(r)}(m) transforms to
where {circle around (X)} denotes the circular convolution and we assumed the periodicity on the DFT coefficients. Note that h_{w}(m) is the sum of δ(m) plus an odd function of m, thus H_{w}(k)=1+jH_{w,Im}(k) where H_{w,Im}(k) is a real sequence; namely, the discrete sine transform of h_{w}(m). Since H_{w,Im}(k) is independent of the cut-off frequency, it can be precalculated and stored in a ROM. As for H_{S} ^{(r)}(k), because h_{S} ^{(r)}(m) is just the sine function, we can write
H _{S} ^{(r)}(k)=h _{0} ^{(r)} +j(N/2)[δ(k−k _{C}(r))−δ(k+k _{C}(r))]
Thus the circular convolution can be simplified significantly. Since the DFT coefficients of real sequences are asymmetric in their imaginary parts about k=0, the following relations hold:
and similarly,
1{circle around (X)}j(N/2)[δ(k−k _{C}(r))−δ(k+k _{C}(r))]=0
Consequently,
H ^{(r)}(k)=h _{0} ^{(r)}+½[H _{w,Im}(k+k _{C}(r))−H _{w,Im}(k−k _{C}(r))]
Thus H^{(r)}(k) can be easily obtained by just adding look-up table values H_{w,Im}(k).
The order of the high-pass filter, which has been set at 2L=N−R in the preferred embodiment method, can be further examined. In general, we hope to design a long filter that has better cut-off characteristic. However, due to the behavior of circular convolution of the overlap-save method, illegally long order of filter results in time domain alias. See
Preceding section 4 provided the method that estimates frame-varying cut-off frequency k_{C}(r) in the system
X _{A} ^{(r)}(k)=1/NΣ _{0≦m≦N−1} w _{a}(m)x ^{(r)}(m)exp [−j2πmk/N]
In general, the analysis window function w_{a}(m) has to be used to suppress the sidelobes caused by the frame boundary discontinuity. However, direct implementation of FFT only for this purpose requires redundant computation, since we need another FFT that is used for X_{S} ^{(r)}(k). To cope with this problem, we propose an efficient method that calculates X_{A} ^{(r)}(k) from X_{S} ^{(r)}(k), which enables economy of computational cost. Based on our method, any kind of window function can be used for w_{a}(m), as long as it is derived from a summation of cosine sequences. This includes Hann, Hamming, Blackman, Blackman-Harris windows which are commonly expressed as the following formula:
w _{a}(m)=Σ_{0≦i≦M} a _{m }cos [2πmi/N]
For example, for the Hann window, M32 1, a_{0}=½ and a_{1}=½.
Comparison of X_{A} ^{(r)}(k) and X_{S} ^{(r)}(k) as DFTs leads to the following relation:
X _{A} ^{(r)}(k)=X _{A} ^{(r)}(k){circle around (X)}W _{a}(k)
where W_{a}(k) is the DFT of w_{a}(m). Using the expression of w_{a}(m) in terms of cosines and after simplification, we obtain
X _{A} ^{(r)}(k)=a _{0} X _{S} ^{(r)}(k)+½Σ_{1≦m≦M} a _{m}(X _{S} ^{(r)}(k−m)+X _{S} ^{(r)}(k+m))
Typically, M=1 for Hann and Hamming windows, M=2 for Blackman window and M=3 for Blackman-Harris window. Therefore the computational load of this relation is much lower than additional FFT that would be implemented just to obtain X_{A} ^{(r)}(k).
Since the preferred embodiment frequency domain method for BWE is much more complicated than that the time domain method, we summarize the steps of the procedure.
(1) Receive R input samples and associate an N-sample frame overlapped with the previous ones. The overlap length N−R has to be N−R=2L, where 2L is the order of high-pass filter H_{C}(z).
(2) The N sample input signal is processed with FFT to obtain X_{S} ^{(r)}(k).
(3) X_{S} ^{(r)}(k) is converted to X_{A} ^{(r)}(k), which is the short-time spectrum of the input signal with a cosine-derived window.
(4) Using X_{A} ^{(r)}(k), the cut-off frequency index k_{C}(r) is estimated. The estimation can be done based on either approach in section 4.
(5) X_{S} ^{(r)}(k) is also frequency-shifted by cosine modulation to yield U_{1} ^{(r)}(k)
(6) U_{1} ^{(r)}(k) is point-wisely multiplied with H^{(r)}(k) to yield U_{2} ^{(r)}(r,k), where H^{(r)}(k) is calculated as h_{0} ^{(r)}+½[H_{w,Im}(k+k_{C}(r))−H_{w,Im}(k−k_{C}(r))] using a lookup table for the H_{w,Im}(.) values.
(7) U_{2} ^{(r)}(r,k) is processed with IFFT to get u_{2} ^{(r)}(r,m), and the synthesized high frequency components u_{2}(n) is extracted as u_{2} ^{(r)}(r,n+L)
(8) The gain g(n) is determined as in section 3, and applied to the high frequency components u_{2}(n).
(9) The signal u_{2}(n) is added to delayed input signal, where the delay amount D is given by D=L.
7. Bass Expansion
The bass boost filter is intended to equalize the loudspeaker of interest for the higher bass frequencies f_{H}≦f≦f_{C}.
The preferred embodiment harmonics generator generates integral-order harmonics of the lower bass frequencies f_{L}≦f≦f_{H }with an effective combination of a full wave rectifier and a clipper.
h(n)=h _{e}(n)+Kh _{o}(n)
where K is a level-matching constant. The generated harmonics h(n) is passed to the output low-pass filter ‘LPF2’ to suppress extra harmonics that may lead to unpleasant noisy sound.
The peak detector ‘Peak’ works as an envelope estimator. Its output is used to eliminate dc (direct current) component of the full wave rectified signal, and to determine the clipping threshold. The following paragraphs describe the peak detection and the method of generating harmonics efficiently using the detected peak.
The peak detector detects peak absolute value of the input signal s(n) during each half-wave. A half-wave means a section between neighboring zero-crossings.
To generate even-order harmonics h_{e}(n), the preferred embodiments employs the full wave rectifier. Namely it calculates absolute value of the input signal s(n). An issue of using the full wave rectifier is that the output cannot be negative and thus it has a positive offset that may lead to unreasonably wide dynamic range. The offset could be eliminated by using a high-pass filter. However, the filter should have steep cut-off characteristics in order to cut the dc offset while passing generated bass (i.e., very low) frequencies. The filter order will then be relatively high, and the computation cost will be increased. Instead, the preferred embodiments, in a more direct way, subtracts an estimate of the offset as
h _{e}(n)=|s(n)|−αp(n)
where α is a scalar multiple. From the derivation in the following section, the value of α is set to 2/π.
The frequency characteristics of h_{e}(n) are analyzed for a sinusoidal input. Since the frequencies contained in s(n) and h_{e}(n) are very low compared to the sampling frequency, the characteristics may be derived in the continuous time domain.
Let f(t) be a periodic function of period 2π. Then, the Fourier series of f(t) is given by
f(t)=a _{0}+Σ_{0<k<∞}(a _{k }cos kt+b _{k }sin kt)
where the Fourier coefficients a_{k}, b_{k }are
a _{0}=∫_{−π<t<π} f(t)dt
a _{k}=∫_{−π<t<π} f(t)cos kt dt
b _{k}=∫_{−π<t<π} f(t)sin kt dt
Suppose that the unit sinusoidal function of the fundamental frequency, sin t, is fed to the foregoing full-wave rectifier with offset (h_{e}(n)=|s(n)|−αp(n)). Note that the peak is always equal to 1 for input sin t. Then, computing the Fourier coefficients for |sin t|−α gives
a _{0} ^{(e)}=2/π−α.
b _{k} ^{(e)}=0
Hence, the full wave rectifier generates even-order harmonics. To eliminate the dc offset, a_{0} ^{(e)}, α is set to 2/π. in the preferred embodiments. The frequency spectrum of h_{e}(n) is shown in
To generate higher harmonics of odd-order, the preferred embodiment clips the input signal s(n) at a certain threshold T(T>0) as follows:
The threshold T should follow the envelope of the input signal s(n) to generate harmonics efficiently. It is thus time-varying and denoted by T(n) hereinafter. In the present invention, from the derivation in the following section, the threshold is determined as
T(n)=βp(n)
where β=1/√2.
The Fourier coefficients of a unit sinusoidal, sin t, clipped with the threshold T=sin θ are given by
a _{k} ^{(o)}=0
b _{1} ^{(o)}=2(θ+sin θ cos θ)/π
Note that the clipping generates odd-order harmonics. The frequency spectrum of the clipped sinusoidal, h_{o}(n), is shown in
The similarity in the decay rate is suggested as follows. When the threshold parameter θ is set to θ=π/4, the magnitude of the k≠1, odd Fourier coefficients become
|b _{k} ^{(o)}|=2[1−(−1)^{(k−1)/2} /k]/π(k ^{2}−1)
Since the 1/k term is small compared to the principal term due to k≧3, the following approximation holds
2|b _{k} ^{(o)}|=4/π(k ^{2}−1) for k≠1, odd
On the other hand, from h_{e}(n) discussion
|a _{k} ^{(e)}|=4/π(1−k ^{2}) for k even, positive
Thus the expressions for |a_{k} ^{(e)}| and 2|b_{k} ^{(o)}| are identical except for the neglected term. Therefore, the frequency spectra of h_{e}(n) and 2h_{e}(n) decay in a similar manner with respect to k. In the preferred embodiments, the constant K in and β are so selected as K=2, β=sin π/4=1/√2.
8. Experimental Results of Stereo BWE
We implemented and tested the proposed method in the following steps: First, a stereo signal sampled at 44.1 kHz was low-pass filtered with cut-off frequency at 11.025 kHz (half the Nyquist frequency). This was used for an input signal to the proposed system. The frequency shift amount f, was chosen to be 5.5125 kHz. Therefore, the bandwidth of the output signal was set to about 16 kHz. We implemented the high-pass filters with an IIR structure.
9. Modifications
The preferred embodiments can be modified while retaining one or more of the features of adaptive high frequency signal level estimation, stereo bandwidth expansion with a common signal, cut-off frequency estimation with spectral curve fits, and bass expansion with both fundamental frequency illusion and frequency band equalization.
For example, the number of samples summed for the ratios defining the left and right channel gains can be varied from a few to thousands, the shift frequency can be roughly a target frequency (e.g., 20 kHz)—the cutoff frequency, the interpolation frequencies and size of averages for the cut-off verification could be varied, and the shape and amount of bass boost could be varied, and so forth.
Cited Patent | Filing date | Publication date | Applicant | Title |
---|---|---|---|---|
US4696035 * | Jul 28, 1986 | Sep 22, 1987 | Sgs Microelectronica S.P.A. | System for expanding the stereo base of stereophonic acoustic diffusion apparatus |
US5111346 * | Jul 18, 1989 | May 5, 1992 | Akai Electric Co., Ltd. | Fm audio record reproducing apparatus having circuitry for preventing discontinuities during channel switching or switching of read heads |
US5319713 * | Nov 12, 1992 | Jun 7, 1994 | Rocktron Corporation | Multi dimensional sound circuit |
US5377272 * | Aug 28, 1992 | Dec 27, 1994 | Thomson Consumer Electronics, Inc. | Switched signal processing circuit |
US5838800 * | Dec 11, 1995 | Nov 17, 1998 | Qsound Labs, Inc. | Apparatus for enhancing stereo effect with central sound image maintenance circuit |
US5872851 * | May 19, 1997 | Feb 16, 1999 | Harman Motive Incorporated | Dynamic stereophonic enchancement signal processing system |
US5892830 * | Dec 19, 1996 | Apr 6, 1999 | Srs Labs, Inc. | Stereo enhancement system |
US6711265 * | May 12, 2000 | Mar 23, 2004 | Thomson Licensing, S.A. | Centralizing of a spatially expanded stereophonic audio image |
US6947564 * | Dec 15, 1999 | Sep 20, 2005 | Thomson Licensing | Stereophonic spatial expansion circuit with tonal compensation and active matrixing |
Citing Patent | Filing date | Publication date | Applicant | Title |
---|---|---|---|---|
US8150050 * | Nov 26, 2007 | Apr 3, 2012 | Samsung Electronics Co., Ltd. | Bass enhancing apparatus and method |
US8160889 * | Apr 17, 2012 | Nuance Communications, Inc. | System for providing an acoustic signal with extended bandwidth | |
US8167826 * | Aug 5, 2009 | May 1, 2012 | Action Research Co., Ltd. | Vibration generating apparatus and method introducing hypersonic effect to activate fundamental brain network and heighten aesthetic sensibility |
US8600765 | Dec 27, 2012 | Dec 3, 2013 | Huawei Technologies Co., Ltd. | Signal classification method and device, and encoding and decoding methods and devices |
US8868414 * | Jan 18, 2012 | Oct 21, 2014 | Yamaha Corporation | Audio signal processing device with enhancement of low-pitch register of audio signal |
US8949116 * | Jan 28, 2011 | Feb 3, 2015 | Samsung Electronics Co., Ltd. | Signal processing method and apparatus for amplifying speech signals |
US9105263 | Jun 25, 2012 | Aug 11, 2015 | Huawei Technologies Co., Ltd. | Audio signal coding and decoding method and device |
US9391579 * | Sep 8, 2011 | Jul 12, 2016 | Dts, Inc. | Dynamic compensation of audio signals for improved perceived spectral imbalances |
US20080175409 * | Nov 26, 2007 | Jul 24, 2008 | Samsung Electronics Co., Ltd. | Bass enhancing apparatus and method |
US20080195392 * | Jan 17, 2008 | Aug 14, 2008 | Bernd Iser | System for providing an acoustic signal with extended bandwidth |
US20110152729 * | Aug 5, 2009 | Jun 23, 2011 | Tsutomu Oohashi | Vibration generating apparatus and method introducing hypersonic effect to activate fundamental brain network and heighten aesthetic sensibility |
US20110184731 * | Jul 28, 2011 | Samsung Electronics Co., Ltd. | Signal processing method and apparatus for amplifying speech signals | |
US20120063616 * | Sep 8, 2011 | Mar 15, 2012 | Martin Walsh | Dynamic compensation of audio signals for improved perceived spectral imbalances |
US20120191462 * | Jul 26, 2012 | Yamaha Corporation | Audio signal processing device with enhancement of low-pitch register of audio signal | |
CN102208188A * | Jul 13, 2011 | Oct 5, 2011 | 华为技术有限公司 | Audio signal encoding-decoding method and device |
CN102208188B | Jul 13, 2011 | Apr 17, 2013 | 华为技术有限公司 | Audio signal encoding-decoding method and device |
CN102800317A * | May 25, 2011 | Nov 28, 2012 | 华为技术有限公司 | Signal classification method and equipment, and encoding and decoding methods and equipment |
CN102800317B | May 25, 2011 | Sep 17, 2014 | 华为技术有限公司 | Signal classification method and equipment, and encoding and decoding methods and equipment |
EP2584560A1 * | Oct 21, 2011 | Apr 24, 2013 | Huawei Technologies Co., Ltd. | Signal classification method and device, and coding/decoding method and device |
U.S. Classification | 381/1, 381/2, 381/17 |
International Classification | H04R5/00 |
Cooperative Classification | G10L21/038, H04S1/002 |
European Classification | G10L21/038, H04S1/00A |
Date | Code | Event | Description |
---|---|---|---|
Apr 27, 2006 | AS | Assignment | Owner name: TEXAS INSTRUMENTS, INCORPORATED,TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:TSUTSUI, RYO;SETIAWAN, IVAN;IWATA, YOSHIHIDE;SIGNING DATES FROM 20060413 TO 20060417;REEL/FRAME:017539/0795 |
Mar 18, 2013 | FPAY | Fee payment | Year of fee payment: 4 |