US20080262856A1 - Method and system for enabling audio speed conversion - Google Patents
Method and system for enabling audio speed conversion Download PDFInfo
- Publication number
- US20080262856A1 US20080262856A1 US12/079,889 US7988908A US2008262856A1 US 20080262856 A1 US20080262856 A1 US 20080262856A1 US 7988908 A US7988908 A US 7988908A US 2008262856 A1 US2008262856 A1 US 2008262856A1
- Authority
- US
- United States
- Prior art keywords
- individual unit
- audio signal
- unit cycles
- average power
- cycles
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/04—Time compression or expansion
- G10L21/043—Time compression or expansion by changing speed
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/01—Correction of time axis
Definitions
- the present invention generally relates to audio speed conversion, and more particularly, to a method and system that enables audio speed conversion such as voice speed conversion.
- Speed conversion systems can be used to enable multiple speed operation (e.g., fast, slow, etc.) in video and/or audio reproduction systems, such as color television (CTV) systems, video tape recorders (VTRs), digital video/versatile disk (DVD) systems, compact disk (CD) players, hearing aids, telephone answering machines and the like.
- Conventional audio speed converters generally differentiate between a silence interval and a sound interval in an audio signal. Deleting the silence interval and compressing the sound interval results in an increased audio speed. Conversely, expanding the silence and sound intervals results in a decreased audio speed.
- Many conventional audio speed converters increase or decrease audio speed at a constant rate independent of the contents. Accordingly, these types of audio speed converters can not take full advantage of the silence and redundant intervals of an audio signal.
- a system for processing an audio signal comprises means for receiving the audio signal and dividing the received audio signal into one or more individual unit cycles and means for enabling an audio speed conversion operation by one of repeating and removing one or more of the individual unit cycles.
- a method for processing an audio signal comprises steps of receiving the audio signal, dividing the received audio signal into one or more individual unit cycles, and enabling an audio speed conversion operation by one of repeating and removing one or more of the individual unit cycles.
- FIG. 1 is an audio speed converter constructed according to principles of the present invention
- FIG. 2 is a single unit cycle of an exemplary input audio signal according to principles of the present invention
- FIG. 3 is a waveform illustrating an exemplary audio signal according to principles of the present invention.
- FIG. 4 is a waveform illustrating the periodicity of a sound interval of an exemplary audio signal according to principles of the present invention
- FIG. 5 is a series of waveforms illustrating an example of detecting a sound interval and a pitch period according to principles of the present invention.
- FIG. 6 is a series of waveforms illustrating examples of audio signal compression and expansion according to principles of the present invention.
- an audio signal such as a digital voice signal is received and divided into one or more individual unit cycles.
- An audio speed conversion operation is enabled by repeating or removing one or more of the individual unit cycles.
- repeating one or more of the individual unit cycles decreases audio speed, and removing one or more of the individual unit cycles increases audio speed.
- the received audio signal is divided into one or more individual unit cycles in dependence upon a reference value such that an individual unit cycle starts at a first sample of the received audio signal that is equal to or greater than the reference value and ends at a last sample of the received audio signal that is less than the reference value.
- the method may also include a step of determining whether each of the one or more individual unit cycles corresponds to a silence interval. This determination may be made in dependence upon an average power value for each of the one or more individual unit cycles. According to a preferred embodiment, the average power value for each of the one or more individual unit cycles is determined in dependence upon an average amplitude value for each of the one or more individual unit cycles.
- the method may also include a step of detecting one or more pitch periods in the received audio signal, wherein each of the one or more pitch periods includes one or more of the individual unit cycles. This detection may be in dependence upon the average power value for each of the one or more individual unit cycles.
- An audio speed conversion system capable of performing the foregoing method is also provided herein.
- an audio speed converter 10 constructed according to principles of the present invention is shown.
- an audio speed converter 10 includes a zero crossing detector 11 which receives an input audio signal.
- the zero crossing detector 11 samples the input audio signal and compares the sampled values to a zero reference value. Sampled values that are greater than or equal to zero reference value correspond to a positive input signal, and sampled values less than the zero reference value correspond to a negative input signal.
- the input audio signal is divided into a series of single unit cycle waveforms.
- An absolute value calculator 12 receives the sampled values of the input audio signal from the zero crossing detector 11 , and computes the absolute value of each sample.
- An average power value (P) generator 13 receives the absolute values computed by the absolute value calculator 12 , and calculates an average power value (P) for each cycle of the input audio signal based on the absolute values.
- the average power value (P) is calculated on the basis of the average amplitude value. That is, the average power value (P) is equal to the sum of the sample values divided by the total number of samples in a cycle. In this manner, the average power value (P) is computed for each cycle of the input audio signal.
- a silence detector 14 receives the average power values (P) from the average power value (P) generator 13 and performs a comparison operation to determine whether or not each cycle corresponds to a silence interval. In particular, the silence detector 14 compares each average power value (P) with a reference threshold value.
- a silence redundancy detector 15 may be utilized in certain modes to calculate the duration of the silence intervals and expand or compress the silence interval in accordance with principles of the present invention. Further details regarding the expansion and compression of intervals will be provided later herein.
- a sound detector and pitch period detector 16 detects a sound interval in the input audio signal, and further detects the start of different pitch periods.
- a pitch redundancy detector 17 detects redundancies in pitch periods in accordance with principles of the present invention. Further details regarding the detection of sound intervals and pitch periods will be provided later herein.
- a control circuit 18 controls the general operation of the audio speed converter 10 .
- the control circuit 18 enables outputs from the audio converter 10 to be stored in an internal buffer memory 19 or an external storage device 20 such as a hard disk, a random access memory (RAM), an optical disk or other external memory.
- the control circuit 18 also enables outputs from the audio converter 10 to be transferred to an external device 21 such as a speaker or other device, and receives inputs regarding modes of operation.
- the audio speed converter 10 of FIG. 1 has three different modes of operation: a fast mode, a slow mode, and a standby mode.
- FIGS. 1 through 6 Further details regarding operation of the audio speed converter 10 constructed according to principles of the present invention will now be provided with reference to FIGS. 1 through 6 .
- the zero crossing detector 11 of the audio speed converter 10 receives an input audio signal.
- the input audio signal is a 10 bit digital signal. It is contemplated, however, that input signals of other bit lengths may be accommodated in accordance with principles of the present invention.
- the zero crossing detector 11 samples the input audio signal and compares the sampled values to a zero reference value. According to a preferred embodiment, the zero reference value is 512. It is contemplated, however, that other zero reference values may be utilized in accordance with principles of the present invention.
- the input audio signal is divided into a series of single unit cycle waveforms.
- FIG. 2 a schematic diagram of a single cycle 30 of an exemplary input audio signal is shown.
- the dots represent exemplary points sampled by the zero crossing detector 11 of FIG. 1 and the numbers (i.e., 1000, 560, 470, 24) represent possible values of certain samples (assuming 10 bits of resolution).
- the zero crossing detector 11 uses a zero reference value of 512 in a preferred embodiment, which is one half a maximum value of 1024 (assuming 10 bits of resolution). Consequently, sampled values that are greater than or equal to 512 correspond to a positive input signal, and sampled values less than 512 correspond to a negative input signal.
- the input signal can be divided into a series of single unit cycle waveforms, such as the one shown in FIG. 2 .
- a single unit cycle of the input audio signal is measured from the first sample of the positive half-wave (value ⁇ 512) to the last sample of the negative half-wave (value ⁇ 512).
- Such a cycle is the smallest unit of a signal that is eliminated or repeated by the audio speed converter 10 .
- the audio speed converter 10 of FIG. 1 only deletes or repeats complete unit cycles of the input audio signal. The advantage of this method is that signal deletion or insertion always takes place at zero crossing points, thus preventing any audible clicks in an output audio signal.
- the present invention advantageously provides output audio signals comprised of actual audio information without synthetic waveforms.
- PICOLA pointer interval control overlap and add
- the absolute value calculator 12 receives the sampled values of the input audio signal from the zero crossing detector 11 , and computes the absolute value of each sample.
- the average power value (P) calculator 13 receives the absolute values computed by the absolute value calculator 12 , and calculates an average power value (P) for each cycle of the input audio signal based on the absolute values.
- the average power value (P) is calculated on the basis of the average amplitude value. That is, the average power value (P) is equal to the sum of the sample values divided by the total number of samples in a cycle. In this manner, the average power value (P) is computed for each cycle of the input audio signal.
- the silence detector 14 receives the average power values (P) from the average power value (P) generator 13 and performs a comparison operation to determine whether or not each cycle corresponds to a silence interval. In particular, the silence detector 14 compares each average power value (P) with a reference threshold value P SIL , which may be set according to design choice. If P ⁇ P SIL , the corresponding cycle is identified as a silence interval, and if P ⁇ P SIL , the corresponding cycle is identified as not being a silence interval (i.e., it contains recognizable sound). In situations where P ⁇ P SIL , the silence redundancy detector 15 may be utilized in certain modes to calculate the duration of the silence intervals and expand or compress the silence interval in accordance with principles of the present invention. Further details regarding this operation will now be provided.
- FIG. 3 a schematic diagram of a waveform 40 of an exemplary audio signal is shown.
- the waveform 40 of FIG. 3 may approximate the input audio signal to the audio speed converter 10 of FIG. 1 .
- the audio signal waveform 40 illustrates three different types of intervals: a silence interval, a quasi-sound interval, and a sound interval.
- a silence interval mainly contains background noise and is of very low amplitude, with a low and constant average power.
- the silence redundancy detector 15 can compress a silence interval by removing part of the silence interval. For example, in FIG. 3 if the silence interval T SIL is long, then an interval equal to T SIL ⁇ T TH can be removed.
- the threshold time T TH in FIG. 3 is a delay time that must elapse before compression of a silence interval can occur. In this manner, sounds (e.g., speech) represented by the audio signal can be better understood by a listener.
- the silence redundancy detector 15 can expand the silence interval by a predetermined time interval equal to T SIL-REF ⁇ T SIL .
- T SIL-REF limits the maximum expansion time of a silence interval. Moreover, this parameter causes the expansion of an originally long silence interval to be less than the expansion of an originally shorter interval. In this way, words spoken quickly can be better understood by a listener. If a silence interval is long enough so that the result of T SIL-REF ⁇ T SIL is negative, then expansion may not take place since there typically is no need to expand an already long silence interval.
- a quasi-sound interval exhibits greater amplitude than a silence interval, and is typically random in nature having frequent variations. Due to these frequent variations, a quasi-sound interval tends to exhibit a relatively low degree of periodicity (i.e., redundancy).
- a sound interval exhibits the largest amplitude of the three types of intervals, and has a periodic structure. Due to this periodicity, a sound interval exhibits some degree of redundancy. Quasi-sound intervals and sound intervals both may represent voice information.
- FIG. 4 a schematic diagram of a waveform 50 illustrating the periodicity of a sound interval of an exemplary audio signal is shown.
- the waveform 50 of FIG. 4 illustrates four pitch periods, T 1 through T 4 .
- a pitch period is defined by the periodicity (i.e., redundancy) in a sound interval of an audio signal. This redundancy in the sound interval can be used to increase audio speed.
- audio speed can be increased by removing the second and third pitch periods T 2 and T 3 from the waveform 50 .
- repeating the second and third pitch periods T 2 and T 3 in the waveform 50 decreases audio speed.
- the silence detector 14 determines that P ⁇ P SIL for a given cycle, that cycle is transferred to the sound detector and pitch period detector 16 for further processing.
- the sound detector and pitch period detector 16 detects a sound interval, such as the one shown in the waveform 40 of FIG. 3 , and further detects the start of pitch periods, such as the ones shown in the waveform 50 of FIG. 4 . Further details regarding this operation will now be provided.
- a waveform 60 shows an exemplary input audio signal having pitch periods T 1 through T 4 .
- Each pitch period includes one or more cycles.
- the pitch period T 1 includes cycles Cy 2 , Cy 3 and Cy 4 .
- the pitch period T 2 includes cycles Cy 5 , Cy 6 and Cy 7 .
- the pitch period T 3 includes cycles Cy 8 , Cy 9 and Cy 10 .
- the pitch period T 4 includes cycles Cy 11 , Cy 12 and Cy 13 .
- the number of cycles included in the pitch periods T 1 through T 4 is represented by the values N 1 through N 4 , respectively.
- a waveform 61 illustrates the average amplitude values corresponding to the different cycles.
- cycles Cy 1 through Cy 13 have average power values P 1 through P 13 , respectively.
- P SIL silence threshold value
- the cycles Cy 2 , Cy 5 , Cy 8 and Cy 11 each represent the start of a given pitch period detected by the sound detector and pitch period detector 16 of FIG. 1 .
- This detection may be enabled via the average power values. That is, the average power values P 2 , P 5 , P 8 and P 11 corresponding to the cycles Cy 2 , Cy 5 , Cy 8 and Cy 11 are higher than the average power values of the other cycles. Accordingly, power (e.g., amplitude) value is a useful criterion for detecting the start of pitch periods.
- the present invention uses a reference value for detecting pitch periods wherein a reference value for one cycle depends on the average power value of a previous cycle.
- the reference value for a given cycle is set equal to the average power value of an immediately preceding cycle multiplied by a constant that is between 1 and 2. Therefore, assuming for example that the constant is 1.5, the power value P 2 is compared to 1.5 times the power value P 1 . Similarly, the power value P 3 is compared to 1.5 times the power value P 2 , and so on.
- the reference value used to detect pitch periods varies from cycle to cycle and exactly follows the dynamic change of an audio signal such as a voice signal. Therefore, according to principles of the present invention, if the average amplitude value of one cycle is greater than or equal to its reference value, then that cycle is identified as the start of a pitch period and a logic high signal is generated for output by the sound detector and pitch period detector 16 .
- This output signal of the sound detector and pitch period detector 16 is represented by a waveform 62 in FIG. 5 . The rising edge of this output signal may be used to set a memory address pointer to indicate the start of a pitch period.
- a detected pitch period may be characterized by two parameters: its duration T and its total number of cycles N.
- the similarity between two successive pitch waveforms can be determined by comparing these parameters.
- the pitch redundancy detector 17 calculates a difference in duration between two successive pitch periods (e.g., T 1 and T 2 in FIG. 5 ) and compares the result to a reference value ⁇ T REF .
- the pitch redundancy detector 17 then calculates a difference in the number of cycles (e.g., N 1 and N 2 in FIG. 5 ) between the two successive pitch periods, and compares the result to another reference value ⁇ N REF .
- the two corresponding pitch periods are considered to be identical.
- the chance of identifying two identical pitch periods in a quasi-sound interval, such as the one shown in FIG. 3 is relatively low. However, the chance of identifying two identical pitch periods in a sound interval, such as the one shown in FIG. 3 , is higher.
- the audio speed converter 10 of FIG. 1 is in the fast mode of operation, the second of two identical periods is removed from an audio signal. By doing this, the signal redundancy decreases and audio speed increases. Conversely, when the audio speed converter 10 of FIG. 1 is in the slow mode of operation, the second of two identical periods is repeated in an audio signal. By doing this, the signal redundancy increases and audio speed decreases.
- a waveform 70 illustrates a situation where no signal compression or expansion is performed. Accordingly, all four pitch periods having durations T 1 through T 4 , respectively, are included in an audio signal.
- a waveform 71 illustrates a situation where signal compression is performed. In particular, only the pitch periods having durations T 1 and T 3 are included in an audio signal, thereby decreasing signal redundancy. The waveform 71 may result when the audio speed converter 10 of FIG. 1 is in the fast mode of operation.
- a waveform 72 illustrates a situation where signal expansion is performed.
- the pitch period having duration T 2 is repeated in an audio signal, thereby increasing signal redundancy.
- the waveform 72 may result when the audio speed converter 10 of FIG. 1 is in the slow mode of operation.
- the audio speed converter 10 is in the standby mode of operation, an input audio signal is simply looped through the audio speed converter 10 without any speed variation.
- the control circuit 18 can calculate the audio speed at any given moment and provide the result to other devices, such as the internal buffer memory 19 , the external storage device 20 and/or the external device 21 .
- the audio speed converter 10 when the audio speed converter 10 is in the fast mode of operation, best results are obtained at a speed that is a maximum of twice the original speed. If the speed is higher, sounds such as speech become less understandable to a listener. Nevertheless, higher speeds may be used in applications such as a fast forward function of a video tape recorder (VTR) where a complete comprehension of the audio information is not required. In such cases, it may be necessary to increase the values of the reference parameters T TH , T SIL-REF , P SIL , ⁇ T REF and ⁇ N REF . When the audio speed converter 10 is in the slow mode of operation, best results are obtained at a speed that is not lower than half the original speed. While the present invention is particularly suitable for processing voice signals, the principles of the present invention may also be applied to the processing of audio signals in general, including audio signals such as music containing data other than and/or in addition to voice data.
- the present invention provides several advantages over conventional audio speed conversion devices. Exemplary features of the present invention are as follows:
Abstract
The present invention provides a method and system for processing an audio signal. According to an exemplary method, an audio signal such as a digital voice signal is received and divided into one or more individual unit cycles. An audio speed conversion operation is enabled by repeating or removing one or more of the individual unit cycles. In particular, repeating one or more of the individual unit cycles decreases audio speed, and removing one or more of the individual unit cycles increases audio speed.
Description
- This application is a continuation application under 37 C.F.R. 1.53(b) of copending patent application Ser. No. 10/343,615 filed Feb. 3, 2003, claims benefit under 35 U.S.C. § 365 of International Application PCT/IB01/01161 filed Jun. 29, 2001, which was published in accordance with PCT Article 21(2) on Feb. 14, 2002 in English, and claims benefit of U.S. provisional application Ser. No. 60/224,115 filed Aug. 9, 2000.
- The present invention generally relates to audio speed conversion, and more particularly, to a method and system that enables audio speed conversion such as voice speed conversion.
- Speed conversion systems can be used to enable multiple speed operation (e.g., fast, slow, etc.) in video and/or audio reproduction systems, such as color television (CTV) systems, video tape recorders (VTRs), digital video/versatile disk (DVD) systems, compact disk (CD) players, hearing aids, telephone answering machines and the like. Conventional audio speed converters generally differentiate between a silence interval and a sound interval in an audio signal. Deleting the silence interval and compressing the sound interval results in an increased audio speed. Conversely, expanding the silence and sound intervals results in a decreased audio speed. Many conventional audio speed converters increase or decrease audio speed at a constant rate independent of the contents. Accordingly, these types of audio speed converters can not take full advantage of the silence and redundant intervals of an audio signal.
- The process of removing or repeating intervals of an audio signal can be problematic since it often produces undesirable audible “clicks.” Additionally, the pitch of an audio signal should not be changed or transformed to other frequencies since the human ear tends to be quite sensitive to these changes. Known prior art algorithms such as the “pointer interval control overlap and add” (PICOLA) algorithm address these problems by multiplying an audio signal by a window function in an attempt to smooth the output signal and maintain the original pitch. This results in producing synthetic waveforms that were not part of the original audio signal. Moreover, the use of such algorithms typically requires utilization of fast digital signal processors (DSPs), which tend to be expensive. Accordingly, it is desirable to provide an audio speed converter which avoids the use of expensive digital signal processors (DSPs), and utilizes more cost-effective processing means such as small programmable logic devices (PLDs). The present invention addresses these and other problems.
- In accordance with an aspect of the invention, a system for processing an audio signal comprises means for receiving the audio signal and dividing the received audio signal into one or more individual unit cycles and means for enabling an audio speed conversion operation by one of repeating and removing one or more of the individual unit cycles.
- In accordance with another aspect of the invention, a method for processing an audio signal comprises steps of receiving the audio signal, dividing the received audio signal into one or more individual unit cycles, and enabling an audio speed conversion operation by one of repeating and removing one or more of the individual unit cycles.
- In the drawings:
-
FIG. 1 is an audio speed converter constructed according to principles of the present invention; -
FIG. 2 is a single unit cycle of an exemplary input audio signal according to principles of the present invention; -
FIG. 3 is a waveform illustrating an exemplary audio signal according to principles of the present invention; -
FIG. 4 is a waveform illustrating the periodicity of a sound interval of an exemplary audio signal according to principles of the present invention; -
FIG. 5 is a series of waveforms illustrating an example of detecting a sound interval and a pitch period according to principles of the present invention; and -
FIG. 6 is a series of waveforms illustrating examples of audio signal compression and expansion according to principles of the present invention. - The exemplifications set out herein illustrate preferred embodiments of the invention, and such exemplifications are not to be construed as limiting the scope of the invention in any manner.
- This application discloses a system and a method for processing an audio signal which provide advantages over conventional techniques. According to an exemplary system and an exemplary method, an audio signal such as a digital voice signal is received and divided into one or more individual unit cycles. An audio speed conversion operation is enabled by repeating or removing one or more of the individual unit cycles. In particular, repeating one or more of the individual unit cycles decreases audio speed, and removing one or more of the individual unit cycles increases audio speed. According to a preferred embodiment, the received audio signal is divided into one or more individual unit cycles in dependence upon a reference value such that an individual unit cycle starts at a first sample of the received audio signal that is equal to or greater than the reference value and ends at a last sample of the received audio signal that is less than the reference value.
- The method may also include a step of determining whether each of the one or more individual unit cycles corresponds to a silence interval. This determination may be made in dependence upon an average power value for each of the one or more individual unit cycles. According to a preferred embodiment, the average power value for each of the one or more individual unit cycles is determined in dependence upon an average amplitude value for each of the one or more individual unit cycles. The method may also include a step of detecting one or more pitch periods in the received audio signal, wherein each of the one or more pitch periods includes one or more of the individual unit cycles. This detection may be in dependence upon the average power value for each of the one or more individual unit cycles. An audio speed conversion system capable of performing the foregoing method is also provided herein.
- Referring now to the drawings, and more particularly to
FIG. 1 , anaudio speed converter 10 constructed according to principles of the present invention is shown. InFIG. 1 , anaudio speed converter 10 includes a zerocrossing detector 11 which receives an input audio signal. The zerocrossing detector 11 samples the input audio signal and compares the sampled values to a zero reference value. Sampled values that are greater than or equal to zero reference value correspond to a positive input signal, and sampled values less than the zero reference value correspond to a negative input signal. As will be discussed later herein, the input audio signal is divided into a series of single unit cycle waveforms. - An
absolute value calculator 12 receives the sampled values of the input audio signal from the zerocrossing detector 11, and computes the absolute value of each sample. An average power value (P)generator 13 receives the absolute values computed by theabsolute value calculator 12, and calculates an average power value (P) for each cycle of the input audio signal based on the absolute values. In accordance with principles of the present invention, it is important to calculate the average power value (P) of a single unit cycle waveform, and not of a single frame that contains a fixed number of samples, as is the case with many conventional audio speed converters. According to a preferred embodiment, the average power value (P) is calculated on the basis of the average amplitude value. That is, the average power value (P) is equal to the sum of the sample values divided by the total number of samples in a cycle. In this manner, the average power value (P) is computed for each cycle of the input audio signal. - A
silence detector 14 receives the average power values (P) from the average power value (P)generator 13 and performs a comparison operation to determine whether or not each cycle corresponds to a silence interval. In particular, thesilence detector 14 compares each average power value (P) with a reference threshold value. When one or more cycles corresponding to a silence interval are identified, asilence redundancy detector 15 may be utilized in certain modes to calculate the duration of the silence intervals and expand or compress the silence interval in accordance with principles of the present invention. Further details regarding the expansion and compression of intervals will be provided later herein. Alternatively, when one or more cycles not corresponding to a silence interval are identified, a sound detector andpitch period detector 16 detects a sound interval in the input audio signal, and further detects the start of different pitch periods. Apitch redundancy detector 17 detects redundancies in pitch periods in accordance with principles of the present invention. Further details regarding the detection of sound intervals and pitch periods will be provided later herein. - A
control circuit 18 controls the general operation of theaudio speed converter 10. For example, thecontrol circuit 18 enables outputs from theaudio converter 10 to be stored in aninternal buffer memory 19 or anexternal storage device 20 such as a hard disk, a random access memory (RAM), an optical disk or other external memory. Thecontrol circuit 18 also enables outputs from theaudio converter 10 to be transferred to anexternal device 21 such as a speaker or other device, and receives inputs regarding modes of operation. As will be discussed later herein, theaudio speed converter 10 ofFIG. 1 has three different modes of operation: a fast mode, a slow mode, and a standby mode. - Further details regarding operation of the
audio speed converter 10 constructed according to principles of the present invention will now be provided with reference toFIGS. 1 through 6 . - As previously indicated, in
FIG. 1 the zerocrossing detector 11 of theaudio speed converter 10 receives an input audio signal. According to a preferred embodiment, the input audio signal is a 10 bit digital signal. It is contemplated, however, that input signals of other bit lengths may be accommodated in accordance with principles of the present invention. The zerocrossing detector 11 samples the input audio signal and compares the sampled values to a zero reference value. According to a preferred embodiment, the zero reference value is 512. It is contemplated, however, that other zero reference values may be utilized in accordance with principles of the present invention. As previously indicated, the input audio signal is divided into a series of single unit cycle waveforms. - Referring now to
FIG. 2 , a schematic diagram of asingle cycle 30 of an exemplary input audio signal is shown. InFIG. 2 , the dots represent exemplary points sampled by the zerocrossing detector 11 ofFIG. 1 and the numbers (i.e., 1000, 560, 470, 24) represent possible values of certain samples (assuming 10 bits of resolution). As previously indicated, the zerocrossing detector 11 uses a zero reference value of 512 in a preferred embodiment, which is one half a maximum value of 1024 (assuming 10 bits of resolution). Consequently, sampled values that are greater than or equal to 512 correspond to a positive input signal, and sampled values less than 512 correspond to a negative input signal. By comparing the sampled values with a zero reference value, the input signal can be divided into a series of single unit cycle waveforms, such as the one shown inFIG. 2 . According to principles of the present invention, a single unit cycle of the input audio signal is measured from the first sample of the positive half-wave (value≧512) to the last sample of the negative half-wave (value<512). Such a cycle is the smallest unit of a signal that is eliminated or repeated by theaudio speed converter 10. As will be discussed later herein, theaudio speed converter 10 ofFIG. 1 only deletes or repeats complete unit cycles of the input audio signal. The advantage of this method is that signal deletion or insertion always takes place at zero crossing points, thus preventing any audible clicks in an output audio signal. In this way, the present invention advantageously provides output audio signals comprised of actual audio information without synthetic waveforms. In the conventional “pointer interval control overlap and add” (PICOLA) algorithm, an input audio signal is multiplied by a window function which results in producing synthetic waveforms that were not part of the original audio signal. - Referring back to
FIG. 1 , theabsolute value calculator 12 receives the sampled values of the input audio signal from the zerocrossing detector 11, and computes the absolute value of each sample. The average power value (P)calculator 13 receives the absolute values computed by theabsolute value calculator 12, and calculates an average power value (P) for each cycle of the input audio signal based on the absolute values. In accordance with principles of the present invention, it is important to calculate the average power value (P) of a single cycle waveform, and not of a single frame that contains a fixed number of samples, as is the case with many conventional audio speed converters. According to a preferred embodiment, the average power value (P) is calculated on the basis of the average amplitude value. That is, the average power value (P) is equal to the sum of the sample values divided by the total number of samples in a cycle. In this manner, the average power value (P) is computed for each cycle of the input audio signal. - The
silence detector 14 receives the average power values (P) from the average power value (P)generator 13 and performs a comparison operation to determine whether or not each cycle corresponds to a silence interval. In particular, thesilence detector 14 compares each average power value (P) with a reference threshold value PSIL, which may be set according to design choice. If P<PSIL, the corresponding cycle is identified as a silence interval, and if P≧PSIL, the corresponding cycle is identified as not being a silence interval (i.e., it contains recognizable sound). In situations where P<PSIL, thesilence redundancy detector 15 may be utilized in certain modes to calculate the duration of the silence intervals and expand or compress the silence interval in accordance with principles of the present invention. Further details regarding this operation will now be provided. - Referring to
FIG. 3 , a schematic diagram of awaveform 40 of an exemplary audio signal is shown. Thewaveform 40 ofFIG. 3 may approximate the input audio signal to theaudio speed converter 10 ofFIG. 1 . InFIG. 3 , theaudio signal waveform 40 illustrates three different types of intervals: a silence interval, a quasi-sound interval, and a sound interval. A silence interval mainly contains background noise and is of very low amplitude, with a low and constant average power. When theaudio speed converter 10 ofFIG. 1 is in the fast mode, thesilence redundancy detector 15 can compress a silence interval by removing part of the silence interval. For example, inFIG. 3 if the silence interval TSIL is long, then an interval equal to TSIL−TTH can be removed. The threshold time TTH inFIG. 3 is a delay time that must elapse before compression of a silence interval can occur. In this manner, sounds (e.g., speech) represented by the audio signal can be better understood by a listener. - Additionally, when the
audio speed converter 10 ofFIG. 1 is in the slow mode, thesilence redundancy detector 15 can expand the silence interval by a predetermined time interval equal to TSIL-REF−TSIL. The parameter TSIL-REF limits the maximum expansion time of a silence interval. Moreover, this parameter causes the expansion of an originally long silence interval to be less than the expansion of an originally shorter interval. In this way, words spoken quickly can be better understood by a listener. If a silence interval is long enough so that the result of TSIL-REF−TSIL is negative, then expansion may not take place since there typically is no need to expand an already long silence interval. - As indicated by the
waveform 40 ofFIG. 3 , a quasi-sound interval exhibits greater amplitude than a silence interval, and is typically random in nature having frequent variations. Due to these frequent variations, a quasi-sound interval tends to exhibit a relatively low degree of periodicity (i.e., redundancy). A sound interval exhibits the largest amplitude of the three types of intervals, and has a periodic structure. Due to this periodicity, a sound interval exhibits some degree of redundancy. Quasi-sound intervals and sound intervals both may represent voice information. - Referring to
FIG. 4 , a schematic diagram of awaveform 50 illustrating the periodicity of a sound interval of an exemplary audio signal is shown. In particular, thewaveform 50 ofFIG. 4 illustrates four pitch periods, T1 through T4. As indicated inFIG. 4 , a pitch period is defined by the periodicity (i.e., redundancy) in a sound interval of an audio signal. This redundancy in the sound interval can be used to increase audio speed. For example, inFIG. 4 audio speed can be increased by removing the second and third pitch periods T2 and T3 from thewaveform 50. Conversely, repeating the second and third pitch periods T2 and T3 in thewaveform 50 decreases audio speed. - Referring back to
FIG. 1 , when thesilence detector 14 determines that P≧PSIL for a given cycle, that cycle is transferred to the sound detector andpitch period detector 16 for further processing. In particular, the sound detector andpitch period detector 16 detects a sound interval, such as the one shown in thewaveform 40 ofFIG. 3 , and further detects the start of pitch periods, such as the ones shown in thewaveform 50 ofFIG. 4 . Further details regarding this operation will now be provided. - Referring to
FIG. 5 , a series of waveforms illustrating an example of detecting a sound interval and a pitch period according to principles of the present invention are shown. InFIG. 5 , awaveform 60 shows an exemplary input audio signal having pitch periods T1 through T4. Each pitch period includes one or more cycles. For example, inFIG. 5 the pitch period T1 includes cycles Cy2, Cy3 and Cy4. The pitch period T2 includes cycles Cy5, Cy6 and Cy7. The pitch period T3 includes cycles Cy8, Cy9 and Cy10. The pitch period T4 includes cycles Cy11, Cy12 and Cy13. The number of cycles included in the pitch periods T1 through T4 is represented by the values N1 through N4, respectively. A waveform 61 illustrates the average amplitude values corresponding to the different cycles. In particular, cycles Cy1 through Cy13 have average power values P1 through P13, respectively. Note that all of the average power values P1 through P13 inFIG. 5 are above the silence threshold value PSIL, which is shown as a dotted line. - As indicated by the
waveform 60, the cycles Cy2, Cy5, Cy8 and Cy11 each represent the start of a given pitch period detected by the sound detector andpitch period detector 16 ofFIG. 1 . This detection may be enabled via the average power values. That is, the average power values P2, P5, P8 and P11 corresponding to the cycles Cy2, Cy5, Cy8 and Cy11 are higher than the average power values of the other cycles. Accordingly, power (e.g., amplitude) value is a useful criterion for detecting the start of pitch periods. Since certain audio signals such as voice signals are dynamic in that their power values vary with time, a reference level (i.e., value) used to detect pitch periods should also vary with time and follow changes in the input audio signal. Therefore, the present invention uses a reference value for detecting pitch periods wherein a reference value for one cycle depends on the average power value of a previous cycle. According to a preferred embodiment, the reference value for a given cycle is set equal to the average power value of an immediately preceding cycle multiplied by a constant that is between 1 and 2. Therefore, assuming for example that the constant is 1.5, the power value P2 is compared to 1.5 times the power value P1. Similarly, the power value P3 is compared to 1.5 times the power value P2, and so on. In this manner, the reference value used to detect pitch periods varies from cycle to cycle and exactly follows the dynamic change of an audio signal such as a voice signal. Therefore, according to principles of the present invention, if the average amplitude value of one cycle is greater than or equal to its reference value, then that cycle is identified as the start of a pitch period and a logic high signal is generated for output by the sound detector andpitch period detector 16. This output signal of the sound detector andpitch period detector 16 is represented by awaveform 62 inFIG. 5 . The rising edge of this output signal may be used to set a memory address pointer to indicate the start of a pitch period. - A detected pitch period may be characterized by two parameters: its duration T and its total number of cycles N. The similarity between two successive pitch waveforms can be determined by comparing these parameters. In
FIG. 1 , thepitch redundancy detector 17 calculates a difference in duration between two successive pitch periods (e.g., T1 and T2 inFIG. 5 ) and compares the result to a reference value ΔTREF. Thepitch redundancy detector 17 then calculates a difference in the number of cycles (e.g., N1 and N2 inFIG. 5 ) between the two successive pitch periods, and compares the result to another reference value ΔNREF. According to a preferred embodiment, if the two conditions |T2−T1|≦ΔTREF and |N2−N1|≦ΔNREF are fulfilled, the two corresponding pitch periods are considered to be identical. The chance of identifying two identical pitch periods in a quasi-sound interval, such as the one shown inFIG. 3 , is relatively low. However, the chance of identifying two identical pitch periods in a sound interval, such as the one shown inFIG. 3 , is higher. When theaudio speed converter 10 ofFIG. 1 is in the fast mode of operation, the second of two identical periods is removed from an audio signal. By doing this, the signal redundancy decreases and audio speed increases. Conversely, when theaudio speed converter 10 ofFIG. 1 is in the slow mode of operation, the second of two identical periods is repeated in an audio signal. By doing this, the signal redundancy increases and audio speed decreases. - Referring to
FIG. 6 , a series of waveforms illustrating examples of audio signal compression and expansion according to principles of the present invention are shown. InFIG. 6 , awaveform 70 illustrates a situation where no signal compression or expansion is performed. Accordingly, all four pitch periods having durations T1 through T4, respectively, are included in an audio signal. Awaveform 71 illustrates a situation where signal compression is performed. In particular, only the pitch periods having durations T1 and T3 are included in an audio signal, thereby decreasing signal redundancy. Thewaveform 71 may result when theaudio speed converter 10 ofFIG. 1 is in the fast mode of operation. Awaveform 72 illustrates a situation where signal expansion is performed. In particular, the pitch period having duration T2 is repeated in an audio signal, thereby increasing signal redundancy. Thewaveform 72 may result when theaudio speed converter 10 ofFIG. 1 is in the slow mode of operation. When theaudio speed converter 10 is in the standby mode of operation, an input audio signal is simply looped through theaudio speed converter 10 without any speed variation. When theaudio speed converter 10 is in the fast or slow modes of operation, the number of deleted or repeated cycles is controlled by thecontrol circuit 18. Therefore, thecontrol circuit 18 can calculate the audio speed at any given moment and provide the result to other devices, such as theinternal buffer memory 19, theexternal storage device 20 and/or theexternal device 21. - Certain other attributes of the present invention have been identified. For example, when the
audio speed converter 10 is in the fast mode of operation, best results are obtained at a speed that is a maximum of twice the original speed. If the speed is higher, sounds such as speech become less understandable to a listener. Nevertheless, higher speeds may be used in applications such as a fast forward function of a video tape recorder (VTR) where a complete comprehension of the audio information is not required. In such cases, it may be necessary to increase the values of the reference parameters TTH, TSIL-REF, PSIL, ΔTREF and ΔNREF. When theaudio speed converter 10 is in the slow mode of operation, best results are obtained at a speed that is not lower than half the original speed. While the present invention is particularly suitable for processing voice signals, the principles of the present invention may also be applied to the processing of audio signals in general, including audio signals such as music containing data other than and/or in addition to voice data. - As described above, the present invention provides several advantages over conventional audio speed conversion devices. Exemplary features of the present invention are as follows:
-
- Deletion or insertion of parts of an audio signal always occurs at zero crossing points, thereby eliminating audible clicks.
- Simple and fast signal processing is enabled since no multiplication is required at the deletion or insertion points.
- An input voice signal is divided into variable-length cycles/frames, wherein each cycle/frame is equal to a variable number of signal samples depending on the frequency of the input audio signal.
- Elimination (i.e., removal) or insertion (i.e., repetition) of parts of an audio signal only takes place if two successive periods are found to be identical.
- Only part of a silence interval is deleted. The expansion of a silence interval is inversely proportional to its duration.
- No time or speed limit for the signal processing is imposed. This results in good quality audio reproduction. Conventional audio speed converters often eliminate or repeat a section of an audio signal depending on the overflow or underflow of a buffer memory. Also, they often have time and speed limits, which have to be fulfilled. This often results in loosing complete sections of an audio signal.
- The resulting output signal, independent of the momentary speed, contains only parts of the original audio signal. No synthetically produced parts are included.
- The resulting audio speed is not constant. The rate of speed change depends on the parameters TTH, TSIL-REF, PSIL, ΔTREF, ΔNREF and the input signal. In the fast mode, an input signal that contains more silence intervals and more identical intervals will result in a faster output signal than an input signal having the same duration but opposite features. In the slow mode, the audio speed converter proceeds in a way that short silence intervals are expanded more than long silence intervals.
- While this invention has been described as having a preferred design, the present invention can be further modified within the spirit and scope of this disclosure. This application is therefore intended to cover any variations, uses, of adaptations of the invention using its general principles. Further, this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains and which fall within the limits of the appended claims.
Claims (24)
1. A system for processing an audio signal, comprising:
means for receiving said audio signal and dividing said received audio signal into one or more individual unit cycles;
means for enabling an audio speed conversion operation by one of repeating and removing one or more of said individual unit cycles;
means for detecting one or more pitch periods in said received audio signal, wherein each of said one or more pitch periods includes one or more of said individual unit cycle;
means for generating an average power value for each of said one or more individual unit cycles; and
wherein said detecting means detects said one or more pitch periods in said received audio signal in dependence upon said average power value for each of said one or more individual unit cycles.
2. The system of claim 1 , wherein said receiving means divides said received audio signal into said one or more individual unit cycles in dependence upon a reference value such that an individual unit cycle starts at a first sample of said received audio signal that is equal to or greater than said reference value and ends at a last sample of said received audio signal that is less than said reference value.
3. The system of claim 1 , wherein repeating one or more of said individual unit cycles decreases audio speed.
4. The system of claim 1 , wherein removing one or more of said individual unit cycles increases audio speed.
5. The system of claim 1 , wherein said received audio signal is a digital voice signal.
6. The system of claim 1 , further comprising means for determining whether each of said one or more individual unit cycles corresponds to a silence interval in dependence upon said average power value for each of said one or more individual unit cycles.
7. The system of claim 1 , wherein said generating means generates said average power value for each of said one or more individual unit cycles in dependence upon an average amplitude value for each of said one or more individual unit cycles.
8. An audio speed conversion system, comprising:
a signal detector for receiving an audio signal and dividing said received audio signal into one or more individual unit cycles;
circuitry for enabling an audio speed conversion operation by one of repeating and removing one or more of said individual unit cycles;
a pitch period detector for detecting one or more pitch periods in said received audio signal, wherein each of said one or more pitch periods includes one or more of said individual unit cycles;
an average power value generator for generating an average power value for each of said one or more individual unit cycles; and
wherein said pitch period detector detects said one or more pitch periods in said received audio signal in dependence upon said average power value for each of said one or more individual unit cycles.
9. The audio speed conversion system of claim 8 , wherein said signal detector divides said received audio signal into said one or more individual unit cycles in dependence upon a reference value such that an individual unit cycle starts at a first sample of said received audio signal that is equal to or greater than said reference value and ends at a last sample of said received audio signal that is less than said reference value.
10. The audio speed conversion system of claim 8 , wherein repeating one or more of said individual unit cycles decreases audio speed.
11. The audio speed conversion system of claim 8 , wherein removing one or more of said individual unit cycles increases audio speed.
12. The audio speed conversion system of claim 8 , wherein said received audio signal is a digital voice signal.
13. The audio speed conversion system of claim 8 , further comprising a silence detector for determining whether each of said one or more individual unit cycles corresponds to a silence interval in dependence upon said average power value for each of said one or more individual unit cycles.
14. The audio speed conversion system of claim 8 , wherein said average power value generator generates said average power value for each of said one or more individual unit cycles in dependence upon an average amplitude value for each of said one or more individual unit cycles.
15. The audio speed conversion system of claim 8 , wherein said average power value generator generates said average power value for each of said one or more individual unit cycles in dependence upon an average amplitude value for each of said one or more individual unit cycles.
16. A method for processing an audio signal, comprising steps of:
receiving said audio signal;
dividing said received audio signal into one or more individual unit cycles;
enabling an audio speed conversion operation by one of repeating and removing one or more of said individual unit cycles;
detecting one or more pitch periods in said received audio signal, wherein each of said one or more pitch periods includes one or more of said individual unit cycles; and
wherein said step of detecting one or more pitch periods in said received audio signal is performed in dependence upon an average power value for each of said one or more individual unit cycles.
17. The method of claim 16 , wherein said received audio signal is divided into said one or more individual unit cycles in dependence upon a reference value such that an individual unit cycle starts at a first sample of said received audio signal that is equal to or greater than said reference value and ends at a last sample of said received audio signal that is less than said reference value.
18. The method of claim 16 , wherein repeating one or more of said individual unit cycles decreases audio speed.
19. The method of claim 16 , wherein removing one or more of said individual unit cycles increases audio speed.
20. The method of claim 16 , wherein said received audio signal is a digital voice signal.
21. The method of claim 16 , further comprising a step of determining whether each of said one or more individual unit cycles corresponds to a silence interval.
22. The method of claim 21 , wherein the step of determining whether each of said one or more individual unit cycles corresponds to a silence interval is performed in dependence upon an average power value for each of said one or more individual unit cycles.
23. The method of claim 22 , wherein said average power value for each of said one or more individual unit cycles is determined in dependence upon an average amplitude value for each of said one or more individual unit cycles.
24. The method of claim 16 , wherein said average power value for each of said one or more individual unit cycles is determined in dependence upon an average amplitude value for each of said one or more individual unit cycles.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/079,889 US20080262856A1 (en) | 2000-08-09 | 2008-03-28 | Method and system for enabling audio speed conversion |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US22411500P | 2000-08-09 | 2000-08-09 | |
US10/343,615 US7363232B2 (en) | 2000-08-09 | 2001-06-29 | Method and system for enabling audio speed conversion |
PCT/IB2001/001161 WO2002013185A1 (en) | 2000-08-09 | 2001-06-29 | Method and system for enabling audio speed conversion |
US12/079,889 US20080262856A1 (en) | 2000-08-09 | 2008-03-28 | Method and system for enabling audio speed conversion |
Related Parent Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2001/001161 Continuation WO2002013185A1 (en) | 2000-08-09 | 2001-06-29 | Method and system for enabling audio speed conversion |
US10/343,615 Continuation US7363232B2 (en) | 2000-08-09 | 2001-06-29 | Method and system for enabling audio speed conversion |
Publications (1)
Publication Number | Publication Date |
---|---|
US20080262856A1 true US20080262856A1 (en) | 2008-10-23 |
Family
ID=22839331
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/343,615 Expired - Lifetime US7363232B2 (en) | 2000-08-09 | 2001-06-29 | Method and system for enabling audio speed conversion |
US12/079,889 Abandoned US20080262856A1 (en) | 2000-08-09 | 2008-03-28 | Method and system for enabling audio speed conversion |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/343,615 Expired - Lifetime US7363232B2 (en) | 2000-08-09 | 2001-06-29 | Method and system for enabling audio speed conversion |
Country Status (9)
Country | Link |
---|---|
US (2) | US7363232B2 (en) |
EP (1) | EP1309965B1 (en) |
JP (1) | JP5367932B2 (en) |
KR (1) | KR100806155B1 (en) |
CN (1) | CN1211781C (en) |
AU (1) | AU2001267764A1 (en) |
DE (1) | DE60143662D1 (en) |
MX (1) | MXPA03001198A (en) |
WO (1) | WO2002013185A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070201498A1 (en) * | 2006-02-27 | 2007-08-30 | Masakiyo Tanaka | Fluctuation absorbing buffer apparatus and packet voice communication apparatus |
US20090190614A1 (en) * | 2008-01-24 | 2009-07-30 | Broadcom Corporation | Jitter buffer adaptation based on audio content |
US20110046967A1 (en) * | 2009-08-21 | 2011-02-24 | Casio Computer Co., Ltd. | Data converting apparatus and data converting method |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7426470B2 (en) * | 2002-10-03 | 2008-09-16 | Ntt Docomo, Inc. | Energy-based nonuniform time-scale modification of audio signals |
GB0228245D0 (en) | 2002-12-04 | 2003-01-08 | Mitel Knowledge Corp | Apparatus and method for changing the playback rate of recorded speech |
JP4675692B2 (en) * | 2005-06-22 | 2011-04-27 | 富士通株式会社 | Speaking speed converter |
WO2008054471A2 (en) * | 2006-03-09 | 2008-05-08 | The Board Of Trustees Of The Leland Stanford Junior University | Monolayer-protected gold clusters: improved synthesis and bioconjugation |
JP2007304515A (en) * | 2006-05-15 | 2007-11-22 | Sony Corp | Audio signal decompressing and compressing method and device |
JP4940888B2 (en) * | 2006-10-23 | 2012-05-30 | ソニー株式会社 | Audio signal expansion and compression apparatus and method |
JP5093648B2 (en) * | 2007-05-07 | 2012-12-12 | 国立大学法人電気通信大学 | Playback device |
CN101615397B (en) * | 2008-06-24 | 2013-04-24 | 瑞昱半导体股份有限公司 | Audio signal processing method |
JP2016119588A (en) * | 2014-12-22 | 2016-06-30 | アイシン・エィ・ダブリュ株式会社 | Sound information correction system, sound information correction method, and sound information correction program |
CN105957543B (en) * | 2016-04-26 | 2020-04-28 | 广东小天才科技有限公司 | Audio playing rate adjusting method and system |
CN106504593A (en) * | 2016-11-16 | 2017-03-15 | 马珂 | Four-dimensional image flash memory device |
US10671251B2 (en) | 2017-12-22 | 2020-06-02 | Arbordale Publishing, LLC | Interactive eReader interface generation based on synchronization of textual and audial descriptors |
US11443646B2 (en) | 2017-12-22 | 2022-09-13 | Fathom Technologies, LLC | E-Reader interface system with audio and highlighting synchronization for digital books |
US10878835B1 (en) * | 2018-11-16 | 2020-12-29 | Amazon Technologies, Inc | System for shortening audio playback times |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4631746A (en) * | 1983-02-14 | 1986-12-23 | Wang Laboratories, Inc. | Compression and expansion of digitized voice signals |
US5517595A (en) * | 1994-02-08 | 1996-05-14 | At&T Corp. | Decomposition in noise and periodic signal waveforms in waveform interpolation |
US5583652A (en) * | 1994-04-28 | 1996-12-10 | International Business Machines Corporation | Synchronized, variable-speed playback of digitally recorded audio and video |
US5717818A (en) * | 1992-08-18 | 1998-02-10 | Hitachi, Ltd. | Audio signal storing apparatus having a function for converting speech speed |
US5995925A (en) * | 1996-09-17 | 1999-11-30 | Nec Corporation | Voice speed converter |
US6085157A (en) * | 1996-01-19 | 2000-07-04 | Matsushita Electric Industrial Co., Ltd. | Reproducing velocity converting apparatus with different speech velocity between voiced sound and unvoiced sound |
US6205420B1 (en) * | 1997-03-14 | 2001-03-20 | Nippon Hoso Kyokai | Method and device for instantly changing the speed of a speech |
US6236970B1 (en) * | 1997-04-30 | 2001-05-22 | Nippon Hoso Kyokai | Adaptive speech rate conversion without extension of input data duration, using speech interval detection |
US6735738B1 (en) * | 1998-11-04 | 2004-05-11 | Fujitsu Limited | Method and device for reconstructing acoustic data and animation data in synchronization |
US7058569B2 (en) * | 2000-09-15 | 2006-06-06 | Nuance Communications, Inc. | Fast waveform synchronization for concentration and time-scale modification of speech |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3786195A (en) * | 1971-08-13 | 1974-01-15 | Dc Dt Liquidating Partnership | Variable delay line signal processor for sound reproduction |
FR2485839B1 (en) * | 1980-06-27 | 1985-09-06 | Cit Alcatel | SPEECH DETECTION METHOD IN TELEPHONE CIRCUIT SIGNAL AND SPEECH DETECTOR IMPLEMENTING SAME |
US4803730A (en) * | 1986-10-31 | 1989-02-07 | American Telephone And Telegraph Company, At&T Bell Laboratories | Fast significant sample detection for a pitch detector |
JP3179468B2 (en) * | 1990-07-25 | 2001-06-25 | ソニー株式会社 | Karaoke apparatus and singer's singing correction method in karaoke apparatus |
US5611018A (en) * | 1993-09-18 | 1997-03-11 | Sanyo Electric Co., Ltd. | System for controlling voice speed of an input signal |
US5920842A (en) * | 1994-10-12 | 1999-07-06 | Pixel Instruments | Signal synchronization |
US5809454A (en) * | 1995-06-30 | 1998-09-15 | Sanyo Electric Co., Ltd. | Audio reproducing apparatus having voice speed converting function |
JP3257379B2 (en) * | 1995-12-08 | 2002-02-18 | ヤマハ株式会社 | Hearing aid with speech speed conversion function |
US5749064A (en) * | 1996-03-01 | 1998-05-05 | Texas Instruments Incorporated | Method and system for time scale modification utilizing feature vectors about zero crossing points |
US6049766A (en) * | 1996-11-07 | 2000-04-11 | Creative Technology Ltd. | Time-domain time/pitch scaling of speech or audio signals with transient handling |
JPH10187188A (en) * | 1996-12-27 | 1998-07-14 | Shinano Kenshi Co Ltd | Method and device for speech reproducing |
US6009386A (en) * | 1997-11-28 | 1999-12-28 | Nortel Networks Corporation | Speech playback speed change using wavelet coding, preferably sub-band coding |
US7010491B1 (en) * | 1999-12-09 | 2006-03-07 | Roland Corporation | Method and system for waveform compression and expansion with time axis |
-
2001
- 2001-06-29 JP JP2002518457A patent/JP5367932B2/en not_active Expired - Fee Related
- 2001-06-29 EP EP01945551A patent/EP1309965B1/en not_active Expired - Lifetime
- 2001-06-29 CN CNB018139205A patent/CN1211781C/en not_active Expired - Fee Related
- 2001-06-29 WO PCT/IB2001/001161 patent/WO2002013185A1/en active Application Filing
- 2001-06-29 KR KR1020037001765A patent/KR100806155B1/en not_active IP Right Cessation
- 2001-06-29 US US10/343,615 patent/US7363232B2/en not_active Expired - Lifetime
- 2001-06-29 MX MXPA03001198A patent/MXPA03001198A/en active IP Right Grant
- 2001-06-29 AU AU2001267764A patent/AU2001267764A1/en not_active Abandoned
- 2001-06-29 DE DE60143662T patent/DE60143662D1/en not_active Expired - Lifetime
-
2008
- 2008-03-28 US US12/079,889 patent/US20080262856A1/en not_active Abandoned
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4631746A (en) * | 1983-02-14 | 1986-12-23 | Wang Laboratories, Inc. | Compression and expansion of digitized voice signals |
US5717818A (en) * | 1992-08-18 | 1998-02-10 | Hitachi, Ltd. | Audio signal storing apparatus having a function for converting speech speed |
US5517595A (en) * | 1994-02-08 | 1996-05-14 | At&T Corp. | Decomposition in noise and periodic signal waveforms in waveform interpolation |
US5583652A (en) * | 1994-04-28 | 1996-12-10 | International Business Machines Corporation | Synchronized, variable-speed playback of digitally recorded audio and video |
US6085157A (en) * | 1996-01-19 | 2000-07-04 | Matsushita Electric Industrial Co., Ltd. | Reproducing velocity converting apparatus with different speech velocity between voiced sound and unvoiced sound |
US5995925A (en) * | 1996-09-17 | 1999-11-30 | Nec Corporation | Voice speed converter |
US6205420B1 (en) * | 1997-03-14 | 2001-03-20 | Nippon Hoso Kyokai | Method and device for instantly changing the speed of a speech |
US6236970B1 (en) * | 1997-04-30 | 2001-05-22 | Nippon Hoso Kyokai | Adaptive speech rate conversion without extension of input data duration, using speech interval detection |
US6735738B1 (en) * | 1998-11-04 | 2004-05-11 | Fujitsu Limited | Method and device for reconstructing acoustic data and animation data in synchronization |
US7058569B2 (en) * | 2000-09-15 | 2006-06-06 | Nuance Communications, Inc. | Fast waveform synchronization for concentration and time-scale modification of speech |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070201498A1 (en) * | 2006-02-27 | 2007-08-30 | Masakiyo Tanaka | Fluctuation absorbing buffer apparatus and packet voice communication apparatus |
US20090190614A1 (en) * | 2008-01-24 | 2009-07-30 | Broadcom Corporation | Jitter buffer adaptation based on audio content |
US7852882B2 (en) * | 2008-01-24 | 2010-12-14 | Broadcom Corporation | Jitter buffer adaptation based on audio content |
US20110051957A1 (en) * | 2008-01-24 | 2011-03-03 | Broadcom Corporation | Jitter buffer adaptation based on audio content |
US8576884B2 (en) * | 2008-01-24 | 2013-11-05 | Broadcom Corporation | Jitter buffer adaptation based on audio content |
US20110046967A1 (en) * | 2009-08-21 | 2011-02-24 | Casio Computer Co., Ltd. | Data converting apparatus and data converting method |
US8484018B2 (en) * | 2009-08-21 | 2013-07-09 | Casio Computer Co., Ltd | Data converting apparatus and method that divides input data into plural frames and partially overlaps the divided frames to produce output data |
Also Published As
Publication number | Publication date |
---|---|
JP2004506243A (en) | 2004-02-26 |
JP5367932B2 (en) | 2013-12-11 |
AU2001267764A1 (en) | 2002-02-18 |
US7363232B2 (en) | 2008-04-22 |
EP1309965B1 (en) | 2010-12-15 |
CN1211781C (en) | 2005-07-20 |
KR100806155B1 (en) | 2008-02-22 |
MXPA03001198A (en) | 2003-06-30 |
US20040015345A1 (en) | 2004-01-22 |
KR20030018072A (en) | 2003-03-04 |
EP1309965A1 (en) | 2003-05-14 |
WO2002013185A1 (en) | 2002-02-14 |
CN1446349A (en) | 2003-10-01 |
DE60143662D1 (en) | 2011-01-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20080262856A1 (en) | Method and system for enabling audio speed conversion | |
US5611018A (en) | System for controlling voice speed of an input signal | |
EP0910065B1 (en) | Speaking speed changing method and device | |
JP4785328B2 (en) | System and method enabling audio speed conversion | |
JP3378672B2 (en) | Speech speed converter | |
JP3162945B2 (en) | Video tape recorder | |
GB2454470A (en) | Controlling an audio signal by analysing samples between zero crossings of the signal | |
US20070192089A1 (en) | Apparatus and method for reproducing audio data | |
JP3357742B2 (en) | Speech speed converter | |
JPH09147472A (en) | Video and audio reproducing device | |
JP3373933B2 (en) | Speech speed converter | |
JP3081469B2 (en) | Speech speed converter | |
JP2002258900A (en) | Device and method for reproducing voice | |
JPH09152889A (en) | Speech speed transformer | |
JPH09146587A (en) | Speech speed changer | |
JPH05303400A (en) | Method and device for audio reproduction | |
JPS6253093B2 (en) | ||
KR100194659B1 (en) | Voice recording method of digital recorder | |
JPH08292796A (en) | Reproducing device | |
JP3360370B2 (en) | Waveform detector | |
JP2877613B2 (en) | Audio data recording device | |
KR950004158A (en) | Audio signal recording / playback method and apparatus | |
JPS5821799A (en) | Voice reproducer | |
JPH08292789A (en) | Speech speed changing device | |
JPH04283471A (en) | Silent processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
AS | Assignment |
Owner name: MAGNOLIA LICENSING LLC, TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:THOMSON LICENSING S.A.S.;REEL/FRAME:053570/0237 Effective date: 20200708 |