WO1994016443A1 - Display system facilitating computer assisted audio editing - Google Patents

Display system facilitating computer assisted audio editing Download PDF

Info

Publication number
WO1994016443A1
WO1994016443A1 PCT/US1993/012677 US9312677W WO9416443A1 WO 1994016443 A1 WO1994016443 A1 WO 1994016443A1 US 9312677 W US9312677 W US 9312677W WO 9416443 A1 WO9416443 A1 WO 9416443A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
waveform
audio data
sound
track
Prior art date
Application number
PCT/US1993/012677
Other languages
French (fr)
Inventor
Mark J. Norton
Original Assignee
Avid Technology, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Avid Technology, Inc. filed Critical Avid Technology, Inc.
Priority to GB9513392A priority Critical patent/GB2289558B/en
Publication of WO1994016443A1 publication Critical patent/WO1994016443A1/en

Links

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/34Indicating arrangements 
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • G11B27/034Electronic editing of digitised analogue information signals, e.g. audio or video signals on discs

Definitions

  • the present invention is related to computerized multimedia editing systems. More particularly, the invention is related to display systems which facilitate audio editing in computerized multimedia editing systems.
  • a common problem in the production of a multimedia program is dialog and other sound editing. Audio tracks often are searched for desired words, sentences, or other sound effects, (often called “clips") and appropriate mark-in and mark-out points are selected. These "clips" must then be synchronized with video or other media with which they are associated in a multimedia program.
  • clips desired words, sentences, or other sound effects
  • an editor linearly searches (i.e., jogs) through the source tape until a word break is detected. This process is slow even for an experienced editor.
  • DSE-7000 from AKG Acoustics, Inc. of San Leandro, California
  • DDR-10 from Otari Corp. of Tokyo, Japan
  • Audio File Plus from AMC Industries, PLC of Bernley, Great Britain
  • Dyaxis from Studer Editech of Menlo Park, California
  • Waveframe 401 from Waveframe, Inc., of Sherman Oaks, California.
  • Video F/X Plus from Digital F/X of Mountain View, California; Studio from Matrox of Dorval, Quebec, Canada; Premier 2.0 from Adobe Systems, Inc., of Mountain View, California; EMC2 from Edit Machines Corporation of Washington, D.C.; Lightworks from OLE Partners, LTD., of London, England; and Picture Processor System III from Montage Group, Ltd., of New York, New York.
  • An audio waveform can be an amplitude or an energy (absolute value of the amplitude) plot. Unfortunately, a fair amount of experience is still needed to interpret these waveforms in order to take full advantage of their utility.
  • a display system which represents an audio track as a discrete waveform (i.e. each sample in the waveform may take on one of a discrete range of possible values) indicating the presence of sound energy above user-set thresholds.
  • the audio data is smoothed and applied to one ore more thresholding functions.
  • the resulting discrete output is displayed, giving the editor an indication of where sufficient sound levels occur on the audio track.
  • Frequency analysis is also provided in one embodiment to allow detection of specific signals, rhythms and the like.
  • FIG. 1 is a block diagram of a computer system suitable for implementing a display system in accordance with the present invention
  • FIG. 2 illustrates sample graphics suitable for a display in system in accordance with an embodiment of the present invention
  • FIG. 3 is a sample energy plot of the prior art compared to a binary waveform as may be displayed in accordance with the present invention
  • FIG. 4 is a graph illustrating how a thresholding function may be applied over a window of audio samples
  • FIG. 5 is a graph illustrating a typical result from sorting audio samples with respect to amplitude
  • FIG. 6 is a flow chart describing how the display of FIG. 2 can be generated in accordance with the present invention.
  • FIG. 7 is a sample energy plot of the prior art compared to a discrete waveform as may be displayed in accordance with the present invention.
  • FIG. 8 shows a sample waveform used to identify the location of music tracks on a compact disk
  • FIG. 9 shows a sample waveform used to identify sound effects on a Foley track
  • FIG. 10 shows a sample waveform used for frequency analysis
  • FIG. 11 shows a sample waveform used to identify the location of electronic slate beeps on a track.
  • Fig. 1 illustrates a suitable data processing system 10 with which the present invention may be implemented.
  • the data processing system 10 is a typical programmable digital computer, such as the Macintosh family of computers available from Apple Computer of Cupertino, California (preferably model Quadra 950), a workstation available from Silicon Graphics, Inc., of Mountain View, California (preferably the Indigo model computer). It should be understood that many other data processing systems could be used to implement the present invention and that those specified are intended to be merely exemplary and illustrative.
  • Such a data processing system may be programmed using typical computer programming languages, such as C++ (on the Indigo) or ThinkC 5.0 (on the Macintosh), which may then be compiled into object code, readable by the data processing system 10, using a suitable compiler, as those familiar with this art would understand.
  • a suitable data processing system 10 includes a main unit 12 which includes a central processing unit (CPU) 14 which controls the operation of the computer and performs arithmetic and logical operations.
  • the data processing system also includes a random access memory 16, (in which the data is volatile) connected to the CPU via a bus 18.
  • the bus 18 also connects the CPU 14 to a display 20, such as a cathode ray tube (CRT) display or a liquid crystal display (LCD) .
  • the data processing system 10 also includes a nonvolatile memory 22, such as a hard disk, or floppy disk drive. This disk drive is also connected to the CPU via bus 18.
  • An input device 24 such as a keyboard, mouse, track ball, graphic tablet or other mechanical user interface, enables a user of the system to input information into the computer.
  • the input device is connected to the CPU and memory via bus 18.
  • the data processing system 10 also has an input port 26 which enables various multimedia data to be input directly into the computer system.
  • Such an input usually includes, or may be connected to, an analog-to-digital converter (not shown) and other hardware subsystems which enable data such as video data and audio data to be directly input to the computer, sampled, and stored in memory 16 or disk 22.
  • the data processing system 10 may be programmed, for example, by using the computer languages described above, along with other computer languages, to enable audio editing. Many such systems are currently available.
  • the present invention provides a display system which facilitates such audio editing by representing the audio data to be edited as a binary waveform having user-selectable parameters.
  • Audio data is represented as a strip, or a track, 30.
  • a plurality of tracks can be displayed as desired, along with video or other multimedia data (indicated at 32).
  • Buttons 34 enable a user to select a track 30 for editing. The provision of such buttons is familiar display and user interface technique in the art. A user may select such buttons and perform other editing functions using an input device which controls a cursor 36 on the display.
  • Such cursor control devices include the track ball, mouse, or graphics tablet as described above.
  • Each strip or track 32 represents a selected duration of time from the audio track.
  • the amount of time is typically user-selectable, and is selected on the basis of the resolution to which accuracy in editing is desired by the editor using the system.
  • five seconds of samples from an audio track are displayed.
  • the audio data for a track is mapped to the display space provided. That is, if the number of samples for the selected time period is greater than the number of pixels available to display the track, the data samples are averaged so as to provide one data sample per pixel in the strip 32.
  • these averaged samples were typically displayed to provide an audio waveform display, having amplitudes scaled to fit the vertical limits of the corresponding display strip 32.
  • a sample corresponding waveform is shown in Fig.
  • Fig. 4 is a graph of representing five samples S_ roof through S 2 having a corresponding amplitude or energy of data samples taken in a given period of time. For each sample S n in the audio data displayed, the absolute values or squared values of surrounding data samples S_horizon through S N are summed. This sum, or average or root-average based on this sum, is then compared to a threshold which is user-selectable. Preferably, the root-mean-square of the audio data is used for increased accuracy.
  • the threshold is taken from a range of values corresponding to the range of possible values subjected to the threshold operation.
  • the amplitude of the audio signal is represented by a signed 16-bit number.
  • the possible threshold value therefore ranges between 0 and 32,000. It was experimentally determined that a suitable default threshold is 10,500. This default can then be adjusted by the user for a given audio track so that the resulting output corresponds to sounds heard while listening to the audio track.
  • a threshold may be provided globally, for all tracks, and clips within tracks, displayed. However, the same threshold may not always be applicable to all clips within a track because different audio sources with different levels are often edited together.
  • a threshold can be made an attribute of a track, or of a clip within a track, to provide more flexibility. In a computer system where tracks and clips are represented as objects such a modification may be readily made.
  • a threshold can also be associated with a master clip, i.e., source data, having the advantage of storing the threshold with the sound data to which it is applied, allowing for a more accurate determination of an appropriate threshold. Threshold adjustment therefore becomes a function on the media database and is not an attribute of the display.
  • Background noise levels and signal-to-noise ratios can also be computed. Auto-correlation and similar techniques can be used to separate the desired audio signal from unwanted noise. Once the background noise has been characterized and measured, the signal-to-noise ratio (S/N) can be computed. A threshold can be determined from this S/N ratio.
  • a threshold for a track or a clip may also be calculated based on the history of data samples for the track or the clip, allowing adaptation to transient shifts in background noise levels.
  • the sample valves can be sorted by amplitudes.
  • the resulting function of amplitude to number of samples tends to have two local maxima-one indicating a noise level, the other indicating a desired sound level. See Fig. 5.
  • the local minimum between these two levels may be used as an appropriate threshold.
  • a steady state room tone recording could be used as a source of expected background noise levels to improve the accuracy of such calculations. That is, steady state room tone samples may be added to the sorted audio samples, thus increasing the number of samples at the noise level.
  • the number of samples which are summed per displayed sample i.e., the size of the sample window
  • the number of samples which are summed per displayed sample should be odd, so that the sample window is centered over the current sample S palm .
  • a sample window is provided so that sporadic samples do not cause the binary waveform to change states too quickly. Rather, several samples in sequence must be either low or high in order to change the state of the binary waveform thereby providing some state momentum.
  • the number of samples considered may also be made user-selectable, allowing the user to control the level of state momentum in the averaging process.
  • Fig. 6 is a flow chart describing how this display is generated when a user selects a given audio track.
  • the user first selects the audio data to be viewed in step 50.
  • This audio data is usually available on the computer as an array of time-indexed 16-bit floating point words, wherein each word represents an instantaneous measurement of sound energy, sampled at 44.1 KHz.
  • the data may also be 16-bit integers which enable faster computation.
  • Audio data may be received at a number of different sampling rates; the sampling rate of 44.1 KHz is typical for a compact disc audio data. Such sound information is typically received through a microphone which provides an analog signal.
  • the analog signal is converted to a digital signal using an analog-to-digital converter, as it is well known in the art, which provides a word of digital data at a given sampling rate.
  • This data can be stored in a variety of different media, such as a floppy disk, hard disk or digital audio tape as time-indexed information.
  • the selected audio data is mapped to the display space in step 52, as described above using procedures which are well known in the art. For example, a number of audio data samples can be averaged to provide a corresponding sample to be displayed for each pixel in the display space. As described in connection with Fig. 4, for each sample to be displayed for a pixel in the display space, the sum of samples within a sample window is calculated and a threshold is applied in step 54 to obtain a binary value. Representation of this binary value is then displayed (step 56) .
  • the location of word breaks, or other breaks, in the sound on the audio track can be readily determined simply by viewing the display. For example, as shown in Fig. 3, where the binary waveform is zero, the person speaking the indicated sentence is pausing between words.
  • Using such a display for editing enables an editor to readily mark cut locations (mark-in and mark-out locations) in the audio track. How cuts are marked in such computerized multimedia editing systems involves techniques which are well-known in the art. In video editing systems, although editing granularity is at the video frame level, fairly accurate edits can be made on word break boundaries using this display mechanism.
  • new editing controls may also be provided. Some of these functions include going to a next word, going to the end of a word, selecting certain words, playing a word or a selection of words, or marking the start or end of a word as a cut location in the track. Such a display system may also be used for musical audio data. Given an appropriate threshold level, the binary waveform may be used to isolate volume peaks and crescendos in music. These and similar functions allow an editor to create multimedia programs based on the dialog or musical content of a sound track.
  • Binary waveforms may also be used to identify and locate the presence of a sound effect on a Foley track (for example, see Fig. 9) especially if the track includes large amounts of silence, greatly improving the ability to synchronize the sound effect with visual material. Sound effects may be quickly found, edited, and synchronized with other material. In stereo sound, synchronization may be repaired if lost when one track slips versus another.
  • This type of display may also be used to detect long pauses, which may be then used to identify and separate effects or music tracks captured from prerecorded sources such as records, tapes, compact discs and the like. (For example, see Fig. 8).
  • Such a display system facilitates the development of marketing and advertising multimedia programs by companies who have no personnel with experience in film editing.
  • the invention is not limited to generating nearly a binary waveform, nor to amplitude data.
  • Two or more thresholds may also be provided to provide a discrete waveform, having a smaller range than a continuous waveform, but a broader range than a simple binary waveform.
  • Two thresholds provide a hysteresis type state behavior to -l i ⁇
  • discrete color values may be used to identify levels of sound over different thresholds. For example, black may be used to indicate silence, purple for low volume, blue for mid-volume and green for high volume.
  • a 16-bit continuous range of colors can completely represent a range of amplitudes from 0 to 32,000 represented by the sound data.
  • FIG. 7 A sample of a display using three thresholds, with threshold level colors identified by patterns, is shown in Fig. 7.
  • a discrete waveform is shown at 60, with a corresponding energy plot at 62.
  • the threshold levels are indicated at 64, 65 and 66.
  • the waveform takes on one value, indicated by black in Fig. 7 at, for example, 68.
  • the waveform takes a second value, indicated by white in Fig. 7 at, for example, 70.
  • the waveform takes on a third value when the sound energy is between the second and third thresholds, as indicated by horizontal lines, for example, at 71. Otherwise the waveform takes on a fourth value indicated by diagonal lines, for example, at 72.
  • Frequency data may also be used for this display system.
  • Such frequency data can be obtained by applying a simple fast Fourier transform (FFT) with a limited frequency band to the audio data.
  • FFT simple fast Fourier transform
  • a threshold can be applied to the amplitudes of the different frequency bands to determine if sounds within the certain frequency band are present. Such information may then also be displayed.
  • a sample display is shown in Fig. 10,
  • certain sounds can be detected, such as electronic slate beeps (signals used to separate one tape or scene from another during video and audio recording sessions) (for example, see Fig. 11). If the frequency of the desired signal is known (in this example the presumption is 1.2 KHz), its occurrence in a track can be identified, s.ich as shown in black in Fig. 11.
  • Frequency data may also be used to identify potential problem areas in a sound track. For example, repetitive background noise events, such as fans, light buzzing, etc. may be detected. Using both frequency and amplitude data, it is possible to avoid the loss of dialogue in noisy environments, when dialogue levels fall below the levels of background noise. It is then possible to differentiate further word breaks in this low level dialogue.

Abstract

A display system represents an audio track as a discrete waveform, wherein each sample in the waveform may take on one of a discrete range of possible values. The discrete waveform indicates the presence of sound energy above user-set thresholds. Audio data samples are smoothed and applied to one or more thresholding functions. The resulting discrete output is displayed by using graphics of either different size or different color. An editor using this display system in conjunction with a computerized editing system receives an indication of where sufficient sound levels occur on the audio track. This information may be used to locate breaks in sound, dialog and other sound effects, which simplifies audio editing and synchronization of audio with other media in a multimedia presentation. Similar thresholds may be applied to the results of frequency analysis to provide an indication of which frequencies are present in the audio's signal.

Description

DISPLAY SYSTEM FACILITATING COMPUTER ASSISTED AUDIO EDITING
Field of the Invention
The present invention is related to computerized multimedia editing systems. More particularly, the invention is related to display systems which facilitate audio editing in computerized multimedia editing systems.
Background of the Invention
A common problem in the production of a multimedia program is dialog and other sound editing. Audio tracks often are searched for desired words, sentences, or other sound effects, (often called "clips") and appropriate mark-in and mark-out points are selected. These "clips" must then be synchronized with video or other media with which they are associated in a multimedia program. In conventional, linear editing, relying on analog or digital source tape, an editor linearly searches (i.e., jogs) through the source tape until a word break is detected. This process is slow even for an experienced editor.
In computerized editing systems, such as a digital audio workstation available from Avid Technology, Inc. of Tewksbury, Massachusetts, and a digital video workstation (e.g.. Media Composer or Media Suite Pro, also available from Avid Technology, Inc.) the audio editing process has been made somewhat easier by providing a representation of the audio waveform for the audio track being edited.
Other available digital audio workstations include: the DSE-7000, from AKG Acoustics, Inc. of San Leandro, California; the DDR-10 from Otari Corp. of Tokyo, Japan; the Audio File Plus, from AMC Industries, PLC of Bernley, Great Britain; Dyaxis from Studer Editech of Menlo Park, California; and Waveframe 401 from Waveframe, Inc., of Sherman Oaks, California. Other available digital video workstations which allow for audio editing include: Video F/X Plus from Digital F/X of Mountain View, California; Studio from Matrox of Dorval, Quebec, Canada; Premier 2.0 from Adobe Systems, Inc., of Mountain View, California; EMC2 from Edit Machines Corporation of Washington, D.C.; Lightworks from OLE Partners, LTD., of London, England; and Picture Processor System III from Montage Group, Ltd., of New York, New York.
An audio waveform can be an amplitude or an energy (absolute value of the amplitude) plot. Unfortunately, a fair amount of experience is still needed to interpret these waveforms in order to take full advantage of their utility.
Summary of the Invention
To facilitate editing a display system was developed which represents an audio track as a discrete waveform (i.e. each sample in the waveform may take on one of a discrete range of possible values) indicating the presence of sound energy above user-set thresholds. The audio data is smoothed and applied to one ore more thresholding functions. The resulting discrete output is displayed, giving the editor an indication of where sufficient sound levels occur on the audio track.
Frequency analysis is also provided in one embodiment to allow detection of specific signals, rhythms and the like.
Brlef Description of the Drawing
In the drawing,
FIG. 1 is a block diagram of a computer system suitable for implementing a display system in accordance with the present invention;
FIG. 2 illustrates sample graphics suitable for a display in system in accordance with an embodiment of the present invention; and FIG. 3 is a sample energy plot of the prior art compared to a binary waveform as may be displayed in accordance with the present invention;
FIG. 4 is a graph illustrating how a thresholding function may be applied over a window of audio samples;
FIG. 5 is a graph illustrating a typical result from sorting audio samples with respect to amplitude;
FIG. 6 is a flow chart describing how the display of FIG. 2 can be generated in accordance with the present invention;
FIG. 7 is a sample energy plot of the prior art compared to a discrete waveform as may be displayed in accordance with the present invention;
FIG. 8 shows a sample waveform used to identify the location of music tracks on a compact disk;
FIG. 9 shows a sample waveform used to identify sound effects on a Foley track;
FIG. 10 shows a sample waveform used for frequency analysis; and
FIG. 11 shows a sample waveform used to identify the location of electronic slate beeps on a track.
Detailed Description
The invention will be more completely understood through a reading of the detailed description which follows, when taken in conjunction with the attached drawing.
Fig. 1 illustrates a suitable data processing system 10 with which the present invention may be implemented. The data processing system 10 is a typical programmable digital computer, such as the Macintosh family of computers available from Apple Computer of Cupertino, California (preferably model Quadra 950), a workstation available from Silicon Graphics, Inc., of Mountain View, California (preferably the Indigo model computer). It should be understood that many other data processing systems could be used to implement the present invention and that those specified are intended to be merely exemplary and illustrative. Such a data processing system may be programmed using typical computer programming languages, such as C++ (on the Indigo) or ThinkC 5.0 (on the Macintosh), which may then be compiled into object code, readable by the data processing system 10, using a suitable compiler, as those familiar with this art would understand.
A suitable data processing system 10 includes a main unit 12 which includes a central processing unit (CPU) 14 which controls the operation of the computer and performs arithmetic and logical operations. The data processing system also includes a random access memory 16, (in which the data is volatile) connected to the CPU via a bus 18. The bus 18 also connects the CPU 14 to a display 20, such as a cathode ray tube (CRT) display or a liquid crystal display (LCD) . The data processing system 10 also includes a nonvolatile memory 22, such as a hard disk, or floppy disk drive. This disk drive is also connected to the CPU via bus 18. An input device 24, such as a keyboard, mouse, track ball, graphic tablet or other mechanical user interface, enables a user of the system to input information into the computer. The input device is connected to the CPU and memory via bus 18. The data processing system 10 also has an input port 26 which enables various multimedia data to be input directly into the computer system. Such an input usually includes, or may be connected to, an analog-to-digital converter (not shown) and other hardware subsystems which enable data such as video data and audio data to be directly input to the computer, sampled, and stored in memory 16 or disk 22.
The data processing system 10 may be programmed, for example, by using the computer languages described above, along with other computer languages, to enable audio editing. Many such systems are currently available. The present invention provides a display system which facilitates such audio editing by representing the audio data to be edited as a binary waveform having user-selectable parameters.
Referring now to Fig. 2, suitable graphics for such a display system are shown. Audio data is represented as a strip, or a track, 30. A plurality of tracks can be displayed as desired, along with video or other multimedia data (indicated at 32). Buttons 34 enable a user to select a track 30 for editing. The provision of such buttons is familiar display and user interface technique in the art. A user may select such buttons and perform other editing functions using an input device which controls a cursor 36 on the display. Such cursor control devices include the track ball, mouse, or graphics tablet as described above.
Each strip or track 32 represents a selected duration of time from the audio track. The amount of time is typically user-selectable, and is selected on the basis of the resolution to which accuracy in editing is desired by the editor using the system. In the example shown in Fig. 2, five seconds of samples from an audio track are displayed. The audio data for a track is mapped to the display space provided. That is, if the number of samples for the selected time period is greater than the number of pixels available to display the track, the data samples are averaged so as to provide one data sample per pixel in the strip 32. In previous computerized audio editing systems, these averaged samples were typically displayed to provide an audio waveform display, having amplitudes scaled to fit the vertical limits of the corresponding display strip 32. A sample corresponding waveform is shown in Fig. 3 at 40. In the present invention, such a waveform is converted into a binary waveform, such as shown at 42 in Fig. 3. How the binary values are generated will now be described in connection with the graph of Fig. 4. Fig. 4 is a graph of representing five samples S_„ through S 2 having a corresponding amplitude or energy of data samples taken in a given period of time. For each sample Sn in the audio data displayed, the absolute values or squared values of surrounding data samples S_„ through S N are summed. This sum, or average or root-average based on this sum, is then compared to a threshold which is user-selectable. Preferably, the root-mean-square of the audio data is used for increased accuracy. The threshold is taken from a range of values corresponding to the range of possible values subjected to the threshold operation. In this embodiment, the amplitude of the audio signal is represented by a signed 16-bit number. The possible threshold value therefore ranges between 0 and 32,000. It was experimentally determined that a suitable default threshold is 10,500. This default can then be adjusted by the user for a given audio track so that the resulting output corresponds to sounds heard while listening to the audio track.
A threshold may be provided globally, for all tracks, and clips within tracks, displayed. However, the same threshold may not always be applicable to all clips within a track because different audio sources with different levels are often edited together. Alternatively, a threshold can be made an attribute of a track, or of a clip within a track, to provide more flexibility. In a computer system where tracks and clips are represented as objects such a modification may be readily made. A threshold can also be associated with a master clip, i.e., source data, having the advantage of storing the threshold with the sound data to which it is applied, allowing for a more accurate determination of an appropriate threshold. Threshold adjustment therefore becomes a function on the media database and is not an attribute of the display. Background noise levels and signal-to-noise ratios can also be computed. Auto-correlation and similar techniques can be used to separate the desired audio signal from unwanted noise. Once the background noise has been characterized and measured, the signal-to-noise ratio (S/N) can be computed. A threshold can be determined from this S/N ratio.
A threshold for a track or a clip may also be calculated based on the history of data samples for the track or the clip, allowing adaptation to transient shifts in background noise levels. To do this, the sample valves can be sorted by amplitudes. The resulting function of amplitude to number of samples tends to have two local maxima-one indicating a noise level, the other indicating a desired sound level. See Fig. 5. The local minimum between these two levels may be used as an appropriate threshold. A steady state room tone recording could be used as a source of expected background noise levels to improve the accuracy of such calculations. That is, steady state room tone samples may be added to the sorted audio samples, thus increasing the number of samples at the noise level.
The example shown in Fig. 4 is a five sample window (N=2) into the sound track. The number of samples which are summed per displayed sample (i.e., the size of the sample window) should be odd, so that the sample window is centered over the current sample S„ . A sample window is provided so that sporadic samples do not cause the binary waveform to change states too quickly. Rather, several samples in sequence must be either low or high in order to change the state of the binary waveform thereby providing some state momentum. The number of samples considered may also be made user-selectable, allowing the user to control the level of state momentum in the averaging process. There is a slight delay introduced by this method which causes the binary waveform to change states a number of time units after the actua'i energy waveform changes. In general, this delay is not a problem, since there are tens of thousands of samples per second and the delay is correspondingly negligible. It has been found that a sample window of five samples provides a suitable smoothing filter for this purpose. A three sample window was found to be insufficient, particularly at higher time resolutions. The sample window should not be made too wide as it would tend to distort the waveform of the audio data.
Fig. 6 is a flow chart describing how this display is generated when a user selects a given audio track. The user first selects the audio data to be viewed in step 50. This audio data is usually available on the computer as an array of time-indexed 16-bit floating point words, wherein each word represents an instantaneous measurement of sound energy, sampled at 44.1 KHz. The data may also be 16-bit integers which enable faster computation. Audio data may be received at a number of different sampling rates; the sampling rate of 44.1 KHz is typical for a compact disc audio data. Such sound information is typically received through a microphone which provides an analog signal. The analog signal is converted to a digital signal using an analog-to-digital converter, as it is well known in the art, which provides a word of digital data at a given sampling rate. This data can be stored in a variety of different media, such as a floppy disk, hard disk or digital audio tape as time-indexed information.
The selected audio data is mapped to the display space in step 52, as described above using procedures which are well known in the art. For example, a number of audio data samples can be averaged to provide a corresponding sample to be displayed for each pixel in the display space. As described in connection with Fig. 4, for each sample to be displayed for a pixel in the display space, the sum of samples within a sample window is calculated and a threshold is applied in step 54 to obtain a binary value. Representation of this binary value is then displayed (step 56) .
When the data is filtered, and the resulting binary waveform is displayed, the location of word breaks, or other breaks, in the sound on the audio track can be readily determined simply by viewing the display. For example, as shown in Fig. 3, where the binary waveform is zero, the person speaking the indicated sentence is pausing between words. Using such a display for editing enables an editor to readily mark cut locations (mark-in and mark-out locations) in the audio track. How cuts are marked in such computerized multimedia editing systems involves techniques which are well-known in the art. In video editing systems, although editing granularity is at the video frame level, fairly accurate edits can be made on word break boundaries using this display mechanism.
Because the number of data samples displayed depends on the size of the window and the time resolution selected by the urer, the granularity of the binary waveform also changes, so that it does not always indicate word breaks. At much higher resolutions, it has been found that syllables of words can be detected. At lower resolutions, breaks between sentences are detected, while word breaks are not. Such functionality is useful for editing because it allows high level selections to be made easily, then later, more fine level editing can be performed.
Given a binary waveform which indicates the presence of word or any sound breaks in a sound track, new editing controls may also be provided. Some of these functions include going to a next word, going to the end of a word, selecting certain words, playing a word or a selection of words, or marking the start or end of a word as a cut location in the track. Such a display system may also be used for musical audio data. Given an appropriate threshold level, the binary waveform may be used to isolate volume peaks and crescendos in music. These and similar functions allow an editor to create multimedia programs based on the dialog or musical content of a sound track.
Binary waveforms may also be used to identify and locate the presence of a sound effect on a Foley track (for example, see Fig. 9) especially if the track includes large amounts of silence, greatly improving the ability to synchronize the sound effect with visual material. Sound effects may be quickly found, edited, and synchronized with other material. In stereo sound, synchronization may be repaired if lost when one track slips versus another.
This type of display may also be used to detect long pauses, which may be then used to identify and separate effects or music tracks captured from prerecorded sources such as records, tapes, compact discs and the like. (For example, see Fig. 8).
By using such a display system an editor may readily visualize sound pieces, and the editing process is accelerated and simplified. Sentences, phrases, words, syllables, transient noises, speech patterns, and even silence such as dramatic pauses and sentence and phrase breaks, may be quickly located and isolated in a long audio track and extracted for appropriate use. Thus, less experience is required to generate high quality multimedia productions. Such a display system facilitates the development of marketing and advertising multimedia programs by companies who have no personnel with experience in film editing.
The invention is not limited to generating nearly a binary waveform, nor to amplitude data.
Two or more thresholds may also be provided to provide a discrete waveform, having a smaller range than a continuous waveform, but a broader range than a simple binary waveform. Two thresholds provide a hysteresis type state behavior to -l i¬
the display. For more thresholds, discrete color values may be used to identify levels of sound over different thresholds. For example, black may be used to indicate silence, purple for low volume, blue for mid-volume and green for high volume. A 16-bit continuous range of colors can completely represent a range of amplitudes from 0 to 32,000 represented by the sound data.
A sample of a display using three thresholds, with threshold level colors identified by patterns, is shown in Fig. 7. In this figure, a discrete waveform is shown at 60, with a corresponding energy plot at 62. The threshold levels are indicated at 64, 65 and 66. When the sound energy is below the first threshold 64, the waveform takes on one value, indicated by black in Fig. 7 at, for example, 68. When sound energy is above the first threshold 64 but below the second threshold 68, the waveform takes a second value, indicated by white in Fig. 7 at, for example, 70. Similarly, the waveform takes on a third value when the sound energy is between the second and third thresholds, as indicated by horizontal lines, for example, at 71. Otherwise the waveform takes on a fourth value indicated by diagonal lines, for example, at 72.
Frequency data may also be used for this display system. Such frequency data can be obtained by applying a simple fast Fourier transform (FFT) with a limited frequency band to the audio data. A threshold can be applied to the amplitudes of the different frequency bands to determine if sounds within the certain frequency band are present. Such information may then also be displayed. A sample display is shown in Fig. 10,
Using frequency analysis, certain sounds can be detected, such as electronic slate beeps (signals used to separate one tape or scene from another during video and audio recording sessions) (for example, see Fig. 11). If the frequency of the desired signal is known (in this example the presumption is 1.2 KHz), its occurrence in a track can be identified, s.ich as shown in black in Fig. 11.
It is also possible to show selected frequencies or a range of frequencies, using colors to denote the various frequency bands. Such frequency data allows certain aspects of music to be viewed, including beat detection. In some compositions with strong drum or other rhythm sounds, it is possible to isolate or determine the tempo of the music using such frequency data.
Frequency data may also be used to identify potential problem areas in a sound track. For example, repetitive background noise events, such as fans, light buzzing, etc. may be detected. Using both frequency and amplitude data, it is possible to avoid the loss of dialogue in noisy environments, when dialogue levels fall below the levels of background noise. It is then possible to differentiate further word breaks in this low level dialogue.
Having now described a few embodiments of the invention, it should be apparent to those skilled in the art that the foregoing is merely illustrative and not limiting, having been presented by way of example only. Numerous modifications and other embodiments are within the scope of one of ordinary skill in the art and are contemplated as falling within the scope of the invention as defined by the appended claims.

Claims

1. A display system for facilitating computerized editing of audio data, comprising: means for selecting at least a portion of the audio data; means for generating a discrete waveform representative of the selected portion of audio data; and means for displaying the discrete waveform on a video display.
2. The display system of claim 1 wherein the selected audio data comprises a plurality of samples and the means for generating includes means for applying a smoothing operation to the selected audio data to obtain an averaged value for each sample and means for applying a threshold operation to each of the averaged values to obtain a binary value.
3. A method of audio data for facilitating computerized editing of the audio data, the method comprising the steps of: selecting at least a portion of the audio data; generating a discrete waveform representative of the selected portion of the audio data; and displaying the discrete waveform on a video display.
4. The method of claim 3 wherein the selected audio data comprises a plurality of samples in the step of generating a discrete waveform includes the steps of applying a smoothing operation to the selected audio data to obtain an averaged value for each sample and applying a threshold operation to each of the averaged values to obtain a binary value.
PCT/US1993/012677 1992-12-31 1993-12-30 Display system facilitating computer assisted audio editing WO1994016443A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB9513392A GB2289558B (en) 1992-12-31 1993-12-30 Display system facilitating computer assisted audio editing

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US07/997,716 1992-12-31
US07/997,716 US5634020A (en) 1992-12-31 1992-12-31 Apparatus and method for displaying audio data as a discrete waveform

Publications (1)

Publication Number Publication Date
WO1994016443A1 true WO1994016443A1 (en) 1994-07-21

Family

ID=25544309

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1993/012677 WO1994016443A1 (en) 1992-12-31 1993-12-30 Display system facilitating computer assisted audio editing

Country Status (3)

Country Link
US (1) US5634020A (en)
GB (1) GB2289558B (en)
WO (1) WO1994016443A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2306750A (en) * 1995-10-23 1997-05-07 Quantel Ltd An audio editing system
EP0811975A2 (en) * 1996-06-04 1997-12-10 Hitachi Denshi Kabushiki Kaisha Editing method for recorded information
WO1998006099A1 (en) * 1996-08-06 1998-02-12 Interval Research Corporation Time-based media processing system
EP0902431A2 (en) * 1997-09-12 1999-03-17 Philips Patentverwaltung GmbH System for editing of digital video and audio information
EP2779172A1 (en) * 2013-03-14 2014-09-17 Honeywell International Inc. System and method of audio information display on video playback timeline
WO2017001860A1 (en) * 2015-06-30 2017-01-05 British Broadcasting Corporation Audio-video content control

Families Citing this family (46)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6278743B1 (en) * 1996-11-12 2001-08-21 Zenith Electronics Corporation Non linear amplitude precorrection for HDTV transmitter
US6661430B1 (en) * 1996-11-15 2003-12-09 Picostar Llc Method and apparatus for copying an audiovisual segment
WO1998044717A2 (en) * 1997-04-01 1998-10-08 Medic Interactive, Inc. System for automated generation of media programs from a database of media elements
US6336093B2 (en) 1998-01-16 2002-01-01 Avid Technology, Inc. Apparatus and method using speech recognition and scripts to capture author and playback synchronized audio and video
GB2336022A (en) * 1998-04-03 1999-10-06 Discreet Logic Inc Edit processing audio-visual data
AU753622B2 (en) * 1998-08-12 2002-10-24 Allan Plaskett Method of, and system for, analysing events
US6631368B1 (en) * 1998-11-13 2003-10-07 Nortel Networks Limited Methods and apparatus for operating on non-text messages
US9171545B2 (en) * 1999-04-19 2015-10-27 At&T Intellectual Property Ii, L.P. Browsing and retrieval of full broadcast-quality video
US7877774B1 (en) 1999-04-19 2011-01-25 At&T Intellectual Property Ii, L.P. Browsing and retrieval of full broadcast-quality video
US6704671B1 (en) 1999-07-22 2004-03-09 Avid Technology, Inc. System and method of identifying the onset of a sonic event
CN100409358C (en) * 2000-09-08 2008-08-06 皇家菲利浦电子有限公司 Reproducing apparatus providing colored slider bar
US9419844B2 (en) 2001-09-11 2016-08-16 Ntech Properties, Inc. Method and system for generation of media
US20060015904A1 (en) 2000-09-08 2006-01-19 Dwight Marcus Method and apparatus for creation, distribution, assembly and verification of media
US7072908B2 (en) 2001-03-26 2006-07-04 Microsoft Corporation Methods and systems for synchronizing visualizations with audio streams
US7711123B2 (en) * 2001-04-13 2010-05-04 Dolby Laboratories Licensing Corporation Segmenting audio signals into auditory events
JP3886372B2 (en) * 2001-12-13 2007-02-28 松下電器産業株式会社 Acoustic inflection point extraction apparatus and method, acoustic reproduction apparatus and method, acoustic signal editing apparatus, acoustic inflection point extraction method program recording medium, acoustic reproduction method program recording medium, acoustic signal editing method program recording medium, acoustic inflection point extraction method Program, sound reproduction method program, sound signal editing method program
US8009966B2 (en) * 2002-11-01 2011-08-30 Synchro Arts Limited Methods and apparatus for use in sound replacement with automatic synchronization to images
AU2004254950A1 (en) 2003-06-24 2005-01-13 Ntech Properties, Inc. Method, system and apparatus for information delivery
GB2404300A (en) * 2003-07-25 2005-01-26 Autodesk Inc Compositing and temporally editing clips of image data
US7725828B1 (en) * 2003-10-15 2010-05-25 Apple Inc. Application of speed effects to a video presentation
JP2006145712A (en) * 2004-11-18 2006-06-08 Pioneer Electronic Corp Audio data interpolation system
US20080201092A1 (en) * 2005-08-22 2008-08-21 Matthew Sean Connolly Waveform Display Method And Apparatus
US9020326B2 (en) 2005-08-23 2015-04-28 At&T Intellectual Property Ii, L.P. System and method for content-based navigation of live and recorded TV and video programs
US9042703B2 (en) 2005-10-31 2015-05-26 At&T Intellectual Property Ii, L.P. System and method for content-based navigation of live and recorded TV and video programs
US9070408B2 (en) 2005-08-26 2015-06-30 Endless Analog, Inc Closed loop analog signal processor (“CLASP”) system
US7751916B2 (en) * 2005-08-26 2010-07-06 Endless Analog, Inc. Closed loop analog signal processor (“CLASP”) system
US8630727B2 (en) * 2005-08-26 2014-01-14 Endless Analog, Inc Closed loop analog signal processor (“CLASP”) system
US20080229200A1 (en) * 2007-03-16 2008-09-18 Fein Gene S Graphical Digital Audio Data Processing System
US8145704B2 (en) 2007-06-13 2012-03-27 Ntech Properties, Inc. Method and system for providing media programming
US20090082887A1 (en) * 2007-09-23 2009-03-26 International Business Machines Corporation Method and User Interface for Creating an Audio Recording Using a Document Paradigm
CA2721522C (en) 2008-04-18 2021-03-16 Erik Van De Pol System and method for condensed representation of long video sequences
US8856655B2 (en) * 2009-05-01 2014-10-07 Apple Inc. Media editing application with capability to focus on graphical composite elements in a media compositing area
US8612858B2 (en) 2009-05-01 2013-12-17 Apple Inc. Condensing graphical representations of media clips in a composite display area of a media-editing application
GB2474076B (en) * 2009-10-05 2014-03-26 Sonnox Ltd Audio repair methods and apparatus
EP3418917B1 (en) 2010-05-04 2022-08-17 Apple Inc. Methods and systems for synchronizing media
US8875025B2 (en) 2010-07-15 2014-10-28 Apple Inc. Media-editing application with media clips grouping capabilities
US10324605B2 (en) 2011-02-16 2019-06-18 Apple Inc. Media-editing application with novel editing tools
US10095367B1 (en) * 2010-10-15 2018-10-09 Tivo Solutions Inc. Time-based metadata management system for digital media
US8775480B2 (en) 2011-01-28 2014-07-08 Apple Inc. Media clip management
US8966367B2 (en) 2011-02-16 2015-02-24 Apple Inc. Anchor override for a media-editing application with an anchored timeline
US9997196B2 (en) 2011-02-16 2018-06-12 Apple Inc. Retiming media presentations
US11747972B2 (en) 2011-02-16 2023-09-05 Apple Inc. Media-editing application with novel editing tools
KR20130134195A (en) * 2012-05-30 2013-12-10 삼성전자주식회사 Apparatas and method fof high speed visualization of audio stream in a electronic device
FR3000155B1 (en) 2012-12-21 2015-09-25 Valeo Embrayages TORSION DAMPER FOR A TORQUE TRANSMISSION DEVICE OF A MOTOR VEHICLE
FR3024759B1 (en) 2014-08-08 2020-01-03 Valeo Embrayages SHOCK ABSORBER, PARTICULARLY FOR AN AUTOMOTIVE CLUTCH
US10762347B1 (en) * 2017-05-25 2020-09-01 David Andrew Caulkins Waveform generation and recognition system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4249218A (en) * 1978-11-01 1981-02-03 Minnesota Mining And Manufacturing Company Method and apparatus for editing digitally recorded audio signals
WO1988002958A1 (en) * 1986-10-16 1988-04-21 David Burton Control system
EP0322100A2 (en) * 1987-12-21 1989-06-28 International Business Machines Corporation System and method for processing digitized audio signals
WO1991003053A1 (en) * 1989-08-18 1991-03-07 Jemani Ltd. A method of and apparatus for assisting in editing recorded audio material
GB2235815A (en) * 1989-09-01 1991-03-13 Compact Video Group Inc Digital dialog editor
GB2245745A (en) * 1990-07-06 1992-01-08 Sony Corp Editing digital audio signals associated with video signals
US5151998A (en) * 1988-12-30 1992-09-29 Macromedia, Inc. sound editing system using control line for altering specified characteristic of adjacent segment of the stored waveform

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4067049A (en) * 1975-10-02 1978-01-03 Glen Glenn Sound Sound editing system
US4214278A (en) * 1978-11-30 1980-07-22 Scp Producer's Services Limited Editing system for videotape sound
US4251688A (en) * 1979-01-15 1981-02-17 Ana Maria Furner Audio-digital processing system for demultiplexing stereophonic/quadriphonic input audio signals into 4-to-72 output audio signals
US4375083A (en) * 1980-01-31 1983-02-22 Bell Telephone Laboratories, Incorporated Signal sequence editing method and apparatus with automatic time fitting of edited segments
US4757540A (en) * 1983-10-24 1988-07-12 E-Systems, Inc. Method for audio editing
US4641253A (en) * 1984-06-06 1987-02-03 Maximus, Inc. Process for synchronizing computer video with independent audio
US4746994A (en) * 1985-08-22 1988-05-24 Cinedco, California Limited Partnership Computer-based video editing system
DE3788038T2 (en) * 1986-11-20 1994-03-17 Matsushita Electric Ind Co Ltd Information editing device.
DE3739681A1 (en) * 1987-11-24 1989-06-08 Philips Patentverwaltung METHOD FOR DETERMINING START AND END POINT ISOLATED SPOKEN WORDS IN A VOICE SIGNAL AND ARRANGEMENT FOR IMPLEMENTING THE METHOD
US4956806A (en) * 1988-07-12 1990-09-11 International Business Machines Corporation Method and apparatus for editing source files of differing data formats using an edit tracking file
JPH02110658A (en) * 1988-10-19 1990-04-23 Hitachi Ltd Document editing device
US5204969A (en) * 1988-12-30 1993-04-20 Macromedia, Inc. Sound editing system using visually displayed control line for altering specified characteristic of adjacent segment of stored waveform
DE69028940T2 (en) * 1989-03-28 1997-02-20 Matsushita Electric Ind Co Ltd Device and method for data preparation
US5249289A (en) * 1989-09-28 1993-09-28 International Business Machines Corporation System and method for rebuilding edited digital audio files
US5121470A (en) * 1990-02-01 1992-06-09 Intellimetrics Instrument Corporation Automated interactive record system
EP0526064B1 (en) * 1991-08-02 1997-09-10 The Grass Valley Group, Inc. Video editing system operator interface for visualization and interactive control of video material
JP3252172B2 (en) * 1991-11-14 2002-01-28 カシオ計算機株式会社 Digital recorder
US5355450A (en) * 1992-04-10 1994-10-11 Avid Technology, Inc. Media composer with adjustable source material compression
JP3067801B2 (en) * 1992-04-10 2000-07-24 アヴィッド・テクノロジー・インコーポレーテッド Digital audio workstation providing digital storage and display of video information

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4249218A (en) * 1978-11-01 1981-02-03 Minnesota Mining And Manufacturing Company Method and apparatus for editing digitally recorded audio signals
WO1988002958A1 (en) * 1986-10-16 1988-04-21 David Burton Control system
EP0322100A2 (en) * 1987-12-21 1989-06-28 International Business Machines Corporation System and method for processing digitized audio signals
US5151998A (en) * 1988-12-30 1992-09-29 Macromedia, Inc. sound editing system using control line for altering specified characteristic of adjacent segment of the stored waveform
WO1991003053A1 (en) * 1989-08-18 1991-03-07 Jemani Ltd. A method of and apparatus for assisting in editing recorded audio material
GB2235815A (en) * 1989-09-01 1991-03-13 Compact Video Group Inc Digital dialog editor
GB2245745A (en) * 1990-07-06 1992-01-08 Sony Corp Editing digital audio signals associated with video signals

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"THREE-LEVEL AUDIO OBJECT DISPLAY FOR A PERSONAL COMPUTER AUDIO EDITOR", IBM TECHNICAL DISCLOSURE BULLETIN., vol. 30, no. 10, March 1988 (1988-03-01), NEW YORK US, pages 351 - 353, XP000104962 *
"VOLUME HISTORY DISPLAY FOR A PERSONAL COMPUTER AUDIO EDITOR", IBM TECHNICAL DISCLOSURE BULLETIN., vol. 30, no. 10, March 1988 (1988-03-01), NEW YORK US, pages 355 - 356, XP000104958 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2306750B (en) * 1995-10-23 1999-11-10 Quantel Ltd An audio editing system
GB2306750A (en) * 1995-10-23 1997-05-07 Quantel Ltd An audio editing system
EP0811975A2 (en) * 1996-06-04 1997-12-10 Hitachi Denshi Kabushiki Kaisha Editing method for recorded information
EP0811975A3 (en) * 1996-06-04 1999-05-19 Hitachi Denshi Kabushiki Kaisha Editing method for recorded information
WO1998006099A1 (en) * 1996-08-06 1998-02-12 Interval Research Corporation Time-based media processing system
US6243087B1 (en) 1996-08-06 2001-06-05 Interval Research Corporation Time-based media processing system
US5969716A (en) * 1996-08-06 1999-10-19 Interval Research Corporation Time-based media processing system
US6185538B1 (en) 1997-09-12 2001-02-06 Us Philips Corporation System for editing digital video and audio information
EP0902431A3 (en) * 1997-09-12 1999-08-11 Philips Patentverwaltung GmbH System for editing of digital video and audio information
EP0902431A2 (en) * 1997-09-12 1999-03-17 Philips Patentverwaltung GmbH System for editing of digital video and audio information
EP2779172A1 (en) * 2013-03-14 2014-09-17 Honeywell International Inc. System and method of audio information display on video playback timeline
CN104053064A (en) * 2013-03-14 2014-09-17 霍尼韦尔国际公司 System and method of audio information display on video playback timeline
US10809966B2 (en) 2013-03-14 2020-10-20 Honeywell International Inc. System and method of audio information display on video playback timeline
WO2017001860A1 (en) * 2015-06-30 2017-01-05 British Broadcasting Corporation Audio-video content control
GB2556737A (en) * 2015-06-30 2018-06-06 Rankine Simon Audio-video content control
US10701459B2 (en) 2015-06-30 2020-06-30 British Broadcasting Corporation Audio-video content control

Also Published As

Publication number Publication date
GB9513392D0 (en) 1995-09-27
US5634020A (en) 1997-05-27
GB2289558B (en) 1997-04-16
GB2289558A (en) 1995-11-22

Similar Documents

Publication Publication Date Title
US5634020A (en) Apparatus and method for displaying audio data as a discrete waveform
US8153882B2 (en) Time compression/expansion of selected audio segments in an audio file
JP3941417B2 (en) How to identify new points in a source audio signal
US5204969A (en) Sound editing system using visually displayed control line for altering specified characteristic of adjacent segment of stored waveform
US5151998A (en) sound editing system using control line for altering specified characteristic of adjacent segment of the stored waveform
JP4695392B2 (en) Method and apparatus for use in sound replacement that automatically synchronizes with an image
US7250566B2 (en) Evaluating and correcting rhythm in audio data
US7027124B2 (en) Method for automatically producing music videos
US7603623B1 (en) User interface to automatically correct timing in playback for audio recordings
JPH01172900A (en) Voice data processor
US20180005614A1 (en) Intelligent Crossfade With Separated Instrument Tracks
US5621538A (en) Method for synchronizing computerized audio output with visual output
US20120185068A1 (en) Background Audio Processing
JP2900976B2 (en) MIDI data editing device
EP0597798A1 (en) Method and system for utilizing audible search patterns within a multimedia presentation
US10460712B1 (en) Synchronizing playback of a digital musical score with an audio recording
Whalen et al. The Haskins Laboratories’ pulse code modulation (PCM) system
Aigrain et al. Representation-based user interfaces for the audiovisual library of the year 2000
Menu et al. Version 1.2 User Guide
Lee et al. DiMaß: A technique for audio scrubbing and skimming using direct manipulation
Kowalski et al. The NYIT digital sound editor
Czyzewski et al. New algorithms for wow and flutter detection and compensation in audio
JP4280893B2 (en) Voice speech / pause section detection device
Petelin et al. Cool Edit Pro2 in Use
US20210110803A1 (en) Synchronizing playback of a digital musical score with an audio recording

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): GB

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase