|Publication number||USRE40107 E1|
|Application number||US 09/971,236|
|Publication date||Feb 26, 2008|
|Filing date||Oct 4, 2001|
|Priority date||Dec 22, 1989|
|Also published as||CA2071975A1, CA2071975C, DE69034016D1, DE69034016T2, EP0506877A1, EP0506877A4, EP0830026A2, EP0830026A3, EP0830026B1, EP1278379A2, EP1278379A3, US5045940, WO1991010323A1|
|Publication number||09971236, 971236, US RE40107 E1, US RE40107E1, US-E1-RE40107, USRE40107 E1, USRE40107E1|
|Inventors||Eric C. Peters, Stanley Rabinowitz|
|Original Assignee||Avid Technology, Inc.|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (7), Non-Patent Citations (1), Classifications (55)|
|External Links: USPTO, USPTO Assignment, Espacenet|
This application is a continuation reissue application of reissue application Ser. No. 08/418,862, filed on Apr. 7, 1995, which is a continuation of reissue application Ser. No. 08/116,905, filed on Sep. 3, 1993 for reissue of U.S. Pat. No. 5,045,940, which is incorporated herein by reference.
The application includes a microfiche appendix pursuant to 37 CFR 1.96(b) containing 1 microfiche with 44 frames.
The invention relates to displaying non-linear media data, i.e., digitized audio and video data.
Non-linear media data is audio and video data recorded on a linear medium, e.g., a VHS videotape cassette, and stored in digitized form on a computer storage device, e.g., a hard disk drive. Typically, linear audio data is recorded with a “pulse” equal to the speed at which the linear video data is recorded. That is, if the video data is recorded at 30 frames per second (fps), the accompanying audio data is likewise recorded at 30 fps. This is obviously the case where the audio and video data are recorded simultaneously on a single medium, e.g., a single videotape cassette. However, the recording of the audio data can be distinct from the recording of the video data, e.g. a soundtrack can be recorded in London and a film clip can be shot on location in the Sahara Desert, in which case the speed of recording the video and the audio may or may not be equal. In addition, the standard speeds for filming video and recording audio vary from country to country. For example, the standard speed for recording video in the United States is 30 fps, while the standard speed in Europe is 24 fps. Likewise, the audio sampling rate standard is 22 kHz, but 44 kHz is also used. Thus, in cases where the speeds are different, the two recordings (often referred to as media “channels”) must be effectively combined and displayed so that there are no visible or audible gaps or overlaps.
As noted above, the speed at which video data is recorded and displayed can vary, e.g., from 30 fps to 24 fps. Video data from different countries, however, can be successfully edited together and played at a single speed because the human eye cannot detect subtle variations in the number of frames per second. The human ear, however, can detect even subtle variations in the speed at which audio data is played. These variations appear as audible clicks, silences, or other distortions. Over time, differences in the speed at which the video data is displayed and the audio data are played results in visibility and audibly mismatched video and audio data.
The invention is a data pipeline system which synchronizes the display of digitized audio and video data regardless of the speed at which the data was recorded on its linear medium. To do this, the video data is played at a constant speed, synchronized by the audio speed. The system includes a media file database (MFD) that contains a number of media files, each of which contains either digitized audio or digitized video media data. The system also includes a viewer module which synchronizes operations to transfer the media data from the media files into a number of ring buffers (preferably software). Each ring buffer is associated with a media channel being displayed. For example, there is a ring buffer which contains media data for the video channel, a ring buffer which contains media data for a first audio channel, and a ring buffer which contains media data for a second audio channel. The two audio channels are for stereo. The viewer module also synchronizes calls to a video accelerator card and a sound accelerator card so that the video data and audio data, which were recorded on a linear medium at differing pulse rates, are displayed at a constant rate without visible or audible gaps or overlaps.
Further, the invention uses a method of “staging” data in the storage buffers, particularly ring buffers, to coordinate buffer loading to encourage efficient use of the viewer module resources by not permitting the viewer to read excessive amounts of data at any one time, i.e., to read only enough data into any one ring buffer so that the amount of data in the ring buffer is approximately equivalent to the amount of data in the other ring buffers.
Other advantages and features will become apparent from the following description, and from the claims.
The pipeline system 10 also includes a viewer module 16 which synchronizes the transfer of media data from the media files 14 into three ring buffers 18a-18c which store the data or pointers to the data before it is displayed. For convenience, the terms “view” and “display” are used herein with respect to audio as well as video and should be understood to mean “play” when referring to audio. Each ring buffer is associated with a media channel. That is, ring buffer 18a contains media data for the video channel, ring buffer 18b contains media data for audio channel 1, and ring buffer 18c contains media data for audio channel 2. Upon a signal from the viewer module, a pointer to a frame of video data in the ring buffer 18a is transferred to a conventional video accelerator card 20 preferably TRUEVISION ® model NuVista which displays the video data on a monitor 22. Likewise, upon a signal from the viewer module 16, a pointer to a frame of audio data in the ring buffers 18b and 18c is transferred to a conventional sound accelerator card 24 preferably DIGIDESIGN ® model SOUND ACCELERATOR which plays the audio data through the monitor 22. The operation of the system and synchronization of the displaying of video data and playing of audio data is described below in connection with FIG. 2.
In general, “staging” the data in the ring buffers 18 encourages efficient use of the viewer module resources by not permitting the viewer to read excessive amounts of data at any one time, i.e., to read only enough data into any one ring buffer so that the amount of data in the ring buffer is roughly equivalent to the amount of data in the other ring buffers (see steps 254, 260 and 266 in FIG. 3). The staging process also encourages efficient use of the file system (disks where media is stored) by permitting large efficient reads from the disk when there is time available.
Before PLAY_AV is called the system preloads all the ring buffers with data. This considerably improves efficiency since this can be done before the time-critical operations occur.
Fist, PLAY_AV performs an error check, i.e., determines if the number of frames to be played is zero (step 200). If the number of frames to be displayed is not zero, then PLAY_AV performs further error checks, e.g., determines that the audio capability of the monitor 22 is initialized (by checking the value of the Boolean variable AUDIO_INIT_DONE) and that the audio data is to be used to synchronize the video data (by checking the value of the Boolean variable SOUND_SYNC) (step 202). Next, if there are no errors (step 202), then PLAY_AV determines which audio channels are in use (step 204), i.e., from which of the ring buffers 18 audio data will be read. Note that the embodiment described here uses two audio data channels, i.e., audio channel 1 and audio channel 2. Additional channels and hence additional buffers are of course possible. Fewer channels (i.e., 1) are also permitted.
Having determined which channels are in use, PLAY_AV next initializes variables for the frames of audio and video data to be displayed (step 206), i.e., assigns a value of 0 to the variables “nextAudio1”, “nextAudio2”, and “nextVideo”. In addition, PLAY_AV initializes the end of file marker for the video data (“videoEOF”) to FALSE (step 208) and also initializes the end of file markers for the audio data (“audio1EOF” and “audio2EOF”) (step 220). Specifically, if audio channel 1 is being used (step 204) and soundsync is TRUE (step 202), then audio1EOF equals FALSE. Otherwise, audio1EOF equals TRUE. Likewise, if audio channel 2 is being used (step 204) and soundsync is TRUE (step 202), then audio2EOF equals FALSE. Otherwise, audio2EOF equals TRUE.
Once PLAY_AV has determined from which channels it will read data (steps 204-210), it begins an infinite (while TRUE) loop to read, transfer, and display media data. PLAY_AV does not exit the loop until the clip is exhausted, i.e., there are no more frames to display. At the beginning of each pass through the loop, PLAY_AV initializes several variables, including the maximum number of bytes it will read (“max_read”) and the number of audio channel 1 and audio channel 2 bytes yet to be played (“AbytesUnplayed1” and “AbytesUnplayed2”) (step 212). In addition, PLAY_AV initializes several variables that indicate whether it should “wait”, execute a “critical” read, or execute an “efficient” read (step 214) (each of which is described below in more detail), and also initializes a variable “fewest_buffers” to MAX_LONG (step 216), i.e., a number far larger than the number of bytes in a ring buffer.
Having initialized the loop variables, PLAY_AV next determines which of the ring buffers has the least amount of data in it, i.e., which ring buffer has fewer bytes free. PLAY_AV begins by checking ring buffer 18b (audio channel 1) as described below.
To determine what the state of ring buffer 18b (audio channel 1) is, PLAY_AV determines if audio1EOF is FALSE and ring buffer 18b has at least 735 bytes free (step 218). If so, PLAY_AV goes on to determine if the number of bytes free in ring buffer 18b is less than fewest buffers (step 220) (which is always true initially since fewest_buffers was assigned MAX_LONG above). The action variable is then assigned a value of “read_AUDIO1” (step 222). The critical variable is assigned a value of TRUE if fewest_buffers is less than a predefined number (AUDIO_ALERT_BUFS), and is assigned a value of FALSE otherwise (step 224). And the efficient variable is assigned a value of TRUE if the number of bytes free in ring buffer 18b is greater than or equal to a predefined number (EFFICIENT_AUDIO_BYTES) and if the size of the next audio frame times the typical audio frame size is greater than or equal to EFFICIENT_AUDIO_BYTES (step 226). Otherwise the variable efficient is assigned a value of FALSE.
To determine what the state of ring buffer 18c (audio channel 2) is, PLAY_AV determines if audio2EOF is FALSE and ring buffer 18c has at least 735 bytes free (step 228). If so, PLAY_AV goes on to determine if the number of bytes free in ring buffer 18c is less than fewest buffers (step 230) (i.e., whether ring buffer 18c has fewer bytes than ring buffer 18b as determined above). If ring buffer 18c indeed contains fewer bytes, the action variable is assigned a value of “read_AUDIO2” (step 232). The critical variable is assigned a value of TRUE if fewest_buffers is less than a predefined number (AUDIO_ALERT_BUFS), and is assigned a value of FALSE otherwise (step 234). And the efficient variable is assigned a value of TRUE if the number of bytes free in ring buffer 18c is greater than or equal to a predefined number (EFFICIENT_AUDIO_BYTES) and if the size of the next audio frame times the typical audio frame size is greater than or equal to EFFICIENT_AUDIO_BYTES (step 236). Otherwise the efficient variable is assigned a value of FALSE.
Finally, to determine what the state of ring buffer 18a (video channel) is, PLAY_AV determines if videoEOF is FALSE and ring buffer 18c has at least 1 byte free (step 238). If so, PLAY_AV goes on to determine if the number of bytes free in ring buffer 18a is less than fewest_buffers (step 240) (i.e., whether ring buffer 18a has fewer bytes than ring buffer 18c as determined above). If ring buffer 18a indeed contains fewer bytes, the action variable is assigned a value of “read_VIDEO” (step 242). The critical variable is assigned a value of TRUE if fewest_buffers is less than a predefined number (VIDEO_ALERT_BUFS), and is assigned a value of FALSE otherwise (step 244). And the efficient variable is assigned a value of TRUE if the number of bytes free in ring buffer 18a is greater than or equal to a predefined number (EFFICIENT_VIDEO_BUFS) and if the size of the next video frame is greater than or equal to EFFICIENT_VIDEO_BUFS (step 246). Otherwise, the efficient variable is assigned a value of FALSE.
Having determined, in steps 218-246, which channel and hence ring buffer has the fewest bytes and therefore should be filled, PLAY_AV executes either a critical read operation or an efficient read operation depending on the values assigned to the critical and efficient variables. In addition, the execution of the efficient read operation further depends on two factors: 1) whether there is an upcoming transition between clips, i.e., the end of the current clip is near and the viewer 16 will soon need to retrieve data from a different media file 14; and 2) whether the viewer is coming to an end of the ring buffer from which it is reading. If either of these factors is true, the efficient variable is also true. Thus, if the critical and efficient variables are both FALSE (step 248), PLAY_AV assigns the value of “wait” to the action variable and checks several other conditions to determine if some other value should be assigned to the action variable (step 250). (The conditions are reproduced below in Boolean notation below for ease of understanding).
if (!critical && !efficient)
action = wait;
if (!videoEOF && vbufsFree >= VID_MIN_READ &&
( (vbufsFree >= EFFICIENT_VIDEO—
|| ( (nextVideoTA < EFFICIENT—
VIDEO_BUFS) && (nextVideoTA > 0))
action = read_VIDEO;
if (action == wait && !audio1EOF
&& (aring1.abytesFree >= 735)
&& ( (aring1.abytesFree >= EFFICIENT—
&& (nextAudio1TA > 0)
action = read_AUDIO1:
if(action != read_VIDEO && !audio2EOF
&& (aring2.abytesFree > 735)
&& ( (aring2.abytesFree >= EFFICIENT—
&& (nextAudio2TA > 0)
if (action == wait)
action = read_AUDIO2:
else /* action is read_AUDIO1 */
* Could do either A1 or A2
* Do the one with the most
empty ring buffer.
if aring2.abytesFree > aring1.
action = read_AUDIO2:
/* if not, then action is already
} /* end analysis for non-critical. non-efficient reads */
Depending on the outcome of the analysis above, the action variable has one of three values: read_VIDEO, read_AUDIO1, or read_AUDIO2. In the case of read_VIDEO, PLAY_AV assigns to the variable “vid-Trigger” a number of bytes to read from the media file 14 (step 252). However, if that number exceeds the number necessary to match the number of bytes contained in the audio channels, PLAY_AV adjusts the number downward (step 254) so that viewer resources are not tied up reading an excessive amount of video data. (See the discussion of staging above.) Finally, PLAY_AV retrieves the video bytes from the media file and transfers them to the ring buffer 18a (step 256).
In the case of read_AUDIO1, PLAY_AV assigns to the variable max_read a number of bytes to read from the media file 14 (258). However, if that number exceeds the number of bytes contained in audio channel 2, PLAY_AV adjusts the number downward (step 260) so that viewer resources are not tied up reading an excessive amount of audio data. Finally, PLAY_AV retrieves the audio bytes from the media file 14 and transfers them to the ring buffer 18b (step 262).
In the case of read_AUDIO2, PLAY_AV assigns to the variable max_read a number of bytes to read from the media file 14 (step 264). However, if that number exceeds the number of bytes contained in audio channel I, PLAY_AV adjusts the number downward (step 266) so that viewer resources are not tied up reading an excessive amount of audio data. Finally, PLAY_AV retrieves the audio bytes from the media file 14 and transfer them to the ring buffer 18c (step 268).
Having determined into which ring buffer to read data and done so (steps 218-268), PLAY_AV next checks several conditions which might cause the display to stop (step 270), e.g., the viewer reached the end of file for the video data, the viewer reached the end of file for the audio1 or audio2 data, or the user interrupted the display. Finally, PLAY_AV selects the current frame from one of the ring_buffers (step 272), and sends a pointer to the frame to the appropriate hardware (step 274), i.e., the video accelerator card 22 or the sound accelerator card 24 depending on whether the frame is video or audio. The hardware plays the frame (step 276) and then interrupts the software (step 278), i.e., PLAY_AV, which then repeats the above described process.
In order to ensure that the audio and video stay in synch, it is essential that the system read the correct number of audio bytes of data corresponding to the video frame being played. This is especially important where the audio track was digitized independently of the video track. To ensure synchronization, when any audio is digitized, the system stores away in the audio media file the number of video frames associated with that audio. Then, later, when a request for a certain number of frames of audio is made, the system can form a proportion against the original number of video frames and audio bytes to find the correct number of audio bytes needed to agree with the number of video frames in the current request.
To ensure efficiency when playing video that has been captured at less than 30 frames per second, the system stores a capture mask with any video media file that has been captured at this lower rate. The capture mask consists of a sequence of 0's and 1's. There are m one-bits and a total of n bits all together, to indicate that only m video frames are present out of every n. When playing this video, the system successively rotates this capture mask one bit to the left. If the high order bit is a 1, this means this is a new frame and we play it. If the bit is a 0, this means this is a repeated frame and we need not play it. The capture mask always ends with a 1, so when it shifts into a word of all 0's, we reload the capture mask.
The attached microfiche appendix (incorporated herein by reference) embodies the viewer module 16 of FIG. 1. The programming language and compiler used are THINK C version 3.01 by Symantec Corporation, and the computer used is the Macintosh II running under Mac OS version 6.0.2.
Portions of the disclosure of this patent document, including the appendix, contain material which is subject to copyright protection and as to which copyright is claimed. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document as it appears in the Patent and Trademark Office files, but otherwise reserves all copyright rights whatsoever, for example, including but not restricted to the right to load the software on a computer system.
Other embodiments are within the following claims.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US3609227||Sep 9, 1968||Sep 28, 1971||Ampex||Random access audio/video information retrieval system|
|US4536836 *||Nov 15, 1982||Aug 20, 1985||Storage Technology Corporation||Detection of sequential data stream|
|US4667286 *||Dec 20, 1984||May 19, 1987||Advanced Micro Devices, Inc.||Method and apparatus for transferring data between a disk and a central processing unit|
|US4698664||Mar 4, 1985||Oct 6, 1987||Apert-Herzog Corporation||Audio-visual monitoring system|
|US4882743||Aug 1, 1988||Nov 21, 1989||American Telephone And Telegraph||Multi-location video conference system|
|US4949169||Oct 27, 1989||Aug 14, 1990||International Business Machines Corporation||Audio-video data interface for a high speed communication link in a video-graphics display window environment|
|US4956768 *||Feb 24, 1988||Sep 11, 1990||Etat Francais, Centre National d'Etudes des Telecommunications||Wideband server, in particular for transmitting music or images|
|U.S. Classification||348/472, 725/92, 725/145, 348/484, 709/219, 725/151, 709/251, 725/148|
|International Classification||H04N7/24, H04N7/52, H04N21/43, H04N21/439, H04N21/434, H04N21/2368, H04N21/44, H04N21/654, H04N21/432, H04N21/234, H04N21/6332, H04N5/937, G11B27/034, G11B27/00, H04N5/93, H04N9/802, G11B27/10|
|Cooperative Classification||H04N21/2368, H04N5/937, H04N9/802, H04N21/4341, H04N21/44004, H04N21/23406, H04N21/6332, H04N21/4325, G11B27/034, H04N5/93, G11B27/005, G11B27/10, H04N21/654, H04N21/4398, H04N21/4307|
|European Classification||H04N21/234B, H04N21/2368, H04N21/654, H04N21/6332, H04N21/44B, H04N21/434A, H04N5/937, H04N21/439R, H04N21/43S2, H04N21/432P, G11B27/00V, H04N9/802, G11B27/10, H04N5/93, G11B27/034|