Search Images Maps Play YouTube News Gmail Drive More »
Sign in
Screen reader users: click this link for accessible mode. Accessible mode has the same essential features but works better with your reader.

Patents

  1. Advanced Patent Search
Publication numberUS20050154987 A1
Publication typeApplication
Application numberUS 10/757,138
Publication dateJul 14, 2005
Filing dateJan 14, 2004
Priority dateJan 14, 2004
Also published asCN1910580A, CN100538698C, EP2107477A2, EP2107477A3, US7406409, US20050154973
Publication number10757138, 757138, US 2005/0154987 A1, US 2005/154987 A1, US 20050154987 A1, US 20050154987A1, US 2005154987 A1, US 2005154987A1, US-A1-20050154987, US-A1-2005154987, US2005/0154987A1, US2005/154987A1, US20050154987 A1, US20050154987A1, US2005154987 A1, US2005154987A1
InventorsIsao Otsuka, Ajay Divakaran, Masaharu Ogawa, Kazuhiko Nakane
Original AssigneeIsao Otsuka, Ajay Divakaran, Masaharu Ogawa, Kazuhiko Nakane
Export CitationBiBTeX, EndNote, RefMan
External Links: USPTO, USPTO Assignment, Espacenet
System and method for recording and reproducing multimedia
US 20050154987 A1
Abstract
A system and method summarizes multimedia stored in a compressed multimedia file partitioned into a sequence of segments, where the content of the multimedia is, for example, video signals, audio signals, text, and binary data. An associated metadata file includes index information and an importance level for each segment. The importance information is continuous over as closed interval. An importance level threshold is selected in the closed interval, and only segments of the multimedia having a particular importance level greater than the importance level threshold are reproduced.
Images(12)
Previous page
Next page
Claims(78)
1. A system for summarizing multimedia, comprising:
storage for storing a compressed multimedia file partitioned into a sequence of segments, and a metadata file including index information and an importance level information for each segment in the sequence, the importance level being continuous over closed interval;
unit for selecting an importance level threshold in the closed interval; and
unit for reproducing, using the index information, only segments of the multimedia having a particular importance level greater than the importance level threshold.
2. The system of claim 1, in which the sequence of the segments is temporal, and the index information includes a start time and an end time of each segment.
3. The system of claim 1, in which the sequence of the segments is temporal, and the index information includes a frame number.
4. The system of claim 1, in which the multimedia is compressed.
5. The system of claim 1, in which the multimedia includes video and audio signals.
6. The system of claim 1, in which the importance level is contained in a file that is distinct from the multimedia file.
7. The system of claim 1, in which the importance level is real number.
8. The system of claim 1, in which the multimedia comprises text and binary data.
9. The system of claim 1, in which the importance level threshold is expressed as range of real number values.
10. The system of claim 1, in which the importance level threshold is expressed as a plurality of ranges of real number values.
11. The system of claim 1, in which the importance level threshold is viewer selected.
12. The system of claim 1, in which the importance level threshold is selected automatically.
13. The system of claim 1, in which only segments of the multimedia having a particular importance level less than the importance level threshold are reproduced.
14. The system of claim 1, in which the multimedia file includes a plurality of programs, and further comprising:
unit for reproducing only segments of the plurality of programs having a particular importance greater than the importance level threshold.
15. The system of claim 1, further comprising:
unit for specifying an abstraction ratio, the abstraction ratio representing the importance level threshold.
16. The system of claim 1, in which the segments are ordered according to the importance level, and further comprising:
unit for reproducing the segments in a descending order of the importance level.
17. The system of claim 1, in which the reproducing terminates after a predetermined amount of time.
18. The system of claim 1, further comprising:
recorder for recording the compressed multimedia and the metadata file on the storage.
19. The system of claim 1, in which only segments greater than a time threshold are reproduced.
20. The system of claim 19, in which the segments shorter than the time threshold are extended to satisfy the time threshold.
21. The system of claim 20, in which the extending is additive.
22. The system of claim 20, in which the extending is multiplicative.
23. The system of claim 1, further comprising:
unit for searching the multimedia to locate a particular segment to begin the reproducing.
24. The system of claim 1, in which the storage is an optical storage disk.
25. The system of claim 1, in which the storage is a magnetic storage device.
26. The system of claim 1, further comprising:
unit for extracting the importance level and the indexing information while decoding the multimedia file.
27. A method for summarizing multimedia, comprising:
storing a compressed multimedia file partitioned into a sequence of segments;
storing a metadata file including index information and an importance level for each segment in the sequence, the importance level being continuous over as closed interval;
selecting an importance level threshold in the closed interval; and
reproducing, using the index information, only segments of the multimedia having a particular importance level greater than the importance level threshold.
28. The method of claim 27, in which the sequence of the segments is temporal, and the index information includes a start time and an end time of each segment.
29. The method of claim 27, in which the sequence of the segments is temporal, and the index information includes a frame number.
30. The method of claim 27, further comprising:
compressing the multimedia.
31. The method of claim 27, in which the multimedia includes video and audio signals.
32. The method of claim 27, in which the importance level is contained in a file that is distinct from the multimedia file.
33. The method of claim 27, in which the importance level is real number.
34. The method of claim 27, in which the multimedia comprises multiplexed video and audio signals.
35. The method of claim 27, in which the importance level threshold is expressed as a range of real number values.
36. The method of claim 27, in which the importance level threshold is expressed as a plurality of ranges of real number values.
37. The method of claim 27, in which the importance level threshold is viewer selected.
38. The method of claim 27, in which the importance level threshold is selected automatically.
39. The method of claim 27, in which only segments of the multimedia having a particular importance level less than the importance level threshold are reproduced.
40. The method of claim 27, in which the multimedia file includes a plurality of programs, and further comprising:
reproducing only segments of the plurality of programs having a particular importance level greater than the importance level threshold.
41. The method of claim 27, further comprising:
specifying an abstraction ratio, the abstraction ratio representing the importance level threshold.
42. The method of claim 27, in which the segments are ordered according to the importance level, and further comprising:
reproducing the segments in a descending order of the importance level.
43. The method of claim 27, in which the reproducing terminates after a predetermined amount of time.
44. The method of claim 27, further comprising:
recording the compressed multimedia and the metadata file on the storage.
45. The method of claim 27, in which only segments greater than a time threshold are reproduced.
46. The method of claim 45, in which the segments shorter than the time threshold are extended to satisfy the time threshold.
47. The method of claim 46, in which the extending is additive.
48. The method of claim 46, in which the extending is multiplicative.
49. The method of claim 27, further comprising:
searching the multimedia to locate a particular segment to begin the reproducing.
50. The method of claim 27, in which the multimedia file and the metadata file are stored on an optical storage disk.
51. The method of claim 27, in which the multimedia file and the metadata file are stored on a magnetic storage device.
52. The method of claim 27, further comprising:
extracting the importance level and the indexing information while decoding the multimedia file.
53. A computer readable medium, comprising:
a compressed multimedia file partitioned into a sequence of segments; and
a metadata file including index information and an importance level information for each segment in the sequence, the importance information being continuous over a closed interval, the compressed multimedia file and the metadata file, when read by a computer using the index information, causes the computer to reproduced only segments of the multimedia having a particular importance level greater than a importance level threshold.
54. The medium of claim 53, in which the sequence of the segments is temporal, and the index information includes a start time and an end time of each segment.
55. The medium of claim 53, in which the sequence of the segments is temporal, and the index information includes a frame number.
56. The medium of claim 53, in which the multimedia is compressed.
57. The medium of claim 53, in which the multimedia includes video and audio.
58. The medium of claim 53, in which the importance level information is contained in a file that is distinct from the multimedia file.
59. The medium of claim 53, in which the importance level is real number.
60. The medium of claim 53, in which the multimedia comprises multiplexed video and audio signals.
61. The medium of claim 53, in which the segments are ordered according to the importance level.
62. The medium of claim 53 is an optical storage disk.
63. The medium of claim 53 is a magnetic storage device.
64. The medium of claim 53, further comprising:
flags for indicating a validity of the metadata.
65. A disc recorder, comprising:
recorder for recording an inputted video signal or audio signal on a predetermined recording medium;
unit for partitioning the video signal or audio signal into predetermined segments to extract a feature from the video signal or a feature from the audio signal for each segment; and
unit for generating metadata including feature data corresponding to the features and start positions of the segments,
wherein the recorder records the metadata on the recording medium in association with the segments.
66. The disc recorder according to claim 65, in which the predetermined recording medium further comprises:
a first directory for storing files corresponding to the metadata; and
a second directory for storing files corresponding to the segments.
67. The disc recorder according to claim 65, further comprising:
comparator for performing comparison between a value corresponding to the feature data and a predetermined threshold;
unit for searching the segments recorded on the recording medium for a segment that matches a result from the comparison; and
unit for reproducing video or audio corresponding to the segment retrieved by the unit for searching.
68. The disc recorder according to claim 67, in which the unit for searching searches for a segment that corresponds to the feature data having a value larger than the threshold as a result of the comparison by the comparator.
69. The disc recorder according to claim 67, in which the comparator performs comparison between a reproducing time of the video corresponding to the segment retrieved by the unit for searching and a predetermined threshold; and
in a case where the reproducing time has a value smaller than the predetermined threshold as a result of the comparison by the comparator, the apparatus for browsing video does not reproduce the video or audio corresponding to the retrieved segment.
70. The disc recorder according to claim 67, in which the comparator performs comparison between a reproducing time of the video corresponding to the segment retrieved by the searcher and a predetermined threshold; and
in a case where the reproducing time has a value smaller than the predetermined threshold as a result of the comparison by the comparator, the apparatus for browsing video adjusts the reproducing time such that the reproducing time of video or audio reproduced by including the video or audio corresponding to the segment becomes equal to or larger than the predetermined threshold.
71. A method for recording, comprising:
recording an inputted video signal or audio signal on a predetermined recording medium;
partitioning the video signal or audio signal into predetermined segments to extract a feature from the video signal or a feature from the audio signal for each segment;
generating metadata including feature data corresponding to the features and start positions of the segments; and
upon the recording, recording the metadata on the recording medium in association with the segments.
72. The method according to claim 71, further comprising:
comparing a value corresponding to the feature data to a predetermined threshold;
searching the segments recorded on recording medium for a segment that matches a result from the comparing; and
reproducing video or audio corresponding to the segment retrieved by the searching.
73. A disc player comprising:
unit for extracting the feature data from the metadata recorded on a recording medium;
comparator for performing a comparison between a value corresponding to the feature data and a predetermined threshold;
unit for searching among the segments recorded on the recording medium for a segment corresponding to a result of the comparison; and
unit for reproducing video or audio corresponding to the segment searched by the unit for searching.
74. The disc player according to claim 73, wherein the unit for searching searches for the segment that corresponds to the feature data having a value larger than the predetermined threshold.
75. The disc player according to claim 73, wherein:
the comparator performs comparison between a reproducing time of the video corresponding to the segment searched by the unit for searching and another predetermined threshold; and
in a case where the reproducing time has a value smaller than said another predetermined threshold, the disc player does not reproduces the video or audio corresponding to the searched segment.
76. The disc player according to claim 73, wherein:
the comparator performs comparison between a reproducing time of the video corresponding to the segment searched by the searcher and another predetermined threshold; and
in a case where the reproducing time has a value smaller than said another predetermined threshold, the disc player adjusts the reproducing time to become equal to or larger than said another predetermined threshold.
77. A method for playing video, comprising:
extracting the feature data from the metadata recorded on a recording medium;
performing comparison between a value corresponding to the feature data and a predetermined threshold;
searching among the segments recorded on the recording medium for a segment corresponding to a result of the comparison; and
reproducing video or audio corresponding to the segment searched by the searching.
78. A recording medium comprising:
a first directory for storing files corresponding to a segment which is generated by partitioning an inputted audio signal and an inputted video signal;
a second directory for storing files corresponding to a metadata which is generated based on the inputted audio signal or the inputted video signal, and corresponding to the each segment.
Description
FIELD OF THE INVENTION

This invention relates generally to processing multimedia, and more particularly to recording video signals, audio signals, text, and binary data on storage media, and for reproducing selected portions of the multimedia.

BACKGROUND OF THE INVENTION

In order to quickly review and analyze a video, for example a movie, a recorded sporting event or a news broadcast, a summary of the video can be generated. A number of techniques are known for summarizing uncompressed and compressed videos.

The conventional practice is to first segment the video into scenes or ‘shots’, and then to extract low and high level features. The low level features are usually based on syntactic characteristics such as color, motion, and audio components, while the high level features capture semantic information.

The features are then classified, and the shots can be further segmented according to the classified features. The segments can be converted to short image sequences, for example, one or two seconds ‘clips’ or ‘still’ frames, and labeled and indexed. Thus, the reviewer can quickly scan the summary to select portions of the video to playback in detail. Obviously, the problem with such summaries is that the playback can only be based on the features and classifications used to generate the summary.

In order to further assist the review, the segments can be subjectively rank ordered according to a relative importance. Thus, important events in the video, such as climactic scenes, or goal scoring opportunities can be quickly identified, see, Fujiwara et al. “Abstractive Description of Video Using Summary DS,” Point-illustrated Broadband+Mobile Standard MPEG Textbook, ASCII Corp., p. 177 FIGS. 5-24 Feb. 11, 2003, also “ISO/IEC 15938-5: 2002 Information technology—Multimedia content description interface—Part 5: Multimedia Description Schemes,” 2002. After an important video segment has been located, the viewer can use fast-forward or fast-reverse capabilities of the playback device to view segments of interest, see “DVR-7000 Instruction Manual,” Pioneer Co., Ltd., p. 49, 2001.

Another technique for summarizing a news video uses motion activity descriptors, see U.S. patent application Ser. No. 09/845,009, titled “Method for Summarizing a Video Using Motion Descriptors,” filed by Divakaran, et al., on Apr. 27, 2001. A technique for generating soccer highlights uses a combination of video and audio features, see U.S. patent application Ser. No. 10/046,790, titled “Summarizing Videos Using Motion Activity Descriptors Correlated with Audio Features,” filed by Cabasson, et al., on Jan. 15, 2002. Audio and video features can also be used to generate highlights for news, soccer, baseball and golf videos, see U.S. patent application Ser. No. 10/374,017, titled “Method and System for Extracting Sports Highlights from Audio Signals,” filed by Xiong, et al., on Feb. 25, 2003. Those techniques extract key segments of notable events from the video, such a scoring opportunity or an introduction to a news story. The original video is thus represented by an abstract that includes the extracted key segments. The key segments can provide entry points into the original content and thus allow flexible and convenient navigation.

There are a number of problems with prior art video recording, summarization and playback. First, the summary is based on some preconceived notion of the extracted features, classifications, and importance, instead of those of the viewer. Second, if importance levels are used, the importance levels are usually quantized to a very small number of levels, for example, five or less. More often, only two levels are used, i.e., the interesting segments that are retained, and the rest of the video that is discarded.

In particular, the hierarchical description proposed in the MPEG-7 standard is very cumbersome if a fine quantization of the importance is used because the number of levels in the hierarchy becomes very large, which in turn requires management of too many levels.

The MPEG-7 description requires editing of the metadata whenever the content is edited. For example, if a segment is cut out of the original content, all the levels affected by the cut need to be modified. That can get cumbersome quickly as the number of editing operations increases.

The importance levels are highly subjective, and highly context dependent. That is, the importance levels for sports videos depend on the particular sports genre, and are totally inapplicable to movies and news programs. Further, the viewer has no control over the length of the summary to be generated.

The small number of subjective levels used by the prior art techniques make it practically impossible for the viewer to edit and combine several different videos based on the summaries to generate a derivate video that reflects the interests of the viewer.

Therefore, there is a need to record and reproduce a video in a manner that can be controlled by the viewer. Furthermore, there is a need for specifying importance levels that are content independent, and not subjective. In addition, there is a need to provide more than a small number of discrete importance levels. Lastly, there is a need to enable the viewer to generate a summary of any length, depending on a viewer-selected level of importance.

SUMMARY OF THE INVENTION

A system and method summarizes multimedia stored in a compressed multimedia file partitioned into segments.

An associated metadata file includes index information and importance level information for each segment in the sequence. In a preferred embodiment, the files are stored on a storage medium such as a DVD.

The importance information is continuous over a closed interval. An importance level threshold, or range, is selected in the closed interval. The importance level can be viewer selected.

When the files are read, only segments of the multimedia having a particular importance level greater than the importance level threshold are reproduced.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a system for reproducing multimedia according to the invention;

FIG. 2 is a block diagram of a file structure for multimedia according to the invention;

FIG. 3 is a block diagram of a data structure of a metadata file according to the invention;

FIG. 4 is block diagram of indexing the multimedia according to the invention using the metadata file;

FIG. 5 is a graph representing an abstractive reproduction according to the invention;

FIG. 6A is a graph of an alternative abstractive reproduction according to the invention;

FIG. 6B is a graphics image representing an abstraction ratio;

FIG. 7 is a block diagram of a system for recording compressed multimedia files and metadata files on a storage media according to the invention;

FIG. 8 is a graph of an alternative abstractive reproduction according to the invention;

FIG. 9 is a graph of an alternative abstractive reproduction according to the invention;

FIG. 10 is a graph of an alternative abstractive reproduction according to the invention; and

FIG. 11 is a block diagram of a system for recording multimedia according to the invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

Reproducing System Structure

FIG. 1 shows a system 100 for reproducing multimedia, where the content of the multimedia is, for example, video signals, audio signals, text, and binary data. The system includes a storage media 1, such as a disc or tape, for persistently storing multimedia and metadata organized as files in directories. In the preferred embodiment, the multimedia is compressed using, e.g., MPEG and AC-3 standards. The multimedia has been segmented, classified, and indexed using known techniques. The indexing can be based on time or frame number, see U.S. Pat. No. 6,628,892, incorporated herein by reference.

The metadata includes index and importance information. As an advantage of the present invention, and in contrast with the prior art, the importance information is continuous over a closed interval, e.g., [0, 1] or [0, 100]. Therefore, the importance level, is not in terms of ‘goal’ or ‘head-line-news-time’, but rather a real number, e.g., the importance is 0.567 or +73.64.

As an additional advantage, the continuous importance information is context and content independent, and not highly subjective as in the prior art. Both of these features enable a viewer to reproduce the multimedia to any desired length.

The metadata can be binary or text, and if necessary, protected by encryption. The metadata can include file attributes such as dates, validity codes, file types, etc. The hierarchical file and directory structure for the multimedia and metadata are described with respect to FIG. 2.

As shown in FIG. 1, a reader drive 10 reads the multimedia and metadata files from the storage media 1. A read buffer 11 temporarily stores data read by the reader drive 10. A demultiplexer 12 acquires, sequentially, multimedia data from the read buffer, and separates the multimedia data into a video stream and an audio stream.

A video decoder 13 processes a video signal 17, and an audio decoder 14 processes the audio signal 18 for an output device, e.g., a television monitor 19.

A metadata analyzing section 15 acquires sequentially metadata from the read buffer 11. A reproduction control section 16, including a processor, controls the system 100. The functionality of the metadata analyzing section 15 can be implemented with software, and can be incorporated as part of the reproduction control section 16.

It should be noted that for any implementation described herein the multimedia files and the metadata files do not need to be recorded and reproduced concurrently. In fact, the metadata file can be analyzed independently to enable the viewer to quickly locate segments of interest in the multimedia files. In addition, the multimedia and the metadata can be multiplexed into a single file, and demultiplexed when read.

File and Directory Structure

FIG. 2 shows the hierarchical structure 200 of the files and directories stored on the media 1. A root directory 20 includes a multimedia directory 21 and a metadata directory 22. The multimedia directory 21 stores information management files 23, multimedia files 24, and backup files 25. The metadata directory 22 stores metadata files 26. It should be noted that other directory and file structures are possible. The data in the multimedia files 24 contains the multiplexed video and/or audio signals.

Note that either the information management files 23 and/or the multimedia data files 24 can includes flags indicating the presence or absence or invalidity of the metadata.

Metadata Structure

FIG. 3 shows the hierarchical structure 300 of the metadata files 26. There are five levels A-E in the hierarchy, including metadata 30 at a highest level, followed by management information 31, general information 32, shot information 33, and index and importance information 34.

The metadata managing information 31 at level B includes a comprehensive description 31 a of the overall metadata 30, video object (VOB) metadata information search pointer entries 31 b, and associated VOB information entries 31 c. The associations do not need to be one-to-one, for instance, there can multiple pointers 31 b for one information entry 31 c, or one information entry for multiple VOBs, or none at all.

At the next level C, each VOB information entry 31 c includes metadata general information 32 a, and video shot map information 32 b. The metadata general information 32 a can includes program names, producer names, actor/actress/reporter/player names, an explanation of the content, broadcast date, time, and channel, and so forth. The exact correspondences are stored as a table in the general information entry 32 a.

At the next level D, for each video shot map information entry 32 b there is video shot map general information 33 a, and one or more video shot entries 33 b. As above, there does not need to be a one-to-one correspondence between these entries. The exact correspondences are stored as a table in the general information entry 33 a.

At the next level E, for each video shot entry 33 b, there are start time information 34 a, end time information 34 b, and an importance level 34 c. As stated above, frame numbers can also index the multimedia. The index information can be omitted if the index data can be obtained from the video shot reproducing time information 34 a. Any ranking system can be used for indicating the relative importance. As stated above, the importance level can be continuous and content independent. The importance level can be added manually or automatically.

Multimedia Indexing

FIG. 4 shows the relationship between the multimedia recorded and reproduced according to the invention, and the metadata. Program chain information 40 stored in the management information file 23 describes a sequence for reproducing multimedia of a multimedia data file 24. The chain information includes programs 41 based on a reproducing unit as defined by the program chain information 40. Cells 42 a-b are based on a reproducing unit as defined by the program 41. In digital versatile disk (DVD) type of media, a ‘cell’ is a data structure to represent a portion of a video program.

Video object information 43 a-b describes a reference destination of the actual video or audio data corresponding to the reproducing time information, i.e., presentation time, designated by the cell 42 described in the management information file 23.

Map tables 44 a-b are for offsetting the reproducing time information defined by the VOB information 43 and converting the same into actual video data or audio data address information. Video object units (VOBU) 45 a and 45 b describe the actual video or audio data in the multimedia data file 24. These data are multiplexed in a packet structure, together with the reproducing time information. The VOBUs are the smallest units for accessing and reproducing the multimedia. A VOBU includes one or more group-of-pictures (GOP) of the content.

Importance Threshold Based Reproduction

FIG. 5 shows the abstractive reproduction according to the invention, where the horizontal axis 51 defines time and the vertical axis 50 defines an importance level. As shown in FIG. 5, the importance level varies continuously over a closed interval 55, e.g., [0, 1] or [0, 100]. Also, As shown, the importance level threshold 53 can be varied 56 by the viewer over the interval 55.

The time is in terms of the video-shot start time information 34 a and the video-shot end time information 34 b of FIG. 3. The importance is in terms of the video-shot importance level 34 c. An example importance curve 52 is evaluated according to an importance threshold 53.

During a reproduction of the multimedia, portions of the multimedia that have an importance greater than the threshold 53 are reproduced 58 while portions that have an importance less than the threshold are skipped 59. The curve 54 indicates the portions that are included in the reproduction. The reproduction is accomplished using the reproducing control section 16 based on the metadata information obtained from the metadata analyzing section 15.

It should be noted that multiple continuous importance levels, or one or more importance level ranges can be specified so that only segments having a particular importance according to the real number values in the importance ranges are reproduced. Alternatively, only the least important segments can be reproduced.

To reproduce a desired program, the information management file 23 is read by the reader drive 10. This allows one to determine that the program is configured as, e.g., two cells.

Each cell is described by a VOB number and index information, e.g., a start and end time. The time map table 44 a for the VOB1 information 43 a is used to convert each presentation time to a presentation time stamp (PTS), or address information in the VOB1 concerned, thus obtaining an actual VOBU 45.

Likewise, the cell-2 42 b is also obtained with a VOBU 45 b group of VOB2 by the use of a time map table 44 b of VOB2 information 43 b. In this example, a cell, in this case, cell 42 b, is indexed by the VOB 43 b using the time map table 44 b.

The data of the VOBUs 45 are provide sequentially for demuliplexing and decoding. The video signal 17 and the audio signal 18 are synchronized using the presentation time (PTM) and provided to the output device 19.

When the viewer selects a desired program e.g. program 1 41, the cells 42 a-b that contain the configuration of the relevant program 41 can be found by the program chain information 40. The program chain information is thus used to find the corresponding VOB as well as the presentation time (PTM).

The metadata 26 described in FIG. 4 is used as follows, and as illustrated in FIG. 3. First, the metadata information management information 31 a is used to locate the metadata information search pointer 31 b corresponding to the desired VOB number. Then, the search pointer 31 b is used to locate the VOB metadata information 31 c. The VOB metadata includes video shot map information, which in turn includes the start time, stop time and importance level of each of the video shots. Thus, the VOB metadata is used to collect all the shots that have a presentation time (PTM) included in the range specified by the start time and end time of the cell, as well as their corresponding importance levels. Then, only those portions that exceed the desired importance level 53 are retained.

It should be noted that multiple programs can be selected for reproduction, and any number of techniques are possible to concatenate only the reproduced segments.

Alternative Abstractive Reproduction

FIG. 6A shows an alternative abstractive reproduction according to the invention, where the vertical axis 50 defines an importance level, the horizontal axis 51 defines time, and the continuous curve 52 indicates importance levels. Line 63 is an importance level threshold, and line 64 a reproduction for only those segments that have a particular importance greater than the threshold. Other segments are skipped.

Abstraction Ratio

FIG. 6B shows an abstraction ratio 60. The abstraction ratio can vary, e.g., from 0% to 100%, i.e., over the entire interval 55. The abstract ratio is shown as a graphics image superposed on an output image on the output device 19, which can be a playback device. A portion 61 is a current abstraction ratio that is user selectable. The threshold 63 is set according to the user selectable current abstraction ratio 61. The user can set the abstraction ratio using some input device, e.g., a keyboard or remote control 17 a, see FIG. 1. If the abstraction ratio is 100%, then the entire multimedia file is reproduced, a ratio of 50% only reproduces half of the file. The abstraction ratio can be changed during the reproduction. It should be noted, that the graphics image can have other forms, for example, a sliding bar, or a numerical display in terms of the ratio or actual time. Alternatively, the abstraction ratio can be varied automatically by the metadata analyzing section 15 or the reproducing control section 16.

It should be noted, that pointers to the video segments can be sorted in a list according to a descending order of importance. Thus, it is possible to obtain a summary of any desired length by going down the list in the sorted order, including segments until a time length requirement is met.

Recording System Structure

FIG. 7 shows a block diagram of a system 700 for recording compressed multimedia files and metadata files on storage media 2, such as a disc or tape. The system includes a video encoder 71 and an audio encoder 72, which take as input video signals 78, audio signals 79, text, images, binary data, and the like. The outputs of the encoder are multiplexed 73 and stored temporarily in a write buffer 74 as multimedia data. The outputs are also passed to a metadata generating section 75 which also writes output to the write buffer.

A write drive 70 then writes the multimedia and the metadata to the storage media 2 as files under control of a recording control section 76, which includes a processor. The files can be written in a compressed format using standard multimedia compression techniques such as MPEG and AC-3. Encryption can also be used during the recording. It should be noted that the metadata generating section 75 can be implemented as software incorporated in recording control section 76.

The encoders extract features from the input signals 78-79, e.g., motion vectors, a color histograms, audio frequencies, characteristics, and volumes, and speech related information. The extracted features are analyzed by the metadata generating section 75 to determine segments and their associated index information and importance levels.

It should be noted that, for any implementation, the multimedia files and the metadata files do not need to be generated concurrently. For example, the metadata can be generated at later time, and metadata can be added incrementally over time.

Time Threshold Based Reproduction

FIG. 8 shows an alternative reproduction according to the invention in an abstract manner where the vertical axis 50 defines an importance level, the horizontal axis 51 defines time, and the continuous curve 52 indicates importance levels over time. Line 80 is a variable importance level threshold, and line 81 a reproduction for only those segments that have a particular importance greater than the threshold. Other segments are skipped.

However, in this embodiment, a time threshold is also used. Only segments that have a particular importance level greater than the importance level threshold and maintain that importance level for an amount of time that is longer than the time threshold are reproduced. For example, the segment a1 to a2 is not reproduced, while the segment b1 to b2 is reproduced. This eliminates segments that are too short in time to enable the viewer to adequately comprehend the segment.

Time Threshold Based Reproduction with Additive Segment Extension

FIG. 9 shows an alternative reproduction 900 according to the invention in an abstract manner where the vertical axis 50 defines an importance level, the horizontal axis 51 defines time, and the curve 52 indicates importance levels over time. Line 90 is an importance level threshold, and line 91 a reproduction for only those segments that have a particular importance greater than the threshold. Other segments are skipped, as before. In this implementation, as well as alternative implementations described below, the amount of extension can vary depending on the decisions made by the reproduction control section.

This embodiment also uses the time threshold as described above. However, in this case, segments that are shorter in time than the time threshold are not skipped. Instead, such segments are time extend to satisfy the time threshold requirement. This is done by adding portions of the multimedia file before, after, or before and after, the short segments, for example, segment c1 to a2. Thus, the short segments are increase in size to enable the viewer to adequately comprehend the short segment. It should be noted, that a second time threshold can also be used, so that extremely short segments, e.g., single frames, are still skipped.

Time Threshold Based Reproduction with Multiplicative Segment Extension

FIG. 10 shows an alternative reproduction according to the invention in an abstract manner where the vertical axis 50 defines an importance level, the horizontal axis 51 defines time, and the curve 52 indicates importance levels over time. Line 1000 is an importance level threshold, and line 101 a reproduction for only those segments that have a particular importance greater than the threshold. Other segments are skipped.

This embodiment also uses the time threshold as described above. However, in this case, the time of the segments are increased by a predetermined amount d to increase the size of the reproduced segments that satisfy the time threshold. As above, the segments can be extended before, after, or before and after. We can also use a multiplication factor to achieve the same lengthening of the time of the segments.

Recording and Reproducing System Structure

FIG. 11 shows a block diagram of a system 1100 for recording and reproducing compressed multimedia files and metadata files stored on read/write storage media 3, such as a disc or tape.

A read/write drive 110 can write data to the read buffer 11 and read data from the write buffer 74. The demultiplexer 12 acquires, sequentially, multimedia from the read buffer, and separates the multimedia into a video stream and an audio stream. The video decoder 13 processes the video stream, and the audio decoder 14 processes the audio stream. However, in this case, the metadata generating section 75 also receives the outputs of the decoders 13-14 so that the reproduced multimedia can be persistently stored on the storage media 3 using a recording/reproducing control section 111.

It should be noted that the importance level, indexing information and other metadata can also be extracted from the video and/or audio data during the decoding phase using the metadata generating section 75.

Furthermore, the importance level, indexing information and other metadata can also be generated manually and inserted at a later stage.

It should be noted that any of the above implementations can include a search function, to enable the viewer to directly position to particular portion of the multimedia based either on time, frame number, or importance. The search function can use ‘thumbnail’ segments, for example a single or small number of frames to assist the viewer during the searching.

Although the invention has been described by way of examples of preferred embodiments, it is to be understood that various other adaptations and modifications may be made within the spirit and scope of the invention. Therefore, it is the object of the appended claims to cover all such variations and modifications as come within the true spirit and scope of the invention.

Referenced by
Citing PatentFiling datePublication dateApplicantTitle
US7624021 *Jul 2, 2004Nov 24, 2009Apple Inc.Universal container for audio data
US7764866 *Dec 2, 2005Jul 27, 2010Lg Electronics, Inc.Recording medium, method for searching for content data from the recording medium, and method and apparatus for reproducing data from the recording medium
US8020097 *Mar 21, 2006Sep 13, 2011Microsoft CorporationRecorder user interface
US8095375Apr 25, 2008Jan 10, 2012Apple Inc.Universal container for audio data
US8117038Apr 25, 2008Feb 14, 2012Apple Inc.Universal container for audio data
US8260794 *Aug 30, 2007Sep 4, 2012International Business Machines CorporationCreating playback definitions indicating segments of media content from multiple content files to render
US8494866Oct 31, 2011Jul 23, 2013Apple Inc.Universal container for audio data
US8682132Jan 9, 2007Mar 25, 2014Mitsubishi Electric CorporationMethod and device for detecting music segment, and method and device for recording data
US8792778Feb 24, 2010Jul 29, 2014Canon Kabushiki KaishaVideo data display apparatus and method thereof
US20080019661 *Jul 18, 2006Jan 24, 2008Pere ObradorProducing output video from multiple media sources including multiple video sources
US20100138418 *Nov 27, 2009Jun 3, 2010Samsung Electronics Co., Ltd.Method and apparatus for reproducing content by using metadata
Classifications
U.S. Classification715/716, 707/E17.019
International ClassificationG10L19/00, G06F15/00, H04N9/82, G11B27/00, H04N9/804, G06F17/30, G11B27/28, H04N5/91
Cooperative ClassificationG06F17/30843, H04N21/44008, H04N9/8042, H04N21/84, G06F17/30858, G11B27/28, H04N21/4542, H04N21/4508, H04N9/8205, H04N21/4394, H04N21/42646, G06F17/30787, H04N21/4325
European ClassificationH04N21/45M, H04N21/439D, H04N21/454B, H04N21/426D, H04N21/44D, H04N21/432P, H04N21/84, G06F17/30V1A, G06F17/30V4S, G06F17/30V9, H04N9/804B, H04N9/82N, G11B27/28
Legal Events
DateCodeEventDescription
Jun 3, 2004ASAssignment
Owner name: MITSUBISHI ELECTRIC CORPORATION, JAPAN
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:OGAWA, MASAHARU;NAKANE, KAZUHIKO;REEL/FRAME:015410/0375
Effective date: 20040303
Jan 14, 2004ASAssignment
Owner name: MITSUBISHI ELECTRIC CORPORATION, JAPAN
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:OTSUKA, ISAO;REEL/FRAME:014935/0465
Effective date: 20040114
Owner name: MITSUBISHI ELECTRIC RESEARCH LABORATORIES, INC., M
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:DIVAKARAN, AJAY;REEL/FRAME:014935/0420