Search Images Maps Play YouTube News Gmail Drive More »
Advanced Patent Search | Web History | Sign in

Patents

The present invention analyzes recorded video from a video camera to identify camera and object motion in the recorded video. Keyframes representative of clips of the recorded video are displayed on a user interface that allows a user to manipulate an order of the keyframes. Editing rules are then applied to the keyframes to intelligently splice together portions of the representative clips into a final output video.

InventorsAndreas Girgensohn, John J. Doherty, Lynn D. Wilcox, John S. Boreczky, Patrick Chiu, Jonathan T. Foote
Original AssigneeFuji Xerox Co., Ltd.
Primary Examiner: Robert Chevalier
Attorney: Fliesler Meyer LLP
Current U.S. Classification386/227; 386/278; 386/282; G9B/27.012; G9B/27.029; G9B/27.051
International Classification: H04N/593; H04N/576; G11B/2700

View patent at USPTO
Search USPTO Assignment Database

Citations

Cited PatentFiling dateIssue dateOriginal AssigneeTitle
US6072542Nov 25, 1997Jun 6, 2000Fuji Xerox Co., Ltd.
Xerox Corporation
Automatic video segmentation using hidden markov model
US6748158Feb 1, 2000Jun 8, 2004Grass Valley (U.S.) Inc.Method for classifying and searching video databases based on 3-D camera motion

Referenced by

Citing PatentFiling dateIssue dateOriginal AssigneeTitle
US7043075Jun 27, 2002May 9, 2006Koninklijke Philips Electronics N.V.Computer vision system and method employing hierarchical object classification scheme
US7149974Apr 3, 2002Dec 12, 2006Fuji Xerox Co., Ltd.Reduced representations of video sequences
US7274741Nov 1, 2002Sep 25, 2007Microsoft CorporationSystems and methods for generating a comprehensive user attention model
US7400761Sep 30, 2003Jul 15, 2008Microsoft CorporationContrast-based image attention analysis framework
US7444018Aug 31, 2004Oct 28, 2008Microsoft CorporationMethod and apparatus for shot detection
US7471827Oct 16, 2003Dec 30, 2008Microsoft CorporationAutomatic browsing path generation to present image areas with high attention value as a function of space and time
US7565016Jan 15, 2007Jul 21, 2009Microsoft CorporationLearning-based automatic commercial content detection
US7599918Dec 29, 2005Oct 6, 2009Microsoft CorporationDynamic search with implicit user intention mining
US7623677Jan 3, 2006Nov 24, 2009Fuji Xerox Co., Ltd.Methods and interfaces for visualizing activity across video frames in an action keyframe
US7712017Feb 24, 2006May 4, 2010Fuji Xerox Co., Ltd.Method, system and article of manufacture for linking a video to a scanned document
US7773813Oct 31, 2005Aug 10, 2010Microsoft CorporationCapture-intention detection for video content analysis
US7783106Nov 12, 2004Aug 24, 2010Fuji Xerox Co., Ltd.Video segmentation combining similarity analysis and classification
US7848598Sep 15, 2003Dec 7, 2010Fuji Xerox Co., Ltd.Image retrieval processing to obtain static image data from video data
US7873905Jul 24, 2003Jan 18, 2011Fuji Xerox Co., Ltd.Image processing system
US7890867Jun 7, 2006Feb 15, 2011Adobe Systems IncorporatedVideo editing functions displayed on or near video sequences
US7954065Jun 29, 2007May 31, 2011Apple Inc.Two-dimensional timeline display of media items
US7986372Aug 2, 2004Jul 26, 2011Microsoft CorporationSystems and methods for smart media content thumbnail extraction
US7996771Jan 3, 2006Aug 9, 2011Fuji Xerox Co., Ltd.Methods and interfaces for event timeline and logs of video streams
US8089563Jan 3, 2006Jan 3, 2012Fuji Xerox Co., Ltd.Method and system for analyzing fixed-camera video via the selection, visualization, and interaction with storyboard keyframes
US8098730Apr 3, 2006Jan 17, 2012Microsoft CorporationGenerating a motion attention model
US8103966Feb 5, 2008Jan 24, 2012International Business Machines CorporationSystem and method for visualization of time-based events
US8161389Oct 31, 2007Apr 17, 2012Adobe Systems IncorporatedAuthoring tool sharable file format
US8171410May 29, 2009May 1, 2012Telcordia Technologies, Inc.Method and system for generating and presenting mobile content summarization
US8180826Apr 14, 2006May 15, 2012Microsoft CorporationMedia sharing and authoring on the web
US8196032Nov 1, 2005Jun 5, 2012Microsoft CorporationTemplate-based multimedia authoring and sharing
US8208792Sep 12, 2007Jun 26, 2012Panasonic CorporationContent shooting apparatus for generating scene representation metadata

Claims

1. A video creation system for analyzing recorded video from a video camera to produce a final output video, the system comprising:

a camera motion detector, the camera motion detector segmenting respective takes of the recorded video into clips, the clips being classified according to camera motion detected in respective takes; and
a video splicer, the video splicer applying a plurality of editing rules to determine whether video frames adjacent to keyframes representative of respective clips are spliced together to create the final output video.

2. The system of claim 1 further comprising a digitizer, the digitizer digitizing the recorded video and segmenting the recorded video into takes corresponding to camera on and camera off transitions.

3. The system of claim 1 further comprising a user interface, the user interface displaying respective selected keyframes and permitting a user to change a sequence of the keyframes to any desired order.

4. The system of claim 3, wherein the user interface comprises a keyframe interface and a storyboard interface.

5. The system of claim 4, wherein a size of respective keyframes displayed in the keyframe interface is proportional to a duration of respective clips represented by the corresponding keyframe.

6. The system of claim 4, wherein a user can arrange respective keyframes displayed in the keyframe interface onto the storyboard interface in any desired time sequence.

7. The system of claim 3, wherein the user interface is a dynamic user interface, the dynamic user interface enabling a user to omit the clip being viewed from the final output video while the recorded video is playing.

8. The system of claim 7, wherein the dynamic user interface is used in conjunction with a static user interface.

9. The system of claim 3, wherein the user interface enables a user to include the clip being viewed in the final output video.

10. The system of claim 1 further comprising a keyframe selector, the keyframe selector enabling a user to select at least one keyframe representative of respective clips.

11. The system of claim 10, wherein a single keyframe is selected and displayed for clips classified as a still class having a static scene.

12. The system of claim 10, wherein multiple keyframes are selected and displayed for clips having object motion therein.

13. The system of claim 12, wherein multiple keyframes are selected using hierarchical agglomerative clustering to segment respective clips into homogeneous regions and choosing keyframes from respective regions.

14. The system of claim 1, wherein the video splicer employs a constraint satisfaction system for applying the plurality of editing rules.

15. The system of claim 1, wherein an ergodic Hidden Markov Model is used to segment respective takes into clips based on camera motion classes detected in the take by the camera motion detector.

16. The system of claim 1, wherein respective clips are classified according to camera motion comprising a still class, a pan class, a tilt class, a zoom class, and a garbage class.

17. The system of claim 1, wherein the editing rules comprise:

discarding respective clips having a length less than a minimum length, the minimum length being substantially equal to three seconds;
trimming respective clips having a length that exceeds a maximum length, the maximum length being substantially equal to ten seconds;
merging two clips, to be included in the final output video, selected from the same take if the two clips are separated by less than three seconds to avoid cutting between the two clips;
discarding clips having fast and non-linear camera motion and clips classified as a garbage class;
selecting a sub-clip near an end of a shot if the shot exceeds the maximum length;
discarding clips comprising a zoom, a pan, and a tilt having durations less than five seconds unless a still clip exists on either end of the zoom, the pan, and the tilt; and
selecting the shot if the shot has a minimum brightness above a predetermined brightness threshold.

18. The system of claim 17, wherein the predetermined brightness threshold is substantially equal to 30% brightness.

19. The system of claim 1, wherein the video splicer applies the editing rules to automatically determine a video in point and a video out point for respective clips.

20. A method for creating custom videos from recorded video of a video camera, the method comprising the steps of:

detecting camera movement in respective takes of the recorded video;
segmenting respective takes into clips based on classes of camera movement detected;
displaying selected keyframes representative of respective clips on a user interface, the selection of respective keyframes based on the classes of camera movement detected; and
applying editing rules to the keyframes displayed in the user interface to choose sections of video around respective keyframes to splice together to create the final output video.

21. The method of claim 20 further comprising the step of digitizing the recorded video and segmenting the digitized recorded video into takes.

22. The method of claim 21, wherein respective takes are defined by digitized recorded video between a camera on transition and a camera off transition.

23. The method of claim 20, wherein an ergodic Hidden Markov Model segments respective takes into clips based on camera motion classes detected in the take by the camera motion detector.

24. The method of claim 23, wherein the camera motion classes comprise a still class, a pan class, a tilt class, a zoom class, and a garbage class.

25. The method of claim 20, wherein a single keyframe is selected for clips comprising camera motion classified as a still class having a static scene.

26. The method of claim 20, wherein multiple keyframes are selected for clips comprising camera motion respectively classified as a pan class, a tilt class, a zoom class, and a still class having object motion.

27. The method of claim 26, wherein multiple keyframes are selected using hierarchical agglomerative clustering to divide respective clips into homogeneous regions and choosing keyframes from respective regions.

28. The method of claim 20, wherein a size of respective selected keyframes displayed on the user interface is proportional to a duration of clips represented by respective selected keyframes.

29. The method of claim 28, wherein the user interface comprises a keyframe interface and a storyboard interface where a user can place respective selected keyframes displayed in the keyframe interface onto the storyboard interface in any desired sequence.

30. The method of claim 20, wherein the user interface comprises a first display interface and a second display interface.

31. The method of claim 30, wherein the static user interface displays selected keyframes and the dynamic user interface enables a user to dynamically select clips to be omitted from the final output video while the recorded video is playing.

32. The method of claim 20, wherein the step of applying editing rules further comprises the steps of:

discarding respective clips having a length less than a minimum length, the minimum length being substantially equal to three seconds;
trimming respective clips having a length that exceeds a maximum length, the maximum length being substantially equal to ten seconds;
merging two clips, to be included in the final output video, selected from the same take if the two clips are separated by less than three seconds to avoid cutting between the two clips;
discarding clips having fast and non-linear camera motion and clips classified as a garbage class;
selecting a sub-clip near an end of a shot if the shot exceeds the maximum length;
discarding clips comprising a zoom, a pan, and a tilt having durations less than five seconds unless a still clip exists on either end of the zoom, the pan, and the tilt; and
selecting the shot if the shot has a minimum brightness above a predetermined brightness threshold.

33. The method of claim 20, wherein the step of applying editing rules further comprises the step of automatically determining a video in point and a video out point for respective clips.

34. The method of claim 20, wherein the segmenting step segments respective takes into clips based, alternatively, on video quality rules.

35. The method of claim 34, wherein keyframes representative of clips having a highest video quality from respective takes are selected.