WO1999001830A1 - Interactive video interfaces - Google Patents

Interactive video interfaces Download PDF

Info

Publication number
WO1999001830A1
WO1999001830A1 PCT/US1998/013606 US9813606W WO9901830A1 WO 1999001830 A1 WO1999001830 A1 WO 1999001830A1 US 9813606 W US9813606 W US 9813606W WO 9901830 A1 WO9901830 A1 WO 9901830A1
Authority
WO
WIPO (PCT)
Prior art keywords
frames
image
interface
video
root
Prior art date
Application number
PCT/US1998/013606
Other languages
French (fr)
Inventor
Chil M. Goldberg
Nabil Madrane
Original Assignee
Obvious Technology, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Obvious Technology, Inc. filed Critical Obvious Technology, Inc.
Priority to IL13379898A priority Critical patent/IL133798A0/en
Priority to EP98932999A priority patent/EP0992010A1/en
Publication of WO1999001830A1 publication Critical patent/WO1999001830A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7834Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • G11B27/034Electronic editing of digitised analogue information signals, e.g. audio or video signals on discs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/74Browsing; Visualisation therefor
    • G06F16/745Browsing; Visualisation therefor the internal structure of a single video sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
    • G06F16/786Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content using motion, e.g. object motion or camera motion
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/102Programmed access in sequence to addressed parts of tracks of operating record carriers
    • G11B27/105Programmed access in sequence to addressed parts of tracks of operating record carriers of operating discs
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/28Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/34Indicating arrangements 
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B2220/00Record carriers by type
    • G11B2220/20Disc-shaped record carriers
    • G11B2220/21Disc-shaped record carriers characterised in that the disc is of read-only, rewritable, or recordable type
    • G11B2220/213Read-only discs
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B2220/00Record carriers by type
    • G11B2220/20Disc-shaped record carriers
    • G11B2220/25Disc-shaped record carriers characterised in that the disc is based on a specific recording technology
    • G11B2220/2537Optical discs
    • G11B2220/2545CDs
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S715/00Data processing: presentation processing of document, operator interface processing, and screen saver display processing
    • Y10S715/9763-D Icon

Definitions

  • the present invention relates to the field of interfaces for video information. More particularly, the present invention provides interactive interfaces for video information and tool-kits for use in creation of such interactive interfaces.
  • Video information is being produced at an ever-increasing rate and video sequences, especially short sequences, are increasingly being used, for example, in websites and on CD-ROM, and being created, for example, by domestic use of camcorders.
  • video information consists of a sequence of frames recorded at a fixed time interval.
  • the video information consists of 25 or 30 frames per second.
  • Each frame is meaningful since it corresponds to an image which can be viewed.
  • a frame may be made up of a number of interlaced fields, but this is not obligatory as is seen from more recently proposed video formats, such as those intended for high definition television.
  • Frames describe the temporal decomposition of the video image information.
  • Each frame contains image information structured in terms of lines and pixels, which represent the spatial decomposition of the video.
  • video information or “video sequences” refer to data representing a visual image recorded over a given time period, without reference to the length of that time period or the structure of the recorded information.
  • video sequence will be used to refer to any series of video frames, regardless of whether this series corresponds to a single camera shot (recorded between two cuts) or to a plurality of shots or scenes.
  • the coded content may, for example, identify the types of objects present in the video sequence, their properties/motion, the type of camera movements involved in the video sequence (pan, tracking shot, zoom, etc.), and other properties.
  • a "summary" of the coded document may be prepared, consisting of certain representative frames taken from the sequence, together with text information or icons indicating how the sequence has been coded.
  • the interface for interacting with the video database typically includes a computer input device enabling the user to specify objects or properties of interest and, in response to the query, the computer determines which video sequences in the database correspond to the input search terms and displays the appropriate "summaries". The user then indicates whether or not a particular video sequence should be reproduced. Examples of products using this approach are described in the article "Advanced Imaging Product Survey: Photo, Document and Video” from the journal “Advanced Imaging", October 1994, which document is incorporated herein by this reference.
  • the video sequence is divided up into shorter series of frames based upon the scene changes or the semantic content of the video information.
  • a hierarchical structure may be defined. Index "summaries" may be produced for the different series of frames corresponding to nodes in the hierarchical structure.
  • the "summary" corresponding to a complete video sequence may be retrieved for display to the user who is then allowed to request display of "summaries” relating to sub-sections of the video sequence which are lower down in the hierarchical structure. If the user so wishes, a selected sequence or sub-section is reproduced on the display monitor.
  • Such a scheme is described in EP-A-0 555 028 which is incorporated herein by this reference.
  • Another approach derived from the field of video editing, consists of the "digital storyboard".
  • the video sequence is segmented into scenes and one or more representative frames from each scene is selected and displayed, usually accompanied by text information, side-by-side with representative frames from other segments.
  • the user now has both a visual overview of all the scenes and a direct visual access to individual scenes.
  • Each representative frame of the storyboard can be considered to be an icon.
  • Selection of the icon via a pointing device typically a mouse-controlled cursor
  • Typical layouts for the storyboards are two-dimensional arrays or long one-dimensional strips. In the first case, the user scans the icons from the left to the right, line by line, whereas in the second case the user needs to move the strip across the screen.
  • Digital storyboards are typically created by a video editor who views the video sequence, segments the data into individual scenes and places each scene, with a descriptive comment, onto the storyboard.
  • many steps of this process can be automated. For example, different techniques for automatic detection of scene changes are discussed in the following documents, each of which is incorporated herein by reference:
  • a "video icon” as illustrated in Fig. 1 A, the scene is represented by a number of frames selected from the sequence and which are displayed as if they were stacked up one behind the other in the z-direction and are viewed in perspective.
  • each individual frame is represented by a plane and the planes lie one behind the other with a slight offset.
  • the first frame of the stack is displayed in its entirety whereas underlying frames are partially occluded by the frames in front.
  • the envelope of the stack of frames has a parallelepiped shape.
  • FIG. IB Two special types of video icon have been proposed, "object based" video icons and video icons containing a representation of camera movement.
  • objects of interest are isolated in the individual frames and. for at least some of the stacked frames, the only image information included in the video icon is the image information corresponding to the selected object.
  • the individual frames are represented as if they were transparent except in the regions containing the selected object.
  • Video icons containing an indication of camera movement may have, as illustrated in the example of Fig. 1C, a serpentine-shaped envelope corresponding to the case of side-to-side motion of the camera.
  • the video icons discussed above present the user with information concerning the content of the whole of a video sequence and serve as a selection tool allowing the user to access -frames of the video sequence out of the usual order. In other words, these icons allow non-sequential access to the video sequence. Nevertheless, the ways in which the user can interact with the video sequence information are strictly limited. The user can select frames for playback in a non-sequential way but he has little or no means of obtaining a deeper level of information concerning the video sequence as a whole, short of watching a playback of the whole sequence.
  • the present invention provides a novel type of interface to video information which allows the user to access information concerning a video sequence in a highly versatile manner.
  • interactive video interfaces of the present invention enable a user to obtain deeper levels of information concerning an associated video sequence at positions in the sequence which are designated by the user as being of interest.
  • the present invention provides an interface to information concerning an associated video sequence, one such interface comprising: information defining a three-dimensional root image, the root image consisting of a plurality of basic frames selected from said video sequence, and/or a plurality of portions of video frames corresponding to selected objects represented in the video sequence, x and y directions in the root image corresponding to x and y directions in the video frames and the z direction in the root image corresponding to the time axis whereby the basic frames are spaced apart from one another in the z direction of the root image by distances corresponding to the time separation between the respective video frames; means for displaying views of the root image; means for designating a viewing position relative to said root image; and means for calculating image data representing said three-dimensional root image viewed from the designated viewing position, and for outputting said calculated image data to the displaying means.
  • customized user interfaces may be created for video sequences.
  • These interfaces comprise a displayable "root" image which directly represents the content and context of the image information in the video sequence and can be manipulated, either automatically or by the user, in order to display further image information, by designation of a viewing position with respect to the root image, the representation of the displayed image being changed in response to changes in the designated viewing position.
  • the representation of the displayed image changes dependent upon the designated viewing position as if the root image were a three-dimensional object.
  • the data necessary to form the displayed representation of the root image is calculated so as to provide the correct perspective view given the viewing angle, the distance separating the viewing position from the displayed quasi-object and whether the viewing position is above or below the displayed quasi-object.
  • the present invention can provide non-interactive interfaces to video sequences, in which the root image information is packaged with an associated script defining a routine for automatically displaying a sequence of different views of the root image and performing a set of manipulations on the displayed image, no user manipulation being permitted.
  • the full benefits of the invention are best seen in interactive interfaces where the viewing position of the root image is designated by the user, as follows. When the user first accesses the interface he is presented with a displayed image which represents the root image seen from a particular viewpoint (which may be a predetermined reference viewpoint). As he designates different viewing angles, the displayed image represents the root image seen from different perspectives.
  • the displayed image increases or reduces the size and, preferably, resolution of the displayed information, accessing image data from additional video frames, if need be.
  • the customized, interactive interfaces provided by the present invention involve displayed images, representing the respective associated video sequences, which, in some ways, could be considered to be a navigable environment or a manipulable object.
  • This environment or object is a quasi-three-dimensional entity.
  • the x and y dimensions of the environment/object correspond to true spatial dimensions (corresponding to the x and y directions in the associated video frames) whereas the z dimension of the environment/object corresponds to the time axis.
  • the user can select spatial and temporal information from a video sequence for access by designating a viewing position with respect to a video icon representing the video sequence.
  • Arbitrarily chosen oblique "viewing directions" are possible whereby the user simultaneously accesses image information corresponding to portions of a number of different frames in the video sequence.
  • the user's viewing position relative to the video icon changes, the amount of a given frame which is visible to him, and the number and selection of frames which he can see, changes correspondingly.
  • the interactive video interfaces of the present invention make use of a "root" image comprising a plurality of basic frames arranged to form a quasi-three dimensional object. It is preferred that the relative placement positions of the basic frames be arranged so as to indicate visually some underlying motion in the video sequence.
  • the envelope of the set of basic frames preferably does not have a parallelepiped shape but. instead, composes a "pipe" of rectangular section and bending, in a way corresponding to the camera travel during filming of the video sequence.
  • the basic video frames making up the root image are chosen as a function of the amount of motion or change in the sequence.
  • successive basic frames should include back-round information overlapping by, say, 50%.
  • the root image corresponds to an "object-based video icon.”
  • certain basic frames included in the root image are not included therein in full; only those portions corresponding to selected objects are included.
  • certain basic frames may be included in full in the root image but may include "hot objects," that is, representations of objects selectable by the user.
  • the corresponding basic frames and, if necessary, additional frames
  • the root image allows the user to selectively isolate objects of interest in the video sequence and obtain at a glance a visual impression of the appearance and movement of the objects during the video sequence.
  • the interfaces of the present invention allow the user to select an arbitrary portion of the video sequence for playback.
  • the user designates a portion of the video sequence which is of interest, by designating a corresponding portion of the displayed image forming part of the interface to the video sequence. This portion of the video sequence is than played back.
  • the interface may include a displayed set of controls similar to those provided on a VCR in order to permit the user to select different modes for this playback, such as fast-forward, rewind, etc.
  • the displayed image forming part of the interface remains visible whilst the designated portion of the sequence is being played back.
  • This can be achieved in any number of ways,as for example, by providing a second display device upon which the playback takes place, or by designating a "playback window" on the display screen, this playback window being offset with respect to the screen area used by the interface, or by any other suitable means.
  • the preferred embodiments of interfaces according to the invention also permit the user to designate an object of interest and to select a playback mode in which only image information concerning that selected object is included in the playback.
  • the user can select a single frame from the video sequence for display separately from the interactive displayed image generated by the interface.
  • the interfaces of the present invention allow the user to generate a displayed image corresponding to a distortion of the root image. More especially, the displayed image can correspond to the root image subjected to an
  • the present invention can provide user interfaces to "multi-threaded" video sequences, that is, video sequences consisting of numerous interrelated shorter segments such as are found, for example, in a video game where the user's choices change the scene which is displayed.
  • Interfaces to such multi-threaded video sequences can include frames of the different video segments in the root image, such that the root image has a branching structure. Alternatively, some or all of the different threads may not be visible in the root image but may become visible as a result of user manipulation.
  • a pointing device such as a mouse, or by touching a touch screen, etc.
  • image portions for these different threads may be added to the displayed image.
  • the root image for the video sequence concerned is associated with information defining how the corresponding displayed image will change in response to given types of user manipulation.
  • this associated information may define how many, or which additional frames are displayed when the user moves the viewing position closer up to the root image.
  • the associated information may identify which objects in the scene are "hot objects" and what image information will be displayed in relation to these hot objects when activated by the user.
  • the user who is interested in a particular video sequence may first download only certain components of the associated interface. First of all he downloads information for generating a displayed view of the root image, together with an associated application program (if he does not already have an appropriate "interface player" loaded in his computer).
  • the downloaded (or already-resident) application program includes basic routines for chancing the perspective of the displayed image in response to changes in the viewing position designated by the user.
  • the application program is also adapted to consult any "associated information" (as mentioned above) which forms part of the interface and conditions the way in which the displayed image changes in response to certain predetermined user manipulations (such as "zoom-in” and "activate object"). If the interface does not contain any such "associated information” then the application program makes use of pre-set default parameters.
  • the root image corresponds to a particular set of basic video frames and information designating relative placement positions thereof.
  • the root image information downloaded to the user may include just the data necessary to create a reference view of the root image or it may include the image data for the set of basic frames (in order to enable the changes in user viewing angle to be catered for without the need to download additional information).
  • this extra information can either be pre-packaged and supplied with the root image information or the extra information can be downloaded from the host website as and when it is needed.
  • the present invention also provides apparatus for creation of interfaces according to the present invention.
  • This may be dedicated hardware or, more preferably, a computer system programmed in accordance with specially designed computer programs.
  • the selection of basic frames for inclusion in the "root image" of the interface can be made automatically according to one of a number of different algorithms, such as choosinbg one frame every n frames, or choosing 1 frame every time the camera movement has displaced the background by m%, etc.
  • the relative placement positions of the basic frames in the root image can be set automatically taking into account the time separation between those frames and, if desired, other factors such as camera motion.
  • the presence of objects or people in the video sequence can be detected automatically according to one of the known algorithms (such as those discussed in the references cited above), and an "object oriented" root image can be created automatically.
  • the interface creation apparatus of the present invention has the capability of automatically processing video sequence information in order to produce a root image.
  • These embodiments include means for associating with the root image a standard set of routines for changing the representation of the displayed image in response to user manipulations.
  • the present invention provides a toolkit for use in creation of customized interfaces.
  • the toolkit enables a designer to tailor the configuration and content of the root image, as well as to specify which objects in the video sequence are "hot objects" and to control the way in which the displayed interface image will change in response to manipulation by an end user.
  • the toolkit enables the interface designer to determine which frames of the video sequence should be used as basic frames in the root image, and how many additional frames are added to the displayed image when the user designates a viewing position close to the root image.
  • Figure 1 illustrates various types of video icon, wherein Fig. 1A shows an ordinary video icon, Fig. IB shows an object-based video icon and Fig. IC shows a video icon including a representation of camera motion;
  • Figure 2 is a block diagram indicating the components of an interactive interface according to a first embodiment of the present invention
  • FIG. 3 is a diagram illustrating the content of the interface data file (FDI) used in the first embodiment of the invention
  • Figure 4 is a diagram illustrating a reference view of a root image and three viewing, positions designated by a user
  • Figure 5 illustrates the displayed image in the case of the root image viewed from the different viewing positions of Fig.4, wherein Fig.5A represents the displayed image from viewing position A, wherein Fig.5B represents the displayed image from viewing position B, and wherein Fig. 5C represents the displayed image from viewing position C;
  • Figure 6 illustrates displayed images based on more complex root images according to the present invention, in which Figure 6A is derived from a root image visually representing motion and Fig.6B is derived from a root image visually representing a zoom effect;
  • Figure 7 illustrates the effect of user selection of an object represented in the displayed image, in a second embodiment of interface according to the present invention;
  • Figure 8 illustrates a user manipulation of a root image to produce an "according effect"
  • Figure 9 illustrates a displayed image corresponding to a view of a branching root image associated with a multi-threaded scenario
  • Figure 10 is a flow diagram indicating steps in a preferred process of designing an interface according to the present invention:
  • Figure 11 is a schematic representation of a preferred embodiment of an interface editor unit according to the present invention: and Figure 12 is a schematic representation of a preferred embodiment of an interface viewer according to the present invention.
  • an interactive interface of the invention is associated with video sequences recorded on a CD-ROM.
  • a CD-ROM reader 1 is connected to a computer system including a central processor portion 2, a display screen 3, and a user-operable input device which, in this case, includes a keyboard 4 and a mouse 5.
  • a user-operable input device which, in this case, includes a keyboard 4 and a mouse 5.
  • the user wishes to consult video sequences recorded on a CD-ROM 7, he places the CD-ROM 7 in the CD-ROM reader and activates CD-ROM accessing software provided in the central processor portion 2 or an associated memory or unit.
  • the CD-ROM has recorded thereon not only the video sequence image information 8 (in any convenient format), but also a respective interface data file (FDI;) 10 for each video sequence, together with a video interface application program 11.
  • FDI respective interface data file
  • Respective scripts 12 are optionally associated with the interface data files.
  • the video interface application program 11 is operated by the central processor portion 2 of the computer system and the interface data file applicable to the video sequence selected by the user is processed in order to cause an interactive video icon (see, for example, Figs. 4 and 5) to be displayed on the display screen 3.
  • the user can then manipulate the displayed icon, by making use of the mouse or keyboard input devices, in order to explore the selected video sequence.
  • Fig. 4 illustrates a simple interactive video icon according to the present invention.
  • this video icon is represented on the display screen as a set of superposed images arranged within an envelope having the shape of a regular parallelepiped.
  • Each of the superposed images corresponds to a video frame selected from the video sequence, but these frames are offset from one another. It may be considered that the displayed image corresponds to a cuboid viewed from a particular viewing position (above and to the right, in this example).
  • This cuboid is a theoretical construct consisting of the set of selected video frames disposed such that their respective x and y axes correspond to the x and y axes of the cuboid and the z axis of the cuboid corresponds to the time axis.
  • the selected frames are spaced apart in the z direction in accordance with their respective time separations in the video sequence.
  • a position on the screen is designated by the letters A, B and C.
  • the displayed image is changed to the form shown in Fig.5: Figs.5A, 5B and 5C correspond to "viewing positions" A, B and C, respectively, of Fig. 4.
  • the image displayed to the user changes so as to provide a perspective view of the theoretical cuboid as seen from an angle corresponding to the viewing position designated by the user.
  • the above-mentioned cuboid is a special case of a "root image” according to the present invention.
  • This "root image” is derived from the video sequence and conveys information concerning both the image content of the selected sub-set of frames (called below, “basic frames”) and the relative "position” of that image information in time as well as space.
  • the "root image” is defined by information in the interface data file. The definition specifies which video frames are “basic frames” (for example, by storing the relevant frame numbers), as well as specifying the placement positions of the basic frames relative to one another within the root image.
  • the central processor portion 2 of the computer system calculates the image data required to generate the displayed image from the root image definition contained in the appropriate interface data file, image data of the basic frames (and, where required, additional frames) and the viewing position designated by the user, using, standard ray- tracing techniques.
  • the data required to generated the displayed image is loaded into the video buffer and displayed on the display screen.
  • the image information in the area of interest should be enriched. This is achieved by including, in the displayed image, image data relating to additional video frames besides the basic video frames. Such a case is illustrated in Fig.5B, where the basic frames BF5 and BF6 are displayed together with additional frames AF1 and AF2.
  • the video interface application program causes closely spaced additional frames to be added to the displayed image.
  • successive video frames of the video sequence may be included in the displayed image.
  • image information corresponding to parts of the root image distant from the area of interest may be omitted from the displayed "close- up" image.
  • the interface data file includes data specifying how the choice should be made of additional frames to be added as the user "moves close up" to the displayed image. More preferably, this data defines rules governing the choice of how many, and which, additional frames should be used to enrich the displayed image as the designated viewing position changes. These rules can, for example, define a mathematical relationship between the number of displayed frames and the distance separating the designated viewing position and the displayed quasi-object. In preferred embodiments of the invention, the number of frames which are added to the display as the viewing position approaches the displayed quasi-object depends upon the amount of motion or change in the video sequence at that location.
  • the example illustrated in Fig. 4 is a simplification in which the displayed image corresponds to a root image having a simple, cuboid shape. However, according to the present invention, the root image may have a variety of different forms.
  • the relative placement positions of the basic frames may be selected such that the envelope of the root image has a shape which reflects motion in the corresponding video sequence (either camera motion, during tracking shots and the like, or motion of objects represented in the sequence) - see the corresponding interactive icon shown in Fig.6 A.
  • the dimensions of the basic frames in the root image may be scaled so as to visually represent a zoom effect occurring in the video sequence -see the corresponding interactive icon shown in Fig. ⁇ B.
  • the interactive icon represented in Fig. ⁇ B includes certain frames for which only a portion of the image information has been displayed. This corresponds to a case where an object of special interest has been selected. Such object selection can be made in various ways.
  • the root image may be designed such that, instead of including basic frames in full, only those portions of frames which represent a particular object are included. This involves a choice being made, at the time of design of the root image portion of the interface, concerning which objects are interesting. The designer can alternatively or additionally decide that the root image will include basic frames in full but that certain objects represented in the video sequence are to be "selectable” or "extractable” at user request. This feature will now be discussed with reference to Fig.7.
  • Figure 7A illustrates an initial view presented to a user when he consults the interface for a particular selected video sequence.
  • two people walk towards each other and their paths cross.
  • the designer of the interface has decided that the two people are objects that may be of interest to the end user. Accordingly, he has included, in the interface data file, information designating these objects as "extractable".
  • This designation information may correspond to x, y co-ordinate range information identifying the position of the object in each video frame (or a sub-set of frames). If the user expresses an interest in either of the two objects, for example, by designating a screen position corresponding to one of the objects (e.g.
  • interface application program controls the displayed image such that extraneous portions of the displayed frames disappear from the display, leaving only a representation of the two people and their motion, as shown in Fig.7B.
  • the objects of interest are “extracted” from their surroundings.
  • the "missing" or transparent portions of the displayed frames can be restored to the displayed image at the user's demand (e.g. by a further “click” of the mouse button).
  • interfaces may be designed such that particular "extractable" objects may be extracted simultaneously with some or all of the other extractable objects, or they may be extracted individually.
  • Sophisticated interfaces can incorporate object- extraction routines permitting the user to arbitrarily select objects visible in the displayed view of the root image, for extraction.
  • the user may use a pointing device to create a frame around an object visible in a displayed view of the root image and the application program then provides analysis routines permitting identification of the designated object in the other basic frames of the root image (and, if required, in additional frames) so as to cause display of that selected object as if it were located on transparent frames.
  • Preferred embodiments of interface according to the invention thus provide a so-called “accordion” effect, as illustrated in Fig.8.
  • the basic frames in the vicinity of the region of interest are spread so as to provide the user with a better view.
  • the function of displaying additional frames so as to increase detail is inhibited during the "accordion" effect.
  • Fig.9 illustrates an interactive video icon derived from a simple example of such a root image.
  • the designer may create secondary root images for the respective sub-sequences, these secondary root images being used to generate the displayed image only when the user designates a viewing position close to the video frame where the sub-sequence begins.
  • this is a logical choice since it is at the point where the video sub-sequence branches from the main sequence that user choices during playing of the game, or using of the educational software, change the experienced scenario.
  • VCR controls which permit the user to playback the video sequence with which the displayed video icon is associated.
  • the user can select for playback portions or frames within the sequence by, for example, "clicking" with the mouse button on the frames of interest as displayed in the interactive video icon.
  • the video playback can take place on a separate display screen or on a window defined on the display screen displaying the video icon.
  • a particular video sequence may be associated with an interface data file and a script.
  • the script is a routine defined by the interface designer which leads the user through the use of the interface.
  • the script can, for example, consist of a routine to cause an automatic demonstration of the different manipulations possible of the displayed quasi-object.
  • the user can alter the running of the script in the usual way, for example by pausing it, slowing it down, etc.
  • the script may, if desired, include additional text, sound or graphic information which can be reproduced in association with the displayed view of the root image either automatically or in response to operations performed by the end user.
  • Script functionality according to the present invention allows creation and editing of viewing scenarios that may be subsequently be played, in part or in whole, automatically, or interactively with user inputs.
  • the user can cause the scenario to begin to play by itself and take the user through the scenario and any associated information by simply reading the scenario and changing the view.
  • the script may call for interaction by the user, such as to initiate a transaction.
  • the user may be asked to specify information, e.g. if he wants to purchase the video or any other items associated with what has been viewed.
  • the editor may leave visible tags which when activated by the user will cause some information to be displayed on the display device; e.g. associated text, graphics, video, or sound files which are played through the speakers of the display device. In certain cases these tags are attached to objects selected and extracted from the video sequence, such as so-called "hot objects" according to the present invention.
  • Fig. 10 is a flow diagram illustrating typical stages in the design of an interface according to the present invention, in the case where a designer is involved. It is to be understood that interfaces according to the present invention can also be generated entirely automatically. It will be noted that the designer's choices affect, notably, the content of the interface data file. It is to be understood, also, that not all of the steps illustrated in Fig. 10 are necessarily required - for example, steps concerning creation of secondary root images can be omitted in the case of a video sequence which is not multithreaded. Similarly, it may be desirable to include in the interface design process certain supplementary steps which are not shown in Fig.10.
  • the interface data file (as indicated in the example of Fig.3) information regarding the camera motion, cuts, etc. present in the video sequence.
  • this information can permit, for example, additional video frames to be added to the displayed image and positioned so as to provide a visual representation of the camera motion.
  • the information on the characteristics of the video sequence can be determined either automatically (using, known cut-detection techniques and the like) and/or may be specified by the interface designer.
  • the interface or sequence is accessed using such information applied according to a traditional method, such as standard database query language or through a browser via a channel or network; the interface data may be downloaded in its entirety or fetched on an as needed basis.
  • the present invention provides toolkitd for use by designers wishing to create an interactive video interface according to the present invention.
  • These toolkits are preferably implemented as a computer program for running on a general purpose computer.
  • the toolkits present the designer with displayed menus and instructions to lead him through a process including steps such as the typical sequence illustrated in Fig. 10.
  • the designer first of all indicates for which video sequence he desires to create an interface, for example by typing in the name of a stored file containing the video sequence information.
  • the toolkit accesses this video sequence information for display in a window on the screen for consultation by the designer during the interface design process.
  • the designer may make his selection of basic frames/objects for the root image, extractable objects and the like by stepping slowly through the video sequence and, for example, using a mouse to place a cursor on frames or portions of frames which are of interest.
  • the toolkit logs the frame number (and x, y locations of regions in a frame, where appropriate) of the frames/frame portions indicated by the designer and associates this positional information with the appropriate parameter being defined.
  • the designer is presented with a displayed view of the root image for manipulation so that he may determine whether any changes to the interface data file are required.
  • Different versions of the application program can be associated with the interface data file (and script, if present) depending upon the interface functions which are to be supported. Thus, if no script is associated with the interface data file, the application program does not require routines handling the running of scripts. Similarly, if the interface data file does not permit an accordion effect to be performed by the end user then the application program does not need to include routines required for calculating display information for such effects. If the interface designer believes that the end user is likely already to have an application program suitable for running interfaces according to the present invention then he may choose not to package an application program with the interface data file or else to associate with the interface data file merely information which identifies a suitable version of application program for running this particular interface.
  • the present invention has been described above in connection with video sequences stored on CD-ROM. It is to be understood that the present invention can be realized in numerous other applications.
  • the content of the interface data file and the elements of the interface which are present at the same location as the end user can vary depending upon the application.
  • the user may first download via his telecommunications connection just the interface data file applicable to the sequence. If the user does not already have software suitable for handling manipulation of the interactive video icon then he will also download the corresponding application program. As the user manipulates the interactive video icon, any extra image information that he may require which has not already been downloaded can be downloaded in a dynamic fashion as required.
  • This process can be audited according to the present invention if desired.
  • the user's interaction with the interface can be audited, and he can interact with the transaction/audit functionality for example to supply any information required by a script which may then be recorded and stored.
  • the transaction/audit information can be stored and made available for externally (optional) located auditing and transaction processing facilities/applications.
  • the auditing information can be transmitted at the end of a session whereas the transaction information may be performed on-line, i.e. the transaction information is submitted during the session. Real time transmission can also occur according to the present invention, however.
  • the interface data frame includes the image information. Some additional image information may also be provided.
  • Editors, readers and viewers according to the present invention can be implemented in hardware, hardware/software hybrid, or as software on a dedicated platform, a workstation, a personal computer, or any other hardware.
  • Different units implemented in software run on a CPU or graphics boards or other conventional hardware in a conventional manner, and the various storage devices can be general purpose computer storage devices such as magnetic disks, CD-ROMs, DVD, etc.
  • the editor connects to a database manager (101) and selects a video document and any other documents to be included in the interface by using a data chooser unit (102).
  • the database manager may be implemented in various ways; e.g., as a simple file structure or even as a complete multimedia database.
  • the data storage (100) contains the video data and any other information/documents required and can be implemented in various modes; e.g., in a simple stand-alone mode of operation it could be a CD-ROM or in a networked application it could be implemented as a bank of video servers.
  • the user operating through the user interaction unit (120) is first presented a list of available videos or uses a standard database query language to choose the desired video and then chooses any other documents required.
  • the creation of an interface using the editor is discussed below in three phases: (1) Analysis, (2) Visual layout and (3) Effects creation. 1. Analysis.
  • the video document chosen by the editor is first processed by the activity measure unit (103).
  • the activity measure unit is responsible for computing various parameters related to the motion and changes in the video. This unit typically will implement one of a number of known techniques for measuring changes, e.g., by calculating the statistics of the differences between frames, by tracking objects in motion, or by estimating camera motions by separating foreground and background portions of the image. In other implementations this unit may use motion vector information stored in an MPEG-encoded sequence to detect important frames of activity in the video document.
  • the activity measures template store is optional but would contain templates which can be used to calculate the frame ranking measure and could be specified by the user through the user interaction unit.
  • the frame ranking measure is derived heuristically from these measures [e.g., by normalizing the values and taking an average of the parameters, and can be tailored for different kinds of sequences (traveling shots, single objects in motion, etc) or applications].
  • the editor may choose a pre-defined set of parameters from the activity measures template store (108) to detect or highlight a specific kind of activity (rapid motion, abrupt changes, accelerations, etc.)
  • the frame ranking measures can be employed by the user acting through the user interaction unit on the frame selection unit (104) to select the frames to be included within the interface. For example, if 10 frames are to be included in the interface then in default mode the 10 frames corresponding to the 10 largest frame making measures are selected for inclusion in the interface. The user can then interactively de-select some of these frames and add other frames.
  • the camera motion analysis unit (105) is an optional unit which typically will implement one of a number of known techniques for measuring camera motion parameters. This information can be used to determine what shape to give to the outer envelope of the interface as shown in Figure IC: a default shape, stored in the interface template store (1 16) can be chosen. This information may be optionally stored in the FDI file.
  • the object selection unit (106A) is responsible for selecting or detecting individual objects in the video document.
  • the editor may visually select and outline an object of interest in a given frame through the user interaction unit (120); in a semi-manual mode, the editor simply points at an object and chooses from the object templates store (107) features and associated algorithms to use for extracting and tracking the chosen object; in another mode the editor may chose one of a set of pre-defined templates of objects and known pattern matching techniques are used to detect whether any objects of interest are preset. The user may even assign a name/identifier to the object and add the object to the object templates store (107). In this latter case searches for multiple occurrences of the same object can be initiated by the user.
  • the information regarding the properties of the object may be optionally stored in the FDI file.
  • the object extraction and tracking unit (106B) is now responsible for extracting the object of interest from the frame and then tracking it by using known tracking algorithms.
  • the algorithms used are either chosen by the user or by default. It is understood that the object selecting, detection, extraction, and tracking process may be highly interactive and that the user may be called upon or choose to intervene in the process a number of times.
  • the information about the presence and location of objects may be optionally stored in the FDI file.
  • the FDI file can be made available to an external program, for example when the interface editor is associated with an indexing program, the task of which is to attach indexes (identifiers) to the video documents, to portions thereof, or to objects located within the video document.
  • indexing program the task of which is to attach indexes (identifiers) to the video documents, to portions thereof, or to objects located within the video document.
  • the user acting through the user interaction unit (120) on the interface creation unit (109) determines the visual layout of the interface. He can shape the outer envelope of the interface in any way that he desires; two examples are provided in Figures 6 and 9; in particular, multiple sequences can be concentrated and so implement branching effects representing alternatives to the user. Default shapes are stored in the interface template store (116). The user can also choose to vary the spacing of the frames seen on the interface; that is the distance between frames of the interface as perceived on the display unit. The user can also insert selections of the extracted and tracked objects from unit (106B) as illustrated in Figure 7B. In this case, the corresponding frames are rendered transparent except at the locations of the objects.
  • the different pieces of information generated by the units described above are gathered together by the interface creation unit (109) into an FDI file containing a description of the interface in terms of its layout i.e. shape and structure, the image frame numbers and their positions, and if available, the extracted features the ranking of the frames and the camera motion information. This information is transmitted to the interface effects creation unit (117). 3. Effects Creation.
  • the editor can also specify three classes of interface features which serve to convey additional information to the user and which allow the user to interact with the interface.
  • the editor performs this specification through the interface effects creation unit (117).
  • the zooming effects creation unit (110) is used by the editor to specify which frames will be made visible, and also which will be rendered invisible to the user when he moves up closer to the interface (Fig. 5B) so as to view it from a new viewing position.
  • the choice of frames to add depends upon factors such as, the distance of the viewing point from the interface, the degree of motion, the degree of scene change, the number of frames that can be made visible and optionally the frame ranking measures calculated by the activity measure unit (103).
  • the editor can choose to use one or more of the default zooming effect templates contained in the zooming effect templates store (113) and assign these in a differential manner to different parts of the interface; alternatively the editor can choose to modify these templates and apply them differentially to the interface.
  • the special effects creation unit (111) is used by the editor to create special visual effects on the interface.
  • One such example is the accordion effect illustrated in Fig. 8 where parts of the interface are compressed and other parts are expanded.
  • Fig. 7A and 7B Another example is illustrated in Fig. 7A and 7B where the editor has designated an extractable object and which is then shown in its extracted form; in other words, the background is removed.
  • the editor creates the scripts by calling up templates from the specific effects templates store (1 14) and instantiating them by defining the positions where the special effect is to take place and by setting the appropriate parameters.
  • the script effects creation unit (113) allows the editor of the interface to build an interface viewing scenario that may be subsequently be played, in part or in whole, automatically, or interactively with user inputs. For example, in a completely automatic mode when the user calls up the interface it begins to play by itself and takes the user through the interface and any associated information by simply reading the scenario and changing the view of the interface.
  • the script may call for the user to interact with the interface, e.g. to initiate a transaction. In this case the user may be asked to specify information, e.g. if he wants to purchase the video or any other items associated with the interface.
  • the editor may leave visible tags which when activated by the user will cause some information to be displayed on the display device; e.g.
  • the editor creates the scripts by calling up templates from the script effects templates store (115) and instantiating them by defining the tag and the locations of the information to be called up.
  • the interface effects creation unit (117) creates 4 files which are passed to the interface database manager (118) which will store these files either remotely or locally as the case may be: (1) The FDI file, completed by the special effect and script tags, text and graphics which have been added to the interface and which are directly visible to the user. (2) The zoom effect details, scripts and special effects.
  • the user/editor can view the interface under construction, according to the current set of parameters, templates and designer preferences, on the interface viewer unit (121) (presented in Figure 12 and described below), thus allowing the editor to interactively change its appearance and features.
  • the interface viewer unit (121) presented in Figure 12 and described below
  • the interface viewer unit is then employed to read and interact with the interface.
  • the storage units (201) are remotely located and accessed through the interface database manager (202) by way of a communication channel or network; depending upon the size and characteristics of the channel and the application the interface data may be loaded in its entirety or fetched on a as need basis.
  • the data are then stored in a local memory unit (203) which may be either a cache memory, a disk store or any other writable storage element.
  • the local memory unit (203) stores the 4 files created by the editor (see above) and in addition a transaction/audit file. In certain cases the applications programs are already resident in the interface viewer unit and so do not need to be transmitted.
  • the CPU unit (204) fetches the application program, deduces which actions need to be performed, and then fetches the relevant interface information contained in the local memory unit (203). Typically the CPU unit fetches the required application program for the user interaction unit (205), the navigation unit (206), and the transaction audit unit (207), then interface information is read from the local memory unit (203) passed to the interface renderer unit (208) which then calculates how the interface is to appear or be rendered for viewing on the display device (209).
  • the user interacts with the interface through the user interaction unit (205) to the navigation unit (206) and all his actions are audited by the transaction/audit unit (207).
  • the user can interact with the transaction/audit unit (207) for example to supply any information required by the script which is then recorded and stored in the transaction/audit portion of the local memory unit (203).
  • this transaction audit file or a portion thereof is transmitted by the interface database manager to the appropriate storage unit (201). This information is then available for externally (optional) located auditing and transaction processing facilities/applications.
  • the auditing information is transmitted at the end of the session whereas the transaction information may be performed on-line, i.e. the transaction information is submitted during the session.
  • the interface rendered unit (208) calculates how the interface is to appear or be rendered for viewing on the display device (209).
  • the zoom effects unit (210) fetches the required application program, reads the zoom effect parameters stored in the local memory store (203), determines the frames to be dropped or added and supplies this information (including the additional frames if needed) to interface renderer unit (208) which then calculates how the interface is to appear or be rendered for viewing on the display device (209).
  • the video play effects unit (211) fetches the required application program, then reads the required video data from the local memory unit (203) and plays the video on a second display device (209) or in a new window if only one display device is available.
  • the special effects unit (212) fetches the required application program, reads the locations of the object and the corresponding frames are modified so as to be transparent wherever the objects do not occur; the new frames are passed to interface renderer unit (208) which then calculates how the interface is to appear or be rendered for viewing on the display device (209).
  • the frames are passed to the video effects unit (211) which then plays the video on a second display device (209) or in a new window if only one display device is available.
  • the special effects unit fetches the accordion effect store (203), determines the frames to be dropped or added and calculates parameters stored in the local memory the relative position of all the frames and supplies this information (including the additional frames if needed) to interface renderer unit (208) which then calculates how the interface is to appear or be rendered for viewing on the display device (209).
  • the script effects unit (214) fetches the required application program, reads the corresponding portion of the script and the related information required to carry out the portion of the script associated with the tag designated. If the interface is to be played in automatic mode then the script effects unit (214) fetches the entire script and all the related information required to carry out the script. When needed the zoom effects unit (210), the video play unit (211), and the special effects unit (212) may be called into play. If the script calls for user input such as required for carrying out a transaction, then a new window may be opened on the display device (or on a second display device) where the information is supplied and transmitted to the transaction/audit unit (207).
  • references above to user input or user selection processes cover the use of any input device whatsoever operable by the user including, but not limited to, a keyboard, a mouse (or other pointing, device), a touch screen or panel, glove input devices, detectors of eye movements, voice actuated devices, etc.
  • references above to "displays" cover the use of numerous different devices such as, but not limited to, conventional monitor screens, liquid crystal displays, etc.
  • the respective root images each have a single characteristic feature, such as, giving a visual representation of motion, or giving a visual representation of zoom, or having a multi-threaded structure, etc. It is to be understood that a single root image can combine several of these features, as desired. Similarly, special effects such as object extraction, the accordion effect, etc. have been described separately. Again, it is to be understood that interfaces according to the invention can be designed to permit any desired combination of special effects.

Abstract

Interactive interfaces to video information provide a displayed view of a quasi-object called a root image. The root image consists of a plurality of basic frames selected from the video information, arranged such that their respective x and y directions are aligned with the x and y directions in the root image and the z direction in the root image corresponds to time, such that base frames are spaced apart in the z direction of the root image in accordance with their time separation. The displayed view of the root image changes in accordance with a designated viewing position, as if the root image were a three-dimensional object. The user can manipulate the displayed image by designating different viewing positions, selecting portions of the video information for playback and by special effects, such as cutting open the quasi-object for a better view. A toolkit permits interface designers to design such interfaces, notably so as to control the types of interaction which will be possible between the interface and an end user. Implementations of the interfaces including editors and viewers are also disclosed.

Description

INTERACTIVE VIDEO INTERFACES
The present invention relates to the field of interfaces for video information. More particularly, the present invention provides interactive interfaces for video information and tool-kits for use in creation of such interactive interfaces.
Background and Summary
Video information is being produced at an ever-increasing rate and video sequences, especially short sequences, are increasingly being used, for example, in websites and on CD-ROM, and being created, for example, by domestic use of camcorders. There is a growing need for tools enabling the indexing, handling and interaction with video data. It is particularly necessary for interfaces to be provided which enable a user to access video information selectively and to interact with that information, especially in a non-sequential way.
Conventionally, video information consists of a sequence of frames recorded at a fixed time interval. In the case of classic television signals, for example, the video information consists of 25 or 30 frames per second. Each frame is meaningful since it corresponds to an image which can be viewed. A frame may be made up of a number of interlaced fields, but this is not obligatory as is seen from more recently proposed video formats, such as those intended for high definition television. Frames describe the temporal decomposition of the video image information. Each frame contains image information structured in terms of lines and pixels, which represent the spatial decomposition of the video. In the present document, the terms "video information" or "video sequences" refer to data representing a visual image recorded over a given time period, without reference to the length of that time period or the structure of the recorded information. Thus, the term "video sequence" will be used to refer to any series of video frames, regardless of whether this series corresponds to a single camera shot (recorded between two cuts) or to a plurality of shots or scenes.
Traditionally, if a user desired to know what was the content of a particular video sequence he was obliged to watch as each frame, or a sub-sample of the frames, of the sequence was displayed successively in time. (For purposes of this document, the terms "he," "him," or "his" are used for convenience in place of she/he, her/him and hers/his, and are intended to be gender-neutral.) This approach is still wide-spread, and in applications where video data is accessed using a personal computer, the interface to the video often consists of a displayed window in which the video sequence is contained and a set of displayed controls similar to those found on a video tape recorder (allowing fast- forward, rewind, etc.).
Developments in the fields of video indexing and video editing have provided other forms of interface to video information. In the field of video indexing, it is necessary to code information contained in a video sequence in order to enable subsequent retrieval of the sequence from a database by reference to keywords or concepts. The coded content may, for example, identify the types of objects present in the video sequence, their properties/motion, the type of camera movements involved in the video sequence (pan, tracking shot, zoom, etc.), and other properties. A "summary" of the coded document may be prepared, consisting of certain representative frames taken from the sequence, together with text information or icons indicating how the sequence has been coded. The interface for interacting with the video database typically includes a computer input device enabling the user to specify objects or properties of interest and, in response to the query, the computer determines which video sequences in the database correspond to the input search terms and displays the appropriate "summaries". The user then indicates whether or not a particular video sequence should be reproduced. Examples of products using this approach are described in the article "Advanced Imaging Product Survey: Photo, Document and Video" from the journal "Advanced Imaging", October 1994, which document is incorporated herein by this reference.
In some video indexing schemes, the video sequence is divided up into shorter series of frames based upon the scene changes or the semantic content of the video information. A hierarchical structure may be defined. Index "summaries" may be produced for the different series of frames corresponding to nodes in the hierarchical structure. In such a case, at the time when a search is made, the "summary" corresponding to a complete video sequence may be retrieved for display to the user who is then allowed to request display of "summaries" relating to sub-sections of the video sequence which are lower down in the hierarchical structure. If the user so wishes, a selected sequence or sub-section is reproduced on the display monitor. Such a scheme is described in EP-A-0 555 028 which is incorporated herein by this reference.
A disadvantage of such traditional, indexing/searching interfaces to video sequences is that the dynamic quality of the video information is lost.
Another approach, derived from the field of video editing, consists of the "digital storyboard". The video sequence is segmented into scenes and one or more representative frames from each scene is selected and displayed, usually accompanied by text information, side-by-side with representative frames from other segments. The user now has both a visual overview of all the scenes and a direct visual access to individual scenes. Each representative frame of the storyboard can be considered to be an icon. Selection of the icon via a pointing device (typically a mouse-controlled cursor) causes the associated video sequence or sub-sequence to be reproduced. Typical layouts for the storyboards are two-dimensional arrays or long one-dimensional strips. In the first case, the user scans the icons from the left to the right, line by line, whereas in the second case the user needs to move the strip across the screen.
Digital storyboards are typically created by a video editor who views the video sequence, segments the data into individual scenes and places each scene, with a descriptive comment, onto the storyboard. As is well-known from technical literature. many steps of this process can be automated. For example, different techniques for automatic detection of scene changes are discussed in the following documents, each of which is incorporated herein by reference:
"A Real-time neural approach to scene cut detection" by Ardizzone et al, IS&T/SPLE - Storage & Retrieval for Image and Video Databases IV, San Jose, Ca. "Digital Video Segmentation" by Hampapur et al, ACM Multimedia '94
Proceedings, ACM Press -1
"Extraction of News Articles based on Scene Cut Detection using DCT Clustering" by Ariki et al, International Conference on Image Processing, September 1996, Lausanne, Switzerland; "Automatic partitioning of full-motion video" by HoncJiang Zhang et al,
Multimedia Systems (Springer-Verfaa. 199')), 1, pages 10-28-, and EP-A-0 590 759. Various methods for automatically detecting and tracking persons and objects in video sequences are considered in the following documents, each of which is incorporated herein by reference:
"Modeling, Analysis and Visualization of Nonrigid Object Motion", by T.S. Huang, Proc. of International Conf. on Pattern Recognition, Vol. 1, pp 361-364, Atlantic City, NJ, June 1990- and
"Segmentation of People in Motion" by Shio et al, Proc. IEEE, vol. 79, pp 325- 332, 1991.
Techniques for automatically detecting different types of camera shot are described in "Global zoom/pan estimation and compensation for video compression" by Tse et al, Proc. ICASSP, Vol.4, pp 2725-2728, May 1991; and
"Differential estimation of the global motion parameters zoom and pan" by M. Hoetter, Signal Processing, Vol. 16, pp 249-265, 1989.
In the case of digital storyboards too, the dynamic quality of the video sequence is often lost or obscured. Some impression of the movement inherent in the video sequence can be preserved by selecting several frames to represent each scene, preferably frames which demonstrate the movement occurring in that scene. However, storyboard- type interfaces to video information remain awkward to use in view of the fact that multiple actions on the user's part are necessary in order to view and access data. Attempts have been made to create a single visual image which represents both the content of individual views making up a video sequence and preserves the context, that is, the time-varying nature of the video image information.
One such approach creates a "trace" consisting of a single frame having superimposed images taken from different frames of the video sequence, these images being offset one from the other due to motion occurring between the different frames from which the images were taken. Thus, for example, in the case of a video sequence representing a sprinter running, the corresponding "trace" will include multiple (probably overlapping) images of the sprinter, spaced in the direction in which the sprinter is running. Another approach of this kind generates a composite image, called a "salient still", representative of the video sequence - see "Salient Video Stills: Content and
Context Preserved" by Teodosio et al, Proc. ACM Multimedia 93, California, August 1- 6, 1993), pp 39-47 which article is incorporated herein by this reference in its entirety. Still another approach of this general type consists in creation of a "video icon", as described in the papers "Developing Power Tools for Video Indexinor and retrieval" by Zhang et al, SPIE, Vol.2185, pp 140-149-. and "Video Representation tools using a unified object and perspective based approach" by the present inventors, IS&T/SPIE Conference on Storage and Perusal for Image and Video Databases, San Jose, California, February 1995 which are incorporated herein by reference.
In a "video icon" , as illustrated in Fig. 1 A, the scene is represented by a number of frames selected from the sequence and which are displayed as if they were stacked up one behind the other in the z-direction and are viewed in perspective. In other words, each individual frame is represented by a plane and the planes lie one behind the other with a slight offset. Typically the first frame of the stack is displayed in its entirety whereas underlying frames are partially occluded by the frames in front. The envelope of the stack of frames has a parallelepiped shape. The use of a number of frames, even if they are partially occluded, gives the user a more complete view of the scene and, thus, a better visual understanding. Furthermore, with some such icons, the user can directly access any frame represented in the icon.
Two special types of video icon have been proposed, "object based" video icons and video icons containing a representation of camera movement. In an "object based" video icon, as illustrated in Fig. IB, objects of interest are isolated in the individual frames and. for at least some of the stacked frames, the only image information included in the video icon is the image information corresponding to the selected object. In such a video icon, at least some of the individual frames are represented as if they were transparent except in the regions containing the selected object. Video icons containing an indication of camera movement may have, as illustrated in the example of Fig. 1C, a serpentine-shaped envelope corresponding to the case of side-to-side motion of the camera.
The video icons discussed above present the user with information concerning the content of the whole of a video sequence and serve as a selection tool allowing the user to access -frames of the video sequence out of the usual order. In other words, these icons allow non-sequential access to the video sequence. Nevertheless, the ways in which the user can interact with the video sequence information are strictly limited. The user can select frames for playback in a non-sequential way but he has little or no means of obtaining a deeper level of information concerning the video sequence as a whole, short of watching a playback of the whole sequence.
The present invention provides a novel type of interface to video information which allows the user to access information concerning a video sequence in a highly versatile manner. In particular, interactive video interfaces of the present invention enable a user to obtain deeper levels of information concerning an associated video sequence at positions in the sequence which are designated by the user as being of interest.
The present invention provides an interface to information concerning an associated video sequence, one such interface comprising: information defining a three-dimensional root image, the root image consisting of a plurality of basic frames selected from said video sequence, and/or a plurality of portions of video frames corresponding to selected objects represented in the video sequence, x and y directions in the root image corresponding to x and y directions in the video frames and the z direction in the root image corresponding to the time axis whereby the basic frames are spaced apart from one another in the z direction of the root image by distances corresponding to the time separation between the respective video frames; means for displaying views of the root image; means for designating a viewing position relative to said root image; and means for calculating image data representing said three-dimensional root image viewed from the designated viewing position, and for outputting said calculated image data to the displaying means.
According to the present invention, customized user interfaces may be created for video sequences. These interfaces comprise a displayable "root" image which directly represents the content and context of the image information in the video sequence and can be manipulated, either automatically or by the user, in order to display further image information, by designation of a viewing position with respect to the root image, the representation of the displayed image being changed in response to changes in the designated viewing position. In a preferred embodiment of the present invention, the representation of the displayed image changes dependent upon the designated viewing position as if the root image were a three-dimensional object. In such preferred embodiments, as the designated viewing position changes, the data necessary to form the displayed representation of the root image is calculated so as to provide the correct perspective view given the viewing angle, the distance separating the viewing position from the displayed quasi-object and whether the viewing position is above or below the displayed quasi-object.
In a reduced form, the present invention can provide non-interactive interfaces to video sequences, in which the root image information is packaged with an associated script defining a routine for automatically displaying a sequence of different views of the root image and performing a set of manipulations on the displayed image, no user manipulation being permitted. However, the full benefits of the invention are best seen in interactive interfaces where the viewing position of the root image is designated by the user, as follows. When the user first accesses the interface he is presented with a displayed image which represents the root image seen from a particular viewpoint (which may be a predetermined reference viewpoint). As he designates different viewing angles, the displayed image represents the root image seen from different perspectives. When the user designates viewing positions at greater or lesser distances from the root image, the displayed image increases or reduces the size and, preferably, resolution of the displayed information, accessing image data from additional video frames, if need be. The customized, interactive interfaces provided by the present invention involve displayed images, representing the respective associated video sequences, which, in some ways, could be considered to be a navigable environment or a manipulable object. This environment or object is a quasi-three-dimensional entity. The x and y dimensions of the environment/object correspond to true spatial dimensions (corresponding to the x and y directions in the associated video frames) whereas the z dimension of the environment/object corresponds to the time axis. These interfaces could be considered to constitute a development of the "video icons" discussed above, now rendered interactive and manipulable by the user.
With the interfaces provided by the present invention, the user can select spatial and temporal information from a video sequence for access by designating a viewing position with respect to a video icon representing the video sequence. Arbitrarily chosen oblique "viewing directions" are possible whereby the user simultaneously accesses image information corresponding to portions of a number of different frames in the video sequence. As the user's viewing position relative to the video icon changes, the amount of a given frame which is visible to him, and the number and selection of frames which he can see, changes correspondingly.
As mentioned above, the interactive video interfaces of the present invention make use of a "root" image comprising a plurality of basic frames arranged to form a quasi-three dimensional object. It is preferred that the relative placement positions of the basic frames be arranged so as to indicate visually some underlying motion in the video sequence. Thus, for example, if the video sequence corresponds to a travelling shot moving down a hallway and turning a corner, the envelope of the set of basic frames preferably does not have a parallelepiped shape but. instead, composes a "pipe" of rectangular section and bending, in a way corresponding to the camera travel during filming of the video sequence.
In preferred embodiments of video interfacesaccording to the present invention, the basic video frames making up the root image are chosen as a function of the amount of motion or change in the sequence. For example, in the case of a video sequence corresponding to a travelling shot, in which the background information changes, it is preferable that successive basic frames should include back-round information overlapping by, say, 50%.
In certain embodiments of the present invention, the root image corresponds to an "object-based video icon." In other words, certain of the basic frames included in the root image are not included therein in full; only those portions corresponding to selected objects are included. Alternatively, or additionally, certain basic frames may be included in full in the root image but may include "hot objects," that is, representations of objects selectable by the user. In response to selection of such "hot objects" by the user, the corresponding basic frames (and, if necessary, additional frames) are then displayed as if they had become transparent at all portions thereof except the portion(s) where the selected object or objects are displayed. The presence of such selectable objects in the root image allows the user to selectively isolate objects of interest in the video sequence and obtain at a glance a visual impression of the appearance and movement of the objects during the video sequence.
The interfaces of the present invention allow the user to select an arbitrary portion of the video sequence for playback. The user designates a portion of the video sequence which is of interest, by designating a corresponding portion of the displayed image forming part of the interface to the video sequence. This portion of the video sequence is than played back. The interface may include a displayed set of controls similar to those provided on a VCR in order to permit the user to select different modes for this playback, such as fast-forward, rewind, etc.
In preferred embodiments of interfaces according to the invention, the displayed image forming part of the interface remains visible whilst the designated portion of the sequence is being played back. This can be achieved in any number of ways,as for example, by providing a second display device upon which the playback takes place, or by designating a "playback window" on the display screen, this playback window being offset with respect to the screen area used by the interface, or by any other suitable means.
The preferred embodiments of interfaces according to the invention also permit the user to designate an object of interest and to select a playback mode in which only image information concerning that selected object is included in the playback.
Furthermore, the user can select a single frame from the video sequence for display separately from the interactive displayed image generated by the interface.
In preferred embodiments, the interfaces of the present invention allow the user to generate a displayed image corresponding to a distortion of the root image. More especially, the displayed image can correspond to the root image subjected to an
"accordion effect", where the root image is "cracked open", for example, by bending around a bend line so as to "fan out" video frames in the vicinity of the opening point, or is modified by linearly spreading apart video frames at a point of interest. The accordion effect can also be applied repetitively or otherwise in a nested fashion according to the present invention.
The present invention can provide user interfaces to "multi-threaded" video sequences, that is, video sequences consisting of numerous interrelated shorter segments such as are found, for example, in a video game where the user's choices change the scene which is displayed. Interfaces to such multi-threaded video sequences can include frames of the different video segments in the root image, such that the root image has a branching structure. Alternatively, some or all of the different threads may not be visible in the root image but may become visible as a result of user manipulation. For example, if the user expresses an interest in a particular region of the video sequence by designating a portion of a displayed root image using a pointing device (such as a mouse, or by touching a touch screen, etc.) then if multiple different threads of the sequence start from the designated area, image portions for these different threads may be added to the displayed image.
In preferred embodiments of interfaces according to the present invention, the root image for the video sequence concerned is associated with information defining how the corresponding displayed image will change in response to given types of user manipulation. Thus, for example, this associated information may define how many, or which additional frames are displayed when the user moves the viewing position closer up to the root image. Similarly, the associated information may identify which objects in the scene are "hot objects" and what image information will be displayed in relation to these hot objects when activated by the user.
Furthermore, different possibilities exist for delivering the components of the interface to the end user. In an application where video sequences are transmitted to a user over a telecommunications path, such as via the Internet, the user who is interested in a particular video sequence may first download only certain components of the associated interface. First of all he downloads information for generating a displayed view of the root image, together with an associated application program (if he does not already have an appropriate "interface player" loaded in his computer). The downloaded (or already-resident) application program includes basic routines for chancing the perspective of the displayed image in response to changes in the viewing position designated by the user. The application program is also adapted to consult any "associated information" (as mentioned above) which forms part of the interface and conditions the way in which the displayed image changes in response to certain predetermined user manipulations (such as "zoom-in" and "activate object"). If the interface does not contain any such "associated information" then the application program makes use of pre-set default parameters.
The root image corresponds to a particular set of basic video frames and information designating relative placement positions thereof. The root image information downloaded to the user may include just the data necessary to create a reference view of the root image or it may include the image data for the set of basic frames (in order to enable the changes in user viewing angle to be catered for without the need to download additional information). In a case where the user performs a manipulation which requires display of video information which is not present in the root image (e.g. he "zooms in" such that data from additional frames is required), this extra information can either be pre-packaged and supplied with the root image information or the extra information can be downloaded from the host website as and when it is needed.
Similar possibilities exist in the case of interfaces provided on CD-ROM. In general, the root image and other associated information will be provided on the CD- ROM in addition to the full video sequence. However, it is to be understood that, for reasons of space saving, catalogues of video sequences could be made consisting solely of interfaces, without the corresponding full video sequences.
In addition to providing the interfaces themselves, the present invention also provides apparatus for creation of interfaces according to the present invention. This may be dedicated hardware or, more preferably, a computer system programmed in accordance with specially designed computer programs.
Various of the steps involved in creation of a customized interface according to the present invention can be automated. Thus, for example, the selection of basic frames for inclusion in the "root image" of the interface can be made automatically according to one of a number of different algorithms, such as choosinbg one frame every n frames, or choosing 1 frame every time the camera movement has displaced the background by m%, etc. Similarly, the relative placement positions of the basic frames in the root image can be set automatically taking into account the time separation between those frames and, if desired, other factors such as camera motion. Similarly, the presence of objects or people in the video sequence can be detected automatically according to one of the known algorithms (such as those discussed in the references cited above), and an "object oriented" root image can be created automatically. Thus, in some embodiments, the interface creation apparatus of the present invention has the capability of automatically processing video sequence information in order to produce a root image. These embodiments include means for associating with the root image a standard set of routines for changing the representation of the displayed image in response to user manipulations. However, it is often preferable actively to design the characteristics of interactive interfaces according to the invention, such that the ways in which the end user can interact with the video information are limited or channeled in preferred directions, This is particularly true in the case of video sequences which are advertisements or are used in educational software and the like.
Thus, the present invention provides a toolkit for use in creation of customized interfaces. In preferred embodiments, the toolkit enables a designer to tailor the configuration and content of the root image, as well as to specify which objects in the video sequence are "hot objects" and to control the way in which the displayed interface image will change in response to manipulation by an end user. Thus, among other things, the toolkit enables the interface designer to determine which frames of the video sequence should be used as basic frames in the root image, and how many additional frames are added to the displayed image when the user designates a viewing position close to the root image.
Brief Description of the Drawings
Further features and advantages of the present invention will become apparent from the following description of preferred embodiments thereof, given by way of example, and illustrated by the accompanying drawings, in which:
Figure 1 illustrates various types of video icon, wherein Fig. 1A shows an ordinary video icon, Fig. IB shows an object-based video icon and Fig. IC shows a video icon including a representation of camera motion;
Figure 2 is a block diagram indicating the components of an interactive interface according to a first embodiment of the present invention;
Figure 3 is a diagram illustrating the content of the interface data file (FDI) used in the first embodiment of the invention;
Figure 4 is a diagram illustrating a reference view of a root image and three viewing, positions designated by a user;
Figure 5 illustrates the displayed image in the case of the root image viewed from the different viewing positions of Fig.4, wherein Fig.5A represents the displayed image from viewing position A, wherein Fig.5B represents the displayed image from viewing position B, and wherein Fig. 5C represents the displayed image from viewing position C; Figure 6 illustrates displayed images based on more complex root images according to the present invention, in which Figure 6A is derived from a root image visually representing motion and Fig.6B is derived from a root image visually representing a zoom effect; Figure 7 illustrates the effect of user selection of an object represented in the displayed image, in a second embodiment of interface according to the present invention;
Figure 8 illustrates a user manipulation of a root image to produce an "according effect";
Figure 9 illustrates a displayed image corresponding to a view of a branching root image associated with a multi-threaded scenario;
Figure 10 is a flow diagram indicating steps in a preferred process of designing an interface according to the present invention:
Figure 11 is a schematic representation of a preferred embodiment of an interface editor unit according to the present invention: and Figure 12 is a schematic representation of a preferred embodiment of an interface viewer according to the present invention.
Detailed Description
The components of an interactive interface according to a first preferred embodiment of the present invention will now be described with reference to Fig.2. In this example, an interactive interface of the invention is associated with video sequences recorded on a CD-ROM.
As shown in Fig.2, a CD-ROM reader 1 is connected to a computer system including a central processor portion 2, a display screen 3, and a user-operable input device which, in this case, includes a keyboard 4 and a mouse 5. When the user wishes to consult video sequences recorded on a CD-ROM 7, he places the CD-ROM 7 in the CD-ROM reader and activates CD-ROM accessing software provided in the central processor portion 2 or an associated memory or unit. According to the first embodiment of the invention, the CD-ROM has recorded thereon not only the video sequence image information 8 (in any convenient format), but also a respective interface data file (FDI;) 10 for each video sequence, together with a video interface application program 11. The content of a typical data file is illustrated in Fig.3. Respective scripts 12 are optionally associated with the interface data files. When data on the CD-ROM is to be read, the video interface application program 11 is operated by the central processor portion 2 of the computer system and the interface data file applicable to the video sequence selected by the user is processed in order to cause an interactive video icon (see, for example, Figs. 4 and 5) to be displayed on the display screen 3. The user can then manipulate the displayed icon, by making use of the mouse or keyboard input devices, in order to explore the selected video sequence.
The types of manipulations of the interactive video icon which are available to the user will now be described with reference to Figs.4 to 9.
Fig. 4 illustrates a simple interactive video icon according to the present invention. In particular, this video icon is represented on the display screen as a set of superposed images arranged within an envelope having the shape of a regular parallelepiped. Each of the superposed images corresponds to a video frame selected from the video sequence, but these frames are offset from one another. It may be considered that the displayed image corresponds to a cuboid viewed from a particular viewing position (above and to the right, in this example). This cuboid is a theoretical construct consisting of the set of selected video frames disposed such that their respective x and y axes correspond to the x and y axes of the cuboid and the z axis of the cuboid corresponds to the time axis. Thus, in the theoretical construct cuboid, the selected frames are spaced apart in the z direction in accordance with their respective time separations in the video sequence.
When the user seeks to explore the video sequence via the interactive video icon displayed on the display screen, one of the basic operations he can perform is to designate a position on the screen as a viewing position relative to the displayed image (e.g. by "clicking" with the computer mouse). In Fig. 4, three such designated viewing positions are indicated by the letters A, B and C. In response to this operation by the user, the displayed image is changed to the form shown in Fig.5: Figs.5A, 5B and 5C correspond to "viewing positions" A, B and C, respectively, of Fig. 4. The image displayed to the user changes so as to provide a perspective view of the theoretical cuboid as seen from an angle corresponding to the viewing position designated by the user. The above-mentioned cuboid is a special case of a "root image" according to the present invention. This "root image" is derived from the video sequence and conveys information concerning both the image content of the selected sub-set of frames (called below, "basic frames") and the relative "position" of that image information in time as well as space. It is to be appreciated that the "root image" is defined by information in the interface data file. The definition specifies which video frames are "basic frames" (for example, by storing the relevant frame numbers), as well as specifying the placement positions of the basic frames relative to one another within the root image.
The central processor portion 2 of the computer system calculates the image data required to generate the displayed image from the root image definition contained in the appropriate interface data file, image data of the basic frames (and, where required, additional frames) and the viewing position designated by the user, using, standard ray- tracing techniques. The data required to generated the displayed image is loaded into the video buffer and displayed on the display screen. According to the present invention it is preferred that, when the user designates a viewing position close up to the interactive video icon, the image information in the area of interest should be enriched. This is achieved by including, in the displayed image, image data relating to additional video frames besides the basic video frames. Such a case is illustrated in Fig.5B, where the basic frames BF5 and BF6 are displayed together with additional frames AF1 and AF2. As the user-designated viewing position approaches closer and closer to the displayed image the video interface application program causes closely spaced additional frames to be added to the displayed image. Ultimately, successive video frames of the video sequence may be included in the displayed image. As is clear from Fig.5B, image information corresponding to parts of the root image distant from the area of interest may be omitted from the displayed "close- up" image.
Preferably, the interface data file includes data specifying how the choice should be made of additional frames to be added as the user "moves close up" to the displayed image. More preferably, this data defines rules governing the choice of how many, and which, additional frames should be used to enrich the displayed image as the designated viewing position changes. These rules can, for example, define a mathematical relationship between the number of displayed frames and the distance separating the designated viewing position and the displayed quasi-object. In preferred embodiments of the invention, the number of frames which are added to the display as the viewing position approaches the displayed quasi-object depends upon the amount of motion or change in the video sequence at that location. The example illustrated in Fig. 4 is a simplification in which the displayed image corresponds to a root image having a simple, cuboid shape. However, according to the present invention, the root image may have a variety of different forms.
For example, the relative placement positions of the basic frames may be selected such that the envelope of the root image has a shape which reflects motion in the corresponding video sequence (either camera motion, during tracking shots and the like, or motion of objects represented in the sequence) - see the corresponding interactive icon shown in Fig.6 A. Similarly, the dimensions of the basic frames in the root image may be scaled so as to visually represent a zoom effect occurring in the video sequence -see the corresponding interactive icon shown in Fig.όB. It will be seen that the interactive icon represented in Fig.όB includes certain frames for which only a portion of the image information has been displayed. This corresponds to a case where an object of special interest has been selected. Such object selection can be made in various ways. If desired, the root image may be designed such that, instead of including basic frames in full, only those portions of frames which represent a particular object are included. This involves a choice being made, at the time of design of the root image portion of the interface, concerning which objects are interesting. The designer can alternatively or additionally decide that the root image will include basic frames in full but that certain objects represented in the video sequence are to be "selectable" or "extractable" at user request. This feature will now be discussed with reference to Fig.7.
Figure 7A illustrates an initial view presented to a user when he consults the interface for a particular selected video sequence. In this sequence two people walk towards each other and their paths cross. The designer of the interface has decided that the two people are objects that may be of interest to the end user. Accordingly, he has included, in the interface data file, information designating these objects as "extractable". This designation information may correspond to x, y co-ordinate range information identifying the position of the object in each video frame (or a sub-set of frames). If the user expresses an interest in either of the two objects, for example, by designating a screen position corresponding to one of the objects (e.g. by "clicking" on the left-hand person using the right-hand mouse button), then the interface application program controls the displayed image such that extraneous portions of the displayed frames disappear from the display, leaving only a representation of the two people and their motion, as shown in Fig.7B. Thus, the objects of interest are "extracted" from their surroundings. The "missing" or transparent portions of the displayed frames can be restored to the displayed image at the user's demand (e.g. by a further "click" of the mouse button). It is to be understood that, according to the present invention, interfaces may be designed such that particular "extractable" objects may be extracted simultaneously with some or all of the other extractable objects, or they may be extracted individually. Sophisticated interfaces according to the present invention can incorporate object- extraction routines permitting the user to arbitrarily select objects visible in the displayed view of the root image, for extraction. Thus, for example, the user may use a pointing device to create a frame around an object visible in a displayed view of the root image and the application program then provides analysis routines permitting identification of the designated object in the other basic frames of the root image (and, if required, in additional frames) so as to cause display of that selected object as if it were located on transparent frames.
It may be desirable to allow the user to obtain a close-up view of a particular portion of the interactive video icon in a manner which does not correspond to a strict perspective view of the re-ion concerned. Preferred embodiments of interface according to the invention thus provide a so-called "accordion" effect, as illustrated in Fig.8. When the user manipulates the icon by an "accordion" effect at a particular point, the basic frames in the vicinity of the region of interest are spread so as to provide the user with a better view. Further, preferably, the function of displaying additional frames so as to increase detail is inhibited during the "accordion" effect.
In the case of "multi-threaded" video sequences, such as are traditionally found in video-based computer games and educational software and involve parallel video subsequences which are accessed alternatively depending upon the user's choices, these too can be the subject of interfaces according to the present invention. In such a case, the interface designer may choose to include frames from different parallel video subsequences in the interface's root image in order to give the user an idea of the different plot strands available to him in the video sequence. Fig.9 illustrates an interactive video icon derived from a simple example of such a root image. Alternatively, or additionally, the designer may create secondary root images for the respective sub-sequences, these secondary root images being used to generate the displayed image only when the user designates a viewing position close to the video frame where the sub-sequence begins. In the case of interfaces to such computer games or educational software, this is a logical choice since it is at the point where the video sub-sequence branches from the main sequence that user choices during playing of the game, or using of the educational software, change the experienced scenario.
Another manipulation which it is preferable to include in interfaces according to the invention is the traditional set of displayed VCR controls which permit the user to playback the video sequence with which the displayed video icon is associated. Furthermore, the user can select for playback portions or frames within the sequence by, for example, "clicking" with the mouse button on the frames of interest as displayed in the interactive video icon. The video playback can take place on a separate display screen or on a window defined on the display screen displaying the video icon.
As mentioned above, a particular video sequence may be associated with an interface data file and a script. The script is a routine defined by the interface designer which leads the user through the use of the interface. The script can, for example, consist of a routine to cause an automatic demonstration of the different manipulations possible of the displayed quasi-object. The user can alter the running of the script in the usual way, for example by pausing it, slowing it down, etc. The script may, if desired, include additional text, sound or graphic information which can be reproduced in association with the displayed view of the root image either automatically or in response to operations performed by the end user. Script functionality according to the present invention allows creation and editing of viewing scenarios that may be subsequently be played, in part or in whole, automatically, or interactively with user inputs. For example, in a completely automatic mode, the user can cause the scenario to begin to play by itself and take the user through the scenario and any associated information by simply reading the scenario and changing the view. In other situations the script may call for interaction by the user, such as to initiate a transaction. In this case the user may be asked to specify information, e.g. if he wants to purchase the video or any other items associated with what has been viewed. In yet other situations the editor may leave visible tags which when activated by the user will cause some information to be displayed on the display device; e.g. associated text, graphics, video, or sound files which are played through the speakers of the display device. In certain cases these tags are attached to objects selected and extracted from the video sequence, such as so-called "hot objects" according to the present invention.
Fig. 10 is a flow diagram illustrating typical stages in the design of an interface according to the present invention, in the case where a designer is involved. It is to be understood that interfaces according to the present invention can also be generated entirely automatically. It will be noted that the designer's choices affect, notably, the content of the interface data file. It is to be understood, also, that not all of the steps illustrated in Fig. 10 are necessarily required - for example, steps concerning creation of secondary root images can be omitted in the case of a video sequence which is not multithreaded. Similarly, it may be desirable to include in the interface design process certain supplementary steps which are not shown in Fig.10. Thus, for example, it is often desirable to include in the interface data file (as indicated in the example of Fig.3) information regarding the camera motion, cuts, etc. present in the video sequence. During use of the interface, this information can permit, for example, additional video frames to be added to the displayed image and positioned so as to provide a visual representation of the camera motion. During the interface design process the information on the characteristics of the video sequence can be determined either automatically (using, known cut-detection techniques and the like) and/or may be specified by the interface designer. It may also be desirable to include in the interface data file information which allows the sequence, or scripting for it, to be indexed and retrieved. Preferably, the interface or sequence is accessed using such information applied according to a traditional method, such as standard database query language or through a browser via a channel or network; the interface data may be downloaded in its entirety or fetched on an as needed basis.
The present invention provides toolkitd for use by designers wishing to create an interactive video interface according to the present invention. These toolkits are preferably implemented as a computer program for running on a general purpose computer. The toolkits present the designer with displayed menus and instructions to lead him through a process including steps such as the typical sequence illustrated in Fig. 10. The designer first of all indicates for which video sequence he desires to create an interface, for example by typing in the name of a stored file containing the video sequence information. Preferably, the toolkit accesses this video sequence information for display in a window on the screen for consultation by the designer during the interface design process. In such preferred embodiments of the toolkit, the designer may make his selection of basic frames/objects for the root image, extractable objects and the like by stepping slowly through the video sequence and, for example, using a mouse to place a cursor on frames or portions of frames which are of interest. The toolkit logs the frame number (and x, y locations of regions in a frame, where appropriate) of the frames/frame portions indicated by the designer and associates this positional information with the appropriate parameter being defined. Preferably, at the end of the interface design process the designer is presented with a displayed view of the root image for manipulation so that he may determine whether any changes to the interface data file are required.
Different versions of the application program can be associated with the interface data file (and script, if present) depending upon the interface functions which are to be supported. Thus, if no script is associated with the interface data file, the application program does not require routines handling the running of scripts. Similarly, if the interface data file does not permit an accordion effect to be performed by the end user then the application program does not need to include routines required for calculating display information for such effects. If the interface designer believes that the end user is likely already to have an application program suitable for running interfaces according to the present invention then he may choose not to package an application program with the interface data file or else to associate with the interface data file merely information which identifies a suitable version of application program for running this particular interface.
The present invention has been described above in connection with video sequences stored on CD-ROM. It is to be understood that the present invention can be realized in numerous other applications. The content of the interface data file and the elements of the interface which are present at the same location as the end user can vary depending upon the application.
For example, in an application where a video sequence is provided at a web-site, the user may first download via his telecommunications connection just the interface data file applicable to the sequence. If the user does not already have software suitable for handling manipulation of the interactive video icon then he will also download the corresponding application program. As the user manipulates the interactive video icon, any extra image information that he may require which has not already been downloaded can be downloaded in a dynamic fashion as required.
This process can be audited according to the present invention if desired. The user's interaction with the interface can be audited, and he can interact with the transaction/audit functionality for example to supply any information required by a script which may then be recorded and stored. Depending upon the application, the transaction/audit information can be stored and made available for externally (optional) located auditing and transaction processing facilities/applications. In a typical situation, the auditing information can be transmitted at the end of a session whereas the transaction information may be performed on-line, i.e. the transaction information is submitted during the session. Real time transmission can also occur according to the present invention, however.
Another example is the case of a catalogue on CD-ROM including only interfaces rather than the associated video sequences, in order to save space. In such a case, rather than including a pointer to the image information of the basic frames of the root 'image, the interface data frame includes the image information. Some additional image information may also be provided.
The following disclosure relates to a preferred implementation according to the present invention, with reference to FIGs. 11 and 12. A. Interface Editor Unit
Editors, readers and viewers according to the present invention can be implemented in hardware, hardware/software hybrid, or as software on a dedicated platform, a workstation, a personal computer, or any other hardware. Different units implemented in software run on a CPU or graphics boards or other conventional hardware in a conventional manner, and the various storage devices can be general purpose computer storage devices such as magnetic disks, CD-ROMs, DVD, etc.
With reference to Fig. 11, the editor connects to a database manager (101) and selects a video document and any other documents to be included in the interface by using a data chooser unit (102). The database manager may be implemented in various ways; e.g., as a simple file structure or even as a complete multimedia database. The data storage (100) contains the video data and any other information/documents required and can be implemented in various modes; e.g., in a simple stand-alone mode of operation it could be a CD-ROM or in a networked application it could be implemented as a bank of video servers. Typically the user operating through the user interaction unit (120) is first presented a list of available videos or uses a standard database query language to choose the desired video and then chooses any other documents required. The creation of an interface using the editor is discussed below in three phases: (1) Analysis, (2) Visual layout and (3) Effects creation. 1. Analysis.
The video document chosen by the editor is first processed by the activity measure unit (103). The activity measure unit is responsible for computing various parameters related to the motion and changes in the video. This unit typically will implement one of a number of known techniques for measuring changes, e.g., by calculating the statistics of the differences between frames, by tracking objects in motion, or by estimating camera motions by separating foreground and background portions of the image. In other implementations this unit may use motion vector information stored in an MPEG-encoded sequence to detect important frames of activity in the video document. The activity measures template store is optional but would contain templates which can be used to calculate the frame ranking measure and could be specified by the user through the user interaction unit.
These parameters are then used to calculate a frame ranking measure which ranks the different frames as to whether they should be included in the interface. The frame ranking measure is derived heuristically from these measures [e.g., by normalizing the values and taking an average of the parameters, and can be tailored for different kinds of sequences (traveling shots, single objects in motion, etc) or applications]. The editor may choose a pre-defined set of parameters from the activity measures template store (108) to detect or highlight a specific kind of activity (rapid motion, abrupt changes, accelerations, etc.)
The frame ranking measures can be employed by the user acting through the user interaction unit on the frame selection unit (104) to select the frames to be included within the interface. For example, if 10 frames are to be included in the interface then in default mode the 10 frames corresponding to the 10 largest frame making measures are selected for inclusion in the interface. The user can then interactively de-select some of these frames and add other frames.
The camera motion analysis unit (105) is an optional unit which typically will implement one of a number of known techniques for measuring camera motion parameters. This information can be used to determine what shape to give to the outer envelope of the interface as shown in Figure IC: a default shape, stored in the interface template store (1 16) can be chosen. This information may be optionally stored in the FDI file. The object selection unit (106A) is responsible for selecting or detecting individual objects in the video document. There are various modes possible: in a completely manual mode the editor may visually select and outline an object of interest in a given frame through the user interaction unit (120); in a semi-manual mode, the editor simply points at an object and chooses from the object templates store (107) features and associated algorithms to use for extracting and tracking the chosen object; in another mode the editor may chose one of a set of pre-defined templates of objects and known pattern matching techniques are used to detect whether any objects of interest are preset. The user may even assign a name/identifier to the object and add the object to the object templates store (107). In this latter case searches for multiple occurrences of the same object can be initiated by the user. The information regarding the properties of the object may be optionally stored in the FDI file.
The object extraction and tracking unit (106B) is now responsible for extracting the object of interest from the frame and then tracking it by using known tracking algorithms. The algorithms used are either chosen by the user or by default. It is understood that the object selecting, detection, extraction, and tracking process may be highly interactive and that the user may be called upon or choose to intervene in the process a number of times. The information about the presence and location of objects may be optionally stored in the FDI file.
In certain applications the FDI file can be made available to an external program, for example when the interface editor is associated with an indexing program, the task of which is to attach indexes (identifiers) to the video documents, to portions thereof, or to objects located within the video document. 2. Visual Layout.
The user acting through the user interaction unit (120) on the interface creation unit (109) determines the visual layout of the interface. He can shape the outer envelope of the interface in any way that he desires; two examples are provided in Figures 6 and 9; in particular, multiple sequences can be concentrated and so implement branching effects representing alternatives to the user. Default shapes are stored in the interface template store (116). The user can also choose to vary the spacing of the frames seen on the interface; that is the distance between frames of the interface as perceived on the display unit. The user can also insert selections of the extracted and tracked objects from unit (106B) as illustrated in Figure 7B. In this case, the corresponding frames are rendered transparent except at the locations of the objects.
The different pieces of information generated by the units described above are gathered together by the interface creation unit (109) into an FDI file containing a description of the interface in terms of its layout i.e. shape and structure, the image frame numbers and their positions, and if available, the extracted features the ranking of the frames and the camera motion information. This information is transmitted to the interface effects creation unit (117). 3. Effects Creation.
The editor can also specify three classes of interface features which serve to convey additional information to the user and which allow the user to interact with the interface. The editor performs this specification through the interface effects creation unit (117). The zooming effects creation unit (110) is used by the editor to specify which frames will be made visible, and also which will be rendered invisible to the user when he moves up closer to the interface (Fig. 5B) so as to view it from a new viewing position. The choice of frames to add depends upon factors such as, the distance of the viewing point from the interface, the degree of motion, the degree of scene change, the number of frames that can be made visible and optionally the frame ranking measures calculated by the activity measure unit (103). The editor can choose to use one or more of the default zooming effect templates contained in the zooming effect templates store (113) and assign these in a differential manner to different parts of the interface; alternatively the editor can choose to modify these templates and apply them differentially to the interface.
The special effects creation unit (111) is used by the editor to create special visual effects on the interface. One such example is the accordion effect illustrated in Fig. 8 where parts of the interface are compressed and other parts are expanded. Another example is illustrated in Fig. 7A and 7B where the editor has designated an extractable object and which is then shown in its extracted form; in other words, the background is removed. The editor creates the scripts by calling up templates from the specific effects templates store (1 14) and instantiating them by defining the positions where the special effect is to take place and by setting the appropriate parameters.
The script effects creation unit (113) allows the editor of the interface to build an interface viewing scenario that may be subsequently be played, in part or in whole, automatically, or interactively with user inputs. For example, in a completely automatic mode when the user calls up the interface it begins to play by itself and takes the user through the interface and any associated information by simply reading the scenario and changing the view of the interface. In other situations the script may call for the user to interact with the interface, e.g. to initiate a transaction. In this case the user may be asked to specify information, e.g. if he wants to purchase the video or any other items associated with the interface. In yet other situations the editor may leave visible tags which when activated by the user will cause some information to be displayed on the display device; e.g. associated text, graphics, video, or sound files which are played through the speakers of the display device. In certain cases these tags are attached to objects selected and extracted from the video sequence by units 6 A and 6B and become so-called "hot object." The editor creates the scripts by calling up templates from the script effects templates store (115) and instantiating them by defining the tag and the locations of the information to be called up. The interface effects creation unit (117) creates 4 files which are passed to the interface database manager (118) which will store these files either remotely or locally as the case may be: (1) The FDI file, completed by the special effect and script tags, text and graphics which have been added to the interface and which are directly visible to the user. (2) The zoom effect details, scripts and special effects. (3) The application programs (optional) to view the interface; i.e., allow the user to view the interface from different perspectives, traverse the interface, run the script, perform the special effects, or coded information which indicate which application program residing on the users machine can be used to perform these operations. (4) The video sequence and any other associated information (data) required for reading the interface.
These files are shown stored in storage unit (119) but depending upon the embodiment they may be physically located in the same storage device, in separate storage devices (as shown) either locally (as shown) or remotely.
During the editing process, the user/editor can view the interface under construction, according to the current set of parameters, templates and designer preferences, on the interface viewer unit (121) (presented in Figure 12 and described below), thus allowing the editor to interactively change its appearance and features. B. Interface Viewer Unit
Having chosen an interface through a traditional method, for example by using a database query language or by using a browser such as are used for viewing data on the Web. the interface viewer unit is then employed to read and interact with the interface.
In a typical application the storage units (201) are remotely located and accessed through the interface database manager (202) by way of a communication channel or network; depending upon the size and characteristics of the channel and the application the interface data may be loaded in its entirety or fetched on a as need basis.
The data are then stored in a local memory unit (203) which may be either a cache memory, a disk store or any other writable storage element. The local memory unit (203) stores the 4 files created by the editor (see above) and in addition a transaction/audit file. In certain cases the applications programs are already resident in the interface viewer unit and so do not need to be transmitted.
The CPU unit (204) fetches the application program, deduces which actions need to be performed, and then fetches the relevant interface information contained in the local memory unit (203). Typically the CPU unit fetches the required application program for the user interaction unit (205), the navigation unit (206), and the transaction audit unit (207), then interface information is read from the local memory unit (203) passed to the interface renderer unit (208) which then calculates how the interface is to appear or be rendered for viewing on the display device (209).
The user interacts with the interface through the user interaction unit (205) to the navigation unit (206) and all his actions are audited by the transaction/audit unit (207). In addition, the user can interact with the transaction/audit unit (207) for example to supply any information required by the script which is then recorded and stored in the transaction/audit portion of the local memory unit (203). Depending upon the application, this transaction audit file or a portion thereof is transmitted by the interface database manager to the appropriate storage unit (201). This information is then available for externally (optional) located auditing and transaction processing facilities/applications. In a typical situation, the auditing information is transmitted at the end of the session whereas the transaction information may be performed on-line, i.e. the transaction information is submitted during the session.
Through the navigation unit (206) the user can choose the point of view from which to view the interface (or a portion of the interface). The interface rendered unit (208) then calculates how the interface is to appear or be rendered for viewing on the display device (209).
If the user chooses to zoom in or zoom out. then the zoom effects unit (210) fetches the required application program, reads the zoom effect parameters stored in the local memory store (203), determines the frames to be dropped or added and supplies this information (including the additional frames if needed) to interface renderer unit (208) which then calculates how the interface is to appear or be rendered for viewing on the display device (209).
If the user chooses to view part of the underlying video then the video play effects unit (211), fetches the required application program, then reads the required video data from the local memory unit (203) and plays the video on a second display device (209) or in a new window if only one display device is available.
If the user chooses to interact with a hot pre-extracted object (created by the special effects unit), then the special effects unit (212), fetches the required application program, reads the locations of the object and the corresponding frames are modified so as to be transparent wherever the objects do not occur; the new frames are passed to interface renderer unit (208) which then calculates how the interface is to appear or be rendered for viewing on the display device (209). In cases where the extracted object is to be played as a video the frames are passed to the video effects unit (211) which then plays the video on a second display device (209) or in a new window if only one display device is available. Similarly if the user chooses to view an accordion effect then the special effects unit fetches the accordion effect store (203), determines the frames to be dropped or added and calculates parameters stored in the local memory the relative position of all the frames and supplies this information (including the additional frames if needed) to interface renderer unit (208) which then calculates how the interface is to appear or be rendered for viewing on the display device (209).
If the user designates a tag created by the script then the script effects unit (214) fetches the required application program, reads the corresponding portion of the script and the related information required to carry out the portion of the script associated with the tag designated. If the interface is to be played in automatic mode then the script effects unit (214) fetches the entire script and all the related information required to carry out the script. When needed the zoom effects unit (210), the video play unit (211), and the special effects unit (212) may be called into play. If the script calls for user input such as required for carrying out a transaction, then a new window may be opened on the display device (or on a second display device) where the information is supplied and transmitted to the transaction/audit unit (207). In semi-automatic mode control of the viewing of the interface is passed between the script effects unit (214) and the navigation as instructed by the user through the user interaction unit (205). Although the above-discussed preferred embodiments of the present invention present certain combinations of features, it is to be understood that the present invention is not limited to the details of these particular examples. Firstly, since image processing is performed on image data in digital form, it is to be understood that in the case where the video sequence consists of data in analogue form, an analogue-to digital converter or the like will be used in order to provide image data in a form suitable for processing. It is to be understood that the present invention can be used to create interfaces to video sequences where the video data is in compressed form, encrypted, etc. Secondly, references above to user input or user selection processes cover the use of any input device whatsoever operable by the user including, but not limited to, a keyboard, a mouse (or other pointing, device), a touch screen or panel, glove input devices, detectors of eye movements, voice actuated devices, etc. Thirdly, references above to "displays" cover the use of numerous different devices such as, but not limited to, conventional monitor screens, liquid crystal displays, etc.
Furthermore, for ease of comprehension the above discussion describes interfaces according to the present invention in which the respective root images each have a single characteristic feature, such as, giving a visual representation of motion, or giving a visual representation of zoom, or having a multi-threaded structure, etc. It is to be understood that a single root image can combine several of these features, as desired. Similarly, special effects such as object extraction, the accordion effect, etc. have been described separately. Again, it is to be understood that interfaces according to the invention can be designed to permit any desired combination of special effects.

Claims

ClaimsWhat is claimed is:
1. An interface to an associated video sequence, the interface comprising: a) information defining a three-dimensional root image, the root image consisting of a plurality of basic frames selected from said video sequence, and/or a plurality of portions of video frames corresponding to selected objects represented in the video sequence, x and y directions in the root image corresponding, to x and y directions in the video frames and the z direction in the root image corresponding to the time axis whereby the basic frames are spaced apart from one another in the z direction of the root image by distances corresponding to the time separation between the respective video frames; b) means for displaying views of the root image; c) means for designating a viewing position relative to said root image; and d) means for calculating image data representing said three-dimensional root image viewed from the designated viewing, position, and for outputting said calculated image data to the displaying means.
2. An interactive interface according to claim 1 , wherein the designating means is user-operable means for designating a viewing position relative to a displayed representation of the root image.
3. An interactive interface according to claim 1 wherein the means for calculating image data for display is adapted to include in the calculated output image data, dependent upon the designated viewing position, image data corresponding to portions of basic frames which are not visible in the reference view of the root image.
4. An interface according to claim 1 wherein the means for calculating image data for display is adapted to include in the calculated image data, dependent upon the distance between the designated viewing position and the root image, image data from frames of the video sequence additional to the basic frames.
5. An interactive interface according to claim 4 wherein the means for calculating image data for display is adapted to select for use in calculating the image for display additional frames chosen based upon criteria specified in additional information stored in association with the root image definition.
6. An interface according to claim 1, wherein the means for calculating image data for display is adapted to calculate output image data corresponding to a different number of frames and/or a displayed image of enlarged or reduced size, dependent upon the distance between the user-designated viewing position and the root image.
7. An interface according to any previous claim, wherein the video sequence includes image data representing one or more selected objects, the means for calculating image data for display being adapted, for each displayed frame containing a respective selected object, selectively to output image data causing display of only that image data which corresponds to the selected object(s), causing, the remainder of the respective displayed frame to appear transparent.
8. An interface according to claim 7, wherein there is provided means for the user to select objects represented in the displayed image, and wherein the means for calculating image data for display is adapted to output image data causing portions of frames to appear transparent in response to the selection of objects by the user.
9. An interface according to any previous claim for a video sequence comprising a main sequence of video frames and at least one additional sub-sequence of video frames constituting an alternative path to or from a particular video frame in the main sequence, wherein the user can access image information relating to an alternative sub-sequence by designating a viewing position close to a point in the root image corresponding to said particular video frame, the means for calculating image data for display being adapted to graft on to the displayed view of the root image, at the branching point, a secondary root image representing said alternative sub-sequence.
10. An interactive interface according to claim 9, wherein by operation of the viewing position designating means the user can navigate through root images and secondary root images corresponding to the different possible scenarios contained in the video sequence.
11. Apparatus for creation of an interface to a video sequence, the apparatus comprising: a) means for accessing image information in digital form representing a video sequence; b) means for creating a root image representing the video sequence, the root image creation means comprising: i) means for selecting a sub-set of frames from the video sequence, or portions of said sub-set which correspond to objects represented in the video sequence, to serve as basic frames of the root image; and ii) means for setting the relative placement positions of the basic frames in the root image, and c) means for associating with the root image routines for changing the displayed view of the root image depending upon a designated viewing position relative to the root image.
12. Apparatus according to claim 11 and further comprising means for identification of objects represented in the image information of the video sequence and for designating objects as selectable by an end user.
13. Apparatus according to claim 11 wherein the means for setting the relative placement positions of the basic frames in the root image comprises means for accessing stored information representing a plurality of templates and means for inputting selection information designating one of the stored templates.
14. Apparatus according to claim 11 wherein the means for setting the relative placement positions of the basic frames in the root image comprises means for detecting motion in the video sequence and means for placing the basic frames within the root image in relative positions which provide a visual representation of said motion.
15. Apparatus according to claim 14 wherein the means for placing the basic frames within the root image is adapted to effect a progressive change in the dimensions of the basic frames in the root image in order to visually represent a zoom-in or zoom-out operation.
16. Apparatus according to claim 11, wherein the means for selecting a sub-set of frames from the video sequence to serve as basic frames of the root image is adapted to select frames as a function of the rate of change of back-round information in the image.
17. Apparatus according to claim 1 1, and comprising means for inputting parameters constraining the ways in which the displayed view of the root image can be changed depending upon a user-designated viewing position, the constraint parameters being assimilated into the routines associated with the root image by the associating means.
18. Apparatus according to claim 17 wherein the constraint parameter inputting means is adapted to input data identifying the rate at which additional frames should be included in a displayed view of the root image when a user-designated viewing position approaches the root image.
19. Apparatus according to claim 11, and comprising means for creating secondary root images corresponding to additional sub-sequences of video frames constituting alternative paths to or from a particular video frame in the main video sequence.
20. A process for creating an interface corresponding to a predetermined video sequence, comprising the steps of: a) retrieving the video sequence from a data store; b) analyzing data corresponding to at least some frames of the video sequence based upon at least one predetermined algorithm; c) selecting at least some frames from the video sequence based at least in part on frame ranking measure parameters stored in a frame ranking template store; d) arranging the selected frames to form a succession of frames defining at least in part the interface; and e) transferring data corresponding to said selected and arranged frames to an interface store.
21. A process according to claim 20 in which said step of selection is conducted automatically.
22. A process according to claim 20 in which said step of selection is conducted at least in part manually.
23. A process according to claim 20 in which said step of selection is based at least in part on the degree of motion of objects in the frames.
24. A process according to claim 20 in which said step of selection is based at least in part on estimating camera motion by separating foreground and background portions of the images.
25. A process according to claim 20 in which said step of selection is based at least in part on vector data in the digital representation of images in the frames.
26. A process according to claim 20 further comprising the steps of: a) analyzing data corresponding to at least some frames of the video sequence in order to evaluate objects within said frames; b) selecting at least one object from a plurality of the frames; c) selecting at least some frames from the video sequence based at least in part on said at least one object; d) tracking said at least one object through the selected frames; and e) arranging the selected frames based at least in part on said at least one object.
27. A process according to claim 26 in which said step of selecting said at least one object is conducted automatically.
28. A process according to claim 26 in which said step of selecting said at least one object is conducted at least in part manually.
29. A process according to claim 20 further comprising the steps of: a) arranging the selected frames to form a succession of frames defining at least in part the interface; b) selecting at least one additional frame to add to the succession of frames corresponding to a new viewing position based at least in part on certain predetermined factors; and c) selecting at least one frame to remove from the succession of frames corresponding to a new viewing position based at least in part on certain predetermined factors.
30. A process according to claim 29 in which said step of selecting at least one additional frame to add is conducted automatically.
31. A process according to claim 29 in which said step of selecting at least one additional frame to add is conducted at least in part manually.
32. A process according to claim 29 in which said step of selecting at least one additional frame to remove is conducted automatically.
33. A process according to claim 29 in which said step of selecting at least one additional frame to remove is conducted at least in part manually.
34. A process according to claim 20 further comprising the steps of: a) arranging the selected frames based at least in part on user specified criteria; and b) calculating the arrangement of the selected frames based at least in part on predetermined algorithms.
35. A process according to claim 20 further comprising the steps of: a) creating an interface data file which contains data corresponding at least in part to said interface; and b) storing said interface data file in a data store.
36. A process according to claim 20 comprising the further steps of: a) creating a effects detail file which contains data corresponding to the said selection and arrangement of the selected frames; and b) storing said effects detail file in a data store.
37. A process according to claim 20 comprising the further steps of: a) creating a video sequence file which contains data corresponding to the selected frames; and b) storing said video sequence file in a data store.
38. A process according to claim 20 comprising the further steps of: a) extracting a predetermined set of information from the said interface; b) creating a script file consisting at least in part of said predetermined set of information; and c) storing the said script file in a data store.
39. A method for processing an interface corresponding to a predetermined video sequence, comprising the steps of: a) retrieving the video sequence from a data store; b) analyzing data corresponding to at least some frames of the video sequence based upon at least one predetermined algorithm; c) selecting at least some frames from the video sequence based at least in part on frame ranking measure parameters stored in a frame ranking template store; d) arranging the selected frames to form a succession of frames defining at least in part the interface; and e) transferring data capable of generating at least one image corresponding to said succession of frames to a viewer; f) generating an image from a desired perspective using said data capable of generating at least one image; and g) displaying said image on a display device.
40. A method according to claim 39 further comprising the steps of: a) generating an image from a second desired perspective using said data capable of generating at least one image; and b) displaying said image on a display device.
41. A method according to claim 39 in which the step of generating an image from a desired perspective using said data capable of generating at least one image comprises the steps of: a) determining frames in the interface to dropped or added; b) calculating the position of all frames relative to each other; and c) generating an image that renders said frames positioned appropriately relative to each other and that takes into account the predetermined perspective.
PCT/US1998/013606 1997-07-03 1998-06-30 Interactive video interfaces WO1999001830A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
IL13379898A IL133798A0 (en) 1997-07-03 1998-06-30 Interactive video interfaces
EP98932999A EP0992010A1 (en) 1997-07-03 1998-06-30 Interactive video interfaces

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US08/887,992 1997-07-03
US08/887,992 US5963203A (en) 1997-07-03 1997-07-03 Interactive video icon with designated viewing position

Publications (1)

Publication Number Publication Date
WO1999001830A1 true WO1999001830A1 (en) 1999-01-14

Family

ID=25392296

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1998/013606 WO1999001830A1 (en) 1997-07-03 1998-06-30 Interactive video interfaces

Country Status (4)

Country Link
US (3) US5963203A (en)
EP (1) EP0992010A1 (en)
IL (1) IL133798A0 (en)
WO (1) WO1999001830A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000051078A2 (en) * 1999-02-22 2000-08-31 Siemens Corporate Research, Inc. Method and apparatus for authoring and linking video documents
WO2001026377A1 (en) * 1999-10-04 2001-04-12 Obvious Technology, Inc. Network distribution and management of interactive video and multi-media containers
EP1922864A1 (en) * 2005-08-15 2008-05-21 Disney Enterprises, Inc. A system and method for automating the creation of customized multimedia content
EP1983418A1 (en) * 2007-04-16 2008-10-22 Fujitsu Limited Display device, display program storage medium, and display method
GB2477120A (en) * 2010-01-22 2011-07-27 Icescreen Ehf Media editing using poly-dimensional display of sequential image frames
USRE42728E1 (en) 1997-07-03 2011-09-20 Sony Corporation Network distribution and management of interactive video and multi-media containers
US8201073B2 (en) 2005-08-15 2012-06-12 Disney Enterprises, Inc. System and method for automating the creation of customized multimedia content
US8811801B2 (en) 2010-03-25 2014-08-19 Disney Enterprises, Inc. Continuous freeze-frame video effect system and method
EP2816563A1 (en) * 2013-06-18 2014-12-24 Nokia Corporation Video editing
EP2816564A1 (en) * 2013-06-21 2014-12-24 Nokia Corporation Method and apparatus for smart video rendering
WO2016207861A1 (en) * 2015-06-25 2016-12-29 Nokia Technologies Oy Method, apparatus, and computer program product for predictive customizations in self and neighborhood videos

Families Citing this family (173)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6735253B1 (en) 1997-05-16 2004-05-11 The Trustees Of Columbia University In The City Of New York Methods and architecture for indexing and editing compressed video over the world wide web
JP3116033B2 (en) * 1997-07-02 2000-12-11 一成 江良 Video data creation method and video data display method
US6292837B1 (en) * 1997-10-30 2001-09-18 Daniel Miller Apparatus and method for non-sequential image data transmission and display
US6052492A (en) * 1997-12-09 2000-04-18 Sun Microsystems, Inc. System and method for automatically generating an image to represent a video sequence
US6246790B1 (en) * 1997-12-29 2001-06-12 Cornell Research Foundation, Inc. Image indexing using color correlograms
US6380950B1 (en) * 1998-01-20 2002-04-30 Globalstreams, Inc. Low bandwidth television
JP3436688B2 (en) * 1998-06-12 2003-08-11 富士写真フイルム株式会社 Image playback device
DE69907829T2 (en) * 1998-09-03 2004-04-01 Ricoh Co., Ltd. Storage media with video or audio index information, management procedures and retrieval procedures for video or audio information and video retrieval system
WO2000017778A1 (en) * 1998-09-17 2000-03-30 Sony Corporation Image generating device and method
WO2000017777A1 (en) * 1998-09-17 2000-03-30 Sony Corporation Image display device and method
US7143434B1 (en) 1998-11-06 2006-11-28 Seungyup Paek Video description system and method
US6859799B1 (en) * 1998-11-30 2005-02-22 Gemstar Development Corporation Search engine for video and graphics
KR100347710B1 (en) * 1998-12-05 2002-10-25 엘지전자주식회사 Method and data structure for video browsing based on relation graph of characters
US6320600B1 (en) * 1998-12-15 2001-11-20 Cornell Research Foundation, Inc. Web-based video-editing method and system using a high-performance multimedia software library
US6674436B1 (en) * 1999-02-01 2004-01-06 Microsoft Corporation Methods and apparatus for improving the quality of displayed images through the use of display device and display condition information
US6571024B1 (en) * 1999-06-18 2003-05-27 Sarnoff Corporation Method and apparatus for multi-view three dimensional estimation
US6366407B2 (en) * 1999-07-12 2002-04-02 Eastman Kodak Company Lenticular image product with zoom image effect
JP2001043215A (en) * 1999-08-02 2001-02-16 Sony Corp Device and method for processing document and recording medium
US8464302B1 (en) 1999-08-03 2013-06-11 Videoshare, Llc Method and system for sharing video with advertisements over a network
US6681043B1 (en) * 1999-08-16 2004-01-20 University Of Washington Interactive video object processing environment which visually distinguishes segmented video object
US7996878B1 (en) 1999-08-31 2011-08-09 At&T Intellectual Property Ii, L.P. System and method for generating coded video sequences from still media
US7020351B1 (en) 1999-10-08 2006-03-28 Sarnoff Corporation Method and apparatus for enhancing and indexing video and audio signals
WO2001028238A2 (en) * 1999-10-08 2001-04-19 Sarnoff Corporation Method and apparatus for enhancing and indexing video and audio signals
US7181757B1 (en) * 1999-10-11 2007-02-20 Electronics And Telecommunications Research Institute Video summary description scheme and method and system of video summary description data generation for efficient overview and browsing
US6996775B1 (en) * 1999-10-29 2006-02-07 Verizon Laboratories Inc. Hypervideo: information retrieval using time-related multimedia:
US6757866B1 (en) * 1999-10-29 2004-06-29 Verizon Laboratories Inc. Hyper video: information retrieval using text from multimedia
US6569206B1 (en) * 1999-10-29 2003-05-27 Verizon Laboratories Inc. Facilitation of hypervideo by automatic IR techniques in response to user requests
US6873327B1 (en) * 2000-02-11 2005-03-29 Sony Corporation Method and system for automatically adding effects to still images
AU2001245575A1 (en) 2000-03-09 2001-09-17 Videoshare, Inc. Sharing a streaming video
US6901110B1 (en) 2000-03-10 2005-05-31 Obvious Technology Systems and methods for tracking objects in video sequences
US6925602B1 (en) * 2000-03-20 2005-08-02 Intel Corporation Facilitating access to digital video
JP4730571B2 (en) * 2000-05-01 2011-07-20 ソニー株式会社 Information processing apparatus and method, and program storage medium
WO2001090876A1 (en) * 2000-05-24 2001-11-29 Koninklijke Philips Electronics N.V. A method and apparatus for shorthand processing of medical images
US20040125877A1 (en) * 2000-07-17 2004-07-01 Shin-Fu Chang Method and system for indexing and content-based adaptive streaming of digital video content
US7193645B1 (en) 2000-07-27 2007-03-20 Pvi Virtual Media Services, Llc Video system and method of operating a video system
US6697523B1 (en) * 2000-08-09 2004-02-24 Mitsubishi Electric Research Laboratories, Inc. Method for summarizing a video using motion and color descriptors
US6810149B1 (en) 2000-08-17 2004-10-26 Eastman Kodak Company Method and system for cataloging images
KR101399240B1 (en) 2000-10-11 2014-06-02 유나이티드 비디오 프로퍼티즈, 인크. Systems and methods for delivering media content
EP1335302A4 (en) * 2000-10-20 2005-02-02 Sharp Kk Dynamic image content search information managing apparatus
US8711217B2 (en) 2000-10-24 2014-04-29 Objectvideo, Inc. Video surveillance system employing video primitives
US9892606B2 (en) * 2001-11-15 2018-02-13 Avigilon Fortress Corporation Video surveillance system employing video primitives
US8564661B2 (en) 2000-10-24 2013-10-22 Objectvideo, Inc. Video analytic rule detection system and method
US7868912B2 (en) * 2000-10-24 2011-01-11 Objectvideo, Inc. Video surveillance system employing video primitives
US20050146605A1 (en) * 2000-10-24 2005-07-07 Lipton Alan J. Video surveillance system employing video primitives
FR2818053B1 (en) * 2000-12-07 2003-01-10 Thomson Multimedia Sa ENCODING METHOD AND DEVICE FOR DISPLAYING A ZOOM OF AN MPEG2 CODED IMAGE
US6816174B2 (en) * 2000-12-18 2004-11-09 International Business Machines Corporation Method and apparatus for variable density scroll area
US6957389B2 (en) * 2001-04-09 2005-10-18 Microsoft Corp. Animation on-object user interface
US20020167546A1 (en) * 2001-05-10 2002-11-14 Kimbell Benjamin D. Picture stack
US20030210329A1 (en) * 2001-11-08 2003-11-13 Aagaard Kenneth Joseph Video system and methods for operating a video system
US7339992B2 (en) 2001-12-06 2008-03-04 The Trustees Of Columbia University In The City Of New York System and method for extracting text captions from video and generating video summaries
US6948127B1 (en) * 2001-12-10 2005-09-20 Cisco Technology, Inc. Interface for compressed video data analysis
US7199805B1 (en) 2002-05-28 2007-04-03 Apple Computer, Inc. Method and apparatus for titling
US7904812B2 (en) * 2002-10-11 2011-03-08 Web River Media, Inc. Browseable narrative architecture system and method
US20040139481A1 (en) * 2002-10-11 2004-07-15 Larry Atlas Browseable narrative architecture system and method
JP4114720B2 (en) * 2002-10-25 2008-07-09 株式会社ソニー・コンピュータエンタテインメント Image generation method and image generation apparatus
GB2402588B (en) * 2003-04-07 2006-07-26 Internet Pro Video Ltd Computer based system for selecting digital media frames
US20040233233A1 (en) * 2003-05-21 2004-11-25 Salkind Carole T. System and method for embedding interactive items in video and playing same in an interactive environment
US20070118812A1 (en) * 2003-07-15 2007-05-24 Kaleidescope, Inc. Masking for presenting differing display formats for media streams
JP4250540B2 (en) * 2004-01-30 2009-04-08 キヤノン株式会社 Layout adjustment method and apparatus, and layout adjustment program
CN100471220C (en) * 2004-02-17 2009-03-18 Nxp股份有限公司 Method of visualizing a large still picture on a small-size display.
US7983835B2 (en) 2004-11-03 2011-07-19 Lagassey Paul J Modular intelligent transportation system
JP4241647B2 (en) * 2005-03-04 2009-03-18 キヤノン株式会社 Layout control apparatus, layout control method, and layout control program
WO2006096612A2 (en) 2005-03-04 2006-09-14 The Trustees Of Columbia University In The City Of New York System and method for motion estimation and mode decision for low-complexity h.264 decoder
US8074248B2 (en) 2005-07-26 2011-12-06 Activevideo Networks, Inc. System and method for providing video content associated with a source image to a television in a communication network
JP4982065B2 (en) * 2005-09-26 2012-07-25 株式会社東芝 Video content display system, video content display method and program thereof
US20070124766A1 (en) * 2005-11-30 2007-05-31 Broadcom Corporation Video synthesizer
US7694885B1 (en) * 2006-01-26 2010-04-13 Adobe Systems Incorporated Indicating a tag with visual data
US7716157B1 (en) 2006-01-26 2010-05-11 Adobe Systems Incorporated Searching images with extracted objects
US8259995B1 (en) 2006-01-26 2012-09-04 Adobe Systems Incorporated Designating a tag icon
US7636450B1 (en) 2006-01-26 2009-12-22 Adobe Systems Incorporated Displaying detected objects to indicate grouping
US7319421B2 (en) * 2006-01-26 2008-01-15 Emerson Process Management Foldback free capacitance-to-digital modulator
US7813526B1 (en) 2006-01-26 2010-10-12 Adobe Systems Incorporated Normalizing detected objects
US7720258B1 (en) 2006-01-26 2010-05-18 Adobe Systems Incorporated Structured comparison of objects from similar images
US7706577B1 (en) 2006-01-26 2010-04-27 Adobe Systems Incorporated Exporting extracted faces
US7813557B1 (en) 2006-01-26 2010-10-12 Adobe Systems Incorporated Tagging detected objects
US7978936B1 (en) 2006-01-26 2011-07-12 Adobe Systems Incorporated Indicating a correspondence between an image and an object
JP4714056B2 (en) * 2006-03-23 2011-06-29 株式会社日立製作所 Media recognition system
US8020100B2 (en) * 2006-12-22 2011-09-13 Apple Inc. Fast creation of video segments
US8943410B2 (en) 2006-12-22 2015-01-27 Apple Inc. Modified media presentation during scrubbing
WO2008088741A2 (en) 2007-01-12 2008-07-24 Ictv, Inc. Interactive encoded content system including object models for viewing on a remote device
US9826197B2 (en) 2007-01-12 2017-11-21 Activevideo Networks, Inc. Providing television broadcasts over a managed network and interactive content over an unmanaged network to a client device
US8296662B2 (en) * 2007-02-05 2012-10-23 Brother Kogyo Kabushiki Kaisha Image display device
US8166045B1 (en) 2007-03-30 2012-04-24 Google Inc. Phrase extraction using subphrase scoring
US8166021B1 (en) 2007-03-30 2012-04-24 Google Inc. Query phrasification
US7693813B1 (en) 2007-03-30 2010-04-06 Google Inc. Index server architecture using tiered and sharded phrase posting lists
WO2008137432A2 (en) * 2007-05-01 2008-11-13 Dyyno Sharing of information and formatting information for transmission over a communication network
WO2009126785A2 (en) 2008-04-10 2009-10-15 The Trustees Of Columbia University In The City Of New York Systems and methods for image archaeology
WO2009155281A1 (en) 2008-06-17 2009-12-23 The Trustees Of Columbia University In The City Of New York System and method for dynamically and interactively searching media data
US8325796B2 (en) * 2008-09-11 2012-12-04 Google Inc. System and method for video coding using adaptive segmentation
US8311111B2 (en) * 2008-09-11 2012-11-13 Google Inc. System and method for decoding using parallel processing
US8326075B2 (en) 2008-09-11 2012-12-04 Google Inc. System and method for video encoding using adaptive loop filter
US20100070527A1 (en) * 2008-09-18 2010-03-18 Tianlong Chen System and method for managing video, image and activity data
US8755515B1 (en) 2008-09-29 2014-06-17 Wai Wu Parallel signal processing system and method
US20120330507A1 (en) * 2008-12-22 2012-12-27 Toyota Motor Engineering & Manufacturing North America, Inc. Interface for cycling through and selectively choosing a mode of a vehicle function
US8671069B2 (en) 2008-12-22 2014-03-11 The Trustees Of Columbia University, In The City Of New York Rapid image annotation via brain state decoding and visual pattern mining
US9299184B2 (en) * 2009-04-07 2016-03-29 Sony Computer Entertainment America Llc Simulating performance of virtual camera
JP5414357B2 (en) * 2009-05-20 2014-02-12 キヤノン株式会社 Imaging device and playback device
EP2302845B1 (en) 2009-09-23 2012-06-20 Google, Inc. Method and device for determining a jitter buffer level
US9652462B2 (en) * 2010-04-29 2017-05-16 Google Inc. Identifying responsive resources across still images and videos
CN102262439A (en) * 2010-05-24 2011-11-30 三星电子株式会社 Method and system for recording user interactions with a video sequence
US8477050B1 (en) 2010-09-16 2013-07-02 Google Inc. Apparatus and method for encoding using signal fragments for redundant transmission of data
WO2012050832A1 (en) 2010-09-28 2012-04-19 Google Inc. Systems and methods utilizing efficient video compression techniques for providing static image data
WO2012051528A2 (en) 2010-10-14 2012-04-19 Activevideo Networks, Inc. Streaming digital video between video devices using a cable television system
US8751565B1 (en) 2011-02-08 2014-06-10 Google Inc. Components for web-based configurable pipeline media processing
US8780971B1 (en) 2011-04-07 2014-07-15 Google, Inc. System and method of encoding using selectable loop filters
EP2695388B1 (en) 2011-04-07 2017-06-07 ActiveVideo Networks, Inc. Reduction of latency in video distribution networks using adaptive bit rates
US9154799B2 (en) 2011-04-07 2015-10-06 Google Inc. Encoding and decoding motion via image segmentation
US8780996B2 (en) 2011-04-07 2014-07-15 Google, Inc. System and method for encoding and decoding video data
US8781004B1 (en) 2011-04-07 2014-07-15 Google Inc. System and method for encoding video using variable loop filter
AU2012245285A1 (en) 2011-04-22 2013-11-21 Pepsico, Inc. Beverage dispensing system with social media capabilities
US8907957B2 (en) 2011-08-30 2014-12-09 Apple Inc. Automatic animation generation
US9164576B2 (en) 2011-09-13 2015-10-20 Apple Inc. Conformance protocol for heterogeneous abstractions for defining user interface behaviors
US8819567B2 (en) 2011-09-13 2014-08-26 Apple Inc. Defining and editing user interface behaviors
US8885706B2 (en) 2011-09-16 2014-11-11 Google Inc. Apparatus and methodology for a video codec system with noise reduction capability
WO2013067020A1 (en) 2011-11-01 2013-05-10 Stephen Lim Dispensing system and user interface
US9100657B1 (en) 2011-12-07 2015-08-04 Google Inc. Encoding time management in parallel real-time video encoding
US8805418B2 (en) 2011-12-23 2014-08-12 United Video Properties, Inc. Methods and systems for performing actions based on location-based rules
EP2815582B1 (en) 2012-01-09 2019-09-04 ActiveVideo Networks, Inc. Rendering of an interactive lean-backward user interface on a television
USD696265S1 (en) 2012-01-19 2013-12-24 Pepsico, Inc. Display screen with graphical user interface
USD696267S1 (en) 2012-01-19 2013-12-24 Pepsico, Inc. Display screen with graphical user interface
USD696264S1 (en) 2012-01-19 2013-12-24 Pepsico, Inc. Display screen with graphical user interface
USD702698S1 (en) 2012-01-19 2014-04-15 Pepsico, Inc. Display screen with graphical user interface
USD702247S1 (en) 2012-01-19 2014-04-08 Pepsico, Inc. Display screen with graphical user interface
USD702699S1 (en) 2012-01-19 2014-04-15 Pepsico, Inc. Display screen with graphical user interface
USD703681S1 (en) 2012-01-19 2014-04-29 Pepsico, Inc. Display screen with graphical user interface
USD696266S1 (en) 2012-01-19 2013-12-24 Pepsico, Inc. Display screen with graphical user interface
US9262670B2 (en) 2012-02-10 2016-02-16 Google Inc. Adaptive region of interest
US9094681B1 (en) 2012-02-28 2015-07-28 Google Inc. Adaptive segmentation
US9131073B1 (en) 2012-03-02 2015-09-08 Google Inc. Motion estimation aided noise reduction
US9800945B2 (en) 2012-04-03 2017-10-24 Activevideo Networks, Inc. Class-based intelligent multiplexing over unmanaged networks
US9123084B2 (en) 2012-04-12 2015-09-01 Activevideo Networks, Inc. Graphical application integration with MPEG objects
JP5959951B2 (en) * 2012-06-15 2016-08-02 キヤノン株式会社 Video processing apparatus, video processing method, and program
US9344729B1 (en) 2012-07-11 2016-05-17 Google Inc. Selective prediction signal filtering
KR101328199B1 (en) * 2012-11-05 2013-11-13 넥스트리밍(주) Method and terminal and recording medium for editing moving images
USD707701S1 (en) 2013-02-25 2014-06-24 Pepsico, Inc. Display screen with graphical user interface
USD701875S1 (en) 2013-02-25 2014-04-01 Pepsico, Inc. Display screen with graphical user interface
USD701876S1 (en) 2013-02-25 2014-04-01 Pepsico, Inc. Display screen with graphical user interface
USD707700S1 (en) 2013-02-25 2014-06-24 Pepsico, Inc. Display screen with graphical user interface
USD704728S1 (en) 2013-02-25 2014-05-13 Pepsico, Inc. Display screen with graphical user interface
US10275128B2 (en) 2013-03-15 2019-04-30 Activevideo Networks, Inc. Multiple-mode system and method for providing user selectable video content
US9501506B1 (en) 2013-03-15 2016-11-22 Google Inc. Indexing system
CN105144768B (en) * 2013-04-26 2019-05-21 英特尔Ip公司 Shared frequency spectrum in frequency spectrum share situation is redistributed
US9483568B1 (en) 2013-06-05 2016-11-01 Google Inc. Indexing system
US9326047B2 (en) 2013-06-06 2016-04-26 Activevideo Networks, Inc. Overlay rendering of user interface onto source video
US9219922B2 (en) * 2013-06-06 2015-12-22 Activevideo Networks, Inc. System and method for exploiting scene graph information in construction of an encoded video sequence
US9294785B2 (en) 2013-06-06 2016-03-22 Activevideo Networks, Inc. System and method for exploiting scene graph information in construction of an encoded video sequence
JP6232828B2 (en) * 2013-08-13 2017-11-22 日本電気株式会社 Still image providing device
US11425395B2 (en) 2013-08-20 2022-08-23 Google Llc Encoding and decoding using tiling
US9330171B1 (en) * 2013-10-17 2016-05-03 Google Inc. Video annotation using deep network architectures
USD741368S1 (en) * 2013-10-17 2015-10-20 Microsoft Corporation Display screen with transitional graphical user interface
WO2015161487A1 (en) * 2014-04-24 2015-10-29 Nokia Technologies Oy Apparatus, method, and computer program product for video enhanced photo browsing
US9788029B2 (en) 2014-04-25 2017-10-10 Activevideo Networks, Inc. Intelligent multiplexing using class-based, multi-dimensioned decision logic for managed networks
US9392272B1 (en) 2014-06-02 2016-07-12 Google Inc. Video coding using adaptive source variance based partitioning
US9578324B1 (en) 2014-06-27 2017-02-21 Google Inc. Video coding using statistical-based spatially differentiated partitioning
US10102613B2 (en) 2014-09-25 2018-10-16 Google Llc Frequency-domain denoising
USD760274S1 (en) * 2014-11-26 2016-06-28 Amazon Technologies, Inc. Display screen or portion thereof with an animated graphical user interface
USD761300S1 (en) * 2014-11-26 2016-07-12 Amazon Technologies, Inc. Display screen or portion thereof with an animated graphical user interface
USD761845S1 (en) * 2014-11-26 2016-07-19 Amazon Technologies, Inc. Display screen or portion thereof with an animated graphical user interface
EP3265999A4 (en) * 2015-03-01 2018-08-22 NEXTVR Inc. Methods and apparatus for 3d image rendering
JP1553726S (en) * 2015-10-30 2016-07-11
US9794574B2 (en) 2016-01-11 2017-10-17 Google Inc. Adaptive tile data size coding for video and image compression
US10542258B2 (en) 2016-01-25 2020-01-21 Google Llc Tile copying for video compression
GB2549723A (en) * 2016-04-26 2017-11-01 Nokia Technologies Oy A system and method for video editing in a virtual reality enviroment
US11449469B2 (en) * 2017-10-09 2022-09-20 Box, Inc. Embedded content object collaboration
US11082752B2 (en) * 2018-07-19 2021-08-03 Netflix, Inc. Shot-based view files for trick play mode in a network-based video delivery system
KR20210043679A (en) * 2018-08-21 2021-04-21 돌비 인터네셔널 에이비 Method, apparatus, and system for generation, transmission and processing of instant playback frames (IPF)
US10825231B2 (en) * 2018-12-10 2020-11-03 Arm Limited Methods of and apparatus for rendering frames for display using ray tracing
US11665312B1 (en) * 2018-12-27 2023-05-30 Snap Inc. Video reformatting recommendation
US10887542B1 (en) 2018-12-27 2021-01-05 Snap Inc. Video reformatting system
US11141656B1 (en) * 2019-03-29 2021-10-12 Amazon Technologies, Inc. Interface with video playback
USD929440S1 (en) 2019-04-19 2021-08-31 Pepsico, Inc. Display screen or portion thereof with animated graphical user interface
US11082755B2 (en) * 2019-09-18 2021-08-03 Adam Kunsberg Beat based editing
CN114359335A (en) * 2020-09-30 2022-04-15 华为技术有限公司 Target tracking method and electronic equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2758428A1 (en) * 1997-01-16 1998-07-17 Transcom Productions Ltd Interface device for associated video sequence

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5008853A (en) * 1987-12-02 1991-04-16 Xerox Corporation Representation of collaborative multi-user activities relative to shared structured data objects in a networked workstation environment
US5220657A (en) * 1987-12-02 1993-06-15 Xerox Corporation Updating local copy of shared data in a collaborative system
US6345288B1 (en) * 1989-08-31 2002-02-05 Onename Corporation Computer-based communication system and method using metadata defining a control-structure
US5237648A (en) * 1990-06-08 1993-08-17 Apple Computer, Inc. Apparatus and method for editing a video recording by selecting and displaying video clips
JPH05282379A (en) * 1992-02-06 1993-10-29 Internatl Business Mach Corp <Ibm> Method and device for retrieval of dynamic image
DE69322470T2 (en) * 1992-08-12 1999-07-15 Ibm System and method for localizing video segment transitions
EP0622930A3 (en) * 1993-03-19 1996-06-05 At & T Global Inf Solution Application sharing for computer collaboration system.
US5524195A (en) * 1993-05-24 1996-06-04 Sun Microsystems, Inc. Graphical user interface for interactive television with an animated agent
US5600775A (en) * 1994-08-26 1997-02-04 Emotion, Inc. Method and apparatus for annotating full motion video and other indexed data structures
US5596705A (en) * 1995-03-20 1997-01-21 International Business Machines Corporation System and method for linking and presenting movies with their underlying source information
US5729471A (en) * 1995-03-31 1998-03-17 The Regents Of The University Of California Machine dynamic selection of one video camera/image of a scene from multiple video cameras/images of the scene in accordance with a particular perspective on the scene, an object in the scene, or an event in the scene
JP3635710B2 (en) 1995-04-05 2005-04-06 ソニー株式会社 Method and apparatus for transmitting news material
US6181867B1 (en) 1995-06-07 2001-01-30 Intervu, Inc. Video storage and retrieval system
US5708845A (en) * 1995-09-29 1998-01-13 Wistendahl; Douglass A. System for mapping hot spots in media content for interactive digital media program
US5805118A (en) * 1995-12-22 1998-09-08 Research Foundation Of The State Of New York Display protocol specification with session configuration and multiple monitors
US5828370A (en) * 1996-07-01 1998-10-27 Thompson Consumer Electronics Inc. Video delivery system and method for displaying indexing slider bar on the subscriber video screen
AU3724497A (en) 1996-07-12 1998-02-09 Lava, Inc. Digital video system having a data base of coded data for digital audio and ideo information
US5963203A (en) 1997-07-03 1999-10-05 Obvious Technology, Inc. Interactive video icon with designated viewing position
WO1998047084A1 (en) 1997-04-17 1998-10-22 Sharp Kabushiki Kaisha A method and system for object-based video description and linking
WO1999046702A1 (en) 1998-03-13 1999-09-16 Siemens Corporate Research, Inc. Apparatus and method for collaborative dynamic video annotation

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2758428A1 (en) * 1997-01-16 1998-07-17 Transcom Productions Ltd Interface device for associated video sequence

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
AKUTSU A ET AL: "VIDEO INDEXING USING MOTION VECTORS", PROCEEDINGS OF THE SPIE, vol. 1818, no. PART 03, 18 November 1992 (1992-11-18), pages 1522 - 1530, XP000671350 *
HIROTADA UEDA ET AL: "AUTOMATIC STRUCTURE VISUALIZATION FOR VIDEO EDITING", BRIDGES BETWEEN WORLDS, AMSTERDAM, APR. 24 - 29, 1993, 24 April 1993 (1993-04-24), ASHLUND S;MULLET K; HENDERSON A; HOLLNAGEL E; WHITE T, pages 137 - 141, XP000570441 *
KATAOKA R ET AL: "ARCHITECTURE AND STORAGE STRUCTURE OF AN INTERACTIVE MULTIMEDIA INFORMATION SYSTEM", IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, vol. E78-D, no. 11, 1 November 1995 (1995-11-01), pages 1354 - 1361, XP000553522 *
MADRANE N ET AL: "VIDEO REPRESENTATION TOOLS USING A UNIFIED OBJECT AND PERSPECTIVE BASED APPROACH", PROCEEDINGS OF THE SPIE, vol. 2420, 9 February 1995 (1995-02-09), pages 152 - 163, XP000571900 *
MASSEY M ET AL: "SALIENT STILLS: PROCESS AND PRACTICE", IBM SYSTEMS JOURNAL, vol. 35, no. 3/04, 1996, pages 557 - 573, XP000635088 *
TONOMURA Y ET AL: "CONTENT ORIENTED VISUAL INTERFACE USING VIDEO ICONS FOR VISUAL DATABASE SYSTEMS", JOURNAL OF VISUAL LANGUAGES AND COMPUTING, vol. 1, 1 January 1990 (1990-01-01), pages 183 - 198, XP000195706 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
USRE42728E1 (en) 1997-07-03 2011-09-20 Sony Corporation Network distribution and management of interactive video and multi-media containers
US6573907B1 (en) 1997-07-03 2003-06-03 Obvious Technology Network distribution and management of interactive video and multi-media containers
USRE45594E1 (en) 1997-07-03 2015-06-30 Sony Corporation Network distribution and management of interactive video and multi-media containers
WO2000051078A3 (en) * 1999-02-22 2002-02-07 Siemens Corp Res Inc Method and apparatus for authoring and linking video documents
WO2000051078A2 (en) * 1999-02-22 2000-08-31 Siemens Corporate Research, Inc. Method and apparatus for authoring and linking video documents
WO2001026377A1 (en) * 1999-10-04 2001-04-12 Obvious Technology, Inc. Network distribution and management of interactive video and multi-media containers
EP1922864A1 (en) * 2005-08-15 2008-05-21 Disney Enterprises, Inc. A system and method for automating the creation of customized multimedia content
EP1922864A4 (en) * 2005-08-15 2010-12-22 Disney Entpr Inc A system and method for automating the creation of customized multimedia content
US8201073B2 (en) 2005-08-15 2012-06-12 Disney Enterprises, Inc. System and method for automating the creation of customized multimedia content
EP1983418A1 (en) * 2007-04-16 2008-10-22 Fujitsu Limited Display device, display program storage medium, and display method
GB2477120A (en) * 2010-01-22 2011-07-27 Icescreen Ehf Media editing using poly-dimensional display of sequential image frames
US8811801B2 (en) 2010-03-25 2014-08-19 Disney Enterprises, Inc. Continuous freeze-frame video effect system and method
EP2816563A1 (en) * 2013-06-18 2014-12-24 Nokia Corporation Video editing
WO2014202486A1 (en) * 2013-06-18 2014-12-24 Nokia Corporation Video editing
EP2816564A1 (en) * 2013-06-21 2014-12-24 Nokia Corporation Method and apparatus for smart video rendering
US10347298B2 (en) 2013-06-21 2019-07-09 Nokia Technologies Oy Method and apparatus for smart video rendering
WO2016207861A1 (en) * 2015-06-25 2016-12-29 Nokia Technologies Oy Method, apparatus, and computer program product for predictive customizations in self and neighborhood videos

Also Published As

Publication number Publication date
USRE42728E1 (en) 2011-09-20
USRE45594E1 (en) 2015-06-30
US5963203A (en) 1999-10-05
IL133798A0 (en) 2001-04-30
EP0992010A1 (en) 2000-04-12

Similar Documents

Publication Publication Date Title
US5963203A (en) Interactive video icon with designated viewing position
USRE38401E1 (en) Interactive video icon with designated viewing position
US7181757B1 (en) Video summary description scheme and method and system of video summary description data generation for efficient overview and browsing
US10031649B2 (en) Automated content detection, analysis, visual synthesis and repurposing
EP2127368B1 (en) Concurrent presentation of video segments enabling rapid video file comprehension
KR100781623B1 (en) System and method for annotating multi-modal characteristics in multimedia documents
US6573907B1 (en) Network distribution and management of interactive video and multi-media containers
EP1960990B1 (en) Voice and video control of interactive electronically simulated environment
US7421455B2 (en) Video search and services
US6571054B1 (en) Method for creating and utilizing electronic image book and recording medium having recorded therein a program for implementing the method
KR100464076B1 (en) Video browsing system based on keyframe
KR101967036B1 (en) Methods, systems, and media for searching for video content
EP1006459A2 (en) Content-based video story browsing system
US20080159708A1 (en) Video Contents Display Apparatus, Video Contents Display Method, and Program Therefor
CN101398843B (en) Device and method for browsing video summary description data
JP3579111B2 (en) Information processing equipment
Lee et al. Implementation and analysis of several keyframe-based browsing interfaces to digital video
JP2001175380A (en) Information index display method and device
KR100319160B1 (en) How to search video and organize search data based on event section
US20010033302A1 (en) Video browser data magnifier
JP2001306579A (en) Device and method for retrieving information and computer-readable recording medium recorded with program for computer to execute the same method
JP3698523B2 (en) Application program starting method, recording medium recording the computer program, and computer system
EP1872272A1 (en) Method and apparatus for editing program search information
JP3751608B2 (en) Information processing device
Campanella et al. Interactive visualization of video content and associated description for semantic annotation

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 133798

Country of ref document: IL

AK Designated states

Kind code of ref document: A1

Designated state(s): IL JP

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 1998932999

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: JP

Ref document number: 1999507306

Format of ref document f/p: F

WWP Wipo information: published in national office

Ref document number: 1998932999

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 1998932999

Country of ref document: EP