WO2006097907A2 - Video diary with event summary - Google Patents

Video diary with event summary Download PDF

Info

Publication number
WO2006097907A2
WO2006097907A2 PCT/IB2006/050839 IB2006050839W WO2006097907A2 WO 2006097907 A2 WO2006097907 A2 WO 2006097907A2 IB 2006050839 W IB2006050839 W IB 2006050839W WO 2006097907 A2 WO2006097907 A2 WO 2006097907A2
Authority
WO
WIPO (PCT)
Prior art keywords
events
video
videos
event
further including
Prior art date
Application number
PCT/IB2006/050839
Other languages
French (fr)
Other versions
WO2006097907A3 (en
Inventor
Nevenka Dimitrova
Robert J. Turetsky
Lalitha Agnihotri
Original Assignee
Koninklijke Philips Electronics, N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics, N.V. filed Critical Koninklijke Philips Electronics, N.V.
Publication of WO2006097907A2 publication Critical patent/WO2006097907A2/en
Publication of WO2006097907A3 publication Critical patent/WO2006097907A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/71Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/738Presentation of query results
    • G06F16/739Presentation of query results in form of a video summary, e.g. the video summary being a video sequence, a composite still image or having synthesized frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7834Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data

Definitions

  • This invention relates to the field of consumer electronics, and in particular to a video processing system that is configured to record a user's life experiences and to organize these recordings for efficient search and retrieval.
  • Video cameras are becoming increasingly smaller, and "wearable” cameras are becoming increasingly popular. Jewelry- sized cameras are currently available, as well as eyeglass mounted cameras. Similarly, many hand-held devices, such as cell-phones, are commonly equipped with video cameras, and security and surveillance cameras are becoming ubiquitous.
  • Solid state memory devices are currently able to store over a giga-byte of data, and small hard-disk drives, such as used in portable music players, are capable of storing dozens of giga-bytes of data.
  • Personal computers are available with hundreds of giga-bytes of data storage, and storage nodes with thousands of giga-bytes of data storage are also readily available.
  • Techniques are available for analyzing, characterizing, and summarizing video information from conventional sources, such as television programs, but these techniques generally rely upon known recording patterns and fairly static characterizations.
  • television programs may be characterized as "comedy”, “drama”, “news”, “weather”, and so on, with easy to recognize context shifts and program breaks.
  • Such techniques are not well suited for characterizing and organizing 'free-flowing' video, such as the collection of videos from an always-on wearable camera, and such techniques are not well suited for maintaining long-term archives and summaries of events.
  • Efficient Retrieval of Life Log Based on Context and Content presents a lifelog that contains input from substantially continuous video recordings, location and motion sensors, physiological sensors, documents, emails, and so on.
  • Spatio- temporal sampling of the video recordings such as sampling every N seconds or after every M meters of movement, is used to create summary frames and key frames that facilitate retrieval. Additionally, these key frames are analyzed to distinguish conversation scenes from non-conversation scenes, based, for example, on whether the scene contains a close-up face.
  • Retrieval of the information can be based on time, location, and behavior, as well as annotations provided by words in documents, e-mails and so on.
  • This structure facilitates the sharing of information, including the creation of summary videos that combine the collected information from multiple users, as well as identifying each user's interests, based, for example on the "LOOK_AT” and "TALK_TO” elements in the video.
  • an automated-guide system can be provided, wherein the guide provides suggestions based on prior interactions, such as exhibits associated with each "LOOK_AT” element.
  • a video processing system that is configured to process videos related to activities of a user, to identify events that facilitate the organization and storage of the videos for subsequent recall.
  • the user wears one or more camera devices that continuously record the activities of the user.
  • Processing elements analyze the recorded videos to recognize events, such as a greeting with another person.
  • the recognized events are used to index or otherwise organize the videos to facilitate recollection, such as recalling "people I met today", or answering queries such as "when did I last speak to Wendy?"
  • the topic of recorded conversations can also be used to characterize events, as well as the recognition of key words or phrases.
  • a hierarchy of archives is used to organize events and corresponding videos on a daily, weekly, monthly, and yearly basis.
  • FIG. 1 illustrates a block diagram of an example video archiving system in accordance with this invention.
  • FIG. 2 illustrates an example flow diagram for creating and maintaining an event library in accordance with this invention.
  • FIG. 3 illustrates an example flow diagram for recognizing events in accordance with this invention.
  • FIG. 1 illustrates a block diagram of an example video archiving system in accordance with this invention.
  • the system includes a memory 150 that receives videos from one or more cameras 110, and an abstraction/summarization module 120 that is configured to organize the videos in the memory 150 so as to facilitate recall via an access and editing module 160 that is coupled to a playback device 170.
  • an adaptive learning module 130 is also provided.
  • the memory 150 is illustrated as containing three levels of memory, named for convenience as “immediate” 152, "short-term” 154, and “long-term” 156 memory. Fewer or more levels of memory may be used, and the different levels may be distinguished physically or logically, or a combination of both.
  • the memory 150 may be distributed among multiple physical blocks.
  • the immediate memory 152 may be located in a portable device that is coupled directly to a user's camera 110; it may be integral to a camera 110; it may include portions of different users' cameras 110; and so on.
  • the short-term memory 154 may be partitioned between a portable device and a personal computer, and the long-term memory 156 may be located at a mass-storage facility that is accessed via the Internet.
  • a portion of the immediate memory 152 may be temporary memory that is used to download videos from a surveillance tape of a region that the user visited, or from another person's cell-phone, and so on.
  • the abstraction and summarization module 120 is configured to analyze the videos in the immediate memory 152, to identify videos that should be placed in short-term memory 154, and to semantically index these stored videos for efficient retrieval. The module 120 is also optionally configured to further determine which videos should be placed in long-term memory 156, as detailed further below. The abstraction and summarization module 120 also provides features of a personal assistant device, by processing the videos to identify personal introductions, telephone numbers, addresses, and so on.
  • the abstraction and summarization module 120 is configured to partition the continuous video stream into discrete events.
  • break points are identified within the video, based on changes in the visual or audio content, or based on other "break” cues. Events are defined to begin and end at break points, and may contain one or more intermediate break points. For example, during a ride in an automobile, the general content or form of a video stream will be fairly consistent. When the ride terminates, the content and form of the video stream will change substantially, and this visual transition will be identified by a break point.
  • the audio content of a video stream will often change substantially when a person enters a new environment, when a meeting commences, when the unexpected happens, and so on.
  • Each substantial audio change can be used to identify break points in the video. If the system is coupled with a location determination device, changes in velocity or direction may also serve to identify potential break points in the video.
  • a data structure is presented to identify related but non-contiguous frames, to accommodate temporal gaps in a common scene, such as a temporary departure from a meeting, using "families" of histograms.
  • the abstraction and summarization module 120 is configured to provide multi- level hierarchical summaries based on the recorded events, including, for example, daily, weekly, monthly, and yearly summaries. Additionally, special-purpose summaries may be created, for example to create personal "life videos" to pass along to family and friends.
  • the daily summaries will correspond to the videos that are selected for inclusion in the short-term memory 154, and the other summaries will correspond to videos that are selected for inclusion in the long-term memory 156.
  • the daily summary will include a table of contents that identifies the recorded events, the people encountered or referenced, as well as the topics of conversation, based on the audio content of the videos.
  • User cues such as "Hello, Wendy", or facial recognition techniques, are used to identify the people encountered.
  • the techniques presented in the aforementioned "ONLINE FACE RECOGNITION SYSTEM FOR VIDEOS BASED ON MODIFIED PROBABILISTIC NEURAL NETWORKS WITH ADAPTIVE THRESHOLD" paper are used to add new faces and new voices to the memory, and to reconcile the identities of the people involved in the conversations.
  • the system can be coupled to a dictionary to identify proper names, so that, for example, the text "I was speaking to Charles, yesterday... " can be used to identify people referenced in a conversation.
  • other cues such as "This is Sally”, or "I'd like you to meet Sally" can be used to capture images of newly-met people, to facilitate recollection at a later time.
  • the occurrence of numbers in a conversation can trigger savable events, such as identifying telephone numbers, addresses, appointment times, and so on.
  • the tenor, duration, and tone of the conversation may be used to classify the conversation as "small talk”, “greeting”, “significant news”, “argument”, and so on.
  • multiple classifications and indexing may occur.
  • multiple levels of cross-indexing are preferably used, so that, for example, all encounters with a particular person can be efficiently recalled.
  • the abstraction module 120 may also be configured to recognize commands directed to this archiving system, such as "DIARY, TAKE NOTE ... ", or "DIARY, THIS MEETING IS IMPORTANT” to direct the system to include subsequent videos in the short term memory 154, or "DIARY, OFF” to direct the system not to include the subsequent videos in the short term memory 154.
  • commands directed to this archiving system such as "DIARY, TAKE NOTE ... ", or "DIARY, THIS MEETING IS IMPORTANT" to direct the system to include subsequent videos in the short term memory 154, or "DIARY, OFF” to direct the system not to include the subsequent videos in the short term memory 154.
  • "DIARY, DELETE LAST 15 MINUTES” can be used to remove recording information from the short term memory 154.
  • commands such as "DIARY, NEW EVENT", or "DIARY, NEW MEETING” can be used to facilitate the identification of segment breaks, as well as the classification of events.
  • visual cues may be used to identify or classify events.
  • the detection of text within the visual image may facilitate a determination of the location or venue of the video, including for example, the recognition of road signs, building names, skylines, and so on.
  • pattern recognition and learning techniques can be applied to recognize visual patterns corresponding to "home", “office”, “traveling to work”, “airport”, and so on.
  • the immediate memory 152 may include information from other sources, such as a GPS receiver, that can be used to facilitate identification of events, such as “trip to New York", “at the Empire State Building", and so on.
  • the abstraction module 120 may be configured to always save videos to the short term memory when the user is at a particular location, such as "in the Board room", and never save videos when the user is at another location, such as "in the bedroom”.
  • Other sources of information may be used to facilitate the identification and classification of events.
  • the module 120 may be coupled to a personal calendar, and key dates, such as birthdays, anniversaries, holidays, and so on, can effect a different prioritization for saving events.
  • the calendar will also affect the movement of events from short-term memory 154 to long-term memory 156, as detailed further below.
  • a schedule event such as "Let's meet next Wednesday at three"
  • this scheduled event can be automatically added to the personal calendar.
  • the abstraction and summarization module 120 may include a number of components that are logically or physically independent from each other.
  • the aforementioned "DIARY, OFF" or "in the bedroom” controls may be located with the immediate memory 152 components, so that the videos are not recorded to the memory 152, for greater privacy.
  • the components used to determine which events are stored in long- term memory 156 may be located wherever the long-term memory 156 components are located.
  • third party vendors may offer summarization and abstraction components that can be added to the archiving system, including abstraction and storage components that are accessed via the Internet.
  • a third-party vendor may offer a "summarization and e- mailing service" that creates and distributes a video summary that includes videos from the video diaries of multiple family members.
  • the adaptive learning module 130 is configured to facilitate the recognition and characterization of events, including distinguishing between rare events and repetitive events.
  • the video archiving system of this invention cannot rely on a "supervised" structure of the images in the stream. For example, in a system that summarizes news stories, the system can expect each segment to begin with a scene break, each news story to begin with an image of the anchor person, each weather story to begin with an image of a map, and so on.
  • the image of the current speaker is usually placed in the foreground, or a light appears at the current speaker's location, and so on.
  • a "free flow", or “unsupervised flow” of images will be collected, and the learning module 130 is configured to facilitate the segregation of the continuous flow videos into discrete events by creating a library of labeled events and their characteristic features.
  • the learning module 130 is also configured to further distinguish events as being rare or repetitive. Additional levels of distinction may also be provided, such as 'extremely rare', 'somewhat repetitive', and so on.
  • FIG. 2 illustrates an example flow diagram for creating and maintaining a dynamic event library in accordance with this invention, as would be included, for example, in the learning module 130 of FIG. 1.
  • features are extracted from the video information that is stored in the memory 150. Generally, these features are extracted from the information in the immediate memory 152. The extraction of features include audio, visual, and textual analysis of the video information, as discussed above, optionally enhanced by ancillary information, such as location and time.
  • the features are processed to identify clusters of events, such as “meeting people”, “driving in car”, “in kitchen”, and so on; typically, this clustering is performed on a daily basis.
  • each day 232, 234 will likely exhibit different clusters.
  • each cluster can generally be classified as being “rare” or “repetitive”.
  • a repetitive cluster for example, might be “driving to work”, or “eating breakfast", whereas rare clusters may include “at a beach", or "at a party”.
  • conventional unsupervised learning methods are used to define the clusters.
  • Clusters that contain many sequences would be representatives of clusters of repetitive events.
  • the clusters that have few sequences are rare events, which may be characterized as important, such as meeting the President, or not-important, such as losing one's way through an unknown part of town.
  • the system delineates repetitive vs.
  • known patterns of events are identified using statistical time series methods, such as hierarchical temporal Hidden Markov Models (HMMs).
  • HMMs hierarchical temporal Hidden Markov Models
  • adaptive resonance theory is used to further distinguish long-term and short-term patterns of events.
  • the clusters of events are identified by labels, such as the aforementioned “driving to work”, “eating breakfast”, “at a beach”, and so on.
  • the learning module 130 includes default labels, and patterns, for easily-identifiable and predictable events, such as views from an automobile, views of particular rooms (meeting room, bedroom, kitchen, etc.), cues for greeting people, and others, and provides a user- interface to allow the user to label unrecognized events, change labels of specific events, further distinguish clustered events, and so on.
  • this user interface is also used to allow the user to identify particular images, such as "Sally”, “John”, “home-kitchen”, “office-kitchen”, so that subsequent videos can be suitably identified.
  • the labeled events and their characteristic features are stored in an event library, at 270, for subsequent recognition and classification of events, as further detailed with regard to FIG. 3.
  • FIG. 3 illustrates an example flow diagram for recognizing and classifying events in accordance with this invention, as would be used, for example, in the abstraction and summarization module 120 of FIG. 1.
  • features are extracted from the memory 150, typically the immediate memory 152 of FIG. 1.
  • the extraction of features include audio, visual, and textual analysis of the video information, as discussed above, optionally enhanced by ancillary information, such as location and time.
  • the extracted features are analyzed to determine if they correspond to one or more defined events in the event library 270, discussed above. If, at 330, the features correspond to an event, the occurrence of the event is stored, including the transfer of the video segments into the short-term memory 154 of FIG. 1, and the creation of a summary record that characterizes the event, such as "Meeting Sally in Manhattan", “On the beach with Bill”, and so on.
  • the summary record also includes an "importance" rating, to facilitate prioritization; generally, for example, rare events are rated more important than routine and/or repetitive events.
  • an interface is provided to allow the user to adjust the importance ratings and/or the resultant prioritization.
  • the information from the memory 150 is passed to the adaptive learning module 130, at 340, for the potential creation of a newly defined event in the event library, at 350, using the techniques detailed above with regard to FIG. 2.
  • the user is also provided the option of explicitly defining events, preferably via the learning module 130. If a newly defined event is created, this first occurrence of the event is stored and summarized at 360, discussed above.
  • the identified event corresponds to one of a defined set of events that trigger other operations, such as storing a telephone number, or storing an image of a new acquaintance, these operations are subsequently performed, at 370.
  • the abstraction and summarization module 120 is also configured to prepare weekly, monthly, and yearly summaries of events, based on a prioritization of the events.
  • the module 120 allows both automated and manual identification of events for inclusion in these summaries, as well as manual control of the parameters and procedures used for automated event identification and prioritization.
  • Video data mining is used to detect weekly, monthly, or yearly patterns, to facilitate the detection of memorable days and/or memorable events.
  • High level semantic pointers are preferably used to index these memorable days and events, to facilitate retrieval.
  • ancillary information such as calendar information for birthdays, anniversaries, travel schedule, and so on, and GPS information for locations, can be used to identify memorable days and events as well.
  • the aforementioned importance rating is also used to select memorable events.
  • days and events are identified as being memorable when they include out-of-the-ordinary occurrences, such as rare events, new locations, new vistas, and so on.
  • the user is provided the option of manually overriding the systems' choice of memorable days or events.
  • the system includes a list of criteria and/or options for defining memorable events, such as a list that includes "location”, “vista”, “people”, “calendar”, etc., and the user selects which of these criteria should or should not be used to define memorable events.
  • Indexed archives are used to manage the storage of the daily, weekly, monthly, and yearly summaries, using archiving techniques common in the art.
  • "meta summaries" are created at each level (e.g. daily level) to produce the summary at the next higher level (e.g. weekly level).
  • a fixed number of daily summaries, with their associated stored videos are maintained, with new daily summaries and associated videos replacing the oldest daily summaries and videos.
  • a fixed number of each of weekly and monthly summaries and videos are maintained, with new summaries and videos replacing the oldest summaries and videos.
  • the selected daily videos may be copied to a memory reserved for weekly summaries, and selected weekly videos may be copied to a memory reserved for monthly summaries, and so on.
  • pointers could be maintained to a single copy of each video segment, and the video segment is not permitted to be overwritten if any of the daily, weekly, monthly, etc. summaries contain a pointer to this segment.
  • the terms "delete” and “replace” are used herein in the "logical" sense, and do not necessarily imply a physical deletion or immediate overwriting of the material.
  • a video segment is "deleted" from the system whenever a reference to the segment does not appear in any of the current summaries.
  • the access and editing module 160 of FIG. 1 is configured to allow a user to browse the hierarchy of summaries, search for key words or phrases, sort by time or location, and so on, using data access techniques common in the art.
  • each stored event includes a preview image or short clip, to facilitate selection of the stored videos.
  • the accessing and editing module 160 allows a user to search and browse the video diary using a variety of query modes. For example, if the user is browsing the video, the module 160 allows queries such as: “Which building is this?”, “Who is this person?", “Have I met this person before?”, “Whose voice is this?”, and so on. In response to such queries, the accessing module 160 compares the current image, audio, or video sequence to other images, audio and sequences within the memory 150. The query may also be in the form of a summary, such as "How often have I met Mary?", "What songs did we listen to at Mike's party?", "Who attended the staff meeting yesterday?”, and so on.
  • the module 160 also provides a variety of editing functions, ranging from straightforward copying, cutting, and deleting functions to automated tasks such as "Prepare a disk containing all of my meetings with Charles", or "Prepare a summary disk of all of our family reunions", and so on.
  • the module 160 allows a user to use a variety of "director styles" in the creation of such composites and summaries.
  • this invention is presented in the context of a single user, the system can be configured to communicate with similar systems, to provide for multi-user-based visual memories.
  • the recordings of multiple users who attend a meeting, party, or other assembly, for example, can be merged to provide a common multiple- view recording of the event.

Abstract

A video processing system is configured to process videos (150) related to activities of a user, to identify events (260) that facilitate the organization and storage of the videos (152, 154, 156) for subsequent recall. Preferably, the user wears one or more camera devices (110) that continuously record the activities of the user. Processing elements (120) analyze the recorded videos to recognize events, such as a greeting with another person. The recognized events (240, 250) are used to index or otherwise organize the videos to facilitate recollection (160), such as recalling 'people I met today', or answering queries such as 'when did I last speak to Wendy?' The topic of recorded conversations can also be used to characterize events, as well as the recognition of key words or phrases. A hierarchy of archives is used to organize events and corresponding videos on a daily, weekly, monthly, and yearly basis.

Description

VIDEO DIARY WITH EVENT SUMMARY
This invention relates to the field of consumer electronics, and in particular to a video processing system that is configured to record a user's life experiences and to organize these recordings for efficient search and retrieval.
Video cameras are becoming increasingly smaller, and "wearable" cameras are becoming increasingly popular. Jewelry- sized cameras are currently available, as well as eyeglass mounted cameras. Similarly, many hand-held devices, such as cell-phones, are commonly equipped with video cameras, and security and surveillance cameras are becoming ubiquitous.
Memory devices are also becoming increasingly smaller. Solid state memory devices are currently able to store over a giga-byte of data, and small hard-disk drives, such as used in portable music players, are capable of storing dozens of giga-bytes of data. Personal computers are available with hundreds of giga-bytes of data storage, and storage nodes with thousands of giga-bytes of data storage are also readily available.
Although the technologies exist for a user to continuously record videos corresponding to months or years of the user's life, the heretofore absence of a viable method and system for organizing these videos for subsequent recall and recollection has substantially diminished the practicality of using these technologies to create such a video diary.
Techniques are available for analyzing, characterizing, and summarizing video information from conventional sources, such as television programs, but these techniques generally rely upon known recording patterns and fairly static characterizations. For example, television programs may be characterized as "comedy", "drama", "news", "weather", and so on, with easy to recognize context shifts and program breaks. Such techniques are not well suited for characterizing and organizing 'free-flowing' video, such as the collection of videos from an always-on wearable camera, and such techniques are not well suited for maintaining long-term archives and summaries of events.
At the CARPE workshop co-located with ACM Multimedia Conference 2004 in New York, October 15, 2004, a number of papers were presented regarding techniques for capturing and organizing information regarding a person's life, and are summarized below. "Passive Capture and Ensuing Issues for a Personal Lifetime Store," by Gemmell, presents a system for integrating the input from cameras and sensors, such as time and location sensors. The time and location that each photograph is taken is stored in a database, so that a user can retrieve pictures based on time or location. The system also maintains a log of events, such as a record of telephone calls, a record of web-pages visited, a record of appointments in a calendar, and so on. If the user recalls an event associated with the time or location of pictures to be retrieved, the user consults the log for the time of the event, and thereby locates the pictures.
"Efficient Retrieval of Life Log Based on Context and Content" by Aizawa et al. presents a lifelog that contains input from substantially continuous video recordings, location and motion sensors, physiological sensors, documents, emails, and so on. Spatio- temporal sampling of the video recordings, such as sampling every N seconds or after every M meters of movement, is used to create summary frames and key frames that facilitate retrieval. Additionally, these key frames are analyzed to distinguish conversation scenes from non-conversation scenes, based, for example, on whether the scene contains a close-up face. Retrieval of the information can be based on time, location, and behavior, as well as annotations provided by words in documents, e-mails and so on.
"A Layered Interpretation of Human Interactions Captured by Ubiquitous Sensors," Takahashi et al. introduces a system architecture for capturing continuous video, including a layered model of interactions based on semantic levels. At the lowest level is the raw data collected by audio/visual equipment, location sensors, and the like. At a higher level, segments of raw data are created, and at a further level, characterized as elements of human behavior, such as "LOOK_AT", "TALK_TO", and so on. At a higher level, the individual behavior elements are grouped into social interactions, such as "GROUP_DISCUSSION", "TOGETHER_WITH", and so on. This structure facilitates the sharing of information, including the creation of summary videos that combine the collected information from multiple users, as well as identifying each user's interests, based, for example on the "LOOK_AT" and "TALK_TO" elements in the video. In like manner, an automated-guide system can be provided, wherein the guide provides suggestions based on prior interactions, such as exhibits associated with each "LOOK_AT" element.
Although the above articles address techniques for storing and organizing videos, a need exists for improved techniques for such storage and organization, to facilitate long term as well as short term retrieval and recollection. Of particular note, a need exists for a system that facilitates the retrieval of events in a person's life, recognizing that the identification of material that constitutes an "event" is dynamic, and that the significance of an event changes with time.
It is an object of this invention to provide a method and system for collecting and organizing videos related to a user's everyday experiences, to facilitate recollection of events in the user's life. It is a further object of this invention to provide a method and system for collecting and organizing videos to facilitate maintenance of a personal or business diary.
These objects, and others, are achieved by a video processing system that is configured to process videos related to activities of a user, to identify events that facilitate the organization and storage of the videos for subsequent recall. Preferably, the user wears one or more camera devices that continuously record the activities of the user. Processing elements analyze the recorded videos to recognize events, such as a greeting with another person. The recognized events are used to index or otherwise organize the videos to facilitate recollection, such as recalling "people I met today", or answering queries such as "when did I last speak to Wendy?" The topic of recorded conversations can also be used to characterize events, as well as the recognition of key words or phrases. A hierarchy of archives is used to organize events and corresponding videos on a daily, weekly, monthly, and yearly basis.
The invention is explained in further detail, and by way of example, with reference to the accompanying drawings wherein:
FIG. 1 illustrates a block diagram of an example video archiving system in accordance with this invention.
FIG. 2 illustrates an example flow diagram for creating and maintaining an event library in accordance with this invention.
FIG. 3 illustrates an example flow diagram for recognizing events in accordance with this invention.
Throughout the drawings, the same reference numeral refers to the same element, or an element that performs substantially the same function. The drawings are included for illustrative purposes and are not intended to limit the scope of the invention. Using commonly available compression techniques, one giga-byte of memory is able to store about one hour's worth of high quality video, or at least four hours of "teleconference-quality" video. This invention is based on the observation that, using existing technology, a continuous recording of an average person's activities for days at a time could be recorded on a portable recorder, and with judicious event selection and repetition reduction, years of key events in the person's life could be stored for retrieval and recollection on a personal computer. As memory and processing technologies advance, a user will be able to maintain a life-long video diary on a portable viewing system using the techniques of this invention.
Techniques are continuing to be developed for summarizing and classifying video segments. For example, Mauro Barbieri, Nevenka Dimitrova, and Lalitha Agnihotri's "MOVIE-IN-A-MINUTE: AUTOMATICALLY GENERATED VIDEO PREVIEWS", published at the Pacific Rim Conference on Multimedia, in Tokyo, 1-3 December 2004, presents a technique for automatically creating a video preview of a movie. USP 6,754,389, "PROGRAM CLASSIFICATION USING OBJECT TRACKING", issued 22 June 2004 to Nevenka Dimitrova, Lalitha Agnihotri, and Gang Wei, presents a technique for classifying television programs based on the path or trajectory of objects within each scene, and is incorporated by reference herein.
Similarly, techniques are continuing to be developed for recognizing significant objects within video images. For example, Jun Fan, Nevenka Dimitrova, and Vasanth Philomin's "ONLINE FACE RECOGNITION SYSTEM FOR VIDEOS BASED ON MODIFIED PROBABILISTIC NEURAL NETWORKS WITH ADAPTIVE THRESHOLD", published at IEEE ICIP 2004, teaches an online learning technique for efficiently recognizing faces in video images, and adding new faces to the database "online". During continuous operation of the system, the system "learns" new faces and is open to memorize new face models.
FIG. 1 illustrates a block diagram of an example video archiving system in accordance with this invention. The system includes a memory 150 that receives videos from one or more cameras 110, and an abstraction/summarization module 120 that is configured to organize the videos in the memory 150 so as to facilitate recall via an access and editing module 160 that is coupled to a playback device 170. To facilitate dynamic event recognition and characterization, an adaptive learning module 130 is also provided. The memory 150 is illustrated as containing three levels of memory, named for convenience as "immediate" 152, "short-term" 154, and "long-term" 156 memory. Fewer or more levels of memory may be used, and the different levels may be distinguished physically or logically, or a combination of both. Although a single block is used to represent the memory 150, the memory 150 may be distributed among multiple physical blocks. For example, the immediate memory 152 may be located in a portable device that is coupled directly to a user's camera 110; it may be integral to a camera 110; it may include portions of different users' cameras 110; and so on. In like manner, the short-term memory 154 may be partitioned between a portable device and a personal computer, and the long-term memory 156 may be located at a mass-storage facility that is accessed via the Internet. Similarly, a portion of the immediate memory 152 may be temporary memory that is used to download videos from a surveillance tape of a region that the user visited, or from another person's cell-phone, and so on.
The abstraction and summarization module 120 is configured to analyze the videos in the immediate memory 152, to identify videos that should be placed in short-term memory 154, and to semantically index these stored videos for efficient retrieval. The module 120 is also optionally configured to further determine which videos should be placed in long-term memory 156, as detailed further below. The abstraction and summarization module 120 also provides features of a personal assistant device, by processing the videos to identify personal introductions, telephone numbers, addresses, and so on.
In conjunction with the adaptive learning module 130, discussed further below, the abstraction and summarization module 120 is configured to partition the continuous video stream into discrete events. To facilitate this partitioning, "break points" are identified within the video, based on changes in the visual or audio content, or based on other "break" cues. Events are defined to begin and end at break points, and may contain one or more intermediate break points. For example, during a ride in an automobile, the general content or form of a video stream will be fairly consistent. When the ride terminates, the content and form of the video stream will change substantially, and this visual transition will be identified by a break point. In like manner, the audio content of a video stream will often change substantially when a person enters a new environment, when a meeting commences, when the unexpected happens, and so on. Each substantial audio change can be used to identify break points in the video. If the system is coupled with a location determination device, changes in velocity or direction may also serve to identify potential break points in the video.
Nevenka Dimitrova, Jacquelyn Martino, Lalitha Agnihotri, and Herman Elenbaas's "SUPERHISTOGRAMS FOR VIDEO REPRESENTATION", IEEE ICIP 1999, Kobe, Japan, discloses the use of color histograms to identify related frames, and identifies breaks where the histograms differ substantially. A data structure is presented to identify related but non-contiguous frames, to accommodate temporal gaps in a common scene, such as a temporary departure from a meeting, using "families" of histograms. Additional techniques for defining breaks in video are presented in Nevenka Dimitrova, Lalitha Agnihotri, and Radu Jasinschi's "TEMPORAL VIDEO BOUNDARIES", at pages 61-90 of the VIDEO MINING BOOK, edited by A. Rosenfeld et al., and published by Kluwer, 2003.
In a preferred embodiment of this invention, the abstraction and summarization module 120 is configured to provide multi- level hierarchical summaries based on the recorded events, including, for example, daily, weekly, monthly, and yearly summaries. Additionally, special-purpose summaries may be created, for example to create personal "life videos" to pass along to family and friends.
Generally, the daily summaries will correspond to the videos that are selected for inclusion in the short-term memory 154, and the other summaries will correspond to videos that are selected for inclusion in the long-term memory 156.
Preferably, the daily summary will include a table of contents that identifies the recorded events, the people encountered or referenced, as well as the topics of conversation, based on the audio content of the videos. User cues, such as "Hello, Wendy", or facial recognition techniques, are used to identify the people encountered. In a preferred embodiment of this system, the techniques presented in the aforementioned "ONLINE FACE RECOGNITION SYSTEM FOR VIDEOS BASED ON MODIFIED PROBABILISTIC NEURAL NETWORKS WITH ADAPTIVE THRESHOLD" paper are used to add new faces and new voices to the memory, and to reconcile the identities of the people involved in the conversations. Similarly, the system can be coupled to a dictionary to identify proper names, so that, for example, the text "I was speaking to Charles, yesterday... " can be used to identify people referenced in a conversation. In like manner, other cues, such as "This is Sally", or "I'd like you to meet Sally", can be used to capture images of newly-met people, to facilitate recollection at a later time. Also, the occurrence of numbers in a conversation can trigger savable events, such as identifying telephone numbers, addresses, appointment times, and so on.
In addition to the text of the conversation, the tenor, duration, and tone of the conversation may be used to classify the conversation as "small talk", "greeting", "significant news", "argument", and so on.
Note that within a given video segment corresponding to a saved event, multiple classifications and indexing may occur. In like manner, to facilitate efficient retrieval, multiple levels of cross-indexing are preferably used, so that, for example, all encounters with a particular person can be efficiently recalled.
The abstraction module 120 may also be configured to recognize commands directed to this archiving system, such as "DIARY, TAKE NOTE ... ", or "DIARY, THIS MEETING IS IMPORTANT" to direct the system to include subsequent videos in the short term memory 154, or "DIARY, OFF" to direct the system not to include the subsequent videos in the short term memory 154. Similarly, "DIARY, DELETE LAST 15 MINUTES" can be used to remove recording information from the short term memory 154. In like manner, commands such as "DIARY, NEW EVENT", or "DIARY, NEW MEETING" can be used to facilitate the identification of segment breaks, as well as the classification of events.
In addition to audio cues, visual cues may be used to identify or classify events. For example, the detection of text within the visual image may facilitate a determination of the location or venue of the video, including for example, the recognition of road signs, building names, skylines, and so on. In like manner, pattern recognition and learning techniques can be applied to recognize visual patterns corresponding to "home", "office", "traveling to work", "airport", and so on. Similarly, the immediate memory 152 may include information from other sources, such as a GPS receiver, that can be used to facilitate identification of events, such as "trip to New York", "at the Empire State Building", and so on.
Optionally, the abstraction module 120 may be configured to always save videos to the short term memory when the user is at a particular location, such as "in the Board room", and never save videos when the user is at another location, such as "in the bedroom". Other sources of information may be used to facilitate the identification and classification of events. For example, the module 120 may be coupled to a personal calendar, and key dates, such as birthdays, anniversaries, holidays, and so on, can effect a different prioritization for saving events. The calendar will also affect the movement of events from short-term memory 154 to long-term memory 156, as detailed further below. In like manner, if the abstraction and summarization module 120 detects a schedule event, such as "Let's meet next Wednesday at three", this scheduled event can be automatically added to the personal calendar.
As with the memory 150, although the abstraction and summarization module 120 is illustrated as a single block in FIG. 1, it may include a number of components that are logically or physically independent from each other. For example, the aforementioned "DIARY, OFF" or "in the bedroom" controls may be located with the immediate memory 152 components, so that the videos are not recorded to the memory 152, for greater privacy. In like manner, the components used to determine which events are stored in long- term memory 156 may be located wherever the long-term memory 156 components are located.
As the use of this invention proliferates, third party vendors may offer summarization and abstraction components that can be added to the archiving system, including abstraction and storage components that are accessed via the Internet. For example, to replace the "holiday letter" that many families send each year, summarizing the significant events of the year, a third-party vendor may offer a "summarization and e- mailing service" that creates and distributes a video summary that includes videos from the video diaries of multiple family members. These and other configurations and enhancements will be evident to one of ordinary skill in the art in view of this disclosure.
The adaptive learning module 130 is configured to facilitate the recognition and characterization of events, including distinguishing between rare events and repetitive events. As contrast with conventional systems used to characterize video streams, such as television programs or videoconference meetings, the video archiving system of this invention cannot rely on a "supervised" structure of the images in the stream. For example, in a system that summarizes news stories, the system can expect each segment to begin with a scene break, each news story to begin with an image of the anchor person, each weather story to begin with an image of a map, and so on. In a supervised videoconference, the image of the current speaker is usually placed in the foreground, or a light appears at the current speaker's location, and so on. In a typical embodiment of this invention, on the other hand, a "free flow", or "unsupervised flow" of images will be collected, and the learning module 130 is configured to facilitate the segregation of the continuous flow videos into discrete events by creating a library of labeled events and their characteristic features. To facilitate efficient storage and recall of the videos, the learning module 130 is also configured to further distinguish events as being rare or repetitive. Additional levels of distinction may also be provided, such as 'extremely rare', 'somewhat repetitive', and so on.
FIG. 2 illustrates an example flow diagram for creating and maintaining a dynamic event library in accordance with this invention, as would be included, for example, in the learning module 130 of FIG. 1.
At 210, features are extracted from the video information that is stored in the memory 150. Generally, these features are extracted from the information in the immediate memory 152. The extraction of features include audio, visual, and textual analysis of the video information, as discussed above, optionally enhanced by ancillary information, such as location and time. At 220, the features are processed to identify clusters of events, such as "meeting people", "driving in car", "in kitchen", and so on; typically, this clustering is performed on a daily basis.
As illustrated at 230, each day 232, 234 will likely exhibit different clusters. However, over time, each cluster can generally be classified as being "rare" or "repetitive". A repetitive cluster, for example, might be "driving to work", or "eating breakfast", whereas rare clusters may include "at a beach", or "at a party". In a preferred embodiment, conventional unsupervised learning methods are used to define the clusters. Clusters that contain many sequences would be representatives of clusters of repetitive events. The clusters that have few sequences are rare events, which may be characterized as important, such as meeting the President, or not-important, such as losing one's way through an unknown part of town. The system delineates repetitive vs. rare events, and provides the user the option of further distinguishing the events as important. In a preferred embodiment, known patterns of events are identified using statistical time series methods, such as hierarchical temporal Hidden Markov Models (HMMs). Optionally, adaptive resonance theory is used to further distinguish long-term and short-term patterns of events. At 260, the clusters of events are identified by labels, such as the aforementioned "driving to work", "eating breakfast", "at a beach", and so on. In a preferred embodiment, the learning module 130 includes default labels, and patterns, for easily-identifiable and predictable events, such as views from an automobile, views of particular rooms (meeting room, bedroom, kitchen, etc.), cues for greeting people, and others, and provides a user- interface to allow the user to label unrecognized events, change labels of specific events, further distinguish clustered events, and so on. Similarly, this user interface is also used to allow the user to identify particular images, such as "Sally", "John", "home-kitchen", "office-kitchen", so that subsequent videos can be suitably identified.
The labeled events and their characteristic features are stored in an event library, at 270, for subsequent recognition and classification of events, as further detailed with regard to FIG. 3.
FIG. 3 illustrates an example flow diagram for recognizing and classifying events in accordance with this invention, as would be used, for example, in the abstraction and summarization module 120 of FIG. 1.
At 310, features are extracted from the memory 150, typically the immediate memory 152 of FIG. 1. The extraction of features include audio, visual, and textual analysis of the video information, as discussed above, optionally enhanced by ancillary information, such as location and time.
At 320, the extracted features are analyzed to determine if they correspond to one or more defined events in the event library 270, discussed above. If, at 330, the features correspond to an event, the occurrence of the event is stored, including the transfer of the video segments into the short-term memory 154 of FIG. 1, and the creation of a summary record that characterizes the event, such as "Meeting Sally in Manhattan", "On the beach with Bill", and so on. The summary record also includes an "importance" rating, to facilitate prioritization; generally, for example, rare events are rated more important than routine and/or repetitive events. In a preferred embodiment of this invention, an interface is provided to allow the user to adjust the importance ratings and/or the resultant prioritization.
If, at 330, the extracted features do not correspond to a defined event, the information from the memory 150 is passed to the adaptive learning module 130, at 340, for the potential creation of a newly defined event in the event library, at 350, using the techniques detailed above with regard to FIG. 2. In a preferred embodiment of this invention, the user is also provided the option of explicitly defining events, preferably via the learning module 130. If a newly defined event is created, this first occurrence of the event is stored and summarized at 360, discussed above.
If the identified event corresponds to one of a defined set of events that trigger other operations, such as storing a telephone number, or storing an image of a new acquaintance, these operations are subsequently performed, at 370.
In addition to storing and summarizing the daily events as illustrated in FIG. 3, the abstraction and summarization module 120 is also configured to prepare weekly, monthly, and yearly summaries of events, based on a prioritization of the events. In a preferred embodiment of this invention, the module 120 allows both automated and manual identification of events for inclusion in these summaries, as well as manual control of the parameters and procedures used for automated event identification and prioritization.
Video data mining is used to detect weekly, monthly, or yearly patterns, to facilitate the detection of memorable days and/or memorable events. High level semantic pointers are preferably used to index these memorable days and events, to facilitate retrieval. As discussed above, ancillary information, such as calendar information for birthdays, anniversaries, travel schedule, and so on, and GPS information for locations, can be used to identify memorable days and events as well. The aforementioned importance rating is also used to select memorable events. Generally, days and events are identified as being memorable when they include out-of-the-ordinary occurrences, such as rare events, new locations, new vistas, and so on.
As noted above, the user is provided the option of manually overriding the systems' choice of memorable days or events. Optionally, the system includes a list of criteria and/or options for defining memorable events, such as a list that includes "location", "vista", "people", "calendar", etc., and the user selects which of these criteria should or should not be used to define memorable events.
Indexed archives are used to manage the storage of the daily, weekly, monthly, and yearly summaries, using archiving techniques common in the art. Generally, "meta summaries" are created at each level (e.g. daily level) to produce the summary at the next higher level (e.g. weekly level). In a preferred embodiment, to control the amount of storage required to maintain the video diary of this invention, a fixed number of daily summaries, with their associated stored videos, are maintained, with new daily summaries and associated videos replacing the oldest daily summaries and videos. In like manner, a fixed number of each of weekly and monthly summaries and videos are maintained, with new summaries and videos replacing the oldest summaries and videos. As noted above, any of a variety of techniques can be used for maintaining this hierarchy of saved information. For example, the selected daily videos may be copied to a memory reserved for weekly summaries, and selected weekly videos may be copied to a memory reserved for monthly summaries, and so on. Alternatively, pointers could be maintained to a single copy of each video segment, and the video segment is not permitted to be overwritten if any of the daily, weekly, monthly, etc. summaries contain a pointer to this segment. Note that the terms "delete" and "replace" are used herein in the "logical" sense, and do not necessarily imply a physical deletion or immediate overwriting of the material. A video segment is "deleted" from the system whenever a reference to the segment does not appear in any of the current summaries.
The access and editing module 160 of FIG. 1 is configured to allow a user to browse the hierarchy of summaries, search for key words or phrases, sort by time or location, and so on, using data access techniques common in the art. In a preferred embodiment, each stored event includes a preview image or short clip, to facilitate selection of the stored videos.
Preferably, the accessing and editing module 160 allows a user to search and browse the video diary using a variety of query modes. For example, if the user is browsing the video, the module 160 allows queries such as: "Which building is this?", "Who is this person?", "Have I met this person before?", "Whose voice is this?", and so on. In response to such queries, the accessing module 160 compares the current image, audio, or video sequence to other images, audio and sequences within the memory 150. The query may also be in the form of a summary, such as "How often have I met Mary?", "What songs did we listen to at Mike's party?", "Who attended the staff meeting yesterday?", and so on.
The module 160 also provides a variety of editing functions, ranging from straightforward copying, cutting, and deleting functions to automated tasks such as "Prepare a disk containing all of my meetings with Charles", or "Prepare a summary disk of all of our family reunions", and so on. Optionally, the module 160 allows a user to use a variety of "director styles" in the creation of such composites and summaries. The foregoing merely illustrates the principles of the invention. It will thus be appreciated that those skilled in the art will be able to devise various arrangements which, although not explicitly described or shown herein, embody the principles of the invention and are thus within its spirit and scope. For example, although this invention is presented in the context of a single user, the system can be configured to communicate with similar systems, to provide for multi-user-based visual memories. The recordings of multiple users who attend a meeting, party, or other assembly, for example, can be merged to provide a common multiple- view recording of the event. These and other system configuration and optimization features will be evident to one of ordinary skill in the art in view of this disclosure, and are included within the scope of the following claims.
In interpreting these claims, it should be understood that: a) the word "comprising" does not exclude the presence of other elements or acts than those listed in a given claim; b) the word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements; c) any reference signs in the claims do not limit their scope; d) several "means" may be represented by the same item or hardware or software implemented structure or function; e) each of the disclosed elements may be comprised of hardware portions (e.g., including discrete and integrated electronic circuitry), software portions (e.g., computer programming), and any combination thereof; f) hardware portions may be comprised of one or both of analog and digital portions; g) any of the disclosed devices or portions thereof may be combined together or separated into further portions unless specifically stated otherwise; h) no specific sequence of acts is intended to be required unless specifically indicated; and i) the term "plurality of" an element includes two or more of the claimed element, and does not imply any particular range of number of elements; that is, a plurality of elements can be as few as two elements.

Claims

CLAIMS:
1. A method comprising: collecting (110) videos corresponding to a user's activities, analyzing (120) the videos to identify events, storing the events in an event library (270) that is configured to facilitate retrieval of one or more of the videos corresponding to each event.
2. The method of claim 1, further including storing video segments of the videos corresponding to the events in a memory (150), prioritizing the events, and removing select video segments from the memory, based on the prioritizing of the corresponding event.
3. The method of claim 1, further including identifying (240, 250) one or more of the events as being either rare or repetitive, so as to further facilitate retrieval (160) of the one or more videos.
4. The method of claim 3, wherein storing the events (270) is based at least in part on whether each of the one or more events is rare or repetitive (240, 250).
5. The method of claim 1, further including identifying a time associated with one or more of the events, and wherein storing the events (270) is based at least in part on the time of the each of the one or more events.
6. The method of claim 5, further including analyzing the events (120) in the event history to identify key events, and wherein storing the events (270) in the event library is further based on whether each event is a key event.
7. The method of claim 1, further including analyzing the events (120) in the event history to identify key events, and wherein storing the events (270) in the event library is further based on whether each event is a key event.
8. The method of claim 1, wherein analyzing the videos (120) includes: extracting features (210) from the videos, and defining clusters (220) based on the features.
9. The method of claim 8, wherein analyzing the videos (120) further includes: identifying the events (230) based on the clusters, and distinguishing one or more of the events as being a repetitive event (250).
10. The method of claim 9, wherein analyzing the videos (120) further includes distinguishing one or more of the events as being a rare event (240).
11. The method of claim 8, further including receiving input from a user to facilitate labeling (260) one or more of the events.
12. The method of claim 11, further including retrieving (160) the one or more videos corresponding to a select event, based on the labeling of the one or more events.
13. The method of claim 1, further including retrieving (160) the one or more videos corresponding to a select event.
14. The method of claim 1, wherein storing the events (270) includes: storing one or more of the videos in a short-term memory (154), and identifying a location in the short-term memory (154) corresponding to each of the events in the event library (270).
15. The method of claim 14, wherein storing the events (270) further includes: storing the one or more videos associated with select events of the events in a long-term memory (156), and identifying a location in the long-term memory (156) corresponding to each of the select events.
16. The method of claim 1, wherein analyzing the videos (120) includes analyzing audio information of the videos to facilitate a characterization of the events.
17. The method of claim 16, wherein the characterization of the events includes at least one of: personal-introduction events, numeric -reference events, and schedule events.
18. The method of claim 16, wherein the characterization of the events is based on at least one of: a topic of conversation in the audio information, a recognition of a person based on the audio information, and a recognition of key phrases in the audio information.
19. The method of claim 1, further including determining a location associated with the videos, and wherein analyzing the videos to identify events is based at least in part on the location associated with the videos.
20. The method of claim 1, further including applying adaptive learning techniques (130) to facilitate classifying each event.
21. The method of claim 1, further including recognizing people in the videos, and storing an identification of each of the people to facilitate retrieval of the one or more videos based on the identification.
22. The method of claim 21, further including storing recognition information related to faces in the video, to facilitate subsequent recognition of the people in the video.
23. The method of claim 21, further including characterizing one or more of the events based upon the identification of the people in the video.
24. The method of claim 1, further including characterizing one or more of the events based upon corresponding information from ancillary sources.
25. A video diary system comprising: a memory system (150) that is configured to contain: a video stream corresponding to a user's activities, and an event library (270), an abstraction module (120) that is configured to: identify video segments within the video stream corresponding to events, and store the events in the event library (270), and an access module (160) that is configured to facilitate retrieval of the video segment corresponding to each event in the event library (270).
26. The video diary system of claim 25, further including a summarization module (120) that is configured to: identify key events in the event library (270), and store the key events in a portion of the event library corresponding to a long-term memory (156) of the memory system (150).
27. The video diary system of claim 26, wherein the summarization module (120) is further configured to selectively delete events from the event library (270), and corresponding video segments from the memory system (150).
28. The video diary system of claim 27, wherein the abstraction module (120) is further configured to distinguish events as being rare events (240) and repetitive events (250), and the summarization module (120) selectively deletes events based on whether the events are repetitive events (250).
29. The video diary system of claim 25, wherein the abstraction module (120) is configured to identify video segments corresponding to events by defining clusters of features (220) within the video segments, and identifying the events (240, 250) based on the clusters of features.
30. The video diary system of claim 25, further including: an adaptive learning module (130) that is configured to define the events.
31. The video diary system of claim 25, wherein the abstraction module (120) is further configured to: identify people in the video segments, and store an identification of each of the people in the memory system, and the access module (160) is further configured to retrieve one or more of the video segments based on the identification.
32. The video diary system of claim 31, wherein the abstraction module (120) is further configured to store recognition information related to faces in the video segments, to facilitate subsequent identification of the people in the video segments.
33. The video diary system of claim 31, wherein the abstraction module (120) is further configured to characterize one or more of the events based upon the identification of the people in the video.
34. The video diary system of claim 25, wherein the abstraction module (120) is configured to characterize one or more of the events based on audio information contained in the one or more video segments.
35. The video diary system of claim 34, wherein the abstraction module (120) characterizes the one or more events as at least one of: personal-introduction events, numeric -reference events, and schedule events.
36. The video diary system of claim 34, wherein the abstraction module (120) characterizes the one or more events based on at least one of: a topic of conversation in the audio information, a recognition of a person based on the audio information, and a recognition of a key phrase in the audio information.
37. The video diary system of claim 25, wherein the abstraction module (120) is further configured to: identify a location associated with one or more events, and store the location in the event library, and the access module (160) is further configured to retrieve one or more of the video segments based on the location.
38. The video diary system of claim 25, wherein the access module (160) is configured to facilitate retrieval based on a user's query.
39. The video diary system of claim 25, further including: an editing module (160) that is configured to facilitate creation of a composite of video segments corresponding to related events.
PCT/IB2006/050839 2005-03-18 2006-03-17 Video diary with event summary WO2006097907A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US66305705P 2005-03-18 2005-03-18
US60/663,057 2005-03-18

Publications (2)

Publication Number Publication Date
WO2006097907A2 true WO2006097907A2 (en) 2006-09-21
WO2006097907A3 WO2006097907A3 (en) 2007-01-04

Family

ID=36710418

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2006/050839 WO2006097907A2 (en) 2005-03-18 2006-03-17 Video diary with event summary

Country Status (1)

Country Link
WO (1) WO2006097907A2 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010058064A1 (en) * 2008-11-24 2010-05-27 Nokia Corporation Determination of event of interest
US8275649B2 (en) 2009-09-18 2012-09-25 Microsoft Corporation Mining life pattern based on location history
US8612134B2 (en) 2010-02-23 2013-12-17 Microsoft Corporation Mining correlation between locations using location history
US8719198B2 (en) 2010-05-04 2014-05-06 Microsoft Corporation Collaborative location and activity recommendations
US8966121B2 (en) 2008-03-03 2015-02-24 Microsoft Corporation Client-side management of domain name information
US8972177B2 (en) 2008-02-26 2015-03-03 Microsoft Technology Licensing, Llc System for logging life experiences using geographic cues
US9009177B2 (en) 2009-09-25 2015-04-14 Microsoft Corporation Recommending points of interests in a region
US9063226B2 (en) 2009-01-14 2015-06-23 Microsoft Technology Licensing, Llc Detecting spatial outliers in a location entity dataset
US9261376B2 (en) 2010-02-24 2016-02-16 Microsoft Technology Licensing, Llc Route computation based on route-oriented vehicle trajectories
US9536146B2 (en) 2011-12-21 2017-01-03 Microsoft Technology Licensing, Llc Determine spatiotemporal causal interactions in data
US9593957B2 (en) 2010-06-04 2017-03-14 Microsoft Technology Licensing, Llc Searching similar trajectories by locations
US9683858B2 (en) 2008-02-26 2017-06-20 Microsoft Technology Licensing, Llc Learning transportation modes from raw GPS data
US9754226B2 (en) 2011-12-13 2017-09-05 Microsoft Technology Licensing, Llc Urban computing of route-oriented vehicles
US10288433B2 (en) 2010-02-25 2019-05-14 Microsoft Technology Licensing, Llc Map-matching for low-sampling-rate GPS trajectories

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030101104A1 (en) * 2001-11-28 2003-05-29 Koninklijke Philips Electronics N.V. System and method for retrieving information related to targeted subjects

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030101104A1 (en) * 2001-11-28 2003-05-29 Koninklijke Philips Electronics N.V. System and method for retrieving information related to targeted subjects

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
BABAGUCHI N ET AL: "Generation of personalized abstract of sports video" ICME 2001, IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, 22-25 AUG. 2001, PISCATAWAY, NJ, USA, IEEE, 22 August 2001 (2001-08-22), pages 619-622, XP010661913 ISBN: 0-7695-1198-8 *
FURTH B (ED.): "Handbook of multimedia computing" CRC PRESS, 1999, pages 520-521, XP002393546 Boca Raton, FL, USA ISBN: 0-8493-1825-4 *
GEORIS B ET AL: "IP-distributed computer-aided video-surveillance system" IEE SYMPOSIUM INTELLIGENT DISTRIBUTED SURVEILLANCE SYSTEMS (DIGEST NO.03/10062) IEE LONDON, UK, 2003, pages 18/1-5, XP002393544 *
LIENHART R: "Dynamic video summarization of home video" PROCEEDINGS OF SPIE, vol. 3972, December 1999 (1999-12), pages 378-389, XP002393545 *
TSENG B L ET AL: "Video summarization and personalization for pervasive mobile devices" PROCEEDINGS OF THE SPIE - THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING SPIE-INT. SOC. OPT. ENG USA, vol. 4676, 2001, pages 359-370, XP002393543 ISSN: 0277-786X *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9683858B2 (en) 2008-02-26 2017-06-20 Microsoft Technology Licensing, Llc Learning transportation modes from raw GPS data
US8972177B2 (en) 2008-02-26 2015-03-03 Microsoft Technology Licensing, Llc System for logging life experiences using geographic cues
US8966121B2 (en) 2008-03-03 2015-02-24 Microsoft Corporation Client-side management of domain name information
US8645283B2 (en) 2008-11-24 2014-02-04 Nokia Corporation Determination of event of interest
WO2010058064A1 (en) * 2008-11-24 2010-05-27 Nokia Corporation Determination of event of interest
US9063226B2 (en) 2009-01-14 2015-06-23 Microsoft Technology Licensing, Llc Detecting spatial outliers in a location entity dataset
US8275649B2 (en) 2009-09-18 2012-09-25 Microsoft Corporation Mining life pattern based on location history
US9009177B2 (en) 2009-09-25 2015-04-14 Microsoft Corporation Recommending points of interests in a region
US9501577B2 (en) 2009-09-25 2016-11-22 Microsoft Technology Licensing, Llc Recommending points of interests in a region
US8612134B2 (en) 2010-02-23 2013-12-17 Microsoft Corporation Mining correlation between locations using location history
US9261376B2 (en) 2010-02-24 2016-02-16 Microsoft Technology Licensing, Llc Route computation based on route-oriented vehicle trajectories
US10288433B2 (en) 2010-02-25 2019-05-14 Microsoft Technology Licensing, Llc Map-matching for low-sampling-rate GPS trajectories
US8719198B2 (en) 2010-05-04 2014-05-06 Microsoft Corporation Collaborative location and activity recommendations
US9593957B2 (en) 2010-06-04 2017-03-14 Microsoft Technology Licensing, Llc Searching similar trajectories by locations
US10571288B2 (en) 2010-06-04 2020-02-25 Microsoft Technology Licensing, Llc Searching similar trajectories by locations
US9754226B2 (en) 2011-12-13 2017-09-05 Microsoft Technology Licensing, Llc Urban computing of route-oriented vehicles
US9536146B2 (en) 2011-12-21 2017-01-03 Microsoft Technology Licensing, Llc Determine spatiotemporal causal interactions in data

Also Published As

Publication number Publication date
WO2006097907A3 (en) 2007-01-04

Similar Documents

Publication Publication Date Title
WO2006097907A2 (en) Video diary with event summary
US7739597B2 (en) Interactive media frame display
Lienhart Dynamic video summarization of home video
Hori et al. Context-based video retrieval system for the life-log applications
US6597859B1 (en) Method and apparatus for abstracting video data
US7466334B1 (en) Method and system for recording and indexing audio and video conference calls allowing topic-based notification and navigation of recordings
Ponceleon et al. Key to effective video retrieval: effective cataloging and browsing
EP1485821A2 (en) Adaptive environment system and method of providing an adaptive environment
KR100711948B1 (en) Personalized video classification and retrieval system
JP2011086315A (en) Method and device for audio/data/visual information selection
Wilcox et al. Annotation and segmentation for multimedia indexing and retrieval
US9665627B2 (en) Method and device for multimedia data retrieval
Otani et al. Video summarization using textual descriptions for authoring video blogs
Wang et al. VFerret: content-based similarity search tool for continuous archived video
Smeaton Indexing, browsing, and searching of digital video and digital audio information
Adami et al. The ToCAI description scheme for indexing and retrieval of multimedia documents
US7457811B2 (en) Precipitation/dissolution of stored programs and segments
US20050138036A1 (en) Method of Producing Personalized Data Storage Product
Smeaton et al. Interactive searching and browsing of video archives: Using text and using image matching
Hauptmann et al. Beyond the Informedia digital video library: video and audio analysis for remembering conversations
Dimitrova et al. Video scouting demonstration: smart content selection and recording
Petrelli et al. Retrieving amateur video from a small collection
TW202337202A (en) Multimedia data recording method and recording device
EP1820125A1 (en) Adaptation of time similarity threshold in associative content retrieval
Hampapur et al. Video Browsing Using Cooperative Visual and Linguistic Indices.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase in:

Ref country code: DE

NENP Non-entry into the national phase in:

Ref country code: RU

WWW Wipo information: withdrawn in national office

Country of ref document: RU

122 Ep: pct application non-entry in european phase

Ref document number: 06727676

Country of ref document: EP

Kind code of ref document: A2

WWW Wipo information: withdrawn in national office

Ref document number: 6727676

Country of ref document: EP

122 Ep: pct application non-entry in european phase

Ref document number: 06727676

Country of ref document: EP

Kind code of ref document: A2