WO2007005382A2 - Sensing, storing, indexing, and retrieving data leveraging measures of user activity, attention, and interest - Google Patents

Sensing, storing, indexing, and retrieving data leveraging measures of user activity, attention, and interest Download PDF

Info

Publication number
WO2007005382A2
WO2007005382A2 PCT/US2006/024847 US2006024847W WO2007005382A2 WO 2007005382 A2 WO2007005382 A2 WO 2007005382A2 US 2006024847 W US2006024847 W US 2006024847W WO 2007005382 A2 WO2007005382 A2 WO 2007005382A2
Authority
WO
WIPO (PCT)
Prior art keywords
data
user
data items
items
item
Prior art date
Application number
PCT/US2006/024847
Other languages
French (fr)
Other versions
WO2007005382A3 (en
Inventor
Susan T. Dumais
Eric J. Horvitz
Original Assignee
Microsoft Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corporation filed Critical Microsoft Corporation
Priority to CN2006800226726A priority Critical patent/CN101501627B/en
Priority to JP2008520267A priority patent/JP5021640B2/en
Priority to EP06774032.4A priority patent/EP1897002B1/en
Priority to KR1020077030613A priority patent/KR101242369B1/en
Publication of WO2007005382A2 publication Critical patent/WO2007005382A2/en
Publication of WO2007005382A3 publication Critical patent/WO2007005382A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/21Monitoring or handling of messages
    • H04L51/234Monitoring or handling of messages for tracking messages
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/535Tracking the activity of the user
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99932Access augmentation or optimizing

Definitions

  • Computer platforms provide many tools for storing and processing large and varying types of data sets. These can include word processing tools, data presentation tools, computer-aided graphics tools, electronic mail handling tools, calendar and scheduling tools, and numerous database manipulation tools. Given the various usages for data on the platform, applications have developed over time that are somewhat content centric. In other words, when data has been stored in the computer's database, the data is subsequently retrieved and/or manipulated in some manner based on the actual content of the stored data. In one specific example, an e-mail inbox can be searched for previously received e-mails based on a keyword that links a search tool to respective e-mails that are associated with the term, where the term is linked to the actual contents of stored e-mails.
  • any e-mail associated with this keyword would be retrieved and presented to the user, whereby the user would subsequently sift through the retrieved list for the desired e-mail associated with the term "John.”
  • the specific e-mail the user is searching for may be retrieved in the resulting list of mail, a large number of e-mails may have to be subsequently searched in order to find the desired e-mail (e.g., thirty e-mails contain the term John).
  • e-mail processing described in the above example can be extended to include many types of data processing and file manipulation activities. For instance these can include indexing of stored data, presentation of stored data, searching for various types of stored data, ranking data, indexing data, and so forth.
  • Another aspect is that knowledge from a source generally cannot be applied without a description of the context of both a document's creator and its reader. Only an explicit representation of the two context frames allows for a (semi-automatic) translation between them; in the above examples, old knowledge can be adapted to modern standards and vocabulary, but similar problems may increasingly appear in the medium and long-term future, when all documents that are currently created and stored in digital form become "historic knowledge" themselves.
  • Metadata tags associated with files or applications can be employed to facilitate effective information storage of and/or access to information.
  • User activities or interactions with data such as associated with respective files or applications represent an especially interesting and effective type of metadata and are the focus of many applications.
  • their activity with the data can be monitored and weighted according to the type and intensity of the activity. For instance, if a user heavily interacts with a particular file by adding and removing text from the file on a frequent basis, a score or weight can be assigned to the file in metadata or other format to indicate such activity.
  • scores or weights can be assigned over a broad range of file usage activities and for a plurality of differing activities such as creating, opening, viewing, scrolling, editing, printing, annotating, saving, forwarding, and so forth.
  • the activity weights or patterns can then be associated with data items ⁇ e.g., tagged to a column in a database), subsection of items, or groups of items.
  • the activity weight can then later be employed with a data manipulation tool such as a search utility for example, to refine a larger set of data items into a smaller or more manageable set of items.
  • a data manipulation tool such as a search utility for example, to refine a larger set of data items into a smaller or more manageable set of items. For example, instead of merely searching a set of data items for a content-centric keyword, searching for information can be augmented via activity enhanced clues to more efficiently retrieve desired data of interest ⁇ e.g., find all files that were forwarded to a particular user, find a subset of presentations that have been most heavily utilized for other applications, determine paragraph that was last edited, and so forth).
  • FIG. 1 is a schematic block diagram illustrating a data processing system that employs user activity or interaction data.
  • FIG. 2 illustrates an exemplary user interface that utilizes user activity or interaction data.
  • Fig. 3 illustrates a flow diagram illustrating a user activity determination and process.
  • FIG. 4 illustrates an example system of an information retrieval architecture that can be employed with user activity data processing.
  • FIG. 5 illustrates an example user model that can be employed with user activity or interaction data.
  • Fig. 6 a system diagram illustrates access-based information retrieval in accordance with user activity or interaction data.
  • Fig. 7 illustrates retrieval service applications that can be employed with user activity or interaction data.
  • FIG. 8 is a schematic block diagram illustrating a suitable operating environment.
  • FIG. 9 is a schematic block diagram of a sample-computing environment.
  • Various components and processes are provided to enable data processing on multiple data types where user activity or interaction with data is determined and employed to further process the data in accordance with the activity.
  • the activity or interaction can be monitored and subsequently tagged to a data item ⁇ e.g., activity of file interactions assigned a weight and applied to a column in a database) to later be employed for searching, indexing, cataloging, ranking, or viewing of various data items (or item subsets) residing in a database.
  • a data manipulation system is provided.
  • the system includes one or more data items that are associated with one or more tags and indicate at least one user's interaction with the data items.
  • a manipulation tool ⁇ e.g., search tool) processes the data items to determine a subset of data items based at least in part on the user's interaction with the data items.
  • model “query,” and the like are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution.
  • a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer.
  • an application running on a server and the server can be a component.
  • One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers. Also, these components can execute from various computer readable media having various data structures stored thereon.
  • the components may communicate via local and/or remote processes such as in accordance with a signal having one or more data packets ⁇ e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the Internet with other systems via the signal).
  • the term "inference” refers generally to the process of reasoning about or inferring states of the system, environment, and/or user from a set of observations as captured via events and/or data. Inference can be employed to identify a specific context or action, or can generate a probability distribution over states, for example.
  • the inference can be probabilistic - that is, the computation of a probability distribution over states of interest based on a consideration of data and events.
  • Inference can also refer to techniques employed for composing higher-level events from a set of events and/or data. Such inference results in the construction of new events or actions from a set of observed events and/or stored event data, whether or not the events are correlated in close temporal proximity, and whether the events and data come from one or several event and data sources. Furthermore, inference can be based upon logical models or rules, whereby relationships between components or data are determined by an analysis of the data and drawing conclusions there from. For instance, by observing that one user interacts with a subset of other users over a network, it may be determined or inferred that this subset of users belongs to a desired social network of interest for the one user as opposed to a plurality of other users who are never or rarely interacted with.
  • a system 100 illustrates a data processing architecture that employs user activity or interaction data to perform various computer-related tasks.
  • a monitor component 110 observes data interactions overtime with one or more databases 120 that store one or more data items 130. Monitoring can occur via a background and/or foreground component (not shown) and employed to determine when the data items 130 are interacted with by users.
  • this can include observing when files are opened or closed, edited, added to or deleted from, read or written to, cut, pasted, last edited, forwarded, replied to, sent to, last viewed, the amount of time viewed, time interacted with over a time horizon, ⁇ ttentional annotations indicating how long the item 130 or different subcomponents of the item was the focus of attention, and so forth.
  • a tag component 140 assigns weights or scores according to monitored data activities. These can be probabilistically assigned, if desired (or other weighting classification), and can reflect the amount or pattern of the user's interaction with a given piece of data or application. For instance, minimal file usage may generate a lower weight than an extensive editing of the file.
  • the determined weight information for activity is associated or tagged to one or more data items and is illustrated as activity data 150.
  • This association can occur within the database 120 for example such as creating or modify values in a column or columns of the database 120 to indicate the weighting or importance of a particular item identified in a row of the database 120 for the metadata tag of interest. It is to be appreciated that the columns are logical entities and could either be stored explicitly or computed dynamically at the time of usage.
  • Other type associations could include a metadata reference that is directly or indirectly assigned to one or more data items 130.
  • the tags and data items 130 can be applied with one or more data manipulation tools which can employ user activity information derived from the tags to augment information storage (e.g., efficient index creation), information access (e.g., searching, filtering or ranking of items) and information presentation (e.g., to organize, arrange or present items) and so forth.
  • results from the data manipulation tool 160 are automatically generated and can include a reduced subset of data items form a larger set of the data items 130. It is not that as applied herein, the term subset can include all or a portion of the data items 130.
  • activity thresholds can be set-up within the tools 160 to include more or less of the data items 130 in the results 170.
  • content-based scores can be assigned for documents. For example, a score can be assigned based on the similarity of a user's query to the content of the document. Thus, when ranking, give more weight to terms that appear in documents or sections of a document that have been edited or the user has spent lots of time reading, for example.
  • activity data 150 can be employed to identify regions or terms of interest. Thus, differential weighting of document regions can be utilized to: [0026] 1) Compress an index to preferentially include terms in regions of interest;
  • the systems and methods described herein support a plurality of data processing applications. This includes processing data items such as documents, files, email messages, calendar appointments, web pages, sub-sections within the data items' or cross-item abstractions, for example.
  • Applying tags to the data items can represent a location that a user last accessed an item, or represent a location history of times that a user has accessed or interacted with an item.
  • the tag represents a time a user last accessed an item, a total number of times that an item has been accessed, represents a frequency that an item has been accessed, within a period of time extending into the past, or represents a frequency that an item has been accessed, within one or more arbitrarily specified periods of time.
  • tags can be probabilistic indications of activity or interest.
  • Various processes include analysis of user activities with data items. This includes automatically tagging a quantity or nature of interaction that data items have received from computer users and employing the tags to further process the data items in accordance with future data activities.
  • the processes can include storing data within an attentional annotation associated with the data items in a separate database or within a data structure embedded in the data items.
  • indexing procedures can be provided that weight subcomponents of data items differently for retrieval depending on a status of annotations indicating attention or interaction with data items. This includes indexing procedures that overlook or delete information in data items depending on a status of annotations indicating attention or interaction with data items.
  • An index can be compressed by removing components that have not been attended to or interacted with by computer users or that are lesser attended to or interacted with components of data items.
  • a ranking score can be utilized for data retrieval, to yield more weight to terms or objects that appear in sections of a data item that a user has attended to or has interacted with.
  • This can include employing attentional annotations to automatically or semi-automatically generate queries based on regions that have been attended to or interacted within the past and/or present.
  • the attentional annotations can also be employed to provide differential access to items or differential display of items that have been attended to or interacted with in the past and/or present.
  • attentional annotations can be encoded not only as attention to data items themselves but of attention to subcomponents of the items, where the annotation captures pointers or other indications of each of the subcomponents and the attention that has been received. For example, consider a large document such as 211 page document. The document may have been opened interacted with and attended to 23 times, for example which is captured as one type of attentional annotation for the document. However, other attentional annotations indicate that the user has repeatedly examined pages 4-6, 89-93, 123-124, and 198, for example, — and just skimmed quickly over other pages of the document. Thus, each subcomponent can be listed and the amount of attention that each portion of text can be encoded in the annotation.
  • Fig. 2 illustrates an exemplary user interface 200 that utilizes user activity or interaction data.
  • a manipulation tool 210 e.g., user interface applied to a database
  • the tool 210 can include many features for processing data from one or more databases.
  • the tool 210 may include selections for enabling data searches, indexing or cataloging of data, ranking of data, and so forth.
  • Such data can include textual data such as XML data or ASCII data for example.
  • Other data includes image data, audio data, video data, graphics data, and/or presentation data such as contained in a series of slides, for example.
  • Substantially any data type or application can be employed including spread sheets, Universal Resource Location (URL) information, Internet or Web data, and so forth.
  • URL Universal Resource Location
  • Such data can be tagged such as in a column or as file metadata to indicate a score or a weight that is indicative of the past usage or interactions.
  • the manipulation tool can then search, retrieve, or process the tagged data to refine or determine more manageable subsets of data for users.
  • the output 220 from the tool 210 can be a file or an actual user interface display.
  • the output could be a display of returned results.
  • the returned information can be more global in nature as illustrated at 230. This may include highlighting or applying graphics to a file to indicate that one file or grouping of files have been selected because of their increased activity with the user.
  • the tool 210 may be applied to search for all files that have the keyword computer and have had at least one graphical image associated with the file in the past month. Searches can be crafted in a plurality of ways and can include combinations of content searching, activity-based searching, and or combinations thereof.
  • three e-mails out of a set of ten e-mails may be highlighted in one color as having a higher activity score than the other seven e-mails which are delineated in a different color.
  • information within a returned file or data set can be highlighted or annotated to indicate usage activity (e.g., paragraph within a file selected with different font format to highlight usage areas within the document).
  • Fig. 3 illustrates a process 300 for determining and applying user activity or interaction data. While, for purposes of simplicity of explanation, the methodologies are shown and described as a series or number of acts, it is to be understood and appreciated that the subject process is not limited by the order of acts, as some acts may, in accordance with the subject process, occur in different orders and/or concurrently with other acts from that shown and described herein. For example, those skilled in the art will understand and appreciate that a methodology could alternatively be represented as a series of interrelated states or events, such as in a state diagram. Moreover, not all illustrated acts may be required to implement a methodology in accordance with the subject process.
  • data interactions with one or more local or remote databases is monitored.
  • Such monitoring can occur in a background and/or foreground application and is employed to determine when data or files are interacted with by users. For instance, this can include observing when files are opened or closed, edited, added to or deleted from, read or written to, and so forth.
  • various techniques can be employed to determine activities within the application. These can include monitoring how long a user dwells on a particular set or subset of data, what data has been modified or observed and so forth. Outside monitoring can also be associated with the application to determine a user's interaction with the data, file or application.
  • weights for the monitored activities are determined.
  • the weights can be probabilistically assigned and can reflect the amount of the user's interaction with a given piece of data or application. For instance, a very light perusal of a document may generate a lower weight than an extensive editing of the document.
  • the determined weight information for activity is associated or tagged to one or more data items. The association can occur within the confines of a database for example such as tagging a column or columns of the database to indicate the weighting or importance of a particular item identified in a row of the database.
  • tags and data items can be applied with a data processing application. This can include utilizing the activity information derived from the tags to augment searches for items, to index items, to arrange items, to rank items, to organize items and so forth.
  • results from the data, processing application are generated. This can include explicit actions such as filtering a larger result set into a smaller subset or more subtle actions such as annotating a display to highlight files or data on the display which indicate those items that have been interacted with more by the user's.
  • an example system 400 illustrates an information retrieval architecture that can be employed with user activity data processing.
  • the system 100 depicts a general diagram for personalizing search results, however other forms of data manipulation can be performed as described above.
  • a personalization component 410 includes a user model 420 based on user activity as well as processing components ⁇ e.g., retrieval algorithms modified in accordance with the user model) for using the model to influence search results by modifying a query 430 and/or modifying results 440 returned from a search.
  • a user interface 450 generates the query 430 and receives modified or personalized results based upon a query modification 470 and/or results modification 460 provided by the personalization component 410.
  • query modification refers to both an alteration with respect to terms in the query 430 and alterations in an algorithm that matches the query 430 to documents in order to obtain the personalized results 440.
  • Modified queries and/or results 440 are returned from one or more local and/or remote search engines 480.
  • a global database 490 of user statistics may be maintained to facilitate updates to the user model 420.
  • the user model 420 and/or global statistics 490 can be associated with user activity or interaction data as previously describe to facilitate data manipulation or processing.
  • query modification processes an initial input query and modifies or regenerates the query (via user model) to yield personalized results.
  • Relevance feedback is a two-cycle variation of this process, wherein a query generates results that leads to a modified query (using explicit or implicit judgments about the initial results set) which yields personalized results that are personalized to a short-term model based on the query and result set. Longer-term user models can also be used in the context of relevance feedback.
  • query modifications also refer to alterations made in algorithm(s) employed to match the query to documents.
  • results modification take a user's input as-is to generate a query to yield results which are then modified (via user model) to generate personalized results. It is noted that modification of results usually includes some form of re-ranking and/or selection from a larger set of alternatives which can include a consideration or weighting from determined activity data. Modification of results can also include various types of agglomeration and summarization of all or a subset of results.
  • Methods for modifying results include statistical similarity match (in which users interests and content are represented as vectors and matched to items), and category matching (in which the users' interests and content are represented and matched to items using a smaller set of descriptors).
  • statistical similarity match in which users interests and content are represented as vectors and matched to items
  • category matching in which the users' interests and content are represented and matched to items using a smaller set of descriptors.
  • a user model 500 is illustrated that can be employed with user activity or interaction data.
  • the user model 500 is employed to differentiate personalized searches from generalized searches and to facilitate rich data processing according to determined activity data.
  • One aspect in successful personalization is to build a model of the user that accurately reflects their interests and is easy to maintain and adapt to changes regarding long- term and short-term interests.
  • the user model can be obtained from a variety of sources, including but not limited to:
  • the user model 500 can be based on many different sources of information.
  • the model 500 can be sourced from a history or log of locations visited by a user over time, as monitored by devices such as the Global Positioning System (GPS).
  • GPS Global Positioning System
  • raw spatial information can be converted into textual city names, and zip codes.
  • the raw spatial information can be converted into textual city names, and zip codes for positions a user has paused or dwelled or incurred a loss of GPS signal, for example.
  • the locations that the user has paused or dwelled or incurred a loss of GPS signal can identified and converted via a database of businesses and points of interest into textual labels. Other factors include logging the time of day or day of week to determine locations and points of interest.
  • components can be provided to manipulate parameters for controlling how a user's corpus of information, appointments, views of documents or files, activities, or locations can be grouped into subsets or weighted differentially in matching procedures for personalization based on type, age, or other combinations.
  • a retrieval algorithm could be limited to those aspects of the user's corpus that pertain to the query (e.g., documents that contain the query term or past interaction with data).
  • email may be analyzed from the previous 1 month, whereas web accesses from the previous 3 days, and the user's content created within the last year. It may be desirable that GPS location information is used from only today or other time period.
  • the parameters can be manipulated automatically to create subsets (e.g., via an optimization process that varies parameters and tests response from user or system) or users can vary one or more of these parameters via a user interface, wherein such settings can be a function of the nature of the query, the time of day, day of week, or other contextual or activity-based observations.
  • Models can be derived for individuals or groups of individuals at 570 such as via collaborative filtering techniques that develop profiles by the analysis of similarities among individuals or groups of individuals. Similarity computations can be based on the content and/or usage of items. It is noted that modeling infrastructure and associated processing can reside on client, multiple clients, one or more servers, or combinations of servers and clients. [0042] At 580, machine learning techniques can be applied to learn user characteristics and interests over time as well as how and when data is interacted with by users.
  • the learning models can include substantially any type of system such as statistical/mathematical models and processes for modeling users and determining preferences and interests including the use of Bayesian learning, which can generate Bayesian dependency models, such as Bayesian networks, na ⁇ ve Bayesian classifiers, and/or other statistical classification methodology, including Support Vector Machines (SVMs), for example.
  • Bayesian dependency models such as Bayesian networks, na ⁇ ve Bayesian classifiers, and/or other statistical classification methodology, including Support Vector Machines (SVMs), for example.
  • Other types of models or systems can include neural networks and Hidden Markov Models, for example.
  • elaborate reasoning models can be employed, it is to be appreciated that other approaches can also utilized. For example, rather than a more thorough probabilistic approach, deterministic assumptions can also be employed (e.g., no recent searching for Z amount of time of a particular web site may imply by rule that user is no longer interested in the respective information).
  • logical decisions can also be made regarding the status, location, context, interests,
  • the learning models can be trained from a user event data store (not shown) that collects or aggregates data from a plurality of different data sources.
  • Such sources can include various data acquisition components that record or log user event data (e.g., cell phone, acoustical activity recorded by microphone, Global Positioning System (GPS), electronic calendar, vision monitoring equipment, desktop activity, web site interaction and so forth).
  • GPS Global Positioning System
  • the systems can be implemented in substantially any manner that supports personalized query and results processing.
  • the system could be implemented as a server, a server farm, within client application(s), or more generalized to include a web service(s) or other automated application(s) that interact with search functions such as user interfaces and search engines.
  • collaborative filter techniques applied at 570 of the user model 500 are described in more detail. These techniques can include employment of collaborative filters to analyze data and determine profiles for the user.
  • Collaborative filtering systems generally use a centralized database about user preferences to predict additional topics users may desire.
  • Collaborative filtering is applied with the user model 500 to process previous user activities from a group of users that may indicate preferences for a given user that predict likely or possible profiles for new users of a system.
  • Several algorithms including techniques based on correlation coefficients, vector-based similarity calculations, and statistical Bayesian methods can be employed.
  • a system 600 illustrates access-based information retrieval in accordance with user activity or interaction data.
  • the system 600 includes one or more information sources 610 that are accessed or considered by a user. These sources 610 can be similar or disparate in nature having dissimilar information content, whereby some of the information sources may represent local data locations such as files, folders, applications, images, audio files, appointments, email, and so forth, and other sources 610 may represent remote sources such as web information, for example.
  • a usage analyzer 614 passes this information to a content analyzer 620 (or monitor) which can operate as a background task locally on a client machine and/or remotely in accordance with a server, processes the accessed data via a filter 624 for parsing content from data, and an automated indexer 630 that creates a content index 640 (or activity tags) of accessed data items.
  • the analyzer 620 creates representations of accessed data in the index
  • the content analyzer 620 may create a thumbnail representation of the web page and associate a hyperlink reference to the page and thumbnail as part of a metadata file.
  • the representation could be further tailored to reflect detailed patterns of user interaction with the page.
  • the analyzer 620 may extract the text or portions thereof, and associate a database link such as a file path as part of metadata.
  • the indexer 630 would then automatically create an index (or add to an existing index) having two items in the content index 640 - the thumbnail representation and text document representation including metadata.
  • filters analyze the content of and metadata associated with items.
  • the filter 624 extracts metadata such as filename, title, author, keywords, creation date, and so forth along with the words in the document. This is what is used to build the index 640.
  • the creation of thumbnails and the analysis of images could also be encapsulated in the filter 624, if desired.
  • Activity or interaction metadata can be employed that may contain other items such as user and/or implicit tags that describe the items stored in the content index 640.
  • the indexer 630 may also perform filter 624 functions (e.g., indexer associates metadata with filtered content).
  • a search component 650 is provided that receives a user query 654 for information items contained in the content index 640.
  • the search component 650 can be provided as part of a user interface that returns links and/or representations of accessed items at 660 to the user in response to the query 654.
  • the user may query for "items relating to last years performance review," wherein the search component 650 extracts items from the index 640 such as emails, coworker evaluations, documents published in the last year, web page images, audio recordings and so forth relating to the context of the query 654.
  • an implicit query may be derived from the query 654 ⁇ e.g., whenever I get a phone call from this person, pull-up last five e-mails from this person).
  • accessed items can be presented in a plurality of differing formats designed to facilitate efficient and timely retrieval of information items that have been previously accessed.
  • the links and/or representations 660 may include other items of interest to the user such as providing information items that the user may want to see other than those items previously accessed (e.g., system provides links to other content of interest based upon or inferred from query at hand, e.g., in addition to showing performance review items, optionally provide links to human resources describing review policies based on another index of content even though these items may or may not have not been previously accessed by the user).
  • an event component can be provided (not shown) (e.g., background task that monitors user activities associated with usage analyzer 614).
  • the event component monitors user activities such as saving, reading, editing, copying, hovering on information, selecting information, manipulating information and/or deleting files, for example, and makes determinations with respect to user actions.
  • This can include sensors such as microphones, cameras, and other devices along with monitoring desktop activities to determine user actions or goals.
  • probabilistic models and/or logical decisions can be applied to determine events such as when a user has observed or contemplated information.
  • Logical and/or statistical models e.g., Bayesian dependency models, decision trees, Support Vector Machines
  • Fig. 7 illustrates various retrieval service applications 700.
  • explicit queries 710 and/or implicit queries 714 can be supported.
  • Explicit queries 710 are directed by the user to find information of interest ⁇ e.g., show all data references relating to a meeting or date).
  • Implicit queries 714 can in some cases be derived from the explicit query 710. For example, a user could have their desktop phone messages linked to their e-mail system or other message system. If a phone call were to arrive from selected individuals, the e-mail system could automatically retrieve e-mail relating to the individual via implicit query 714.
  • implicit queries 714 may be generated based upon reasoning processes associated with the user's current context or query ⁇ e.g., a query composed of important words in recently read paragraphs).
  • queries include providing additional selection options to edit or refine searches.
  • queries may be directed to a particular type of application or location ⁇ e.g., apply this query to mail folder only).
  • the context of an application can be considered when performing a query. For example, if a photo application is being used, then the query can be refined to only search for images.
  • item-centric integrations can be performed. This includes operating system actions that support interface actions such as mouse click functions, tagging items, updating metadata files, deleting items, editing items or content, and so forth.
  • file sharing can be performed.
  • the user may specify that one or more other users can inspect or have access to all or a subset of their query/index database ⁇ e.g., all users on my project team are permitted access to my project notes).
  • index scrubbing can occur. Over time, users may desire to remove one or more items from their index. In accordance with this activity, users can specify specific items to remove or specify general topic areas that can be automatically scrubbed by the system (e.g., remove thumbnails related to my birthday two years ago). Other actions could occur based upon logical or reasoning processes such as if an item were accessed fewer than a certain number of times in a predetermined period, then the item could be automatically removed if desired. [0053] At 740, effective time computations are considered.
  • the date that's relevant or useful concerning a file is the date it was changed
  • the date for presenting mail is usually the date it was delivered (and thus approximately when the user saw it)
  • the useful date for an appointment is the date the appointment occurs.
  • all time information recorded and indexed and that useful date information can be utilized for presentation of information.
  • various tasks can occur such as indexing the time mail was sent, the time it was updated (if that happened), the time the user accepted/declines, and the time the meeting occurred, for example.
  • typically one time is selected for display although more than one time can be provided.
  • certain data can be marked as having been previously observed by analyzing file elements associated with a file type.
  • a text document may contain a field indicating when a file was open or last edited.
  • calendar appointments merely creating an index from when the calendar was created is likely to be of minor benefit to people because sometimes meetings are created well in advance of the actual meeting date.
  • the actual meeting data as opposed to time of creation can be tracked.
  • This type of effective time consideration enables users to retrieve information in a manner more suited to memory recall.
  • the volatility of data is considered and processed. This type of processing involves indexing of data into a persistent form during intermittent operations. As can be appreciated, various automated background operations are possible.
  • an exemplary environment 810 for implementing various aspects described herein includes a computer 812.
  • the computer 812 includes a processing unit 814, a system memory 816, and a system bus 818.
  • the system bus 818 couples system components including, but not limited to, the system memory 816 to the processing unit 814.
  • the processing unit 814 can be any of various available processors. Dual microprocessors and other multiprocessor architectures also can be employed as the processing unit 814.
  • the system bus 818 can be any of several types of bus structure(s) including the memory bus or memory controller, a peripheral bus or external bus, and/or a local bus using any variety of available bus architectures including, but not limited to, 11 -bit bus, Industrial Standard Architecture (ISA), Micro-Channel Architecture (MSA), Extended ISA (EISA), Intelligent Drive Electronics (IDE), VESA Local Bus (VLB), Peripheral Component Interconnect (PCI), Universal Serial Bus (USB), Advanced Graphics Port (AGP), Personal Computer Memory Card International Association bus (PCMCIA), and Small Computer Systems Interface (SCSI).
  • ISA Industrial Standard Architecture
  • MSA Micro-Channel Architecture
  • EISA Extended ISA
  • IDE Intelligent Drive Electronics
  • VLB VESA Local Bus
  • PCI Peripheral Component Interconnect
  • USB Universal Serial Bus
  • AGP Advanced Graphics Port
  • PCMCIA Personal Computer Memory Card International Association bus
  • SCSI Small Computer Systems Interface
  • the system memory 816 includes volatile memory 820 and nonvolatile memory
  • nonvolatile memory 822 The basic input/output system (BIOS), containing the basic routines to transfer information between elements within the computer 812, such as during start-up, is stored in nonvolatile memory 822.
  • BIOS basic input/output system
  • nonvolatile memory 822 can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM), or flash memory.
  • Volatile memory 820 includes random access memory (RAM), which acts as external cache memory.
  • RAM is available in many forms such as synchronous RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), Synchlink DRAM (SLDRAM), and direct Rambus RAM (DRRAM).
  • SRAM synchronous RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDR SDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM Synchlink DRAM
  • DRRAM direct Rambus RAM
  • Computer 812 also includes removable/non-removable, volatile/non-volatile computer storage media.
  • Fig. 8 illustrates, for example a disk storage 824.
  • Disk storage 824 includes, but is not limited to, devices like a magnetic disk drive, floppy disk drive, tape drive, Jaz drive, Zip drive, LS-100 drive, flash memory card, or memory stick.
  • disk storage 824 can include storage media separately or in combination with other storage media including, but not limited to, an optical disk drive such as a compact disk ROM device (CD- ROM), CD recordable drive (CD-R Drive), CD rewritable drive (CD-RW Drive) or a digital versatile disk ROM drive (DVD-ROM).
  • an optical disk drive such as a compact disk ROM device (CD- ROM), CD recordable drive (CD-R Drive), CD rewritable drive (CD-RW Drive) or a digital versatile disk ROM drive (DVD-ROM).
  • a removable or non-removable interface
  • Fig 8 describes software that acts as an intermediary between users and the basic computer resources described in suitable operating environment 810.
  • Such software includes an operating system 828.
  • Operating system 828 which can be stored on disk storage 824, acts to control and allocate resources of the computer system 812.
  • System applications 830 take advantage of the management of resources by operating system 828 through program modules 832 and program data 834 stored either in system memory 816 or on disk storage 824. It is to be appreciated that various components described herein can be implemented with various operating systems or combinations of operating systems.
  • a user enters commands or information into the computer 812 through input device(s) 836.
  • Input devices 836 include, but are not limited to, a pointing device such as a mouse, trackball, stylus, touch pad, keyboard, microphone, joystick, game pad, satellite dish, scanner, TV tuner card, digital camera, digital video camera, web camera, and the like. These and other input devices connect to the processing unit 814 through the system bus 818 via interface port(s) 838.
  • Interface port(s) 838 include, for example, a serial port, a parallel port, a game port, and a universal serial bus (USB).
  • Output device(s) 840 use some of the same type of ports as input device(s) 836.
  • a USB port may be used to provide input to computer 812, and to output information from computer 812 to an output device 840.
  • Output adapter 842 is provided to illustrate that there are some output devices 840 like monitors, speakers, and printers, among other output devices 840, that require special adapters.
  • the output adapters 842 include, by way of illustration and not limitation, video and sound cards that provide a means of connection between the output device 840 and the system bus 818. It should be noted that other devices and/or systems of devices provide both input and output capabilities such as remote computer(s) 844.
  • Computer 812 can operate in a networked environment using logical connections to one or more remote computers, such as remote computer(s) 844.
  • the remote computer(s) 844 can be a personal computer, a server, a router, a network PC, a workstation, a microprocessor based appliance, a peer device or other common network node and the like, and typically includes many or all of the elements described relative to computer 812. For purposes of brevity, only a memory storage device 846 is illustrated with remote computer(s) 844.
  • Remote computer(s) 844 is logically connected to computer 812 through a network interface 848 and then physically connected via communication connection 850.
  • Network interface 848 encompasses communication networks such as local-area networks (LAN) and wide-area networks (WAN).
  • LAN technologies include Fiber Distributed Data Interface (FDDI), Copper Distributed Data Interface (CDDI), Ethernet/IEEE 802.3, Token Ring/IEEE 802.5 and the like.
  • WAN technologies include, but are not limited to, point-to-point links, circuit switching networks like Integrated Services Digital Networks (ISDN) and variations thereon, packet switching networks, and Digital Subscriber Lines (DSL).
  • ISDN Integrated Services Digital Networks
  • DSL Digital Subscriber Lines
  • Communication connection(s) 850 refers to the hardware/software employed to connect the network interface 848 to the bus 818. While communication connection 850 is shown for illustrative clarity inside computer 812, it can also be external to computer 812.
  • the hardware/software necessary for connection to the network interface 848 includes, for exemplary purposes only, internal and external technologies such as, modems including regular telephone grade modems, cable modems and DSL modems, ISDN adapters, and Ethernet cards.
  • Fig. 9 is a schematic block diagram of a sample-computing environment 900 that can be employed.
  • the system 900 includes one or more client(s) 910.
  • the client(s) 910 can be hardware and/or software (e.g., threads, processes, computing devices).
  • the system 900 also includes one or more server(s) 930.
  • the server(s) 930 can also be hardware and/or software (e.g., threads, processes, computing devices).
  • the servers 930 can house threads to perform transformations by employing the components described herein, for example.
  • One possible communication between a client 910 and a server 930 may be in the form of a data packet adapted to be transmitted between two or more computer processes.
  • the system 900 includes a communication framework 950 that can be employed to facilitate communications between the client(s) 910 and the server(s) 930.
  • the client(s) 910 are operably connected to one or more client data store(s) 960 that can be employed to store information local to the client(s) 910.
  • the server(s) 930 are operably connected to one or more server data store(s) 940 that can be employed to store information local to the servers 930.

Abstract

Various components and processes are provided to enable data processing on multiple data types where aspects of the history of user activity, attention, interest, location, or other interaction with data is determined and to enhance information storage and access. In one aspect, a data manipulation system is provided. The system includes one or more data items that are associated with one or more tags and indicate at least one user's interaction or activity with the data. A manipulation tool that processes the data items to determine a subset of data based at least in part on the user's interaction with the data items. Methods are described for using the manipulation tool to weight terms in an index, to compress indexes, to influence the rank of items returned in a search, to generate additional queries for data items either automatically or with user direction, or for improved presentation of data items.

Description

Title: SENSING, STORING, INDEXING, AND RETRIEVING DATA
LEVERAGING MEASURES OF USER ACTIVITY, ATTENTION, AND INTEREST
BACKGROUND
[0001] Computer platforms provide many tools for storing and processing large and varying types of data sets. These can include word processing tools, data presentation tools, computer-aided graphics tools, electronic mail handling tools, calendar and scheduling tools, and numerous database manipulation tools. Given the various usages for data on the platform, applications have developed over time that are somewhat content centric. In other words, when data has been stored in the computer's database, the data is subsequently retrieved and/or manipulated in some manner based on the actual content of the stored data. In one specific example, an e-mail inbox can be searched for previously received e-mails based on a keyword that links a search tool to respective e-mails that are associated with the term, where the term is linked to the actual contents of stored e-mails. Thus, if a user were to search for the keyword "John," any e-mail associated with this keyword would be retrieved and presented to the user, whereby the user would subsequently sift through the retrieved list for the desired e-mail associated with the term "John." Although the specific e-mail the user is searching for may be retrieved in the resulting list of mail, a large number of e-mails may have to be subsequently searched in order to find the desired e-mail (e.g., thirty e-mails contain the term John). As can be appreciated, e-mail processing described in the above example can be extended to include many types of data processing and file manipulation activities. For instance these can include indexing of stored data, presentation of stored data, searching for various types of stored data, ranking data, indexing data, and so forth.
[0002] Relating to content-centric applications in general, one common view of a
"finished" document that is to be retrieved, viewed, and employed by a reader is generally not sufficient to adequately support knowledge-intensive tasks. Thus, users or groups of users should also be able to add their own information to a knowledge source. In one example, a historian may want to add a detailed analysis to a chapter of a book. Another user may want to annotate a section of the book with experiences gathered from the analysis. [0003] While practically all documents are available on or through the Web, its hypertext capabilities are currently not used as extensively to directly modify and annotate existing information (e.g., books, papers, web pages, and so forth). Rather, when content is deemed "completed" it is stored in some type of archive (e.g., a digital library), from which it is eventually retrieved as a monolithic entity, used for the production of yet more content. Moreover, the task of information retrieval is typically not integrated with the task of content development. Thus, the user has to retrieve documents they believe are required for a task and then base content development on the information found. While a new document search can always be initiated manually, it is a much more compelling view that content development and retrieval should be integrated. A system that continually scans and analyzes new text entered by a user should be able to search additional relevant information and present this to the user, who may then inspect the new data, integrate it, add cross-references, or reject the proposed sources, for example.
[0004] Another aspect is that knowledge from a source generally cannot be applied without a description of the context of both a document's creator and its reader. Only an explicit representation of the two context frames allows for a (semi-automatic) translation between them; in the above examples, old knowledge can be adapted to modern standards and vocabulary, but similar problems may increasingly appear in the medium and long-term future, when all documents that are currently created and stored in digital form become "historic knowledge" themselves.
[0005] Currently, users obtain documents through some type of indexing and ranking systems: web search engines for plain web pages, or some type of information retrieval systems for digital libraries (historically, these systems come from different roots, but modern implementations exhibit some overlap between these techniques). In either case, the systems usually return complete documents, be it web pages, papers, or whole books. This is one of the primary reasons behind the feeling of "information overload" shared by many users with a virtually endless source of information to process.
SUMMARY
[0006] The following presents a simplified summary in order to provide a basic understanding of some aspects described herein. This summary is not an extensive overview nor is intended to identify key/critical elements or to delineate the scope of the various aspects described herein. Its sole purpose is to present some concepts in a simplified form as a pfelude to the more detailed description that is presented later.
[0007] In contrast to pure content-centric data processing applications, metadata tags associated with files or applications can be employed to facilitate effective information storage of and/or access to information. User activities or interactions with data such as associated with respective files or applications represent an especially interesting and effective type of metadata and are the focus of many applications. As users process data overtime, their activity with the data can be monitored and weighted according to the type and intensity of the activity. For instance, if a user heavily interacts with a particular file by adding and removing text from the file on a frequent basis, a score or weight can be assigned to the file in metadata or other format to indicate such activity.
[0008] In another instance, if a file is rarely interacted with {e.g., opened one time within a year), this relative inaction with the file can cause a lower weight to be assigned - indicating possibly the lower importance of the file to the user. As can be appreciated, scores or weights can be assigned over a broad range of file usage activities and for a plurality of differing activities such as creating, opening, viewing, scrolling, editing, printing, annotating, saving, forwarding, and so forth. The activity weights or patterns can then be associated with data items {e.g., tagged to a column in a database), subsection of items, or groups of items. The activity weight can then later be employed with a data manipulation tool such as a search utility for example, to refine a larger set of data items into a smaller or more manageable set of items. For example, instead of merely searching a set of data items for a content-centric keyword, searching for information can be augmented via activity enhanced clues to more efficiently retrieve desired data of interest {e.g., find all files that were forwarded to a particular user, find a subset of presentations that have been most heavily utilized for other applications, determine paragraph that was last edited, and so forth).
[0009] To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings. These aspects are indicative of various ways which can be practiced, all of which are intended to be covered herein. Other advantages and novel features may become apparent from the following detailed description when considered in conjunction with the drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
[0010] Fig. 1 is a schematic block diagram illustrating a data processing system that employs user activity or interaction data.
[0011] Fig. 2 illustrates an exemplary user interface that utilizes user activity or interaction data.
[0012] Fig. 3 illustrates a flow diagram illustrating a user activity determination and process.
[0013] Fig. 4 illustrates an example system of an information retrieval architecture that can be employed with user activity data processing.
[0014] Fig. 5 illustrates an example user model that can be employed with user activity or interaction data. [0Q15] Fig. 6, a system diagram illustrates access-based information retrieval in accordance with user activity or interaction data.
[0016] Fig. 7 illustrates retrieval service applications that can be employed with user activity or interaction data.
[0017] Fig. 8 is a schematic block diagram illustrating a suitable operating environment.
[0018] Fig. 9 is a schematic block diagram of a sample-computing environment.
DETAILED DESCRIPTION
[0019] Various components and processes are provided to enable data processing on multiple data types where user activity or interaction with data is determined and employed to further process the data in accordance with the activity. For example, the activity or interaction can be monitored and subsequently tagged to a data item {e.g., activity of file interactions assigned a weight and applied to a column in a database) to later be employed for searching, indexing, cataloging, ranking, or viewing of various data items (or item subsets) residing in a database. In one particular aspect, a data manipulation system is provided. The system includes one or more data items that are associated with one or more tags and indicate at least one user's interaction with the data items. A manipulation tool {e.g., search tool) processes the data items to determine a subset of data items based at least in part on the user's interaction with the data items.
[0020] As used in this application, the terms "component," "system," "tag," "monitor,"
"model," "query," and the like are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a server and the server can be a component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers. Also, these components can execute from various computer readable media having various data structures stored thereon. The components may communicate via local and/or remote processes such as in accordance with a signal having one or more data packets {e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the Internet with other systems via the signal). [0021] As used herein, the term "inference" refers generally to the process of reasoning about or inferring states of the system, environment, and/or user from a set of observations as captured via events and/or data. Inference can be employed to identify a specific context or action, or can generate a probability distribution over states, for example. The inference can be probabilistic - that is, the computation of a probability distribution over states of interest based on a consideration of data and events. Inference can also refer to techniques employed for composing higher-level events from a set of events and/or data. Such inference results in the construction of new events or actions from a set of observed events and/or stored event data, whether or not the events are correlated in close temporal proximity, and whether the events and data come from one or several event and data sources. Furthermore, inference can be based upon logical models or rules, whereby relationships between components or data are determined by an analysis of the data and drawing conclusions there from. For instance, by observing that one user interacts with a subset of other users over a network, it may be determined or inferred that this subset of users belongs to a desired social network of interest for the one user as opposed to a plurality of other users who are never or rarely interacted with. [0022] Referring initially to Fig. 1, a system 100 illustrates a data processing architecture that employs user activity or interaction data to perform various computer-related tasks. A monitor component 110 observes data interactions overtime with one or more databases 120 that store one or more data items 130. Monitoring can occur via a background and/or foreground component (not shown) and employed to determine when the data items 130 are interacted with by users. For instance, this can include observing when files are opened or closed, edited, added to or deleted from, read or written to, cut, pasted, last edited, forwarded, replied to, sent to, last viewed, the amount of time viewed, time interacted with over a time horizon, μttentional annotations indicating how long the item 130 or different subcomponents of the item was the focus of attention, and so forth.
[0023] When an application of file is open and a particular data item 130 is being acted upon from within an application, various techniques can be employed to determine activities within the application. These can include monitoring how long a user dwells on a particular set or subset of data, what data has been modified or observed, how often and over what time span the data has been operated upon and so forth. A tag component 140 assigns weights or scores according to monitored data activities. These can be probabilistically assigned, if desired (or other weighting classification), and can reflect the amount or pattern of the user's interaction with a given piece of data or application. For instance, minimal file usage may generate a lower weight than an extensive editing of the file. When the weights or scores have been determined, the determined weight information for activity is associated or tagged to one or more data items and is illustrated as activity data 150. This association can occur within the database 120 for example such as creating or modify values in a column or columns of the database 120 to indicate the weighting or importance of a particular item identified in a row of the database 120 for the metadata tag of interest. It is to be appreciated that the columns are logical entities and could either be stored explicitly or computed dynamically at the time of usage. Other type associations could include a metadata reference that is directly or indirectly assigned to one or more data items 130.
[0024] The tags and data items 130 can be applied with one or more data manipulation tools which can employ user activity information derived from the tags to augment information storage (e.g., efficient index creation), information access (e.g., searching, filtering or ranking of items) and information presentation (e.g., to organize, arrange or present items) and so forth. At 170, results from the data manipulation tool 160 are automatically generated and can include a reduced subset of data items form a larger set of the data items 130. It is not that as applied herein, the term subset can include all or a portion of the data items 130. Also, activity thresholds can be set-up within the tools 160 to include more or less of the data items 130 in the results 170.
[0025] In other aspects, content-based scores can be assigned for documents. For example, a score can be assigned based on the similarity of a user's query to the content of the document. Thus, when ranking, give more weight to terms that appear in documents or sections of a document that have been edited or the user has spent lots of time reading, for example. In yet another aspect of the system 100, activity data 150 can be employed to identify regions or terms of interest. Thus, differential weighting of document regions can be utilized to: [0026] 1) Compress an index to preferentially include terms in regions of interest;
2) Differentially weight terms in regions of interest for ranking;
3) Differentially weight terms in regions of interest for relevance feedback;
4) Automatically or semi-automatically generate queries based on regions of current user focus; and/or
5) Differentially present (via highlighting or other techniques) items or regions of items of interest.
[0027] The systems and methods described herein support a plurality of data processing applications. This includes processing data items such as documents, files, email messages, calendar appointments, web pages, sub-sections within the data items' or cross-item abstractions, for example. Applying tags to the data items can represent a location that a user last accessed an item, or represent a location history of times that a user has accessed or interacted with an item. The tag represents a time a user last accessed an item, a total number of times that an item has been accessed, represents a frequency that an item has been accessed, within a period of time extending into the past, or represents a frequency that an item has been accessed, within one or more arbitrarily specified periods of time. Other components can be provided that encode higher-order statistics of frequency of access over time. In one case, a viewer allows a user to retrieve items based on functions of one or more tags, allows a user to sort or filter retrieved items based on functions of one or more tags, or alternatively presents retrieved items based on functions of one or more tags. In another case, tags can be probabilistic indications of activity or interest.
[0028] Various processes include analysis of user activities with data items. This includes automatically tagging a quantity or nature of interaction that data items have received from computer users and employing the tags to further process the data items in accordance with future data activities. The processes can include storing data within an attentional annotation associated with the data items in a separate database or within a data structure embedded in the data items. Also, indexing procedures can be provided that weight subcomponents of data items differently for retrieval depending on a status of annotations indicating attention or interaction with data items. This includes indexing procedures that overlook or delete information in data items depending on a status of annotations indicating attention or interaction with data items. An index can be compressed by removing components that have not been attended to or interacted with by computer users or that are lesser attended to or interacted with components of data items.
In another aspect, a ranking score can be utilized for data retrieval, to yield more weight to terms or objects that appear in sections of a data item that a user has attended to or has interacted with. This can include employing attentional annotations to automatically or semi-automatically generate queries based on regions that have been attended to or interacted within the past and/or present. The attentional annotations can also be employed to provide differential access to items or differential display of items that have been attended to or interacted with in the past and/or present.
[0029] It is noted that attentional annotations can be encoded not only as attention to data items themselves but of attention to subcomponents of the items, where the annotation captures pointers or other indications of each of the subcomponents and the attention that has been received. For example, consider a large document such as 211 page document. The document may have been opened interacted with and attended to 23 times, for example which is captured as one type of attentional annotation for the document. However, other attentional annotations indicate that the user has repeatedly examined pages 4-6, 89-93, 123-124, and 198, for example, — and just skimmed quickly over other pages of the document. Thus, each subcomponent can be listed and the amount of attention that each portion of text can be encoded in the annotation. [0030] Fig. 2 illustrates an exemplary user interface 200 that utilizes user activity or interaction data. In this example, a manipulation tool 210 (e.g., user interface applied to a database), can be associated with an output or display 220. The tool 210 can include many features for processing data from one or more databases. For example, the tool 210 may include selections for enabling data searches, indexing or cataloging of data, ranking of data, and so forth. Such data can include textual data such as XML data or ASCII data for example. Other data includes image data, audio data, video data, graphics data, and/or presentation data such as contained in a series of slides, for example. Substantially any data type or application can be employed including spread sheets, Universal Resource Location (URL) information, Internet or Web data, and so forth. As noted above with respect to Fig. 1, such data can be tagged such as in a column or as file metadata to indicate a score or a weight that is indicative of the past usage or interactions. The manipulation tool can then search, retrieve, or process the tagged data to refine or determine more manageable subsets of data for users.
[0031] The output 220 from the tool 210 can be a file or an actual user interface display.
For instance, if the tool were employed as a search engine within a database, the output could be a display of returned results. The returned information can be more global in nature as illustrated at 230. This may include highlighting or applying graphics to a file to indicate that one file or grouping of files have been selected because of their increased activity with the user. In an e-mail search tool for example, the tool 210 may be applied to search for all files that have the keyword computer and have had at least one graphical image associated with the file in the past month. Searches can be crafted in a plurality of ways and can include combinations of content searching, activity-based searching, and or combinations thereof. For instance, in this example, three e-mails out of a set of ten e-mails may be highlighted in one color as having a higher activity score than the other seven e-mails which are delineated in a different color. In another aspect at 240, information within a returned file or data set can be highlighted or annotated to indicate usage activity (e.g., paragraph within a file selected with different font format to highlight usage areas within the document).
[0032] Fig. 3 illustrates a process 300 for determining and applying user activity or interaction data. While, for purposes of simplicity of explanation, the methodologies are shown and described as a series or number of acts, it is to be understood and appreciated that the subject process is not limited by the order of acts, as some acts may, in accordance with the subject process, occur in different orders and/or concurrently with other acts from that shown and described herein. For example, those skilled in the art will understand and appreciate that a methodology could alternatively be represented as a series of interrelated states or events, such as in a state diagram. Moreover, not all illustrated acts may be required to implement a methodology in accordance with the subject process.
[0033] Proceeding to 310, data interactions with one or more local or remote databases is monitored. Such monitoring can occur in a background and/or foreground application and is employed to determine when data or files are interacted with by users. For instance, this can include observing when files are opened or closed, edited, added to or deleted from, read or written to, and so forth. When an application is open and particular data is being operated upon from within an application, various techniques can be employed to determine activities within the application. These can include monitoring how long a user dwells on a particular set or subset of data, what data has been modified or observed and so forth. Outside monitoring can also be associated with the application to determine a user's interaction with the data, file or application. For instance, audio cues may be employed, automated facial recognition techniques, or explicit user instructions that a data set is highly relevant to the user. [0034] At 320, weights for the monitored activities are determined. The weights can be probabilistically assigned and can reflect the amount of the user's interaction with a given piece of data or application. For instance, a very light perusal of a document may generate a lower weight than an extensive editing of the document. At 330, the determined weight information for activity is associated or tagged to one or more data items. The association can occur within the confines of a database for example such as tagging a column or columns of the database to indicate the weighting or importance of a particular item identified in a row of the database. Other type associations could include a metadata reference that is directly or indirectly assigned to one or more data items. At 340, the tags and data items can be applied with a data processing application. This can include utilizing the activity information derived from the tags to augment searches for items, to index items, to arrange items, to rank items, to organize items and so forth. At 350, results from the data, processing application are generated. This can include explicit actions such as filtering a larger result set into a smaller subset or more subtle actions such as annotating a display to highlight files or data on the display which indicate those items that have been interacted with more by the user's.
[0035] Referring to Fig. 4, an example system 400 illustrates an information retrieval architecture that can be employed with user activity data processing. The system 100 depicts a general diagram for personalizing search results, however other forms of data manipulation can be performed as described above. A personalization component 410 includes a user model 420 based on user activity as well as processing components {e.g., retrieval algorithms modified in accordance with the user model) for using the model to influence search results by modifying a query 430 and/or modifying results 440 returned from a search. A user interface 450 generates the query 430 and receives modified or personalized results based upon a query modification 470 and/or results modification 460 provided by the personalization component 410. As utilized herein, the term "query modification" refers to both an alteration with respect to terms in the query 430 and alterations in an algorithm that matches the query 430 to documents in order to obtain the personalized results 440. Modified queries and/or results 440 are returned from one or more local and/or remote search engines 480. A global database 490 of user statistics may be maintained to facilitate updates to the user model 420. As can be appreciated, the user model 420 and/or global statistics 490 can be associated with user activity or interaction data as previously describe to facilitate data manipulation or processing.
[0036] Generally, there are at least two approaches to adapting search results based on the user model 420. In one aspect, query modification processes an initial input query and modifies or regenerates the query (via user model) to yield personalized results. Relevance feedback is a two-cycle variation of this process, wherein a query generates results that leads to a modified query (using explicit or implicit judgments about the initial results set) which yields personalized results that are personalized to a short-term model based on the query and result set. Longer-term user models can also be used in the context of relevance feedback. Further, query modifications also refer to alterations made in algorithm(s) employed to match the query to documents. In another aspect, results modification take a user's input as-is to generate a query to yield results which are then modified (via user model) to generate personalized results. It is noted that modification of results usually includes some form of re-ranking and/or selection from a larger set of alternatives which can include a consideration or weighting from determined activity data. Modification of results can also include various types of agglomeration and summarization of all or a subset of results.
[0037] Methods for modifying results include statistical similarity match (in which users interests and content are represented as vectors and matched to items), and category matching (in which the users' interests and content are represented and matched to items using a smaller set of descriptors). The above processes of query modification or results modification can be combined, either independently, or in an integrated process where dependencies are introduced among the two processes and leveraged.
[0038] Referring to Fig. 5, a user model 500 is illustrated that can be employed with user activity or interaction data. The user model 500 is employed to differentiate personalized searches from generalized searches and to facilitate rich data processing according to determined activity data. One aspect in successful personalization is to build a model of the user that accurately reflects their interests and is easy to maintain and adapt to changes regarding long- term and short-term interests. The user model can be obtained from a variety of sources, including but not limited to:
1) From a rich history of computing context at 510 which can be obtained from local, mobile, or remote sources (e.g., applications open, content of those applications, and detailed history of such interactions including locations).
2) From a rich index of content previously encountered at 520 (e.g., documents, web pages, email, Instant Messages, notes, calendar appointments, and so forth).
3) From monitoring client interactions at 530 including recent or frequent contacts, topics of interest derived from keywords, relationships in an organizational chart, appointments, and so forth.
4) From a history or log of previous web pages or local/remote data sites visited including a history of previous search queries at 540.
5) From profile of user interests at 550 which can be specified explicitly or implicitly derived via background monitoring.
6) From demographic information at 560 (e.g., location, gender, age, background, job category, and so forth).
[0039] From the above examples, it can be appreciated that the user model 500 can be based on many different sources of information. For instance, the model 500 can be sourced from a history or log of locations visited by a user over time, as monitored by devices such as the Global Positioning System (GPS). When monitoring with a GPS, raw spatial information can be converted into textual city names, and zip codes. The raw spatial information can be converted into textual city names, and zip codes for positions a user has paused or dwelled or incurred a loss of GPS signal, for example. The locations that the user has paused or dwelled or incurred a loss of GPS signal can identified and converted via a database of businesses and points of interest into textual labels. Other factors include logging the time of day or day of week to determine locations and points of interest.
[0040] In other aspects, components can be provided to manipulate parameters for controlling how a user's corpus of information, appointments, views of documents or files, activities, or locations can be grouped into subsets or weighted differentially in matching procedures for personalization based on type, age, or other combinations. For example, a retrieval algorithm could be limited to those aspects of the user's corpus that pertain to the query (e.g., documents that contain the query term or past interaction with data). Similarly, email may be analyzed from the previous 1 month, whereas web accesses from the previous 3 days, and the user's content created within the last year. It may be desirable that GPS location information is used from only today or other time period. The parameters can be manipulated automatically to create subsets (e.g., via an optimization process that varies parameters and tests response from user or system) or users can vary one or more of these parameters via a user interface, wherein such settings can be a function of the nature of the query, the time of day, day of week, or other contextual or activity-based observations.
[0041] Models can be derived for individuals or groups of individuals at 570 such as via collaborative filtering techniques that develop profiles by the analysis of similarities among individuals or groups of individuals. Similarity computations can be based on the content and/or usage of items. It is noted that modeling infrastructure and associated processing can reside on client, multiple clients, one or more servers, or combinations of servers and clients. [0042] At 580, machine learning techniques can be applied to learn user characteristics and interests over time as well as how and when data is interacted with by users. The learning models can include substantially any type of system such as statistical/mathematical models and processes for modeling users and determining preferences and interests including the use of Bayesian learning, which can generate Bayesian dependency models, such as Bayesian networks, naϊve Bayesian classifiers, and/or other statistical classification methodology, including Support Vector Machines (SVMs), for example. Other types of models or systems can include neural networks and Hidden Markov Models, for example. Although elaborate reasoning models can be employed, it is to be appreciated that other approaches can also utilized. For example, rather than a more thorough probabilistic approach, deterministic assumptions can also be employed (e.g., no recent searching for Z amount of time of a particular web site may imply by rule that user is no longer interested in the respective information). Thus, in addition to reasoning under uncertainty, logical decisions can also be made regarding the status, location, context, interests, focus, and so forth of the users.
[0043] The learning models can be trained from a user event data store (not shown) that collects or aggregates data from a plurality of different data sources. Such sources can include various data acquisition components that record or log user event data (e.g., cell phone, acoustical activity recorded by microphone, Global Positioning System (GPS), electronic calendar, vision monitoring equipment, desktop activity, web site interaction and so forth). It is noted that the systems can be implemented in substantially any manner that supports personalized query and results processing. For example, the system could be implemented as a server, a server farm, within client application(s), or more generalized to include a web service(s) or other automated application(s) that interact with search functions such as user interfaces and search engines.
[0044] Before proceeding, collaborative filter techniques applied at 570 of the user model 500 are described in more detail. These techniques can include employment of collaborative filters to analyze data and determine profiles for the user. Collaborative filtering systems generally use a centralized database about user preferences to predict additional topics users may desire. Collaborative filtering is applied with the user model 500 to process previous user activities from a group of users that may indicate preferences for a given user that predict likely or possible profiles for new users of a system. Several algorithms including techniques based on correlation coefficients, vector-based similarity calculations, and statistical Bayesian methods can be employed.
[0045] Referring to Fig. 6, a system 600 illustrates access-based information retrieval in accordance with user activity or interaction data. The system 600 includes one or more information sources 610 that are accessed or considered by a user. These sources 610 can be similar or disparate in nature having dissimilar information content, whereby some of the information sources may represent local data locations such as files, folders, applications, images, audio files, appointments, email, and so forth, and other sources 610 may represent remote sources such as web information, for example. As the user accesses different types of information over time, a usage analyzer 614 passes this information to a content analyzer 620 (or monitor) which can operate as a background task locally on a client machine and/or remotely in accordance with a server, processes the accessed data via a filter 624 for parsing content from data, and an automated indexer 630 that creates a content index 640 (or activity tags) of accessed data items.
[0046] In general, the analyzer 620 creates representations of accessed data in the index
640. For example, if the user has accessed a web page, the content analyzer 620 may create a thumbnail representation of the web page and associate a hyperlink reference to the page and thumbnail as part of a metadata file. The representation could be further tailored to reflect detailed patterns of user interaction with the page. In another case, if the user then accessed a text document having images contained therein, the analyzer 620 may extract the text or portions thereof, and associate a database link such as a file path as part of metadata. The indexer 630 would then automatically create an index (or add to an existing index) having two items in the content index 640 - the thumbnail representation and text document representation including metadata. In general, filters analyze the content of and metadata associated with items. Thus, for a Word document, for example, the filter 624 extracts metadata such as filename, title, author, keywords, creation date, and so forth along with the words in the document. This is what is used to build the index 640. The creation of thumbnails and the analysis of images could also be encapsulated in the filter 624, if desired. Activity or interaction metadata can be employed that may contain other items such as user and/or implicit tags that describe the items stored in the content index 640. It is to be appreciated that the indexer 630 may also perform filter 624 functions (e.g., indexer associates metadata with filtered content). [0047] A search component 650 is provided that receives a user query 654 for information items contained in the content index 640. The search component 650 can be provided as part of a user interface that returns links and/or representations of accessed items at 660 to the user in response to the query 654. For example, the user may query for "items relating to last years performance review," wherein the search component 650 extracts items from the index 640 such as emails, coworker evaluations, documents published in the last year, web page images, audio recordings and so forth relating to the context of the query 654. In another example, an implicit query may be derived from the query 654 {e.g., whenever I get a phone call from this person, pull-up last five e-mails from this person).
[0048] As will be described in more detail below, accessed items can be presented in a plurality of differing formats designed to facilitate efficient and timely retrieval of information items that have been previously accessed. Also, the links and/or representations 660 may include other items of interest to the user such as providing information items that the user may want to see other than those items previously accessed (e.g., system provides links to other content of interest based upon or inferred from query at hand, e.g., in addition to showing performance review items, optionally provide links to human resources describing review policies based on another index of content even though these items may or may not have not been previously accessed by the user).
[0049] In one aspect, an event component can be provided (not shown) (e.g., background task that monitors user activities associated with usage analyzer 614). The event component monitors user activities such as saving, reading, editing, copying, hovering on information, selecting information, manipulating information and/or deleting files, for example, and makes determinations with respect to user actions. This can include sensors such as microphones, cameras, and other devices along with monitoring desktop activities to determine user actions or goals. In one example, probabilistic models and/or logical decisions can be applied to determine events such as when a user has observed or contemplated information. Logical and/or statistical models (e.g., Bayesian dependency models, decision trees, Support Vector Machines) can be constructed that consider the following example classes of evidence associated with patterns of user activity:
• Focus of attention: Selection and/or dwelling on items, dwelling on portions of a document or on specific subtext after scrolling through a document.
• Introspection: A pause after a period of activity or a significant slowing of the rate of interaction. Undesired information: Immediate closure of a document after a brief glance, attempts to return to a prior state after an information access action. These observations include undoing the effect of recent action, including issuing an undo command, and deleting items. • Domain-specific syntactic and semantic content: Consideration of special distinctions in content or structure of documents and how user interacts with these features or items. These include domain-specific features associated with the task, (e.g., considering that rate and frequency of email messages, and the age in time or number of messages of a subject heading, from the author of a message at a user's focus of attention). As can be appreciated, the event component can be employed to trigger indexing of various types of information on the basis of user-activity. User's activity with information objects can also be utilized to improve information presentation.
[0050] Fig. 7 illustrates various retrieval service applications 700. In one aspect, explicit queries 710 and/or implicit queries 714 can be supported. Explicit queries 710 are directed by the user to find information of interest {e.g., show all data references relating to a meeting or date). Implicit queries 714 can in some cases be derived from the explicit query 710. For example, a user could have their desktop phone messages linked to their e-mail system or other message system. If a phone call were to arrive from selected individuals, the e-mail system could automatically retrieve e-mail relating to the individual via implicit query 714. In another example, at a predetermined interval before an upcoming meeting, the user's calendar system could trigger queries to recall data from past meetings or information relating to individuals attending the upcoming meeting. Also, implicit queries 714 may be generated based upon reasoning processes associated with the user's current context or query {e.g., a query composed of important words in recently read paragraphs).
[0051] Proceeding to 716, other types of queries support context-sensitive queries.
These types of queries include providing additional selection options to edit or refine searches. For example, queries may be directed to a particular type of application or location {e.g., apply this query to mail folder only). At 720, the context of an application can be considered when performing a query. For example, if a photo application is being used, then the query can be refined to only search for images. At 724, item-centric integrations can be performed. This includes operating system actions that support interface actions such as mouse click functions, tagging items, updating metadata files, deleting items, editing items or content, and so forth. [0052] At 730, file sharing can be performed. For example, the user may specify that one or more other users can inspect or have access to all or a subset of their query/index database {e.g., all users on my project team are permitted access to my project notes). At 734, index scrubbing can occur. Over time, users may desire to remove one or more items from their index. In accordance with this activity, users can specify specific items to remove or specify general topic areas that can be automatically scrubbed by the system (e.g., remove thumbnails related to my birthday two years ago). Other actions could occur based upon logical or reasoning processes such as if an item were accessed fewer than a certain number of times in a predetermined period, then the item could be automatically removed if desired. [0053] At 740, effective time computations are considered. As an example, the date that's relevant or useful concerning a file (during data presentation to a user) is the date it was changed, the date for presenting mail is usually the date it was delivered (and thus approximately when the user saw it), and the useful date for an appointment is the date the appointment occurs. It is noted that all time information recorded and indexed and that useful date information can be utilized for presentation of information. Thus, for appointments, various tasks can occur such as indexing the time mail was sent, the time it was updated (if that happened), the time the user accepted/declines, and the time the meeting occurred, for example. However, typically one time is selected for display although more than one time can be provided.
[0054] As noted above, certain data can be marked as having been previously observed by analyzing file elements associated with a file type. For example, a text document may contain a field indicating when a file was open or last edited. With respect to calendar appointments however, merely creating an index from when the calendar was created is likely to be of minor benefit to people because sometimes meetings are created well in advance of the actual meeting date. Thus, when indexing a calendar appointment, the actual meeting data as opposed to time of creation can be tracked. This type of effective time consideration enables users to retrieve information in a manner more suited to memory recall. At 744, the volatility of data is considered and processed. This type of processing involves indexing of data into a persistent form during intermittent operations. As can be appreciated, various automated background operations are possible.
[0055] With reference to Fig. 8, an exemplary environment 810 for implementing various aspects described herein includes a computer 812. The computer 812 includes a processing unit 814, a system memory 816, and a system bus 818. The system bus 818 couples system components including, but not limited to, the system memory 816 to the processing unit 814. The processing unit 814 can be any of various available processors. Dual microprocessors and other multiprocessor architectures also can be employed as the processing unit 814. [0056] The system bus 818 can be any of several types of bus structure(s) including the memory bus or memory controller, a peripheral bus or external bus, and/or a local bus using any variety of available bus architectures including, but not limited to, 11 -bit bus, Industrial Standard Architecture (ISA), Micro-Channel Architecture (MSA), Extended ISA (EISA), Intelligent Drive Electronics (IDE), VESA Local Bus (VLB), Peripheral Component Interconnect (PCI), Universal Serial Bus (USB), Advanced Graphics Port (AGP), Personal Computer Memory Card International Association bus (PCMCIA), and Small Computer Systems Interface (SCSI).
[0057] The system memory 816 includes volatile memory 820 and nonvolatile memory
822. The basic input/output system (BIOS), containing the basic routines to transfer information between elements within the computer 812, such as during start-up, is stored in nonvolatile memory 822. By way of illustration, and not limitation, nonvolatile memory 822 can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM), or flash memory. Volatile memory 820 includes random access memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in many forms such as synchronous RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), Synchlink DRAM (SLDRAM), and direct Rambus RAM (DRRAM).
[0058] Computer 812 also includes removable/non-removable, volatile/non-volatile computer storage media. Fig. 8 illustrates, for example a disk storage 824. Disk storage 824 includes, but is not limited to, devices like a magnetic disk drive, floppy disk drive, tape drive, Jaz drive, Zip drive, LS-100 drive, flash memory card, or memory stick. In addition, disk storage 824 can include storage media separately or in combination with other storage media including, but not limited to, an optical disk drive such as a compact disk ROM device (CD- ROM), CD recordable drive (CD-R Drive), CD rewritable drive (CD-RW Drive) or a digital versatile disk ROM drive (DVD-ROM). To facilitate connection of the disk storage devices 824 to the system bus 818, a removable or non-removable interface is typically used such as interface 826.
[0059] It is to be appreciated that Fig 8 describes software that acts as an intermediary between users and the basic computer resources described in suitable operating environment 810. Such software includes an operating system 828. Operating system 828, which can be stored on disk storage 824, acts to control and allocate resources of the computer system 812. System applications 830 take advantage of the management of resources by operating system 828 through program modules 832 and program data 834 stored either in system memory 816 or on disk storage 824. It is to be appreciated that various components described herein can be implemented with various operating systems or combinations of operating systems. [0060] A user enters commands or information into the computer 812 through input device(s) 836. Input devices 836 include, but are not limited to, a pointing device such as a mouse, trackball, stylus, touch pad, keyboard, microphone, joystick, game pad, satellite dish, scanner, TV tuner card, digital camera, digital video camera, web camera, and the like. These and other input devices connect to the processing unit 814 through the system bus 818 via interface port(s) 838. Interface port(s) 838 include, for example, a serial port, a parallel port, a game port, and a universal serial bus (USB). Output device(s) 840 use some of the same type of ports as input device(s) 836. Thus, for example, a USB port may be used to provide input to computer 812, and to output information from computer 812 to an output device 840. Output adapter 842 is provided to illustrate that there are some output devices 840 like monitors, speakers, and printers, among other output devices 840, that require special adapters. The output adapters 842 include, by way of illustration and not limitation, video and sound cards that provide a means of connection between the output device 840 and the system bus 818. It should be noted that other devices and/or systems of devices provide both input and output capabilities such as remote computer(s) 844.
[0061] Computer 812 can operate in a networked environment using logical connections to one or more remote computers, such as remote computer(s) 844. The remote computer(s) 844 can be a personal computer, a server, a router, a network PC, a workstation, a microprocessor based appliance, a peer device or other common network node and the like, and typically includes many or all of the elements described relative to computer 812. For purposes of brevity, only a memory storage device 846 is illustrated with remote computer(s) 844. Remote computer(s) 844 is logically connected to computer 812 through a network interface 848 and then physically connected via communication connection 850. Network interface 848 encompasses communication networks such as local-area networks (LAN) and wide-area networks (WAN). LAN technologies include Fiber Distributed Data Interface (FDDI), Copper Distributed Data Interface (CDDI), Ethernet/IEEE 802.3, Token Ring/IEEE 802.5 and the like. WAN technologies include, but are not limited to, point-to-point links, circuit switching networks like Integrated Services Digital Networks (ISDN) and variations thereon, packet switching networks, and Digital Subscriber Lines (DSL).
[0062] Communication connection(s) 850 refers to the hardware/software employed to connect the network interface 848 to the bus 818. While communication connection 850 is shown for illustrative clarity inside computer 812, it can also be external to computer 812. The hardware/software necessary for connection to the network interface 848 includes, for exemplary purposes only, internal and external technologies such as, modems including regular telephone grade modems, cable modems and DSL modems, ISDN adapters, and Ethernet cards. [0063] Fig. 9 is a schematic block diagram of a sample-computing environment 900 that can be employed. The system 900 includes one or more client(s) 910. The client(s) 910 can be hardware and/or software (e.g., threads, processes, computing devices). The system 900 also includes one or more server(s) 930. The server(s) 930 can also be hardware and/or software (e.g., threads, processes, computing devices). The servers 930 can house threads to perform transformations by employing the components described herein, for example. One possible communication between a client 910 and a server 930 may be in the form of a data packet adapted to be transmitted between two or more computer processes. The system 900 includes a communication framework 950 that can be employed to facilitate communications between the client(s) 910 and the server(s) 930. The client(s) 910 are operably connected to one or more client data store(s) 960 that can be employed to store information local to the client(s) 910. Similarly, the server(s) 930 are operably connected to one or more server data store(s) 940 that can be employed to store information local to the servers 930.
[0064] What has been described above includes various exemplary aspects. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing these aspects, but one of ordinary skill in the art may recognize that many further combinations and permutations are possible. Accordingly, the aspects described herein are intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the appended claims. Furthermore, to the extent that the term "includes" is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term "comprising" as "comprising" is interpreted when employed as a transitional word in a claim.

Claims

CLAIMSWhat is claimed is:
1. A data manipulation system, comprising: one or more data items that are associated with one or more tags that indicate at least one user's interaction with or attention to the data items; and a manipulation tool that processes the data items to determine a subset of data items based at least in part on at least one user's interaction with the data items.
2. The system of claim 1, the data items include documents, files, email messages, calendar appointment, web pages, sub-sections within the data items, or cross-item abstractions.
3. The system of claim 1, where the tags represent a location that a user last accessed an item, or represent a location history of times that a user has accessed or interacted with an item.
4. The system of claim 1, where the tag represents a represents a time a user last accessed an item, a total number of times that an item has been accessed, represents a frequency that an item has been accessed, within a period of time extending into the past, or represents a frequency that an item has been accessed, within one or more arbitrarily specified periods of time.
5. The system of claim 1, further comprising a component that encodes higher-order statistics of frequency of access over time.
6. The system of claim 1, further comprising a viewer that allows a user to retrieve items based on functions of one or more tags, a viewer that allows a user to sort or filter retrieved items based on functions of one or more tags, or a viewer that presents retrieved items based on functions of one or more tags.
7. The system of claim 1, further comprising tags that are probabilistic indications of activity or interest.
8. A computer readable medium having computer executable instructions stored thereon for executing the components of claim 1.
9. A method for analysis of user activities with data items, comprising: \ automatically tagging a quantity or nature of interaction that data items have received from computer users; and employing the tags to further process the data items in accordance with future data activities.
10. The method of claim 9, storing data within an attentional annotation associated with the data items in a separate database or within a data structure embedded in the data items.
11. The method of claim 10, the data items compose a computer readable storage medium.
12. The method of claim 11, the storage medium is a data item that includes text, graphics, and related data components.
13. The method of claim 9, further comprising providing indexing procedures that weight subcomponents of data items differently for retrieval depending on a status of annotations indicating attention or interaction with data items.
14. The method of claim 13, the indexing procedures overlook or delete information in data items depending on a status of annotations indicating attention or interaction with data items.
15. The method of claim 13, further comprising an index that is compressed by removing components that have not been attended to or interacted with by computer users.
I.
16. The method of claim 15, further comprising an index that is compressed by removing components that are lesser attended to or interacted with components of data items.
17. The method of claim 15, further comprising providing a ranking score used for retrieval, to yield more weight to terms or objects that appear in sections of a data item that a user has attended to or has interacted with.
18. The method of claim 15, further comprising employing attentional annotations to automatically or semi-automatically generate queries based on regions that have been attended to or interacted within the past and/or present, or employing attentional annotations to provide differential access to items or differential display of items that have been attended to or interacted within the past and/or present.
19. A system for analyzing user activities with data items, comprising: means for determining user activities with respect to one or more data items; means for tagging the data items based at least in part on the user activities; and means for storing or retrieving data based in part on the tagged data items.
20. The system of claim 19, further comprising means for encoding attentional annotations for the data items and means for encoding attentional annotations for subcomponents within the data items, where the annotations capture pointers or indications of each of the subcomponents and resulting user attention that has been received for the subcomponents.
PCT/US2006/024847 2005-06-29 2006-06-27 Sensing, storing, indexing, and retrieving data leveraging measures of user activity, attention, and interest WO2007005382A2 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN2006800226726A CN101501627B (en) 2005-06-29 2006-06-27 Data manipulation system and method for analyzing user activity aiming at data items
JP2008520267A JP5021640B2 (en) 2005-06-29 2006-06-27 Detect, store, index, and search means for leveraging data on user activity, attention, and interests
EP06774032.4A EP1897002B1 (en) 2005-06-29 2006-06-27 Sensing, storing, indexing, and retrieving data leveraging measures of user activity, attention, and interest
KR1020077030613A KR101242369B1 (en) 2005-06-29 2006-06-27 Sensing, storing, indexing, and retrieving data leveraging measures of user activity, attention, and interest

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11/172,121 2005-06-29
US11/172,121 US7693817B2 (en) 2005-06-29 2005-06-29 Sensing, storing, indexing, and retrieving data leveraging measures of user activity, attention, and interest

Publications (2)

Publication Number Publication Date
WO2007005382A2 true WO2007005382A2 (en) 2007-01-11
WO2007005382A3 WO2007005382A3 (en) 2009-04-16

Family

ID=37604960

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2006/024847 WO2007005382A2 (en) 2005-06-29 2006-06-27 Sensing, storing, indexing, and retrieving data leveraging measures of user activity, attention, and interest

Country Status (6)

Country Link
US (1) US7693817B2 (en)
EP (1) EP1897002B1 (en)
JP (1) JP5021640B2 (en)
KR (1) KR101242369B1 (en)
CN (1) CN101501627B (en)
WO (1) WO2007005382A2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2244195A3 (en) * 2009-04-22 2011-01-12 Palo Alto Research Center Incorporated System and method for implicit tagging of documents using search query data
WO2014162033A1 (en) * 2013-04-01 2014-10-09 Crambo Sa Method, mobile device, system and computer product for detecting and measuring the attention level of a user
WO2019112673A1 (en) * 2017-12-05 2019-06-13 Google Llc Identifying videos with inappropriate content by processing search logs
US20200272631A1 (en) * 2007-01-31 2020-08-27 Paypal, Inc. Selective presentation of data items
CN114579893A (en) * 2022-05-09 2022-06-03 山东大学 Continuous POI recommendation method and system

Families Citing this family (131)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7917468B2 (en) * 2005-08-01 2011-03-29 Seven Networks, Inc. Linking of personal information management data
US8468126B2 (en) 2005-08-01 2013-06-18 Seven Networks, Inc. Publishing data in an information community
US8131674B2 (en) 2004-06-25 2012-03-06 Apple Inc. Methods and systems for managing data
US20050289107A1 (en) * 2004-06-25 2005-12-29 Yan Arrouye Methods and systems for managing data
WO2006025145A1 (en) * 2004-08-31 2006-03-09 Access Co., Ltd. Markup language processing device, browser program, and markup language processing method
CA2500573A1 (en) * 2005-03-14 2006-09-14 Oculus Info Inc. Advances in nspace - system and method for information analysis
US7925995B2 (en) 2005-06-30 2011-04-12 Microsoft Corporation Integration of location logs, GPS signals, and spatial resources for identifying user activities, goals, and context
US20190268430A1 (en) 2005-08-01 2019-08-29 Seven Networks, Llc Targeted notification of content availability to a mobile device
US8645376B2 (en) 2008-05-02 2014-02-04 Salesforce.Com, Inc. Method and system for managing recent data in a mobile device linked to an on-demand service
US9135304B2 (en) * 2005-12-02 2015-09-15 Salesforce.Com, Inc. Methods and systems for optimizing text searches over structured data in a multi-tenant environment
US20070143376A1 (en) * 2005-12-16 2007-06-21 Mcintosh Robert Methods, systems, and computer program products for displaying at least one electronic media file on an electronic calendar based on information associated with the electronic calendar
US7996396B2 (en) * 2006-03-28 2011-08-09 A9.Com, Inc. Identifying the items most relevant to a current query based on user activity with respect to the results of similar queries
US8910060B2 (en) * 2006-06-22 2014-12-09 Rohit Chandra Method and apparatus for highlighting a portion of an internet document for collaboration and subsequent retrieval
US11429685B2 (en) 2006-06-22 2022-08-30 Rohit Chandra Sharing only a part of a web page—the part selected by a user
US11763344B2 (en) 2006-06-22 2023-09-19 Rohit Chandra SaaS for content curation without a browser add-on
US10866713B2 (en) 2006-06-22 2020-12-15 Rohit Chandra Highlighting on a personal digital assistant, mobile handset, eBook, or handheld device
US11288686B2 (en) 2006-06-22 2022-03-29 Rohit Chandra Identifying micro users interests: at a finer level of granularity
US11853374B2 (en) 2006-06-22 2023-12-26 Rohit Chandra Directly, automatically embedding a content portion
US10884585B2 (en) 2006-06-22 2021-01-05 Rohit Chandra User widget displaying portions of content
US10289294B2 (en) 2006-06-22 2019-05-14 Rohit Chandra Content selection widget for visitors of web pages
US11301532B2 (en) 2006-06-22 2022-04-12 Rohit Chandra Searching for user selected portions of content
US9292617B2 (en) 2013-03-14 2016-03-22 Rohit Chandra Method and apparatus for enabling content portion selection services for visitors to web pages
US10909197B2 (en) 2006-06-22 2021-02-02 Rohit Chandra Curation rank: content portion search
US8352573B2 (en) * 2006-06-23 2013-01-08 Rohit Chandra Method and apparatus for automatically embedding and emailing user-generated highlights
US20080005067A1 (en) * 2006-06-28 2008-01-03 Microsoft Corporation Context-based search, retrieval, and awareness
US20080065649A1 (en) * 2006-09-08 2008-03-13 Barry Smiler Method of associating independently-provided content with webpages
US8260252B2 (en) * 2006-10-02 2012-09-04 The Nielsen Company (Us), Llc Method and apparatus for collecting information about portable device usage
US8014726B1 (en) 2006-10-02 2011-09-06 The Nielsen Company (U.S.), Llc Method and system for collecting wireless information transparently and non-intrusively
US20080140642A1 (en) * 2006-10-10 2008-06-12 Bill Messing Automated user activity associated data collection and reporting for content/metadata selection and propagation service
US8661029B1 (en) 2006-11-02 2014-02-25 Google Inc. Modifying search result ranking based on implicit user feedback
US8484083B2 (en) * 2007-02-01 2013-07-09 Sri International Method and apparatus for targeting messages to users in a social network
US7685200B2 (en) * 2007-03-01 2010-03-23 Microsoft Corp Ranking and suggesting candidate objects
US8036926B2 (en) * 2007-03-12 2011-10-11 International Business Machines Corporation Techniques for selecting calendar events by examining content of user's recent e-mail activity
US8005823B1 (en) * 2007-03-28 2011-08-23 Amazon Technologies, Inc. Community search optimization
US9542394B2 (en) * 2007-06-14 2017-01-10 Excalibur Ip, Llc Method and system for media-based event generation
US8244660B2 (en) 2007-06-28 2012-08-14 Microsoft Corporation Open-world modeling
US9235848B1 (en) 2007-07-09 2016-01-12 Groupon, Inc. Implicitly associating metadata using user behavior
US8321556B1 (en) * 2007-07-09 2012-11-27 The Nielsen Company (Us), Llc Method and system for collecting data on a wireless device
CN101802787A (en) * 2007-08-20 2010-08-11 费斯布克公司 Targeting advertisements in a social network
US20090064282A1 (en) * 2007-08-29 2009-03-05 International Business Machines Corporation Method for organizing activities in activity-centric computing networks
FR2921503B1 (en) * 2007-09-20 2010-01-29 Alcatel Lucent AUTOMATIC CONTENT INDEXING DEVICE
US20090138314A1 (en) * 2007-11-05 2009-05-28 Michael Bruce Method and system for locating a workforce
US9123079B2 (en) 2007-11-05 2015-09-01 Facebook, Inc. Sponsored stories unit creation from organic activity stream
US8799068B2 (en) * 2007-11-05 2014-08-05 Facebook, Inc. Social advertisements and other informational messages on a social networking website, and advertising model for same
US9990652B2 (en) 2010-12-15 2018-06-05 Facebook, Inc. Targeting social advertising to friends of users who have interacted with an object associated with the advertising
US20120203831A1 (en) 2011-02-03 2012-08-09 Kent Schoen Sponsored Stories Unit Creation from Organic Activity Stream
US8423557B2 (en) * 2007-11-06 2013-04-16 International Business Machines Corporation Computer method and system for determining individual priorities of shared activities
US8862582B2 (en) * 2007-11-15 2014-10-14 At&T Intellectual Property I, L.P. System and method of organizing images
US8019772B2 (en) * 2007-12-05 2011-09-13 International Business Machines Corporation Computer method and apparatus for tag pre-search in social software
US20090187540A1 (en) * 2008-01-22 2009-07-23 Microsoft Corporation Prediction of informational interests
US8892552B1 (en) * 2008-03-11 2014-11-18 Google Inc. Dynamic specification of custom search engines at query-time, and applications thereof
US9268843B2 (en) * 2008-06-27 2016-02-23 Cbs Interactive Inc. Personalization engine for building a user profile
US8214346B2 (en) * 2008-06-27 2012-07-03 Cbs Interactive Inc. Personalization engine for classifying unstructured documents
JP5327784B2 (en) * 2008-07-30 2013-10-30 株式会社日立製作所 Computer system, information collection support device, and information collection support method
US9407942B2 (en) * 2008-10-03 2016-08-02 Finitiv Corporation System and method for indexing and annotation of video content
US9477672B2 (en) 2009-12-02 2016-10-25 Gartner, Inc. Implicit profile for use with recommendation engine and/or question router
US20100235231A1 (en) * 2009-01-30 2010-09-16 Cbs Interactive, Inc. Lead acquisition, promotion and inventory management system and method
US9342814B2 (en) * 2009-04-07 2016-05-17 Clearslide, Inc. Presentation access tracking system
US8661030B2 (en) 2009-04-09 2014-02-25 Microsoft Corporation Re-ranking top search results
US20100287152A1 (en) 2009-05-05 2010-11-11 Paul A. Lipari System, method and computer readable medium for web crawling
US10303722B2 (en) * 2009-05-05 2019-05-28 Oracle America, Inc. System and method for content selection for web page indexing
US9075883B2 (en) 2009-05-08 2015-07-07 The Nielsen Company (Us), Llc System and method for behavioural and contextual data analytics
US20100299305A1 (en) * 2009-05-22 2010-11-25 Microsoft Corporation Programming element modification recommendation
US8150843B2 (en) 2009-07-02 2012-04-03 International Business Machines Corporation Generating search results based on user feedback
US8498974B1 (en) 2009-08-31 2013-07-30 Google Inc. Refining search results
US8972391B1 (en) * 2009-10-02 2015-03-03 Google Inc. Recent interest based relevance scoring
CA2691326A1 (en) * 2010-01-28 2011-07-28 Ibm Canada Limited - Ibm Canada Limitee Integrated automatic user support and assistance
US10102278B2 (en) * 2010-02-03 2018-10-16 Gartner, Inc. Methods and systems for modifying a user profile for a recommendation algorithm and making recommendations based on user interactions with items
US8635267B2 (en) * 2010-03-09 2014-01-21 Cbs Interactive Inc. Systems and methods for generating user entertainment activity profiles
US9697500B2 (en) * 2010-05-04 2017-07-04 Microsoft Technology Licensing, Llc Presentation of information describing user activities with regard to resources
AU2010355789B2 (en) 2010-06-24 2016-05-12 Arbitron Mobile Oy Network server arrangement for processing non-parametric, multi-dimensional, spatial and temporal human behavior or technical observations measured pervasively, and related method for the same
US9623119B1 (en) 2010-06-29 2017-04-18 Google Inc. Accentuating search results
US8340685B2 (en) 2010-08-25 2012-12-25 The Nielsen Company (Us), Llc Methods, systems and apparatus to generate market segmentation data with anonymous location data
US9158775B1 (en) 2010-12-18 2015-10-13 Google Inc. Scoring stream items in real time
WO2012083540A1 (en) * 2010-12-23 2012-06-28 Nokia Corporation Method and apparatus for providing token-based classification of device information
CN102075574B (en) * 2011-01-07 2013-02-06 北京邮电大学 Web-based collaborative learning (WBCL) system and method
US8610788B2 (en) 2011-02-08 2013-12-17 International Business Machines Corporation Content storage management in cameras
US9449093B2 (en) * 2011-02-10 2016-09-20 Sri International System and method for improved search experience through implicit user interaction
US8666984B2 (en) * 2011-03-18 2014-03-04 Microsoft Corporation Unsupervised message clustering
US9477574B2 (en) 2011-05-12 2016-10-25 Microsoft Technology Licensing, Llc Collection of intranet activity data
US9652556B2 (en) 2011-10-05 2017-05-16 Google Inc. Search suggestions based on viewport content
US9032316B1 (en) 2011-10-05 2015-05-12 Google Inc. Value-based presentation of user-selectable computing actions
US8890827B1 (en) 2011-10-05 2014-11-18 Google Inc. Selected content refinement mechanisms
US10013152B2 (en) 2011-10-05 2018-07-03 Google Llc Content selection disambiguation
US8878785B1 (en) 2011-10-05 2014-11-04 Google Inc. Intent determination using geometric shape input
WO2013052866A2 (en) * 2011-10-05 2013-04-11 Google Inc. Semantic selection and purpose facilitation
US8825671B1 (en) 2011-10-05 2014-09-02 Google Inc. Referent determination from selected content
US10102100B2 (en) * 2011-11-29 2018-10-16 International Business Machines Corporation Optimizing automated interactions with computer software applications
US8635230B2 (en) * 2012-01-26 2014-01-21 International Business Machines Corporation Display of information in computing devices
US8938405B2 (en) 2012-01-30 2015-01-20 International Business Machines Corporation Classifying activity using probabilistic models
US8718927B2 (en) * 2012-03-12 2014-05-06 Strava, Inc. GPS data repair
WO2013180704A1 (en) * 2012-05-30 2013-12-05 Intel Corporation Determining a profile for a recommendation engine based on group interaction dynamics
CN104471571B (en) * 2012-07-11 2018-01-19 谢晚霞 To Web activities index, sequence and the system and method for analysis under event-driven framework
US20140040256A1 (en) 2012-08-06 2014-02-06 Aol Inc. Systems and methods for processing electronic content
US8825764B2 (en) * 2012-09-10 2014-09-02 Facebook, Inc. Determining user personality characteristics from social networking system communications and characteristics
US20140236964A1 (en) * 2013-02-19 2014-08-21 Lexisnexis, A Division Of Reed Elsevier Inc. Systems And Methods For Ranking A Plurality Of Documents Based On User Activity
US10600011B2 (en) 2013-03-05 2020-03-24 Gartner, Inc. Methods and systems for improving engagement with a recommendation engine that recommends items, peers, and services
US9524489B2 (en) * 2013-03-14 2016-12-20 Samsung Electronics Co., Ltd. Computing system with task transfer mechanism and method of operation thereof
KR102107810B1 (en) * 2013-03-19 2020-05-28 삼성전자주식회사 Display apparatus and displaying method for information regarding activity using the same
US9734208B1 (en) * 2013-05-13 2017-08-15 Audible, Inc. Knowledge sharing based on meeting information
WO2015035185A1 (en) 2013-09-06 2015-03-12 Apple Inc. Providing transit information
US20150006526A1 (en) * 2013-06-28 2015-01-01 Google Inc. Determining Locations of Interest to a User
CN104423824A (en) * 2013-09-02 2015-03-18 联想(北京)有限公司 Information processing method and device
US9298831B1 (en) 2013-12-13 2016-03-29 Google Inc. Approximating a user location
US9778817B2 (en) * 2013-12-31 2017-10-03 Findo, Inc. Tagging of images based on social network tags or comments
US20150237470A1 (en) * 2014-02-14 2015-08-20 Apple Inc. Personal Geofence
US20150339373A1 (en) * 2014-05-20 2015-11-26 Matthew Christian Carlson Graphical interface for relevance-based rendering of electronic messages from multiple accounts
US9615202B2 (en) 2014-05-30 2017-04-04 Apple Inc. Determining a significant user location for providing location-based services
CN104142990A (en) * 2014-07-28 2014-11-12 百度在线网络技术(北京)有限公司 Search method and device
JP6147242B2 (en) * 2014-12-19 2017-06-14 ヤフー株式会社 Prediction device, prediction method, and prediction program
US11030171B2 (en) * 2015-01-09 2021-06-08 Ariba, Inc. Elastic sharding of data in a multi-tenant cloud
US10552493B2 (en) 2015-02-04 2020-02-04 International Business Machines Corporation Gauging credibility of digital content items
US10643185B2 (en) 2016-06-10 2020-05-05 Apple Inc. Suggested locations for calendar events
US10552728B2 (en) 2016-07-29 2020-02-04 Splunk Inc. Automated anomaly detection for event-based system
US11314799B2 (en) 2016-07-29 2022-04-26 Splunk Inc. Event-based data intake and query system employing non-text machine data
US11227208B2 (en) 2016-07-29 2022-01-18 Splunk Inc. Automated data-generation for event-based system
US10956481B2 (en) * 2016-07-29 2021-03-23 Splunk Inc. Event-based correlation of non-text machine data
US20180232449A1 (en) * 2017-02-15 2018-08-16 International Business Machines Corporation Dynamic faceted search
US10529000B1 (en) * 2017-02-22 2020-01-07 Udemy, Inc. System and method for automatically tagging products for an e-commerce web application and providing product recommendations
US10467230B2 (en) 2017-02-24 2019-11-05 Microsoft Technology Licensing, Llc Collection and control of user activity information and activity user interface
US10732796B2 (en) * 2017-03-29 2020-08-04 Microsoft Technology Licensing, Llc Control of displayed activity information using navigational mnemonics
US10671245B2 (en) 2017-03-29 2020-06-02 Microsoft Technology Licensing, Llc Collection and control of user activity set data and activity set user interface
US10693748B2 (en) 2017-04-12 2020-06-23 Microsoft Technology Licensing, Llc Activity feed service
US10853220B2 (en) 2017-04-12 2020-12-01 Microsoft Technology Licensing, Llc Determining user engagement with software applications
US10889958B2 (en) * 2017-06-06 2021-01-12 Caterpillar Inc. Display system for machine
US11580088B2 (en) 2017-08-11 2023-02-14 Microsoft Technology Licensing, Llc Creation, management, and transfer of interaction representation sets
US20190205465A1 (en) * 2017-12-28 2019-07-04 Salesforce.Com, Inc. Determining document snippets for search results based on implicit user interactions
US11126630B2 (en) 2018-05-07 2021-09-21 Salesforce.Com, Inc. Ranking partial search query results based on implicit user interactions
US10831471B2 (en) 2018-07-19 2020-11-10 Microsoft Technology Licensing, Llc Source code file recommendation notification
US10459999B1 (en) * 2018-07-20 2019-10-29 Scrappycito, Llc System and method for concise display of query results via thumbnails with indicative images and differentiating terms
US11676015B2 (en) * 2020-05-28 2023-06-13 Salesforce, Inc. Personalized recommendations using a transformer neural network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5924096A (en) 1997-10-15 1999-07-13 Novell, Inc. Distributed database using indexed into tags to tracks events according to type, update cache, create virtual update log on demand
US20040205093A1 (en) 1999-12-01 2004-10-14 Jin Li Methods and systems for providing random access to structured media content
US6910029B1 (en) 2000-02-22 2005-06-21 International Business Machines Corporation System for weighted indexing of hierarchical documents

Family Cites Families (64)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US52963A (en) * 1866-03-06 Improvement in carts
US54174A (en) * 1866-04-24 Improved sad-iron
US40591A (en) * 1863-11-10 Improvement in gas-heating apparatus
US43231A (en) * 1864-06-21 Improved tire or hoop bender
US46401A (en) * 1865-02-14 Improved milling-machine
US44152A (en) * 1864-09-13 Improved tile-machine
US154476A (en) * 1874-08-25 Improvement in sulky-plows
US99817A (en) * 1870-02-15 of chicago
US34078A (en) * 1862-01-07 Improvement in scroll-saws
US83158A (en) * 1868-10-20 Frank a
US83025A (en) * 1868-10-13 Improved sofa-bedstead
US40590A (en) * 1863-11-10 Improvement in wrenches
US43232A (en) * 1864-06-21 Improvement in preserving fruits
US87525A (en) * 1869-03-02 George tefft
US52930A (en) * 1866-02-27 Improvement in wrenches
US78204A (en) * 1868-05-26 Improved lounge
US80156A (en) * 1868-07-21 James k
US32689A (en) * 1861-07-02 Improvement in projectiles for ordnance
US80155A (en) * 1868-07-21 brisk ell
US54130A (en) * 1866-04-24 Improvement in lever-power of windlasses
US5555376A (en) 1993-12-03 1996-09-10 Xerox Corporation Method for granting a user request having locational and contextual attributes consistent with user policies for devices having locational attributes consistent with the user request
US5493692A (en) 1993-12-03 1996-02-20 Xerox Corporation Selective delivery of electronic messages in a multiple computer system based on context and environment of a user
US5812865A (en) 1993-12-03 1998-09-22 Xerox Corporation Specifying and establishing communication data paths between particular media devices in multiple media device computing systems based on context of a user or users
US6092725A (en) 1997-01-24 2000-07-25 Symbol Technologies, Inc. Statistical sampling security methodology for self-scanning checkout system
US6035104A (en) 1996-06-28 2000-03-07 Data Link Systems Corp. Method and apparatus for managing electronic documents by alerting a subscriber at a destination other than the primary destination
US7040541B2 (en) 1996-09-05 2006-05-09 Symbol Technologies, Inc. Portable shopping and order fulfillment system
US6837436B2 (en) 1996-09-05 2005-01-04 Symbol Technologies, Inc. Consumer interactive shopping system
US6409086B1 (en) 1997-08-08 2002-06-25 Symbol Technolgies, Inc. Terminal locking system
EP0938053B1 (en) 1998-02-20 2003-08-20 Hewlett-Packard Company, A Delaware Corporation Methods of refining descriptors
US6640214B1 (en) 1999-01-16 2003-10-28 Symbol Technologies, Inc. Portable electronic terminal and data processing system
US7010501B1 (en) 1998-05-29 2006-03-07 Symbol Technologies, Inc. Personal shopping system
JP3607093B2 (en) * 1998-09-10 2005-01-05 シャープ株式会社 Information management apparatus and recording medium on which program is recorded
AT406588B (en) * 1998-09-29 2000-06-26 Chemiefaser Lenzing Ag METHOD FOR PRODUCING CELLULOSIC FIBERS
US7055101B2 (en) 1998-12-18 2006-05-30 Tangis Corporation Thematic response to a computer user's context, such as by a wearable personal computer
US7107539B2 (en) 1998-12-18 2006-09-12 Tangis Corporation Thematic response to a computer user's context, such as by a wearable personal computer
US6801223B1 (en) 1998-12-18 2004-10-05 Tangis Corporation Managing interactions between computer users' context models
US7076737B2 (en) 1998-12-18 2006-07-11 Tangis Corporation Thematic response to a computer user's context, such as by a wearable personal computer
US6513046B1 (en) 1999-12-15 2003-01-28 Tangis Corporation Storing and recalling information to augment human memories
US6466232B1 (en) 1998-12-18 2002-10-15 Tangis Corporation Method and system for controlling presentation of information to a user based on the user's condition
US6842877B2 (en) 1998-12-18 2005-01-11 Tangis Corporation Contextual responses based on automated learning techniques
US6968333B2 (en) 2000-04-02 2005-11-22 Tangis Corporation Soliciting information based on a computer user's context
US7137069B2 (en) 1998-12-18 2006-11-14 Tangis Corporation Thematic response to a computer user's context, such as by a wearable personal computer
US7779015B2 (en) 1998-12-18 2010-08-17 Microsoft Corporation Logging and analyzing context attributes
US8181113B2 (en) 1998-12-18 2012-05-15 Microsoft Corporation Mediating conflicts in computer users context data
US6812937B1 (en) 1998-12-18 2004-11-02 Tangis Corporation Supplying enhanced computer user's context data
US6791580B1 (en) 1998-12-18 2004-09-14 Tangis Corporation Supplying notifications related to supply and consumption of user context data
US6747675B1 (en) 1998-12-18 2004-06-08 Tangis Corporation Mediating conflicts in computer user's context data
US20010030664A1 (en) 1999-08-16 2001-10-18 Shulman Leo A. Method and apparatus for configuring icon interactivity
US6353398B1 (en) 1999-10-22 2002-03-05 Himanshu S. Amin System for dynamically pushing information to a user utilizing global positioning system
WO2002033541A2 (en) 2000-10-16 2002-04-25 Tangis Corporation Dynamically determining appropriate computer interfaces
US20020044152A1 (en) 2000-10-16 2002-04-18 Abbott Kenneth H. Dynamic integration of computer generated and real world images
US20020054130A1 (en) 2000-10-16 2002-05-09 Abbott Kenneth H. Dynamically displaying current status of tasks
US7233933B2 (en) 2001-06-28 2007-06-19 Microsoft Corporation Methods and architecture for cross-device activity monitoring, reasoning, and visualization for providing status and forecasts of a users' presence and availability
JP2003044511A (en) * 2001-07-26 2003-02-14 Fuji Photo Film Co Ltd Display device, image pickup device, image retrieval device, and program
USD494584S1 (en) 2002-12-05 2004-08-17 Symbol Technologies, Inc. Mobile companion
JP2005018530A (en) * 2003-06-27 2005-01-20 Toshiba Corp Information processor, information processing program, and information processing method
JP2005031906A (en) * 2003-07-10 2005-02-03 Ntt Docomo Inc Information communication terminal device and process shared server device
US20050033777A1 (en) 2003-08-04 2005-02-10 Moraes Mark A. Tracking, recording and organizing changes to data in computer systems
US7831679B2 (en) 2003-10-15 2010-11-09 Microsoft Corporation Guiding sensing and preferences for context-sensitive services
US7165722B2 (en) * 2004-03-10 2007-01-23 Microsoft Corporation Method and system for communicating with identification tags
US7925995B2 (en) 2005-06-30 2011-04-12 Microsoft Corporation Integration of location logs, GPS signals, and spatial resources for identifying user activities, goals, and context
US7979252B2 (en) 2007-06-21 2011-07-12 Microsoft Corporation Selective sampling of user state based on expected utility
US7991718B2 (en) 2007-06-28 2011-08-02 Microsoft Corporation Method and apparatus for generating an inference about a destination of a trip using a combination of open-world modeling and closed world modeling
US9295346B2 (en) 2008-03-11 2016-03-29 Hallmark Cards, Incorporated Method of and apparatus for displaying merchandise

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5924096A (en) 1997-10-15 1999-07-13 Novell, Inc. Distributed database using indexed into tags to tracks events according to type, update cache, create virtual update log on demand
US20040205093A1 (en) 1999-12-01 2004-10-14 Jin Li Methods and systems for providing random access to structured media content
US6910029B1 (en) 2000-02-22 2005-06-21 International Business Machines Corporation System for weighted indexing of hierarchical documents

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP1897002A4

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200272631A1 (en) * 2007-01-31 2020-08-27 Paypal, Inc. Selective presentation of data items
EP2244195A3 (en) * 2009-04-22 2011-01-12 Palo Alto Research Center Incorporated System and method for implicit tagging of documents using search query data
WO2014162033A1 (en) * 2013-04-01 2014-10-09 Crambo Sa Method, mobile device, system and computer product for detecting and measuring the attention level of a user
US20160094670A1 (en) * 2013-04-01 2016-03-31 Nilo Garcia Manchado Method, mobile device, system and computer product for detecting and measuring the attention level of a user
WO2019112673A1 (en) * 2017-12-05 2019-06-13 Google Llc Identifying videos with inappropriate content by processing search logs
US11403337B2 (en) 2017-12-05 2022-08-02 Google Llc Identifying videos with inappropriate content by processing search logs
CN114579893A (en) * 2022-05-09 2022-06-03 山东大学 Continuous POI recommendation method and system

Also Published As

Publication number Publication date
EP1897002A2 (en) 2008-03-12
US7693817B2 (en) 2010-04-06
KR20080024157A (en) 2008-03-17
JP5021640B2 (en) 2012-09-12
KR101242369B1 (en) 2013-03-14
EP1897002B1 (en) 2018-05-16
CN101501627A (en) 2009-08-05
WO2007005382A3 (en) 2009-04-16
CN101501627B (en) 2011-12-21
EP1897002A4 (en) 2016-09-14
JP2009500747A (en) 2009-01-08
US20070016553A1 (en) 2007-01-18

Similar Documents

Publication Publication Date Title
US7693817B2 (en) Sensing, storing, indexing, and retrieving data leveraging measures of user activity, attention, and interest
US7225187B2 (en) Systems and methods for performing background queries from content and activity
US7162473B2 (en) Method and system for usage analyzer that determines user accessed sources, indexes data subsets, and associated metadata, processing implicit queries based on potential interest to users
JP5536022B2 (en) Systems, methods, and interfaces for providing personalized search and information access
US7716150B2 (en) Machine learning system for analyzing and establishing tagging trends based on convergence criteria
Johnson et al. Web content mining techniques: a survey
US7672909B2 (en) Machine learning system and method comprising segregator convergence and recognition components to determine the existence of possible tagging data trends and identify that predetermined convergence criteria have been met or establish criteria for taxonomy purpose then recognize items based on an aggregate of user tagging behavior
US8131684B2 (en) Adaptive archive data management
Chen et al. User intention modeling in web applications using data mining
US20190213407A1 (en) Automated Analysis System and Method for Analyzing at Least One of Scientific, Technological and Business Information
Barry Crabtree et al. Identifying and tracking changing interests
US8135669B2 (en) Information access with usage-driven metadata feedback
US20090322756A1 (en) Using visual techniques to manipulate data
Kang et al. Making sense of archived e‐mail: Exploring the Enron collection with NetLens
Aliakbary et al. Web page classification using social tags
Wable Information Retrieval in Business
CN113987146B (en) Dedicated intelligent question-answering system of electric power intranet
Mudgal et al. STATE OF THE ART CONTENT MINING USING SCAN TECHNOLOGY
Schutz Outer Context: A not yet Discovered Jewel for Document-Based Information Mining
Abolhassani et al. Web Page Classification Using Social Tags
Gonçalves et al. Understanding Stories about Personal Documents

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200680022672.6

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2006774032

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 1020077030613

Country of ref document: KR

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2008520267

Country of ref document: JP

Kind code of ref document: A