|Publication number||US7765212 B2|
|Application number||US 11/321,963|
|Publication date||Jul 27, 2010|
|Filing date||Dec 29, 2005|
|Priority date||Dec 29, 2005|
|Also published as||US20070156732|
|Publication number||11321963, 321963, US 7765212 B2, US 7765212B2, US-B2-7765212, US7765212 B2, US7765212B2|
|Inventors||Arungunram C. Surendran, Erin L. Renshaw, John C. Platt|
|Original Assignee||Microsoft Corporation|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (27), Non-Patent Citations (16), Referenced by (43), Classifications (7), Legal Events (4)|
|External Links: USPTO, USPTO Assignment, Espacenet|
Storage capacity on computing devices has increased tremendously over a relatively short period of time, thereby enabling users and businesses to create and store a substantial amount of data. For example, hard drive space on today's consumer computers is in the order of hundreds of gigabytes. Servers and other higher-level devices can be associated with a significantly greater amount of storage space. This growth in storage capacity is not solely limited to personal computers and servers, but rather has reached into the portable device space, such as portable telephones, personal digital assistants, portable media players, and other suitable hand-held devices.
The massive amount of storage space available to average consumers has enabled them to retain thousands if not millions of files. For example, photographs can be taken through use of a digital camera and then transferred and retained on a computing device. Thus, a computing device can effectively be utilized as a photograph album. In a similar vein, music files can be ripped from a media such as a compact disk and placed upon the computing device, thereby enabling the computing device to act as a juke box. Word processing documents can be created and retained, wherein such documents can relate to one's bills, reports, school papers, employment, investment portfolio, etc. Spread sheet files, slide presentations, and other item types relating to any topic desired by the user can also be created and/or retained in a hard disk or memory of a computing device. Given the significant number of data files that may exist on a computing device, wherein such files can be created at different times and relate to different topics, it can be discerned that organization and/or indexing of such files can be extremely problematic.
To undertake data file organization, conventionally folders and sub-folders are created, wherein names and location within a hierarchy of the folders is determined according to topic and content that is to be retained therein. This can be done manually and/or automatically; for instance, a user can manually create a folder, name the folder, and place the folder in a desired location. Thereafter, the user can move data/files to such folder and/or cause newly created data/files to be saved in the folder. Folders can also be created automatically through one or more programs. For example, digital cameras typically store files in folders that are named by date—thus, digital photographs can be stored in a folder that recites a date that photographs therein were taken. This approach works well for a small number of files created over a relatively short time frame, as users can remember locations of folders and contents that were stored therein. When number of files and folders increases and time passes, however, users have difficulty remembering where items that they wish to retrieve are located, what they were named, etc. A search for file content or name can then be employed, but often this search is deficient in locating desired data, as a user may not remember a name of a file, when such file was created, and other parameters that can be searched. To cause even further difficulty, a file may be related to a particular topic, but a search function cannot be employed due to lack of content or lack of particular wording.
A similar problem exists with respect to emails, as users can retain hundreds if not thousands of emails. Currently, organizing such emails requires a significant amount of labeling by a user. For instance, a user can categorize emails from a particular sender as “junk” email, thus causing each email delivered from such sender to be provided to a certain folder. Similarly, users can manually create folders and drag emails into such folders to organize emails. Furthermore, an email application can be trained to automatically direct emails to a particular older. However, emails that may belong to more than one folder that are assigned to a single folder can leave other folders incomplete. Additionally, items moved outside of an inbox are typically ignored.
The following presents a simplified summary in order to provide a basic understanding of some aspects of the claimed subject matter. This summary is not an extensive overview, and is not intended to identify key/critical elements or to delineate the scope of the claimed subject matter. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
The claimed subject matter relates generally to document organization, and more particularly to automatic document organization through automatic discovery of topics of interest of a user from their email. To effectuate this automatic organization of documents, emails associated with a user can be received and clustered by way of any suitable clustering algorithm(s). For example, the clustering can be undertaken such that an email can reside within a single cluster, or the clustering can be undertaken such that an email can reside within multiple clusters. The emails can be received from a web-based email service and/or an email application resident upon a client. Thus, all emails associated with a particular user can be analyzed and employed in connection with automatic organization of documents. In one particular example, multi-level clustering can be undertaken against the received emails, wherein multi-level clustering refers to undertaking several clustering acts against the received emails.
Upon the emails being placed into one or more clusters, key phrases can be extracted from multiple emails within the clusters. These extracted key phrases can be representative of a topic of personal interest to the user, and documents can be assigned to such topics. Documents that can be assigned to topics include emails, word processing documents, spreadsheets, digital images, video files, audio files, and any other suitable type of electronic file. Extracting key phrases from clustered emails is advantageous in that users often communicate over email with respect to areas of personal relevance to the user. Thus, topics that are highly relevant to the user can be automatically generated and utilized to organize emails as well as other documents.
To ensure that extracted key phrases are personalized with respect to the user, various filtering mechanisms can be employed to remove terms that are too general and/or not typically related to personalized topics. For example, a candidate list of key phrases can be reduced to noun phrases in subject lines of emails. Similarly, dates, days of the week, and the like can be removed as candidates for key phrases that characterize a particular topic. Still further, names of recipients and senders of emails can be collected and employed to refine key phrases extracted from clusters of documents.
Upon determining topics of interest to a user, any suitable document can be assigned to one or more topics that are characterized by the key phrases. For example, text associated with a document can be analyzed to determine a measure of relevance between the document and a particular topic. If the measure of relevance is above a threshold, the document can be assigned to the topic. Furthermore, it is understood that documents can be assigned to multiple topics. A user interface can be employed by a user to quickly access documents according to topic. For example, upon selection of a particular topic, documents associated with such topic can be provided to the user.
To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings. These aspects are indicative, however, of but a few of the various ways in which the principles of the claimed subject matter may be employed and the claimed matter is intended to include all such aspects and their equivalents. Other advantages and novel features may become apparent from the following detailed description when considered in conjunction with the drawings.
The subject invention is now described with reference to the drawings, wherein like reference numerals are used to refer to like elements throughout. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the claimed subject matter. It may be evident, however, that such subject matter may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to facilitate describing the subject invention.
As used in this application, the terms “component” and “system” are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and a computer. By way of illustration, both an application running on a server and the server can be a component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers. The word “exemplary” is used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs.
Furthermore, aspects of the claimed subject matter may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computer to implement various aspects of the subject invention. The term “article of manufacture” as used herein is intended to encompass a computer program accessible from any computer-readable device, carrier, or media. For example, computer readable media can include but are not limited to magnetic storage devices (e.g., hard disk, floppy disk, magnetic strips, . . . ), optical disks (e.g., compact disk (CD), digital versatile disk (DVD), . . . ), smart cards, and flash memory devices (e.g., card, stick, key drive, . . . ). Additionally it should be appreciated that a carrier wave can be employed to carry computer-readable electronic data such as those used in transmitting and receiving electronic mail or in accessing a network such as the Internet or a local area network (LAN). Of course, those skilled in the art will recognize many modifications may be made to this configuration without departing from the scope or spirit of what is described herein.
The claimed subject matter will now be described with respect to the drawings, where like reference numerals refer to like elements throughout. The claimed subject matter relates to automatically discovering topics of interest associated with a user by reviewing email data of such user. These topics of interest can then be employed to automatically organize items (including emails) associated with a user. Further, the systems, methods, articles of manufacture, and/or apparatuses described herein can be considered as being unsupervised and automatic, meaning that a pre-existing folder structure is not necessary to determine topics and organize items based at least in part upon the determined topics. Thus, an entire document store can be arranged into topics that are meaningful to the user.
To that end,
Content of the emails 104 received by the clustering component 102 can be employed to discover topics that are of importance to an individual associated with the emails 104. For example, an individual's personal interests can be gleaned through analyzing content of their emails. As utilized herein, a topic can be defined as any cohesive concept that is relevant to a user, such as an activity in which the user participates, an event the user organized or attended, a person or group of people within an organization to which the user belongs, etc. Furthermore, groups of people can sometimes be defined by concepts that appear in the emails 104, such as a project, a person, an activity, a mailing group, and the like. In another example, a group of people can be defined by information not within the emails 104, such as a circle of friends that do not utilize the term “friends” in email to refer to one another. Most commonly, a topic is signaled by occurrence of words relating to a common activity. The clustering component 102 can be thought of as a mechanism for deriving topics. Upon receipt of the emails 104, the clustering component 102 can cluster the plurality of emails 104 into a plurality of disparate clusters. For example, the clustering can be undertaken such that an email can be assigned to several clusters. Alternatively, the clustering component 102 can cluster the emails 104 such that an email may only be assigned to a single cluster.
In more detail, the clustering component 102 can employ any suitable clustering algorithm(s) to effectuate clustering of the emails 104. For example, to minimize variations that may be associated with clustering, a multi-level clustering scheme can be employed (where different clustering actions can be performed in a certain sequence). For instance, the emails 104 can be represented using tf-idf (term frequency-inverse document frequency) vectors of particular words within the emails 104, and a cosine distance measure can be employed to measure similarity between different emails. Clusters can then be initialized through employment of agglomerative clustering on a small sample of the emails 104. Thereafter, K-means can be run utilizing the initializations on each of the emails 104 within the data store 108. Probabilistic Latent Semantic Analysis can then be run utilizing the K-means clusters as initial clusters. During each clustering stage clusters that are not associated with a threshold number of threshold percentage of the emails 104 can be removed from further consideration, as providing too many topics will not aid a user in organizing emails and/or documents. It is understood that while examples of clustering acts have been described above, any suitable clustering algorithm can be employed in connection with clustering the emails 104. For example, a mixture of multinomials and hierarchical agglomerative clustering can be employed. Further, Probabilistic Latent Semantic Analysis can be run with random initialization.
To characterize a topic, multi-document key phrase extraction can be employed to extract one or more key phrases from emails within each cluster. In more detail, a select number of characteristic keywords and/or key phrases of a topic can be extracted from each cluster. An advantage of employing Probabilistic Latent Semantic Analysis for clustering is that each cluster is automatically characterized by distribution of words in such cluster. Multi-document key phrase extraction is described in greater detail below.
Upon determining a select number of topics and labeling such topics, an organization component 110 can be employed to organize the emails 104 and the documents 108 through employment of the key phrases that characterize the topics. For instance, a word processing document can be associated with a particular topic by comparing content of such document with key phrases extracted from one or more clusters. Metadata can be assigned to each of the documents 108 to indicate topic(s) to which each of the documents belongs. For instance, each document can be associated with a relevance measure to various topics, and can be assigned to topics where the relevance measure is above a pre-defined threshold. In a detailed example, the relevance measure may be a sum of tf-idf counts of all keywords/key phrases in a particular document. Other relevance measuring techniques, however, are contemplated and intended to fall under the scope of the hereto-appended claims. The organization component 110 can analyze such relevance scores and assign each document within the data store 106 to one or more topics. Furthermore, a user can define a subset of the documents that are to be assigned to topics. For instance, the user may wish that word processing and spreadsheet documents be automatically associated with topics while not wishing to associate digital photographs with topics.
Turning now to
The system 200 can further include a filtering component 204 that removes a cluster if a number of emails within such cluster are below a predefined threshold and/or a predefined percentage of total number of emails. For instance, if the clustering component 102 utilizes a multi-level clustering scheme when clustering the emails 104, clusters that do not meet certain criteria at separate clustering steps can be removed. In a detailed example, after agglomerative clustering and running K-means, clusters that do not include a threshold percentage of the emails 104 can be removed. Similarly, after performing Probabilistic Latent Semantic Analysis topics (characterized by particular key phrases) that do not exceed a certain threshold (e.g., 0.1) can be removed from consideration. The system 200 can also include a post-processing component 206 that can remove domain-dependent words, wherein such words may not be meaningful in connection with representing a topic. For example, a name of a department within which an individual is assigned may appear within a multitude of emails, and thus may not be representative of a topic.
After pre-processing, post-processing, and/or filtering associated with the clustering component 102, key phrase extraction can be completed by a key phrase extraction component 208. The key phrase extraction component 208 can extract key phrases from multiple emails that exist within particular clusters. As stated above, one advantage of performing Probabilistic Latent Semantic Analysis is that automatic characterization of topics can be completed based upon distribution of words in such topics. For instance, words can be selected as key phrases if they are within a threshold number (e.g., half) of the probability of a most likely word associated with the cluster. This can limit a number of key phrases that characterize topics to a reasonable number. Additionally, words can be extracted as additional keywords that lie between threshold values (e.g., one half and one fifth) of the word most likely to be associated with the topic. Further, words that are sub-phrases of selected key phrases can be removed from a list of key phrases associated with a topic. For instance, if the phrase “puzzles and logic” is associated with a topic, then the phrase “puzzles” can be removed. Moreover, words that are associated with an individual's name can be prohibited from characterizing a topic unless they are the only words that can characterize the topic.
Identification of the topics characterized by key phrases can be provided to the organization component 110, which can analyze the documents 108 and automatically organize such documents based at least in part upon the key phrases. For instance, the organization component 110 can evaluate text associated with the documents 108 in light of key phrases that characterize/identify a topic. To that end, the organization component 110 can include a probability component 210 that determines a probability that a document belongs to a specific topic. This probability can be determined, for instance, by discerning a number of instances of the key phrases within the document. Any suitable manner for determining a relevance score between a document and a topic, however, is contemplated and intended to fall under the scope of the hereto-appended claims. If the determined probability is above a threshold, the document can be assigned to the topic. Furthermore, a document can reside within multiple topics. For example, the document can be associated with a probability above a threshold with respect to multiple topics, and can be organized accordingly.
Referring now to
Upon the key phrase extraction component 208 characterizing topics (clusters) with key phrases, such topics can be relayed to the organization component 110. The organization component 110 can access the documents 108 within the data store 106 and organize such documents 108 based at least in part upon the key phrases. As described above, each of the documents 108 can be analyzed in light of the key phrases.
Referring now to
The system 400 can additionally include an assignment component 404 that is employed to assign an incoming email to a particular topic. For example, it may not be desirable to perform clustering each time an email is received by an email application. The assignment component 404 can assign the incoming email to at least one topic based at least in part upon a calculated probability of relevance (e.g., relevance measure) with respect to the at least one topic. Thus, the assignment component 404 and the organization component 110 can communicate with one another to complete the assignment. Determining a relevance measure with respect to documents has been described above.
Referring now to
The organization component 110 can be associated with an interface component 502 that automatically arranges topics defined by the clustering component 102 and presents such topics (and documents associated with such topics) to the user. For instance, the topics can be arranged by the interface component 502 in alphabetical order, according to a number of emails associated with the topics, or any other suitable manner for organizing the topics. Documents associated with the topics can similarly be arranged alphabetically according to title (or subject line), arranged according to sender of an email message, or any suitable manner of arranging the documents with respect to a topic. In one example, the interface component 502 can comprise a relevance calculator component 504 that calculates a relevance measure of an email with respect to a topic that includes the email. The relevance calculator component 504 can use standard relevance formulae in the ranking of the documents. For example, the documents can be ranked by the dot product of the tf-idf vector of each document with the tf-idf vector of the key phrases extracted for that topic. Other relevance formulae, such as BM25, can also be used. The interface component 502 can then display the documents associated with topics according to the calculated relevance. Thus, documents with a highest relevance score with respect to a topic can be displayed most prominently upon a user selecting the topic.
Now referring to
The user context discovered by the context discovery component 602 can be provided to a machine-learning component 604, which can make inferences based upon current and historical contexts and provide the organization component 110 with instructions based at least in part upon the inferences. As used herein, the term “inference” refers generally to the process of reasoning about or inferring states of the system, environment, and/or user from a set of observations as captured via events and/or data. Inference can be employed to identify a specific context or action, or can generate a probability distribution over states, for example. The inference can be probabilistic—that is, the computation of a probability distribution over states of interest based on a consideration of data and events. Inference can also refer to techniques employed for composing higher-level events from a set of events and/or data. Such inference results in the construction of new events or actions from a set of observed events and/or stored event data, whether or not the events are correlated in close temporal proximity, and whether the events and data come from one or several event and data sources. Various classification schemes and/or systems (e.g., support vector machines, neural networks, expert systems, Bayesian belief networks, fuzzy logic, data fusion engines, . . . ) can be employed in connection with performing automatic and/or inferred action. In a particular example, the machine-learning component 604 can determine that during a lunch break the user reviews information relating to a certain sports team. Such determination can be made based upon current and previous contexts provided by the context discovery component 602. Thereafter, the machine-learning component 604 can infer with a threshold probability of correctness that the user desires a certain topic to be prominently displayed.
Referring now to
Referring specifically to
At 708, key phrase extraction can be performed on multiple emails within the clusters. Before or after clustering, a candidate number of possible key phrases can be reduced through various filtering mechanisms. For example, candidate key phrases can be limited to noun phrases that occur within subject lines of one or more emails and are associated with a sufficient number of repetition within bodies of emails. Similarly, dates, days of a week, and the like can be filtered from a list of possible key phrases. At 710, each cluster is labeled with one or more extracted key phrases. These extracted key phrases can then be employed to characterize a topic. At 712, documents are organized based at least in part upon the labels. In one particular example, the key phrases can be compared with content of each document. More specifically, a relevance ranking can be calculated, wherein such ranking is based upon a sum of tf-dif counts of all keywords in a particular document. If the relevance ranking is above a threshold with respect to a particular topic, the document can be assigned to such topic. Other manners for determining relevance are also contemplated. The methodology 700 then completes at 714.
Now referring to
Turning now to
Turning now to
In order to provide additional context for various aspects of the subject invention,
Generally, however, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular data types. The operating environment 1110 is only one example of a suitable operating environment and is not intended to suggest any limitation as to the scope of use or functionality of the features described herein. Other well known computer systems, environments, and/or configurations that may be suitable for use with the invention include but are not limited to, personal computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include the above systems or devices, and the like.
With reference to
The system bus 1118 can be any of several types of bus structure(s) including the memory bus or memory controller, a peripheral bus or external bus, and/or a local bus using any variety of available bus architectures including, but not limited to, 8-bit bus, Industrial Standard Architecture (ISA), Micro-Channel Architecture (MSA), Extended ISA (EISA), Intelligent Drive Electronics (IDE), VESA Local Bus (VLB), Peripheral Component Interconnect (PCI), Universal Serial Bus (USB), Advanced Graphics Port (AGP), Personal Computer Memory Card International Association bus (PCMCIA), and Small Computer Systems Interface (SCSI). The system memory 1116 includes volatile memory 1120 and nonvolatile memory 1122. The basic input/output system (BIOS), containing the basic routines to transfer information between elements within the computer 1112, such as during start-up, is stored in nonvolatile memory 1122. By way of illustration, and not limitation, nonvolatile memory 1122 can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM), or flash memory. Volatile memory 1120 includes random access memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in many forms such as synchronous RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), Synchlink DRAM (SLDRAM), and direct Rambus RAM (DRRAM).
Computer 1112 also includes removable/nonremovable, volatile/nonvolatile computer storage media.
It is to be appreciated that
A user enters commands or information into the computer 1112 through input device(s) 1136. Input devices 1136 include, but are not limited to, a pointing device such as a mouse, trackball, stylus, touch pad, keyboard, microphone, joystick, game pad, satellite dish, scanner, TV tuner card, digital camera, digital video camera, web camera, remote control, and the like. As described above, advertisements can be provided to a user upon receipt of user input. These and other input devices connect to the processing unit 1114 through the system bus 1118 via interface port(s) 1138. Interface port(s) 1138 include, for example, a serial port, a parallel port, a game port, and a universal serial bus (USB). Output device(s) 1140 use some of the same type of ports as input device(s) 1136. Thus, for example, a USB port may be used to provide input to computer 1112, and to output information from computer 1112 to an output device 1140. Output adapter 1142 is provided to illustrate that there are some output devices 1140 like monitors, speakers, and printers among other output devices 1140 that require special adapters. The output adapters 1142 include, by way of illustration and not limitation, video and sound cards that provide a means of connection between the output device 1140 and the system bus 1118. It should be noted that other devices and/or systems of devices provide both input and output capabilities such as remote computer(s) 1144.
Computer 1112 can operate in a networked environment using logical connections to one or more remote computers, such as remote computer(s) 1144. The remote computer(s) 1144 can be a personal computer, a server, a router, a network PC, a workstation, a microprocessor based appliance, a peer device or other common network node and the like, and typically includes many or all of the elements described relative to computer 1112. For purposes of brevity, only a memory storage device 1146 is illustrated with remote computer(s) 1144. Remote computer(s) 1144 is logically connected to computer 1112 through a network interface 1148 and then physically connected via communication connection 1150. Network interface 1148 encompasses communication networks such as local-area networks (LAN) and wide-area networks (WAN). LAN technologies include Fiber Distributed Data Interface (FDDI), Copper Distributed Data Interface (CDDI), Ethernet/IEEE 802.3, Token Ring/IEEE 802.5 and the like. WAN technologies include, but are not limited to, point-to-point links, circuit switching networks like Integrated Services Digital Networks (ISDN) and variations thereon, packet switching networks, and Digital Subscriber Lines (DSL).
Communication connection(s) 1150 refers to the hardware/software employed to connect the network interface 1148 to the bus 1118. While communication connection 1150 is shown for illustrative clarity inside computer 1112, it can also be external to computer 1112. The hardware/software necessary for connection to the network interface 1148 includes, for exemplary purposes only, internal and external technologies such as, modems including regular telephone grade modems, cable modems and DSL modems, ISDN adapters, and Ethernet cards.
What has been described above includes examples of the claimed subject matter. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing such subject matter, but one of ordinary skill in the art may recognize that many further combinations and permutations are possible. Accordingly, the claimed subject matter is intended to embrace all such alterations, modifications, and variations that fall within the spirit and scope of the appended claims. Furthermore, to the extent that the term “includes” is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term “comprising” as “comprising” is interpreted when employed as a transitional word in a claim.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US5963940 *||Aug 14, 1996||Oct 5, 1999||Syracuse University||Natural language information retrieval system and method|
|US6128613 *||Apr 29, 1998||Oct 3, 2000||The Chinese University Of Hong Kong||Method and apparatus for establishing topic word classes based on an entropy cost function to retrieve documents represented by the topic words|
|US6167368 *||Aug 14, 1998||Dec 26, 2000||The Trustees Of Columbia University In The City Of New York||Method and system for indentifying significant topics of a document|
|US6253169 *||May 28, 1998||Jun 26, 2001||International Business Machines Corporation||Method for improvement accuracy of decision tree based text categorization|
|US6349307 *||Dec 28, 1998||Feb 19, 2002||U.S. Philips Corporation||Cooperative topical servers with automatic prefiltering and routing|
|US6446061 *||Jun 30, 1999||Sep 3, 2002||International Business Machines Corporation||Taxonomy generation for document collections|
|US6493663 *||Dec 7, 1999||Dec 10, 2002||Fuji Xerox Co., Ltd.||Document summarizing apparatus, document summarizing method and recording medium carrying a document summarizing program|
|US6578032 *||Jun 28, 2000||Jun 10, 2003||Microsoft Corporation||Method and system for performing phrase/word clustering and cluster merging|
|US6592627 *||Jun 10, 1999||Jul 15, 2003||International Business Machines Corporation||System and method for organizing repositories of semi-structured documents such as email|
|US6654743 *||Nov 13, 2000||Nov 25, 2003||Xerox Corporation||Robust clustering of web documents|
|US6691108 *||Dec 12, 2000||Feb 10, 2004||Nec Corporation||Focused search engine and method|
|US6871174 *||May 17, 2000||Mar 22, 2005||Microsoft Corporation||System and method for matching a textual input to a lexical knowledge base and for utilizing results of that match|
|US7228301 *||Jun 27, 2003||Jun 5, 2007||Microsoft Corporation||Method for normalizing document metadata to improve search results using an alias relationship directory service|
|US7565630 *||Jun 15, 2004||Jul 21, 2009||Google Inc.||Customization of search results for search queries received from third party sites|
|US7593932 *||Jan 13, 2003||Sep 22, 2009||Elucidon Group Limited||Information data retrieval, where the data is organized in terms, documents and document corpora|
|US7627590 *||Oct 25, 2004||Dec 1, 2009||Apple Inc.||System and method for dynamically presenting a summary of content associated with a document|
|US20020023136 *||Jun 15, 2001||Feb 21, 2002||Silver Edward Michael||Electronic mail (email) internet applicance methods and systems|
|US20020055936 *||Aug 20, 2001||May 9, 2002||Kent Ridge Digital Labs||Knowledge discovery system|
|US20020156810 *||Apr 19, 2001||Oct 24, 2002||International Business Machines Corporation||Method and system for identifying relationships between text documents and structured variables pertaining to the text documents|
|US20030182631 *||Mar 22, 2002||Sep 25, 2003||Xerox Corporation||Systems and methods for determining the topic structure of a portion of text|
|US20030220922 *||Mar 28, 2003||Nov 27, 2003||Noriyuki Yamamoto||Information processing apparatus and method, recording medium, and program|
|US20040117736 *||Dec 16, 2002||Jun 17, 2004||Palo Alto Research Center, Incorporated||Method and apparatus for normalizing quoting styles in electronic mail messages|
|US20040177048 *||Sep 27, 2003||Sep 9, 2004||Klug John R.||Method and apparatus for identifying, managing, and controlling communications|
|US20040177319 *||Jul 16, 2003||Sep 9, 2004||Horn Bruce L.||Computer system for automatic organization, indexing and viewing of information from multiple sources|
|US20040243388 *||Jun 3, 2003||Dec 2, 2004||Corman Steven R.||System amd method of analyzing text using dynamic centering resonance analysis|
|US20060085504 *||Oct 20, 2004||Apr 20, 2006||Juxing Yang||A global electronic mail classification system|
|US20060095521 *||Nov 4, 2004||May 4, 2006||Seth Patinkin||Method, apparatus, and system for clustering and classification|
|1||Andrew McCallum, et al. A Comparison of Event Models for Naive Bayes Text Classification. AAAI-98 Workshop on Text Categorization. 1998.|
|2||Douglass R. Cutting, et al. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. SIGIR 92 , pp. 318-329, 1992.|
|3||Endong Xun. A Unified Statistical Model for the Identification of English BaseNP. ACL-2000, The 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong. Oct. 2000.|
|4||Gary Boone. Concept Features in Re: Agent, an Intelligent Email Agent. Second International Conference on Autonomous Agents. May 10-13, 1998.|
|5||Gina D. Venolia, et al. Supporting Email Workflow. Microsoft Technical Report. 2001.|
|6||Kenrick Mock. An Experimental Framework for Email Categorization and Management. SIGIR. 2001.|
|7||Olle Balter, et al. Bifrost Inbox Organizer: Giving users control over the inbox. NAD A T echn ica I Repo r t TRITA-NA-p. 010 1. Ro yal. Oct. 2002.|
|8||Paul Dourish et al. Presto: An Experimental Architecture for Fluid Interactive Document Spaces. ACM Transactions on Computer-Human Interaction, 1999.|
|9||Richard B. Segal, et al. MailCat: An Intelligent Assistant for Organizing E-Mail. In Proceedings of the 3rd International Conference on Autonomous Agents. 1999.|
|10||Steve Whittaker, et al. Email overload: exploring personal information management of email. Proceedings of the ACM Conference on Human Factors in Computing Systems. 1996.|
|11||Thomas Hofmann. Probabilistic Latent Semantic Indexing. Machine Learning. 2001.|
|12||*||Tzoukermann et al., GIST-IT: summarizing email using linguistic knowledge and machine learning,Annual Meeting of the ACL Proceedings of the workshop on Human Language Technology and Knowledge Management-vol. 2001; Toulouse, France pp. 1-8, Year of Publication: 2001, Association for Computational Linguistics.|
|13||*||Tzoukermann et al., GIST-IT: summarizing email using linguistic knowledge and machine learning,Annual Meeting of the ACL Proceedings of the workshop on Human Language Technology and Knowledge Management—vol. 2001; Toulouse, France pp. 1-8, Year of Publication: 2001, Association for Computational Linguistics.|
|14||Victoria Bellotti, et al. Taking Email to Task: The Design and Evaluation of a Task MAnagement Centered Email Tool. Pr o ceed ing s of th e Con fer ence on Hum an Fa ctor s in Computing Systems (CHI-2003). 2003.|
|15||Wendy E. Mackay. More Than Just a Communication System: Diversity in the Use of Electronic Mail. Proceedings of the CSCW 1998 Conference on Computer Supported Co-operative Work. 1988.|
|16||Yifen Huang, et al. Inferring Ongoing Activities of Workstation Users by Clustering Email. Conference on Email and Anti-Spam. 2004.|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US7899871 *||Aug 14, 2007||Mar 1, 2011||Clearwell Systems, Inc.||Methods and systems for e-mail topic classification|
|US8032598||Jan 23, 2007||Oct 4, 2011||Clearwell Systems, Inc.||Methods and systems of electronic message threading and ranking|
|US8145648 *||Dec 19, 2008||Mar 27, 2012||Samsung Electronics Co., Ltd.||Semantic metadata creation for videos|
|US8392409||Sep 10, 2007||Mar 5, 2013||Symantec Corporation||Methods, systems, and user interface for E-mail analysis and review|
|US8458192||Jan 31, 2012||Jun 4, 2013||Google Inc.||System and method for determining topic interest|
|US8458193||Jan 31, 2012||Jun 4, 2013||Google Inc.||System and method for determining active topics|
|US8458194||Jan 31, 2012||Jun 4, 2013||Google Inc.||System and method for content-based document organization and filing|
|US8458195||Jan 31, 2012||Jun 4, 2013||Google Inc.||System and method for determining similar users|
|US8458196||Jan 31, 2012||Jun 4, 2013||Google Inc.||System and method for determining topic authority|
|US8458197||Jan 31, 2012||Jun 4, 2013||Google Inc.||System and method for determining similar topics|
|US8458271||Nov 9, 2010||Jun 4, 2013||International Business Machines Corporation||Handling email communications having human delegate prepared summaries|
|US8713078||Aug 13, 2009||Apr 29, 2014||Samsung Electronics Co., Ltd.||Method for building taxonomy of topics and categorizing videos|
|US8719257||Feb 16, 2011||May 6, 2014||Symantec Corporation||Methods and systems for automatically generating semantic/concept searches|
|US8756236||Jan 31, 2012||Jun 17, 2014||Google Inc.||System and method for indexing documents|
|US8843822||Jan 30, 2012||Sep 23, 2014||Microsoft Corporation||Intelligent prioritization of activated extensions|
|US8886648||Jan 31, 2012||Nov 11, 2014||Google Inc.||System and method for computation of document similarity|
|US8943071||Aug 23, 2011||Jan 27, 2015||At&T Intellectual Property I, L.P.||Automatic sort and propagation associated with electronic documents|
|US8949283||Dec 23, 2013||Feb 3, 2015||Google Inc.||Systems and methods for clustering electronic messages|
|US8959425||Dec 9, 2011||Feb 17, 2015||Microsoft Corporation||Inference-based extension activation|
|US9015192||Mar 19, 2014||Apr 21, 2015||Google Inc.||Systems and methods for improved processing of personalized message queries|
|US9026591||May 25, 2011||May 5, 2015||Avaya Inc.||System and method for advanced communication thread analysis|
|US9124546||Dec 31, 2013||Sep 1, 2015||Google Inc.||Systems and methods for throttling display of electronic messages|
|US9152307||Feb 18, 2014||Oct 6, 2015||Google Inc.||Systems and methods for simultaneously displaying clustered, in-line electronic messages in one display|
|US9251508||Dec 9, 2010||Feb 2, 2016||At&T Intellectual Property I, L.P.||Intelligent message processing|
|US9256445||Jan 30, 2012||Feb 9, 2016||Microsoft Technology Licensing, Llc||Dynamic extension view with multiple levels of expansion|
|US9275129||Feb 16, 2011||Mar 1, 2016||Symantec Corporation||Methods and systems to efficiently find similar and near-duplicate emails and files|
|US9306893 *||Feb 21, 2014||Apr 5, 2016||Google Inc.||Systems and methods for progressive message flow|
|US9378200||Sep 30, 2014||Jun 28, 2016||Emc Corporation||Automated content inference system for unstructured text data|
|US9449112||Jan 30, 2012||Sep 20, 2016||Microsoft Technology Licensing, Llc||Extension activation for related documents|
|US9454528 *||Oct 17, 2011||Sep 27, 2016||Xerox Corporation||Method and system for creating ordered reading lists from unstructured document sets|
|US9519883||Jun 28, 2011||Dec 13, 2016||Microsoft Technology Licensing, Llc||Automatic project content suggestion|
|US9542455 *||Dec 11, 2013||Jan 10, 2017||Avaya Inc.||Anti-trending|
|US9542668||Feb 21, 2014||Jan 10, 2017||Google Inc.||Systems and methods for clustering electronic messages|
|US9600568||May 18, 2011||Mar 21, 2017||Veritas Technologies Llc||Methods and systems for automatic evaluation of electronic discovery review and productions|
|US20100011020 *||Jul 11, 2008||Jan 14, 2010||Motorola, Inc.||Recommender system|
|US20100057694 *||Dec 19, 2008||Mar 4, 2010||Samsung Electronics Co., Ltd.||Semantic metadata creation for videos|
|US20130097167 *||Oct 17, 2011||Apr 18, 2013||Xerox Corporation||Method and system for creating ordered reading lists from unstructured document sets|
|US20130159082 *||Dec 16, 2011||Jun 20, 2013||Comcast Cable Communications, Llc||Managing electronic mail|
|US20140115483 *||May 2, 2013||Apr 24, 2014||Aol Inc.||Systems and methods for processing and organizing electronic content|
|US20150161216 *||Dec 11, 2013||Jun 11, 2015||Avaya, Inc.||Anti-trending|
|US20150188870 *||Feb 21, 2014||Jul 2, 2015||Google Inc.||Systems and methods for progressive message flow|
|CN103620587A *||Jun 9, 2012||Mar 5, 2014||微软公司||Automatic classification of electronic content into projects|
|WO2013003008A3 *||Jun 9, 2012||Apr 25, 2013||Microsoft Corporation||Automatic classification of electronic content into projects|
|U.S. Classification||707/738, 707/739|
|International Classification||G06F17/30, G06F7/00|
|Cooperative Classification||H04L12/58, G06Q10/107|
|Jan 27, 2006||AS||Assignment|
Owner name: MICROSOFT CORPORATION, WASHINGTON
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SURENDRAN, ARUNGUNRAM C.;RENSHAW, ERIN L.;PLATT, JOHN C.;REEL/FRAME:017078/0678;SIGNING DATES FROM 20051228 TO 20051229
Owner name: MICROSOFT CORPORATION, WASHINGTON
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SURENDRAN, ARUNGUNRAM C.;RENSHAW, ERIN L.;PLATT, JOHN C.;SIGNING DATES FROM 20051228 TO 20051229;REEL/FRAME:017078/0678
|May 3, 2011||CC||Certificate of correction|
|Dec 30, 2013||FPAY||Fee payment|
Year of fee payment: 4
|Dec 9, 2014||AS||Assignment|
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034543/0001
Effective date: 20141014