« PreviousContinue »
11211 10211 NETWORK 108 100 0 .~ / 112b 10% _> CLIENT CLIENT 102a PROCESSOR 110 SERVER DEVICE 150 MEMORY 108 112a PROCESSOR 160 ‘ CLIENT APPLICATIONS 120 MEMORY 162 4 11 I NETWORK SEARCH 170 CAPTURE PROCESSOR 124 ENGWE I QUEUE 126 I sEARCI-I ENGINE 122 IINDEXER 130 | IQUERY SYSTEM 132 I IFORMATTER 134 I M ‘V DISPLAY PROCESSOR 128
METHODS AND SYSTEMS FOR ELIMINATING DUPLICATE EVENTS
The invention generally relates to search engines. More particularly, the invention relates to methods and systems for eliminating duplicate events.
BACKGROUND OF THE INVENTION
Users generate and access a large number of articles, such as emails, web pages, word processing documents, spreadsheet documents, instant messenger messages, and presentation documents, using a client device, such as a personal computer, personal digital assistant, or mobile phone. Some articles are stored on one or more storage devices coupled to, accessible by, or otherwise associated with the client device (s). Users sometimes wish to search the storage device(s) for articles.
Conventional client-device search applications may significantly degrade the performance of the client device. For example, certain conventional client-device search applications typically use batch processing to index all articles, which can result in noticeably slower performance of the client device during the batch processing. Additionally, batch processing occurs only periodically. Therefore, when a user performs a search, the most recent articles are sometimes not included in the results. Moreover, if the batch processing is scheduled for a time when the client device is not operational and is thus not performed for an extended period of time, the index of articles associated with the client device can become outdated. Conventional client-device search applications can also need to rebuild the index at each batch processing or build new partial indexes and perform a merge operation that can use a lot of client-device resources. Conventional clientdevice search applications also sometimes use a great deal of system resources when operational, resulting in slower performance of the client device.
Additionally, conventional client-device search applications can require an explicit search query from a user to generate results, and may be limited to examining file names or the contents of a particular application’s files.
Embodiments of the present invention comprise methods and systems for information capture. In one embodiment, an event is captured, wherein the event comprises a user interaction with an article on a client device and it is determined whether the event is a duplicate of a stored event. If it is determined that the event is not a duplicate of a stored event, then the event is indexed. If it is determined that the event is not a duplicate of a stored event, then the event can also be stored.
These exemplary embodiments are mentioned not to limit or define the invention, but to provide examples of the embodiments of the invention to aid understanding thereof. Exemplary embodiments are discussed in the Detailed Description, and further description of the invention is provided there. Advantages offered by the various embodiments of the present invention may be further understood by examining this specification.
BRIEF DESCRIPTION OF THE DRAWINGS
These and other features, aspects, and advantages of the present invention are better understood when the following Detailed Description is read with reference to the accompanying drawings, wherein:
FIG. 1 is a diagram illustrating an exemplary environment in which one embodiment of the present invention may operate;
FIG. 2 is a flow diagram illustrating an exemplary method of capturing and processing event data associated with a client device in one embodiment of the present invention; and
FIG. 3 is a flow diagram illustrating an exemplary method of determining duplicate events in one embodiment of the present invention.
DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS
Referring now to the drawings in which like numerals indicate like elements throughout the several figures, FIG. 1 is a block diagram illustrating an exemplary environment for implementation of an embodiment of the present invention. While the environment shown in FIG. 1 reflects a client-side search engine architecture embodiment, other embodiments are possible. The system 100 shown in FIG. 1 includes multiple client devices 102a-n that can communicate with a server device 150 over a network 106. The network 106 shown in FIG. 1 comprises the Internet. In other embodiments, other networks, such as an intranet, may be used instead. Moreover, methods according to the present invention may operate within a single client device that does not communicate with a server device or a network.
Client devices 102a-n can be coupled to a network 106, or alternatively, can be stand alone machines. Client devices 102a-n may also include a number of external or internal devices such as a mouse, a CD-ROM, DVD, a keyboard, a display device, or other input or output devices. Examples of client devices 102a-n are personal computers, digital assistants, personal digital assistants, cellular phones, mobile phones, smart phones, pagers, digital tablets, laptop computers, Intemet appliances, and other processor-based devices. In general, the client devices 102a-n may be any type of
processor-based platform that operates on any suitable operating system, such as Microsoft® Windows® or Linux, capable of supporting one or more client application programs. For example, the client device 102a can comprise a personal computer executing client application programs, also known as client applications 120. The client applications 120 can be contained in memory 108 and can include, for example, a word processing application, a spreadsheet application, an email application, an instant messenger application, a presentation application, an Internet browser application, a calendar/organizer application, a video playing application, an audio playing application, an image display application, a file management program, an operating system shell, and other applications capable of being executed by a client device. Client applications may also include client-side applications that interact with or accesses other applications (such as, for example, a web-browser executing on the client device 102a that interacts with a remote e-mail server to access e-mail).
The user 112a can interact with the various client applications 120 and articles associated with the client applications 120 via various input and output devices of the client device 102a. Articles include, for example, word processor documents, spreadsheet documents, presentation docuMents, emails, instant messenger messages, database entries, calendar entries, appointment entries, task manager entries, source code files, and other client application program content, files, messages, items, web pages of various formats, such as HTML, XML, XHTML, Portable Document Format (PDF) files, and media files, such as image files, audio files, and video files, or any other documents or items or groups of documents or items or information of any suitable type whatsoever.
The user’s 112a interaction with articles, the client applications 120, and the client device 102a creates event data that may be observed, recorded, analyzed or otherwise used. An event can be any occurrence possible associated with an article, client application 120, or client device 102a, such as inputting text in an article, displaying an article on a display device, sending an article, receiving an article, manipulating an input device, opening an article, saving an article, printing an article, closing an article, opening a client application program, closing a client application program, idle time, processor load, disk access, memory usage, bringing a client application program to the foreground, changing visual display details of the application (such as resizing or minimizing) and any other suitable occurrence associated with an article, a client application program, or the client device whatsoever. Additionally, event data can be generated when the client device 102a interacts with an article independent of the user 112a, such as when receiving an email or performing a scheduled task.
The memory 108 of the client device 102a can also contain a capture processor 124, a queue 126, and a search engine 122. The client device 102a can also contain or is in communication with a data store 140. The capture processor 124 can capture events and pass them to the queue 126. The queue 126 can pass the captured events to the search engine 122 or the search engine 122 can retrieve new events from the queue 126. In one embodiment, the queue 126 notifies the search engine 122 when a new event arrives in the queue 126 and the search engine 122 retrieves the event (or events) from the queue 126 when the search engine 122 is ready to process the event (or events). When the search engine receives an event it can be processed and can be stored in the data store 140. The search engine 122 can receive an explicit query from the user 112a or generate an implicit query and it can retrieve infor
mation from the data store 140 in response to the query. In another embodiment, the queue is located in the search engine 122. In still another embodiment, the client device 102a does not have a queue and the events are passed from the capture processor 124 directly to the search engine 122. According to other embodiments, the event data is transferred using an information exchange protocol. The information exchange protocol can comprise, for example, any suitable rule or convention facilitating data exchange, and can include, for example, any one of the following communication mechanisms: Extensible Markup LanguageiRemote Procedure Calling protocol Q(ML/RPC), Hypertext Transfer Protocol (HTTP), Simple Object Access Protocol (SOAP), shared memory, sockets, local or remote procedure calling, or any other suitable information exchange mechanism.
The capture processor 124 can capture an event by identifying and compiling event data associated with an event. Examples of events include a user viewing a web page, saving a word processing document, printing a spreadsheet document, inputting text to compose or edit an email, such as adding a name to the recipients list, opening a presentation application, closing an instant messenger application, entering a keystroke, moving the mouse, hovering the mouse over a hyperlink, moving the window focus to a word processing document, and sending an instant messenger message. An example of event data captured by the capture processor 124 for an event involving the viewing of a web page by a user can comprise the URL of the web page, the time and date the user viewed the web page, the content of the web page in original or processed forms, a screenshot of the page as displayed to the user, and a thumbnail version of the screenshot.
In one embodiment, the capture processor 124 comprises multiple capture components. For example, the capture processor 124 shown comprises a separate capture component for each client application in order to capture events associated with each application. The capture processor 124 shown in FIG. 1 also comprises a separate capture component to monitor and capture keystrokes input by the user and a separate capture component to monitor and capture items, such as text, displayed on a display device associated with the client device 102a. An individual capture component can monitor multiple client applications and multiple capture components can monitor different aspects of a single client application.
The capture processor 124 shown also comprises a separate capture component that monitors overall network activity in order to capture event data associated with network activity, such as the sending or receipt of an instant messenger message. The capture processor 124 shown in FIG. 1 may also comprise a separate client device capture component that monitors overall client device performance data, such as processor load, idle time, disk access, the client applications in use, and the amount of memory available. An individual capture component may monitor multiple client applications and multiple capture components can monitor different aspects of a single client application.
In one embodiment, the capture processor 124, through the individual capture components, can monitor activity on the client device and can capture events by a generalized event definition and registration mechanism, such as an event schema. Each capture component can define its own event schema or can use a predefined one. Event schemas can differ depending on the client application or activity the capture component is monitoring. Generally, the event schema can describe the format for an event, for example, by providing fields for event data associated with the event (such as the time of the event) and fields related to any associated article (such as the title) as well as the content of any associated