Search Images Maps Play YouTube News Gmail Drive More »
Advanced Patent Search | Web History | Sign in

Patents

A system and method for real time search that matches a plurality of client queries against a plurality of terms extracted from a plurality of information packets. The method and system allows implementation of complex matching techniques in real time. The method and system provides that a group of information packets originating from a single information source is checked in order to provide a query result. In addition, it provides that received information packets and information representative of a reception of extracted terms are stored in a manner that allows fast insertion and deletion of content.

InventorsOren Zamir, Guy Windreich, Guy Engelhard, Sharon Fridman, Edo Segal, Arik Kopelman
Original AssigneeThe Relegence Corporation
Primary Examiner: Paul H. Kang
Attorneys: Seth D. Levy, Davis Wright Tremaine LLP
Current U.S. Classification707/673; 707/694; 707/725; 707/747; 707/754; 707/758; 707/780; 707/781; 707/913; 707/959; 707/999.003; 709/206

View patent at USPTO
Search USPTO Assignment Database

Citations

Cited PatentFiling dateIssue dateOriginal AssigneeTitle
US5724424Nov 29, 1995Mar 3, 1998Open Market, Inc.Digital active advertising
US5886746Jul 8, 1997Mar 23, 1999Gemstar Development CorporationMethod for channel scanning
US5890152Sep 9, 1996Mar 30, 1999Seymour Alvin Rapaport
Jeffrey Alan Rapaport
Linda Rapaport
Personal feedback browser for obtaining media files
US5970206Apr 11, 1997Oct 19, 1999Gemstar Development CorporationTelevision calendar and method for creating same
US6052145Oct 1, 1997Apr 18, 2000Gemstar Development CorporationSystem and method for controlling the broadcast and recording of television programs and for distributing information to be displayed on a television screen
US6091882Aug 1, 1994Jul 18, 2000Gemstar Development CorporationApparatus and method using compressed codes for recorder preprogramming
US6101493Mar 15, 1999Aug 8, 2000Apple Computer, Inc.Method and system for displaying related information from a database
US6226635Aug 14, 1998May 1, 2001Microsoft CorporationLayered query management
US6269368Oct 16, 1998Jul 31, 2001Textwise LLCInformation retrieval using dynamic evidence combination
US6311189Dec 3, 1998Oct 30, 2001Altavista CompanyTechnique for matching a query to a portion of media
US6327590May 5, 1999Dec 4, 2001Xerox CorporationSystem and method for collaborative ranking of search results employing user and group profiles derived from document collection content analysis
US6332154Feb 19, 1999Dec 18, 2001Genesys Telecommunications Laboratories, Inc.Method and apparatus for providing media-independent self-help modules within a multimedia communication-center customer interface
US6381594Jul 11, 2000Apr 30, 2002Yahoo Inc.System and method for personalized information filtering and alert generation
US6574632Nov 18, 1998Jun 3, 2003Harris CorporationMultiple engine information retrieval and visualization system
US6591245Sep 28, 1999Jul 8, 2003Media content notification via communications network

Referenced by

Citing PatentFiling dateIssue dateOriginal AssigneeTitle
US7469276Dec 27, 2004Dec 23, 2008International Business Machines CorporationService offering for the delivery of information with continuing improvement
US7539611Nov 19, 2004May 26, 2009Method of identifying and highlighting text
US7849072Feb 26, 2007Dec 7, 2010NHN CorporationLocal terminal search system, filtering method used for the same, and recording medium storing program for performing the method
US7885993Dec 26, 2007Feb 8, 2011Sony CorporationCommunication apparatus, communication method, electronic apparatus, control method for controlling electronic apparatus, and storage medium
US7933975Aug 28, 2008Apr 26, 2011International Business Machines CorporationService offering for the delivery of information with continuing improvement
US7941439Mar 31, 2004May 10, 2011Google Inc.Methods and systems for information capture
US7979321Jul 25, 2007Jul 12, 2011eBay Inc.Merchandising items of topical interest
US8099407Mar 31, 2004Jan 17, 2012Google Inc.Methods and systems for processing media files
US8117060Dec 20, 2007Feb 14, 2012eBay Inc.Geographic demand distribution and forecast
US8121905May 31, 2011Feb 21, 2012eBay Inc.Merchandising items of topical interest
US8161053Mar 31, 2004Apr 17, 2012Google Inc.Methods and systems for eliminating duplicate events

Claims

1. A system for real time search, adapted to receive a client query originated by a client system, to receive a plurality of information packets provided by a plurality of information sources or representative of a portion of a signal provided by the plurality of information sources, and to generate query results to be provided to the client system, the system comprising:

an information packet processor, for receiving an information packet and for processing the information packet to generate at least one processed portion of the information packet, wherein the at least one processed portion of the information packet is an at least one extracted term;

storage means, coupled to the information packet processor and to a storage means, for temporarily storing information representative of a reception of the at least one processed portion of the information packet, the storage means being configured to allow fast insertion and fast deletion of content;

a query and result manager, coupled to the storage means, for matching a received client query against at least a portion of a content of the storage means to generate a query result; and

at least one module selected from a group of modules consisting of:
a message coordinator module adapted to coordinate a handling of a plurality of information packets;
a message buffer adapted to hold temporarily the plurality of information packets;
a message filter module for filtering the plurality of information packets according to predefined rules;
a term extractor module for performing parsing and stemming on said plurality of information packets;
a terms filter for excluding extracted terms according to predefined rules;
a queries coordinator module to coordinate the processing of client queries;
a query-term extractor to parse and stem incoming queries in order to extract and process operative query-terms;
a query-terms filter for excluding specific query-terms in a predefined manner;
an archive search module for indexing data on archive files containing historical informational content and for returning results according to said indexed data;
a semi-static database search module to act on a semi-static database holding semi-static information source control data;
a future search module for matching extracted terms from the plurality of information packets against static queries; and
a queries index for holding queries for a predefined time frame to provide means of future search.

2. The system of claim 1, wherein the storage means is a term index data structure.

3. The system of claim 2 wherein the term index data structure is adapted to hold indexed extracted terms and information packet identifiers.

4. The system of claim 3 wherein the term index data structure further comprises:

a terms hash table to hold extracted, filtered and processed terms;

a terms inverted file pointed to by said term hash table holding a terms inverted entry map;

a messages hash table to hold information packets identification;

a messages data table to hold information packets data; and
a channel map to hold a list of information sources and the related number of index terms of said information source.

5. The system of claim 4 wherein the terms inverted file further comprises:

a terms inverted entries map table;

a total instances of said term;

a number of information sources containing said term; and

a last modification time of said term.

6. The system of claim 5 further comprising:

a message terms keyed map;

an information source identification; and

an information packet time of arrival.

7. The system of claim 6 wherein the message terms keyed map further comprises:

a pointer to said terms inverted file;

an instances number of said term in said information packet; and

a pointer to said inverted file entry related to said term.

8. The system of claim 7 wherein the teams inverted entries map further comprises:

an information source identification;

an instances number of said term in said information source informational content; and

a time of last appearance of said term in said information source informational content.

9. The system of claim 1 wherein said storage means allows fast insertion and deletion of content.

10. The system of claim 1 wherein the storage means further allows timely deletions of irrelevant or time-decayed terms and query-terms.

11. The system of claim 1 further comprising a means selected from the group consisting of:

adding means for adding control data to said information packets;

filtering means for the plurality of information packets;

processing means for said extracted terms, to add control information to said extracted terms; and

term filtering means for the extracted terms to generate filtered extracted terms.

12. The system of claim 11 wherein the control data comprises information packet identification, information source identification and time of arrival.

13. The system of claim 1 wherein the extracted terms are extracted out of the plurality of information packets by parsing and stemming the plurality of information packets; and wherein the term filtering means are adapted to (a) discard said terms constructed of one-letter words; (b) discard said terms constructed of frequently used words; (c) discard said terms constructed of stop-words; and (d) discard said terms constructed of predefined words.

14. The system of claim 1 further adapted to receive an information packet; to store the information packet with an associated packet identifier in an information packet storage means; to store extracted term information representative of a reception of at least one extracted term, said at least one extracted term extracted from the information packet; and to link between the stored information packet and the extracted term information.

15. The system of claim 14 further adapted to delete an information packet and delete the linked extracted term information.

16. The system of claim 14 wherein information packets are stored in a messages hash, and wherein the linked extracted term information is stored in a terms hash.

17. The system of claim 16 wherein the extracted term information comprises at least one information field selected from a group consisting of:

a last modification time field, indicating a most recent time in which the extracted term was received;

a number of channels containing term, indicating a number of information sources that provided the extracted term;

a total instances field, indicating a number of times the extracted term was provided; and

a terms inverted entries map, comprising a plurality of terms inverted file entries, each entry holding information representative of a reception of the extracted term from a single information source.

18. The system of claim 17 wherein each inverted file entry comprises at least one field selected from a group consisting of:

a channel identifier, for identifying the information source that provided the extracted term;

an instances number, for indicating a number of times the extracted term was provided by an information source; and

a time of last appearance, for indicating a most recent time in which the extracted term was received from an information source.

19. The system of claim 18 wherein each information packet is further associated to a message terms key map, said message terms key map comprising of a plurality of message characteristic entries, each message characteristic entry associated to an extracted term being extracted from the information packet, said message characteristic entry comprising at least one field selected from a group consisting of:

a term inverted file, for pointing to the term extracted information;

an instance of number, for indicating a number of times said extracted term appeared in the information packet; and an inverted file entry, for pointing to a terms inverted file entry.

20. The system of claim 1 further adapted to insert an extracted term into a terms hash table and into a terms inverted file; insert an information source identification, said information source providing the extracted term to a terms inverted entry map table in said terms inverted file; insert information packet data in a messages hash table; insert the extracted term from said information packet to a messages data table; increase a value of instances in said messages data table by one; and update a value of information source identification in said message data table.

21. The system of claim 20 further adapted to extract an extracted term and accordingly to perform at least one operation selected from a group consisting of:

increase a value of total instances in said terms inverted file;

update a value of last modification time in said terms inverted file;

increase a value of instances number in said inverted entry map table associated with said information source identification in said terms inverted file; and

update a value of message time in said messages data table.

22. The system of claim 1 further adapted to delete an information packet, and accordingly to perform at least one operation selected from a group consisting of:

receive an information packet identification, whereas the terms extracted from the information packets are to be deleted;

read the information packet identification from a messages hash table in a terms index data structure of said storage means;

obtain relevant entries of said extracted terms belonging to said information packet in said messages data; and

access a terms inverted file of said storage means for each terms entry pointed to in said terms inverted file.

23. The system of claim 1 further adapted to store alert criteria and to match alert criteria received and processed in the past against newly received terms to generate an alert.

24. The system of claim 1 further adapted to match the client query against historical archives of informational content to generate an archive query result.

25. The system of claim 24 further adapted to generate a query result from an archive query result and from a recent query result.

26. The system of claim 1 further adapted to match the client query against a semi-static database of said informational content and having a low incidence of changing to generate a semi-static query result.

27. The system of claim 26 further adapted to generate a query result from a semi-static query result and from a recent query result.

28. The system of claim 1 further adapted to rank information sources according to a similarity between at least a portion of information packets provided by said information sources and between the client query.

29. The system of claim 28 further adapted to insert a list of ranked information sources in the query result.

30. The system of claim 29 wherein the step of ranking is based upon a parameter out of a group consisting of: a total amount of extracted terms provided by an information source in a predefined time interval; an elapsed time since the extracted term was provided by the information source in said predefined time interval; and an extracted term position in the information source.

31. The system of claim 1 wherein an information source is selected from a group consisting of: television broadcast providers; radio broadcast providers; data network providers; chat channels providers; news providers; and music providers.

32. The system of claim 1 wherein information packets comprise content selected from a group of: text, audio, video, multimedia, and executable code streaming media.

33. The system of claim 1 further adapted to compute a similarity between a client query and a group of at least one information packet.

34. The system of claim 33 wherein the group of at least one information packet comprises at least one information packet received from a single information source.

35. The system of claim 1 wherein the similarity reflects at least one parameter selected from the group consisting of;

a total amount of extracted terms being received from at least one information source during a predefined time interval;

a number of relevant extracted terms being received from at least one information source during the predefined time interval;

a total number of information sources being searched during the predefined time interval;

an elapsed time since a last appearance of a relevant extracted term from an information source during the predefined time interval;
a position of relevant extracted terms in at least one information source in proximity to a relevant extracted term;
a part of speech of a relevant extracted term; and
a relevant extracted term frequency and importance in a language of the information source.

36. The system of claim 1 adapted to implement a matching technique selected from a group consisting of:

Boolean based matching;

probabilistic matching;

fuzzy matching;

proximity matching; and
vector based matching.

37. The system of claim 1 adapted to implement complex matching techniques.