|Publication number||US20050132197 A1|
|Application number||US 10/845,648|
|Publication date||Jun 16, 2005|
|Filing date||May 13, 2004|
|Priority date||May 15, 2003|
|Also published as||EP1649645A2, US7831667, US8402102, US20050108339, US20050108340, US20110055343, WO2004105332A2, WO2004105332A3, WO2004105332A9|
|Publication number||10845648, 845648, US 2005/0132197 A1, US 2005/132197 A1, US 20050132197 A1, US 20050132197A1, US 2005132197 A1, US 2005132197A1, US-A1-20050132197, US-A1-2005132197, US2005/0132197A1, US2005/132197A1, US20050132197 A1, US20050132197A1, US2005132197 A1, US2005132197A1|
|Original Assignee||Art Medlar|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (6), Referenced by (47), Classifications (7), Legal Events (1)|
|External Links: USPTO, USPTO Assignment, Espacenet|
The present application claims priority to U.S. Provisional Application Ser. No. 60/471,242, filed May 15, 2003, which is incorporated herein in its entirety.
The present invention relates to data processing; more particularly, the present invention relates to a character-based comparison of documents.
The Internet is growing in popularity, and more and more people are conducting business over the Internet, advertising their products and services by generating and sending electronic mass mailings. These electronic messages (emails) are usually unsolicited and regarded as nuisances by the recipients because they occupy much of the storage space needed for the necessary and important data processing. For example, a mail server may have to reject accepting an important and/or desired email when its storage capacity is filled to the maximum with the unwanted emails containing advertisements. Moreover, thin client systems such as set top boxes, PDA's, network computers, and pagers all have limited storage capacity. Unwanted emails in any one of such systems can tie up a finite resource for the user. In addition, a typical user wastes time by downloading voluminous but useless advertisement information. These unwanted emails are commonly referred to as spam.
Presently, there are products that are capable of filtering out unwanted messages. For example, a spam block method exists which keeps an index list of all spam agents (i.e., companies that generate mass unsolicited e-mails), and provides means to block any e-mail sent from a company on the list.
Another “junk mail” filter currently available employs filters which are based on predefined words and patterns as mentioned above. An incoming mail is designated as an unwanted mail, if the subject contains a known spam pattern.
However, as spam filtering grows in sophistication, so do the techniques of spammers in avoiding the filters. Examples of tactics incorporated by recent generation of spammers include randomization, origin concealment, and filter evasion using HTML.
A method and system for a character-based comparison of documents are described. According to one aspect, the method includes dividing a first document into tokens. Each token includes a predefined number of sequential characters from the first document. The method further includes calculating hash values for the tokens and creating, for the first document, a signature including a subset of hash values from the calculated hash values and additional information pertaining to the tokens of the first document. The signature of the first document is subsequently compared with a signature of a second document to determine resemblance between the first document and the second document.
Other features of the present invention will be apparent from the accompanying drawings and from the detailed description that follows.
The present invention will be understood more fully from the detailed description given below and from the accompanying drawings of various embodiments of the invention, which, however, should not be taken to limit the invention to the specific embodiments, but are for explanation and understanding only.
A method and apparatus for a character-based comparison of documents are described. In the following description, numerous details are set forth. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present invention.
Some portions of the detailed descriptions which follow are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
The present invention also relates to apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method steps. The required structure for a variety of these systems will appear from the description below. In addition, the present invention is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the invention as described herein.
A machine-readable medium includes any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer). For example, a machine-readable medium includes read only memory (“ROM”); random access memory (“RAM”); magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, acoustical or other form of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.); etc.
Filtering Email Spam Based on Similarity Measures
The control center 102 is an anti-spam facility that is responsible for analyzing messages identified as spam, developing filtering rules for detecting spam, and distributing the filtering rules to the servers 104. A message may be identified as spam because it was sent by a known spam source (as determined, for example, using a “spam probe”, i.e., an email address specifically selected to make its way into as many spammer mailing lists as possible).
A server 104 may be a mail server that receives and stores messages addressed to users of corresponding user terminals sent. Alternatively, a server 104 may be a different server coupled to the mail server 104. Servers 104 are responsible for filtering incoming messages based on the filtering rules received from the control center 102.
In one embodiment, the control center 102 includes a spam content preparation module 108 that is responsible for generating data characterizing the content associated with a spam attack and sending this data to the servers 104. Each server 104 includes a similarity determination module 110 that is responsible for storing spam data received from the control center 102 and identifying incoming email messages resembling the spam content using the stored data.
In an alternative embodiment, each server 104 hosts both the spam content preparation module 108 that generates data characterizing the content associated with a spam attack and the similarity determination module 110 that uses the generated data to identify email messages resembling the spam content.
The spam content parser 202 is responsible for parsing the body of email messages resulting from spam attacks (referred to as spam messages).
The spam data generator 206 is responsible for generating data characterizing a spam message. In one embodiment, data characterizing a spam message includes a list of hash values calculated for sets of tokens (e.g., characters, words, lines, etc.) composing the spam message. Data characterizing a spam message or any other email message is referred to herein as a message signature. Signatures of spam messages or any other email messages may contain various data identifying the message content and may be created using various algorithms that enable the use of similarity measures in comparing signatures of different email messages.
In one embodiment, the spam content preparation module 200 also includes a noise reduction algorithm 204 that is responsible for detecting data indicative of noise and removing the noise from spam messages prior to generating signatures of spam messages. Noise represents data invisible to a recipient that was added to a spam message to hide its spam nature.
In one embodiment, the spam content preparation module 200 also includes a message grouping algorithm (not shown) that is responsible for grouping messages originated from a single spam attack. Grouping may be performed based on specified characteristics of spam messages (e.g., included URLs, message parts, etc.). If grouping is used, the spam data generator 206 may generate a signature for a group of spam messages rather than for each individual message.
The spam data transmitter 208 is responsible for distributing signatures of spam messages to participating servers such as servers 104 of
The incoming message parser 302 is responsible for parsing the body of incoming email messages.
The spam data receiver 306 is responsible for receiving signatures of spam messages and storing them in the spam database 304.
The message data generator 310 is responsible for generating signatures of incoming email messages. In some embodiments, a signature of an incoming email message includes a list of hash values calculated for sets of tokens (e.g., characters, words, lines, etc.) composing the incoming email message. In other embodiments, a signature of an incoming email message includes various other data characterizing the content of the email message (e.g., a subset of token sets composing the incoming email message). As discussed above, signatures of email messages may be created using various algorithms that allow for use of similarity measures in comparing signatures of different email messages.
In one embodiment, the similarity determination module 300 also includes an incoming message cleaning algorithm 308 that is responsible for detecting data indicative of noise and removing the noise from the incoming email messages prior to generating their signatures, as will be discussed in more detail below.
The resemblance identifier 312 is responsible for comparing the signature of each incoming email message with the signatures of spam messages stored in the spam database 304 and determining, based on this comparison, whether an incoming email message is similar to any spam message.
In one embodiment, the spam database 304 stores signatures generated for spam messages before they undergo the noise reduction process (i.e., noisy spam messages) and signatures generated for these spam messages after they undergo the noise reduction process (i.e., spam message with reduced noise). In this embodiment, the message data generator 310 first generates a signature of an incoming email message prior to noise reduction, and the resemblance identifier 312 compares this signature with the signatures of noisy spam messages. If this comparison indicates that the incoming email message is similar to one of these spam messages, then the resemblance identifier 312 marks this incoming email message as spam. Alternatively, the resemblance identifier 312 invokes the incoming message cleaning algorithm 308 to remove noise from the incoming email message. Then, the message data generator 310 generates a signature for the modified incoming message, which is then compared by the resemblance identifier 312 with the signatures of spam messages with reduced noise.
At processing block 404, processing logic modifies the spam message to reduce noise. One embodiment of a noise reduction algorithm will be discussed in more detail below in conjunction with
At processing block 406, processing logic generates a signature of the spam message. In one embodiment, a signature of the spam message includes a list of hash values calculated for sets of tokens (e.g., characters, words, lines, etc.) composing the incoming email message, as will be discussed in more detail below in conjunction with
At processing block 408, processing logic transfers the signature of the spam message to a server (e.g., a server 104 of
At processing block 504, processing logic modifies the incoming message to reduce noise. One embodiment of a noise reduction algorithm will be discussed in more detail below in conjunction with
At processing block 506, processing logic generates a signature of the incoming message based on the content of the incoming message. In one embodiment, a signature of an incoming email message includes a list of hash values calculated for sets of tokens (e.g., characters, words, lines, etc.) composing the incoming email message, as will be discussed in more detail below in conjunction with
At processing block 508, processing compares the signature of the incoming messages with signatures of spam messages.
At processing block 510, processing logic determines that the resemblance between the signature of the incoming message and a signature of some spam message exceeds a threshold similarity measure. One embodiment of a process for determining the resemblance between two messages is discussed in more detail below in conjunction with
At processing block 512, processing logic marks the incoming email message as spam.
At processing block 604, processing logic calculates hash values for the sets of tokens. In one embodiment, a hash value is calculated by applying a hash function to each combination of a set of tokens and a corresponding token occurrence number.
At processing block 606, processing logic creates a signature for the email message using the calculated hash values. In one embodiment, the signature is created by selecting a subset of calculated hash values and adding a parameter characterizing the email message to the selected subset of calculated hash values. The parameter may specify, for example, the size of the email message, the number of calculated hash values, the keyword associated with the email message, the name of an attachment file, etc.
In one embodiment, a signature for an email message is created using a character-based document comparison mechanism that will be discussed in more detail below in conjunction with
Processing logic begins with comparing a parameter in a signature of the incoming email message with a corresponding parameter in a signature of each spam message (processing block 652).
A decision box 654, processing logic determines whether any spam message signatures contain a parameter similar to the parameter of the incoming message signature. The similarity may be determined, for example, based on the allowed difference between the two parameters or the allowed ratio of the two parameters.
If none of the spam message signatures contain a parameter similar to the parameter of the incoming message signature, processing logic decides that the incoming email message is legitimate (i.e., it is not spam) (processing block 662).
Alternatively, if one or more spam message signatures have a similar parameter, processing logic determines whether the signature of he first spam message has hash values similar to the hash values in the signature of the incoming email (decision box 656). Based on the similarity threshold, the hash values may be considered similar if, for example, a certain number of them matches or the ratio of matched and unmatched hash values exceeds a specified threshold.
If the first spam message signature has hash values similar to the hash values of the incoming email signature, processing logic decides that the incoming email message is spam (processing block 670). Otherwise, processing logic further determines if there are more spam message signatures with the similar parameter (decision box 658). If so, processing logic determines whether the next spam message signature has hash values similar to the hash values of the incoming email signature (decision box 656). If so, processing logic decides that the incoming email message is spam (processing block 670). If not, processing logic returns to processing block 658.
If processing logic determines that no other spam message signatures have the similar parameter, then it decides that the incoming mail message is not spam (processing block 662).
Character-Based Document Comparison Mechanism
At processing block 704, processing logic divides the document into tokens, with each token including a predefined number of sequential characters from the document. In one embodiment, each token is combined with its occurrence number. This combination is referred to as a labeled shingle. For example, if the predefined number of sequential characters in the token is equal to 3, the expression specified above includes the following set of labeled shingles:
In one embodiment, the shingles are represented as a histogram.
At processing block 706, processing logic calculates hash values for the tokens. In one embodiment, the hash values are calculated for the labeled shingles. For example, if a hashing function H(x) is applied to each labeled shingle illustrated above, the following results are produced:
In one embodiment, processing logic then sorts the hash values as follows:
At processing block 708, processing logic selects a subset of hash values from the calculated hash values. In one embodiment, processing logic selects X smallest values from the sorted hash values and creates from them a “sketch” of the document. For example, for X=4, the sketch can be expressed as follows:
At processing block 710, processing logic creates a signature of the document by adding to the sketch a parameter pertaining to the tokens of the document. In one embodiment, the parameter specifies the number of original tokens in the document. In the example above, the number of original tokens is 15. Hence, the signature of the document can be expressed as follows:
If the token number in the first signature is outside of the allowed range with respect to the token number from the second signature, processing logic decides that documents 1 and 2 are different (processing block 808). Otherwise, if the token number in the first signature is within the allowed range with respect to the token number from the second signature, processing logic determines whether the resemblance between hash values in signatures 1 and 2 exceeds a threshold (e.g., more than 95 percent of hash values are the same) (decision box 804). If so, processing logic decides that the two documents are similar (processing block 806). If not, processing logic decides that documents 1 and 2 are different (processing block 808).
Email Spam Filtering Using Noise Reduction
At processing block 904, processing logic modifies the content of the email message to reduce the noise. In one embodiment, the content modification includes removing formatting data, translating numeric character references and charcater entity references to their ASCII equivalents, and modifying URL data.
At processing block 906, processing logic compares the modified content of the email message with the content of a spam message. In one embodiment, the comparison is performed to identify an exact match. Alternatively, the comparison is performed to determine whether the two documents are similar.
At decision box 1004, processing logic determines whether the found formatting data qualifies as an exception. Typically, HTML formatting does not add anything to the information content of a message. However, a few exceptions exist. These exceptions are the tags that contain useful information for further processing of the message (e.g., tags <BODY>, <A>, <IMG>, and <FONT>). For example, the <FONT> and <BODY> tags are needed for “white on white” text elimination, and the <A> and <IMG> tags typically contain link information that may be used for passing data to other components of the system.
If the formatting data does not qualify as an exception, the formatting data is extracted from the email message (processing block 1006).
Next, processing logic converts each numerical character reference and character entity reference into a corresponding ASCII character (processing block 1008).
In HTML, numeric character references may take two forms:
Some times the conversion performed at processing block 1008 may need to be repeated. For example, the string “&” corresponds to the string “&” in ASCII, the string “#” corresponds to the string “#” in ASCII, the string “3” corresponds to 3 in ASCII, the string “#56;” corresponds to 8 in ASCII, and “#59;” corresponds to the string “;” in ASCII. Hence, the combined string “&#38;”, when converted, results in the string “&” that needs to be converted.
Accordingly, after the first conversion operation at processing block 1008, processing logic checks whether the converted data still includes numeric character references or character entity references (decision box 1010). If the check is positive, processing logic repeats the conversion operation at processing block 1008. Otherwise, processing logic proceeds to processing block 1012.
At processing block 1012, processing logic modifies URL data of predefined categories. These categories may include, for example, numerical character references contained in the URL that are converted by processing logic into corresponding ASCII characters. In addition, the URL “password” syntax may be used to add characters before an “@” in the URL hostname. These characters are ignored by the target web server but they add significant amounts of noise information to each URL. Processing logic modifies the URL data by removing these additional characters. Finally, processing logic removes the “query” part of the URL, following a string “?” at the end of the URL.
An example of a URL is as follows:
http %3a %2f %firstname.lastname@example.org %2fbar.html?muchmorejunk
Processing logic modifies the above URL data into http://www.foo.coni/bar.html.
An Exemplary Computer System
The computer system 1100 includes a processor 1102, a main memory 1104 and a static memory 1106, which communicate with each other via a bus 1108. The computer system 1100 may further include a video display unit 1110 (e.g., a liquid crystal display (LCD) or a cathode ray tube (CRT)). The computer system 1100 also includes an alpha-numeric input device 1112 (e.g., a keyboard), a cursor control device 1114 (e.g., a mouse), a disk drive unit 1116, a signal generation device 1120 (e.g., a speaker) and a network interface device 1122.
The disk drive unit 1116 includes a computer-readable medium 1124 on which is stored a set of instructions (i.e., software) 1126 embodying any one, or all, of the methodologies described above. The software 1126 is also shown to reside, completely or at least partially, within the main memory 1104 and/or within the processor 1102. The software 1126 may further be transmitted or received via the network interface device 1122. For the purposes of this specification, the term “computer-readable medium” shall be taken to include any medium that is capable of storing or encoding a sequence of instructions for execution by the computer and that cause the computer to perform any one of the methodologies of the present invention. The term “computer-readable medium” shall accordingly be taken to included, but not be limited to, solid-state memories, optical and magnetic disks, and carrier wave signals.
Whereas many alterations and modifications of the present invention will no doubt become apparent to a person of ordinary skill in the art after having read the foregoing description, it is to be understood that any particular embodiment shown and described by way of illustration is in no way intended to be considered limiting. Therefore, references to details of various embodiments are not intended to limit the scope of the claims which in themselves recite only those features regarded as essential to the invention.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US5926812 *||Mar 28, 1997||Jul 20, 1999||Mantra Technologies, Inc.||Document extraction and comparison method with applications to automatic personalized database searching|
|US6199103 *||Jun 23, 1998||Mar 6, 2001||Omron Corporation||Electronic mail determination method and system and storage medium|
|US6460050 *||Dec 22, 1999||Oct 1, 2002||Mark Raymond Pace||Distributed content identification system|
|US6804667 *||Nov 30, 1999||Oct 12, 2004||Ncr Corporation||Filter for checking for duplicate entries in database|
|US7080123 *||Sep 20, 2001||Jul 18, 2006||Sun Microsystems, Inc.||System and method for preventing unnecessary message duplication in electronic mail|
|US20030195937 *||Apr 16, 2002||Oct 16, 2003||Kontact Software Inc.||Intelligent message screening|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US7434058 *||Jun 7, 2004||Oct 7, 2008||Reconnex Corporation||Generating signatures over a document|
|US7596700 *||Dec 22, 2004||Sep 29, 2009||Storage Technology Corporation||Method and system for establishing trusting environment for sharing data between mutually mistrusting entities|
|US7657104||Nov 21, 2005||Feb 2, 2010||Mcafee, Inc.||Identifying image type in a capture system|
|US7660865 *||Aug 12, 2004||Feb 9, 2010||Microsoft Corporation||Spam filtering with probabilistic secure hashes|
|US7689614||May 22, 2006||Mar 30, 2010||Mcafee, Inc.||Query generation for a capture system|
|US7730011||Oct 19, 2005||Jun 1, 2010||Mcafee, Inc.||Attributes of captured objects in a capture system|
|US7739337 *||Jun 20, 2005||Jun 15, 2010||Symantec Corporation||Method and apparatus for grouping spam email messages|
|US7765204 *||Sep 27, 2007||Jul 27, 2010||Microsoft Corporation||Method of finding candidate sub-queries from longer queries|
|US7774604||Nov 22, 2004||Aug 10, 2010||Mcafee, Inc.||Verifying captured objects before presentation|
|US7814327||Mar 30, 2004||Oct 12, 2010||Mcafee, Inc.||Document registration|
|US7818326||Aug 31, 2005||Oct 19, 2010||Mcafee, Inc.||System and method for word indexing in a capture system and querying thereof|
|US7899828||Mar 30, 2004||Mar 1, 2011||Mcafee, Inc.||Tag data structure for maintaining relational data over captured objects|
|US7907608||Aug 12, 2005||Mar 15, 2011||Mcafee, Inc.||High speed packet capture|
|US7930540||Nov 22, 2004||Apr 19, 2011||Mcafee, Inc.||Cryptographic policy enforcement|
|US7949849||Jun 27, 2005||May 24, 2011||Mcafee, Inc.||File system for a capture system|
|US7958227||Jun 7, 2011||Mcafee, Inc.||Attributes of captured objects in a capture system|
|US8001193 *||May 16, 2006||Aug 16, 2011||Ntt Docomo, Inc.||Data communications system and data communications method for detecting unsolicited communications|
|US8037145||Mar 31, 2008||Oct 11, 2011||Symantec Operating Corporation||System and method for detecting email content containment|
|US8171002 *||Feb 17, 2009||May 1, 2012||Trend Micro Incorporated||Matching engine with signature generation|
|US8191148 *||Dec 14, 2009||May 29, 2012||Sonicwall, Inc.||Classifying a message based on fraud indicators|
|US8275842 *||Mar 31, 2008||Sep 25, 2012||Symantec Operating Corporation||System and method for detecting content similarity within email documents by sparse subset hashing|
|US8396897 *||Nov 22, 2004||Mar 12, 2013||International Business Machines Corporation||Method, system, and computer program product for threading documents using body text analysis|
|US8402102||Mar 19, 2013||Symantec Corporation||Method and apparatus for filtering email spam using email noise reduction|
|US8458268 *||Feb 22, 2010||Jun 4, 2013||Symantec Corporation||Systems and methods for distributing spam signatures|
|US8473442||Feb 25, 2009||Jun 25, 2013||Mcafee, Inc.||System and method for intelligent state management|
|US8572190||Dec 1, 2009||Oct 29, 2013||Watchguard Technologies, Inc.||Method and system for recognizing desired email|
|US8656039||Jun 8, 2004||Feb 18, 2014||Mcafee, Inc.||Rule parser|
|US8661545||May 3, 2012||Feb 25, 2014||Sonicwall, Inc.||Classifying a message based on fraud indicators|
|US8935783 *||Mar 8, 2013||Jan 13, 2015||Bitdefender IPR Management Ltd.||Document classification using multiscale text fingerprints|
|US8954458||Jul 11, 2011||Feb 10, 2015||Aol Inc.||Systems and methods for providing a content item database and identifying content items|
|US8984289||Feb 7, 2014||Mar 17, 2015||Sonicwall, Inc.||Classifying a message based on fraud indicators|
|US9092471||Feb 14, 2014||Jul 28, 2015||Mcafee, Inc.||Rule parser|
|US9094338||Mar 21, 2014||Jul 28, 2015||Mcafee, Inc.||Attributes of captured objects in a capture system|
|US20050127171 *||Mar 30, 2004||Jun 16, 2005||Ahuja Ratinder Paul S.||Document registration|
|US20050131876 *||Mar 31, 2004||Jun 16, 2005||Ahuja Ratinder Paul S.||Graphical user interface for capture system|
|US20050132046 *||Mar 30, 2004||Jun 16, 2005||De La Iglesia Erik||Method and apparatus for data capture and analysis system|
|US20050132079 *||Mar 30, 2004||Jun 16, 2005||Iglesia Erik D.L.||Tag data structure for maintaining relational data over captured objects|
|US20050166066 *||Nov 22, 2004||Jul 28, 2005||Ratinder Paul Singh Ahuja||Cryptographic policy enforcement|
|US20050177725 *||Nov 22, 2004||Aug 11, 2005||Rick Lowe||Verifying captured objects before presentation|
|US20050204005 *||Mar 12, 2004||Sep 15, 2005||Purcell Sean E.||Selective treatment of messages based on junk rating|
|US20050273614 *||Jun 7, 2004||Dec 8, 2005||Ahuja Ratinder P S||Generating signatures over a document|
|US20090193018 *||Jul 30, 2009||Liwei Ren||Matching Engine With Signature Generation|
|US20100095378 *||Dec 14, 2009||Apr 15, 2010||Jonathan Oliver||Classifying a Message Based on Fraud Indicators|
|US20140259157 *||Mar 8, 2013||Sep 11, 2014||Bitdefender IPR Management Ltd.||Document Classification Using Multiscale Text Fingerprints|
|US20150089644 *||Dec 2, 2014||Mar 26, 2015||Bitdefender IPR Management Ltd.||Document Classification Using Multiscale Text Fingerprints|
|EP1837784A1 *||Mar 23, 2007||Sep 26, 2007||Canon Kabushiki Kaisha||Document management apparatus, document management system, control method of the apparatus and system, program, and storage medium|
|EP1997281A1 *||Feb 19, 2007||Dec 3, 2008||Borderware Technologies Inc.||Method and sytem for recognizing desired email|
|International Classification||H04L12/58, G06F15/16|
|Cooperative Classification||H04L51/12, H04L51/063|
|European Classification||H04L12/58F, H04L12/58|
|Mar 7, 2005||AS||Assignment|
Owner name: SYMANTEC CORPORATION, CALIFORNIA
Free format text: MERGER;ASSIGNOR:BRIGHTMAIL, INC.;REEL/FRAME:016331/0026
Effective date: 20040618