BACKGROUND OF THE INVENTION
This is a non-provisional application based on Provisional Application Serial No. 60/066,292 filed Nov. 25, 1997, the contents of which are incorporated herein by reference.
1. Field of the Invention
The present invention concerns electronic messaging in general and electronic mail in particular, and provides a method and system for handling electronic mail messages, verifying the origination of messages to determine the probability that they are or are not junk e-mail, and detecting that a mass mailing has been initiated by utilizing special addresses.
2. Description of the Background
Digital storage of information brings with it the ability to transfer such information easily and inexpensively. As a result of this situation, unwanted or unsolicited junk e-mail (sometimes referred to as “spam”) has become prevalent on the Internet since messages can be sent without a specific “per-character” cost. As a result, the average e-mail account currently receives a number of unsolicited, unwelcome pieces of junk e-mail each day, with a rapidly increasing number of pieces being forecast.
Documents are available which describe electronic mail handling procedures. In particular, two Internet standards on e-mail are incorporated herein by reference in their entirety. They are: Internet STD0014 entitled “MAIL ROUTING AND THE DOMAIN SYSTEM” (also known as RFC 974) and Internet STD0010 entitled “SIMPLE MAIL TRANSFER PROTOCOL” (also known as RFC 821). The contents of the Second Edition of “sendmail” by Bryan Costales and Eric Allman, published by O'Reilly Publishing, is also incorporated herein by reference. Further, some issued patents address the general handling of electronic mail. For example, U.S. Pat. No. 5,377,354 teaches a method for prioritizing a plurality of incoming electronic mail messages by comparing the messages with a list of key words. U.S. Pat. No. 5,619,648 teaches a method for reducing junk e-mail which uses non-address information and uses a filtering system that has access to models of the user's correspondents. The e-mail system adds a recipient identifier that is used to further specify the recipients in the group to whom the message is sent who should actually receive the message.
U.S. Pat. No. 5,555,426 teaches a method and apparatus for disseminating messages to unspecified users in a data processing system. The method permits users to associate conditions of interest, such as keywords or originator identities, but does not perform any verification of the originator's identity. The method permits messages to be sent based upon probable interest in the message, rather than being addressed to any specific individual.
U.S. Pat. No. 5,627,764 teaches a method for implementing a rules-based system that can run a user's set of rules under system control and process messages according to the user's rules. Peloria Mail Scout uses rules to screen junk mail by limiting messages to only known and acceptable senders, but makes no provision for unknown, yet acceptable senders.
- SUMMARY OF THE INVENTION
U.S. Pat. No. 5,675,733 teaches a method for collecting, sorting, and compiling statistical summaries of message acknowledgment data, also known as Confirmations of Delivery or COD's. The invention teaches a method for acknowledging a single message to multiple recipients and generating a statistical list of information delivery under such circumstances. Each of the above-referenced U.S. patents are incorporated herein by reference in their entirety.
It is an object of the present invention to address deficiencies in known e-mail handling systems.
This object and other objects of the present invention are addressed through the use of a computer system or mail handling system which provides enhanced blocking of junk e-mail. Accordingly, the present invention first ascertains if the sender of the e-mail has a verifiable identity and valid computer address. Based upon that determination, certain user-assignable and computable confidence ratios may be automatically determined. If the identity cannot be verified or the address is determined not to be valid or usable for a reply to the sender, the mail can be assigned a presumptive classification as junk e-mail. By applying additional filters, the confidence ratio can be increased to nearly 100%, and the mail can be handled in accordance with standard rules-based procedures, providing for a range of alternatives that include deletion or storage in a manner determined by the user.
The system of the present invention also advantageously utilizes a cooperative tool, known as an authenticator, to determine if a received e-mail is a junk e-mail. The mail handling system, either automatically or as part of a mail filter, contacts an authenticator with information about a received e-mail. If the authenticator has received negative or adverse notifications from other users who have received the same or similar e-mails, the authenticator informs any mail handling systems that ask that the received e-mail is very likely junk e-mail. This information from the authenticator along with other factors can be weighted to provide an overall confidence rating.
BRIEF DESCRIPTION OF THE DRAWINGS
The system of the present invention also advantageously utilizes a list of “seed” addresses that do not correspond to real users but, rather, to special non-existent (or ghost) accounts. When a message is received that is addressed to a ghost account, the system searches other incoming and recently received messages for the same message body. For messages with the same message body as received for the ghost account, the system marks the messages as having a high probability of being junk e-mail. In an alternate embodiment, the system of the present invention provides cooperative filtering by sending the message body to authenticators or other systems to help the authenticators or other systems to determine that the message is probably a junk e-mail.
FIG. 1 is a schematic illustration of a computer system for performing the method of the present invention;
FIG. 2 is a listing of a first exemplary header that is analyzed according to the present invention;
FIG. 3 is a listing of a second exemplary header that is analyzed according to the present invention;
FIG. 4 is a pseudo-code listing of how deliverability is tested according to the present invention,
FIG. 5 is a pseudo-code listing of how confidence testing of a message is performed according to the present invention;
FIGS. 6A and 6B are flow diagrams of how message creation, transmission, and reception are processed according to the present invention;
FIG. 7 is a schematic illustration of plural computers which interact to send, receive, and process/authenticate e-mail according to the present invention; and
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
FIG. 8 is a schematic illustration of the operation of the authenticator of the present invention.
Referring now to the drawings, wherein like reference numerals designate identical or corresponding parts throughout the several views, FIG. 1 is a schematic illustration of a computer system for blocking unwanted or junk e-mails. A computer 100 implements the method of the present invention, wherein the computer housing 102 houses a motherboard 104 which contains a CPU 106, memory 108 (e.g., DRAM, ROM, EPROM, EEPROM, SRAM and Flash RAM), and other optional special purpose logic devices (e.g., ASICs) or configurable logic devices (e.g., GAL and reprogrammable FPGA). The computer 100 also includes plural input devices, (e.g., a keyboard 122 and mouse 124), and a display card 110 for controlling monitor 120. In addition, the computer system 100 further includes a floppy disk drive 114; other removable media devices (e.g., compact disc 119, tape, and removable magneto-optical media (not shown)); and a hard disk 112, or other fixed, high density media drives, connected using an appropriate device bus (e.g., a SCSI bus or an Enhanced IDE bus). Although compact disc 119 is shown in a CD caddy, the compact disc 119 can be inserted directly into CD-ROM drives which do not require caddies. Also connected to the same device bus or another device bus as the high density media drives, the computer 100 may additionally include a compact disc reader 118, a compact disc reader/writer unit (not shown) or a compact disc jukebox (not shown). In addition, a printer (not shown) also provides printed e-mails.
The system further includes at least one computer readable medium. Examples of computer readable media are compact discs 119, hard disks 112, floppy disks, tape, magneto-optical disks, PROMs (EPROM, EEPROM, Flash EPROM), DRAM, SRAM, etc. Stored on any one or on a combination of the computer readable media, the present invention includes software for controlling both the hardware of the computer 100 and for enabling the computer 100 to interact with a human user. Such software may include, but is not limited to, device drivers, operating systems and user applications, such as development tools. Such computer readable media further includes the computer program product of the present invention for blocking unwanted e-mails. These computer readable media can include programs, dynamic link libraries, scripts, or any other executable or interpreted code, including, but not limited to, Java code, C or C++ code, Perl scripts, and Active X controls.
The method and system of the present invention assign confidence ratings to messages to signify the statuses of the messages as junk e-mails or as a bonafide messages that the recipient may wish to read. The method and system begin by analyzing the origins and transmission paths of the messages. The sender's origination information is extracted from the e-mail message and an automatic reply (called a verification request) is created and sent. Based on the verification response that is received in response to the verification request, the sender is scored as to the probable characteristics, origination, validity, and desirability of the mail. Incoming messages (e-mails) are automatically scanned and parsed, either (1) at a server located at an Internet provider (prior to delivery to the intended ultimate recipient), (2) at a LAN-based receiving station, or (3) at the actual ultimate recipient's mail machine, i.e., local to the user. Once the message has been parsed or broken down into fields, the message is compared with several user defined rules for handling messages, and a confidence rating is assigned to the message. In one embodiment, the message header information is analyzed and a verification request(s) is/are automatically sent to the purported sender(s), as identified by fields such as “From:” or “Reply-To:”. If there is a delivery problem in delivering the verification request, the presumed validity of the message is reduced in accordance with a set of user-definable criteria. In addition to determining the purported origination point, the present invention automatically analyzes all information pertaining to the sender, the path of delivery, any information pertaining to copies, blind copies, or other indicia of validity of the origin of the message to determine if there has been a discernable effort to obscure the origin, disguise the sender, or in some other way to inhibit the recipient from performing verification of the sender's identity. For example, if a message has purportedly been relayed through a machine named mail.fromnowhere.com and the mail handling system has determined that such a machine does not actually exist, the confidence rating for the message should be decreased.
Techniques for reducing the amount of junk e-mail by using confidence rating technology based upon characteristics of junk e-mail are also implemented in the invention. Factors that the invention incorporates in a determination of the status of mail as junk e-mail or a valid message, include maintaining (1) a list of certain mail providers known to be an origination point of junk e-mail, (2) a dictionary of certain content frequently found in junk e-mail, and (3) a learning knowledge base that creates its own rules to ascertain prior junk e-mail characteristics and subsequently adds those criteria to the knowledge base to prevent future junk e-mail with the same or similar characteristics from being delivered.
Primary components of the invention are (1) screening all incoming messages by the receiver on either the mail server or the local receiving facility and (2) automatically sending a reply (in the form of a verification request) to the purported sender(s). The verification request is sent to all address locations contained in the sender's address information or any subset of those addresses as determined by the recipient. If that verification request is undeliverable (as determined by the receipt of the corresponding verification response), the message can be automatically deleted or marked as junk e-mail. In addition, rules filters can be used in conjunction with the presumptive test for a purported sender's address, to determine a confidence rating based upon a scoring technique, which the user can set forth based upon factors the user considers to be most significant. The e-mail filtering can be used in conjunction with the verification response to refine the confidence rating. As an example, a previously read junk e-mail can be added to the rules base to look for certain phrases. This may not be sufficient, however, to screen out valid mail that cites or quotes from the junk e-mail. If, however, the content is combined with an address that cannot pass a verification request, the user may wish to assign a 100% confidence rating, and the mail can optionally be automatically deleted.
FIG. 2 shows an exemplary e-mail header that is received by the system of the present invention. The fields for “Return Path:,” “From:,” and “Reply-To:” are highlighted as three of the fields which the present invention will parse from the message header. The line:
is broken down into a user id (48941493) and a host name (notarealaddress.com). Likewise, the line:
is also broken down into its corresponding user id (junker) and host name (notarealaddress3.com). Both of these addresses will receive verification requests attempting to verify that these addresses represent valid user and host names. The same process is performed on the message header shown in FIG. 3.
Accordingly, the system of the present invention can analyze e-mail headers to determine whether or not the e-mail has been received from a site suspected of sending junk e-mail. A received e-mail that conforms to RFC 821 includes fields identifying the sender and the recipient, i.e., the “From:” and the “To:” fields, respectively. Messages may optionally contain a “Reply-To:” field if a user wishes to have his/her replies directed to a different e-mail address. Since junk e-mails often come from either non-existent users or non-existent sites or both, a first level check is to determine if the alleged sender identified by the “From:” or “Reply-To:” fields are valid. This first level check corresponds to issuing a verification request and can be in many forms, including: (1) sending a message to the user identified by the “From:” or “Reply-To:” fields and examining whether the message can be successfully delivered, (2) using the UNIX “whois” command to determine if a site (or host) by that name actually exists, (3) using the UNIX “finger” command to identify if a user name exists at a verifiable host, (4) using the “vrfy” command when connected to a sendmail daemon to verify that a user exists at a particular site, and (5) using the UNIX “traceroute” command to make sure there is a valid route back to the specified host. It is presently preferred to utilize a method which does not create an endless cycle of messages while attempting to verify a sender's address. That is, if each message generated a sender verification message which in turn generated a sender verification message, then the system would quickly become inundated with extra messages.- Accordingly, the present invention utilizes messaging for sender verification that do not generate a cascade of new verification requests. In an alternate embodiment, the system keeps track of which verification requests are outstanding and thereby prevents cascading requests by limiting the system to sending a single verification message for a particular address within a period of time. The system thus maintains a cache of recently authorized and recently denied addresses.
FIG. 4 shows a test of deliverability for three messages received by a mail handling system. Each of the three header messages is parsed into fields to enable the system to determine purported senders. The system then generates replies to the messages in the form of verification requests. Each of the verification requests is sent to the purported sender of its corresponding message, and the replies or verification responses are analyzed. For each of the verification requests that were undeliverable, the system marks the message as suspected junk e-mail, otherwise the message passes the sender deliverability test. Additionally, the verification request, when successful, performs the function of providing a return receipt verification.
The process of FIG. 4 can be augmented in an alternate embodiment to include the confidence testing shown in FIG. 5. By analyzing phrases and keywords in the message bodies, better confidence values can be assigned to each e-mail message.
When verifying that a user is a valid user by sending a verification request in the form of an e-mail message, the system creates and transmits an e-mail message and examines the verification response as shown in FIGS. 6A, 6B, and 7
. The network that connects the computers can either be a local area network, a wide area network, or the Internet. Table I below shows the steps of creating and transmitting an e-mail message and of receiving a delivery result message as shown in FIGS. 6A and 6B.
|TABLE I |
|A. ||Message Creation |
| ||1. ||Address header |
| ||2. ||Subject |
| ||3. ||Message content |
|B. ||Message Transmission |
| ||1. ||Address Header |
| ||2. ||Routing |
| || ||a) ||To |
| || ||b) ||From |
| || || ||(1) Test From Address for validity |
| || ||c) ||Reply |
| || || ||(1) Test Reply Address for validity |
| || ||d) ||Received 1 |
| || || ||(1) Test for Validity |
| || ||e) ||Received 2 |
| || || ||(1) Test for Validity |
| || ||f) ||Received 3 |
| || || ||(1) Test of Validity |
|C. ||Message Receipt |
| ||1. ||Server |
| || ||a) ||Review results of tests |
| || ||b) ||Apply rules based on test results |
| || ||c) ||Assign confidence rating |
| || ||d) ||File mail based on confidence rating rule |
| ||2. ||Local |
| || ||a) ||Review results of tests |
| || ||b) ||Apply rules based on test results |
| || ||c) ||Assign confidence rating |
| || ||d) ||File mail based on confidence rating rule |
As shown in FIG. 8, the general mail blocking program can be supplemented with an authenticator component to enable cooperative determination of junk e-mail. This works just as described above, except that it adds the facility of replying to an address supplied by the present invention to the subscriber. Users of the present invention would be provided with an authentication code certifying that they are not known spammers. In effect, the system would vouch for the authenticity, and the “spam check” would be sent to the system of the present invention and auto-responded to. If it turned out that the sender had abused his authentication privileges, the authentication address would be added to a list that automatically responds with a known key phrase in the subject line of the message so that the recipient would know immediately that this sender is not trustworthy. This eliminates having to reply to the original sender, who may be unknown due to blind carbon copies (BCCs), etc. Further, the authenticator would potentially be receiving additional information on whether or not a message was a junk e-mail while the message was present in a user's inbox. If the message was determined to be a junk e-mail, the mail program would be informed, and the user would be able to have the message automatically discarded or to be marked as potentially junk. If a message has previously been checked but the message was not yet known to be junk, and if the user has not yet read the message, the authenticator may “call back” the mail program that previously checked the message and identify that the message, although previously thought to be okay, is now believed to be junk.
In order to provide each user with an authentication ID that the authenticator can use to quickly determine if the sender is a known junk e-mailer, the e-mail users would each register, potentially for a fee, and their e-mail program would be assigned a unique identification code. Preferably, the e-mail program would maintain the unique code in secret by the mail program such that the users and others would not see the message. For example, to prevent a recipient from stealing a unique code of another user from which he/she has received a message, the e-mail program or the e-mail handling system at an ISP or corporate level could strip the unique code before delivering the message. That is, when a message is received, the mail program or mail handling system would send the unique code and the “From:” identifier to the authenticator for authentication. The code and the “From:” identifier would be checked against the database of known junk e-mailers as well as checked for consistency between the two parts. If the code was for a known junk e-mailer, or if the code and the “From:” field did not match, the mail program or mail handling system would be warned of the problem. Since the message would then be authenticated, the unique code would no longer be needed and could be stripped before passing the mail message to the user.
In an alternate embodiment, the unique code is further protected by being used in conjunction with message signing and encryption. The mail program (or mail handling system) would send the authenticator a message to be authenticated, including the digitally signed part, the signature, and the unique code. The authenticator then would check the signed part of the message against the signature using the encryption key which is registered to the unique code. In this way, added protection from junk e-mail is obtained.
In an alternate embodiment, e-mail programs would send mail to be authenticated directly to an authentication server. The authentication server would check the message as in any of the above methods. When the authenticator had verified that the message was not part of a junk e-mail effort, the authenticator would “sign” the message and send the signed message on to its intended recipient. The user's mail program that eventually received the message would be able to authenticate it immediately as having been pre-authenticated, either by the signature or by the IP address from which the “signed” message was received. This would avoid the mail program from having to perform a remote communication before delivering the message.
In an alternate embodiment, a series of“seeded” e-mail addresses would be provided on the e-mail service that would be considered early warning notification of a junk e-mail effort. These addresses would correspond to non-existent or ghost accounts which a system has reserved for junk e-mail detection, e.g., A1 Aardvark and Arnie Apple. If these messages use the first set of ASCII characters, then the system would be notified early when A1 Aardvark and Arnie Apple receive the beginning of a mass junk e-mailing. Thus, the system could immediately identify the remaining messages with the same or similar contents as junk e-mail. An alternate way to do this would be to “seed” newsgroups and member profiles with phony addresses that only the provider would know of As a result, these addresses could be watched for incoming junk e-mail and a notification from the authentication server could then be broadcast to users indicating that mail with the subject of“XYZ” is junk e-mail. This would allow the client or server of the present invention to automatically eliminate the junk e-mail. Alternatively, a user requesting a service provider to handle this automatically would have the seeded addresses watched, notice the junk e-mail, and automatically prevent the mail from being transmitted any further to users that have requested services of the system of the present invention.
All of the above are only some of the examples of available embodiments of the present invention. Those skilled in the art will readily observe that numerous other modifications and alterations may be made without departing from the spirit and scope of the invention.