WO2005043417A3 - Methods and apparatuses for classifying electronic documents - Google Patents

Methods and apparatuses for classifying electronic documents Download PDF

Info

Publication number
WO2005043417A3
WO2005043417A3 PCT/US2004/036759 US2004036759W WO2005043417A3 WO 2005043417 A3 WO2005043417 A3 WO 2005043417A3 US 2004036759 W US2004036759 W US 2004036759W WO 2005043417 A3 WO2005043417 A3 WO 2005043417A3
Authority
WO
WIPO (PCT)
Prior art keywords
mufti
electronic documents
dimensional vector
classified
apparatuses
Prior art date
Application number
PCT/US2004/036759
Other languages
French (fr)
Other versions
WO2005043417A2 (en
Inventor
Vipul Ved Prakash
Mark Stemm
Original Assignee
Cloudmark Inc
Vipul Ved Prakash
Mark Stemm
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cloudmark Inc, Vipul Ved Prakash, Mark Stemm filed Critical Cloudmark Inc
Publication of WO2005043417A2 publication Critical patent/WO2005043417A2/en
Publication of WO2005043417A3 publication Critical patent/WO2005043417A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/21Monitoring or handling of messages
    • H04L51/212Monitoring or handling of messages using filtering or selective blocking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99942Manipulating data structure, e.g. compression, compaction, compilation

Abstract

Embodiments of the invention provide methods and apparatuses for classifying electronic documents (e.g., electronic communications) as either spam electronic documents or legitimate electronic documents. In accordance with one embodiment of the invention, each of a plurality of electronic communications is reduced to a corresponding multidimensional vector based on a mufti-dimensional vector space. The mufti-dimensional vectors represent corresponding electronic documents that have been classified as at least one type of electronic documents. Subsequent electronic documents to be classified are reduced to a corresponding mufti-dimensional vector inserted into the mufti-dimensional vector space. The electronic documents corresponding to an inserted mufti-dimensional vector are classified based upon the proximity of the inserted mufti-dimensional vector to at least one previously classified mufti-dimensional vectors of the mufti-dimensional vector space.
PCT/US2004/036759 2003-11-03 2004-11-03 Methods and apparatuses for classifying electronic documents WO2005043417A2 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US51701003P 2003-11-03 2003-11-03
US60/517,010 2003-11-03
US10/877,735 US7519565B2 (en) 2003-11-03 2004-06-24 Methods and apparatuses for classifying electronic documents
US10/877,735 2004-06-24

Publications (2)

Publication Number Publication Date
WO2005043417A2 WO2005043417A2 (en) 2005-05-12
WO2005043417A3 true WO2005043417A3 (en) 2005-08-18

Family

ID=34556244

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2004/036759 WO2005043417A2 (en) 2003-11-03 2004-11-03 Methods and apparatuses for classifying electronic documents

Country Status (2)

Country Link
US (2) US7519565B2 (en)
WO (1) WO2005043417A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8516377B2 (en) 2005-05-03 2013-08-20 Mcafee, Inc. Indicating Website reputations during Website manipulation of user information

Families Citing this family (54)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050149546A1 (en) * 2003-11-03 2005-07-07 Prakash Vipul V. Methods and apparatuses for determining and designating classifications of electronic documents
US7519565B2 (en) * 2003-11-03 2009-04-14 Cloudmark, Inc. Methods and apparatuses for classifying electronic documents
US7191175B2 (en) 2004-02-13 2007-03-13 Attenex Corporation System and method for arranging concept clusters in thematic neighborhood relationships in a two-dimensional visual display space
US7702673B2 (en) 2004-10-01 2010-04-20 Ricoh Co., Ltd. System and methods for creation and use of a mixed media environment
US9171202B2 (en) * 2005-08-23 2015-10-27 Ricoh Co., Ltd. Data organization and access for mixed media document system
US9373029B2 (en) 2007-07-11 2016-06-21 Ricoh Co., Ltd. Invisible junction feature recognition for document security or annotation
US9405751B2 (en) 2005-08-23 2016-08-02 Ricoh Co., Ltd. Database for mixed media document system
US7812986B2 (en) 2005-08-23 2010-10-12 Ricoh Co. Ltd. System and methods for use of voice mail and email in a mixed media environment
US9530050B1 (en) 2007-07-11 2016-12-27 Ricoh Co., Ltd. Document annotation sharing
US9495385B2 (en) 2004-10-01 2016-11-15 Ricoh Co., Ltd. Mixed media reality recognition using multiple specialized indexes
US9384619B2 (en) 2006-07-31 2016-07-05 Ricoh Co., Ltd. Searching media content for objects specified using identifiers
US8156116B2 (en) 2006-07-31 2012-04-10 Ricoh Co., Ltd Dynamic presentation of targeted information in a mixed media reality recognition system
US8176054B2 (en) 2007-07-12 2012-05-08 Ricoh Co. Ltd Retrieving electronic documents by converting them to synthetic text
US8989431B1 (en) 2007-07-11 2015-03-24 Ricoh Co., Ltd. Ad hoc paper-based networking with mixed media reality
US7356777B2 (en) 2005-01-26 2008-04-08 Attenex Corporation System and method for providing a dynamic user interface for a dense three-dimensional scene
US7404151B2 (en) 2005-01-26 2008-07-22 Attenex Corporation System and method for providing a dynamic user interface for a dense three-dimensional scene
US8438499B2 (en) 2005-05-03 2013-05-07 Mcafee, Inc. Indicating website reputations during user interactions
US8566726B2 (en) 2005-05-03 2013-10-22 Mcafee, Inc. Indicating website reputations based on website handling of personal information
US7765481B2 (en) 2005-05-03 2010-07-27 Mcafee, Inc. Indicating website reputations during an electronic commerce transaction
US7822620B2 (en) 2005-05-03 2010-10-26 Mcafee, Inc. Determining website reputations using automatic testing
US9384345B2 (en) 2005-05-03 2016-07-05 Mcafee, Inc. Providing alternative web content based on website reputation assessment
US7769751B1 (en) * 2006-01-17 2010-08-03 Google Inc. Method and apparatus for classifying documents based on user inputs
US8489987B2 (en) 2006-07-31 2013-07-16 Ricoh Co., Ltd. Monitoring and analyzing creation and usage of visual content using image and hotspot interaction
US8201076B2 (en) 2006-07-31 2012-06-12 Ricoh Co., Ltd. Capturing symbolic information from documents upon printing
US9176984B2 (en) 2006-07-31 2015-11-03 Ricoh Co., Ltd Mixed media reality retrieval of differentially-weighted links
US9063952B2 (en) 2006-07-31 2015-06-23 Ricoh Co., Ltd. Mixed media reality recognition with image tracking
US8204891B2 (en) * 2007-09-21 2012-06-19 Limelight Networks, Inc. Method and subsystem for searching media content within a content-search-service system
US8966389B2 (en) * 2006-09-22 2015-02-24 Limelight Networks, Inc. Visual interface for identifying positions of interest within a sequentially ordered information encoding
US7917492B2 (en) * 2007-09-21 2011-03-29 Limelight Networks, Inc. Method and subsystem for information acquisition and aggregation to facilitate ontology and language-model generation within a content-search-service system
US9015172B2 (en) 2006-09-22 2015-04-21 Limelight Networks, Inc. Method and subsystem for searching media content within a content-search service system
US8396878B2 (en) 2006-09-22 2013-03-12 Limelight Networks, Inc. Methods and systems for generating automated tags for video files
US7945627B1 (en) * 2006-09-28 2011-05-17 Bitdefender IPR Management Ltd. Layout-based electronic communication filtering systems and methods
US8973678B1 (en) * 2006-11-22 2015-03-10 Symantec Corporation Misspelled word analysis for undesirable message classification
US8572184B1 (en) 2007-10-04 2013-10-29 Bitdefender IPR Management Ltd. Systems and methods for dynamically integrating heterogeneous anti-spam filters
US8010614B1 (en) 2007-11-01 2011-08-30 Bitdefender IPR Management Ltd. Systems and methods for generating signatures for electronic communication classification
US8695100B1 (en) 2007-12-31 2014-04-08 Bitdefender IPR Management Ltd. Systems and methods for electronic fraud prevention
US20090274376A1 (en) * 2008-05-05 2009-11-05 Yahoo! Inc. Method for efficiently building compact models for large multi-class text classification
US8170966B1 (en) 2008-11-04 2012-05-01 Bitdefender IPR Management Ltd. Dynamic streaming message clustering for rapid spam-wave detection
US8340405B2 (en) * 2009-01-13 2012-12-25 Fuji Xerox Co., Ltd. Systems and methods for scalable media categorization
US8515957B2 (en) 2009-07-28 2013-08-20 Fti Consulting, Inc. System and method for displaying relationships between electronically stored information to provide classification suggestions via injection
CA2772082C (en) 2009-08-24 2019-01-15 William C. Knight Generating a reference set for use during document review
US20120254333A1 (en) * 2010-01-07 2012-10-04 Rajarathnam Chandramouli Automated detection of deception in short and multilingual electronic messages
US20120047172A1 (en) * 2010-08-23 2012-02-23 Google Inc. Parallel document mining
US8464342B2 (en) * 2010-08-31 2013-06-11 Microsoft Corporation Adaptively selecting electronic message scanning rules
US8332415B1 (en) * 2011-03-16 2012-12-11 Google Inc. Determining spam in information collected by a source
US9058331B2 (en) 2011-07-27 2015-06-16 Ricoh Co., Ltd. Generating a conversation in a social network based on visual search results
US8762365B1 (en) * 2011-08-05 2014-06-24 Amazon Technologies, Inc. Classifying network sites using search queries
CN104487966A (en) * 2012-07-23 2015-04-01 惠普发展公司,有限责任合伙企业 Document classification
US11068546B2 (en) 2016-06-02 2021-07-20 Nuix North America Inc. Computer-implemented system and method for analyzing clusters of coded documents
RU2634180C1 (en) 2016-06-24 2017-10-24 Акционерное общество "Лаборатория Касперского" System and method for determining spam-containing message by topic of message sent via e-mail
US9954805B2 (en) * 2016-07-22 2018-04-24 Mcafee, Llc Graymail filtering-based on user preferences
US10489589B2 (en) * 2016-11-21 2019-11-26 Cylance Inc. Anomaly based malware detection
US20200019767A1 (en) * 2018-07-12 2020-01-16 KnowledgeLake, Inc. Document classification system
US11687717B2 (en) * 2019-12-03 2023-06-27 Morgan State University System and method for monitoring and routing of computer traffic for cyber threat risk embedded in electronic documents

Family Cites Families (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6298174B1 (en) 1996-08-12 2001-10-02 Battelle Memorial Institute Three-dimensional display of document set
US6141686A (en) * 1998-03-13 2000-10-31 Deterministic Networks, Inc. Client-side application-classifier gathering network-traffic statistics and application and user names using extensible-service provider plugin for policy-based network control
US6192360B1 (en) * 1998-06-23 2001-02-20 Microsoft Corporation Methods and apparatus for classifying text and for building a text classifier
US6161130A (en) * 1998-06-23 2000-12-12 Microsoft Corporation Technique which utilizes a probabilistic classifier to detect "junk" e-mail by automatically updating a training and re-training the classifier based on the updated training set
US6351712B1 (en) * 1998-12-28 2002-02-26 Rosetta Inpharmatics, Inc. Statistical combining of cell expression profiles
US7272593B1 (en) * 1999-01-26 2007-09-18 International Business Machines Corporation Method and apparatus for similarity retrieval from iterative refinement
US6941321B2 (en) * 1999-01-26 2005-09-06 Xerox Corporation System and method for identifying similarities among objects in a collection
US6564202B1 (en) 1999-01-26 2003-05-13 Xerox Corporation System and method for visually representing the contents of a multiple data object cluster
US6598054B2 (en) 1999-01-26 2003-07-22 Xerox Corporation System and method for clustering data objects in a collection
US6393427B1 (en) * 1999-03-22 2002-05-21 Nec Usa, Inc. Personalized navigation trees
US6647341B1 (en) * 1999-04-09 2003-11-11 Whitehead Institute For Biomedical Research Methods for classifying samples and ascertaining previously unknown classes
US6563952B1 (en) * 1999-10-18 2003-05-13 Hitachi America, Ltd. Method and apparatus for classification of high dimensional data
CA2307404A1 (en) * 2000-05-02 2001-11-02 Provenance Systems Inc. Computer readable electronic records automated classification system
US6766316B2 (en) 2001-01-18 2004-07-20 Science Applications International Corporation Method and system of ranking and clustering for document indexing and retrieval
US6901398B1 (en) * 2001-02-12 2005-05-31 Microsoft Corporation System and method for constructing and personalizing a universal information classifier
US6952700B2 (en) * 2001-03-22 2005-10-04 International Business Machines Corporation Feature weighting in κ-means clustering
US7194483B1 (en) * 2001-05-07 2007-03-20 Intelligenxia, Inc. Method, system, and computer program product for concept-based multi-dimensional analysis of unstructured information
US7308451B1 (en) 2001-09-04 2007-12-11 Stratify, Inc. Method and system for guided cluster based processing on prototypes
US6459974B1 (en) * 2001-05-30 2002-10-01 Eaton Corporation Rules-based occupant classification system for airbag deployment
US20030030666A1 (en) 2001-08-07 2003-02-13 Amir Najmi Intelligent adaptive navigation optimization
US6778995B1 (en) 2001-08-31 2004-08-17 Attenex Corporation System and method for efficiently generating cluster groupings in a multi-dimensional concept space
US7363311B2 (en) * 2001-11-16 2008-04-22 Nippon Telegraph And Telephone Corporation Method of, apparatus for, and computer program for mapping contents having meta-information
JP3860046B2 (en) * 2002-02-15 2006-12-20 インターナショナル・ビジネス・マシーンズ・コーポレーション Program, system and recording medium for information processing using random sample hierarchical structure
JP4175001B2 (en) * 2002-03-04 2008-11-05 セイコーエプソン株式会社 Document data retrieval device
US7158983B2 (en) * 2002-09-23 2007-01-02 Battelle Memorial Institute Text analysis technique
CA2530350A1 (en) * 2003-06-25 2005-03-10 National Institute Of Advanced Industrial Science And Technology Digital cell
GB0315154D0 (en) * 2003-06-28 2003-08-06 Ibm Improvements to hypertext integrity
US7610313B2 (en) * 2003-07-25 2009-10-27 Attenex Corporation System and method for performing efficient document scoring and clustering
US7519565B2 (en) 2003-11-03 2009-04-14 Cloudmark, Inc. Methods and apparatuses for classifying electronic documents
US20050149546A1 (en) 2003-11-03 2005-07-07 Prakash Vipul V. Methods and apparatuses for determining and designating classifications of electronic documents
US20050282193A1 (en) * 2004-04-23 2005-12-22 Bulyk Martha L Space efficient polymer sets

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DUDANI S A: "The Distance-Weighted k-Nearest-Neighbor Rule", IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS, vol. SMC-6, no. 4, April 1976 (1976-04-01), LOS ALAMITOS, CA, US, pages 325 - 327, XP009044290 *
MACLEOD J E S ET AL: "A Re-Examination of the Distance-Weighted k-Nearest Neighbor Classification Rule", IEEE TRANSACTIONS ON SYSTEM, MAN AND CYBERNETICS, vol. SMC-17, no. 4, August 1987 (1987-08-01), pages 689 - 696, XP009044291 *
YONGHONG LI ET AL: "Classification of text documents", PATTERN RECOGNITION, 1998. PROCEEDINGS. FOURTEENTH INTERNATIONAL CONFERENCE ON BRISBANE, QLD., AUSTRALIA 16-20 AUG. 1998, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, vol. 2, 16 August 1998 (1998-08-16), pages 1295 - 1297, XP010297856, ISBN: 0-8186-8512-3 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8516377B2 (en) 2005-05-03 2013-08-20 Mcafee, Inc. Indicating Website reputations during Website manipulation of user information

Also Published As

Publication number Publication date
US20050097435A1 (en) 2005-05-05
US20090259608A1 (en) 2009-10-15
US7519565B2 (en) 2009-04-14
WO2005043417A2 (en) 2005-05-12
US7890441B2 (en) 2011-02-15

Similar Documents

Publication Publication Date Title
WO2005043417A3 (en) Methods and apparatuses for classifying electronic documents
WO2007068519A3 (en) Method and systems using radio frequency identifier tags for comparing and authenticating items
WO2004086192A3 (en) Systems and methods for interactive search query refinement
WO2005043416A3 (en) Methods and apparatuses for determining and designating classifications of electronic documents
WO2007027208A3 (en) Traversing data in a repeatable manner
WO2002080022A3 (en) Knowledge discovery from data sets
WO2004114045A3 (en) Two-phase hash value matching technique in message protection systems
MY152525A (en) Video abstraction
WO2006094206A3 (en) Generating structured information
WO2002031682A3 (en) Email to database import utility
WO2005086723A3 (en) Passively populating a participant list with known contacts
WO2010039519A3 (en) Methods and apparatus related to document processing based on a document type
WO2007062172A3 (en) A method of processing annotations using an editable multi-dimensional catalog
WO2008137637A3 (en) Methods, arrangements and systems for obtaining information associated with a sample using brillouin microscopy
WO2006077163A3 (en) Wire-printed circuit board or card comprising conductors with a rectangular or square cross-section
WO2005076923A3 (en) Database manipulations using group theory
WO2004057529A3 (en) Region-based image processor
WO2009075337A1 (en) Encryption method, decryption method, device, and program
WO2006115655A3 (en) Linking diffie hellman with hfs authentication by using a seed
WO2007121121A3 (en) Laminated biosensor and its manufacturing method
Shabbir et al. A new estimator of population mean in stratified sampling
US9968014B2 (en) Shielding cover, shielding cover assembly and electronic device employing the same
US10535922B2 (en) Host with multiple antennas
WO2002008876A3 (en) Folding cellular telephone and digital assistant with improved keyboard
WO2004017236A3 (en) Method for communicating structured information

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase