DE602006021384D1 - Erzeugung von Beschreibungen für Klassen und Cluster von Dokumenten - Google Patents

Erzeugung von Beschreibungen für Klassen und Cluster von Dokumenten

Info

Publication number
DE602006021384D1
DE602006021384D1 DE602006021384T DE602006021384T DE602006021384D1 DE 602006021384 D1 DE602006021384 D1 DE 602006021384D1 DE 602006021384 T DE602006021384 T DE 602006021384T DE 602006021384 T DE602006021384 T DE 602006021384T DE 602006021384 D1 DE602006021384 D1 DE 602006021384D1
Authority
DE
Germany
Prior art keywords
clusters
classes
descriptions
documents
generation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
DE602006021384T
Other languages
English (en)
Inventor
Cyril Goutte
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xerox Corp
Original Assignee
Xerox Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xerox Corp filed Critical Xerox Corp
Publication of DE602006021384D1 publication Critical patent/DE602006021384D1/de
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
DE602006021384T 2005-12-20 2006-12-14 Erzeugung von Beschreibungen für Klassen und Cluster von Dokumenten Active DE602006021384D1 (de)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/312,764 US7813919B2 (en) 2005-12-20 2005-12-20 Class description generation for clustering and categorization

Publications (1)

Publication Number Publication Date
DE602006021384D1 true DE602006021384D1 (de) 2011-06-01

Family

ID=37726701

Family Applications (1)

Application Number Title Priority Date Filing Date
DE602006021384T Active DE602006021384D1 (de) 2005-12-20 2006-12-14 Erzeugung von Beschreibungen für Klassen und Cluster von Dokumenten

Country Status (3)

Country Link
US (1) US7813919B2 (de)
EP (2) EP2302532A1 (de)
DE (1) DE602006021384D1 (de)

Families Citing this family (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7809733B2 (en) * 2006-03-02 2010-10-05 Oracle International Corp. Effort based relevance
US8364467B1 (en) 2006-03-31 2013-01-29 Google Inc. Content-based classification
US7917492B2 (en) * 2007-09-21 2011-03-29 Limelight Networks, Inc. Method and subsystem for information acquisition and aggregation to facilitate ontology and language-model generation within a content-search-service system
JP5082374B2 (ja) * 2006-10-19 2012-11-28 富士通株式会社 フレーズアラインメントプログラム、翻訳プログラム、フレーズアラインメント装置およびフレーズアラインメント方法
US7757163B2 (en) * 2007-01-05 2010-07-13 International Business Machines Corporation Method and system for characterizing unknown annotator and its type system with respect to reference annotation types and associated reference taxonomy nodes
US7856351B2 (en) * 2007-01-19 2010-12-21 Microsoft Corporation Integrated speech recognition and semantic classification
US8108413B2 (en) * 2007-02-15 2012-01-31 International Business Machines Corporation Method and apparatus for automatically discovering features in free form heterogeneous data
US8996587B2 (en) 2007-02-15 2015-03-31 International Business Machines Corporation Method and apparatus for automatically structuring free form hetergeneous data
WO2009009192A2 (en) * 2007-04-18 2009-01-15 Aumni Data, Inc. Adaptive archive data management
US8856123B1 (en) * 2007-07-20 2014-10-07 Hewlett-Packard Development Company, L.P. Document classification
JP5379138B2 (ja) * 2007-08-23 2013-12-25 グーグル・インコーポレーテッド 領域辞書の作成
US7917355B2 (en) * 2007-08-23 2011-03-29 Google Inc. Word detection
US7983902B2 (en) * 2007-08-23 2011-07-19 Google Inc. Domain dictionary creation by detection of new topic words using divergence value comparison
US8140584B2 (en) * 2007-12-10 2012-03-20 Aloke Guha Adaptive data classification for data mining
US8189930B2 (en) * 2008-07-17 2012-05-29 Xerox Corporation Categorizer with user-controllable calibration
US8788497B2 (en) * 2008-09-15 2014-07-22 Microsoft Corporation Automated criterion-based grouping and presenting
US8266148B2 (en) * 2008-10-07 2012-09-11 Aumni Data, Inc. Method and system for business intelligence analytics on unstructured data
US8339680B2 (en) * 2009-04-02 2012-12-25 Xerox Corporation Printer image log system for document gathering and retention
US8386437B2 (en) * 2009-04-02 2013-02-26 Xerox Corporation Apparatus and method for document collection and filtering
US9405456B2 (en) * 2009-06-08 2016-08-02 Xerox Corporation Manipulation of displayed objects by virtual magnetism
US8165974B2 (en) 2009-06-08 2012-04-24 Xerox Corporation System and method for assisted document review
US8566349B2 (en) 2009-09-28 2013-10-22 Xerox Corporation Handwritten document categorizer and method of training
JP2011095905A (ja) * 2009-10-28 2011-05-12 Sony Corp 情報処理装置および方法、並びにプログラム
US8756503B2 (en) 2011-02-21 2014-06-17 Xerox Corporation Query generation from displayed text documents using virtual magnets
US8860763B2 (en) 2012-01-31 2014-10-14 Xerox Corporation Reversible user interface component
US8880525B2 (en) 2012-04-02 2014-11-04 Xerox Corporation Full and semi-batch clustering
US9189473B2 (en) 2012-05-18 2015-11-17 Xerox Corporation System and method for resolving entity coreference
US9977829B2 (en) * 2012-10-12 2018-05-22 Hewlett-Packard Development Company, L.P. Combinatorial summarizer
US20140289260A1 (en) * 2013-03-22 2014-09-25 Hewlett-Packard Development Company, L.P. Keyword Determination
CN103678274A (zh) * 2013-04-15 2014-03-26 南京邮电大学 一种基于改进互信息和熵的文本分类特征提取方法
US20150127323A1 (en) * 2013-11-04 2015-05-07 Xerox Corporation Refining inference rules with temporal event clustering
JP6044963B2 (ja) 2014-02-12 2016-12-14 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation 情報処理装置、方法及びプログラム
CN104991891B (zh) * 2015-07-28 2018-03-30 北京大学 一种短文本特征提取方法
CN105045913B (zh) * 2015-08-14 2018-08-28 北京工业大学 基于WordNet以及潜在语义分析的文本分类方法
TWI571756B (zh) 2015-12-11 2017-02-21 財團法人工業技術研究院 用以分析瀏覽記錄及其文件之方法及其系統
CN107967912B (zh) * 2017-11-28 2022-02-25 广州势必可赢网络科技有限公司 一种人声分割方法及装置
US11893500B2 (en) 2017-11-28 2024-02-06 International Business Machines Corporation Data classification for data lake catalog
US11301629B2 (en) 2019-08-21 2022-04-12 International Business Machines Corporation Interleaved conversation concept flow enhancement

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5537488A (en) 1993-09-16 1996-07-16 Massachusetts Institute Of Technology Pattern recognition system with statistical classification
US5857179A (en) * 1996-09-09 1999-01-05 Digital Equipment Corporation Computer method and apparatus for clustering documents and automatic generation of cluster keywords
US6137911A (en) 1997-06-16 2000-10-24 The Dialog Corporation Plc Test classification system and method
US6104835A (en) * 1997-11-14 2000-08-15 Kla-Tencor Corporation Automatic knowledge database generation for classifying objects and systems therefor
US6424971B1 (en) 1999-10-29 2002-07-23 International Business Machines Corporation System and method for interactive classification and analysis of data
US6862586B1 (en) 2000-02-11 2005-03-01 International Business Machines Corporation Searching databases that identifying group documents forming high-dimensional torus geometric k-means clustering, ranking, summarizing based on vector triplets
US7035431B2 (en) * 2002-02-22 2006-04-25 Microsoft Corporation System and method for probabilistic exemplar-based pattern tracking
US7165024B2 (en) 2002-02-22 2007-01-16 Nec Laboratories America, Inc. Inferring hierarchical descriptions of a set of documents
US7031909B2 (en) * 2002-03-12 2006-04-18 Verity, Inc. Method and system for naming a cluster of words and phrases
US6931347B2 (en) * 2002-03-29 2005-08-16 International Business Machines Corporation Safety stock determination
US7085771B2 (en) * 2002-05-17 2006-08-01 Verity, Inc System and method for automatically discovering a hierarchy of concepts from a corpus of documents
US20030233232A1 (en) * 2002-06-12 2003-12-18 Lucent Technologies Inc. System and method for measuring domain independence of semantic classes
US7139754B2 (en) 2004-02-09 2006-11-21 Xerox Corporation Method for multi-class, multi-label categorization using probabilistic hierarchical modeling
US7457808B2 (en) 2004-12-17 2008-11-25 Xerox Corporation Method and apparatus for explaining categorization decisions
US20060287848A1 (en) * 2005-06-20 2006-12-21 Microsoft Corporation Language classification with random feature clustering
US7849087B2 (en) 2005-06-29 2010-12-07 Xerox Corporation Incremental training for probabilistic categorizer
US7630977B2 (en) 2005-06-29 2009-12-08 Xerox Corporation Categorization including dependencies between different category systems
US8209335B2 (en) * 2005-09-20 2012-06-26 International Business Machines Corporation Extracting informative phrases from unstructured text

Also Published As

Publication number Publication date
EP1801714A3 (de) 2007-09-05
EP1801714B1 (de) 2011-04-20
EP2302532A1 (de) 2011-03-30
US20070143101A1 (en) 2007-06-21
US7813919B2 (en) 2010-10-12
EP1801714A2 (de) 2007-06-27

Similar Documents

Publication Publication Date Title
DE602006021384D1 (de) Erzeugung von Beschreibungen für Klassen und Cluster von Dokumenten
DE602006013810D1 (de) Aus biomasseressourcen gewonnener polyester und herstellungsverfahren dafür
DE602006011834D1 (de) Anlaufschaltung und Anlaufverfahren für Bandgapspannungsgeneratoren
DE602007006394D1 (de) Mikrofon und Befestigungsanordnung
DE602006020661D1 (de) Wässrige Dispersion von Polymerteilchen
ATE480322T1 (de) Keramikpartikel
DE602006013788D1 (de) Cyclo-1-gen aus mais und promoter
EP1960952A4 (de) Analyse adminstrativer krankenpflege-anspruchsdaten und anderer datenquellen
DE602006013761D1 (de) Herstellungsverfahren für wasserabsorbierende Partikel und wasserabsorbierende Partikel
DE502006004466D1 (de) Elektrisches bauelement
DE602007013962D1 (de) Wasserstoffverdichtersystem
DE602008001533D1 (de) Erzeugungsverfahren für Fotomaskendaten, Herstellungsverfahren für Fotomasken, Belichtungsverfahren und Herstellungsverfahren für Bauelemente
DE602007000901D1 (de) Dateiteilung durch Teilung der Cluster und der Management Information
DE602006001272D1 (de) Narkosebox und Mikroskop
FI20055408A0 (fi) Äärellisen tietokonemallin luominen
DE602006017437D1 (de) Polyarenazol/thermoplast-pulpe und herstellungsverfahren dafür
BRPI0908668A2 (pt) ''produtor auditivo com alto-falantes''
AU314482S (en) Dust cover for an electrical connector
DE602006005502D1 (de) Entwickler-regulierungsglied und entwicklungsvorrichtung
DE502005004418D1 (de) Turbolader
DE602007006881D1 (de) Luftschalter, Ausschaltfeder für den Luftschalter und Verbindungsverfahren dafür
AT503152A3 (de) Einbau elektrischer grosskomponenten in doppelstock-triebzügen
DE502006005974D1 (de) Nutzung von Variablen in mehreren Automatisierungssystemen
DE602007002258D1 (de) Stromvorverstärker und damit verbundener Stromvergleicher
UY3519Q (es) Cubierta para purificador de aire