WO2005089340A3 - Training tree transducers - Google Patents

Training tree transducers Download PDF

Info

Publication number
WO2005089340A3
WO2005089340A3 PCT/US2005/008648 US2005008648W WO2005089340A3 WO 2005089340 A3 WO2005089340 A3 WO 2005089340A3 US 2005008648 W US2005008648 W US 2005008648W WO 2005089340 A3 WO2005089340 A3 WO 2005089340A3
Authority
WO
WIPO (PCT)
Prior art keywords
training
transducers
rules
those
tree
Prior art date
Application number
PCT/US2005/008648
Other languages
French (fr)
Other versions
WO2005089340A2 (en
Inventor
Kevin Knight
Jonathan Graehl
Original Assignee
Univ Southern California
Kevin Knight
Jonathan Graehl
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univ Southern California, Kevin Knight, Jonathan Graehl filed Critical Univ Southern California
Publication of WO2005089340A2 publication Critical patent/WO2005089340A2/en
Publication of WO2005089340A3 publication Critical patent/WO2005089340A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/44Statistical methods, e.g. probability models

Abstract

Training using tree transducers is described. Given sample input/output pairs as training (100, 110), and given a set of tree transducer rules (120), the information is combined to yield locally optimal weights for those rules (140). This combination is carried out by building a weighted derivation forest for each input/output pair and applying counting methods to those forests (130).
PCT/US2005/008648 2004-03-15 2005-03-15 Training tree transducers WO2005089340A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US55358704P 2004-03-15 2004-03-15
US60/553,587 2004-03-15

Publications (2)

Publication Number Publication Date
WO2005089340A2 WO2005089340A2 (en) 2005-09-29
WO2005089340A3 true WO2005089340A3 (en) 2008-01-10

Family

ID=34994277

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2005/008648 WO2005089340A2 (en) 2004-03-15 2005-03-15 Training tree transducers

Country Status (2)

Country Link
US (1) US7698125B2 (en)
WO (1) WO2005089340A2 (en)

Families Citing this family (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2002316581A1 (en) 2001-07-03 2003-01-21 University Of Southern California A syntax-based statistical translation model
AU2003269808A1 (en) 2002-03-26 2004-01-06 University Of Southern California Constructing a translation lexicon from comparable, non-parallel corpora
US8548794B2 (en) 2003-07-02 2013-10-01 University Of Southern California Statistical noun phrase translation
US7711545B2 (en) * 2003-07-02 2010-05-04 Language Weaver, Inc. Empirical methods for splitting compound words with application to machine translation
US8296127B2 (en) 2004-03-23 2012-10-23 University Of Southern California Discovery of parallel text portions in comparable collections of corpora and training using comparable texts
US8666725B2 (en) 2004-04-16 2014-03-04 University Of Southern California Selection and use of nonstatistical translation components in a statistical machine translation framework
JP5452868B2 (en) 2004-10-12 2014-03-26 ユニヴァーシティー オブ サザン カリフォルニア Training for text-to-text applications that use string-to-tree conversion for training and decoding
US8886517B2 (en) 2005-06-17 2014-11-11 Language Weaver, Inc. Trust scoring for language translation systems
US8676563B2 (en) 2009-10-01 2014-03-18 Language Weaver, Inc. Providing human-generated and machine-generated trusted translations
US7697827B2 (en) 2005-10-17 2010-04-13 Konicek Jeffrey C User-friendlier interfaces for a camera
US10319252B2 (en) 2005-11-09 2019-06-11 Sdl Inc. Language capability assessment and training apparatus and techniques
WO2007078220A2 (en) * 2005-12-30 2007-07-12 Telefonaktiebolaget Lm Ericsson (Publ) Compiling method for command based router classifiers
US7966173B2 (en) * 2006-03-22 2011-06-21 Nuance Communications, Inc. System and method for diacritization of text
US8943080B2 (en) 2006-04-07 2015-01-27 University Of Southern California Systems and methods for identifying parallel documents and sentence fragments in multilingual document collections
US7827028B2 (en) * 2006-04-07 2010-11-02 Basis Technology Corporation Method and system of machine translation
US8886518B1 (en) 2006-08-07 2014-11-11 Language Weaver, Inc. System and method for capitalizing machine translated text
US8433556B2 (en) 2006-11-02 2013-04-30 University Of Southern California Semi-supervised training for statistical word alignment
US9122674B1 (en) 2006-12-15 2015-09-01 Language Weaver, Inc. Use of annotations in statistical machine translation
US8468149B1 (en) 2007-01-26 2013-06-18 Language Weaver, Inc. Multi-lingual online community
US7962323B2 (en) * 2007-03-07 2011-06-14 Microsoft Corporation Converting dependency grammars to efficiently parsable context-free grammars
US8615389B1 (en) 2007-03-16 2013-12-24 Language Weaver, Inc. Generation and exploitation of an approximate language model
US8831928B2 (en) 2007-04-04 2014-09-09 Language Weaver, Inc. Customizable machine translation service
US8825466B1 (en) * 2007-06-08 2014-09-02 Language Weaver, Inc. Modification of annotated bilingual segment pairs in syntax-based machine translation
US8311806B2 (en) * 2008-06-06 2012-11-13 Apple Inc. Data detection in a sequence of tokens using decision tree reductions
US8738360B2 (en) 2008-06-06 2014-05-27 Apple Inc. Data detection of a character sequence having multiple possible data types
US8447588B2 (en) * 2008-12-18 2013-05-21 Palo Alto Research Center Incorporated Region-matching transducers for natural language processing
US8510097B2 (en) * 2008-12-18 2013-08-13 Palo Alto Research Center Incorporated Region-matching transducers for text-characterization
US20100299132A1 (en) * 2009-05-22 2010-11-25 Microsoft Corporation Mining phrase pairs from an unstructured resource
US8990064B2 (en) 2009-07-28 2015-03-24 Language Weaver, Inc. Translating documents based on content
US8380486B2 (en) 2009-10-01 2013-02-19 Language Weaver, Inc. Providing machine-generated translations and corresponding trust levels
US10417646B2 (en) 2010-03-09 2019-09-17 Sdl Inc. Predicting the cost associated with translating textual content
US9317595B2 (en) * 2010-12-06 2016-04-19 Yahoo! Inc. Fast title/summary extraction from long descriptions
US20120158398A1 (en) * 2010-12-17 2012-06-21 John Denero Combining Model-Based Aligner Using Dual Decomposition
US8612204B1 (en) * 2011-03-30 2013-12-17 Google Inc. Techniques for reordering words of sentences for improved translation between languages
US11003838B2 (en) 2011-04-18 2021-05-11 Sdl Inc. Systems and methods for monitoring post translation editing
US8694303B2 (en) 2011-06-15 2014-04-08 Language Weaver, Inc. Systems and methods for tuning parameters in statistical machine translation
US8886515B2 (en) 2011-10-19 2014-11-11 Language Weaver, Inc. Systems and methods for enhancing machine translation post edit review processes
US8874615B2 (en) * 2012-01-13 2014-10-28 Quova, Inc. Method and apparatus for implementing a learning model for facilitating answering a query on a database
US8942973B2 (en) 2012-03-09 2015-01-27 Language Weaver, Inc. Content page URL translation
US10261994B2 (en) 2012-05-25 2019-04-16 Sdl Inc. Method and system for automatic management of reputation of translators
US9152622B2 (en) 2012-11-26 2015-10-06 Language Weaver, Inc. Personalized machine translation via online adaptation
US9213694B2 (en) 2013-10-10 2015-12-15 Language Weaver, Inc. Efficient online domain adaptation

Family Cites Families (69)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS57201958A (en) * 1981-06-05 1982-12-10 Hitachi Ltd Device and method for interpretation between natural languages
JPS58201175A (en) * 1982-05-20 1983-11-22 Kokusai Denshin Denwa Co Ltd <Kdd> Machine translation system
US4642526A (en) * 1984-09-14 1987-02-10 Angstrom Robotics & Technologies, Inc. Fluorescent object recognition system having self-modulated light source
JPS61217871A (en) * 1985-03-25 1986-09-27 Toshiba Corp Translation processor
DE3616751A1 (en) * 1985-05-20 1986-11-20 Sharp K.K., Osaka TRANSLATION SYSTEM
JPH083815B2 (en) 1985-10-25 1996-01-17 株式会社日立製作所 Natural language co-occurrence relation dictionary maintenance method
US5146405A (en) * 1988-02-05 1992-09-08 At&T Bell Laboratories Methods for part-of-speech determination and usage
JPH0242572A (en) * 1988-08-03 1990-02-13 Hitachi Ltd Preparation/maintenance method for co-occurrence relation dictionary
JPH02301869A (en) * 1989-05-17 1990-12-13 Hitachi Ltd Method for maintaining and supporting natural language processing system
US5369574A (en) 1990-08-01 1994-11-29 Canon Kabushiki Kaisha Sentence generating system
US5212730A (en) * 1991-07-01 1993-05-18 Texas Instruments Incorporated Voice recognition of proper names using text-derived recognition models
US5477451A (en) * 1991-07-25 1995-12-19 International Business Machines Corp. Method and system for natural language translation
US5267156A (en) * 1991-12-05 1993-11-30 International Business Machines Corporation Method for constructing a knowledge base, knowledge base system, machine translation method and system therefor
GB9209346D0 (en) 1992-04-30 1992-06-17 Sharp Kk Machine translation system
US6278967B1 (en) * 1992-08-31 2001-08-21 Logovista Corporation Automated system for generating natural language translations that are domain-specific, grammar rule-based, and/or based on part-of-speech analysis
US5432948A (en) * 1993-04-26 1995-07-11 Taligent, Inc. Object-oriented rule-based text input transliteration system
GB2279164A (en) * 1993-06-18 1994-12-21 Canon Res Ct Europe Ltd Processing a bilingual database.
US5510981A (en) * 1993-10-28 1996-04-23 International Business Machines Corporation Language translation apparatus and method using context-based translation models
US6304841B1 (en) * 1993-10-28 2001-10-16 International Business Machines Corporation Automatic construction of conditional exponential models from elementary features
JP3345763B2 (en) 1994-03-04 2002-11-18 日本電信電話株式会社 Natural language translator
JP3377290B2 (en) * 1994-04-27 2003-02-17 シャープ株式会社 Machine translation device with idiom processing function
US5695980A (en) * 1995-06-06 1997-12-09 Human Genome Sciences Polynucleotides, vectors, cells and an expression method for human MutT2
JP2855409B2 (en) * 1994-11-17 1999-02-10 日本アイ・ビー・エム株式会社 Natural language processing method and system
GB2295470A (en) * 1994-11-28 1996-05-29 Sharp Kk Machine translation system
CA2170669A1 (en) * 1995-03-24 1996-09-25 Fernando Carlos Neves Pereira Grapheme-to phoneme conversion with weighted finite-state transducers
WO1996041281A1 (en) * 1995-06-07 1996-12-19 International Language Engineering Corporation Machine assisted translation tools
US5903858A (en) * 1995-06-23 1999-05-11 Saraki; Masashi Translation machine for editing a original text by rewriting the same and translating the rewrote one
US5987404A (en) * 1996-01-29 1999-11-16 International Business Machines Corporation Statistical natural language understanding using hidden clumpings
JPH09259127A (en) * 1996-03-21 1997-10-03 Sharp Corp Translation device
US5870706A (en) * 1996-04-10 1999-02-09 Lucent Technologies, Inc. Method and apparatus for an improved language recognition system
US6161083A (en) * 1996-05-02 2000-12-12 Sony Corporation Example-based translation method and system which calculates word similarity degrees, a priori probability, and transformation probability to determine the best example for translation
US5806032A (en) * 1996-06-14 1998-09-08 Lucent Technologies Inc. Compilation of weighted finite-state transducers from decision trees
JPH1011447A (en) 1996-06-21 1998-01-16 Ibm Japan Ltd Translation method and system based upon pattern
JP3579204B2 (en) * 1997-01-17 2004-10-20 富士通株式会社 Document summarizing apparatus and method
US5991710A (en) * 1997-05-20 1999-11-23 International Business Machines Corporation Statistical translation system with features based on phrases or groups of words
US6415250B1 (en) * 1997-06-18 2002-07-02 Novell, Inc. System and method for identifying language using morphologically-based techniques
US6032111A (en) * 1997-06-23 2000-02-29 At&T Corp. Method and apparatus for compiling context-dependent rewrite rules and input strings
DE69837979T2 (en) * 1997-06-27 2008-03-06 International Business Machines Corp. System for extracting multilingual terminology
JPH11143877A (en) * 1997-10-22 1999-05-28 Internatl Business Mach Corp <Ibm> Compression method, method for compressing entry index data and machine translation system
US6533822B2 (en) 1998-01-30 2003-03-18 Xerox Corporation Creating summaries along with indicators, and automatically positioned tabs
US6031984A (en) * 1998-03-09 2000-02-29 I2 Technologies, Inc. Method and apparatus for optimizing constraint models
JP3430007B2 (en) 1998-03-20 2003-07-28 富士通株式会社 Machine translation device and recording medium
GB2337611A (en) * 1998-05-20 1999-11-24 Sharp Kk Multilingual document retrieval system
GB2338089A (en) * 1998-06-02 1999-12-08 Sharp Kk Indexing method
US6092034A (en) * 1998-07-27 2000-07-18 International Business Machines Corporation Statistical translation system and method for fast sense disambiguation and translation of large corpora using fertility models and sense models
US6285978B1 (en) * 1998-09-24 2001-09-04 International Business Machines Corporation System and method for estimating accuracy of an automatic natural language translation
JP2000132550A (en) * 1998-10-26 2000-05-12 Matsushita Electric Ind Co Ltd Chinese generating device for machine translation
US6182014B1 (en) * 1998-11-20 2001-01-30 Schlumberger Technology Corporation Method and system for optimizing logistical operations in land seismic surveys
US6460015B1 (en) * 1998-12-15 2002-10-01 International Business Machines Corporation Method, system and computer program product for automatic character transliteration in a text string object
US6223150B1 (en) * 1999-01-29 2001-04-24 Sony Corporation Method and apparatus for parsing in a spoken language translation system
WO2000062193A1 (en) * 1999-04-08 2000-10-19 Kent Ridge Digital Labs System for chinese tokenization and named entity recognition
US6393389B1 (en) * 1999-09-23 2002-05-21 Xerox Corporation Using ranked translation choices to obtain sequences indicating meaning of multi-token expressions
US6904402B1 (en) * 1999-11-05 2005-06-07 Microsoft Corporation System and iterative method for lexicon, segmentation and language model joint optimization
US6587844B1 (en) * 2000-02-01 2003-07-01 At&T Corp. System and methods for optimizing networks of weighted unweighted directed graphs
US7389234B2 (en) * 2000-07-20 2008-06-17 Microsoft Corporation Method and apparatus utilizing speech grammar rules written in a markup language
US6952666B1 (en) * 2000-07-20 2005-10-04 Microsoft Corporation Ranking parser for a natural language processing system
US6782356B1 (en) * 2000-10-03 2004-08-24 Hewlett-Packard Development Company, L.P. Hierarchical language chunking translation table
US7113903B1 (en) * 2001-01-30 2006-09-26 At&T Corp. Method and apparatus for providing stochastic finite-state machine translation
US7107215B2 (en) * 2001-04-16 2006-09-12 Sakhr Software Company Determining a compact model to transcribe the arabic language acoustically in a well defined basic phonetic study
US7177792B2 (en) * 2001-05-31 2007-02-13 University Of Southern California Integer programming decoder for machine translation
US7191115B2 (en) * 2001-06-20 2007-03-13 Microsoft Corporation Statistical method and apparatus for learning translation relationships among words
US6810374B2 (en) * 2001-07-23 2004-10-26 Pilwon Kang Korean romanization system
US7013262B2 (en) * 2002-02-12 2006-03-14 Sunflare Co., Ltd System and method for accurate grammar analysis using a learners' model and part-of-speech tagged (POST) parser
US7373291B2 (en) * 2002-02-15 2008-05-13 Mathsoft Engineering & Education, Inc. Linguistic support for a recognizer of mathematical expressions
DE60332220D1 (en) * 2002-03-27 2010-06-02 Univ Southern California PHRASE BASED COMMON PROBABILITY MODEL FOR STATISTICAL MACHINE TRANSLATION
US7149688B2 (en) * 2002-11-04 2006-12-12 Speechworks International, Inc. Multi-lingual speech recognition with cross-language context modeling
US20040111253A1 (en) * 2002-12-10 2004-06-10 International Business Machines Corporation System and method for rapid development of natural language understanding using active learning
US7346493B2 (en) * 2003-03-25 2008-03-18 Microsoft Corporation Linguistically informed statistical models of constituent structure for ordering in sentence realization for a natural language generation system
US20050125218A1 (en) * 2003-12-04 2005-06-09 Nitendra Rajput Language modelling for mixed language expressions

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ABNEY S.P.: "Stochastic Attribute Value Grammars", ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 1997, pages 597 - 618, XP058249076 *
ALSHAWI H.: "Head automata for speech translation, Spoken language", ICLP 96. PROCEEDINGS., FOURTH INTERNATIONAL CONFERENCE, October 1996 (1996-10-01), pages 2360 - 2363 *
MOHRI M.: "Regular approximation of context-free grammars through transformation, Robustness in Language and speech Technology", KLUWER ACADEMIC PUBLISHERS, pages: 251 - 261 *
NORVIG P.: "Techniques for Automatic Memoization with Applications to Context-Free Parsing", ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 1991, pages 91 - 98 *

Also Published As

Publication number Publication date
WO2005089340A2 (en) 2005-09-29
US7698125B2 (en) 2010-04-13
US20050234701A1 (en) 2005-10-20

Similar Documents

Publication Publication Date Title
WO2005089340A3 (en) Training tree transducers
WO2005041103A3 (en) Medical advisory system
WO2005079341A3 (en) System and method for producing merchandise from a virtual environment
WO2002022675A3 (en) Plant genes, the expression of which are altered by pathogen infection
CN101542600B (en) packet-based echo cancellation and suppression
WO2006099320A3 (en) Interactive virtual personal trainer and method of use
WO2013049739A3 (en) Processing signals
ATE406073T1 (en) METHOD FOR TRAINING AND OPERATING A HEARING AID AND CORRESPONDING HEARING AID
WO2005006140A3 (en) Methods to attribute conversions for online advertisement campaigns
WO2007109726A3 (en) Social network aware pattern detection
WO2001087426A3 (en) Method and apparatus for monitoring exercise
WO2006058239A3 (en) Specialized processor for solving optimization problems
DK1804652T3 (en) System for implementing a physiological system
ATE416824T1 (en) RETROFIT KIT FOR TRAINING DEVICE AND TRAINING DEVICE
WO2008104446A3 (en) Method for reducing noise in an input signal of a hearing device as well as a hearing device
WO2001061526A3 (en) Signal processing technique
DE602005021930D1 (en) Training system of a neuroevolution
GB2493030B (en) Method of sound analysis and associated sound synthesis
EP1201277A3 (en) Entertainment apparatus and method for reflecting input voice in operation of character
WO2004083885A3 (en) Time delaybeamformer and method of time delay beamforming
WO2005049802A3 (en) Anti-hydroxylase antibodies and uses thereof
WO2005100153A3 (en) Method and device for minimising rotorcraft take-off and landing noise
EP2447713A3 (en) Single-input multi-output surface acoustic wave device
WO2005022488A3 (en) Color-guiding music teaching system including practice pad, pitch plates, percussion targets &amp; lesson book assembly and method
DE60101094D1 (en) DEVICE FOR SEPARATING THE FREQUENCY BAND OF AN INPUT SIGNAL

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase