CN101326572A - 具有巨大词汇量的语音识别系统 - Google Patents
具有巨大词汇量的语音识别系统 Download PDFInfo
- Publication number
- CN101326572A CN101326572A CNA2006800460259A CN200680046025A CN101326572A CN 101326572 A CN101326572 A CN 101326572A CN A2006800460259 A CNA2006800460259 A CN A2006800460259A CN 200680046025 A CN200680046025 A CN 200680046025A CN 101326572 A CN101326572 A CN 101326572A
- Authority
- CN
- China
- Prior art keywords
- word
- phoneme
- speech recognition
- recognition system
- graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 claims abstract description 26
- 238000010586 diagram Methods 0.000 claims description 8
- 239000000284 extract Substances 0.000 claims description 5
- 230000001360 synchronised effect Effects 0.000 claims description 3
- 238000012549 training Methods 0.000 description 10
- 230000001419 dependent effect Effects 0.000 description 9
- 239000000463 material Substances 0.000 description 8
- 238000013518 transcription Methods 0.000 description 8
- 230000035897 transcription Effects 0.000 description 8
- 230000008676 import Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 238000009825 accumulation Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 238000005259 measurement Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 239000013598 vector Substances 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000000704 physical effect Effects 0.000 description 2
- 241000196324 Embryophyta Species 0.000 description 1
- 241001014642 Rasta Species 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000002203 pretreatment Methods 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/187—Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G10L2015/025—Phonemes, fenemes or fenones being the recognition units
Abstract
Description
Claims (13)
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP05111839.6 | 2005-12-08 | ||
EP05111839 | 2005-12-08 | ||
PCT/IB2006/054637 WO2007066297A1 (en) | 2005-12-08 | 2006-12-06 | Speech recognition system with huge vocabulary |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2011101288722A Division CN102176310B (zh) | 2005-12-08 | 2006-12-06 | 具有巨大词汇量的语音识别系统 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101326572A true CN101326572A (zh) | 2008-12-17 |
CN101326572B CN101326572B (zh) | 2011-07-06 |
Family
ID=37907345
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2006800460259A Expired - Fee Related CN101326572B (zh) | 2005-12-08 | 2006-12-06 | 具有巨大词汇量的语音识别系统 |
CN2011101288722A Expired - Fee Related CN102176310B (zh) | 2005-12-08 | 2006-12-06 | 具有巨大词汇量的语音识别系统 |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2011101288722A Expired - Fee Related CN102176310B (zh) | 2005-12-08 | 2006-12-06 | 具有巨大词汇量的语音识别系统 |
Country Status (8)
Country | Link |
---|---|
US (3) | US8140336B2 (zh) |
EP (1) | EP1960997B1 (zh) |
JP (2) | JP5322655B2 (zh) |
CN (2) | CN101326572B (zh) |
AT (1) | ATE457510T1 (zh) |
DE (1) | DE602006012218D1 (zh) |
RU (1) | RU2008127509A (zh) |
WO (1) | WO2007066297A1 (zh) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102479508A (zh) * | 2010-11-30 | 2012-05-30 | 国际商业机器公司 | 用于将文本转换成语音的方法和系统 |
CN103544955A (zh) * | 2012-07-13 | 2014-01-29 | 三星电子株式会社 | 识别语音的方法及其电子装置 |
CN103903619A (zh) * | 2012-12-28 | 2014-07-02 | 安徽科大讯飞信息科技股份有限公司 | 一种提高语音识别准确率的方法及系统 |
CN105895091A (zh) * | 2016-04-06 | 2016-08-24 | 普强信息技术(北京)有限公司 | 一种eswfst构建方法 |
Families Citing this family (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1960997B1 (en) | 2005-12-08 | 2010-02-10 | Nuance Communications Austria GmbH | Speech recognition system with huge vocabulary |
US8135590B2 (en) * | 2007-01-11 | 2012-03-13 | Microsoft Corporation | Position-dependent phonetic models for reliable pronunciation identification |
CN101689364B (zh) * | 2007-07-09 | 2011-11-23 | 富士通株式会社 | 声音识别装置和声音识别方法 |
US8738360B2 (en) * | 2008-06-06 | 2014-05-27 | Apple Inc. | Data detection of a character sequence having multiple possible data types |
US8788256B2 (en) * | 2009-02-17 | 2014-07-22 | Sony Computer Entertainment Inc. | Multiple language voice recognition |
US9646603B2 (en) * | 2009-02-27 | 2017-05-09 | Longsand Limited | Various apparatus and methods for a speech recognition system |
US9659559B2 (en) * | 2009-06-25 | 2017-05-23 | Adacel Systems, Inc. | Phonetic distance measurement system and related methods |
JP5660441B2 (ja) * | 2010-09-22 | 2015-01-28 | 独立行政法人情報通信研究機構 | 音声認識装置、音声認識方法、及びプログラム |
KR20120046627A (ko) * | 2010-11-02 | 2012-05-10 | 삼성전자주식회사 | 화자 적응 방법 및 장치 |
WO2012104708A1 (en) * | 2011-01-31 | 2012-08-09 | Walter Rosenbaum | Method and system for information recognition |
CN102737638B (zh) * | 2012-06-30 | 2015-06-03 | 北京百度网讯科技有限公司 | 一种语音解码的方法及装置 |
WO2014035394A1 (en) * | 2012-08-30 | 2014-03-06 | Interactive Intelligence, Inc. | Method and system for predicting speech recognition performance using accuracy scores |
US10019983B2 (en) * | 2012-08-30 | 2018-07-10 | Aravind Ganapathiraju | Method and system for predicting speech recognition performance using accuracy scores |
US9035884B2 (en) * | 2012-10-17 | 2015-05-19 | Nuance Communications, Inc. | Subscription updates in multiple device language models |
CN103810997B (zh) * | 2012-11-14 | 2018-04-03 | 北京百度网讯科技有限公司 | 一种确定语音识别结果置信度的方法和装置 |
US9953646B2 (en) | 2014-09-02 | 2018-04-24 | Belleau Technologies | Method and system for dynamic speech recognition and tracking of prewritten script |
JP7102710B2 (ja) * | 2017-11-22 | 2022-07-20 | 富士通株式会社 | 情報生成プログラム、単語抽出プログラム、情報処理装置、情報生成方法及び単語抽出方法 |
CN110189750B (zh) * | 2018-02-23 | 2022-11-15 | 株式会社东芝 | 词语检测系统、词语检测方法以及记录介质 |
JP7124358B2 (ja) * | 2018-03-13 | 2022-08-24 | 富士通株式会社 | 出力プログラム、情報処理装置及び出力制御方法 |
CN109002454B (zh) * | 2018-04-28 | 2022-05-27 | 陈逸天 | 一种确定目标单词的拼读分区的方法和电子设备 |
CN109376358B (zh) * | 2018-10-25 | 2021-07-16 | 陈逸天 | 一种借用历史拼读经验的单词学习方法、装置和电子设备 |
CN110176230B (zh) * | 2018-12-11 | 2021-10-08 | 腾讯科技(深圳)有限公司 | 一种语音识别方法、装置、设备和存储介质 |
US11217245B2 (en) * | 2019-08-29 | 2022-01-04 | Sony Interactive Entertainment Inc. | Customizable keyword spotting system with keyword adaptation |
Family Cites Families (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4980918A (en) * | 1985-05-09 | 1990-12-25 | International Business Machines Corporation | Speech recognition system with efficient storage and rapid assembly of phonological graphs |
JPS63155263A (ja) * | 1986-12-18 | 1988-06-28 | Fujitsu Ltd | 音声ワ−ドプロセツサ |
US5033087A (en) | 1989-03-14 | 1991-07-16 | International Business Machines Corp. | Method and apparatus for the automatic determination of phonological rules as for a continuous speech recognition system |
JP3042455B2 (ja) * | 1997-07-28 | 2000-05-15 | 日本電気株式会社 | 連続音声認識方式 |
US6757652B1 (en) * | 1998-03-03 | 2004-06-29 | Koninklijke Philips Electronics N.V. | Multiple stage speech recognizer |
US6243680B1 (en) * | 1998-06-15 | 2001-06-05 | Nortel Networks Limited | Method and apparatus for obtaining a transcription of phrases through text and spoken utterances |
US20020116196A1 (en) | 1998-11-12 | 2002-08-22 | Tran Bao Q. | Speech recognizer |
JP2000267693A (ja) * | 1999-03-12 | 2000-09-29 | Fuji Xerox Co Ltd | 音声処理装置及び索引作成装置 |
US6542866B1 (en) | 1999-09-22 | 2003-04-01 | Microsoft Corporation | Speech recognition method and apparatus utilizing multiple feature streams |
DE60036486T2 (de) * | 1999-10-28 | 2008-06-12 | Canon K.K. | Methode und apparat zum prüfen von musterübereinstimmungen |
US20030009331A1 (en) * | 2001-07-05 | 2003-01-09 | Johan Schalkwyk | Grammars for speech recognition |
DE10207895B4 (de) | 2002-02-23 | 2005-11-03 | Harman Becker Automotive Systems Gmbh | Verfahren zur Spracherkennung und Spracherkennungssystem |
US7181398B2 (en) * | 2002-03-27 | 2007-02-20 | Hewlett-Packard Development Company, L.P. | Vocabulary independent speech recognition system and method using subword units |
US6879954B2 (en) * | 2002-04-22 | 2005-04-12 | Matsushita Electric Industrial Co., Ltd. | Pattern matching for large vocabulary speech recognition systems |
US7149688B2 (en) * | 2002-11-04 | 2006-12-12 | Speechworks International, Inc. | Multi-lingual speech recognition with cross-language context modeling |
JP4072718B2 (ja) | 2002-11-21 | 2008-04-09 | ソニー株式会社 | 音声処理装置および方法、記録媒体並びにプログラム |
US7409345B2 (en) * | 2003-04-04 | 2008-08-05 | International Business Machines Corporation | Methods for reducing spurious insertions in speech recognition |
KR100612839B1 (ko) * | 2004-02-18 | 2006-08-18 | 삼성전자주식회사 | 도메인 기반 대화 음성인식방법 및 장치 |
WO2006051180A1 (fr) * | 2004-11-08 | 2006-05-18 | France Telecom | Procede de construction distribuee d'un modele de reconnaissance vocale , dispositif, serveur et programmes d'ordinateur pour mettre en œuvre un tel procede |
EP1960997B1 (en) | 2005-12-08 | 2010-02-10 | Nuance Communications Austria GmbH | Speech recognition system with huge vocabulary |
-
2006
- 2006-12-06 EP EP06832122A patent/EP1960997B1/en not_active Not-in-force
- 2006-12-06 CN CN2006800460259A patent/CN101326572B/zh not_active Expired - Fee Related
- 2006-12-06 CN CN2011101288722A patent/CN102176310B/zh not_active Expired - Fee Related
- 2006-12-06 JP JP2008543980A patent/JP5322655B2/ja not_active Expired - Fee Related
- 2006-12-06 DE DE602006012218T patent/DE602006012218D1/de active Active
- 2006-12-06 WO PCT/IB2006/054637 patent/WO2007066297A1/en active Application Filing
- 2006-12-06 US US12/096,046 patent/US8140336B2/en active Active
- 2006-12-06 RU RU2008127509/09A patent/RU2008127509A/ru not_active Application Discontinuation
- 2006-12-06 AT AT06832122T patent/ATE457510T1/de not_active IP Right Cessation
-
2012
- 2012-02-03 US US13/366,096 patent/US8417528B2/en active Active
- 2012-12-14 JP JP2012273922A patent/JP5968774B2/ja not_active Expired - Fee Related
-
2013
- 2013-03-06 US US13/786,973 patent/US8666745B2/en active Active
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102479508A (zh) * | 2010-11-30 | 2012-05-30 | 国际商业机器公司 | 用于将文本转换成语音的方法和系统 |
CN102479508B (zh) * | 2010-11-30 | 2015-02-11 | 国际商业机器公司 | 用于将文本转换成语音的方法和系统 |
CN103544955A (zh) * | 2012-07-13 | 2014-01-29 | 三星电子株式会社 | 识别语音的方法及其电子装置 |
CN103544955B (zh) * | 2012-07-13 | 2018-09-25 | 三星电子株式会社 | 识别语音的方法及其电子装置 |
CN103903619A (zh) * | 2012-12-28 | 2014-07-02 | 安徽科大讯飞信息科技股份有限公司 | 一种提高语音识别准确率的方法及系统 |
CN103903619B (zh) * | 2012-12-28 | 2016-12-28 | 科大讯飞股份有限公司 | 一种提高语音识别准确率的方法及系统 |
CN105895091A (zh) * | 2016-04-06 | 2016-08-24 | 普强信息技术(北京)有限公司 | 一种eswfst构建方法 |
CN105895091B (zh) * | 2016-04-06 | 2020-01-03 | 普强信息技术(北京)有限公司 | 一种eswfst构建方法 |
Also Published As
Publication number | Publication date |
---|---|
US20080294441A1 (en) | 2008-11-27 |
WO2007066297A1 (en) | 2007-06-14 |
US20130185073A1 (en) | 2013-07-18 |
JP2009518677A (ja) | 2009-05-07 |
CN101326572B (zh) | 2011-07-06 |
RU2008127509A (ru) | 2010-01-20 |
CN102176310A (zh) | 2011-09-07 |
CN102176310B (zh) | 2013-08-21 |
ATE457510T1 (de) | 2010-02-15 |
US8417528B2 (en) | 2013-04-09 |
DE602006012218D1 (de) | 2010-03-25 |
US8666745B2 (en) | 2014-03-04 |
JP5968774B2 (ja) | 2016-08-10 |
US20120136662A1 (en) | 2012-05-31 |
EP1960997B1 (en) | 2010-02-10 |
US8140336B2 (en) | 2012-03-20 |
EP1960997A1 (en) | 2008-08-27 |
JP2013068970A (ja) | 2013-04-18 |
JP5322655B2 (ja) | 2013-10-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101326572B (zh) | 具有巨大词汇量的语音识别系统 | |
Ward et al. | Recent improvements in the CMU spoken language understanding system | |
CN110675855B (zh) | 一种语音识别方法、电子设备及计算机可读存储介质 | |
US20050091054A1 (en) | Method and apparatus for generating and displaying N-Best alternatives in a speech recognition system | |
Demuynck | Extracting, modelling and combining information in speech recognition | |
KR101424193B1 (ko) | 타 언어권 화자음성에 대한 음성인식 시스템의 성능 향상을위한 비직접적 데이터 기반 발음변이 모델링 시스템 및방법 | |
Moyal et al. | Phonetic search methods for large speech databases | |
Avram et al. | Towards a romanian end-to-end automatic speech recognition based on deepspeech2 | |
US8423354B2 (en) | Speech recognition dictionary creating support device, computer readable medium storing processing program, and processing method | |
CN111105787A (zh) | 一种文本匹配方法、设备及计算机可读存储介质 | |
JP5897718B2 (ja) | 音声検索装置、計算機読み取り可能な記憶媒体、及び音声検索方法 | |
Raymond et al. | Belief confirmation in spoken dialog systems using confidence measures | |
CN114974218A (zh) | 语音转换模型训练方法及装置、语音转换方法及装置 | |
CN111489742B (zh) | 声学模型训练方法、语音识别方法、装置及电子设备 | |
CN109559752B (zh) | 语音识别方法和装置 | |
Tamburini | Playing with NeMo for building an automatic speech recogniser for Italian | |
Erman | A functional description of the Hearsay-II speech understanding system | |
Zhang et al. | Using mutual information criterion to design an efficient phoneme set for Chinese speech recognition | |
Chen et al. | Application of allophonic and lexical constraints in continuous digit recognition | |
King | Using information above the word level for automatic speech recognition | |
Müller | Anu Kurian | |
CN114283812A (zh) | 语音识别的后处理方法及其装置、计算机可读存储介质 | |
Razik et al. | Local word confidence measure using word graph and n-best list. | |
Lyu et al. | Cross-lingual audio-to-text alignment for multimedia content management | |
Balakrishna et al. | Automatic generation of statistical language models for interactive voice response applications. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
ASS | Succession or assignment of patent right |
Owner name: NEWANGS AUSTRIA COMUNICATION CO., LTD. Free format text: FORMER OWNER: KONINKLIJKE PHILIPS ELECTRONICS N.V. Effective date: 20091218 |
|
C41 | Transfer of patent application or patent right or utility model | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20091218 Address after: Austria Vienna Applicant after: Nuance Communications, Inc. Address before: Holland Ian Deho Finn Applicant before: Koninklijke Philips Electronics N.V. |
|
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
ASS | Succession or assignment of patent right |
Owner name: NUANCE COMMUNICATIONS, INC. Free format text: FORMER OWNER: NUANCE COMM AUSTRIA GMBH Effective date: 20131028 |
|
C41 | Transfer of patent application or patent right or utility model | ||
TR01 | Transfer of patent right |
Effective date of registration: 20131028 Address after: Massachusetts, USA Patentee after: Nuance Communications, Inc. Address before: Austria Vienna Patentee before: Nuance Communications, Inc. |
|
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20110706 |
|
CF01 | Termination of patent right due to non-payment of annual fee |