CA1301340C - Continuous speech recognition system - Google Patents

Continuous speech recognition system

Info

Publication number
CA1301340C
CA1301340C CA000538501A CA538501A CA1301340C CA 1301340 C CA1301340 C CA 1301340C CA 000538501 A CA000538501 A CA 000538501A CA 538501 A CA538501 A CA 538501A CA 1301340 C CA1301340 C CA 1301340C
Authority
CA
Canada
Prior art keywords
link
link records
records
record
speech patterns
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
CA000538501A
Other languages
French (fr)
Inventor
Ira Alan Gerson
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Motorola Solutions Inc
Original Assignee
Motorola Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Motorola Inc filed Critical Motorola Inc
Application granted granted Critical
Publication of CA1301340C publication Critical patent/CA1301340C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/19Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
    • G10L15/193Formal grammars, e.g. finite state automata, context free grammars or word networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]

Abstract

ABSTRACT
A continuous speech recognition system employs a grammar tree of alternative potentially recognized word paths. A technique of tracing back through the grammar tree is utilized in determining which partial word path is common to all potential word paths.
The common partial word path is deleted and words corresponding to the deleted partial word path are output as recognized words.

Description

- " ~L30134() ~ 5 :: I

CONTINUOUS SPEECH RECOGNITION SYSTEN
~ 10 ,, Back~round of the Invention :.
The present invention relates to speech recognition, and, more particularly, to speech recognition wherein spoken word end poin~s are not predetermined.
Recognition of isolated words from a given vocabulary for a known spea~er ha~ heen known ~or somo time. Words of the vocabulary are prestored as individual ~emplates, each t~mplata reprasenting the sound pattern for a word 2 in the voaabulary. When an isolated word is spoken, the sy~tem compares tha word to each individual template representing the vocabulary. This method is commonly referred to as whole-word template matching. Many ~uccess~ul recognition systems use whole-word template matching with dynamia programming to cope with nonlinear time scale variations betweQn the spoken word and the prestored template.
Although this techniqu~ has been effec~ive for i isolated word recognition applications, many practical applications require continuous word recognition. In continuous word recognition, the number of words in a phrase can be unlimited and the identity of the earlier ~; words can be determined before the end of the phrase, whereas in isolated word recogni~ion, delimiters are used `:

I , .
.
, . ... .
. , .
: .~
. ~ ,. , ~ .
~, .; . .
.

~3~
~ "

to identiXy the beginning and ending of input patterns and recognition occurs one word at a time. Moreover, a continuous Rpeech recognition system must distinguish an input pattern from other recognizable patterns, backgrollnd noise, speaker induced noise such as breathing noise, while isolated recognition cannot usually tolerate other recognizabls pattexns at the beginning or ending of tha word.
In IITwo level DP Matching A dynamic programming based pattern matching algorithm for connected word recognition", H. Sakoe, I~EE Trans. AcousticR, Speech and Signal Processing, Vol.ASSP-27, No.6, pp.588-595, Dec.
1979, the method of whole-word templata matching has been extended ~o deal with connected word recognition. The paper suggests a two-pass dynamic programming algorithm ; to ~ind a sequence of word templates which best matches the whole input pattern. In the first pass, a score is generated which indicates the similarity between every template matched against every pos~ible portion of the input pattern. In the ~econd pass, the ~core i~ used to find the best sequence of templates corresponding to the whole input pattern.
This extended method has distinct ~isadvantages. one disadvantage of this technique is tha amount of con~putation time it requires. Depending on the specific design requirement~, this limitation may create unwarranted need for an expensive high-speed processor.
Another disadvantage o~ this method is that the endpoints of the input pattern must be prede~ermined and the whole input pattern must be stored in the system before any accurate template matching can occur For an input pattern of any signi~ican~ leng~h, recognition response ti=e would be substant1a1ly degraded. Also, ;

~30~3~
errors in endpoint detection will seriously degrade racognizer performance. Further, the memory required to store thl~ information may bacome excessive.
In "Partial Traceback and Dynamic Programmin~", P.
Brown, J. Spohrer, P. Hochschild, and J. Bak~r, IEEE
~rans. Acoustics, Speech and Signal Procassing, Vol.ASS~ 27, No.6, pp.588-595, Dec. 1979, a technique is described which allows for continuous speech recognition ~ arbitrarily long input patters without prede~ermination of endpoints. This i accomplished using a technique called partial traceback. Partial traceback allows outputting o~ recognize~ words prior to completion of the complete inpu~ pattern without sacrificing recognizer per~ormance. However, the partial traceback technique described appears to be processor burdensome and cumbersome to implement.
Accordingly, there is a need for a continuou~ ~peech recognition system which can easily be implemented, yet ; 20 can operate e~iciently and inexpensively in real time.

,~

~ 25 ::~

- ~ . . --- :

36;~3~(~
. - 4 ~
Obiects and SummarY o~ the Invention Accordingly, it i~ an ob~ect of the present invention to provide an arrangement and method of speech recognition which can be implemented for real time applications to recognize continuous speech with inexpensi~e hardware.
It i~ a further ob~ect of the present invsntion to provide an arrangement and ~ethod of speech recognition which allows efficient memory management of speech : recognition memory during the recognition process.
It i~ a ~urther ob~ect o~ th~ present invention to provids an arrangement and method o~ speech recognition which allows ef~icient traceback through a ~o~tware linked network representing potentially matching template~ ~or the recognition process.
It is still a further ob;ect of the present invention to provida an arrangement and method of speech recognition which allows input speech to be recognized be~ore the whole string is input.
In brie~, the present invention pertains to a speech recognition system, wherein input frames are processed against prestored t~mplates representing speech, and template~, which axe under consideration as potentially recognized are individually recorded in a linked network, each link record generally having ancestor and descendant link records. One arrangement and method for recognizing speech patterns includes providing temporary pointers for : 30 the link records, and tracing back through the linked network using the temporary pointers for selecting link records to connect pot~ntially recognized templates corresponding to the link records. Those link records with ancestor records having ~wo or more potentially recognized descendants have their temporary pointers removed. The remaining :

:

~ ` ~3~

link record~ in the network, which are still labeled with tamporaxy pointers, hava their associated words output as recognlzed word~.
Another aspect of the present invention pertains to a similarly arranged speech recognition system as above, but is aimed toward management o~ speech recognition memory. This aspect of the present invention comprises means for storing tha link records as indexed data sets 1~ with each data set includ$ng a symbol representing a template, a sequence indicator representing the rela~ive time the link record wa~ storad and a pointer indicating a link record in thQ network from which it descends. The table i3 composed o~ freQ record spaca and established ; 15 record space, wherein th~ link records are stored in the established record space. Further, means are employed ~or identifying those link records which are part of unambiguously recognized network paths. Tha unambiguously recognized link records are output (displaced) as racognized, and those link records displaced from the established record space become ; availabla as free record space for subsequently storing additional link rQcords in the linked network.

,:

,......................... . . .
, ' ' . . ~ , ;
.
, ~3~3~o Brie~ De criptlon of tha Drawinqs The ~eatures of the pr~sent invention which are believed to be novel are 62t forth with particularity in ~he appended claims. Th~ invention, together with further object~ and advantages thereo~, may best be understood by making reference to the ~ollowing description taken in con~unction with th~ accompanying drawings, in the several figures o~ which liXe reference numerals identify identical elements, and wherein~
~ igure 1 is a hardware block diagram of a speech recognition ~ystem depicted in accordance with the present invention;
Figura 2 i~ a graphical representation o~ a recognition grammar model illustrating one aspect of a sp~ech recognition system implemented in accordance with the present invention;
Figure 3 is a graphical representation o~ a speech grammar tree illustrating an enumeration o~ all possible paths through the speech grammar model o~ Figure 1:
Figure~ 4a, 4b and 4c comprise a flowchart depicting a set of 6tQps which may bo performed to implement a recognition process in accordance with the present inventiont Figure 5 is a flowchart more particularly ~ illustrating block 72 of Figure 4c;
:~ Figures 6a, 6b, 6c and 6d compriss a flowchart more particularly illustrating block 44 of Figure 4a:
Figur~ 7 comprises a serie~ o~ yrammar ~ree diagrams illu~trating an example of "traceback" in accordance with the present invention.
The sheet entitled "Appendix A" illustrates entries recited in the above table.

13~ o Detailed Description of the Preferred Embodiment R~ferring to Fig. 1, shown i8 a block diagram of a sp~ech recognition ~ystem which can be us~d to implement tha present invention. The diagram include~ template memory 10 which contains a prestored vocabulary.
Formation of a typical prestored vocabula~y is ~scribed in liA simplified, robust training procedure for speaker trained, isolated word recognition systems", L.R.Rabiner and J.G.Wilpon, Journal ~coustic Society of Amer., 68(5), Nov. 1980. An acoustic processor 12, such as is described in "On the effects of varying ~ilter bank paramet~rs on isolated word recognitionl', B.A. Dautrich, L.R. Rabiner and T.B. ~artin, I~EE Trans. on Acoustics, Speech and signal Processing, pp.793-806 Vol. ~SSP-31, August 1983, may be used to convert input speech into a seriQs of speech segments, aommonly referred to as ; "~rame~". Each ~rame represents a tima segment of input speech, u~ually in the form o~ LPC or ~ilter bank data.
The ~rames from the acoustic processor are passed to recognizer 14.
The xecognizer 14 accesses word templates from the vocabulary pre~tored in template memory 10 and processes each input ~rame ~rom the acoustic procçssor 12 with segmen~s of the word templates. Such a technique i9 inherent to many speech recogn~tion system and may ba ; referred to as "template processing".
The recognizer 14 bidirectionally accesses a second memory, memory ~or link table 16. ~emory for link table 16 is used for storing five r~lated arrays. The arrays are further discussed below.
The recognizer 14 may ba implemented by using two processors, a recognition processor 18 and a link and tracebacX processor 20. Tha recognition processor 18 ~ handles all template matching, grammar, control and ;~ communication with the link and tracebac~ processor 20.
Th~ link and traceback processor 20 is used to maintain ~L3~

the me~ory ~or link table~ 16. This function includes recording potential template matches.while the continuous p~ech i~ being input, storing related information in the ; 5 memory for link tables 16, freeing 3pace in memory for .
' .

~.~

:`

., 3s ~' .` .: .
, ~3~134~
g link tables 16 for additional lnformation and outputting re~ogniz~ng result~ as input spaech i5 identified. The ~unctions o~ the recognition processor 18 and the link ~ 5 and traceback proc~ssor 20 may be combined into one i processor or may be ~eparated, as shown, whereby the recognition processor 18 may be implemented as speci~ically described in "An Algorithm for Connected Word Recognition", J. Bridle, M. Brown and R.
Chamberlain, Proceedings o~ the IEEE International Conference on Acoustics, SpeecA and Signal Processing, pp~899-902, 1982. The link and traceback processors may be implemented with an 8 bit processor such aR Motorola's MC6801, when used in accordance with the present invention~
MODELING THE GRAMMAR

Referring now to Figure 2, a simplified recognition grammar model is shown to illustrate all possible word seguences which may be recognized by the system. The model i9 re~erred to as "æimpli~ied" because the voaabulary shown i9 grossly limited, for the purpose of illustration, from what would typically be required. In Fiqur~ 2, there are 9iX possible word strings, each comprising two words (further discussed with Figure 3).
In a typical speech recognition system, the grammar model may include many more possible word strings~ each of which may contain several words. The topology of the grammar model is stored in general memory with pointers pointing to the corresponding templates in template memory. Each node in the gra~mar model ha~ assoclated with it, stored in general memory, a link pointer designating the node from which template matching originated and its associated accumulated distance.

~3~3~

MODELING TH ~ _PAT~S

In Figure 3, each o~ the 5iX possible word strings from Figure 2 is enumerated in a tree diagram. There are three possible ~irst words, "one", "two" and "three".
Each po~qible first word may be followed by two possible second word~ our" and "five". During templat~
matching, i.e. while input frame~ are being co~pared against prestored word template~, the xecognition proce sor recognizes potential "word ends". A potential word end" is found when a sequenc~ of input ~rame~
potentlally matches a word template. The identified wo~d template is appended to the tree diagram through link info~mation stored in the link table, pre~iously mentioned, and an accumulated distance indicating a ; measure o~ similarity between the se~uence o~ input frames being processed and the template(s) leading up to the node. For example, given the vocabulary string po~si~ilitie3 of Figures 2 and 3, when an input frame UenCQ i8 identi~ied as potentially matching the word "two", "two" i~ added to the tree diagram from the ~; initial node, node 24. Figure 3 illustrates that after ; additional frames are input and processed, the word "one"
has become a potential match; hence, it is next added to the tre~ diagram also at no~e 24. Next tha word "three"
is added to the tree diagram, then "~our" appended to node 26, followed by "five" appended to the same node, etc. This continue~ with each potential matching template added to tha tree diagram as it is identi~ied in time.
The term "tree node", or reference to a node in the tree diagram, will be used interchangably with the term link(ed) record. In general, a link record is a data set ~ 35 stored in memory defining a connec~ion in the tree ; diagram, including identification o~ the particular .~
:`~

~ .

~ ~ ~3~

trea node and relation to the previous node in the tree topology.
For each potential word end frame, a new entry, or link record, 1c added to the link table, corre~ponding to a link in the represent2tive tree diagram. Quite often, a word template, typically represented as a sequence of frames (representing state~) in the form of a state diagram, will have multiple potential word end~ as input frames are being processed. Each time a potential word end i~ detected, the corresponding template is added to the tr~ as a new link. Further, each sta~e o~ each template has recorded an accumulated d~stanc~ processed through the present input frame, and a link pointer polnting to the link record in the link table ; corresponding to the link in the tree rrom which decoding for that template began. For further information on templatQ matching, raference may be made to "An Algorithm for Connect~d Word Recognition", supra.
Un~ortunately, when the vocabulary i5 large, continuously appending templates to the tres causes problems. First, it dQlays recognition response time.
The longer the input frame ~equence, the longer the operator must walt be~ore the system recognizes and takes action upon the recognized words.
Second, continuously appending template~ requires extensi~e memory ~or linking the tree diagram informationO For complex grammar models and several potential word end frame~ for each potential word match, the memory raquirement~ for the link table (tree) will grow at a very ~ast rate. If the growth rate is too large, memory requirements may become impractical.
The present invention overcomes these problems by uniquely structuring a link table and maintaining it in a highly efficient manner.

~ ~ 13t}~

THE LINK TABLE

Th~ link table represent~ all potential word saquences under consideration as matches in the form of a tree network, similar to Figure 2. The word ~equences are, in ef~ect, concatenatQd templates having potential word ends detected during template matching.
~stablishing the ne~work in thls fashion allows analysis of those link~ that are unambiguously part o~ all potential word se~uences. ~his process of analy~is is ref~rred to as traceback. Properly employed, traceback also allows an e~ficient method o~ releasing llnk records whlch are no longer part of any sequence under still consideration.
Each link, or nodal tre~ connection, o~ the tree diagram requires several types of information to be stored. This information i~ stored in L-ACT, L-FWRD, L-8~K, L-WORD and ~ PTR array~ in memory ~or link table 16 o~ Figure 1. In the current embodiment, each array is 255 bytes long and i~ positioned one byte past a 256 byte boundary to allow efficient access. The corresponding elements ~rom each array constitute a "link record". The link records are chained into 2 linked list~. One list contains frae link record~, i.e. ompty record space available for additional links. The second list i~ the established list and it contains records for link currently being used. These lists are chained together by L-PT~ array, where one entry in L-PTR indica~es the next link record in the table, from the established or free list, where each record con~ains one byte from each o~ the five arrays. For example, if, for a given link record in the es~ablished li~t, the corresponding byte of thQ L-PTR array contains the binary ~' '~ `

. , .

~3~3~ ~

repres~ntation of the number "2", the next record o~ the astabllshed list would be found in the s2cond byto o~ all ~ive array~ A "zero" ~ntry in ths L-PTR array defines the end of tha linked li3t.
~ -BACK and L-WORD arraY9 contain the actual link information. L-BACK contains the pointer to the previous 1ink in the deCOding Path, 1.e. the PreViOU~ node in the tree diagram, whila L-WORD contain~ a s~mbol representing th~ word decoded ending at the current link. FQr . eXamP1e, in FigUr~ 3, after appending the word "~our" to tree nOde 26, L-WORD WOU1d COntain an 8-~it SYmbO1 repr~senting the word "~our" and L-BACK would contain a POinter pointing tO the link record corresponding to tree 1~ node 26. The other two arrayst L-ACT and ~-FWRD, are used to "traceback" through the decoding pàths (potential word sequences). L-ACT i5 used to indicate decoding paths still under consideration as a potential match, while L-FWRD is used to point to the subse~uent node in the trea diagram, the inversa o~ L-BACK. Hereinafter, those link records flagged as boing under considsration as being part o~ a potentially recognized path (active path) will be re~erad to as "active" link records.
The link records provide traceback in~ormation so that the path through the word model used to reach that ; state can be determined. Traceback also allows the tree to be pruned o~ nonuseful information. This is necessary to prevent in~ormation ~rom excessively accumulating in ; memory. Trac~back is also used to output words which are unambiguously recognized, i.e., common to all aCtiVQ
paths. The L-BAC~ entries in the link table point to previous entries in the tabl~, which correspond to previous connected node~ in ths tree diagram. Hence, tracebaok refers to the procesa of tracing back through the tree diagram to a point, or tree node, wher~ all ~ . ' q~

paths ~erge. The concept o~ tracing back to a point where all paths merge is well known to those skilled in tha art, for a general description "traceback" reference may be made to '~Partial TracebacX and Dynamic Programming", supra.
The above arrays are listed below ~or reference during discussion of the subsequent figures.

L-PTR: 255 byte~, each byte available for a pointer indicating the previous link record appended to the tree diagram (table) a~ a ~unction o~ time ~or the establi~hed list. It is also used to chain free link records in the free list.

L-BACK: 255 bytes, each byt~ available for a pointer indicating the previous link record in the tree diagram.

L-WO~D: 255 bytes, each byte available ~or a symbol indicating the potentially recognized word ; corresponding to the present link record.

L-ACT: 255 bytes, each byte available for indicating whether or not the present link record is active (used during traceback).

L-FWRD: 255 bytes, each byte available ~or a pointer indicating the subsequent valid link record in the tree diagram ~used during traceback).
In addition to the above arrays, ~ive other pointers are used. They are:

~3~
.

HEAD: a on~ byte pointer ~or indicating the first, i.a. the most recently added, link record in the established list, as chained by th~ L-PTR array.

FREE: a one byte pointer for indicating the first link xecord of the ~ree list, a~ chained by thQ L-PTR
array.

~`~ 10 PTR: ~ one byte pointer ~or referenclng the current tree node being processed.
, ~
T~Pl and TMP2: each i~ a ona byto temporary pointer used in the recognition ~lowchar~.
1~
Structurally, pre~uming a table with o~ly 10 entries in the established list, these arrays may be arranged as follows.

Rei'd# L-PTR L-BACK L-WORD L~ACT~ L-FWRD
2 3 3 four 0 0 4 1 2 one 0 0 ~ 5 6 4 three 0 0 :~ 25 6 10 10 ~ix 0 0 7 8 5 three 0 0 ; 8 5 5 nine 0 0 9 4 1 zero 0 0 9 4 one 0 0 The entriQs of the abova table are illustra~ed in a tree diagram in appendix A. It should be noted tha~ HEAD
points to record # 7, and the "0" in ~he L-PTR entry for record # 3:indicate~ ~he last record on the l~st. FREE
is not illustrated.
:

- . :: .' . ' ':
, ~3~3~1~

L-PTR allow~ the records in the established 11st to bQ ent~red by simply removing tho recoxd from the free list and designating its L-PTR entry to point to the HEA~
rscord, and updating HEAD and FREE. When a record is deleted from the table's established list during traceback, that record becomes available without xearranging th~ table by linking the record entry to the free list, and linXing over the removed record for the established list using the L-PTR entries. The entries for ~-ACT and L-FWRD are only used during tracQback and are otherwise always reset to zero.

; THE RECOGNITION FLOWCHART

Referring now to Figures 4a through 4c, a recognition flowchart is shown in accordance with the present invention. The flowchart o~ Figure 4a begins at block 30 by resetting the link tahle and its associated pointers.
The reset procedure includes setting each bytQ of L-FRWD
: 2 0 and L-ACT equal to O, setting HEAD pointer equal to 1 to indicate the beginning of the established list, and setting ~-PTR~l) and L-PTR(255) equal to O to indicate the end o~ the estahlished and ~ree lists, respectively.
Also, template state memory, typically stoxed in general memory, i~ set inactive. Accordingly, th~ first record compri~es the established list, and records 2 through 255 arQ chained using L-PTR entrieæ to form the free list, where HEAD points to the beginning of the established list and FREE points to the beginning of the free list ~link record #2)~
In block 32 of Figure 4a, the recognition grammar model is initialized. The initialization includes marking the initial node o~ the model active. This may be done by arbitrarily assigning it a low accumulated distance measure to indicate the starting point in the grammar. The link polnter for tbe initial node is set to .

:

~3~;34~

1, correcponding to the first entry initialized in the link table. Conversely, all other nodes in the grammar ~odQl are initialized as inactiv~. A node may be sst inactive by setting the accumulated distance for the node equal to infinity indicating that there i8 no llkelihood of being at this node at tha ~tart of processing.
In block 34 a traceback counker i8 in$tialized to 10.
Tho traceback counter is used to periodically ~ndicate that the traceback proces~ should be performed. In this embodiment~ traceback i8 performed after every 10 input ~rames are processed.
In block 36 the next input ~ramQ i~ input to the system for template matching, prQviou~ly ~ent~oned. The remaining steps of this ~lowchart all pertain to th~
pro~e~sing of the present input frame.
The traceback counter is decremented in block 38 to indicate a frame ha~ just been input.
At block 40 a test i9 performed to determine whether or not all nodes o~ the grammar model have been processed. In other words, whether or not the input ~rame has been proces~ed for the entlre grammar model.
If all nodes o~ the grammar model have been processed, ~low pxoceeds to block 42 to determine lf ~h2 traceback countar is indicating that traceback should be performed this frame. I~ so, traceback is performed by calling the traceback subroutine 44, subsequently discussed in ~ Figures 6a through 6d. Following traceback, the - traceback counter i8 re~et, block 46, befora processing the next input frame at block 36.
I~ all node~ of the gram~ar model have not been processed, flow proceed ~rom block 40 of ~lgure 4a to block 50 of Figure 4b. At block 50, processing of the recognition gra~mar modal proceeds to the next node. If no nodes have been processed for ~hi frame, "the next : ; .
;

.

~3n~

node" i~ the first node in the grammar model.
Therea~tar, in block 50, "the next node" refers to the node ~ollowing or a node parallel to the one ~ust processed. In particular, the ordering of nodes for processing must be such that a node i not processed until the originating node~ for all templates terminating at that node have been proce~sed ~or the current ~rame.
This 18 to insure that node accumulated dl~tancec and ; 10 links hav~ been updated for originating node~ of template3 before these templates ar~ processed.
In block 52, the new node i9 set inactive. This may be done by setting an accumulated distanc~ aasociated wikh the node equal to infinity.
A test i~ performed in block 54 to determine whether or not all template~ ending at this nodc have been processed. Later it will be recognized that each template immediately preceding each node o~ the grammar model is processed before moving to another node. If all the template~ ending at thi~ node have been processed, flow proceeds to block 68 o~ Figure 4c, subsequently discussed. I~ all the templates have not yet been processed, flow proceeds to block 56 to determine if, for the present input ~ramo, tracQback is ra~uired. Hence, the traceback counter, 18 compared to zero.
~ t this tims, it may be helpful to su~marize the processing of the recognition grammar model thus far indicat~d by the recognition ~lowchart. Referring once again to Figure 2, initial node 22 of the grammar model 3~ is set active and corresponding link table en~ry is initialized to indicate a reference from which all ~`` potential grammar paths (tree branches) eminate~ For each input frame processed, the grammar model is progressed through one node at a time from the beginning nod- to the ending node. Furthor, ~or each nodo o~ tbe ~3~34~ ~

gra~mar mod~l, each template end~ng at that node is pro~essed one template at a time, as will be discussed.
Accordingly, for each input frame~ each node is processed and, ~or each node, each template ending at the node is processed.
Regaxdless of whether or not traceback is required, a~ indicated in block 56, the next template i9 processed in either blocX 58 or block 60. Template ~atching at either block entail~ updating the accumulated distances and link pointers for all the states o~ the templat~
based on the current input fra~e, the templat~, and the accumulated distances and link pointers for all the states o~ the template as w~ll as the accumulated distance and link pointer ~or the originating node of the grammar model. If there is a potential word end for this template for the current frame, then an accumulated distance and link pointer will be generated corresponding to the potential word end. "An Algorithm for Connected ; 20 ~ord Recognition", supra.
If flow proceeds from block 56 to block 60, indicating thak traceback i9 being processed this frame, in addition to the template processing described above, all L-ACT entries, whiah correspond to the link records pointed to by the link pointers for each active state in the template, are set nonzero. The "active" template states are those which have a finite accumulated distance.
Flow next proceeds to block 62, where a test is par~ormed to determine whether or not ~he accumulated distance associated with the template is better than the current accumulated distance corresponding ~o ~he best previollsly processed template ending a~ the node for this frame (this will be inf$nity if this is the first template processed for this node). The outcome of this .

- ~3~3~

te~t is true only when template matching of the present input frame suggests a potential word end of the word template. As previously mentioned, a potential word end indicates that a sequence of input frames may correspond to, or match, a word template prestored in template memory.
I~ the template does not have a potential word end for the current input fr me, then it5 associated accumulated distance will be in~initP~
Flow proceedR back to block 54 if the ~sst recent template proce~sed-is not found to havs an accumulated distance which is better than thR previous accumulated distance stored for that nodQ, where additional ~emplates ending at the node are processed.
If the accumulated distance for th2 most recent template processed is found to be tha best of those yet processed for the node, flow proceeds to block 64 to record this in~ormation. In block 64, the accumulated distance and the link pointer corresponding to the above pro~essed template is recorded for the node o~ the grammar model. Additionally, the word number, or symbol, rQpresentlng the template i~ recorded. The word number is recorded for subse~uently outputting the word if it is later determined to have been reaognized. From block 64, flow proceeds to block 54 as discussed above.
once it i~ indicated at block 54 that all the templates ending at the node have been processed, ~low proceeds to block 68 o~ Figure 4c. In Figure 4c, block~
68 through 74 determine whether or not a link record should be added to the trea, and, if so, the link record is added to the tree through the link arrays.
In block 68, a test is p~rformed to determine if the node of the grammar model is active. The only manner in which the node could have become active is if at least :

13~3~

one word t~mplate processed ~or that node has a potential word end for tha current input frame. I~ the node is inactive, ~low proceeds to block 40 of Figure 4a to seek another node to proces~ for the current frame.
Otherwis~, flow proceed~ to block 70.
In block 70, a t~st i~ per~ormed to determine whether or not the best template ending at the node was a word template. In certain instances, it may havQ been another 0 ~YPQ Or template such as a silence template, in which case flow would proceed to block 40 of Figure 4a.
Silence template~ are not added to the tree since it is typically not necessary to output silence a~ having been recognized. If the best template endlng at this node was a word template, LINK subroutine, Figure 5, i~ called to add a link record to the tree diagram. Parameters are passed to LINK subroutine indicating the link pointer corresponding to the originating link record for the template and the word number representing the tem~late.
A~ter the link record i8 added, discussed below, a new link pointer i~ returned from LINK. In block 74, the llnk pointer for the current node of the grammar model is set to the link pointer passed from LINX.
Following bloak 74, flow proceeds to block 40 of Flgure 4a to check i~ all the nodes o~ the grammar tree have been processed for the current input ~rame.

ADDING A LINK TO THE TREE

Referring now to Figure 5, as previously mentioned thi~ subroutine add~ a link re¢ord to the tree diagram as defined by the link arrays. The param~ters which ara passed to the subroutine are the word number and the link `~ 35 :
:`

.

~3~3~0 pointer corresponding to the node of the tree from which it i~ being added.
In block 78, a test i8 pre~ormed to determine whether or not there are any freQ link record~. This is done by comparing FREE to O. I~ FREE equal~ 0, there are no more free link record~. Discussed above, the records in the ; link arrays are chained together by L-PTR array, which comprises ~ree link records and established link records.
1~ Fra~ link records allow additional link records to be added to the tree diagram. Accordingly, i~ there ara no ~ree link records, all the link records are b~ing used and flow proceeds to block 80 whexe an error is reported and tha system i~ resQt. It should be note~ tha~ this step in block 80 i8 only used as a protection from ; unu~ual conditions which might lead to a link table overflow. Under normal cond~tions, the invention will pr~vent the absence o~ ~ree link records by using a link tabls o~ adequate length.
Given one or more free link records, flow proceeds to block 82 where the next available link record is removed ~rom the fre~ list and inserted at the top, or beg~nning o~ the established list by updating the H~AD and FREE
pointers. FREE i9 set to point to the lndex of the next ~ree record and HEAD i8 set to point to the llnk record ~u~t added. L-PTR o~ the new HEAD link record is set to point to the previou~ H~AD link record to chain the new rscord onto the established listo In block 86, since HEAD points to the link record ~ust added in the established list, the word number passed to this subroutine is recorded ~or the new record in the L-WORD array. Also, the link pointer passed to the subrou~ine is recorded ~or the link record in the L-BACK array~

13~3L3~(~
~ . .

At block 88, a test i~ parformed to determine whether or not traceback i8 required ~or the current input ~rama.
Ir 80, ~low proceeds to block 90 whare the newly added linX record i~ marked active. This i8 done by setting the L-~CT array for the record equal to one. I~
traceback is not re~uired for th~ current input frame, the subroutine ends and flow returnq to block 74 of Fig.
4c.
~ 10 .:

:

' 35 .:
. .

:
-:
: .

~3~1~L3~ `

TRACING BACK THROUGH THE TREE

XQferring now to Fig~. 6a through 6d, the traceback subroutino, i.e., block ~4 o~ Fig. 4a, is shown s in detail. The traceback subroutine searches through the tree diagram for words which have been identified as potential matches, and determines whether or not there i~
any ambiguity to th~ uniqueness o~ the match. Those word wh$ch have been uniguely identifi~d ara output from thQ recognition syst~ as recognized word~O Further, the raceback subroutine remove3 all dead linX records, i.e.
those records that ar~ no longer under con~ideration a~ a potential match, to the free list to make memory available for future link records. B~ore entering 1~ TRACE~ACK, L-AC~ is set, or flag~ed, for all active link record~ as described above. At the start of traceback;
active link records represent the ends of all paths through the tree which are still under consideration.
The baslc concept of traceback is to "traceback" through the trae from the ends o~ all active paths (marked initially by L-ACT array) to find whera all active paths merge. The part of the tree which is common to all -; active paths represents the unambiguous partial path and those word~ corresponding to this unambiguous partial path may be output as being recognized. During traceback the L-FWRD array is used to chain partial paths in the forward direction (toward the ends o~ the tree). As thesQ partial paths are built tha base node ~or each partial path is set active via th~ L~ACT array. When an attempt is made to extend a partial path ~using L-BACK
information) from a current nod~ to a previous node which i5 already marked activet then more th n one possible path eminates from ~hi~ previous node and the partial paths from both nodes are dele~ed (forward pointer chains '~

: . ,....:, .
..

:~L3~3.3~

(L-FWRD) reset to 0). All nodes marked active are processed this way. The order of node processing is in the reverse order of time sequence the link records S were added into the link tables. This ordering is inherent in the structure of the established list. The last node to be processed will be the root node of the tree.
At this point the forward chain (partial path) emanating from the node will represent the unambiguous partial path and the corresponding recognized words may be output. The traceback procedure also "cleans up" after itself in that upon 10 completion the l,FVVRD and L-ACI arrays will have been reset to zero.
Further, all link records not on any active path, as well as the link records on the unambiguous partial path which has been outputted, will be returned to the free list.

Before describing the traceback subroutine in detail, it may be helpful to step through an illustrated exarnple. Referring to Figure 7, such an example is depicted as a series of tree diagrams, A through I.

In diagram A, the tree is shown prior to traceback, active link records, i.e. those links t`hat have active word links emanating from them are marked to the right of the link with a bold dot. The first step of traceback is to determine the most recent active link record as added in time, in this case the node marked 25. ~ test is performed to determine if there is an active node immediately preceding this node in the tree. If so, there is ambiguity as to which path, as in this instance via node 25 or node 21, traces back to node 24. When ambiguity occurs, the forward pointer for the nodes of ambiguity have their chained forward pointers removed if they exist. This is done by inserting a zeroin the L-FWRD array for each subsequent link record. In this ~ .

ln~tanc~, neither node ha~ ~ forward pointer, i.e.
L-FWRD~2O o The next most recently added active link record 5 i3 then identi~ied, node 24. Node 24'g preceding link is also noded 21 and is processed similar to the processing for node 25, discussed above.
The next most recently added active nodQ i~ node 23. Since it does not have a preceding node which is active, the traceback process sets ths preceding node active and records a forward pointer (L-FWRD) ~or the preceding node, node 19, equal to the node presently baing processed, node 23. Diagram B illustrate~ tree diagram A after proces~ing node 23, with th~ ~orward pointer added to node 19 depicted as a bold line. It should be noted that after each node is processed, it3 L~ACT entry which indicates that it is an active node, is removed. Hence, tree diagram B no longer depicts nodes 23, 24 and 25 as active.
Node 22 i5 the next most recsntly added entry with an active node. It does not have an active preceding node. Therefore, as was dono ~or node 23, the ~orward pointer for node 1~ is set equal to node 22, and node 18 is marked active. Diagram C illustrates the tree after procQssing node 22.
Node 21 is handled similar to node 22, since its preceding node is not activ~, shown in diagram D.
Node 20 is the next ackive node to be processed.
~ Th~ node preceding node 20 is active, which indicates !, 30 ambiguity. When ambiguity occurs, the forward pointer for the nodes of ambiguity, in this CaS2 nodes 18 and 20, have their forward pointer chains removed. In this case only node 18 has a ~orward pointer. As in~icated in the bold line after node lB, the forward pointer ~or node 18 ha~ been set equal to nods 22. Hence, in dlagram E, the .:

~3~3~

bold l ine i8 removed, by setting the forward pointer for node 18 Qqual to 0.
Node 19 is tha next most recently added entry with an active node. Since it8 preceding node is active, ambiguity requires that the forward pointer chains for both nodes 16 and 19 bs removed, a~ illustrated in Diagram F.
Node 18 is processed next. Since its preceding node, 16, is active but neither nodes ~6 or 18 ha~ a forward pointer, no action ic taken except removing the bold dot indicating the activity of the node.
Tha next active node i~ node 16, which has a preceding node which is not active. In thi~ case, the traceback proces~ sets the preceding node active and records a forward pointer for the preceding node, node 13, cqual to the node presently being procassed, node 16.
Diagram G illustrates tree diagram A a~ter pxocessing node 16.
Node 13 is handled similar to node 16. Hence, in diagram H the tree is shown with only node 11 ~arked activa and only forward pointers for nodes 11 and 13 remain.
On~e the traceback procQs~ arrives at the root node in the tree (node 11), that node which is not preceded by any other nodes, it outputs words chained through the ~orward pointers as recognized words. This is done by sequentially looking to those link records having chained forward pointers starting at the root node, in L-FWRDo As illustrated in Diagram I, the link records which record "eight" and "five" in ~heir respective L-WORD arrays are output. Additionally, those link records representing the links b tween nodes 11 and 16, in ~iagram H, are removed from the as~ablished list and linked into the free list, as indicated by L-PTR

~3~

array ~nd FREE pointer, ~t which time the new root node o~ the tree is node 16. The remaining tree i5 ; illustrated in Diagram I, which is used a~ the additional input frame~ are processed, i.e. where flow returns to block 46 or Fiqure 4a in tha recognition flowchart.
Referring now to Fi~ure~ 6a-6d, the traceback flowchart will b~ discussed ln d~tail. In Fig. 6a, ~he link tabl~ i8 ~earched for the most recently added link record which i active. In block 94, a test is per~ormad to determine if the fir~t record in th~ link table i~ the only record in the establ~s~ed record li~t. Thiq i~ done by looking into the index of array L-PTR which is pointed to by HEAD. As previously mentioned, HEAD contains the ; 15 index for the most recently added link record. If the L-PTR entry corresponding to ~EAD is equal to zero, the chain i8 terminated and no additional record~ are in the table, in which case, flow proceeds to block 96 where the corresponding L-ACT entry ls set inactiv~. From block 96, the subroutine returns to the recognition ~lowchart at bloak 46 at Flg. 4a.
In the ca3e where there are additional link records in the table, a test is performed in block 98 to determine if th~ ~irst link recoxd is active. The link record is active i~ its corresponding L-ACT entry is not equal to zeroO
If the link record is acti~e, flow proceeds to block lO0. In block lO0, to indicate that the link record has been accounted ~or, the link record is set inactive. Flow then proceeds to block llO where the record pointed to by HEAD i9 stored in th~ temporary poin~er, PTR. From block llO, flow proceeds to block 120, subsequently discussed.
If the first link record turns out ~o be lnactiv3, ~low proceods frc~ block 98 to b1Ock 112. In ~3~J~3~
- 29 - .
; block 112, L-PTR i~ stepped through until the first active link record i3 found, at which time the index for that active record is stored in PTR. In block 114, the activ~ indicator is cleared to indicate processing for that link recQrd, similarly dons in block 100.
In block 116, those link rQcords ~ound inactive, : a3 indicated between pointer~ HEAD and ~TR, are returned to the ~ree list ~or future use.
In block 118 a test is performed to determine i~
there are any more l~nks ln the link table. This test is similar to the test per~ormed ln block 94, above.
If there ara no more link record in the established list, the subroutine return~ to the recognltion flowchart at block 46 of Figure 4a.
I~ block 120 a test i~ par~ormed to determina whether or not the link (node), preceding tha current link is inactive. This is done by looking to the back pointer ~or the current link record and looking to its ; 20 corresponding L-ACT entry. If the preceding node is active, flow proceeds to block 12~ o~ ~ig. 6b to handle the problem of ambiguity, previously discussed. I~ th~
preceding node i9 inactive, flow proceeds to block 142 o~
Fig~ 6c.
Referring now to Fig. 6b, the steps therein handle the ambiguity, when the preceding node indicates . that there may be two or more link records emanating from that node that are still under consideration for a match.
This occurs when a node preceding an active node is also actiYe. The three temporary pointers (TMPl, TMP2 and PTR) are used in Fig. 6b for manipulation of link record data. The steps in block 121 remove ~he ~orward pointer chain from the previous link. These steps include blocks 124, 126, 128 and 130. Figure 6~ is entered with P~R

;~

.

~3t~

poin~ing to the nodQ, or link r~cord, currently being proce3sad. The L-BACR entry correspond to the link record point~ to the node immediately preceding tho node currently being processed, as mentioned above in ; discu~sion with Fig. 7. The L FWRD entry associated with th~ link record indicates the only potential descendant link record.
- In block 124, a pointer to ths node immediately preceding the current active node is stored in TMPl. In block 126, the descendant link record, as indicated in L-FWRD, of the preceding node ~s stored TMP2.
In block 12~, a test is perfor~ed to detsrmina if the node pointed to by TMP2 has an actual L-FWRD ~ntry or if it is set to zero. If tho node pointed to by T~P2 does have a forward pointer (L-FWRD not equal to 0), flow pxoceeds to block 130 where the forward pointer for that node i~ removed. In block 130, the contents of TMP2 is also moved to T~Pl, which temporarily move~ reference of the current node to tha node pointed to by TMP2, then, starting wlth block 126, the above steps are repeated for the subsequent nodes until a node is found in the forward chain which does not have a ~orward pointer, as indicated in block 128, indicating the end of the forward chain.
The step~ in block 122 remove the ~orward pointer chain from the current link, as indicated by PTR. In block 132 the current link record pointer, PTR, is stored in TMPl. In block 134, the ~-FWRD entry for that link record i5 stored in T~P2 . In block 136, a test is performed to determine if a forward pointer exists for ~ this lin~ record, as was done in block 128 above. If ; there is a forward pointer for this link record, flow proceeds to block 138 where ~he ~orward pointer is removed and the descendant node is stored in TMPl for removlnq lts forward pointer as woll. Startinq at block ' .; ,~ .

,',: ., . .

~3~3~ `

134, the above step~ are repeated until all ~orward polnt~rs chained ~rom ths current node are deleted.
Then, ~ro~ block 136, flow proceed~ to block 144 o~ Fig.
6c to process tha next active link record.
Re~erring back to block 120 of Fig. 6a, i~ the node preceding the current node is inactive, flow proceeds to block 142 of Flg. 6c, which is now discussed.
In block 142 of Fig. 6c, since the pxeceding node lo (linX record) was found inactive, it i~ set active and its ~orward pointer i~ set to point to the current link record. In block 144, starting at the current link record, khe t~ble i searched until the next active link record i8 ~oun~ and PTR is set to point to this record, indicating the new current link record. In block 146, all inactive records encountered during the step o~ block 144 are returned to the free list by modifying the appropriate entries in the L-PTR array. ~n block 148 the new node is set inactive to indicate that it has been accounted for, similarly done in blocks 100 and 114 of Figs. 6a.
A test i8 performed in block 152 to determine whether or not this new link record i8 the last in the chain. I~ 80~ then all link records have baen processed and flow proceeds to block 156 of Fig. 6d to output the words recognized during the traceback process. If this new link record is not the last in the chain, then flow proceeds to block 120 of ~ig. 6a ~or additional processing.
R~erring now to Fig. 6d, in block 156 the index ~or the current 1 ink record, which i8 tha root node in the tree, is stored in TMPl. At hlock 158, the nod~
(link record) representad by tha current node 1 8 forward pointer iR stored in TMP2. For example, referring to ,;

~3~3~!D

diagram H of Fig. 7, TMPl would contain 11 (node 11), and TMP2 ~ould contain 13 (node 13).
In block 160, a test is performed to determine if there are any forward pointers descending from the node stored in TMPl. This is performed by comparing the content~ of TMP2 to zero. If there are any forward pointers descending from thQ node stored in TMPl, flow proceed~ to block 162 where the forward pointer for the current nod~ i~ removed, and the current node is moved up to th~ next node in the forward chain, aY indicated by th~ current node's forward pointer stored in ~MP2. In block 164, the word associated with the current link record i8 output as a recognlzed word. Starting with ` 15 block 158, the above s eps are repeated until each link record in the forward pointer chain has its associated word output as a recognized word. During the step in block 160, a de~cending link record will he found which does not have a forward pointer, in which case, flow proceeds to block 168 where all dead link records, as indicated botwean TMPl and PTR, are returned to the ~ree list. Additionally, L-PT~ array is updated in block 1~8 by settin~ the L-PT~ entry ~or the new base node which is currently point~d to by PTR, with zero. The zero in this record ind~cates the root of the tree and the end of the e~tablishQd li~t o~ link record~. From block 168, the traceback is complete and flow proceeds to block 46 of Fig. 4a.
The present invention therefore provides a new and improved system and method for continuous speech recognition. The invention can readily be implemented as described in the efficiently ~ormula~ed abov~ flowcharts to accommodate real time recognition in a simple and inexpensive 8-bit processor. The present invention additionally provides excellent memory management such ;
:

.~. .

~3~

that only a minimum number o~ link records are needed to b~ stored as input ~rames are processed.
While the invention has been particularly shown and described with rsfQrence to a preferred e~bodiment, it will be understood by those skilled in the art that ~ various modi~ications and changes may be made to the ; present invention described above without departing from ~: the spirit and scope thereo~.
~` 10 What i~ claimed is:

'

Claims (56)

1. In a speech recognition system, wherein input frames are processed against prestored templates representing speech, and templates which are under consideration as potentially recognized templates are individually recorded in alinked network as link records, said link records generally having ancestor and descendant link records, an arrangement for recognizing speech patterns, comprising:

means for providing temporary pointers for said link records;

means for tracing back through said network while labeling selected ones of said link records with said temporary pointers to connect potentially recognized templates corresponding to said ones of link records;

means for determining link records which may have two or more potentially recognized descendant link records;

means for deleting said temporary pointers corresponding to said determined link records; and means for outputting data corresponding to said link records still labeled with said temporary pointers.
2. An arrangement for recognizing speech patterns, according to claim 1, including means for periodically alerting the system after a predetermined number of input frames are processed.
3. An arrangement for recognizing speech patterns, according to claim 1, including means for building the network according to a predetermined grammar model topology.
4. An arrangement for recognizing speech patterns, according to claim 1, including means for adding a link record to the network after a potential word end for at least one of said templates is recognized.
5. An arrangement for recognizing speech patterns, according to claim 1, including means for indicating the most recently appended link record in the network.
6. An arrangement for recognizing speech patterns, according to claim 1, including means for indicating a link record has no ancestor link records.
7. An arrangement for recognizing speech patterns, according to claim 1, including means for storing said link records in a table composed of free link records and established link records.
8. An arrangement for recognizing speech patterns, according to claim 7, including means for indicating the beginning of the established link records and the beginning of the free link records.
9. An arrangement for recognizing speech patterns, according to claim 7, including means for indicating the end of the established link records and the end of the free link records.
10. An arrangement for recognizing speech patterns, according to claim 1, including means for deleting temporary pointers for link records descending from said determined link records.
11. In speech recognition system, wherein input frames are processed against prestored templates representing speech, and templates which are under consideration as potentially recognized templates are individually recorded in a linked network as link records, said link records generally having ancestor and descendant link records, an arrangement for recognizing speech patterns, comprising:
means for storing said link records as indexed data sets, each data set including a symbol representing the template, a sequence indicator representing the relative time the link record was stored, a first pointer indicating a link record in the network from which each descends, and a second temporary pointer;
means for tracing back through the network using said indexed data sets, including said second temporary pointer, to identify at least one link record which is unambiguously recognized; and means for outputting said unambiguously recognized link records.
12. An arrangement for recognizing speech patterns, according to claim 11, including means for periodically alerting the system after a predetermined number of input frames are processed.
13. An arrangement for recognizing speech patterns, according to claim 11, including means for building the network according to a predetermined grammar model topology.
14. An arrangement for recognizing speech patterns, according to claim 11, including means for adding a link record to the network after a potential word end for at least one of said templates is recognized.
15. An arrangement for recognizing speech patterns, according to claim 11, including means for indicating the most recently appended link record in the network.
16. An arrangement for recognizing speech patterns, according to claim 11, including means for indicating a link record has no ancestor link records.
17. An arrangement for recognizing speech patterns, according to claim 11, including means for storing said link records in a table composed of free link records and established link records.
18. An arrangement for recognizing speech patterns, according to claim 17, including means for indicating the beginning of the established link records and the beginning of the free link records.
19. An arrangement for recognizing speech patterns, according to claim 17, including means for indicating the end of the established link records and the end of the free link records.
20. An arrangement for recognizing speech patterns, according to claim 11, including means for deleting temporary pointers for link records descending from said determined link records.
21. In speech recognition system, wherein input frames are processed against prestored templates representing speech, and templates which are under consideration as potentially recognized templates are individually recorded in a linked network as link records, said link records generally said network having ancestor and descendant link records, an arrangement for recognizing speech patterns, comprising:
means for storing said link records as indexed data sets in a table, each data set including a symbol representing the template, a sequence indicator representing the relative time the link record was stored and a pointer indicating a link record in the network from which each descends;
said table composed of free record space and established record space, wherein said link records are stored in said established record space;
means for tracing back through the network using said indexed data sets to identify one or more link records whose corresponding templates are unambiguously recognized;
means for outputting data representing said unambiguously recognized link records, and removing said link records from said established record space, whereby said removed link records becomes free record space for subsequently storing link records.
22. An arrangement for recognizing speech patterns, according to claim 21, including means for periodically alerting the system after a predetermined number of input frames are processed.
23. An arrangement for recognizing speech patterns, according to claim 21, including means for building the network according to a predetermined grammar model topology.
24. An arrangement for recognizing speech patterns, according to claim 21, including means for adding a link record to the network after a potential word end for at least one of said templates is recognized.
25. An arrangement for recognizing speech patterns, according to claim 21, including means for indicating the most recently appended link record in the network.
26. An arrangement for recognizing speech patterns, according to claim 21, including means for indicating a link record has no ancestor link records.
27. An arrangement for recognizing speech patterns, according to claim 21, including means for determining link records which have may have two or more potentially recognized descendant link records.
28. An arrangement for recognizing speech patterns, according to claim 21, including means for selecting particular link records in said established record space, whose corresponding templates have not been unambiguously recognized, and returning said particular link records to said free record space.
29. In speech recognition system, wherein input frames are processed against prestored templates representing speech, and templates which are under consideration as potentially recognized templates are individually recorded in a linked network as link records, said link records generally having ancestor and descendant link records, a method for recognizing speech patterns, comprising the steps of:
providing temporary pointers for said link records;
tracing back through said network while labeling selected ones of said link records with said temporary pointer to connect potentially recognized templates corresponding to said ones of link records;
determining link records which may have two or more potentially recognized descendant link records;
deleting said temporary pointers corresponding to said determined link records; and outputting data corresponding to said link records still labeled with said temporary pointers.
30. A method for recognizing speech patterns, according to claim 29, including the step of periodically alerting the system after a predetermined number of input frames are processed.
31. A method for recognizing speech patterns, according to claim 29, including the step of building the network according to a predetermined grammar model topology.
32. A method for recognizing speech patterns, according to claim 29, including the step of adding a link record to the network after a potential word end for at least one of said templates is recognized.
33. A method for recognizing speech patterns, according to claim 29, including the step of indicating the most recently appended link record in the network.
34. A method for recognizing speech patterns, according to claim 29, including the step of indicating a link record has no ancestor link records.
35. A method for recognizing speech patterns, according to claim 29, including the step of storing said link records in a table composed of free link records and established link records.
36. A method for recognizing speech patterns, according to claim 35, including the step of indicating the beginning of the established link records and the beginning of the free link records.
37, A method for recognizing speech patterns, according to claim 35, including the step of indicating the end of the the established link records and the end of the free link records.
38. A method for recognizing speech patterns, according to claim 29, including the step of deleting temporary pointers for link records descending from said determined link records.
39. In speech recognition system, wherein input frames are processed against prestored templates representing speech, and templates which are under consideration as potentially recognized templates are individually recorded in a linked network as link records, said link records generally having ancestor and descendant link records, a method for recognizing speech patterns, comprising the steps of:
storing said link records as indexed data sets, each data set including a symbol representing the template, a sequence indicator representing the relative time the link record was stored, a first pointer indicating a link record in the network from which each descends, and a second temporary pointer;
tracing back through the network using said indexed data sets, including said second temporary pointer, to identify at least one link record which is unambiguously recognized; and outputting said unambiguously recognized link records.
40. A method for recognizing speech patterns, according to claim 39, including the step of periodically alerting the system after a predetermined number of input frames are processed.
41. A method for recognizing speech patterns, according to claim 39, including the step of building the network according to a predetermined grammar model topology.
42. A method for recognizing speech patterns, according to claim 39, including the step of adding a link record to the network after a potential word end for at least one of said templates is recognized.
43. A method for recognizing speech patterns, according to claim 39, including the step of indicating the most recently appended link record in the network.
44. A method for recognizing speech patterns, according to claim 39, including the step of indicating a link record has no ancestor link records.
45. A method for recognizing speech patterns, according to claim 39, including the step of storing said link records in a table composed of free link records and established link records.
46. A method for recognizing speech patterns, according to claim 45, including the step of indicating the beginning of the established link records and the beginning of the free link records.
47 47. A method for recognizing speech patterns, according to claim 17, including the step of indicating the end of the established link records and the end of the free link records.
48. A method for recognizing speech patterns, according to claim 39, including the step of deleting temporary pointers for link records descending from said determined link records.
49. In speech recognition system, wherein input frames are processed against prestored templates representing speech, and templates which are under consideration as potentially recognized templates are individually recorded in a linked network as link records, said link record generally said network having ancestor and descendant link records, a method for recognizing speech patterns, comprising the steps of:
storing said link records as indexed data sets in a table, each data set including a symbol representing the template, a sequence indicator representing the relative time the link record was stored and a pointer indicating a link record in the network from which each descends, said table composed of free record space and established record space, wherein said link records are stored in said established record space;
tracing back through the network using said indexed data sets to identify one or more link records whose corresponding templates are unambiguously recognized;
outputting data representing said unambiguously recognized link records, and removing said link records from said established record space, whereby said removed link records becomes free record space for subsequently storing link records.
50. A method for recognizing speech patterns, according to claim 49, including the step of periodically alerting the system after a predetermined number of input frames are processed.
51. A method for recognizing speech patterns, according to claim 49, including the step of building the network according to a predetermined grammar model topology.
52. A method for recognizing speech patterns, according to claim 49, including the step of adding a link record to the network after a potential word end for at least one of said templates is recognized.
53. A method for recognizing speech patterns, according to claim 49, including the step of indicating the most recently appended link record in the network.
54. A method for recognizing speech patterns, according to claim 49, including the step of indicating a link record has no ancestor link records.
55. A method for recognizing speech patterns, according to claim 49, including the step of determining link records which may have two or more potentially recognized descendant link records.
56. A method for recognizing speech patterns, according to claim 49, including the step of selecting particular link records in said established record space, whose corresponding templates have not been output, and returning said particular link records to said free record space.
CA000538501A 1986-06-02 1987-06-01 Continuous speech recognition system Expired - Lifetime CA1301340C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
PCT/US1986/001224 WO1987007749A1 (en) 1986-06-02 1986-06-02 Continuous speech recognition system
USPCT/US86/01224 1986-06-02

Publications (1)

Publication Number Publication Date
CA1301340C true CA1301340C (en) 1992-05-19

Family

ID=22195542

Family Applications (1)

Application Number Title Priority Date Filing Date
CA000538501A Expired - Lifetime CA1301340C (en) 1986-06-02 1987-06-01 Continuous speech recognition system

Country Status (4)

Country Link
US (1) US5040127A (en)
JP (1) JP2717652B2 (en)
CA (1) CA1301340C (en)
WO (1) WO1987007749A1 (en)

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3043439B2 (en) * 1990-12-28 2000-05-22 富士通株式会社 Network storage method in data processing device
DE4130633A1 (en) * 1991-09-14 1993-03-18 Philips Patentverwaltung METHOD FOR RECOGNIZING THE SPOKEN WORDS IN A VOICE SIGNAL
DE4130631A1 (en) * 1991-09-14 1993-03-18 Philips Patentverwaltung METHOD FOR RECOGNIZING THE SPOKEN WORDS IN A VOICE SIGNAL
US6385312B1 (en) 1993-02-22 2002-05-07 Murex Securities, Ltd. Automatic routing and information system for telephonic services
US5848388A (en) * 1993-03-25 1998-12-08 British Telecommunications Plc Speech recognition with sequence parsing, rejection and pause detection options
US5819222A (en) * 1993-03-31 1998-10-06 British Telecommunications Public Limited Company Task-constrained connected speech recognition of propagation of tokens only if valid propagation path is present
US6230128B1 (en) 1993-03-31 2001-05-08 British Telecommunications Public Limited Company Path link passing speech recognition with vocabulary node being capable of simultaneously processing plural path links
ATE200590T1 (en) * 1993-07-13 2001-04-15 Theodore Austin Bordeaux VOICE RECOGNITION SYSTEM FOR MULTIPLE LANGUAGES
JP3004883B2 (en) * 1994-10-18 2000-01-31 ケイディディ株式会社 End call detection method and apparatus and continuous speech recognition method and apparatus
US5970457A (en) * 1995-10-25 1999-10-19 Johns Hopkins University Voice command and control medical care system
US5799276A (en) * 1995-11-07 1998-08-25 Accent Incorporated Knowledge-based speech recognition system and methods having frame length computed based upon estimated pitch period of vocalic intervals
US5835893A (en) * 1996-02-15 1998-11-10 Atr Interpreting Telecommunications Research Labs Class-based word clustering for speech recognition using a three-level balanced hierarchical similarity
US5872938A (en) * 1996-06-28 1999-02-16 International Business Machines Corp. Service priority queue implemented with ordered sub-queues and sub-queue pointers pointing to last entries in respective sub-queues
US5950160A (en) * 1996-10-31 1999-09-07 Microsoft Corporation Method and system for displaying a variable number of alternative words during speech recognition
US5884258A (en) * 1996-10-31 1999-03-16 Microsoft Corporation Method and system for editing phrases during continuous speech recognition
US5829000A (en) * 1996-10-31 1998-10-27 Microsoft Corporation Method and system for correcting misrecognized spoken words or phrases
US5899976A (en) * 1996-10-31 1999-05-04 Microsoft Corporation Method and system for buffering recognized words during speech recognition
US6006181A (en) * 1997-09-12 1999-12-21 Lucent Technologies Inc. Method and apparatus for continuous speech recognition using a layered, self-adjusting decoder network
US5983180A (en) * 1997-10-23 1999-11-09 Softsound Limited Recognition of sequential data using finite state sequence models organized in a tree structure
GB2333172A (en) * 1998-01-13 1999-07-14 Softsound Limited Recognition
US6442520B1 (en) 1999-11-08 2002-08-27 Agere Systems Guardian Corp. Method and apparatus for continuous speech recognition using a layered, self-adjusting decoded network
US20040205035A1 (en) * 2000-05-01 2004-10-14 Norbert Rimoux Method and system for adaptive learning and pattern recognition
KR100450396B1 (en) * 2001-10-22 2004-09-30 한국전자통신연구원 Tree search based speech recognition method and speech recognition system thereby
US20040064315A1 (en) * 2002-09-30 2004-04-01 Deisher Michael E. Acoustic confidence driven front-end preprocessing for speech recognition in adverse environments
CN1839383A (en) * 2003-09-30 2006-09-27 英特尔公司 Viterbi path generation for a dynamic Bayesian network
US7529657B2 (en) * 2004-09-24 2009-05-05 Microsoft Corporation Configurable parameters for grammar authoring for speech recognition and natural language understanding
US20070214153A1 (en) * 2006-03-10 2007-09-13 Mazzagatti Jane C Method for processing an input particle stream for creating upper levels of KStore
US8086463B2 (en) * 2006-09-12 2011-12-27 Nuance Communications, Inc. Dynamically generating a vocal help prompt in a multimodal application
US8060367B2 (en) * 2007-06-26 2011-11-15 Targus Information Corporation Spatially indexed grammar and methods of use
US9390708B1 (en) * 2013-05-28 2016-07-12 Amazon Technologies, Inc. Low latency and memory efficient keywork spotting

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4277644A (en) * 1979-07-16 1981-07-07 Bell Telephone Laboratories, Incorporated Syntactic continuous speech recognizer
US4783809A (en) * 1984-11-07 1988-11-08 American Telephone And Telegraph Company, At&T Bell Laboratories Automatic speech recognizer for real time operation
JPS62275300A (en) * 1986-05-16 1987-11-30 沖電気工業株式会社 Continuous voice recognition

Also Published As

Publication number Publication date
US5040127A (en) 1991-08-13
JP2717652B2 (en) 1998-02-18
WO1987007749A1 (en) 1987-12-17
JPH01502611A (en) 1989-09-07

Similar Documents

Publication Publication Date Title
CA1301340C (en) Continuous speech recognition system
US4819271A (en) Constructing Markov model word baseforms from multiple utterances by concatenating model sequences for word segments
US5528701A (en) Trie based method for indexing handwritten databases
CN111145728B (en) Speech recognition model training method, system, mobile terminal and storage medium
Young et al. The HTK book
CN103229232B (en) Speech recognition device and navigation device
JPH0772839B2 (en) Method and apparatus for grouping phoneme pronunciations into phonetic similarity-based context-dependent categories for automatic speech recognition
US5553284A (en) Method for indexing and searching handwritten documents in a database
WO1992014237A1 (en) Method for recognizing speech using linguistically-motivated hidden markov models
DE3876379D1 (en) AUTOMATIC DETERMINATION OF LABELS AND MARKOV WORD MODELS IN A VOICE RECOGNITION SYSTEM.
JP4289715B2 (en) Speech recognition apparatus, speech recognition method, and tree structure dictionary creation method used in the method
EP0248377B1 (en) Continuous speech recognition system
CN102027534B (en) Language model score lookahead value imparting device and method for the same
CN111144097A (en) Modeling method and device for emotion tendency classification model of dialog text
JP3914709B2 (en) Speech recognition method and system
CN110413779B (en) Word vector training method, system and medium for power industry
CA1336017C (en) Continuous speech recognition system
JP2000267693A (en) Voice processor and index preparation device
JP3818154B2 (en) Speech recognition method
JP2738403B2 (en) Voice recognition device
Thomae et al. A One-Stage Decoder for Interpretation of Natural Speech
JPH09244688A (en) Speech recognizing method
JP3369121B2 (en) Voice recognition method and voice recognition device
CN117316145A (en) Uygur language voice keyword retrieval method and device and electronic equipment
JPS615300A (en) Voice input unit with learning function

Legal Events

Date Code Title Description
MKLA Lapsed