CA2235361A1 - Telephone-based speech recognition for data collection - Google Patents

Telephone-based speech recognition for data collection Download PDF

Info

Publication number
CA2235361A1
CA2235361A1 CA002235361A CA2235361A CA2235361A1 CA 2235361 A1 CA2235361 A1 CA 2235361A1 CA 002235361 A CA002235361 A CA 002235361A CA 2235361 A CA2235361 A CA 2235361A CA 2235361 A1 CA2235361 A1 CA 2235361A1
Authority
CA
Canada
Prior art keywords
recognition
data collection
script
collection system
utterances
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA002235361A
Other languages
French (fr)
Inventor
Susan J. Boyce
Lynne S. Brotman
Deborah W. Brown
Randy G. Goldberg
Edward D. Haszto
Stephen M. Marcus
Richard R. Rosinski
William R. Wetzel
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AT&T Corp
Original Assignee
AT&T Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AT&T Corp filed Critical AT&T Corp
Publication of CA2235361A1 publication Critical patent/CA2235361A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/487Arrangements for providing information services, e.g. recorded voice services or time announcements
    • H04M3/493Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2201/00Electronic components, circuits, software, systems or apparatus used in telephone systems
    • H04M2201/40Electronic components, circuits, software, systems or apparatus used in telephone systems using speech recognition

Abstract

A production script for interactions between participants and an automated data collection system is created by selectively tuning an experimental script through successive trials until a recognition rate of the system is at least an acceptability threshold. The data collection system uses a semi-constrained grammar, which is a practical accommodation of a larger range of possible inputs than a menu. The data collection system collects data by recognizing utterances from participants in accordance with the production script.

Description

TELEPHONE BASED SPEECH RECOGNITION FOR DATA COLLECTION
3 The present invention relates to telephone-based data collection systems, and, 4 more particularly, is directed to a voice recognition system using a script including s prompts and corresponding recognition gl~llll~S which are empirically refined.
6 Many industries collect data from respondents by telephone. For ql-~lit~tive 7 information, a human agent is convelltionally pl~felled. However, for certain8 applications, such as mP~ l patient mo~ olillg, dependence on human interviewers g prohibitively increases the cost of galhering data as often as desired. For ~ t~ e 10 information, an ~ o.ll~t~d data collection system is generally adequate, and in some 11 cases may be p~efelred, e.g., people are more willing to give negative responses to an 12 automated system than to a human agent. An automated system has the additional 13 advantage relative to a human agent of eli~ ti~ a subsequent data entry step.19 In one known service offering, cl~stomPrs call a special phone number and 15 respond to pre-recorded questions using a telephone keypad. T imh~tions of this 16 service offering include lack of universal availability of dual tone multi-frequency 17 telephones (rotary telephones provide out-of-voiceband signals, and so are lln.cllit~ble 18 for data entry) and an awkward interface due to availability of only twelve keys.
19 Also, some users are uncolllf~l~ble with providing i~o~l,~lion by "typing".

SUMMARY OF THE INVENTION
2 In accordallce with an aspect of this invention, a method of c~ g a 3 production script for interactions between participants and an automated data 4 collection system is provided. The automated data collection system is opela~ive in accordance with an e~ l script. The method comprises con~ cting a trial in 6 whic:h a group of subjects provides utterances to the au~ol~ ed data collection system, 7 and ev~ ting a recognition rate ~tt~in~cl by the automated data collection system.
8 If the recognition rate is below an acceptability threshold, the experimental 9 script is tuned. The step of con~llctin~ a trial is repeated using the tunedexperimental script, and the step of eval-lating the recognition rate is also repeated.
11 If the recognition rate is at least the acceptability threshold, then the 12 experimental script, as selectively tuned, is used as the production script.13 It is not intPntle~l that the invention be summarized here in its entirety.
14 Rather, further fealules, aspects and advantages of the invention are set forth in or are apparent from the following description and drawings.

18 Fig. 1 is a block diagram of a telephone-based data collection system;
19 Fig. 2 is a flow-;hal L referred to in describing how the sofhvare for the data 2 0 collection system of Fig. l is produced; and 21 Fig. 3 is a flowchart of the tuning process refelellced in Fig. 2.

CA 0223~361 1998-04-20 DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
2 Fig. 1 shows a typical envilon.,lent for a telephone-based data collection 3 system. A telephone set 100 having a st~ndard key set (0, 1, .. , 9, *, #) is used by a 4 participant, such as from the participant's home or office. As used herein, a participant means a person providing information for collection to an auloll~ d data 6 collection system.
7 Telecollll~ul~ication network 200 comprises dial telephone lines and well-8 known switching e4ui~",e"L such as stored program control switches.
9 Data collection system 300 comprises co~ ic~tion interface 310 to network 1 0 200, col"".,lnir~tion interface 315 to external system 400, dual tone multi-frequency 11 (DTMF) signal recognition circuit 320, speech recognition circuit 330, speech 12 generation/replay circuit 340, data bus 350, processor 360, memory 370, disk 13 interface 380 and disk storage 390. Data collection system 300 is located within a 14 network operated by a co""".~ ,-ir~tiorl network operator. In other embodiments, data collection system 300 may be externa] to the network operated by the co""" ~-ication 1 6 network operator.
17 Cu~ llunication interface 310 functions to receive and place calls through 1 8 network 200 under control of processor 360. Co~ "l~"ic~tion interface 310 supplies 1 9 inbound call signals from network 200 to recognition circuits 320, 330, and receives 2 0 outbound call signals from gel~ldlion/replay circuit 340 for delivery to network 200.
21 Co"-", ~"ication interface 315 functions to l,~"~s",il data to and receive data 2 2 from external system 400 under control of processor 360. In some embo~1im~nt~, CA 0223~361 1998-04-20 co"""~ ir~lion interface 315 interfaces to PSTN 200, rather than a de~ir~ted 2 co"-",~ tion line (as shown).
3 DTMF signal recognition circuit 320 functions to recognize a DTMF signal 4 produced by depression of a key on telephone set 100, to convert the DTMF signal to digital data and to supply the digital data to processor 360 via data bus 350.
6 Speech recognition circuit 330 is adapted to receive a speech signal spoken by 7 a palticipant to telephone set 100, to process the speech signal to obtain a word string 8 or de,~ ~tion that the speech signal is unrecognizable, and to deliver its processing 9 results to processor 360 via data bus 350.
1 0 Speech generation/replay circuit 340 is adapted to receive a control signal 1 1 from processor 360 via data bus 350, and in response thereto, to ge~llel ~le a speech 12 signa,l or to replay a pre-recorded speech signal, and to provide the gell~,a~d or 13 replayed speech signal to communications interface 310.
4 Processor 360 serves to control circuits 310, 320, 330 and 340 to collect infolmation from participants in accordance with a predetermined script including 16 ~rOll~ S and corresponding recognition grammars, to store the collected information 1 7 and to make the stored ~nfollllation available to a recipient of the information.
1 8 Proce ssor 360 is operative in conjullclion with memory 370, disk interface 380 and 1 9 disk storage 390 in a conventional manner which is well-known to one of ordi.~ly 2 0 skill in the art. It will be appreciated that the predetermined script is stored on disk 2 1 storage 390. At initi~li7~tion of data c:ollection system 300, typically when power is 22 supplied thereto, a~lo~liate portions of the predetermined script are loaded into CA 0223~361 1998-04-20 memory 370.
2 An important feature of the present invention is the ability to provide a semi-3 constrained vocabulary recognition response g~alm~ . That is, rather than requiring 4 a caller to enter data via manual entry on a telephone keypad, a data collection system according to the present invention supports very flexible speech recognition. The 6 data collection system does not require ntraining" for each caller, but rather can 7 accommodate new callers. The data collection system is not limited to a small 8 recognition vocabulary, but rather accommodates a conversational style dialog with a 9 caller. However, the recognition glalll,l,a~ is semi-constrained in accordance with an empirical grammar construction process, described in detail below.
11 To support a larger participant co~ y, various elements of the system 12 shown in Fig. 1 may be replicated. For example, to support a plurality of 13 simultaneous calls, multiple oc~;u~,cnces of comm~miC~tion interface 310, DTMF
14 signal recoglution circuit 320, speech recognition circuit 330, and speech gel~lalion/replay circuit 340 may be provided.
16 Typical applications for the data collection system 300 include collection of 17 me~ l data from patients; collection of purchase h~"la~ion from customers, e.g., 18 to make a rental car lesel./dtion or to order merch~n~ e or services; and collection of 19 survey data from subjects.
2 0 Initially, the recipient of information (such as a hospital or co"~pal,y testing a 21 new dLrug) sets up a script for the service dialog, and registers participants (such as 2 2 doctors and patients) possibly including a voice print for verification. For CA 0223~361 1998-04-20 app].ications involving co,~f il "~tion of the caller's identity, multiple levels of 2 verification may be elected by the recipient of il~llllation, such as tiers of 3 verification.
4 On a regular or as needed basis, patients call in to provide il~o.l~lion, such 5 as:

7 System How are you feeling on a scale of one to five, with one being 8 very good and five being very poor?

Patient Ummm .. three.

12 System You're feeling fair. Is that correct?

14 Patient Yes.

16 Generally, a participant employs telephone set l00 to dial a call through 17 net~,vork 200 to data collection systerm 300 which is configured to receive the 18 participant's call and engage the participant in a speech based dialog to collect mostly 19 spoken information. Using automated speech recognition, data collection system 300 2 0 converts the spoken information into response data, which is verified with the speaker 21 via voice synthesis or pre-recorded segments. The speech recognition supports "word 2 2 spotting" (the desired response is recognized from a longer utterance) and "barge in"

CA 0223~361 1998-04-20 (the speaker utters somlothin~ before Ihe system has completed its spoken question or 2 instruction). An example of word spotting is proce~sing "Ummm three" to extract 3 the response "three" .
4 When data collection system 300 is used to interview an out-patient at a mental hospital, a prompt may be:

7 System Please state all of the drugs that you are presently taking.

9 The semi-constrained vocabulary recognition response grammar is all of the drugs 10 prescribed by the mental hospital. An example g~ ar is:

12 ativan lithium stelazine 13 buspar luvox symmetryl 14 clozaril paxil thorazine cogentin prolixin tofranil 16 depakote prozac xanax 17 haldol risperidol zoloft 18 klonopin ritalin zyprexa 2 0 It will be appreciated that, as the list of allowable responses grows, is it much more 21 convenient for the participant to respond to an open-ended question which has a semi-22 constrained response vocabulary than ffir the participant to listen to a menu and enter CA 0223~361 1998-04-20 a selection, possibly with a delimiter. An open-ended question is particularly 2 conl~enient when the participant is being asked about information which they know, 3 e.g., personal st~tistirs or desired selection, and thus for which they do not require a 4 menu.
In some embo~;.. ~,.lc, data collection system 300 also includes speaker 6 veri~fication based on voice prints, which entails registration of participants including 7 provision of a baseline voice print; col.ri. ".~tion of caller identity, such as by 8 provision of prerecorded distinguishing i~ lation (e.g., mother's maiden name);
9 call ~lal~r~l, e.g., to voice mail services or a person; and outbound calling to participants to collect illfo~ ion, or as a reminder to call the automated system and 11 provide h~lll~lion.
12 The collected response data are dissemin~ted as printed reports, possibly 13 L~ c"lilled via f~rsimilr or as digital data. In some embodimentc, the collected 14 response data are made available through a voice based interface tailored for the recipient of infol,l~lion, e.g., the recipient calls data collection system 300 and 16 queries the status of selected participants and/or newly collected information. In 17 some embo~limrntc, data collection system 300 processes the collected data, e.g., 18 detec:ts predetermined events, or makes the collected data available to a separate data 19 processing system (not shown).
2 0 External system 400 may be used to supply data to data collection system 300, 21 e.g., a daily update of schrdllling il~ ation or real-time reservation availability.
22 Exte]nal system 400 may additionally or alternatively be a destination for hlrollnation CA 0223~361 1998-04-20 collected by data collection system 300.
2 Fig. 2 is a flowchall showing how the predetermined script including ~lC3 and corresponding recognition ~ alS used by data collection system 300 is 4 produced.
At step 500, an expe~ n~l script is p-~a-ed. Typically, producers of the 6 script provide their best guesses as to (i) the sequence and phrasing of questions most 7 likely to elicit desired info~liQn from participants, that is, the ~.olpl~, and (ii) the 8 types of responses which participants are most likely to give. Based on the most 9 likely responses, a recognition gl~llllmdliS created. If a response is not understood, 10 provision of a question to elicit understandable i--ro.,--ation, referred to as re-11 p~ ing, may be included as part of the script. How extensively re-pro~ li,.g is 12 employed depends on the criticality of the information and the motivation of the 13 participant, e.g., a patient versus an anonymous caller.
14 At step 5lO, a trial of the expe-hlle-llal script is con~ cted with a test group of 15 participants. The participants' responses are recorded.
16 At step 520, the results of the trial are analyzed, and an overall recognition 17 rate iis obtained. In most cases, selecl:ed questions exhibit a low recognition rate, 18 while other questions have an acceptable recognition rate.
19 The recognition rate of the data collection system 300 is defined as follows:

Recognition Rate = Correct Recognitions 2 Total Attempted Recognitions 4 where 6 Correct Recognitions = IN_ACCEPT + OUT_EWECT
7 Total Attempted Recognitions = Correct Recognitions + IN_REJECT +
8 OUl'_ACCEPT
9 IN_ACCEPT = responses in glall"~lar that were accepted 10 IN_E~EJECT = responses in ~la~ al that were rejected 11 OUI'_ACCEPT = responses out of gl~llllal that were accepted 12 OUI'_E~EJECT = responses out of glallllllar that were rejected 14 At step 530, tuning is performed to improve the recognition rate. Tuning is 15 ~ cusse~l in detail below. Most of the tuning is performed m~nll~lly. The result of 16 step 530 is a revised expe~ lal script. Tuning is an important aspect of creation of 17 a semi-constrained gldll~nar.
18 At step 540, it is cletermin~l whether the revised experimental script is 19 applo~liate for batch testing. Generally, if the changes made during tuning do not 2 0 affecl: the l,rol~ or prompt sequence, then batch testing is al,pro~lial~.
21 If batch testing is ~l,lo~liate, at step 550, a batch test is con-luctecl Batch 22 testing is relu~ g collected data with a revised speech recognition system, and is 2 3 generally faster and more economical than another user trial.

If, at step 540, it is de~ il~d that batch testing is illap~Iop,iate, step 510 is 2 repeated using the revised expe~ lJ~l script. In most cases, user trials are3 con~ cted with a test group having pallici~a"ls other than those who participated in a 4 previous user trial.
Step 520 is repeated on the results of the most recent user testing or batch 6 trial. If the recognition rate is unacceptable, the tuning, testing, trial and acceptability 7 steps 530, 540, 510 and 520, respectively, are repeated until the recognition rate 8 attains an acceptable threshold.
9 When an acceptable recognition rate is at~in~, the experimental script used in the associated trial is defined to be the production script, and, at step 560, the 11 process of producing the predefined (or production) script is complete.
12 Fig. 3 is a flowchart of the tuning process shown in step 530 of Fig. 2. Much 13 of the tuning process is performed m~ml~lly, and the order in which the various 14 tuning techniq ~es are applied varies from artisan to artisan. The tuning techniques typic,ally used include tuning the recognition grammar, tuning plo",~l~, modifying re-16 ~lolll~ls, pruning the glanu~ar, tuning recognition parameters, pre- and post-17 processor tuning, and tuning the sequence of plolll~. Other tuning techniques may 18 also be used.
19 Tuning the recognition grammar refers to exp~n-ling the recognition gld~unar 2 0 to inc lude what people say, for example, a larger vocabulary of substantive responses, 21 surrolmding prepositions, and ~ulrounding phrases.
2 2 Pruning the grammar refers to excluding rarely used words from the recognition g~ ar. Illle.e~,Lillgly, although removing rarely used words ensures a 2 recognition error when a participant says the removed word, the pelrc,lll~,ce 3 imp;rovement in misrecognitions, thal. is, the reduction in recognition errors for more 4 frequently said words which are misrecognized as one of the rarely used words, can be larger than the pelrollllance degradation due to the ensured errors.
6 Tuning recognition parameters refers to adjusting the parameters of the7 software used in speech recognition circuit 330. Recognition software is available 8 frorn various vendors such as the HARKTM Telephony Recognizer, Release 3.0, from g Bolt, Beranek and Newman, Cambridge, M~cs~çh-lsettc 02138, described in Document No. 300-3.0 (July 1995), the disclosure of which is hereby incorporated by 11 reference. An example of a recognition parameter is the "rejection parameter", which 12 attennpts to m~ximi7P the "proper reje:ctions" curve and minimi7e the "false13 rejections" curve, which typically have dirrelellL shapes. The rejection parameter is 14 usually computed to maximize the following expression:
(proper rejections) - (false rejections) 16 = OUT_REJECT - IN_REJECT
17 Another example of a recognition parameter is speaker gender sensitivity.
18 Tuning prompts refers to ch~n~ing a prompt. An exannple is provided below 19 (ch~nging PROMPT 2 of the experimental script discussed below).
2 0 Modifying re-prompts refers to adding re-prolll~ or altering re-prom~ls, 21 which are used when the recognition system fails to recognize a response. An22 exarnple is provided below (adding PROMPT R1 of the experimental script ~ ic-lsse~

CA 0223~36l l998-04-20 below).
2 Tuning the sequence of pron~1ts refers to ch~ngin~ the order in which plo 3 are prese~ d. The purpose is to provide a better context for the participant, thereby 4 avoiding errors due to participant confusion.
Pre- and post-processor tuning refers to altering how external i~ol,nation is 6 employed by speech recognition circuit 330 and processor 360. An example of 7 external i~ ation is an account llulllbel or a confusion matrix. Processor 360 may 8 use lhe output of speech recognition circuit 330 to enhance account number 9 reco'gnition.
1 0 An example of post-processor tuning is using a confusion matrix to improve 1 1 the recognition accuracy. Use of confusion matrices with account numbers is 12 described in application serial no. 08J763,382 (attorney docket no. BROWN 1-3-1-3), 1 3 the disclosure of which is hereby incorporated by reference.
1 4 An example of pre-processor luning is constraining the account numbers which may exist to improve recognition accuracy. Choosing alpha-numerics to improve 1 6 recognition accuracy is described in application serial no. 08/771,356 (attorney docket 1 7 no. E~ROWN 24-14-3), the disclosure of which is hereby incorporated by reference.
18 At step 600 of Fig. 3, the results of a user trial conducted at step 510 of Fig. 2 1 9 are analyzed.
2 0 If, at step 610, it is dete~ ecl that a sufficient percentage of responses were 2 1 out of ~lr~ lllal ~ then at step 620, the out of ~lallll"ar responses are analyzed to 22 determine whether they are clustered or grouped. A sufficient pelcell~ge may be, for CA 0223536l l998-04-20 example, at least 10% of the responses. If the responses are clustered, at step 622, 2 the recognition gl~lllllal is a~ nted (expan-~e~l) to include what people actually 3 said. and a batch test can be pelro~ ed to evaluate the results of the ~l~llllar tuning.
4 Step 622 can be pelrolllled entirely by a colll~uLel programmed a~ro~)liately. If the responses are not clustered, then at step 625, the prolll~,ts are tuned, and at step 627 6 the re-prolll~l~ are adjusted. After modifying the ~lolll~ or re-prollll)ls, a batch test 7 is h~plo~liale, so a user trial should be pelrolllled.
8 If, at step 610, it was determined that an out of glallllllar situation was not 9 present, then at step 630, it is detelll~ned whether the in-g,allllllal recognition 1 0 accuracy is low, for example, less than 80% correct recognition of in grammar 1 1 responses. If the in-gl~lllllar recognition accuracy is low, then at step 632, the 12 glanllllar is pruned and at step 634, the recognition parameters are tuned. Step 632 1 3 can be performed entirely by a colllyulel programmed a~lop.iately. A batch test 1 4 may be performed to evaluate the results of the gl~lllllar and parameter tuning.
1 5 If, at step 630, it is tletermin~cl that the in-~ lllmar recognition accuracy is 16 acceptable, then at step 640, it is detelmilled whether the responses in~jc~te that 17 people are getting confused, for example, by giving wholly inappropliate responses to 1 8 certain ~ . If people are getting confused, then, at step 642, the prompt 1 9 sequence is modified, and the pronlyL~ and re-plolllpls may also be adjusted.
2 0 If, at step 640, it is determined that people are not confused, i.e., their out-of-2 1 gl~lm~l responses are not clustered, then at step 645, pre-processor tuning and post-22 processor tuning are pelrolllled. A baLtch test may be performed to evaluate the results of the pre-processor and post-processor tuning.
2 An example of production of a script will now be discussed. The application 3 is a system for collecting ulrolmation for car rental reservations.
4 An ~e~ "~l script inrhlde~l the following plol~ll)ls and corresponding grammars, where triangular brackets (< >) in~licatr that a word in the in(lic~tr~l 6 category is expected:
7 PROMPT 1: Where will you be renting?
8 The recognitioL g~ .ar for PROMPT 1 was g < AIRPORT >
< CITY > < STATE >
11 < CITY > ~ STATE > < AIRPORT >
12 PROMPT 2: Picking up and ~c~ul~ g back to <LOCATION> ?
13 The recognition ~ ln~al for PROMPT 2 was:

NO
16 A first trial was con-l-lrtecl. The recognition rate was about 70%, that is, 30%
17 of the participants said things that could not be processed by the telephone data 18 collection system. This recognition rate was unacceptable. Tuning was performed, 19 as described below.
The responses to PROMPT 1 included:

Chicago 2 from O-Hare 3 O-Hare 4 Chicago O-Hare O-Hare Airport 6 Chicago O-Hare Airport 8 New York 9 JFK Airport at JFK
11 from Newark Airport 12 uh landing um at Newark 13 KennPAy T--le- ,.,.~ional Los Angeles 16 L-A-X Airport 18 LA Il~ lational 2 0 Participants used many variations on place names, occasionally added prepositions to 21 their responses, and provided extraneous utterances such as "uh", "ah", "um", "er"
2 2 and so on. The ~ ar was adjusted to account for this.

The responses to PROMPT 2 were quite varied, so the prompt was elimin~tecl, 2 and the following prompts used in place thereof:
3 PROMPT 2A: Are you l~lu~ lg to the same location?
4 PROMPT 2B: Where will you be ~c;lu~llillg the car?
Additionally, re-p.onl~ g capability was added, including:
6 PROMPT Rl: I'm sorry, I didn't understand what you said.
7 If a eomplete sentence was provided as an answer, re-plolllptillg was considered faster 8 than trying to extract hlrollllation from the sentence.
9 For ambiguous airports and cities, additional prompting was added:
PROMPT lA: Is that Charleston, South Carolina or Charleston, 11 West Virginia?
12 PROMPT lB: Is that Boston, ~ c;s~chllcetts or Austin, Texas?
13 PROMPT lC: Is that Columbus, Ohio, Columbus, Mississippi 14 or Columbus, Georgia?
PROMPT lD: I'; that Portland, Oregon or Portland, Maine?
16 PROMPT lE: I'; that Washington, National, Dulles or BWI?
17 PROMPT lF: Is that Dulles International or Dallas, Texas?
18 PROMPT lG: Is; that JFK, Laguardia or Newark T~le~ lional?
19 Also, at any point, if the participant said "help", an a~plo~liate response was 2 0 provided.
21 A second trial was con~ ct~l, and the responses were recorded. The 22 recognition rate was about 90%, that is, l0~ of the participants said things that could CA 0223~36l l998-04-20 not be processed by the telephone data collection system. This recognition rate was 2 unacceptable. Tuning was ~IÇ ~ll.-ed. The speech models used in the recognizer3 were changed to include large country-wide speech data, that is, more regional4 pro.~ cia~ions were accommodated. Also, the speech models were changed to 5 incorporate better phoneme recognition. Better rejection was added to the glOI.l,ar.
6 For example, if a response of "um" or "er" was provided, the system repeated the 7 question.
8 A third trial was conducted, in which the pre-recorded responses of the second 9 trial were played back; no new participant data was collected. Results were 1 0 encouraging.
11 A fourth trial was conf~ ted, with live participants. The recognition rate was 12 about 98%. This was considered acceptable, that is, the acceptability threshold was 13 no more than 98%. The script used in the fourth triial was defined to be the 14 production script.
Although an illustrative embodiment of the present invention, and various 16 modiifications thereof, have been described in detail herein with lere~ellce to the 17 acco]~panying drawings, it is to be uIlderstood that the invention is not limited to this 18 precise embodiment and the described modifications, and that various changes and 19 further modifications may be effected therein by one ski]iled in the art without 2 0 departing from the scope or spirit of the invention as defined in the appended claims.

Claims (9)

1. A method of creating a production script for interactions between participants and an automated data collection system, comprising the steps of:
conducting a trial in which a group of subjects provides utterances to the automated data collection system which is operative in accordance with an experimental script;
evaluating a recognition rate attained by the automated data collection system;
if the recognition rate is below an acceptability threshold, tuning the experimental script, repeating the step of conducting a trial using the tuned experimental script, and repeating the step of evaluating the recognition rate; and if the recognition rate is at least the acceptability threshold, using the experimental script, as selectively tuned, as the production script.
2. The method of claim 1, wherein the script comprises prompts and corresponding recognition grammars, and the step of tuning includes modifying at least one of the prompts and the recognition grammars.
3. The method of claim 2, wherein modifying the recognition grammar comprises pruning the recognition grammar.
4. The method of claim 2, wherein modifying the recognition grammar comprises augmenting the recognition grammar.
5. The method of claim 1, further comprising the step of recording the utterances provided by the test subjects during the trial, and wherein a subsequent trial is conducted using the recorded utterances instead of utterances from a group of test subjects.
6. The method of claim 1, wherein the utterances of the test subjects are provided to the automated data collection system via telephone.
7. The method of claim 1, wherein the automated data collection system is located within a network operated by a communication network operator.
8. The method of claim 1, wherein the automated data collection system interacts with an external system to recognize utterances.
9. The method of claim 1, wherein the recognition grammar includes a semi-constrained vocabulary recognition response grammar.
CA002235361A 1997-07-16 1998-04-20 Telephone-based speech recognition for data collection Abandoned CA2235361A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US08/895,183 US6101241A (en) 1997-07-16 1997-07-16 Telephone-based speech recognition for data collection
US08/895,183 1997-07-16

Publications (1)

Publication Number Publication Date
CA2235361A1 true CA2235361A1 (en) 1999-01-16

Family

ID=25404126

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002235361A Abandoned CA2235361A1 (en) 1997-07-16 1998-04-20 Telephone-based speech recognition for data collection

Country Status (2)

Country Link
US (1) US6101241A (en)
CA (1) CA2235361A1 (en)

Families Citing this family (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6961410B1 (en) * 1997-10-01 2005-11-01 Unisys Pulsepoint Communication Method for customizing information for interacting with a voice mail system
US6144938A (en) 1998-05-01 2000-11-07 Sun Microsystems, Inc. Voice user interface with personality
US8321411B2 (en) 1999-03-23 2012-11-27 Microstrategy, Incorporated System and method for management of an automatic OLAP report broadcast system
US9208213B2 (en) 1999-05-28 2015-12-08 Microstrategy, Incorporated System and method for network user interface OLAP report formatting
US8607138B2 (en) 1999-05-28 2013-12-10 Microstrategy, Incorporated System and method for OLAP report generation with spreadsheet report within the network user interface
US7612528B2 (en) * 1999-06-21 2009-11-03 Access Business Group International Llc Vehicle interface
US6829334B1 (en) 1999-09-13 2004-12-07 Microstrategy, Incorporated System and method for the creation and automatic deployment of personalized, dynamic and interactive voice services, with telephone-based service utilization and control
US6964012B1 (en) 1999-09-13 2005-11-08 Microstrategy, Incorporated System and method for the creation and automatic deployment of personalized, dynamic and interactive voice services, including deployment through personalized broadcasts
US8130918B1 (en) 1999-09-13 2012-03-06 Microstrategy, Incorporated System and method for the creation and automatic deployment of personalized, dynamic and interactive voice services, with closed loop transaction processing
US6836537B1 (en) 1999-09-13 2004-12-28 Microstrategy Incorporated System and method for real-time, personalized, dynamic, interactive voice services for information related to existing travel schedule
US6873693B1 (en) 1999-09-13 2005-03-29 Microstrategy, Incorporated System and method for real-time, personalized, dynamic, interactive voice services for entertainment-related information
US6577713B1 (en) * 1999-10-08 2003-06-10 Iquest Technologies, Inc. Method of creating a telephone data capturing system
US7509266B2 (en) * 2000-05-31 2009-03-24 Quality Data Management Inc. Integrated communication system and method
US10142836B2 (en) 2000-06-09 2018-11-27 Airport America, Llc Secure mobile device
US7599847B2 (en) 2000-06-09 2009-10-06 Airport America Automated internet based interactive travel planning and management system
US7315567B2 (en) * 2000-07-10 2008-01-01 Motorola, Inc. Method and apparatus for partial interference cancellation in a communication system
US7398209B2 (en) 2002-06-03 2008-07-08 Voicebox Technologies, Inc. Systems and methods for responding to natural language speech utterance
US7693720B2 (en) 2002-07-15 2010-04-06 Voicebox Technologies, Inc. Mobile systems and methods for responding to natural language speech utterance
US7072684B2 (en) * 2002-09-27 2006-07-04 International Business Machines Corporation Method, apparatus and computer program product for transcribing a telephone communication
US7117153B2 (en) * 2003-02-13 2006-10-03 Microsoft Corporation Method and apparatus for predicting word error rates from text
US20050108013A1 (en) * 2003-11-13 2005-05-19 International Business Machines Corporation Phonetic coverage interactive tool
US7640160B2 (en) 2005-08-05 2009-12-29 Voicebox Technologies, Inc. Systems and methods for responding to natural language speech utterance
US7620549B2 (en) 2005-08-10 2009-11-17 Voicebox Technologies, Inc. System and method of supporting adaptive misrecognition in conversational speech
US7949529B2 (en) 2005-08-29 2011-05-24 Voicebox Technologies, Inc. Mobile systems and methods of supporting natural language human-machine interactions
US7634409B2 (en) 2005-08-31 2009-12-15 Voicebox Technologies, Inc. Dynamic speech sharpening
US8320649B2 (en) 2006-05-25 2012-11-27 Elminda Ltd. Neuropsychological spatiotemporal pattern recognition
US8386248B2 (en) * 2006-09-22 2013-02-26 Nuance Communications, Inc. Tuning reusable software components in a speech application
US8073681B2 (en) 2006-10-16 2011-12-06 Voicebox Technologies, Inc. System and method for a cooperative conversational voice user interface
WO2009069135A2 (en) * 2007-11-29 2009-06-04 Elminda Ltd. System and method for neural modeling of neurophysiological data
US7818176B2 (en) 2007-02-06 2010-10-19 Voicebox Technologies, Inc. System and method for selecting and presenting advertisements based on natural language processing of voice-based input
EP2227725A4 (en) * 2007-11-29 2013-12-18 Elminda Ltd Clinical applications of neuropsychological pattern analysis and modeling
US8140335B2 (en) 2007-12-11 2012-03-20 Voicebox Technologies, Inc. System and method for providing a natural language voice user interface in an integrated voice navigation services environment
US9305548B2 (en) 2008-05-27 2016-04-05 Voicebox Technologies Corporation System and method for an integrated, multi-modal, multi-device natural language voice services environment
US8589161B2 (en) 2008-05-27 2013-11-19 Voicebox Technologies, Inc. System and method for an integrated, multi-modal, multi-device natural language voice services environment
US8326637B2 (en) 2009-02-20 2012-12-04 Voicebox Technologies, Inc. System and method for processing multi-modal device interactions in a natural language voice services environment
WO2011059997A1 (en) 2009-11-10 2011-05-19 Voicebox Technologies, Inc. System and method for providing a natural language content dedication service
US9171541B2 (en) 2009-11-10 2015-10-27 Voicebox Technologies Corporation System and method for hybrid processing in a natural language voice services environment
EP3195145A4 (en) 2014-09-16 2018-01-24 VoiceBox Technologies Corporation Voice commerce
US9898459B2 (en) 2014-09-16 2018-02-20 Voicebox Technologies Corporation Integration of domain information into state transitions of a finite state transducer for natural language processing
CN107003999B (en) 2014-10-15 2020-08-21 声钰科技 System and method for subsequent response to a user's prior natural language input
US10431214B2 (en) 2014-11-26 2019-10-01 Voicebox Technologies Corporation System and method of determining a domain and/or an action related to a natural language input
US10614799B2 (en) 2014-11-26 2020-04-07 Voicebox Technologies Corporation System and method of providing intent predictions for an utterance prior to a system detection of an end of the utterance
US10331784B2 (en) 2016-07-29 2019-06-25 Voicebox Technologies Corporation System and method of disambiguating natural language processing requests
CN107333102B (en) * 2017-06-30 2019-10-29 普联技术有限公司 The method, apparatus and computer readable storage medium of data during restorer powers off
US10304453B2 (en) 2017-07-27 2019-05-28 International Business Machines Corporation Real-time human data collection using voice and messaging side channel

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US34587A (en) * 1862-03-04 Improvement in apparatus for forging and crushing iron
US4451700A (en) * 1982-08-27 1984-05-29 M. A. Kempner, Inc. Automatic audience survey system
US4785408A (en) * 1985-03-11 1988-11-15 AT&T Information Systems Inc. American Telephone and Telegraph Company Method and apparatus for generating computer-controlled interactive voice services
US5255309A (en) * 1985-07-10 1993-10-19 First Data Resources Inc. Telephonic-interface statistical analysis system
US4866756A (en) * 1986-04-16 1989-09-12 Call It Co. Interactive computerized communications systems with voice input and output
US4922520A (en) * 1986-12-31 1990-05-01 M. A. Kempner, Inc. Automatic telephone polling system
US4897865A (en) * 1988-04-29 1990-01-30 Epic Data, Inc. Telephone data collection device
US5187735A (en) * 1990-05-01 1993-02-16 Tele Guia Talking Yellow Pages, Inc. Integrated voice-mail based voice and information processing system
US5131045A (en) * 1990-05-10 1992-07-14 Roth Richard G Audio-augmented data keying
US5303299A (en) * 1990-05-15 1994-04-12 Vcs Industries, Inc. Method for continuous recognition of alphanumeric strings spoken over a telephone network
US5375164A (en) * 1992-05-26 1994-12-20 At&T Corp. Multiple language capability in an interactive system
US5719920A (en) * 1995-03-31 1998-02-17 The Messenger Group Llc Method and apparatus for processing and downloading sound messages onto a permanent memory of a communication package
US5758323A (en) * 1996-01-09 1998-05-26 U S West Marketing Resources Group, Inc. System and Method for producing voice files for an automated concatenated voice system
US5737487A (en) * 1996-02-13 1998-04-07 Apple Computer, Inc. Speaker adaptation based on lateral tying for large-vocabulary continuous speech recognition

Also Published As

Publication number Publication date
US6101241A (en) 2000-08-08

Similar Documents

Publication Publication Date Title
CA2235361A1 (en) Telephone-based speech recognition for data collection
US5488652A (en) Method and apparatus for training speech recognition algorithms for directory assistance applications
US6570964B1 (en) Technique for recognizing telephone numbers and other spoken information embedded in voice messages stored in a voice messaging system
EP0890249B1 (en) Apparatus and method for reducing speech recognition vocabulary perplexity and dynamically selecting acoustic models
CA2105034C (en) Speaker verification with cohort normalized scoring
JP3479304B2 (en) Voice command control and verification system
US6487530B1 (en) Method for recognizing non-standard and standard speech by speaker independent and speaker dependent word models
US5797124A (en) Voice-controlled voice mail having random-order message retrieval based on played spoken identifier list
Rabiner Applications of voice processing to telecommunications
US7822611B2 (en) Speaker intent analysis system
US6219643B1 (en) Method of analyzing dialogs in a natural language speech recognition system
US5479488A (en) Method and apparatus for automation of directory assistance using speech recognition
US6438520B1 (en) Apparatus, method and system for cross-speaker speech recognition for telecommunication applications
US20010016813A1 (en) Distributed recogniton system having multiple prompt-specific and response-specific speech recognizers
US20120262296A1 (en) User intent analysis extent of speaker intent analysis system
US7933775B2 (en) Method of and system for providing adaptive respondent training in a speech recognition application based upon the inherent response of the respondent
CA2267954A1 (en) Speaker verification method
JPH08320696A (en) Method for automatic call recognition of arbitrarily spoken word
JP2000194386A (en) Voice recognizing and responsing device
US20040098259A1 (en) Method for recognition verbal utterances by a non-mother tongue speaker in a speech processing system
US6738457B1 (en) Voice processing system
CA2235376A1 (en) Telephone-based speech recognition for data collection
de Veth et al. Comparison of hidden Markov model techniques for automatic speaker verification in real-world conditions
WO2000059193A1 (en) Auto attendant with library of recognisable names
EP0942575A2 (en) Adaptive telephone answering system

Legal Events

Date Code Title Description
EEER Examination request
FZDE Discontinued