US20030225578A1 - System and method for improving the accuracy of a speech recognition program - Google Patents

System and method for improving the accuracy of a speech recognition program Download PDF

Info

Publication number
US20030225578A1
US20030225578A1 US10/461,079 US46107903A US2003225578A1 US 20030225578 A1 US20030225578 A1 US 20030225578A1 US 46107903 A US46107903 A US 46107903A US 2003225578 A1 US2003225578 A1 US 2003225578A1
Authority
US
United States
Prior art keywords
written text
speech recognition
recognition program
text
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US10/461,079
Inventor
Jonathan Kahn
Thomas Flynn
Charles Qin
Nicholas Linden
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US09/362,255 external-priority patent/US6490558B1/en
Application filed by Individual filed Critical Individual
Priority to US10/461,079 priority Critical patent/US20030225578A1/en
Publication of US20030225578A1 publication Critical patent/US20030225578A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0631Creating reference templates; Clustering

Definitions

  • the present invention relates in general to computer speech recognition systems and, in particular, to a system and method for expediting the aural training of an automated speech recognition program.
  • Speech recognition programs are well known in the art. While these programs are ultimately useful in automatically converting speech into text, many users are dissuaded from using these programs because they require each user to spend a significant amount of time training the system. Usually this training begins by having each user read a series of pre-selected materials for several minutes. Then, as the user continues to use the program, as words are improperly transcribed the user is expected to stop and train the program as to the intended word thus advancing the ultimate accuracy of the speech files. Unfortunately, most professionals (doctors, dentists, veterinarians, lawyers) and business executive are unwilling to spend the time developing the necessary speech files to truly benefit from the automated transcription.
  • aural parameters i.e. speech files, acoustic model and/or language model
  • Another object of the present invention is to provide a system that can increase the speed of the speech recognition training by training the speech recognition software with only the segments of transcribed speech that are determined to be erroneous.
  • the means for parsing the written text into segments includes means for directly accessing the functions of the speech recognition program.
  • the parsing means may include means to determine the character count to the beginning of the segment and means for determining the character count to the end of the segment.
  • Such parsing means may further include the UtteranceBegin function of Dragon Naturally SpeakingTM to determine the character count to the beginning of the segment and the UtteranceEnd function of Dragon Naturally SpeakingTM to determine the character count to the end of the segment.
  • the means for automatically converting a pre-recorded audio file into a written text may further be accomplished by executing functions of Dragon Naturally SpeakingTM.
  • the means for automatically converting may include the TranscribeFile function of Dragon Naturally SpeakingTM.
  • the correcting means includes means for displaying the current unmatched word in a manner substantially visually isolated from other text in the written text and means for playing a portion of said synchronized voice dictation recording from said first buffer associated with said current unmatched word.
  • the correcting means may further include means for alternatively viewing the current unmatched word in context within the copy of the written text.
  • the second written text may be established by a second speech recognition program having at least one conversion variable different from said speech recognition program.
  • the second written text may be established by one or more human beings.
  • the invention further involves a method for improving the accuracy of a speech recognition program operating on a computer comprising: (a) automatically converting a pre-recorded audio file into a written text; (b) parsing the written text into segments; (c) correcting each and every segment of the written text; (d) saving the corrected segments in a retrievable manner; (e) saving speech files associated with a substantially corrected written text and used by the speech recognition program towards improving accuracy in speech-to-text conversion by the speech recognition program; (f) establishing an independent instance of the written text from the pre-recorded audio file using the speech recognition program; (g) replacing erroneous segments in the independent instance of the written text with the individually retrievable saved corrected segment associated therewith; (h) saving speech files associated with the independent instance of the written text used by the speech recognition program towards improving accuracy in speech-to-text conversion by the speech recognition program; and (i) repeating steps (f) through (i) a predetermined number of times.
  • FIG. 1 of the drawings is a block diagram of the system for quickly improving the accuracy of a speech recognition program
  • FIG. 2 of the drawings is a flow diagram of the method for quickly improving the accuracy of the Dragon Naturally SpeakingTM software
  • FIG. 3 of the drawings is a flow diagram of the method for automatically training the Dragon Naturally SpeakingTM software
  • FIG. 5 of the drawings is a plan view of the present system and method showing the highlighting of a segment of text for playback or edit
  • FIG. 6 of the drawings is a plan view of the present system and method showing the highlighting of a segment of text with an error for correction;
  • FIG. 7 of the drawings is a plan view of the present system and method showing the initiation of the automated correction method
  • FIG. 8 of the drawings is a plan view of the present system and method showing the initiation of the automated training method.
  • FIG. 9 of the drawings is a plan view of the present system and method showing the selection of audio files for training for addition to the queue;
  • FIG. 1 of the drawings generally shows one potential embodiment of the present system quickly improving the accuracy of a speech recognition program.
  • the system must include some means for receiving a pre-recorded audio file.
  • This audio file receiving means can be a digital audio recorder, an analog audio recorder, or standard means for receiving computer files on magnetic media or via a data connection; preferably implemented on a general-purpose computer (such as computer 20 ), although a specialized computer could be developed for this specific purpose.
  • the general-purpose computer should have, among other elements, a microprocessor (such as the Intel Corporation Pentium®, AMD K6® or Motorola 6800® series); volatile and non-volatile memory; one or more mass storage devices (i.e. HDD, floppy drive, and other removable media devices such as a CD-ROM drive, DITTOTM, ZIPTM or JAZTM drive (from Iomega Corporation) and the like); various user input devices, such as a mouse 23 , a keyboard 24 , or a microphone 25 ; and a video display system 26 .
  • the general-purpose computer is controlled by the WindowsTM 9.x operating system.
  • the present system would work equally well using a MacintoshTM computer or even another operating system such as a Windows CETM, UNIX or a JAVA® based operating system, to name a few.
  • the general purpose computer has amongst its programs a speech recognition program, such as Dragon Naturally SpeakingTM, IBM's Via VoiceTM, Lernout & Hauspie's Professional EditionTM or other programs.
  • the general-purpose computer in an embodiment utilizing an analog audio input (such as via microphone 25 ) the general-purpose computer must include a sound-card (not shown).
  • a sound-card no sound card would be necessary to input the file.
  • a sound card is likely to be necessary for playback such that the human speech trainer can listen to the pre-recorded audio file toward modifying the written text into a verbatim text.
  • the general purpose computer may be loaded and configured to run digital audio recording software (such as the media utility in the WindowsTM 9.x operating system, VOICEDOCTM from The Programmers' Consortium, Inc. of Oakton, Va., Cool EditTM by Syntrillium Corporation of Phoenix, Ariz. or Dragon Naturally Speaking Professional EditionTM by Dragon Systems, Inc.
  • digital audio recording software such as the media utility in the WindowsTM 9.x operating system, VOICEDOCTM from The Programmers' Consortium, Inc. of Oakton, Va., Cool EditTM by Syntrillium Corporation of Phoenix, Ariz. or Dragon Naturally Speaking Professional EditionTM by Dragon Systems, Inc.
  • the speech recognition program may create a digital audio file as a byproduct of the automated transcription process.
  • These various software programs produce a pre-recorded audio file in the form of a “WAV” file.
  • WAV Wideband Audio file formats
  • other audio file formats such as MP3 or DSS, could also be used to format the audio file, without departing from the spirit of the present invention. The method of saving such audio files
  • dedicated digital recorder 14 Another means for receiving a pre-recorded audio file is dedicated digital recorder 14 , such as the Olympus Digital Voice Recorder D-1000 manufactured by the Olympus Corporation.
  • dedicated digital recorder In order to harvest the digital audio text file, upon completion of a recording, dedicated digital recorder would be operably connected toward downloading the digital audio file into that general-purpose computer. With this approach, for instance, no audio card would be required.
  • Another alternative for receiving the pre-recorded audio file may consist of using one form or another of removable magnetic media containing a pre-recorded audio file. With this alternative an operator would input the removable magnetic media into the general-purpose computer toward uploading the audio file into the system.
  • a DSS file format may have to be changed to a WAV file format, or the sampling rate of a digital audio file may have to be upsampled or downsampled.
  • Software to accomplish such pre-processing is available from a variety of sources including Syntrillium Corporation and Olympus Corporation.
  • an acceptably formatted pre-recorded audio file is provided to at least a first speech recognition program that produces a first written text therefrom.
  • the first speech recognition program may also be selected from various commercially available programs, such as Naturally SpeakingTM from Dragon Systems of Newton, Massachusetts, Via VoiceTM from IBM Corporation of Armonk, N.Y., or Speech Magic from Philips Corporation of Atlanta, Ga. is preferably implemented on a general-purpose computer, which may be the same general-purpose computer used to implement the pre-recorded audio file receiving means.
  • Dragon Systems' Naturally SpeakingTM for instance, there is built-in functionality that allows speech-to-text conversion of pre-recorded digital audio.
  • the present invention can directly access executable files provided with Dragon Naturally SpeakingTM in order to transcribe the pre-recorded digital audio.
  • a sound card In an approach using IBM Via VoiceTM—which does not have built-in functionality to allow speech-to-text conversion of pre-recorded audio a sound card would be configured to “trick” IBM Via VoiceTM into thinking that it is receiving audio input from a microphone or in-line when the audio is actually coming from a pre-recorded audio file. Such routing can be achieved, for instance, with a SoundBlaster LiveTM sound card from Creative Labs of Milpitas, Calif.
  • the transcription errors in the first written text are located in some manner to facilitate establishment of a verbatim text for use in training the speech recognition program.
  • a human transcriptionist establishes a transcribed file, which can be automatically compared with the first written text creating a list of differences between the two texts, which is used to identify potential errors in the first written text to assist a human speech trainer in locating such potential errors to correct same.
  • Such effort could be assisted by the use of specialized software for isolating or highlighting the errors and synchronizing them with their associated audio.
  • the acceptably formatted pre-recorded audio file is also provided to a second speech recognition program that produces a second written text therefrom.
  • the second speech recognition program has at least one “conversion variable” different from the first speech recognition program.
  • conversion variables may include one or more of the following:
  • speech recognition programs e.g. Dragon Systems' Naturally SpeakingTM, IBM's Via VoiceTM or Philips Corporation's Speech Magic;
  • the second speech recognition program will produce a slightly different written text than the first speech recognition program and that by comparing the two resulting written texts a list of differences between the two texts will assist a human speech trainer in locating such potential errors to correct same.
  • Such effort could be assisted by the use of specialized software for isolating or highlighting the errors and synchronizing them with their associated audio.
  • FIG. 2 is a flow diagram of this approach using the Dragon software developer's kit (“SDK”).
  • SDK Dragon software developer's kit
  • a user selects an audio file (usually “.wav”) for automatic transcription.
  • the selected pre-recorded audio file is sent to the TranscribeFile module of Dictation Edit Control of the Dragon SDK.
  • the location of each segment of text is determined automatically by the speech recognition program. For instance, in Dragon, an utterance is defined by a pause in the speech. As a result of Dragon completing the transcription, the text is internally “broken up” into segments according to the location of the utterances by the present invention.
  • the location of the segments is determined by the Dragon SDK UtteranceBegin and UtteranceEnd modules which report the location of the beginning of an utterance and the location of the end of an utterance. For example, if the number of characters to the beginning of the utterance is 100, and to the end of the utterance is 115, then the utterance begins at 100 and has 15 characters. This enables the present system to find the text for audio playback and automated correction. The location of utterances is stored in a listbox for reference.
  • Each utterance is listed sequentially in a correction window (see FIG. 5).
  • the display may also contain a window that allows the user to view the original transcribed text.
  • the user then manually examines each utterance to determine if correction is necessary.
  • the present program can play the audio associated with the currently selected speech segment using a “playback” button in the correction window toward comparing the audible text with the selected speech segment in the correction window. If correction is necessary, then that correction is manually input with standard computer techniques (using the keyboard, mouse and/or speech recognition software and, potentially, lists of potential replacement words) (see FIG. 6).
  • the audio is unintelligible or unusable (e.g., dictator sneezes and speech recognition software types out a word, like “cyst”—an actual example).
  • the speech recognition program inserts word(s) when there is no detectable audio. Or sometimes when the dictator says a command like “New Paragraph”, and rather than executing the command, the speech recognition software types in the words “new” and “paragraph”.
  • One approach where there is noise or no sound, is to type in some nonsense word like “xxxxx” for the utterance file so that audio text alignment is not lost.
  • the words “new” and “paragraph” may be treated as text (and not as command).
  • correction techniques may be modified to take into account the limitations and errors of the underlying speech recognition software to promote improved automated training of speech files.
  • unintelligible or unusable portions of the pre-recorded audio file may be removed using an audio file editor, so that only the usable audio would be used for training the speech recognition program.
  • the segment in the correction window is manually accepted and the next segment automatically displayed in the correction window.
  • the user may then have the option to calculate the accuracy of the transcription performed by Dragon.
  • This process compares the corrected set of utterances with the original transcribed file. The percentage of correct words can be displayed, and the location of the differences is recorded by noting every utterance that contained an error.
  • the corrected set of utterances is saved to a single file. In a preferred embodiment, all the utterances are saved to this file, not just corrected ones. Thus, this file will contain a corrected verbatim text version of the pre-recorded audio.
  • the user may then choose to do an automated correction of the transcribed text (see FIG. 7).
  • This process inserts the corrected utterances into the original transcription file via Dragon's correction dialog.
  • this correction uses the locations of the differences between the corrected utterances and the transcribed text to only correct the erroneous utterances.
  • the user is prompted to Save the Speech file.
  • Another novel aspect of this invention is the ability to make changes in the transcribed file for the purposes of a written report versus for the verbatim files (necessary for training the speech conversion program).
  • the general purpose of the present invention is to allow for automated training of a voice recognition system. However, it may also happen that the initial recording contains wrong information or the wrong word was actually said during recording (e.g. the user said ‘right’ during the initial recording when the user meant to say ‘left’). In this case, the correction of the text cannot normally be made to a word that was not actually said in the recording as this would hinder the training of the voice recognition system.
  • the present invention may allow the user to make changes to the text and save this text solely for printing or reporting, while maintaining the separate verbatim file to train the voice recognition system.
  • FIG. 5 One potential user interface for implementing the segmentation/correction scheme is shown in FIG. 5.
  • the program has selected “a range of dictation and transcription solutions” as the current speech segment.
  • the human speech trainer listening to the portion of pre-recorded audio file associated with the currently displayed speech segment, looking at the correction window and perhaps the speech segment in context within the transcribed text determines whether or not correction is necessary. By clicking on the “Play Selected” button the audio synchronized to the particular speech segment is automatically played back. Once the human speech trainer knows the actually dictated language for that speech segment, they either indicate that the present text is correct or manually replace any incorrect text with verbatim text. In a preferred approach, in either event, the corrected/verbatim text from the correction window is saved into a single file containing all the corrected utterances.
  • FIG. 4 the Dragon Naturally SpeakingTM program has selected “seeds for cookie” as the current speech segment (or utterance in Dragon parlance).
  • the human speech trainer listening to the portion of pre-recorded audio file associated with the currently displayed speech segment, looking at the correction window and perhaps the speech segment in context within the transcribed text determines whether or not correction is necessary. By clicking on the “Play Back” button the audio synchronized to the particular speech segment is automatically played back.
  • the human speech trainer knows the actually dictated language for that speech segment, they either indicate that the present text is correct (by merely pressing an “OK” button) or manually replace any incorrect text with verbatim text.
  • the corrected/verbatim text from the correction window is preferably saved into a single file containing all the corrected utterances.
  • FIG. 3 is a flow diagram describing the training process.
  • the user has the option of running the training sequence a selected number of times to increase the effectiveness of the training.
  • the user chooses the file on which to perform the training.
  • the chosen files are then transferred to the queue for processing (FIG. 9).
  • the file containing the corrected set of utterances is read.
  • the corrected utterances file is opened and read into a listbox. This is not a function of the Dragon SDK, but is instead a basic I/O file.
  • TranscribeFile method of DictationEditControl from the Dragon SDK.
  • the audio file is sent by running the command “FrmControls.DeTop2.TranscribeFile filename;” FrmControls is the form where the Dragon SDK ActiveX Controls are located; DeTop2 is the name of the controls.
  • Transcribe File is the function of controls for transcribing wave files.
  • the UtteranceBegin and UtteranceEnd methods of DragonEngineControl report the location of utterances in the same manner as previously described. Once transcription ends, the location of the utterances that were determined are used to break apart the text.
  • This set of utterances is compared to the list of corrected utterances to find any differences.
  • One program used to compare the differences may be File Compare. The location of the differences are then stored in a listbox. Then the locations of differences in the list box are used to only correct the utterances that had differences. Upon completion of correction, speech files are automatically saved. This cycle can then be repeated the predetermined number of times.
  • TranscribeFile can be initiated one last time to transcribe the pre-recorded audio. The location of the utterances are not calculated again in this step. This transcribed file is compared one more time to the corrected utterances to determine the accuracy of the voice recognition program after training.
  • the present system can produce a significant improvement in the accuracy of the speech recognition program.
  • the training process can be automated by using an executable file simultaneously operating with the speech recognition means that feeds phantom keystrokes and mousing operations through the WIN32API, such that the first speech recognition program believes that it is interacting with a human being, when in fact it is being controlled by the microprocessor.
  • the video and storage buffer of the speech recognition program are first cleared.
  • the pre-recorded audio file is loaded into the first speech recognition program, in the same manner disclosed above.
  • a new written text is established by the first speech recognition program.
  • the segmentation/correction program utilizes the speech recognition program's parsing system to sequentially identify speech segments and places each and every one of those speech segments into a correction window—whether correction is required on any portion of those segments or not—seriatim.
  • the system automatically replaces the next segment of erroneous text in the correction window using the saved corrected segments file. That text is then pasted into the underlying Dragon Naturally SpeakingTM buffer. The fourth and fifth steps are repeated until all of the erroneous segments have been replaced.

Abstract

A system and method for improving the accuracy of a speech recognition program. The system is based on a speech recognition program that automatically converts a pre-recorded audio file into a written text. The system parses the written text into segments, each of which can be corrected by the system and saved in a retrievable manner in association with the computer. The standard speech files are saved towards improving accuracy in speech-to-text conversion by the speech recognition program. The system further includes facilities to repetitively establish an independent instance of the written text from the pre-recorded audio file using the speech recognition program. This independent instance can then be broken into segments and each erroneous segment in said independent instance replaced with the corrected segment associated with that segment. In this manner, repetitive instruction of a speech recognition program can be facilitated.

Description

    CROSS REFERENCE TO RELATED APPLICATIONS
  • This application is a continuation of non-provisional patent application corresponding to provisional patent application Serial No. 60/208,878 filed on Jun. 1, 2000 entitled “System and Method for Improving the Accuracy of a Speech Recognition Program” and a continuation-in-part of co-pending patent application U.S. application Ser. No. 09/362,255 filed on Jul. 28, 1999 entitled “System and Method for Improving the Accuracy of a Speech Recognition Program.”.[0001]
  • BACKGROUND OF THE INVENTION
  • 1. Field of the Invention [0002]
  • The present invention relates in general to computer speech recognition systems and, in particular, to a system and method for expediting the aural training of an automated speech recognition program. [0003]
  • 2. Background Art [0004]
  • Speech recognition programs are well known in the art. While these programs are ultimately useful in automatically converting speech into text, many users are dissuaded from using these programs because they require each user to spend a significant amount of time training the system. Usually this training begins by having each user read a series of pre-selected materials for several minutes. Then, as the user continues to use the program, as words are improperly transcribed the user is expected to stop and train the program as to the intended word thus advancing the ultimate accuracy of the speech files. Unfortunately, most professionals (doctors, dentists, veterinarians, lawyers) and business executive are unwilling to spend the time developing the necessary speech files to truly benefit from the automated transcription. [0005]
  • Accordingly, it is an object of the present invention to provide a system that offers expedited training of speech recognition programs. It is an associated object to provide a simplified means for providing verbatim text files for training the aural parameters (i.e. speech files, acoustic model and/or language model) of a speech recognition portion of the system. [0006]
  • Another object of the present invention is to provide a system that can increase the speed of the speech recognition training by training the speech recognition software with only the segments of transcribed speech that are determined to be erroneous. [0007]
  • It is an associated object of the present invention to provide a system that can recognize segments of text that require correction without the need to run speech recognition software in the background. [0008]
  • These and other objects will be apparent to those of ordinary skill in the art having the present drawings, specification and claims before them. [0009]
  • SUMMARY OF THE INVENTION
  • The present invention relates to a system for improving the accuracy of a speech recognition program. The system includes means for automatically converting a pre-recorded audio file into a written text. The system also includes means for parsing the written text into segments and for correcting each and every segment of the written text. In a preferred embodiment, a human speech trainer is presented with the text and associated audio for each and every segment. The segments that are ultimately modified by the human speech trainer are stored in a retrievable manner in association with the computer. The system further includes means for saving speech files associated with a substantially corrected written text and used by the speech recognition program towards improving accuracy in speech-to-text conversion. The system finally includes means for repetitively establishing an independent instance of the written text from the pre-recorded audio file using the speech recognition program and for replacing those segments that required correction in the independent instance of the written text with the corrected segments associated therewith. [0010]
  • In the preferred embodiment of the invention the means for parsing the written text into segments includes means for directly accessing the functions of the speech recognition program. The parsing means may include means to determine the character count to the beginning of the segment and means for determining the character count to the end of the segment. Such parsing means may further include the UtteranceBegin function of Dragon Naturally Speaking™ to determine the character count to the beginning of the segment and the UtteranceEnd function of Dragon Naturally Speaking™ to determine the character count to the end of the segment. [0011]
  • The means for automatically converting a pre-recorded audio file into a written text may further be accomplished by executing functions of Dragon Naturally Speaking™. The means for automatically converting may include the TranscribeFile function of Dragon Naturally Speaking™. [0012]
  • In one embodiment, the correcting means further includes means for highlighting likely errors in the written text. In such an embodiment, where the written text is at least temporarily synchronized to said pre-recorded audio file, the highlighting means further includes means for sequentially comparing a copy of the written text with a second written text resulting in a sequential list of unmatched words culled from the written text and means for incrementally searching for the current unmatched word contemporaneously within a first buffer associated with the speech recognition program containing the written text and a second buffer associated with a sequential list of possible errors. Such element further includes means for correcting the current unmatched word in the second buffer. [0013]
  • In one embodiment, the correcting means includes means for displaying the current unmatched word in a manner substantially visually isolated from other text in the written text and means for playing a portion of said synchronized voice dictation recording from said first buffer associated with said current unmatched word. The correcting means may further include means for alternatively viewing the current unmatched word in context within the copy of the written text. [0014]
  • The second written text may be established by a second speech recognition program having at least one conversion variable different from said speech recognition program. Alternatively, the second written text may be established by one or more human beings. [0015]
  • The invention further involves a method for improving the accuracy of a speech recognition program operating on a computer comprising: (a) automatically converting a pre-recorded audio file into a written text; (b) parsing the written text into segments; (c) correcting each and every segment of the written text; (d) saving the corrected segments in a retrievable manner; (e) saving speech files associated with a substantially corrected written text and used by the speech recognition program towards improving accuracy in speech-to-text conversion by the speech recognition program; (f) establishing an independent instance of the written text from the pre-recorded audio file using the speech recognition program; (g) replacing erroneous segments in the independent instance of the written text with the individually retrievable saved corrected segment associated therewith; (h) saving speech files associated with the independent instance of the written text used by the speech recognition program towards improving accuracy in speech-to-text conversion by the speech recognition program; and (i) repeating steps (f) through (i) a predetermined number of times.[0016]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 of the drawings is a block diagram of the system for quickly improving the accuracy of a speech recognition program; [0017]
  • FIG. 2 of the drawings is a flow diagram of the method for quickly improving the accuracy of the Dragon Naturally Speaking™ software; [0018]
  • FIG. 3 of the drawings is a flow diagram of the method for automatically training the Dragon Naturally Speaking™ software; [0019]
  • FIG. 4 of the drawings is a plan view of the present system and method in operation in conjunction with Dragon Naturally Speaking™ software; [0020]
  • FIG. 5 of the drawings is a plan view of the present system and method showing the highlighting of a segment of text for playback or edit; [0021]
  • FIG. 6 of the drawings is a plan view of the present system and method showing the highlighting of a segment of text with an error for correction; [0022]
  • FIG. 7 of the drawings is a plan view of the present system and method showing the initiation of the automated correction method; [0023]
  • FIG. 8 of the drawings is a plan view of the present system and method showing the initiation of the automated training method; and [0024]
  • FIG. 9 of the drawings is a plan view of the present system and method showing the selection of audio files for training for addition to the queue;[0025]
  • BEST MODES OF PRACTICING THE INVENTION
  • While the present invention may be embodied in many different forms, there is shown in the drawings and discussed herein one specific embodiment with the understanding that the present disclosure is to be considered only as an exemplification of the principles of the invention and is not intended to limit the invention to the embodiment illustrated. [0026]
  • FIG. 1 of the drawings generally shows one potential embodiment of the present system quickly improving the accuracy of a speech recognition program. The system must include some means for receiving a pre-recorded audio file. This audio file receiving means can be a digital audio recorder, an analog audio recorder, or standard means for receiving computer files on magnetic media or via a data connection; preferably implemented on a general-purpose computer (such as computer [0027] 20), although a specialized computer could be developed for this specific purpose.
  • The general-purpose computer should have, among other elements, a microprocessor (such as the Intel Corporation Pentium®, AMD K6® or Motorola 6800® series); volatile and non-volatile memory; one or more mass storage devices (i.e. HDD, floppy drive, and other removable media devices such as a CD-ROM drive, DITTO™, ZIP™ or JAZ™ drive (from Iomega Corporation) and the like); various user input devices, such as a [0028] mouse 23, a keyboard 24, or a microphone 25; and a video display system 26. In one embodiment, the general-purpose computer is controlled by the Windows™ 9.x operating system. It is contemplated, however, that the present system would work equally well using a Macintosh™ computer or even another operating system such as a Windows CE™, UNIX or a JAVA® based operating system, to name a few. In any embodiment, the general purpose computer has amongst its programs a speech recognition program, such as Dragon Naturally Speaking™, IBM's Via Voice™, Lernout & Hauspie's Professional Edition™ or other programs.
  • Regardless of the particular computer platform used, in an embodiment utilizing an analog audio input (such as via microphone [0029] 25) the general-purpose computer must include a sound-card (not shown). Of course, in an embodiment with a digital input no sound card would be necessary to input the file. However, a sound card is likely to be necessary for playback such that the human speech trainer can listen to the pre-recorded audio file toward modifying the written text into a verbatim text.
  • In one embodiment, the general purpose computer may be loaded and configured to run digital audio recording software (such as the media utility in the Windows™ 9.x operating system, VOICEDOC™ from The Programmers' Consortium, Inc. of Oakton, Va., Cool Edit™ by Syntrillium Corporation of Phoenix, Ariz. or Dragon Naturally Speaking Professional Edition™ by Dragon Systems, Inc. In another embodiment, the speech recognition program may create a digital audio file as a byproduct of the automated transcription process. These various software programs produce a pre-recorded audio file in the form of a “WAV” file. However, as would be known to those skilled in the art, other audio file formats, such as MP3 or DSS, could also be used to format the audio file, without departing from the spirit of the present invention. The method of saving such audio files is well known to those of ordinary skill in the art. [0030]
  • Another means for receiving a pre-recorded audio file is dedicated [0031] digital recorder 14, such as the Olympus Digital Voice Recorder D-1000 manufactured by the Olympus Corporation. Thus, if a user is more comfortable with a more conventional type of dictation device, they can use a dedicated digital recorder in combination with this system. In order to harvest the digital audio text file, upon completion of a recording, dedicated digital recorder would be operably connected toward downloading the digital audio file into that general-purpose computer. With this approach, for instance, no audio card would be required.
  • Another alternative for receiving the pre-recorded audio file may consist of using one form or another of removable magnetic media containing a pre-recorded audio file. With this alternative an operator would input the removable magnetic media into the general-purpose computer toward uploading the audio file into the system. [0032]
  • In some cases it may be necessary to pre-process the audio files to make them acceptable for processing by the speech recognition software. For instance, a DSS file format may have to be changed to a WAV file format, or the sampling rate of a digital audio file may have to be upsampled or downsampled. Software to accomplish such pre-processing is available from a variety of sources including Syntrillium Corporation and Olympus Corporation. [0033]
  • In some manner, an acceptably formatted pre-recorded audio file is provided to at least a first speech recognition program that produces a first written text therefrom. The first speech recognition program may also be selected from various commercially available programs, such as Naturally Speaking™ from Dragon Systems of Newton, Massachusetts, Via Voice™ from IBM Corporation of Armonk, N.Y., or Speech Magic from Philips Corporation of Atlanta, Ga. is preferably implemented on a general-purpose computer, which may be the same general-purpose computer used to implement the pre-recorded audio file receiving means. In Dragon Systems' Naturally Speaking™, for instance, there is built-in functionality that allows speech-to-text conversion of pre-recorded digital audio. In one preferred approach, the present invention can directly access executable files provided with Dragon Naturally Speaking™ in order to transcribe the pre-recorded digital audio. [0034]
  • In an alternative approach, Dragon Systems' Naturally Speaking™ is used by running an executable simultaneously with Naturally Speaking™ that feeds phantom keystrokes and mousing operations through the WIN32API, such that Naturally Speaking™ believes that it is interacting with a human being, when in fact it is being controlled by the microprocessor. Such techniques are well known in the computer software testing art and, thus, will not be discussed in detail. It should suffice to say that by watching the application flow of any speech recognition program, an executable to mimic the interactive manual steps can be created. [0035]
  • In an approach using IBM Via Voice™—which does not have built-in functionality to allow speech-to-text conversion of pre-recorded audio a sound card would be configured to “trick” IBM Via Voice™ into thinking that it is receiving audio input from a microphone or in-line when the audio is actually coming from a pre-recorded audio file. Such routing can be achieved, for instance, with a SoundBlaster Live™ sound card from Creative Labs of Milpitas, Calif. [0036]
  • In a preferred embodiment, the transcription errors in the first written text are located in some manner to facilitate establishment of a verbatim text for use in training the speech recognition program. In one approach, a human transcriptionist establishes a transcribed file, which can be automatically compared with the first written text creating a list of differences between the two texts, which is used to identify potential errors in the first written text to assist a human speech trainer in locating such potential errors to correct same. Such effort could be assisted by the use of specialized software for isolating or highlighting the errors and synchronizing them with their associated audio. [0037]
  • In another approach for establishing a verbatim text, the acceptably formatted pre-recorded audio file is also provided to a second speech recognition program that produces a second written text therefrom. The second speech recognition program has at least one “conversion variable” different from the first speech recognition program. Such “conversion variables” may include one or more of the following: [0038]
  • (1) speech recognition programs (e.g. Dragon Systems' Naturally Speaking™, IBM's Via Voice™ or Philips Corporation's Speech Magic); [0039]
  • (2) language models within a particular speech recognition program (e.g. general English versus a specialized vocabulary (e.g. medical, legal); [0040]
  • (3) settings within a particular speech recognition program (e.g. “most accurate” versus “speed”); and/or [0041]
  • (4) the pre-recorded audio file by pre-processing same with a digital signal processor (such as Cool Edit™ by Syntrillium Corporation of Phoenix, Ariz. or a programmed DSP56000 IC from Motorola, Inc.) by changing the digital word size, sampling rate, removing particular harmonic ranges and other potential modifications. [0042]
  • By changing one or more of the foregoing “conversion variables” it is believed that the second speech recognition program will produce a slightly different written text than the first speech recognition program and that by comparing the two resulting written texts a list of differences between the two texts will assist a human speech trainer in locating such potential errors to correct same. Such effort could be assisted by the use of specialized software for isolating or highlighting the errors and synchronizing them with their associated audio. [0043]
  • In a preferred approach, the present invention can directly access various executable files associated with Dragon Systems' Naturally Speaking™. This allows the present invention to use the built in functionality of Naturally Speaking™ to transcribe pre-recorded audio files. FIG. 2 is a flow diagram of this approach using the Dragon software developer's kit (“SDK”). A user selects an audio file (usually “.wav”) for automatic transcription. The selected pre-recorded audio file is sent to the TranscribeFile module of Dictation Edit Control of the Dragon SDK. As the audio is being transcribed, the location of each segment of text is determined automatically by the speech recognition program. For instance, in Dragon, an utterance is defined by a pause in the speech. As a result of Dragon completing the transcription, the text is internally “broken up” into segments according to the location of the utterances by the present invention. [0044]
  • In this approach, the location of the segments is determined by the Dragon SDK UtteranceBegin and UtteranceEnd modules which report the location of the beginning of an utterance and the location of the end of an utterance. For example, if the number of characters to the beginning of the utterance is 100, and to the end of the utterance is 115, then the utterance begins at 100 and has 15 characters. This enables the present system to find the text for audio playback and automated correction. The location of utterances is stored in a listbox for reference. [0045]
  • In Dragon's Naturally Speaking™ these speech segments vary from 1 to, say 20 words depending upon the length of the pause setting in the Miscellaneous Tools section of Dragon Naturally Speaking™. If the end user makes the pause setting long more words will be part of an utterance because a long pause is required before Naturally Speaking™ establishes a different utterance. If the pause setting is made short then there will be more utterances with few words. Once transcription ends (using the TranscribeFile module), the text is captured. The location of the utterances (using the UtteranceBegin and UtteranceEnd modules) is then used to break apart the text to create a list of utterances. [0046]
  • Each utterance is listed sequentially in a correction window (see FIG. 5). The display may also contain a window that allows the user to view the original transcribed text. The user then manually examines each utterance to determine if correction is necessary. Using the utterance locations, the present program can play the audio associated with the currently selected speech segment using a “playback” button in the correction window toward comparing the audible text with the selected speech segment in the correction window. If correction is necessary, then that correction is manually input with standard computer techniques (using the keyboard, mouse and/or speech recognition software and, potentially, lists of potential replacement words) (see FIG. 6). [0047]
  • Sometimes the audio is unintelligible or unusable (e.g., dictator sneezes and speech recognition software types out a word, like “cyst”—an actual example). Sometimes the speech recognition program inserts word(s) when there is no detectable audio. Or sometimes when the dictator says a command like “New Paragraph”, and rather than executing the command, the speech recognition software types in the words “new” and “paragraph”. One approach where there is noise or no sound, is to type in some nonsense word like “xxxxx” for the utterance file so that audio text alignment is not lost. In cases, where the speaker pauses and the system types out “new” and “paragraph,” the words “new” and “paragraph” may be treated as text (and not as command). Although it is also possible to train commands to some extent by replacing, such an error with the voice macro command (e.g. “\New-Paragraph”). Thus, it is contemplated that correction techniques may be modified to take into account the limitations and errors of the underlying speech recognition software to promote improved automated training of speech files. [0048]
  • In another potential embodiment, unintelligible or unusable portions of the pre-recorded audio file may be removed using an audio file editor, so that only the usable audio would be used for training the speech recognition program. [0049]
  • Once the speech trainer believes the segment is a verbatim representation of the synchronized audio, the segment in the correction window is manually accepted and the next segment automatically displayed in the correction window. Once the erroneous utterances are corrected, the user may then have the option to calculate the accuracy of the transcription performed by Dragon. This process compares the corrected set of utterances with the original transcribed file. The percentage of correct words can be displayed, and the location of the differences is recorded by noting every utterance that contained an error. The corrected set of utterances is saved to a single file. In a preferred embodiment, all the utterances are saved to this file, not just corrected ones. Thus, this file will contain a corrected verbatim text version of the pre-recorded audio. [0050]
  • The user may then choose to do an automated correction of the transcribed text (see FIG. 7). This process inserts the corrected utterances into the original transcription file via Dragon's correction dialog. In a preferred approach, this correction uses the locations of the differences between the corrected utterances and the transcribed text to only correct the erroneous utterances. After corrections are complete, the user is prompted to Save the Speech file. [0051]
  • Another novel aspect of this invention is the ability to make changes in the transcribed file for the purposes of a written report versus for the verbatim files (necessary for training the speech conversion program). The general purpose of the present invention is to allow for automated training of a voice recognition system. However, it may also happen that the initial recording contains wrong information or the wrong word was actually said during recording (e.g. the user said ‘right’ during the initial recording when the user meant to say ‘left’). In this case, the correction of the text cannot normally be made to a word that was not actually said in the recording as this would hinder the training of the voice recognition system. Thus, in one embodiment the present invention may allow the user to make changes to the text and save this text solely for printing or reporting, while maintaining the separate verbatim file to train the voice recognition system. [0052]
  • One potential user interface for implementing the segmentation/correction scheme is shown in FIG. 5. In FIG. 5, the program has selected “a range of dictation and transcription solutions” as the current speech segment. The human speech trainer listening to the portion of pre-recorded audio file associated with the currently displayed speech segment, looking at the correction window and perhaps the speech segment in context within the transcribed text determines whether or not correction is necessary. By clicking on the “Play Selected” button the audio synchronized to the particular speech segment is automatically played back. Once the human speech trainer knows the actually dictated language for that speech segment, they either indicate that the present text is correct or manually replace any incorrect text with verbatim text. In a preferred approach, in either event, the corrected/verbatim text from the correction window is saved into a single file containing all the corrected utterances. [0053]
  • Alternatively, another approach to correcting the written text may use Dragon Naturally Speaking™'s user interface by using executables simultaneously operating with the speech recognition means that feeds phantom keystrokes and mousing operations through the WIN32API (See FIG. 4). In FIG. 4, the Dragon Naturally Speaking™ program has selected “seeds for cookie” as the current speech segment (or utterance in Dragon parlance). As in the other approach, the human speech trainer listening to the portion of pre-recorded audio file associated with the currently displayed speech segment, looking at the correction window and perhaps the speech segment in context within the transcribed text determines whether or not correction is necessary. By clicking on the “Play Back” button the audio synchronized to the particular speech segment is automatically played back. As in the other approach, once the human speech trainer knows the actually dictated language for that speech segment, they either indicate that the present text is correct (by merely pressing an “OK” button) or manually replace any incorrect text with verbatim text. As in the other approach, in either event, the corrected/verbatim text from the correction window is preferably saved into a single file containing all the corrected utterances. [0054]
  • Once the verbatim text is completed (and preferably verified for accuracy), the file containing the corrected utterances can be used to train the speech recognition program (see FIG. 8). FIG. 3 is a flow diagram describing the training process. The user has the option of running the training sequence a selected number of times to increase the effectiveness of the training. The user chooses the file on which to perform the training. The chosen files are then transferred to the queue for processing (FIG. 9). Once training is initiated, the file containing the corrected set of utterances is read. The corrected utterances file is opened and read into a listbox. This is not a function of the Dragon SDK, but is instead a basic I/O file. The associated pre-recorded audio file is sent to TranscribeFile method of DictationEditControl from the Dragon SDK. (In particular, the audio file is sent by running the command “FrmControls.DeTop2.TranscribeFile filename;” FrmControls is the form where the Dragon SDK ActiveX Controls are located; DeTop2 is the name of the controls.) Transcribe File is the function of controls for transcribing wave files. In conjunction with this transcribing, the UtteranceBegin and UtteranceEnd methods of DragonEngineControl report the location of utterances in the same manner as previously described. Once transcription ends, the location of the utterances that were determined are used to break apart the text. This set of utterances is compared to the list of corrected utterances to find any differences. One program used to compare the differences (native to Windows 9.x) may be File Compare. The location of the differences are then stored in a listbox. Then the locations of differences in the list box are used to only correct the utterances that had differences. Upon completion of correction, speech files are automatically saved. This cycle can then be repeated the predetermined number of times. [0055]
  • Once training is complete, TranscribeFile can be initiated one last time to transcribe the pre-recorded audio. The location of the utterances are not calculated again in this step. This transcribed file is compared one more time to the corrected utterances to determine the accuracy of the voice recognition program after training. [0056]
  • By automating this process, the present system can produce a significant improvement in the accuracy of the speech recognition program. [0057]
  • Alternatively, the training process can be automated by using an executable file simultaneously operating with the speech recognition means that feeds phantom keystrokes and mousing operations through the WIN32API, such that the first speech recognition program believes that it is interacting with a human being, when in fact it is being controlled by the microprocessor. In this approach, the video and storage buffer of the speech recognition program are first cleared. Next, the pre-recorded audio file is loaded into the first speech recognition program, in the same manner disclosed above. Third, a new written text is established by the first speech recognition program. Fourth, the segmentation/correction program utilizes the speech recognition program's parsing system to sequentially identify speech segments and places each and every one of those speech segments into a correction window—whether correction is required on any portion of those segments or not—seriatim. Fifth, the system automatically replaces the next segment of erroneous text in the correction window using the saved corrected segments file. That text is then pasted into the underlying Dragon Naturally Speaking™ buffer. The fourth and fifth steps are repeated until all of the erroneous segments have been replaced. [0058]
  • This selection and replacement of erroneous text segments within the buffer leads to an improvement in the aural parameters of the speech recognition program for the particular speech user that recorded the pre-recorded audio file. In this manner) the accuracy of first speech recognition program's speech-to-text conversion can be markedly, yet quickly improved. [0059]
  • The foregoing description and drawings merely explain and illustrate the invention and the invention is not limited thereto. Those of the skill in the art who have the disclosure before them will be able to make modifications and variations therein without departing from the scope of the present invention. [0060]

Claims (14)

What is claimed is:
1. A system for improving the accuracy of a speech recognition program operating on a computer, said system comprising:
means for automatically converting a pre-recorded audio file into a written text;
means for parsing said written text into segments;
means for correcting each and every segment of said written text;
means for saving each corrected segment in a retrievable manner in association with said computer;
means for saving speech files associated with a substantially corrected written text and used by said speech recognition program towards improving accuracy in speech-to-text conversion by said speech recognition program; and
means for repetitively establishing an independent instance of said written text from said pre-recorded audio file using said speech recognition program and for replacing each erroneous segment in said independent instance of said written text with said corrected segment associated therewith.
2. The invention according to claim 1 wherein said parsing means includes means for directly accessing functions of said speech recognition program.
3. The invention according to claim 2 wherein said parsing means further include means to determine a character count to the beginning of each of said segments and means to determine a character count to the end of each of said segments.
4. The invention according to claim 3 wherein said means to determine the character count to the beginning of each of said segments includes UtteranceBegin function from the Dragon Naturally Speaking™, and said means to determine the character count to the end of each of said segments includes UtteranceEnd function from the Dragon Naturally Speaking™.
5. The invention according to claim 1 wherein said means for automatically converting includes means for directly accessing functions of said speech recognition program.
6. The invention according to claim 5 wherein said means for automatically converting further includes TranscribeFile function of Dragon Naturally Speaking™.
7. The invention according to claim 1 wherein said correcting means further includes means for highlighting likely errors in said written text.
8. The invention according to claim 7 wherein said written text is at least temporarily synchronized to said pre-recorded audio file, said highlighting means comprises:
means for sequentially comparing a copy of said written text with a second written text resulting in a sequential list of unmatched words culled from said copy of said written text, said sequential list having a beginning, an end and a current unmatched word, said current unmatched word being successively advanced from said beginning to said end;
means for incrementally searching for said current unmatched word contemporaneously within a first buffer associated with the speech recognition program containing said written text and a second buffer associated with said sequential list; and
means for correcting said current unmatched word in said second buffer, said correcting means including means for displaying said current unmatched word in a manner substantially visually isolated from other text in said copy of said written text and means for playing a portion of said synchronized voice dictation recording from said first buffer associated with said current unmatched word.
9. The invention according to claim 8 wherein said second written text is established by a second speech recognition program having at least one conversion variable different from said speech recognition program.
10. The invention according to claim 8 wherein said second written text is established by one or more human beings.
11. The invention according to claim 8 wherein said correcting means further includes means for alternatively viewing said current unmatched word in context within said copy of said written text.
12. A method for improving the accuracy of a speech recognition program operating on a computer comprising:
(a) automatically converting a pre-recorded audio file into a written text;
(b) parsing the written text into segments;
(c) correcting each and every segment of the written text;
(d) saving each corrected segment in a retrievable manner;
(e) saving speech files associated with a substantially corrected written text and used by the speech recognition program towards improving accuracy in speech-to-text conversion by the speech recognition program;
(f) establishing an independent instance of the written text from the pre-recorded audio file using the speech recognition program;
(g) replacing each erroneous segment in the independent instance of the written text with the corrected segment associated therewith;
(h) saving speech files associated with the independent instance of the written text used by the speech recognition program towards improving accuracy in speech-to-text conversion by the speech recognition program; and
(i) repeating steps (f) through (i) a predetermined number of times.
13. The method according to claim 12 further comprising highlighting likely errors is said written text.
14. The method according to claim 13 wherein highlighting includes:
comparing sequentially a copy of said written text with a second written text resulting in a sequential list of unmatched words culled from said copy of said written text, said sequential list having a beginning, an end and a current unmatched word, said current unmatched word being successively advanced from said beginning to said end;
searching incrementally for said current unmatched word contemporaneously within a first buffer associated with the speech recognition program containing said written text and a second buffer associated with said sequential list; and
correcting said current unmatched word in said second buffer, said correcting means including means for displaying said current unmatched word in a manner substantially visually isolated from other text in said copy of said written text and means for playing a portion of said synchronized voice dictation recording from said first buffer associated with said current unmatched word.
US10/461,079 1999-07-28 2003-06-13 System and method for improving the accuracy of a speech recognition program Abandoned US20030225578A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US10/461,079 US20030225578A1 (en) 1999-07-28 2003-06-13 System and method for improving the accuracy of a speech recognition program

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US09/362,255 US6490558B1 (en) 1999-07-28 1999-07-28 System and method for improving the accuracy of a speech recognition program through repetitive training
US20887800P 2000-06-01 2000-06-01
US09/625,657 US6704709B1 (en) 1999-07-28 2000-07-26 System and method for improving the accuracy of a speech recognition program
US10/461,079 US20030225578A1 (en) 1999-07-28 2003-06-13 System and method for improving the accuracy of a speech recognition program

Related Parent Applications (2)

Application Number Title Priority Date Filing Date
US09/362,255 Continuation US6490558B1 (en) 1999-07-28 1999-07-28 System and method for improving the accuracy of a speech recognition program through repetitive training
US09/625,657 Continuation US6704709B1 (en) 1999-07-28 2000-07-26 System and method for improving the accuracy of a speech recognition program

Publications (1)

Publication Number Publication Date
US20030225578A1 true US20030225578A1 (en) 2003-12-04

Family

ID=31890898

Family Applications (2)

Application Number Title Priority Date Filing Date
US09/625,657 Expired - Fee Related US6704709B1 (en) 1999-07-28 2000-07-26 System and method for improving the accuracy of a speech recognition program
US10/461,079 Abandoned US20030225578A1 (en) 1999-07-28 2003-06-13 System and method for improving the accuracy of a speech recognition program

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US09/625,657 Expired - Fee Related US6704709B1 (en) 1999-07-28 2000-07-26 System and method for improving the accuracy of a speech recognition program

Country Status (1)

Country Link
US (2) US6704709B1 (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006035402A1 (en) * 2004-09-30 2006-04-06 Koninklijke Philips Electronics N.V. Automatic text correction
US20070078806A1 (en) * 2005-10-05 2007-04-05 Hinickle Judith A Method and apparatus for evaluating the accuracy of transcribed documents and other documents
US20070143099A1 (en) * 2005-12-15 2007-06-21 International Business Machines Corporation Method and system for conveying an example in a natural language understanding application
US20070203707A1 (en) * 2006-02-27 2007-08-30 Dictaphone Corporation System and method for document filtering
US20080059173A1 (en) * 2006-08-31 2008-03-06 At&T Corp. Method and system for providing an automated web transcription service
US7899670B1 (en) * 2006-12-21 2011-03-01 Escription Inc. Server-based speech recognition
US8214213B1 (en) * 2006-04-27 2012-07-03 At&T Intellectual Property Ii, L.P. Speech recognition based on pronunciation modeling
US8504369B1 (en) * 2004-06-02 2013-08-06 Nuance Communications, Inc. Multi-cursor transcription editing
US8571869B2 (en) 2005-02-28 2013-10-29 Nuance Communications, Inc. Natural language system and method based on unisolated performance metric
US20140122513A1 (en) * 2005-01-03 2014-05-01 Luc Julia System and method for enabling search and retrieval operations to be performed for data items and records using data obtained from associated voice files
US20150058007A1 (en) * 2013-08-26 2015-02-26 Samsung Electronics Co. Ltd. Method for modifying text data corresponding to voice data and electronic device for the same
US20150331941A1 (en) * 2014-05-16 2015-11-19 Tribune Digital Ventures, Llc Audio File Quality and Accuracy Assessment
US20190295539A1 (en) * 2018-03-22 2019-09-26 Lenovo (Singapore) Pte. Ltd. Transcription record comparison
US10565994B2 (en) 2017-11-30 2020-02-18 General Electric Company Intelligent human-machine conversation framework with speech-to-text and text-to-speech
US11417319B2 (en) * 2017-09-21 2022-08-16 Kabushiki Kaisha Toshiba Dialogue system, dialogue method, and storage medium

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7236932B1 (en) * 2000-09-12 2007-06-26 Avaya Technology Corp. Method of and apparatus for improving productivity of human reviewers of automatically transcribed documents generated by media conversion systems
US6915258B2 (en) * 2001-04-02 2005-07-05 Thanassis Vasilios Kontonassios Method and apparatus for displaying and manipulating account information using the human voice
US7200565B2 (en) * 2001-04-17 2007-04-03 International Business Machines Corporation System and method for promoting the use of a selected software product having an adaptation module
CA2488256A1 (en) * 2002-05-30 2003-12-11 Custom Speech Usa, Inc. A method for locating an audio segment within an audio file
US20080255837A1 (en) * 2004-11-30 2008-10-16 Jonathan Kahn Method for locating an audio segment within an audio file
US20070055520A1 (en) * 2005-08-31 2007-03-08 Microsoft Corporation Incorporation of speech engine training into interactive user tutorial
US20070208567A1 (en) * 2006-03-01 2007-09-06 At&T Corp. Error Correction In Automatic Speech Recognition Transcripts
US8412522B2 (en) 2007-12-21 2013-04-02 Nvoq Incorporated Apparatus and method for queuing jobs in a distributed dictation /transcription system
WO2009082684A1 (en) 2007-12-21 2009-07-02 Sandcherry, Inc. Distributed dictation/transcription system
US8639505B2 (en) 2008-04-23 2014-01-28 Nvoq Incorporated Method and systems for simplifying copying and pasting transcriptions generated from a dictation based speech-to-text system
US8639512B2 (en) * 2008-04-23 2014-01-28 Nvoq Incorporated Method and systems for measuring user performance with speech-to-text conversion for dictation systems
US8688445B2 (en) * 2008-12-10 2014-04-01 Adobe Systems Incorporated Multi-core processing for parallel speech-to-text processing
US20110173537A1 (en) * 2010-01-11 2011-07-14 Everspeech, Inc. Integrated data processing and transcription service
TWI412019B (en) 2010-12-03 2013-10-11 Ind Tech Res Inst Sound event detecting module and method thereof
US8909534B1 (en) 2012-03-09 2014-12-09 Google Inc. Speech recognition training
US10878721B2 (en) 2014-02-28 2020-12-29 Ultratec, Inc. Semiautomated relay method and apparatus
US10748523B2 (en) 2014-02-28 2020-08-18 Ultratec, Inc. Semiautomated relay method and apparatus
US10389876B2 (en) 2014-02-28 2019-08-20 Ultratec, Inc. Semiautomated relay method and apparatus
US20180270350A1 (en) 2014-02-28 2018-09-20 Ultratec, Inc. Semiautomated relay method and apparatus
US20180034961A1 (en) 2014-02-28 2018-02-01 Ultratec, Inc. Semiautomated Relay Method and Apparatus
CN105827417A (en) * 2016-05-31 2016-08-03 安徽声讯信息技术有限公司 Voice quick recording device capable of performing modification at any time in conference recording
US11017778B1 (en) 2018-12-04 2021-05-25 Sorenson Ip Holdings, Llc Switching between speech recognition systems
US10573312B1 (en) 2018-12-04 2020-02-25 Sorenson Ip Holdings, Llc Transcription generation from multiple speech recognition systems
US10388272B1 (en) 2018-12-04 2019-08-20 Sorenson Ip Holdings, Llc Training speech recognition systems using word sequences
US11170761B2 (en) 2018-12-04 2021-11-09 Sorenson Ip Holdings, Llc Training of speech recognition systems
US11615789B2 (en) * 2019-09-19 2023-03-28 Honeywell International Inc. Systems and methods to verify values input via optical character recognition and speech recognition
US11539900B2 (en) 2020-02-21 2022-12-27 Ultratec, Inc. Caption modification and augmentation systems and methods for use by hearing assisted user
US11488604B2 (en) 2020-08-19 2022-11-01 Sorenson Ip Holdings, Llc Transcription of audio

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5960447A (en) * 1995-11-13 1999-09-28 Holt; Douglas Word tagging and editing system for speech recognition
US6064959A (en) * 1997-03-28 2000-05-16 Dragon Systems, Inc. Error correction in speech recognition
US6122614A (en) * 1998-11-20 2000-09-19 Custom Speech Usa, Inc. System and method for automating transcription services
US6219640B1 (en) * 1999-08-06 2001-04-17 International Business Machines Corporation Methods and apparatus for audio-visual speaker recognition and utterance verification
US6243713B1 (en) * 1998-08-24 2001-06-05 Excalibur Technologies Corp. Multimedia document retrieval by application of multimedia queries to a unified index of multimedia data for a plurality of multimedia data types

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6064957A (en) * 1997-08-15 2000-05-16 General Electric Company Improving speech recognition through text-based linguistic post-processing
US6006183A (en) * 1997-12-16 1999-12-21 International Business Machines Corp. Speech recognition confidence level display
US6151576A (en) * 1998-08-11 2000-11-21 Adobe Systems Incorporated Mixing digitized speech and text using reliability indices
US6195635B1 (en) * 1998-08-13 2001-02-27 Dragon Systems, Inc. User-cued speech recognition
US6064965A (en) * 1998-09-02 2000-05-16 International Business Machines Corporation Combined audio playback in speech recognition proofreader

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5960447A (en) * 1995-11-13 1999-09-28 Holt; Douglas Word tagging and editing system for speech recognition
US6064959A (en) * 1997-03-28 2000-05-16 Dragon Systems, Inc. Error correction in speech recognition
US6243713B1 (en) * 1998-08-24 2001-06-05 Excalibur Technologies Corp. Multimedia document retrieval by application of multimedia queries to a unified index of multimedia data for a plurality of multimedia data types
US6122614A (en) * 1998-11-20 2000-09-19 Custom Speech Usa, Inc. System and method for automating transcription services
US6219640B1 (en) * 1999-08-06 2001-04-17 International Business Machines Corporation Methods and apparatus for audio-visual speaker recognition and utterance verification

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8504369B1 (en) * 2004-06-02 2013-08-06 Nuance Communications, Inc. Multi-cursor transcription editing
US20070299664A1 (en) * 2004-09-30 2007-12-27 Koninklijke Philips Electronics, N.V. Automatic Text Correction
WO2006035402A1 (en) * 2004-09-30 2006-04-06 Koninklijke Philips Electronics N.V. Automatic text correction
US20140122513A1 (en) * 2005-01-03 2014-05-01 Luc Julia System and method for enabling search and retrieval operations to be performed for data items and records using data obtained from associated voice files
US8571869B2 (en) 2005-02-28 2013-10-29 Nuance Communications, Inc. Natural language system and method based on unisolated performance metric
US8977549B2 (en) 2005-02-28 2015-03-10 Nuance Communications, Inc. Natural language system and method based on unisolated performance metric
US20070078806A1 (en) * 2005-10-05 2007-04-05 Hinickle Judith A Method and apparatus for evaluating the accuracy of transcribed documents and other documents
US9384190B2 (en) 2005-12-15 2016-07-05 Nuance Communications, Inc. Method and system for conveying an example in a natural language understanding application
US10192543B2 (en) 2005-12-15 2019-01-29 Nuance Communications, Inc. Method and system for conveying an example in a natural language understanding application
US8612229B2 (en) * 2005-12-15 2013-12-17 Nuance Communications, Inc. Method and system for conveying an example in a natural language understanding application
US20070143099A1 (en) * 2005-12-15 2007-06-21 International Business Machines Corporation Method and system for conveying an example in a natural language understanding application
US8036889B2 (en) * 2006-02-27 2011-10-11 Nuance Communications, Inc. Systems and methods for filtering dictated and non-dictated sections of documents
US20070203707A1 (en) * 2006-02-27 2007-08-30 Dictaphone Corporation System and method for document filtering
US8214213B1 (en) * 2006-04-27 2012-07-03 At&T Intellectual Property Ii, L.P. Speech recognition based on pronunciation modeling
US8532993B2 (en) 2006-04-27 2013-09-10 At&T Intellectual Property Ii, L.P. Speech recognition based on pronunciation modeling
US9070368B2 (en) 2006-08-31 2015-06-30 At&T Intellectual Property Ii, L.P. Method and system for providing an automated web transcription service
US8775176B2 (en) 2006-08-31 2014-07-08 At&T Intellectual Property Ii, L.P. Method and system for providing an automated web transcription service
US8521510B2 (en) * 2006-08-31 2013-08-27 At&T Intellectual Property Ii, L.P. Method and system for providing an automated web transcription service
US20080059173A1 (en) * 2006-08-31 2008-03-06 At&T Corp. Method and system for providing an automated web transcription service
US7899670B1 (en) * 2006-12-21 2011-03-01 Escription Inc. Server-based speech recognition
US20150058007A1 (en) * 2013-08-26 2015-02-26 Samsung Electronics Co. Ltd. Method for modifying text data corresponding to voice data and electronic device for the same
US20150331941A1 (en) * 2014-05-16 2015-11-19 Tribune Digital Ventures, Llc Audio File Quality and Accuracy Assessment
US10776419B2 (en) * 2014-05-16 2020-09-15 Gracenote Digital Ventures, Llc Audio file quality and accuracy assessment
US11417319B2 (en) * 2017-09-21 2022-08-16 Kabushiki Kaisha Toshiba Dialogue system, dialogue method, and storage medium
US10565994B2 (en) 2017-11-30 2020-02-18 General Electric Company Intelligent human-machine conversation framework with speech-to-text and text-to-speech
US20190295539A1 (en) * 2018-03-22 2019-09-26 Lenovo (Singapore) Pte. Ltd. Transcription record comparison
US10748535B2 (en) * 2018-03-22 2020-08-18 Lenovo (Singapore) Pte. Ltd. Transcription record comparison

Also Published As

Publication number Publication date
US6704709B1 (en) 2004-03-09

Similar Documents

Publication Publication Date Title
US6704709B1 (en) System and method for improving the accuracy of a speech recognition program
US6490558B1 (en) System and method for improving the accuracy of a speech recognition program through repetitive training
US6961699B1 (en) Automated transcription system and method using two speech converting instances and computer-assisted correction
CA2351705C (en) System and method for automating transcription services
EP1183680B1 (en) Automated transcription system and method using two speech converting instances and computer-assisted correction
US6161087A (en) Speech-recognition-assisted selective suppression of silent and filled speech pauses during playback of an audio recording
US7006967B1 (en) System and method for automating transcription services
US4866778A (en) Interactive speech recognition apparatus
US20080255837A1 (en) Method for locating an audio segment within an audio file
US7979281B2 (en) Methods and systems for creating a second generation session file
US20030004724A1 (en) Speech recognition program mapping tool to align an audio file to verbatim text
US20050131559A1 (en) Method for locating an audio segment within an audio file
US6915258B2 (en) Method and apparatus for displaying and manipulating account information using the human voice
KR20000057795A (en) Speech recognition enrollment for non-readers and displayless devices
US7120581B2 (en) System and method for identifying an identical audio segment using text comparison
US7895037B2 (en) Method and system for trimming audio files
AU776890B2 (en) System and method for improving the accuracy of a speech recognition program
AU3588200A (en) System and method for automating transcription services
JP2001325250A (en) Minutes preparation device, minutes preparation method and recording medium
WO2001093058A1 (en) System and method for comparing text generated in association with a speech recognition program
US20050125236A1 (en) Automatic capture of intonation cues in audio segments for speech applications
AU2004233462B2 (en) Automated transcription system and method using two speech converting instances and computer-assisted correction
JPH0644060A (en) Program development supporting method and device therefor
JPH1091392A (en) Voice document preparing device

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION