Search Images Maps Play YouTube News Gmail Drive More »
Advanced Patent Search | Web History | Sign in

Patents

A method and system for dynamically controlling a voice-processing mechanism in a voice command platform. The platform receives a specification during a voice command session with a user and responsively sets a mode of operation of the voice-processing mechanism. The platform can receive the specification from various sources, such as a user (e.g., as a voice command), a stored user profile record, a voice command application and/or another stored table or data source. Example modes of operation include (i) use of a designated text-to-speech engine from among multiple text-to-speech engines, (ii) use of a designated voice prompt store from among multiple voice prompt stores, (iii) muting of a speech recognition engine, and (iv) selection of a secondary phoneme dictionary for use in recognizing phonemes in incoming speech signals.

Citations

Cited PatentFiling dateIssue dateOriginal AssigneeTitle
US3990050Sep 25, 1974Nov 2, 1976Bell Telephone Laboratories, IncorporatedComputer controlled automatic response system
US4215240Nov 11, 1977Jul 29, 1980Federal Screw WorksPortable voice system for the verbally handicapped
US4523055Nov 25, 1983Jun 11, 1985Pitney Bowes Inc.Voice/text storage and retrieval system
US5257187Sep 30, 1991Oct 26, 1993Sharp Kabushiki KaishaTranslation machine system
US5297183Apr 13, 1992Mar 22, 1994VCS Industries, Inc.Speech recognition system for electronic switches in a cellular telephone or personal communication network
US5365050Mar 16, 1993Nov 15, 1994Worthington Data SolutionsPortable data collection terminal with voice prompt and recording
US5659597Mar 22, 1994Aug 19, 1997Voice Control Systems, Inc.Speech recognition system for electronic switches in a non-wireline communications network
US5754736Sep 8, 1995May 19, 1998U.S. Philips CorporationSystem and method for outputting spoken information in response to input speech signals
US5815639May 20, 1993Sep 29, 1998Engate IncorporatedComputer-aided transcription system using pronounceable substitute text with a common cross-reference library
US6058166Oct 6, 1997May 2, 2000Unisys CorporationEnhanced multi-lingual prompt management in a voice messaging system with support for speech recognition
US6157848Aug 19, 1997Dec 5, 2000Philips Electronics North America CorporationSpeech recognition system for electronic switches in a non-wireline communications network
US6334101Dec 15, 1998Dec 25, 2001International Business Machines CorporationMethod, system and computer program product for dynamic delivery of human language translations during software operation

Referenced by

Citing PatentFiling dateIssue dateOriginal AssigneeTitle
US7505910Jan 29, 2004Mar 17, 2009Hitachi, Ltd.Speech command management dependent upon application software status
US7519534Oct 30, 2003Apr 14, 2009AgileTV CorporationSpeech controlled access to content on a presentation medium
US7689421Jun 27, 2007Mar 30, 2010Microsoft CorporationVoice persona service for embedding text-to-speech features into software programs
US7869998Dec 19, 2002Jan 11, 2011AT&T Intellectual Property II, L.P.Voice-enabled dialog system
US7881932Oct 2, 2006Feb 1, 2011Nuance Communications, Inc.VoiceXML language extension for natively supporting voice enrolled grammars
US7957509Oct 19, 2007Jun 7, 2011AT&T Intellectual Property I, L.P.Voice enhancing for advance intelligent network services
US8015014Jun 16, 2006Sep 6, 2011Storz Endoskop Produktions GmbHSpeech recognition system with user profiles management component
US8024194Dec 8, 2004Sep 20, 2011Nuance Communications, Inc.Dynamic switching between local and remote speech rendering
US8234117Mar 22, 2007Jul 31, 2012Canon Kabushiki KaishaSpeech-synthesis device having user dictionary control

Claims

1. A network-based voice command platform for interacting with remote users via speech signals exchanged between the user and the voice command platform via a telephone and a communications network, comprising:

a user communication interface for receiving speech from the user via the telephone and the network;

a processor;

an application-processing module executable by the processor to process voice command applications designed for interacting with users via speech, the voice command applications defining user-prompts, allowed grammars, and application logic;

a voice-processing module executable by the processor to recognize the allowed grammars in speech signals received from a user via the user communication interface, and to convert the user-prompts into speech signals for transmission to the user via the user communication interface, the voice-processing module having a plurality of selectable modes of operation; and
selection-logic executable by the processor in response to a specification received during a voice command session with the user, to cause the voice-processing module to operate according to a mode of operation that corresponds with the specification.

2. The voice command platform of claim 1, wherein the specification is received from the user during the voice command session.

3. The voice command platform of claim 1, wherein the specification is received from a voice command application being processed during the voice command session.

4. The voice command platform of claim 1, wherein the specification is received from a stored profile record for the user.

5. The voice command platform of claim 4, wherein the specification is received at initiation of the voice command session with the user.

6. The voice command platform of claim 1, wherein:

the user-prompts comprise text prompts;

the voice-processing module comprises a plurality of text-to-speech engines for converting the text prompts into speech signals;

the specification indicates a given text-to-speech engine to use for converting text prompts into speech signals; and

the mode of operation that corresponds with the specification comprises applying the given text-to-speech engine when converting text prompts into speech signals.

7. The voice command platform of claim 1, wherein:

the voice-processing module comprises a plurality of text-to-speech engines for converting user-prompts into speech signals, including a first text-to-speech engine and a second text-to-speech engine;

in a first mode of operation of the voice-processing module, the first text-to-speech engine is active, so that, when the processor executes the voice-processing module to convert user-prompts into speech signals, the processor applies the first text-to-speech engine;

in a second mode of operation of the voice-processing module, the second text-to-speech engine is active, so that, when the processor executes the voice translation module to convert user-prompts into speech signals, the processor applies the second text-to-speech engine; and

the selection-logic is executable by the processor in response to the specification received during the voice command session, to switch from the first mode of operation to the second mode of operation.

8. The voice command platform of claim 7, wherein the first mode of operation is a default mode of operation of the voice-processing module.

9. The voice command platform of claim 7, wherein the specification is received from the user during the voice command session.

10. The voice command platform of claim 7, wherein the specification is received from a voice command application being executed during the voice command session.

11. The voice command platform of claim 7, wherein the specification is received from a stored profile record for the user.

12. The voice command platform of claim 11, wherein the specification is received at initiation of the voice command session with the user.

13. The voice command platform of claim 1, wherein:

the user-prompts comprise designations of prerecorded speech signals representing voice prompts to retrieve and transmit to the user;

the voice-processing module is executable to convert a user prompt defined by a voice command application into a speech signal by retrieving from a voice prompt store a given voice prompt corresponding to the user prompt;

the voice-processing module comprises a plurality of voice prompt stores, each voice prompt store comprising a respective version of the given voice prompt;

the specification indicates a given voice prompt store to use for converting prompts into speech signals; and
the mode of operation that corresponds with the specification comprises retrieving the given voice prompt from the given voice prompt store.

14. The voice command platform of claim 13, wherein the specification is received from the user during the voice command session.

15. The voice command platform of claim 13, wherein the specification is received from a voice command application being executed during the voice command session.

16. The voice command platform of claim 13, wherein the specification is received from a stored profile record for the user.

17. The voice command platform of claim 16, wherein the specification is received at initiation of the voice command session with the user.

18. The voice command platform of claim 13, wherein:

the user-prompts further comprise text prompts;

the voice-processing module comprises a plurality of text-to-speech engines for converting the text prompts into speech signals;

the specification indicates a given text-to-speech engine to use for converting text prompts into speech signals; and

the mode of operation that corresponds with the specification further comprises applying the given text-to-speech engine when converting text prompts into speech signals.

19. The voice command platform of claim 1, wherein:

in a first mode of operation, the voice-processing is executable to convert the user-prompts into speech signals representing a first voice persona;

in a second mode of operation, the voice-processing is executable to convert the user-prompts speech signals representing a second voice persona;

the selection-logic is executable by the processor in response to a first specification received during a first voice command session with a first user, to cause the voice-processing module to operate according to the first mode of operation; and

the selection-logic is executable by the processor in response to a second specification received during a second voice command session with a second user, to cause the voice-processing module to operate according to the second mode of operation.

20. The voice command platform of claim 19, wherein the first specification is received from a stored profile record for the first user, and the second specification is received from a stored profile record for the second user.

21. The voice command platform of claim 1, wherein:

the voice-processing module includes a speech recognition engine executable by the processor to recognize the allowed grammars in the speech signals received from a user;

in a first mode of operation of the voice-processing module, the speech recognition engine actively monitors incoming speech signals and provides output indicative of allowed grammars recognized in the incoming speech signals;

in a second mode of operation of the voice-processing module, the speech recognition engine does not actively monitor speech signals and provide output indicative of allowed grammars recognized in the incoming speech signals;

the specification is received from the user during the voice command session with the user; and
the selection-logic is executable by the processor in response to the specification to switch the voice-processing module from the first mode of operation to the second mode of operation.

22. The voice command platform of claim 21, wherein the specification comprises a mute-command provided by the user.

23. The voice command platform of claim 22, wherein specification comprises a speech signal representing at least the word “Mute”.

24. The voice command platform of claim 1, wherein:

the voice-processing module comprises (i) a core phoneme dictionary that the processor uses to recognize phonemes in incoming speech signals and (ii) at least one secondary phoneme dictionary that the processor uses to recognize phonemes in incoming speech signals;

the specification comprises an indication of which secondary phoneme dictionary to use when the processor is processing a given voice command application; and

the selection-logic is executable by the processor in response to the specification to cause the processor to use a given secondary phoneme dictionary to recognize phonemes in incoming speech signals when the processor is processing the given voice command application.

25. The voice command platform of claim 24, wherein, when the processor uses a given secondary phoneme dictionary to recognize phonemes in incoming speech signals, the processor uses only the given secondary phoneme dictionary and not the core phoneme dictionary.

26. The voice command platform of claim 24, wherein, when the processor uses a given secondary phoneme dictionary to recognize phonemes in incoming speech signals, the processor uses the given secondary phoneme dictionary in conjunction with the core phoneme dictionary.

27. The voice command platform of claim 26, wherein the given secondary phoneme dictionary defines additions to the core phoneme dictionary.

28. The voice command platform of claim 24, wherein the specification is received from a stored table that correlates voice command applications with secondary phoneme dictionaries.

29. The voice command platform of claim 28, wherein the voice command platform receives the given secondary phoneme dictionary from a source selected from the group consisting of (i) a provider of the given voice command application, in advance of the voice command session, and (ii) the voice command application, when the voice command application is loaded into the platform.

30. The voice command platform of claim 1, wherein speech signals are communicated between the user and the user communication interface via a telecommunications network comprising a wireless communications link.

31. A method of dynamically switching between modes of operation of a voice-processing module in a voice command platform, the voice-processing module being executable by a processor to recognize allowed grammars in speech signals received from a user via a user communication interface, and to convert user-prompts defined by voice command applications into speech signals for transmission to a user via the user communication interface, the method comprising:

receiving, during a voice command session with the user, a specification indicative of a mode of operation of the voice-processing module;

in response to the specification, switching from a first mode of operation of the voice-processing module to a second mode of operation of the voice-processing module.

32. The method of claim 31, wherein receiving the specification indicative of the mode of operation of the voice-processing module comprises receiving the specification from the user during the voice command session.

33. The method of claim 31, wherein receiving the specification indicative of the mode of operation of the voice-processing module comprises receiving the specification from a voice command application being executed during the voice command session.

34. The method of claim 31, wherein receiving the specification indicative of the mode of operation of the voice-processing module comprises receiving the specification from a stored profile record for the user.

35. The method of claim 31, wherein:

the user-prompts comprise text prompts, the voice translator module comprises a plurality of text-to-speech engines for converting the text prompts into speech signals, and the specification indicates a given text-to-speech engine to use for converting text prompts into speech signals;

in the first mode of operation of the voice-processing module, the processor applies a first text-to-speech engine to convert text prompts into speech signals;

in the second mode of operation of the voice-processing module, the processor applies a second text-to-speech engine to convert text prompts into speech signals; and

switching from the first mode of operation of the voice-processing module to the second mode of operation of the voice-processing module comprises causing the processor (i) to stop applying the first text-to-speech engine to convert text prompts into speech signals and (ii) to start applying the second text-to-speech engine to convert text prompts into speech signals.

36. The method of claim 35, wherein receiving the specification indicative of the mode of operation of the voice-processing module comprises receiving the specification from the user during the voice command session.

37. The method of claim 35, wherein receiving the specification indicative of the mode of operation of the voice-processing module comprises receiving the specification from a voice command application being executed during the voice command session.

38. The method of claim 35, wherein receiving the specification indicative of the mode of operation of the voice-processing module comprises receiving the specification from a stored profile record for the user.

39. The method of claim 31, wherein:

the user-prompts comprise designations of prerecorded speech signals representing voice prompts to retrieve and transmit to the user;

the voice-processing module is executable to convert a user prompt defined by a voice command application into a speech signal by retrieving from a voice prompt store a given voice prompt corresponding to the user prompt;

the voice-processing module comprises a plurality of voice prompt stores, each voice prompt store comprising a respective version of the given voice prompt;

the specification indicates a given voice prompt store to use for converting prompts into speech signals;
in the first mode of operation, the voice-processing module is executable to retrieve the given voice prompt from a first prompt store; and
in the second mode of operation, the voice-processing module is executable to retrieve the given voice prompt from a second prompt store.

40. The method of claim 39, wherein receiving the specification indicative of the mode of operation of the voice-processing module comprises receiving the specification from the user during the voice command session.

41. The method of claim 39, wherein receiving the specification indicative of the mode of operation of the voice-processing module comprises receiving the specification from a voice command application being executed during the voice command session.

42. The method of claim 39, wherein receiving the specification indicative of the mode of operation of the voice-processing module comprises receiving the specification from a stored profile record for the user.

43. The method of claim 31, wherein

the voice-processing module includes a speech recognition engine executable by the processor to recognize the allowed grammars in the speech signals received from a user;

in the first mode of operation of the voice-processing module, the speech recognition engine actively monitors incoming speech signals and provides output indicative of allowed grammars recognized in the incoming speech signals;

in a second mode of operation of the voice-processing module, the speech recognition engine does not actively monitor speech signals and provide output indicative of allowed grammars recognized in the incoming speech signals; and

receiving the specification indicative of the mode of operation of the voice-processing module comprises receiving the specification from the user during a voice command session with the user.

44. The method of claim 31, wherein:

the voice-processing module comprises (i) a core phoneme dictionary that the processor uses to recognize phonemes in incoming speech signals and (ii) at least one secondary phoneme dictionary that the processor uses to recognize phonemes in incoming speech signals;

the specification comprises an indication of which secondary phoneme dictionary to use when the processor is processing a given voice command application;

in the first mode of operation, the processor uses a first secondary phoneme dictionary to recognize phonemes in incoming speech signals; and

in the second mode of operation, the processor uses a second secondary phoneme dictionary to recognize phonemes in incoming speech signals.

45. The method of claim 44, further comprising:

in the second mode of operation, the processor using the secondary phoneme dictionary in conjunction with the core phoneme dictionary.

46. The method of claim 44, wherein receiving the specification comprises referring to a stored table that correlates voice command applications with secondary phoneme dictionaries.

47. The method of claim 44, further comprising the platform receiving the second secondary phoneme dictionary from a source selected from the group consisting of (i) a provider of the given voice command application, in advance of the voice command session, and (ii) the voice command application, upon loading the voice command application into the platform.