WO2005022295A2 - Media center controller system and method - Google Patents

Media center controller system and method Download PDF

Info

Publication number
WO2005022295A2
WO2005022295A2 PCT/US2004/022301 US2004022301W WO2005022295A2 WO 2005022295 A2 WO2005022295 A2 WO 2005022295A2 US 2004022301 W US2004022301 W US 2004022301W WO 2005022295 A2 WO2005022295 A2 WO 2005022295A2
Authority
WO
WIPO (PCT)
Prior art keywords
user
input
audio
media center
processor
Prior art date
Application number
PCT/US2004/022301
Other languages
French (fr)
Other versions
WO2005022295A3 (en
Inventor
Dean C. Weber
Laszlo B. Betyar
Kenneth R. Kubinak
James E. Hadzicki
Original Assignee
One Voice Technologies, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by One Voice Technologies, Inc. filed Critical One Voice Technologies, Inc.
Publication of WO2005022295A2 publication Critical patent/WO2005022295A2/en
Publication of WO2005022295A3 publication Critical patent/WO2005022295A3/en

Links

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08CTRANSMISSION SYSTEMS FOR MEASURED VALUES, CONTROL OR SIMILAR SIGNALS
    • G08C17/00Arrangements for transmitting signals characterised by the use of a wireless electrical link
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • H04L67/125Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks involving control of end-device applications over a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/30Definitions, standards or architectural aspects of layered protocol stacks
    • H04L69/32Architecture of open systems interconnection [OSI] 7-layer type protocol stacks, e.g. the interfaces between the data link level and the physical level
    • H04L69/322Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions
    • H04L69/329Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions in the application layer [OSI layer 7]
    • GPHYSICS
    • G08SIGNALLING
    • G08CTRANSMISSION SYSTEMS FOR MEASURED VALUES, CONTROL OR SIMILAR SIGNALS
    • G08C2201/00Transmission systems of control signals via wireless link
    • G08C2201/30User interface
    • G08C2201/31Voice input
    • GPHYSICS
    • G08SIGNALLING
    • G08CTRANSMISSION SYSTEMS FOR MEASURED VALUES, CONTROL OR SIMILAR SIGNALS
    • G08C2201/00Transmission systems of control signals via wireless link
    • G08C2201/40Remote control systems using repeaters, converters, gateways
    • G08C2201/42Transmitting or receiving remote control signals via a network

Definitions

  • the present invention relates to media center control, and, more particularly, to media center control by a user.
  • Remotely controlled devices are commonplace today. Remote control devices typically have multiple buttons each one of which when actuated by a user may send a remote command to the remotely controlled device causing the controlled device to change its state of operation (e.g., change television channel or volume setting). Remote control devices may control a single device or multiple devices. A universal remote control has been developed that can control multiple different devices from different, commercial manufacturers.
  • remote controls can be difficult to use in darkened rooms or under other conditions in which the button labels may be difficult to ascertain and, in any case, require the user to locate the button corresponding to the desired function.
  • users of a media center in a home or office may experience difficulty in attempting to control media devices or perform media related tasks using a remote control under conditions otherwise favorable to the media experience (e.g., seated or standing in a darkened room while directing attention to a display or screen).
  • voice command input may provide an easier user input mechanism.
  • Embodiments of the present invention may include a media center controller for controlling and providing user access to multiple devices and applications of a media center. Embodiments may also include systems and methods for transmitting and receiving speech commands from a user for remotely controlling one or more devices or applications.
  • a remote control device may be used as a voice command access point to control a variety of media related functions of a media center.
  • Embodiments may further include a media center controller that allows users to control various media center activities via manual devices, such as keypad or keyboard, or by voice command, which may include speaking naturally to their computers. Such activities may include playing music and DVDs, launching applications, dictating letters, browsing the Internet, using instant messaging, reading and sending electronic mail, and placing phone calls.
  • manual devices such as keypad or keyboard
  • voice command which may include speaking naturally to their computers.
  • Such activities may include playing music and DVDs, launching applications, dictating letters, browsing the Internet, using instant messaging, reading and sending electronic mail, and placing phone calls.
  • FIGURE 1 is a system functional block diagram according to at least one embodiment
  • FIGURE 2 is a flow chart illustrating a method according to at least one embodiment
  • FIGURE 3 is a detailed functional block diagram of at least one embodiment of a media center controller according to the invention.
  • FIGURE 4 is a detailed functional block diagram of a media center controller remote control device according to at least one embodiment;
  • FIGURE 5 is a detailed functional block diagram of a media center controller computing device according to at least one embodiment.
  • FIGURE 6 is a logical control and data flow diagram depicting the transfer of information among various modules comprising the media center command processor according to at least one embodiment
  • FIGURES 7a and 7b are a flow chart of a media center control method according to at least one embodiment
  • FIGURE 8 shows a top level menu interactive page according to at least one embodiment
  • FIGURE 9 shows a send voice recording interactive page according to at least one embodiment
  • FIGURE 10 shows a send e-mail interactive page according to at least one embodiment
  • FIGURE 11 shows a read e-mail interactive page according to at least one embodiment
  • FIGURE 12 shows a send text message interactive page according to at least one embodiment
  • FIGURE 13 shows a voice activated dialing interactive page according to at least one embodiment
  • FIGURE 14 shows a messenger interactive page according to at least one embodiment
  • FIGURE 15 shows a user account interactive page according to at least one embodiment
  • FIGURE 16 shows a user contacts interactive page according to at least one embodiment
  • FIGURES 17a and 17b are a flowchart of a method voice over Internet Protocol (VoIP) or Personal Computer (PC)-to-PC applications in an embodiment
  • FIGURES 18a and 18b are a flowchart of a method 1800 for PC-to- phone applications in an embodiment. DETAILED DESCRIPTION
  • the system and methods may include a computing device having a user dialog manager to process commands and input for controlling one or more controlled devices or applications.
  • the system and methods may include the capability to receive and respond to commands and input from a variety of sources, including voice and manual entry commands and spoken commands from a user, for remotely controlling one or more electronic devices.
  • the system and methods may also include a user interaction device capable of receiving spoken user input and transferring the spoken input to the computing device.
  • the user interaction device may be a handheld device.
  • embodiments of the present invention may include a system and method, interacting with a computer using a remote control device for controlling the computing device.
  • remote control devices may be used such as, for example, a Universal Remote Control device, which transmits utterances (i.e., spoken information) to a receiving computer device that may perform speech processing and natural language processing.
  • the remote control device may include a microphone, and optionally a speaker, along with an optional microphone On/Off button. When actuated, the microphone On/Off button may mute the device(s) controlled by the remote control device, and begin its transmitting of the user's utterance to the receiving computing unit. When released, the microphone On/Off button may deactivate the microphone and un-mute the affected device(s) (such as, for example, television, stereo).
  • the receiving computing unit may provide the audio transmission from the remote control device to a speech processing application and may transmit audio back to the remote control device for playback to the user using the speaker.
  • FIGURE 1 is a system functional block diagram of at least one embodiment.
  • a system 100 may include a remote control device 101 which may be coupled to a computing device 102 using an interface 103.
  • the remote control device 101 may also include a remote control interface 104 for transmitting commands to one or more controlled devices 105.
  • the remote control device may be a media center controller remote control unit.
  • a media center command processor 106 may be coupled to or included with the computing device 102 and provided in communication with the remote control device 101 using the interface 103.
  • the computing device 102 may be coupled to one or more controlled devices 105.
  • the computing device 102 may be a media center controller computing device.
  • the computing device 102 may include a speech recognizer 110 and a natural language processor 111.
  • the speech recognizer 110 and the natural language processor 111 may be implemented, for example, using a sequence of programmed instructions executed by the computing device 102.
  • the speech recognizer 110 and the natural language processor 111 may comprise multiple portions of their respective applications, each of the portions executing on one or more of the computing device 102, and the media center command processor 106.
  • no training sequences are required by the speech recognizer 110.
  • the speech recognizer 110 may be configured to determine one or more remote control commands corresponding to the received audio signal.
  • the speech recognizer 110 may include a speech processing capability that detects features of the audio signal sufficient to identify the corresponding remote commands or user requests or input.
  • the mapping of the features to remote commands/requests may be maintained at the computing device 102 using, for example, non- volatile storage media such as a hard drive.
  • the computing device 102 Upon determining the remote command(s) or input, the computing device 102 sends the corresponding response(s) to the remote control device 101 using the interface 103.
  • the audio signal may be input to the natural language processor 111 for extraction of the relevant portions of the audio signal required for the speech recognizer 110 to determine the associated command or input.
  • the natural language processor 111 may receive the audio signal prior to the speech recognizer 110, at the same time as the speech recognizer 110, or only if the speech recognizer 110 first fails to confidently determine the corresponding remote command.
  • the remote control device 101 may output the remote command to the affected controlled device(s) 105 using the remote control interface 104.
  • one or both of the speech recognizer 110 and the natural language processor 111 may be implemented in the media center command processor 106 which is coupled to or included with the computing device 102.
  • the media center command processor 106 may include hardware and software components to perform the speech analysis described above, thereby reducing the processing load and processing bandwidth requirements for the computing device 102.
  • the media center command processor 106 may be operably coupled to the computing device 102 using a variety of known interfacing mechanisms (e.g., USB, Ethernet, RS- 232, parallel port, IEEE 802.11).
  • the media center command processor 106 may be coupled to the controlled device(s) 105 using a network 107.
  • the media center command processor 106 may be a set top box.
  • the media center command processor 106 may be implemented as one or more internal circuit board assemblies, software or a sequence of programmed instructions, or a combination thereof, of the computing device 102.
  • the media center command processor 106 may be implemented using hardware and software in the remote control device 101 or one or more of the controlled devices 105.
  • the computing device 102 and media center command processor 106 may be implemented using one or more computing platforms of a headend system for cable or satellite television or media signal distribution. Jn particular, the computing device 102 may be provided using one or more servers, which may be PC-based servers, at the headend. In these embodiments, the media center command processor 106 may be implemented as one or more internal circuit board assemblies, software or a sequence of programmed instructions, or a combination thereof, of the headend.
  • the remote control device 101 may output remote control signals (either keypad command or voice input) to the headend computing device 102 via the interface 103.
  • the interface 103 may be a satellite channel or a cable channel for communications in the direction from the user to the headend.
  • a Cable Television (CATV) converter box may be provided for transmitting information back to the CATV service provider or headend from the remote control device 101.
  • CATV Cable Television
  • the remote control device 101 may include buttons which, when actuated by a user, cause the transmission of remote commands or status inquiries to the controlled device(s) 105 using the remote control interface 104.
  • the remote control device 101 may be capable of controlling a single device, multiple devices, or may be a Universal Remote Control device capable of controlling multiple controlled devices 105 provided by different manufacturers.
  • the remote control device 101 may be a BluetoothTM capable headset.
  • the remote control device 101 may allow user selection of a particular controlled device 105 to be controlled using the remote control device 101.
  • the remote control device 101 may include at least one processor such as, but not limited to, a microcontroller implemented using an integrated circuit.
  • the remote control device 101 may simultaneously send or broadcast information to more than one controlled device 105.
  • the remote control device 101 may include a microphone 120, a speaker 121, and a switch 122 operable to actuate the microphone and transmit information using the interfaces 103 and 104.
  • actuation of the switch 122 may cause information to be sent to one or more controlled devices 105 using the remote interface 104 that causes the audio output of those devices 105 to be muted while the switch is actuated.
  • the information or command that causes the muting may be sent from the media center command processor 106 or the computing device 102 directly to the controlled device 105.
  • the interface 103 transmits audio signal of the audio received from the microphone 120 (spoken by a user, for example) to the computing device 102.
  • the audio signal may be encoded or compressed using a variety of compression algorithms (e.g., coder-decoder (CODEC), vocoding) to reduce the amount of information transferred using the interface 103, and its attendant bandwidth and data rate requirements.
  • the remote control device 101 may be configured to extract particular features from the audio received from the microphone 120.
  • the remote control device 101 may include a pushbutton by which a user may actuate and release the switch 122.
  • the switch 122 may be voice activated.
  • the remote control device 101 may turn deactivate the microphone 120, cease sending information to the computing device 102 via interface 103, and send an "un-mute" command via remote control interface 104 or interface 107 to the controlled devices 105.
  • This approach reduces the power consumed by the remote control device 101.
  • the mute and un-mute signals may be sent by the computing device 102, in which case the computing device 102 may also include a remote control interface 104; or, the mute and un-mute signals may be sent by the media center command processor 106 via the interface 107, or by the remote interface 104 (if present at the media center command processor 106).
  • the remote control device 101 may include one or more programmable switches and a coder that transmits codes over the remote control interface 104 based on the switch settings as determined by a switch state to code mapping maintained by the remote control device 101.
  • the switches may be programmed by a user interacting with a user interface of the remote control device 101.
  • the switches may be programmed by the computing device 102 using the interface 103.
  • the switch state to code mapping is maintained by the computing device 102 and downloaded to the remote control device 101 using the interface 103.
  • the computing device 102 may be implemented using a personal computer configured to execute applications compatible with the WindowsTM operating system available from Microsoft Corporation of Redmond, Washington.
  • the computing device 102 may execute the MicrosoftTM Windows Media CenterTM operating system.
  • the computing device may be implemented using a game device console (e.g., X-BoxTM, Sony PlaystationTM or Playstation2TM, or GameCubeTM), a television set top box, a digital video recorder (e.g., TiVoTM, Replay TVTM), a home theater sound processor, or other processing device.
  • a game device console e.g., X-BoxTM, Sony PlaystationTM or Playstation2TM, or GameCubeTM
  • a television set top box e.g., a digital video recorder (e.g., TiVoTM, Replay TVTM), a home theater sound processor, or other processing device.
  • all or a portion ofthe systems and methods described herein may be implemented as a sequence of programmed instructions executing on the computing device 102 along with and in cooperation with other processors or computing platforms.
  • the computing device 102 may include a sound card/Universal Serial Bus (USB) port for input of audio signal.
  • USB Universal Serial Bus
  • the computing device 102 may include an audio response capability.
  • the computing device 102 may provide an audio response to the remote control device 101 using the network 103.
  • the remote control device 101 may output the audio response to the user using the speaker 121.
  • the audio response information may be synthesized speech provided by the computing device 102.
  • the audio response information may be stored actual speech information from a human voice, or fragments thereof, or may be generated as required using a speech synthesis application.
  • the audio response information may produce audio confirming to the user that the operation requested in the audio signal (e.g., spoken request from the user) have been accomplished.
  • these audio response functions may be performed by the computing device 102 without involving the remote control device 101, by using, for example, the interface 107.
  • the audio response may be played from a speaker on the computing device 102 (the computing device 102 having a sound card) or from a speaker of one or more of the controlled devices 105.
  • the media center command processor 106 may provide some or all of these audio response functions, or may share them with the computing device 102.
  • Controlled devices 105 may include electronic devices produced by different manufacturers such as, for example, but not limited to, televisions, stereos, video cassette recorders (VCRs), Compact Disc (CD) players/recorders, Digital Video Disc (DVD) players/recorders, TiVoTM units, satellite receivers, cable boxes, television set-top boxes, the Internet and devices provided in communication with the Internet, tuners, and receivers.
  • the remote control interface 104 may include, for example, an InfraRed (IR) wireless transceiver for transmission, and possibly reception, of command and status information to and from the controlled devices 105, as is commonly practiced.
  • IR InfraRed
  • the remote control interface 104 may be implemented according to a variety of techniques in addition to IR including, without limitation, wireline connection, a Radio Frequency (RF) interface, telephone wiring carried signals, BlueToothTM, FirewireTM, 802.11 standards, cordless telephone, or wireless telephone or cellular or digital over-the- air interfaces.
  • RF Radio Frequency
  • the computing device 102 may be configured as an Interactive Voice Response (IVR) system.
  • the computing device may be configured to support a limited set of IVR command-response pairs such as, for example, command-responses that accomplish pattern matching for the received audio signal without semantic recovery.
  • the interfaces 103 and 107 may be an electronic network capable of conveying information such as, for example, an RF network.
  • an RF network examples include Frequency Modulation (FM), IEEE 802.11 standard and variations, IR, FirewireTM, and BluetoothTM.
  • the interface 103 may be a satellite communication channel or a Cable Television (CATV) channel. Other networks are possible.
  • FM Frequency Modulation
  • CATV Cable Television
  • the remote control device 101 may include navigation keys 301, a numeric and text entry keypad 302, a microphone 120, a speaker 121, a mute button or switch 122, an interface 103, and a remote control interface 104.
  • the interface 103 may further include an audio receiver 303, an audio transmitter 304, and a function key transmitter 305.
  • the telephone customer premises equipment may be used to obtain and process a user's audio utterances for remote control.
  • the remote control device 101 may be implemented using a telephone handset (which may be a wireline or a cordless or cellular/mobile handset or headset) having the speech processing capabilities described herein.
  • Audio signal may be transmitted from the telephone handset to the computing device 102 using the existing household telephone wiring.
  • the handset microphone and speaker may be used for obtaining the user's utterances and for playback of the audio response, respectively.
  • the remote command information received from the computing device 102 may be transmitted by the handset to the controlled device(s) 105 using the interface 103 included in the handset for this purpose.
  • the computing device 102 may output audio queries to the user via the handset speaker (e.g., "What do you want to do?").
  • FIGURE 2 is a flow chart of a method 200 according to at least one embodiment.
  • the method 200 may commence at 202. Control may then proceed to 204 at which the user activates the microphone button on the remote control device. In response, at 206, the remote control unit may mute the controlled device(s). Upon the user uttering a command at 208, the remote control device microphone may output (for example, by streaming) the audio uttered by the user at 210 and transmit the audio signal to the computing device at 212.
  • the remote control device may unmute the controlled device(s) at 216.
  • the computing device may perform speech processing as described above to determine the associated remote command(s) at 218.
  • the computing device may then transmit the corresponding response (which may be a device command) to the remote control device at 220.
  • the input is a non- spoken input, a keypad or keyboard input may be received at 219.
  • Control may then proceed to 228, at which the computing device provides the command to the controlled device(s).
  • the computing device may also transmit an audio response to an audio output device at 222.
  • the audio output device may play the audio response to the user using a speaker at 226.
  • the computing device may output the audio response directly to the controlled device to play over a speaker of the controlled device.
  • the method may end.
  • a media center may be any system that includes a processor configured to provide control and use of multiple media devices or capabilities.
  • Examples of such media devices include, but are not limited to, Television (TV), cable TV, direct broadcast satellite, stereo, Video Cassette Recorder (VCR), Digital Video Disc (DVD), Compact Disc (CD), TivoTM recorder, and World Wide Web (WWW) browser, electronic mail client, telephone, voicemail.
  • TV Television
  • VCR Video Cassette Recorder
  • DVD Digital Video Disc
  • CD Compact Disc
  • WWW World Wide Web
  • One or more of these media devices may be implemented using application software programmed instructions executing on a personal computer or computer platform.
  • FIGURE 3 is a detailed functional block diagram of a media center controller 300 according to at least one embodiment.
  • the media center controller 300 may include a computing device 102, which may be a media center controller computing device.
  • the computing device 102 may be coupled to a remote control device 101, which may be a media center controller remote control device, for receiving and transmitting audio information and for receiving control data from the remote control device 101.
  • the computing device 102 may include the media center command processor 106.
  • the media command processor 106 may include a speech transceiver capability.
  • the computing device 102 for media center controller 300 may be operably coupled to a variety of media devices as described above.
  • the media center controller computing device 102 may be operably coupled to, for example, but not limited to, a radio signal source 301 for receiving radio broadcast signals, a Television (TV) signal source 302 for receiving TV broadcast signals, a satellite signal source 303 for receiving satellite transmitted TV and data signals, including direct broadcast satellite TV and data signals, a CATV converter box 313 for communication to and from a CATV headend, and to a private or public packet switched network 304 such as, for example, the Internet, for receiving and transmitting a variety of packet based information to other PCs or other communications devices.
  • a radio signal source 301 for receiving radio broadcast signals
  • TV Television
  • satellite signal source 303 for receiving satellite transmitted TV and data signals, including direct broadcast satellite TV and data signals
  • CATV converter box 313 for communication to and from a CATV headend
  • a private or public packet switched network 304 such as, for example, the Internet
  • Packet based information transferred by the computing device 102 includes, but is not limited to, electronic mail (email) messages in accordance with SMTP, Instant Messages (IM), Voice-Over-Internet-Protocol (VOIP) information, HTML and XML formatted pages such as, for example, WWW pages, and other packet or IP based data.
  • email electronic mail
  • IM Instant Messages
  • VOIP Voice-Over-Internet-Protocol
  • HTML and XML formatted pages such as, for example, WWW pages, and other packet or IP based data.
  • Further media devices to which the media center controller computing device 102 may be operably coupled to include, for example, but are not limited to, a wireline or cordless access telephone network 305 such as the Public Switched Telephone Network (PSTN), and wireless or cellular telephone systems.
  • the computing device 315 may be coupled to a telephone handset 315, which may be a cordless or wireless handset.
  • the computing device 102 may be optically or electronically coupled to a keyboard and mouse 311 for receiving command and data input, as well as to a camera 312 for receiving video input.
  • the computing device 102 may also be coupled to a variety of known video devices, optionally using a video receiver 306, for output of video or image information to a television 307, computer monitor 308, or other display device.
  • the computing device 102 may also be coupled to a variety of known audio devices, optionally using an audio receiver 309, for output of audio information to one or more speakers 310.
  • the media center controller 300 may include an audio file/track player to play audio files requested by the user; and an audio/visual player to play audio/visual files or tracks requested by the user.
  • the computing device 102 and media center command processor 106 may be implemented using one or more computing platforms of a headend system for cable or satellite television or media signal distribution.
  • the computing device 102 may be provided using one or more servers, which may be PC-based servers, at the headend.
  • the media center command processor 106 may be implemented as one or more internal circuit board assemblies, software or a sequence of programmed instructions, or a combination thereof, of the headend.
  • the remote control device 101 may output remote control signals (either keypad command or voice input) to the headend computing device 102 via the interface 103.
  • the interface 103 may be a satellite channel or a cable channel for communications in the direction from the user to the headend.
  • the media center controller 300 may include a CATV converter box for transmitting information back to the CATV service provider or headend from the remote control device 101.
  • FIGURE 4 is a detailed functional block diagram of a media center controller remote control device 101 according to at least one embodiment.
  • the remote control device 101 may include navigation buttons 401 operable to allow a user to input directional commands relative to a cursor position or to scroll among items for selection using a display, a numeric and text entry keypad 402 operable to allow a user to input numeric and text information, the microphone 120 for receiving user voice utterances, the speaker 121 for providing audio output to a user, the activation/mute switch 122 for muting controlled devices, the remote control interface 104 for sending information to controlled devices, and the interface 103 for transferring audio to and from and control data to the computing device 102.
  • navigation buttons 401 operable to allow a user to input directional commands relative to a cursor position or to scroll among items for selection using a display
  • a numeric and text entry keypad 402 operable to allow a user to input numeric and text information
  • the microphone 120 for receiving user voice utterances
  • the remote control device 101 may further include a 'clear' button and an 'enter' button.
  • the interface 103 may include an audio receiver portion 403, an audio transmitter portion 404, and a function key transmitter portion 405, for transferring this respective information to the computing device 102.
  • FIGURE 5 is a detailed functional block diagram of a media center controller computing device 102 according to at least one embodiment.
  • the computing device 102 may include the media center command processor 106.
  • the computing device 102 may also include standard computer components 506 such as, but not limited to, a processor, memory, storage, and device drivers.
  • the computing device 102 may be a Microsoft WindowsTM compatible PC provided by a variety of manufacturers such as the Dell Corporation of Austin, Texas.
  • the computing device 102 may also include an audio transmitter 507 for transferring synthesized speech and other audio output to the remote control device 101, an audio receiver, or other controlled device for output to a listening user.
  • the computing device 102 may also include an audio receiver 508 for receiving audio information from the remote control device 101 or a microphone.
  • the computing device 102 may include a data receiver 509 for receiving function key, keypad, or navigation key information from the remote control device 101, and for receiving keyboard or mouse input, and for receiving packet based information. Other types of received data are possible.
  • the media center command processor 106 may include the speech recognition processor 110, an audio feedback generator 505 that may include a speech synthesizer , a data/command processor 502, a sequence processor 503, and a user dialog manager 501.
  • the speech recognition processor 110 may further include the natural language processor 111.
  • each of these items comprising the media center command processor 106 may be implemented using a sequence of programmed instructions which, when executed by a processor such as the processor 506 of the computing device 102, causes the computing device 102 to perform the operations specified.
  • the media center command processor 106 may include one or more hardware items, such as a Digital Signal Processor (DSP), to enhance the execution speed and efficiency of the voice processing applications described herein.
  • DSP Digital Signal Processor
  • the speech recognition processor 110 may receive the audio signal and convert or interpret it to one or more particular commands or to input data for further processing.
  • natural language processing may also be used for voice command interpretation. Further details regarding the interaction between the user dialog manager 501 and the speech recognition processor 110 for natural language processing are set forth in commonly assigned U.S. Patent No. 6,532,444, entitled “USING SPEECH RECOGNITION AND NATURAL LANGUAGE PROCESSING,” issued March 11, 2003 (“the '444 patent”), which is hereby incorporated by reference as if set forth fully herein.
  • the computing device 102 may be configured to include the natural language processor 111 and speech recognition processor 110 as described with respect to the functional block diagram in Figure 2 of the '444 patent.
  • the speech recognition processor 110 may include a natural language processor 111 as described herein to assist in decoding and parsing the received audio signal.
  • the natural language processor 111 may be used to identify or interpret an ambiguous audio signal resulting from unfamiliar speech phraseology, cadence, words, etc.
  • the speech recognition processor 110 and the natural language processor 111 may obtain expected speech characteristics for comparison from the grammar/sequence database 504.
  • the audio feedback generator 505 may be configured to convert stored information to a synthesized spoken word recognizable by a human listener, or to provide a pre-stored audio file for playback.
  • the data/command processor 502 may be configured to recejve and process non-spoken information, such as information received via keyboard, remote 101 keypad, email, or VOIP, for example.
  • the sequence processor 503 may be configured to retrieve and executed a predefined spoken script or a predefined sequence of steps for eliciting information from a user according to a hierarchy of different command categories.
  • the sequence processor 503 may also validate the input received as being at the proper or expected step of a sequence or scenario.
  • the sequence processor 503 may obtain the sequence information from the grammar/sequence database 504.
  • the sequence processor 503 may determine an appropriate response for output to the user based on the received user input. In making this determination, the sequence processor 503 may use or consult a sequence or set of steps associated with the input and the context of the task requested or being performed by the user.
  • the user dialog manager 501 may provide management for functions such as, but not limited to: determining whether input received from an application includes an audio signal for speech recognition or is command/data input for command interpretation; requesting command validation and response identification from the sequence processor; outputting audio or display based responses to the user; requesting text to speech conversion or speech synthesis; requesting audio and/or visual output processing; and calling operating system functions and other applications as required to interact with the user.
  • the media center command processor 106 may further comprise a grammar/sequence database 504.
  • the grammar/sequence database 504 may include predefined sequences of information, each of which may be used by the sequence processor 503 to output information or responses to a user designed to elicit information from the user necessary to perform a media related function in a contextually proper manner. Further, the grammar/sequence database 504 may include state information to specify the valid states of a task, as well as the permissible state transitions.
  • FIGURE 6 is a logical control and data flow diagram depicting the transfer of information among various modules of the media center command processor 106 according to at least one embodiment.
  • the user dialog manager 501 may receive user input from a variety of input devices via an application processor 601.
  • the application processor 601 may be configured to receive input from a user via spoken information such as, for example, audio signals received from the remote control device 101, as well as to receive non-spoken information, such as information received via keyboard manual entry, remote 101 keypad, or Voice Over Internet Protocol (VOIP), for example.
  • the user dialog manager 501 may transfer the audio signal to the speech recognition processor 110 for interpretation of the received audio signal into command or data information.
  • the user dialog manager 501 may transfer command information to the data/command processor 502 for further processing such as, for example, validation of the received input in the context of the requested task or task in process.
  • the user dialog manager 501 may also request the sequence processor 503 to validate that the received input is within an acceptable range and is received in the proper or expected sequence for an associated task. If the input is valid and in-sequence, the sequence processor 503 may identify to the user dialog manager 501 an appropriate response to be output to the user. Based on this response information, the user dialog manager 501 may request the audio feedback generator 505 to prepare an audio response to be output to the user, or may play a pre-recorded prompt. The user dialog manager 501 may also request a visual output formatter 602 to prepare a visual response to be output to the user.
  • the user dialog manager 501, the visual output formatter 602, and the application processor 601 may output the user response to an operating system 603 of the computing device 102 as well as to applications or device drivers for a variety of output devices 604 for output to the user, such that the user dialog manager 501 is logically connected through operating system services to input/output devices.
  • FIGURES 7a and 7b illustrate a flow chart of a media center control method 700 according to at least one embodiment.
  • a method 700 may commence at 705. Control may then proceed to 710, at which user input is received by an application or an application processor.
  • the input may be received from a user via spoken information such as, for example, audio signals received from the remote control device 101, but may also include non-spoken information, such as information received via keyboard manual entry, remote 101 keypad, or VOIP, for example.
  • Control may then proceed to 715, at which the application processor may transfer the user input (e.g., audio signal, commands, data) to the user dialog manager for interpretation.
  • Control may then proceed to 717, at which the user dialog manager may classify the input as audio or non-spoken input.
  • the user dialog manager may then transfer the audio signal to the speech recognition processor for interpretation of the audio signal into command or data information.
  • the user dialog manager may transfer non-spoken information to the data/command processor for further processing such as, for example, validation of the received input in the context of the requested task or task in process.
  • control may proceed to 735 at which natural language processing may be performed.
  • the natural language processing may provide for additional interpretation of the audio signal for determining the requested command, operation, or input.
  • Control may then proceed to 740, at which the speech recognition processor or data/command processor provide an indication of the interpreted command(s) or input to the user dialog manager.
  • control may then proceed to 745, at which the user dialog manager may transfer the interpreted command(s) or input to the sequence processor for validation.
  • the sequence processor may obtain command set and sequence information associated with the interpreted command(s) or input from the grammar/sequence database.
  • Control may then proceed to 755, at which the sequence processor may validate that the interpreted command or input is within an acceptable range and is received in the proper or expected sequence or dialog step for an associated task as specified in a predefined state table contained in the grammar/sequence database. If at 760 the sequence processor determines that the interpreted command or input is valid, then control may proceed to 764; otherwise, control proceeds to 762 at which the sequence processor provides an error indication to the user dialog manager indicating command/input validation failure.
  • the sequence processor may identify to the user dialog manager an appropriate response to be output to the user. Control may then proceed to 765, at which, based on this response information, the user dialog manager may prepare a response to the user. Control may then proceed to 770, at which the user dialog manager may determine if an audio output response is to be provided. If so, control may then proceed to 780 at which the user dialog manager requests the audio feedback generator to prepare an audio response to be output to the user, or plays a pre-recorded audio file. In either case, at 775 the user dialog manager may request the visual output formatter to prepare a visual response to be output to the user.
  • Control may then proceed to 785, at which the user dialog manager, the visual output formatter, and the application processor may output the user response to an operating system of the computing device as well as to applications or device drivers for a variety of output devices for output to the user.
  • a method may end.
  • the media center controller 300 may be used for control of and interaction with a variety of media devices and functions.
  • the media center controller may allow a user to command a platform (device or computer) to implement capabilities such as, but not limited to: making audio phone calls; making video phone calls; instant messaging; video messaging; sending voice recordings; reading e-mail; sending e-mail; sending text messages; managing user contacts; accessing voice mail; calendar management; playing music; playing movies; playing the radio; playing TV programs; recording TV programs; browsing the Internet; dictating documents; entering dates into a personal calendar application and having the system provide alerts for upcoming scheduled meetings and events; and launching applications.
  • a platform device or computer
  • the mechanism for interaction with the computer system is accomplished through either a) a remote control device, b) microphone input, c) keyboard and mouse, or d) touch-screen.
  • the remote control device it may be a multi-mode input device that has a keypad for manual entry of commands transmitted to the system, as well as a microphone embedded in the remote control, allowing the user to provide spoken commands to the system.
  • the media center controller 300 may include a natural language interface allows users to speak freely and naturally. However, manual key, touchscreen and keyboard/mouse interface may also be provided as an option to speech.
  • the media center controller 300 may provide a mechanism, such as a logon authentication process using an interactive page, to identify the current user and allow or deny access to the system.
  • FIGURE 8 shows a top level menu interactive page 800 according to at least one embodiment.
  • the top level menu interactive page 800 may include several media function selection buttons 801.
  • a request to execute the associated media function may be received by the application processor 601.
  • the application processor 601 may forward the request to the user dialog manager 501 for processing as described with respect to FIGURES 7a and 7b herein.
  • FIGURE 9 shows a send voice recording interactive page 900 according to at least one embodiment.
  • the send voice recording interactive page 900 may provide an interface by which a user may compose and send a recorded voice message for a wireless device. Using this feature, the user can record a voice message to a recipient, such as a contact, and then send the recorded voice message to the recipient.
  • the media center controller 300 may record the voice message as a .wav file, for example.
  • the recipient listener will hear exactly what the recording user says, so misinterpretation can be avoided.
  • the recorded voice message may be delivered to the recipient's inbox as an e-mail. When the recipient opens the e-mail message, they will hear the .wav file play your message.
  • FIGURE 10 shows a send e-mail interactive page 1000 according to at least one embodiment.
  • the send e-mail interactive page 1000 may provide an interface by which a user may compose and send an e-mail message for a wireless device. Using this feature, the user may speak his message into the wireless device and his voice is converted to text as discussed herein. The e-mail message may be sent to the recipient using a network, and will appear in the recipient's inbox as if it was written on a computer.
  • the send e-mail feature requires no keypad tapping to create. While the user dictates the message, he may be provided the option to edit, add more, or send.
  • FIGURE 11 shows a read e-mail interactive page 1100 according to at least one embodiment.
  • the read e-mail interactive page 1100 may provide an interface by which a user may read an e-mail message.
  • users may access their corporate or personal e-mail account via the Media center controller.
  • a POP3, IMAP, or corporate e- mail account may be required.
  • a user first enters her e-mail server name, account name, and password into the user profile portion (see FIGURE 15) of the read e-mail interactive page 1100.
  • the entered information may be stored by the computing device of the media center controller. Thereafter, when the user calls in, she will be able to check her e-mail by saying "Read E-mail.”
  • users may have the option to reply to, forward, delete, and skip e-mails.
  • FIGURE 12 shows a send text message interactive page 1200 according to at least one embodiment.
  • the send text message interactive page 1200 may provide an interface by which a user may send a text message.
  • Text messaging is a way to send short messages from wireless device to a wireless phone.
  • users may send text messages such as, for example, SMS messages, to anyone with a messaging-capable phone.
  • the send text message interactive page 1200 may include a characters remaining field 1201 for informing the user how many text characters may be added to an in-process message.
  • the media center controller 300 may determine the number of characters remaining based on the display characteristics and capabilities of the receiving wireless device as maintained using a database.
  • FIGURE 13 shows a voice activated dialing interactive page 1300 according to at least one embodiment.
  • the voice activated dialing interactive page 1300 may provide an interface by which a user may make voice-activated telephone calls by speaking a name, nickname, or number. Users can store all of their contact information using a user account interactive page, such as> shown in FIGURE 15, of the media center controller 300. In at least one embodiment, there is no need to train the media center controller 300 to recognize each name.
  • FIGURE 14 shows a Windows MessengerTM interactive page 1400 according to at least one embodiment.
  • the Windows MessengerTM interactive page 1400 may provide an interface by which a user may communicate in real-time with other people who use Windows MessengerTM and who are signed in to the same instant messaging service.
  • the media center controller 300 may allow users to send instant messages to each other by typing; to communicate through a PC-to-PC audio connection; or to communicate through a PC-to-PC audio/video connection.
  • the media center controller 300 may provide an interface by which a user may access voice mail systems (VM) by voice command over the telephone network.
  • VM voice mail systems
  • the media center controller upon a (spoken or keyboard/keypad entered) command from the user to connect to his VM, the media center controller will connect to VM by dialing, connecting the call, and automatically playing the proper VM Connect tone to the far-end VM system (for example, a "*" tone), and then automatically (if so selected by the user) playing the VM user account number and password, as appropriate, through DTMF.
  • VM Connect tone for example, a "*" tone
  • this automated activity may be transparent to the user.
  • the user states "[name] Voice Mail," the user hears Music On Hold (MOH), or feedback to the user alerting him to wait for computer processing, until the request is recognized and the media center controller 300 has forwarded the account and password tones to the VM system.
  • MOH Music On Hold
  • the media center controller 300 may play a VM greeting from the VM system.
  • the connection to the VM system is complete (if the user provided an incorrect account number or password), the media center controller 300 may connect through anyway and the user will hear the VM system request proper authorization keys.
  • the media center controller 300 will have connected the VM outgoing line to the user so he can hear the prompts, but the line from the user will be connected to the media center controller for voice recognition. If the user hits one or more DTMF keys, the DTMF tones may be passed through to the VM system. Note that a '##' key sequence will still disconnect from the VM system (assuming that VM systems will not use '##' for any commands).
  • media center controller 300 voicemail may provide most-often-used features such as, but not limited to: Play Voice Mail; Playback/Rewind/Repeat; Pause; Fast Forward n sees, Fast Rewind n sees; Get Next/Skip Ahead; Get Previous; Delete/Erase; Save Voice Mail; Call Sender; Help/VM Menu.
  • the system may also respond to requests such as "Help", tutorial,” and "All Options.” System response to such user requests will be analogous to how the system responds to these commands in other VUI sequences. Note that some VM systems do not support all of the features listed.
  • Unsupported features may be removed from the media center controller 300 prompts and online help, or the media center controller 300 will play a prompt to indicate that the requested feature is not supported by the active VM system.
  • a simple command e.g., "[Get my I Call my] Voice Mail" may connect to a caller's VM.
  • the system may prompt "Say 'Verizon' or 'One Voice' or 'Home.'" From the main menu, for multiple VMs (e.g., carrier and corporate) the caller may be able to select which VM system he wants: “Voice Mail for Verizon”, “Verizon VoiceMail”, or "Voice Mail for One Voice.”
  • VMs e.g., carrier and corporate
  • the user may set up for multiple VM systems, choosing from carrier, business and home VM, by interacting with VM systems externally to them and using their own commands.
  • the fields defining a VoiceMail entry may include: friendly name, provider selection list box, password required checkbox, and password text field that is masked for security. If the 'Provider' selection is "Other", then other fields including a selection identifying the VM Connect key sequence (usually '#') may be displayed and need to be entered. In an embodiment, many VM systems appear in the dropdown listbox for 'Provider' to make the selection easier for the user.
  • the following example describes how a user of the media center controller 300 may access carrier voicemail.
  • the user may say, "Voice Mail.”
  • the media center controller 300 may respond with, for example, "Just a moment while I connect you to [your voice mail system].”
  • the media center controller 300 may then call the VM system.
  • the media center controller 300 may issue a "VM Connect" DTMF ('#' for Service Provider 1, '*' for Service Providers 2 and 3), if required, n msecs after off-hook and then DTMF the user's account number and/or password n msecs after it DTMFed the "VM Connect".
  • VM Connect DTMF
  • the media center controller 300 may not know that and it will still connect, but the login to the VM system will then fail. If the VM system hangs up, the media center controller 300 may respond with, for example, "Sorry, we could not connect totinct.”
  • the 'Voice Mail Account Number' field for the carrier may be visible only if the user has Voice VM service provided by the carrier.
  • the 'Voice Mail Password' field for the carrier may be visible only if the user has Voice VM service provided by the carrier. For corporate or home VM access, the password field is always visible.
  • the media center controller 300 may include calendar management.
  • calendar management the media center controller 300 may allow a user to access calendar functions by speaking, "Calendar.” The media center controller 300 may respond with, for example, "OK.
  • calendar main menu commands may include: Add [an] appointment; Add [a] meeting; Edit; Delete; Look up; [Main Menu, All Options, Help, Cancel, tutorial] - these are available at most response points. Also, in the following scenarios the "Undo" command always takes the user back to the previous step.
  • the user may speak, "Add an appointment.”
  • the media center controller 300 may respond with, for example, "OK. Please say the month and date of your appointment.” ⁇ 3 second delay> "You can also say today, tomorrow, or a day ofthe week.”
  • the user may reply, "October 20 th .”
  • the media center controller 300 may respond, "Monday, October 20 th .
  • the media center controller 300 may say the day, month and date followed by the year if the appointment occurs in the next year.
  • the user may reply with one of : “10am to 1 lam,” “10 o'clock,” “10 am for 2 hours,” “10 am,” or “All day.”
  • the media center controller 300 may respond with, for example, "October 20 th , 10 am to 1 lam.
  • the media center controller 300 may save as a .wav file as an attachment or link, as with VR, and then say, "Please say the location.” To which the user may reply, "Scripps Clinic.”
  • the media center controller 300 may save as a .wav file as an attachment or link, as with VR. Variations of this scenario as possible.
  • the media center controller 300 may allow the user to "look up” his calendar for a given day or period and, by interacting with the media center controller 300, receive his calendar schedule for that period.
  • the media center controller 300 may say, "MV: You have ⁇ #> appointment(s) today, October 21 st .
  • First appointment is ⁇ appointment>.
  • Second appointment is ⁇ appointment>.”
  • the user will have the option to choose where he/she would like the calendar alerts sent (e.g., mobile phone, e-mail at work, e-mail at home) under the preferences section of the user accounts interactive page of FIGURE 15.
  • the OutlookTM default will be used to determine when the alert is sent out.
  • Visual indications for calendar alerts may also be provided.
  • FIGURE 15 shows a user account interactive page 1500 according to at least one embodiment.
  • the user account interactive page 1500 may provide an interface by which a user may create a profile with his preferences.
  • users will click on the New User button 1501. They will be asked to provide their first and last name, greeting (how they want Media center controller to greet them at start up), e-mail address, and voice model (male or female).
  • they will also have the option to choose BVI setup, phone setup, e-mail setup, preferences, training, save, delete, or cancel.
  • FIGURE 16 shows a user contacts interactive page 1600 according to at least one embodiment.
  • the user contacts interactive page 1600 may provide an interface by which users may access all of their contacts from any controlled device that can access the media center.
  • the media center communicator 300 may provide users voice access to all their important contact names and phone numbers so they don't have to carry an address book or PDA. Users can also add or edit contact information via voice input.
  • each of the FIGURES 8-16 may include certain interactive display items in addition to those described above beneficial to a user of a media center.
  • FIGURES 9 and 12- 16 show an "album cover" icon in the lower left corner indicating the artist, album, song track, and length of play time remaining for an audio music selection.
  • the media center controller 300 may support a variety of media center functions and applications. Further details regarding the ability ofthe media center controller 300 to support bidirectional VOIP, PC-to-phone, and PC-to-PC communication are set forth below.
  • the media center controller 300 may use a voice command capability to initiate PC-to-PC communications such as, for example, an Internet Messaging (IM) session, or VOIP communications.
  • FIGURES 17a and 17b are a flowchart of a method 1700 for VOIP or PC-to-PC applications using the media center controller 300. Referring to FIGURE 17a, a method 1700 may commence at 1705. Control may then proceed to 1710, while the top level menu (see, for example, FIGURE 8) is displayed, the user may actuate mute switch on a user interaction device (for example, user interaction device 101).
  • a user interaction device for example, user interaction device 101.
  • Control may then proceed to 1715, at which in response to receiving a signal from the user interaction device that the mute switch has been actuated, the media center command processor may output a signal(s) to one or more controlled devices to mute the audio from the controlled devices.
  • Control may then proceed to 1720, at which the user may speak a request for an audio or audio/video messaging session.
  • the spoken request may be received by the user interaction device and provided therefrom to the media center command processor as described herein.
  • Control may then proceed to 1725, at which the media center command processor may process the spoken request as set forth in FIGURES 7a and 7b herein.
  • Control may then proceed to 1730, at which a messaging interactive page may be displayed (see, for example, FIGURE 1400). Control may then proceed to 1735, at which the user may select, via spoken request or manual selection, the person he wants to chat, in accordance with the processing described with respect to FIGURES 7a and 7b herein. Control may then proceed to 1740, at which the user may select, via spoken request or manual selection, to commence the chat session (e.g., selects the "Start Talking" option), in accordance with the processing described with respect to FIGURES 7a and 7b herein.
  • commence the chat session e.g., selects the "Start Talking" option
  • Control may then proceed to 1745 of FIGURE 17b, at which the media center command processor may establish an Internet connection with a VOIP communication server to request an audio or audio/visual connection to the selected party.
  • Control may then proceed to 1750, at which if the selected party accepts the request for a conversation, a bi-directional VOIP channel may be opened between the media center command processor and the user and the called party. A conversation may then ensue.
  • control may then proceed to 1755 of FIGURE 17b at which the media center command processor may establish an Internet connection with another computing device such as, for example, a PC, to request an audio or audio/visual connection to the selected party.
  • control may then proceed to 1760, at which if the selected party accepts the request for a conversation, a bi-directional IP channel may be opened between the media center command processor and the user and the called party.
  • Control may then proceed to 1765, at which the conversation may be terminated by the called party, or by the media center command processor user through selection, via spoken request or manual selection, of a terminate conversation option via, for example, the messaging screen (see, for example, FIGURE 14), in accordance with the processing described with respect to FIGURES 7a and 7b herein. Control may then proceed to 1770, at which a method may end.
  • the media center controller 300 may use a voice command capability to initiate PC-to-phone communications.
  • FIGUREs 18a and 18b are a flowchart of a method 1800 for PC-to-phone applications using the media center controller 300. Referring to FIGURE 18a, a method 1800 may commence at 1805. Control may then proceed to 1810, while the top level menu (see, for example, FIGURE 8) is displayed, the user may actuate mute switch on a user interaction device (for example, user interaction device 101).
  • a user interaction device for example, user interaction device 101.
  • Control may then proceed to 1815, at which in response to receiving a signal from the user interaction device that the mute switch has been actuated, the media center command processor may output a signal(s) to one or more controlled devices to mute the audio from the controlled devices.
  • Control may then proceed to 1820, at which the user may speak a request to make a telephone call.
  • the spoken request may be received by the user interaction device and provided therefrom to the media center command processor as described herein.
  • Control may then proceed to 1825, at which the media center command processor may process the spoken request as set forth in FIGURES 7a and 7b herein.
  • Control may then proceed to 1830, at which a make phone call interactive page may be displayed (see, for example, FIGURE 1300).
  • Control may then proceed to 1835, at which the user may select, via spoken request or manual selection, the person he wants to chat or the telephone to which he wants to connect, in accordance with the processing described with respect to FIGURES 7a and 7b herein.
  • Control may then proceed to 1840, at which the user may select, via spoken request or manual selection, to commence the initiate the telephone call (e.g., selects the "Dial” option), in accordance with the processing described with respect to FIGURES 7a and 7b herein.
  • the user may select, via spoken request or manual selection, to commence the initiate the telephone call (e.g., selects the "Dial” option), in accordance with the processing described with respect to FIGURES 7a and 7b herein.
  • Control may then proceed to 1845 of FIGURE 18b, at which the media center command processor may establish an Internet connection with a VOIP communication server to request the telephone call to the selected party.
  • Control may then proceed to 1850, at which if the selected party answers the incoming call, a request for a conversation, a bi-directional voice communication channel may be opened between the media center command processor and the user and the called party.
  • the called party may be accessed via the PSTN.
  • the called party may be accessed via an IP enabled phone, handset or communication device.
  • the media center command processor may communicate with the called party using VOIP via VOIP gateway for conversion between IP and PSTN traffic.
  • the PSTN may also be used for voice connections with non- VOIP enabled called parties. A conversation may then ensue.
  • Control may then proceed to 1855, at which the call may be terminated by the called party, or by the media center controller user through selection, via spoken request or manual selection, of a terminate call option via, for example, the make phone call interactive page (such as, for example, FIGURE 13), in accordance with the processing described with respect to FIGURES 7a and 7b herein. Control may then proceed to 1860, at which a method may end.
  • a media center controller that includes a computing device having a user dialog manager to process commands and input for controlling one or more controlled devices of a media center.
  • the system and methods may include the capability to receive and respond to commands and input from a variety of sources, including spoken commands from a user, for remotely controlling one or more electronic devices.
  • the system and methods may also include a user interaction device capable of receiving spoken user input and transferring the spoken input to the computing device.

Abstract

A system and methods for a media center controller. The system and methods include a computing device having a user dialog manager to process commands and input for controlling one or more controlled devices of the media center. The system and methods includes the capability to receive and respond to commands and input from a variety of sources, including spoken commands from a user, for remotely controlling one or more electronic devices and to perform, in response to the input received from the handheld device, speech recognition processing, voice over Internet Protocol communications, instant messaging, electronic mail messaging, or control of one or more controlled devices. The system and methods may also include a user interaction device capable of receiving spoken user input and transferring the spoken input to the computing device.

Description

MEDIA CENTER CONTROLLER SYSTEM AND METHOD
[0001] This application claims the benefit of U.S. Provisional Application No. 60/490,937, filed July 30, 2003.
[0002] This disclosure contains information subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent disclosure or the patent as it appears in the U.S. Patent and Trademark Office files or records, but otherwise reserves all copyright rights whatsoever.
BACKGROUND
1. Field of Invention
[0003] The present invention relates to media center control, and, more particularly, to media center control by a user.
2. General Background
[0004] Remotely controlled devices are commonplace today. Remote control devices typically have multiple buttons each one of which when actuated by a user may send a remote command to the remotely controlled device causing the controlled device to change its state of operation (e.g., change television channel or volume setting). Remote control devices may control a single device or multiple devices. A universal remote control has been developed that can control multiple different devices from different, commercial manufacturers.
[0005] However, remote controls can be difficult to use in darkened rooms or under other conditions in which the button labels may be difficult to ascertain and, in any case, require the user to locate the button corresponding to the desired function. For example, users of a media center in a home or office may experience difficulty in attempting to control media devices or perform media related tasks using a remote control under conditions otherwise favorable to the media experience (e.g., seated or standing in a darkened room while directing attention to a display or screen). In some cases, voice command input may provide an easier user input mechanism.
SUMMARY
[0006] Embodiments of the present invention may include a media center controller for controlling and providing user access to multiple devices and applications of a media center. Embodiments may also include systems and methods for transmitting and receiving speech commands from a user for remotely controlling one or more devices or applications. In at least one embodiment, a remote control device may be used as a voice command access point to control a variety of media related functions of a media center.
[0007] Embodiments may further include a media center controller that allows users to control various media center activities via manual devices, such as keypad or keyboard, or by voice command, which may include speaking naturally to their computers. Such activities may include playing music and DVDs, launching applications, dictating letters, browsing the Internet, using instant messaging, reading and sending electronic mail, and placing phone calls.
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] The invention claimed and/or described herein is further described in terms of exemplary embodiments. These exemplary embodiments are described in detail with reference to the drawings. These embodiments are non-limiting exemplary embodiments, in which like reference numerals represent similar structures throughout the several views of the drawings, and wherein:
[0009] FIGURE 1 is a system functional block diagram according to at least one embodiment;
[0010] FIGURE 2 is a flow chart illustrating a method according to at least one embodiment;
[0011] FIGURE 3 is a detailed functional block diagram of at least one embodiment of a media center controller according to the invention; [0012] FIGURE 4 is a detailed functional block diagram of a media center controller remote control device according to at least one embodiment;
[0013] FIGURE 5 is a detailed functional block diagram of a media center controller computing device according to at least one embodiment; and
[0014] FIGURE 6 is a logical control and data flow diagram depicting the transfer of information among various modules comprising the media center command processor according to at least one embodiment;
[0015] FIGURES 7a and 7b are a flow chart of a media center control method according to at least one embodiment;
[0016] FIGURE 8 shows a top level menu interactive page according to at least one embodiment;
[0017] FIGURE 9 shows a send voice recording interactive page according to at least one embodiment;
[0018] FIGURE 10 shows a send e-mail interactive page according to at least one embodiment;
[0019] FIGURE 11 shows a read e-mail interactive page according to at least one embodiment;
[0020] FIGURE 12 shows a send text message interactive page according to at least one embodiment;
[0021] FIGURE 13 shows a voice activated dialing interactive page according to at least one embodiment;
[0022] FIGURE 14 shows a messenger interactive page according to at least one embodiment;
[0023] FIGURE 15 shows a user account interactive page according to at least one embodiment;
[0024] FIGURE 16 shows a user contacts interactive page according to at least one embodiment;
[0025] FIGURES 17a and 17b are a flowchart of a method voice over Internet Protocol (VoIP) or Personal Computer (PC)-to-PC applications in an embodiment; and
[0026] FIGURES 18a and 18b are a flowchart of a method 1800 for PC-to- phone applications in an embodiment. DETAILED DESCRIPTION
[0027] Described herein are a system and methods for a media center controller. The system and methods may include a computing device having a user dialog manager to process commands and input for controlling one or more controlled devices or applications. The system and methods may include the capability to receive and respond to commands and input from a variety of sources, including voice and manual entry commands and spoken commands from a user, for remotely controlling one or more electronic devices. In at least one embodiment, the system and methods may also include a user interaction device capable of receiving spoken user input and transferring the spoken input to the computing device. The user interaction device may be a handheld device.
[0028] Accordingly, embodiments of the present invention may include a system and method, interacting with a computer using a remote control device for controlling the computing device. Alternatively, other remote control devices may be used such as, for example, a Universal Remote Control device, which transmits utterances (i.e., spoken information) to a receiving computer device that may perform speech processing and natural language processing. The remote control device may include a microphone, and optionally a speaker, along with an optional microphone On/Off button. When actuated, the microphone On/Off button may mute the device(s) controlled by the remote control device, and begin its transmitting of the user's utterance to the receiving computing unit. When released, the microphone On/Off button may deactivate the microphone and un-mute the affected device(s) (such as, for example, television, stereo).
[0029] In at least one embodiment, the receiving computing unit may provide the audio transmission from the remote control device to a speech processing application and may transmit audio back to the remote control device for playback to the user using the speaker.
[0030] FIGURE 1 is a system functional block diagram of at least one embodiment. Referring to FIGURE 1, a system 100 may include a remote control device 101 which may be coupled to a computing device 102 using an interface 103. The remote control device 101 may also include a remote control interface 104 for transmitting commands to one or more controlled devices 105. In at least one embodiment, the remote control device may be a media center controller remote control unit. A media center command processor 106 may be coupled to or included with the computing device 102 and provided in communication with the remote control device 101 using the interface 103. Furthermore, in at least one embodiment the computing device 102 may be coupled to one or more controlled devices 105. In at least one embodiment, the computing device 102 may be a media center controller computing device.
[0031] In at least one embodiment, the computing device 102 may include a speech recognizer 110 and a natural language processor 111. The speech recognizer 110 and the natural language processor 111 may be implemented, for example, using a sequence of programmed instructions executed by the computing device 102. Alternatively, the speech recognizer 110 and the natural language processor 111 may comprise multiple portions of their respective applications, each of the portions executing on one or more of the computing device 102, and the media center command processor 106. In at least one embodiment, no training sequences are required by the speech recognizer 110.
[0032] An example of a natural language processor is given in commonly assigned U.S. Patent No. 6,434,524, entitled "OBJECT INTERACTIVE USER INTERFACE USING SPEECH RECOGNITION AND NATURAL LANGUAGE PROCESSING," issued August 13, 2002 ("the '524 patent"), ϋi particular, the computing device 102 may be configured to include the natural language processor 111 as described with respect to the functional block diagram in Figure 2 of the '524 patent and at col. 6, lines 13-67, which is hereby incorporated by reference as if set forth fully herein.
[0033] In an embodiment, the speech recognizer 110 may be configured to determine one or more remote control commands corresponding to the received audio signal. The speech recognizer 110 may include a speech processing capability that detects features of the audio signal sufficient to identify the corresponding remote commands or user requests or input. The mapping of the features to remote commands/requests may be maintained at the computing device 102 using, for example, non- volatile storage media such as a hard drive. Upon determining the remote command(s) or input, the computing device 102 sends the corresponding response(s) to the remote control device 101 using the interface 103. [0034] In an embodiment, the audio signal may be input to the natural language processor 111 for extraction of the relevant portions of the audio signal required for the speech recognizer 110 to determine the associated command or input. The natural language processor 111 may receive the audio signal prior to the speech recognizer 110, at the same time as the speech recognizer 110, or only if the speech recognizer 110 first fails to confidently determine the corresponding remote command. Upon receiving the remote command or interpreted information from the computing device 102, the remote control device 101 may output the remote command to the affected controlled device(s) 105 using the remote control interface 104.
[0035] In an embodiment, one or both of the speech recognizer 110 and the natural language processor 111 may be implemented in the media center command processor 106 which is coupled to or included with the computing device 102. In particular, the media center command processor 106 may include hardware and software components to perform the speech analysis described above, thereby reducing the processing load and processing bandwidth requirements for the computing device 102. The media center command processor 106 may be operably coupled to the computing device 102 using a variety of known interfacing mechanisms (e.g., USB, Ethernet, RS- 232, parallel port, IEEE 802.11). In at least one embodiment, the media center command processor 106 may be coupled to the controlled device(s) 105 using a network 107. The media center command processor 106 may be a set top box. Alternatively, the media center command processor 106 may be implemented as one or more internal circuit board assemblies, software or a sequence of programmed instructions, or a combination thereof, of the computing device 102. Alternatively, the media center command processor 106 may be implemented using hardware and software in the remote control device 101 or one or more of the controlled devices 105.
[0036] In an embodiment, the computing device 102 and media center command processor 106 may be implemented using one or more computing platforms of a headend system for cable or satellite television or media signal distribution. Jn particular, the computing device 102 may be provided using one or more servers, which may be PC-based servers, at the headend. In these embodiments, the media center command processor 106 may be implemented as one or more internal circuit board assemblies, software or a sequence of programmed instructions, or a combination thereof, of the headend. The remote control device 101 may output remote control signals (either keypad command or voice input) to the headend computing device 102 via the interface 103. In these embodiments, the interface 103 may be a satellite channel or a cable channel for communications in the direction from the user to the headend. A Cable Television (CATV) converter box may be provided for transmitting information back to the CATV service provider or headend from the remote control device 101.
[0037] In at least one embodiment, the remote control device 101 may include buttons which, when actuated by a user, cause the transmission of remote commands or status inquiries to the controlled device(s) 105 using the remote control interface 104. Furthermore, the remote control device 101 may be capable of controlling a single device, multiple devices, or may be a Universal Remote Control device capable of controlling multiple controlled devices 105 provided by different manufacturers. Alternatively, the remote control device 101 may be a Bluetooth™ capable headset. The remote control device 101 may allow user selection of a particular controlled device 105 to be controlled using the remote control device 101. In an embodiment, the remote control device 101 may include at least one processor such as, but not limited to, a microcontroller implemented using an integrated circuit. In some embodiments, the remote control device 101 may simultaneously send or broadcast information to more than one controlled device 105. In an embodiment, the remote control device 101 may include a microphone 120, a speaker 121, and a switch 122 operable to actuate the microphone and transmit information using the interfaces 103 and 104.
[0038] In at least one embodiment, actuation of the switch 122 may cause information to be sent to one or more controlled devices 105 using the remote interface 104 that causes the audio output of those devices 105 to be muted while the switch is actuated. Alternatively, the information or command that causes the muting may be sent from the media center command processor 106 or the computing device 102 directly to the controlled device 105. While the switch 122 is actuated, the interface 103 transmits audio signal of the audio received from the microphone 120 (spoken by a user, for example) to the computing device 102. The audio signal may be encoded or compressed using a variety of compression algorithms (e.g., coder-decoder (CODEC), vocoding) to reduce the amount of information transferred using the interface 103, and its attendant bandwidth and data rate requirements. In an embodiment, the remote control device 101 may be configured to extract particular features from the audio received from the microphone 120. [0039] In at least one embodiment, the remote control device 101 may include a pushbutton by which a user may actuate and release the switch 122. Alternatively, the switch 122 may be voice activated. Upon the user releasing the switch 122, the remote control device 101 may turn deactivate the microphone 120, cease sending information to the computing device 102 via interface 103, and send an "un-mute" command via remote control interface 104 or interface 107 to the controlled devices 105. This approach reduces the power consumed by the remote control device 101. Alternatively, the mute and un-mute signals may be sent by the computing device 102, in which case the computing device 102 may also include a remote control interface 104; or, the mute and un-mute signals may be sent by the media center command processor 106 via the interface 107, or by the remote interface 104 (if present at the media center command processor 106).
[0040] In addition, the remote control device 101 may include one or more programmable switches and a coder that transmits codes over the remote control interface 104 based on the switch settings as determined by a switch state to code mapping maintained by the remote control device 101. In an embodiment the switches may be programmed by a user interacting with a user interface of the remote control device 101. Alternatively, the switches may be programmed by the computing device 102 using the interface 103. Alternatively, the switch state to code mapping is maintained by the computing device 102 and downloaded to the remote control device 101 using the interface 103.
[0041] In an embodiment, the computing device 102 may be implemented using a personal computer configured to execute applications compatible with the Windows™ operating system available from Microsoft Corporation of Redmond, Washington. For example, in at least one embodiment, the computing device 102 may execute the Microsoft™ Windows Media Center™ operating system. Other embodiments are possible, including other operating systems and computing platforms. For example, the computing device may be implemented using a game device console (e.g., X-Box™, Sony Playstation™ or Playstation2™, or GameCube™), a television set top box, a digital video recorder (e.g., TiVo™, Replay TV™), a home theater sound processor, or other processing device. In at least one embodiment, all or a portion ofthe systems and methods described herein may be implemented as a sequence of programmed instructions executing on the computing device 102 along with and in cooperation with other processors or computing platforms. In at least one embodiment, the computing device 102 may include a sound card/Universal Serial Bus (USB) port for input of audio signal.
[0042] In an embodiment, the computing device 102 may include an audio response capability. In particular, upon receiving the audio signal from the remote control device 101, the computing device 102 may provide an audio response to the remote control device 101 using the network 103. Upon receiving the audio response information from the computing device 102, the remote control device 101 may output the audio response to the user using the speaker 121. Accordingly, the audio response information may be synthesized speech provided by the computing device 102. Alternatively, the audio response information may be stored actual speech information from a human voice, or fragments thereof, or may be generated as required using a speech synthesis application. In an embodiment, the audio response information may produce audio confirming to the user that the operation requested in the audio signal (e.g., spoken request from the user) have been accomplished. For example, if the user utters "TV channel 27," upon the system changing the television controlled device to channel 27 as described herein, an audio response stating "TV Channel 27" may be played to the user over speaker 121. Other messages are possible, such as "Television 1 changed to channel 27," etc.
[0043] Alternatively, these audio response functions may be performed by the computing device 102 without involving the remote control device 101, by using, for example, the interface 107. In such embodiments, the audio response may be played from a speaker on the computing device 102 (the computing device 102 having a sound card) or from a speaker of one or more of the controlled devices 105. Alternatively, the media center command processor 106 may provide some or all of these audio response functions, or may share them with the computing device 102.
[0044] Controlled devices 105 may include electronic devices produced by different manufacturers such as, for example, but not limited to, televisions, stereos, video cassette recorders (VCRs), Compact Disc (CD) players/recorders, Digital Video Disc (DVD) players/recorders, TiVo™ units, satellite receivers, cable boxes, television set-top boxes, the Internet and devices provided in communication with the Internet, tuners, and receivers. The remote control interface 104 may include, for example, an InfraRed (IR) wireless transceiver for transmission, and possibly reception, of command and status information to and from the controlled devices 105, as is commonly practiced. However, the remote control interface 104 may be implemented according to a variety of techniques in addition to IR including, without limitation, wireline connection, a Radio Frequency (RF) interface, telephone wiring carried signals, BlueTooth™, Firewire™, 802.11 standards, cordless telephone, or wireless telephone or cellular or digital over-the- air interfaces.
[0045] Alternatively, the computing device 102 may be configured as an Interactive Voice Response (IVR) system. In particular, the computing device may be configured to support a limited set of IVR command-response pairs such as, for example, command-responses that accomplish pattern matching for the received audio signal without semantic recovery.
[0046] The interfaces 103 and 107 may be an electronic network capable of conveying information such as, for example, an RF network. Examples of such an RF network include Frequency Modulation (FM), IEEE 802.11 standard and variations, IR, Firewire™, and Bluetooth™. Further, the interface 103 may be a satellite communication channel or a Cable Television (CATV) channel. Other networks are possible.
[0047] The remote control device 101 may include navigation keys 301, a numeric and text entry keypad 302, a microphone 120, a speaker 121, a mute button or switch 122, an interface 103, and a remote control interface 104. The interface 103 may further include an audio receiver 303, an audio transmitter 304, and a function key transmitter 305.
[0048] In another embodiment, the telephone customer premises equipment (CPE) may be used to obtain and process a user's audio utterances for remote control. In particular, the remote control device 101 may be implemented using a telephone handset (which may be a wireline or a cordless or cellular/mobile handset or headset) having the speech processing capabilities described herein. Audio signal may be transmitted from the telephone handset to the computing device 102 using the existing household telephone wiring. The handset microphone and speaker may be used for obtaining the user's utterances and for playback of the audio response, respectively. The remote command information received from the computing device 102 may be transmitted by the handset to the controlled device(s) 105 using the interface 103 included in the handset for this purpose. In addition, the computing device 102 may output audio queries to the user via the handset speaker (e.g., "What do you want to do?").
m [0049] FIGURE 2 is a flow chart of a method 200 according to at least one embodiment. Referring to FIGURE 2, the method 200 may commence at 202. Control may then proceed to 204 at which the user activates the microphone button on the remote control device. In response, at 206, the remote control unit may mute the controlled device(s). Upon the user uttering a command at 208, the remote control device microphone may output (for example, by streaming) the audio uttered by the user at 210 and transmit the audio signal to the computing device at 212.
[0050] Upon the user releasing the microphone button at 214, the remote control device (or the computing device or media center command processor) may unmute the controlled device(s) at 216.
[0051] Upon receiving the audio signal, the computing device may perform speech processing as described above to determine the associated remote command(s) at 218. The computing device may then transmit the corresponding response (which may be a device command) to the remote control device at 220. Or, if the input is a non- spoken input, a keypad or keyboard input may be received at 219. Control may then proceed to 228, at which the computing device provides the command to the controlled device(s).
[0052] The computing device may also transmit an audio response to an audio output device at 222. Upon receiving the audio response, the audio output device may play the audio response to the user using a speaker at 226. Alternatively, the computing device may output the audio response directly to the controlled device to play over a speaker of the controlled device. At 230, the method may end.
[0053] In at least one embodiment, the above described system and method may be used for media center control. A media center may be any system that includes a processor configured to provide control and use of multiple media devices or capabilities. Examples of such media devices include, but are not limited to, Television (TV), cable TV, direct broadcast satellite, stereo, Video Cassette Recorder (VCR), Digital Video Disc (DVD), Compact Disc (CD), Tivo™ recorder, and World Wide Web (WWW) browser, electronic mail client, telephone, voicemail. One or more of these media devices may be implemented using application software programmed instructions executing on a personal computer or computer platform.
[0054] FIGURE 3 is a detailed functional block diagram of a media center controller 300 according to at least one embodiment. Referring to FIGURE 3, the media center controller 300 may include a computing device 102, which may be a media center controller computing device. In at least one embodiment, the computing device 102 may be coupled to a remote control device 101, which may be a media center controller remote control device, for receiving and transmitting audio information and for receiving control data from the remote control device 101. As shown in FIGURE 3, the computing device 102 may include the media center command processor 106. In at least one embodiment, the media command processor 106 may include a speech transceiver capability. Further, the computing device 102 for media center controller 300 may be operably coupled to a variety of media devices as described above. As shown in FIGURE 3, the media center controller computing device 102 may be operably coupled to, for example, but not limited to, a radio signal source 301 for receiving radio broadcast signals, a Television (TV) signal source 302 for receiving TV broadcast signals, a satellite signal source 303 for receiving satellite transmitted TV and data signals, including direct broadcast satellite TV and data signals, a CATV converter box 313 for communication to and from a CATV headend, and to a private or public packet switched network 304 such as, for example, the Internet, for receiving and transmitting a variety of packet based information to other PCs or other communications devices. Packet based information transferred by the computing device 102 includes, but is not limited to, electronic mail (email) messages in accordance with SMTP, Instant Messages (IM), Voice-Over-Internet-Protocol (VOIP) information, HTML and XML formatted pages such as, for example, WWW pages, and other packet or IP based data.
[0055] Further media devices to which the media center controller computing device 102 may be operably coupled to include, for example, but are not limited to, a wireline or cordless access telephone network 305 such as the Public Switched Telephone Network (PSTN), and wireless or cellular telephone systems. In such embodiments, the computing device 315 may be coupled to a telephone handset 315, which may be a cordless or wireless handset.
[0056] In addition, in an embodiment, the computing device 102 may be optically or electronically coupled to a keyboard and mouse 311 for receiving command and data input, as well as to a camera 312 for receiving video input. The computing device 102 may also be coupled to a variety of known video devices, optionally using a video receiver 306, for output of video or image information to a television 307, computer monitor 308, or other display device. The computing device 102 may also be coupled to a variety of known audio devices, optionally using an audio receiver 309, for output of audio information to one or more speakers 310. Furthermore, the media center controller 300 may include an audio file/track player to play audio files requested by the user; and an audio/visual player to play audio/visual files or tracks requested by the user.
[0057] With respect to FIGURE 3, in an embodiment, the computing device 102 and media center command processor 106 may be implemented using one or more computing platforms of a headend system for cable or satellite television or media signal distribution. In particular, the computing device 102 may be provided using one or more servers, which may be PC-based servers, at the headend. In these embodiments, the media center command processor 106 may be implemented as one or more internal circuit board assemblies, software or a sequence of programmed instructions, or a combination thereof, of the headend. The remote control device 101 may output remote control signals (either keypad command or voice input) to the headend computing device 102 via the interface 103. In these embodiments, the interface 103 may be a satellite channel or a cable channel for communications in the direction from the user to the headend. Further, the media center controller 300 may include a CATV converter box for transmitting information back to the CATV service provider or headend from the remote control device 101.
[0058] FIGURE 4 is a detailed functional block diagram of a media center controller remote control device 101 according to at least one embodiment. Referring to FIGURE 4, the remote control device 101 may include navigation buttons 401 operable to allow a user to input directional commands relative to a cursor position or to scroll among items for selection using a display, a numeric and text entry keypad 402 operable to allow a user to input numeric and text information, the microphone 120 for receiving user voice utterances, the speaker 121 for providing audio output to a user, the activation/mute switch 122 for muting controlled devices, the remote control interface 104 for sending information to controlled devices, and the interface 103 for transferring audio to and from and control data to the computing device 102. The remote control device 101 may further include a 'clear' button and an 'enter' button. In an embodiment, the interface 103 may include an audio receiver portion 403, an audio transmitter portion 404, and a function key transmitter portion 405, for transferring this respective information to the computing device 102. [0059] FIGURE 5 is a detailed functional block diagram of a media center controller computing device 102 according to at least one embodiment. Referring to FIGURE 5, the computing device 102 may include the media center command processor 106. The computing device 102 may also include standard computer components 506 such as, but not limited to, a processor, memory, storage, and device drivers. In an embodiment, the computing device 102 may be a Microsoft Windows™ compatible PC provided by a variety of manufacturers such as the Dell Corporation of Austin, Texas. The computing device 102 may also include an audio transmitter 507 for transferring synthesized speech and other audio output to the remote control device 101, an audio receiver, or other controlled device for output to a listening user. The computing device 102 may also include an audio receiver 508 for receiving audio information from the remote control device 101 or a microphone. Further, the computing device 102 may include a data receiver 509 for receiving function key, keypad, or navigation key information from the remote control device 101, and for receiving keyboard or mouse input, and for receiving packet based information. Other types of received data are possible.
[0060] In at least one embodiment, the media center command processor 106 may include the speech recognition processor 110, an audio feedback generator 505 that may include a speech synthesizer , a data/command processor 502, a sequence processor 503, and a user dialog manager 501. The speech recognition processor 110 may further include the natural language processor 111. In an embodiment, each of these items comprising the media center command processor 106 may be implemented using a sequence of programmed instructions which, when executed by a processor such as the processor 506 of the computing device 102, causes the computing device 102 to perform the operations specified. Alternatively, the media center command processor 106 may include one or more hardware items, such as a Digital Signal Processor (DSP), to enhance the execution speed and efficiency of the voice processing applications described herein. In an embodiment, the speech recognition processor 110 may receive the audio signal and convert or interpret it to one or more particular commands or to input data for further processing. In addition to command grammar processing, natural language processing may also be used for voice command interpretation. Further details regarding the interaction between the user dialog manager 501 and the speech recognition processor 110 for natural language processing are set forth in commonly assigned U.S. Patent No. 6,532,444, entitled "USING SPEECH RECOGNITION AND NATURAL LANGUAGE PROCESSING," issued March 11, 2003 ("the '444 patent"), which is hereby incorporated by reference as if set forth fully herein. In particular, the computing device 102 may be configured to include the natural language processor 111 and speech recognition processor 110 as described with respect to the functional block diagram in Figure 2 of the '444 patent.
[0061] In that regard, the speech recognition processor 110 may include a natural language processor 111 as described herein to assist in decoding and parsing the received audio signal. For example, the natural language processor 111 may be used to identify or interpret an ambiguous audio signal resulting from unfamiliar speech phraseology, cadence, words, etc. The speech recognition processor 110 and the natural language processor 111 may obtain expected speech characteristics for comparison from the grammar/sequence database 504. The audio feedback generator 505 may be configured to convert stored information to a synthesized spoken word recognizable by a human listener, or to provide a pre-stored audio file for playback. The data/command processor 502 may be configured to recejve and process non-spoken information, such as information received via keyboard, remote 101 keypad, email, or VOIP, for example. The sequence processor 503 may be configured to retrieve and executed a predefined spoken script or a predefined sequence of steps for eliciting information from a user according to a hierarchy of different command categories. The sequence processor 503 may also validate the input received as being at the proper or expected step of a sequence or scenario. The sequence processor 503 may obtain the sequence information from the grammar/sequence database 504. In addition, the sequence processor 503 may determine an appropriate response for output to the user based on the received user input. In making this determination, the sequence processor 503 may use or consult a sequence or set of steps associated with the input and the context of the task requested or being performed by the user.
[0062] The user dialog manager 501 may provide management for functions such as, but not limited to: determining whether input received from an application includes an audio signal for speech recognition or is command/data input for command interpretation; requesting command validation and response identification from the sequence processor; outputting audio or display based responses to the user; requesting text to speech conversion or speech synthesis; requesting audio and/or visual output processing; and calling operating system functions and other applications as required to interact with the user.
[0063] In at least one embodiment, the media center command processor 106 may further comprise a grammar/sequence database 504. The grammar/sequence database 504 may include predefined sequences of information, each of which may be used by the sequence processor 503 to output information or responses to a user designed to elicit information from the user necessary to perform a media related function in a contextually proper manner. Further, the grammar/sequence database 504 may include state information to specify the valid states of a task, as well as the permissible state transitions.
[0064] FIGURE 6 is a logical control and data flow diagram depicting the transfer of information among various modules of the media center command processor 106 according to at least one embodiment. Referring to FIGURE 6, the user dialog manager 501 may receive user input from a variety of input devices via an application processor 601. The application processor 601 may be configured to receive input from a user via spoken information such as, for example, audio signals received from the remote control device 101, as well as to receive non-spoken information, such as information received via keyboard manual entry, remote 101 keypad, or Voice Over Internet Protocol (VOIP), for example. The user dialog manager 501 may transfer the audio signal to the speech recognition processor 110 for interpretation of the received audio signal into command or data information. The user dialog manager 501 may transfer command information to the data/command processor 502 for further processing such as, for example, validation of the received input in the context of the requested task or task in process.
[0065] The user dialog manager 501 may also request the sequence processor 503 to validate that the received input is within an acceptable range and is received in the proper or expected sequence for an associated task. If the input is valid and in-sequence, the sequence processor 503 may identify to the user dialog manager 501 an appropriate response to be output to the user. Based on this response information, the user dialog manager 501 may request the audio feedback generator 505 to prepare an audio response to be output to the user, or may play a pre-recorded prompt. The user dialog manager 501 may also request a visual output formatter 602 to prepare a visual response to be output to the user. The user dialog manager 501, the visual output formatter 602, and the application processor 601 may output the user response to an operating system 603 of the computing device 102 as well as to applications or device drivers for a variety of output devices 604 for output to the user, such that the user dialog manager 501 is logically connected through operating system services to input/output devices.
[0066] FIGURES 7a and 7b illustrate a flow chart of a media center control method 700 according to at least one embodiment. Referring to FIGURE 7a, a method 700 may commence at 705. Control may then proceed to 710, at which user input is received by an application or an application processor. The input may be received from a user via spoken information such as, for example, audio signals received from the remote control device 101, but may also include non-spoken information, such as information received via keyboard manual entry, remote 101 keypad, or VOIP, for example.
[0067] Control may then proceed to 715, at which the application processor may transfer the user input (e.g., audio signal, commands, data) to the user dialog manager for interpretation. Control may then proceed to 717, at which the user dialog manager may classify the input as audio or non-spoken input. At 720, the user dialog manager may then transfer the audio signal to the speech recognition processor for interpretation of the audio signal into command or data information. At 725, the user dialog manager may transfer non-spoken information to the data/command processor for further processing such as, for example, validation of the received input in the context of the requested task or task in process.
[0068] If at 730 the speech recognition processor determines that the received audio signal includes ambiguities such as extraneous information, noise, or otherwise are not readily susceptible of interpretation, then control may proceed to 735 at which natural language processing may be performed. The natural language processing may provide for additional interpretation of the audio signal for determining the requested command, operation, or input.
[0069] Control may then proceed to 740, at which the speech recognition processor or data/command processor provide an indication of the interpreted command(s) or input to the user dialog manager. Referring to FIGURE 7b, control may then proceed to 745, at which the user dialog manager may transfer the interpreted command(s) or input to the sequence processor for validation. At 750, the sequence processor may obtain command set and sequence information associated with the interpreted command(s) or input from the grammar/sequence database. Control may then proceed to 755, at which the sequence processor may validate that the interpreted command or input is within an acceptable range and is received in the proper or expected sequence or dialog step for an associated task as specified in a predefined state table contained in the grammar/sequence database. If at 760 the sequence processor determines that the interpreted command or input is valid, then control may proceed to 764; otherwise, control proceeds to 762 at which the sequence processor provides an error indication to the user dialog manager indicating command/input validation failure.
[0070] At 764, if the input is valid and in-sequence, the sequence processor may identify to the user dialog manager an appropriate response to be output to the user. Control may then proceed to 765, at which, based on this response information, the user dialog manager may prepare a response to the user. Control may then proceed to 770, at which the user dialog manager may determine if an audio output response is to be provided. If so, control may then proceed to 780 at which the user dialog manager requests the audio feedback generator to prepare an audio response to be output to the user, or plays a pre-recorded audio file. In either case, at 775 the user dialog manager may request the visual output formatter to prepare a visual response to be output to the user. Control may then proceed to 785, at which the user dialog manager, the visual output formatter, and the application processor may output the user response to an operating system of the computing device as well as to applications or device drivers for a variety of output devices for output to the user. At 790, a method may end.
[0071] In at least one embodiment, the media center controller 300 may be used for control of and interaction with a variety of media devices and functions. For example, the media center controller may allow a user to command a platform (device or computer) to implement capabilities such as, but not limited to: making audio phone calls; making video phone calls; instant messaging; video messaging; sending voice recordings; reading e-mail; sending e-mail; sending text messages; managing user contacts; accessing voice mail; calendar management; playing music; playing movies; playing the radio; playing TV programs; recording TV programs; browsing the Internet; dictating documents; entering dates into a personal calendar application and having the system provide alerts for upcoming scheduled meetings and events; and launching applications. In an embodiment, the mechanism for interaction with the computer system is accomplished through either a) a remote control device, b) microphone input, c) keyboard and mouse, or d) touch-screen. With respect to the remote control device, it may be a multi-mode input device that has a keypad for manual entry of commands transmitted to the system, as well as a microphone embedded in the remote control, allowing the user to provide spoken commands to the system. As discussed herein, in at least one embodiment, the media center controller 300 may include a natural language interface allows users to speak freely and naturally. However, manual key, touchscreen and keyboard/mouse interface may also be provided as an option to speech. In an embodiment, the media center controller 300 may provide a mechanism, such as a logon authentication process using an interactive page, to identify the current user and allow or deny access to the system.
[0072] FIGURE 8 shows a top level menu interactive page 800 according to at least one embodiment. Referring to FIGURE 8, the top level menu interactive page 800 may include several media function selection buttons 801. Upon user selection of a particular media function selection button 801, a request to execute the associated media function may be received by the application processor 601. The application processor 601 may forward the request to the user dialog manager 501 for processing as described with respect to FIGURES 7a and 7b herein.
[0073] FIGURE 9 shows a send voice recording interactive page 900 according to at least one embodiment. Referring to FIGURE 9, the send voice recording interactive page 900 may provide an interface by which a user may compose and send a recorded voice message for a wireless device. Using this feature, the user can record a voice message to a recipient, such as a contact, and then send the recorded voice message to the recipient. The media center controller 300 may record the voice message as a .wav file, for example. The recipient listener will hear exactly what the recording user says, so misinterpretation can be avoided. In an embodiment, the recorded voice message may be delivered to the recipient's inbox as an e-mail. When the recipient opens the e-mail message, they will hear the .wav file play your message.
[0074] FIGURE 10 shows a send e-mail interactive page 1000 according to at least one embodiment. Referring to FIGURE 10, the send e-mail interactive page 1000 may provide an interface by which a user may compose and send an e-mail message for a wireless device. Using this feature, the user may speak his message into the wireless device and his voice is converted to text as discussed herein. The e-mail message may be sent to the recipient using a network, and will appear in the recipient's inbox as if it was written on a computer. In at least one embodiment, the send e-mail feature requires no keypad tapping to create. While the user dictates the message, he may be provided the option to edit, add more, or send.
[0075] FIGURE 11 shows a read e-mail interactive page 1100 according to at least one embodiment. Referring to FIGURE 11, the read e-mail interactive page 1100 may provide an interface by which a user may read an e-mail message. In at least one embodiment, users may access their corporate or personal e-mail account via the Media center controller. In order to use the E-mail Read feature, a POP3, IMAP, or corporate e- mail account may be required. To use this feature, a user first enters her e-mail server name, account name, and password into the user profile portion (see FIGURE 15) of the read e-mail interactive page 1100. Next, the entered information may be stored by the computing device of the media center controller. Thereafter, when the user calls in, she will be able to check her e-mail by saying "Read E-mail." In an embodiment, users may have the option to reply to, forward, delete, and skip e-mails.
[0076] FIGURE 12 shows a send text message interactive page 1200 according to at least one embodiment. Referring to FIGURE 12, the send text message interactive page 1200 may provide an interface by which a user may send a text message. Text messaging is a way to send short messages from wireless device to a wireless phone. In an embodiment, users may send text messages such as, for example, SMS messages, to anyone with a messaging-capable phone. In at least one embodiment, the send text message interactive page 1200 may include a characters remaining field 1201 for informing the user how many text characters may be added to an in-process message. The media center controller 300 may determine the number of characters remaining based on the display characteristics and capabilities of the receiving wireless device as maintained using a database.
[0077] Furthermore, FIGURE 13 shows a voice activated dialing interactive page 1300 according to at least one embodiment. Referring to FIGURE 13, the voice activated dialing interactive page 1300 may provide an interface by which a user may make voice-activated telephone calls by speaking a name, nickname, or number. Users can store all of their contact information using a user account interactive page, such as> shown in FIGURE 15, of the media center controller 300. In at least one embodiment, there is no need to train the media center controller 300 to recognize each name.
[0078] FIGURE 14 shows a Windows Messenger™ interactive page 1400 according to at least one embodiment. Referring to FIGURE 14, the Windows Messenger™ interactive page 1400 may provide an interface by which a user may communicate in real-time with other people who use Windows Messenger™ and who are signed in to the same instant messaging service. The media center controller 300 may allow users to send instant messages to each other by typing; to communicate through a PC-to-PC audio connection; or to communicate through a PC-to-PC audio/video connection.
[0079] In addition, the media center controller 300 may provide an interface by which a user may access voice mail systems (VM) by voice command over the telephone network. In particular, upon a (spoken or keyboard/keypad entered) command from the user to connect to his VM, the media center controller will connect to VM by dialing, connecting the call, and automatically playing the proper VM Connect tone to the far-end VM system (for example, a "*" tone), and then automatically (if so selected by the user) playing the VM user account number and password, as appropriate, through DTMF.
[0080] In an embodiment, this automated activity may be transparent to the user. After the user states "[name] Voice Mail," the user hears Music On Hold (MOH), or feedback to the user alerting him to wait for computer processing, until the request is recognized and the media center controller 300 has forwarded the account and password tones to the VM system. Next, the media center controller 300 may play a VM greeting from the VM system. When the connection to the VM system is complete (if the user provided an incorrect account number or password), the media center controller 300 may connect through anyway and the user will hear the VM system request proper authorization keys. At this point, the media center controller 300 will have connected the VM outgoing line to the user so he can hear the prompts, but the line from the user will be connected to the media center controller for voice recognition. If the user hits one or more DTMF keys, the DTMF tones may be passed through to the VM system. Note that a '##' key sequence will still disconnect from the VM system (assuming that VM systems will not use '##' for any commands).
[0081] Voice access to carrier voice mail, corporate voice mail, and personal voice mail (home answering machines) may all be provided in much the same manner.
[0082] In an embodiment, media center controller 300 voicemail may provide most-often-used features such as, but not limited to: Play Voice Mail; Playback/Rewind/Repeat; Pause; Fast Forward n sees, Fast Rewind n sees; Get Next/Skip Ahead; Get Previous; Delete/Erase; Save Voice Mail; Call Sender; Help/VM Menu. The system may also respond to requests such as "Help", Tutorial," and "All Options." System response to such user requests will be analogous to how the system responds to these commands in other VUI sequences. Note that some VM systems do not support all of the features listed. Unsupported features may be removed from the media center controller 300 prompts and online help, or the media center controller 300 will play a prompt to indicate that the requested feature is not supported by the active VM system. A simple command (e.g., "[Get my I Call my] Voice Mail") may connect to a caller's VM. If the user commands "VoiceMail" from the main menu, and the user has more than one VM setup, then the system may prompt "Say 'Verizon' or 'One Voice' or 'Home.'" From the main menu, for multiple VMs (e.g., carrier and corporate) the caller may be able to select which VM system he wants: "Voice Mail for Verizon", "Verizon VoiceMail", or "Voice Mail for One Voice."
[0083] In an embodiment, the user may set up for multiple VM systems, choosing from carrier, business and home VM, by interacting with VM systems externally to them and using their own commands. The fields defining a VoiceMail entry may include: friendly name, provider selection list box, password required checkbox, and password text field that is masked for security. If the 'Provider' selection is "Other", then other fields including a selection identifying the VM Connect key sequence (usually '#') may be displayed and need to be entered. In an embodiment, many VM systems appear in the dropdown listbox for 'Provider' to make the selection easier for the user. The selections may include the a) carrier name(s), b) corporate VM systems, and c) identifiers for particular answering machines. Clear identification of the VM system may need to also identify the VM by product name, model number or version number. Knowing the type of VM service allows the media center controller 300 to automate the call setup sequence. If the user chooses "Other", then details such as 'VM Connect' sequence, key mapping and timing requirements must be entered by the user. An example of this concern involves entry of the 'VM Connect' sequence, followed by the password. Some VM systems allow '#12345' (VM Connect = '#', password = '12345') to be entered as one sequence. Other systems require a delay between '#' and '12345.'
[0084] The following example describes how a user of the media center controller 300 may access carrier voicemail. First, from the main menu, the user may say, "Voice Mail." The media center controller 300 may respond with, for example, "Just a moment while I connect you to [your voice mail system]." The media center controller 300 may then call the VM system. Upon connect, the media center controller 300 may issue a "VM Connect" DTMF ('#' for Service Provider 1, '*' for Service Providers 2 and 3), if required, n msecs after off-hook and then DTMF the user's account number and/or password n msecs after it DTMFed the "VM Connect". If the account number or password retrieved from the data store is bad, the media center controller 300 may not know that and it will still connect, but the login to the VM system will then fail. If the VM system hangs up, the media center controller 300 may respond with, for example, "Sorry, we could not connect to [...]."
[0085] For connection to the VM, the user must have entered their VM account and/or password on the media center controller 300 interactive page. The 'Voice Mail Account Number' field for the carrier may be visible only if the user has Voice VM service provided by the carrier. The 'Voice Mail Password' field for the carrier may be visible only if the user has Voice VM service provided by the carrier. For corporate or home VM access, the password field is always visible.
[0086] Furthermore, in at least one embodiment, the media center controller 300 may include calendar management. Regarding calendar management, the media center controller 300 may allow a user to access calendar functions by speaking, "Calendar." The media center controller 300 may respond with, for example, "OK. To access calendar features, say Add an appointment, Add a meeting request, Edit, Delete or Look up." <3 second delay> "For a list of all options say All Options. You can also say Help or Tutorial." In an embodiment, calendar main menu commands may include: Add [an] appointment; Add [a] meeting; Edit; Delete; Look up; [Main Menu, All Options, Help, Cancel, Tutorial] - these are available at most response points. Also, in the following scenarios the "Undo" command always takes the user back to the previous step.
[0087] For example, to add an appointment, the user may speak, "Add an appointment." The media center controller 300 may respond with, for example, "OK. Please say the month and date of your appointment." <3 second delay> "You can also say today, tomorrow, or a day ofthe week." The user may reply, "October 20th." The media center controller 300 may respond, "Monday, October 20th. At what time?" (The media center controller 300 may say the day, month and date followed by the year if the appointment occurs in the next year.) The user may reply with one of : "10am to 1 lam," "10 o'clock," "10 am for 2 hours," "10 am," or "All day." The media center controller 300 may respond with, for example, "October 20th, 10 am to 1 lam. What is the subject of your appointment?" The user may reply, "Doctor's appointment." The media center controller 300 may save as a .wav file as an attachment or link, as with VR, and then say, "Please say the location." To which the user may reply, "Scripps Clinic." The media center controller 300 may save as a .wav file as an attachment or link, as with VR. Variations of this scenario as possible. For example, the media center controller 300 may allow the user to "look up" his calendar for a given day or period and, by interacting with the media center controller 300, receive his calendar schedule for that period. For example, the media center controller 300 may say, "MV: You have <#> appointment(s) today, October 21st . First appointment is <appointment>. Second appointment is <appointment>." Further, the user will have the option to choose where he/she would like the calendar alerts sent (e.g., mobile phone, e-mail at work, e-mail at home) under the preferences section of the user accounts interactive page of FIGURE 15. In an embodiment, the Outlook™ default will be used to determine when the alert is sent out. Visual indications for calendar alerts may also be provided.
[0088] FIGURE 15 shows a user account interactive page 1500 according to at least one embodiment. Referring to FIGURE 15, the user account interactive page 1500 may provide an interface by which a user may create a profile with his preferences. To create a new user profile, users will click on the New User button 1501. They will be asked to provide their first and last name, greeting (how they want Media center controller to greet them at start up), e-mail address, and voice model (male or female). On this page, they will also have the option to choose BVI setup, phone setup, e-mail setup, preferences, training, save, delete, or cancel.
[0089] Furthermore, FIGURE 16 shows a user contacts interactive page 1600 according to at least one embodiment. Referring to FIGURE 16, the user contacts interactive page 1600 may provide an interface by which users may access all of their contacts from any controlled device that can access the media center. The media center communicator 300 may provide users voice access to all their important contact names and phone numbers so they don't have to carry an address book or PDA. Users can also add or edit contact information via voice input. In at least one embodiment, each of the FIGURES 8-16 may include certain interactive display items in addition to those described above beneficial to a user of a media center. For example, FIGURES 9 and 12- 16 show an "album cover" icon in the lower left corner indicating the artist, album, song track, and length of play time remaining for an audio music selection. [0090] Thus, the media center controller 300 may support a variety of media center functions and applications. Further details regarding the ability ofthe media center controller 300 to support bidirectional VOIP, PC-to-phone, and PC-to-PC communication are set forth below.
[0091] In an embodiment, the media center controller 300 may use a voice command capability to initiate PC-to-PC communications such as, for example, an Internet Messaging (IM) session, or VOIP communications. FIGURES 17a and 17b are a flowchart of a method 1700 for VOIP or PC-to-PC applications using the media center controller 300. Referring to FIGURE 17a, a method 1700 may commence at 1705. Control may then proceed to 1710, while the top level menu (see, for example, FIGURE 8) is displayed, the user may actuate mute switch on a user interaction device (for example, user interaction device 101).
[0092] Control may then proceed to 1715, at which in response to receiving a signal from the user interaction device that the mute switch has been actuated, the media center command processor may output a signal(s) to one or more controlled devices to mute the audio from the controlled devices.
[0093] Control may then proceed to 1720, at which the user may speak a request for an audio or audio/video messaging session. In an embodiment, the spoken request may be received by the user interaction device and provided therefrom to the media center command processor as described herein. Control may then proceed to 1725, at which the media center command processor may process the spoken request as set forth in FIGURES 7a and 7b herein.
[0094] Control may then proceed to 1730, at which a messaging interactive page may be displayed (see, for example, FIGURE 1400). Control may then proceed to 1735, at which the user may select, via spoken request or manual selection, the person he wants to chat, in accordance with the processing described with respect to FIGURES 7a and 7b herein. Control may then proceed to 1740, at which the user may select, via spoken request or manual selection, to commence the chat session (e.g., selects the "Start Talking" option), in accordance with the processing described with respect to FIGURES 7a and 7b herein.
[0095] Control may then proceed to 1745 of FIGURE 17b, at which the media center command processor may establish an Internet connection with a VOIP communication server to request an audio or audio/visual connection to the selected party. Control may then proceed to 1750, at which if the selected party accepts the request for a conversation, a bi-directional VOIP channel may be opened between the media center command processor and the user and the called party. A conversation may then ensue.
[0096] Alternatively, from 1740 control may then proceed to 1755 of FIGURE 17b at which the media center command processor may establish an Internet connection with another computing device such as, for example, a PC, to request an audio or audio/visual connection to the selected party. Control may then proceed to 1760, at which if the selected party accepts the request for a conversation, a bi-directional IP channel may be opened between the media center command processor and the user and the called party.
[0097] Control may then proceed to 1765, at which the conversation may be terminated by the called party, or by the media center command processor user through selection, via spoken request or manual selection, of a terminate conversation option via, for example, the messaging screen (see, for example, FIGURE 14), in accordance with the processing described with respect to FIGURES 7a and 7b herein. Control may then proceed to 1770, at which a method may end.
[0098] In an embodiment, the media center controller 300 may use a voice command capability to initiate PC-to-phone communications. FIGUREs 18a and 18b are a flowchart of a method 1800 for PC-to-phone applications using the media center controller 300. Referring to FIGURE 18a, a method 1800 may commence at 1805. Control may then proceed to 1810, while the top level menu (see, for example, FIGURE 8) is displayed, the user may actuate mute switch on a user interaction device (for example, user interaction device 101).
[0099] Control may then proceed to 1815, at which in response to receiving a signal from the user interaction device that the mute switch has been actuated, the media center command processor may output a signal(s) to one or more controlled devices to mute the audio from the controlled devices.
[0100] Control may then proceed to 1820, at which the user may speak a request to make a telephone call. In an embodiment, the spoken request may be received by the user interaction device and provided therefrom to the media center command processor as described herein. Control may then proceed to 1825, at which the media center command processor may process the spoken request as set forth in FIGURES 7a and 7b herein. [0101] Control may then proceed to 1830, at which a make phone call interactive page may be displayed (see, for example, FIGURE 1300). Control may then proceed to 1835, at which the user may select, via spoken request or manual selection, the person he wants to chat or the telephone to which he wants to connect, in accordance with the processing described with respect to FIGURES 7a and 7b herein. Control may then proceed to 1840, at which the user may select, via spoken request or manual selection, to commence the initiate the telephone call (e.g., selects the "Dial" option), in accordance with the processing described with respect to FIGURES 7a and 7b herein.
[0102] Control may then proceed to 1845 of FIGURE 18b, at which the media center command processor may establish an Internet connection with a VOIP communication server to request the telephone call to the selected party. Control may then proceed to 1850, at which if the selected party answers the incoming call, a request for a conversation, a bi-directional voice communication channel may be opened between the media center command processor and the user and the called party. In an embodiment, the called party may be accessed via the PSTN. In another embodiment, the called party may be accessed via an IP enabled phone, handset or communication device. In either case, the media center command processor may communicate with the called party using VOIP via VOIP gateway for conversion between IP and PSTN traffic. Optionally, the PSTN may also be used for voice connections with non- VOIP enabled called parties. A conversation may then ensue.
[0103] Control may then proceed to 1855, at which the call may be terminated by the called party, or by the media center controller user through selection, via spoken request or manual selection, of a terminate call option via, for example, the make phone call interactive page (such as, for example, FIGURE 13), in accordance with the processing described with respect to FIGURES 7a and 7b herein. Control may then proceed to 1860, at which a method may end.
[0104] Thus has been shown a media center controller that includes a computing device having a user dialog manager to process commands and input for controlling one or more controlled devices of a media center. The system and methods may include the capability to receive and respond to commands and input from a variety of sources, including spoken commands from a user, for remotely controlling one or more electronic devices. The system and methods may also include a user interaction device capable of receiving spoken user input and transferring the spoken input to the computing device.
[0105] While the invention has been described with reference to the certain illustrated embodiments, the words that have been used herein are words of description, rather than words of limitation. Changes may be made, within the purview of the associated claims, without departing from the scope and spirit of the invention in its aspects. Although the invention has been described herein with reference to particular structures, acts, and materials, the invention is not to be limited to the particulars disclosed, but rather can be embodied in a wide variety of forms, some of which may be quite different from those of the disclosed embodiments, and extends to all equivalent structures, acts, and, materials, such as are within the scope of the associated claims.

Claims

CLAIMSWe claim:
1. A media center controller system comprising:
a computing device having at least one interface to one or more controlled devices; and
a media center command processor coupled to the computing device, the media center command processor including an interface to a handheld device, wherein the media center command processor includes a user dialog manager, a data/command processor, and a sequence processor;
wherein the media center command processor is configured to receive audio input from a handheld device and to perform, in response to the input received from the handheld device, at least one of: speech recognition processing, voice over Internet Protocol communications, instant messaging, electronic mail messaging, and control of one or more controlled devices.
2. The media center controller system of claim 1 , wherein the media center command processor is further configured to receive manual input from the handheld device.
3. The media center controller system of claim 1, wherein the media center command processor further comprises:
a speech recognition processor; and
an audio feedback generator;
wherein the sequence processor is configured to process grammar or sequence data; wherein the user dialog manager is configured to transfer an audio signal to the speech recognition processor, to receive audio feedback from the audio feedback generator, to transfer non-spoken input to the data/command processor, and to receive sequence information from the sequence processor;
wherein the computing device is configured to output interpreted command information to the one or more controlled devices, to output video information to a display monitor based on input received by the user dialog manager, and to output audio feedback to a user.
4. The media center controller system of claim 1, further comprising:
a handheld user interaction device configured to receive input from a user and including an interface to the media center command processor for transferring user input to the media center command processor.
5. The media center controller system of claim 4, wherein the computing device is configured to output audio feedback information and remote control commands received from the media center command processor to the user interaction device, and wherein the user interaction device is configured to output remote control commands to the one or more controlled devices.
6. The media center controller system of claim 5, wherein the user interaction device is configured to output audio feedback to a user.
7. The media center controller system of claim 4, wherein the computing device is configured to output audio feedback information to at least one controlled device.
8. The media center controller system of claim 4, wherein the computing device is configured to output video information to a display monitor.
9. The media center controller system of claim 4, in which the input received from a user includes audio input.
10. The media center controller system of claim 9, in which the input received from a user includes keypad input.
11. The media center controller system of claim 10, in which the input received from a user includes touchscreen input.
12. The media center controller system of claim 4, wherein the user interaction device is a remote control unit further including a microphone, and wherein the remote control unit is configured to transmit the audio signal to the computing device.
13. The media center controller system of claim 4, wherein the user interaction device is configured to receive audio feedback information and remote control commands from the computing device.
14. The media center controller system of claim 13, wherein the remote control unit includes a speaker.
15. The media center controller system of claim 12, in which the remote control unit further includes a mute switch, the remote control unit being configured to send a mute signal to the controlled devices through the computing device upon actuation of the mute switch and to send an unmute signal to the controlled devices through the computing device upon release of the mute switch.
16. The media center controller system of claim 15, in which the remote control unit controls the computing device.
17. The media center controller system of claim 1, in which the media center command processor is included in the computing device.
18. The media center controller system of claim 3, in which the speech recognition processor further includes a natural language processor configured to interpret spoken commands.
19. The media center controller system of claim 3, in which the audio signal represents speech provided by a user.
20. The media center controller system of claim 3, in which the audio signal is received via voice over Internet Protocol.
21. The media center controller system of claim 1, further comprising one or more controlled devices configured to output audio to a user using a speaker in response to receiving audio feedback information from the computing device.
22. The media center controller system of claim 1, in which the media center command processor is a headend system.
23. A method comprising:
receiving user input;
transferring the received user input for interpretation;
classifying the user input as audio input or non-spoken input;
transferring an audio signal to a speech recognition processor for interpretation of the audio signal into command or data information;
transferring non-spoken information to a data command processor for validation;
providing, by the speech recognition processor or data/command processor, an indication of the interpreted command(s) or input;
transferring the interpreted command(s) or input to a sequence processor for validation;
obtaining sequence steps;
identifying valid commands at each sequence step;
transitioning from step to step within a sequence or between sequences;
validating the interpreted command or input to be within an acceptable range and received in sequence for an associated task as specified in a predefined state table;
preparing audio feedback to the user action;
preparing, using a visual output formatter, a visual response to the input; and outputting the response to the user.
24. The method of claim 23, in which the audio input is received from a remote control device.
25. The method of claim 23, in which the non-spoken input is received via manual data entry source.
26. The method of claim 23, in which the audio input is received via voice over Internet Protocol.
27. The method of claim 23, in which the audio input is received public switched telephone network.
28. The method of claim 23, further comprising outputting the audio response to one or more controlled devices configured to output the audio response to a user using a speaker.
29. The method of claim 23, further comprising performing natural language processing to interpret the audio signal containing ambiguities.
30. The method of claim 23, further comprising obtaining command set and sequence information associated with the user input from grammar/sequence data.
31. The method of claim 30, in which the state table is contained in the grammar/sequence data.
32. The method of claim 23, further comprising:
sending a mute signal to the controlled devices during user speech input; and
sending an unmute signal to the controlled devices following user speech input.
33. A remote control device comprising:
a microphone for receiving spoken user input; and
a first interface to a computing device, wherein the first interface may further include an audio receiver portion for receiving audio from the computing device, an audio transmitter portion for providing an audio signal to the computing device, and a function key transmitter portion for transferring keypad information to the computing device.
34. The remote control device of claim 33, further comprising command keys.
35. The remote control device of claim 34, in which the command keys include a numeric keypad, a clear button, an enter button, and navigation buttons for up, down, left, right movement.
36. The remote control device of claim 33, further comprising a speaker for outputting audio to a user.
37. The remote control device of claim 33, further comprising a second interface to at least one controlled device.
38. The remote control unit of claim 33, in which the remote control unit controls the computing device.
39. The remote control unit of claim 33, in which the remote control unit includes an interface to a headend system.
40. A media center controller system comprising:
a computing device including an application processor and a media center command processor, wherein the media center command processor includes a user dialog manager;
a handheld user interaction device coupled to the computing device;
wherein the user dialog manager further includes a speech recognition processor, an audio feedback generator including a speech synthesizer, a data/command processor, and a sequence processor;
wherein the speech recognition processor is configured to generate a text output converted from spoken utterances, the speech recognition processor further including a natural language processor;
wherein the user dialog manager is configured to transfer an audio signal to the speech recognition processor, to receive synthesized speech from the speech synthesizer from the audio feedback generator, to receive pre-recorded audio files from the audio feedback generator for audio feedback to a user, to transfer non-spoken input to the data command processor, and to receive sequence information from the sequence processor;
the sequence processor being coupled to a grammar/sequence database;
a speech synthesizing processor for generating a synthesized speech output in response to text data;
an interface to one or more controlled devices;
wherein the computing device is configured to output synthesized speech and pre-recorded audio information and remote control commands to the user interaction device and to output interpreted command information to at least one controlled device and video information to a display monitor, based on input received by the user dialog manager;
wherein the user interaction device coupled to the computing device and is configured to receive audio input from a user, the user interaction device further including an interface to the computing device for transferring user input to the computing device and a remote control interface to one or more controlled devices, and the user interaction device further configured to output remote control commands to the one or more controlled devices and to output synthesized speech or pre-recorded audio;
wherein the user interaction device further includes: a microphone and a speaker, and wherein the remote control unit is configured to transmit the audio signal to the computing device and to receive synthesized speech information, pre-recorded audio, and remote control commands from the computing device, and wherein the remote control unit further includes a mute switch, the remote control unit being configured to send a mute signal to the controlled devices through the media center command processor upon actuation of the mute switch and to send an unmute signal to the controlled devices through the media center command processor upon release of the mute switch;
an audio input system for receiving speech input provided by the user;
a video input system for receiving a live camera feed;
an audio output system for outputting synthesized speech to the user; a keyboard entry system for input of user commands;
a display device for outputting visual responses and interactive pages to the user;
wherein the user dialog manager is logically connected through operating system services to input/output devices, the audio input system, the audio output system, the speech recognition processor and the speech synthesizing processor, and other computer-internal components;
a data set for storing and accessing user-related information, such as user profiles, contact information, and selected preferences; and
a data store for recorded audio or audio/visual files.
41. The media center controller system of claim 40, in which the controlled devices include a radio receiver for playing radio stations requested by the user.
42. The media center controller system of claim 40, in which the controlled devices include a television receiver for playing or recording television programs.
43. The media center controller system of claim 40, in which the controlled devices include an audio file/track player to play audio files requested by the user.
44. The media center controller system of claim 40, in which the controlled devices include an audio/visual player to play audio/visual files or tracks requested by the user.
45. The media center controller system of claim 40, in which the audio signal represents speech provided by a user.
46. The media center controller system of claim 40, in which the non-spoken input is received via manual data entry source.
47. The media center controller system of claim 40, in which the audio signal is received via voice over Internet Protocol.
48. The media center controller system of claim 40, in which the audio signal is received via public switched telephone network.
49. The media center controller system of claim 40, further comprising one or more controlled devices configured to output audio to a user using a speaker in response to receiving audio information from the computing device.
50. The media center controller system of claim 40, in which the media center command processor is a headend system.
51. A computer readable medium upon which is embodied a sequence of instructions which, when executed by a processor, cause the processor to be configured to:
receive user input;
transfer the received user input for interpretation; classify the user input as audio input or non-spoken input;
transfer an audio signal to a speech recognition processor for interpretation of the audio signal into command or data information;
transfer non-spoken information to a data/command processor for validation;
provide, by the speech recognition processor or data/command processor, an indication of the interpreted command(s) or input;
transfer the interpreted command(s) or input to a sequence processor for validation;
validate the interpreted command or input to be within an acceptable range and received in sequence for an associated task as specified in a predefined state table;
prepare, using a speech synthesizer or a pre-recorded audio file, an audio response to the input;
prepare, using a visual output formatter, a visual response to the input; and
output the response to the user.
52. The computer readable medium of claim 51 , in which the audio input is received from a remote control device.
53. The computer readable medium of claim 51 , in which the non-spoken input is received via manual data entry source.
54. The computer readable medium of claim 51, in which the audio input is received via voice over Internet Protocol.
55. The computer readable medium of claim 51, in which the audio input is received via public switched telephone network.
56. The computer readable medium of claim 51 , further comprising outputting the audio response to one or more controlled devices configured to output the audio response to a user using a speaker.
57. The computer readable medium of claim 51 , further comprising performing natural language processing to interpret the audio signal containing ambiguities.
58. The computer readable medium of claim 51 , further comprising obtaining command set and sequence information associated with the user input from grammar/sequence data.
59. The computer readable medium of claim 51, in which the state table is contained in the grammar/sequence data.
60. The computer readable medium of claim 51 , further comprising outputting the audio response to a user via a speaker of the controlled device.
61. The computer readable medium of claim 51 , further comprising:
sending a mute signal to the controlled devices during user speech input; and sending an unmute signal to the controlled devices following user speech input.
62. A method comprising:
sending a mute signal one or more controlled devices upon user actuation of a mute switch on a user interaction device;
receiving spoken user input in which the user input includes a request for audio or visual messaging;
transferring the received user input for interpretation;
classifying the user input as audio input;
transferring an audio signal to a speech recognition processor for interpretation of the audio signal into command or data information;
providing, by the speech recognition processor or data/command processor, an indication of the interpreted command(s) or input;
transferring the interpreted command(s) or input to a sequence processor for validation;
obtaining sequence steps;
identifying valid commands at each sequence step;
transitioning from step to step within a sequence or between sequences;
validating the interpreted command or input to be within an acceptable range and received in sequence for an associated task as specified in a predefined state table;
preparing audio feedback for an audio response to the user action;
preparing, using a visual output formatter, a messaging page ; outputting the response to the user;
selecting a person for messaging;
establishing an Internet connection and opening a bi-directional channel therein; and
terminating the messaging session.
63. The method of claim 62, in which the bi-directional channel is a voice over Internet Protocol channel.
64. A method comprising:
sending a mute signal one or more controlled devices upon user actuation of a mute switch on a user interaction device;
receiving spoken user input in which the user input includes a request to make a telephone call;
transferring the received user input for interpretation;
classifying the user input as audio input;
transferring an audio signal to a speech recognition processor for interpretation of the audio signal into command or data information;
providing, by the speech recognition processor or data/command processor, an indication of the interpreted command(s) or input;
transferring the interpreted command(s) or input to a sequence processor for validation;
obtaining sequence steps;
identifying valid commands at each sequence step; transitioning from step to step within a sequence or between sequences;
validating the interpreted command or input to be within an acceptable range and received in sequence for an associated task as specified in a predefined state table;
preparing, using a speech synthesizer or. a pre-recorded file for playback, an audio response to the input;
preparing, using a visual output formatter, a make telephone call page;
outputting the response to the user;
selecting a person or telephone number for a telephone call;
establishing an Internet connection with a voice over Internet Protocol server and opening a bi-directional voice over Internet Protocol channel therein; and
terminating the telephone call.
PCT/US2004/022301 2003-07-30 2004-07-28 Media center controller system and method WO2005022295A2 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US49093703P 2003-07-30 2003-07-30
US60/490,937 2003-07-30
US10/897,093 2004-07-23
US10/897,093 US20050027539A1 (en) 2003-07-30 2004-07-23 Media center controller system and method

Publications (2)

Publication Number Publication Date
WO2005022295A2 true WO2005022295A2 (en) 2005-03-10
WO2005022295A3 WO2005022295A3 (en) 2006-09-21

Family

ID=34107928

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2004/022301 WO2005022295A2 (en) 2003-07-30 2004-07-28 Media center controller system and method

Country Status (2)

Country Link
US (1) US20050027539A1 (en)
WO (1) WO2005022295A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009109038A1 (en) * 2008-03-04 2009-09-11 Streamband Flexible router
CN103594085A (en) * 2012-08-16 2014-02-19 百度在线网络技术(北京)有限公司 Method and system providing speech recognition result

Families Citing this family (75)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7747596B2 (en) * 2005-06-17 2010-06-29 Fotonation Vision Ltd. Server device, user interface appliance, and media processing network
US7792970B2 (en) 2005-06-17 2010-09-07 Fotonation Vision Limited Method for establishing a paired connection between media devices
US7685341B2 (en) * 2005-05-06 2010-03-23 Fotonation Vision Limited Remote control apparatus for consumer electronic appliances
KR100520118B1 (en) * 2003-08-21 2005-10-10 삼성전자주식회사 Integrated control device for multi controled device and integrated control method thereof
US20060041926A1 (en) * 2004-04-30 2006-02-23 Vulcan Inc. Voice control of multimedia content
US7613893B2 (en) * 2004-06-22 2009-11-03 Intel Corporation Remote audio
US20060036438A1 (en) * 2004-07-13 2006-02-16 Microsoft Corporation Efficient multimodal method to provide input to a computing device
US7865365B2 (en) * 2004-08-05 2011-01-04 Nuance Communications, Inc. Personalized voice playback for screen reader
US20060104430A1 (en) * 2004-11-12 2006-05-18 International Business Machines Corporation Method for multiple dialing by phone
US8942985B2 (en) * 2004-11-16 2015-01-27 Microsoft Corporation Centralized method and system for clarifying voice commands
US7778821B2 (en) * 2004-11-24 2010-08-17 Microsoft Corporation Controlled manipulation of characters
DE102005004941B4 (en) * 2005-02-02 2006-12-21 Avt Audio Vision Technology Gmbh Conversion of data, in particular for the reproduction of audio and / or video information
US7818350B2 (en) 2005-02-28 2010-10-19 Yahoo! Inc. System and method for creating a collaborative playlist
US7636300B2 (en) * 2005-04-07 2009-12-22 Microsoft Corporation Phone-based remote media system interaction
JP4872241B2 (en) * 2005-05-31 2012-02-08 船井電機株式会社 TV receiver
JP4765427B2 (en) * 2005-06-20 2011-09-07 船井電機株式会社 AV equipment with voice recognition function
US20070005370A1 (en) * 2005-06-30 2007-01-04 Scott Elshout Voice-activated control system
US7424431B2 (en) * 2005-07-11 2008-09-09 Stragent, Llc System, method and computer program product for adding voice activation and voice control to a media player
US9866697B2 (en) 2005-08-19 2018-01-09 Nexstep, Inc. Consumer electronic registration, control and support concierge device and method
US8224647B2 (en) 2005-10-03 2012-07-17 Nuance Communications, Inc. Text-to-speech user's voice cooperative server for instant messaging clients
US7966577B2 (en) 2005-10-11 2011-06-21 Apple Inc. Multimedia control center
US8769408B2 (en) * 2005-10-07 2014-07-01 Apple Inc. Intelligent media navigation
US7721208B2 (en) * 2005-10-07 2010-05-18 Apple Inc. Multi-media center for computing systems
US9083564B2 (en) 2005-10-13 2015-07-14 At&T Intellectual Property I, L.P. System and method of delivering notifications
US20070106506A1 (en) * 2005-11-07 2007-05-10 Ma Changxue C Personal synergic filtering of multimodal inputs
US7979790B2 (en) * 2006-02-28 2011-07-12 Microsoft Corporation Combining and displaying multimedia content
US7925975B2 (en) 2006-03-10 2011-04-12 Microsoft Corporation Searching for commands to execute in applications
US8813163B2 (en) * 2006-05-26 2014-08-19 Cyberlink Corp. Methods, communication device, and communication system for presenting multi-media content in conjunction with user identifications corresponding to the same channel number
US20070280282A1 (en) * 2006-06-05 2007-12-06 Tzeng Shing-Wu P Indoor digital multimedia networking
US20070292135A1 (en) * 2006-06-09 2007-12-20 Yong Guo Integrated remote control signaling
US20070286600A1 (en) * 2006-06-09 2007-12-13 Owlink Technology, Inc. Universal IR Repeating over Optical Fiber
US20090115915A1 (en) * 2006-08-09 2009-05-07 Fotonation Vision Limited Camera Based Feedback Loop Calibration of a Projection Device
US7769593B2 (en) 2006-09-28 2010-08-03 Sri International Method and apparatus for active noise cancellation
US8266664B2 (en) 2007-01-31 2012-09-11 At&T Intellectual Property I, Lp Methods and apparatus to provide messages to television users
US7765266B2 (en) * 2007-03-30 2010-07-27 Uranus International Limited Method, apparatus, system, medium, and signals for publishing content created during a communication
US8060887B2 (en) 2007-03-30 2011-11-15 Uranus International Limited Method, apparatus, system, and medium for supporting multiple-party communications
US7765261B2 (en) * 2007-03-30 2010-07-27 Uranus International Limited Method, apparatus, system, medium and signals for supporting a multiple-party communication on a plurality of computer servers
US7950046B2 (en) * 2007-03-30 2011-05-24 Uranus International Limited Method, apparatus, system, medium, and signals for intercepting a multiple-party communication
US8627211B2 (en) * 2007-03-30 2014-01-07 Uranus International Limited Method, apparatus, system, medium, and signals for supporting pointer display in a multiple-party communication
US8702505B2 (en) * 2007-03-30 2014-04-22 Uranus International Limited Method, apparatus, system, medium, and signals for supporting game piece movement in a multiple-party communication
US8150261B2 (en) * 2007-05-22 2012-04-03 Owlink Technology, Inc. Universal remote control device
US20080313050A1 (en) * 2007-06-05 2008-12-18 Basir Otman A Media exchange system
US8201096B2 (en) * 2007-06-09 2012-06-12 Apple Inc. Browsing or searching user interfaces and other aspects
US8185839B2 (en) * 2007-06-09 2012-05-22 Apple Inc. Browsing or searching user interfaces and other aspects
US10877623B2 (en) * 2007-06-18 2020-12-29 Wirepath Home Systems, Llc Dynamic interface for remote control of a home automation network
US20090018818A1 (en) * 2007-07-10 2009-01-15 Aibelive Co., Ltd. Operating device for natural language input
US20090064258A1 (en) * 2007-08-27 2009-03-05 At&T Knowledge Ventures, Lp System and Method for Sending and Receiving Text Messages via a Set Top Box
US20090156251A1 (en) * 2007-12-12 2009-06-18 Alan Cannistraro Remote control protocol for media systems controlled by portable devices
TWI385932B (en) * 2008-03-26 2013-02-11 Asustek Comp Inc Device and system for remote controlling
EP2141674B1 (en) * 2008-07-01 2019-03-06 Deutsche Telekom AG Assembly with device which can be controlled remotely
US8078397B1 (en) 2008-08-22 2011-12-13 Boadin Technology, LLC System, method, and computer program product for social networking utilizing a vehicular assembly
US8073590B1 (en) 2008-08-22 2011-12-06 Boadin Technology, LLC System, method, and computer program product for utilizing a communication channel of a mobile device by a vehicular assembly
US8265862B1 (en) 2008-08-22 2012-09-11 Boadin Technology, LLC System, method, and computer program product for communicating location-related information
US8131458B1 (en) 2008-08-22 2012-03-06 Boadin Technology, LLC System, method, and computer program product for instant messaging utilizing a vehicular assembly
US8340974B2 (en) * 2008-12-30 2012-12-25 Motorola Mobility Llc Device, system and method for providing targeted advertisements and content based on user speech data
US20120030712A1 (en) * 2010-08-02 2012-02-02 At&T Intellectual Property I, L.P. Network-integrated remote control with voice activation
US8861744B1 (en) * 2011-03-15 2014-10-14 Lightspeed Technologies, Inc. Distributed audio system
US20120317492A1 (en) * 2011-05-27 2012-12-13 Telefon Projekt LLC Providing Interactive and Personalized Multimedia Content from Remote Servers
KR20130037777A (en) * 2011-10-07 2013-04-17 삼성전자주식회사 Display apparatus and control method thereof
US9020824B1 (en) * 2012-03-09 2015-04-28 Google Inc. Using natural language processing to generate dynamic content
CN104272202B (en) * 2012-06-29 2017-07-14 哈曼(中国)投资有限公司 Automotive universal control device for interfacing with sensor and controller
US20140052438A1 (en) * 2012-08-20 2014-02-20 Microsoft Corporation Managing audio capture for audio applications
US9805721B1 (en) * 2012-09-21 2017-10-31 Amazon Technologies, Inc. Signaling voice-controlled devices
US9344562B2 (en) * 2012-11-30 2016-05-17 At&T Intellectual Property I, Lp Apparatus and method for managing interactive television and voice communication services
US9224404B2 (en) * 2013-01-28 2015-12-29 2236008 Ontario Inc. Dynamic audio processing parameters with automatic speech recognition
US9894312B2 (en) 2013-02-22 2018-02-13 The Directv Group, Inc. Method and system for controlling a user receiving device using voice commands
US10133546B2 (en) * 2013-03-14 2018-11-20 Amazon Technologies, Inc. Providing content on multiple devices
US9842584B1 (en) 2013-03-14 2017-12-12 Amazon Technologies, Inc. Providing content on multiple devices
CN105357564A (en) * 2014-08-20 2016-02-24 中兴通讯股份有限公司 Remote control mobile terminal, remote control system and remote control method
US10178473B2 (en) 2014-09-05 2019-01-08 Plantronics, Inc. Collection and analysis of muted audio
US9691070B2 (en) * 2015-09-01 2017-06-27 Echostar Technologies L.L.C. Automated voice-based customer service
US10496363B2 (en) 2017-06-16 2019-12-03 T-Mobile Usa, Inc. Voice user interface for data access control
US10334415B2 (en) * 2017-06-16 2019-06-25 T-Mobile Usa, Inc. Voice user interface for device and component control
US10152297B1 (en) 2017-11-21 2018-12-11 Lightspeed Technologies, Inc. Classroom system
US10595073B2 (en) * 2018-06-03 2020-03-17 Apple Inc. Techniques for authorizing controller devices

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5774859A (en) * 1995-01-03 1998-06-30 Scientific-Atlanta, Inc. Information system having a speech interface
US6397186B1 (en) * 1999-12-22 2002-05-28 Ambush Interactive, Inc. Hands-free, voice-operated remote control transmitter

Family Cites Families (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03163623A (en) * 1989-06-23 1991-07-15 Articulate Syst Inc Voice control computor interface
US5020107A (en) * 1989-12-04 1991-05-28 Motorola, Inc. Limited vocabulary speech recognition system
US5689663A (en) * 1992-06-19 1997-11-18 Microsoft Corporation Remote controller user interface and methods relating thereto
US5890122A (en) * 1993-02-08 1999-03-30 Microsoft Corporation Voice-controlled computer simulateously displaying application menu and list of available commands
US5748841A (en) * 1994-02-25 1998-05-05 Morin; Philippe Supervised contextual language acquisition system
US5748974A (en) * 1994-12-13 1998-05-05 International Business Machines Corporation Multimodal natural language interface for cross-application tasks
JP3484554B2 (en) * 1995-02-28 2004-01-06 日本テキサス・インスツルメンツ株式会社 Semiconductor device
IL119948A (en) * 1996-12-31 2004-09-27 News Datacom Ltd Voice activated communication system and program guide
US6188985B1 (en) * 1997-01-06 2001-02-13 Texas Instruments Incorporated Wireless voice-activated device for control of a processor-based host system
US5884266A (en) * 1997-04-02 1999-03-16 Motorola, Inc. Audio interface for document based information resource navigation and method therefor
US5987525A (en) * 1997-04-15 1999-11-16 Cddb, Inc. Network delivery of interactive entertainment synchronized to playback of audio recordings
US6313851B1 (en) * 1997-08-27 2001-11-06 Microsoft Corporation User friendly remote system interface
US6505159B1 (en) * 1998-03-03 2003-01-07 Microsoft Corporation Apparatus and method for providing speech input to a speech recognition system
US6434524B1 (en) * 1998-09-09 2002-08-13 One Voice Technologies, Inc. Object interactive user interface using speech recognition and natural language processing
US6499013B1 (en) * 1998-09-09 2002-12-24 One Voice Technologies, Inc. Interactive user interface using speech recognition and natural language processing
US6304523B1 (en) * 1999-01-05 2001-10-16 Openglobe, Inc. Playback device having text display and communication with remote database of titles
US6553345B1 (en) * 1999-08-26 2003-04-22 Matsushita Electric Industrial Co., Ltd. Universal remote control allowing natural language modality for television and multimedia searches and requests
US6192340B1 (en) * 1999-10-19 2001-02-20 Max Abecassis Integration of music from a personal library with real-time information
US6339706B1 (en) * 1999-11-12 2002-01-15 Telefonaktiebolaget L M Ericsson (Publ) Wireless voice-activated remote control device
US20020055844A1 (en) * 2000-02-25 2002-05-09 L'esperance Lauren Speech user interface for portable personal devices
FR2810150B1 (en) * 2000-06-13 2002-10-04 St Microelectronics Sa DYNAMIC RANDOM MEMORY DEVICE AND METHOD FOR CONTROLLING ACCESS TO READ SUCH A MEMORY
US6629077B1 (en) * 2000-11-22 2003-09-30 Universal Electronics Inc. Universal remote control adapted to receive voice input
US20020090934A1 (en) * 2000-11-22 2002-07-11 Mitchelmore Eliott R.D. Content and application delivery and management platform system and method
US20020099545A1 (en) * 2001-01-24 2002-07-25 Levitt Benjamin J. System, method and computer program product for damage control during large-scale address speech recognition
US6747566B2 (en) * 2001-03-12 2004-06-08 Shaw-Yuan Hou Voice-activated remote control unit for multiple electrical apparatuses
WO2003050557A2 (en) * 2001-12-07 2003-06-19 Dashsmart Investments, Llc Portable navigation and communication systems
EP1466460A1 (en) * 2002-01-15 2004-10-13 Avaya Technology Corp. Communication application server for converged communication services
KR20040054061A (en) * 2002-12-17 2004-06-25 주식회사 이머텍 Internet Phone System and Method for a Mobile Telephone Service

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5774859A (en) * 1995-01-03 1998-06-30 Scientific-Atlanta, Inc. Information system having a speech interface
US6397186B1 (en) * 1999-12-22 2002-05-28 Ambush Interactive, Inc. Hands-free, voice-operated remote control transmitter

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009109038A1 (en) * 2008-03-04 2009-09-11 Streamband Flexible router
CN103594085A (en) * 2012-08-16 2014-02-19 百度在线网络技术(北京)有限公司 Method and system providing speech recognition result
CN103594085B (en) * 2012-08-16 2019-04-26 百度在线网络技术(北京)有限公司 It is a kind of that the method and system of speech recognition result are provided

Also Published As

Publication number Publication date
WO2005022295A3 (en) 2006-09-21
US20050027539A1 (en) 2005-02-03

Similar Documents

Publication Publication Date Title
US20050027539A1 (en) Media center controller system and method
US9948772B2 (en) Configurable phone with interactive voice response engine
JP3651508B2 (en) Information processing apparatus and information processing method
US6816577B2 (en) Cellular telephone with audio recording subsystem
US6792082B1 (en) Voice mail system with personal assistant provisioning
US6697467B1 (en) Telephone controlled entertainment
US8063749B2 (en) Multifunctional two-way remote control device
US20070026852A1 (en) Multimedia telephone system
EP1924977A1 (en) Method and system for obtaining feedback from at least one recipient via a telecommunication network
CN100455086C (en) Monitoring mobile phone and its remote monitoring method
WO2001078443A2 (en) Earset communication system
US7213259B2 (en) Method and apparatus for a mixed-media messaging delivery system
EP2312821A1 (en) Method and apparatus for unified interface for heterogeneous session management
CN111835923A (en) Mobile voice interactive dialogue system based on artificial intelligence
US7158618B1 (en) Apparatus and method for recording URL together with voice message in a browser equipped telephone
JP4503564B2 (en) Mobile phone having an answering machine announcement generation function and an answering machine announcement generation system
US9088815B2 (en) Message injection system and method
JP2008048126A (en) Cellular phone having function of generating announcement for answering machine and system of generating announcement for answering machine
IL157044A (en) System and method for recording audible and/or visual information
KR100651512B1 (en) Method for dispatching and receiving the message in wireless terminal
JP2008099121A (en) Cellular phone, and program
JP4537360B2 (en) Mobile phone having an answering machine announcement generation function and an answering machine announcement generation system
JP2674951B2 (en) Button telephone device
JP2005141767A (en) Information processor and information processing method
JP2003069718A (en) System for supporting remote interaction between person handicapped in hearing and person having no difficulty in hearing

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase