US 20100184409 A1
A device and method for managing voice messages using a graphic user interface (GUI) to display text transcribed from the content of a voice message, to parse the transcribed text and to use the parsed data for an application running on a device. The present invention hence supplants the current approach of retrieving voice messages by phone using a voice prompt based system to a GUI based system. This GUI based system individually lists voice messages in a menu list, making it very simple for a user to not only read the transcribed text from the voice message but to run applications based on the parsed data from the transcribed text, as well as performing other more familiar functions as such retrieving the voice message for playing back.
1. A device for managing voice messages left for a user comprising a graphical user interface that displays text transcribed from the content of a voice message, and a parser that parses data from the transcribed text, the parsed data being used for an application running on the device.
2. A device for managing voice messages as in
3. A device for managing voice messages as in
4. A device for managing voice messages as in
5. A device for managing voice messages as in
6. A device for managing voice messages as in
7. A device for managing voice messages as in
8. A device for managing voice messages as in
9. A device for managing voice messages as in
10. A device for managing voice messages as in
11. A device for managing voice messages as in
12. A device for managing voice messages as in
13. A device for managing voice messages as in
14. A device for managing voice messages as in
15. A device for managing voice messages as in
16. A device for managing voice messages as in
17. A device for managing voice messages as in
18. A device for managing voice messages as in
19. A device for managing voice messages as in
20. A device for managing voice messages as in
21. A device for managing voice messages as in
22. A device for managing voice messages as in
23. A method for managing voice messages left for a user comprising the steps of: (a) displaying text transcribed from the content of a voice message and (b) a parsing the transcribed text, the parsed data being used for an application running on a device.
24. A method for managing voice messages as in
25. A method for managing voice messages as in
26. A method for managing voice messages as in
27. A method for managing voice messages as in
28. A method for managing voice messages as in
29. A method for managing voice messages as in
30. A method for managing voice messages as in
31. A method for managing voice messages as in
32. A method for managing voice messages as in
33. A method for managing voice messages as in
34. A method for managing voice messages as in
35. A method for managing voice messages as in
36. A method for managing voice messages as in
37. A method for managing voice messages as in
38. A method for managing voice messages as in
39. A method for managing voice messages as in
40. A method for managing voice messages as in
41. A method for managing voice messages as in
1. Field of the Invention
This invention relates to a method of managing voicemails from a mobile telephone.
2. Description of the Prior Art
Voicemail has the sole purpose of storing voice messages from someone trying to call a user's telephone when that user is otherwise unavailable and then relaying those messages to the user when convenient. But today's voicemail systems, particularly for mobile telephones, fail to do this intelligently. The primary reason is the nature of the interface from the user's mobile telephone to the remote voice mail server: typically, a mobile telephone user will call (or be called by) a voicemail server controlled by the network operator. The voicemail server will generate a synthetic voice announcing the number of messages to the user and then replaying the messages; various options are then spoken by the synthetic voice, such as “press 1 to reply”, “press 2 to delete”, “press 3 to repeat” etc. This presents several challenges to the user: first, he may not have a pen and paper to hand to take down any important information; secondly, he may forget or not be able to hear the options and hence will be unable to operate the voicemail system effectively.
Because of this inadequate and opaque interface, voicemail is not used by at least 45% of mobile telephone users. Of those that do use voicemail, it typically accounts for 30% of a user's call time and spend. One of the reasons for this perhaps surprisingly high level is that, because of the difficult interface, users frequently dial in again just to listen to key voice messages they did not get the details of the first time round.
Some efforts have been made to make retrieving voicemails easier: reference may be made for example to U.S. Pat. No. 6,507,643 to Breveon Inc: in this patent, voice mail is automatically converted, using a voice recognition computer, to a text message suitable for sending as an e-mail message and for viewing on a text display device such as a PC or laptop computer. Reading a written message can be quicker than having to listen to a spoken voicemail; there is also no need to write down important information from the message since it has already been transcribed. However, automated voicemail systems have quite limited performance and accuracy; they also slavishly transcribe the normal hesitations in human speech (‘er’, ‘um’, ‘ah’ etc.). When one is listening to human speech, one can readily filter out these sounds and concentrate on the substantive communication. Seeing these hesitations slavishly transcribed to a text message or an e-mail can make the sender appear less then lucid.
In a first aspect, there is a method of managing voice messages using a mobile telephone, comprising the steps of:
The present invention hence supplants the current approach of retrieving voice messages (based on the user listening to various options spoken by a synthetic voice, such as “press 1 to reply”, “press 2 to delete”, “press 3 to repeat”) with a GUI based system; this system individually lists voice messages in a menu list displayed on the mobile telephone, making it very simple for an end-user to select a message to initiate playback of the voice message.
In an implementation, the GUI is a hierarchical interface which at a first or second level lists the number of stored voice messages in an in-box. The interface may list at a first or second level whether the received voice messages are new or have been listened to. The interface could be an inbox view with folders for storage/retrieval of voice messages.
In addition, the GUI can list the name of a person leaving a voice message or their telephone number. This greatly aids operation: the end user can quickly scan the menu list of stored voice messages, looking at the caller name etc. to decide if there are any important messages to listen to immediately.
The GUI may display a menu list with one or more of the following selectable options: play all voice messages; delete all voice messages; mark all voice messages as heard; forward all voice messages; store all voice messages. Again, this GUI-based approach is far easier for most people to operate than the prior art “press 1 to reply”, “press 2 to delete”, “press 3 to repeat” etc. approach.
The GUI may also be a hierarchical interface which displays a menu list of selectable items that enable the user to initiate further actions in respect of a selected voice message. For example, the further actions could be selected from the list: erase voice message; next voice message; fast forward through voice message; rewind through voice message; play previous voice message; call back to sender of voice message; open up text messaging application; store voice message in a specific folder; forward voice message; add caller's telephone number to contacts; configure greetings; configure call diversion behaviour. Again, presenting these options graphically on a display of the mobile telephone is far better than the current approach which give no visual cues as to how to initiate these functions.
Adding a caller's telephone number to a contacts application is an example of parsing the transcribed text message and using the parsed data in an application running on the mobile telephone. The GUI can display a menu list of other selectable items that enable the user to initiate further kinds of parsing and use of the parsed data. For example:
(a) extracting the phone number spoken allowing it to be used (to make a call), saved, edited or added to a phone book;
One or more items from the list could be displayed whilst the voice message is being played back on the device.
Speaking a command to initiate the further actions is also possible; then the telephone may display synchronised aural prompts (IVR) to facilitate a user speaking the command they want executed.
In another implementation, voice messages are succinctly transcribed to text format by remote, human transcribers and the transcribed messages are then sent to the mobile telephone. The GUI then lists any voice messages that have been converted to text format and the GUI further enables those voice messages converted to text format to be selected to cause the text format message to be displayed.
In a second aspect, there is a mobile telephone programmed to perform the above methods.
The present invention will be described with reference to the accompanying drawings, in which:
The present invention is implemented by SpinVox Limited, London, United Kingdom as part of a suite of mobile telephone products:
1. VoicemailView™: Voicemail to Text system—This gives subscribers the option to have voicemail delivered to their mobile telephone as text (SMS/MMS or equivalent messaging format) with the option to hear the original voicemail on the mobile telephone. The term ‘SMS’ means the short message service for sending plain text messages to mobile telephones; ‘MMS’ means the multimedia messaging service developed by 3GPP (Third Generation Partnership Project) for sending multimedia communications between mobile telephones and other forms of wireless information device. The terms also embrace any intermediary technology (such as EMS (Enhanced Message Service)) and variants, such as Premium SMS, and any future enhancements and developments of these services.
Key to the accurate transcription of voice messages to text format (as deployed in VoicemailView and VoiceMessenger) is the use of human operators to do the actual transcribing intelligently by extracting the message (not a verbose word-for-word transcription), and not automated voice recognition systems. Key to the efficient operation of this system is an IT architecture that rapidly sends voice files to the operators and allows them to rapidly hear these messages, efficiently generate a transcription and to them send the transcribed message as a text message.
A. VoicemailView™ Voicemail to Text system
There are three solutions described which deliver the Voicemail to Text system:
Referring now to
Referring now to
Referring now to
In any of the above variants, the mobile phone (or other wireless information device of some nature) will need to be upgraded OTA (Over the Air) or otherwise, in the following manner:
There are two options:
When one opens a standard SMS message, one can generally readily access further functionality (via an Options menu in Nokia mobile telephones, for example), such as ‘Erase’, ‘Reply’, ‘Edit’ etc. Under this standard ‘Options’ menu, or equivalent, the present implementation adds three new functions, as shown in
We expand on these new functions below:
Hear Original: This allows the user to now hear the original voicemail and uses the unique i/d encoded into the SMS/MMS message to correctly connect to the original voice file.
There are three options:
(i) The user goes into the standard voicemail system and follows the existing audio prompts for hearing the message.
In either case, upon ending the call to voicemail, the user is returned to the same point in the messaging application to decide what to do with the text/audio version.
(iii) The user embeds the original sound file in an MMS message (or equivalent, such as e-mail) to be played back locally on the terminal.
This uses the caller's number recorded with the message to call them back.
This takes the caller's number and automatically adds it to a new contact/address entry for the user to complete with name, etc.
This is a specific example of the mobile telephone software being able to parse the text that has been converted from voice and to use that intelligently. Other examples are:
(a) extracting the phone number spoken allowing it to be used (to make a call), saved, edited or added to a phone book;
The extent to which this can be done depends on the intelligence in your handset (in essence its parsing capacity and interoperability with other applications and common clipboard where this data is normally stored for use in other applications). Today, nearly all phones support extraction of phone numbers, email addresses and web addresses from a text message. This is normally made available when the user is reading the message by the content being underlined (as a hyperlink or equivalent); the user then simply selects ‘Options’ (as found on Nokia telephones, or its equivalent on a different make of handset) and ‘Use’ (as found on Nokia telephones, or its equivalent on a different handset) and then depending on the content type, further context sensitive options (e.g. with a street address it might offer—Look up, Navigate, Save in Address book, etc. . . . ).
This application can be used in either stand-alone or as integral part of the VoicemailView Voice to SMS/MMS system (or equivalent text delivery system) described above at B.1.
The Voicemail Management application gives a user a GUI (Graphical User Interface) in addition to the standard audio prompts they are used to receiving when accessing and managing normal audio voicemail. When a subscriber calls (
For programming purposes, these controls will nearly all relate to standard DTMF tones that the voicemail system uses as input to it when the user currently presses keys on their phone's keypad.
During this process, the user is always offered the aural navigation options which are synchronised with what is shown on-screen, so that they have the best of both worlds. With the use of simple command based Speech Recognition, the user may just speak the command they want to execute, so if the user wants to play new messages, they would just say “Play” and the VoicemailManager engine would recognise this command and do just that—play the message.
Note: The exact numbers (keypad numbers) and their related functions will be those of the existing voicemail system and so will vary by network operator/voicemail system.
It is often preferable for users to want to send a message in text format, rather than voice—e.g. if they do not want to disturb the receiver, but want to get the message to them. But it is often difficult for people to thumb-type text on a small alpha-numeric keypad. They may also be mobile, such as walking, or in a car or have only one hand available, or be unable to type, such as whilst driving. The VoiceMessenger™ speech to text service addresses this need.
The user goes into their Messaging/Text application running on their mobile telephone, simply selects the message recipient either from their phone's address book, or types their number in, then selects the new VoiceMessenger option, as shown in
When connected to the remote VoiceMessenger Engine, the user simply speaks his message and the remote VoiceMessenger Engine records it, and then sends the audio file for conversion to text using the human operator based voice transcription system. The text format message is then packaged as a SMS/MMS (email or other appropriate messaging system) and sent through the SMS/MMS etc. gateway. The user will be given aural prompts for controlling the input, hearing the conversion and sending the message.
A user with an MMS enabled phone will be able to send voice-notes via an MMS which the human operator based voice transcription service will then transcribe and send on to their desired destination. They can also have their Voicemail converted and sent to their phone in MMS format if preferred.
This is to speed up the processing of inbound voice files and reduce operating costs. The prime function will be to auto-detect spoken phone numbers, and detect language to route audio files to the correct human operator staffed transcription bureau. It will also be used for detecting names and spoken numbers and addresses from the users online phone-book (see below) and commands for VoicemailManager controls.
There will be two forms of online address book that a user will be able to use when connected to SpinVox services by simply saying the name of the person they want to say:
Using Presently Available Servers, users can define what mode they want to be in for receiving communications, e.g. ‘Meeting’ lets a user know before the communicate that the person they want to contact is in a meeting and will accept say SMS/MMS or a VoiceView text message. Once out of the meeting, the user can then change their contact status to ‘Available’ and be contacted by a phone call.
A standard voicemail server system with IVR is the foundation; the IVR is programmed as shown in the
The user's phone will (during technical provisioning shown below) have the ‘1’ key (standard voicemail access key) re-programmed to automatically call the SpinVox voicemail server and have them automatically logged-in (unique phone-number+PIN) which takes them to the top level of the IVR tree.
If at any point the user hangs up, then the session is terminated with the relevant outcome. If this happens during a recording, including a dropped line from another mobile caller, then it is assumed to be the end of a recording, and the system proceeds to the transcription stage.
Each transcribed voicemail will contain a unique number starting with say a ‘4’ (depends on final IVR tree configuration), so that when a user presses and holds ‘1’ to connect to SpinVox's voicemail server, they simply press the unique message i/d—e.g. 403 which takes them to the 3rd message they have in the queue.
As shown in
The IVR system will accept a user programming in a speed-dial that allows them to dial their unique SpinVox number+PIN. They are then able to access all features shown above.
The user's phone is configured to divert to SpinVox voicemail under conditions they define shown below, where the caller will either hear:
The above IVR diagram shows how a user accesses VoiceMessenger, whether directly from their mobile phone, or via another phone.
The IVR system will accept a user programming in a speed-dial that allows them to dial their unique SpinVox number+PIN+‘3’.
If from their mobile phone, the technical provisioning below will have configured a speed-dial (by default key ‘2’) to dial and log them in (voicemail number+PIN+3) directly to the VoiceMessenger option.
They will then hear a standard prompt:
“Welcome to SpinVox's VoiceMessenger. At the tone, please either speak the destination number or type it in, then dictate the message you wish to send. Hang-up to send, or press # to send a new message.” [tone]
During Technical Provisioning, user data (handset, network, etc. . . . ) will be re-used to confirm to the user what they have selected.
Key will be the system sending the user SMS messages to part automate the configuration of the user's handset (diverts & V.Card for VoiceMessenger) and confirmation of successful setup. These messages are all sent as High Priority to ensure user/salesperson is not left ‘hanging’ whilst waiting for configuration SMS to arrive.
The steps are:
Step 1: handset selection, from a drop down list shown on the provisioning screen (usually at the point of sale)
This is provided to a human operator transcriber when they log-on to their account. All they need is a web browser, sound card, media player capable of playing and controlling playback of the media files or streaming protocol, and high-speed internet access.
Note: For User Data Protection reasons, the Transcriber will never see auto-populated telephone fields (or other user data fields), so the system will not show these unless it requires the Transcriber to type the destination number in.
When the Transcriber hits ‘Send’, the system will automatically spell check the message and if any errors occur, correct them and display the corrections to the Transcriber with a prompt ‘Accept & Send”, or allow them to manually correct (as there might be a particular spelling they want).
To do this properly, the spell checking process will include a real-noun dictionary relevant to the geographic area and culture of the user. So for example, in the UK the real-noun dictionary will contain not only English names, but place names, landmarks, road-names, chain establishment names (e.g. pubs, bars, restaurants, etc. . . . ), etc. . . . .
Where there isn't a match, the Transcriber just double clicks on the underlined word and is offered the closest matches. If need be, they can rewind and re-listen to that part of the message to make the appropriate selection.
They can view the statistics for all the Transcriber accounts they own below them. They will be able to view and analyse:
These are the requirements for the Transcription Services to be used for both VoicemailView and VoiceMessenger services.
The key requirement is to deliver the actual message, not all the redundant information which is often spoken and left in a message.
The Transcription service must minimally provide complete confidentiality of messages it, transcribes within the Data Protection Act 98 or other legislation in force at the time.
If the user receives a text message, it will be intelligible—99% accurate to original voice file message.
All numbers, phone numbers, email addresses, web addresses, street addresses will be correctly converted.
Character Set 100% Compatible with SMS/MMS Allowed Characters
Characters used during transcription are compatible with the SMS/MMS system resulting message will be sent through.
User will clearly know to continue to next message to continue reading transcription. If system doesn't automatically provide obvious prompt to do so, then insert ‘1 of 2’, ‘2 of 3’ or the like.
Transcriptionists must be able to deal with the various regional accents and sayings that occur in a country. For instance, in the UK alone, there are over 12 regional accents ranging from the ‘posh’ South-Eastern accent to the thick Glaswegian accent of West Scotland to the lilted Irish accent. These should be translated correctly and in their form of saying things. Routing of a message to transcribers with the appropriate capabilities may be provided.
Typically speech contains much redundant ‘noise’, e.g.: ‘ummms’, ‘ahhh's’, ‘erre, ‘ehmm’, pauses, breaths, coughs, sneezes and other typical speech artefacts. These clearly mustn't be included in the transcription.
Often a message will contain repeated phrases or names to clarify what is being said. These shouldn't be included.
Spoken message: “See you outside Waxy O'Connors, that's Waxy as in candle wax and O'Connor as in Irish singer Sinead O'Connor.”
Standard abbreviation of common terms should be used:
Whenever a number is spoken, the numeric format will be written down.
E.g. “See you at seven forty five tonight”=“See you at 7:45 pm”
To save character space, phone numbers are a single string of numbers with no spaces:
E.g.: 07798625155, not 07798 625 155 as two additional space characters are being used.
If phone number is given with 00 for international dialing, then convert this into a ‘+’.
e.g. 00442075864103 should be +442075864103.
Again this saves character spaces and correctly defines the number for international dialing prefix which is interpreted by the local Network for the correct international dial out code which isn't always 00 (e.g. in US it's 011).
Messages must be correctly spelt and it is suggested that the relevant spell checker is used for all messages—e.g. UK English for the UK, US English for the US, etc. . . .
The dictionary/spell checker used must include Real Nouns (names) and Place Names to assist in getting the information in the message right 1st time.
There are several aspects of this:
In multi-cultural societies, it is important to know that on many days a certain community will be celebrating something. For example the Hindi new year (Divali) is not the same as the main UK new year, so on Divali, Transcribers must be prepared to hear greetings and wishes with this and other associated words in it and know how to spell them or what a message's context might mean.
(ii) Normal annual events—Easter, Christmas, New Year, etc. . . .
The local Transcription Bureau Manager must have a full calendar of all cultural, social and sporting events which they must plan for at least 2 days in advance. In addition, this will be critical to determining the likely load balancing required with staff. For instance, at the end of the recent England Rugby world cup win, the text messaging and voicemail loads in the 2-3 hours that followed the match probably exceeded 300% of their normal levels and there would have been lots of references to players names, technical words used in the game (try, conversion, ruck, mall, etc. . . . ), foreign cities and locations, and of course the following day all the traffic related to people getting back from the event, etc. . . . which will naturally skew the load balancing again.
After the best attempt has been made to figure out what the word might be (could be the name of a bar or place that is outside the normal vocabulary), a question mark in brackets will be placed after it.
Spoken message: Meet you at Jongleurs at 6 tonight.
The message may contain ‘drop-outs’, ‘gaps’ or other interference due to temporary Network coverage issues. In this case, insert a ‘_’ where the word(s) are missing.
E.g. “John, it's Mike and I'm _ late _ so see you at 6 pm.”
This will likely prompt the user to dial-in to listen to the original and see if they can make sense of the message.
More than 3 Drop Outs:
In the case the message is unintelligible due to a high number of drop outs (3 or more), then use the ‘Undecipherable’ option to send the user a notice that they need to either listen to a voicemail or try speaking their text message again.
The user will be notified via a text message using a standard template that there are undecipherable voice messages for them to listen to:
The standard text will say, “You have x new voicemail(s) to listen to that couldn't be converted. To hear them, please connect to VoicemailView by holding and pressing 1.”
Then the following fields will be automatically populated:
The standard text will say “We're sorry we couldn't convert the message you just dictated. Please try again speaking slowly and clearly. Thank you!”
Then the following fields will be automatically populated:
When it is clear that the person leaving the message is also using mood as part of the message, then the transcriptionist will include the following at the beginning of the message:
When the mood is unclear (e.g. may be just the way that person talks or the context that they're in), then don't add this in.
It is becoming common to insert text symbols to represent emotions emoticons). The following will be published and will be supported. This is the set that we will support and publish on our website.
The official full listing of SMS-Speak is at: http://sites.ninemsn.com.au/minisite/web2sms/help/smsdict.asp
During dictation of the VoiceMessenger message, the user may say “Insert symbol-name” and the transcriber will insert the appropriate symbol.
E.g. “Thanks for confirming our trip. Insert smiley. Bye!”=“Thanks for confirming our trip :-) Bye!”
Normal punctuation should be used such as capitals at the begging of sentence, full stops, question marks, exclamation marks, colons and semi-colons where it is clear that the intonation or the grammar requires it.
The Grammar checker used in the Transcribe Assistant ought to help eliminate mistypes.
Time taken for text message to arrive on receiver's phone from end of voicemail recording is on average 2 mins:
Queuing and load-balancing will be necessary to ensure optimal throughput of messages.