CA2244624A1 - Method and system for aligning natural and synthetic video to speech synthesis - Google Patents
Method and system for aligning natural and synthetic video to speech synthesisInfo
- Publication number
- CA2244624A1 CA2244624A1 CA002244624A CA2244624A CA2244624A1 CA 2244624 A1 CA2244624 A1 CA 2244624A1 CA 002244624 A CA002244624 A CA 002244624A CA 2244624 A CA2244624 A CA 2244624A CA 2244624 A1 CA2244624 A1 CA 2244624A1
- Authority
- CA
- Canada
- Prior art keywords
- text
- time stamp
- facial animation
- encoder
- bookmarks
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/46—Embedding additional information in the video signal during the compression process
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transforming into visible information
- G10L2021/105—Synthesis of the lips movements from speech, e.g. for talking heads
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/20—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
Abstract
According to MPEG-4's TTS architecture, facial animation can be driven by two streams simultaneously - text, and Facial Animation Parameters. In this architecture, text input is sent to a Text-To-Speech converter at a decoder that drives the mouth shapes of the face. Facial Animation Parameters are sent from an encoder to the face over the communication channel. The present invention includes codes(known as bookmarks) in the text string transmitted to the Text-to-Speech converter, which bookmarks are placed between words as well as inside them. According to the present invention, the bookmarks carry an encoder time stamp. Due to the nature of text-to-speech conversion, the encoder time stamp does not relate to real-world time, and should be interpreted as a counter. In addition, the Facial Animation Parameter stream carries the same encoder time stamp found in the bookmark of the text. The system of the present invention reads the bookmark and provides the encoder time stamp as well as a real-time time stamp to the facial animation system.
Finally, the facial animation system associates the correct facial animation parameter with the real-time time stamp using the encoder time stamp of the bookmark as a reference.
Finally, the facial animation system associates the correct facial animation parameter with the real-time time stamp using the encoder time stamp of the bookmark as a reference.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US08/905,931 US6567779B1 (en) | 1997-08-05 | 1997-08-05 | Method and system for aligning natural and synthetic video to speech synthesis |
US08/905,931 | 1997-08-05 |
Publications (2)
Publication Number | Publication Date |
---|---|
CA2244624A1 true CA2244624A1 (en) | 1999-02-05 |
CA2244624C CA2244624C (en) | 2002-05-28 |
Family
ID=25421706
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA002244624A Expired - Lifetime CA2244624C (en) | 1997-08-05 | 1998-08-05 | Method and system for aligning natural and synthetic video to speech synthesis |
Country Status (5)
Country | Link |
---|---|
US (3) | US6567779B1 (en) |
EP (1) | EP0896322B1 (en) |
JP (2) | JP4716532B2 (en) |
CA (1) | CA2244624C (en) |
DE (1) | DE69819624T2 (en) |
Families Citing this family (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6567779B1 (en) * | 1997-08-05 | 2003-05-20 | At&T Corp. | Method and system for aligning natural and synthetic video to speech synthesis |
US7366670B1 (en) | 1997-08-05 | 2008-04-29 | At&T Corp. | Method and system for aligning natural and synthetic video to speech synthesis |
JP3720230B2 (en) * | 2000-02-18 | 2005-11-24 | シャープ株式会社 | Expression data control system, expression data control apparatus constituting the same, and recording medium on which the program is recorded |
FR2807188B1 (en) * | 2000-03-30 | 2002-12-20 | Vrtv Studios | EQUIPMENT FOR AUTOMATIC REAL-TIME PRODUCTION OF VIRTUAL AUDIOVISUAL SEQUENCES FROM A TEXT MESSAGE AND FOR THE BROADCAST OF SUCH SEQUENCES |
AU2001248996A1 (en) * | 2000-04-19 | 2001-10-30 | Telefonaktiebolaget Lm Ericsson (Publ) | System and method for rapid serial visual presentation with audio |
KR100343006B1 (en) * | 2000-06-01 | 2002-07-02 | 김상덕 | Language input type facial expression control mathod |
US7149686B1 (en) * | 2000-06-23 | 2006-12-12 | International Business Machines Corporation | System and method for eliminating synchronization errors in electronic audiovisual transmissions and presentations |
US7120583B2 (en) | 2000-10-02 | 2006-10-10 | Canon Kabushiki Kaisha | Information presentation system, information presentation apparatus, control method thereof and computer readable memory |
US8046010B2 (en) | 2006-03-07 | 2011-10-25 | Sybase 365, Inc. | System and method for subscription management |
AU2008100836B4 (en) * | 2007-08-30 | 2009-07-16 | Machinima Pty Ltd | Real-time realistic natural voice(s) for simulated electronic games |
US10248931B2 (en) * | 2008-06-23 | 2019-04-02 | At&T Intellectual Property I, L.P. | Collaborative annotation of multimedia content |
US20090319884A1 (en) * | 2008-06-23 | 2009-12-24 | Brian Scott Amento | Annotation based navigation of multimedia content |
US8225348B2 (en) | 2008-09-12 | 2012-07-17 | At&T Intellectual Property I, L.P. | Moderated interactive media sessions |
US20100070858A1 (en) * | 2008-09-12 | 2010-03-18 | At&T Intellectual Property I, L.P. | Interactive Media System and Method Using Context-Based Avatar Configuration |
US9697535B2 (en) | 2008-12-23 | 2017-07-04 | International Business Machines Corporation | System and method in a virtual universe for identifying spam avatars based upon avatar multimedia characteristics |
US9704177B2 (en) | 2008-12-23 | 2017-07-11 | International Business Machines Corporation | Identifying spam avatars in a virtual universe (VU) based upon turing tests |
US8656476B2 (en) * | 2009-05-28 | 2014-02-18 | International Business Machines Corporation | Providing notification of spam avatars |
KR102117082B1 (en) | 2014-12-29 | 2020-05-29 | 삼성전자주식회사 | Method and apparatus for speech recognition |
Family Cites Families (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4520501A (en) * | 1982-10-19 | 1985-05-28 | Ear Three Systems Manufacturing Company | Speech presentation system and method |
GB8528143D0 (en) * | 1985-11-14 | 1985-12-18 | British Telecomm | Image encoding & synthesis |
US4884972A (en) * | 1986-11-26 | 1989-12-05 | Bright Star Technology, Inc. | Speech synchronized animation |
JP2518683B2 (en) * | 1989-03-08 | 1996-07-24 | 国際電信電話株式会社 | Image combining method and apparatus thereof |
US5111409A (en) * | 1989-07-21 | 1992-05-05 | Elon Gasper | Authoring and use systems for sound synchronized animation |
JP3036099B2 (en) * | 1991-01-30 | 2000-04-24 | 日本電気株式会社 | Data management methods |
US5640590A (en) * | 1992-11-18 | 1997-06-17 | Canon Information Systems, Inc. | Method and apparatus for scripting a text-to-speech-based multimedia presentation |
US5878396A (en) * | 1993-01-21 | 1999-03-02 | Apple Computer, Inc. | Method and apparatus for synthetic speech in facial animation |
US5473726A (en) * | 1993-07-06 | 1995-12-05 | The United States Of America As Represented By The Secretary Of The Air Force | Audio and amplitude modulated photo data collection for speech recognition |
US5608839A (en) * | 1994-03-18 | 1997-03-04 | Lucent Technologies Inc. | Sound-synchronized video system |
DE4331710A1 (en) * | 1993-09-17 | 1995-03-23 | Sel Alcatel Ag | Method and device for creating and editing text documents |
US5623587A (en) * | 1993-10-15 | 1997-04-22 | Kideo Productions, Inc. | Method and apparatus for producing an electronic image |
US5657426A (en) * | 1994-06-10 | 1997-08-12 | Digital Equipment Corporation | Method and apparatus for producing audio-visual synthetic speech |
JPH08194494A (en) * | 1995-01-13 | 1996-07-30 | Canon Inc | Sentence analyzing method and device |
US5634084A (en) * | 1995-01-20 | 1997-05-27 | Centigram Communications Corporation | Abbreviation and acronym/initialism expansion procedures for a text to speech reader |
US5930450A (en) * | 1995-02-28 | 1999-07-27 | Kabushiki Kaisha Toshiba | Recording medium, apparatus and method of recording data on the same, and apparatus and method of reproducing data from the recording medium |
JPH0916195A (en) * | 1995-07-03 | 1997-01-17 | Canon Inc | Information processing device and its method |
JPH0922565A (en) * | 1995-07-06 | 1997-01-21 | Sony Corp | Device and method for processing data |
US5806036A (en) * | 1995-08-17 | 1998-09-08 | Ricoh Company, Ltd. | Speechreading using facial feature parameters from a non-direct frontal view of the speaker |
US6477239B1 (en) * | 1995-08-30 | 2002-11-05 | Hitachi, Ltd. | Sign language telephone device |
JPH0982040A (en) * | 1995-09-14 | 1997-03-28 | Toshiba Corp | Recording medium, apparatus and method for recording data onto same recording medium as well as apparatus and method for reproducing data from same recording medium |
JPH09138767A (en) * | 1995-11-14 | 1997-05-27 | Fujitsu Ten Ltd | Communication equipment for feeling expression |
SE519244C2 (en) * | 1995-12-06 | 2003-02-04 | Telia Ab | Device and method of speech synthesis |
JP3588883B2 (en) * | 1995-12-08 | 2004-11-17 | ヤマハ株式会社 | Karaoke equipment |
US5880731A (en) * | 1995-12-14 | 1999-03-09 | Microsoft Corporation | Use of avatars with automatic gesturing and bounded interaction in on-line chat session |
US5802220A (en) * | 1995-12-15 | 1998-09-01 | Xerox Corporation | Apparatus and method for tracking facial motion through a sequence of images |
US5793365A (en) * | 1996-01-02 | 1998-08-11 | Sun Microsystems, Inc. | System and method providing a computer user interface enabling access to distributed workgroup members |
US5732232A (en) * | 1996-09-17 | 1998-03-24 | International Business Machines Corp. | Method and apparatus for directing the expression of emotion for a graphical user interface |
US5884029A (en) * | 1996-11-14 | 1999-03-16 | International Business Machines Corporation | User interaction with intelligent virtual objects, avatars, which interact with other avatars controlled by different users |
US5963217A (en) * | 1996-11-18 | 1999-10-05 | 7Thstreet.Com, Inc. | Network conference system using limited bandwidth to generate locally animated displays |
KR100236974B1 (en) * | 1996-12-13 | 2000-02-01 | 정선종 | Sync. system between motion picture and text/voice converter |
US5812126A (en) * | 1996-12-31 | 1998-09-22 | Intel Corporation | Method and apparatus for masquerading online |
US5920834A (en) | 1997-01-31 | 1999-07-06 | Qualcomm Incorporated | Echo canceller with talk state determination to control speech processor functional elements in a digital telephone system |
US5818463A (en) * | 1997-02-13 | 1998-10-06 | Rockwell Science Center, Inc. | Data compression for animated three dimensional objects |
US5977968A (en) * | 1997-03-14 | 1999-11-02 | Mindmeld Multimedia Inc. | Graphical user interface to communicate attitude or emotion to a computer program |
US5983190A (en) * | 1997-05-19 | 1999-11-09 | Microsoft Corporation | Client server animation system for managing interactive user interface characters |
US6567779B1 (en) * | 1997-08-05 | 2003-05-20 | At&T Corp. | Method and system for aligning natural and synthetic video to speech synthesis |
US6177928B1 (en) * | 1997-08-22 | 2001-01-23 | At&T Corp. | Flexible synchronization framework for multimedia streams having inserted time stamp |
US5907328A (en) * | 1997-08-27 | 1999-05-25 | International Business Machines Corporation | Automatic and configurable viewpoint switching in a 3D scene |
-
1997
- 1997-08-05 US US08/905,931 patent/US6567779B1/en not_active Expired - Lifetime
-
1998
- 1998-08-04 EP EP98306215A patent/EP0896322B1/en not_active Expired - Lifetime
- 1998-08-04 DE DE69819624T patent/DE69819624T2/en not_active Expired - Lifetime
- 1998-08-05 JP JP22207298A patent/JP4716532B2/en not_active Expired - Lifetime
- 1998-08-05 CA CA002244624A patent/CA2244624C/en not_active Expired - Lifetime
-
2003
- 2003-01-23 US US10/350,225 patent/US6862569B1/en not_active Expired - Lifetime
-
2005
- 2005-01-07 US US11/030,781 patent/US7110950B2/en not_active Expired - Lifetime
-
2009
- 2009-06-05 JP JP2009135960A patent/JP4783449B2/en not_active Expired - Lifetime
Also Published As
Publication number | Publication date |
---|---|
US20050119877A1 (en) | 2005-06-02 |
US6862569B1 (en) | 2005-03-01 |
EP0896322B1 (en) | 2003-11-12 |
JP4716532B2 (en) | 2011-07-06 |
JP2009266240A (en) | 2009-11-12 |
CA2244624C (en) | 2002-05-28 |
EP0896322A3 (en) | 1999-10-06 |
US6567779B1 (en) | 2003-05-20 |
DE69819624D1 (en) | 2003-12-18 |
US7110950B2 (en) | 2006-09-19 |
EP0896322A2 (en) | 1999-02-10 |
JP4783449B2 (en) | 2011-09-28 |
DE69819624T2 (en) | 2004-09-23 |
JPH11144073A (en) | 1999-05-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA2244624A1 (en) | Method and system for aligning natural and synthetic video to speech synthesis | |
EP1715696A3 (en) | System, method and apparatus for a variable output video decoder | |
US6602299B1 (en) | Flexible synchronization framework for multimedia streams | |
MXPA03009539A (en) | Multi-rate transcoder for digital streams. | |
ES2159688T3 (en) | METHOD AND APPLIANCE TO PLAY VOICE SIGNS AND METHOD FOR TRANSMITTERS. | |
CA2149068A1 (en) | Sound-synchronized video system | |
DE69433593D1 (en) | SPLIT VOICE RECOGNITION SYSTEM | |
ATE367038T1 (en) | ARRANGEMENT AND METHOD RELATING TO TEACHING LANGUAGE | |
HK1091585A1 (en) | Fidelity-optimised variable frame length encoding | |
MY145597A (en) | Method and apparatus for representing image granularity by one or more parameters | |
US7844463B2 (en) | Method and system for aligning natural and synthetic video to speech synthesis | |
CA2285158A1 (en) | A method and an apparatus for the animation, driven by an audio signal, of a synthesised model of human face | |
BR0206615A (en) | Methods for encoding and synthesizing a set of audio signals, encoders for encoding a set of audio signals and for encoding audio channels, decoder for synthesizing a set of audio signals, data bearer, and encoded signal | |
CA2267219A1 (en) | Differential coding for scalable audio coders | |
ZA200205089B (en) | Speech decoder and a method for decoding speech. | |
US7076426B1 (en) | Advance TTS for facial animation | |
MXPA01008928A (en) | Method and apparatus for coding moving picture image. | |
WO2002025953A3 (en) | Video and audio transcoder | |
DE3277095D1 (en) | Allophone vocoder | |
HK1042980A1 (en) | Speech synthesizer based on variable rate speech coding. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request | ||
MKEX | Expiry |
Effective date: 20180806 |