CA2244624A1 - Method and system for aligning natural and synthetic video to speech synthesis - Google Patents

Method and system for aligning natural and synthetic video to speech synthesis

Info

Publication number
CA2244624A1
CA2244624A1 CA002244624A CA2244624A CA2244624A1 CA 2244624 A1 CA2244624 A1 CA 2244624A1 CA 002244624 A CA002244624 A CA 002244624A CA 2244624 A CA2244624 A CA 2244624A CA 2244624 A1 CA2244624 A1 CA 2244624A1
Authority
CA
Canada
Prior art keywords
text
time stamp
facial animation
encoder
bookmarks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CA002244624A
Other languages
French (fr)
Other versions
CA2244624C (en
Inventor
Andrea Basso
Mark Charles Beutnagel
Joern Ostermann
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AT&T Intellectual Property II LP
Original Assignee
AT&T Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AT&T Corp filed Critical AT&T Corp
Publication of CA2244624A1 publication Critical patent/CA2244624A1/en
Application granted granted Critical
Publication of CA2244624C publication Critical patent/CA2244624C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transforming into visible information
    • G10L2021/105Synthesis of the lips movements from speech, e.g. for talking heads
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/20Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding

Abstract

According to MPEG-4's TTS architecture, facial animation can be driven by two streams simultaneously - text, and Facial Animation Parameters. In this architecture, text input is sent to a Text-To-Speech converter at a decoder that drives the mouth shapes of the face. Facial Animation Parameters are sent from an encoder to the face over the communication channel. The present invention includes codes(known as bookmarks) in the text string transmitted to the Text-to-Speech converter, which bookmarks are placed between words as well as inside them. According to the present invention, the bookmarks carry an encoder time stamp. Due to the nature of text-to-speech conversion, the encoder time stamp does not relate to real-world time, and should be interpreted as a counter. In addition, the Facial Animation Parameter stream carries the same encoder time stamp found in the bookmark of the text. The system of the present invention reads the bookmark and provides the encoder time stamp as well as a real-time time stamp to the facial animation system.
Finally, the facial animation system associates the correct facial animation parameter with the real-time time stamp using the encoder time stamp of the bookmark as a reference.
CA002244624A 1997-08-05 1998-08-05 Method and system for aligning natural and synthetic video to speech synthesis Expired - Lifetime CA2244624C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US08/905,931 US6567779B1 (en) 1997-08-05 1997-08-05 Method and system for aligning natural and synthetic video to speech synthesis
US08/905,931 1997-08-05

Publications (2)

Publication Number Publication Date
CA2244624A1 true CA2244624A1 (en) 1999-02-05
CA2244624C CA2244624C (en) 2002-05-28

Family

ID=25421706

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002244624A Expired - Lifetime CA2244624C (en) 1997-08-05 1998-08-05 Method and system for aligning natural and synthetic video to speech synthesis

Country Status (5)

Country Link
US (3) US6567779B1 (en)
EP (1) EP0896322B1 (en)
JP (2) JP4716532B2 (en)
CA (1) CA2244624C (en)
DE (1) DE69819624T2 (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6567779B1 (en) * 1997-08-05 2003-05-20 At&T Corp. Method and system for aligning natural and synthetic video to speech synthesis
US7366670B1 (en) 1997-08-05 2008-04-29 At&T Corp. Method and system for aligning natural and synthetic video to speech synthesis
JP3720230B2 (en) * 2000-02-18 2005-11-24 シャープ株式会社 Expression data control system, expression data control apparatus constituting the same, and recording medium on which the program is recorded
FR2807188B1 (en) * 2000-03-30 2002-12-20 Vrtv Studios EQUIPMENT FOR AUTOMATIC REAL-TIME PRODUCTION OF VIRTUAL AUDIOVISUAL SEQUENCES FROM A TEXT MESSAGE AND FOR THE BROADCAST OF SUCH SEQUENCES
AU2001248996A1 (en) * 2000-04-19 2001-10-30 Telefonaktiebolaget Lm Ericsson (Publ) System and method for rapid serial visual presentation with audio
KR100343006B1 (en) * 2000-06-01 2002-07-02 김상덕 Language input type facial expression control mathod
US7149686B1 (en) * 2000-06-23 2006-12-12 International Business Machines Corporation System and method for eliminating synchronization errors in electronic audiovisual transmissions and presentations
US7120583B2 (en) 2000-10-02 2006-10-10 Canon Kabushiki Kaisha Information presentation system, information presentation apparatus, control method thereof and computer readable memory
US8046010B2 (en) 2006-03-07 2011-10-25 Sybase 365, Inc. System and method for subscription management
AU2008100836B4 (en) * 2007-08-30 2009-07-16 Machinima Pty Ltd Real-time realistic natural voice(s) for simulated electronic games
US10248931B2 (en) * 2008-06-23 2019-04-02 At&T Intellectual Property I, L.P. Collaborative annotation of multimedia content
US20090319884A1 (en) * 2008-06-23 2009-12-24 Brian Scott Amento Annotation based navigation of multimedia content
US8225348B2 (en) 2008-09-12 2012-07-17 At&T Intellectual Property I, L.P. Moderated interactive media sessions
US20100070858A1 (en) * 2008-09-12 2010-03-18 At&T Intellectual Property I, L.P. Interactive Media System and Method Using Context-Based Avatar Configuration
US9697535B2 (en) 2008-12-23 2017-07-04 International Business Machines Corporation System and method in a virtual universe for identifying spam avatars based upon avatar multimedia characteristics
US9704177B2 (en) 2008-12-23 2017-07-11 International Business Machines Corporation Identifying spam avatars in a virtual universe (VU) based upon turing tests
US8656476B2 (en) * 2009-05-28 2014-02-18 International Business Machines Corporation Providing notification of spam avatars
KR102117082B1 (en) 2014-12-29 2020-05-29 삼성전자주식회사 Method and apparatus for speech recognition

Family Cites Families (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4520501A (en) * 1982-10-19 1985-05-28 Ear Three Systems Manufacturing Company Speech presentation system and method
GB8528143D0 (en) * 1985-11-14 1985-12-18 British Telecomm Image encoding & synthesis
US4884972A (en) * 1986-11-26 1989-12-05 Bright Star Technology, Inc. Speech synchronized animation
JP2518683B2 (en) * 1989-03-08 1996-07-24 国際電信電話株式会社 Image combining method and apparatus thereof
US5111409A (en) * 1989-07-21 1992-05-05 Elon Gasper Authoring and use systems for sound synchronized animation
JP3036099B2 (en) * 1991-01-30 2000-04-24 日本電気株式会社 Data management methods
US5640590A (en) * 1992-11-18 1997-06-17 Canon Information Systems, Inc. Method and apparatus for scripting a text-to-speech-based multimedia presentation
US5878396A (en) * 1993-01-21 1999-03-02 Apple Computer, Inc. Method and apparatus for synthetic speech in facial animation
US5473726A (en) * 1993-07-06 1995-12-05 The United States Of America As Represented By The Secretary Of The Air Force Audio and amplitude modulated photo data collection for speech recognition
US5608839A (en) * 1994-03-18 1997-03-04 Lucent Technologies Inc. Sound-synchronized video system
DE4331710A1 (en) * 1993-09-17 1995-03-23 Sel Alcatel Ag Method and device for creating and editing text documents
US5623587A (en) * 1993-10-15 1997-04-22 Kideo Productions, Inc. Method and apparatus for producing an electronic image
US5657426A (en) * 1994-06-10 1997-08-12 Digital Equipment Corporation Method and apparatus for producing audio-visual synthetic speech
JPH08194494A (en) * 1995-01-13 1996-07-30 Canon Inc Sentence analyzing method and device
US5634084A (en) * 1995-01-20 1997-05-27 Centigram Communications Corporation Abbreviation and acronym/initialism expansion procedures for a text to speech reader
US5930450A (en) * 1995-02-28 1999-07-27 Kabushiki Kaisha Toshiba Recording medium, apparatus and method of recording data on the same, and apparatus and method of reproducing data from the recording medium
JPH0916195A (en) * 1995-07-03 1997-01-17 Canon Inc Information processing device and its method
JPH0922565A (en) * 1995-07-06 1997-01-21 Sony Corp Device and method for processing data
US5806036A (en) * 1995-08-17 1998-09-08 Ricoh Company, Ltd. Speechreading using facial feature parameters from a non-direct frontal view of the speaker
US6477239B1 (en) * 1995-08-30 2002-11-05 Hitachi, Ltd. Sign language telephone device
JPH0982040A (en) * 1995-09-14 1997-03-28 Toshiba Corp Recording medium, apparatus and method for recording data onto same recording medium as well as apparatus and method for reproducing data from same recording medium
JPH09138767A (en) * 1995-11-14 1997-05-27 Fujitsu Ten Ltd Communication equipment for feeling expression
SE519244C2 (en) * 1995-12-06 2003-02-04 Telia Ab Device and method of speech synthesis
JP3588883B2 (en) * 1995-12-08 2004-11-17 ヤマハ株式会社 Karaoke equipment
US5880731A (en) * 1995-12-14 1999-03-09 Microsoft Corporation Use of avatars with automatic gesturing and bounded interaction in on-line chat session
US5802220A (en) * 1995-12-15 1998-09-01 Xerox Corporation Apparatus and method for tracking facial motion through a sequence of images
US5793365A (en) * 1996-01-02 1998-08-11 Sun Microsystems, Inc. System and method providing a computer user interface enabling access to distributed workgroup members
US5732232A (en) * 1996-09-17 1998-03-24 International Business Machines Corp. Method and apparatus for directing the expression of emotion for a graphical user interface
US5884029A (en) * 1996-11-14 1999-03-16 International Business Machines Corporation User interaction with intelligent virtual objects, avatars, which interact with other avatars controlled by different users
US5963217A (en) * 1996-11-18 1999-10-05 7Thstreet.Com, Inc. Network conference system using limited bandwidth to generate locally animated displays
KR100236974B1 (en) * 1996-12-13 2000-02-01 정선종 Sync. system between motion picture and text/voice converter
US5812126A (en) * 1996-12-31 1998-09-22 Intel Corporation Method and apparatus for masquerading online
US5920834A (en) 1997-01-31 1999-07-06 Qualcomm Incorporated Echo canceller with talk state determination to control speech processor functional elements in a digital telephone system
US5818463A (en) * 1997-02-13 1998-10-06 Rockwell Science Center, Inc. Data compression for animated three dimensional objects
US5977968A (en) * 1997-03-14 1999-11-02 Mindmeld Multimedia Inc. Graphical user interface to communicate attitude or emotion to a computer program
US5983190A (en) * 1997-05-19 1999-11-09 Microsoft Corporation Client server animation system for managing interactive user interface characters
US6567779B1 (en) * 1997-08-05 2003-05-20 At&T Corp. Method and system for aligning natural and synthetic video to speech synthesis
US6177928B1 (en) * 1997-08-22 2001-01-23 At&T Corp. Flexible synchronization framework for multimedia streams having inserted time stamp
US5907328A (en) * 1997-08-27 1999-05-25 International Business Machines Corporation Automatic and configurable viewpoint switching in a 3D scene

Also Published As

Publication number Publication date
US20050119877A1 (en) 2005-06-02
US6862569B1 (en) 2005-03-01
EP0896322B1 (en) 2003-11-12
JP4716532B2 (en) 2011-07-06
JP2009266240A (en) 2009-11-12
CA2244624C (en) 2002-05-28
EP0896322A3 (en) 1999-10-06
US6567779B1 (en) 2003-05-20
DE69819624D1 (en) 2003-12-18
US7110950B2 (en) 2006-09-19
EP0896322A2 (en) 1999-02-10
JP4783449B2 (en) 2011-09-28
DE69819624T2 (en) 2004-09-23
JPH11144073A (en) 1999-05-28

Similar Documents

Publication Publication Date Title
CA2244624A1 (en) Method and system for aligning natural and synthetic video to speech synthesis
EP1715696A3 (en) System, method and apparatus for a variable output video decoder
US6602299B1 (en) Flexible synchronization framework for multimedia streams
MXPA03009539A (en) Multi-rate transcoder for digital streams.
ES2159688T3 (en) METHOD AND APPLIANCE TO PLAY VOICE SIGNS AND METHOD FOR TRANSMITTERS.
CA2149068A1 (en) Sound-synchronized video system
DE69433593D1 (en) SPLIT VOICE RECOGNITION SYSTEM
ATE367038T1 (en) ARRANGEMENT AND METHOD RELATING TO TEACHING LANGUAGE
HK1091585A1 (en) Fidelity-optimised variable frame length encoding
MY145597A (en) Method and apparatus for representing image granularity by one or more parameters
US7844463B2 (en) Method and system for aligning natural and synthetic video to speech synthesis
CA2285158A1 (en) A method and an apparatus for the animation, driven by an audio signal, of a synthesised model of human face
BR0206615A (en) Methods for encoding and synthesizing a set of audio signals, encoders for encoding a set of audio signals and for encoding audio channels, decoder for synthesizing a set of audio signals, data bearer, and encoded signal
CA2267219A1 (en) Differential coding for scalable audio coders
ZA200205089B (en) Speech decoder and a method for decoding speech.
US7076426B1 (en) Advance TTS for facial animation
MXPA01008928A (en) Method and apparatus for coding moving picture image.
WO2002025953A3 (en) Video and audio transcoder
DE3277095D1 (en) Allophone vocoder
HK1042980A1 (en) Speech synthesizer based on variable rate speech coding.

Legal Events

Date Code Title Description
EEER Examination request
MKEX Expiry

Effective date: 20180806