|Publication number||US6240390 B1|
|Application number||US 09/137,958|
|Publication date||May 29, 2001|
|Filing date||Aug 21, 1998|
|Priority date||May 18, 1998|
|Publication number||09137958, 137958, US 6240390 B1, US 6240390B1, US-B1-6240390, US6240390 B1, US6240390B1|
|Original Assignee||Winbond Electronics Corp.|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (6), Non-Patent Citations (1), Referenced by (11), Classifications (6), Legal Events (7)|
|External Links: USPTO, USPTO Assignment, Espacenet|
This application claims the priority benefit of Taiwan application serial no. 87107658, filed May 18, 1998, the non-essential material of which is incorporated herein by reference.
1. Field of the Invention
This invention includes speech synthesizers, and more particularly, to an architecture for speech synthesizer and a method to synthesize speech, which allows the speech synthesizer to be capable of driving external devices in a multi-tasking manner while nonetheless allowing the software complexity and voice concatenation to be simple to implement.
2. Description of Related Art
A synthesizer may be a device that combines a variety of items so as to form a new, complex product. Speech synthesizers are widely utilized in various systems where voice is used to output certain messages or data to the user, such as personal computers, mobile phones, toys, and warning systems, to name a few. A speech synthesizer is typically provided with a ROM (read-only memory) unit which stores a database of various sounds or words that can be retrieved and combined to form a stream of voices of specific meanings. This ROM unit is typically partitioned into a number of sections, called speech sections. In one standard for voice synthesizing, such speech sections are designated by H4, S1, S2, . . . , Sn. and T4. Each speech section represents one of 250 basic phonic elements that can be selected and combined into the sound data of various words or phrases. Alternatively, each speech section can store the sound data of complete words. However, this is merely a design choice by the speech synthesizer designer.
The data in each speech section can be selected for synthesizing into words or phrases through various speech equations (EQ), each EQ representing the combination of a number of selected phonic elements that are combined in accordance with the EQ to form a particular word or phrase of a specified meaning. For example, EQ=H4+S1+S2+S3+T4 may represent either a five-sound word or a five-word phrase.
The foregoing scheme of using phonic elements for the synthesizing of words allows the required memory space for the speech database to be significantly reduced as compared to the scheme of storing the sound of each word in the ROM unit. Moreover, it allows the designer to be more flexible and versatile in designing the speech synthesizer for the purpose of providing the sound data of more complex words or phrases.
One standard for speech synthesis defines one section of speech data as the combination of a number of bytes, respectively designated by H4, S1, S2, S3, and T4. This scheme is illustratively depicted in FIG. 1. Each of the bytes (H4, S1, S2, S3, T4) represents one basic constituent element of sound data and can be either a single sound, a series of sounds, a piece of music, or the combination of several pieces of music.
FIG. 2 is a schematic block diagram showing a conventional speech synthesizer, as designated by the reference numeral 10, that can be used for the synthesizing of the speech data shown in FIG. 1 into digital sound data. As shown, this speech synthesizer 10 includes a memory unit 11, such as a ROM unit, and a synthesizer 12. The ROM unit 11 is used to store a database of phonic elements and various other kinds of speech data that can be selectively retrieved for synthesizing into sound data of specific meanings. When the speech synthesizer 10 receives a trigger signal 14, the corresponding phonic elements in the ROM unit 11 are retrieved and then transferred to the synthesizer 12 for synthesizing into sound data. The synthesized sound data are then converted into audible sounds by a loudspeaker 13. One benefit of this speech synthesizer is that its system architecture is quite simple to implement.
One drawback to the foregoing speech synthesizer 10, however, is that it is only capable of outputting the synthesized speech data as audible sounds through the loudspeaker 13, but incapable of driving external devices such as motors or light-emitting diodes (LED) in a multi-tasking manner at the same time.
The synthesizer 12 utilized in the speech synthesizer 10 is typically included in a state machine that can perform some I/O controls. One drawback to the utilization of the speech synthesizer in state machine, however, is that the I/O ports thereof can be switched for other I/O functions only when at the break between two consecutive speech sections. Therefore, the architecture of FIG. 2 would not meet high quality requirements for speech synthesizers.
FIG. 3A is a schematic block diagram of a conventional speech synthesizer 20 with multi-tasking capability. As shown, this speech synthesizer 20 includes a memory unit 21 such as a ROM unit, a micro-controller 22, a synthesizer 23, and a digital-to-analog converter (DAC) 24. Moreover, the speech synthesizer 20 is coupled to a loudspeaker 25. The memory unit 21 is used to store a database of phonic elements and various other kinds of speech data that can be selectively retrieved for synthesizing into sound data of specific meanings. When the speech synthesizer 20 receives a trigger signal 27, the corresponding data are retrieved under control of the micro-controller 22 from the memory unit 21 and subsequently transferred to the synthesizer 23 for synthesizing into sound data of specific meanings. The digital output from the synthesizer 23 is then converted by the DAC 24 into analog form which is then converted by the loudspeaker 25 into audible form. The micro-controller 22 allows the speech synthesizer 20 to perform I/O functions with external devices such as motors or LEDs.
Alternatively, as shown in FIG. 3B, the micro-controller 22 and the synthesizer 23 in the speech synthesizer 20 of FIG. 3A can be replaced by a single microprocessor 26. With this architecture, both the I/O controls and the synthesizing of speech data are performed by the microprocessor 26.
The foregoing speech synthesizer with multi-tasking capability, however, still has a drawback in encoding. For example, the voice concatenation, which is a technique to combine a number of separate phonic elements into a continuous stream of meaningful sounds, would be very complex in algorithm that can be very difficult to code into software program. Therefore, the design of the speech synthesizer would be a very laborious and time-consuming job to carry out. The development period typically requires at least one month.
In conclusion, the prior art has the following drawbacks.
(1) First, in respect to the prior art of FIG. 2, although it is simple in system architecture that allows it easy to design, it is incapable of driving external devices such as motors and LEDs in a multi-tasking manner at the same time when performing the speech synthesis. Moreover, it cannot switch the output state of the I/O ports except at the break between two consecutive speech sections.
(2) Second, in respect to the prior art of FIGS. 3A-3B, its multi-tasking capability is complex in algorithm that would cause the programming to be very complex to implement. The development period is therefore quite long.
It is therefore an objective of the present invention to provide a speech synthesizer and a method of synthesizing speech, which is capable of driving external devices in a multi-tasking manner and which is simple in software complexity.
It is another objective of the present invention to provide a speech synthesizer and a method of synthesizing speech, which allows voice concatenation to be easy to implement either through hardware or through software.
In accordance with the foregoing and other objectives of the present invention, a new speech synthesizer and a method of synthesizing speech are provided.
The speech synthesizer of the invention includes a memory unit, a voice list pointer, a start address register, a program counter, a synthesizer and an interrupt controller.
The memory unit has an interrupt vector section, a voice list section, a control program section, and a speech data section. The value of voice list pointer represents an address in the voice list section of the memory unit for gaining access to the data stored in the specified address in the voice list section of the memory unit. The content of start address register represents the starting address of a specific chunk of waveform data stored in the speech data section of the memory unit. The output of the program counter is used to gain access to specific addresses in the control program section of the memory unit. The synthesizer, coupled to the memory unit, is used for synthesizing the retrieved speech data from the memory unit into voice data. The interrupt controller is coupled to the synthesizer, which is capable of actuating the execution of an synthesis interrupt service routine stored in the memory unit in response to an interrupt signal generated by the synthesizer.
The architecture of the speech synthesizer of the invention allows the speech synthesizer to be capable of performing multi-tasking on external devices and the outputting of the synthesized sound data. Moreover, it allows the speech synthesizer to be constructed with simple software complexity and can be realized by either hardware or software for voice concatenation.
Further, one embodiment of the method of the invention includes the following steps. From a first speech section, the address is fetched corresponding to a voice list pointer (VLP) VLP. A first segment of speech data first segment is retrieved from the first speech section. The retrieved speech data is synthesized into voice data and then the synthesized voice data is broadcasted. An interrupt signal is generated when the broadcasting of the synthesized voice data is completed. The VLP is incremented to gain access to the next speech section. The invention also determines whether a stop mark is encountered in the data retrieved from the current speech section. If no stop mark is encountered, the invention repeats from the step of where the retrieved speech data is synthesized into voice through the step of checking whether a stop mark is encountered in the data retrieved from the current speech section. If a stop mark is encountered, then the invention terminates the synthesizing operation.
The above-described method of the speech synthesizer of the invention allows the speech synthesizer to be capable of performing multi-tasking on external devices 28 and the outputting of the synthesized sound data. Moreover, it allows the speech synthesizer to be constructed with simple software complexity and can be realized by either hardware or software for voice concatenation.
The invention can be more fully understood by reading the following detailed description of the preferred embodiments, with reference made to the accompanying drawings, wherein:
FIG. 1 is a schematic diagram used to depict a present standard which defines the format for speech data and voice signal waveforms;
FIG. 2 is a schematic block diagram of a conventional speech synthesizer;
FIG. 3A is a schematic block diagram of a first conventional speech synthesizer with multi-tasking capability;
FIG. 3B is a schematic block diagram of a second conventional speech synthesizer with multi-tasking capability;
FIG. 4 is a schematic block diagram of the speech synthesizer according to the invention; and
FIG. 5 is a schematic diagram used to depict the memory allocation in the memory unit in the speech synthesizer.
FIG. 4 is a schematic block diagram showing the architecture of the speech synthesizer according to the invention, which is designated by the reference numeral 30. As shown, the speech synthesizer 30 of the invention includes a voice list pointer (VLP) unit 31, a start address register 32, a program counter 33, a stack register 34, a multiplexer 35, an interrupt controller 36, a memory unit 37, a synthesizer 38, an input/output (I/O) controller 39, and a digital-to-analog converter (DAC) 40. Output device 60 is external to speech synthesizer 30. The output of the DAC 40 is coupled to a sound transducer 41, such as a loudspeaker, for converting into audible form.
The memory unit 37 is, for example, a ROM (read-only memory), which is partitioned into a plurality of sections, including a first section 50 (FIG. 5) for storing a number of interrupt vectors branching to some interrupt routines including a synthesis interrupt service routine; a second section 51 for storing a voice list; a third section 52 for storing a control program that can be used for I/O controls; and a fourth section 53 for storing various speech data that can be retrieved in a predetermined manner for synthesizing into sound data that can be then reproduced.
The VLP 31 is used to point to the current speech section in the voice list section 51. The start address register 32 is used to store the address value indicative of the location in the speech data section 53 where the speech data corresponding to the pointed speech section in the voice list section 51 are stored. The program counter 33 is used to generate a sequence of consecutive address values used to gain access to the memory unit 37.
An example of speech synthesis by the speech synthesizer 30 is given in the following. At start, the program counter 33 is set to output a specified address value used to gain access to a selected location in the control program section 52. The instruction code stored in this location is then executed to assign the starting address of a segment of speech data to the VLP 31. After this, the output address value from the program counter 33 is incremented to fetch the next instruction from the control program section 52, which is then executed to read the data in the first speech section of the speech data. The corresponding speech data in the voice list section 51 are then retrieved in accordance with the VLP 31. The retrieved data from voice list selection 51 include the frequency of the voice and a pointer that is pointed to an address in the speech data section 53 where the associated waveform data are stored. The address of the associated waveform data is then put into the start address register 32. After this, the content of the VLP 31 is incremented to point to the next speech section.
The speech synthesizer 30 then retrieves the speech data stored in the speech data section 53 in accordance with the waveform data address stored in the start address register 32. The retrieved data are then transferred to the synthesizer 38 for synthesizing into speech voices.
One example of the instruction sequence is shown below:
LD VLP, addr ;fetches the address value currently pointed by VLP
RD VLP ;retrieve the data in the speech section currently pointed by VLP
play ch ;synthesizing the retrieved speech data
When the instruction “play ch” is being executed, the synthesizer 38 uses the data in the speech data section 53 stored in the memory unit 37 to reset and start the synthesizer 38 to synthesizing the retrieved speech data into sound data.
At the end of the retrieved data from the currently selected speech section, the synthesizer 38 will generate an interrupt signal to the interrupt controller 36, causing the interrupt controller 36 to execute an interrupt service routine. This causes the speech synthesizer 30 to enter into the interrupt mode, in which the program counter 33 is set to a specific address value that is pointed to an address in the interrupt vector section 50 where the corresponding interrupt vector is stored. The interrupt service routine fetches the data that are stored in the next speech section in the voice list section 51 that is currently pointed by the VLP 31. Meanwhile, the start address register 32 is set to the address of the associated waveform data of the next speech section. After this, the VLP 31 is incremented to gain access to the next instruction. The retrieved data are then transferred to the synthesizer 38 for synthesizing into sound data. After this is completed, the speech synthesizer 30 exits the interrupt mode and returns to the main program.
The foregoing process is repeated to retrieve data and synthesize the retrieved data into sound data. When a stop mark in the speech section is encountered, a stop signal will be generated to stop the operation of the synthesizer 38 and turn it into a standby state.
Since the synthesizing of the speech data into sound data is carried out through the interrupt service routine, it can operate repeatedly and incessantly. This feature allows the designer to fully utilize the main program for external I/O controls. The speech synthesizer of the invention can thus be simplified in software complexity while nonetheless capable of performing multi-tasking on external devices and the outputting of the synthesized sound data.
When the speech synthesizer of the invention is implemented through hardware, the compressed speech data from the memory unit 37 are first fed into the synthesizer 38 for synthesizing into sound data, and then the digital output of the synthesizer 38 is converted by the DAC 40 into analog form which can be then converted by the sound transducer 41 into audible form. The stack register 34 is used to store the return address of an interrupt/call operation. The multiplexer 35 is used to couple either the output of the VLP 31, the output of the start address register 32, or the output of the program counter 33, to the memory unit 37 so as to gain access to data stored in various locations in the memory unit 37 in accordance with current requests. The interrupt controller 36 is capable of interrupting the speech synthesizer 30 in response to an externally generated trigger signal 39 or an interrupt signal from the synthesizer 38. The synthesizer 38 is used to synthesize the retrieved speech data from the memory unit 37 through a PCM (pulse-code modulation) method into digital sound data. The I/O controller 39 is used for I/O controls of external devices (60) such as a motor (not shown) or an LED (not shown) in response to instructions from the memory unit 37.
In the foregoing speech synthesizer 30, the interrupt signal is generated through hardware means. Alternatively, it can be generated through software means.
One example of a software program designed for the speech synthesizer is shown below:
LD R0, 3
MOV R1, R2
ADD R3, R5
play ch, H4+S1+S2+S3+S4+S5+T4
LD R6, F
loop: (I/O control)
DINZ R6, loop
LD output, 0011B
LD output, 0011B
LD output, 0010B
. . .
Synth-INT (synthesis interrupt service routine)
With the provision of the voice list section, the VLP 31, and the synthesis interrupt service routine, the voice concatenation can be carried out automatically by the hardware without having to devise complex software programs to perform this task. Therefore, the speech synthesizer is able to perform I/O controls at the same time it is outputting synthesized voice data.
In conclusion, the speech synthesizer 30 of the invention has the following advantages over the prior art.
(1) First, the invention allows the speech synthesizer 30 to be capable of performing multi-tasking on external devices 60 and the outputting of the synthesized sound data to sound transducer 41. The drawback of the prior art as mentioned in the background section is therefore eliminated.
(2) The invention allows the speech synthesizer 30 to be constructed with simple software complexity and can be realized by either hardware or software for voice concatenation.
The invention has been described using exemplary preferred embodiments. However, it is to be understood that the scope of the invention is not limited to the disclosed embodiments. On the contrary, it is intended to cover various modifications and similar arrangements. The scope of the claims, therefore, should be accorded the broadest interpretation so as to encompass all such modifications and similar arrangements.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US4940965 *||Jul 21, 1988||Jul 10, 1990||Suzuki Jidosha Kogyo Kabushiki Kaisha||Vocal alarm for outboard engine|
|US5016006 *||Jul 21, 1988||May 14, 1991||Suzuki Jidosha Kogyo Kabushiki Kaisha||Audio alarm outputting device for outboard engine|
|US5045993 *||Jun 3, 1988||Sep 3, 1991||Mitsubishi Denki Kabushiki Kaisha||Digital signal processor|
|US5708760 *||Aug 8, 1995||Jan 13, 1998||United Microelectronics Corporation||Voice address/data memory for speech synthesizing system|
|US5809466 *||Nov 27, 1996||Sep 15, 1998||Advanced Micro Devices, Inc.||Audio processing chip with external serial port|
|US5954811 *||Jan 25, 1996||Sep 21, 1999||Analog Devices, Inc.||Digital signal processor architecture|
|1||*||Texas Instruments, Design Manual for TSP50COx/1x Family Speech Synthesizer, sec. 2.3, 1994.|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US7062440||May 31, 2002||Jun 13, 2006||Hewlett-Packard Development Company, L.P.||Monitoring text to speech output to effect control of barge-in|
|US7895041 *||Apr 27, 2007||Feb 22, 2011||Dickson Craig B||Text to speech interactive voice response system|
|US8495157||Sep 14, 2010||Jul 23, 2013||International Business Machines Corporation||Method and apparatus for distributed policy-based management and computed relevance messaging with remote attributes|
|US8966110||Sep 9, 2010||Feb 24, 2015||International Business Machines Corporation||Dynamic bandwidth throttling|
|US20020184031 *||May 31, 2002||Dec 5, 2002||Hewlett Packard Company||Speech system barge-in control|
|US20080270137 *||Apr 27, 2007||Oct 30, 2008||Dickson Craig B||Text to speech interactive voice response system|
|US20100332640 *||Sep 14, 2010||Dec 30, 2010||Dennis Sidney Goodrow||Method and apparatus for unified view|
|US20110029626 *||Sep 14, 2010||Feb 3, 2011||Dennis Sidney Goodrow||Method And Apparatus For Distributed Policy-Based Management And Computed Relevance Messaging With Remote Attributes|
|US20110066752 *||Sep 9, 2010||Mar 17, 2011||Lisa Ellen Lippincott||Dynamic bandwidth throttling|
|US20110066841 *||Sep 14, 2010||Mar 17, 2011||Dennis Sidney Goodrow||Platform for policy-driven communication and management infrastructure|
|US20110066951 *||Sep 14, 2010||Mar 17, 2011||Ward-Karet Jesse||Content-based user interface, apparatus and method|
|U.S. Classification||704/267, 704/258, 704/E13.006|
|Aug 21, 1998||AS||Assignment|
Owner name: WINBOND ELECTRONICS CORP., TAIWAN
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:JIH, CHAUR-WEN;REEL/FRAME:009407/0884
Effective date: 19980712
|Apr 16, 2002||CC||Certificate of correction|
|Sep 23, 2004||FPAY||Fee payment|
Year of fee payment: 4
|Sep 18, 2008||FPAY||Fee payment|
Year of fee payment: 8
|Jan 7, 2013||REMI||Maintenance fee reminder mailed|
|May 29, 2013||LAPS||Lapse for failure to pay maintenance fees|
|Jul 16, 2013||FP||Expired due to failure to pay maintenance fee|
Effective date: 20130529