US 3646576 A
Abstract available in
Claims available in
Description (OCR text may contain errors)
United States Patent Feb. 29, 1972 DavidThurston G iggs a/k/a D. Thurston Griggs  SPEECH CONTROLLED PHONETIC TYPEWRITER I Inventor -P T22R51?! fines e is e. 21 E5 1!!! Griggs, 5128 Rolling Road, Baltimore, Md. 21227  Filed: Jan. 9, 1970  Appl.No.: 1,739
 U.S.Cl. ..179/1 SA  1nt.Cl. ..G10l 1/16  Field ofSearch ..l79/1 SA, 1 VS; 178/31;
 References Cited UNITED STATES PATENTS 3,225,141 12/1965 Dersch ..179/1 SA 3,204,030 8/1965 Olson ..l78/31 3,428,748 2/1969 Flanagan..... ....l79/l SA 3,234,332 2/1966 Belar ....179/l SA 3,265,814 8/1966 Maeda.. ..179/31 3,383,466 5/1968 Hillix ..179/1 SA OTHER PUBLICATIONS Dolansky, On Certain lrregularities of Voiced-Speech Waveforms, IEEE Transactions, Vol. Au-1 6, 3/68, p. 51- 56 INPUT T0 UNVOICED FRICATIVE mmsoucm(ne5) i 10 DIPTHONG TRANSDUCER (no.8)
ADJUSTMENT Primary ExaminerKathleen H. Claffy Assistant Examiner.lon Bradford Leaheey Attorney-Keith Misegades and George R. Douglas, Jr.
 ABSTRACT To convert speech directly into print as it is being spoken, by machine, is a goal that has been thwarted by two critical wants: (I) a way to perform many complex selective operations with great speed, and (2) a way to close the gap between continuous speech as an unbroken sequence of sounds on the one hand, and the distinctly separated words and spelling conventions of the printed language, on the other. Recently, taking advantage of the electronic computers speed with multiplex programmed operations, acousticians have sought to achieve more accurate detection and separation of speech sounds. Such efforts have required programming and availability of extensive'computer facilities to approach one step in the problem.
1 switching circuits, into a real-time electrical phonemic analog of what is said; then it adds a special-purpose digital computer component to process and match syllabic sequences of sounds in the language. Thus, the computer element is smaller and is used not for phonetic detection but simply to give an output as close as possible to conventional printing as can be obtained by means of a prestored vocabulary of 12,000 words. The readout can be printed by a modern high-speed electric typewriter.
H 1 6 CIairns, 12 Drawingfigures PAIENIEDFEIIZQ I912 3. 646,576
SHEET 1 [IF 9 20 26 m 482 L 2 ORAL I6 8 2 In L TRANSDUCER 4o 2 TRANSCRIBER s9 84 TYPO- IL THROA SIGNALS MODULE SIGNALS GRAPHIC o 3 *T UNIT 5% CONTACT l4 (FIesz-a) (H09) 5 I0 I FIG.I wRITTEN OUTPUT I69 INPUI' SIGNALS I STOPS I I UNVOICED STOPS I 88 I I DI FROM I FILTER I FILTER 2 FILTER 3 SILENCE DETECTOR STOPS OR 94+ l800-22OO 3800-4600 3400-3800 I I I I FILTER I FILTER 2 FILTER 3 1100- I700 I700 -2000 2000-2400 I82 I A I92 I FI+F2 I I I I I 80 T I I I I I86 I I I I I l I I I I I (I90 I; 9 I70 UNDIFFERENITATED I L -L I f 9J E EM I INVENTOR F 4A DAVID THURSTON GRIGGS ATTORNEYS PATENTEUFEBZS I972 SHEET 8 0F 9 t 58 'RD.TA ITORTAL *-I SIGNAL STRENGTH j 5 318 1 PEAK TOTAL ORAL 342 1 ORAL PEAK INPUT 3e AMPL'TUDE COMPARATOR/J FROM SEQUENTIAL PEAK FIRST GATE CENTERED 1 FILTERS Saw S 'Q I I 240- 960 QU TIE I CPS. DISCRIM" i 1 CYCLES INATOR I EACH L g E 3 |2 1E I I -w I I 4, FIRST FORMANT I 5 q BANDWIDTH I I I 9 332 I I 330 358 i PX s S mG I 5 SW I I E 720 94OCPS 346 5 DlPHTl-DNG ASSING 5 3 jg T"" l I 940-20065 354 g 5; 9 1! I i z 0 O 2 U I T 35 2 u: 360 37 u i u w 4 37 T0 I (n 9, E A mmmse I T 3 r 1 310 I 380 i g ,fi l i i 356 339M! /3 4O I I INPUT TRMBDUCER To TRANSCRTBER PRESENT MODULE 1 1 VOWEL DETECTION I UNIT I l v 3 l FIG 7 INVENTOR ATTORNEYS SPEECH CONTROLLED PHONETIC TYPEWRITER SPECIFICATION The present invention relates to a mechanism which transcribes human speech of the standard American variety from any adult speaker, instantaneously and automatically into a typed printout that normally consists, 90 percent or more, of words that are spaced and appear in conventional spelling. The system comprises three modules and employs dual inputs, as shown in FIG. 1.
One of the objects and advantages of the present invention is that there is provided a device that transcribes automatically to give a printed output which is over 90 percent in conventional spelling with separations into words or syllabic units, instantaneously derived from the spoken input of standard American English.
A further object of the present invention is to provide realtime detection and analysis of speech sounds achieved by preswitching sounds according to their six manners of production into separate analytical circuits for each type, namely, vowels, nasals, unvoiced fricatives, voiced fricatives, voiced stops and unvoiced stops.
A further object of the present invention is to better differentiate one from another the voiced stops (plosives) and nasals where two different kinds of vocal inputs are used.
Another object of the present invention is to provide means for producing distinctions between voiced stops and unvoiced and unreleased stops (plosives) which are detected independent of voicing, when necessary, by means of rate of change of signal strength and durational timing.
A further object of the invention is to provide specific detection of speech sounds regardless of differences of pitch as between different speakers. Detection of vowels based upon frequency measurements alone is obviated because the centrum frequency and peak amplitude for the first formant are detected and measured and then correlated to centrum peak amplitude for the second formant, with a ratio characteristic for each vowel sound, regardless of pitch.
Another object of the invention is to provide signal indications for undifi'erentiated stops, both voiced and unvoiced, for
an undifferentiated nasal and for an undifferentiated vowel, when those occur, so that the most probable intended but slighted phonemes may be interpolated in the processes of syllable and word formation by the device.
A still further object of the invention is to provide a means of single vowel detection so as to reduce phoneme storage, the diphthongs being identified by a process based upon detection of the simple vowels, but signaled out in the same manner.
Another object of the invention is to provide phonetic detection processes which produce differentiation of 38 different phonetic entities which serve as phonemes for the transcription process, and two additional signals-(a) an indication of silence and (b) an indication of syllabic stress.
A further object of the invention is to provide separation between successive identical sounds where one terminates the first word and the other starts the next word, this being accomplished by timing the duration of each type of sound so that only one signal will pass during the normal duration of a single occurrence of the sound, but a second signal will be emitted when it is prolonged as a bridge between successive words as a repeated sound.
Another object of the invention is to provide, by taking the vowel as the basis, for the analysis of the input by syllables, and the invention deals with up to 337 distinctive kinds of syllabic sequences of different types of phonemes. By means of these combinations, syllables are separated in connected speech, and speech vocabulary is classified. Provision is made for reconstitution of erroneous syllabic formulations, also instantaneously. Certain nasals are accorded syllabic import where, through their position, they supplant an articulated vowel.
A further object of the invention is to provide for the accumulation of separated and identified syllables, which are combined according to patterns arranged in a prestored vocabulary so that words will be formed, the longest possible ones first, and to provide a printout in conventional prestored spellings as a result of this matching of incoming material with what has been stored. This is both for ease of reading and to separate words for printing from within the stream of connected speech.
Another object of the invention is to provide for a vocabulary of 12,000 or more words and syllables in storage, with about 1,150 of these in a supplementary store so that short words of three or less phonemes print out independently but only after they have been useful as parts of longer stored words.
A further object is to provide, by means of coded designations for stored syllables, variations of pronunciation permissible within the language structure when these designations are matched to the stored vocabulary words for printout.
Another object is to provide for compensation for omissions in the stored vocabulary (mainly proper names) by printing a phonemic or syllabic printout of incoming unstored verbal material in a close approximation to conventional spelling, with stress indicated;
Another object is to provide for a printout of punctuation and of numerals and also of letter designations for spelling from spoken inputs by means of the syllable designations and vocabulary matching processes.
A still further object of the invention is to provide for realtime printout designations at a speed of 10 phonemes per second for the entire vocabulary, except that printing of each word must delay until completion of that word, the maximum time being approximately 2.5 seconds.
Another object is to provide for 88 printout signals which are suitable for a conventional high-speed typewriter or printer capable of handling 10 characters per second. The 88 printout signals may also be adapted to drive a Braille printing device, as well as a conventional printer.
Another object is to provide for the 88 printout signals characters which provide capital letters for spelling or designation purposes, numerals, punctuation and indications of phonemes where stress has occurred as shown by vowels appearing in boldfaced type. A separate spacing indication signal to the printing module after words also is provided by the invention.
The design of the invention permits a choice of vocabulary constituents or substitution of various ones, either as an entire group or through individually altered circuits without requiring a wholly different apparatus.
The invention provides at least two opportunities for recording spoken material for subsequent delayed transcription-( l) a dual-track recording by oral and throat microphones jointly on tape, or (2) a recording of the output of the detected phonemes from the transducer module on single track.
These and other objects and advantages of the invention will become apparent upon full consideration of the following detailed description and accompanying drawings in which:
FIG. 1 is a block diagram of the three modules comprising the invention according to the preferred and best mode of the invention and showing that the transducer module employs two inputs;
FIG. 1A is a block diagram showing the manner in which FIGS. 1-9 are connected;
FIG. 2 is a block diagram of the sound separator apparatus or unit of the transducer module;
FIG. 3 is a block diagram of the stops-or-silence detector;
FIGS. 4 through 8 show methods of detecting individual speech sounds and their components based upon the concept of filtering at various frequencies with necessary comparators, switches and attending circuitry, and in particular FIG. 4 shows a block diagram for detecting and processing stop speech sounds;
FIG. 4A shows a block diagram for processing undifferentiated voiced stop speech sounds;
FIG. 5 shows a block diagram for detecting and processing fricative sounds;
FIG. 6 shows a block diagram of a nasal unit for detecting and processing nasal sounds;
FIG. 7 shows a block diagram of a vowel detection unit for processing vowel sounds;
FIG. 7A shows a block and circuit diagram of a second formant scanner unit for processing the second formant of vowel sounds;
FIG. 8 shows a circuit and block diagram of a diphthong transducer unit for detecting and processing diphthong sounds; and
FIG. 9 is a block diagram of the transcriber module 20 of FIG. 1 according to the preferred and best mode of the invention.
Note also the following tables in the Appendix:
Table I is a chart in the specification of sequences of phonemes in syllabic formations;
Table 2 is a chart or listing of the regrouping of phoneme sequences from NO-GO syllables into new syllables; and
Table 3 is a chart in the specification showing the proposed type font consisting of 88 figures or characters that are printouts of the typographic unit of FIG. 1. Also included with the 88 figures is a spacing unit, totaling 89 signal inputs thereto.
Table 4' is a chart showing difi'erentiation of vowels according to formant peaks.
Referring now to the drawings, there is shown in FIG. 1 the transducer module which accepts an oral signal which is received by a microphone (not shown) that picks up the voice that is spoken by an adult and is fed to the transducer module on oral signal line 12. Also provided to the transducer module is a signal derived from a microphone positioned on ones throat above and to one side of one s Adams apple and in substantial contact with the skin surface, and is fed to the transducer module on conductor 14.
As is also shown in FIG. 1, the transducer module 10 sorts and detects the phonetic elements needed for subsequent transcription which is accomplished in the circuitry and components of FIGS. 2-8 to be described below, and produces 38 real-time electrical output signals on conductors 16 representing the phonemes that have been detected and delineated, including diphthongs. Also within the conductors 16 are two additional outputs, one indicating duration of silence and one indicating stress, to be described below.
The second or transcriber module of FIG. 1 is a modified digital computer unit, more particularly described in connection with FIG. 9, and which receives the 38 phoneme output signals from conductors 16 together with the two other signals indicating durations of silence and stress, respectively. The transcriber module divides its phoneme input into 337 types or patterns of syllables and makes words from a stored vocabulary of 12,000 or more words; and for syllables not stored, it arranges a similar printout. The output is appropriately coded so as to drive a high-speed typewriter or similar printing device 26, and thus produces thereby its written output. The printing device 26, also called a typographic unit, is responsive to activating signals of 88 different characters including a punctuation signal, and provides stress for isolated syllables and provides spacing after each of the language units it determines.
Thus the third module is any suitable high-speed typewriter or similar printing device which will accept the outputs of the transcriber module 20 at speeds up to 10 characters per second with type font modified to accord with the 88 outputs and space signals totaling 89 signals, as necessary.
As is shown in FIG. 1A, there is shown the arrangement in which the various FIGS. 2-8 interconnect in forming the transducer module 10 of FIG. I. In FIG. 2, there is shown the sound separator 30 having the oral signal line 12 and the throat signal line 14 connected to sensor elements 32,34 respectively, which sensor devices amplify for analyzing the inputs applied thereto for analysis as to kinds of speech sounds so that they may be shunted through or conveyed to different subsequent analytical circuits according to the kinds of speech and their constituent components, for detection of individual speech sounds in several determined categories. From the sensor device 32, an output is conveyed or coupled to a linear amplifier or, what is called herein, a VOGAD" 36, from whence it is fed to a set of switches or gates 40,42,44,46,48,50; each of which passes the oral input when appropriate for its category of kind of speech sound being analyzed. These categories are six in number and relate in the following manner to the switches 40-50.
Switch 40 unvoiced stops Switch 42 voiced stops Switch 44 unvoiced fricative Switch 46 voiced fricative Switch 48 nasals Switch 50 vowels By these divisions or separations of conventional electric analogs of an oral input, there are derived signals from switches 40-50 that provide means of preswitching nasals, vowels and voiced fricatives.
Signal means of opening each of the gates or switches 40-50 likewise is shown in FIG. 2 by means of a series of three ratiometers 52,54,56, which act upon the ratios of amplitude of the signal strength in line 58 from sensor 32 and from the signal strength in line 60, as applied to the ratiometers 52-56. The signal strength in lines 58,60 essentially conduct or pass signals indicative of the strength of the oral and throat inputs respectively, and an oral OFF-switch 62 coupled to line 58 provides a ratio of change of strength signal to the ratiometer 54, while the sensor 34 provides an output to a 700-c.p.s. lowpass filter 64 which provides an output to a throat signal differentiator 66 which provides a rate of change of signal strength to the ratiometer 56.
In the ratiometer 52, there is an amplitude comparison of signals of about 3 to 2 at the throat for indicating a vowel; a throat input signal ratio of about 2 to 1 for providing the oral input indicating a nasal sound; a ratio of approximately I to l which characterizes a voiced fricative sound.
The ratiometer 52, when satisfied, activates a vowel gate over conductor 52a in gate 50. The ratiometer 54, when satisfied, activates the nasal gate 48 over line 54a. The ratiometer 56 which provides an approximately l-to-l output comparison, when satisfied, provides activation of the gate or switch 46 which provides a voiced fricative indication.
During a rapid rate of change of oral input as shown by the sensors 32,34, the ratiometers 52,54,56 are cut off so that transitional states that may be developed will not register inappropriate ratios therein. A 700-Herz low-pass filter 64 is connected to the sensor 34 so that only the lower frequencies which are present and indicative of the signals and information found in the throat input 14 are used for detection in this process to develop rate of change output to the throat differentiator 66, and ON-OFF indications over conductors 70,72, respectively. The ON-OFF indications on conductors 70,72 are used as inputs to the stops-or-silence indicator 74 in FIG. 3. The ON output of filter 64 is provided as an input 76 to the voiced fricative switch 46, and the OFF-signal 72 is used as an input to the unvoiced fricative switch 44.
The stops-or-silence indicator 74, which more particularly is shown in detail in FIG. 3, has several inputs, namely, 70,72, as referred to above, signal line 80 from throat difierentiator 66, signal conductor 82 from the oral differentiator 62, and oral OFF signal on conductor 84, and oral ON signal on conductor 86, and an oral input from VOGAD 36 on conductor 88.
A threshold adjustment means is connected from the output of the sensor 32 to a stress signal terminal 92.
There are seven general output signals from the stops-orsilence indicator 74, three of which are applied to the unvoiced stops switch 40, i.e., conductors 94,94,169, two of which are applied as output signals to the voiced stop switch 42 over conductors 96, 96, an output to a silence terminal 98, and an output from a switch that shows probable presence of a stop, which output passes by conductor 128 to close the unvoiced fricatives gate 44 in FIG. 5. This prevents mistaken identification of an unvoiced stop as an unvoiced fricative in the process shown in FIG. 5.
An output 169 used as an indication of an undifferentiated unvoiced stop, hereinafter defined as p, t, or k," is used also to pass through the unvoiced stop switch 40 and then into the detection circuit of FIG. 4A.
Shown in FIG. 2 is the output of the linear amplifier or voltage VOGAD 36 to which input is applied over conductor 12. The output of VOGAD 36 provides a signal over conductor 88, as described above, to the stops-or-silence unit 74, and a further output to the unvoiced stop gate 40 over conductor 110, an output also to the voiced stop gate 42 over conductor 112, an output to the unvoiced fricative switch 44 over conductor 114, an output to the voiced fricative switch or gate 46 over conductor 116, an output to nasal switch 48 over conductor 118, and an output to the vowel switch 50 over conductor 120. These outputs from VOGAD 36 are used in conjunction with deriving the gated output of the switches 40-50. The outputs of these switches 40-50 are applied to further circuit units of the system as shown in the output terminal extending below the switch in FIG. 2 so that the output of switch 40 is applied as an input to FIG. 4; stop 42 is applied similarly to FIG. 4A; the switch 44 is applied to FIG. 5, respectively; switch 46 to FIG. 5, respectively; nasal switch 48 to FIG. 6, respectively; and vowel switch 50 to FIGS. 7 and 7A, respectively. The signals from the VOGAD 36 are used to derive a measure of stress to be used in the transcriber module of FIG. 9 through the above-described arrangements. In the stress indication, an adjustment is provided as threshold adjustment 90 to the threshold above in which there is to be an indication of stress. The output 92 is applied to FIG. 9, as is shown and will be described in detail below.
Similar to the VOGAD unit 36 for the oral signal of conductor 12, there is also a VOGAD unit 122 connected to conductor 14 which carries the throat input signal, and the output of VOGAD 122 is applied as a gate signal to the nasal switch 48 over conductor 124.
FIG. 3, which has been described in part above, shows the stops-or-silence detector 74 which receives inputs 70,80,82,84,72,86,88, and which emits six outputs as shown. The purpose is to distinguish the true silence from various kinds of stops or plosive sounds and to distinguish the different kinds from each other when possible. However, when not possible, a signal for undifierentiated unvoiced stops is produced in conductor 94. The stops-or-silence detector is provided with voiced stops switch 126, and the unvoiced stops switch 142. It also provides a direct oral plosive input direct to the detection or transducer circuits for the unvoiced stops over conductor 94, and for the voiced stops over conductor 96. The two remaining outputs are indications of undifferentiated unvoiced stops over conductor 169, and of silence over conductor 98. The method of operation is essentially by means of timers and delay circuits, such as silence timer 130, a 0.01-second timer 132, a timer 134, a 0.06-second timer 136, a third timer of 0.04-second delay 140, and switches 141,142,144.
Prerequisite to all regular stops must occur a silence of at least 0.04 second followed by a rapid rate of change of oral signal. Delay switch 140 detects this from inputs 72,84,82, releasing signal 128 when these conditions occur. For voiced stops, a comparator 146 uses inputs of the oral and throat rates by conductor inputs 82,80 to establish whether there is greater than 1:1 ratio; and if so, and if the throat voltmeter or VOGAD 122 is in its ON-condition 70 and there is a stop signal from delay switch 140 with 128, the voice stops gate 42 is activated. The silence output over conductor 98 depends on four inputs, that is, the oral VOGAD or voltmeter OFF condition from conductor 84, the throat voltmeter OFF condition from conductor 72, the absence of oral voltmeter ON condition from conductor 86, and the absence of indication of oral rate of change from oral differentiator circuit 62 from conductor 82. If the throat and oral input are OFF for 0.04 seconds, as determined by delay 140, and if there is no rate-of-change signal, a switch 141 emits a signal which may eventuate in a silence output over conductor 98; but with a rate of signal change present, the switch 142 will emit a signal to open the unvoiced stop gate over conductor 94. This separate switch 142 will open the gate only when there is determined to be a throat voltmeter OFF indication received over conductor 72. To return to the incipient silence signal from switch 141, a silence indication will pass to produce an output on conductor 94 showing silence if timer is thus satisfied for 0.15 seconds; but if an oral voltage cuts in sooner from conductor 86, no silence indication is passed via conductor 98.
It is seen that thus far there are distinctions made between different kinds of stops, as well as between the stops and silence breaks in the speech which are assumed to be detectable upon a clear and intelligible speech input to the system. However, there are stops which are not sufficiently clear to be distinguished by the foregoing method of analysis; for these a separate provision is made to feed the input to the stops transducers in FIGS. 4 and 4A, as follows. The oral input 88 is suspended and stored by a timer of 0.01 seconds (timer 132) which is triggered by the switch 141 through conductor 128. The timer also receives an oral rate of change input 82. When there is a continuing rate of change during that time, the oral input undergoes additional suspension and storage up to 0.06 seconds as determined by an additional timer 136; and if during that succeeding 0.06-second interval, there is a rapid rate of change of oral input, then the stored plosive or stop oral input is supplied to the unvoiced stops transducers over conductor 94. It is supplied from the retaining timers 132,136 of of the stops or silence detector rather than through the normal opening of the gate. If there is no variation in the rate of change during the 0.01 second of timer 132, the oral input signal instead of passing to the 0.06 second timer 136 is switched to 0.03 second timer 144 for retention. Then it is released to the transducers either of voiced 96 or unvoiced 94 stops, depending upon the reading of the associated comparator 144. This comparator has two settings as to fast or slow rate of change during the 0.03-second period, switching the stored input to unvoiced stops 94 to FIG. 4 with rapid change, or to voiced stops 96 to FIG. 4A with slow rate of change.
The foregoing described programs or processes will not handle the unvoiced stops which are not released and not distinct enough to be analyzed from ordinary speech in the transducer circuits beyond the gates. In order to determine the presence of such undifferentiated stops, i.e., p, t, or k, the oral voltage ON and OFF inputs and oral rate of change readings are used in the timer 134 which receive from conductor 84 the oral OFF signal, from conductor 86 the oral ON signal, and from the oral differential signal from conductor 82. The timer 134 is set for 0.15 seconds and is a maximum transducer, activated only when there has been a high rate of change followed by zero change with oral voltage nil. If the oral voltage recurs within 0.15 seconds, the associated transducer in timer 134 emits a signal over conductor 169 for undifferentiated unvoiced stops. However, if there is a silence longer than 0.15 seconds, the transducer in timer 134 is not activated and the circuit output provides nothing.
FIG. 4 shows detection of individual unvoiced stops by stops detector 150. There are two inputs to the stops detector 150, namely, the unvoiced stops over conductor 88 processed by the stops or silence detector of FIG. 3, and the direct input from the gate 40 in FIG. 2 applied over conductor 94. The transduced undifferentiated unvoiced stops signal from the stops or silence detector of FIG. 3 is carried over on conductor 169 as an additional output of unvoiced stops gate 40. For detection of /p/, filters 152,154 are used with the resulting voltages compared in a comparator 156. Filter 152 passes l,8002,200 c.p.s., and filter 154 passes signals in the band 3,8004,600 approximately, while the comparator 156 provides an output of filter 152, filter 154 being greater than one. This means that a ratio showing the output of filter 152 to be greater than filter 154 output will give a transduced output signal for the sound /p/ on conductor 164. For detection of /k/, there are provided a filter 154 and filter 158, filter 158 passing a band of frequencies between 3,400-3,800 c.p.s., so that the resulting outputs of filters 154,158 are applied to a comparator 160 wherein the voltage amplitude of filters 154,158 is greater than unit 2, such that the ratio of filter 154 and filter 158 must be more than 2:1 to give the transducer output for the sound /k/ on conductor 166. For detection of /t/, the voltages from filters 158, 152 are correspondingly compared in comparator 162 wherein a ratio of output voltages of more than 2:1 for the outputs of filters 158 and 152 provides determination of the transducer signal for /t/ on conductor 168.
FIG. 4A illustrates how a similar process is arranged for detection of the individual voiced stop sounds in a circuit called voiced stop transducer 170. Two inputs, namely, the oral input shown in FIG. 2, pass through the voiced stops gate 42 and provided from conductor 42a, together with a separate signal from the delayed input from the stops-or-silence detector of FIG. 3 over conductor 96.
The inputs to the undifierentiated voiced stop 170 from conductors 42a and 96 are fed to an ON-OFF detector 172 used to show there is an active input from either source over conductors 42a and 96. The ON-OFF detector feeds an input to each of band-pass filters such as filter 174 passing 1,lOl,700 c.p.s., filter 176 passing I,700-2,000 c.p.s., and filter 178 passing 2,000-2,400 c.p.s. The ON-OFF detector also provides a signal to gate or switch 180.
The output voltages passing filters 174,176,178 are applied to a comparator 182 so that the voltage from filter 174 is compared together with the combined voltage output from filters 176,178. Comparator 182 thus compares the output of filter 174 to the sum of outputs from filters 176,178, and if the order of magnitude of the comparison is greater than unity, an output of comparator 182 on conductor 184 is derived indicating the sound /b/ on conductor 184. The output thus is transduced or present inly if the ratio is greater than 1:1.
The detection for /g/ by comparator 186 is developed by applying the output of filter 178 for comparison with the sums of outputs of filters 174,176, so that when this comparison or ratio is less than unity, an output on conductor 190 provides the detection of /g/. FIG. 4A also produces a signal for the sound /d/ which is detected when there is a voltage output derived from filter 176 over conductor 192 to a switch 194. Switch 194 produces the output in conductor 196 when there is input over conductor 192 from filter 176 simultaneously compared with the absence of outputs from comparators 182,186 over conductors 184,190, respectively. Thus the sound /d/ is detected when there is the absence of detection of sounds /b/ or /g/, respectively.
In order to determine the presence of an undifferentiated voiced sound /b/, /d/ or /g/, outputs from conductors 184,190,196 are applied to the gate or switch 180. The presence of an output from the ON-OFF detector 172, but in the absence of any specific filter output over conductors 184,190,196, switch 180 produces or releases a transducer signal indicating an undifferentiated voiced stop on conductor 200.
FIG. 5 shows a fricative transducer module 202 for detection of both voiced and unvoiced fricative sounds by means of a network stage of five low-pass filters 210,212,214,216,218 coupled by comparators 220,222,224,226,228. Low-pass filter 210 passes up to 10 k.; filter 212 passes up to 8 k.; filter 214 passes up to 5 k.; filter 216 passes up to 2.5 k.; and filter 218 passes up to l k. Input signal on conductor 88 from VOGAD 36 of FIG. 2 is applied to the low-pass filter 210 and to total signal switch two-way 230 which processes about 96 percent of the total signal strength of applied unvoiced fricative signal on conductor 44a from switch 44 (FIG. 2), or processes about 60 percent of the total input signal on conductor 88 for voiced fricative signal on conductor 460 (FIG. 3). The voice input 88 is capable of interruption by switch 206 activated by the stops-or-silence detector in FIG. 3 through conductor 128. This input therefore will be lacking when a stop is present.
The two-way output of total signal switch 230 is applied to each of the comparators 220,222,224,226,228. The two-way output is actually realized by the conductor 46a for the voiced fricative signal being applied to a voltage-responsive switch means 234, which upon actuation thereof, mechanically activates a series single-pole-double-throw (SPDT) switches 240,242,244,246 from an initial position (shown vertically) to an actuated position (shown diagonally disposed) by lever means 248, schematically illustrated in dotted lines.
Now having described the physical arrangements, it is shown in FIG. 5 that the oral input signals present on conductor 88 are passed through the IO-k. low-pass filter 210 which output is applied to the comparator 220 which compares the output of total switch 230, i.e., either 96 percent of the total signal strength in case of unvoiced fricative, or 60 percent of the total signal for a voiced fricative. If the comparison of the l0-k. filter, output/total switch output is greater than 1:1, SPDT-switch 240 connected to the output of comparator 220 passes an output signal for lunvoiced th/, unless two-way switch 234 is activated; in such case, an output signal lvoiced th/ is produced, as shown.
The IO-k. low-pass filter 210 output is applied over conductor 250 to the 8-k. low-pass filter 212, which output is applied to the comparator 222 for comparing the output of total switch 230, as above described. If the comparison of the 8-k. filter output/total switch output is greater than 1:1, SPDT switch 242 connected to the output of comparator 222 passes an output signal /f/ unless two-way switch 234 has been actuated; in such case, an output signal /v/ is produced, as shown.
The 8-k. low-pass filter 212 output is applied over conductor 252 to the 5-1:. low-pass filter 214, which output is applied in turn to the comparator 224 for comparing the output of total switch 230, as above described. If the comparison of the S-k. filter output/total switch output is greater than l:l, SPDT-switch 244 connected to the output of comparator 224 passes an output signal ls/ unless two-way switch 234 has been activated; in such instance, an output signal /z/ is passed, as shown.
The 5-k. low-pass filter 214 output is fed over conductor 254 to the 2.5-k. low-pass filter 216 which output in turn is fed to the comparator 226 for comparing the output of total switch 230, as above described. If the comparison of the 2.5-k. filter output/total switch output is greater than l:l, SPDT- switch 246 connected to the output of comparator 226 passes an output signal /sh/ unless two-way switch 234 has been actuated; in such instance, an output signal lzh/ is passed from the switch 246.
The 2.5-k. low-pass filter 216 is fed over conductor 256 to the l-k. low-pass filter 218, which output in turn is fed to the comparator 228 for comparison with the total switch 230 output, as has been described. If the comparison of the l-k. filter output/total switch output is less than 111, output conductor 250 passes a sound /h/. It is noted that the ratio is less than that of 1:1 showing that the l-k. filter will have critically cut the frequencies within its band range, therefore properly identifying the proper /h/ speech sound.
Switch 234 is a mechanical gang switch activated by the voiced fricatives gate 46 of FIG. 2, so as to switch the transducer output from an unvoiced fricative to the voiced fricative sound that is in the same bandwidth.
In general, therefore, the same process of comparison and switching continues through the cascade of filters 210-218, except that with the last one (I-k. filter 218) there is no further filter, and a ratio of less than 1:1 when present will signal the presence of an /h/.
FIG. 5 there is seen to yield output signals indicative of /0/, lvoiced thl, /f/, /v/, /s/, /z/, lsh/ lzh/ and /h/, a total of nine speech sounds.
FIG. 6 illustrates the nasal detection 260 having oral input from conductor 12 and throat signal input 14 (see FIG. 2). These signals are coupled for reinforcement of the lower frequencies on conductor 262; FIG. 6 produces the detection of nasal sounds /m/, /n/, /ng/, and undifferentiated nasal or /n/, and /l/.
The conductor 262 having reinforced input signals thereon is applied in parallel to each of band-pass filters including filter 270 passing 700-1,200 c.p.s. filter 272 passing 1,300-2,800; filter 274 passing 700-1,000; filter 276 passing 1,400-2,l; filter 278 passing 2,000-3,000; and filter 280 passing 1,600-2,000.
The outputs of filters 270 and 272 are applied to ratio or amplitude comparator 282 which produces a signal on conductor 284 indicative of the /m/ sound when filter 270 output/filter 272 output is greater than unity.
The outputs of filters 270 and 272 are applied to ratio or amplitude comparator 286 which produces a signal on conductor 288 indicative of the /n/ sound when filter 272 output/filter 270 output is greater than unity.
The outputs of filters 274 and 276 are fed to a ratio or amplitude comparator 290, which produces a signal on conductor 292 indicative of the lng/ sound when filter 274 output/filter 276 output is greater than unity.
The outputs of filters 278 and 280 are fed to a ratio comparator 294 which produces a signal on conductor 296 indicative of the /1/ sound when the filter 278 output/filter 280 output is greater than unity.
The filter 274 output is fed to a gate or switch 300 to which also is fed the output of comparator 282 on conductor 284, the output of comparator 286 on conductor 288 and the output of comparator 290 on conductor 292; such output of switch 300 is a signal, if any, indicative of an undifferentiated nasal sound, i.e.,  on conductor 302. A voltage through filter 274 between 700 and 1,000 I-Iz. will yield an output unless outputs on conductors 284,288 or 292 show the presence of a specifically identified /m/, /n/ or lngl.
The vowel detector 310 in FIG. 7 operates with only two inputs: the oral input signal on conductor 50a in FIG. 2, and total oral signal strength component on conductor 58 in FIG. 2. In summary, it produces transduced output signals, conductors 311 representing 10 vowel sounds including /r/ and one undifferentiated vowel signal. Eight of the vowel signals are supplied to the diphthong transducer of FIG. 8, and three of them are supplied directly to the transcriber module 20 of FIG. 9. In addition, a ratio signal for peak amplitude of first formant as a function of total signal strength 344 is passed to the diphthong transducer in FIG. 8.
The method used in differentiating vowel sounds is to compare the peak (centrum) amplitudes of the first and second formants relative to each other and to the total oral amplitude at the same time, then to check this against the frequency of the first formant. Detection of the first formant follows conventional methods by using a bank of filters 312 at 20-Hz. intervals between 240 and 960 Hz. This enables determination of the frequency of the first formant centrum. Detection of the second formant in FIG. 7A is done differently-through scanning for the peak, without regard for the exact frequency, by means of heterodyning.
The input signal from the vowel gate on conductor 50a of FIG. 2 is diverted into two different circuits, one for each formant, and it supplies a transducer 340 which indicates the presence of any kind of vowel input whatever, described in detail below. The first formant detector 312 receives the input voltage to a bank of 36 sequential centered filters of 20-cycles bandwidth in the range from 240 to 960 Hz. The 36 filtered outputs are supplied over individual conductors 314 to a comparator called a peak and centrum discriminator 316, which reads the peak amplitudes values in output 318 and determines whether the centrum lies within certain particular bandwidth ranges: 400-700 Hz. passing over conductor 321; 340-460 I-Iz. passing over conductor 322; 260-400 Hz. passing over conductor 323; 240-340 112. passing over conductor 324; 400-700 Hz. passing over conductor 315; 400-500 I-Iz. passing over conductor 326; or 700-940 Hz. passing over conductor 327. Determination of the location of the centrum in these bandwidths is used to help differentiate the vowels when the peak amplitude ratios are compared. The method of differentiation is shown in the chart of Table 4 (see Appendix), where amplitude ratios are derived from the Peterson-Barney studies.
From the input signal for total oral signal strength on conductor 58, a comparator 342 also receives the first formants peak amplitude reading from conductor 318 and compares these, producing a ratio output on conductor 344 for the first formant quotient. This is supplied to a final comparatorratiometer 346 associated with the transducer 348 for the vowel output signals 311.
For the second formant process in scanner 328, the input signal is passed through a pair of bandwidth filters, a filter 332 passing 720-940 Hz. from oral input conductor 50a, and the other filter 350 passing 940-2900 Hz. The voltage from filter 332 resulting at below 940 is subject to close-off by a switch 330 which is activated when the first formants centrum shall have proved to be located in that bandwidth as applied over conductor 327 because in that case the second peak will formant above 940 Hz. The filtered voltages from the paired filters 332,350 are supplied over conductors 352,354 to the second formant scanner 328 which is shown in detail in FIG. 7A. It produces an output of second formant peak amplitude on conductor 358 and an indication when the second formants centrum was above 1,050 B2. on conductor 356, both to be described below. That indication on conductor 356 is supplied to the vowel transducers 348 where it is needed for substantiative discrimination between certain vowel phonemes as shown in Table l A comparator-ratiometer 346 receives the second formants peak amplitude reading through connector 358 and also an input of the total signal strength component on conductor 58. This ratio output from ratiometer 346 is then supplied to a comparator 360 for comparing the ratios of peak amplitude of the first and second forrnants as quotients of the total signal strength. Preset ratios according to Table 1, when met, will activate the vowel transducers 348 in combination with the inputs of information concerning first formant bandwidth from conductors 321,322,323,324,325,326,327 from the second formant scanner 328. The resulting transduced output signals 311 for presence of the various vowels to the diphthong transducer in FIG. 8 are on conductors 371-378, and to the transcriber module 30 of FIG. 9 over conductors 379,380, the latter for sounds which do not appear in diphthongs.
Whenever a vowel is detected by the vowel transducers 348, a signal is emitted over conductor 339 and passed to the undifferentiated vowel transducer 340 so that it will not put out a signal on conductor 341 for an undifferentiated vowel. However, whenever there is a vowel input through the vowel gate over conductor 50a without identification of a specific vowel, the transducer 340 will supply its output direct to the transcriber module 20 of FIG. 9 over conductor 341.
FIG. 7A is a detail of the second formant scanner 328 whose purpose is to detect and measure the peak amplitude of the second formant regardless of the frequency of its peak and to determine whether or not that peak lies in the range of 1,050 to 2,900 Hz. The method used is to heterodyne the incoming signal so that peak-measuring voltmeters can be used to read the peak. Input voltages in the range from 720 to 2,900 Hz. passing over conductor 354 are merged on conductor 390 as they emanate from two filters passing 720-940 Hz. passing over conductor 332. The merged voltages are fed through connector 390 to a mixer 392. The mixer 392 is activated from a sweep oscillator 394 that linearly sweeps through the range from 10 kc. to 11.850 kc. The resulting signal on conductor 396 is passed through two filters of 20-1-12. bandwidth each, on filter 398 centered on 8,950 I-Iz., and the other filter 399 on 9280 Hz. The filtered signal outputs of 398,399 then are separately supplied to peak-measuring voltmeters 400,402. These voltmeters are activated in synchronization with the sweep oscillator 394 by means of an ON-OFF signal applied over conductor 404.
In order to obtain the higher of the two peak amplitudes thus detected in peak voltmeters 400,402 as an indication of the peak amplitude of the second formant, the respective signals are supplied to a third voltmeter 406 the output of which supplies that information on conductor 358. That information from peak-voltmeter 406 is also supplied to a comparator 409, also receiving output from voltmeter 400, comparator 408 measuring whether the second formant peak was that peak which was detected by the filter 399 centered at 8,950 Hz. through connector 358; so that if the peak thus was in the range l,050-2,900 1-12., the comparator passes that information as its output 356.
A diphthong transducer 420 of FIG. 1A and shown in detail in FIG. 8, processes single vowel outputs from the vowel detector 348 of FIG. 7 to identify diphthongs when present, and it produces distinctive electrical signals on conductors 42l,422,423,424,425,426 for six such diphthongs. Since it also relays the eight single-vowel signals 371-378 from the vowel detector 348, it passes all vowel and diphthong output signals detected by the transducer module 420 to the transcriber module 20.
The single-vowel signals 371-378 fed to the diphthong transducer 420 represent the basic simple vowel phonemes of American English as continuous signals generally timed to last 0.2 seconds each except for /U/ and /u/ which are transmitted in single, somewhat shorter, pulses. The continuous signals allow retention of the single-vowel signals during the time required to determine whether or not they are used in a diphthong before releasing them. Since /U/ and /u/ occur only terminally, such delay is not required for them. Other inputs to the diphthong transducer 420 are a signal from the vowel detector 310 in FIG. 7, through conductor 344 from peak comparator 342 that gives the ratio of the first formant signal strength to the total oral signal strength. That input (344) is supplied to a memory unit 424 of 0.15 seconds which is activated by the signal on conductor 426 from an OR circuit responsive to a signal on any of 371-376 indicating the presence of any vowel input except /U/ and /u/. The other input to the diphthong transducer 420 is the rate of change of oral signal on conductor 82 from FIG. 2. It is supplied each to the memory unit 424 and to a comparator 430 of rates of change.
The presence or absence of a diphthong is determined in the transducer 420 according to whether there is a steady rate of change of signal strength when two vowels occur consecutively. The memory 424 of 0.15 seconds determines the timing parameter for this comparison in comparator 430; and if there is only a steady rate of change, which characterizes a diphthong glide, the comparator 430 activates a ganged switch 434 which engages the diphthong circuits by means of singlepole-double-throw switches 441,442,443,444,445 ,446,447,448. An unsteady rate of change from comparator 430 indicates that no glide is present and the vowels 311 are intended to be separate. Consequently, unless the comparator 430 is satisfied with a steady-state ratio, the switches 441-448 are normally closed (upwardly, as seen in FIG. 8) in a state where the single, simple vowel signals will be transmitted following the 0.l-second memory delay. Those output signals will pass to the transcriber module through connectors 451,452,453,454,455,456,457,458.
When the switch 434 is actuated (closed down, as seen in FIG. 8), the following occurs. The leh/ sound signal in conductor 373 is applied to transducer 461 to combine with sounds /I/ or /i/ from switch 446, so that a signal A" is produced in conductor 421. /I/ and /i/ merge in conductor 470.
The /I/ or /i/ sounds from switch 446 also combine in transducer 462 with the signal /I)/ from conductor 376 passing switch 442, to produce loi/ in conductor 422.
The /a/ sounds from switch 443, or lg/ in conductor 464 from switch 444, combine in transducer 463 with the signal /I/ or /i/ in conductor 460 to produce I in conductor 423.
The /8 sound from conductor 374 passing switch 444 merges with the /a/ signal through connector 474 as a dialectal substitution for /a/. /a/ or /as/ is applied to transducer 464 with the outputfrom a rate detector 475 to produce [au/ in conductor 424. The rate detector 475 is responsive to the comparator 530 but detects that the ratio change of the comparator increases in a period of 0.2 seconds.
The /I/ or /i/ sound signal in conductor 470 combines in transducer 465 with the merged /U/ or /u/ signals in conductor 476 to produce an output for U" in conductor 425.
The 12/ sound from switch 442 combines in transducer 466 with the output from the rate detector 475 to produce /0/ on conductor 426.
The comparator 475 senses any increase within 0.2-second periods in the ratio of the first formant peak amplitude to the total signal amplitude in the interval between the beginning and end of a diphthong. This is the interval involved in the action of the memory unit 424. This comparator receives the ratio signal through conductor 344 from the vowel detector of FIG. 7. Its timing is activated by an output from the comparator 430. When there is an increase of the first formant strength without a rapid rate of change of oral signal, the comparator 475 passes an output 477 to gates 464 and 466 to complete formation of diphthongs /ao/ and /0/ respectively.
The six diphthong output signals 421-426 supplement the eight single-vowel signals 451-458 in providing the complete vowel and diphthong indications from the transducer module V 20 to the transcriber module in FIG. 9.
The transcriber module 480 in FIG. 1, and shown in detail in FIG. 9, receives (in general) electrical signals representing the detected sounds (39 in number) plus a signal representing the occurrence of silence and processes these, converting them to 89 output signals to drive the typing or printing mechanism 26 of FIG. 1. FIG. 9 shows this transcriber module 480. It has six input channels-one for the silence signal on conductor 98 from the stops-or-silence detector 74, FIG. 3; one for the stress indication via conductor 92 from the threshold setting 90 of FIG. 2; and four channels representing the categories of speech sounds, i.e., (l) switches 240-246 which come from transducer outputs 220-228, 261 representing the fricative sounds both voiced and unvoiced in FIG. 5. Since none of these signals will occur simultaneously, they are merged in the fricative input channel; (2) stops from conductors 164,166,l68,190,196,200 which come from outputs of transducers 156,160,162,169,l ,l86,194 representing the various stops, voiced and unvoiced and undifferentiated in FIGS. 4 and 4A, also merged; (3) vowels from conductors 371-380, 341, 421-426 which come from the outputs of transducers 340,348 representing single vowels and diphthongs and undifferentiated vowels, in FIGS. 7 and 8, also merged; and (4) nasals from conductors 284,286,290,300 which come from the outputs of transducers 282,286,290,300 representing the nasal sounds and /l/ and undifferentiated nasals in FIG. 6, likewise merged.
Input pulse dividers or time choppers 491,492,493,494 are placed in each channel and divide the pulse or signal input into a series of approximately equal periods, respectively, for each of the channels. The time chopper for fricatives and vowels are set to divide the applied signal into series of 0.2- second pulses, as long as the signal is applied and, correspondingly, the time chopper for stops and nasals is set to divide the applied signal into series of 0.l-second pulses for as long as the applied signal is present.
The transcriber module 480 is essentially a multiphase digital computer having a direct set program. It has two outputs: a spacing signal to indicate times between words on conductor 482, and a channel 484 for 88 distinctive signals for printed characters.
The input signals are applied first to a phoneme sequence sensor and designator 490 to which the silence signal 98 also is applied since its presence will signal grouping of inputs to constitute syllables. The phoneme sequence sensor and designator 490 contains a storage of about 337 syllabic combinations of fricatives from the four classes of sounds represented by its four channels. Basic to English, these syllabic combinations are shown in Table 1 (see Appendix). When a possible syllable has been identified, its speech-element signals are passed as inputs over conduit 492 to the regrouper and storage means 494 of 5 preselected actual syllables for that particular combination of classes of sounds among the 337 syllabic combinations. Consequently, sequence indication on conduit 496 accompanies passing of the speech-element signals to facilitate the search in storage means 494. The storage of actual syllables is combined with a unit for regrouping sequences if there is no match. The maximum capacity of this unit is 10 phonemes or not more than 1 second, whichever is first. If, in a particular sequence there is no match with a stored syllable, the sequence indication is altered accordingly as the last phoneme of the nonviable sequence is dropped from it and the shorter sequence then tried. If it then fits, that last phoneme becomes the first one in a new sequence, then the last two, then the last three, etc. This shortening and consequent regrouping of phonemes for the syllable involves identification of a different sequence. Therefore, that information is passed back to the phoneme sequence sensor and designator 490 through conductor 498 so that it can start identification of a new sequence with the terminal elements rejected from the previous sequence. Regrouping of phonemes starts either with that signal on conductor 498, with a silence input applied on conductor 98, or upon receipt of a signal from conductor 482 fed to designator 490 that shows completion of a syllable or word ready for printout. The pat- :ierr; used for regrouping is shown in Tables 2(a b) (see Appen- If the actual syllable identified is not one that is used in stored words of the vocabulary of 12,000 or so words chosen, or it it consists of only two or three elements (i.e., appears to be a fragment or remnant), it is passed through conductor 500 to a storage unit of two and three phoneme units 501 where it is either matched with a stored short word or is converted to printout signals in phonetic-phonemic form over conductor 532.
Upon identification of a syllabic sequence unit in designator 490 that could be part of a larger word in the vocabulary storage 516 of multisyllable words, the designation for that particular syllable is signaled to the storage unit 494 of vocabulary words through conductor 492 and, simultaneously, the phoneme sequence units together are passed to a syllable retainer 510 through conductor 512. The syllable retainer 510 has a storage capacity of up to nine syllables which it holds either (1) until their release as parts of a word that matches one in the vocabulary storage 516 through conductor 540, or (2) until the retainer 510 is saturated, or (3) until there is a spacing input to indicate beginning of a new verbal unit, as indicated on conductor 482. These latter two releases through conductors 500 and 520 pass to the storage of two and three phoneme units for short printouts. In the storage of two and three phoneme units 50], sequences that do not match about 1,600 stored short words that will be printed conventionally over conduit 532 will be printed in phonetic-phonemic manner, with stress shown by uppercase or bold printing in response to a stress signal on conductor 92 coming from FIG. 2.
An additional input to the storage of two and three phoneme units 501 is provision for supplying a period or dot whenever the silence input reaches 1 second cumulatively. This is accomplished by a timer 522 of one second operating on the silence signal 98 which will automatically signal what appear to be sentence endings or long pauses on line 524 to the storage of two and three phoneme units 501. This same storage 501 will convert inputs designating letters of the alphabet orally into the corresponding capital letters for each.
The storage of word vocabulary 516 contains about 10,600 words that are arranged according to syllable designations within the 337 of this system, stored in the proper sequence. For each such stored word, there is a corresponding coded printout signal which is activated through the output 530 to the printing unit 26, whenever there is an input of syllables that matches a stored word. In the case of punctuation, the coding translates from the verbal name to the punctuational printout designation. Upon release of a printout signal for any word, the storage of word vocabulary unit 516 emits a spacing signal through conductor 482 which goes both to the syllable retainer 510 and to the phoneme sequence sensor and designator 490 as a signal for start of a new verbal unit. This spacing signal, of course, likewise goes to the typographic or printing unit 26 at the end of each word. Since printout signals will not emanate simultaneously from both the word vocabulary storage 516 and the storage of two and three phoneme units 501, their outputs 530 and 532 pass through the same connector 484 to the printing unit 26.
Table 3 (see Appendix) shows 88 printout characters and phonemic printout symbols (with phonemic equivalents in parenthesis) for use in the printing unit of FIG. 1. The 88 symbols are those which will be activated by 88 distinctive signals through connector 484. The 89th signal is for spacing which is applied over conductor 482.
Additional embodiments of the invention in this specification will occur to others and therefore it is intended that the true spirit of the invention be limited only by the appended claims and not by the embodiment described hereinabove. Accordingly, reference should be made to the following claims in determining the true spirit of the invention.
APPENDIX.'IABLE 1 [377 Phoneme sequence patterns in English-in four categories of phonemes] Code: V-vowel including/r/and/ll; terminal diphthongs as single vowel units.
S-stop (plosive), voiced or unvoiced. F-fricative, voiced or unvoiced.
V SVVV SVVVN FVVVS FSVN SFVN F FNVF VV FVVV SVVVNS FVVVSF FSVNS SFVNSSF FNVFS VVV NVVV SVVVNF FVVVSFS FSVNSF SFVNSFF FNVFF VVVV FSVVV FVVVSFF FSVNSS SFVNSFS FNVFFS SN SVVVNFS SVNF SFVNFS FNVFFF VVVN SVVVF NVN FSVNSSF SFVF FNVFSS SV VVVNS SVVVFS NVNS FSVNSFF SFVFS FNVFSF FV VVVNF SVVVFF NVNSF FSVNSFS SFVFF FNVS NV VV VNFS SVVVS NVNSS FSVNFS SFVFFS FNVSS FSV VVVF SVV VSF NVNF FSVF SFVFFF FNVSF SFV VVVFS SVVVSFS NVNSSF FSVFS SFVFSS FNVSFS FNV VVVFF SVVVSFF NVNSFF FSVFF SFVFSF FNVSFF FN VV VS ..NVNSFS FSVFFS SFVS FNVSSF V V VSF FVN NVNFS FSVFFF SFVSS FNVSFSF VN VVVSFS FVNS NVF FSVFSS SFVSF VNS VVVSFF FVNSF NVFS FSVFSF SFVSFS FNVVN VNSF FVNSS NVFF FSVS SFVSFF FNVVNS VNSS VVVVF FVNF NVFFS FSVSS SFVSSF FNVVNI" VNF VVVVS FVNSSF NVFFF FSVSF SFVSFSF VNSSF FVNSFF NVFSS FSVSFS VNSFS SVN FVNSFS NVFSF .SFVVN SVNF FVFF NVSFS SF\'\'F FNVVS FS VFF SVNSSF FVFFS NVSFF FSVVN SFVVFS FNVVSF F VFFb SVNFF FVFFF NVSSF FSVVNS SFVVFF VFFF BVNSFS FVFSS N'VSFSF FSV'VNF SFVVS FNVYY N VlBb SVNFfa FVFSF .FSVVN F SFYYSF 14 1\1\'\ N s VP'BY SVF FVS NVVN FSVYF SFVVSFS FN\'\'\'N F VS SVFS FVSS NVVNS FSVVFS SFVVS l" FNY N Fh VSS SVFF FVSF N'VVNF FSVYFF s FNY YF "SF SVFFS FVSFS NVVNFS VSFS SVFFF FVSFF NVVF VSFF SVFSS FVSSF NVVFS VSSF SVFSF FVSFS NVVFF VSFSF SVS FVVN NVVS SVSS FVVNS NVVSF SVV SVSF FVVNF NVVSFS FVV SVSFS FVVNFS NVVSFF NVV SVSFF FVVF FSV V VNFS SFVVVSF FSVV SVSSF FVVFS NVVVN FSVVVF SFVVVSFS SVSFSF FVVFF NV V VNS FSVVVFS SFVVVSFF VVN SVVN FVVS NVVVNF FSVVVFF VVNS SVVNS FVVSF NVVVNFS FSVVVS FNVN VVNF SVVNF FVVSFS VVVF FSVVVSF FNVNS VVNFS SVVNFS FVVSFF NVVVFS FSV V VSFS FNVNSF VVF SVVF FVV VN .FSV V VSFF FNVNSS VVFS SVVFS FVVVNS NVVVFF MFNVNF VVFF SVVFF F.V V VNF NV V V S SFVN FNVNSSF VVS SVVS FV V VNFS NVVVSF SFVNS FNVNSFF VVSF SVVSF FVV VF NV V VSFS SFVNSF FNVNSFS VVSFS SVVSFS FVVVFS NV VVSFF SFVNSS FNVNFS VVSFF SVVSFF FV V VFF APPENDIX.-TABLE 2(a) I APPENDIX Table 3 [Sequences of phonemes in syllable formations] Key: fi i E or a a b c t l e i g h a ir l 'm n o p q r s t u v w x y z 2 3 v l I I I I I I I I 7 I I I I 7 I Y 1 I Y J I I Y I I Nnas a l. (W 220.127.116.11, 9.4 ,1), .F, H. 1.1. K, L, M, 0.1. Q, V-vowal, r I 2 N b d a, 6, ii, 11, in k, 6 Bold face (only for phoneme-syllable um er 0O 8 (example only) print-out). E, I, 0 U, a, e, l, o, a, 6, o, u plus a spac ng an Initial sequences (may Phonemic moms.
stand complete): E g 1 s (I)-flnal part of St V i (1) r z dlphthongs:
47 6 5:) m 1 :11, A, I. g (3 n 11 U) final art I -ng s p 0 2g 35 o (0;, (6) p 211 (5) diphtho%s: 51 1 so, 0, 52 t tl1 53 1! MP 0 54 11 (u) b ch (1; 61 W- d 1' 40 y- (1-) 8 complete) 62 a h Terminal sequences: 1
-N 21 Norm-m. undlflerentiated nasal: m,n, or -ng; p. undifferentiated p, 22 l; ork k- N Fr 24 ..J ba-'. 1 .h.1 d -v --N St St--. 23
t APPENDlX.-Tuble4 Distinguishing 29 peak ratios Frequency range 11 Basis for confirmation $35-- Vowel 1st F 2nd F 1st F 2nd F and supplemental distinction Fr St 16 -5 -15 440-540 xxx Fr 17 -1 -s (680-940) xxx (Peak amplitude ratios only). gi gt; 5 -1 -10 (580-800) XXX St 31 i g gg }Flrst formant ireq. range. "S g: 2 17 400-100 1, 700-2, 400 First lormant freq. ra ge g 35 -3 --23 340-400 1, 800-2, 000 (with second formant tre- F 37 4 -24 260-340 2,1s0-2,900 quency more than 1,050). F r 33 260400 7504 050) First formant lreq. range t t r (with second formant ireq. St Fr St Fr 36 0 -7 520-640 (720-1, 030) less than 1 050) 1 Must be attached to one of the above.
' KFPENDIXETABLE 2 (b) [Re-grouping oi phoneme sequence]s from no-go syllables into new syllables IMMEDIATE shift [or new terminal 5 grouping: New initial sequence grouping: 1
28dc22t021- 28 becomes 44-46 or #48; 22 becomes 41-42 or #49. 236:24 t022. 24 becomes 44-46 or #48; 23 becomes 41-42 or #49. 25 to 23-. 25 becomes 44-46 or #48. Y 264227 to 24. 26 becomes 44-46 or #48; 27 becomes 41-42 or #49. 29 to 28. 29 becomes 41-42 or #49. 138: 15 toll. 15 becomes 44-46 or #48; 13 becomes 41-42 or #49. 1041 171:015. 11 becomes 44-40 or #43; 10 becomes 41-42 or #49. 186: 14 to 13. 14 becomes 44-46 or #48; 18 becomes 41-42 or #49. 326: 34 to31. 34 becomes 44-46 or #48; 32 becomes 41-42 or #49. 33 to 32... 33 becomes 44-46 or #48. 356237 to34. 37 becomes 44-46 or #48; 35 becomes 41-42 or #49. 36 to 35- 36 becomes 44-46 or #48. 46 to 48... 46 becomes 51.
4 e shewe ea elmeeeehswaJnTQle;(an.
In neg. db referred to 1st formant centrum for Wh le ance s;
l. Speech-to-writer apparatus comprising:
a detection and analysis transducer module receiving an oral and a throat signal input and having sound separation means for detecting, ditferentiating, processing and producing speech sound signals according to at least the following stated categories: (I) vowels and semivowels, (2) nasals, (3) unvoiced fricatives, (4) voiced fricativcs, (5) unvoiced-stops, and (6) voiced stops, said sound separation means (FIG. 2) having sensors responsive to the oral signal and throat signal inputs,
a stops-or-silence means responsive to said sensors, ratiometer circuits fed by said sensors and said stops-orsilence means to produce an output signal, and
a plurality of logic gate means selectively responsive to outputs from said sensors, said stops-or-silence means, said ratiometer circuits, for singularly passing processed speech.
2. The invention according to claim 1 wherein said sound signals produced by said transducer module are fed separately to a transcriber module having a syllable storage and a wordvocabulary storage.
3. The invention according to claim 1 wherein a printout signal means for said transcriber module is connected to a typographic unit producing a written output of either conventionally spelled words or syllabic utterances.
4. The invention according to claim 1 wherein said sound separation means (FIG. 2) has:
an oral sensor gate receiving an input of said oral signal and producing digital outputs indicative of oral ON and oral OFF,
a throat sensor gate and low-pass filter receiving an input of throat signal and producing digital outputs indicative of throat ON and throat OFF signal,
differentiator means responsive to said oral sensor,
throat difi'erentiator responsive to said throat sensor,
said stops-or-silence unit receiving inputs of said oral signal, said oral ON and said oral OFF signals, said throat ON and OFF signals, and said oral and throat differentiator outputs,
said ratiometer circuits receiving inputs from said oral and throat sensor gates, said oral and throat difi'erentiator outputs and producing ratioed output signals, and
each said logic gate means receiving said oral signal, including an unvoiced stop gate receiving outputs from said stop-or-silence unit,
a voiced stop gate receiving outputs from said unvoiced stop gate and said stop-or-silence unit,
an unvoiced fricative gate receiving said oral ON signals and said throat OFF signal,
a voiced fricative gate receiving said throat ON signal and one of said ratioed output signals, and
a vowel gate receiving outputs from said oral gate and one of said ratioed output signals.
5. The invention according to claim 4 wherein (FIG. 4) said unvoiced stop gate provides an output to a stop unit comprised of a terminal for receiving signals indicative of an unvoiced stop that is undifferentiable as to whether it is /p/, /t/ or /k/.
first, second and third band-pass filters for passing l,8002,200, 3,800-4,600 and 3,400-3,800 cycles, respectively, and receiving the outputs of said unvoiced stop gate and said stops-or-silence unit producing filter output signals, and
a first, second and third comparator, said first comparator receiving output signals from said first and second filters, said second comparator receiving output signals from said second and third filters, and said third comparator receiving output signals from said first and third filters.
6. The invention according to claim 1 wherein (FIG. 3) said stops-or-silence means for voiced stops, unvoiced stops, and for silence, comprises a comparator 146 receiving outputs from said oral and throat differentiators and producing output when the amplitude of said throat differentiator output is equal to or greater than the amplitude of said oral differentiator output,
a switch means 126 receiving said produced output, said throat ON signal, and an output from a second circuit indicating the presence of a stop, saidsecond circuit having a timer 140 receiving inputs of said oral OFF and throat OFF signals and producing a delay gate output,
a switch 141 receiving inputs of said oral difierentiator and said delay gate output producing complementary outputs identified as silence signal without dt and stop with dt,
a timer gate (0.15 seconds max.) 134 receiving inputs of said oral ON and OFF and oral differentiator signals, to pass a predetermined delayed signal representing undifferentiated unvoiced stops,
a silence timer gate receiving inputs of said silence signal without dt and said oral ON signal producing a silence signal,
an unvoiced stop gate 142 receiving inputs from said throat OFF signal and said stop with dt signal,
a timer (0.01) delay gate 132 receiving inputs of said oral signal from said oral ditferentiator and from said second circuit means producing mutually exclusive outputs of greater or less than 0.0l-second delay duration,
a timer (0.06) 136 receiving inputs of said less than 0.01-
second delay duration and said oral signal, and also from the said oral differentiator, producing an output direct to the unvoiced stops transducers,
a delay switch (0.03) 144 receiving inputs of said more than 0.0l-second delay duration, and said oral differentiator, and producing outputs direct to both voiced and unvoiced stops transducers.
7. The invention of claim 1 wherein said detection and analysis transducer module comprises a nasal unit (FIG. 6) responsive to oral and throat inputs, a plurality of filter comparators, one of said comparators deriving an /1/ signal and one deriving an undifferentiated nasal sound as a result of said added throat input.
8. The invention of claim 1 wherein said detection and analysis transducer module comprises a vowel detection unit FIG.
7) responsive to oral input to provide detection and dif ferentiation of single vowels from each other by ratio comparison of first and second formant peaks to total signal strength deriving vowel signals therefrom.
9. The invention of claim 8 wherein are means for deriving (FIG. 7A) second formant peak amplitude signals without reference to its frequency which is accomplished by heterodyning.
10. The invention of claim 9 wherein said transducer module operates substantially as well with oral input signals from either a male or female voice.
11. The invention of claim 8 wherein said vowel detection unit includes a transducer 340 receiving an oral input signal and a signal indicative of the presence of an identified vowel, said transducer producing an output signal indicative of an undifferentiated vowel.
12. The invention of claim 11 wherein said transducer module comprises a diphthong transducer (FIG. 8) that distinguishes diphthong sound signals from single vowel sound signals (371-376) by a differentiated oral signal'applied to a memory, and a first formant quotient signal and from a signal from the ratio comparison of first fonnant peaks to total signal strengths (claim 9), said memory acting upon a comparator to condition diphthong gates (461-466) to pass a diphthong signal, said memory also acting on a switch (434) to shunt nondiphthong sound signals to a transcriber module.
13. The invention of claim 2 wherein said transcriber module (FIG. 9) comprises a phoneme sequence sensor and designator (490) for receiving signals indicative of fricatives, stops, vowels, diphthongs, nasals, and a silence signal;
a regrouper and storage means 494 of preselected phoneme groupings responsive to the output of the designator,
a syllable retainer 510 having a storage capacity of syllables to be actuated by the regrouper means output, and
a word vocabulary 516 cumulatively responsive to the consecutive output of the syllable retainer for producing print out signals of the longest word forms provided by spoken input.
14. The invention of claim 13 wherein said transcriber module in response to receiving said silence signal produces a spacing indication in the printout signal, indicating spacing between words and punctuation, said printout signal providing either (a) dot signals for a typographic unit, or (b) punctuation signals responsive to specific voiced inputs, e.g., comma," semicolon."
15. The invention of claim 14 wherein the transcriber module (FIG. 9) comprises storage of phoneme sequences responsive to (a) an indication of stress in the oral input signal, (b) output from the regrouper means, and (c) the out-