US 2891111 A
Abstract available in
Claims available in
Description (OCR text may contain errors)
J. L. FLANAGAN SPEECH ANALYSIS June 16V, 1959 s sheets-sheet 1 Filed April l2, 1957 f//a H ,verh/012K caw/ale w P W o INVENTOR, m/rfs ora/v fum/16m' vr Arrok/vfrf J. L. FLANAGAN SPEECH ANALYSIS Y June 16, 1959 3 Sheets-Sheet 2 Filed April l2, 1957 A J m n3, a; ,n mm a 5| wm .I e., mm WW u m rum 00 a W WM Il IIIIIIIIIIIIIIIIIIII .Il -||l||||||||l|l||1| June 16, 1959 J. l.. FLANAGAN 2,891,111
n SPEECH ANALYSIS Filed April l2, 1957 5 Sheets-Sheet I5 1 r-,e aad o u LLA 3 fvg/@e465 fav/Amr nefafmc/as (w cas) [r2-6:
EG 6 INVNTOR.
United States Patent O SPEECH ANALYSIS James Loton Flanagan, Greenwood, Miss., assignor to the United States ot' America as represented by the Secretary of the Air Force Application April 12, 1957, Serial No. 652,634
y l 2 Claims. (Cl. 179--1) (Granted under 'Itle 35, U.S. Code (1952), sec. 266) The invention described herein may be manufactured and used by or for the United States Government for governmental .purposes without payment to me of any royalty thereon.
This invention relates to speech analysis, and particularly to the transmission of speech over a communication channel highly restricted in transmission bandwidth and capacity.
The invention is pertinent to a speech bandwidthcompression system of the analysis-synthesis type, in which the speech information lis coded in terms of signals represent-ing the major vocal resonances (or formants) and the nat-ure of the excitation of the vocal tract, both as functions of time during the production of speech. The major vocal resonances, or formants, are manifested as maxima in the frequency spectrum of speech radiated by a talker. lt is basic to the operation of such a speech compression system that the frequencies of these spectral maxima (i. e. the formant frequencies) be determined by automatic analysis as the speech is uttered, and that signals representing these frequencies be transmitted to the speech synthesizer in order .that the speech may be reproduced with negligible time delay. p
The conventional telephone channel, which is a waveform rtransmission system, requires a transmission bandwidth off the order of 3000 c.p.s. and a signal-to-noise ratio of about 30 db. The speech bandwidth compression system disclosed in my co-pending patent application Ser. No. 551,478, filed December 6, 1955, on the other hand, can function with a much narrower total transmission bandwidth of the order of only 50 c.p.s., and a signal-to-noise ratio of approximately 30 db.
The present application is also concerned with a speech bandwidth compression system whichlike the system disclosed in my co-pending application above-identifiedrequires only a narrow transmission bandwidth. The system herein disclosed also operates to accept continuous speech at its input and -to deliver at its output electrical signals, varying slowly with time, whose amplitudes represent lthe frequencies of the lirst three major vocal resonances, that is, the iirst three lformant frequencies of the input speech. The input speech signals are directed through a set of contiguous band-.pass filters, each with an associated rectifier and smoothing network (as in the previous patent application) to provide a short-time amplitude spectrum of the speech. But whereas the system of the previous patent application employs sampling procedure for examining this spectrum to determine and indicate the frequencies of the first three maxima, the system herein disclosed employs a diterent examination procedure. The examination procedure of the present invention, which may be defined as spectrum-segmentation, or maximum amplitude selection, takes advantage of the fact that the lirst three speech -formants occupy frequency ranges which, on the average, do not overlap to any great extent. p
` Utilizing this characteristic of the speech formants to 2,891,111 Patented June 16, 1959 ice segregate themselves into reasonably well-delined frequency ranges, the present invention provides a method of speech spectrum examination which comprises, as `its first step, the process of grouping the channels of the speech analyzing filter set into three distinct channel families of which the iirst channel family 0-800 c.p.s.) coincides with a frequency range (hereinafter referredto as range F1) embracing .the first formant of the principal English speech vowels; the second channel family (800- 2250 c.p.s.) coincides with a frequency range (F2) embracing the second formant of these vowels; and the third channel family (2250-3600 c ps.) coincides with a frequency range (F3) embracing the third formant of these vowels. The indicated values of F1, F2 and F3 are for adult male speech.
As its second and third steps, the speech-examination method of the present invention comprises the process of selecting, from each of the three channel families, the individual channel thereof which reiiects maximum signal content, and repeating this maximum amplitude-selecting step at a repetition rate of, say, 60 or more times per second. The fourth step is to store or register such maximum amplitude selections at the .instant they occur.
The invention also embraces the particular apparatus andcircuitry herein disclosed for practicing the speech spectrum-segmentation method of examination above outlined. p
Other characteristics of the invention will appear upon reference to the following description of the invention as illustrated in the accompanying drawings wherein:
Fig. l is a block diagram of a system employing` the speech spectrum segmentation concept underlying the invention; p
Fig. 2 is a circuit diagram embodying the normalizing and maximum-selector components of the` system of Fig. 1;
Fig. 3 is a circuit diagram embodying the clamper components of the system of Fig. 1;
Fig. 4 is a circuit diagram of a conventional type `of amplifier assembly suitable for the circuits of Figs. 2 and 3;
Fig. 5 is a chart `showing the average lfrequencies in c.p.s. and relative intensities in db of the tirst three formants of the ten indicated English vowels as uttered by male speakers; and
Fig. 6 is a sexies of Wave diagrams illustrating how the system operates.
The arrangement of major components is illustrated in Fig. 1. The task of the system is to` accept continuous speech at its input and to yield three output voltages, F10), F20), and F30), whose magnitudes as functions of time, represent the `frequencies of the first three major vocal resonances (formants). Continuous speech signals are -fed directly (or, if desired through a vowel extracting apparatus such as is depicted in Fig 2" of my co-pending application above-identified) into the analyzing filter set indicated at 10 in Fig. 1. This filter set 10 may be composed of 36 contiguous band-pass filters, each with an associated amplifier, rectitier and smoothing network, as illustrated in Fig 5 of my co-pending application. The out-put voltages (negative D.C.) of the individual lter units of the filter set are directed to filter output terminals 11a, 11b, or 11a` (Fig. l), as the case may be, depending upon whether each individually filter signal falls -in the F1, F2, or `F3 frequency range, corresponding to Athe lirst, second, and third vowel formant groups. From terminals 11a, 11b, and llc the output voltages (or, optionally, the second ditferenced outputs)` are directed into amplitude normalizing networks 12a, 12b, and 12e, respectively. l
As shown in Fig. 2, amplitude normalizing network Fseveral subtraction operations. 'voltage input to the normalizing circuit from the kth 12e are duplicates thereof-includes a series of resistance loads 22 (a to n), 23 (a Ito n), and 24 (a to n) equal in number to the number (n) ofinput lines connecting `terminal group 11a to network 12a, said resistors functioning (under the control of adjustable resistor 27, and with the assistance of amplier A1) to subtract the mean value of all applied voltages from ythe value of each individual input voltage, and to present to the respective -output terminals 30 (a to n) voltage values equal to one-half the magnitudes of the differences obtained in the For example, if ek is the `iilter kchannel of a channel `family have N individual channels, then the normalized kth channel voltage is:
The normalized set of voltages presented to output terminals 30 advance to amplifiers Az-Anifl (Fig. 2)
where the voltages undergo a gain of the order of l to Whose plate circuits include a common resistor 42, and
are connected by a common conductor 46 to the twin yanodes ,of pulser 47, the latter having its cathodes grounded at 48, and its grids triggered by current from thesecondary of a transformer 49 whose primary circuit receives energy from A.C. source 50. With pulsing-current being applied to the plate circuit at 60 c.p.s. (assum- `ing transformer 49 to be receiving 60-cycle input from source 50 )pulser triodes 47 and pulser output circuit 46 will enable one or another of the thyratrons to ire every j,(30th second, with the activated thyratron being the particular one receiving the maximum positive grid voltage .on any given cycle. Each time any thyratron is thus ac- ;tivated it operates to preclude the tiring of any other. (If no positive voltages are being delivered to the thyratron grids, at the moment of enabling in the manner just described, then of course there will be no ring of any ,of the thyratrons.) The cathode outputs of the thyratrons are weighted by means of potentiometers 43a to '43n, and are summed to provide a single output, by way of resistive summation network 44a to 4411. The selector'output at 17a, therefore, `is a string of weighted rectangular pulses whose heights correspond to the num- VVFig. 2 is illustrated in Fig. 6 at (a), (b), and (c). Fig. F6(a) represents the output voltages of three arbitrarily :chosen lter channels during seven successive selecting `time intervals. In the rst time interval, no output has appeared. In the second interval, a iilter output has appeared and channel No. l has the maximum value. In the following intervals, the maximum moves successively from channel one to two; from two to three; from three vback to two; and from two back to one. Fig. 6(b) shows the normalized values of these channel voltages Aduring the same succession of selecting intervals. Fig. @6(c) assumes that the maximum selector is selecting from these three channels and shows its output as a function 'of time for the same succession of selecting intervals.
The unweighted cathode'voltages of the maximum amplitude selector are summed to provide a cathode sum voltageoutput at point 16a (Figs. 2 and 3) for trigger- Ling the left-hand half 67 of a twin triode Fig. y3) constituting partV of the clamper circuitry'. The clamper `ciring.
'4 5 cuitry also includes a one-shot multivibrator (triodes 69 and 70, Fig. 3) and two amplifier assemblies 52` and'56, each having constituent parts like those of Fig. 4. The control grid of the first stage of amplifier assembly 52 receives the voltage output of the thyratron selection network by way of maximum amplitude selector output terminal 17a and conductor 51. The purpose of the clamper circuit is to provide a staircase smoothing of the rectangular pulses coming from the selector. To accomplish this, a gating pulse is generated at the proper phase of the enabling-disabling period of the maximum selector by the one-shot multivibrator 69-70, and this pulse is applied by way of conductor 65 to the grid of gating tube 55. Gating pulse 55 reads the height of each pulse from the maximum selector, stores this value for a brief holding interval, and then delivers'it to terminals 13a by way of ampliiier stage 56 and lead 59. Thus the output at terminals 18a represents (in staircase fashion) the heights of the successive output pulses from the selector 13A (Fig. 2). This height-reading operation is ilustrated in Fig. 6(d). The voltage read by the gating tube is stored and held in the clamper circuit until the next sampling occurs. The clamper output (that is, the staircase smoothing of Ithe output pulses from the selector) is shown in Fig. 6(e). 'Ihis output can be smoothed further by a passive low-,pass network.
Triggered and synchronous sampling: Two methods for Igenerating the height-sampling gate have 4been provided. They are termed triggered and synchronous sampling. During triggered sampling (switch 64 in the fullline position, Fig. 3), the height-reading gate is generated only if the thyratron selector is making a selection. The trigger is derived from a summation, without weighting, of the thyratron cathode voltages, and the gate reads each time any thyratron fires. During synchronous sampling (switch 64 in the dash-line position, Fig. 3), the height-reading gate is generated in synchronism with the enabling plate voltage of the thyratron set. It reads, therefore, regardless of whether or not a thyratronris ring. If a thyratron is not firing, the gate, of course, reads the value zero, and this appears at the clamper output 18a. The dotted portions of the curves inFigs. 6(d) and 6(e) indicate the result obtained for synchronous sampling.
The method of sampling determines the manner in which the clamper output voltage is extrapolated. With triggered sampling, the clamper holds the last value of voltage read when the thyratrons were selecting and fir- It loses this value relatively slowly, returning to Vzero or to a neutral voltage withV a time constant of approximately a quarter-second. With synchronous sarnpling, the output voltage goes to zero in the enabling interval immediately following the last selection ofthe thyratrons. Therefore, if one wishes to extract formant signals which are extrapolated smoothly across consonant and silent intervals, the triggered sampling yields the best results.
The amplifier assemblies indicated in circular format 52 and 56 in Fig. 3 may be of conventional design as, for example, the design shown in Fig. 4, whichis that-of a conventional plug-in type of amplifier assembly. The numerals in the small circles spaced about large circles 52 and 56 (Fig 3) indicate connection of these circuit points to the points similarly designated by circled numerals in Fig. 4. The only difference in the external connections of units 52 and S6 is that unit 52 is connected to functionas a polarity inverter (by way of'phase inversion loop S3) whereas unit 56 is connected to function as a cathode follower with its output line 59 leading to terminal 18a, and with feed-back 57 to adjustable bias control network 58-60- Each unit ('52 and '56)A-includes two twin triodes 7i, 72 (Fig. l4), two gasdiodes 73, 74 and resistance and capacitance parametriocouples of the values indicated adjacent each unit. Equivalent ampliiier circuitry may, of course, be substituted.
` Reverting to Fig. 3, resistor 62 is adjustable to set the proper phase relation between the voltages of transformer 61 and transformer 49 (Fig. 2), to assure that the sampling action will be truly synchronous, as described.
What is claimed is:
1. An automatic, electronic speech analyzing apparatus for extracting from input speech narrow bandwidth electrical signals whose varying amplitudes indicate the content of said input speech, said apparatus including means for converting the speech input into electrical signal groups whose frequency ranges differ, each from the others, in accordance with the speech spectral differences distinguishing major speech formants, one from the other, means for selecting from each of the subbands within each formant group the particular signal having maximum amplitude, said signal selecting means including a plurality of voltage-responsive ionization discharge devices, each adiusted to respond to the same voltage input level, and means responsive to the iring of any one of said devices to preclude the tiring of any of the associated devices, during any given operating cycle, and means for simultaneously registering the selected signals.
2. Electronic speech analyzing apparatus as defined in claim 1 wherein said maximum signal selecting means includes electronic gating means controllinfg delivery of the output of said ionization discharge devices to said registering means.
References Cited in the le of this patent UNITED STATES PATENTS 2,243,527 Dudley May 27, 1941 2,458,227 Vermeulen Ian. 4, 1949 2,635,146 Steinberg Apr. 14, 1953