US 3483325 A
Abstract available in
Claims available in
Description (OCR text may contain errors)
Dec..9, 1969 J. l.. STEWART SPEECH PROCESSING SYSTEM 2 Sheets-Sheet 2 Filed April 22, 1966 FfQaEA/cy A400 scm E) INVENTOR.
Jd//A/ L. .STEWART T'OPA/EKS 3,483,325 SPEECH PROCESSING SYSTEM .lohn L. Stewart, Menlo Pai-k, Calif., assignor to Santa Rita Technology, Inc., Menlo Park, Calif., a corporation of Arizona Filed Apr. 22, 1966, Ser. No. 544,531
Int. Cl'. H04m 1/00; H04b 1/66; G01r 23/16 U.S. Cl. 179-1 10 Claims ABSTRACT F THE DISCLOSURE A speech processing system that develops from an electrical signal corresponding in frequency and magnitude to the speech, a first constant bandwidth of comparatively low frequency and a second variable bandwidth of a frequency somewhat higher than that of the constant bandwidth. A system for controlling or tuning the variable bandwidth filter in proportion to the frequency distribution characteristics of the input signal. The tuning circuit is arranged to generate a tuning signal that is weighted according to the frequency content of the signal so as to preserve and extract only essential frequency components of the signal.
This invention relates generally to speech processing systems, and pertains more particularly to such a system employing bandwidth compression techniques.
As a preface to a detailed description of my invention, it can be explained that all of the frequency components that exist at any instant in a speech signal are not equally important, some being unnecessary. Most of the intelligibility and naturalness in speech can be retained by passing only two relatively narrow bands of components. One of these bands in accordance with the present invention is fixed to cover the range of from approximately 300 to 600 cycles per second, and the other moves with the speech signal in the range from approximately 600 -to 5,000 cycles per second, this latter band or more properly sub-band possessing a constant bandwidth to center frequency ratio of approximately unity, although this specific ratio is not critical.
With the above in mind, one object of the present invention is to provide a speech processing system which provides speech output that is generally intelligible to the human ear, even though only the relatively narrow bands of frequency components are utilized to either constitute or reconstitute the speech. More specifically, the invention has for the dual aim the provision of a system that will find utility in either processing a voice input signal to derive a voice output signal having a generally acceptable level of intelligibility or to synthesize a broad band noise input signal to come out witli speech having the same desired characteristics as when processing an actual voice input signal. Stated somewhat differently, my invention basically'involves the passing of frequencies in a particular fixed band toward the lower end of the audio range and to superimpose the signal passed in such a manner with a control signal that is predicated upon certain frequency bands that are variable and which are controlled in a tuned fashion so as to literally superimpose the controlled signal onto the more basic signal and still obtain a speech output signal that can be understood.
While my invention has the broader object which has been mentioned above, a more specific object is to provide an output voice signal that contains a certain frequency emphasis so as to improve hearing in some cases of sensori-neural deafness by shifting the normal speech frequencies to a somewhat lower value before the voice signal reaches the listeners ear. In a somewhat reverse situation, although functionally similar to improving the hearing in the above-alluded-to deafness situation, it is 3,483,325 Patented Dec. 9, 1969 possible to counteract certain distortions of speech wherel the hearing is normal, such a case existing where either there is some impairment in the speakers manner of speaking or he is located in an abnormal environment such as when the speaker is under water and is breathing a mixture of helium and oxygen which has been experienced by persons attempting to listen to undersea divers.
Thus, quite briefly, my invention involves the passing of frequencies within a certain predetermined band toward the lower end of the audio spectrum, more specifically, within the bandwidth of approximately from 300 to 600 cycles per second. Via a tunable bandwidth filter operating in the 600 to 5,000 cycles per second range, another band within such range is utilized in accordance withl a tuning control signal so as to pass the necessary frequency components. Through the agency of an adder, the signals from both paths are combined and then delivered to a suitable output device which expresses the derived signal in speech form. Since the system that has been briefly described can receive an electrical input signal which has been transduced from a voice or can receive a broad band noise input signal, suitable provision can be made for either processing one or the other of such signals in order to obtain the desired output voiced or synthetic speech signal. It can be pointed out at this time, though, that when processing a voice input signal, a vsomewhat more simplified circuit configuration can be utilized than when synthesizing a desired signal to be used as the output speech signal. As will hereinafter become more readily understood, use is made of a pattern centroid signal for controlling and selecting the appropriate subbandwidth that is to be added or combined with the more basic signal falling within a bandwidth toward the lower end of the audio range. When synthesizing a signal, an area control signal is utilized and is multiplied with the signal obtained by adding or combining in order to get the speech output signal that is sought.
These and other objects and advantages of my invention, will more fully appear from the following description, made in connection with the accompanying drawings, lwherein like reference characters refer to the same or similar parts throughout the several views and in which:
FIGURE l is a view, largely in block form, illustrating a circuit arrangement that can be employed for either speech sharpening or speech synthesizing purposes;
FIGURE 2 is a gain curve showing the relative gain function for the various filters which are embodied in the analog ear analyzer shown in FIGURE 1, and
FIGURE 3 is a graphical representation of a set of e waveforms typical of those utilized in FIGURE 1 and defining a spatial pattern which may be continuous and which changes with time, the view three-dimensionally characterizing this pattern being in the form of a surface plotted against distance, intensity and time.
Since my speech processing system can be utilized either for speech sharpening or speech synthesizing purposes, it will be helpful to describe the circuitry utilized in obtaining the simpler goal which is the speech sharpening one. Therefore, it will be seen that my speech sharpening system includes an input device 20, such as a microphone or tape deck, serving as the means for delivering an appropriate electrical signal that has been transduced or converted from the speech to be sharpened. The electrical signal from the input device 20 is delivered through a switch 22 to a first filter 24 having a fixed bandwidth from approximately 300 to cycles per second and therefore capable of passing relatively low audio fre-- quencies. Also connected to the input device 20 through the switch 22 is a second filter 26 having a variable bandwidth for passing audio frequencies above the fixed band- Width that has been selected for the filter 24. More specifiartesana cally, the bandwidth for the filter 26 is from approximately 600 to 5,000 cycles per second, and an appropriate sub-band within this relatively wide frequency range is shifted in accordance with certain characteristics that are to be imparted to the ultimate voice signal that will be hereinafter referred to. Preferably, this sub-band has a constant bandwidth to center frequency ratio of approximately unity.
Actually, it is desirable to feed the output from the filter 24 through an adjustable gain control 30 so that the contribution from the filter 24 may he augmented or suppressed according to the desires of the user. The now sharpened outputs are fed through switches 28 and 32 that are closed so as to bypass multiplier circuitry hereinafter referred to in conjunction with the synthesizing action that is possible with my system. The two signals via` the switches 28, 32 and the signals from the multipliers 60, 62 (hereinafter more specifically referred to) are then applied directly to an adder 34 and an output device 36 which can be a set of earphones or a loudspeaker if the voice output signal is to be heard directly by the listener, or the output device can be a recorder of some type if the processed voice signal is to be used later.
From the information presented above, it is readily apparent that the manner in which the filter 26 is tuned has not as yet been dealt with. Therefore, it will be observed that the same voice input signal from the device 20 is delivered to a frequency spreader or analog ear analyzer designated generally by the reference numeral 38. In the depicted situation, the analyzer 38 comprises a common filter 39 and a plurality or bank of low-pass filters 40-1, 40-2 40-12; while 12 such low-pass filters have been mentioned, this number can be either increased or decreased as circumstances require and without any essential change taking place with respect to the practicing of the instant invention. Also included in the analyzer 38 is a group of amplifiers 42-1 through 42-12, there being one such amplifier after each filter t-1 through 40-12 in order to provide a prescribed gain of six db. It may in some instances be advantageous to combine functions of filtering and amplifying such that the two operations cannot be separately identified. Especially cited in this regard is that, through use of feedback amplifiers, the filters can be made to perform as if both inductance and capacitance are present but without actually having any inductors. It is to be recognized that alternative but equivalent filtering schemes exist which in one case yields a purely passive resistance-inductance-capacitance circuit. It will be helpful at this time to refer to FIGURE 2 where the gain curve scheme is graphically displayed. The particular amplified outputs of the analyzer 38 have been labeled, respectively, with the reference numerals 1 through 12. Also superimposed upon the graph constituting FIGURE 2 is the characteristic of the common filter 39, this filter suggestively having a break point at 2,500 cycles per second below which the slope of the curve is 18 db per octave. It will be recognized that the abscissa represents frequency on a logarithm. scale and the ordinate represents relative gain in decibels. From the nature of these several curves, system variations are obvious. Part of the common filter characteristic can be associated with each filter output as can accumulated gain values. For example, a differentiator can be associated with each output section so that the slope of the common filter characteristic below the break point becomes l2 db per octave, which is in accord with the auditory threshold characteristic of human hearing at low frequencies. The gain value of 6 db associated with each filter is predicated on the use of 12 filters giving a total accumulated gain of 72 db for the 12th stage. If the same frequency range is covered using, say, 24 filters, then maintaining the same accumulated gain value requires that each filter be associated with a gain of 3 db instead of 6 db. The intent of' FIGURE 2 is in part to indicate gain values. But also it is meant to show that bandwidths of the separate low-pass filters bear a constant ratio, one to the next. The proper values for gains and bandwidths using any number of sections other than the l2 explicitly described herein should thus be obvious.
It can be explained at this time that an analyzer very similar to what has been generally described herein has been more fully described in my Patent No. 3,387,093 for Speech Bandwidth Compression System, and reference may be made thereto for a more complete understanding of the analyzer 38, although the analyzer or frequency spreader described in my patent has a lesser' number of filter sections and also does not have the six db gain amplifiers that have been included in the analyzer 38. Attention is directed also to another of my co-pending patent applications, namely, Method and System of Analyzing the Inner Ear, which was filed on July l2, 1965, Ser. No. 471,074.
Inasmuch as the various filters 404 through 40-12 are extremely overlapping7 as pointed out in my co-pending application, these various filters may be thought of as making a type of weighted spectrum analysis of the speech or voice delivered from the input device 20. The out-put, however, from each filter 40-1 through 40-12 is rectified by a rectifier 44 and the rectified signal in each situation is fed to various filters 46 each having a bandwidth of approximately 30 cycles per second.
The outputs from the filters 46 are in the form of a set of relatively slowly varying waveforms or voltages e1, e2 e212 (or whatever number of filters has been selected for the analyzer 38) in part representing temporal variations of energies of individual spectral bands. From FIG- URE 3, it will be appreciated that the system or group of signals e1, e2 em constitutes the bandwidth compressed speech to the extent that the individual es are slowly varying and that they are similar to one another. However, these slowly varying voltage signals el, cl2 are representative of intelligence that has been obtained from the transduced voice signal. It is this information that is further processed in order to tune the filter 26 properly so as' to reflect in the signal forwarded to the adder from the filter 26 the information that is needed in providing an intelligible voice output at .34,
Therefore, attention is now drawn to the presence of a pattern centroid extractor 48 to which the various filters 46 are connected, there being terminals 50-1 through 50-12 for introducing the slowly varying Ie signals into the extractor, Although several suitable pattern centroid extractors have been shown and described in my patent, it will be of assistance to explain the processing action that takes place in the extractor 48, although resort can be made to my said patent if detailed information is desired. Also, it should be of help to refer to FIGURE 3 herein presented which figure shows the various e signals that are fed to the pattern centroid extractor 48 via the terminals 50-1 through 50-12.
These various rectified and filtered voltages e1-612 are added together in resistor/amplifier linear summing arrays contained within the extractor 48 to get two voltages of the form and these voltages EA and EB are subtracted in one situation and added in another with a further divisional operation performed thereon to get the final tuning signal that is applied to the tunable bandwidth filter 26. The quantity EA can be considered a combination of' voltage Signals 6 in an. increasing weighted fashion, and the quantity EB can be considered a combination of voltage signals en in a decreasing weighted fashion. The signal is t L EAMEB r i' f1 iiniung signal EA +En Since l have stated that reference can be made to my Patent No. 3,387,093 titled Speech Bandwidth Compression System for a more comprehensive understanding of the role performed by the extractor 48, it will simplify matters to state that the extractor in effect is providing a signal representative of the center of gravity of the area under the envelope which is labeled 51. As this center of gravity shifts, a change iii the control signal impressed on the tunable ybandwidth filter 26 can be modified.A so as to shift the sub-band within the bandwidth for the filter 26. Consequently, the signal taken from the analyzer 33 through the medium of the extractor 48 is placed or converted into a usable form that controls the tuning of the filter 26 in a fashion so that intelligible information is forwarded to one input of the adder 34. Since the adder 34 receives-its other signal directly from the adjustable gain control, the outputfrom the adder 34 is in a speech sharpened form that can be used directly or recorded for subsequent use at the output device 36.
Having described aquisition of centroid in the foregoing, it is now stated that all of the 12 signals may not be used for this purpose. It was previously stated that the tunable filter covers the range above 600 cycles per second which range is not properly represented in part by voltages e9- e12. In the actual process of centroid extraction, these voltages are attenuated or even eliminated from the computation for EA and EBIn the case of complete'elimination, there results where it is to be understood that a situation intermediate between total usage and "total elimination of voltages e9- e12 may be most appropriate.
Having given the above-presented description, the manner in which my speech processing system is employed for synthesizing purposes will be more easily understood. It will be observed that the switch 22 has two positions, and when in the phantom outline position, it is connected directly to a broad bandnoise input source 54. Ifdesired, the source 54, which provides white gaussian noise, can be augmented with a buzzing sound a buzz source 56 so that the human larynx can be emulated.
However, since in this situation, which is currently being described, it is desired to synthesize or build up a voice output at 36, tlie broad band noise signal from the source 54 is not forwarded to the analyzer 38. Instead, the same voice input signal as in the case of speech sharpening is employed. In this case, the desire is to synthesize speech from ordinary voice using control signals which are relatively simple compared to the speech waveform itself.
As with the speech sharpening procedure, the filters 46 supply slowly varying e voltages which are impressed in the same manner as heretofore on the terminals 501 through 50-12 of the extractor 48. Hence, the tuning control signal delivered via the line S2 to the filter 26 is used in the same fashion as was done when using the voice input device Not only is a tuning control signal derived in the synthesizing of speech, as Vdescribed above, but two pattern area control signals are also obtained from the slowly varying e voltages through addition of combinations of the slowly varying voltages e1-e12. These are designated A1 and A2 composed as which may be modified somewhat through inclusion of some of e9 in A1 and/ or reduction of various of the component in A2. .Means for acquiring A1 and 2 from signals e1-e12 are believed to be so well known that specific descriptions are not needed. It will be recognized that A1 and A2 are representative of partial areas under the envelope 51 of FIGURE 3, A1 being representative of the partial area taken over a distance related to the frequency range passed by filters -1 to 40-8, about 800 to 20,000
cycles per second, and A2 being representative of the area taken over a distance related to the frequency range passed by filters 40-9 to 40-12, which is about 300-600 cycles per second, These partial areas are measures of the average intensity of the signals in the associated frequency ranges. The area' measure A2 i`s directed to one input of the previously mentioned multiplier 62 which has its other input connected to the tunable filter 26. In a similar manner, the area measure A1 is directed to the multiplier 60, the multiplier 60 having its other input connected to the 300-600 c.p.s. filter 24 via the adjustable gain control 30. Hence, instead of the lters 24 and 26 delivering their outputs directlyto the adder 34, as was done when obtaining a speech sharpening action, when synthesizing the filter outputs are delivered to multipliers 60 and 62 to be operated upon in accordance with the area measures A1 and A2 and the 'resulting products are fed to the adder 34 and thence to the output device 36 in order to provide a constituted or synthesized speech signal that contains a suliicient number of frequency components of proper relative intensity as to have a generally intelligible level as far as the human ear is concerned.
Owing to thefact that the operation has been given in each instance, it is believed unnecessary to have a separate operational description at this particular time. It will be appreciated, through, that when either sharpening speech or synthesizing speech, here is produced an appropriate tuning" signal that carries sufiicient information or intelligence therein so that when applied to the tunable bandwidth lter. 26 an appropriate sub-band lof frequency is transmitted to the adder 34 for combining with? the signal forwarded from the fixed bandwidth lter 24 through the intermediary of the adjustable gain control 30. It` has previously been-explained that the gain control 30 provides individual adjustment to suit the listening tastes of the individual when the speech signal is transduced at the output device 36.
It will, of course, be understood that various changes may be made in the form, details, arrangements and proportions of the parts without departing from the scope of my invention as set forth in the appended claims.
1. A speech processing system comprising means for providing an input electrical signal containing various frequencies within the audio band, first filter means connected to said input means having a fixed bandwidth for passing relatively low audio frequencies, second filter means connected to said input means having a variable bandwidth for passing audio frequencies above said fixed bandwidth, means for combining the respective outputs from said first and second filter means, means connected to said combining means for providing a speech signal, and means for tuning said second filter means to selected frequencies so that sai-d speech signal is generally intelligible to the human ear, said tuning means including means for forming a plurality of discrete Voltage signals en that correspond to the magnitude of frequency selected portions of the input electrical signal, first combining means for combining the discrete voltage signals in increasing weighted fashion to produce a composite signal EA, second combining means for combining the discrete voltage signals in decreasing weighted fashion to produce a composite signal EB, and means constituting the output of said tuning means for generating the voltage function 2. A speech processing system as defined in claim 1 in which sai-d first filter means has a fixed bandwidth from approximately 300 to about 600 cycles per second, and said second filter means has a tunable sub-band within a bandwidth of from approximately 600 to about 5,000 cycles per second.
3. A speech processing system as defined in claim 1 in which said input providing means provides a signal transduced. from speech sound.
7 4. A speech processing system as defined in claim 3 in which said first filter means has a fixed bandwidth .from approximately 300 to about 600 cycles per second, and Said second filter means has a tunable sub-band Within a bandwidth of from approximately 600 to about, 5,000 cycles per second, said sub-band having a center frequency to bandwidth ratio of approximately unity.
S. A speech processing system as defined in claim 4 in which said system further includes an adjustable gain control connected between said first filter means and said combining means for selectively varying the contribution from said first filter means to said speech signal.
6, A speech processing system as defined in claim 1 in which said input means provides a broad band noise signal, said system further including means `for supplying to said analyzer a signal. transduced from speech sound.
7. A speech processing system as defined in claim 6 in which said first 1iilter means has a fixed bandwidth from approximately 300 to about 600 cycles per second, and said second filter means has a tunable sub-band within a bandwidth of from approximately 600 to about 5,000 cycles per second.
8. A speech processing system as defined in claim 7, further including means connected between each of said first and second filter means and said combining means for adjusting the relative average intensity of the respective outputs from said first and second filter means.
9. A speech processing system as defined in claim 8, in which each of said intensity adjusting means comprises a multiplier for providing an output signal which is the product of two input signals, the respective outputs of each of said first and second filter means being connected to one input of the corresponding multiplier, said system further including means responsive to said analyzer for providing a first area control signal related to the average intensity of frequency components of said transduced speech signal in the range of about 300 to about 600 cycles per second and a second area control signal related to the average intensity of frequency components of said transduced speech signal above about 800 cycles per second, means for supplying said first area control signal to the other input of the multiplier connected to said first filter means, and means for supplying said second area control signal to the other input of the multiplier connected to said second filter means.
10. A speech processing system as defined in claim 9 in which said input means includes a source of buzzing soundg References Cited UNITED STATES PATENTS 3,431,356 3/1969 Copel.
2,906,955 9/1956 Edson et alc 3,078,345 2/1963 Campanella et al. 179-1555 3,176,073 3/1965 Samuelson et al,
3,376,386 5/1968 Pant,
3,387,093 6/1968 Stewart.
3,394,228 7/1968 Flanagan et al.
KATHLEEN H. CLAFFY, Primary Examiner ARTHUR A. MCGILL, Assistant Examiner U.S. Cl. XLRc