Search Images Maps Play YouTube News Gmail Drive More »
Sign in
Screen reader users: click this link for accessible mode. Accessible mode has the same essential features but works better with your reader.

Patents

  1. Advanced Patent Search
Publication numberUS3387093 A
Publication typeGrant
Publication dateJun 4, 1968
Filing dateApr 22, 1964
Priority dateApr 22, 1964
Publication numberUS 3387093 A, US 3387093A, US-A-3387093, US3387093 A, US3387093A
InventorsJohn L Stewart
Original AssigneeSanta Rita Techonolgy Inc
Export CitationBiBTeX, EndNote, RefMan
External Links: USPTO, USPTO Assignment, Espacenet
Speech bandwidsth compression system
US 3387093 A
Abstract  available in
Images(7)
Previous page
Next page
Claims  available in
Description  (OCR text may contain errors)

June 4, 1968 J. L. STEWART SPEECH BANDWIDTH COMPRESSION SYSTEM '7 Sheets-Sheet 1 INPUT DEVICE 22 7 COMMON 24 2o 1 FILTER 26 FREQUENCY SPREADER 1- 8 i T \I! T RECT. l RECT. 2 RECT 3-9 RECT. l0

l T T FILTERI FILTER2 FILTERS 3-9 FILTER \o e2 e3'99 \e (W) EXTRACTOR FOR PROVIDING SPATIAL PATTERN SIGNALS FIG. I

DISTANCE FIG. 2

I N VEN TOR.

JOHN L. STEWART ATTORNEYS June 4, 1968 J. L. STEWART 3,387,093

SPEECH BANDWIDTH COMPRESSION SYSTEM Filed April 22, 1964 7 Sheets-Sheet g COMMON FILTERi MIKE 0.02

INVENTOR.

JOHN L. STEWART geywwfm ATTORNEYS June 4, 1968 J. 1.. STEWART 3,387,093

SPEECH BANDWIDTH COMPRESSION SYSTEM Filed April 22, 1964 7 Sheets-$heet 5 FROM COMMON FILTER AND Cl DRIVER 24 (VOLTAGE SOURCE) TO DETECTORS I AND POST DETECTION FILTERS 3O SECTION 2 i i I 220K a 560K If 2.2K Tan T OUTPUT POST DETECTION FILTER IN VENTOR.

JOHN L. STEWART ATTORNEYS June 4, 1968 J. STEWART 3,337,093

SPEECH BANDWIDTH COMPRESSION SYSTEM Filed April 22, 1964 7 Sheets-Sheet 4 POST DETEOT|0N FILTER POSITIVE INPUT CENTROID magmas INVENTOR. JOHN L. STEWART ATTORNEYS l PATTERN PEG. 7

FROM POST DETECTION FILTERS PROD RECTIFIER;

MOTOR CONTROL AME June 4, 1968 J. L. STEWART 3,387,093

SPEECH BANDWIDTH COMPRESSION SYSTEM Filed April 22, 1964 '7 Sheets-Sheet 5 E (A'x') loom I44 I54 f CHOPPER 5 K65 CLIPPER I32 5 7 AMPLlFIER SENSITIVE OUTPUT 4. DEMODULATOR Z M 240K (RING, arc.) x

CHOPPER DRIVE cou.

PHASE 4/ ADJUST 32a \..|52

5KOS SINE WAVE Emu) W I30 IM A AJEP 21 OUTPUT E (A) CHOPPER 7 ZH/VVJ I DETECTOR 5 I32 5mm. T I46 T (RING, ETC) CHOPPER DRIVE cou.

4/ PHASE 32b ADJUST sms WAVE lsz INVENTOR.

JOHN L. STEWART June 4, 1968 J. L. STEWART 3,387,093

SPEECH BANDWIDTH COMPRESSION SYSTEM Filed April 22, 1964 v Sheets-Sheet v I 34 SYNTHESIZER r' .5? lee a K r l BROAD TUNABLE BAND 7 BANWPASS EQUALIZER H MULTIPLIER SOURCE FILTER I L74 L30 "8i PATTERN .rnzoueucv 'gZ 'LgE 'W'IDTH CONTROL I CONTROL NOISE f v souace leo f 500m GENTROID l: CONTROL SIGNAL 5+ (lsov) K 7 n I;

L AUDILOT our u AMPLITUDE 82K K CONTROL. 820* svm'uasazan M86 I INVENTOR.

JOHN L. STEWART ATTORNEYS United States Patent 3,387,093 SPEECH BANDWIDTH COMPRESSION SYSTEM John L. Stewart, Menlo Park, Calif assignor to Santa Rita Technology, Inc., Menlo Park, Calif., 21 corporation of Arizona Filed Apr. 22, 1964, Ser. No. 361,809 24 Claims. (Cl. 179-1555) ABSTRACT OF THE DISCLOSURE A frequency spreader composed of filters having extremely overlapping bandwidths together with post-detection filters provide slowly varying voltage outputs. With the band overlap, a frequency component in a signal to be analyzed contributes to the response of several outputs simultaneously. Patterns due to speech are rendered so simple that only a few pattern measures derived from the voltage outputs are needed to reconstitute the speech into intelligible form.

This invention relates generally to narrow-band speech transmission, and pertains more particularly to a system for first compressing the frequency bandwidth and thereafter reconstituting the speech into intelligible form.

Those familiar with speech bandwidth compression and recognition techniques will appreciate that sounds such as those of human speech can be analyzed in numerous ways so as to provide one or more slowly varying measures which describe the sound but which are simpler in detail than the original speech waveform. In this manner, the total frequency band required for the transmission of the intelligibility contained in speech has been reduced over that normally required. The slowly varying measures have been utilized for the purpose of causing appropriate sound generators to be operated in accordance with the measures so as to reconstitute the speech in audible form, thereby providing a complete bandwidth compressed speech system. Formant tracking and vocoder systems have been able to compress the bandwidth required for transmission of speech to the order of hundreds of cycles per second; more specifically, the state of the art devices have been instrumental in yielding bandwidth compression to a minimum of 100 to 200 cycles per second.

Previously conceived systems and methods have been limited by the inadequacy of the electronic analyzing system which extracts the slowly varying measures. In particular, waveform data of purely semantic significance and waveform data related to quality characteristics of the particular speaker are not separable with vocoders and related devices as they exist today. Therefore, one object of the present invention is to extract measures from speech involving a total bandwidth for transmission of these measures which is of the order 0f cycles per second.

Another object of the invention is to provide a system that will not be vulnerable to excessive effects of individual variations. More specifically, it is an aim of the invention to provide pattern measures that need not be close 1y adjusted to a given speaker.

Another object is to derive patterns from speech which are rendered so simple in form and quite slowly varying such that only a very few pattern measures are required. In practice, it has been found that as few as two such pattern measures will be ample for allowing the reconstitution of moderately intelligible speech, although additional pattern measures have also been employed with improved results.

Yet another object of the invention is to provide a system that will closely simulate some of the functional processing procedures employed in the animal (human) ear and central nervous system.

Quite briefly, the analyzing portion of my system includes a frequency spreader that can be represented in terms of a set of band-pass filters which continuously cover the speed spectrum. These filters are disposed logarithmically in frequency and have a constant center-frequency-to-bandwidth ratio at the half-power level of about two or somewhat less. The various filter bands in my system are extremely overlapping in contrast to prior art schemes where such an overlap has been normally deemed undesirable. With the band overlap envisaged in the present system, a frequency component in a signal to e analyzed contributes to the response of several outputs simultaneously rather than identifying itself with a particular filter and no others. As a result, the present system does not analyze according to notions-of frequency components; the fact that it does not is a major contributor to its success.

The speech synthesizing or decoding portion of my system converts the slowly varying measures into audible speech through the agency of first providing a constant signal giving a broad spectrum which is filtered with a controllable filter or filters whose parameters are varied in accordance with the spatial pattern instant by instant. The invention utilizes a single filter band driven with a broad band source where the filter is of the band-pass kind. The center frequency is varied in accordance with the principal spatial pattern measure, and also the filter gain is adjusted in accordance with the average pattern intensity. It is also within the purview of the invention to provide a filter bandwidth and symmetry about the center frequency which can also be varied according to appropriate pattern measures.

These and other objects and advantages of my invention will more fully appear from the following description, made in connection with the accompanying drawings, wherein like reference characters refer to the same or similar parts throughout the several views and in which:

FIGURE 1 is a block diagram of one form my complete system can assume, such system including the analyzing, extracting and synthesizing portions of the circuitry;

FIGURE 2 is a graphical representation of a set of e waveforms as in FIGURE 1 defining a spatial pattern which may be continuous and which changes withtime, the view characterizing this pattern in the form of a surface plotted against distance, intensity and time;

FIGURE 3 is a schematic view of the circuitry constituting the common filter and speech amplifier shown in the block diagram of FIGURE 1;

FIGURE 4 is a schematic diagram depicting the frequency spreader shown in FIGURE 1;

FIGURE 5 is a schematic detail illustrating a specific form that one of the detectors and post detection filters of FIGURE 1 may assume;

FIGURE 6 is a schematic diagram illustrating a second form that the post detection filters of FIGURE 1 may assume;

FIGURE 7 is a schematic diagram representing one form, primarily of symbolic significance, that the spatial pattern extractor may take;

FIGURE 8 is a schematic diagram depicting a second form that the spatial pattern extractor of FIGURE 1 may constitute;

FIGURE 9 is a schematic diagram depicting a third form that the spatial pattern extractor of FIGURE 1 may constitute;

FIGURE 10 is a block diagram denoting a fourth embodiment that can be used as the spatial pattern extractor;

FIGURE 11 is a subdivided block diagram illustrating the synthesizer or decoder utilized in the over-all speech processing system of FIGURE 1; and

FIGURE 12 is a schematic diagram corresponding to the block diagram of FIGURE 11.

Referring first to FIGURE 1, the entire system there depicted has been denoted by the reference numeral 20. An input device 22, such as a microphone or tape deck, serves as the means for delivering an appropriate electrical signal that has been transduced or converted from the speech, the frequency bandwidth of which is to be compressed. The input device 22 is coupled to a common filter 24 and the output from the filter 24 is fed to a fre quency spreader 26 to which are connected a bank or plurality of rectifiers 28 which in turn are connected to a similar number of post detection filters 30.

Before proceeding further, it will be helpful to point out that the speech analyzer portion of the system is in some ways similar to vocoders, such as that described in Dudley Patent No. 2,151,091 granted Mar. 21, 1939. However, the frequency spreader 26 and the post detection filters 30 dilfer from those employed in the vocoder. As already indicated, the frequency spreader 26 is composed of filters that are disposed logarithmically in frequency and which possess a constant center-frequency-tobandwidth ratio at the half-power level of about two or possibly slightly less since the value is not especially critical.

Also, as earlier indicated, the various filter bands for the set of band-filters constituting the frequency spreader 26 are extremely overlapping, as will become better understood when referring to FIGURE 4.

Thus, the various filters constituting the spreader 26 may be thought of as making a kind of weighted spectrum analysis, even though somewhat indirectly, of the speech to be encoded. As with the vocoder, the output from each filter within the spreader 26 is rectified by the various rectifiers 28 and filtered again in tle filters 30. While any number of rectifiers 28 and filters 30 may be used, 10 such rectifiers and 10 such filters have been found satisfactory in actual practice. The result is that a set of relatively slowly varying waveforms e e e (or eN where N is the number of filters 30 actually used) in part represent temporal variations of energies of individual spectral bands. It will be appreciated that the system or group of signals e e e constitutes the bandwidth compressed speech to the extent that individual es are slowly varying and that they are similar to one another.

It will be understood that, all other things being equal, the fewer the number of unique e waveforms, the smaller the bandwidth. Also, all other things being equal, the more slowly varying are the e waveforms, the smaller is the bandwidth. Consequently, the complete set of 2 waveforms defines a spatial pattern which may be continuous and which changes with time. FlGURE 2 is a characterization of the resulting pattern in the form of a surface or envelope designated by the reference numeral 36, this envelope having distance, intensity and time coordinates. It can be explained that the filtering of the e signals derived in FIGURE 1 acts to reduce noise along the time dimension of FIGURE 2. Further filtering is left to a subsequent point in the system as will later be discussed.

Having referred to the common filter 24 only generally, it is believed that a schematic circuit showing one form that the filter 24 may assume will be helpful. This circuitry is depicted in FIGURE 3 and the reader will be immediately oriented by reason of the input device 22 being shown specifically as a microphone. The common filter has two purposes: (1) to provide adequate speech amplification as in any common speech amplifier; and (2) to provide a special frequency selective characteristic. The frequency characteristic is meant to simulate the standard human audiometric curve of threshold, a mathematical approximation for which is:

where f is frequency in cycles per second. It will be obvious that many designs for achieving this transfer function or one roughly similar to it (where accuracy is not critical) are possible. The amplifier-filter shown in FIG- URE 3 is so straight-forward that its detailed description is hardly necessary. The output from the amplifier driver, of course, is connected directly to the frequency spreader 26.

The frequency spreader 26 is detailed in FIGURE 4. It will be observed that the system 29 shown in FIGURE 1 has been depicted with ten channels of rectification and filtering action so the circuitry of FIGURE 4 is set up so as to show these ten channels. Each channel, of course, produces one of the previously-mentioned e waveforms subsequent to detection and filtering. It will be of assistance in understanding the role played by the spreader 26 to have listed the various component parameters along with the resonant frequencies for each channel:

Resonant No. Frequency, L1 (hy.) Cr (#f) C: (f C; t at) c.p.s.

In addition to the components L C C and C there are adjustable resistors R and the value of each resistor R is constant throughout the various channels, being 5,000 ohms, so as to yield a resonant circuit Q of 1.7. A moderately different value for each resistor R yields a modified Q value if such is desired; the particular value is not critical over a moderate range. Still additional resistors, these being fixed resistors, are labeled R and R and have ohmic values of 220K and 560K respectively.

Each channel is connected to a detector and filter circuit as shown in FIGURE 5. Each such combination -con tains a vacuum tube 60 which has in its plate circuit the primary of a transformer 62.

The secondary of each transformer 62 is connected to a diode bridge circuit 64, there being one such bridge circuit 64 in each rectifier designated generally by the reference numeral 28.

The type of post detection filter 30 shown in FIG- URE 5 is low-pass so that the bandwidths of the relevant e waveforms are materially decreased and the residual fundamental components in voiced speech sounds are adequately removed. The bandwidth of the filter shown in FIGURE 5 is approximately 15 cycles per second.

By direct experimentation, I have discovered that the post detection filter 301? shown in FIGURE -6 yields e waveform temporal variations for actual speech that are not greatly distorted from those resulting from the use of filters, such as the filter 30, shown in FIGURE 5. But the special nonlinear filter 30a in FIGURE 6 has a bandwidth of only about 2 cycles per second, this considerable reduction being made possible because the filter response embodies both the value of the input signal and its rate of change, where the latter acts to compensate for frequency distortions which would otherwise occur. The diode 78 in FIGURE 6 prevents the filter output from ever becoming negative, which would be contrary to the nature of corresponding variations in the animal nervous system. With the post detection filter shown in FIGURE 6, even ignoring considerable redundancy between adjacent e waveforms produced in FIGURE 1 (which redundancy is accounted for using area, centroid, and width measures as will later be described), the total bandwidth of a 10 section analyzer is indicated as being only about 20 cycles per second.

Encoding the several e waveforms may be done in a number of ways embraced by the present invention. The important point to be borne in mind is that it is desired to obtain pattern measures of the central moment kind. In this regard, we are desirous of acquiring signals in accordance with area and centroid, as well as certain higher order moment measures. The terminology will be clear if We consider a cross section along the distanceintensity plane of FIGURE 2 (i.e., at a given instant of time). The result is a curve of intensity versus distance which usually has the shape of a single bump going to Zero at the ends. This pattern may be described in terms of measures which are particularly well suited to a simple bump shape. The total area defined by the distance-intensity curve is one such measure, denoted A. The mechanical centroid is another, where this measure can also be visualized as the point of balance of a paper card cut in the shape of the distance-intensity curve. The width of the bump is another measure, and its asymmetry (lopsidedness) is yet another. These several measures (centroid 5, width and asymmetry) are the central moment measures that are well known in mechanics.

The area is praportional to the sum of all of the voltages from the several filters contained in the frequency spreader 26. A simple set of equal resistors may be connected to the output from the post detection filters 30, these resistors being in turn connected to a common summing point, so as to acquire the area measure A.

Again with a similar set of resistors, if the conductance values be tapered linearly from One end of the frequency spreader to the other, the summed voltage is proportional both to area and centroid value, as A or A (L-ZE), where 5 measures pattern center of gravity from one end of the frequency spreader filter array, and L-E is the corresponding value measured from the other end. An electronic division of this signal by the area measure provides the centroid directly. When two tapered resistance arrays are employed, one of which measures centroid from one end of the frequency spreader 26 and the other from the opposite end thereof, the difference between these two arrays gives centroid measured from the center of the pattern, with proportionality also to intensity as before.

As indicate-d in FIGURE 7, the spatial pattern extractor 32 utilizes two tapered arrays of the type alluded to in the paragraph above, as well as one array of equal resistors R which yields pattern area A. The first tapered array is composed of resistors 92-98. Actually, a section channel has been utilized and there would be 10 such resistors although only four have been pictured. To render the discussion as general as possible, it will be referred to with the thought in mind that N detector outputs are to be summed (even though we have selected N to be 10) and that the resistor 92 has a value equal to R/N, the resistor 94 has a value equal to R/(N-l), the resistor 96 a value equal to R/ (N -2) and the resistor 98 having a value equal to R. The second array is composed of resistors 100-106. In this instance, the resistor 100 has a value R, the resistor 102 a value R/2, the resistor 104 a value of R/ 3 and the resistor 106 a value of R/N. The array involving the resistors 92-98 yields centroid value 5 with multiplier of area A. The value for 5 is measured from the end of the array at which the voltage e is obtained, this being the end which involves the use of the resistor 92. The centroid L-E is measured with the other array, more specifically, the end thereof which includes the resistor 106.

Whereas the acquisition of continuous area data is self-evident, the determination of the centroid independently of area is not so direct. Several methods have been devised for achieving the electronic division of A5 by A as mentioned above to give 5.

Continuing with the description of FIGURE 7 and the particular extractor 32 there depicted, it will be perceived that a servo-mechanism is utilized for the succinct explanation of how the signals from the resistor arrays 92-98 and -106, respectively, are processed to provide the centroid measure 5. In this regard, there is a potentiometer comprised of a resistor 108 which is connected between the commoned ends of the resistors 92-98 at one end thereof and to the commoned ends of the resistors 100-106 at the other end thereof. The potentiometer further includes a wiper arm 110 that is movable along the resistor 108. Through the agency of a resistor 112, an error signal is derived which is sensed by a motor control amplifier 114, The output from the amplifier 114 is fed to a motor 116 coupled to a shaft 118 which drives the hub 120 to which the wiper arm 110 is attached. Also attached to the same hub 120 is a 'wiper arm 122 of a second potentiometer, the second potentiometer having a resistor 124 which is engaged by the wiper arm 122. The resistor 124 is energized via a battery 126. Hence, the position of the wiper arm 122 is in accordance with the centroid measure 5 and such a signal can be sensed at terminal 128. In other words, the position of the shaft 118 is proportional to the centroid and this position is converted to an electrical signal via the potentiometer composed of the elements 122, 124. Obviously, in order to follow speech signals, the servo-system must be relatively fast, but this is within the state of the art in view of the narrow bandwidth realized through the use of the post detection filter 30 illustrated in FIGURE 5.

Another circuit arrangement which can be used to ex tract 5, this time by phase comparison, is shown in FIG- URE 8. This circuit has been labeled 32a. A transistor chopper designated generally by the reference numeral 134 is employed for the purpose of chopping the waveforms for A and A5, delivered to terminals 132 and 130, respectively. The two unfiltered sine wave chopper fundamental components are added after +45 phase shift for one and -45 for the other; this addition is performed at 136 and provides the necessary addition in phase quadrature. Along with amplification at 138, the added waveform is filtered so as to remove all harmonics of 5,000 c.p.s., thereby leaving only fundamental components. The composite sine wave is then passed through several precision clipper amplifiers 144 of the kind used in quality phase meters. The zero crossing points of the clipped waveform are maintained independent of amplitude. The clipped wave is finally demodulated in a phase-sensitive demodulator indicated generally by the reference numeral 154 that is synchronized with the original modulating frequency of the chopper 134 through a phase adjustment means 152. Thus, the centroid 5 is extracted and made available at the terminal 156.

I have also employed a logarithmic computing device in what I think may be a unique circuit configuration to obtain the ratio A5/A=E. It is well known that the logarithm of a voltage may be obtained with a resistordiode series circuit. With two such circuits operating on different voltages V; and V there result voltages proportional log V and log V Subtracting these two logarithms gives log V V The practical difficulty with this procedure is relative stability between the two diode circuits. I have succeeded in time sharing a single diode by using a chopper. Not only does this avoid the selection of closely matched diodes, but it also permits the small ratio signal to be A.-C. amplified thus avoiding the need for stable D.-C. amplifiers. Finally, the use of modulation permits use of synchronous demodulation so that output polarity resolves the usual ambiguity in logarithmic ratio computers as to which of the two input signals is the larger.

The nature of the logarithmic circuit is evident from FIGURE 9 which circuit has been labeled 32b. The two (positive) inputs are chopped and applied to a diode logarithmic computer located at 146. The peak-to-peak computed voltage is the desired log ratio, and the waveform can be amplified in a standard A.-C. amplifier 148.

Finally, detection can be with a phase-sensitive detector such as a synchronized chopper or a ring demodulator. Or, if the indication of V V or V V is not important, a simple peak-to-peak detector can be employed.

Application of the logarithmic method to centroid extraction results from using A5 for V and A for V to give log 5. Using proper dimensional scaling and over the typical range of 5. excursions for normal speech, log 5 may be a nearly linear function of 5.

Yet another technique, this being for the purpose of extracting pattern centroid and width, is shown in block form in FIGURE 10 and has been assigned the reference numeral 320. This modification is considerably different from the three previously discussed, and of itself is a novel electronic instrumentation technique. Each filtered e waveform from FIGURE 1 is applied to one input of a multiplier 158. The other input to each multiplier 158 is in the form of a band of noise. The noise source has been indicated by the numeral 160 and the noise is filtered at 162 before being delivered to the second input of each multiplier 158. Each of the multipliers 158 is associated with a different band of noise in order that a regular progression is provided. Thus, e associates with a relatively narrow noise band centered at a given frequency, whereas e would be associated with a band of noise having a center frequency greater than the first-mentioned center frequency and so forth. All the outputs from the multipliers 158 are added at 164. This results in the provision of a single waveform which has a frequency spectrum whose shape is similar to the shape of the spatial pattern array of e voltages. It is assumed for convenience that this spectrum has a total bandwidth that is appreciably less than the average frequency.

Continuing with the description of FIGURE 10, the single sum waveform from the adder 164 is next acted upon by a standard limiter-discriminator detecting system 166. The output from the discriminator 166 (which should be of the cycle-counting type so as todepend only upon Waveform zero crossings) has a frequency that is representative of the average frequency of the spectrum, which frequency is proportional to the centroid of the spatial pattern of the e signals. After filtering with a low pass filter 168, the centroid measure 5 is made available at terminal 170. Because noise bands are added in the system, the output from the discriminator 166 is more or less noisy, there being random fluctuations superimposed upon the value representing 5. A measure of this randomness is achieved with an averaging A.-C. detector circuit 172 driven through a high-pass filter that may be considered as being included in the detector 172. The output from the detector is connected directly to a terminal 174 and it is at this terminal that the width measurement, designated a, is indicated. It will be appreciated that nor mal variations of 5 inherent in speech are not transmitted via the detector 172, thereby rendering the detector responsive only to pattern width. Finally, it should be evident that area A is simply obtained With a broadband detector 176 which is connected between the adder 164 and the discriminator 166. The area measurement A is provided at termnial 178.

Describing now the synthesizer 34, it will help to provide a block breakdown of the general block shown in FIGURE 1. This has been set forth in FIGURE 11. As can be seen from FIGURE 11, the synthesizer 34 includes a broadband source that provides sounds that are generated separately and independently of the measures. The broadband source is connected to a tunable band-pass filter 182 having the terminals 170 and 174 connected thereto in order to provide frequency control and pattern width control, respectively. The filter 182 is connected to an equalizer 184 and the equalizer in turn is connected to a multiplier 186 having the terminal 178 connected thereto. The equalizer may or may not be present and can be located after the multiplier as well as before it.

The controlled sound subsequent to the controllable filter 182 produces a spatial pattern similar to that due to the original speech from which pattern measures were derived. Thus, the output from the multiplier 186 when fed to a speaker 188 provides an intelligible reproduction of the original sound.

It can be pointed out that the sound source may have a variety of qualities. In one, pure broadband noise can be applied to the controllable or tunable filter 182. There can be embedded in this noise a few harmonics of low frequency periodic sound in order ot provide a voiced quality to the speech. Presence of low frequency periodic sounds can in part be determined by the centroid measure, there being such sounds more often on the average for large 5 values (low average pattern frequency content).

Other approaches to the design of the controllable filter exist from that depicted in FIGURE 12. The primary requirement is that sufiiciently rapid variations be possible. The circuit shown in FIGURE 12, which has been used with success, consists of a feedback ring of five triode vacuum tubes 190 interspersed with five pentode vacuum tubes 192. Grid bias control of the pentodes 192 varies the loop gain which in turn varies the center frequency of a pass band in part of the ring through which the separately generated noise passes. The triodes 190 act as cathode followers to avoid screen degeneration of the pentodes so that larger maximum loop gain can be achieved. The electronic multiplier 186, as already indicated, is used to give gain variations according to the area measure. A set of simple filter functions in cascade provides whatever frequency equalization may be required so that centroid control varies only the average frequency of the generated sound and not its amplitude (as measured in the original analysis system). Although not shown in FIGURE 12, FIGURE 11 indicates that two electronically controlled filters in cascade or in parallel can be used. With two bands, purposeful band misalignment can be used for bandwidth control. In this way, the pattern width measure can be employed in the synthesis in addition to area and centroid.

By way of recapitulation, important features included in the present invention embrace a set of filters which are employed in the frequency spreader 26 whose individual pass bands are distributed uniformly along a logarithmic frequency scale. Additionally, these filters are extensively overlapping and are also asymmetric with a sharper cutoff below the center frequency than above it.

Further, a detection system is utilized in each filter in the frequency spreader 26 which may or may not have lead compensation. Approximately equivalent behavior may be realized when filtering and possibly also lead compensation is applied only to certain slowly varying measures rather than to each of the e signals in FIGURE 1. With response intensity presumed to be pattern area, overall steady state behavior emulates the human threshold curve in hearing.

A system of slowly varying measures is employed in the invention, especially the central-moment system for specifying the intelligibility in speech sounds. Several schemes for extracting these measures have been described.

Slowly varying measures, especially the area-centroidwidth combination, can be used for speech recognition and classification in my system so as to implement, for example, automatic recognition of spoken words.

The slowly varying measures referred to herein are used for controlling a signal consisting of a band of noise derived from the source 180 with or without embedded harmonic components, which signal produces a similar set of measures in an analog ear as does the original speech. The resulting signal produces synthetic speech and comprises a narrow band speech system.

The use of my invention for bandwidth compression has been emphasized here. It is pointed out that study of the several pattern parameters also implements research into automatic speech recognition because each word associates with a unique finite-duration set of pattern measures. My analyzer simulates some of the functional processing mechanisms of the human ear-brain system, but not all. In fact, for bandwidth compression as pposed to recognition, it is important to not model all of these mechanisms. The human observer listens to synthetic speech and effects the usual kinds of processing in his own brain; if certain of these are also involved in the production of synthetic speech, a repetition of processing would occur, once in the synthesizer, and again in the observers brain. Synthetic speech produced using some of these duplicative mechanisms would therefore not sound natural.

I have studied some of the processing mechanisms which are appropriate for use in speech recognition which are not employed in speech synthesis. One of these is mutual inhibition. The detectors in FIGURE 1 are caused to inverse gain control both themselves and their neighbors. The result is partly a controlled gain characteristic, but more importantly, there results a sharpening of spatial patterns so as to make desirable speech sounds easier to recognize in a noisy background. Also, there are certain adaptive mechanisms which adjust the gain of the channel which carries the centroid measure so that varying environmental background noise will not prevent use of fixed memory devices in automatic recognition. In addition, I believe that pattern width is augmented by rate of change of pattern area. This provides for a sustained sound which changes abruptly from one level to another a consonant like distinction.

Although the functioning of my speech processing system is believed readily understandable from the information herein presented, nonetheless reference can be made to two co-pending patent applications for additional clarification. The first of these applications is entitled Electronic Analog Ear, Ser. No. 245,697, filed Dec. 19, 1962, in the names of John L. Stewart, the present applicant, William F. Caldwell, and Ewald Glaesser, now Patent No. 3,294,909. The second application having a bearing on the present subject matter is my application for Sound Analyzing System, Ser. No. 310,394, filed Sept. 20, 1963, now Patent No. 3,325,597.

The analyzing filter in my invention was designed to duplicate some of the characteristics of the human cochlea, or inner ear. Instead of the analyzing filters described here, it is permissible to use instead an analog ear system of filters as described in the patent application entitled, Electronic Analog Ear, cited above. The major functional difference between the present filters and those provided by an analog cochlea is that in the latter case the filters are arranged along a delay line so that a signal of a given frequency experiences increasing phase lag as it excites filters of progressively lowering frequency. When using an analog cochlea, it may be necessary to adjust the common filter as compared with that shown in FIG- URE 3', whereas the gains of the several analyzing filters in FIGURE 1 are equal, those of filters in the analog cochlea may taper somewhat.

It will, of course, be understood that various changes may be made in the form, details, arrangements and proportions of the parts without departing from the scope of my invention as set forth in the appended claims.

What is claimed is:

1. A speech band-width compression system comprising:

(a) means for converting speech into an input electrical signal,

('b) a plurality of band-pass filters connected to said converting means for continuously covering the speech spectrum,

(c) said filters having extremely overlapping frequency bands such that any frequency component of said electrical signal falls within the frequency band of each of more than two of said filters.

2. A speech bandwidth compression system in accordance with claim 1 in which:

(a) said filters are disposed logarithmically as to frequency.

3. A speech bandwidth compression system in accordance with claim 2 in which:

(a) said filters have a center-to-bandwidth ratio at the half-power level of approximately two.

4. A speech bandwidth compression system in accordance with claim 1 including:

(a) means connected to said filters for providing at least one slowly varying measure representative of intelligence contained in said speech.

5. A speech band-width compression system comprising:

(a) means for converting speech to be processed with an input electrical signal,

(b) a plurality of bandwidth filters connected to said converting means for continuously covering the speech spectrum,

(c) said filters being disposed logarithmically as to frequency and having extremely overlapping frequency bands so that any frequency component in the signal being processed contributes appreciably to the output from more than two of said filters, and

(d) means connected to the outputs from said filters for changing said outputs to at least one slowly varymg measure.

6. A speech bandwidth compression system in accordance with claim 5 including:

(a) means for extracting a spatial pattern signal from said last-mentioned means, and

(-b) means connected to said extracting means for synthesizing the extracted signal to reconstitute said speech into intelligible form.

7. A speech bandwidth compression system comprising:

(a) means for converting speech into an input electrical signal;

(b) a plurality of band-pass filters connected to said converting means for continuously covering the speech spectrum;

(c) said filters being disposed logarithmically as to frequency and having extremely overlapping frequency bands such that any frequency component of said electrical signal falls within the frequency band of each of more than two of said filters;

(d) means for rectifying the various outputs from said filters, and

(e) a plurality of post detection filters for providing a set of relatively slowly varying waveforms.

8. A speech bandwidth compression system comprising:

(a) means for converting speech into an input electrical signal; 1

(b) means in circuit with said converting means including a plurality of band-pass filters disposed logarithmically as to frequency and having extremely overlapping frequency bands such that any frequency component of said electrical signal falls within the frequency band of each of more than two said filters;

(0) means connected to said last-mentioned means for providing a set of relatively slowly varying waveforms;

((1) extractor means for providing spatial pattern signals from said Waveforms, and

(e) means for synthesizing said pattern signals to provide a reconstituted speech signal of an intelligible character.

9. A speech bandwidth compression system comprising:

(a) an input device for providing a transduced electrical signal from the speech to be processed;

(b) a common filter connected to said input device for providing a frequency characteristic simulating the standard human audiometric curve of threshold;

(0) a frequency spreader connected to said common filter including a plurality of band-pass filters disposed logarithmically as to frequency and having exceedingly overlapping frequency bands;

(d) a rectifier connected to each filter of said frequency spreader;

(e) a low-pass post detection filter connected to each rectifier so that the bandwidths of the voltage waveforms delivered thereto are materially decreased;

(f) means for extracting spatial pattern signals from the output signals from said post detection filters, and

(g) means for synthesizing said pattern signals to provide a reconstituted speech signal of an intelligible character.

10. A speech bandwidth compression system in accordance with claim 9 in which said extracting means includes:

(a) two tapered resistance arrays and one equal resistance array,

(b) said tapered arrays providing a measure involving centroid and area, and said equal array providing a measure involving only area.

11. A speech bandwidth compression system in accordance with claim 10 in which said extracting means further includes:

(a) a potentiometer in circuit with said tapered arrays for providing an electrical signal representing only centroid.

12. A speech bandwidth compression system in accordance with claim 9 in which said extracting means includes:

(a) a phase comparison means for extracting the centroid value.

13. A speech bandwidth compression system in accordance with claim 12 in which said phase comparison means includes:

(a) a resistor-diode series circuit for producing logarithmically proportional voltages, and

(b) a chopper for time sharing said resistor-diode circuit.

14. A speech bandwidth compression system in accordance with claim 9 in which said extracting means includes:

(a) a multiplier for each of said post detection filters having one input connected to the output of the post detection filter with which it is associated;

(b) a broadband noise source;

(c) a filter for each multiplier having its input connected to said broadband noise source and its output connected to the other input of one of said multipliers;

(d) means for adding together the output-s from said multipliers to provide a single waveform having a frequency spectrum whose shape is similar to the spatial pattern array of the voltages from said post detection filters.

15. A speech bandwidth compression system in accordance with claim 14 including:

(a) a limiter-discriminator detecting circuit connected to said adding means, and

(b) a low-pass filter connected to said limiter-discriminator detecting circuit for providing an output signal having a value representing the centroid.

16. A speech bandwidth compression system in accordance with claim 15 including:

(a) an averaging detector connected to said adding means for providing an output signal having a value representing area.

17. A speech bandwidth compression system in accordance with claim 16 including:

(a) an averaging detector connected to said limiterdiscriminator circuit for producing an output signal having a value representing width.

18. A speech bandwidth compression system in accordance with claim 17 in which said synthesizing means includes:

(a) a broadband noise source;

(b) a tunable band-pass filter connected to said lastmentioned broadband noise source;

(c) said tunable band-pass filter also being connected to said low-pass filter and said last-mentioned averaging detector;

(d) a multiplier having one input connected to the output from said tunable band-pass filter and its other input connected to said first mentioned averaging detector.

19. A speech bandwidth compression system in accordance with claim 18 including:

(a) a speaker connected to the output of said multiplier.

20. A speech bandwidth compression system comprising:

(a) means for converting speech into an input electrical signal;

(b) a plurality of band-pass filters connected to said converting means for continuously covering the speech spectrum;

(c) said filters being disposed logarithmically as to frequency and having extremely overlapping frequency bands;

(d) means for rectifying the various outputs from said filters;

(e) a plurality of post detection filters for providing a set of relatively slowly varying waveforms;

(f) a common filter interposed between said converting means and said band-pass filters, the frequency characteristic of said common filter simulating the standard human audiometric curve of threshold having a mathematical approximation for relative power equal to (f/500) J 1 1 1+(f/ +(f/ where f is frequency in cycles per second.

21. A speech bandwidth compression system comprising:

(a) means for converting speech into an input electrical signal;

(b) a plurality of band-pass filters connected to said converting means for continuously covering the speech spectrum;

(0) said filters being disposed logarithmically as to frequency and having extremely overlapping frequency bands such that any frequency component of said electrical signal falls within the frequency band of each of more than two of said filters;

(d) means for rectifying the various outputs from said filters, and

(e) a plurality of post detection filters for providing a set of relatively slowly varying waveforms, the bandwidth of each of said post detection filters being approximately 15 cycles per second.

22. A speech bandwidth compression system comprising:

(a) means for converting speech into an input electrical signal;

(b) means in circuit with said converting means including a plurality of band-pass filters disposed logarithmically as to frequency and having overlapping frequency bands;

(0) means connected to said last-mentioned means for providing a set of relatively slowly varying waveforms;

(d) extractor means for providing spatial pattern signals from said waveforms, said spatial pattern si nals being in accordance with area and centroid, and

(e) means for synthesizing said pattern signals to provide a reconstituted speech signal of an intelligible character.

23. A speech bandwidth compression system comprising (a) means for converting speech into an input electrical signal;

(b) means in circuit with said converting means including a plurality of band-pass filters disposed loga- 13 rithmically as to frequency and having overlapping frequency bands;

(c) means connected to said last-mentioned means for providing a set of relatively slowly varying waveforms;

((1) extractor means for providing spatial pattern signals from said waveforms, said spatial pattern signals being in accordance with area, centroid and width, and

(e) means for synthesizing said pattern signals to provide a reconstituted speech signal of an intelligible character.

24. A speech bandwidth compression system comprising:

(a) means for converting speech into an input electrical signal;

(-b) a plurality of band-pass filters connected to said converting means for continuously covering the speech spectrum;

(c) said band-pass filters being disposed logarithmically 2 as to frequency and having extremely overlapping frequency bands such that any frequency component of said electrical signal falls within the frequency band of each of more than two of said filters; (d) a common filter interposed between said convert- 5 ing means and said band-pass filters;

(e) means for rectifying the various outputs of said band-pass filters; and f) a plurality of post detection filters for providing a set of relatively slowly varying waveforms. 10

References Cited UNITED STATES PATENTS 2,868,882 1/1959 Di Toro 17915.55 15 2,340,364 2/1944 Bedford 1791 2,817,711 12/1957 Feldman 17915.55

ROBERT L. GRIFFIN, Primary Examiner.

0 JOHN W. CALDWELL, Examiner.

J. T. STRATMAN, Assistant Examiner.

Patent Citations
Cited PatentFiling datePublication dateApplicantTitle
US2340364 *Aug 22, 1942Feb 1, 1944Rca CorpAudio transmission circuit
US2817711 *May 10, 1954Dec 24, 1957Bell Telephone Labor IncBand compression system
US2868882 *Jan 12, 1953Jan 13, 1959IttCommunication system
Referenced by
Citing PatentFiling datePublication dateApplicantTitle
US3469034 *May 23, 1966Sep 23, 1969Santa Rita Technology IncNeural-like analyzing system
US3483325 *Apr 22, 1966Dec 9, 1969Santa Rita Technology IncSpeech processing system
US4941178 *May 9, 1989Jul 10, 1990Gte Laboratories IncorporatedSpeech recognition using preclassification and spectral normalization
US6138089 *Mar 10, 1999Oct 24, 2000Infolio, Inc.Apparatus system and method for speech compression and decompression
Classifications
U.S. Classification704/203, 704/246
International ClassificationG10L11/00
Cooperative ClassificationG10L25/00, H05K999/99
European ClassificationG10L25/00