US 3925616 A
Apparatus, for determining the glottal waveform of a subject, is disclosed which comprises a hard-walled, uniform, cylindrical, acoustic waveguide having a cross-sectional area commensurate with that of the vocal tract of the subject. The waveguide is terminated at one end to substantially reduce acoustic wave reflections. A microphone is positioned within the wall of the waveguide to detect and convey to suitable instrumentation the glottal waveform of the subject.
Description (OCR text may contain errors)
United States Patent [191 Sondhi Dec. 9, 1975 APPARATUS FOR DETERMINING THE GLOTTAL WAVEFORM  Inventor: Man Mohan Sondhi, Berkeley Heights, NJ.
 Assignee: Bell Telephone Laboratories,
Incorporated, Murray Hill, NJ.
22 Filed: Apr. 30, 1974 21 App]. No.: 465,604
 US. Cl 179/1 SC  Int. Cl. GlOL 1/04  Field of Search 179/1 SA, 1 SC; 181/34, 181/40  References Cited UNITED STATES PATENTS 1,800,234 4/1931 Tuttle l8l/34 2,508,581 5/1950 Morrow.. 181/34 2,572,547 10/1951 Webb 181/34 OTHER PUBLICATIONS Dudley; H., Speech Fundamentals..., J. of Aud. Eng. Soc., Vol. 3, 1955, p. 170.
Primary Examinerl(athleen H. Claffy Assistant ExaminerE. S. Kemeny Attorney, Agent, or FirmG. E. Murphy; W. Ryan ABSIRACT Apparatus, for determining the glottal waveform of a subject, is disclosed which comprises a hard-walled, uniform, cylindrical, acoustic waveguide having a cross-sectional area commensurate with that of the vocal tract of the subject. The waveguide is terminated at one end to substantially reduce acoustic wave reflections. A microphone is positioned within the wall of the waveguide to detect and convey to suitable instrumentation the glottal waveform of the subject.
8 Claims, 4 Drawing Figures l5 (MICROPHONE) INSTRUMENT US. Patent Dec. 9, 1975 Sheet 2 of2 3,925,616
APPARATUS FOR DETERMINING THEGLOTTAT. WAVEFORM' f BACKGROUND or THE iNveN'rioN This invention pertains to the measurement of acoustic waveforms and, more particularly, to theme'asurement of the glottal waveform.
ln the normal production of speech, air is forced from the lungs through the trachea and into the throat cavity. The top of the throat cavity is surmounted by a structure known as the larynx which includes two lips of ligament and muscle; these are the vocal cords. The slitlike orifice that exists between the cords is called the glottis. Voiced sounds of speech, also known as phonation, are produced by vibratory action of the vocal cords. Phonation is accomplished when the subglottal pressure, i.e., the pressure below the glottis, is increased sufficiently to force apart the tensed vocal cords. As the air flow increases in volume through the resultant orifice, the local pressure is reduced and the vocal cords return to their initial position. As the cords draw together, the air flow is diminished and the local pressure approaches the subglottal value. The cycle is then repeated. The mass and compliance of the vocal cords and the subglottal pressure essentially determine the period of oscillation. They also determine the shape of the waveform within a period. Thus, the variable area orifice produced by the vibrating cords generates quasi-periodic pulses of air which excite the acoustic system above the vocal cords.
The measurement of these pulses of air flow through the glottis during phonation, i.e., the measurement of the glottal waveform, has been studied in considerable detail by acousticians. Since, the quasi-periodic pulses emanating from the glottis provide the acoustic excitation of the vocal tract during voiced speech sounds, the nature of the glottal waveform is an important consideration in the design of speech synthesizers as well as in the formulation of models of speech production.
In view of the importance of the subject, numerous methods have been proposed and utilized to measure the glottal waveform. For example, the larynx can be viewed with a small mirror suitably positioned far back in the mouth. By illuminating the vocal cords with a high intensity light beam, it is possible with the appropriate apparatus to make movies of vocal cord motion. These movies may be processed to yield approximately four thousand samples per second of the area A tt) of the orifice between the vocal cords. This sampling rate is adequate for some applications. However, some amount of aliasing is to be expected, especially when the duty cycle of the glottal pulses is low (e.g., during phonation with high vocal effort). It is to be noted that relevant to the measurement of the glottal waveform, this technique can at best give an accurate estimate of A ,,(l). The air flow, and thus the waveform, must be estimated from this.
Another method of estimating the orifice area is by transillumination of the glottis. If a light source is positioned on one side of the vocal cords and a photocell on the other, then the pattern of variations in the transmitted light may be recorded. Such a record bears a strong resemblance to the area function #1,,(1). However, since a numberlof assiimptions must be made in order'to relate transmitted light intensity to the area of the orifice,
the accuracy of this method is not readily verifiable.
-Various other functions can be measured which have a resemblance to the waveform of air flow through the glottis, e.g., signals derived from the motion of the throat wall adjacent to the glottis, the attenuation of ultrasonic waves upon transmission through, or reflection from, the vocal cords, and the modulation ofa high frequencyalternating current due to the variable impedance of the vibrating vocal cords. However, the relationship of each of these functions to the glottal air flow is rather complexfand these methods cannot be expected to yield anything more than some gross features of the glottal pulses, e.g., their presence or absence, periodieity, etc.
There are also methods which estimate the glottal wave from the acoustic output at the subjects lips. Methods of this type employ inverse filtering in one form or another. The basic idea is to eliminate the influenee of the vocal tract. Accordingly, an estimate of the transfer function of the vocal tract is made, a filter with a transfer function which is the inverse of the estimate is realized, and the speech output is filtered to obtain the glottal excitation. In most inverse filtering implementations, the pressure output atthe lips is used, and the vocal tract transfer function is approximated by a product of three or four complexconjugate pole-pairs and by corrections for higher order poles and for radiation. There being no unique decomposition of a speech signal into separate excitation and voeal-tract portions, the success or failure of the inverse filtering technique depends entirely upon the constraints and criteria imposed by the experimenter. These constraints and criteria are based upon a priori information about the nature of the speech production apparatus. While, in general, the a priori information may be reliable, it is important to keep this limitation in mind. For example, quite frequently it is not'possible to decide whether a feature in the recovered waveform is to be attributed to the glottal source or to an improper adjustment of the inverse filter; thus, demanding a smooth output from the inverse filter may obliterate genuine oscillations in the glottal waveform.
It is an object of this invention to overcome the limitations of the prior art methods and apparatus described above and to determine the glottal waveform in a simple, direct, and inexpensive manner.
SUMMARY OF THE INVENTION This and other objects of this invention are accomplished by utilizing apparatus responsive to the acoustic output at the lips. However, this invention differs from, for example, the inverse filtering method described above in that instead of post facto removal of the influence of the vocal tract from the speech utterance, l ensure that the vocal tract transmits the glottal wave with minimal alteration of distortion, thereby obviating the need for inverse filtering.
More particularly, 1 utilize a uniform, hard-walled, cylindrical, acoustic waveguide, having a cross-sectional area commensurate with that of the vocal tract of the subject. The waveguide, or tube, is matched at one end to prevent reflection of the acoustic waves transmitted down the tube by the subject. A small probe microphone positioned within the wall of the waveguide detects the acoustic waveform.
BRIEF DESCRlPTlO N OF THE DRAWINGS FIG. 1 depicts the apparatus of this invention used to measure the glottal waveform of a subject;
FIG. 2 depicts the glottal waveform detected by the apparatus of this invention;
FIG. 3A depicts the magnitude of the Fourier transform of the waveform of FIG. 2 as a linear function of frequency; and
FIG. 3B depicts the magnitude of the Fourier transform of the waveform of FIG. 2 as a logarithmic function of frequency.
DETAILED DESCRIPTION Consider, for example, an individual phonating the neutral vowel, a as in ado. The vocal tract may be accurately approximated in such a condition by a uniform, hollow, cylinder or tube. However, at the end of the tube, i.e., at the lips, there is a termination with a very low impedance, the radiation load, which may, practically, be assumed to be equal to zero. Thus, even presuming that the vocal tract does not alter the glottal waveform, the waveform measured at the lips will be significantly distorted because of the existence of this unmatched termination.
In accordance with this invention, the acoustic termination at the lips is matched, thereby preventing reflections and eliminating distortions of the generated glottal waveform. In FIG. 1, e.g., subject is illustrated phonating the neutral vowel /e/ into a reflectionless acoustic waveguide 12. Acoustic waveguide 12, preferably of uniform cylindrical cross section, has an internal bore whose area is commensurate with, i.e., approximately the same as that of the vocal tract and is of a length substantially greater than the length of the vocal tract. Waveguide or tube 12 is coupled to the subjects lips by mouthpiece 11 which, advantageously, is readily removable from tube 12, for sanitary reasons, and also is of a shape and size to conveniently adapt to the lips of the subject. To maintain the desired acoustic impedance match at the subjects lips, the passageway through mouthpiece 11 should be of substantially the same cross-sectional area and configuration as the internal bore of waveguide or tube 12. Tube 12 is closed, at the end opposite to that of the lips of the subject, by cap 14. Mounted, within waveguide 12, at the opposite end from that of the subjects lips, is an acoustic cone or wedge 13 which provides a matched, i.e., reflectionless, termination. Probe microphone 15, which penetrates through the wall of tube 12 is utilized to detect the acoustic waveform, and conveys this detected signal to a suitable instrument 16. Assuming that the cross-sectional area of acoustic waveguide 12 and the area of the vocal tract of the speaker are approximately of the same magnitude, the glottal source sees a matched load. Thus, there is no distortion of the glottal waveform measured at microphone 15, either by the vocal tract or by radiation loading.
For most subjects, tube 12 should be a uniform, hollow, hard-walled, e.g., brass, cylinder, approximately 3 to 6 feet long with an inside diameter of approximately 1 inch. The corresponding area of the cylinder bore thus should be approximately 3 5 square centimeters. Acoustic wedge termination 13 may be made of fiberglass insulating material; it should extend for about one-third the length of the tube, from the capped end, and be approximately conical in shape. In order to prevent a buildup of acoustic pressure within the tube, a small orifice 18 may be provided in the capped end of tube 12. A small diameter, for example, one quartcr inch, electret microphone 15 positioned in the wall of tube 12, approximately one=thircl the length of the tube 4 from the subjects lips, has performed satisfactorily. Instrument 16, connected to microphone 15, may be an oscilloscope or other welLknown measuring or recording apparatus. Stand 17 assists in the support of tube 12.
In order to avoid the vagaries of analog tape recorders, the output of microphone 15 was digitally recorded in one implementation of this invention. Also, to avoid erratic disturbances due to breathing, accidental motions of the apparatus, etc., a high-pass filter with a low frequency cutoff at approximately twenty hertz was inserted between the microphone and the digital recorder. The high-pass filter was inserted for convenience only. As long as the microphone has high fidelity down to the lowest expected frequencies, the filter need not be used. If so desired, the low frequency, d.c. content, of the waveform may be simultaneously recovered at orifice 18 when the output of microphone 15 is being filtered.
FIG. 2 shows a display of a glottal waveform obtained by the use of the apparatus of this invention. Except for the trivial filtering mentioned, the signal displayed is as recorded, with no processing whatsoever. To the trained eye of an acoustician, it would appear that the influence of the vocal tract and any radiation mismatch is completely absent. A more convincing proof of this is provided by the magnitude of the Fourier transform of the measured glottal waveform. This is shown in the plot of FIG. 3A, on a linear frequency scale, and on a logarithmic frequency scale in FIG. 38. It is clear from these plots that there is no discernible trace of the formant structure of voiced sounds; thus, the characteristic influence of the vocal tract is absent.
The above-described method of measuring or obtaining the glottal Waveform is obviously inexpensive, requiring little in the way of apparatus. For example, comparing this method with what is considered the most accurate of the methods heretofore available, i.e., inverse filtering, it is clear that inverse filtering requires a computer, for simulated filters, or a complete set of fixed and variable filters. Also, the inverse filtering method requires laborious and skillful manual adjustment of the filter apparatus, or the running of a rather expensive computer program to make the adjustments automatically. In the instant invention, the only adjustment necessary is to insure that the vocal tract of the subject has the proper neutral configuration. This may be accomplished by the subject observing the detected glottal waveform on an oscilloscope connected to microphone 15. In an extremely short period of time, the subject learns to manipulate his vocal tract to eliminate oscillations at formant frequencies in the observed glottal waveform which are due to constrictions in the vocal tract. Furthermore, inverse filtering is rather sensitive to noise in the input waveform because the nature of the signal processingenhances the noise in various frequency ranges. Consequently, speech recordings for this purpose must be made in a studio or anechoic chamber. In the instant invention, since microphone 15 is completely enclosed, it is immune to noise in the environment.
What is claimed is:
1. Apparatus for detecting the glottal waveform comprising:
an acoustic waveguide having a cross-sectional area substantially equal to that of the human vocal tract, said waveguide terminated at one end to substantially reduce acoustic wave reflections and adapted 6 at the other end for physical coupling to a subjects The pparatus defined n Claim 5 herein Said lips; and means for detecting is an electret microphone.
7. Apparatus for determining the glottal waveform of a subject comprising: 5 a hard-walled, uniform, cylindrical, acoustic waveguide having a cross-sectional area commensurate means positioned in the wall of said waveguide for detecting said glottal waveform. 2. The apparatus defined in claim 1 wherein the cross-sectional area of said waveguide 18 approximately with that of the vocal tract f aid subject, said 3 to 5 square centlmeters' waveguide terminated at one end to substantially The apparalus defined clam 2 wherem 531d reduce acoustic wave reflections and adapted at means for detectmg is an electret microphonethe other end to receive vocal excitations from said 4. Apparatus for determining the glottal waveform bj t; a d comprising: an electret microphone positioned in the wall of said a cylindrical hollow tube having a cross-sectional waveguide for detecting the glottal waveform of area commensurate with that of the human vocal Said j 8. The method of determining the glottal waveform comprising the steps of:
phonating the neutral vowel /a into a hard-walled,
uniform, cylindrical, hollow tube having a crosssectional area commensurate with that of the tract, the first end of said tube terminated to substantially reduce acoustic wave reflections, the second end of said tube acoustically impedance matched to the human lips and receptive to vocal excltamfis; human vocal tract, said waveguide terminated at means positioned in the wall of said tube for detectone end to Substantially reduce acoustic wave ing said glottal waveform. fl ti and The apparatus defined in claim 4 wherein the detecting said glottal waveform with a microphone cross-sectional area of said tube is approximately 3 to 5 positioned in the wall of said tube. square centimeters.