US 3855418 A
A method and apparatus for indicating emotional stress in speech by detecting the presence of vibratto or rapid modulation of the phonation constituent within the speech signal envelope. The modulation is emphasized by rectification, smoothing, and time and amplitude discrimination of the speech wave form and is compared to a selected voltage level to produce a series of uniform pulses, the number of which is indicative of the magnitude of vibratto content.
Description (OCR text may contain errors)
United States Patent Fuller Dec. 17, 1974 OTHER PUBLICATIONS Philip Lieberman, Perturbations in Vocal Pitch, J.A.S.A Vol. 33. 5/l96l, p. 597-603. Philip Lieberman, Some Acoustic Correlates of Word lnvemofl Fred Fuller, 4450 Park St, Stress in American English, J.A.S.A. Vol. 32, April Chevy Chase, Md. 20014 19 0 45 454  Filed: Dec. 1, 1972 Primary Examiner-David L. Stewart  Appl' 311,392 Attorney, Agent, or FirmFidelman. Wolffe, Leitner & Hiney  US. Cl 179/1 SA, 179/1 SP  Int. Cl. G10] 1/04  ABSTRACT  Field 0fS earch...1 79 /1 SA, 1 SB, 1 VS,15.55 R, A method and apparatus for indicating emotional 179N555 1 SP; 128/206; 35/21 stress in speech by detecting the presence of vibratto or rapid modulation of the phonation constituent  References C'ted within the speech signal envelope. The modulation is UNITED STATES PATENTS emphasized by rectification, smoothing, and time and 3.268.661 8/1966 COUllel 179/1 SA amplitude discrimination of the Speech Wave form and 3.346.694 l0/l967 Brady 179/] SA is compared to a selected voltage level to produce 21 3,387,090 6/1968 Bridges 179/1 SA series of uniform pulses, the number of which is indic- 3,549.806 12/1970 Wood l79/l SA ative of the magnitude of vibratto content. 3,592.96) .7/l97l Yoshino l79/l SA 3,688,126 8/1972 Klein 179/1 SA 13 Clams, 8 Drawlng Flgures TIME AND UNIPOLAR SMOOTHING AMPUTUDE PROCESSOR FILTER D|SCR|M|NATOR 2 24 28,32 38 4Q 4|, 42, 4 J
54 52') 50) NUMERICAL PULSE VOLTAGE INDICATOR COUNTER COMPARATOR PATENIE nan 1 71914 sum xnr 3 H6. 2 I "N v AXIS OF AVERAGES .v v I' II I l v "I l 'I v ll II I ll ll V '1 l 1 l I II I |l l l AXIS OF AVERAGES PATENTEDBEBI 119M 3,855,418
'SHEETEUF3 MAAA AAA A AA AA A AA ILEIESRAEDES} W WW W W w w FIG. 4a
AMPLIFIER AMPLIFIER AMPL'F'ER TIME AND UNIPOLAR SMOOTHING AMPUTUDE PROCESSOR FILTER DISCRIMINATOR 2 lO 24 542 7 NUMERICAL PULSE VOLTAGE INDICATOR COUNTER COMPARATOR w mlw v F/aa METHOD AND APPARATUS FOR PHONATION ANALYSIS LEADING TO VALID TRUTH/LIE DECISIONS BY VIBRATTO COMPONENT ASSESSMENT BACKGROUND OF THE INVENTION The present invention relates generally to voice signal analysis systems and more specifically to a method and apparatus for detecting emotional stress within a voice pattern. The presence of an emotional state will be used to determine the truthfulness of a response to questions asked by a skilled interrogator.
DESCRIPTION OF THE PRIOR ART It has long been known that the voice may be, and often is, used to convey the emotions of the speaker. The emotional state of the speaker produces readily observable variation in the measureable parameters of the voice.
Speech is the acoustic energy response of: (a) the voluntary motions of the vocal cords and the vocal tract, which consists of the throat, the nose, the mouth, the tongue, the lips, and the pharynx, and (b) the resonances of the various openings and cavities of the human head. The primary source of speech energy is excess air under pressure, contained in the lungs. This air pressureis allowed to flow out of the mouth and nose, under muscular control which produces modulation. This flow is controlled or modulated by the human speaker in a variety of ways.
The major source of modulation is the vibration of the vocal cords. This vibration produces the major component of the voiced speech sounds, such as those required when pronouncing the vowel sounds in a normal manner. These voiced sounds, formed by the buzzing action of the vocal cords, contrast to the voiceless sounds such as the letter s or the letter f produced by the nose, tongue and lips. This action of voicing is known as phonation.
The basic buzz or pitch frequency, which establishes phonation are different for men and women. The vocal cords of a typical adult male vibrate or buzz at a frequency of about l20I-Iz, whereas for women, this basic rate is approximately an octave higher, near 250 Hz. The basic pitch pulses of phonation contain many harmonies and overtones of the fundamental rate, in both men and women.
The vocal cords are capable of a variety of shapes and motions. During the process of simple breathing, they are involuntarily held open and during phonation, they are brought together. As air is expelled from the lungs, at the onset of phonation, the cords vibrate back and forth, alternately closing and opening. Current physiological authorities hold that the muscular tension and theefiective mass of the cords is varied by learned muscular action. These changes strongly influence the oscillating or vibrating system.
Certain physiologists consider that phonation is established by or governed by two different structures in the pharynx; the vocal cord muscles and a mucous membrane called the conus elasticus. These two structures are acoustically coupled together at a mutual edge, within the pharynx, and cooperate to produce two different modes of vibration.
In one mode, which seems to be an emotionally stable or non-stressful timbre of voice, the conus elasticus and the vocal cord muscle vibrate as a unit, in synchronism. Phonation in this mode sounds soft or mellow" and few overtones are present.
In the second mode, a pitch cycle begins with a subglottal closure of the conus elasticus. This membrane is forced upward toward the coupled edge of the vocal cord muscle in a wave-like fashion, by air pressure being expelled from the lungs. When the closure reaches the coupled edge, a small puff of air explosively occurs, giving rise to the open phase of vocal cord motion. After the explosive puff of air has been released, the subglottal closure is pulled shut by a suction which results from the aspiration of air through the glottis. Shortly after this, the vocal cord muscles also close. Thus, in this mode, the two masses tend to vibrate in opposite phase. The result in a relatively long closed time alternated with short sharp air pulses which may produce numerous overtones and harmonics.
The balance of respiratory tract and the nasal and cranial cavities give rise to a variety of resonances, known as Formants in the physiology of speech. The lowest frequency formant can be approximately identitied with the pharyngeal cavity, resonating as a closed pipe. The second formant arises-in the mouth cavity. The third formant is often considered related to the second resonance of the pharyngeal cavity. The modes of the higher order formants are too complex to be very simply identified. The frequency of the various formants vary greatly with the production of the various voiced sounds.
Certain investigators and researchers in the field have determined that amplitude and frequency variations in the fundamental voiced pitch energy (which is often termed the fine structure) appears to be an acoustic correlate of emotional content, transmitted through speech. Other parameters thought to be related to the emotional transmission of information include: Phonetic Content, Gross Changes in Fundamen-' tal Frequency, Relative Energy Levels in Various Frequency Bands, and the Speech Envelope Amplitude. These parameters all contribute to the conveyance of emotion or a stressful condition existing in the speaker.
Speech analysis and the equipment for accomplishing the same has been developed for a variety of loosely related purposes. One of the primary concerns is the transmission of speech with a high order of intelligibility and presence over a very reduced bandwidth. The applicability of this particular art becomes obvious in civil and military communications. Other fields in which speech analysis equipment is used are the voice operated printing or recording device, such as a typewriter, and systems, equipment and devices that command and control the spoken word or phrase. While these activities are interesting and valuable in themselves, they do not relate to the detection of emotional content of a speech wave nor'to its use to determine the veracity of the speaker.
According to the present invention, the amplitude variations of the basic phonation may be assessed and quantified by measurement of the amount of rapid aperiodic amplitude modulation on the speech signal envelope of a spoken word. This rapid variation of the amplitude of the speech signal envelope is called Vibratto for the purposes of this invention.
This invention discloses a means whereby the measure of vibratto in the speech envelope of a person under interrogation may be meaningfully quantified in real time, so that a Truth/Lie decision can be made. Research into the vibratto component of the speech wave has conclusively demonstrated that the amount of vibratto correlates well with stress or emotional involvement which leads to the Truth/ Lie decision.
There are many ways to detect and measure the amount of vibratto in the phonation of an emotionally involved person under interrogation. Frequency fluctuation in the basic pitch frequency could be quantified with the aid of a frequency discriminator, for example. In addition, variability of time between successive pitch pulses could be obtained by conventional zero crossing analysis.
SUMMARY OF THE INVENTION The present vibratto quantification method and apparatus provides means for identifying and selecting speech signal envelope amplitude excursions or modulations in excess of a selected value and means for displaying, counting and recording the number of these amplitude excursions. This speech signal is rectified and the envelope is smoothed. The envelope is time and amplitude discriminated by a differentiator and a DC base line restorer to emphasize the amplitude excursions or vibratto modulation. The resulting pulses are applied to a level detector and then processed into a pulse counter which drives the display to indicate the amount of modulation or vibratto in the speakers speech.
A simple oscillographic recorder readout may be used so that the over-threshold envelope amplitude excursions could be visually counted and recorded in such a manner as to allow comparison between successive responses during interrogation. Other comparison or threshold selection techniques can be employed too. The output pulses of such circuits may be counted in a digital manner or in a simple integrating analog diode circuit. The specific embodiment of the invention includes a comparator circuit with a variable voltage level to be selected after observation by a trained operator. The value selected determines the level beyond which the stressed phonation pulses may be considered statistically significant in making the Truth/Lie decision. The group of pulses is digitally counted for each proper utterance of the person being interrogated. A digital display is employed to indicate the number of pulses that the exceeded the selected threshold of the comparator circuit. This digital measure of a proper utterance is available to the interrogator so that he (or she) can intelligently quantify the veracity of the answers to the selected questions during the interrogation process. Statistical data has revealed that this technique allows the Truth/Lie decision to be made with a high degree of accuracy.
OBJECTS OF THE INVENTION It is an object of the present invention to provide a means for detecting a stressful or emotional condition in a human being who is speaking.
An additional object of this invention is to detect this emotional or stressful condition while the person who is speaking is under direct and skillful interrogation.
A further object of this invention is to provide means whereby a valid Truth/Lie decision can be rendered by direct observations of the data readout of a voice or speech analysis system.
A still further object of this invention is to detect the emotional or stressful condition by analysis of the rapid amplitude modulation of the fundamental phonation of the speaker using an electronic signal analysis system.
Other objects, advantages and novel features of the present invention will become apparent from the following detailed description of the invention when considered in conjunction with the accompanying drawings in which:
BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is an oscillograph of a male voice responding with the word yes in the English language, in answer to a direct question at a bandwidth of SkHz.
FIG. 2 is an oscillograph of a male voice responding with the word no in the English language, in answer to a direct question at a bandwidth of 5kHz.
FIGS. 3a and 3b are typical graphs of a portion of a yes response with and without emotional stress, respectively.
FIGS. 4a and 4b are typical graphs of a portion of a no response with and without emotional stress, respectively.
FIG. 5 is a block diagram of the vibratto signal processing circuit.
FIG. 6 is a detailed block schematic of the vibratto pulse level selecting and counting circuit with display.
DESCRIPTION OF PREFERRED EMBODIMENTS FIG. 1 shows an oscillograph of a male voice responding with the word yes in the English language in answer to a direct question at a bandwidth of SkHz. The wave form contains two distinct envelopes, the first being for the ye sound and the second being for the harsh 5 sound. Since the first envelope of the yes signal wave form is a mellower sound being produced primarily by the vocal cords and conus elasticus, this envelope will be processed to detect emotional stress content or modulations. The male voice responding with the word no in the English language at a bandwidth of SkI-Iz is shown in FIG. 2. This response has a single envelope which will be analyzed by the present device to detect the presence of rapid modulation of the phonation constituent of the speech signal.
FIG. 3a is a drawn replica of a portion of the response yes, delivered under emotional stress. The rapid modulation or vibratto pulses can be seen extending above and below the normal envelope. These additional excursions occur as the result of non-symmetric action between the vocal cords and the conus elasticus. The basic reptition period of this male voice is about 8.3 milliseconds.
FIG. 3b is a drawn replica of a portion of a male voice responding yes delivered under conditions of no emotional stress. The smooth regular features of the pitch pulses can be easily seen.
FIG. 4a is a drawn replica of a portion of the same male voice responding no under a condition of emotional stress. The vibratto modulations appear as distortions near the axis of averages and as excessively high peaks in the position direction. This non-regularity is the result of interaction in the pharynges between the vocal cords and the conus elasticus leading to explosive" type of formant excitation.
FIG. 4b is a drawn replica of a portion of the same male voice answering no to a non-stressful question. The smoothness and regularity of the response can be readily seen.
Thus, it is an object of the present invention to isolate the rapid modulation of the phonation constituent of the speech signal envelope in order to detect the presence of emotional stress in the speaker.
The present invention will emphasize the rapid modulations amplitude in excess of a selected level within the envelope in order to distinguish them. After this emphasis, the signal will be analyzed by comparison with a selected voltage level above which the pulses will be counted. It is the registration of the number of pulses which will indicate the presence of emotional stress in the speech of the individual under interrogation.
Experimentation with the present invention has shown that the difference in count between a nonemotional response and an emotional response is readily evident. Though the count varies for degree of emotional stress and for various individuals, the number of pulses counted in the emotional state is usually greater thantwice the number of pulses counted where no emotional stress is present. It is this type of comparison that will present the interrogator with an instantaneous and readily observable deviation which can be correlated with the questions asked to determine the Truth/Lie of the subject.
A blockdiagram of the vlbratto signal processing circuit is shown in FIG. 5 as having an acoustical transducer 2 at its input. The acoustical transducer 2 is a microphone type of device which converts the acoustical utterance of the speaker into alternating current energy. As shown in FIG. 6, a tape recorder 4 may be used as a source of electrical signal energy instead of direct transduction by means of a microphone. In either case, the microphone used to record the information into the tape recorder (or as an input directly into the system) should have the property to transfer the acoustical utterances into electric al'energy with a minimum of frequency and amplitude distortion.
Electrical signals representing the speech wave are amplified in operationalarnplifier which provides linear amplification and isolation of the input from the remainder of the system. The amplified speech signal is then rectified in a unipolar process 14 to provide an electrical signal having only one polarity. The rectified signal is again amplified and isolated from the remainder of the circuit by operational amplifier 24. Electrical signals representing the speech envelope of a single polarity is then smoothed in filter 28,32 by integration. The smoothing filter 28,32 removes the high frequency energy of the phonation and extracts a signal which is representative of the envelope of the speech wave. The smoothed signal is again amplified and isolated from the remainder of the circuit by operational amplifier 38. The smoothed envelope is then difierentiated in time and compared with the envelope amplitude in its level (to be determined by resistor 48). By proper selection of the variable voltage level by the interrogator, analysis of the speech wave can be adapted to the specificperson being interrogated. By varying the voltage level for comparator 50, the interrogator determines the statistical weight to be given to various amplitude levels of modulation. Voltage comparator 50 produces a series of uniform pulses indicative of the number of pulses of which it has received which are greater than the voltage level set by resistor 48. These pulses are counted in pulse counter 52 and displayed in a numerical indicator 54.
With the brief description of the block diagram of the present invention, it is obvious that the present invention provides a rapidly observable indication to a trained interrogator of the truthfulness of the subjects response by mere observation of the numerical indicator 54. The interrogator would initially ask the subject a series of questions for which he knows the answers and which applicant would not lie. These questions would include are you wearing a specific color shirt?" and the response would be yes or no. After observing the number of numerical. indicator 54, the interrogator would adjust the voltage level of voltage comparator 50 so that the number appearing on the numerical indicator would be minimal or approximately under 10. It should be noted that the count of a number in response to a yes is different from the count of a number in response to a no. Once the initial adjustment of the system has been accomplished, the interrogator may proceed to ask questions for which he is not sure of the answers. Upon monitoring various responses, the interrogator may determine which questions the applicant answered with various degrees of emotional stress. By comparing the number in the numerical indicator 54 with the number determined to be truthful responses, the interrogator can determine when emotional stress is present which would correspond to when the applicant is lying. The number on the numerical indicator for an untruth ,or presence of an emotional stress will normally exceed twice the value that would be recorded for truthful responses to a yes or no."
A more detailed schematic of the present invention is shown in FIG..6. As described in reference to FIG. 5, the electrical signal from either thetransducer 2 or the tape recorder 4 enters the system at input port 5. An operational amplifier 10 with its gain and performance determing resistors 6, 8 and 12 is used to provide isolation and linear amplification of the input signal.
This isolated and amplified signal, at the output of the operational amplifier 10, is conducted to a unipolar processor or diode 14 where one polarity of the signal is allowed to pass into the following circuitry. A diode connected in the opposite polarity 16 could be used equally as well. A full wave rectification or bridge rectification circuit (not shown) could be used as well with a small additional complication of the circuit. The electrical energy out of the diode, at the input of the following circuitry is therefore primarily and predominantly of one polarity. The DC energy return resistance 20 prevents a residual charge from building up on the input of the following circuit.
Operational amplifier 24 with its gain and performance determining resistors 18, 22 and 26, is used to isolate the diode circuit from the follow-on circuitry. The follow-on circuitry consists of a smoothing filter in the form of an R/C integrator having a variable resistor 28 and a fixed capacitor 32. It can be seen by those versed in the art that a variety of different active and passive smoothing filters could be used to remove the high frequency energy of the phonation and to extract a signal which is representative of the envelope of the speech wave. The R/C integrator, which is used in the present embodiment, functions quite well and is simple to employ. The time constant is variable to afford adjustment for voices of various fundamental frequencies.
The R/C integrator is followed by a further operational amplifier 38 with its gain and performance determining resistors 30, 34 and 36. This operational amplifier isolates the processing of the R/C integrator 28 and 32 from the subsequent circuit.
Following the isolation amplifier 38 is a special time and amplitude discriminator having a differentiator circuit involving the variable capacitor 42 and the fixed resistor 40. These two components perform the time differentiation function. The potentiometer 41 provides a measure of the undifferentiated signal envelope which is used to null out residual envelope energy. This component, connected as it is, performs the envelope amplitude discrimination function. An operational amplifier 44 with its gain and performance determining resistances 43,45 and 46 accepts the time derivative signal and the amplitude discrimination signal and provides effective base line restoration for most typical types of phonation.
Base line restoration can be accomplished in a variety of ways, for example, clamping and DC restoration. Irrespective of the circuit used, the output of the amplifier 44 is a series of varying amplitude pulses that comprise the variable modulation of the phonation which represents the vibratto to be quantified. This circuitry emphasizes the modulation with respect to the normally present phonation.
Statistical analysis of the series of output pulses at the output of amplifier 44 employing manual means, has been used to derive the validity of Truth/Lie decision assessment of the vibratto quantification. The present invention provides an electronic system to provide digital results.
The preferred embodiment of the invention provides a comparator 50 by which the level of significant output pulses may be adjusted by a knowledgeable operator of the equipment. Potentiometer 48 is the control means for this level adjustment. This control is shown to function either a positive or a negative voltage level. When the polarity of the diodes 14 or 16 are selected, the comparator voltage level will become of the polarity that will select either excess positive or excess negative peaks. The potentiometer 48 may be set at volts, at which time the circuit becomes a conventional zerocrossing detection device. It has been found that the statistical significance of the Truth/Lie decision process will improve if a level away from the baseline is selected for the functioning of the comparator 50. The comparator may be a simple diode circuit or it may be a Schmitt trigger circuit with suitable voltage supplies, passive and active components. However, for simplicity and economy, a differential voltage comparator such as the Motorola MCl7lO has been used for the circuit function. When the differential comparator is used, the output of amplifier 44 is brought into the comparator 50 on the signal input lead 49 while the voltage that the input pulses are being differentially compared to is brought into the comparator at lead 51. The output of the comparator is a series of pulses of constant amplitude that are related to the vibratto component in stressed and unstressed phonation.
These pulses may be counted in a variety of ways. They could be simply recorded on a chart recorder and manually and visually counted. They could also be put into an integrating diode counter and the resultant DC voltage at the outut of the counter would be directly proportional to the number of pulses of interest. A digital counter could obviously be used as well. In the chosen embodiment of the invention, the number of pulses at the output of the comparator 50 are fed into a digital counter that counts and registers in decimal digits, the exact number of pulses at its input. The digital counter 52 takes the input pulses in at terminal 53 and registers the count at digital indicator 54.
Thus the present device and method provides a readily identifiable numerical indication which will provide an interrogator with instantaneous indication of the veracity or the presence of emotional stress in the subjects responses. The invention by electrical analysis of the phonation speech envelope and emphasization of modulation produces a uniform pulse train which can be monitored to provide the data needed to detect the emotional stress. Although the invention has been described and illustrated in detail, it is to be clearly understood that the same is by way of illustration and example only and is not to be taken by way of limitation, the spirit and scope of the invention being limited only by the tenns of the appended claims.
What is claimed:
1. A method to detect emotional stress in the speech of an individual comprising:
converting said speech to an electrical signal;
smoothing said electrical signal to produce an envelope; isolating any rapid aperiodic amplitude modulations present on said smoothed envelope;
counting the number of said rapid aperiodic modulations; and
indicating the count per utterance which is indicative of emotional stress.
2. A method as in claim 1 wherein said isolating includes:
differentiating said smoothed electrical signal; and
comparing said differentiated signal with a selected voltage level to produce a pulse for each differentiated signal above said selected voltage.
3. A method as in claim 2 including the step of rectifying said electrical signal before smoothing and wherein said smoothing comprises integrating said rectified electrical signal.
4. A device for measuring the emotional stress produced variations in a speech sound comprising:
means for converting speech sounds into electrical signals;
means connected to said converting means for emphasizing an emotional stress produced variation segment of said electrical signals, by time and amplitude discrimination, by integration followed by differentiation and baseline restoration;
means connected to said emphasizing means for detecting said emotional stress produced variation segment; and
means connected to said detecting means for indicating the degree of emotional stress produced variations detected.
5. A device as in claim 4 wherein said emphasizing means includes an integrating means and a differentiating means connected in series.
6. A device as in claim 5 wherein said emphasizing means includes a rectifying means connected to the input of said integrating means.
7. A device as in claim 6 including amplifiers connected between said converting means and said rectifying means, between said rectifying means and said integrating means, and between said integrating means and said differentiating means.
8. A device as in claim 5 wherein said differentiating means produces a series of varying amplitude pulses and said detecting means includes a voltage comprising means for producing a series of uniform amplitude pulses for each varying amplitude pulse above a predetermined level.
9. A device as in claim 8 wherein said indicating means includes a counting means for counting the uniform amplitude pulses, whereby the number of uniform amplitude pulses indicates the degree of emotional stress produced variations present.
10. A device for determining emotional stress by speech wave analysis comprising:
means for producing an electrical signal representative of said speech wave;
means connected to said producing means for amplifying and shaping said electrical signal to form an electrical signal envelope; 7
means connected to said amplifying and shaping means for emphasizing rapid aperiodic amplitude modulation on said electrical signal envelope;
means connected to said emphasizing means for detecting amplitudes of said emphasized rapid aperiodic amplitude modulation above a predetermined level; and
means connected to said detecting means for indicating the number of detected modulations whereby emotional stress is determined by the value indicated.
11. A device as in claim 10 wherein said emphasizing means comprises a differentiator means and a baseline restoration means for producing a varying amplitude pulse train.
12. A device as in claim 11 wherein said detecting means comprises a voltage comparator means for producing a uniform amplitude pulse for each varying amplitude pulse above said predetermined level.
13. A device as in claim 12 wherein said shaping means includes a rectifying means and an integrating means connected in series.