WO1993005595A1 - Multi-speaker conferencing over narrowband channels - Google Patents

Multi-speaker conferencing over narrowband channels Download PDF

Info

Publication number
WO1993005595A1
WO1993005595A1 PCT/US1992/002048 US9202048W WO9305595A1 WO 1993005595 A1 WO1993005595 A1 WO 1993005595A1 US 9202048 W US9202048 W US 9202048W WO 9305595 A1 WO9305595 A1 WO 9305595A1
Authority
WO
WIPO (PCT)
Prior art keywords
compressing
data stream
signals
encoded
users
Prior art date
Application number
PCT/US1992/002048
Other languages
French (fr)
Inventor
Terrence Gerard Champion
Original Assignee
The United States Of America As Represented By The Secretary Of The Air Force
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The United States Of America As Represented By The Secretary Of The Air Force filed Critical The United States Of America As Represented By The Secretary Of The Air Force
Publication of WO1993005595A1 publication Critical patent/WO1993005595A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • H04M3/568Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • H04M3/561Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities by multiplexing

Definitions

  • the present invention relates generally to digital voice conferencing systems which provide digitally encoded voice communication to remotely located digital voice terminals, and more
  • the present invention also provides a system that allows digital-to-digital conversion between vocoders which output digital data streams at difference bit rates.
  • a vocoder (voice operated coder) is a device used to enable people to participate in private communication conferences over ordinary telephone lines by encoding their speech for transmission, and decoding speech for reception.
  • the vocoder unit consists of an electronic speech analyzer which converts the speech waveform to several simultaneous electronic digital signals, and an electronic speech synthesizer which produces artificial sounds in accordance with the encoded electronic digital signals received.
  • One narrowband technique currently in use is based on the idea of signal selection: a speaker has the channel until he finishes or someone with a higher priority bumps him, and speakers vie for the open channel when it becomes available.
  • overlaps can be accomplished by permanently halving the available bandwidth allotted to each coder and deferring signal summation to the terminal .
  • This scheme limits the overall quality of the conference by forcing the coder to work at half the available bandwidth. Since, for the majority of a conference, there will be only a single speaker, this technique causes a degradation in perceived quality.
  • the present invention includes a digital conversion system that can be used to facilitate multispeaker conferencing on narrowband vocoders, and which also permits interaction between users of vocoders which operate at different system data bit rates.
  • Vocoders generally output digitally encoded voice signals to each other using a communications network, such as a set of modems and the telephone lines.
  • the present invention uses a conferencing bridge to receive, convert and output digital data streams for either multispeaker conferencing or interaction between vocoders which operate at
  • This conferencing bridge can be either a handwired circuit or a
  • the first step includes counting the number of active conferees to produce thereby a count number N, the active conferees being users of the digitally encoded voice system.
  • the second step of the process includes compressing each of the encoded digital voice signals produced by the active conferees down to a plurality of compressed digital signals which each have a compressed data rate.
  • the third step is a digital signal summation step in which the compressed digital signals are combined to output thereby a combined compressed digital signal which is sent back to the users at the system data rate.
  • the fourth step of the process is performed by the synthesizers of the vocoders.
  • the fourth step of the process is performed by the vocoders decoding the combined compressed digital signal to output thereby an artificial voice signal to the users of the digitally encoded voice system, wherein the artificial voice signal represents voices of all the active conferees which are speaking.
  • the conferencing bridge When the conferencing bridge is used to interface vocoders which are operating at different data stream bit rates, the conferencing bridge performs a two step process.
  • the first step includes measuring each of the particular data stream bit rates produced by each of the digitally encoded voice systems to produce thereby a set of data stream bit rates which are identified for each particular digitally encoded voice system.
  • the second step includes combining and converting all received encoded digital signals into a set of converted encoded digital signals which are each sent back to the digitally encoded voice system at their
  • compression and conversion of digital data streams can be performed by techniques that include but are not limited to: the parameter de-scoping technique, the fame dilation technique, and the heterogenuous transform technique.
  • Parameter de-scoping entails decreasing the resolution and/or dynamic range of the parameterization. For example, 7 bits (128 values) may be allowed to code the fundamental frequency parameter o at a given bit-rate, while 5 bits (32 values) may be allowed for a lower bit-rate. Either the dynamic range of the parameter must be compressed or the spacing between coded values must somehow be increased, or both, to allow to be coded with 5 bits.
  • bit-rate reduction can also be achieved by keeping the same parameter resolution while increasing the frame length.
  • a parameter stream at a 4.8 kb/s with a parameter sampling rate of 20 ms can be resampled at 30 ms to produce a parameter stream of 3.6 kb/s without decreasing the resolution of the parameter set.
  • the heterogenuous transform technique can be to transform between parameterizations which are based on different speech models. For example, two alternate ways of representing the spectral envelope H( ⁇ ) are cepstral smoothing a linear prediction. If one has chosen to code cepstral coefficients at a higher bit-rate and linear prediction coefficients at a lower bit-rate, one can reconstruct the spectral envelope from cepstral coefficients, derive an estimate of the autocorrelation function from the spectral envelope, and derive the reflection
  • FIG. 1 is a block diagram of the
  • conferencing bridge of the present invention which traces the electrical signals from multiple conferees to the conferencing bridge;
  • FIG. 2 is a block diagram of the
  • Figure 3 is a block diagram which traces electrical signals out of the conferencing bridge back to the users of three vocoders;
  • Figure 4 is a flow chart of the process performed by each analyzer in the vocoder units
  • Figure 5 is a flow chart of the process performed by each synthesizer in the vocoder units
  • Figure 6 is a map of Figures 6A, 6B and 6C, which depict a flow chart of the process performed by the conferencing bridge when it facilitates
  • FIGS 7 and 8 are block diagrams which depict two embodiments of the present invention.
  • the present invention includes a digital summation system which facilitates multispeaker conferencing on vocoders which are narrowband
  • One embodiment of the invention consists of a speech terminal for each conferee and a conferencing bridge.
  • the speech terminal analyzes the conferee's voice and codes at the highest bit-rate allowed by the channel connecting the terminal to the bridge.
  • the conferencing bridge monitors all channels and determines which conferrees are actively speaking. The bridge determines the number of active speakers and allocates the bandwidth available to each
  • the bridge After allocating the channel to the active speaker, the bridge uses
  • Figure 1 is a block diagram of the
  • conferencing bridge 100 of the present invention facilitating multispeaker conferencing and digital rate conversion between the users of these telephone handsets 101-103.
  • three A/D converters 111-113 convert analog voice signals into digital voice signals, which are respectively encoded by three analyzer units 121-123 of a vocoder (as discussed in the above-cited patents). More
  • all three analyzer units 131-133 are included in the output ports of current vocoder units.
  • vocoders are used to encode and compress voice signals (using an analyzer) and then decode and produce artificial speech from received signals (using a synthesizer).
  • the three modem units 131-133 transmit the digital output signals of the analyzers 121-123 over the phone lines 150 where they are conducted by their respective bridge modems 191-193 to the conferencing bridge unit 100.
  • the present invention includes, but is not limited to, two different embodiments.
  • the conferencing bridge 100 is used to allow smooth digital-to-digital conversion between multiple vocoder analyzers 121-123 which output digital data streams at different bit rates.
  • the conferencing bridge will perform digital data compression to allow multispeaker conferencing between multiple voice digitizers (vocoders) which are only capable of outputting and receiving a digital data stream at a single system data rate.
  • the conferencing bridge 100 will multiplex the inputs of conferees speaking
  • the vocoder analyzers 121-123 can be the sinusoidal transform coders such as those used in the above-cited McAulay reference, but which each operate at a different system data rate.
  • the function of the conferencing bridge 100 is to perform digital-to-digital conversion between each of the vocoders so that each vocoder can receive the digitized voice data produced by the other vocoders, as shown in Figures 2 and 3, as discussed below.
  • Digital-to-digital conversion is performed by estimating the parameter set associated with the lower bit-rate analyzer from the parameter set associated with the higher bit-rate.
  • the conferencing bridge 100 is a computer which performs data
  • STC Sinusoidal Transform Coder
  • H( ⁇ ) is an estimate of the magnitude of the speech spectrum
  • phi ( ⁇ ) is the minimum-phase phase
  • variable ⁇ [2 ⁇ kf o ] is defined (2) where ⁇ ⁇ ⁇ [0,1] is a voicing measure determined during analysis, and is a random variable uniformly
  • any parameterization ⁇ which includes an estimate of , P ⁇ , and H( ⁇ ) can be used to transmit the information from analyzer to synthesizer.
  • the Multi-Band Excitation (MBE) is similar except that there is a set of voicing
  • parameterization used for coding allows a great deal of freedom for translating between parameterizations; one set of parameters for a given bit-rate M BPS (the parameter set will be designated can be fit to
  • the first data conversion technique is parameter de-scoping, which entails decreasing the resolution and/or dynamic range of the
  • 7 bits (128 values) may be allowed to code the fundamental frequency parameter at a given bit-rate, while 6 (64 values)
  • pitch values and ⁇ and ⁇ are determined by the range of pitch values to be coded.
  • the transformed pitch value f for the received pitch value is
  • the spectrum is coded when it is split into N bands, or channels.
  • the center frequencies for the channels below 1000 Hz are linearly spaced, while those above 1000 Hz are logarithmically spaced.
  • the channel gains are derived in the following manner. Cepstral coefficients are calculated from a smoothed representation S(( ⁇ ) of the log-magnitude spectrum by means of the equations:
  • the channel gains F k are calculated from the cepstral coefficients by means of the equations
  • ⁇ (6F) define an invertible transform between the cepstral and channel representations.
  • the channel gains F K are the transmitted parameters; from them the
  • cepstral coefficients are recovered by means of equations 6E and 6F.
  • the cepstral coefficients are then used to derive a cepstrally smoothed spectral envelope ( ⁇ ) in the usual manner.
  • the sinewave amplitudes are estimated by sampling ( ⁇ ) at the harmonics of the fundamental frequency ⁇ 0 . Since
  • ⁇ h is one of a set of discrete values.
  • the number of values Ah can take on is dependent on the number of bits allocated to it.
  • the parameters that can be varied in any parameterization of the spectrum are the frequency range that is to be represented ⁇ , the spacing of the linear channels L, the number of channels N, and the number of bits allocated to each channel.
  • Parameter de-scoping can be accomplished by decreasing the number of channels N. If ⁇ remains the same, decreasing the number of channels decreases the resolution of the parameterization. Further parameter descoping can be achieved by reducing the number of bits allocated to each channel. Finally, parameter descoping can be accomplished by reducing the frequency range ⁇ to be coded, which produces a somewhat tinnier output. Given the many, degrees of freedom available to reduce the bit allocation requirements for the spectrum, the most efficient way to perform the transformation is to reconstruct the spectral
  • bit-rate reduction is achieved by keeping the same parameter resolution while increasing the frame length.
  • a parameter stream at a 4.8 kb/s with a parameter sampling rate of 20 ms can be resampled at 30 ms to produce a parameter stream of 3.6 kb/s without
  • bit-rate reduction is achieved by keeping the same parameter resolution while increasing the frame length.
  • parameter stream at a 4.8 kb/s with a parameter sampling gate of 20 ms can be resampled at 30 ms to produce a parameter stream of 3.6 kb/s without decreasing the resolution of the parameter set.
  • bit-rate reduction is performed from 4.8 kb/s to 2.4 kb/s using STC.
  • the 4.8 kb/s parameters are sampled every 20 mille-seconds while the 2.4 kb/s parameters are sampled every 30 msecs.
  • channels 1-19 have 3 bits per channel, while channels 20-25 have 2 bits per
  • channels 1-23 have 2 bits per channel.
  • the spectral envelope is coded by dividing the spectrum into channels.
  • the channel spacing is based on a mel scale. Both the number of channels and the scale are different for the different
  • the 4.8 kb/second parameters are decoded to derive time tracks from which the 2.4 kb/second parameters can be estimated. Interpolation is used to derive the 30 ms estimates from the 20 ms estimates. Once the 30 ms estimates are derived, they are coded using the 2.4 kb/second parameter set.
  • the spectral envelope is coded by dividing the spectrum into channels.
  • the channel spacing is based on mel scale. Both the number of channels and the scale are different for the different bit-rates.
  • the 4.8 kb/s parameters are decoded to derive time tracks from which the 2.4 kb/s parameters can be estimated. Interpolation is used to derive the 30 ms estimates from the 20 ms estimates. Once the 30 ms estimates are derived, they are coded using the 2.4 kb/s parameter set.
  • the third data conversion technique is the heterogenous transform, which is used to transform between parameterizations which are based on
  • spectral envelope H(( ⁇ ) two alternate ways of representing the spectral envelope H(( ⁇ ) are cepstral smoothing and linear prediction. If one has chosen to code cepstral coefficients at a higher bit-rate and linear prediction coefficients at a lower bit-rate, one can reconstruct the spectral envelope from cepstral coefficients, derive an
  • technique 2 and technique 3 can be used to make the bit stream of coming from the sinewave coder interoperable.
  • multispeaker conferencing is accomplished between users of vocoders which are only capable of outputting and receiving a single digital data stream at a certain data rate.
  • the system of the invention includes a speech terminal for each conferee and a conferencing bridge.
  • Each of the speech terminal units (201-203 of Figure 2) include the handset, A/D converter, vocoder analyzer, and modem units depicted in Figure 1.
  • the purpose of Figure 2 is to show that each vocoder is made up of an analyzer and synthesizer.
  • the analyzer of the speech terminal analyzes the conference's voice and codes at the highest bit-rate allowed by the channel connecting the terminal to the bridge.
  • synthesizer receives encoded speech signals and produces therefrom signals which represent artificial sounds that simulate the original speech of active conferees.
  • the conferencing bridge monitors all channels and determines which conferences are
  • the bridge determines the number of active speakers and allocates the bandwidth available to each conferee based on the number of speaker parameter sets to be transmitted. After allocating the channel to the active speakers, the bridge uses parameter-space transformations to reduce the bandwidth required to encode the parameter sets. When there is only one speaker, all conferees receive the single speaker at the highest data rate allowed by each conferee's channel. When a collision between two speakers occurs the bridge allocates to each speaker half the available channel bandwidth, while performing bit-rate reduction by parameter-space transformations on the active speakers to allow both data-streams to fit in the channel. Speaker and interrupter hear each other at the full channel bandwidth.
  • the speech terminal Since the speech terminal will often receive multiple parameter sets, it must be capable of synthesizing and summing several signals.
  • One technique that can be employed with the Sinusoidal Transform Coder allows for summation in the frequency domain before synthesis. Synthesis could be
  • the non-speaking listeners receive the two colliding speakers at 2.4 kb/s.
  • Each speaker receives the other speaker at the higher bit-rate, since a speaker is not fed back his own speech.
  • FIGS. 4, 5 and 6 give the flowcharts for analysis, synthesis, and parameter-space
  • the present invention represents an
  • the conferencing bridge 100 of Figures 1, 2 and 3 can be implemented using a microprocessor which is programmed with the computer program in the microfiche appendix and connected to the modem units as illustrated.
  • suitable computer is the SUN Sparc 370, but other equivalent computers can be used.
  • the modem units, A/D converters and vocoders are the modem units, A/D converters and vocoders.
  • FIG. 3 is a block diagram showing the output signal path from the conferencing bridge 100 back to the three telephone bandsets 101-103 of Figure 1.
  • the bridge has two basic functions: 1) signal routing; and 2) bit-rate reduction on speaker parameter sets to allow for multiple speakers to be transmitted through the channel.
  • the conferencing bridge considers each terminal a source, the speech analyzer with an associated target, the speech synthesizer. When there is only one active source, that is, one speaker, all conferees (except the active speaker, all conferees (except the active speaker receive the same set of parameters). When there are two active speakers at the highest rate, while passive listeners would receive the two parameter sets of the two active speakers, each transformed to a lower
  • Figure 2 shows a typical scenario with three conferees, two of which are actively speaking.
  • the conferencing bridge When one speaker is speaking the conferencing bridge will receive the data from the telephone handset 101 at the full 4.8 kb/s data rate. Since there is no conflict between speakers, the
  • conferencing bridge outputs the voice signal back through the bridge modems 191-193, phone lines 150, speaker modems 131-133, synthesizers 301-303,
  • DAC digital-to-analog converters
  • the conferring bridge When two speakers are talking simultaneously, the conferring bridge will receive each of the bit data streams at the full 4.8 kb/s data rate, but will perform digital data comprehension to output a combined data signal with each voice represented in a 2.4 kb/s data stream. Similarly, each data stream of three speakers would be compressed into a 1.6 kb/s data stream; 1.2 kb/s for four speakers, 0.96 kb/s for five speakers etc.
  • voice quality is traded off in exchange for the use of a minimum number of bits to represent the original voice.
  • the invention is limited only by the ability to retain voice quality as the individual data systems of multiple speakers are first compressed (in proportion to the number of speakers) and then combined so that their sum represents the total bit stream data rate being used.
  • the combined data stream when processed by the synthesizers 301-303, and converted back into analog by the DAC units 311-313 will sound like the conversation of the conferees on the telephone
  • the present invention includes the process of multispeaker conferencing over a narrowband channel at a system data rate in which digital signal
  • the conferencing bridge monitors the system and counts the number of conferees speaking. When one conferee is speaking, his voice is conducted to all users of the system at the full system data rate (4.8 kb/s in the example used). When multiple conferees are speaking at once, the conferencing bridge counts the number of active conferees, and identifies the number by the integer N. This means that the digital data stream of each conferee must be compressed down to a bit stream rate given by: SDR/N, where SDE is the system data rate.
  • conferencing bridge combines and forwards all the compressed data streams in a
  • the system data rate used by one particular vocoder has 4.8 kb/s, but the principles of the present invention can apply to any system data rate, and is not limited to the example used.
  • Vocoders have a data rate which ranges from 2.4 to 32 kb/s, and the invention can be applied to any of these systems.
  • Figure 4 is a flow chart of the process used by each analyzer in the vocoders. Analog speech is converted to digital, analyzed and encoded at the system data rate. The significance of Figure 4 is that each analyzer operates at the highest bit rate allowed by the system, and compression and digital signal summation are all performed at the
  • FIG. 5 is a flowchart of the process used by each synthesizer in the vocoders.
  • the synthesizers each receive encoded digital data streams and produce therefrom artificial speech that represent the voices originally input into the handsets.
  • the data stream received by the synthesizers each receive encoded digital data streams and produce therefrom artificial speech that represent the voices originally input into the handsets.
  • the data stream received by the synthesizers each receive encoded digital data streams and produce therefrom artificial speech that represent the voices originally input into the handsets.
  • synthesizers is always at the system data rate (4.8 kb/s in this example) and represents either one conferee or any number of conferees speaking
  • FIG. 6 is a flow chart of the process performed by the conferencing bridge.
  • the conferencing bridge can be a
  • microprocessor programmed with the source code contained in the microfiche appendix. As described above, the process first counts the number of active conferees and determines therefrom the rate to which each data stream needs to be compressed into to maintain an output at the system data rate.
  • the control logic for the bridge is fairly simple. Two slots are availaable for active speakers on a first-come, first-serve basis. New speakers that begin while both slots are occupied are denied access to a newly freed channel to prevent speakers from being represented in mid-sentence. Since some interpolation of parameters is done, care must be taken to properly associate parameters going into and out of collisions. For this purpose the bridge recognizes and codes one of four states. One state represents no change from the previous state.
  • Another state signals an increase in the number of speakers from one or two (one speaker is assumed); the other two states identify which speaker is still speaking during the transition back to one speaker.
  • overlap-add synthesis allows for a particularly simple method for synthesizing multiple speakers.
  • speaker's parameters, pitch voicing, and spectrum are used to fill out an FFT buffer.
  • the inverse transform is taken and summed in the usual
  • a bit-rate reduction can be performed by using a suite of embedded coders, coders for which the parameters of the higher
  • bit-rate coders contain coarser resolution parameters for a lower bit-rate system.
  • the digital data signal compression can be performed by the conferencing bridge using any of three digital data conversion techniques described above.
  • the present invention is not limited to these three techniques alone, nor is it mandatory that the conferencing bridge be a computer.
  • the process of Figure 6 could be performed by a hardwire circuit as well as a computer programmed with the source code listed in the attached microfiche
  • the digital data received by the conferencing bridge is not limited to digitized voice data, but can include other digital data signals including visual images.
  • the proposed conferencing system can be implemented using embedded coders, coders which have the lower bit rate parameters embedded in the
  • Figures 7 and 8 are illustrative of two embodiments of the present invention.
  • Figure 7 depicts the conferencing bridge 20 which is connected by communication links 14-16 to multiple conferees using vocoders 21-23.
  • each of the communication links 14-16 can include but are not limited to: modems and telephone lines, satellite communication links, radio, and laser communication network.
  • Each of the vocoders 21-23 has an analyzer for encoding and. digitizing the user outputs, and a synthesizer for decoding a combined conference signal received from the conferencing bridge.
  • Each of the vocoders 21-23 has a user interface 11-13 which includes an input port device 18 and an output port device 17. These input and output port devices include, but are not limited to the microphone and speaker units of telephone handsets.
  • FIG. 8 is a block diagram of an application of the principles of the present invention that accomplishes both multispeaker conferencing and the interface of vocoders with different system data rates.
  • detector units 31-33 and multiple rate translator units 38-40.
  • Each of the detector units 35-37 output a detector signal to the multiconferencing processor 41 when a conferee on its particular line is speaking.
  • Each of the rate translators 38-40 may be an individual microprocessor which performs one of the rate conversion techniques described above.
  • the multiconferencing processor 41 is a data processor which outputs a combined conference signal to each of the conferees at their particular data rate using the techniques described above.
  • nmbr_channels nmbr_channels_1;
  • spacing_channels spacing_channels_1;
  • n O;
  • nmbr_spkrs_past nmbr_ spkrs
  • modulus ((float)abs_time[spkr] - inten)_time[spkr]);
  • inside_ modulus_prime base_frame_div_2 - inside modulus
  • pitch_final (inside_m odulus*pitch_q_ kml[spkr]
  • pitch_final (inside_m odulus*pitch_q_k[spkr]
  • pitch_finaI (modulus*pitch_q_kml[spkr]
  • vprob_final (inside_modulus*vprob_q_k ml[spkr]
  • nmbr_channels nmbr-channels_2;
  • pack_b its(pitch_index_kpl[spkr],nbits_pitch;
  • spacing_channels spacing_channels_1;
  • nmbr_bits_channel nmbr_bits_channel_1 ;

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)
  • Time-Division Multiplex Systems (AREA)
  • Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A system (100) for digital conferencing over narrowband channels which allows use by multiple simultaneous speakers (101, 102, 103) and which allows for interfacing between vocoders (21, 22, 23) which operate at different bit rates. The system (100) takes advantage of the properties of multi-rate parametric vocoders (21, 22, 23) (which includes the Sinusoidal Transform Coder and the Multi-Band Excitation Vocoder, as well as embedded coders), defined as parametric vocoders for which the basic model and parameter set remain unchanged over a wide range of bit rates. The system (100) performs signal summation. To maintain quality for a single speaker (101, 102, 103) while allowing multiple speakers (101, 102, 103), the system (100) adaptively allocates channel bandwidth based on the number of speakers (101, 102, 103) to be represented. This system (100) also provides digital-to-digital conversion (38, 39, 40) between narrow band digitizers (vocoders) operating at different bit rates. The system (100) takes advantage of the characteristics of a particular class of coders, sine-wave based coders, which perform parameter estimation by sine-wave summation. Digital-to-digital conversion is performed by estimating the parameter set associated with the lower bit rate digitizer from the parameter set associated with the higher bit rate.

Description

MULTI-SPEAKER CONFERENCING OVER NARROWBAND CHANNELS
STATEMENT OF GOVERNMENT INTEREST
The invention described herein may be
manufactured and used by or for the Government for governmental purposes without the payment of any royalty thereon.
REFERENCE TO MICROFICHE APPENDIX
Reference is made to the microfiche appendix which contains l sheet representing 10 frames of the conferencing bridge source code.
BACKGROUND OF THE INVENTION
The present invention relates generally to digital voice conferencing systems which provide digitally encoded voice communication to remotely located digital voice terminals, and more
specifically to a system for facilitating
multispeaker conferencing on vocoders which use narrowband channels (such as 4.8 kb/second). The present invention also provides a system that allows digital-to-digital conversion between vocoders which output digital data streams at difference bit rates.
A vocoder (voice operated coder) is a device used to enable people to participate in private communication conferences over ordinary telephone lines by encoding their speech for transmission, and decoding speech for reception. The vocoder unit consists of an electronic speech analyzer which converts the speech waveform to several simultaneous electronic digital signals, and an electronic speech synthesizer which produces artificial sounds in accordance with the encoded electronic digital signals received.
The problem of conferencing over systems which employ parametric vocoders has long been of
interest. In analog or wideband digital
conferencing, overlapping speakers are handled by signal summation at a conferencing bridge. Such a scheme is not feasible for parametric vocoders for two reasons: 1) signal summation would require tandeming, synthesis and reanalysis of the speech waveform, a process which causes severe degradations in narrowband parametric vocoders; 2) narrowband vocoders cannot satisfactorily represent multiple speakers. One of the difficulties in combining two or more voice tracks is that you end up with two fundamental frequencies: one for each voice signal. These are difficult to encode and separate.
One narrowband technique currently in use is based on the idea of signal selection: a speaker has the channel until he finishes or someone with a higher priority bumps him, and speakers vie for the open channel when it becomes available. The
advantage of such a technique is that it avoids the degradations described above; however, such a
technique is cumbersome since most conference control is handled by interruptions and overlapping speakers, and this scheme presents only one speaker to the listener. Some coders have some capability of representing multiple speakers; however, the speech quality is significantly degraded due to the tandem between coders. In other schemes two-speaker
overlaps can be accomplished by permanently halving the available bandwidth allotted to each coder and deferring signal summation to the terminal . This scheme limits the overall quality of the conference by forcing the coder to work at half the available bandwidth. Since, for the majority of a conference, there will be only a single speaker, this technique causes a degradation in perceived quality.
Examples of vocoder system technology are discussed in the following U.S. Patents, the
disclosures of which are incorporated herein by reference:
U.S Patent No. 4,856,068 issued to Quatieri, Jr. et al;
U.S. Patent No. 4,885,790 issued to McAulay et al;
U.S. Patent No. 4,270,025 issued to Alsup et al;
U.S. Patent No. 4,271,502 issued to Goutmann et al; U.S. Patent No. 4,435,832 issued to Asada et al; U.S. Patent No. 4,441,201 issued to Henderson et al; U.S. Patent No. 4,618,982 issued to Horvath et al; and U.S. Patent No. 4,937,873 issued to McAulay et al. All of the above-cited patents disclose digital vocoders and voice compression systems that can be improved by the present invention. Of
particular interest are the Quatieri, Jr. et al and McAulay et al references which disclose vocoder systems with equipment used by the present invention.
The inherent problems encountered by all of the prior art vocoder systems is a result of the difficulty in realistically representing human speech in limited narrowband channels. As pointed out in the Goutmann et al reference, current digital voice terminals currently achieve bit rates ranging between 2.4 to 32 kilobits per second. One of the most common systems uses 4.8 kb/second, when a user of a system that uses only a 4.8 kb/second data stream is attempting to communicate with a user of a 2.4 kb/second data stream, a means of converting the bit rate signals becomes necessary. The present
invention provides examples with specific kb/second ranges, however it should be understood that this invention is not limited to these specific data bit rates. Although the example includes the use of modems and telephone lines, the invention is
applicable over any media that transmits digitally encoded voice signals. These media systems include, but are not limited to: ratio transmission,
satellite communication systems and laser
communication systems.
In view of the foregoing discussion, it is apparent that there remains an ongoing need to enhance the ability of digital vocoders to handle multispeaker conferencing on narrowband channels, and to interface vocoders which have different bit rate data streams while preserving voice quality. The present invention is intended to help satisfy that need.
SUMMARY OF THE INVENTION
The present invention includes a digital conversion system that can be used to facilitate multispeaker conferencing on narrowband vocoders, and which also permits interaction between users of vocoders which operate at different system data bit rates. Vocoders generally output digitally encoded voice signals to each other using a communications network, such as a set of modems and the telephone lines. The present invention uses a conferencing bridge to receive, convert and output digital data streams for either multispeaker conferencing or interaction between vocoders which operate at
different system data bit rates. This conferencing bridge can be either a handwired circuit or a
computer which uses the source code of the microfiche appendix, and which operates as described below.
When the conferencing bridge performs
multispeaker conferencing between vocoders which each operate at the same data rate, the following four step process is conducted. The first step includes counting the number of active conferees to produce thereby a count number N, the active conferees being users of the digitally encoded voice system. The second step of the process includes compressing each of the encoded digital voice signals produced by the active conferees down to a plurality of compressed digital signals which each have a compressed data rate. The third step is a digital signal summation step in which the compressed digital signals are combined to output thereby a combined compressed digital signal which is sent back to the users at the system data rate. The fourth step of the process is performed by the synthesizers of the vocoders. More specifically, the fourth step of the process is performed by the vocoders decoding the combined compressed digital signal to output thereby an artificial voice signal to the users of the digitally encoded voice system, wherein the artificial voice signal represents voices of all the active conferees which are speaking.
When the conferencing bridge is used to interface vocoders which are operating at different data stream bit rates, the conferencing bridge performs a two step process. The first step includes measuring each of the particular data stream bit rates produced by each of the digitally encoded voice systems to produce thereby a set of data stream bit rates which are identified for each particular digitally encoded voice system. The second step includes combining and converting all received encoded digital signals into a set of converted encoded digital signals which are each sent back to the digitally encoded voice system at their
particular data stream bit rate.
In both of the applications described above, compression and conversion of digital data streams can be performed by techniques that include but are not limited to: the parameter de-scoping technique, the fame dilation technique, and the heterogenuous transform technique. Parameter de-scoping entails decreasing the resolution and/or dynamic range of the parameterization. For example, 7 bits (128 values) may be allowed to code the fundamental frequency parameter
Figure imgf000011_0001
o at a given bit-rate, while 5 bits (32 values) may be allowed for a lower bit-rate. Either the dynamic range of the parameter must be compressed or the spacing between coded values must somehow be increased, or both, to allow
Figure imgf000011_0002
to be coded with 5 bits. Once the values for the different
parameterizations has been set, a one-to-one mapping can be set up. While for some parameters it may be efficient to map directly from one coded parameter to its lower bit-rate counterpart, other parameters might be decoded, and the estimated parameter
recorded at the lower resolution parameterization.
In the frame dilation technique, bit-rate reduction can also be achieved by keeping the same parameter resolution while increasing the frame length. With this technique the parameters
themselves are considered to be a set of time-varying functions, which are sampled at a higher rate for the high bit-rate parameterization and down-sampled for the lower rate parameterization. For example, a parameter stream at a 4.8 kb/s with a parameter sampling rate of 20 ms can be resampled at 30 ms to produce a parameter stream of 3.6 kb/s without decreasing the resolution of the parameter set.
The heterogenuous transform technique can be to transform between parameterizations which are based on different speech models. For example, two alternate ways of representing the spectral envelope H(ω) are cepstral smoothing a linear prediction. If one has chosen to code cepstral coefficients at a higher bit-rate and linear prediction coefficients at a lower bit-rate, one can reconstruct the spectral envelope from cepstral coefficients, derive an estimate of the autocorrelation function from the spectral envelope, and derive the reflection
coefficients.
It is an object of the present invention to provide a system for facilitating multispeaker
conferencing between users of digitally encoded voice systems. It is another object of the present invention to provide a data rate conversion system to interface vocoders which operate at different digital system data rates.
These together with other objects, features and advantages of the invention will become more readily apparent from the following detailed
description when taken in conjunction with the accompanying drawings wherein like elements are given like reference numbers throughout.
DESCRIPTION OF THE DRAWINGS
Figure 1 is a block diagram of the
conferencing bridge of the present invention which traces the electrical signals from multiple conferees to the conferencing bridge;
Figure 2 is a block diagram of the
conferencing bridge interconnecting three remote vocoders; Figure 3 is a block diagram which traces electrical signals out of the conferencing bridge back to the users of three vocoders;
Figure 4 is a flow chart of the process performed by each analyzer in the vocoder units;
Figure 5 is a flow chart of the process performed by each synthesizer in the vocoder units;
Figure 6 is a map of Figures 6A, 6B and 6C, which depict a flow chart of the process performed by the conferencing bridge when it facilitates
multispeaker conferencing; and
Figures 7 and 8 are block diagrams which depict two embodiments of the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
The present invention includes a digital summation system which facilitates multispeaker conferencing on vocoders which are narrowband
channels (such as, but not limited to 4.8 kb/second). One embodiment of the invention consists of a speech terminal for each conferee and a conferencing bridge. The speech terminal analyzes the conferee's voice and codes at the highest bit-rate allowed by the channel connecting the terminal to the bridge. The conferencing bridge monitors all channels and determines which conferrees are actively speaking. The bridge determines the number of active speakers and allocates the bandwidth available to each
conferee based on the number of speaker parameter sets to be transmitted. After allocating the channel to the active speaker, the bridge uses
parameter-space transformations to reduce the
bandwidth required to encode the parameter sets.
When there is only one speaker, all conferencees receive the single speaker at the highest data rate allowed by each conferencee' s channel. When a collision between two speakers occurs, the bridge allocates to each speaker half the available channel bandwidth, while performing bit-rate reduction by parameter-space transformations on the active
speakers to allow both data-streams to fit in the channel. In a typical scenario with three conferees and two interrupting speakers, over 4.8 kb/s
channels, the non-speaking listeners receive the two colliding speakers at 2.4 kb/s, as discussed below. The reader's attention is now directed towards Figure 1, which is a block diagram of the
conferencing bridge 100 of the present invention facilitating multispeaker conferencing and digital rate conversion between the users of these telephone handsets 101-103. As shown in Figure 1, three A/D converters 111-113 convert analog voice signals into digital voice signals, which are respectively encoded by three analyzer units 121-123 of a vocoder (as discussed in the above-cited patents). More
specifically, all three analyzer units 131-133 are included in the output ports of current vocoder units. As mentioned above, vocoders are used to encode and compress voice signals (using an analyzer) and then decode and produce artificial speech from received signals (using a synthesizer).
The three modem units 131-133 transmit the digital output signals of the analyzers 121-123 over the phone lines 150 where they are conducted by their respective bridge modems 191-193 to the conferencing bridge unit 100.
The present invention includes, but is not limited to, two different embodiments. In the first embodiment of the invention, the conferencing bridge 100 is used to allow smooth digital-to-digital conversion between multiple vocoder analyzers 121-123 which output digital data streams at different bit rates. In the second embodiment, the conferencing bridge will perform digital data compression to allow multispeaker conferencing between multiple voice digitizers (vocoders) which are only capable of outputting and receiving a digital data stream at a single system data rate. In the second embodiment of the invention the conferencing bridge 100 will multiplex the inputs of conferees speaking
simultaneously so that the users of the vocoders can hear all active conferees by receiving a single combined digital data stream at the system data rate. Of these two embodiments of the invention, the first embodiment will be described first in the discussion that follows.
In the first embodiment of the present invention, the vocoder analyzers 121-123 can be the sinusoidal transform coders such as those used in the above-cited McAulay reference, but which each operate at a different system data rate. The function of the conferencing bridge 100 is to perform digital-to-digital conversion between each of the vocoders so that each vocoder can receive the digitized voice data produced by the other vocoders, as shown in Figures 2 and 3, as discussed below.
Digital-to-digital conversion is performed by estimating the parameter set associated with the lower bit-rate analyzer from the parameter set associated with the higher bit-rate.
In the present invention, the conferencing bridge 100 is a computer which performs data
conversion using a subroutine in the program of the microfiche appendix. This conversion can be done by three different techniques as well as a combination of these and other techniques, as described below.
The above-cited patents of McAulay et al and Quatieri et al describe sinusoidal models for
acoustic waveforms. One of the parametric coders for which the present invention will work is the Sinusoidal Transform Coder (STC). The underlying model for the STC working at bit rates from 8 kb/s is
(1)
Figure imgf000019_0001
where
Figure imgf000019_0003
is an estimate of fundamental frequency, H(ω) is an estimate of the magnitude of the speech spectrum, phi (ω) is the minimum-phase phase
estimate, and
Figure imgf000019_0002
[2π kf o] an estimate of the
residual phase (that is, the difference between the true phase and the first two terms in the agreement of the exponential). The variable ψ[2πkfo] is defined
Figure imgf000019_0006
(2) where ργε [0,1] is a voicing measure determined during analysis, and
Figure imgf000019_0005
is a random variable uniformly
between-π and π . For coding any parameterization Ω which includes an estimate of
Figure imgf000019_0004
, Pγ , and H(ω) can be used to transmit the information from analyzer to synthesizer. The Multi-Band Excitation (MBE) is similar except that there is a set of voicing
parameters {Pγ i} corresponding to different frequency regions, where Pγ i equals either 1 or 0. The fact that analysis/synthesis for these types of coder is largely independent of the type of
parameterization used for coding allows a great deal of freedom for translating between parameterizations; one set of parameters for a given bit-rate M BPS (the parameter set will be designated can be fit to
Figure imgf000020_0001
another coded set of parameters for another bit-rate K BPS . The resulting speech quality is largely
Figure imgf000020_0002
a matter of parameter resolution of the lowest bit-rate parameter set.
The first data conversion technique is parameter de-scoping, which entails decreasing the resolution and/or dynamic range of the
parameterization. For example, 7 bits (128 values) may be allowed to code the fundamental frequency parameter at a given bit-rate, while 6 (64 values)
Figure imgf000020_0003
may be allowed for a lower bit-rate. Either the dynamic range of the parameter must be compressed or the spacing between coded values must somehow be increased, or both, to allow to be coded with 6
Figure imgf000020_0004
bits. As an example, let
Figure imgf000021_0001
be the set of coded pitch values allowed for the source coder with 7 bits allocated for pitch, and
Figure imgf000021_0002
be the set of coded bit pitch values allowed for the target coder working at with 6 bits allocated for pitch, defined by the equations:
Figure imgf000021_0009
p (3)
(4)
Figure imgf000021_0010
where
Figure imgf000021_0003
and are the lowest allowable coded
Figure imgf000021_0004
pitch values and α and β are determined by the range of pitch values to be coded. The transformed pitch value
Figure imgf000021_0008
f for the received pitch value is
Figure imgf000021_0006
Figure imgf000021_0007
where j is defined
( 5 )
Figure imgf000021_0005
for any |ε [ 0 , 64 ] . Once the values for the different
parameterizations has been set, a one-to-one mapping can be set up.
While for some parameters it may be efficient to map directly from one coded parameter to its lower bit-rate counterpart, other parameters might be decoded, and the estimated parameter recorded at the lower resolution parameterization.
The spectrum is coded when it is split into N bands, or channels. The center frequencies for the channels below 1000 Hz are linearly spaced, while those above 1000 Hz are logarithmically spaced. For channel h in the non-linear region the center
frequency is
ωk =(1+σ )ωk-1
(6)
The channel gains are derived in the following manner. Cepstral coefficients are calculated from a smoothed representation S((ω) of the log-magnitude spectrum by means of the equations:
(6A)
(6B)
Figure imgf000023_0001
The channel gains Fk are calculated from the cepstral coefficients by means of the equations
(6C)
(6D)
Figure imgf000023_0002
These equations, along with equations
(6E)
\ (6F)
Figure imgf000024_0001
define an invertible transform between the cepstral and channel representations. The channel gains FK are the transmitted parameters; from them the
cepstral coefficients are recovered by means of equations 6E and 6F. The cepstral coefficients are then used to derive a cepstrally smoothed spectral envelope
Figure imgf000024_0002
(ω) in the usual manner. The sinewave amplitudes are estimated by sampling
Figure imgf000024_0003
(ω) at the harmonics of the fundamental frequency ω0. Since
(ω) represents the log-magnitude spectrum, S(Kωo ) must be exponentiated in order to be used as an estimate of the sinewave amplitude aK. Each channel gain Fh is differentially coded from the previous gain Fh-1
Fk = Fk- 1 + Δk (7)
where Δh is one of a set of discrete values. The number of values Ah can take on is dependent on the number of bits allocated to it. The parameters that can be varied in any parameterization of the spectrum are the frequency range that is to be represented Ω, the spacing of the linear channels L, the number of channels N, and the number of bits allocated to each channel.
Parameter de-scoping can be accomplished by decreasing the number of channels N. If Ω remains the same, decreasing the number of channels decreases the resolution of the parameterization. Further parameter descoping can be achieved by reducing the number of bits allocated to each channel. Finally, parameter descoping can be accomplished by reducing the frequency range Ω to be coded, which produces a somewhat tinnier output. Given the many, degrees of freedom available to reduce the bit allocation requirements for the spectrum, the most efficient way to perform the transformation is to reconstruct the spectral
envelope estimate
Figure imgf000026_0001
(ω ) at the higher resolution and records at the lower resolution.
In the second data conversion technique, bit-rate reduction is achieved by keeping the same parameter resolution while increasing the frame length. With this technique the parameters
themselves are considered to be a set of time-varying functions, which are samples at a higher rate for the high bit-rate parameterization. For example, a parameter stream at a 4.8 kb/s with a parameter sampling rate of 20 ms can be resampled at 30 ms to produce a parameter stream of 3.6 kb/s without
decreasing the resolution of the parameter set.
In the second data conversion technique, bit-rate reduction is achieved by keeping the same parameter resolution while increasing the frame length. With this technique the parameters
themselves re considered to be a set of time-varying functions, which are sampled at a higher rate for the high bit-rate parameterization and down-sampled for a lower rate parameterization. For example, a
parameter stream at a 4.8 kb/s with a parameter sampling gate of 20 ms can be resampled at 30 ms to produce a parameter stream of 3.6 kb/s without decreasing the resolution of the parameter set.
Although the parameters should be low-pass filtered in time before downsampling, the parameters vary slowly enough in time that linear interpolation is sufficient. Interpolation is done in the
following manner. Let t be the absolute time of the current received frame and tint the time of the next interpolated frame which is to be transmitted. When t > tint, an interpolated set of parameters is
Figure imgf000027_0001
estimated from the received parameters of the
previous frame { ρk i-1 } and the current frame
{PK i } by means of the equation
(8)
Figure imgf000027_0002
The weights w and w' are calculated with the
following equations
(i mod bf);
Figure imgf000028_0001
(9)
w' = 1 - ω;
(10) where bf is the frame rate of the untransformed parameters.
A specific example is described in which bit-rate reduction is performed from 4.8 kb/s to 2.4 kb/s using STC. The 4.8 kb/s parameters are sampled every 20 mille-seconds while the 2.4 kb/s parameters are sampled every 30 msecs. The specific bit
allocation is tabulated below.
PARAMETER 4.8 kb/s 2.4 kb/s
Synchronization 1 bit 1 bit
Pitch 7 6
Mid-frame Pitch 7 6
Voicing 3 3
Mid-frame Voicing 3 3
Spectral Envelope 73 52
Mid-frame Envelope 2 2
Total 96 72 For spectrum coding the following values were used for parameterization:
PARAMETER 4.8 kb/second 2.4 kb/second
0-3.8 Khz 0-3.6 Khz
N 26 24
Channel 0 6 bits 6 bits
At 4.8 kb/second channels 1-19 have 3 bits per channel, while channels 20-25 have 2 bits per
channel. At the transformed 2.4 kb/second rate channels 1-23 have 2 bits per channel.
Pitch for both parameter sets is
logarithmically coded between 50 and 400 hz; there is greater separation between the coded values as
Figure imgf000029_0001
goes up. The spectral envelope is coded by dividing the spectrum into channels. The channel spacing is based on a mel scale. Both the number of channels and the scale are different for the different
bit-rates.
There is no parameter de-scoping for the voicing. Since the transformation is working at two different frame rates, the 4.8 kb/second parameters are decoded to derive time tracks from which the 2.4 kb/second parameters can be estimated. Interpolation is used to derive the 30 ms estimates from the 20 ms estimates. Once the 30 ms estimates are derived, they are coded using the 2.4 kb/second parameter set.
While the parameters are transmitted every 20 or 30 milliseconds, estimates of the parameters for 10 and 15 milliseconds, called the mid-frame
estimate, are transmitted with each frame.
Pitch for both parameter sets is
logarithmically coded between 50 and 400 Hz: there is greater separation between the coded values as
Figure imgf000030_0001
goes up. The spectral envelope is coded by dividing the spectrum into channels. The channel spacing is based on mel scale. Both the number of channels and the scale are different for the different bit-rates.
Since the transformation is working at two different frame rates, the 4.8 kb/s parameters are decoded to derive time tracks from which the 2.4 kb/s parameters can be estimated. Interpolation is used to derive the 30 ms estimates from the 20 ms estimates. Once the 30 ms estimates are derived, they are coded using the 2.4 kb/s parameter set.
The third data conversion technique is the heterogenous transform, which is used to transform between parameterizations which are based on
different speech models. For example, two alternate ways of representing the spectral envelope H((ω) are cepstral smoothing and linear prediction. If one has chosen to code cepstral coefficients at a higher bit-rate and linear prediction coefficients at a lower bit-rate, one can reconstruct the spectral envelope from cepstral coefficients, derive an
estimate of the autocorrelation function from the spectral envelope, and derive the reflection
coefficients. Letting
Figure imgf000031_0002
represent the set of coded cepstral coefficients, the estimated spectrum is calculated
(111
Figure imgf000031_0001
and the set of autocorrelation coefficients are derived from the equation
(12)
Figure imgf000032_0001
From the [Tk] the reflection coefficients can be derived using Levinson Durbin Recursion.
This idea can be extended to produce a method for transformation between coders which use different speech models. Previously, it has been reported that the uncoded sinewave parameters could be represented in terms of 10 parameters. This idea can be extended by fitting the parameters onto coded sinewave
parameters. For example, if a cepstral envelope has been used to code the sinewave spectrum at some frame rate, technique 2 and technique 3 can be used to make the bit stream of coming from the sinewave coder interoperable.
As mentioned above, in the second major
embodiment of the present invention multispeaker conferencing is accomplished between users of vocoders which are only capable of outputting and receiving a single digital data stream at a certain data rate. As shown in Figure 2, the system of the invention includes a speech terminal for each conferee and a conferencing bridge. Each of the speech terminal units (201-203 of Figure 2) include the handset, A/D converter, vocoder analyzer, and modem units depicted in Figure 1. The purpose of Figure 2 is to show that each vocoder is made up of an analyzer and synthesizer. The analyzer of the speech terminal analyzes the conference's voice and codes at the highest bit-rate allowed by the channel connecting the terminal to the bridge. The
synthesizer receives encoded speech signals and produces therefrom signals which represent artificial sounds that simulate the original speech of active conferees. The conferencing bridge monitors all channels and determines which conferences are
actively speaking. The bridge determines the number of active speakers and allocates the bandwidth available to each conferee based on the number of speaker parameter sets to be transmitted. After allocating the channel to the active speakers, the bridge uses parameter-space transformations to reduce the bandwidth required to encode the parameter sets. When there is only one speaker, all conferees receive the single speaker at the highest data rate allowed by each conferee's channel. When a collision between two speakers occurs the bridge allocates to each speaker half the available channel bandwidth, while performing bit-rate reduction by parameter-space transformations on the active speakers to allow both data-streams to fit in the channel. Speaker and interrupter hear each other at the full channel bandwidth.
Since the speech terminal will often receive multiple parameter sets, it must be capable of synthesizing and summing several signals. One technique that can be employed with the Sinusoidal Transform Coder allows for summation in the frequency domain before synthesis. Synthesis could be
performed first and the resulting analog waveforms summed.
In a typical scenario with three conferences and two interrupting speakers over 4.8 kb/s channels, the non-speaking listeners receive the two colliding speakers at 2.4 kb/s. Each speaker receives the other speaker at the higher bit-rate, since a speaker is not fed back his own speech.
Figures 4, 5 and 6 give the flowcharts for analysis, synthesis, and parameter-space
transformation routines respectively, used by the analyzer, synthesizer and conferencing bridge of Figures 1 and 2 and are discussed below after the discussion of Figures 2 and 3.
The present invention represents an
improvement over signal selection techniques in that it allows for the representation of multiple
speakers, allowing the conference to flow in a manner analogous to analog conferences. It improves upon the idea of having the bandwidth to allow for two speakers, by only dividing the channel when there are multiple speakers, thereby allowing for higher quality when only one speaker is present. In
addition, the principle of a parameter space
transformation allows for a conference that is taking place at a higher bit-rate to be reconstructed at a lower bit rate for conferences that whose channels are more bandlimited. For example, a listener who is listening is connected to the example conference given above but whose channel can only support 2.4 kb/s could hear the primary speaker at 2.4 kb/s and speaker and interrupter at 1.2 kb/s.
In both of the embodiments of the invention described above, the conferencing bridge 100 of Figures 1, 2 and 3 can be implemented using a microprocessor which is programmed with the computer program in the microfiche appendix and connected to the modem units as illustrated. Once suitable computer is the SUN Sparc 370, but other equivalent computers can be used. Similarly the modem units, A/D converters and vocoders are
commercially-available items and need not be
described in detail.
Figure 3 is a block diagram showing the output signal path from the conferencing bridge 100 back to the three telephone bandsets 101-103 of Figure 1. As indicated above the bridge has two basic functions: 1) signal routing; and 2) bit-rate reduction on speaker parameter sets to allow for multiple speakers to be transmitted through the channel. For signal routing the conferencing bridge considers each terminal a source, the speech analyzer with an associated target, the speech synthesizer. When there is only one active source, that is, one speaker, all conferees (except the active speaker, all conferees (except the active speaker receive the same set of parameters). When there are two active speakers at the highest rate, while passive listeners would receive the two parameter sets of the two active speakers, each transformed to a lower
bit-rate. Figure 2 shows a typical scenario with three conferees, two of which are actively speaking.
When one speaker is speaking the conferencing bridge will receive the data from the telephone handset 101 at the full 4.8 kb/s data rate. Since there is no conflict between speakers, the
conferencing bridge outputs the voice signal back through the bridge modems 191-193, phone lines 150, speaker modems 131-133, synthesizers 301-303,
digital-to-analog converters (DAC) 311-313 back to all three handsets 101-103.
When two speakers are talking simultaneously, the conferring bridge will receive each of the bit data streams at the full 4.8 kb/s data rate, but will perform digital data comprehension to output a combined data signal with each voice represented in a 2.4 kb/s data stream. Similarly, each data stream of three speakers would be compressed into a 1.6 kb/s data stream; 1.2 kb/s for four speakers, 0.96 kb/s for five speakers etc. When compressing a data stream voice quality is traded off in exchange for the use of a minimum number of bits to represent the original voice. The principle of the present
invention is limited only by the ability to retain voice quality as the individual data systems of multiple speakers are first compressed (in proportion to the number of speakers) and then combined so that their sum represents the total bit stream data rate being used. The combined data stream, when processed by the synthesizers 301-303, and converted back into analog by the DAC units 311-313 will sound like the conversation of the conferees on the telephone
handsets 101-103.
The present invention includes the process of multispeaker conferencing over a narrowband channel at a system data rate in which digital signal
compression and digital signal summation are used as follows. First, in the exempt shown above, the conferencing bridge monitors the system and counts the number of conferees speaking. When one conferee is speaking, his voice is conducted to all users of the system at the full system data rate (4.8 kb/s in the example used). When multiple conferees are speaking at once, the conferencing bridge counts the number of active conferees, and identifies the number by the integer N. This means that the digital data stream of each conferee must be compressed down to a bit stream rate given by: SDR/N, where SDE is the system data rate.
Finally the conferencing bridge combines and forwards all the compressed data streams in a
combined digital signal summation data stream back to all users of the system such that the combined digital signal summation data stream has a bit data rate that equals the rate of the system data rate. In developing the present invention, the system data rate used by one particular vocoder has 4.8 kb/s, but the principles of the present invention can apply to any system data rate, and is not limited to the example used. Vocoders have a data rate which ranges from 2.4 to 32 kb/s, and the invention can be applied to any of these systems.
Figure 4 is a flow chart of the process used by each analyzer in the vocoders. Analog speech is converted to digital, analyzed and encoded at the system data rate. The significance of Figure 4 is that each analyzer operates at the highest bit rate allowed by the system, and compression and digital signal summation are all performed at the
conferencing bridge.
Figure 5 is a flowchart of the process used by each synthesizer in the vocoders. The synthesizers each receive encoded digital data streams and produce therefrom artificial speech that represent the voices originally input into the handsets. In the present invention, the data stream received by the
synthesizers is always at the system data rate (4.8 kb/s in this example) and represents either one conferee or any number of conferees speaking
simultaneously.
Figure 6 is a flow chart of the process performed by the conferencing bridge. As mentioned above, the conferencing bridge can be a
microprocessor programmed with the source code contained in the microfiche appendix. As described above, the process first counts the number of active conferees and determines therefrom the rate to which each data stream needs to be compressed into to maintain an output at the system data rate.
The control logic for the bridge is fairly simple. Two slots are availaable for active speakers on a first-come, first-serve basis. New speakers that begin while both slots are occupied are denied access to a newly freed channel to prevent speakers from being represented in mid-sentence. Since some interpolation of parameters is done, care must be taken to properly associate parameters going into and out of collisions. For this purpose the bridge recognizes and codes one of four states. One state represents no change from the previous state.
Another state signals an increase in the number of speakers from one or two (one speaker is assumed); the other two states identify which speaker is still speaking during the transition back to one speaker. The use of overlap-add synthesis allows for a particularly simple method for synthesizing multiple speakers. With the over-lap add technique a
speaker's parameters, pitch voicing, and spectrum, are used to fill out an FFT buffer. The inverse transform is taken and summed in the usual
overlap-add manner. With two speakers, the two FFT buffers are added before taking the inverse
transform. In this way synthesis of two speaker is no more complex than one speaker.
The idea of splitting the channel to allow for the parameter sets of multiple speakers depends on an effective transformation from a higher bit-rate to a lower bit-rate. Such a transformation must be able to work on a received set of parameters without changing the source rate. A bit-rate reduction can be performed by using a suite of embedded coders, coders for which the parameters of the higher
bit-rate coders contain coarser resolution parameters for a lower bit-rate system.
The digital data signal compression can be performed by the conferencing bridge using any of three digital data conversion techniques described above. However, the present invention is not limited to these three techniques alone, nor is it mandatory that the conferencing bridge be a computer. The process of Figure 6 could be performed by a hardwire circuit as well as a computer programmed with the source code listed in the attached microfiche
appendix. Similarly, the digital data received by the conferencing bridge is not limited to digitized voice data, but can include other digital data signals including visual images.
The proposed conferencing system can be implemented using embedded coders, coders which have the lower bit rate parameters embedded in the
parameters of the higher bit rate coder.
Figures 7 and 8 are illustrative of two embodiments of the present invention. Figure 7 depicts the conferencing bridge 20 which is connected by communication links 14-16 to multiple conferees using vocoders 21-23. As mentioned above, each of the communication links 14-16 can include but are not limited to: modems and telephone lines, satellite communication links, radio, and laser communication network. Each of the vocoders 21-23 has an analyzer for encoding and. digitizing the user outputs, and a synthesizer for decoding a combined conference signal received from the conferencing bridge. Each of the vocoders 21-23 has a user interface 11-13 which includes an input port device 18 and an output port device 17. These input and output port devices include, but are not limited to the microphone and speaker units of telephone handsets.
Figure 8 is a block diagram of an application of the principles of the present invention that accomplishes both multispeaker conferencing and the interface of vocoders with different system data rates. The conferencing bridge unit 30 includes: a multiconferencing processor 41 which receives, combines and outputs a combined conference signal to users via communication links 31-33; multiple
detector units 31-33, and multiple rate translator units 38-40.
Each of the detector units 35-37 output a detector signal to the multiconferencing processor 41 when a conferee on its particular line is speaking. Each of the rate translators 38-40 may be an individual microprocessor which performs one of the rate conversion techniques described above. The multiconferencing processor 41 is a data processor which outputs a combined conference signal to each of the conferees at their particular data rate using the techniques described above.
COMPUTER PROGRAM LISTING OF MICROFICHE APPENDIX FOR MULTI-SPEAKER CONFERENCING OVER NARROWBAND CHANNELS
/*tgc:5/25/91: new bridge algorithm for conferencing. This bridge routine was developed at Rome Laboratory, and demonstrates the bridge function when there are two active speakers*/
/* This routine was developed at Rome Laboratory by Terrence Champion. All subroutine calls are from subroutines developed at Lincoln Laboratory by R. J. McAulay, except for pack_bits and unpack_bits, which were developed at Rome Laboratory. Several of the subroutines have been modified at Rome Laboratory, including setup_bridge(), code_ga_ init() and code_gs(). The header files are modifications of files developed at Lincoln Laboratory*/
#include<math.h>
#include<stdio.h>
#include "define_e.bridge.h" /*STC arrays modded to accommodate CHANNEL speakers*/ #include "pointers.h" /*all the arrays from STC*/
static int abs_time[CHANNELS],interp_time[CHANNELS];
FILE *fin[CHANNELS];
main(argc,argv)
int argc;
char *argv[ ];{
int nmbr_spkrs,nmbr_spkrs_past,spkng[CHANNELS],spkng_ past[CHANNELS];
int chanO,chanl,chan[2];
int index_range;
int i,k,n,conferee;
int offset,sync,state,clear,temp;
static int xform_pack;
/*initialize*/
nmbr_spkrs=O;
chanO=O;
clear=O; nmbr_spkrs_past=O;
allocate (); setup_bridge (argc, argv);
/*set warp design parameters*/
order_cepstral=order_cep_1 ;
nmbr_channels=nmbr_channels_1;
spacing_channels=spacing_channels_1;
trig_init ();
warp_init ();
code_pitch_init ();
code_gs_init (&index_range);
srand(); /* random seed generation */
clear_data ();
n=O;
/*******************************************************************************/
/*unpack parameters from all channels*/
for (conferee=O;conferee<CHANNELS;conferee++{
/* unpack parameters for each conf erencee*/
offset=conferee*order_cepstral;
for (k=O;k<order_ceps tral;k++)gs_q_kml [offfset+k}=gs_q_kpl(offset+k];
pitch_q_kml[conf edree}=pitch_q_kpl[conferee];
vprob_q_kmltconferee]=vprob_q_kpl[conferee];
params=fin[conferee];
unpack_bits(&sync, 1 ,conf eree); /*sync_bit*/
unpack_bits(&spkng[conferee],1,conferee);
if(ss_pitch_frame_fill != 'y'){
unpack_bits(&pitch_index _k[conferee],nbits_pitch,conferee); decode_pitch (pitch_kpl, pitch_inde_k [conferee],
&pitch_q_[conferee]);
)
unpack_bits(&pitch_index_kp 1 [conferee],nbits_pitch,conferee);
decode_pitch (pitch_kpl, pitch_index_k[conferee],
&pitch_q_k[conferee]);
) unpack_bits(&pitch_index_kpl[conferee],nbits_pitch,conferee);
decode_pitch (pitch_kpl, pitch_index_kp l [conferee],
&pitch_q_ kpl[conferee]);
if(ss_yoicing_frame_fill != 'y') {
unpack_bits(&vprob_index_k[conferee],nbits_ voicing,conferee);
decode_yoicing (vprob_kpl, vprob_index_k[conferee],
&vprob_q_k[conferee]);
unpack_bits(& vprob_index_kp l [conferee],nbits_voicing,conferee);
decode_voicing (vprob_final, vprob_index_kpl [conferee],
&vprob_q_kpl [conferee]);
unpack_bits(&stepsize_index[conferee],index_range,conferee);
unpack_bits(&ff_bit_env[conferee],2,conferee);
if(ss_p itch_frame_fill == 'y')
unpack_bits(&f f_bit_pitch[conferee], 1 ,conf eree);
if(ss_yoicing_frame__fill == 'y')
unpack_bits(&f f_bit_yoicing[conferee], 1 ,conf eree);
unpack_bits(&sign_kpl[offset],1, conferee);
unpack_bits(&gs_index_kpl[offset],nbits_channels[0],conferee); for(k=1;k<order_cepstral;k++) {
unpack_bits(&sign_kpl[offset+k],1,conferee);
/*determine and set state for passive listenrers*/
nmbr_spkrs=0;
for(k=O;k<CHANNELS;k++)
nmbr_spkrs+=spkng[k];
/************************************************************************************/ if(nmbr_spkrs>2) nmbr_spkrs=2; /*clamp nmbr_spkrs at 2*/ if (nmbr_spkrs==O)clear=1; if(nmbr_spkrs>nmbr_spkrs_past) { if(clear==1) {
state=O;
for(k=O;k<CHANNELS;k++) {
if(spkng[k]==1 && spkng[k]!=spkng_past[k]) {
chanO=k;
offset=order+cepstral*k;
for(i=O;i<order+_cepstral;i++)
gs_q_kml[offset+i]+O;
break;
} }
clear=O;
}
else if (nmbr_spkrs== 2) [
state=1;
for(k=O;k<CHANNELS;k++) [
if(spkng[k]==1 && k!=chanO
&& spkng[k]!=spkng_past[k]) [
chanl=k;
offset=order_cepstral*k;
gsoqdkml [offset+i]=O.O;
break;
}
}
}
}
else if (nmbr_spkrs_past>nmbr spkrs) { /*one speaker has dropped off*/
if(spkng[chanO]=+O
&& spkng_past[chanO]!=spkng[chanO]) {
/*it's the first speaker*/
state=2;
chanO=chanl;
for(k=O;k<CHANNELS;k++) {
interp_time[k].=O;
abs_time[k]+O;
}
}
else[
state=O;
} if(nmbr_spkrs=1 nmbr_spkrs==O) {
pack_bits(state,2); /*pack state*/
pack_bits(0,1); /*sync_bit*/
}
else if(abs_time[chanO]>=interp_time[chanO]) {
pack_bits(state,2); /*pack state*/
pack_bits(0,1); /*sync_bit*/
}
/*****************************************************************************************************/
/*pack parameters to passive listeners*/
if(nmbr_spkrs==1 nmbr_spkrs=O) { /*pack bits for one speaker*/ offset=chanO*order_cepstral;
if(ss_pitch_frame_fill != 'y')
pack_bits(pitch_index_k[chanO],nbits_pitch);
pack_bits(pitch_index_kpl[chanO],nbits_pitch); if(ss_voicing_frame_fill != 'y')
pack_bits(vprob_index_k[chanO],nbits_voicing);
pack_bits( vprob_index_kpl [chanO],nbits_voicing);
pack_bits(stepsize_index[chanO],index range);
pack_bits(ff_bit_env[chanO],2);
if(ss_pitch_frame_ fill == 'y')
pack_bits(f f_bit_pitch[chanO], 1 );
if(ss_yoicing_frame__fill == 'y')
pack_bits(ff_bit_yoicing[chanO],1);
pack_bits(sign_kpl[offset],1);
pack_bits(gs_index_kpl[offset],nbits_channels[O]);
printf("\norder=%d",order_cepstral);
for(k=1;k<order_cepstral;k++) {
pack_bits(sign_kpl[offset+k],1);
gs_index_kpl [offset+k]-=1;
if(nbits_channels[k]>1 )
offset=chanO*order_cepstral;
if(ss_pitch_frame_fill != 'y')
decode_pitch (pitch_kpl, pitch_index_k[chanO], &pitch_g_k[chanO] decode_ pitch *pitch_kpl, pitch_index_kpl[chanO], &pitch_q_kpl[chanO]); if(ss_voicing_frame_fill != 'y')
decide_voicing (vprob_final, vprob_index_k[chanO],
&vprob _q_k[chanO]);
decode_voicing (vprob_final, vprob_index_kpl[chanO], & vprob_ q_kpl[chanO] /*Connvert channel gains to cepstral coefficients */
for (i = 0; i < order_cepstral; i ++) {
cs_kml[offset+i]=cs kpl[offset+i];
}
channels_to_cs (&gs_q_kpl [offset], &cs_kpl [offset], order_cepstral); ; xform(chanO); offset=chanl*order_cepstraI;
if(ss_pitch_frame_fill != 'y')
decode_ pitch (pitch_kjpl, pitch_index_k[chanl], &pitch_q_k[chanl]) decode_pitch (pitch_kpl, pitch_index_kpl[chanl], &pitch_ q_kpl[chanl]); if(ss_yoicing_frame_filI != 'y')
decode_voicing (vprob_final, vprob_index_k[chanl],
&vprob_q_k[chanl]);
decode_voicing (vprob_final, vprob_index_kpl[chanl],
&vprob_q_kpl[chanl]);
/* Convert channel gains to cepstral coefficients */
for (i = 0; i < order cepstral; i ++) {
cs_kml[offset+i]=cs_kpl[offset+i];
}
channels_to_cs (&gs_q_ kpl[offset], &cs_kpl [offset], order_cepstral);
xform(chanl);
}
nmbr_spkrs_past=nmbr_ spkrs;
xform(spkr)
int spkr; {
int i, i, k, n;
int wait, index_rang,trsh,offset;
float modulus,modulus_prime;
float inside_modulus,inside_modulus_prime;
Float *ptr1, *ptr2, *ptr3, *ptr4, *ptr5, *ptr6;
static float gs_q_old[256];
/******************************************************************************/ offset=spkr*order_cepstral;
/* begin time interpolation routine*/
/*******************************************************************************//*******************************************************************************/
/*set interpolation markers*/
modulus=((float)abs_time[spkr] - inten)_time[spkr]);
modulus_prime=((float)base_frame - modulus); if(modulus>=base_frame_divm2) {
inside_ modulus_prime=base_frame_div_2 - inside modulus;
}
else{
inside_modulus=modulus;
inside_modulus_prime=modulus_prime - base_frame_div_2;
}
/******************************************************************************/ if(abs_time[spkr]>=interp_time[spkr]){
/******************************************************************************/
/* nterpolate*/
if(ss_pitch_frame_fill != 'y')
if(modulus_prime<=base_frame_div_ 2)
pitch_final=(inside_m odulus*pitch_q_ kml[spkr]
+ inside_m odulus_prime*pitch_q_ k[spkr])
/base_frame_div_2;
else
pitch_final=(inside_m odulus*pitch_q_k[spkr]
+ inside_modulus_prime * pitch_ q_kpl[spkr])
/base_ frame_div_2;
else
pitch_finaI=(modulus*pitch_q_kml[spkr]
+ modulus_prime*pitch_q_kpl[spkr])/base_frame;
if(ss_voicing_frame_fill != 'y')
if(modulus_prime<=base_frame_div_2)
vprob_final=(inside_modulus*vprob_q_k ml[spkr]
+ inside_modulus_prime*vprob_q_ k[spkr])
*******************************************************************************/
/*code parameters at new bit rate*/
/* Code the pitch */
code_pitch (pitch_final, &pitch_index_kpl[spkr]);
/* Code the voicing probability */
code_voicing (vprob_final, &vprob_index_kpl[spkr]);
*******************************************************************************/
/*convert envelopes*/ cs_to_envelopes (&cs_f inal[offset], order_cepstral,
freq_l pc, &env_final[offset], &sysfaz_final[offset]);
order_cepstral=order_cep_2;
nmbr_channels=nmbr-channels_2;
spacing_channels=spacing_channels_2; nmbr_bits_channel=nbr_bits_channel_2;
warp_init();
code_gs_init(&index_range);
envelope)to)cs (&env_finaI[offset], &cs_final [offset], order_cepstral);
cs_to_channeIs(&cs_final[offset], &cs_final[offset], order_cepstral);
code_gs (&g_final[offset],&gs_q_old[offset],order_cepstral,&index_range, &stepsize_index[spkr],sign_kpl,gs_index_kpl,
&gain_min, &gain_max, &audio_in[offset_ana-smprate]);
/*pack channel*/
pack_b its(pitch_index_kpl[spkr],nbits_pitch;
pack_bits(vprob_index_kpl[spkr],nbits_voicing);
pack_bits(stepsize_ index[spkr],index_range);
pack_bits(ff_bit_env[spkr],2);
pack_bits(f f_bit_pitch[spkr],1);
pack_bits(f f_ bit- voicing[spkr],1);
pack_bits(sign_kpl[0],1);
pack_bits(gs_indexc_k pl[0],nbits_channels[0]);
gs_q_oId[offset]=gs_final[offset];
for(k=1;k<order_cepstral;k++){
pack_bits(sign_kpl[k], 1 );
gs_index_kpl[k]-=1;
if (nbits_channels[k]> 1 )
pack_bits(gs_index_kpl[k],nbits_channels[k]- 1;
gs_q_old[offset+k]=gs_final[offset+k];
}
/****************************************************************************/ for (i = 0; i < order_cepstral; i ++){
printf("/ngs_kpl[%d]=%f",offset+i,gs_q_kpl[offset+]);
printf("/ ngs_final[%d]=%f",offset+i,gs_final[offset+i]);
/***************************************************************************** /
/*set time counters*/
if(abs_time[spkr]==interp_time[spkr]){
abs_time[spkr]=0;
intep_time[spkr]=interp_frame;
}
else{
intepr _time[spkr]+=interp_frame;
}
/**************************************************************************** / order_cepstral=order_cep_1; nmbr_channels=nbr_channels_1;
spacing_channels=spacing_channels_1;
nmbr_bits_channel=nmbr_bits_channel_1 ;
warp_init();
code_gs_init(&index_range);
} /*endof if*/
abs_time[spkr]+=base_frame;
end of subroutine() */
While the invention has been described in its presently preferred embodiment it is understood that the words which have been used are words of description rather than words of limitation and that change within the purview of the appended claims may be made without departing from the scope and spirit of the invention in its broader aspects.

Claims

1. A process of facilitating multispeaker conferencing for users of a digitally encoded voice system, wherein each of said users is capable of outputting encoded digital voice signals in a bit stream at a system data rate, wherein said process comprises the steps of: counting a number of active conferees to produce thereby a count number N, said active
conferees being users of said digitally encoded voice system which are speaking when said counting step is performed; compressing each of the encoded digital voice signals produced by said active conferees down to a plurality of compressed digital signals which each have a compressed data rate; a digital signal summation step, in which the compressed digital signals are combined to output thereby a combined compressed digital signal which is transmitted to said users at the system data rate; and decoding said combined compressed digital signal to output thereby an artificial voice signal to said users of the digitally encoded voice system, wherein said artificial voice signal represents voices of all the active conferees which are speaking.
2. A process, as defined in claim 1, wherein said digitally encoded voice system used in said process comprises a set of vocoders which operate over a narrowband with said system data rate ranging
substantially between 2.4 and 32.0 kb/s.
3. A process, as defined in claim 1, wherein said compressing step comprises digitally compressing each of said encoded digital voice signals into said compressed digital signals such that their compressed data rate has a bit rate given by SDR/N, where SDR is the system data rate, and N is the number of active conferees.
4. A process, as defined in claim 2 , wherein said compressing step comprises digitally compressing each of said encoded voice signals into said compressed digital signals such that their compressed data rate has a bit rate given by SDR/N, where SDR is the system data rate, and N is the number of active conferees.
5. A process, as defined in claim 1, wherein said counting and compressing steps are performed using a microprocessor which is electrically connected to each of said users of said digitally encoded voice system using telephone lines and a plurality of modem units, wherein said microprocessor is programmed to perform as a conferencing bridge and to perform said counting and compressing steps.
6. A process, as defined in claim 2, wherein said counting and compressing steps are performed using a microprocessor which is electrically connected to each of said users of said digitally encoded voice system using telephone lines and a plurality of modem units, wherein said microprocessor is programmed to perform as a conferencing bridge and to perform said counting and compressing steps.
7. A process, as defined in claim 3, wherein said counting and compressing steps are performed using a microprocessor which is electrically connected to each of said users of said digitally encoded voice system using telephone lines and a plurality of modem units, wherein said microprocessor is programmed to perform as a conferencing bridge and to perform said counting and compressing steps.
8. A process, as defined in claim 4, wherein said counting and compressing steps are performed using a microprocessor which is electrically connected to each of said users of said digitally encoded voice system using telephone lines and a plurality of modem units, wherein said microprocessor is programmed to perform as a conferencing bridge and to perform said counting and compressing steps.
9. A process, as defined in claim 1, wherein said counting step is performed by counting users of a vocoder systems which has the system data rate of about 4.8 kilobits per second, and wherein said compressing step comprises: compressing each of the encoded digital voice signals into approximately 2.4 kilobit per second data system when there are two active
conferees; compressing each of the encoded digital voice signals into approximately 1.6 kilobit per second data stream when there are three active conferees; compressing each of the encoded digital voice signals into approximately 1.2 kilobit per second data stream when there are four active conferees; compressing each of the encoded digital voice signals into approximately 0.96 kilobit per second data stream when there are five active conferees; and compressing each of the encoded digital voice signals into approximately 0.8 kilobit per second data stream when there are six active
conferees.
10. A process, as defined in claim 2, wherein said counting step is performed by counting users of a vocoder system which has the system data rate of about 4.8 kilobits per second, and wherein said compressing step comprises: compressing each of the encoded digital voice signals into approximately 2.4 kilobit per second data system when there are two active conferees; compressing each of the encoded digital voice signals into approximately 1.6 kilobit per second data stream when there are three active conferees; compressing each of the encoded digital voice signals into approximately 1.2 kilobit per second data stream when there are four active conferees; compressing each of the encoded digital voice signals into approximately 0.96 kilobit per second data stream when there are five active conferees; and compressing each of the encoded digital voice signals into approximately 0.8 kilobit per second data stream when there are six active conferees.
11. A process, as defined in claim 3, wherein said counting step is performed by counting users of a vocoder system which has the system data rate of about 4.8 kilobits per second, and wherein said compressing step comprises: compressing each of the encoded digital voice signals into approximately 2.4 kilobit per second data system when there are two active
conferees; compressing each of the encoded digital voice signals into approximately 1.6 kilobit per second data stream when there are three active conferees; compressing each of the encoded digital voice signals into approximately 1.2 kilobit per second data stream when there are four active
conferees; compressing each of the encoded digital voice signals into approximately 0.96 kilobit per second data stream when there are five active
conferees; and compressing each of the encoded digital voice signals into approximately 0.8 kilobit per second data stream when there are six active
conferees.
12. A process, as defined in claim 4, wherein said counting step is performed by counting users of a vocoder system which has the system data rate of about 4.8 kilobits per second, and wherein said compressing step comprises: compressing each of the encoded digital voice signals into approximately 2.4 kilobit per second data system when there are two active
conferees; compressing each of the encoded digital voice signals into approximately 1.6 kilobit per second data stream when there are three active conferees; compressing each of the encoded digital voice signals into approximately 1.2 kilobit per second data stream when there are four active
conferees; compressing each of the encoded digital voice signals into approximately 0.96 kilobit per second data stream when there are five active conferees; and compressing each of the encoded digital voice signals into approximately 0.8 kilobit per second data stream when there are six active
conferees.
13. A process, as defined in claim 5, wherein said counting step is performed by counting users of a vocoder system whih has the system data rate of about 4.8 kilobits per second, and wherein said compressing step comprises: compressing each of the encoded digital voice signals into approximately 2.4 kilobit per second data system when there are two active
conferees; compressing each of the encoded digital voice signals into approximately 1.6 kilobit per second data stream when there are three active conferees; compressing each of the encoded digital voice signals into approximately 1.2 kilobit per second data stream when there are four active conferees; compressing each of the encoded digital voice signals into approximately 0.96 kilobit per second data stream when there are five active
conferees; and compressing each of the encoded digital voice signals into approximately 0.8 kilobit per second data stream when there are six active
conferees.
14. A process, as defined in claim 6, wherein said counting step is performed by counting users of a vocoder system which has the system data rate of about 4.8 kilobits per second, and wherein said compressing step comprises: compressing each of the encoded digital voice signals into approximately 2.4 kilobit per second data system when there are two active
conferees; compressing each of the encoded digital voice signals into approximately 1.6 kilobit per second data stream when there are three active conferees; compressing each of the encoded digital voice signals into approximately 1.2 kilobit per second data stream when there are four active conferees; compressing each of the encoded digital voice signals into approximately 0.96 kilobit per second data stream when there are five active conferees; and compressing each of the encoded digital voice signals into approximately 0.8 kilobit per second data stream when there are six active
conferees.
15. A process, as defined in claim 7, wherein said counting step is performed by counting users of a vocoder system which has the system data rate of about 4.8 kilobits per second, and wherein said compressing step comprises: compressing each of the encoded digital voice signals into approximately 2.4 kilobit per second data system when there are two active conferees; compressing each of the encoded digital voice signals into approximately 1.6 kilobit per second data stream when there are three active conferees; compressing each of the encoded digital voice signals into approximately 1.2 kilobit per second data stream when there are four active conferees; compressing each of the encoded digital voice signals into approximately 0.96 kilobit per second data stream when there are five active conferees; and compressing each of the encoded digital voice signals into approximately 0.8 kilobit per second data stream when there are six active conferees.
16. A process, as defined in claim 8, wherein said counting step is performed by counting users of a vocoder system which has the system data rate of about 4.8 kilobits per second, and wherein said compressing step comprises: compressing each of the encoded digital voice signals into approximately 2.4 kilobit per second data stream when there are two active
conferees; compressing each of the encoded digital voice signals into approximately 1.6 kilobit per second data stream when there are three active conferees; compressing each of the encoded digital voice signals into approximately 1.2 kilobit per second data stream when there are four active
conferees; compressing each of the encoded digital voice signals into approximately 0.96 kilobit per second data stream when there are five active
conferees; and compressing each of the encoded digital voice signals into approximately 0.8 kilobit per second data stream when there are six active
conferees.
17. A system for facilitating multispeaker conferencing for users of a digitally encoded voice system, wherein each of said users is capable of outputting encoded digital voice signal in a bit stream at a system data rate, wherein said system comprises: a means for counting a number of active conferees to produce thereby a count number N, said active conferees being users of said digitally encoded voice system which are actually speaking; a means for compressing each of the encoded digital voice signals produced by said active conferees down to a plurality of compressed digital signals which each have a compressed data rate; a means for a digital signal summation in which the compressed digital signals are combined to output thereby a combined compressed digital signal which is sent back to said users at the system data rate; and a means for decoding the combined compressed digital signal to output thereby an artificial voice signal to said users of the
digitally encoded voice system, wherein said
artificial voice signal represents voices of all the active conferees which are speaking.
18. A system, as defined in claim 17, wherein said digitally encoded voice system used in said system comprises a set of vocoders which operate over a narrowband with said system data rate ranging between 2.4 and 32.0 kb/s.
19. A system, as defined in claim 17, wherein said compressing means digitally compresses each of said encoded digital voice signals into said
compressed digital signals such that their compressed data rate has a bit rate given by SDR/N, where SDR is the system data rate, and N is the number of active conferees.
20. A system, as defined in claim 17, wherein said counting and compressing means comprises a microprocessor which is electrically connected to each of said users of said digitally encoded voice system using telephone lines and a plurality of modem units, wherein said microprocessor is programmed to perform as a conferencing bridge and to perform said counting and compressing.
21. A system for facilitating conferencing between users of a plurality of digitally encoded voice systems which each output encoded digital signals at their own data stream bit rates, wherein said system comprises: an electrical network which is electrically connected with each of said digitally encoded voice systems and conduct said encoded digital signals therebetween; and a means for digital-to-digital conversion which is connected to said electrical network to receive and adjust the data stream bit rate of said encoded digital signals to enable each of the digitally encoded voice systems to receive the encoded digital signals at its data stream bit rate.
22. A system, as defined in claim 21, wherein said electrical network comprises any media which conducts digitally encoded voice signals such as radio, laser communication systems, satellite
communication systems and telephone systems.
23. A system, as defined in claim 21, wherein said electrical network comprises a plurality of modems which connect said digitally encoded voice systems to said conversion means by telephone lines.
24. A system, as defined in claim 21, wherein said conversion means comprises a computer which receives digitized acoustic waveforms with coded values representing parameters that can include:
pitch, synchronization, voicing and a spectral envelope, and wherein said computer is programmed to perform parameter de-scoping and change spacing between the coded values so that each of said
digitally encoded voice systems receives the
digitized acoustic waveforms at its particular data stream bit rate.
25. A system, as defined in claim 21, wherein said conversion means comprises a computer which receives digitized acoustic waveforms with coded values representing signals which were sampled with a first sampling rate, and wherein said computer selects a second sampling rate into which to map said digitized acoustic waveforms and adjust thereby the data stream but rates conducted by the electrical network so that each of said digitally encoded voice system receives the digitized acoustic waveforms at its particular data stream bit rate.
26. A system, as defined in claim 22, wherein said conversion means comprises a computer which receives digitized acoustic waveforms with coded values representing signals which were sampled with a first sampling rate, and wherein said computer selects a second sampling rate into which to map said digitized acoustic waveforms and adjust thereby the data stream bit rates conducted by the electrical network so that each of said digitally encoded voice system receives the digitized acoustic waveforms at its particular data stream bit rate.
27. A system, as defined in claim 21, wherein said conversion means comprises a computer which receives digitized acoustic waveforms with coded values representing acoustic parameters that can include: pitch, synchronization, voicing and a spectral envelopes, and wherein said computer is programmed to reencode the spectral envelope with cepstral coefficients that have and adjusted bit-rate and yield thereby the encoded digital signals at the data stream bit rates of each of the digitally encoded voice systems.
28. A system, as defined in claim 21, wherein said conversion means comprises a computer which receives digitized acoustic waveforms with coded values representing acoustic parameters that can include: pitch, synchronization, voicing and a spectral envelope, and wherein said computer is programmed to reencode the spectral envelope with cepstral coefficients that have and adjusted bit-rate and yield thereby the encoded digital signals at the data stream bit rates of each of the digitally encoded voice systems.
29. A process for facilitating conferencing between users of a plurality of digitally encoded voice systems which each output encoded digital signals at their own data stream bit rates wherein said process comprises: measuring each of the particularly data stream bit rates produced by each of the digitally encoded voice systems to produce thereby a set of data stream bit rates which are identified for each digitally encoded voice system; and combining and converting all received encoded digital signals into a set of converted encoded digital signals which are each sent back to said digitally encoded voice systems at their data stream bit rate.
30. A process, as defined in claim 29, wherein said combining and converting step is performed by a computer which receives digitized acoustic waveforms with coded values representing parameters that can include: pitch, synchronization voicing and a spectral envelope, and wherein said computer is programmed to perform parameter
de-scoping and change spacing between the coded values so that each of said digitally encoded voice systems receives the digitized acoustic waveforms at its data stream bit rate.
31. A process as defined in claim 29, wherein said combining and converting step is performed by a computer which receives digitized acoustic waveforms with coded representing signals which were sampled with a first sample rate, and wherein said computer selects a second sampling rate into which to map said digitized acoustic waveforms and adjust thereby the data stream bit rates conducted by the electrical network so that each of said digitally encoded voice system receives the digitized acoustic waveforms at its data stream bit rate.
32. A process as defined in claim 29, wherein said combining and converting step is performed by a computer which receives digitized acoustic waveforms with coded values representing acoustic parameters that can include: pitch, synchronization, voicing and spectral envelope, and wherein said computer is programmed to reencode the spectral envelope with spectral coefficients that have and adjusted bit-rate and yield thereby the encoded digital signals at the data stream bit rates of each of the digitally encoded voice system.
33. A process, as defined in claim 29, wherein said digitally encoded voice systems are each narrow-band vocoders which each include digitized acoustic voice data in their encoded digital signals by performing sinusoidal transform coding, and wherein said converting step is performed by a computer which receives digitized acoustic waveforms within coded values representing parameters that can include: pitch, synchronization, voicing and a spectral envelope, and wherein said computer is programmed to perform parameter de-scoping and change spacing between the coded values so that each of said digitally encoded voice systems receives the
digitized acoustic waveforms at its data stream bit rate.
34. A process, as defined in claim 29, wherein said digitally encoded voice systems are each narrow-band vocoders which each include digitized acoustic voice data in their encoded digital signals by performing sinusoidal transform coding, and wherein said converting step is performed by a computer which receives digitized acoustic waveforms within coded values representing signal which were sampled with, a first sampling rate, and wherein said computer selects a second sampling rate into which to map said digitized acoustic waveforms and adjust thereby the data stream bit rates conducted by the electrical network so that each of said digitally encoded voice system receives the digitized acoustic waveforms as its data stream bit rate.
35. A process, as defined in claim 29, wherein said digitally encoded voice systems are each narrow-band vocoders which each include digitized acoustic voice data in their encoded digital signals by performing sinusoidal transform coding and wherein said converting step is performed by a computer which receives digitized acoustic waveforms with coded values representing acoustic parameters that can include: pitch, synchronization, voicing and
spectral envelope, and wherein said computer is programmed to reencode the spectral envelope with cepstral coefficients that have and adjusted bit-rate and yield thereby the encoded digital signals at the data stream bit rates of each of the digitally
encoded voice systems.
36. A process of facilitating multispeaking conferencing for suers of a digitally encoded voice system, wherein each of said users is capable of outputting encoded digital voice signals in a bit stream at a system data rate, wherein said process comprises the steps of: monitoring the users as they output said digital voice signals serially when one of said users is speaking, and monitoring said users as they output said digital voice signals simultaneously, when multiple users and speaking; and producing a conferencing signal usable by all of said users, and conferencing signal
representing one digital voice signal when one of said users is speaking, said conferencing signal representing multiple digital voice signals when multiple users are speaking simultaneously.
37. A process, as defined in claim 36, wherein said monitoring step includes counting a number of active conferees to produce thereby a counter number N, said active conferees being users of said digitally encoded voice system which are speaking when said monitoring step is performed.
38. A process, as defined in claim 37, wherein said producing step comprises digitally compressing each of said encoded digital voice signals into said compressed digital signals such that their compressed data rate has a bit rate given by SDR/N, where SDR is the system data rate, and N is the number of active conferees.
39. A process, as defined in claim 36, wherein said digitally encoded voice system used in said process comprises a set of vocoders which operate over a narrowband with said system data rate ranging substantially between 2.4 and 32.0 kb/s.
40. A process, as defined in claim 39, wherein said monitoring step includes counting a number of active conferees to produce thereby a counter number N, said active conferees being users of said digitally encoded voice system which are speaking when said monitoring step is performed.
41. A process, as defined in claim 39, wherein said producing step comprises digitally compressing each of said encoded digital voice
signals into said compressed digital signals such that their compressed at a rate has a bit rate given by SDR/N, where SDR is the system data rate, and N is the number of active conferees.
42. A system for facilitating multispeaker conferencing for users of a digitally encoded voice system, wherein each of said users is capable of outputting encoded digital voice signal in a bit stream at a system data rate, wherein said system comprises: a means for monitoring the users as they output said digital voice signals serially when one of said users is speaking, and monitoring said users as they output said digital voice signals
simultaneously when multiple users are speaking; and a means for producing a conferencing signal usable by all of said users, said conferencing signal representing one digital voice signal when one of said users is speaking, said conferencing signal representing multiple digital voice signals when multiple users are speaking simultaneously.
43. A system, as defined in claim 42, wherein said digitally encoded voice system used in said system comprises a set of vocoders which operate over a narrowband with said system data rate ranging between 2.4 and 32.0 kb/s.
44. A system, as defined in claim 42, wherein said monitoring means counts a number of active conferees to produce thereby a count number N, said active conferees being users of said digitally encoded voice system which are actually speaking.
45. A system, as defined in claim 44, wherein said producing means digitally compresses each of said encoded digital voice signals into said
compressed digital signals such that their compressed data rate has a bit rate given by SDR/N, where SDR is the system data rate, and N is the number of active conferees.
46. A system, as defined in claim 42, means wherein said producing means comprises a
microprocessor which is electrically connected to each of said users of said digitally encoded voice system, using telephone lines and a plurality of modem units, wherein said microprocessor is
programmed to perform as a conferencing bridge and to perform said producing of said conference signal.
PCT/US1992/002048 1991-09-12 1992-03-12 Multi-speaker conferencing over narrowband channels WO1993005595A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US760,021 1991-09-12
US07/760,021 US5317567A (en) 1991-09-12 1991-09-12 Multi-speaker conferencing over narrowband channels

Publications (1)

Publication Number Publication Date
WO1993005595A1 true WO1993005595A1 (en) 1993-03-18

Family

ID=25057811

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1992/002048 WO1993005595A1 (en) 1991-09-12 1992-03-12 Multi-speaker conferencing over narrowband channels

Country Status (3)

Country Link
US (2) US5317567A (en)
AU (1) AU2321892A (en)
WO (1) WO1993005595A1 (en)

Families Citing this family (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5765127A (en) * 1992-03-18 1998-06-09 Sony Corp High efficiency encoding method
US5384772A (en) * 1993-09-01 1995-01-24 Intel Corporation Method and apparatus for audio flow control during teleconferencing
US5457685A (en) * 1993-11-05 1995-10-10 The United States Of America As Represented By The Secretary Of The Air Force Multi-speaker conferencing over narrowband channels
CA2146688A1 (en) * 1994-05-04 1995-11-05 Gregory Ciurpita Jr. Microphone/loudspeakers and systems using multiple microphone/loudspeakers
US5483587A (en) * 1994-06-08 1996-01-09 Linkusa Corporation System and method for call conferencing
US5903862A (en) * 1995-01-25 1999-05-11 Weaver, Jr.; Lindsay A. Method and apparatus for detection of tandem vocoding to modify vocoder filtering
US5956673A (en) * 1995-01-25 1999-09-21 Weaver, Jr.; Lindsay A. Detection and bypass of tandem vocoding using detection codes
US5717819A (en) * 1995-04-28 1998-02-10 Motorola, Inc. Methods and apparatus for encoding/decoding speech signals at low bit rates
US5737405A (en) * 1995-07-25 1998-04-07 Rockwell International Corporation Apparatus and method for detecting conversation interruptions in a telephonic switch
US6292662B1 (en) * 1995-09-29 2001-09-18 Qualcomm Incorporated Method and system for processing telephone calls involving two digital wireless subscriber units that avoid double vocoding
EP0779732A3 (en) * 1995-12-12 2000-05-10 OnLive! Technologies, Inc. Multi-point voice conferencing system over a wide area network
US5912882A (en) * 1996-02-01 1999-06-15 Qualcomm Incorporated Method and apparatus for providing a private communication system in a public switched telephone network
US5806038A (en) * 1996-02-13 1998-09-08 Motorola, Inc. MBE synthesizer utilizing a nonlinear voicing processor for very low bit rate voice messaging
US5898675A (en) * 1996-04-29 1999-04-27 Nahumi; Dror Volume control arrangement for compressed information signals
US5812968A (en) * 1996-08-28 1998-09-22 Ericsson, Inc. Vocoder apparatus using the link margin
US7085710B1 (en) 1998-01-07 2006-08-01 Microsoft Corporation Vehicle computer system audio entertainment system
AU3372199A (en) * 1998-03-30 1999-10-18 Voxware, Inc. Low-complexity, low-delay, scalable and embedded speech and audio coding with adaptive frame loss concealment
US6301265B1 (en) 1998-08-14 2001-10-09 Motorola, Inc. Adaptive rate system and method for network communications
US6163766A (en) * 1998-08-14 2000-12-19 Motorola, Inc. Adaptive rate system and method for wireless communications
US6240070B1 (en) 1998-10-09 2001-05-29 Siemens Information And Communication Networks, Inc. System and method for improving audio quality on a conferencing network
US6728221B1 (en) * 1999-04-09 2004-04-27 Siemens Information & Communication Networks, Inc. Method and apparatus for efficiently utilizing conference bridge capacity
US7006616B1 (en) 1999-05-21 2006-02-28 Terayon Communication Systems, Inc. Teleconferencing bridge with EdgePoint mixing
US6944137B1 (en) * 2000-03-24 2005-09-13 Motorola, Inc. Method and apparatus for a talkgroup call in a wireless communication system
US20060067500A1 (en) * 2000-05-15 2006-03-30 Christofferson Frank C Teleconferencing bridge with edgepoint mixing
US6956828B2 (en) * 2000-12-29 2005-10-18 Nortel Networks Limited Apparatus and method for packet-based media communications
US7006456B2 (en) * 2001-02-02 2006-02-28 Nortel Networks Limited Method and apparatus for packet-based media communication
FI114129B (en) * 2001-09-28 2004-08-13 Nokia Corp Conference call arrangement
JP3973078B2 (en) * 2001-11-29 2007-09-05 パイオニア株式会社 Network conferencing system
US8385233B2 (en) 2007-06-12 2013-02-26 Microsoft Corporation Active speaker identification
US7782802B2 (en) * 2007-12-26 2010-08-24 Microsoft Corporation Optimizing conferencing performance
JP7143574B2 (en) * 2017-07-18 2022-09-29 富士通株式会社 Evaluation program, evaluation method and evaluation device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3970797A (en) * 1975-01-13 1976-07-20 Gte Sylvania Incorporated Digital conference bridge
US4138596A (en) * 1976-09-02 1979-02-06 Roche Alain Equipments for connecting PCM multiplex digital transmission systems having different nominal bit rates
US4313033A (en) * 1978-05-31 1982-01-26 Hughes Aircraft Company Apparatus and method for digital combination of delta modulated data
US4685101A (en) * 1984-12-20 1987-08-04 Siemens Aktiengesellschaft Digital multiplexer for PCM voice channels having a cross-connect capability
US4802189A (en) * 1983-03-25 1989-01-31 Siemens Aktiengesellshaft Method and circuit arrangement for the transmission of data signals between subscriber stations of a data network
US5072442A (en) * 1990-02-28 1991-12-10 Harris Corporation Multiple clock rate teleconferencing network

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4270025A (en) * 1979-04-09 1981-05-26 The United States Of America As Represented By The Secretary Of The Navy Sampled speech compression system
US4271502A (en) * 1979-06-19 1981-06-02 Magnavox Government And Industrial Electronics Co. Digital voice conferencer
JPS5650398A (en) * 1979-10-01 1981-05-07 Hitachi Ltd Sound synthesizer
US4441201A (en) * 1980-02-04 1984-04-03 Texas Instruments Incorporated Speech synthesis system utilizing variable frame rate
ATE15415T1 (en) * 1981-09-24 1985-09-15 Gretag Ag METHOD AND DEVICE FOR REDUNDANCY-REDUCING DIGITAL SPEECH PROCESSING.
US4856068A (en) * 1985-03-18 1989-08-08 Massachusetts Institute Of Technology Audio pre-processing methods and apparatus
US4937873A (en) * 1985-03-18 1990-06-26 Massachusetts Institute Of Technology Computationally efficient sine wave synthesis for acoustic waveform processing
US4885790A (en) * 1985-03-18 1989-12-05 Massachusetts Institute Of Technology Processing of acoustic waveforms
US4679187A (en) * 1985-04-22 1987-07-07 International Business Machines Corp. Adaptive trunk-compression system with constant grade of service
US4833718A (en) * 1986-11-18 1989-05-23 First Byte Compression of stored waveforms for artificial speech
US4852168A (en) * 1986-11-18 1989-07-25 Sprague Richard P Compression of stored waveforms for artificial speech
US4899148A (en) * 1987-02-25 1990-02-06 Oki Electric Industry Co., Ltd. Data compression method
US4890327A (en) * 1987-06-03 1989-12-26 Itt Corporation Multi-rate digital voice coder apparatus
US4815134A (en) * 1987-09-08 1989-03-21 Texas Instruments Incorporated Very low rate speech encoder and decoder
US5065395A (en) * 1990-04-09 1991-11-12 Dsc Communications Corporation Rudimentary digital speech interpolation apparatus and method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3970797A (en) * 1975-01-13 1976-07-20 Gte Sylvania Incorporated Digital conference bridge
US4138596A (en) * 1976-09-02 1979-02-06 Roche Alain Equipments for connecting PCM multiplex digital transmission systems having different nominal bit rates
US4313033A (en) * 1978-05-31 1982-01-26 Hughes Aircraft Company Apparatus and method for digital combination of delta modulated data
US4802189A (en) * 1983-03-25 1989-01-31 Siemens Aktiengesellshaft Method and circuit arrangement for the transmission of data signals between subscriber stations of a data network
US4685101A (en) * 1984-12-20 1987-08-04 Siemens Aktiengesellschaft Digital multiplexer for PCM voice channels having a cross-connect capability
US5072442A (en) * 1990-02-28 1991-12-10 Harris Corporation Multiple clock rate teleconferencing network

Also Published As

Publication number Publication date
US5383184A (en) 1995-01-17
US5317567A (en) 1994-05-31
AU2321892A (en) 1993-04-05

Similar Documents

Publication Publication Date Title
US5317567A (en) Multi-speaker conferencing over narrowband channels
US5457685A (en) Multi-speaker conferencing over narrowband channels
EP1914724B1 (en) Dual-transform coding of audio signals
US8428959B2 (en) Audio packet loss concealment by transform interpolation
US5570363A (en) Transform based scalable audio compression algorithms and low cost audio multi-point conferencing systems
KR101178114B1 (en) Apparatus for mixing a plurality of input data streams
US6356545B1 (en) Internet telephone system with dynamically varying codec
EP2402939B1 (en) Full-band scalable audio codec
US8831932B2 (en) Scalable audio in a multi-point environment
US20040068399A1 (en) Method and apparatus for transmitting an audio stream having additional payload in a hidden sub-channel
US5272698A (en) Multi-speaker conferencing over narrowband channels
US8340959B2 (en) Method and apparatus for transmitting wideband speech signals
US9984698B2 (en) Optimized partial mixing of audio streams encoded by sub-band encoding
US20030220801A1 (en) Audio compression method and apparatus
JP2005114814A (en) Method, device, and program for speech encoding and decoding, and recording medium where same is recorded
JPH0761043B2 (en) Stereo audio transmission storage method
Podolsky A study of speech/audio coding on packet switched networks
JP3092157B2 (en) Communication signal compression system and compression method
JP3217237B2 (en) Loop type band division audio conference circuit
Galand et al. Voice-excited predictive coder (VEPC) implementation on a high-performance signal processor
JPH0784595A (en) Band dividing and encoding device for speech and musical sound

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AT AU BR CA CH CS DE DK ES FI GB JP KR LU NL NO PL RU SE

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IT LU MC NL SE

122 Ep: pct application non-entry in european phase
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

NENP Non-entry into the national phase

Ref country code: CA