CA2446085C - Audio conference platform with dynamic speech detection threshold - Google Patents
Audio conference platform with dynamic speech detection threshold Download PDFInfo
- Publication number
- CA2446085C CA2446085C CA2446085A CA2446085A CA2446085C CA 2446085 C CA2446085 C CA 2446085C CA 2446085 A CA2446085 A CA 2446085A CA 2446085 A CA2446085 A CA 2446085A CA 2446085 C CA2446085 C CA 2446085C
- Authority
- CA
- Canada
- Prior art keywords
- port
- threshold value
- signals
- ports
- dynamic threshold
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/56—Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/02—Details
- H04L12/16—Arrangements for providing special services to substations
- H04L12/18—Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
- H04L12/1813—Arrangements for providing special services to substations for broadcast or conference, e.g. multicast for computer conferences, e.g. chat rooms
- H04L12/1818—Conference organisation arrangements, e.g. handling schedules, setting up parameters needed by nodes to attend a conference, booking network resources, notifying involved parties
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/56—Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
- H04M3/567—Multimedia conference systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/56—Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
- H04M3/568—Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/02—Details
- H04L12/16—Arrangements for providing special services to substations
- H04L12/18—Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
- H04L12/1813—Arrangements for providing special services to substations for broadcast or conference, e.g. multicast for computer conferences, e.g. chat rooms
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2201/00—Electronic components, circuits, software, systems or apparatus used in telephone systems
- H04M2201/18—Comparators
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/18—Automatic or semi-automatic exchanges with means for reducing interference or noise; with means for reducing effects due to line faults with means for protecting lines
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/20—Automatic or semi-automatic exchanges with means for interrupting existing connections; with means for breaking-in on conversations
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/56—Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
- H04M3/568—Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants
- H04M3/569—Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants using the instant speaker's algorithm
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04Q—SELECTING
- H04Q1/00—Details of selecting apparatus or arrangements
- H04Q1/18—Electrical details
- H04Q1/30—Signalling arrangements; Manipulation of signalling currents
- H04Q1/44—Signalling arrangements; Manipulation of signalling currents using alternate current
- H04Q1/444—Signalling arrangements; Manipulation of signalling currents using alternate current with voice-band signalling frequencies
- H04Q1/45—Signalling arrangements; Manipulation of signalling currents using alternate current with voice-band signalling frequencies using multi-frequency signalling
Abstract
The present invention comprises a method for audio/video conferencing including an audio conferencing platform (26), the platform comprising network interface cards (38-40), processing boards (44-46), a CPU (48), a TDM bus (42) and a PCI bus (50). The method comprises using dynamic threshold value to determine whether there is speech on a line. In one aspect, the method comprises determining a dynamic threshold value based on one or more chracteristics of signals received on a port, associating that dynamic threshold value with the port; and comparing one or more characteristics of signals subsequently received on the port to the dynamic threshod value.
Signals received over a plurality of ports are summed, but for ports whose signal characterisics have a specified relationship to the dynamic threshold value associated with that port, signals are not contained in the sum.
Signals received over a plurality of ports are summed, but for ports whose signal characterisics have a specified relationship to the dynamic threshold value associated with that port, signals are not contained in the sum.
Description
50761-3'I
AUDIO CONFERENCE PLATFORM WITH
DYNAMIC SPEECH DETECTION THRESHOLD
BACKGROUND OF THE INVENTION
The present invention relates to telephony, and in pardcular to an audio conferencing platform.
Audio conferencing platforms are known. For example, see U.S. Patents 5,483,588 and 5,495,522. Audio conferencing platforms allow conference participants to easily schedule and conduct audio conferences with a large number of users. In addition, audio conference platforms are generally capable of simultaneously supporting many conferences.
A problem with existing audio conference platforlns is that they employ a fixed threshold to determine whether a conference participant is spealang. Using such a fixed threshold may result in a conference participant being added to the summed conference audio, even though they are not speaking. Specificatly, if the background audio noise is high (e.g., the user is on a factory floor), then the amount of digitized audio energy associated with that conference participant may be sufficient for the conference platform to falsely detect speech, and add the background noise to the conference sum under the mistaken belief that the energy is associated with speech.
Therefore, there is a need for a system that accounts for background noise in the detection of valid conference speakers.
SUMMARY OF THE INVEN'.CION
One object of the present invention is to provide a method and system that advantageously accounts for background noise on iines participating in a conference call and prevents the background noise from being added to the conference sum because an erroneous determinafiion has been made that the energy is associated with speech.
Another object is to provide such an advantage dynamically, to account for changing conditions on participating lines.
A preferred embodiment of the invention comprises an audio conferencing platform that includes a time division multiplexing (TDM) data bus, a controller, and an interface circuit that receives audio signals from a plurality of conference participants and provides digitized audio.signals in assigned time slots over the data bus. The audio conferencing platform also includes a plurality of digital signal processors (DSPs) adapted to communicate on the TDM bus with the interface circuit. At least one of the DSPs sums a plurality of the digitized audio signals associated with conference participants who are speaking to provide a summed conference signal. This DSP provides the summed conference signal to at least one of the other plurality of DSPs, which removes the digitized audio signal associated with a speaker whose voice is included in the summed conference signal, thus providing a customized conference audio signal to each of the speakers.
Each of the digitized audio signals are processed to determine whether the digitized audio signal includes speech. For each digitized audio signal, the amount of energy associated with the digitized audio signal is compared against a dynamic threshold value associated with the line over which the audio signal is received. The dynamic threshold value is set as a function of background noise within the digitized audio signal.
The audio conferencing platform preferably configures at least one of the DSPs as a centralized audio mixer and at least another one of the DSPs as an audio processor. The centralized audio mixer performs the step of summing a plurality of the digitized audio signals associated with conference participants who are speaking, to provide the summed conference signal. The centralized audio mixer provides the summed conference signal to the audio processor(s) for post processing and routing to the conference participants. The post processing includes removing the audio associated with a speaker from the conference signal to be sent to the speaker. For example, if there are forty conference participants and three of the participants are speaking, then the summed conference signal will include the audio from the three speakers. The summed conference signal is made available on the data bus to the thirty-seven non-speaking conference participants. However, the three speakers each receive an audio signal that is equal to the summed conference signal less the digitized audio signal associated with that speaker. Removing the speaker's own voice from the audio he hears reduces echoes.
AUDIO CONFERENCE PLATFORM WITH
DYNAMIC SPEECH DETECTION THRESHOLD
BACKGROUND OF THE INVENTION
The present invention relates to telephony, and in pardcular to an audio conferencing platform.
Audio conferencing platforms are known. For example, see U.S. Patents 5,483,588 and 5,495,522. Audio conferencing platforms allow conference participants to easily schedule and conduct audio conferences with a large number of users. In addition, audio conference platforms are generally capable of simultaneously supporting many conferences.
A problem with existing audio conference platforlns is that they employ a fixed threshold to determine whether a conference participant is spealang. Using such a fixed threshold may result in a conference participant being added to the summed conference audio, even though they are not speaking. Specificatly, if the background audio noise is high (e.g., the user is on a factory floor), then the amount of digitized audio energy associated with that conference participant may be sufficient for the conference platform to falsely detect speech, and add the background noise to the conference sum under the mistaken belief that the energy is associated with speech.
Therefore, there is a need for a system that accounts for background noise in the detection of valid conference speakers.
SUMMARY OF THE INVEN'.CION
One object of the present invention is to provide a method and system that advantageously accounts for background noise on iines participating in a conference call and prevents the background noise from being added to the conference sum because an erroneous determinafiion has been made that the energy is associated with speech.
Another object is to provide such an advantage dynamically, to account for changing conditions on participating lines.
A preferred embodiment of the invention comprises an audio conferencing platform that includes a time division multiplexing (TDM) data bus, a controller, and an interface circuit that receives audio signals from a plurality of conference participants and provides digitized audio.signals in assigned time slots over the data bus. The audio conferencing platform also includes a plurality of digital signal processors (DSPs) adapted to communicate on the TDM bus with the interface circuit. At least one of the DSPs sums a plurality of the digitized audio signals associated with conference participants who are speaking to provide a summed conference signal. This DSP provides the summed conference signal to at least one of the other plurality of DSPs, which removes the digitized audio signal associated with a speaker whose voice is included in the summed conference signal, thus providing a customized conference audio signal to each of the speakers.
Each of the digitized audio signals are processed to determine whether the digitized audio signal includes speech. For each digitized audio signal, the amount of energy associated with the digitized audio signal is compared against a dynamic threshold value associated with the line over which the audio signal is received. The dynamic threshold value is set as a function of background noise within the digitized audio signal.
The audio conferencing platform preferably configures at least one of the DSPs as a centralized audio mixer and at least another one of the DSPs as an audio processor. The centralized audio mixer performs the step of summing a plurality of the digitized audio signals associated with conference participants who are speaking, to provide the summed conference signal. The centralized audio mixer provides the summed conference signal to the audio processor(s) for post processing and routing to the conference participants. The post processing includes removing the audio associated with a speaker from the conference signal to be sent to the speaker. For example, if there are forty conference participants and three of the participants are speaking, then the summed conference signal will include the audio from the three speakers. The summed conference signal is made available on the data bus to the thirty-seven non-speaking conference participants. However, the three speakers each receive an audio signal that is equal to the summed conference signal less the digitized audio signal associated with that speaker. Removing the speaker's own voice from the audio he hears reduces echoes.
The centralized audio mixer also preferably receives DTMF detect bits indicative of the digitized audio signals that include a DTMF tone. The DTMF detect bits may be provided by another of the DSPs that is programmed to detect DTMF tones. If the digitized audio signal is associated with a speaker, but the digitized audio signal includes a DTMF
tone, the centralized conference mixer will not include the digitized audio signal in the summed conference signal while that DTIvIF detect bit signal is active. This ensures that conference participants do not hear annoying DTMF tones in the conference audio. DJhen the DTMF tone is no longer present in the digitized audio signal, the centratized conference mixer may include the audio signal in the summed conference signal.
The audio conference platform is preferably capable of supporting a number of simultaneous conferences (e.g., 384). As a result, the audio conference mixer provides a summed conference signal for each of the conferences.
Each of the digitized audio signals may be preprocessed. The preprocessing steps include decompressing the signal (e.g., using the well-known -law or A-law compression schemes), and determining whether the magnitude of the decompressed audio signal is greater than a detection threshold. If it is, then a speech bit associated with the digitized audio signal is set. Otherwise, the speech bit is cleared.
The centralized conference mixer reduces repetitive tasks distributed between the plurality of DSPs. In addition, centralized conference mixing provides a system architecture that is scalable and thus easily expanded.
Advantageously, using a dynamic threshold value to determine whether there is speech on a line helps to ensure that background noise is not falsely detected as speech.
tone, the centralized conference mixer will not include the digitized audio signal in the summed conference signal while that DTIvIF detect bit signal is active. This ensures that conference participants do not hear annoying DTMF tones in the conference audio. DJhen the DTMF tone is no longer present in the digitized audio signal, the centratized conference mixer may include the audio signal in the summed conference signal.
The audio conference platform is preferably capable of supporting a number of simultaneous conferences (e.g., 384). As a result, the audio conference mixer provides a summed conference signal for each of the conferences.
Each of the digitized audio signals may be preprocessed. The preprocessing steps include decompressing the signal (e.g., using the well-known -law or A-law compression schemes), and determining whether the magnitude of the decompressed audio signal is greater than a detection threshold. If it is, then a speech bit associated with the digitized audio signal is set. Otherwise, the speech bit is cleared.
The centralized conference mixer reduces repetitive tasks distributed between the plurality of DSPs. In addition, centralized conference mixing provides a system architecture that is scalable and thus easily expanded.
Advantageously, using a dynamic threshold value to determine whether there is speech on a line helps to ensure that background noise is not falsely detected as speech.
According to one aspect of the present invention, there is provided a method for conferencing, comprising: receiving audio signals over a plurality of ports; for at least one port, determining a dynamic threshold value based on at least one characteristic of signals received on the port; associating said dynamic threshold value with the port; and comparing at least one characteristic of signals subsequently received on the port to the dynamic threshold value; and establishing whether noise is present on at least one of the ports;
establishing a value of a speech bit for the at least one port based on a comparing; summing signals received over the plurality of ports, wherein signals received on the at least one port whose characteristics have a specified relationship to the dynamic threshold value are not contained in the sum.
The method may further comprise preprocessing audio signals by decompressing them using either p-law or A-law decompression.
In one aspect, the method comprises identifying which ports are receiving audio signals that contain speech; and, on each such identified port, transmitting a summed signal, wherein said summed signal does not contain signals received on that port.
In another aspect, the method comprises identifying which ports are receiving audio signals that contain DTMF tones; and, on each such identified port, transmitting a summed signal, wherein said summed signal does not contain signals received on that port. Preferably, the step of identifying comprises setting a DTMF detect bit for a signal. The method may also comprise the step of including signals from previously identified ports in the sum after those ports are no longer identified as receiving signals containing one or more DTMF tones.
The invention further comprises computer readable media and systems for implementing methods described herein.
According to another aspect of the present invention, there is provided a system for conferencing, comprising: means for receiving audio signals over a plurality of ports; for at least one port, means for determining a dynamic threshold value based on at least one characteristic of signals received on the -3a-port; means for associating said dynamic threshold value with the port; and means for comparing at least one characteristic of signals subsequently received on the port to the dynamic threshold value; and means for establishing whether noise is present on at least one of the ports; means for establishing a value of a speech bit for the at least one port based on a comparing; means for summing signals received over the plurality of ports, wherein signals received on the at least one port whose characteristics have a specified relationship to the dynamic threshold value are not contained in the sum.
According to still another aspect of the present invention, there is provided a method for conferencing, comprising: receiving a plurality of audio signals over a plurality of ports; establishing whether noise is present on at least one of said ports; determining a dynamic threshold value for said at least one port based on said establishing; comparing an energy level of at least one of said received audio signals on said at least one port to the determined dynamic threshold value; and establishing a value of a speech bit for said at least one port based on said comparing.
According to yet another aspect of the present invention, there is provided a system for conferencing, comprising: means for receiving a plurality of audio signals over a plurality of ports; means for establishing whether noise is present on at least one of said ports; means for determining a dynamic threshold value for said at least one port based on said establishing; means for comparing an energy level of at least one of said received audio signals on said at least one port to the determined dynamic threshold value; and means for establishing a value of a speech bit for said at least one port based on said comparing.
-3b-These and other objects, features, and advantages of the present invention will become apparent in light of the following detailed description of preferred embodiments thereof, as illustrated in the accompanying drawings.
Although the invention has been described in connection with an audio conferencing platform, it is not Li.mited to such a platform and may be used, for example, in a video conferencing system.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. I illustrates a conferencing system in accordance with a preferred embodiment of the present invention;
FIG. 2 illustrates a functional bloclc diagram of an audio conferencing platform of a preferred embodiment within the conferencing system of FIG. 1;
FIG. 3 is a bloclc diagram illustration of a processor board of a preferred embodiment within the audio conferencing platfonm of FIG. 2;
FIG. 4 is a functional bloclc diagram illustration of resources on the processor board of FIG.3;
establishing a value of a speech bit for the at least one port based on a comparing; summing signals received over the plurality of ports, wherein signals received on the at least one port whose characteristics have a specified relationship to the dynamic threshold value are not contained in the sum.
The method may further comprise preprocessing audio signals by decompressing them using either p-law or A-law decompression.
In one aspect, the method comprises identifying which ports are receiving audio signals that contain speech; and, on each such identified port, transmitting a summed signal, wherein said summed signal does not contain signals received on that port.
In another aspect, the method comprises identifying which ports are receiving audio signals that contain DTMF tones; and, on each such identified port, transmitting a summed signal, wherein said summed signal does not contain signals received on that port. Preferably, the step of identifying comprises setting a DTMF detect bit for a signal. The method may also comprise the step of including signals from previously identified ports in the sum after those ports are no longer identified as receiving signals containing one or more DTMF tones.
The invention further comprises computer readable media and systems for implementing methods described herein.
According to another aspect of the present invention, there is provided a system for conferencing, comprising: means for receiving audio signals over a plurality of ports; for at least one port, means for determining a dynamic threshold value based on at least one characteristic of signals received on the -3a-port; means for associating said dynamic threshold value with the port; and means for comparing at least one characteristic of signals subsequently received on the port to the dynamic threshold value; and means for establishing whether noise is present on at least one of the ports; means for establishing a value of a speech bit for the at least one port based on a comparing; means for summing signals received over the plurality of ports, wherein signals received on the at least one port whose characteristics have a specified relationship to the dynamic threshold value are not contained in the sum.
According to still another aspect of the present invention, there is provided a method for conferencing, comprising: receiving a plurality of audio signals over a plurality of ports; establishing whether noise is present on at least one of said ports; determining a dynamic threshold value for said at least one port based on said establishing; comparing an energy level of at least one of said received audio signals on said at least one port to the determined dynamic threshold value; and establishing a value of a speech bit for said at least one port based on said comparing.
According to yet another aspect of the present invention, there is provided a system for conferencing, comprising: means for receiving a plurality of audio signals over a plurality of ports; means for establishing whether noise is present on at least one of said ports; means for determining a dynamic threshold value for said at least one port based on said establishing; means for comparing an energy level of at least one of said received audio signals on said at least one port to the determined dynamic threshold value; and means for establishing a value of a speech bit for said at least one port based on said comparing.
-3b-These and other objects, features, and advantages of the present invention will become apparent in light of the following detailed description of preferred embodiments thereof, as illustrated in the accompanying drawings.
Although the invention has been described in connection with an audio conferencing platform, it is not Li.mited to such a platform and may be used, for example, in a video conferencing system.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. I illustrates a conferencing system in accordance with a preferred embodiment of the present invention;
FIG. 2 illustrates a functional bloclc diagram of an audio conferencing platform of a preferred embodiment within the conferencing system of FIG. 1;
FIG. 3 is a bloclc diagram illustration of a processor board of a preferred embodiment within the audio conferencing platfonm of FIG. 2;
FIG. 4 is a functional bloclc diagram illustration of resources on the processor board of FIG.3;
FIG. 5 is a flow chart illustrating the processing of signals received from network interface cards over a TDM bus;
FIG. 6 is a flow chart illustration of the DTMF tone detection processing;
FIGS. 7A-7B together provide a flow chart illustration of preferred conference mixer processing to create a summed conference signal; and FIG. 8 is a flow chart illustrating the processing of signals to be output to the network interface cards via the TDM bus.
DETAILED DESCRIPTION OF THE INVENTION
FIG. 1 is a diagram of a conferencing system 20 in accordance with a preferred embodiment of the present invention. The system 20 connects a plurality of user sites 21-23 through a switching network 24 to an audio conferencing platform 26. The plurality of user sites may be distributed worldwide, or at a company facility/campus. For example, each of the user sites 21-23 may be in different cities and connected to the audio platform 26 via the switching network 24, which may include PSTN and PBX systems. The connections between the user sites and the switching network 24 may include Tl, El, T3, and ISDN
lines.
Each user site 21-23 preferably includes one or more telephones 28 and one or more personal computers or servers 30. However, a user site may only include either a telephone, such as user site 21a, or a computer/server, such as user site 23a. The computer/server 30 may be connected via an Intemet/intranet backbone 32 to a server 34. The audio conferencing platform 26 and the server 34 are connected via a data link 36 (e.g., a 10/100 BaseT Ethernet link). The computer 30 allows the user to participate in a data conference simultaneous to the audio conference via the server 34. In addition, the user can use the computer 30 to interface (e.g., via a browser) with the server 34 to perform functions such as conference control, administration (e.g., system configuration, billing, reports,...), scheduling and account maintenance. The telephone 28 and the computer 30 may cooperate to provide voice over the Intemet/intranet 32 to the audio conferencing platform 26 via the data link 36.
FIG. 2 is a functional block diagram of an audio conferencing platform 26 in accordance with a preferred embodiment of the present invention. The audio conferencing platform 26 includes a plurality of network interface cards (NICs) 38-40 that receive audio information from the switching network 24 (see FIG. 1). Each NIC is preferably capable of handling a plurality of different trunk lines (e.g., eight). The data received by the NIC is generally an 8-bit -law or A-law sample. The NIC places the sample into a memory device (not shown), which is used to output the audio data onto a data bus. The data bus is preferably a TDM bus based, in one embodiment, upon the H.110 telephony standard.
The audio conferencing platform 26 also includes a plurality of processor boards 44-46 that receive and transmit data to the NICs 3 8-40 over the TDM bus 42.
The NICs and the processor boards 44-46 also communicate with a controller/CPU board 48 over a system bus 50. The system bus 50 is preferably based upon the Compact Peripheral Component Interconnect ("cPCI") standard. The CPU/controller communicates with the server 34 (see FIG. 1) via the data link 36. The controller/CPU board may include a general purpose processor such as a 200 MHz PentiumTM CPU manufactured by Intel Corporation, a processor from AMD or any other similar processor (including an ASIC) having sufficient processor speed (MIPS) to support the present invention.
FIG. 3 is block diagram illustration of the processor board 44. The boar,d 44 includes a plurality of dynamically programmable digital signal processors 60-65. Each digital signal processor (DSP) is an integrated circuit that communicates with the controller/CPU card 48 (see FIG. 2) over the system bus 50. Specifically, the processor board 44 includes a bus interface 68 that interconnects the DSPs 60-65 to the system bus 50. Each DSP
also includes an associated dual port RAM (DPR) 70-75 that buffers commands and data for transmission between the system bus 50 and the associated DSP.
Each DSP 60-65 also transmits data over and receives data from the TDM bus 42.
The processor card 44 includes a TDM bus interface 78 that performs any necessary signal conditioning and transformation. For example, if the TDM bus is an H.110 bus, it includes thirty-two serial lines. As a result the TDM bus interface may include a serial-to-parallel and a parallel-to-serial interface.
Each DSP 60-65 also includes an associated TDM dual port RAM 80-85 that buffers data for transmission between the TDM bus 42 and the associated DSP.
Each of the DSPs is preferably a general purpose digital signal processor IC, such as the model number TMS320C6201 processor available from Texas Instruments. The number of DSPs resident on the processor board 44 is a function of the size of the integrated circuits, their power consumption, and the heat dissipation ability of the processor board. For example, in certain embodiments there may be between four and ten DSPs per processor board.
Executable software applications may be downloaded from the controller/CPU 48 (see FIG. 2) via the system bus 50 to a selected one(s) of the DSPs 60-65. Each of the DSPs is preferably also connected to an adjacent DSP via a serial data link.
FIG. 4 is illustrates the DSP resources on the processor board 44 illustrated in FIG. 3.
Referring to FIGS. 3 and 4, the controller/CPU 48 (see FIG. 2) downloads executable program instructions to a DSP based upon the function that the controller/CPU
assigns to the DSP. For example, the controller/CPU may download executable program instructions for the DSP3 62 to function as an audio conference mixer 90, while the DSP2 61 and the DSP4 63 may be configured as audio processors 92, 94, respectively. Other DSPs 60, 65 may be configured by the controller/CPU 48 (see FIG. 2) to provide services such as DTMF detection 96, audio message generation 98 and music playback 100.
Each audio processor 92, 94 is capable of supporting a certain number of user ports (i.e., conference participants). This number is based upon the operational speed of the various components within the processor board and the over-all design of the system. Each audio processor 92, 94 receives compressed audio data 102 from the conference participants over the TDM bus 42.
The TDM bus 42 may, for example, support 4096 time slots, each having a bandwidth of 64 kbps. The timeslots are generally dynamically assigned by the controller/CPU 48 (see FIG. 2) as needed for the conferences that are currently occurring. However, one of ordinary skill in the art will recognize that in a static system the timeslots may be predetermined.
FIG. 5 is a flow chart illustrating the processing steps 500 performed by each audio processor on the digitized audio signals received over the TDM bus 42 from the NICs 38-40 (see FIG. 2). The executable program instructions associated with these processing steps 500 are typically downloaded to the audio processors 92, 94 (see FIG. 4) by the controller/CPU 48 (see FIG. 2). The download may occur during system initialization or reconfiguration. These processing steps 500 preferably are executed at least once every 125 microseconds to provide audio of the requisite quality.
For each of the active/assigned ports for the audio processor, step 502 reads the audio data for that port from TDM dual port RAM associated with the audio processor.
For example, if DSP2 61 (see FIG. 3) is configured to perform the function of audio processor 92 (see FIG. 4), then the data is read from the read bank of the TDM dual port RAM 81. If the audio processor 92 is responsible for, for example,700 active/assigned ports, then step 502 reads the 700 bytes of associated audio data from the TDM dual port RAM 81.
Each audio processor includes a time slot allocation table (not shown) that specifies the address location in the TDM dual port RAM for the audio data from each port.
Since each of the audio signals is typically compressed (e.g., -law, A-law), step 504 decompresses each of the 8-bit signals to a 16-bit word. Step 506 computes the average magnitude (AVM) for each of the decompressed signals associated with the ports assigned to the audio processor. For additional details, see co-pending U.S. Patent Application No.
09/532,602, filed March 22, 2000, entitled "Scalable Audio Conference Platform," the entire contents of which are incorporated herein by reference for all purposes.
Step 508 is performed to determine which of the ports are speaking. This step compares the average magnitude for the port computed in step 506 against a predetermined magnitude value representative of speech (e.g., -35 dBm). If average magnitude for the port exceeds the predetermined magnitude value representative of speech, a speech bit associated witli the port is set. Otherwise, the associated speech bit is cleared. Each port has an associated speech bit. Step 510 outputs all the speech bits (eight per timeslot) onto the TDM
bus. Step 512 is performed to calculate an automatic gain correction (AGC) value for each port. To compute an AGC value for the port, the AVM value is converted to an index value associated with a table containing gain/attenuation factors. For example, there may be 256 index values, each uniquely associated with 256 gain/attenuation factors. The index value is used by the conference mixer 90 (see FIG. 4) to determine the gain/attenuation factor to be applied to an audio signal that will be summed to create the conference sum signal.
In a preferred embodiment, the threshold used in step 508 to determine whether speech is present is a dynamic speech detection threshold value, set as a function of the noise detected on the line. For example, if the magnitude for the energy for the line/port exceeds a noise detection threshold value for a predetermined amount of time (e.g., three seconds), then noise is detected and a higher threshold value may be used in step 510 to determine whether the user is speaking. Once noise has been detected, the dynamic threshold value may be set as a function of the magnitude of the energy on the line. For example, the dynamic threshold value may be set to a certain value greater than the value of the noise on the line (e.g., the average noise). Each line may employ a different speech detection threshold, since the background noise on each of the lines may be different.
The system may also set a noise bit for the line, and the noise bit may be provided to the controller/CPU 48 (see FIG. 2) to take the necessary action due to the background noise.
The action may include not allowing this conference participant to be on the speech list (i.e., the list of lines summed to create the conference signal), or sending an audio message to the conference participant that the system detects high background noise and recommends that the conference participant try to take corrective action (e.g., move to a different area, close an office door, go off speaker phone, etc.).
Additional action may include sending an audio message to the conference participant that the system detects high background noise and instructing the participant to hit a key on the telephone keypad so the system does not consider the audio from the participant for the conference audio. The system would then detect the DTMF tone associated with the key being depressed and take the necessary action to prevent audio from this participant from being used in the conference sum, until such time that the user, for example, hits the same key again or another key instructing the system to consider audio from the participant for the conference sum.
FIG. 6 is a flow chart illustration of the DTMF tone detection processing 600.
These processing steps 600 are performed by the DTMF processor 96 (see FIG. 4), preferably at least once every 125 microseconds, to detect DTMF tones within digitized audio signals from the NICs 38-40 (FIG. 2). One or more of the DSPs may be configured to operate as a DTMF
tone detector. The executable program instructions associated with the processing steps 600 are typically downloaded by the controller/CPU 48 (see FIG. 2) to the DSP
designated to perform the DTMF tone detection function. The download may occur during initialization or system reconfiguration.
For an assigned number of the active/assigned ports of the conferencing system, step 602 reads the audio data for the port from the TDM dual port RAM associated with the DSP(s) configured to perform the DTMF tone detection function. Step 604 then expands the 8-bit signal to a 16-bit word. Next, step 606 tests each of.these decompressed audio signals to determine whether any of the signals includes a DTMF tone. For any signal that does include a DTMF tone, step 606 sets a DTMF detect bit associated with the port.
Otherwise, the DTMF detect bit is cleared. Each port has an associated DTMF detect bit.
Step 608 informs the controller/CPU 48 (see FIG. 3) through Dual Port Ram (DPR) which DTMF tone was detected, since the tone is representative of system commands and/or data from a conference participant. Step 610 outputs the DTMF detect bits onto the TDM
bus.
FIGS. 7A-7B collectively provide a flow chart illustrating processing steps performed by the audio conference mixer 90 (see FIG. 4), preferably at least once every 125 microseconds, to create a summed conference signal for each conference. The executable program instructions associated with the processing steps 700 are typically downloaded by the controller/CPU 48 (see FIG. 2) over the system bus 50 (see FIG. 2) to the DSP designated to perform the conference mixer function. The download may occur during initialization or system reconfiguration.
Referring to FIG. 7A, for each of the active/assigned ports of the audio conferencing system, step 702 reads the speech bit and the DTMF detect bit received over the TDM bus 42 (see FIG. 4). Alternatively, the speech bits may be provided over a dedicated serial link that interconnects the audio processor or processorsand the conference mixer. Step 704 is then performed to determine whether the speech bit for the port is set (i.e., whether energy that may be speech is detected on that port). If the speech bit is set, then step 706 is performed to see whether the DTMF detect bit for the port is also set. If the DTMF detect bit is clear, then the audio received by the port is speech and the audio does not include DTMF
tones. As a result, step 708 sets the conference bit for that port; otherwise, step 709 clears the conference bit associated with the port. Since the audio conferencing platform 26 (see FIG. 1) preferably can support many simultaneous conferences (e.g., 384), the controller/CPU 48 (see FIG. 2) keeps track of the conference that each port is assigned to and provides that information to the DSP performing the audio conference mixer function. Upon the completion of step 708, the conference bit for each port has been updated to indicate the conference participants whose voice should be included in the conference sum.
Referring to FIG. 7B, for each of the conferences, step 710 is performed, if needed, to decompress each of the audio signals associated with conference bits that are set. Step 711 performs AGC and gain/TLP (Test Level Point) compensation on the expanded signals from step 710. Step 712 is then performed to sum each of the compensated audio samples to provide a summed conference signal. Since many conference participants may be speaking at the same time, the system preferably limits the number of conference participants whose voice is summed to create the conference audio. For example, the system may sum the audio signals from a maximum of three speaking conference participants. Step 714 outputs the summed audio signal for the conference to the audio processors, as appropriate. In a preferred embodiment, the summed audio signal for each conference is output to the audio processor(s) over the TDM bus. Since the audio conferencing platform supports a number of simultaneous conferences, steps 710-714 are performed for each of the conferences.
FIG. 8 is a flow chart illustrating the processing steps 800 performed by each audio processor to output audio signals over the TDM bus to conference participants.
The executable program instructions associated with these processing steps 800 are typically downloaded to each audio processor by the controller/CPU during system initialization or reconfiguration. These steps 800 are also preferably executed at least once every 125 microseconds.
For each active/assigned port, step 802 retrieves the summed conference signal for the conference that the port is assigned to. Step 804 reads the conference bit associated with the port, and step 806 tests the bit to determine whether audio from the port was used to create the summed conference signal. If it was, then step 808 removes the gain (e.g., AGC and gain/TLP) compensated audio signal associated with the port from the summed audio signal.
This step removes the speaker's own voice from the conference audio. If step 806 determines that audio from the port was not used to create the summed conference signal, then step 808 is bypassed. To prepare the signal to be output, step 810 applies a gain, and step 812 compresses the gain corrected signal. Step 814 then outputs the compressed signal onto the TDM bus for routing to the conference participant associated with the port, via the NIC (see FIG. 2).
Preferably, the audio conferencing platform 26 (see FIG. 1) computes conference sums at a central location. This reduces the distributed summing that would otherwise need to be performed to ensure that the ports receive the proper conference audio.
In addition, the conference platforin is readily expandable by adding additional NICs and/or processor boards.
That is, the centralized conference mixer architecture allows the audio conferencing platform to be scaled to the user's requirements.
FIG. 6 is a flow chart illustration of the DTMF tone detection processing;
FIGS. 7A-7B together provide a flow chart illustration of preferred conference mixer processing to create a summed conference signal; and FIG. 8 is a flow chart illustrating the processing of signals to be output to the network interface cards via the TDM bus.
DETAILED DESCRIPTION OF THE INVENTION
FIG. 1 is a diagram of a conferencing system 20 in accordance with a preferred embodiment of the present invention. The system 20 connects a plurality of user sites 21-23 through a switching network 24 to an audio conferencing platform 26. The plurality of user sites may be distributed worldwide, or at a company facility/campus. For example, each of the user sites 21-23 may be in different cities and connected to the audio platform 26 via the switching network 24, which may include PSTN and PBX systems. The connections between the user sites and the switching network 24 may include Tl, El, T3, and ISDN
lines.
Each user site 21-23 preferably includes one or more telephones 28 and one or more personal computers or servers 30. However, a user site may only include either a telephone, such as user site 21a, or a computer/server, such as user site 23a. The computer/server 30 may be connected via an Intemet/intranet backbone 32 to a server 34. The audio conferencing platform 26 and the server 34 are connected via a data link 36 (e.g., a 10/100 BaseT Ethernet link). The computer 30 allows the user to participate in a data conference simultaneous to the audio conference via the server 34. In addition, the user can use the computer 30 to interface (e.g., via a browser) with the server 34 to perform functions such as conference control, administration (e.g., system configuration, billing, reports,...), scheduling and account maintenance. The telephone 28 and the computer 30 may cooperate to provide voice over the Intemet/intranet 32 to the audio conferencing platform 26 via the data link 36.
FIG. 2 is a functional block diagram of an audio conferencing platform 26 in accordance with a preferred embodiment of the present invention. The audio conferencing platform 26 includes a plurality of network interface cards (NICs) 38-40 that receive audio information from the switching network 24 (see FIG. 1). Each NIC is preferably capable of handling a plurality of different trunk lines (e.g., eight). The data received by the NIC is generally an 8-bit -law or A-law sample. The NIC places the sample into a memory device (not shown), which is used to output the audio data onto a data bus. The data bus is preferably a TDM bus based, in one embodiment, upon the H.110 telephony standard.
The audio conferencing platform 26 also includes a plurality of processor boards 44-46 that receive and transmit data to the NICs 3 8-40 over the TDM bus 42.
The NICs and the processor boards 44-46 also communicate with a controller/CPU board 48 over a system bus 50. The system bus 50 is preferably based upon the Compact Peripheral Component Interconnect ("cPCI") standard. The CPU/controller communicates with the server 34 (see FIG. 1) via the data link 36. The controller/CPU board may include a general purpose processor such as a 200 MHz PentiumTM CPU manufactured by Intel Corporation, a processor from AMD or any other similar processor (including an ASIC) having sufficient processor speed (MIPS) to support the present invention.
FIG. 3 is block diagram illustration of the processor board 44. The boar,d 44 includes a plurality of dynamically programmable digital signal processors 60-65. Each digital signal processor (DSP) is an integrated circuit that communicates with the controller/CPU card 48 (see FIG. 2) over the system bus 50. Specifically, the processor board 44 includes a bus interface 68 that interconnects the DSPs 60-65 to the system bus 50. Each DSP
also includes an associated dual port RAM (DPR) 70-75 that buffers commands and data for transmission between the system bus 50 and the associated DSP.
Each DSP 60-65 also transmits data over and receives data from the TDM bus 42.
The processor card 44 includes a TDM bus interface 78 that performs any necessary signal conditioning and transformation. For example, if the TDM bus is an H.110 bus, it includes thirty-two serial lines. As a result the TDM bus interface may include a serial-to-parallel and a parallel-to-serial interface.
Each DSP 60-65 also includes an associated TDM dual port RAM 80-85 that buffers data for transmission between the TDM bus 42 and the associated DSP.
Each of the DSPs is preferably a general purpose digital signal processor IC, such as the model number TMS320C6201 processor available from Texas Instruments. The number of DSPs resident on the processor board 44 is a function of the size of the integrated circuits, their power consumption, and the heat dissipation ability of the processor board. For example, in certain embodiments there may be between four and ten DSPs per processor board.
Executable software applications may be downloaded from the controller/CPU 48 (see FIG. 2) via the system bus 50 to a selected one(s) of the DSPs 60-65. Each of the DSPs is preferably also connected to an adjacent DSP via a serial data link.
FIG. 4 is illustrates the DSP resources on the processor board 44 illustrated in FIG. 3.
Referring to FIGS. 3 and 4, the controller/CPU 48 (see FIG. 2) downloads executable program instructions to a DSP based upon the function that the controller/CPU
assigns to the DSP. For example, the controller/CPU may download executable program instructions for the DSP3 62 to function as an audio conference mixer 90, while the DSP2 61 and the DSP4 63 may be configured as audio processors 92, 94, respectively. Other DSPs 60, 65 may be configured by the controller/CPU 48 (see FIG. 2) to provide services such as DTMF detection 96, audio message generation 98 and music playback 100.
Each audio processor 92, 94 is capable of supporting a certain number of user ports (i.e., conference participants). This number is based upon the operational speed of the various components within the processor board and the over-all design of the system. Each audio processor 92, 94 receives compressed audio data 102 from the conference participants over the TDM bus 42.
The TDM bus 42 may, for example, support 4096 time slots, each having a bandwidth of 64 kbps. The timeslots are generally dynamically assigned by the controller/CPU 48 (see FIG. 2) as needed for the conferences that are currently occurring. However, one of ordinary skill in the art will recognize that in a static system the timeslots may be predetermined.
FIG. 5 is a flow chart illustrating the processing steps 500 performed by each audio processor on the digitized audio signals received over the TDM bus 42 from the NICs 38-40 (see FIG. 2). The executable program instructions associated with these processing steps 500 are typically downloaded to the audio processors 92, 94 (see FIG. 4) by the controller/CPU 48 (see FIG. 2). The download may occur during system initialization or reconfiguration. These processing steps 500 preferably are executed at least once every 125 microseconds to provide audio of the requisite quality.
For each of the active/assigned ports for the audio processor, step 502 reads the audio data for that port from TDM dual port RAM associated with the audio processor.
For example, if DSP2 61 (see FIG. 3) is configured to perform the function of audio processor 92 (see FIG. 4), then the data is read from the read bank of the TDM dual port RAM 81. If the audio processor 92 is responsible for, for example,700 active/assigned ports, then step 502 reads the 700 bytes of associated audio data from the TDM dual port RAM 81.
Each audio processor includes a time slot allocation table (not shown) that specifies the address location in the TDM dual port RAM for the audio data from each port.
Since each of the audio signals is typically compressed (e.g., -law, A-law), step 504 decompresses each of the 8-bit signals to a 16-bit word. Step 506 computes the average magnitude (AVM) for each of the decompressed signals associated with the ports assigned to the audio processor. For additional details, see co-pending U.S. Patent Application No.
09/532,602, filed March 22, 2000, entitled "Scalable Audio Conference Platform," the entire contents of which are incorporated herein by reference for all purposes.
Step 508 is performed to determine which of the ports are speaking. This step compares the average magnitude for the port computed in step 506 against a predetermined magnitude value representative of speech (e.g., -35 dBm). If average magnitude for the port exceeds the predetermined magnitude value representative of speech, a speech bit associated witli the port is set. Otherwise, the associated speech bit is cleared. Each port has an associated speech bit. Step 510 outputs all the speech bits (eight per timeslot) onto the TDM
bus. Step 512 is performed to calculate an automatic gain correction (AGC) value for each port. To compute an AGC value for the port, the AVM value is converted to an index value associated with a table containing gain/attenuation factors. For example, there may be 256 index values, each uniquely associated with 256 gain/attenuation factors. The index value is used by the conference mixer 90 (see FIG. 4) to determine the gain/attenuation factor to be applied to an audio signal that will be summed to create the conference sum signal.
In a preferred embodiment, the threshold used in step 508 to determine whether speech is present is a dynamic speech detection threshold value, set as a function of the noise detected on the line. For example, if the magnitude for the energy for the line/port exceeds a noise detection threshold value for a predetermined amount of time (e.g., three seconds), then noise is detected and a higher threshold value may be used in step 510 to determine whether the user is speaking. Once noise has been detected, the dynamic threshold value may be set as a function of the magnitude of the energy on the line. For example, the dynamic threshold value may be set to a certain value greater than the value of the noise on the line (e.g., the average noise). Each line may employ a different speech detection threshold, since the background noise on each of the lines may be different.
The system may also set a noise bit for the line, and the noise bit may be provided to the controller/CPU 48 (see FIG. 2) to take the necessary action due to the background noise.
The action may include not allowing this conference participant to be on the speech list (i.e., the list of lines summed to create the conference signal), or sending an audio message to the conference participant that the system detects high background noise and recommends that the conference participant try to take corrective action (e.g., move to a different area, close an office door, go off speaker phone, etc.).
Additional action may include sending an audio message to the conference participant that the system detects high background noise and instructing the participant to hit a key on the telephone keypad so the system does not consider the audio from the participant for the conference audio. The system would then detect the DTMF tone associated with the key being depressed and take the necessary action to prevent audio from this participant from being used in the conference sum, until such time that the user, for example, hits the same key again or another key instructing the system to consider audio from the participant for the conference sum.
FIG. 6 is a flow chart illustration of the DTMF tone detection processing 600.
These processing steps 600 are performed by the DTMF processor 96 (see FIG. 4), preferably at least once every 125 microseconds, to detect DTMF tones within digitized audio signals from the NICs 38-40 (FIG. 2). One or more of the DSPs may be configured to operate as a DTMF
tone detector. The executable program instructions associated with the processing steps 600 are typically downloaded by the controller/CPU 48 (see FIG. 2) to the DSP
designated to perform the DTMF tone detection function. The download may occur during initialization or system reconfiguration.
For an assigned number of the active/assigned ports of the conferencing system, step 602 reads the audio data for the port from the TDM dual port RAM associated with the DSP(s) configured to perform the DTMF tone detection function. Step 604 then expands the 8-bit signal to a 16-bit word. Next, step 606 tests each of.these decompressed audio signals to determine whether any of the signals includes a DTMF tone. For any signal that does include a DTMF tone, step 606 sets a DTMF detect bit associated with the port.
Otherwise, the DTMF detect bit is cleared. Each port has an associated DTMF detect bit.
Step 608 informs the controller/CPU 48 (see FIG. 3) through Dual Port Ram (DPR) which DTMF tone was detected, since the tone is representative of system commands and/or data from a conference participant. Step 610 outputs the DTMF detect bits onto the TDM
bus.
FIGS. 7A-7B collectively provide a flow chart illustrating processing steps performed by the audio conference mixer 90 (see FIG. 4), preferably at least once every 125 microseconds, to create a summed conference signal for each conference. The executable program instructions associated with the processing steps 700 are typically downloaded by the controller/CPU 48 (see FIG. 2) over the system bus 50 (see FIG. 2) to the DSP designated to perform the conference mixer function. The download may occur during initialization or system reconfiguration.
Referring to FIG. 7A, for each of the active/assigned ports of the audio conferencing system, step 702 reads the speech bit and the DTMF detect bit received over the TDM bus 42 (see FIG. 4). Alternatively, the speech bits may be provided over a dedicated serial link that interconnects the audio processor or processorsand the conference mixer. Step 704 is then performed to determine whether the speech bit for the port is set (i.e., whether energy that may be speech is detected on that port). If the speech bit is set, then step 706 is performed to see whether the DTMF detect bit for the port is also set. If the DTMF detect bit is clear, then the audio received by the port is speech and the audio does not include DTMF
tones. As a result, step 708 sets the conference bit for that port; otherwise, step 709 clears the conference bit associated with the port. Since the audio conferencing platform 26 (see FIG. 1) preferably can support many simultaneous conferences (e.g., 384), the controller/CPU 48 (see FIG. 2) keeps track of the conference that each port is assigned to and provides that information to the DSP performing the audio conference mixer function. Upon the completion of step 708, the conference bit for each port has been updated to indicate the conference participants whose voice should be included in the conference sum.
Referring to FIG. 7B, for each of the conferences, step 710 is performed, if needed, to decompress each of the audio signals associated with conference bits that are set. Step 711 performs AGC and gain/TLP (Test Level Point) compensation on the expanded signals from step 710. Step 712 is then performed to sum each of the compensated audio samples to provide a summed conference signal. Since many conference participants may be speaking at the same time, the system preferably limits the number of conference participants whose voice is summed to create the conference audio. For example, the system may sum the audio signals from a maximum of three speaking conference participants. Step 714 outputs the summed audio signal for the conference to the audio processors, as appropriate. In a preferred embodiment, the summed audio signal for each conference is output to the audio processor(s) over the TDM bus. Since the audio conferencing platform supports a number of simultaneous conferences, steps 710-714 are performed for each of the conferences.
FIG. 8 is a flow chart illustrating the processing steps 800 performed by each audio processor to output audio signals over the TDM bus to conference participants.
The executable program instructions associated with these processing steps 800 are typically downloaded to each audio processor by the controller/CPU during system initialization or reconfiguration. These steps 800 are also preferably executed at least once every 125 microseconds.
For each active/assigned port, step 802 retrieves the summed conference signal for the conference that the port is assigned to. Step 804 reads the conference bit associated with the port, and step 806 tests the bit to determine whether audio from the port was used to create the summed conference signal. If it was, then step 808 removes the gain (e.g., AGC and gain/TLP) compensated audio signal associated with the port from the summed audio signal.
This step removes the speaker's own voice from the conference audio. If step 806 determines that audio from the port was not used to create the summed conference signal, then step 808 is bypassed. To prepare the signal to be output, step 810 applies a gain, and step 812 compresses the gain corrected signal. Step 814 then outputs the compressed signal onto the TDM bus for routing to the conference participant associated with the port, via the NIC (see FIG. 2).
Preferably, the audio conferencing platform 26 (see FIG. 1) computes conference sums at a central location. This reduces the distributed summing that would otherwise need to be performed to ensure that the ports receive the proper conference audio.
In addition, the conference platforin is readily expandable by adding additional NICs and/or processor boards.
That is, the centralized conference mixer architecture allows the audio conferencing platform to be scaled to the user's requirements.
One of ordinary skill will appreciate that the overall system design is a function of the processing ability of each DSP. For example, if a sufficiently fast DSP is available, then the functions of the audio conference mixer, the audio processor and the DTMF tone detection and the other DSP functions may be performed by a single DSP.
In addition, although the aspect of the dynamic threshold value has been discussed in the context of a system that employs a centralized summing architecture, one of ordinary skill in the art will recognize that dynamic thresholding is certainly not limited to systems with a centralized summing architecture. It is contemplated that all audio conferencing systems, and systems with similar audio cpabailities, would enjoy the benefits associated with employing a dynamic threshold value for determining whether a line includes speech.
Although the present invention has been shown and described with respect to several preferred embodiments tliereof, various changes, omissions and additions to the form and detail thereof, may be made therein, without departing from the spirit and scope of the invention.
In addition, although the aspect of the dynamic threshold value has been discussed in the context of a system that employs a centralized summing architecture, one of ordinary skill in the art will recognize that dynamic thresholding is certainly not limited to systems with a centralized summing architecture. It is contemplated that all audio conferencing systems, and systems with similar audio cpabailities, would enjoy the benefits associated with employing a dynamic threshold value for determining whether a line includes speech.
Although the present invention has been shown and described with respect to several preferred embodiments tliereof, various changes, omissions and additions to the form and detail thereof, may be made therein, without departing from the spirit and scope of the invention.
Claims (69)
1. A method for conferencing, comprising:
receiving audio signals over a plurality of ports;
for at least one port, determining a dynamic threshold value based on at least one characteristic of signals received on the port;
associating said dynamic threshold value with the port; and comparing at least one characteristic of signals subsequently received on the port to the dynamic threshold value; and establishing whether noise is present on at least one of the ports;
establishing a value of a speech bit for the at least one port based on said comparing;
summing signals received over the plurality of ports, wherein signals received on the at least one port whose characteristics have a specified relationship to the dynamic threshold value are not contained in the sum.
receiving audio signals over a plurality of ports;
for at least one port, determining a dynamic threshold value based on at least one characteristic of signals received on the port;
associating said dynamic threshold value with the port; and comparing at least one characteristic of signals subsequently received on the port to the dynamic threshold value; and establishing whether noise is present on at least one of the ports;
establishing a value of a speech bit for the at least one port based on said comparing;
summing signals received over the plurality of ports, wherein signals received on the at least one port whose characteristics have a specified relationship to the dynamic threshold value are not contained in the sum.
2. The method of claim 1, wherein the dynamic threshold value is an energy level.
3. The method of claim 1, wherein the dynamic threshold value is determined based on one or more characteristics that comprise energy level.
4. The method of claim 1, wherein the at least one characteristic of signals subsequently received on the port compared to the dynamic threshold value comprise energy level.
5. The method of claim 1, wherein the specified relationship to the dynamic threshold value is that of being less than the threshold value.
6. The method of claim 1, further comprising:
identifying which ports are receiving audio signals that contain speech; and on each such identified port, transmitting a summed signal, wherein said summed signal does not contain signals received on that port.
identifying which ports are receiving audio signals that contain speech; and on each such identified port, transmitting a summed signal, wherein said summed signal does not contain signals received on that port.
7. The method of claim 1, further comprising:
identifying which ports are receiving audio signals that contain DTMF
tones; and on each such identified port, transmitting a summed signal, wherein said summed signal does not contain signals received on that port.
identifying which ports are receiving audio signals that contain DTMF
tones; and on each such identified port, transmitting a summed signal, wherein said summed signal does not contain signals received on that port.
8. The method of claim 7, wherein said step of identifying comprises setting a DTMF detect bit for a signal.
9. The method of claim 1, further comprising preprocessing received audio signals by decompressing the signals.
10. The method of claim 9, wherein said step of comparing at least one characteristic of signals subsequently received on the port to the dynamic threshold value comprises determining whether a magnitude of a decompressed audio signal is greater than said threshold value.
11. The method of claim 9, wherein said step of decompressing uses µ-law decompression.
12. The method of claim 9, wherein said step of decompressing uses A-law decompression.
13. The method of claim 7, further comprising the step of including signals from previously identified ports in said sum after those ports are no longer identified as receiving signals containing one or more DTMF tones.
14. A computer readable medium having computer executable instructions stored thereon, the instructions comprising:
instructions for receiving audio signals over a plurality of ports;
for at least one port, instructions for determining a dynamic threshold value based on one or more characteristics of signals received on the port;
instructions for associating said dynamic threshold value with the port; and instructions for comparing one or more characteristics of signals subsequently received on the port to the dynamic threshold value; and instructions for establishing whether noise is present on at least one of the ports;
instructions for establishing a value of a speech bit for the at least one port based on said comparing;
instructions for summing signals received over the plurality of ports, wherein signals received on the at least one port whose characteristics have a specified relationship to the dynamic threshold value are not contained in the sum.
instructions for receiving audio signals over a plurality of ports;
for at least one port, instructions for determining a dynamic threshold value based on one or more characteristics of signals received on the port;
instructions for associating said dynamic threshold value with the port; and instructions for comparing one or more characteristics of signals subsequently received on the port to the dynamic threshold value; and instructions for establishing whether noise is present on at least one of the ports;
instructions for establishing a value of a speech bit for the at least one port based on said comparing;
instructions for summing signals received over the plurality of ports, wherein signals received on the at least one port whose characteristics have a specified relationship to the dynamic threshold value are not contained in the sum.
15. The computer readable medium of claim 14, wherein the dynamic threshold value is an energy level.
16. The computer readable medium of claim 14, wherein the dynamic threshold value is determined based on one or more characteristics that comprise energy level.
17. The computer readable medium of claim 14, wherein the one or more characteristics of signals subsequently received on the port compared to the dynamic threshold value comprise energy level.
18. The computer readable medium of claim 14, wherein the specified relationship to the dynamic threshold value is that of being less than the threshold value.
19. The computer readable medium of claim 14, further comprising:
instructions for identifying which ports are receiving audio signals that contain speech; and instructions for, on each such identified port, transmitting a summed signal, wherein said summed signal does not contain signals received on that port.
instructions for identifying which ports are receiving audio signals that contain speech; and instructions for, on each such identified port, transmitting a summed signal, wherein said summed signal does not contain signals received on that port.
20. The computer readable medium of claim 14, wherein the instructions further comprise:
instructions for identifying which ports are receiving audio signals that contain DTMF tones; and instructions for, on each such identified port, transmitting a summed signal, wherein said summed signal does not contain signals received on that port.
instructions for identifying which ports are receiving audio signals that contain DTMF tones; and instructions for, on each such identified port, transmitting a summed signal, wherein said summed signal does not contain signals received on that port.
21. The computer readable medium of claim 20, wherein said instructions for identifying comprises software for setting a DTMF detect bit for a signal.
22. The computer readable medium of claim 14, wherein the instructions further comprise instructions for preprocessing received audio signals by decompressing the signals.
23. The computer readable medium of claim 22, wherein said instructions for comparing one or more characteristics of signals subsequently received on the port to the dynamic threshold value comprises instructions for determining whether a magnitude of a decompressed audio signal is greater than said threshold value.
24. The computer readable medium of claim 22, wherein said instructions for decompressing uses µ-law decompression.
25. The computer readable medium of claim 22, wherein said instructions for decompressing uses A-law decompression.
26. The computer readable medium of claim 20, wherein the instructions further comprise instructions for including signals from previously identified ports in said sum after those ports are no longer identified as receiving signals containing one or more DTMF tones.
27. A system for conferencing, comprising:
means for receiving audio signals over a plurality of ports;
for at least one port, means for determining a dynamic threshold value based on at least one characteristic of signals received on the port;
means for associating said dynamic threshold value with the port; and means for comparing at least one characteristic of signals subsequently received on the port to the dynamic threshold value; and means for establishing whether noise is present on at least one of the ports;
means for establishing a value of a speech bit for the at least one port based on said comparing;
means for summing signals received over the plurality of ports, wherein signals received on the at least one port whose characteristics have a specified relationship to the dynamic threshold value are not contained in the sum.
means for receiving audio signals over a plurality of ports;
for at least one port, means for determining a dynamic threshold value based on at least one characteristic of signals received on the port;
means for associating said dynamic threshold value with the port; and means for comparing at least one characteristic of signals subsequently received on the port to the dynamic threshold value; and means for establishing whether noise is present on at least one of the ports;
means for establishing a value of a speech bit for the at least one port based on said comparing;
means for summing signals received over the plurality of ports, wherein signals received on the at least one port whose characteristics have a specified relationship to the dynamic threshold value are not contained in the sum.
28. The system of claim 27, wherein the dynamic threshold value is an energy level.
29. The system of claim 27, wherein the dynamic threshold value is determined based on at least one characteristic that comprises energy level.
30. The system of claim 27, wherein the at least one characteristic of signals subsequently received on the port compared to the dynamic threshold value comprise energy level.
31. The system of claim 27, wherein the specified relationship to the dynamic threshold value is that of being less than the threshold value.
32. The system of claim 27, further comprising:
means for identifying which ports are receiving audio signals that contain speech; and means for, on each such identified port, transmitting a summed signal, wherein said summed signal does not contain signals received on that port.
means for identifying which ports are receiving audio signals that contain speech; and means for, on each such identified port, transmitting a summed signal, wherein said summed signal does not contain signals received on that port.
33. The system of claim 27, further comprising:
means for identifying which ports are receiving audio signals that contain DTMF tones; and means for, on each such identified port, transmitting a summed signal, wherein said summed signal does not contain signals received on that port.
means for identifying which ports are receiving audio signals that contain DTMF tones; and means for, on each such identified port, transmitting a summed signal, wherein said summed signal does not contain signals received on that port.
34. The system of claim 33, wherein said means for identifying comprises means for setting a DTMF detect bit for a signal.
35. The system of claim 27, further comprising means for preprocessing received audio signals by decompressing the signals.
36. The system of claim 35, wherein said means for comparing at least one characteristic of signals subsequently received on the port to the dynamic threshold value comprises means for determining whether a magnitude of a decompressed audio signal is greater than said threshold value.
37. The system of claim 35, wherein said means for decompressing uses µ-law decompression.
38. The system of claim 35, wherein said means for decompressing uses A-law decompression.
39. The system of claim 33, further comprising means for including signals from previously identified ports in said sum after those ports are no longer identified as receiving signals containing one or more DTMF tones.
40. A method for conferencing, comprising:
receiving a plurality of audio signals over a plurality of ports;
establishing whether noise is present on at least one of said ports;
determining a dynamic threshold value for said at least one port based on said establishing;
comparing an energy level of at least one of said received audio signals on said at least one port to the determined dynamic threshold value;
and establishing a value of a speech bit for said at least one port based on said comparing.
receiving a plurality of audio signals over a plurality of ports;
establishing whether noise is present on at least one of said ports;
determining a dynamic threshold value for said at least one port based on said establishing;
comparing an energy level of at least one of said received audio signals on said at least one port to the determined dynamic threshold value;
and establishing a value of a speech bit for said at least one port based on said comparing.
41. The method of claim 40, further comprising:
omitting said at least one port from a conference sum of said ports if said energy level of said at least one received audio signal on said at least one port does not exceed said determined dynamic threshold value.
omitting said at least one port from a conference sum of said ports if said energy level of said at least one received audio signal on said at least one port does not exceed said determined dynamic threshold value.
42. The method of claim 40, wherein said establishing whether said noise is present comprises:
detecting the presence of said noise on said at least one port if said energy level of said at least one received audio signal on said at least one port exceeds a noise detection threshold value.
detecting the presence of said noise on said at least one port if said energy level of said at least one received audio signal on said at least one port exceeds a noise detection threshold value.
43. The method of claim 42 further comprising:
setting a noise bit for said at least one port if said noise is present on said at least one port.
setting a noise bit for said at least one port if said noise is present on said at least one port.
44. The method of claim 43 wherein said setting comprises:
setting said noise bit if said energy level of said at least one received audio signal exceeds said noise detection threshold value for a predetermined time period.
setting said noise bit if said energy level of said at least one received audio signal exceeds said noise detection threshold value for a predetermined time period.
45. The method of claim 43 wherein said setting comprises:
setting said noise bit if said energy level of said at least one received audio signal exceeds said noise detection threshold value for over two seconds.
setting said noise bit if said energy level of said at least one received audio signal exceeds said noise detection threshold value for over two seconds.
46. The method of claim 43 further comprising:
if said noise bit is set, sending a message over said at least one port indicating a high-noise condition.
if said noise bit is set, sending a message over said at least one port indicating a high-noise condition.
47. The method of claim 40, wherein said establishing said value of said speech bit comprises:
setting said speech bit if said energy level of said at least one received audio signal on said at least one port exceeds said determined dynamic threshold value.
setting said speech bit if said energy level of said at least one received audio signal on said at least one port exceeds said determined dynamic threshold value.
48. The method of claim 40, wherein said determining comprises:
if said noise is present on said at least one port, making a level of said dynamic threshold value for said at least one port greater than a level of said noise on said at least one port by a predetermined margin.
if said noise is present on said at least one port, making a level of said dynamic threshold value for said at least one port greater than a level of said noise on said at least one port by a predetermined margin.
49. The method of claim 40, further comprising:
determining whether one or more selected ports of said plurality of ports are receiving audio signals that contain speech; and on each said selected port, transmitting a summed signal omitting said speech-containing audio signals received at each said selected port.
determining whether one or more selected ports of said plurality of ports are receiving audio signals that contain speech; and on each said selected port, transmitting a summed signal omitting said speech-containing audio signals received at each said selected port.
50. The method of claim 40, further comprising:
determining whether one or more selected ports of said plurality of ports are receiving audio signals that contain Dual Tone Multi-Frequency (DTMF) tones; and on each said selected port, transmitting a summed signal omitting said DTMF-tone-containing audio signals received at each said selected port.
determining whether one or more selected ports of said plurality of ports are receiving audio signals that contain Dual Tone Multi-Frequency (DTMF) tones; and on each said selected port, transmitting a summed signal omitting said DTMF-tone-containing audio signals received at each said selected port.
51. The method of claim 50, further comprising setting a DTMF detect bit for each said selected port.
52. The method of claim 40, further comprising preprocessing said plurality of received audio signals by decompressing the signals.
53. The method of claim 52, wherein said comprises:
comparing an energy level of said decompressed at least one received audio signal to said determined dynamic threshold value.
comparing an energy level of said decompressed at least one received audio signal to said determined dynamic threshold value.
54. The method of claim 52, wherein said decompressing uses u-law decompression.
55. The method claim 52, wherein said decompressing uses A-law decompression.
56. The method of claim 50, further comprising including in said summed signal signals received on ports no longer receiving said DTMF-tone-containing audio signals.
57. A system for conferencing, comprising:
means for receiving a plurality of audio signals over a plurality of ports;
means for establishing whether noise is present on at least one of said ports;
means for determining a dynamic threshold value for said at least one port based on said establishing;
means for comparing an energy level of at least one of said received audio signals on said at least one port to the determined dynamic threshold value;
and means for establishing a value of a speech bit for said at least one port based on said comparing.
means for receiving a plurality of audio signals over a plurality of ports;
means for establishing whether noise is present on at least one of said ports;
means for determining a dynamic threshold value for said at least one port based on said establishing;
means for comparing an energy level of at least one of said received audio signals on said at least one port to the determined dynamic threshold value;
and means for establishing a value of a speech bit for said at least one port based on said comparing.
58. The system of claim 57, further comprising:
means for omitting said least one port from a conference sum of said ports if said energy level of said at least one received audio signal on said at least one port does not exceed said dynamic threshold value.
means for omitting said least one port from a conference sum of said ports if said energy level of said at least one received audio signal on said at least one port does not exceed said dynamic threshold value.
59. The system of claim 57, wherein said means for establishing whether said noise is present comprises:
means for detecting the presence of noise on said at least one port if said energy level of said least one received audio signal on said at least one port exceeds a noise detection threshold value.
means for detecting the presence of noise on said at least one port if said energy level of said least one received audio signal on said at least one port exceeds a noise detection threshold value.
60. The system of claim 57, wherein said means for establishing said value of said speech bit comprises:
means for setting said speech bit if said energy level of said at least one received audio signal on said at least one port exceeds said determined dynamic threshold value.
means for setting said speech bit if said energy level of said at least one received audio signal on said at least one port exceeds said determined dynamic threshold value.
61. The system of claim 57, wherein said means for determining comprises:
means for making a level of said dynamic threshold value for said at least one port greater than a level of said noise on said at least one port by a predetermined margin, if said noise is present on said least one port.
means for making a level of said dynamic threshold value for said at least one port greater than a level of said noise on said at least one port by a predetermined margin, if said noise is present on said least one port.
62. The system of claim 57, further comprising:
means for determining whether one or more selected ports of said plurality of ports are receiving audio signals that contain speech; and means for transmitting, on each said selected port, a summed signal omitting said speech-containing audio signals received at each said selected port.
means for determining whether one or more selected ports of said plurality of ports are receiving audio signals that contain speech; and means for transmitting, on each said selected port, a summed signal omitting said speech-containing audio signals received at each said selected port.
63. The system of claim 57, further comprising:
means for determining whether one or more selected ports of said plurality of ports are receiving audio signals that contain Dual Tone Multi-Frequency (DTMF) tones; and means for transmitting, on each said selected port, a summed signal omitting said DTMF-tone-containing audio signals received at each said selected port.
means for determining whether one or more selected ports of said plurality of ports are receiving audio signals that contain Dual Tone Multi-Frequency (DTMF) tones; and means for transmitting, on each said selected port, a summed signal omitting said DTMF-tone-containing audio signals received at each said selected port.
64. The system of claim 63, further comprising means for setting a DTMF detect bit for each said selected port.
65. The system of claim 57, further comprising means for preprocessing said plurality of received audio signals by decompressing the signals.
66. The system of claim 65, wherein said means for comparing comprises:
means for comparing an energy level of said decompressed at least one received audio signal to said determined dynamic threshold value.
means for comparing an energy level of said decompressed at least one received audio signal to said determined dynamic threshold value.
67. The system of claim 65, wherein said decompressing uses u-law decompression.
68. The system of claim 65, wherein said decompressing uses A-law decompression.
69. The system of claim 36, further comprising means for including in said summed signal signals received on ports no longer receiving said DTMF-tone-containing audio signals.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US28744101P | 2001-04-30 | 2001-04-30 | |
US60/287,441 | 2001-04-30 | ||
PCT/US2002/013438 WO2002089458A1 (en) | 2001-04-30 | 2002-04-30 | Audio conference platform with dynamic speech detection threshold |
Publications (2)
Publication Number | Publication Date |
---|---|
CA2446085A1 CA2446085A1 (en) | 2002-11-07 |
CA2446085C true CA2446085C (en) | 2010-04-27 |
Family
ID=23102925
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA2446085A Expired - Lifetime CA2446085C (en) | 2001-04-30 | 2002-04-30 | Audio conference platform with dynamic speech detection threshold |
Country Status (4)
Country | Link |
---|---|
US (3) | US6721411B2 (en) |
EP (1) | EP1391106B1 (en) |
CA (1) | CA2446085C (en) |
WO (1) | WO2002089458A1 (en) |
Families Citing this family (90)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1226578A4 (en) * | 1999-12-31 | 2005-09-21 | Octiv Inc | Techniques for improving audio clarity and intelligibility at reduced bit rates over a digital network |
US20020075965A1 (en) * | 2000-12-20 | 2002-06-20 | Octiv, Inc. | Digital signal processing techniques for improving audio clarity and intelligibility |
US20030023429A1 (en) * | 2000-12-20 | 2003-01-30 | Octiv, Inc. | Digital signal processing techniques for improving audio clarity and intelligibility |
US7236929B2 (en) * | 2001-05-09 | 2007-06-26 | Plantronics, Inc. | Echo suppression and speech detection techniques for telephony applications |
US6888925B2 (en) * | 2001-10-26 | 2005-05-03 | Snowshore Networks, Inc. | Method for testing large-scale audio conference servers |
US7433462B2 (en) * | 2002-10-31 | 2008-10-07 | Plantronics, Inc | Techniques for improving telephone audio quality |
US7031752B1 (en) * | 2003-10-24 | 2006-04-18 | Excel Switching Corporation | Media resource card with programmable caching for converged services platform |
US20050285935A1 (en) * | 2004-06-29 | 2005-12-29 | Octiv, Inc. | Personal conferencing node |
US20050286443A1 (en) * | 2004-06-29 | 2005-12-29 | Octiv, Inc. | Conferencing system |
US8116500B2 (en) * | 2004-10-15 | 2012-02-14 | Lifesize Communications, Inc. | Microphone orientation and size in a speakerphone |
US7720232B2 (en) * | 2004-10-15 | 2010-05-18 | Lifesize Communications, Inc. | Speakerphone |
US7826624B2 (en) * | 2004-10-15 | 2010-11-02 | Lifesize Communications, Inc. | Speakerphone self calibration and beam forming |
US7760887B2 (en) * | 2004-10-15 | 2010-07-20 | Lifesize Communications, Inc. | Updating modeling information based on online data gathering |
US7720236B2 (en) * | 2004-10-15 | 2010-05-18 | Lifesize Communications, Inc. | Updating modeling information based on offline calibration experiments |
US7903137B2 (en) * | 2004-10-15 | 2011-03-08 | Lifesize Communications, Inc. | Videoconferencing echo cancellers |
US7970151B2 (en) * | 2004-10-15 | 2011-06-28 | Lifesize Communications, Inc. | Hybrid beamforming |
US20060132595A1 (en) * | 2004-10-15 | 2006-06-22 | Kenoyer Michael L | Speakerphone supporting video and audio features |
US7599357B1 (en) * | 2004-12-14 | 2009-10-06 | At&T Corp. | Method and apparatus for detecting and correcting electrical interference in a conference call |
US20060221869A1 (en) * | 2005-03-29 | 2006-10-05 | Teck-Kuen Chua | System and method for audio multicast |
US7822192B2 (en) * | 2005-03-30 | 2010-10-26 | Applied Voice And Speech Technologies, Inc. | Sound event processing with echo analysis |
US7970150B2 (en) * | 2005-04-29 | 2011-06-28 | Lifesize Communications, Inc. | Tracking talkers using virtual broadside scan and directed beams |
US7593539B2 (en) | 2005-04-29 | 2009-09-22 | Lifesize Communications, Inc. | Microphone and speaker arrangement in speakerphone |
US7991167B2 (en) * | 2005-04-29 | 2011-08-02 | Lifesize Communications, Inc. | Forming beams with nulls directed at noise sources |
US7606856B2 (en) * | 2005-11-09 | 2009-10-20 | Scenera Technologies, Llc | Methods, systems, and computer program products for presenting topical information referenced during a communication |
US8588220B2 (en) * | 2005-12-30 | 2013-11-19 | L-3 Communications Corporation | Method and apparatus for mitigating port swapping during signal tracking |
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US7957512B2 (en) | 2006-10-27 | 2011-06-07 | Nortel Networks Limited | Source selection for conference bridges |
US8660039B2 (en) * | 2007-01-08 | 2014-02-25 | Intracom Systems, Llc | Multi-channel multi-access voice over IP intercommunication systems and methods |
US8126129B1 (en) | 2007-02-01 | 2012-02-28 | Sprint Spectrum L.P. | Adaptive audio conferencing based on participant location |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
CN101383876A (en) * | 2007-09-07 | 2009-03-11 | 华为技术有限公司 | Method, media server acquiring current active speaker in conference |
US8676904B2 (en) | 2008-10-02 | 2014-03-18 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US9191234B2 (en) * | 2009-04-09 | 2015-11-17 | Rpx Clearinghouse Llc | Enhanced communication bridge |
US10706373B2 (en) | 2011-06-03 | 2020-07-07 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US8355490B2 (en) * | 2009-12-08 | 2013-01-15 | At&T Intellectual Property I, Lp | Computer program product and method to detect noisy connections on a telephonic conference bridge |
US8548146B2 (en) | 2010-05-13 | 2013-10-01 | At&T Intellectual Property, I, L.P. | Method and system to manage connections on a conference bridge |
US10417037B2 (en) | 2012-05-15 | 2019-09-17 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
EP2901668B1 (en) | 2012-09-27 | 2018-11-14 | Dolby Laboratories Licensing Corporation | Method for improving perceptual continuity in a spatial teleconferencing system |
CN113470640B (en) | 2013-02-07 | 2022-04-26 | 苹果公司 | Voice trigger of digital assistant |
US10652394B2 (en) | 2013-03-14 | 2020-05-12 | Apple Inc. | System and method for processing voicemail |
US10748529B1 (en) | 2013-03-15 | 2020-08-18 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
TWI566107B (en) | 2014-05-30 | 2017-01-11 | 蘋果公司 | Method for processing a multi-part voice command, non-transitory computer readable storage medium and electronic device |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US10200824B2 (en) | 2015-05-27 | 2019-02-05 | Apple Inc. | Systems and methods for proactively identifying and surfacing relevant content on a touch-sensitive device |
WO2016205296A1 (en) | 2015-06-16 | 2016-12-22 | Dolby Laboratories Licensing Corporation | Post-teleconference playback using non-destructive audio transport |
US20160378747A1 (en) | 2015-06-29 | 2016-12-29 | Apple Inc. | Virtual assistant for media playback |
US10224038B2 (en) * | 2015-07-14 | 2019-03-05 | International Business Machines Corporation | Off-device fact-checking of statements made in a call |
US10740384B2 (en) | 2015-09-08 | 2020-08-11 | Apple Inc. | Intelligent automated assistant for media search and playback |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US10331312B2 (en) | 2015-09-08 | 2019-06-25 | Apple Inc. | Intelligent automated assistant in a media environment |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10956666B2 (en) | 2015-11-09 | 2021-03-23 | Apple Inc. | Unconventional virtual assistant interactions |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10586535B2 (en) | 2016-06-10 | 2020-03-10 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
DK201670540A1 (en) | 2016-06-11 | 2018-01-08 | Apple Inc | Application integration with a digital assistant |
DK179415B1 (en) | 2016-06-11 | 2018-06-14 | Apple Inc | Intelligent device arbitration and control |
DK180048B1 (en) | 2017-05-11 | 2020-02-04 | Apple Inc. | MAINTAINING THE DATA PROTECTION OF PERSONAL INFORMATION |
US10726832B2 (en) | 2017-05-11 | 2020-07-28 | Apple Inc. | Maintaining privacy of personal information |
DK179745B1 (en) | 2017-05-12 | 2019-05-01 | Apple Inc. | SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT |
DK201770429A1 (en) | 2017-05-12 | 2018-12-14 | Apple Inc. | Low-latency intelligent automated assistant |
DK179496B1 (en) | 2017-05-12 | 2019-01-15 | Apple Inc. | USER-SPECIFIC Acoustic Models |
US20180336892A1 (en) | 2017-05-16 | 2018-11-22 | Apple Inc. | Detecting a trigger of a digital assistant |
US20180336275A1 (en) | 2017-05-16 | 2018-11-22 | Apple Inc. | Intelligent automated assistant for media exploration |
EP3506563A1 (en) * | 2017-12-29 | 2019-07-03 | Unify Patente GmbH & Co. KG | Method, system, and server for reducing noise in a workspace |
US10818288B2 (en) | 2018-03-26 | 2020-10-27 | Apple Inc. | Natural assistant interaction |
US11145294B2 (en) | 2018-05-07 | 2021-10-12 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US10928918B2 (en) | 2018-05-07 | 2021-02-23 | Apple Inc. | Raise to speak |
DK180639B1 (en) | 2018-06-01 | 2021-11-04 | Apple Inc | DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT |
DK179822B1 (en) | 2018-06-01 | 2019-07-12 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US10892996B2 (en) | 2018-06-01 | 2021-01-12 | Apple Inc. | Variable latency device coordination |
US10924894B2 (en) * | 2018-09-20 | 2021-02-16 | Avaya Inc. | System and method for sending and receiving non-visual messages in an electronic audio communication session |
US11462215B2 (en) | 2018-09-28 | 2022-10-04 | Apple Inc. | Multi-modal inputs for voice commands |
US11348573B2 (en) | 2019-03-18 | 2022-05-31 | Apple Inc. | Multimodality in digital assistant systems |
DK201970509A1 (en) | 2019-05-06 | 2021-01-15 | Apple Inc | Spoken notifications |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
US11140099B2 (en) | 2019-05-21 | 2021-10-05 | Apple Inc. | Providing message response suggestions |
DK180129B1 (en) | 2019-05-31 | 2020-06-02 | Apple Inc. | User activity shortcut suggestions |
DK201970511A1 (en) | 2019-05-31 | 2021-02-15 | Apple Inc | Voice identification in digital assistant systems |
US11468890B2 (en) | 2019-06-01 | 2022-10-11 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11061543B1 (en) | 2020-05-11 | 2021-07-13 | Apple Inc. | Providing relevant data items based on context |
US11183193B1 (en) | 2020-05-11 | 2021-11-23 | Apple Inc. | Digital assistant hardware abstraction |
US11755276B2 (en) | 2020-05-12 | 2023-09-12 | Apple Inc. | Reducing description length based on confidence |
US11490204B2 (en) | 2020-07-20 | 2022-11-01 | Apple Inc. | Multi-device audio adjustment coordination |
US11438683B2 (en) | 2020-07-21 | 2022-09-06 | Apple Inc. | User identification using headphones |
US11146602B1 (en) * | 2020-12-04 | 2021-10-12 | Plantronics, Inc. | User status detection and interface |
Family Cites Families (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2162719B (en) * | 1984-08-01 | 1988-01-06 | Stc Plc | Telephone exchange conferencing |
US5548638A (en) * | 1992-12-21 | 1996-08-20 | Iwatsu Electric Co., Ltd. | Audio teleconferencing apparatus |
JPH08506460A (en) * | 1993-02-01 | 1996-07-09 | マルティリンク インコーポレイテッド | Method and apparatus for audio conferencing connection of multiple telephone channels |
US5841763A (en) * | 1995-06-13 | 1998-11-24 | Multilink, Inc. | Audio-video conferencing system |
US5768263A (en) * | 1995-10-20 | 1998-06-16 | Vtel Corporation | Method for talk/listen determination and multipoint conferencing system using such method |
US5889851A (en) * | 1996-02-26 | 1999-03-30 | Lucent Technologies Inc. | DTMF signal detection/removal using adaptive filters |
EP0867856B1 (en) * | 1997-03-25 | 2005-10-26 | Koninklijke Philips Electronics N.V. | Method and apparatus for vocal activity detection |
US5983183A (en) * | 1997-07-07 | 1999-11-09 | General Data Comm, Inc. | Audio automatic gain control system |
US6718302B1 (en) * | 1997-10-20 | 2004-04-06 | Sony Corporation | Method for utilizing validity constraints in a speech endpoint detector |
US6480823B1 (en) * | 1998-03-24 | 2002-11-12 | Matsushita Electric Industrial Co., Ltd. | Speech detection for noisy conditions |
US6259691B1 (en) * | 1998-07-24 | 2001-07-10 | 3Com Corporation | System and method for efficiently transporting dual-tone multi-frequency/multiple frequency (DTMF/MF) tones in a telephone connection on a network-based telephone system |
US6556670B1 (en) * | 1998-08-21 | 2003-04-29 | Lucent Technologies Inc. | Method for solving the music-on-hold problem in an audio conference |
US6711540B1 (en) * | 1998-09-25 | 2004-03-23 | Legerity, Inc. | Tone detector with noise detection and dynamic thresholding for robust performance |
CA2358203A1 (en) * | 1999-01-07 | 2000-07-13 | Tellabs Operations, Inc. | Method and apparatus for adaptively suppressing noise |
CA2364898C (en) * | 1999-03-22 | 2004-09-14 | Octave Communications, Inc. | Audio conferencing system streaming summed conference signal onto the internet |
WO2001039175A1 (en) * | 1999-11-24 | 2001-05-31 | Fujitsu Limited | Method and apparatus for voice detection |
US20020116186A1 (en) * | 2000-09-09 | 2002-08-22 | Adam Strauss | Voice activity detector for integrated telecommunications processing |
US20020136382A1 (en) * | 2001-03-22 | 2002-09-26 | Alon Cohen | System and method for providing simplified conferencing |
US7428223B2 (en) * | 2001-09-26 | 2008-09-23 | Siemens Corporation | Method for background noise reduction and performance improvement in voice conferencing over packetized networks |
US7433462B2 (en) * | 2002-10-31 | 2008-10-07 | Plantronics, Inc | Techniques for improving telephone audio quality |
-
2002
- 2002-04-30 US US10/135,323 patent/US6721411B2/en not_active Expired - Lifetime
- 2002-04-30 CA CA2446085A patent/CA2446085C/en not_active Expired - Lifetime
- 2002-04-30 WO PCT/US2002/013438 patent/WO2002089458A1/en not_active Application Discontinuation
- 2002-04-30 EP EP02725844.1A patent/EP1391106B1/en not_active Expired - Lifetime
-
2004
- 2004-03-16 US US10/801,276 patent/US8111820B2/en active Active
-
2012
- 2012-01-30 US US13/361,395 patent/US8611520B2/en not_active Expired - Lifetime
Also Published As
Publication number | Publication date |
---|---|
US8611520B2 (en) | 2013-12-17 |
US20130028404A1 (en) | 2013-01-31 |
US20040174973A1 (en) | 2004-09-09 |
WO2002089458A1 (en) | 2002-11-07 |
EP1391106A1 (en) | 2004-02-25 |
EP1391106A4 (en) | 2005-04-06 |
EP1391106B1 (en) | 2014-02-26 |
US6721411B2 (en) | 2004-04-13 |
CA2446085A1 (en) | 2002-11-07 |
US20020172342A1 (en) | 2002-11-21 |
US8111820B2 (en) | 2012-02-07 |
WO2002089458A8 (en) | 2003-03-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA2446085C (en) | Audio conference platform with dynamic speech detection threshold | |
EP1163785B1 (en) | Audio conference platform with centralized summing | |
US6985571B2 (en) | Audio conferencing method | |
US6850609B1 (en) | Methods and apparatus for providing speech recording and speech transcription services | |
US7206404B2 (en) | Communications system and method utilizing centralized signal processing | |
US7532713B2 (en) | System and method for voice over internet protocol audio conferencing | |
US20030231746A1 (en) | Teleconference speaker identification | |
CA1282481C (en) | Communication system dynamic conferencer circuit | |
US20030198328A1 (en) | Voice activity identification for speaker tracking in a packed based conferencing system with distributed processing | |
US6404872B1 (en) | Method and apparatus for altering a speech signal during a telephone call | |
MY117901A (en) | Method and system for processing telephone calls involving two digital wireless subscriber units that avoids double vocoding. | |
US8184790B2 (en) | Notification of dropped audio in a teleconference call | |
US6353662B1 (en) | Method and apparatus for eliminating nuisance signals in multi-party calls | |
CA2364898C (en) | Audio conferencing system streaming summed conference signal onto the internet | |
US7177286B2 (en) | System and method for processing digital audio packets for telephone conferencing | |
KR100528742B1 (en) | Multi-party Conference Call Device of Exchange System | |
US20010011945A1 (en) | Dynamic virtual paging method and apparatus | |
US20050058275A1 (en) | Audio source identification | |
ITMI940783A1 (en) | TELEPHONE ANNOUNCEMENT WITHOUT VOICE INTERPOSITION | |
JPH04291873A (en) | Telephone conference system | |
JPH10304076A (en) | Simultaneous command service system for private branch exchange | |
JPS63257364A (en) | Conference communication control system | |
JP2000151806A (en) | Method and device for connection of subscriber telephone line | |
JPH066470A (en) | Private branch exchange telephone system | |
JPH0698027A (en) | Speaking system for plural points distributed connecting conference telephone service |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request | ||
MKEX | Expiry |
Effective date: 20220502 |