US7716048B2 - Method and apparatus for segmentation of audio interactions - Google Patents

Method and apparatus for segmentation of audio interactions Download PDF

Info

Publication number
US7716048B2
US7716048B2 US10/567,810 US56781006D US7716048B2 US 7716048 B2 US7716048 B2 US 7716048B2 US 56781006 D US56781006 D US 56781006D US 7716048 B2 US7716048 B2 US 7716048B2
Authority
US
United States
Prior art keywords
interaction
audio interaction
segment
segmentation
speaker
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US10/567,810
Other versions
US20080181417A1 (en
Inventor
Oren Pereg
Moshe Waserblat
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mercedes Benz Group AG
Nice Systems Ltd
Original Assignee
Nice Systems Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nice Systems Ltd filed Critical Nice Systems Ltd
Assigned to NICE SYSTEMS LTD. reassignment NICE SYSTEMS LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: PEREG, OREN, WASERBLAT, MOSHE
Assigned to DAIMLER AG reassignment DAIMLER AG CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: DAIMLERCHRYSLER AG
Publication of US20080181417A1 publication Critical patent/US20080181417A1/en
Application granted granted Critical
Publication of US7716048B2 publication Critical patent/US7716048B2/en
Assigned to JPMORGAN CHASE BANK, N.A., AS ADMINISTRATIVE AGENT reassignment JPMORGAN CHASE BANK, N.A., AS ADMINISTRATIVE AGENT PATENT SECURITY AGREEMENT Assignors: AC2 SOLUTIONS, INC., ACTIMIZE LIMITED, INCONTACT, INC., NEXIDIA, INC., NICE LTD., NICE SYSTEMS INC., NICE SYSTEMS TECHNOLOGIES, INC.
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H60/00Arrangements for broadcast applications with a direct linking to broadcast information or broadcast space-time; Broadcast-related systems
    • H04H60/56Arrangements characterised by components specially adapted for monitoring, identification or recognition covered by groups H04H60/29-H04H60/54
    • H04H60/58Arrangements characterised by components specially adapted for monitoring, identification or recognition covered by groups H04H60/29-H04H60/54 of audio

Definitions

  • the present invention relates to audio analysis in general and to a method and apparatus for segmenting an audio interaction, in particular.
  • Audio analysis refers to the extraction of information and meaning from audio signals for purposes such as word statistics, trend analysis, quality assurance, and the like. Audio analysis could be performed in audio interaction-extensive working environments, such as for example call centers, financial institutions, health organizations, public safety organizations or the like. Typically, audio analysis is used in order to extract useful information associated with or embedded within captured or recorded audio signals carrying interactions. Audio interactions contain valuable information that can provide enterprises with insights into their business, users, customers, activities and the like. The extracted information can be used for issuing alerts, generating reports, sending feedback or otherwise using the extracted information. The information can be usefully manipulated and processed, such as being stored, retrieved, synthesized, combined with additional sources of information, and the like.
  • Extracted information can include for example, continuous speech, spotted words, identified speaker, extracted emotional (positive or negative) segments within an interaction, data related to the call flow such as number of bursts in from each side, segments of mutual silence, or the like.
  • the customer side of an interaction recorded in a commercial organization can be used for various purposes such as trend analysis, competitor analysis, emotion detection (finding emotional calls) to improve customer satisfaction level, and the like.
  • the service provider side of such interactions can be used for purposes such as script adherence, emotion detection (finding emotional calls) to track deficient agent behavior, and the like.
  • the most common interaction recording format is summed audio, which is the product of analog line recording, observation mode and legacy systems.
  • a summed interaction may include, in addition to two or more speakers that at times may talk simultaneously (co-speakers), also music, tones, background noises on either side of the interaction, or the like.
  • the audio analysis performance as measured in terms of accuracy, detection, real-time efficiency and resource efficiency, depends directly on the quality and integrity of the captured and/or recorded signals carrying the audio interaction, on the availability and integrity of additional meta-information, on the capabilities of the computer programs that constitute the audio analysis process and on the available computing resources. Many of the analysis tasks are highly sensitive to the audio quality of the processed interactions.
  • unsupervised speaker segmentation algorithms are based on bootstrap (bottom up) classification methods, starting with short discriminative segments and extending such segments using additional, not necessarily adjacent segments.
  • a homogenous speaker segment is located, and regarded as an anchor.
  • the anchored segment is used for initially creating a model of the first speaker.
  • a second homogenous speaker segment is located, in which the speaker characteristics are most different from the first segment.
  • the second segment is used for creating a model of the second speaker.
  • an iterative maximum-likelihood (ML) classifier based on the anchored speaker models, all other utterance segments could be roughly classified.
  • the conventional methods suffer from a few limitations: the performance of the speaker segmentation algorithm is highly sensitive to the initial phase, i.e., poor choice of the initial segment (anchored segment) can lead to unreliable segmentation results. Additionally, the methods do not provide a verification mechanism for assessing the success of the segmentation, nor the convergence of the methods, in order to eliminate poorly segmented interactions from being further processed by audio analysis tools and providing further inaccurate results. Another drawback is that additional sources of information, such as computer-telephony-integration (CTI) data, screen events and the like are not used. Yet another drawback is the inability of the method to tell which collection of segments belongs to one speaking side, such as the customer, and which belongs to the other speaking side, since different analyses are performed on both sides, to serve different needs.
  • CTI computer-telephony-integration
  • the segmentation tool has to be effective, i.e., extract as long and as many as possible segments of the interaction in which a single speaker is speaking, with as little as possible compromise on the reliability, i.e., the quality of the segments. Additionally, the tool should be fast and efficient, so as not to introduce delays to further processing, or place additional burden on the computing resources of the organization. It is also required that the tool will provide a performance estimation which can be used in deciding whether the speech segments are to be sent for analysis or not.
  • a speaker segmentation method for associating one or more segments for each of two or more sides of one or more audio interactions, with one of the sides of the interaction using additional information, the method comprising: a segmentation step for associating the one or more segments with one side of the interaction, and a scoring step for assigning a score to said segmentation.
  • the additional information can be one or more of the group consisting of: computer-telephony-integration information related to the at least one interaction; spotted words within the at least one interaction; data related to the at least one interaction; data related to a speaker thereof; external data related to the at least one interaction; or data related to at least one other interaction performed by a speaker of the at least one interaction.
  • the method can further comprise a model association step for scoring the segments against one or more statistical models of one side, and obtaining a model association score.
  • the scoring step can use discriminative information for discriminating the two or more sides of the interaction.
  • the scoring step can comprise a model association step for scoring the segments against a statistical model of one side, and obtaining a model association score.
  • the scoring step can further comprise a normalization step for normalizing the one or more model scores.
  • the scoring step can also comprise evaluating the association of the one or more segments with a side of the interaction, using additional information.
  • the additional information can be one or more of the group consisting of: computer-telephony-integration information related to the at least one interaction; spotted words within the at least one interaction; data related to the at least one interaction; data related to a speaker thereof; external data related to the at least one interaction; or data related to at least one other interaction performed by a speaker of the at least one interaction.
  • the scoring step can comprise statistical scoring.
  • the method can further comprise: a step of comparing the score to a threshold; and repeating the segmentation step and the scoring step if the score is below the threshold.
  • the threshold can be predetermined, or dynamic, or depend on: information associated with said at least one interaction, information associated with an at least one speaker thereof, or external information associated with the interaction.
  • the segmentation step can comprise a parameterization step to transform the speech signal to a set of feature vectors in order to generate data more suitable for statistical modeling; an anchoring step for locating an anchor segment for each side of the interaction; and a modeling and classification step for associating at least one segment with one side of the interaction.
  • the anchoring step or the modeling and classification step can comprise using additional data, wherein the additional data is one or more of the group consisting of: computer-telephony-integration information related to the at least one interaction; spotted words within the at least one interaction; data related to the at least one interaction; data related to a speaker thereof; external data related to the at least one interaction; or data related to at least one other interaction performed by a speaker of the at least one interaction.
  • the method can comprise a preprocessing step for enhancing the quality of the interaction, or a speech/non-speech segmentation step for eliminating non-speech segments from the interaction.
  • the segmentation step can comprise scoring the one or more segments with a voice model of a known speaker.
  • a speaker segmentation apparatus for associating one or more segments for each of at two or more speakers participating in one or more audio interactions, with a side of the interaction, using additional information
  • the apparatus comprising: a segmentation component for associating one or more segments within the interaction with one side of the interaction; and a scoring component for assigning a score to said segmentation.
  • the additional information can be of the group consisting of: computer-telephony-integration information related to the at least one interaction; spotted words within the at least one interaction; data related to the at least one interaction; data related to a speaker thereof; external data related to the interaction; or data related to one or more other interactions performed by a speaker of the interaction.
  • Yet another aspect of the disclosed invention relates to a quality management apparatus for interaction-rich environments, the apparatus comprising: a capturing or logging component for capturing or logging one or more audio interactions; a segmentation component for segmenting the interactions; and a playback component for playing one or more parts of the one or more audio interactions.
  • FIG. 1 is a schematic block diagram of a typical environment in which the disclosed invention is used, in accordance with a preferred embodiment of the present invention
  • FIG. 2 is a schematic flowchart of the disclosed segmentation method, in accordance with a preferred embodiment of the present invention.
  • FIG. 3 is a schematic flowchart of the scoring process, in accordance with a preferred embodiment of the present invention.
  • the present invention overcomes the disadvantages of the prior art by providing a novel method and a system for locating segments within an audio interaction in which a single speaker is speaking, dividing the segments into two or more groups, wherein the speaker in each segment group is the same one, and discriminating in which group of segments a certain participant, or a certain type of participant, such as a service representative (agent) of an organization, is speaking, and in which group another participant or participant type, such as a customer, is speaking.
  • the disclosed invention utilizes additional types of data collected in interaction-intensive environments, such as call centers, financial institutions or the like, in addition to captured or recorded audio interactions in order to enhance the segmentation and the association of a group of segments with a specific speaker or speaker type, such as an agent, a customer or the like.
  • the discussion below is oriented more to applications involving commerce or service, but the method is applicable to any required domain, including public safety, financial organizations such as trade floors, health organizations and others.
  • the information includes raw information, such as meta data, as well as information extracted by processing the interactions.
  • Raw information includes, for example Computer Telephony Integration (CTI) information which includes hold periods, number called, number called, DNIS, VDN, ANI or the like, agent details, screen events related to the current or other interactions with the customer, information exchanged between the parties, and other relevant information that can be retrieved form external sources such as CRM data, billing information, workflow management, mail messages and the like.
  • CTI Computer Telephony Integration
  • the extracted information can include, for example certain words spotted within the interaction, such as greetings, compliance phrases or the like, continuous speech recognition, emotion detected within an interaction, and call flow information, such as bursts of one speaker when the other speaker is talking, mutual silence periods and others.
  • Other data used include for example voice models of a single or multiple speakers.
  • the collected data is used in the process of segmenting the audio interaction in a number of ways.
  • the information can be used to obtain an accurate anchor point for the initial selection of a segment of a single speaker.
  • a segment in which a compliance phrase was spotted can be a good anchor point for one speaker, specifically the agent.
  • a highly emotional segment can be used as an anchor for the customer side.
  • Such information can be used during the classification of segments into speakers, and also for posteriori assessment of the performance of the segmentation.
  • the absence or presence, and certainty level of specific events within the segments of a certain speaker can contribute to the discrimination of the agent side from the customer side, and also for assessing the performance of the segmentation.
  • the presence of compliance sentences and typical customer-side noises can suggest a deficient segmentation.
  • the discrimination of the speakers can be enhanced by utilizing agent-customer-discriminating information, such as screen events, emotion levels, and voice models of a specific agent, a specific customer, a group of agents, a universal agent model or a universal customer model. If segments attributed to one side have a high probability of complying with a specific agent's characteristics or with a universal agent model, relating the segments to the agent side will have a higher score, and vice versa. Thus, the segmentation can be assessed, and according to the assessment result accepted, rejected, or repeated.
  • FIG. 1 presents a block diagram of the main components in a typical environment in which the disclosed invention is used.
  • the environment is an interaction-rich organization, typically a call center, a bank, a trading floor, another financial institute, a public safety contact center, or the like.
  • Customers, users or other contacts are contacting the center, thus generating input information of various types.
  • the information types include vocal interactions, non-vocal interactions and additional data.
  • the capturing of voice interactions can employ many forms and technologies, including trunk side, extension side, summed audio, separate audio, various encoding and decoding protocols such as G729, G726, G723.1, and the like.
  • the vocal interactions usually include telephone 12 , which is currently the main channel for communicating with users in many organizations.
  • a typical environment can further comprise voice over IP channels 16 , which possibly pass through a voice over IP server (not shown).
  • the interactions can further include face-to-face interactions, such as those recorded in a walk-in-center 20 , and additional sources of vocal data 24 , such as microphone, intercom, the audio of video capturing, vocal input by external systems or any other source.
  • the environment comprises additional non-vocal data of various types 28 .
  • CTI Computer Telephony Integration
  • DNIS number called from, DNIS, VDN, ANI, or the like
  • Additional data can arrive from external sources such as billing, CRM, or screen events, including text entered by a call representative, documents and the like.
  • the data can include links to additional interactions in which one of the speakers in the current interaction participated.
  • Another type of data includes data extracted from vocal interactions, such as spotted words, emotion level, speech-to-text or the like. Data from all the above-mentioned sources and others is captured and preferably logged by capturing/logging unit 32 .
  • the captured data is stored in storage 34 , comprising one or more magnetic tape, a magnetic disc, an optical disc, a laser disc, a mass-storage device, or the like.
  • the storage can be common or separate for different types of captured interactions and different types of additional data.
  • the storage can be remote from the site of capturing and can serve one or more sites of a multi-site organization such as a bank.
  • Capturing/logging unit 32 comprises a computing platform running one or more computer applications as is detailed below. From capturing/logging unit 32 , the vocal data and preferably the additional relevant data is transferred to segmentation component 36 which executes the actual segmentation of the audio interaction. Segmentation component 36 transfers the output segmentation to scoring component 38 , which assigns a score to the segmentation.
  • the threshold can be predetermined, or it can be set dynamically, taking into account the interaction type, one or more of the speakers if known, additional data such as Computer-Telephony-Integration (CTI) data, CRM, or billing data, data associated with any of the speakers, screen events or the like.
  • CTI Computer-Telephony-Integration
  • CRM Computer-Telephony-Integration
  • billing data data associated with any of the speakers, screen events or the like.
  • the system can assign a higher threshold to an interaction of a VIP customer, than to an interaction of an ordinary customer, or higher threshold for interactions involving opening an account or the like.
  • the segmented audio can assume the form of separate audio streams or files for each side, the form of the original stream or file accompanied by indexing information denoting the beginning and end of each segment in which a certain side of the interaction is speaking, or any other form.
  • the segmented audio is preferably transferred to further engines 40 , such as speech-to-ext engine, emotion detection, speaker recognition, or other voice processing engines.
  • the segmentation information or the segmented voice is transferred for storage purposes 44 .
  • the information can be transferred to any other purpose or component 48 , such as, but not limited to a playback component for playing the captured or segmented audio interactions.
  • All components of the system including capturing/logging components 32 and segmentation component 36 , preferably comprise one or more computing platforms, such as a personal computer, a mainframe computer, or any other type of computing platform that is provisioned with a memory device (not shown), a CPU or microprocessor device, and several I/O ports (not shown).
  • each component can be a DSP chip, an ASIC device storing the commands and data necessary to execute the methods of the present invention, or the like.
  • Each component can further include a storage device (not shown), storing the relevant applications and data required for processing.
  • Each application running on each computing platform, such as the capturing applications or the segmentation application is a set of logically inter-related computer programs or modules and associated data structures that interact to perform one or more specific tasks.
  • All applications can be co-located and run on the same one or more computing platform, or on different platforms.
  • the information sources and capturing platforms can be located on each site of a multi-site organization, and one or more segmentation components can be remotely located, segment interactions captured at one or more sites and store the segmentation results in a local, central, distributed or any other storage.
  • FIG. 2 showing a flowchart of the main steps in the proposed speaker segmentation method.
  • Summed audio as well as additional data such as CTI data, screen events, spotted words, data from external sources such as CRM, billing, or the like are introduced at step 104 to the system.
  • the summed audio can use any format and any compression method acceptable by the system, such as PCM, WAV, MP3, G729, G726, G723.1, or the like.
  • the audio can be introduced in streams, files, or the like.
  • preprocessing is performed on the audio, in order to enhance the audio for further processing.
  • the preprocessing preferably includes decompression, according to the compression used in the specific interaction.
  • the preprocessing can include compression and decompression with one of the protocols used in the environment in order to adapt the audio to the characteristics common in the environment.
  • the preprocessing can further include low-quality segments removal or other processing that will enhance the quality of the audio.
  • Step 110 marks, removes or otherwise eliminates non-speech segments from the audio. Such segments can include music, tones, DMFT, silence, segments with significant background noise or other substantially non-speech segments.
  • Preprocessing step 108 and speech/non-speech segmentation step 110 are optional, and can be dispensed with. However, the performance in time, computing resources and the quality of the speaker segmentation will degrade if step 108 or step 110 are omitted.
  • Segmentation step 112 comprises a parameterization step 118 , an anchoring step 120 and a modeling and classification step 124 .
  • the speech is being parameterized by transforming the speech signal into a set of feature vectors. The purpose of this transformation is to obtain a new representation which is more compact, less redundant and more suitable for statistical modeling. Most of the speaker segmentation systems depend on cepstral representation of speech in addition to prosodic parameters such as pitch, pitch variance, energy level and the like.
  • the parameterization generates a sequence of feature vectors, wherein each vector relates to a certain time frame, preferably in the range of 10-30 ms, where the speech could be regarded as stationary.
  • the parameterization step is performed earlier as part of preprocessing step 108 or speech/non-speech segmentation step 110 .
  • the speech signal is being divided into non-overlapping segments, typically but not limited to having a period of 1-3 seconds.
  • the speaker segmentation main process starts at step 120 , during which, anchor segments are located within the audio interaction.
  • the method searches for two segments to be used as anchor segments and each of the two segments should contain speech of a different speaker.
  • Each anchor segment will be used for initial voice modeling of the speaker it represents.
  • the first anchor segment finding is preferably performed by a statistical modeling of every segment in the interaction and then by locating the most homogenous segment in terms of statistical voice feature distribution.
  • Such segment is more likely to be a segment in which a single speaker is speaking rather than an area of transition between two speakers.
  • This segment will be used for first speaker initial voice model building. Locating such first segment can also involve utilizing additional data, such as CTI events, for example the first speaker in a call center interaction is likely to be the agent addressing the customer. Alternatively, spotting with high certainty standard phrases which agents are instructed to use, such as “company X good morning, how can I help you”, can help identify an anchor segment for the agent side, and standard questions, such as “how much would it cost to”, can help in locating homogenous segments of a customer side.
  • the method constructs a statistical model of the voice features in that segment where the statistical model represents the voice characteristics of the first speaker.
  • the method searches for a second anchor segment, whose statistical model is as different as possible from the statistical model of the first anchor, the distance is measured and quantified by some statistical distance function, such as a likelihood ratio test.
  • the aim of the second anchor finding is to find an area in the interaction which is most likely produced by a different statistical source, i.e. a different speaker.
  • a different statistical source i.e. a different speaker.
  • locating the segments of the agent can be done by searching for all segments which comply with the specific agent model, and continuing by associating all the rest of the speech segment with the customer (or agent) side.
  • Step 124 comprises an iterative process. On each iteration, a statistical model is constructed from the aggregated segments identified so far as belonging to each speaker. Then the distance between each segment in the interaction and the speakers voice models is measured and quantified. The distance can be produced by likelihood calculation or the like. Next, one or more segments which are most likely to come from the same statistical distributions as the speakers statistical models, i.e. produced by the same speaker, are added to the similar speaker's pool of segments from the previous iteration. On the next iteration, the statistical models are reconstructed, utilizing the newly added segments as well as the previous ones, and new segments to be added are searched for.
  • the iterations proceed until one or more stopping criteria are met, such as the distance between the model and the most similar segment exceeding a certain threshold, the length of the added segments being below a certain threshold or the like.
  • soft classification techniques can also be applied in determining the similarity between a segment and a statistical model or when calculating if a stop criterion is met.
  • scoring step 128 takes place. Scoring step 128 assigns a score to the segmentation result. If the score is below a predetermined threshold, the performance is unsatisfactory and the process repeats, restarting from step 120 , excluding the former first and second anchor segments or from step 118 using different voice features.
  • the threshold can be predetermined, or it can be set dynamically, taking into account the interaction type, other data related to the interaction, additional data such as CTI data, external data such as CRM or billing data, data associated with any of the speakers, screen events or the like.
  • the stopping condition for the segmentation can be defined in a predetermined manner, such as “try at most X times, and if the segmentation does not succeed, skip the interaction and segment another one”.
  • the stopping criteria can be defined dynamically, for example, “continue the segmentation as long as there are still segments that no segments X or less seconds apart from them, have been used as anchor segments”. If the segmentation score exceeds the predetermined threshold, the results are output at step 144 .
  • the Scoring process is detailed in association with FIG. 3 below.
  • the results output at step 144 can take any required form.
  • One preferred form is a file or stream containing text, denoting the start and end locations of each segment, for example in terms of time units from the beginning of the interaction, and the associated speaker.
  • the output can also comprise start and end locations for segments of an unknown speaker, or for non-speech segments.
  • Another preferred form comprises two or more files wherein each file comprises the segments of one speaker. The non-speech or unknown speaker segments can be ignored or reside in a separate file for purposes such as playback.
  • the scoring step comprises two main parts, assessing a statistical score and an agent-customer discrimination score.
  • the statistical score determined at step 204 is based on determining the distance between the model generated from the segments attributed to one side and the model generated from the segments attributed to the other side. If the distance between the models is above a predetermined threshold, then the segments attributed to one side are significantly different than the segments attributed to the other side, and the classification is considered successful. If the distance is below a predetermined threshold (not necessarily equal to the predetermined threshold mentioned above), the segments attributed to different speakers are not distinctive enough, and the classification is assumed to be unsuccessful.
  • Discriminative scoring step 208 uses discriminative information, such as discriminative customer-agent information in order to assess the success of the speaker segmentation process, and to determine or verify the association of each segment group with a specific speaker. Discriminative scoring step 208 is divided into model association step 212 and additional information scoring step 216 . Model association step 212 uses previously built or otherwise acquired universal models of agents and of customers.
  • the universal agent model is built from speech segments in which multiple agents of the relevant environment are speaking, using the same types of equipment used in the environment.
  • the universal customer model is built from multiple segments of customers using various types of equipment, including land lines, cellular lines, various handsets, various types of typical customer background noise and the like.
  • the model preferably incorporates both male and female customers if customers of both genders are likely to speak in real interactions, customers of relevant ages, accents and the like. If the speaker segmentation includes side (agent/customer) association, step 212 is used for verification of the association; otherwise step 212 is used for associating each segment group with a specific side.
  • model association step 212 the speech segments attributed to each side are preferably scored against the universal agent model in step 220 , and against the universal customer model in step 224 , thus obtaining two model association scores.
  • the two model association scores are normalized in normalization step 228 . If one segment group was assigned, for example, to an agent, and indeed the normalized score against the universal agent model yielded a significantly higher score than the scoring against the universal customer model, the association of the segment group to the agent side is reinforced. However, if the score of agent-assumed segment group against a customer model is higher then the score against the genera agent model, this might indicate a problem either in the segmentation or in the side association.
  • the scoring can be performed for the segments attributed to a certain side one or more at a time, or all of them together, using a combination of the feature vectors associated with the segments. If the segment group is not assigned to a specific side, a normalized score to one side which exceeds a certain threshold can be used in determining the side as well as the quality of the segmentation.
  • Model association step 212 can be performed solely in order to associate a segment group with a certain side, and not just to assess a segmentation quality, in which case it is not part of discriminative score 208 but rather an independent step.
  • step 232 the method further uses additional data evaluation, in order to evaluate the contribution of each segment attributed to a certain speaker.
  • Additional data can include spotted words that are typical to a certain side, such as “how can I help you” on the agent side, and “how much would that cost” for a customer side, CTI events, screen events, external or internal information or the like.
  • the presence, possibly associated with a certainty level, of such events on segments associated with a specific side are accumulated or otherwise combined into a single additional data score.
  • the scores of statistical scoring 204 , model association 212 and additional data scoring 232 are combined at step 236 , and a general score is issued. If the score is below a predetermined threshold, as is evaluate at step 140 of FIG.
  • the segmentation process restarts at step 120 excluding the former first and second anchor segments. Since none of scoring steps 204 , 212 , and 232 is mandatory, combining step 236 weights whatever scores that are available. Each subset of the scoring results of scoring steps 204 , 212 and 232 can be used to produce a general scoring result. Combining step 236 can be further designed to weight additional scores, such as user input or other scoring mechanisms currently known or that will become known at a later time. Combining step 236 can use dynamic or predetermined parameters and schemes to weight or otherwise combine the available scores.
  • the same data item should not be used in the scoring phase if it was already used during the segmentation phase.
  • Using the same data item in the two phases will bias the results and give higher and unjustified score to certain segmentation. For example, if the phrase “Company X good morning” was spotted at a certain location, and the segment it appeared on was used as an anchor for the agent side, considering this word during additional data scoring step will raise the score in an artificial manner, since it is known that the segment the phrase was said in is associated with the agent side.
  • the presented methods and scorings can be partitioned in a different manner over the described steps without significant change in the results. It will also be appreciated by people skilled in the art that additional scoring methods can exist and be applied in addition, or instead of the presented scoring.
  • the scoring method can be applied to the results of any segmentation method, and not necessarily the one presented above. Also, different variations can be applied to the segmentation and the scoring methods as described, without significant change to the proposed solution. It will further be appreciated by people skilled in the art that the disclosed invention can be extended to segmenting an interaction between more than two speakers, without significant changes to the described method.
  • the described rules and parameters, such as the acceptable score values, stopping criteria for the segmentation and the like can be predetermined or set dynamically. For example, the parameters can tale into account the type or length of the interaction, the customer type as received from an external system or the like.
  • the disclosed invention provides a novel approach to segmenting an audio interaction into segments, and associating each group of segments with one speaker.
  • the disclosed invention provides a scoring and control mechanism over the quality of the resulting segmentation.

Abstract

A method and apparatus for segmenting an audio interaction, by locating anchor segment from each side of the interaction, iteratively classifying additional segments into one of the two sides, and scoring the resulting segmentation, If the score result is below a threshold, the process is repeated until the segmentation score is satisfactory or until a stopping criterion is met. The anchoring and the scoring steps comprise using additional data associated with the interaction, a speaker thereof, internal or external information related to the interaction or to a speaker thereof or the like.

Description

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to audio analysis in general and to a method and apparatus for segmenting an audio interaction, in particular.
2. Discussion of the Related Art
Audio analysis refers to the extraction of information and meaning from audio signals for purposes such as word statistics, trend analysis, quality assurance, and the like. Audio analysis could be performed in audio interaction-extensive working environments, such as for example call centers, financial institutions, health organizations, public safety organizations or the like. Typically, audio analysis is used in order to extract useful information associated with or embedded within captured or recorded audio signals carrying interactions. Audio interactions contain valuable information that can provide enterprises with insights into their business, users, customers, activities and the like. The extracted information can be used for issuing alerts, generating reports, sending feedback or otherwise using the extracted information. The information can be usefully manipulated and processed, such as being stored, retrieved, synthesized, combined with additional sources of information, and the like. Extracted information can include for example, continuous speech, spotted words, identified speaker, extracted emotional (positive or negative) segments within an interaction, data related to the call flow such as number of bursts in from each side, segments of mutual silence, or the like. The customer side of an interaction recorded in a commercial organization can be used for various purposes such as trend analysis, competitor analysis, emotion detection (finding emotional calls) to improve customer satisfaction level, and the like. The service provider side of such interactions can be used for purposes such as script adherence, emotion detection (finding emotional calls) to track deficient agent behavior, and the like. The most common interaction recording format is summed audio, which is the product of analog line recording, observation mode and legacy systems. A summed interaction may include, in addition to two or more speakers that at times may talk simultaneously (co-speakers), also music, tones, background noises on either side of the interaction, or the like. The audio analysis performance, as measured in terms of accuracy, detection, real-time efficiency and resource efficiency, depends directly on the quality and integrity of the captured and/or recorded signals carrying the audio interaction, on the availability and integrity of additional meta-information, on the capabilities of the computer programs that constitute the audio analysis process and on the available computing resources. Many of the analysis tasks are highly sensitive to the audio quality of the processed interactions. Multiple speakers, as well as music (which is often present on hold periods), tones, background noises such as street noise, ambient noise, convolutional noises such as channel type and handset type, keystrokes and the like, severely degrade the performance of these engines, sometimes to the degree of complete uselessness, for example in the case of emotion detection where it is mandatory to analyze only one speaker's speech segments. Therefore it is crucial to identify only the speech segments of an interaction wherein a single speaker is speaking. The customary solution is to use unsupervised speaker segmentation module as part of the audio analysis.
Traditionally, unsupervised speaker segmentation algorithms are based on bootstrap (bottom up) classification methods, starting with short discriminative segments and extending such segments using additional, not necessarily adjacent segments. Initially, a homogenous speaker segment is located, and regarded as an anchor. The anchored segment is used for initially creating a model of the first speaker. In the next phase a second homogenous speaker segment is located, in which the speaker characteristics are most different from the first segment. The second segment is used for creating a model of the second speaker. By deploying an iterative maximum-likelihood (ML) classifier, based on the anchored speaker models, all other utterance segments could be roughly classified. The conventional methods suffer from a few limitations: the performance of the speaker segmentation algorithm is highly sensitive to the initial phase, i.e., poor choice of the initial segment (anchored segment) can lead to unreliable segmentation results. Additionally, the methods do not provide a verification mechanism for assessing the success of the segmentation, nor the convergence of the methods, in order to eliminate poorly segmented interactions from being further processed by audio analysis tools and providing further inaccurate results. Another drawback is that additional sources of information, such as computer-telephony-integration (CTI) data, screen events and the like are not used. Yet another drawback is the inability of the method to tell which collection of segments belongs to one speaking side, such as the customer, and which belongs to the other speaking side, since different analyses are performed on both sides, to serve different needs.
It should be easily perceived by one with ordinary skills in the art, that there is an obvious need for an unsupervised segmentation method and for an apparatus to segment an unconstrained interaction into segments that should not be analyzed, such as music, tones, low quality segments or the like, and segments carrying speech of a single speaker, where segments of the same speaker should be grouped or marked accordingly. Additionally, identifying the sides of the interaction is required. The segmentation tool has to be effective, i.e., extract as long and as many as possible segments of the interaction in which a single speaker is speaking, with as little as possible compromise on the reliability, i.e., the quality of the segments. Additionally, the tool should be fast and efficient, so as not to introduce delays to further processing, or place additional burden on the computing resources of the organization. It is also required that the tool will provide a performance estimation which can be used in deciding whether the speech segments are to be sent for analysis or not.
SUMMARY OF THE PRESENT INVENTION
It is an object of the present invention to provide a novel method for speaker segmentation which overcomes the disadvantages of the prior art. In accordance with the present invention, there is thus provided a speaker segmentation method for associating one or more segments for each of two or more sides of one or more audio interactions, with one of the sides of the interaction using additional information, the method comprising: a segmentation step for associating the one or more segments with one side of the interaction, and a scoring step for assigning a score to said segmentation. The additional information can be one or more of the group consisting of: computer-telephony-integration information related to the at least one interaction; spotted words within the at least one interaction; data related to the at least one interaction; data related to a speaker thereof; external data related to the at least one interaction; or data related to at least one other interaction performed by a speaker of the at least one interaction. The method can further comprise a model association step for scoring the segments against one or more statistical models of one side, and obtaining a model association score. The scoring step can use discriminative information for discriminating the two or more sides of the interaction. The scoring step can comprise a model association step for scoring the segments against a statistical model of one side, and obtaining a model association score. Within the method, the scoring step can further comprise a normalization step for normalizing the one or more model scores. The scoring step can also comprise evaluating the association of the one or more segments with a side of the interaction, using additional information. The additional information can be one or more of the group consisting of: computer-telephony-integration information related to the at least one interaction; spotted words within the at least one interaction; data related to the at least one interaction; data related to a speaker thereof; external data related to the at least one interaction; or data related to at least one other interaction performed by a speaker of the at least one interaction. The scoring step can comprise statistical scoring. The method can further comprise: a step of comparing the score to a threshold; and repeating the segmentation step and the scoring step if the score is below the threshold. The threshold can be predetermined, or dynamic, or depend on: information associated with said at least one interaction, information associated with an at least one speaker thereof, or external information associated with the interaction. The segmentation step can comprise a parameterization step to transform the speech signal to a set of feature vectors in order to generate data more suitable for statistical modeling; an anchoring step for locating an anchor segment for each side of the interaction; and a modeling and classification step for associating at least one segment with one side of the interaction. The anchoring step or the modeling and classification step can comprise using additional data, wherein the additional data is one or more of the group consisting of: computer-telephony-integration information related to the at least one interaction; spotted words within the at least one interaction; data related to the at least one interaction; data related to a speaker thereof; external data related to the at least one interaction; or data related to at least one other interaction performed by a speaker of the at least one interaction. The method can comprise a preprocessing step for enhancing the quality of the interaction, or a speech/non-speech segmentation step for eliminating non-speech segments from the interaction. The segmentation step can comprise scoring the one or more segments with a voice model of a known speaker.
Another aspect of the disclosed invention relates to a speaker segmentation apparatus for associating one or more segments for each of at two or more speakers participating in one or more audio interactions, with a side of the interaction, using additional information, the apparatus comprising: a segmentation component for associating one or more segments within the interaction with one side of the interaction; and a scoring component for assigning a score to said segmentation. Within the apparatus the additional information can be of the group consisting of: computer-telephony-integration information related to the at least one interaction; spotted words within the at least one interaction; data related to the at least one interaction; data related to a speaker thereof; external data related to the interaction; or data related to one or more other interactions performed by a speaker of the interaction.
Yet another aspect of the disclosed invention relates to a quality management apparatus for interaction-rich environments, the apparatus comprising: a capturing or logging component for capturing or logging one or more audio interactions; a segmentation component for segmenting the interactions; and a playback component for playing one or more parts of the one or more audio interactions.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention will be understood and appreciated more fully from the following detailed description taken in conjunction with the drawings in which:
FIG. 1 is a schematic block diagram of a typical environment in which the disclosed invention is used, in accordance with a preferred embodiment of the present invention;
FIG. 2 is a schematic flowchart of the disclosed segmentation method, in accordance with a preferred embodiment of the present invention; and
FIG. 3 is a schematic flowchart of the scoring process, in accordance with a preferred embodiment of the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
The present invention overcomes the disadvantages of the prior art by providing a novel method and a system for locating segments within an audio interaction in which a single speaker is speaking, dividing the segments into two or more groups, wherein the speaker in each segment group is the same one, and discriminating in which group of segments a certain participant, or a certain type of participant, such as a service representative (agent) of an organization, is speaking, and in which group another participant or participant type, such as a customer, is speaking. The disclosed invention utilizes additional types of data collected in interaction-intensive environments, such as call centers, financial institutions or the like, in addition to captured or recorded audio interactions in order to enhance the segmentation and the association of a group of segments with a specific speaker or speaker type, such as an agent, a customer or the like. The discussion below is oriented more to applications involving commerce or service, but the method is applicable to any required domain, including public safety, financial organizations such as trade floors, health organizations and others.
The information includes raw information, such as meta data, as well as information extracted by processing the interactions. Raw information includes, for example Computer Telephony Integration (CTI) information which includes hold periods, number called, number called, DNIS, VDN, ANI or the like, agent details, screen events related to the current or other interactions with the customer, information exchanged between the parties, and other relevant information that can be retrieved form external sources such as CRM data, billing information, workflow management, mail messages and the like. The extracted information can include, for example certain words spotted within the interaction, such as greetings, compliance phrases or the like, continuous speech recognition, emotion detected within an interaction, and call flow information, such as bursts of one speaker when the other speaker is talking, mutual silence periods and others. Other data used, include for example voice models of a single or multiple speakers.
The collected data is used in the process of segmenting the audio interaction in a number of ways. First, the information can be used to obtain an accurate anchor point for the initial selection of a segment of a single speaker. For example, a segment in which a compliance phrase was spotted can be a good anchor point for one speaker, specifically the agent. A highly emotional segment can be used as an anchor for the customer side. Such information can be used during the classification of segments into speakers, and also for posteriori assessment of the performance of the segmentation. Second, the absence or presence, and certainty level of specific events within the segments of a certain speaker can contribute to the discrimination of the agent side from the customer side, and also for assessing the performance of the segmentation. For example, the presence of compliance sentences and typical customer-side noises (such as a barking dog) in segments of allegedly the same speaker, can suggest a deficient segmentation. The discrimination of the speakers can be enhanced by utilizing agent-customer-discriminating information, such as screen events, emotion levels, and voice models of a specific agent, a specific customer, a group of agents, a universal agent model or a universal customer model. If segments attributed to one side have a high probability of complying with a specific agent's characteristics or with a universal agent model, relating the segments to the agent side will have a higher score, and vice versa. Thus, the segmentation can be assessed, and according to the assessment result accepted, rejected, or repeated.
Referring now to FIG. 1, which presents a block diagram of the main components in a typical environment in which the disclosed invention is used. The environment, generally referenced as 10, is an interaction-rich organization, typically a call center, a bank, a trading floor, another financial institute, a public safety contact center, or the like. Customers, users or other contacts are contacting the center, thus generating input information of various types. The information types include vocal interactions, non-vocal interactions and additional data. The capturing of voice interactions can employ many forms and technologies, including trunk side, extension side, summed audio, separate audio, various encoding and decoding protocols such as G729, G726, G723.1, and the like. The vocal interactions usually include telephone 12, which is currently the main channel for communicating with users in many organizations. The voice typically passes through a PABX (not shown), which in addition to the voice of the two or more sides participating in the interaction collects additional information discussed below. A typical environment can further comprise voice over IP channels 16, which possibly pass through a voice over IP server (not shown). The interactions can further include face-to-face interactions, such as those recorded in a walk-in-center 20, and additional sources of vocal data 24, such as microphone, intercom, the audio of video capturing, vocal input by external systems or any other source. In addition, the environment comprises additional non-vocal data of various types 28. For example, Computer Telephony Integration (CTI) used in capturing the telephone calls, can track and provide data such as number and length of hold periods, transfer events, number called, number called from, DNIS, VDN, ANI, or the like. Additional data can arrive from external sources such as billing, CRM, or screen events, including text entered by a call representative, documents and the like. The data can include links to additional interactions in which one of the speakers in the current interaction participated. Another type of data includes data extracted from vocal interactions, such as spotted words, emotion level, speech-to-text or the like. Data from all the above-mentioned sources and others is captured and preferably logged by capturing/logging unit 32. The captured data is stored in storage 34, comprising one or more magnetic tape, a magnetic disc, an optical disc, a laser disc, a mass-storage device, or the like. The storage can be common or separate for different types of captured interactions and different types of additional data. Alternatively, the storage can be remote from the site of capturing and can serve one or more sites of a multi-site organization such as a bank. Capturing/logging unit 32 comprises a computing platform running one or more computer applications as is detailed below. From capturing/logging unit 32, the vocal data and preferably the additional relevant data is transferred to segmentation component 36 which executes the actual segmentation of the audio interaction. Segmentation component 36 transfers the output segmentation to scoring component 38, which assigns a score to the segmentation. If the score exceeds a certain threshold, the segmentation is accepted. If the score is below the threshold, another activation of the segmentation is attempted. The scoring and segmentation sequence is repeated until an acceptable score is achieved, or a stopping criterion is met. The threshold can be predetermined, or it can be set dynamically, taking into account the interaction type, one or more of the speakers if known, additional data such as Computer-Telephony-Integration (CTI) data, CRM, or billing data, data associated with any of the speakers, screen events or the like. For example, the system can assign a higher threshold to an interaction of a VIP customer, than to an interaction of an ordinary customer, or higher threshold for interactions involving opening an account or the like. It is obvious that if the audio content of interactions, or some of the interactions, is recorded as summed, then speaker segmentation has to be performed. However, even when the audio interactions are recorded separately for each side, as is usually the case in trunk-side or digital extension recording, there still is segmentation work to be done. Separating speech from non-speech is required in order to obtain fluent speech segments, by excluding segment of music, tones, significant background noise, low quality or the like. In addition, there might still be effects of echo, background speech on the either side, the customer consulting a third person, or the like, which require the segmentation and association of single-speaker segments with one speaker. The segmented audio can assume the form of separate audio streams or files for each side, the form of the original stream or file accompanied by indexing information denoting the beginning and end of each segment in which a certain side of the interaction is speaking, or any other form. The segmented audio is preferably transferred to further engines 40, such as speech-to-ext engine, emotion detection, speaker recognition, or other voice processing engines. Alternatively, the segmentation information or the segmented voice is transferred for storage purposes 44. In addition, the information can be transferred to any other purpose or component 48, such as, but not limited to a playback component for playing the captured or segmented audio interactions. All components of the system, including capturing/logging components 32 and segmentation component 36, preferably comprise one or more computing platforms, such as a personal computer, a mainframe computer, or any other type of computing platform that is provisioned with a memory device (not shown), a CPU or microprocessor device, and several I/O ports (not shown). Alternatively, each component can be a DSP chip, an ASIC device storing the commands and data necessary to execute the methods of the present invention, or the like. Each component can further include a storage device (not shown), storing the relevant applications and data required for processing. Each application running on each computing platform, such as the capturing applications or the segmentation application is a set of logically inter-related computer programs or modules and associated data structures that interact to perform one or more specific tasks. All applications can be co-located and run on the same one or more computing platform, or on different platforms. In yet another alternative, the information sources and capturing platforms can be located on each site of a multi-site organization, and one or more segmentation components can be remotely located, segment interactions captured at one or more sites and store the segmentation results in a local, central, distributed or any other storage.
Referring now to FIG. 2 showing a flowchart of the main steps in the proposed speaker segmentation method. Summed audio as well as additional data, such as CTI data, screen events, spotted words, data from external sources such as CRM, billing, or the like are introduced at step 104 to the system. The summed audio can use any format and any compression method acceptable by the system, such as PCM, WAV, MP3, G729, G726, G723.1, or the like. The audio can be introduced in streams, files, or the like. At step 108, preprocessing is performed on the audio, in order to enhance the audio for further processing. The preprocessing preferably includes decompression, according to the compression used in the specific interaction. If the audio is from an external source, the preprocessing can include compression and decompression with one of the protocols used in the environment in order to adapt the audio to the characteristics common in the environment. The preprocessing can further include low-quality segments removal or other processing that will enhance the quality of the audio. Step 110 marks, removes or otherwise eliminates non-speech segments from the audio. Such segments can include music, tones, DMFT, silence, segments with significant background noise or other substantially non-speech segments. Preprocessing step 108 and speech/non-speech segmentation step 110 are optional, and can be dispensed with. However, the performance in time, computing resources and the quality of the speaker segmentation will degrade if step 108 or step 110 are omitted. The enhanced audio is then transferred to segmentation step 112. Segmentation step 112 comprises a parameterization step 118, an anchoring step 120 and a modeling and classification step 124. At step 118 the speech is being parameterized by transforming the speech signal into a set of feature vectors. The purpose of this transformation is to obtain a new representation which is more compact, less redundant and more suitable for statistical modeling. Most of the speaker segmentation systems depend on cepstral representation of speech in addition to prosodic parameters such as pitch, pitch variance, energy level and the like. The parameterization generates a sequence of feature vectors, wherein each vector relates to a certain time frame, preferably in the range of 10-30 ms, where the speech could be regarded as stationary. In another alternative method, the parameterization step is performed earlier as part of preprocessing step 108 or speech/non-speech segmentation step 110. At step 118 the speech signal is being divided into non-overlapping segments, typically but not limited to having a period of 1-3 seconds. The speaker segmentation main process starts at step 120, during which, anchor segments are located within the audio interaction. Preferably, the method searches for two segments to be used as anchor segments and each of the two segments should contain speech of a different speaker. Each anchor segment will be used for initial voice modeling of the speaker it represents. The first anchor segment finding is preferably performed by a statistical modeling of every segment in the interaction and then by locating the most homogenous segment in terms of statistical voice feature distribution. Such segment is more likely to be a segment in which a single speaker is speaking rather than an area of transition between two speakers. This segment will be used for first speaker initial voice model building. Locating such first segment can also involve utilizing additional data, such as CTI events, for example the first speaker in a call center interaction is likely to be the agent addressing the customer. Alternatively, spotting with high certainty standard phrases which agents are instructed to use, such as “company X good morning, how can I help you”, can help identify an anchor segment for the agent side, and standard questions, such as “how much would it cost to”, can help in locating homogenous segments of a customer side. Once the first anchor segment is determined, the method constructs a statistical model of the voice features in that segment where the statistical model represents the voice characteristics of the first speaker. Subsequently, the method searches for a second anchor segment, whose statistical model is as different as possible from the statistical model of the first anchor, the distance is measured and quantified by some statistical distance function, such as a likelihood ratio test. The aim of the second anchor finding is to find an area in the interaction which is most likely produced by a different statistical source, i.e. a different speaker. Alternatively, if the agent (or the customer) is known and a voice model of the agent has previously been built using other voice samples of the speaker or can be otherwise obtained, locating the segments of the agent can be done by searching for all segments which comply with the specific agent model, and continuing by associating all the rest of the speech segment with the customer (or agent) side. Once the two anchor segments are determined, the system goes into the modeling and classification step 124. Step 124 comprises an iterative process. On each iteration, a statistical model is constructed from the aggregated segments identified so far as belonging to each speaker. Then the distance between each segment in the interaction and the speakers voice models is measured and quantified. The distance can be produced by likelihood calculation or the like. Next, one or more segments which are most likely to come from the same statistical distributions as the speakers statistical models, i.e. produced by the same speaker, are added to the similar speaker's pool of segments from the previous iteration. On the next iteration, the statistical models are reconstructed, utilizing the newly added segments as well as the previous ones, and new segments to be added are searched for. The iterations proceed until one or more stopping criteria are met, such as the distance between the model and the most similar segment exceeding a certain threshold, the length of the added segments being below a certain threshold or the like. During modeling and classification step 124, soft classification techniques can also be applied in determining the similarity between a segment and a statistical model or when calculating if a stop criterion is met. Once the modeling and classification is done, scoring step 128 takes place. Scoring step 128 assigns a score to the segmentation result. If the score is below a predetermined threshold, the performance is unsatisfactory and the process repeats, restarting from step 120, excluding the former first and second anchor segments or from step 118 using different voice features. The threshold can be predetermined, or it can be set dynamically, taking into account the interaction type, other data related to the interaction, additional data such as CTI data, external data such as CRM or billing data, data associated with any of the speakers, screen events or the like. The stopping condition for the segmentation can be defined in a predetermined manner, such as “try at most X times, and if the segmentation does not succeed, skip the interaction and segment another one”. Alternatively, the stopping criteria can be defined dynamically, for example, “continue the segmentation as long as there are still segments that no segments X or less seconds apart from them, have been used as anchor segments”. If the segmentation score exceeds the predetermined threshold, the results are output at step 144. The Scoring process is detailed in association with FIG. 3 below. The results output at step 144 can take any required form. One preferred form is a file or stream containing text, denoting the start and end locations of each segment, for example in terms of time units from the beginning of the interaction, and the associated speaker. The output can also comprise start and end locations for segments of an unknown speaker, or for non-speech segments. Another preferred form comprises two or more files wherein each file comprises the segments of one speaker. The non-speech or unknown speaker segments can be ignored or reside in a separate file for purposes such as playback.
Referring now to FIG. 3 showing the main steps in the scoring assessment process referred to in step 140 of FIG. 2. The scoring step comprises two main parts, assessing a statistical score and an agent-customer discrimination score. The statistical score determined at step 204 is based on determining the distance between the model generated from the segments attributed to one side and the model generated from the segments attributed to the other side. If the distance between the models is above a predetermined threshold, then the segments attributed to one side are significantly different than the segments attributed to the other side, and the classification is considered successful. If the distance is below a predetermined threshold (not necessarily equal to the predetermined threshold mentioned above), the segments attributed to different speakers are not distinctive enough, and the classification is assumed to be unsuccessful. However, the statistical score can be problematic, since the model-distance determination is calculated using the same tools and principles used when assigning segments to a certain speaker during the classification step. Therefore, the segmentation step and the testing step use the same data and the same calculations, which makes the examination biased and less reliable. Discriminative scoring step 208 uses discriminative information, such as discriminative customer-agent information in order to assess the success of the speaker segmentation process, and to determine or verify the association of each segment group with a specific speaker. Discriminative scoring step 208 is divided into model association step 212 and additional information scoring step 216. Model association step 212 uses previously built or otherwise acquired universal models of agents and of customers. The universal agent model is built from speech segments in which multiple agents of the relevant environment are speaking, using the same types of equipment used in the environment. The universal customer model is built from multiple segments of customers using various types of equipment, including land lines, cellular lines, various handsets, various types of typical customer background noise and the like. The model preferably incorporates both male and female customers if customers of both genders are likely to speak in real interactions, customers of relevant ages, accents and the like. If the speaker segmentation includes side (agent/customer) association, step 212 is used for verification of the association; otherwise step 212 is used for associating each segment group with a specific side. In model association step 212, the speech segments attributed to each side are preferably scored against the universal agent model in step 220, and against the universal customer model in step 224, thus obtaining two model association scores. The two model association scores are normalized in normalization step 228. If one segment group was assigned, for example, to an agent, and indeed the normalized score against the universal agent model yielded a significantly higher score than the scoring against the universal customer model, the association of the segment group to the agent side is reinforced. However, if the score of agent-assumed segment group against a customer model is higher then the score against the genera agent model, this might indicate a problem either in the segmentation or in the side association. The scoring can be performed for the segments attributed to a certain side one or more at a time, or all of them together, using a combination of the feature vectors associated with the segments. If the segment group is not assigned to a specific side, a normalized score to one side which exceeds a certain threshold can be used in determining the side as well as the quality of the segmentation. Model association step 212 can be performed solely in order to associate a segment group with a certain side, and not just to assess a segmentation quality, in which case it is not part of discriminative score 208 but rather an independent step.
In step 232 the method further uses additional data evaluation, in order to evaluate the contribution of each segment attributed to a certain speaker. Additional data can include spotted words that are typical to a certain side, such as “how can I help you” on the agent side, and “how much would that cost” for a customer side, CTI events, screen events, external or internal information or the like. The presence, possibly associated with a certainty level, of such events on segments associated with a specific side are accumulated or otherwise combined into a single additional data score. The scores of statistical scoring 204, model association 212 and additional data scoring 232 are combined at step 236, and a general score is issued. If the score is below a predetermined threshold, as is evaluate at step 140 of FIG. 2, the segmentation process restarts at step 120 excluding the former first and second anchor segments. Since none of scoring steps 204, 212, and 232 is mandatory, combining step 236 weights whatever scores that are available. Each subset of the scoring results of scoring steps 204, 212 and 232 can be used to produce a general scoring result. Combining step 236 can be further designed to weight additional scores, such as user input or other scoring mechanisms currently known or that will become known at a later time. Combining step 236 can use dynamic or predetermined parameters and schemes to weight or otherwise combine the available scores.
As mentioned above in relation to the statistical model scoring, and is applicable for all types of data, the same data item should not be used in the scoring phase if it was already used during the segmentation phase. Using the same data item in the two phases will bias the results and give higher and unjustified score to certain segmentation. For example, if the phrase “Company X good morning” was spotted at a certain location, and the segment it appeared on was used as an anchor for the agent side, considering this word during additional data scoring step will raise the score in an artificial manner, since it is known that the segment the phrase was said in is associated with the agent side.
It will be appreciated by people skilled in the art that some of the presented methods and scorings can be partitioned in a different manner over the described steps without significant change in the results. It will also be appreciated by people skilled in the art that additional scoring methods can exist and be applied in addition, or instead of the presented scoring. The scoring method can be applied to the results of any segmentation method, and not necessarily the one presented above. Also, different variations can be applied to the segmentation and the scoring methods as described, without significant change to the proposed solution. It will further be appreciated by people skilled in the art that the disclosed invention can be extended to segmenting an interaction between more than two speakers, without significant changes to the described method. The described rules and parameters, such as the acceptable score values, stopping criteria for the segmentation and the like can be predetermined or set dynamically. For example, the parameters can tale into account the type or length of the interaction, the customer type as received from an external system or the like.
The disclosed invention provides a novel approach to segmenting an audio interaction into segments, and associating each group of segments with one speaker. The disclosed invention provides a scoring and control mechanism over the quality of the resulting segmentation. The system
It will be appreciated by persons skilled in the art that the present invention is not limited to what has been particularly shown and described hereinabove. Rather the scope of the present invention is defined only by the claims which follow.

Claims (20)

1. A speaker segmentation method for associating an at least one segment of speech for each of at least two sides of a summed audio interaction, with one of the at least two sides of the interaction, using additional information, the method comprising:
a receiving step for receiving the summed audio interaction from a capturing and logging unit;
a segmentation step for associating the at least one segment with one side of the summed audio interaction, the segmentation step comprising
a parameterization step for transforming a speech signal into a set of feature vectors and dividing the set into non-overlapping segments;
an anchoring step for locating an anchor segment for each of the at least two sides of the summed audio interaction, the anchoring step comprising:
selecting a homogenous segment as a first anchor segment;
constructing a first model of the homogenous segment; and
selecting a second anchor segment such that its model is different from the first model; and
a modeling and classification step for associating at least one second segment with each side of the summed audio interaction; and
a scoring step for assigning a score to said segmentation.
2. The method of claim 1 wherein the additional information is at least one item selected from the group consisting of: computer-telephony-integration information related to the summed audio interaction; spotted words within the summed audio interaction; data related to the summed audio interaction; data related to a speaker thereof; external data related to the summed audio interaction; and data related to at least one other interaction performed by a speaker of the summed audio interaction.
3. The method of claim 1 further comprising a model association step for scoring the at least one segment against an at least one statistical model of one side, and obtaining a model association score.
4. The method of claim 1 wherein the scoring step uses discriminative information for discriminating the at least two sides of the summed audio interaction.
5. The method of claim 4 wherein the scoring step comprises a model association step for scoring the at least one segment against an at least one statistical model of one side, and obtaining a model association score.
6. The method of claim 5 wherein the scoring step further comprises a normalization step for normalizing the at least one model score.
7. The method of claim 4 wherein the scoring step comprises evaluating the association of the at least one segment with a side of the summed audio interaction using second additional information.
8. The method of claim 7 wherein the second additional information is at least one item selected from of the group consisting of: computer-telephony-integration information related to the summed audio interaction; spotted words within the summed audio interaction; data related to the summed audio interaction; data related to a speaker thereof; external data related to the summed audio interaction; and data related to at least one other interaction performed by a speaker of the summed audio interaction.
9. The method of claim 1 wherein the scoring step comprises statistical scoring.
10. The method of claim 1 further comprising:
a step of comparing said score to a threshold; and
repeating the segmentation step and the scoring step if said score is below the threshold.
11. The method of claim 10 wherein the threshold is predetermined, or dynamic, or depends on: information associated with said summed audio interaction, information associated with an at least one speaker thereof or external information associated with the summed audio interaction.
12. The method of claim 1 wherein the homogenous segment is selected by spotting a predetermined phrase.
13. The method of claim 1 wherein the anchoring step or the modeling and classification step comprise using second additional data.
14. The method of claim 13 wherein the second additional data is at least one item selected from the group consisting of: computer-telephony-integration information related to the summed audio interaction; spotted words within the summed audio interaction; data related to the summed audio interaction; data related to a speaker thereof; external data related to the summed audio interaction; and data related to at least one other interaction performed by a speaker of the summed audio interaction.
15. The method of claim 1 further comprising a preprocessing step for enhancing the quality of the summed audio interaction.
16. The method of claim 1 further comprising a speech/non-speech segmentation step for eliminating non-speech segments from the summed audio interaction.
17. The method of claim 1 wherein the segmentation step comprises scoring the at least one segment with a voice model of a known speaker.
18. A speaker segmentation apparatus for associating an at least one segment of speech for each of at least two speakers participating in an audio interaction, with a side of the interaction, using additional information, the apparatus comprising:
a segmentation component for associating an at least one segment within the audio interaction with one side of the audio interaction, the segmentation component comprising:
a parameterization component for transforming a speech signal into a set of feature vectors and dividing the set into non-overlapping segments;
an anchoring component for locating an anchor segment for each of the at least two sides of the audio interaction, the anchoring component selecting a homogenous segment as a first anchor segment, and a second anchor segment having a statistical model different from a statistical model associated with the first anchor segment; and
a modeling and classification component for associating at least one second segment with each side of the audio interaction; and
a scoring component for assigning a score to said segmentation.
19. The apparatus of claim 18 wherein the additional information is at least one item selected from the group consisting of: computer-telephony-integration information related to the audio interaction; spotted words within the audio interaction; data related to the audio interaction; data related to a speaker thereof; external data related to the audio interaction; and data related to at least one other interaction performed by a speaker of the audio interaction.
20. A quality management apparatus for interaction-rich speech environments, the apparatus comprising:
a capturing or logging component for capturing or logging an at least one audio interaction in which at least two sides communicate;
a segmentation component for segmenting the at least one audio interaction, the segmentation component comprising:
a parameterization component for transforming a speech signal into a set of feature vectors and dividing the set into non-overlapping segments;
an anchoring component for locating an anchor segment for each of the at least two sides of the at least one audio interaction, the anchoring component selecting a homogenous segment as a first anchor segment, and a second anchor segment having a statistical model different from a statistical model associated with the first anchor segment; and
a modeling and classification component for associating at least one second segment with each side of the at least one audio interaction; and
a playback component for playing an at least one part of the at least one audio interaction.
US10/567,810 2006-01-25 2006-01-25 Method and apparatus for segmentation of audio interactions Active 2028-11-09 US7716048B2 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/IL2006/000100 WO2007086042A2 (en) 2006-01-25 2006-01-25 Method and apparatus for segmentation of audio interactions

Publications (2)

Publication Number Publication Date
US20080181417A1 US20080181417A1 (en) 2008-07-31
US7716048B2 true US7716048B2 (en) 2010-05-11

Family

ID=38309591

Family Applications (1)

Application Number Title Priority Date Filing Date
US10/567,810 Active 2028-11-09 US7716048B2 (en) 2006-01-25 2006-01-25 Method and apparatus for segmentation of audio interactions

Country Status (2)

Country Link
US (1) US7716048B2 (en)
WO (1) WO2007086042A2 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090103708A1 (en) * 2007-09-28 2009-04-23 Kelly Conway Methods and systems for determining segments of a telephonic communication between a customer and a contact center to classify each segment of the communication, assess negotiations, and automate setup time calculation
US20090292541A1 (en) * 2008-05-25 2009-11-26 Nice Systems Ltd. Methods and apparatus for enhancing speech analytics
US20100070276A1 (en) * 2008-09-16 2010-03-18 Nice Systems Ltd. Method and apparatus for interaction or discourse analytics
US20110119060A1 (en) * 2009-11-15 2011-05-19 International Business Machines Corporation Method and system for speaker diarization
US20110196677A1 (en) * 2010-02-11 2011-08-11 International Business Machines Corporation Analysis of the Temporal Evolution of Emotions in an Audio Interaction in a Service Delivery Environment
US20140172427A1 (en) * 2012-12-14 2014-06-19 Robert Bosch Gmbh System And Method For Event Summarization Using Observer Social Media Messages
US20150088513A1 (en) * 2013-09-23 2015-03-26 Hon Hai Precision Industry Co., Ltd. Sound processing system and related method
US9472188B1 (en) * 2013-11-15 2016-10-18 Noble Systems Corporation Predicting outcomes for events based on voice characteristics and content of a contact center communication
US9711167B2 (en) 2012-03-13 2017-07-18 Nice Ltd. System and method for real-time speaker segmentation of audio interactions
US10642889B2 (en) 2017-02-20 2020-05-05 Gong I.O Ltd. Unsupervised automated topic detection, segmentation and labeling of conversations
US11276407B2 (en) 2018-04-17 2022-03-15 Gong.Io Ltd. Metadata-based diarization of teleconferences
US20220254336A1 (en) * 2019-08-12 2022-08-11 100 Brevets Pour La French Tech Method and system for enriching digital content representative of a conversation

Families Citing this family (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7953219B2 (en) * 2001-07-19 2011-05-31 Nice Systems, Ltd. Method apparatus and system for capturing and analyzing interaction based content
US8204884B2 (en) * 2004-07-14 2012-06-19 Nice Systems Ltd. Method, apparatus and system for capturing and analyzing interaction based content
US9571652B1 (en) * 2005-04-21 2017-02-14 Verint Americas Inc. Enhanced diarization systems, media and methods of use
US8639757B1 (en) 2011-08-12 2014-01-28 Sprint Communications Company L.P. User localization using friend location information
US9346397B2 (en) 2006-02-22 2016-05-24 Federal Signal Corporation Self-powered light bar
WO2007117770A2 (en) * 2006-02-22 2007-10-18 Federal Signal Corporation Networked fire station management
US7746794B2 (en) * 2006-02-22 2010-06-29 Federal Signal Corporation Integrated municipal management console
US20070194906A1 (en) * 2006-02-22 2007-08-23 Federal Signal Corporation All hazard residential warning system
US9002313B2 (en) * 2006-02-22 2015-04-07 Federal Signal Corporation Fully integrated light bar
US7476013B2 (en) 2006-03-31 2009-01-13 Federal Signal Corporation Light bar and method for making
US7991613B2 (en) * 2006-09-29 2011-08-02 Verint Americas Inc. Analyzing audio components and generating text with integrated additional session information
US8838732B2 (en) * 2007-03-22 2014-09-16 Comscore, Inc. Data transfer for network interaction fraudulence detection
US8348839B2 (en) * 2007-04-10 2013-01-08 General Electric Company Systems and methods for active listening/observing and event detection
US8041848B2 (en) * 2008-08-04 2011-10-18 Apple Inc. Media processing method and device
JP5499038B2 (en) 2008-09-18 2014-05-21 コーニンクレッカ フィリップス エヌ ヴェ System control method and signal processing system
JP5526134B2 (en) 2008-09-18 2014-06-18 コーニンクレッカ フィリップス エヌ ヴェ Conversation detection in peripheral telephone technology systems.
US8306814B2 (en) * 2010-05-11 2012-11-06 Nice-Systems Ltd. Method for speaker source classification
US9036888B2 (en) * 2012-04-30 2015-05-19 General Electric Company Systems and methods for performing quality review scoring of biomarkers and image analysis methods for biological tissue
US9368116B2 (en) 2012-09-07 2016-06-14 Verint Systems Ltd. Speaker separation in diarization
US10242330B2 (en) * 2012-11-06 2019-03-26 Nice-Systems Ltd Method and apparatus for detection and analysis of first contact resolution failures
US10134400B2 (en) * 2012-11-21 2018-11-20 Verint Systems Ltd. Diarization using acoustic labeling
US9626963B2 (en) * 2013-04-30 2017-04-18 Paypal, Inc. System and method of improving speech recognition using context
US9460722B2 (en) 2013-07-17 2016-10-04 Verint Systems Ltd. Blind diarization of recorded calls with arbitrary number of speakers
US9984706B2 (en) 2013-08-01 2018-05-29 Verint Systems Ltd. Voice activity detection using a soft decision mechanism
BR112016004299B1 (en) 2013-08-28 2022-05-17 Dolby Laboratories Licensing Corporation METHOD, DEVICE AND COMPUTER-READABLE STORAGE MEDIA TO IMPROVE PARAMETRIC AND HYBRID WAVEFORM-ENCODIFIED SPEECH
US9875743B2 (en) * 2015-01-26 2018-01-23 Verint Systems Ltd. Acoustic signature building for a speaker from multiple sessions
US10043517B2 (en) 2015-12-09 2018-08-07 International Business Machines Corporation Audio-based event interaction analytics
US20180158462A1 (en) * 2016-12-02 2018-06-07 Cirrus Logic International Semiconductor Ltd. Speaker identification
JP6845489B2 (en) * 2017-03-07 2021-03-17 日本電気株式会社 Speech processor, speech processing method, and speech processing program
US11024316B1 (en) * 2017-07-09 2021-06-01 Otter.ai, Inc. Systems and methods for capturing, processing, and rendering one or more context-aware moment-associating elements
US11100943B1 (en) 2017-07-09 2021-08-24 Otter.ai, Inc. Systems and methods for processing and presenting conversations
US10978073B1 (en) 2017-07-09 2021-04-13 Otter.ai, Inc. Systems and methods for processing and presenting conversations
US10003688B1 (en) 2018-02-08 2018-06-19 Capital One Services, Llc Systems and methods for cluster-based voice verification
US11538128B2 (en) 2018-05-14 2022-12-27 Verint Americas Inc. User interface for fraud alert management
US11423911B1 (en) * 2018-10-17 2022-08-23 Otter.ai, Inc. Systems and methods for live broadcasting of context-aware transcription and/or other elements related to conversations and/or speeches
US10887452B2 (en) 2018-10-25 2021-01-05 Verint Americas Inc. System architecture for fraud detection
US11069352B1 (en) * 2019-02-18 2021-07-20 Amazon Technologies, Inc. Media presence detection
US11115521B2 (en) 2019-06-20 2021-09-07 Verint Americas Inc. Systems and methods for authentication and fraud detection
US11868453B2 (en) 2019-11-07 2024-01-09 Verint Americas Inc. Systems and methods for customer authentication based on audio-of-interest
CN112201275A (en) * 2020-10-09 2021-01-08 深圳前海微众银行股份有限公司 Voiceprint segmentation method, voiceprint segmentation device, voiceprint segmentation equipment and readable storage medium
US11676623B1 (en) 2021-02-26 2023-06-13 Otter.ai, Inc. Systems and methods for automatic joining as a virtual meeting participant for transcription

Citations (88)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4104539A (en) 1976-04-05 1978-08-01 Hase A M Parallel redundant and load sharing regulated AC system
US4145715A (en) 1976-12-22 1979-03-20 Electronic Management Support, Inc. Surveillance system
US4359679A (en) 1978-01-16 1982-11-16 Wescom Switching, Inc. Switching d-c. regulator and load-sharing system for multiple regulators
US4527151A (en) 1982-05-03 1985-07-02 Sri International Method and apparatus for intrusion detection
US4766364A (en) 1987-11-04 1988-08-23 International Business Machines Corporation Parallel power systems
US4821118A (en) 1986-10-09 1989-04-11 Advanced Identification Systems, Inc. Video image system for personal identification
US5051827A (en) 1990-01-29 1991-09-24 The Grass Valley Group, Inc. Television signal encoder/decoder configuration control
US5091780A (en) 1990-05-09 1992-02-25 Carnegie-Mellon University A trainable security system emthod for the same
US5303045A (en) 1991-08-27 1994-04-12 Sony United Kingdom Limited Standards conversion of digital video signals
US5307170A (en) 1990-10-29 1994-04-26 Kabushiki Kaisha Toshiba Video camera having a vibrating image-processing operation
US5353168A (en) 1990-01-03 1994-10-04 Racal Recorders Limited Recording and reproducing system using time division multiplexing
US5404170A (en) 1992-06-25 1995-04-04 Sony United Kingdom Ltd. Time base converter which automatically adapts to varying video input rates
WO1995029470A1 (en) 1994-04-25 1995-11-02 Barry Katz Asynchronous video event and transaction data multiplexing technique for surveillance systems
US5491511A (en) 1994-02-04 1996-02-13 Odle; James A. Multimedia capture and audit system for a video surveillance network
US5519446A (en) 1993-11-13 1996-05-21 Goldstar Co., Ltd. Apparatus and method for converting an HDTV signal to a non-HDTV signal
US5598507A (en) * 1994-04-12 1997-01-28 Xerox Corporation Method of speaker clustering for unknown speakers in conversational audio data
US5655058A (en) * 1994-04-12 1997-08-05 Xerox Corporation Segmentation of audio data for indexing of conversational speech for real-time or postprocessing applications
US5659662A (en) * 1994-04-12 1997-08-19 Xerox Corporation Unsupervised speaker clustering for automatic speaker indexing of recorded audio data
WO1998001838A1 (en) 1996-07-10 1998-01-15 Vizicom Limited Video surveillance system and method
US5734441A (en) 1990-11-30 1998-03-31 Canon Kabushiki Kaisha Apparatus for detecting a movement vector or an image by detecting a change amount of an image density value
US5742349A (en) 1996-05-07 1998-04-21 Chrontel, Inc. Memory efficient video graphics subsystem with vertical filtering and scan rate conversion
US5751346A (en) 1995-02-10 1998-05-12 Dozier Financial Corporation Image retention and information security system
US5790096A (en) 1996-09-03 1998-08-04 Allus Technology Corporation Automated flat panel display control system for accomodating broad range of video types and formats
US5796439A (en) 1995-12-21 1998-08-18 Siemens Medical Systems, Inc. Video format conversion process and apparatus
US5847755A (en) 1995-01-17 1998-12-08 Sarnoff Corporation Method and apparatus for detecting object movement within an image sequence
US5895453A (en) 1996-08-27 1999-04-20 Sts Systems, Ltd. Method and system for the detection, management and prevention of losses in retail and other environments
US6014647A (en) 1997-07-08 2000-01-11 Nizzari; Marcia M. Customer interaction tracking
US6028626A (en) 1995-01-03 2000-02-22 Arc Incorporated Abnormality detection and surveillance system
US6031573A (en) 1996-10-31 2000-02-29 Sensormatic Electronics Corporation Intelligent video information management system performing multiple functions in parallel
US6037991A (en) 1996-11-26 2000-03-14 Motorola, Inc. Method and apparatus for communicating video information in a communication system
US6070142A (en) 1998-04-17 2000-05-30 Andersen Consulting Llp Virtual customer sales and service center and method
US6081606A (en) 1996-06-17 2000-06-27 Sarnoff Corporation Apparatus and a method for detecting motion within an image sequence
US6092197A (en) 1997-12-31 2000-07-18 The Customer Logic Company, Llc System and method for the secure discovery, exploitation and publication of information
US6094227A (en) 1997-02-03 2000-07-25 U.S. Philips Corporation Digital image rate converting method and device
US6097429A (en) 1997-08-01 2000-08-01 Esco Electronics Corporation Site control unit for video security system
US6111610A (en) 1997-12-11 2000-08-29 Faroudja Laboratories, Inc. Displaying film-originated video on high frame rate monitors without motions discontinuities
US6134530A (en) 1998-04-17 2000-10-17 Andersen Consulting Llp Rule based routing system and method for a virtual sales and service center
US6138139A (en) 1998-10-29 2000-10-24 Genesys Telecommunications Laboraties, Inc. Method and apparatus for supporting diverse interaction paths within a multimedia communication center
WO2000073996A1 (en) 1999-05-28 2000-12-07 Glebe Systems Pty Ltd Method and apparatus for tracking a moving object
US6167395A (en) 1998-09-11 2000-12-26 Genesys Telecommunications Laboratories, Inc Method and apparatus for creating specialized multimedia threads in a multimedia communication center
US6170011B1 (en) 1998-09-11 2001-01-02 Genesys Telecommunications Laboratories, Inc. Method and apparatus for determining and initiating interaction directionality within a multimedia communication center
GB2352948A (en) 1999-07-13 2001-02-07 Racal Recorders Ltd Voice activity monitoring
US6212178B1 (en) 1998-09-11 2001-04-03 Genesys Telecommunication Laboratories, Inc. Method and apparatus for selectively presenting media-options to clients of a multimedia call center
US6230197B1 (en) 1998-09-11 2001-05-08 Genesys Telecommunications Laboratories, Inc. Method and apparatus for rules-based storage and retrieval of multimedia interactions within a communication center
US6236582B1 (en) 2000-02-01 2001-05-22 Micro Linear Corporation Load share controller for balancing current between multiple supply modules
US6295367B1 (en) 1997-06-19 2001-09-25 Emtera Corporation System and method for tracking movement of objects in a scene using correspondence graphs
US20010043697A1 (en) 1998-05-11 2001-11-22 Patrick M. Cox Monitoring of and remote access to call center activity
US6327343B1 (en) 1998-01-16 2001-12-04 International Business Machines Corporation System and methods for automatic call and data transfer processing
US6330025B1 (en) 1999-05-10 2001-12-11 Nice Systems Ltd. Digital video logging system
US20010052081A1 (en) 2000-04-07 2001-12-13 Mckibben Bernard R. Communication network with a service agent element and method for providing surveillance services
US20020005898A1 (en) 2000-06-14 2002-01-17 Kddi Corporation Detection apparatus for road obstructions
US20020010705A1 (en) 2000-06-30 2002-01-24 Lg Electronics Inc. Customer relationship management system and operation method thereof
WO2002037856A1 (en) 2000-11-06 2002-05-10 Dynapel Systems, Inc. Surveillance video camera enhancement system
US20020059283A1 (en) 2000-10-20 2002-05-16 Enteractllc Method and system for managing customer relations
US6405166B1 (en) * 1998-08-13 2002-06-11 At&T Corp. Multimedia search apparatus and method for searching multimedia content using speaker detection by audio data
US6404857B1 (en) 1996-09-26 2002-06-11 Eyretel Limited Signal monitoring apparatus for analyzing communications
US20020087385A1 (en) 2000-12-28 2002-07-04 Vincent Perry G. System and method for suggesting interaction strategies to a customer service representative
US6421645B1 (en) * 1999-04-09 2002-07-16 International Business Machines Corporation Methods and apparatus for concurrent speech recognition, speaker segmentation and speaker classification
US6424946B1 (en) * 1999-04-09 2002-07-23 International Business Machines Corporation Methods and apparatus for unknown speaker labeling using concurrent speech recognition, segmentation, classification and clustering
US6427137B2 (en) 1999-08-31 2002-07-30 Accenture Llp System, method and article of manufacture for a voice analysis system that detects nervousness for preventing fraud
US6441734B1 (en) 2000-12-12 2002-08-27 Koninklijke Philips Electronics N.V. Intruder detection through trajectory analysis in monitoring and surveillance systems
WO2003013113A2 (en) 2001-08-02 2003-02-13 Eyretel Plc Automatic interaction analysis between agent and customer
US20030033145A1 (en) 1999-08-31 2003-02-13 Petrushin Valery A. System, method, and article of manufacture for detecting emotion in voice signals by utilizing statistics for voice signal parameters
US20030059016A1 (en) 2001-09-21 2003-03-27 Eric Lieberman Method and apparatus for managing communications and for creating communication routing rules
US6549613B1 (en) 1998-11-05 2003-04-15 Ulysses Holding Llc Method and apparatus for intercept of wireline communications
US6559769B2 (en) 2001-10-01 2003-05-06 Eric Anthony Early warning real-time security system
US6570608B1 (en) 1998-09-30 2003-05-27 Texas Instruments Incorporated System and method for detecting interactions of people and vehicles
US20030128099A1 (en) 2001-09-26 2003-07-10 Cockerham John M. System and method for securing a defined perimeter using multi-layered biometric electronic processing
US6604108B1 (en) 1998-06-05 2003-08-05 Metasolutions, Inc. Information mart system and information mart browser
WO2003067884A1 (en) 2002-02-06 2003-08-14 Nice Systems Ltd. Method and apparatus for video frame sequence-based object tracking
WO2003067360A2 (en) 2002-02-06 2003-08-14 Nice Systems Ltd. System and method for video content analysis-based detection, surveillance and alarm management
US20030163360A1 (en) 2002-02-25 2003-08-28 Galvin Brian R. System and method for integrated resource scheduling and agent work management
US20030182118A1 (en) * 2002-03-25 2003-09-25 Pere Obrador System and method for indexing videos based on speaker distinction
US6628835B1 (en) 1998-08-31 2003-09-30 Texas Instruments Incorporated Method and system for defining and recognizing complex events in a video sequence
US6704409B1 (en) 1997-12-31 2004-03-09 Aspect Communications Corporation Method and apparatus for processing real-time transactions and non-real-time transactions
US20040098295A1 (en) 2002-11-15 2004-05-20 Iex Corporation Method and system for scheduling workload
US20040141508A1 (en) 2002-08-16 2004-07-22 Nuasis Corporation Contact center architecture
WO2004091250A1 (en) 2003-04-09 2004-10-21 Telefonaktiebolaget Lm Ericsson (Publ) Lawful interception of multimedia calls
EP1484892A2 (en) 2003-06-05 2004-12-08 Nortel Networks Limited Method and system for lawful interception of packet switched network services
US20040249650A1 (en) 2001-07-19 2004-12-09 Ilan Freedman Method apparatus and system for capturing and analyzing interaction based content
US20040260550A1 (en) * 2003-06-20 2004-12-23 Burges Chris J.C. Audio processing system and method for classifying speakers in audio data
DE10358333A1 (en) 2003-12-12 2005-07-14 Siemens Ag Telecommunication monitoring procedure uses speech and voice characteristic recognition to select communications from target user groups
US20050228673A1 (en) * 2004-03-30 2005-10-13 Nefian Ara V Techniques for separating and evaluating audio and video source data
US20060093135A1 (en) 2004-10-20 2006-05-04 Trevor Fiatal Method and apparatus for intercepting events in a communication system
US7076427B2 (en) 2002-10-18 2006-07-11 Ser Solutions, Inc. Methods and apparatus for audio data monitoring and evaluation using speech recognition
US7103806B1 (en) 1999-06-04 2006-09-05 Microsoft Corporation System for performing context-sensitive decisions about ideal communication modalities considering information about channel reliability
US20060229876A1 (en) * 2005-04-07 2006-10-12 International Business Machines Corporation Method, apparatus and computer program providing a multi-speaker database for concatenative text-to-speech synthesis
US7295970B1 (en) * 2002-08-29 2007-11-13 At&T Corp Unsupervised speaker segmentation of multi-speaker speech data

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5335618A (en) * 1993-07-22 1994-08-09 Stopgap Enterprises Collapsible animal enclosure
GB9706172D0 (en) * 1997-03-25 1997-05-14 Crane John Uk Ltd Improvements in and relating to spring energised plastic seals
US6751354B2 (en) * 1999-03-11 2004-06-15 Fuji Xerox Co., Ltd Methods and apparatuses for video segmentation, classification, and retrieval using image class statistical models
US20040016113A1 (en) * 2002-06-19 2004-01-29 Gerald Pham-Van-Diep Method and apparatus for supporting a substrate
DE20211390U1 (en) * 2002-07-10 2003-11-20 Dolmar Gmbh Adjustable suspension damping system (anti-vibration system), especially for a hand-held tool

Patent Citations (91)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4104539A (en) 1976-04-05 1978-08-01 Hase A M Parallel redundant and load sharing regulated AC system
US4145715A (en) 1976-12-22 1979-03-20 Electronic Management Support, Inc. Surveillance system
US4359679A (en) 1978-01-16 1982-11-16 Wescom Switching, Inc. Switching d-c. regulator and load-sharing system for multiple regulators
US4527151A (en) 1982-05-03 1985-07-02 Sri International Method and apparatus for intrusion detection
US4821118A (en) 1986-10-09 1989-04-11 Advanced Identification Systems, Inc. Video image system for personal identification
US4766364A (en) 1987-11-04 1988-08-23 International Business Machines Corporation Parallel power systems
US5353168A (en) 1990-01-03 1994-10-04 Racal Recorders Limited Recording and reproducing system using time division multiplexing
US5051827A (en) 1990-01-29 1991-09-24 The Grass Valley Group, Inc. Television signal encoder/decoder configuration control
US5091780A (en) 1990-05-09 1992-02-25 Carnegie-Mellon University A trainable security system emthod for the same
US5307170A (en) 1990-10-29 1994-04-26 Kabushiki Kaisha Toshiba Video camera having a vibrating image-processing operation
US5734441A (en) 1990-11-30 1998-03-31 Canon Kabushiki Kaisha Apparatus for detecting a movement vector or an image by detecting a change amount of an image density value
US5303045A (en) 1991-08-27 1994-04-12 Sony United Kingdom Limited Standards conversion of digital video signals
US5404170A (en) 1992-06-25 1995-04-04 Sony United Kingdom Ltd. Time base converter which automatically adapts to varying video input rates
US5519446A (en) 1993-11-13 1996-05-21 Goldstar Co., Ltd. Apparatus and method for converting an HDTV signal to a non-HDTV signal
US5491511A (en) 1994-02-04 1996-02-13 Odle; James A. Multimedia capture and audit system for a video surveillance network
US5598507A (en) * 1994-04-12 1997-01-28 Xerox Corporation Method of speaker clustering for unknown speakers in conversational audio data
US5655058A (en) * 1994-04-12 1997-08-05 Xerox Corporation Segmentation of audio data for indexing of conversational speech for real-time or postprocessing applications
US5659662A (en) * 1994-04-12 1997-08-19 Xerox Corporation Unsupervised speaker clustering for automatic speaker indexing of recorded audio data
US5920338A (en) 1994-04-25 1999-07-06 Katz; Barry Asynchronous video event and transaction data multiplexing technique for surveillance systems
WO1995029470A1 (en) 1994-04-25 1995-11-02 Barry Katz Asynchronous video event and transaction data multiplexing technique for surveillance systems
US6028626A (en) 1995-01-03 2000-02-22 Arc Incorporated Abnormality detection and surveillance system
US5847755A (en) 1995-01-17 1998-12-08 Sarnoff Corporation Method and apparatus for detecting object movement within an image sequence
US5751346A (en) 1995-02-10 1998-05-12 Dozier Financial Corporation Image retention and information security system
US5796439A (en) 1995-12-21 1998-08-18 Siemens Medical Systems, Inc. Video format conversion process and apparatus
US5742349A (en) 1996-05-07 1998-04-21 Chrontel, Inc. Memory efficient video graphics subsystem with vertical filtering and scan rate conversion
US6081606A (en) 1996-06-17 2000-06-27 Sarnoff Corporation Apparatus and a method for detecting motion within an image sequence
WO1998001838A1 (en) 1996-07-10 1998-01-15 Vizicom Limited Video surveillance system and method
US5895453A (en) 1996-08-27 1999-04-20 Sts Systems, Ltd. Method and system for the detection, management and prevention of losses in retail and other environments
US5790096A (en) 1996-09-03 1998-08-04 Allus Technology Corporation Automated flat panel display control system for accomodating broad range of video types and formats
US6404857B1 (en) 1996-09-26 2002-06-11 Eyretel Limited Signal monitoring apparatus for analyzing communications
US6031573A (en) 1996-10-31 2000-02-29 Sensormatic Electronics Corporation Intelligent video information management system performing multiple functions in parallel
US6037991A (en) 1996-11-26 2000-03-14 Motorola, Inc. Method and apparatus for communicating video information in a communication system
US6094227A (en) 1997-02-03 2000-07-25 U.S. Philips Corporation Digital image rate converting method and device
US6295367B1 (en) 1997-06-19 2001-09-25 Emtera Corporation System and method for tracking movement of objects in a scene using correspondence graphs
US6014647A (en) 1997-07-08 2000-01-11 Nizzari; Marcia M. Customer interaction tracking
US6097429A (en) 1997-08-01 2000-08-01 Esco Electronics Corporation Site control unit for video security system
US6111610A (en) 1997-12-11 2000-08-29 Faroudja Laboratories, Inc. Displaying film-originated video on high frame rate monitors without motions discontinuities
US6092197A (en) 1997-12-31 2000-07-18 The Customer Logic Company, Llc System and method for the secure discovery, exploitation and publication of information
US6704409B1 (en) 1997-12-31 2004-03-09 Aspect Communications Corporation Method and apparatus for processing real-time transactions and non-real-time transactions
US6327343B1 (en) 1998-01-16 2001-12-04 International Business Machines Corporation System and methods for automatic call and data transfer processing
US6134530A (en) 1998-04-17 2000-10-17 Andersen Consulting Llp Rule based routing system and method for a virtual sales and service center
US6070142A (en) 1998-04-17 2000-05-30 Andersen Consulting Llp Virtual customer sales and service center and method
US20010043697A1 (en) 1998-05-11 2001-11-22 Patrick M. Cox Monitoring of and remote access to call center activity
US6604108B1 (en) 1998-06-05 2003-08-05 Metasolutions, Inc. Information mart system and information mart browser
US6405166B1 (en) * 1998-08-13 2002-06-11 At&T Corp. Multimedia search apparatus and method for searching multimedia content using speaker detection by audio data
US6628835B1 (en) 1998-08-31 2003-09-30 Texas Instruments Incorporated Method and system for defining and recognizing complex events in a video sequence
US6230197B1 (en) 1998-09-11 2001-05-08 Genesys Telecommunications Laboratories, Inc. Method and apparatus for rules-based storage and retrieval of multimedia interactions within a communication center
US6170011B1 (en) 1998-09-11 2001-01-02 Genesys Telecommunications Laboratories, Inc. Method and apparatus for determining and initiating interaction directionality within a multimedia communication center
US6212178B1 (en) 1998-09-11 2001-04-03 Genesys Telecommunication Laboratories, Inc. Method and apparatus for selectively presenting media-options to clients of a multimedia call center
US6167395A (en) 1998-09-11 2000-12-26 Genesys Telecommunications Laboratories, Inc Method and apparatus for creating specialized multimedia threads in a multimedia communication center
US6345305B1 (en) 1998-09-11 2002-02-05 Genesys Telecommunications Laboratories, Inc. Operating system having external media layer, workflow layer, internal media layer, and knowledge base for routing media events between transactions
US6570608B1 (en) 1998-09-30 2003-05-27 Texas Instruments Incorporated System and method for detecting interactions of people and vehicles
US6138139A (en) 1998-10-29 2000-10-24 Genesys Telecommunications Laboraties, Inc. Method and apparatus for supporting diverse interaction paths within a multimedia communication center
US6549613B1 (en) 1998-11-05 2003-04-15 Ulysses Holding Llc Method and apparatus for intercept of wireline communications
US6421645B1 (en) * 1999-04-09 2002-07-16 International Business Machines Corporation Methods and apparatus for concurrent speech recognition, speaker segmentation and speaker classification
US6424946B1 (en) * 1999-04-09 2002-07-23 International Business Machines Corporation Methods and apparatus for unknown speaker labeling using concurrent speech recognition, segmentation, classification and clustering
US6330025B1 (en) 1999-05-10 2001-12-11 Nice Systems Ltd. Digital video logging system
WO2000073996A1 (en) 1999-05-28 2000-12-07 Glebe Systems Pty Ltd Method and apparatus for tracking a moving object
US7103806B1 (en) 1999-06-04 2006-09-05 Microsoft Corporation System for performing context-sensitive decisions about ideal communication modalities considering information about channel reliability
GB2352948A (en) 1999-07-13 2001-02-07 Racal Recorders Ltd Voice activity monitoring
US6427137B2 (en) 1999-08-31 2002-07-30 Accenture Llp System, method and article of manufacture for a voice analysis system that detects nervousness for preventing fraud
US20030033145A1 (en) 1999-08-31 2003-02-13 Petrushin Valery A. System, method, and article of manufacture for detecting emotion in voice signals by utilizing statistics for voice signal parameters
US6236582B1 (en) 2000-02-01 2001-05-22 Micro Linear Corporation Load share controller for balancing current between multiple supply modules
US20010052081A1 (en) 2000-04-07 2001-12-13 Mckibben Bernard R. Communication network with a service agent element and method for providing surveillance services
US20020005898A1 (en) 2000-06-14 2002-01-17 Kddi Corporation Detection apparatus for road obstructions
US20020010705A1 (en) 2000-06-30 2002-01-24 Lg Electronics Inc. Customer relationship management system and operation method thereof
US20020059283A1 (en) 2000-10-20 2002-05-16 Enteractllc Method and system for managing customer relations
WO2002037856A1 (en) 2000-11-06 2002-05-10 Dynapel Systems, Inc. Surveillance video camera enhancement system
US6441734B1 (en) 2000-12-12 2002-08-27 Koninklijke Philips Electronics N.V. Intruder detection through trajectory analysis in monitoring and surveillance systems
US20020087385A1 (en) 2000-12-28 2002-07-04 Vincent Perry G. System and method for suggesting interaction strategies to a customer service representative
US20040249650A1 (en) 2001-07-19 2004-12-09 Ilan Freedman Method apparatus and system for capturing and analyzing interaction based content
WO2003013113A2 (en) 2001-08-02 2003-02-13 Eyretel Plc Automatic interaction analysis between agent and customer
US20030059016A1 (en) 2001-09-21 2003-03-27 Eric Lieberman Method and apparatus for managing communications and for creating communication routing rules
US20030128099A1 (en) 2001-09-26 2003-07-10 Cockerham John M. System and method for securing a defined perimeter using multi-layered biometric electronic processing
US6559769B2 (en) 2001-10-01 2003-05-06 Eric Anthony Early warning real-time security system
WO2003067884A1 (en) 2002-02-06 2003-08-14 Nice Systems Ltd. Method and apparatus for video frame sequence-based object tracking
WO2003067360A2 (en) 2002-02-06 2003-08-14 Nice Systems Ltd. System and method for video content analysis-based detection, surveillance and alarm management
US20040161133A1 (en) 2002-02-06 2004-08-19 Avishai Elazar System and method for video content analysis-based detection, surveillance and alarm management
US20030163360A1 (en) 2002-02-25 2003-08-28 Galvin Brian R. System and method for integrated resource scheduling and agent work management
US20030182118A1 (en) * 2002-03-25 2003-09-25 Pere Obrador System and method for indexing videos based on speaker distinction
US20040141508A1 (en) 2002-08-16 2004-07-22 Nuasis Corporation Contact center architecture
US7295970B1 (en) * 2002-08-29 2007-11-13 At&T Corp Unsupervised speaker segmentation of multi-speaker speech data
US7076427B2 (en) 2002-10-18 2006-07-11 Ser Solutions, Inc. Methods and apparatus for audio data monitoring and evaluation using speech recognition
US20040098295A1 (en) 2002-11-15 2004-05-20 Iex Corporation Method and system for scheduling workload
WO2004091250A1 (en) 2003-04-09 2004-10-21 Telefonaktiebolaget Lm Ericsson (Publ) Lawful interception of multimedia calls
EP1484892A2 (en) 2003-06-05 2004-12-08 Nortel Networks Limited Method and system for lawful interception of packet switched network services
US20040260550A1 (en) * 2003-06-20 2004-12-23 Burges Chris J.C. Audio processing system and method for classifying speakers in audio data
DE10358333A1 (en) 2003-12-12 2005-07-14 Siemens Ag Telecommunication monitoring procedure uses speech and voice characteristic recognition to select communications from target user groups
US20050228673A1 (en) * 2004-03-30 2005-10-13 Nefian Ara V Techniques for separating and evaluating audio and video source data
US20060093135A1 (en) 2004-10-20 2006-05-04 Trevor Fiatal Method and apparatus for intercepting events in a communication system
US20060229876A1 (en) * 2005-04-07 2006-10-12 International Business Machines Corporation Method, apparatus and computer program providing a multi-speaker database for concatenative text-to-speech synthesis

Non-Patent Citations (17)

* Cited by examiner, † Cited by third party
Title
(Hebrew) print from Haaretz, "The Computer at the Other End of the Line", Feb. 17, 2002.
Bimbot, et al.; A Tutorial on Text-Independent Speaker Vertification; EURASIP Journal on Applied Signal Processing; pp. 430-451; © 2004 Hindawi Publishing Corporation.
Chaudhari, et al.; Very Large Population Text-Independent Speaker Identification using Transformation Enhanced Multi-Grained Models; IBM T.J. Watson Research Center. Oct. 2000.
Financial companies want to turn regulatory burden into competitive advantage, Feb. 24, 2003, printed from Information Week, http://www.informationweek.com/story/IWK20030223S0002.
Freedman, I. Closing the Contact Center Quality Loop with Customer Experience Management, Customer Interaction Solutions, vol. 19, No. 9, Mar. 2001.
Lawrence P. Mark; Sertainty(TM) Automated Quality Assurance; © 2003-2005 SER Solutions, Inc.
Lawrence P. Mark; Sertainty™ Automated Quality Assurance; © 2003-2005 SER Solutions, Inc.
Muthusamy, et al.; Reviewing Automatic Language Identification; IEEE Signal Processing Magazine; pp. 33-41; Oct. 1994.
PR Newswire, Nice Redefines Customer Interactions with Launch of Customer Experience Management, Jun. 13, 2000.
PR Newswire, Recognition Systems and Hyperion to Provide Closed Loop CRM Analytic Applications, Nov. 17, 1999.
Reynolds, et al.; Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models; IEEE Transactions on Speech and Audio Processing, vol. 3, No. 1,pp. 72-83; Jan. 1995.
Reynolds, et al.; Speaker Verification Using Adapted Gaussian Mixture Models; M.I.T. Lincoln Laboratory; Digital Signal Processing 10, pp. 19-41 (Oct. 1, 2000).
SERTAINTY(TM) Agent Performance Optimization; © 2005 SER Solutions, Inc. www.ser.com.
SERTAINTY(TM) Automated Quality Monitoring; © 2003 SER Solutions, Inc. www.ser.com.
SERTAINTY™ Agent Performance Optimization; © 2005 SER Solutions, Inc. www.ser.com.
SERTAINTY™ Automated Quality Monitoring; © 2003 SER Solutions, Inc. www.ser.com.
Zissman; Comparison of Four Approaches to Automatic Language Identification of Telephone Speech; IEEE Transactions on Speech and Audio Processing, vol. 4, No. 1, pp. 31-44; Jan. 1996.

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090103708A1 (en) * 2007-09-28 2009-04-23 Kelly Conway Methods and systems for determining segments of a telephonic communication between a customer and a contact center to classify each segment of the communication, assess negotiations, and automate setup time calculation
US8611523B2 (en) * 2007-09-28 2013-12-17 Mattersight Corporation Methods and systems for determining segments of a telephonic communication between a customer and a contact center to classify each segment of the communication, assess negotiations, and automate setup time calculation
US8145482B2 (en) * 2008-05-25 2012-03-27 Ezra Daya Enhancing analysis of test key phrases from acoustic sources with key phrase training models
US20090292541A1 (en) * 2008-05-25 2009-11-26 Nice Systems Ltd. Methods and apparatus for enhancing speech analytics
US8676586B2 (en) * 2008-09-16 2014-03-18 Nice Systems Ltd Method and apparatus for interaction or discourse analytics
US20100070276A1 (en) * 2008-09-16 2010-03-18 Nice Systems Ltd. Method and apparatus for interaction or discourse analytics
US20130006635A1 (en) * 2009-11-15 2013-01-03 International Business Machines Method and system for speaker diarization
US20110119060A1 (en) * 2009-11-15 2011-05-19 International Business Machines Corporation Method and system for speaker diarization
US8554562B2 (en) * 2009-11-15 2013-10-08 Nuance Communications, Inc. Method and system for speaker diarization
US8554563B2 (en) * 2009-11-15 2013-10-08 Nuance Communications, Inc. Method and system for speaker diarization
US8417524B2 (en) * 2010-02-11 2013-04-09 International Business Machines Corporation Analysis of the temporal evolution of emotions in an audio interaction in a service delivery environment
US20110196677A1 (en) * 2010-02-11 2011-08-11 International Business Machines Corporation Analysis of the Temporal Evolution of Emotions in an Audio Interaction in a Service Delivery Environment
US9711167B2 (en) 2012-03-13 2017-07-18 Nice Ltd. System and method for real-time speaker segmentation of audio interactions
US20140172427A1 (en) * 2012-12-14 2014-06-19 Robert Bosch Gmbh System And Method For Event Summarization Using Observer Social Media Messages
US10224025B2 (en) * 2012-12-14 2019-03-05 Robert Bosch Gmbh System and method for event summarization using observer social media messages
US20150088513A1 (en) * 2013-09-23 2015-03-26 Hon Hai Precision Industry Co., Ltd. Sound processing system and related method
US9472188B1 (en) * 2013-11-15 2016-10-18 Noble Systems Corporation Predicting outcomes for events based on voice characteristics and content of a contact center communication
US9552812B1 (en) * 2013-11-15 2017-01-24 Noble Systems Corporation Predicting outcomes for events based on voice characteristics and content of a voice sample of a contact center communication
US9779729B1 (en) * 2013-11-15 2017-10-03 Noble Systems Corporation Predicting outcomes for events based on voice characteristics and content of a voice sample of a contact center communication
US10642889B2 (en) 2017-02-20 2020-05-05 Gong I.O Ltd. Unsupervised automated topic detection, segmentation and labeling of conversations
US11276407B2 (en) 2018-04-17 2022-03-15 Gong.Io Ltd. Metadata-based diarization of teleconferences
US20220254336A1 (en) * 2019-08-12 2022-08-11 100 Brevets Pour La French Tech Method and system for enriching digital content representative of a conversation

Also Published As

Publication number Publication date
US20080181417A1 (en) 2008-07-31
WO2007086042A3 (en) 2009-05-07
WO2007086042A2 (en) 2007-08-02

Similar Documents

Publication Publication Date Title
US7716048B2 (en) Method and apparatus for segmentation of audio interactions
US8219404B2 (en) Method and apparatus for recognizing a speaker in lawful interception systems
US20240021206A1 (en) Diarization using acoustic labeling
US8306814B2 (en) Method for speaker source classification
US8311824B2 (en) Methods and apparatus for language identification
US9711167B2 (en) System and method for real-time speaker segmentation of audio interactions
US8412530B2 (en) Method and apparatus for detection of sentiment in automated transcriptions
US8676586B2 (en) Method and apparatus for interaction or discourse analytics
US8798255B2 (en) Methods and apparatus for deep interaction analysis
US8571853B2 (en) Method and system for laughter detection
US9093081B2 (en) Method and apparatus for real time emotion detection in audio interactions
CN101547261B (en) Association apparatus and association method
US8990090B1 (en) Script compliance using speech recognition
US20080040110A1 (en) Apparatus and Methods for the Detection of Emotions in Audio Interactions
US8145482B2 (en) Enhancing analysis of test key phrases from acoustic sources with key phrase training models
US20090150152A1 (en) Method and apparatus for fast search in call-center monitoring
US20110004473A1 (en) Apparatus and method for enhanced speech recognition
Mamou et al. Spoken document retrieval from call-center conversations
US20080195387A1 (en) Method and apparatus for large population speaker identification in telephone interactions
CN103377651A (en) Device and method for automatic voice synthesis
Thomas et al. Expressions of style in information seeking conversation with an agent
Nandwana et al. Analysis of Critical Metadata Factors for the Calibration of Speaker Recognition Systems.
Friedland et al. Live speaker identification in conversations
Fu et al. Improving meeting inclusiveness using speech interruption analysis
Foote et al. Finding presentations in recorded meetings using audio and video features

Legal Events

Date Code Title Description
AS Assignment

Owner name: NICE SYSTEMS LTD.,ISRAEL

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PEREG, OREN;WASERBLAT, MOSHE;REEL/FRAME:017570/0325

Effective date: 20060126

Owner name: NICE SYSTEMS LTD., ISRAEL

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PEREG, OREN;WASERBLAT, MOSHE;REEL/FRAME:017570/0325

Effective date: 20060126

AS Assignment

Owner name: DAIMLER AG,GERMANY

Free format text: CHANGE OF NAME;ASSIGNOR:DAIMLERCHRYSLER AG;REEL/FRAME:020976/0889

Effective date: 20071019

Owner name: DAIMLER AG, GERMANY

Free format text: CHANGE OF NAME;ASSIGNOR:DAIMLERCHRYSLER AG;REEL/FRAME:020976/0889

Effective date: 20071019

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

AS Assignment

Owner name: JPMORGAN CHASE BANK, N.A., AS ADMINISTRATIVE AGENT, ILLINOIS

Free format text: PATENT SECURITY AGREEMENT;ASSIGNORS:NICE LTD.;NICE SYSTEMS INC.;AC2 SOLUTIONS, INC.;AND OTHERS;REEL/FRAME:040821/0818

Effective date: 20161114

Owner name: JPMORGAN CHASE BANK, N.A., AS ADMINISTRATIVE AGENT

Free format text: PATENT SECURITY AGREEMENT;ASSIGNORS:NICE LTD.;NICE SYSTEMS INC.;AC2 SOLUTIONS, INC.;AND OTHERS;REEL/FRAME:040821/0818

Effective date: 20161114

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552)

Year of fee payment: 8

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 12