WO2007118709A1 - A method for detecting a commercial in a video data stream by evaluating descriptor information - Google Patents

A method for detecting a commercial in a video data stream by evaluating descriptor information Download PDF

Info

Publication number
WO2007118709A1
WO2007118709A1 PCT/EP2007/003409 EP2007003409W WO2007118709A1 WO 2007118709 A1 WO2007118709 A1 WO 2007118709A1 EP 2007003409 W EP2007003409 W EP 2007003409W WO 2007118709 A1 WO2007118709 A1 WO 2007118709A1
Authority
WO
WIPO (PCT)
Prior art keywords
video data
sub
descriptor information
commercial
evaluating
Prior art date
Application number
PCT/EP2007/003409
Other languages
French (fr)
Inventor
Ronald Glasberg
Thomas Sikora
Cengiz Tas
Original Assignee
Technische Universität Berlin
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Technische Universität Berlin filed Critical Technische Universität Berlin
Publication of WO2007118709A1 publication Critical patent/WO2007118709A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content

Definitions

  • the invention relates to a method for detecting a commercial in a video data stream by evalu- ating descriptor information provided from a plurality of descriptors indicative of commercial or non-commercial features.
  • a known descriptor refers to the appearance of several monochrome black frames also referred to as separating frames or dark frames between each commercial block.
  • Lienhart et al. published an approach (R. Lienhart et al., IEEE Conference on Multimedia Computing and Systems, pp. 509 - 516, 1997), requiring that the average and the standard deviation intensity values of the pixels in these frames should be below a certain threshold.
  • Sadlier et al. International Conference on Enterprise Information Systems, pp. 449 - 452, 2001 designed a method to detect black frames using the DC-coefficients in an MPEG-I- encoded bit stream.
  • a method for detecting a commercial in a video data stream by evaluating descriptor information provided from a plurality of descriptors indicative of commercial or non-commercial features comprising the steps of: detecting a plurality of video data frames in the video data stream; analyzing for each video data frame of the plurality of video data frames sub-areas where an essentially static logo is likely to be broadcasted; generating from the analysis of the sub-areas static area descriptor information; and using the static area descriptor information in the step of evaluating the descriptor information.
  • a method for detecting a commercial in a video data stream by evaluating descriptor information provided from a plurality of descriptors indicative of commercial or non-commercial features comprising the steps of: detecting a plurality of video data frames in the video data stream; deciding for each video data frame whether the video data frame is a separating frame by analyzing for each video data frame a plurality of sub-images; generating separating block descriptor information; and using the separating block descriptor information in the step of evaluating the descriptor information.
  • a method for detecting a commercial in a video data stream by evaluating descriptor information provided from a plurality of descrip- tors indicative of commercial or non-commercial features, wherein the descriptor information provided for the plurality of descriptors is evaluated in an entropy based decision process.
  • the invention provides the advantage that detection of commercials in a video data stream is faster and more reliable.
  • the techniques provided are especially suitable for real-time application.
  • Static area descriptor information is provided from a fast descriptor which detects the presence of a transparent or non-transparent static logo by detecting sub-areas instead of recogniz- ing logos.
  • the recognition is computationally expensive and, therefore, not suitable for realtime application.
  • the separating block descriptor information prevents false acceptance of dark frames as separating frame, so that a very high detection accuracy of separating blocks is achieved.
  • the descriptor information is logically combined with a simple classifier to produce a reliable commercial detection - instead of using a complex clas- sifier.
  • the step of analyzing the sub-areas comprises a step of analyzing sub-areas located in corner sections of the video data frames.
  • the step of analyzing the sub-areas further comprises steps of: generating for each video data frame of the plurality of video data frames values of luminance for each of the sub-areas; storing a plurality of darkest values of luminance for each of the sub-areas; generating for each of the sub-areas an average value of luminance; and generating the static area descriptor information as indicative of the commercial if for at least one of the sub-areas the average value of luminance exceeds a threshold value.
  • the method further comprises the steps of: deciding for each video data frame whether the video data frame is a separating frame by analyzing for each video data frame a plurality of sub-images; generating separating block descriptor information; and using the separating block descriptor information in the step of evaluating the descriptor information.
  • the step of analyzing the plurality of sub-images further com- prises steps of: generating for each of the sub-images an average value of luminance; and comparing for each of the sub-images the average value of luminance to a threshold value of luminance.
  • the step of analyzing the plurality of sub-images further comprises steps of: generating for each of the sub-images a value of variance; and comparing for each of the sub-images the value of variance to a threshold value of variance.
  • the step of analyzing the plurality of sub-images further comprises steps of: detecting a number of consecutive separating frames; and comparing the detected number of consecutive separating frames to a pre-defined number of consecutive separating frames.
  • the step of analyzing the plurality of sub-images further comprises steps of: detecting a separating block of separating frames; and detecting a time distance of the detected separating block to a previous separating block of separating frames.
  • the entropy based decision process uses an ID3 algorithm. Combining the descriptor information logically with a simple classifier - instead of a complex classifier - to produce a reliable commercial detection.
  • Fig. 1 shows schematically a concept of a video-genre-classification system
  • Fig. 2 shows schematically a representation of a method for detecting a commercial in a video data stream by evaluating descriptor information provided from a plurality of descriptors indicative of commercial or non-commercial features;
  • Fig. 3 shows a plurality of candidate video data frames for false acceptance as a separating block being part of a fade
  • Fig. 4 shows schematically a representation of a structure of video data frames of a commercial
  • Fig. 5 shows schematically a block diagram representation of a process of generating sepa- rating block descriptor information
  • Fig. 6 shows schematically a block diagram representation of a process of generating static area descriptor information
  • Fig. 7 shows schematically a block diagram representation of a process of generating hard- cut-rate descriptor information
  • Fig. 8 shows schematically a representation of the appearance of features in a commercial
  • Fig. 9 shows schematically a block diagram representation of a process of evaluating descriptor information in an entropy based decision tree process.
  • a method for detecting a commercial in a video data stream by evaluating descriptor information provided from a plurality of descriptors indicative of commercial or non-commercial features is described.
  • the video data can be compressed.
  • Fig. 1 shows schematically a concept of a video-genre-classification system.
  • Input to the sys- tern is a video data stream 10 received, for example, by a device 20 over an antenna, cable, internet, satellite or a DVD player.
  • descriptors analyze video data frames, extract features and combine them with a classifier to a genre. Therefore the new system enables users to access programs shown on a user device 40 to be clustered by genres.
  • Fig. 2 shows schematically a representation of a method for detecting a commercial in the video data stream 10 by evaluating descriptor information provided from a plurality of descriptors indicative of commercial or non-commercial features. Extracted information from different visual descriptors, namely a static area descriptor 60, a separating block descriptor 80, and hard-cut-rate descriptor 70, of consecutive video data frames 50 are logically combined using a decision tree based evaluation 90 to produce a reliable recognition 100.
  • a decision tree based evaluation 90 to produce a reliable recognition 100.
  • Fig. 3 shows a plurality of candidate video data frames 110 for false acceptance as a separat- ing block being part of a fade.
  • the descriptor which examines sub-images of each video data frame, the number of consecutive dark frames and the time-distance to previous separating blocks, the false acceptance of the presented dark frame as a separating block is prevented.
  • FIG. 4 shows schemati- cally a representation of video data frames of a commercial 200. The following features are depicted:
  • descriptors are proposed to transform the video data into feature vectors.
  • a first step all consecutive video data frames of the video data stream are saved temporarily for processing.
  • descriptor information is generated for a plurality of descriptors.
  • a process of generating separating block descriptor information is described.
  • the following steps are performed: 1) A color transformation to receive the luminance information by taking each consecutive video data frame and transforming it in a color space, where the luminance signal Y is directly available (e.g. YCBCR) 300.
  • An average value of luminance and a variance value of luminance are generated for each video data frame by determine the average luminance L ⁇ as well as the variance L var of the pixels for each consecutive video data frame 310. All candidates as separating frames have to be below a certain threshold.
  • a value of average luminance L ⁇ . S b for 3x3 sub-images of the video data frames is provided 320.
  • the number of consecutive separating frames is counted.
  • the number of consecutive separating frames, satisfying the requirements mentioned in step 3), is counted and has to be in a certain range (appearance of 3 to 14 separating frames between each commercial) 330.
  • the time distance to previous blocks of separating frames is considered. This is called a separating block fl , if the time distance to the previous separating block fulfills the restrictions (duration of individual commercials more than 10 and less than 60 seconds) regarding the duration of individual commercials 340.
  • the present method prevents false acceptance of separating frames also referred to black or dark frames belonging to fades within a commercial spot, but still showing a small area of information.
  • the proposed descriptor analyses sub-images of a video data frame and, optionally, the time-distance between separating blocks.
  • a process of generating static area descriptor information is described. It is investigated whether a TV-logo is present or not. This task was solved by examining, if in the interesting scanned areas, a static field is present. In the present embodiment, the following steps are performed:
  • Pixel luminance values in scanned sub-areas c ⁇ -c3 of a first examined video data frame are saved.
  • Corresponding luminance values Y of the pixels in the scanned sub-areas c ⁇ - c3 of the first video data frame of a processing window N are saved in a buffer Lbuffer- 12)
  • the darkest pixel luminance values are stored.
  • the current values Lactp rame in the scanned sub-areas are compared with Lbuff e r in the scanned sub- areas.
  • the darkest values of the search for each sub-area are stored 410.
  • a binary image is generated. After a length of N video data frames, a binary image is generated by comparing the resulting values to a threshold 420.
  • An average value of luminance for each scanned sub-area is generated. For each of the four scanned sub-areas c ⁇ -c3 the average luminance value is calculated separately 430. 15) It is decided whether a static area is present. If the average luminance value ⁇ c of at least one of the four sub-areas c ⁇ -c3 is higher than zero, static pixels f2 are detected, and the probability of a TV-logo, present in this sub-area, is high 440.
  • the difference of two consecutive I-frames presents a hard-cut, if this dif- ference exceeds a certain threshold.
  • the amount of these values is averaged over a window N resulting in the hard cut frequency f3 500.
  • one descriptor was implemented which builds on the motion-activity information included in the mpeg-2 stream.
  • the descriptors were applied to a set of training video data.
  • case 1 a separating block and non-static areas (no-logos) were detected within a decision window.
  • case 2 either a high hard-cut-rate or no-logo was detected providing that case 1 occurred within the last 100 I-frames.
  • no-logo and a high hard-cut-frequency were simultane- ously detected.
  • Fig. 9 the case with the lowest entropy forms a first node 600.
  • the resulting decision tree is shown in Fig. 9. The following steps are performed: (i) First it is examined if case 1 (see Fig. 8) appeared within the decision window of N (50 consecutive I frames). If yes, a commercial has been detected, (ii) If case 1 didn't appear within N, it was checked whether case 2 (see Fig. 8) appeared within 2*N. If yes, again commercial has been detected, (iii) If case 1 and case 2 didn't appear, but case 3 appeared commercial has been detected.
  • the performance of each descriptor on the training-data was examined.
  • the separating block descriptor should detect the transition from a running spot to a new spot.
  • Table 3 shows the amount of windows N within each genre and the detected number of windows with non-static areas.
  • Table 4 shows the performance of the hard-cut-rate descriptor.
  • Table 5 shows the detection rates for the 20 video sequences of each genre used in our experiments, there were 100 video data streams for testing.
  • Table 5 Probability for video being detected as Commercial
  • 91 from 98 examined windows from the genre 'commercial' were classified as commercial. The remaining 7 windows were very close to the commercial-detection threshold of 50%. Those sequences started within a commercial, after the appearance of a separating block, presented a long spot with a 'company-logo' and had a low high-cut- frequency. It is obvious, that detection of sequences with 'special cases' is highly unreliable. It is interesting to note that only 1 from 105 examined windows of the genre 'music' and 1 of 95 windows of 'news' were misclassified (caused by non-static areas and high cut-rate). The genre 'cartoon' and 'sport' were correctly classified in more than 90 windows.
  • a new approach for the detection of commercials is presented.
  • three contributions to optimization of commercial detection in video data streams have been made.
  • New visual descriptors are provided.
  • the temporal relations of the features are evaluated.
  • a decision tree process is proposed to combine the results of the visual descriptors, deriving a probability rate for a video sequence being a 'commercial' or 'non-commercial'.
  • a video database containing five popular genres namely cartoon, commercial, music, news and sports has been used.
  • An average correct classification rate of 93% for commercial-videos detected as a 'commercial' and more than 99% for the other genres detected as a 'non-commercial' has been achieved.

Abstract

Systems, methods, and devices for detecting a commercial in a video data stream by evaluating descriptor information provided from a plurality of visual descriptors indicative of com- mercial or non-commercial features. The descriptor information provided from the plurality of descriptors may be evaluated in an entropy based decision process.

Description

A method for detecting a commercial in a video data stream by evaluating descriptor information
The invention relates to a method for detecting a commercial in a video data stream by evalu- ating descriptor information provided from a plurality of descriptors indicative of commercial or non-commercial features.
Background of the invention
With the advent of digital TV-broadcasts and video libraries presenting more than hundred of channels at a time over Antenna, Cable, Internet and Satellite, the need for a user-friendly TV- program selection is growing. Unlike the present TV and Internet, a new system should enable users to access programs clustered by genres.
There are several approaches addressing commercial-detection and video-classification. Satterwhite et al. (IEEE Potentials, pp. 9 - 12, 2004) describe the characteristics of commercials and give an overview of several algorithms, which have been experimentally used for detection. Usually so-called descriptor information is evaluated for detecting commercials in a video data stream. A descriptor can be considered as a filter extracting indicative parameter. A descriptor can extract commercial specific features from a video data stream.
A known descriptor refers to the appearance of several monochrome black frames also referred to as separating frames or dark frames between each commercial block. In this context Lienhart et al. published an approach (R. Lienhart et al., IEEE Conference on Multimedia Computing and Systems, pp. 509 - 516, 1997), requiring that the average and the standard deviation intensity values of the pixels in these frames should be below a certain threshold. Sadlier et al. (International Conference on Enterprise Information Systems, pp. 449 - 452, 2001) designed a method to detect black frames using the DC-coefficients in an MPEG-I- encoded bit stream.
Information on the removal of the TV-logo (network logo) during the commercial blocks is another descriptor. The recognition of logos, for example, is described in R. J. M. den HoI- lander et al. (International Conference on Image Processing, volume 3, pp. 517 - 520, 2003). These methods are computationally expensive and therefore not suitable for our real-time application.
Summary of the invention
It is the object of the invention to provide techniques for detecting a commercial in a video data stream by evaluating descriptor information provided from a plurality of descriptors indicative of commercial or non-commercial features, where the commercials are detected with higher likelihood.
According to one aspect of the invention a method for detecting a commercial in a video data stream by evaluating descriptor information provided from a plurality of descriptors indicative of commercial or non-commercial features, the method comprising the steps of: detecting a plurality of video data frames in the video data stream; analyzing for each video data frame of the plurality of video data frames sub-areas where an essentially static logo is likely to be broadcasted; generating from the analysis of the sub-areas static area descriptor information; and using the static area descriptor information in the step of evaluating the descriptor information.
According to another aspect of the invention a method for detecting a commercial in a video data stream by evaluating descriptor information provided from a plurality of descriptors indicative of commercial or non-commercial features, the method comprising the steps of: detecting a plurality of video data frames in the video data stream; deciding for each video data frame whether the video data frame is a separating frame by analyzing for each video data frame a plurality of sub-images; generating separating block descriptor information; and using the separating block descriptor information in the step of evaluating the descriptor information.
According to still another aspect of the invention a method for detecting a commercial in a video data stream by evaluating descriptor information provided from a plurality of descrip- tors indicative of commercial or non-commercial features, wherein the descriptor information provided for the plurality of descriptors is evaluated in an entropy based decision process.
The invention provides the advantage that detection of commercials in a video data stream is faster and more reliable. The techniques provided are especially suitable for real-time application.
Static area descriptor information is provided from a fast descriptor which detects the presence of a transparent or non-transparent static logo by detecting sub-areas instead of recogniz- ing logos. The recognition is computationally expensive and, therefore, not suitable for realtime application. The separating block descriptor information prevents false acceptance of dark frames as separating frame, so that a very high detection accuracy of separating blocks is achieved. In the entropy based process the descriptor information is logically combined with a simple classifier to produce a reliable commercial detection - instead of using a complex clas- sifier.
Preferably, the step of analyzing the sub-areas comprises a step of analyzing sub-areas located in corner sections of the video data frames.
In another preferred embodiment, the step of analyzing the sub-areas further comprises steps of: generating for each video data frame of the plurality of video data frames values of luminance for each of the sub-areas; storing a plurality of darkest values of luminance for each of the sub-areas; generating for each of the sub-areas an average value of luminance; and generating the static area descriptor information as indicative of the commercial if for at least one of the sub-areas the average value of luminance exceeds a threshold value.
The following advantageous embodiments may be provided where separating block descriptor information is evaluated.
In a preferred embodiment, the method further comprises the steps of: deciding for each video data frame whether the video data frame is a separating frame by analyzing for each video data frame a plurality of sub-images; generating separating block descriptor information; and using the separating block descriptor information in the step of evaluating the descriptor information.
In a refinement of the invention, the step of analyzing the plurality of sub-images further com- prises steps of: generating for each of the sub-images an average value of luminance; and comparing for each of the sub-images the average value of luminance to a threshold value of luminance.
In a preferred embodiment, the step of analyzing the plurality of sub-images further comprises steps of: generating for each of the sub-images a value of variance; and comparing for each of the sub-images the value of variance to a threshold value of variance.
In a further preferred embodiment, the step of analyzing the plurality of sub-images further comprises steps of: detecting a number of consecutive separating frames; and comparing the detected number of consecutive separating frames to a pre-defined number of consecutive separating frames. Preferably, the step of analyzing the plurality of sub-images further comprises steps of: detecting a separating block of separating frames; and detecting a time distance of the detected separating block to a previous separating block of separating frames.
In a preferred embodiment, the entropy based decision process uses an ID3 algorithm. Combining the descriptor information logically with a simple classifier - instead of a complex classifier - to produce a reliable commercial detection.
Description of preferred embodiments of the invention
Following the invention will be described in further detail, by way of example, with reference to different embodiments. In the figures,
Fig. 1 shows schematically a concept of a video-genre-classification system;
Fig. 2 shows schematically a representation of a method for detecting a commercial in a video data stream by evaluating descriptor information provided from a plurality of descriptors indicative of commercial or non-commercial features;
Fig. 3 shows a plurality of candidate video data frames for false acceptance as a separating block being part of a fade;
Fig. 4 shows schematically a representation of a structure of video data frames of a commercial;
Fig. 5 shows schematically a block diagram representation of a process of generating sepa- rating block descriptor information;
Fig. 6 shows schematically a block diagram representation of a process of generating static area descriptor information;
Fig. 7 shows schematically a block diagram representation of a process of generating hard- cut-rate descriptor information; Fig. 8 shows schematically a representation of the appearance of features in a commercial; and
Fig. 9 shows schematically a block diagram representation of a process of evaluating descriptor information in an entropy based decision tree process.
Referring to Fig. 1 to 9, a method for detecting a commercial in a video data stream by evaluating descriptor information provided from a plurality of descriptors indicative of commercial or non-commercial features is described. The video data can be compressed.
Fig. 1 shows schematically a concept of a video-genre-classification system. Input to the sys- tern is a video data stream 10 received, for example, by a device 20 over an antenna, cable, internet, satellite or a DVD player. In a feature extraction and classification stage 30, descriptors analyze video data frames, extract features and combine them with a classifier to a genre. Therefore the new system enables users to access programs shown on a user device 40 to be clustered by genres.
In an overview Fig. 2 shows schematically a representation of a method for detecting a commercial in the video data stream 10 by evaluating descriptor information provided from a plurality of descriptors indicative of commercial or non-commercial features. Extracted information from different visual descriptors, namely a static area descriptor 60, a separating block descriptor 80, and hard-cut-rate descriptor 70, of consecutive video data frames 50 are logically combined using a decision tree based evaluation 90 to produce a reliable recognition 100. Although optimized results are provided by evaluating descriptor information from sev- eral descriptors, compared to the prior art, improved commercial detection is also achieved by using at least one of the newly proposed evaluation features.
Fig. 3 shows a plurality of candidate video data frames 110 for false acceptance as a separat- ing block being part of a fade. With the descriptor, which examines sub-images of each video data frame, the number of consecutive dark frames and the time-distance to previous separating blocks, the false acceptance of the presented dark frame as a separating block is prevented.
Commercials for TV have several characteristic features in common. Fig. 4 shows schemati- cally a representation of video data frames of a commercial 200. The following features are depicted:
(a) appearance of 3 to 14 separating video data frames 210 between (black frames) each commercial block 220,
(b) duration of individual commercials more than 10 seconds and less than 60 seconds, (c) removal of the TV-logo during the commercial breaks, and
(d) high number of hard-cuts within the commercials.
Based on these observations, three descriptors are proposed to transform the video data into feature vectors. In a first step all consecutive video data frames of the video data stream are saved temporarily for processing. Following descriptor information is generated for a plurality of descriptors.
Referring to Fig. 5, a process of generating separating block descriptor information is described. In the present embodiment, the following steps are performed: 1) A color transformation to receive the luminance information by taking each consecutive video data frame and transforming it in a color space, where the luminance signal Y is directly available (e.g. YCBCR) 300.
2) An average value of luminance and a variance value of luminance are generated for each video data frame by determine the average luminance Lμ as well as the variance Lvar of the pixels for each consecutive video data frame 310. All candidates as separating frames have to be below a certain threshold. 3) A value of average luminance Lμ.Sb for 3x3 sub-images of the video data frames is provided 320. Next, for the candidates as a dark frame the average luminance Lμ-Sb for 3x3 sub-images of the selected frame is examined. If the 3x3=9 average luminance values are below a threshold, the video data frame is declared as a separating frame (black frame or dark frame).
4) The number of consecutive separating frames is counted. The number of consecutive separating frames, satisfying the requirements mentioned in step 3), is counted and has to be in a certain range (appearance of 3 to 14 separating frames between each commercial) 330. 5) The time distance to previous blocks of separating frames is considered. This is called a separating block fl , if the time distance to the previous separating block fulfills the restrictions (duration of individual commercials more than 10 and less than 60 seconds) regarding the duration of individual commercials 340.
Compared to the prior art, the present method prevents false acceptance of separating frames also referred to black or dark frames belonging to fades within a commercial spot, but still showing a small area of information. The proposed descriptor analyses sub-images of a video data frame and, optionally, the time-distance between separating blocks.
Referring to Fig. 6, a process of generating static area descriptor information is described. It is investigated whether a TV-logo is present or not. This task was solved by examining, if in the interesting scanned areas, a static field is present. In the present embodiment, the following steps are performed:
10) A color transformation to receive the luminance information by taking each consecutive video data frame and transforming it in a color space, where the luminance signal Y is directly available (e. g. YCBCR) 400.
11) Pixel luminance values in scanned sub-areas cθ-c3 of a first examined video data frame are saved. Corresponding luminance values Y of the pixels in the scanned sub-areas cθ- c3 of the first video data frame of a processing window N are saved in a buffer Lbuffer- 12) The darkest pixel luminance values are stored. For each consecutive frame, the current values Lactprame in the scanned sub-areas are compared with Lbuffer in the scanned sub- areas. The darkest values of the search for each sub-area are stored 410. 13) A binary image is generated. After a length of N video data frames, a binary image is generated by comparing the resulting values to a threshold 420.
14) An average value of luminance for each scanned sub-area is generated. For each of the four scanned sub-areas cθ-c3 the average luminance value is calculated separately 430. 15) It is decided whether a static area is present. If the average luminance value μc of at least one of the four sub-areas cθ-c3 is higher than zero, static pixels f2 are detected, and the probability of a TV-logo, present in this sub-area, is high 440.
Referring to Fig. 7, the difference of two consecutive I-frames presents a hard-cut, if this dif- ference exceeds a certain threshold. The amount of these values is averaged over a window N resulting in the hard cut frequency f3 500. Also one descriptor was implemented which builds on the motion-activity information included in the mpeg-2 stream.
Referring to Fig. 8, the descriptors were applied to a set of training video data. The extracted features of the commercial sequences appeared within a decision window of N=50 I-frames mainly in three combinations, namely case 1, case 2, and case 3 according to Fig. 8. In case 1 a separating block and non-static areas (no-logos) were detected within a decision window. In case 2 either a high hard-cut-rate or no-logo was detected providing that case 1 occurred within the last 100 I-frames. In case 3 no-logo and a high hard-cut-frequency were simultane- ously detected.
These cases are combined with a decision tree according to the ID3 algorithm which as such is known (see for example J. R. Quinlan, IEEE Transactions on Systems, Man and Cybernetics, volume 20, pp. 339 - 346, 1990). The average entropy E for each of the cases shown in Fig. 8 is calculated as follows:.
Figure imgf000009_0001
The results are depicted in Table 1.
nt
Figure imgf000010_0001
ribcn ribc2+ nb+ ribci- nbc2- nb-
Table 1 : Cases with accepted / rejected rate as commercial
Referring to Fig. 9 and the results in Table 1 above, the case with the lowest entropy forms a first node 600. The resulting decision tree is shown in Fig. 9. The following steps are performed: (i) First it is examined if case 1 (see Fig. 8) appeared within the decision window of N (50 consecutive I frames). If yes, a commercial has been detected, (ii) If case 1 didn't appear within N, it was checked whether case 2 (see Fig. 8) appeared within 2*N. If yes, again commercial has been detected, (iii) If case 1 and case 2 didn't appear, but case 3 appeared commercial has been detected.
Experimental studies were performed on a database of 200 representative video data sequences (100 sequences for training and 100 as testing-data), in total of 400 minutes of recordings; 40 'commercials' and 4*40 'non-commercials' (cartoon, music, news and sport) of 2 minutes' each gathered from popular networks (ARD, BBC, CNN, MTV, VIVA, ZDF). Video data frames were extracted and scaled down for the analysis only to a resolution of 90*72 pixels. The number of considered frames in the processing window is N=50 I-frames.
The performance of each descriptor on the training-data was examined. The separating block descriptor should detect the transition from a running spot to a new spot.
Figure imgf000011_0001
Table 2: Classification accuracy of the separating-block descriptor
From 111 subjectively determined separating blocks within the 20 'commercials' only three blocks, including fades, were misclassified. The same reason caused the false acceptances in 'cartoon' and 'music'. In 'news' and 'sports' the descriptor worked correctly.
Table 3 shows the amount of windows N within each genre and the detected number of windows with non-static areas.
Figure imgf000011_0002
Table 3: Classification accuracy of the static-area descriptor
From 98 windows in commercial, 90 with non-static-areas were detected. The remaining 8 contained a static 'company-logo'. In 'music' we had a window including a scene-change without the TV-logo and in 'news' the logo was at the beginning on a window outside the scanned corners.
Table 4 shows the performance of the hard-cut-rate descriptor.
Figure imgf000012_0001
Table 4: Classification accuracy of the hard-cut-rate descriptor
The recognition of 'commercials' in a processing window using a single descriptor is obvi- ously not sufficient. In order to achieve high identification rates, we developed the tree with nodes based on logical combinations of the descriptors and a branching ratio of 2.
Table 5 shows the detection rates for the 20 video sequences of each genre used in our experiments, there were 100 video data streams for testing.
Figure imgf000012_0002
Table 5: Probability for video being detected as Commercial In the experiment 91 from 98 examined windows from the genre 'commercial' were classified as commercial. The remaining 7 windows were very close to the commercial-detection threshold of 50%. Those sequences started within a commercial, after the appearance of a separating block, presented a long spot with a 'company-logo' and had a low high-cut- frequency. It is obvious, that detection of sequences with 'special cases' is highly unreliable. It is interesting to note that only 1 from 105 examined windows of the genre 'music' and 1 of 95 windows of 'news' were misclassified (caused by non-static areas and high cut-rate). The genre 'cartoon' and 'sport' were correctly classified in more than 90 windows.
A new approach for the detection of commercials is presented. Among others, three contributions to optimization of commercial detection in video data streams have been made. New visual descriptors are provided. The temporal relations of the features are evaluated. Finally, a decision tree process is proposed to combine the results of the visual descriptors, deriving a probability rate for a video sequence being a 'commercial' or 'non-commercial'. A video database containing five popular genres namely cartoon, commercial, music, news and sports has been used. An average correct classification rate of 93% for commercial-videos detected as a 'commercial' and more than 99% for the other genres detected as a 'non-commercial' has been achieved.
The features disclosed in this specification, claims and / or the figures may be material for the realization of the invention in its various embodiments, taken in isolation or in various combinations thereof.

Claims

Claims
1. A method for detecting a commercial in a video data stream by evaluating descriptor information provided from a plurality of descriptors indicative of commercial or non- commercial features, the method comprising the steps of: detecting a plurality of video data frames in the video data stream; analyzing for each video data frame of the plurality of video data frames sub-areas where an essentially static logo is likely to be broadcasted; generating from the analysis of the sub-areas static area descriptor information; and using the static area descriptor information in the step of evaluating the descriptor information.
2. The method of claim 1, wherein the step of analyzing the sub-areas comprises a step of analyzing sub-areas located in corner sections of the video data frames.
3. The method of claim 1, wherein the step of analyzing the sub-areas further comprises steps of: generating for each video data frame of the plurality of video data frames values of luminance for each of the sub-areas; storing a plurality of darkest values of luminance for each of the sub-areas; generating for each of the sub-areas an average value of luminance; and generating the static area descriptor information as indicative of the commercial if for at least one of the sub-areas the average value of luminance exceeds a threshold value.
4. The method of claim 1, wherein the method further comprises the steps of: deciding for each video data frame whether the video data frame is a separating frame by analyzing for each video data frame a plurality of sub-images; generating separating block descrip- tor information; and using the separating block descriptor information in the step of evaluating the descriptor information.
5. The method of claim 4, wherein the step of analyzing the plurality of sub-images further comprises steps of: generating for each of the sub-images an average value of luminance; and comparing for each of the sub-images the average value of luminance to a threshold value of luminance.
6. The method of claim 4, wherein the step of analyzing the plurality of sub-images further comprises steps of: generating for each of the sub-images a value of variance; and comparing for each of the sub-images the value of variance to a threshold value of variance.
7. The method of claim 4, wherein the step of analyzing the plurality of sub-images further comprises steps of: detecting a number of consecutive separating frames; and comparing the detected number of consecutive separating frames to a pre-defined number of consecutive separating frames.
8. The method of claim 4, wherein the step of analyzing the plurality of sub-images further comprises steps of: detecting a separating block of separating frames; and detecting a time distance of the detected separating block to a previous separating block of separating frames.
9. The method of claim 1, wherein the descriptor information provided from the plurality of descriptors is evaluated in an entropy based decision process.
10. The method of claim 9, wherein the entropy based decision process uses an ID3 algorithm.
11. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform method steps of detecting a commercial in a video data stream by evaluating descriptor information provided from a plurality of descriptors indicative of commercial or non-commercial features, comprising: detecting a plurality of video data frames in the video data stream; analyzing for each video data frame of the plurality of video data frames sub-areas where an essentially static logo is likely to be broadcasted; generating from the analysis of the sub-areas static area descriptor information; and using the static area descriptor information in the step of evaluating the descriptor information.
12. An apparatus for processing a video data stream, said apparatus comprising a processing unit, wherein the processing unit is implementing: a frame detection module for detecting a plurality of video data frames in the video data stream; a sub-area analyzing module for analyzing for each video data frame of the plurality of video data frames sub-areas where an essentially static logo is likely to be broadcasted; a descriptor information generator module for generating from the analysis of the sub-areas static area descriptor information; and an evaluation module for detecting a commercial in the video data stream by evaluating descriptor information provided from a plurality of descriptors indicative of commercial or non-commercial features, wherein the evaluation module is configured to use the static area descriptor information in the step of evaluating the descriptor information.
13. A method for detecting a commercial in a video data stream by evaluating descriptor information provided from a plurality of descriptors indicative of commercial or noncommercial features, the method comprising the steps of: detecting a plurality of video data frames in the video data stream; deciding for each video data frame whether the video data frame is a separating frame by analyzing for each video data frame a plurality of sub-images; generating separating block descriptor information; and using the separating block descriptor information in the step of evaluating the descriptor information.
14. The method of claim 13, wherein the step of analyzing the plurality of sub-images further comprises steps of: generating for each of the sub-images an average value of luminance; and comparing for each of the sub-images the average value of luminance to a threshold value of luminance.
15. The method of claim 13, wherein the step of analyzing the plurality of sub-images further comprises steps of: generating for each of the sub-images a value of variance; and com- paring for each of the sub-images the value of variance to a threshold value of variance.
16. The method of claim 13, wherein the step of analyzing the plurality of sub-images further comprises steps of: detecting a number of consecutive separating frames; and comparing the detected number of consecutive separating frames to a pre-defined number of con- secutive separating frames.
17. The method of claim 13, wherein the step of analyzing the plurality of sub-images further comprises steps of: detecting a separating block of separating frames; and detecting a time distance of the detected separating block to a previous separating block of separating frames.
18. The method of claim 13, wherein the method comprises the steps of: analyzing for each video data frame of the plurality of video data frames sub-areas where an essentially static logo is likely to be broadcasted; generating from the analysis of the sub-areas static area descriptor information; and using the static area descriptor information in the step of evaluating the descriptor information.
19. The method of claim 18, wherein the step of analyzing the sub-areas comprises a step of analyzing sub-areas located in corner sections of the video data frames.
20. The method of claim 18, wherein the step of analyzing the sub-areas further comprises steps of: generating for each video data frame of the plurality of video data frames values of luminance for each of the sub-areas; storing a plurality of darkest values of luminance for each of the sub-areas; generating for each of the sub-areas an average value of luminance; and generating the static area descriptor information as indicative of the commercial if for at least one of the sub-areas the average value of luminance exceeds a threshold value.
21. The method of claim 13, wherein the descriptor information provided for the plurality of descriptors is evaluated in an entropy based decision process.
22. The method of claim 21, wherein the entropy based decision process uses an ID3 algo- rithm.
23. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform method steps of detecting a commercial in a video data stream by evaluating descriptor information provided from a plurality of descriptors indicative of commercial or non-commercial features, comprising: detecting a plurality of video data frames in the video data stream; deciding for each video data frame whether the video data frame is a separating frame by analyzing for each video data frame a plurality of sub-images; generating separating block descriptor information; and using the separating block descriptor information in the step of evaluating the descriptor information.
24. An apparatus for processing a video data stream, said apparatus comprising a processing unit, wherein the processing unit is implementing: a frame detection module for detecting a plurality of video data frames in the video data stream; a decision module for deciding for each video data frame whether the video data frame is a separating frame by analyzing for each video data frame a plurality of sub-images; a descriptor information generator module for generating separating block descriptor information; and an evaluation module for detecting a commercial in the video data stream by evaluating descriptor information provided from a plurality of descriptors indicative of commercial or noncommercial features, wherein the evaluation module is configured to use the separating block descriptor information in the step of evaluating the descriptor information.
25. A method for detecting a commercial in a video data stream by evaluating descriptor information provided from a plurality of descriptors indicative of commercial or non-commercial features, wherein the descriptor information provided from the plurality of descriptors is evaluated in an entropy based decision process.
26. The method of claim 25, wherein the entropy based decision process uses an ID3 algorithm.
27. The method of claim 25, wherein the step of evaluating descriptor information provided from the plurality of descriptors comprises a step of evaluating at least one descriptor in- formation selected from the following group of descriptor information: static area descriptor information, separating block descriptor information and hard cut descriptor information.
28. A program storage device readable by a machine, tangibly embodying a program of in- structions executable by the machine to perform method steps of detecting a commercial in a video data stream by evaluating descriptor information provided from a plurality of descriptors indicative of commercial or non-commercial features, comprising: evaluating the descriptor information provided from the plurality of descriptors in an entropy based decision process.
29. An apparatus for processing a video data stream, said apparatus comprising a processing unit, wherein the processing unit is implementing an evaluation module for detecting a commercial in the video data stream by evaluating descriptor information provided from a plurality of descriptors indicative of commercial or non-commercial features, and wherein the evaluation module is configured for evaluating the descriptor information in an entropy based decision process.
PCT/EP2007/003409 2006-04-18 2007-04-18 A method for detecting a commercial in a video data stream by evaluating descriptor information WO2007118709A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11/406,044 US7761491B2 (en) 2006-04-18 2006-04-18 Method for detecting a commercial in a video data stream by evaluating descriptor information
US11/406,044 2006-04-18

Publications (1)

Publication Number Publication Date
WO2007118709A1 true WO2007118709A1 (en) 2007-10-25

Family

ID=38235435

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2007/003409 WO2007118709A1 (en) 2006-04-18 2007-04-18 A method for detecting a commercial in a video data stream by evaluating descriptor information

Country Status (2)

Country Link
US (1) US7761491B2 (en)
WO (1) WO2007118709A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014004028A2 (en) * 2012-06-25 2014-01-03 Intel Corporation Video analytics test system
CN105893930A (en) * 2015-12-29 2016-08-24 乐视云计算有限公司 Video feature identification method and device

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5311813B2 (en) * 2007-12-18 2013-10-09 三菱電機株式会社 Commercial processing equipment
JP4813517B2 (en) * 2008-05-29 2011-11-09 オリンパス株式会社 Image processing apparatus, image processing program, image processing method, and electronic apparatus
US20100153995A1 (en) * 2008-12-12 2010-06-17 At&T Intellectual Property I, L.P. Resuming a selected viewing channel
US8175413B1 (en) * 2009-03-05 2012-05-08 Google Inc. Video identification through detection of proprietary rights logos in media
US9055335B2 (en) * 2009-05-29 2015-06-09 Cognitive Networks, Inc. Systems and methods for addressing a media database using distance associative hashing
US10949458B2 (en) 2009-05-29 2021-03-16 Inscape Data, Inc. System and method for improving work load management in ACR television monitoring system
US10375451B2 (en) 2009-05-29 2019-08-06 Inscape Data, Inc. Detection of common media segments
US8769584B2 (en) 2009-05-29 2014-07-01 TVI Interactive Systems, Inc. Methods for displaying contextually targeted content on a connected television
US10116972B2 (en) 2009-05-29 2018-10-30 Inscape Data, Inc. Methods for identifying video segments and displaying option to view from an alternative source and/or on an alternative device
US9449090B2 (en) 2009-05-29 2016-09-20 Vizio Inscape Technologies, Llc Systems and methods for addressing a media database using distance associative hashing
WO2011052589A1 (en) * 2009-10-27 2011-05-05 シャープ株式会社 Display device, control method for said display device, program, and computer-readable recording medium having program stored thereon
US9838753B2 (en) 2013-12-23 2017-12-05 Inscape Data, Inc. Monitoring individual viewing of television events using tracking pixels and cookies
US10192138B2 (en) 2010-05-27 2019-01-29 Inscape Data, Inc. Systems and methods for reducing data density in large datasets
US9078013B2 (en) * 2013-06-28 2015-07-07 Verizon Patent And Licensing Inc. Content verification using luminance mapping
CN103714350B (en) * 2013-12-13 2016-11-02 科大讯飞股份有限公司 Television advertising detection method based on channel logo position and system
US9955192B2 (en) 2013-12-23 2018-04-24 Inscape Data, Inc. Monitoring individual viewing of television events using tracking pixels and cookies
CN108337925B (en) 2015-01-30 2024-02-27 构造数据有限责任公司 Method for identifying video clips and displaying options viewed from alternative sources and/or on alternative devices
CA2982797C (en) 2015-04-17 2023-03-14 Inscape Data, Inc. Systems and methods for reducing data density in large datasets
US10080062B2 (en) 2015-07-16 2018-09-18 Inscape Data, Inc. Optimizing media fingerprint retention to improve system resource utilization
WO2017011792A1 (en) 2015-07-16 2017-01-19 Vizio Inscape Technologies, Llc Prediction of future views of video segments to optimize system resource utilization
KR20180030885A (en) 2015-07-16 2018-03-26 인스케이프 데이터, 인코포레이티드 System and method for dividing search indexes for improved efficiency in identifying media segments
CN108293140B (en) 2015-07-16 2020-10-02 构造数据有限责任公司 Detection of common media segments
US9971940B1 (en) * 2015-08-10 2018-05-15 Google Llc Automatic learning of a video matching system
US11256923B2 (en) 2016-05-12 2022-02-22 Arris Enterprises Llc Detecting sentinel frames in video delivery using a pattern analysis
US10097865B2 (en) 2016-05-12 2018-10-09 Arris Enterprises Llc Generating synthetic frame features for sentinel frame matching
BR112019019430A2 (en) 2017-04-06 2020-04-14 Inscape Data Inc computer program system, method and product

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002013067A2 (en) * 2000-08-05 2002-02-14 Hrl Laboratories, Llc System for online rule-based video classification
US20030033347A1 (en) * 2001-05-10 2003-02-13 International Business Machines Corporation Method and apparatus for inducing classifiers for multimedia based on unified representation of features reflecting disparate modalities
US20050078222A1 (en) * 2003-10-09 2005-04-14 Samsung Electronics Co., Ltd. Apparatus and method for detecting opaque logos within digital video signals

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6005603A (en) * 1998-05-15 1999-12-21 International Business Machines Corporation Control of a system for processing a stream of information based on information content
AU2002252698A1 (en) * 2001-04-20 2002-11-05 France Telecom Research And Development L.L.C. Replacing commercials according to location and time
US6870956B2 (en) * 2001-06-14 2005-03-22 Microsoft Corporation Method and apparatus for shot detection
US20030001977A1 (en) * 2001-06-28 2003-01-02 Xiaoling Wang Apparatus and a method for preventing automated detection of television commercials
US7164798B2 (en) * 2003-02-18 2007-01-16 Microsoft Corporation Learning-based automatic commercial content detection

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002013067A2 (en) * 2000-08-05 2002-02-14 Hrl Laboratories, Llc System for online rule-based video classification
US20030033347A1 (en) * 2001-05-10 2003-02-13 International Business Machines Corporation Method and apparatus for inducing classifiers for multimedia based on unified representation of features reflecting disparate modalities
US20050078222A1 (en) * 2003-10-09 2005-04-14 Samsung Electronics Co., Ltd. Apparatus and method for detecting opaque logos within digital video signals

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
ALBIOL A ET AL: "Detection of tv commercials", ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2004. PROCEEDINGS. (ICASSP '04). IEEE INTERNATIONAL CONFERENCE ON MONTREAL, QUEBEC, CANADA 17-21 MAY 2004, PISCATAWAY, NJ, USA,IEEE, 17 May 2004 (2004-05-17), pages 541 - 544, XP010718246, ISBN: 0-7803-8484-9 *
BA TU TRUONG ET AL: "Automatic genre identification for content-based video categorization", 3 September 2000, PATTERN RECOGNITION, 2000. PROCEEDINGS. 15TH INTERNATIONAL CONFERENCE ON SEPTEMBER 3-7, 2000, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, PAGE(S) 230-233, ISBN: 0-7695-0750-6, XP010533062 *
LIENHART R ET AL: "On the detection and recognition of television commercials", PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, LOS ALAMITOS, CA, US, 3 June 1997 (1997-06-03), pages 509 - 516, XP002154465 *
SADLIER D A ET AL: "Automatic TV advertisement detection from MPEG bitstream", December 2002, PATTERN RECOGNITION, ELSEVIER, KIDLINGTON, GB, PAGE(S) 2719-2726, ISSN: 0031-3203, XP004379642 *
YE YUAN ET AL: "Automatic video classification using decision tree method", MACHINE LEARNING AND CYBERNETICS, 2002. PROCEEDINGS. 2002 INTERNATIONAL CONFERENCE ON NOV. 4-5, 2002, PISCATAWAY, NJ, USA,IEEE, 4 November 2002 (2002-11-04), pages 1153 - 1157, XP010802731, ISBN: 0-7803-7508-4 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014004028A2 (en) * 2012-06-25 2014-01-03 Intel Corporation Video analytics test system
WO2014004028A3 (en) * 2012-06-25 2014-05-01 Intel Corporation Video analytics test system
CN105893930A (en) * 2015-12-29 2016-08-24 乐视云计算有限公司 Video feature identification method and device
WO2017113691A1 (en) * 2015-12-29 2017-07-06 乐视控股(北京)有限公司 Method and device for identifying video characteristics

Also Published As

Publication number Publication date
US7761491B2 (en) 2010-07-20
US20070261075A1 (en) 2007-11-08

Similar Documents

Publication Publication Date Title
US7761491B2 (en) Method for detecting a commercial in a video data stream by evaluating descriptor information
JP4036328B2 (en) Scene classification apparatus for moving image data
US7336890B2 (en) Automatic detection and segmentation of music videos in an audio/video stream
US8316301B2 (en) Apparatus, medium, and method segmenting video sequences based on topic
US7170566B2 (en) Family histogram based techniques for detection of commercials and other video content
EP2457214B1 (en) A method for detecting and adapting video processing for far-view scenes in sports video
US20030061612A1 (en) Key frame-based video summary system
Han et al. Video scene segmentation using a novel boundary evaluation criterion and dynamic programming
Nasir et al. Event detection and summarization of cricket videos
KR101195613B1 (en) Apparatus and method for partitioning moving image according to topic
Kolekar et al. Semantic event detection and classification in cricket video sequence
Huang et al. An intelligent subtitle detection model for locating television commercials
JP4999015B2 (en) Moving image data classification device
Glasberg et al. Recognizing commercials in real-time using three visual descriptors and a decision-tree
KR100656373B1 (en) Method for discriminating obscene video using priority and classification-policy in time interval and apparatus thereof
JP4396914B2 (en) Moving image data classification device
Vadhanam et al. Exploiting BICC features for classification of advertisement videos using RIDOR algorithm
Kolekar et al. A hierarchical framework for semantic scene classification in soccer sports video
Glasberg et al. Cartoon-recognition using visual-descriptors and a multilayer-perceptron
Hameed A novel framework of shot boundary detection for uncompressed videos
Kim et al. An adaptive shot change detection algorithm using an average of absolute difference histogram within extension sliding window
Kyperountas et al. Scene change detection using audiovisual clues
El-Khoury et al. Unsupervised TV program boundaries detection based on audiovisual features
Waseemullah et al. Unsupervised Ads Detection in TV Transmissions
Khan et al. Unsupervised Ads Detection in TV Transmissions

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07724346

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07724346

Country of ref document: EP

Kind code of ref document: A1