US20050217463A1 - Signal processing apparatus and signal processing method, program, and recording medium - Google Patents

Signal processing apparatus and signal processing method, program, and recording medium Download PDF

Info

Publication number
US20050217463A1
US20050217463A1 US11/082,778 US8277805A US2005217463A1 US 20050217463 A1 US20050217463 A1 US 20050217463A1 US 8277805 A US8277805 A US 8277805A US 2005217463 A1 US2005217463 A1 US 2005217463A1
Authority
US
United States
Prior art keywords
section
frequency
tempo
audio signal
feature value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US11/082,778
Other versions
US7507901B2 (en
Inventor
Yoshiyuki Kobayashi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony Corp
Original Assignee
Sony Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Corp filed Critical Sony Corp
Assigned to SONY CORPORATION reassignment SONY CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: KOBAYASHI, YOSHIYUKI
Publication of US20050217463A1 publication Critical patent/US20050217463A1/en
Priority to US12/350,519 priority Critical patent/US7868240B2/en
Application granted granted Critical
Publication of US7507901B2 publication Critical patent/US7507901B2/en
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/40Rhythm
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/076Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of timing, tempo; Beat detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/131Mathematical functions for musical analysis, processing, synthesis or composition
    • G10H2250/215Transforms, i.e. mathematical transforms into domains appropriate for musical signal processing, coding or compression
    • G10H2250/235Fourier transform; Discrete Fourier Transform [DFT]; Fast Fourier Transform [FFT]

Definitions

  • This invention relates to a signal processing apparatus and a signal processing method, a program, and a recording medium, and more particularly to a signal processing apparatus and a signal processing method, a program, and a recording medium by which a feature value of an audio signal such as the tempo is detected with a high degree of accuracy.
  • Various methods are known by which the tempo of an audio signal of, for example, a tune is detected. According to one of the methods, a peak portion and a level of an autocorrelation function of sound production starting time of an audio signal are observed to analyze the periodicity of the sound production time, and the tempo which is the number of quarter notes for one minute is detected from a result of the analysis.
  • the method described is disclosed, for example, in Japanese Patent Laid-Open No. 2002-116754.
  • a signal processing apparatus for processing an audio signal, comprising a production section for producing a level signal representative of a transition of the level of the audio signal, a frequency analysis section for frequency analyzing the level signal produced by the production section, and a feature value calculation section for determining a feature value or values of the audio signal based on a result of the frequency analysis by the frequency analysis section.
  • a signal processing method for a signal processing apparatus which processes an audio signal comprising a production step of producing a level signal representative of a transition of the level of the audio signal, a frequency analysis step of frequency analyzing the level signal produced by the process at the production step, and a feature value calculation step of determining a feature value or values of the audio signal based on a result of the frequency analysis by the process at the frequency analysis step.
  • a program for causing a computer to execute processing of an audio signal comprising a production step of producing a level signal representative of a transition of the level of the audio signal, a frequency analysis step of frequency analyzing the level signal produced by the process at the production step, and a feature value calculation step of determining a feature value or values of the audio signal based on a result of the frequency analysis by the process at the frequency analysis step.
  • a recording medium on or in which a program for causing a computer to execute processing of an audio signal is recorded, the program comprising a production step of producing a level signal representative of a transition of the level of the audio signal, a frequency analysis step of frequency analyzing the level signal produced by the process at the production step, and a feature value calculation step of determining a feature value or values of the audio signal based on a result of the frequency analysis by the process at the frequency analysis step.
  • a level signal representative of a transition of the level of an audio signal is produced and frequency analyzed. Then, a feature value of the audio signal is determined based on a result of the frequency analysis.
  • a feature value of music such as the temp can be detected with a high degree of accuracy.
  • FIG. 1 is a block diagram showing an example of a configuration of a feature value detection apparatus to which the present invention is applied;
  • FIG. 2 is a block diagram showing a detailed configuration of a level calculation section and a frequency analysis section shown in FIG. 1 ;
  • FIG. 3 is a block diagram showing a detailed configuration of a speed feeling detection section shown in FIG. 1 ;
  • FIG. 4 is a block diagram showing a detailed configuration of a tempo fluctuation detection section shown in FIG. 1 ;
  • FIG. 5 is a flow chart illustrating a feature value detection process performed by the feature value detection apparatus of FIG. 1 ;
  • FIG. 6 is a flow chart illustrating a frequency analysis process at step S 13 of FIG. 5 ;
  • FIGS. 7A to 7 E and 8 are waveform diagrams illustrating the frequency analysis process of a frequency analysis section shown in FIG. 1 ;
  • FIG. 9 is a flow chart illustrating a speed feeling detection process at step S 15 of FIG. 5 ;
  • FIGS. 10 and 11 are diagrams illustrating different examples of frequency components of an audio signal of one tune obtained by the frequency analysis section shown in FIG. 1 ;
  • FIG. 12 is a flow chart illustrating a tempo correction process at step S 16 of FIG. 5 ;
  • FIG. 13 is a flow chart illustrating a tempo fluctuation detection process at step S 17 of FIG. 5 ;
  • FIGS. 14 and 15 are diagrams illustrating different examples of frequency components of an audio signal of one tune obtained by the frequency analysis section shown in FIG. 1 ;
  • FIG. 16 is a block diagram showing an example of a configuration of a computer to which the present invention is applied.
  • a signal processing apparatus for example, a feature value detection apparatus 1 of FIG. 1 for processing an audio signal, comprising a production section (for example, a level calculation section 21 of FIG. 1 ) for producing a level signal representative of a transition of the level of the audio signal, a frequency analysis section (for example, a frequency analysis section 22 of FIG. 1 ) for frequency analyzing the level signal produced by the production section, and a feature value calculation section (for example, a feature extraction section 23 of FIG. 1 ) for determining a feature value or values of the audio signal based on a result of the frequency analysis by the frequency analysis section.
  • a production section for example, a level calculation section 21 of FIG. 1
  • a frequency analysis section for example, a frequency analysis section 22 of FIG. 1
  • a feature value calculation section for example, a feature extraction section 23 of FIG. 1
  • the signal processing apparatus may further comprise a statistic processing section (for example, a statistic processing section 49 of FIG. 2 ) for performing a statistic process of the result of the frequency analysis by the frequency analysis section.
  • a statistic processing section for example, a statistic processing section 49 of FIG. 2
  • the feature value calculation section determines the feature value or values based on the result of the frequency analysis statistically processed by the statistic processing section.
  • the signal processing apparatus may further comprise a frequency component processing section (for example, a frequency component processing section 48 of FIG. 2 ) for adding, to frequency components of the level signal of the result of the frequency analysis by the frequency analysis section, frequency components having a relationship of harmonics to the frequency components and outputting the sum values as the frequency components of the level signal.
  • a frequency component processing section for example, a frequency component processing section 48 of FIG. 2
  • the feature value calculation section determines the feature value or values based on the frequency components outputted from the frequency component processing section.
  • a signal processing method for a signal processing apparatus which processes an audio signal comprising a production step (for example, a step S 12 of FIG. 5 ) of producing a level signal representative of a transition of the level of the audio signal, a frequency analysis step (for example, a step S 13 of FIG. 5 ) of frequency analyzing the level signal produced by the process at the production step, and a feature value calculation step (for example, steps S 14 to S 16 of FIG. 5 ) of determining a feature value or values of the audio signal based on a result of the frequency analysis by the process at the frequency analysis step.
  • a production step for example, a step S 12 of FIG. 5
  • a frequency analysis step for example, a step S 13 of FIG. 5
  • a feature value calculation step for example, steps S 14 to S 16 of FIG. 5
  • a program for causing a computer to execute processing of an audio signal and a recording medium on or in which a program for causing a computer to execute processing of an audio signal is recorded comprising a production step (for example, a step S 12 of FIG. 5 ) of producing a level signal representative of a transition of the level of the audio signal, a frequency analysis step (for example, a step S 13 of FIG. 5 ) of frequency analyzing the level signal produced by the process at the production step, and a feature value calculation step (for example, steps S 14 to S 16 of FIG. 5 ) of determining a feature value or values of the audio signal based on a result of the frequency analysis by the process at the frequency analysis step.
  • a production step for example, a step S 12 of FIG. 5
  • a frequency analysis step for example, a step S 13 of FIG. 5
  • a feature value calculation step for example, steps S 14 to S 16 of FIG. 5
  • FIG. 1 there is shown in block diagram an example of a configuration of a feature value detection apparatus to which the present invention is applied.
  • the feature value detection apparatus 1 shown receives an audio signal supplied thereto as a digital signal of a tune reproduced, for example, from a CD (Compact Disc) and detects and outputs, for example, a tempo t, a speed feeling S and a tempo fluctuation W as feature values of the audio signal. It is to be noted that, in FIG. 1 , the audio signal supplied to the feature value detection apparatus 1 is a stereo signal.
  • the feature value detection apparatus 1 includes an adder 20 , a level calculation section 21 , a frequency analysis section 22 and a feature extraction section 23 .
  • An audio signal of the left channel and another audio channel of the right channel of a tune are supplied to the adder 20 .
  • the adder 20 adds the audio signals of the left and right channels and supplies a resulting signal to the level calculation section 21 .
  • the level calculation section 21 produces a level signal representative of a transition of the level of the audio signal supplied thereto from the adder 20 and supplies the produced level signal to the frequency analysis section 22 .
  • the frequency analysis section 22 frequency analyzes the level signal representative of a transition of the level of the audio signal supplied thereto from the level calculation section 21 and outputs frequency components A of individual frequencies of the level signal as a result of the analysis. Then, the frequency analysis section 22 supplies the frequency components A to the feature extraction section 23 .
  • the feature extraction section 23 includes a tempo calculation section 31 , a speed feeling detection section 32 , a tempo correction section 33 and a tempo fluctuation detection section 34 .
  • the tempo calculation section 31 outputs a tempo (feature value) t of the audio signal based on the frequency components A of the level signal supplied thereto from the frequency analysis section 22 and supplies the tempo t to the tempo correction section 33 .
  • the speed feeling detection section 32 detects a speed feeling S of the audio signal based on the frequency components A of the level signal supplied thereto from the frequency analysis section 22 and supplies the speed feeling S to the tempo correction section 33 . Further, the speed feeling detection section 32 outputs the speed feeling S as one of feature values of the audio signal to the outside.
  • the tempo correction section 33 corrects the tempo t supplied thereto from the tempo calculation section 31 as occasion demands based on the speed feeling S supplied thereto from the speed feeling detection section 32 . Then, the tempo correction section 33 outputs the corrected tempo t as one of feature values of the audio signal to the outside.
  • the tempo fluctuation detection section 34 detects a tempo fluctuation W which is a fluctuation of the tempo of the audio signal based on the frequency components A of the level signal supplied thereto from the frequency analysis section 22 and outputs the tempo fluctuation W as one of the feature values of the audio signal to the outside.
  • audio signals of the left channel and the right channel of a tune are supplied to the level calculation section 21 through the adder 20 .
  • the level calculation section 21 converts the audio signals into a level signal.
  • the frequency analysis section 22 detects frequency components A of the level signal
  • the tempo calculation section 31 arithmetically operates the tempo t based on the frequency components A while the speed feeling detection section 32 detects the speed feeling S based on the frequency components A.
  • the tempo correction section 33 corrects the tempo t based on the speed feeling S as occasion demands and outputs the corrected tempo t.
  • the tempo fluctuation detection section 34 detects and outputs the tempo fluctuation W based on the frequency components A.
  • FIG. 2 shows an example of a detailed configuration of the level calculation section 21 and the frequency analysis section 22 shown in FIG. 1 .
  • the level calculation section 21 includes an EQ (Equalize) processing section 41 and a level signal production section 42 .
  • the frequency analysis section 22 includes a decimation filter section 43 , a down sampling section 44 , an EQ processing section 45 , a window processing section 46 , a frequency conversion section 47 , a frequency component processing section 48 and a statistic processing section 49 .
  • the EQ processing section 41 performs a filter process for the audio signal.
  • the EQ processing section 41 has a configuration of a high-pass filter (HPF) and removes low frequency components of the audio signal which are not suitable for extraction of the tempo t.
  • HPF high-pass filter
  • the EQ processing section 41 outputs an audio signal of frequency components which are suitable for extraction of the tempo t to the level signal production section 42 .
  • the coefficient of the filter used by the filter process of the EQ processing section 41 is not limited specifically.
  • the level signal production section 42 produces a level signal representative of a transition of the level of the audio signal supplied thereto from the EQ processing section 41 and supplies the level signal to (the decimation filter section 43 of) the frequency analysis section 22 .
  • the level signal may represent, for example, an absolute value or a power (squared) value of the audio signal, a moving average (value) of such an absolute value or power value, a value used for level indication by a level meter or the like. If a value used for level indication by a level meter is adopted as the level signal here, then the absolute value of the audio signal at each sample point makes the level signal at the sample point.
  • the decimation filter section 43 removes high frequency components of the level signal supplied thereto from the level signal production section 42 in order to allow down sampling to be performed by the down sampling section 44 at the next stage.
  • the decimation filter section 43 supplies a resulting level signal to the down sampling section 44 .
  • the down sampling section 44 performs down sampling of the level signal supplied thereto from the decimation filter section 43 .
  • the down sampling section 44 samples out samples of the level signal to decrease the sampling frequency of the level signal to 172 Hz.
  • the level signal after the down sampling is supplied to the EQ processing section 45 .
  • the down sampling by the down sampling section 44 can reduce the load (arithmetic operation amount) of later processing.
  • the EQ processing section 45 performs a filter process of the level signal supplied thereto from the down sampling section 44 to remove low frequency components (for example, a dc component and frequency components lower than a frequency corresponding to the tempo 50 (the number of quarter notes for one minute is 50 )) and high frequency components (frequency components higher than a frequency corresponding to the tempo 400 (the number of quarter notes for one minute is 400 )) from the level signal.
  • the EQ processing section 45 removes those low frequency components and high frequency components which are not suitable for extraction of the tempo t.
  • the EQ processing section 45 supplies a level signal of remaining frequencies as a result of the removal of the low frequency components and high frequency components to the window processing section 46 .
  • tempo i the tempo of the audio signal where the number of quarter notes for one minute
  • the window processing section 46 extracts, from the level signal supplied thereto from the EQ processing section 45 , the level signals for a predetermined period of time, that is, a predetermined number of samples of the level signal, as one block in a time sequence. Further, in order to reduce the influence of sudden variation of the level signal at the opposite ends of the block or for some other object, the window processing section 46 window processes the level signal of the block using a window function such as a Hamming window or a Hanning window by which portions at the opposite ends of the block are gradually attenuated (or multiplies the level signal of the block by a window function) and supplies a resulting level signal to the frequency conversion section 47 .
  • a window function such as a Hamming window or a Hanning window by which portions at the opposite ends of the block are gradually attenuated (or multiplies the level signal of the block by a window function
  • the frequency conversion section 47 performs, for example, discrete cosine transform for the level signal of the block supplied thereto from the window processing section 46 to perform frequency conversion (frequency analysis) of the level signal.
  • the frequency conversion section 47 obtains frequency components of frequencies corresponding, for example, to the tempos 50 to 1 , 600 from among the frequency components obtained by the frequency conversion of the level signal of the block and supplies the obtained frequency components to the frequency component processing section 48 .
  • the frequency component processing section 48 processes the frequency components of the level signal of the block from the frequency conversion section 47 .
  • the frequency component processing section 48 adds, to the frequency components of frequencies corresponding to, for example, the tempos 50 to 400 from among the frequency components of the level signal of the block from the frequency conversion section 47 , frequency components (harmonics) of frequencies corresponding to tempos equal to twice, three times and four times the tempos, respectively. Then, the frequency component processing section 48 determines results of the addition as frequency components of the frequencies corresponding to the tempos.
  • frequency components of a frequency corresponding to the tempo 100 which is twice the tempo 50 frequency components of a frequency corresponding to the tempo 100 which is twice the tempo 50 , another frequency corresponding to the tempo 150 which is three times the tempo 50 and a further frequency corresponding to the tempo 200 which is four times the tempo 50 are added, and the sum is determined as a frequency component of the frequency corresponding to the tempo 50 .
  • frequency components of a frequency corresponding to the tempo 200 which is twice the tempo 100 frequency components of a frequency corresponding to the tempo 200 which is twice the tempo 100 , another frequency corresponding to the tempo 300 which is three times the tempo 100 and a further frequency corresponding to the tempo 400 which is four times the tempo 100 are added, and the sum is determined as a frequency component of the frequency corresponding to the tempo 100 .
  • the frequency component corresponding to the tempo 100 which is added when the frequency component corresponding to the tempo 50 is to be determined is a frequency component corresponding to the tempo 100 before frequency components of harmonics thereto are added. This also applies to the other tempos.
  • the frequency component processing section 48 adds, to individual frequency components of the frequencies corresponding to the range of the tempos 50 to 400 , frequency components of harmonics to them and uses the sum values as new frequency components to obtain frequency components of the frequencies corresponding to the range of the tempos 50 to 400 for each block.
  • the frequency component processing section 48 supplies the obtained frequency components to the statistic processing section 49 .
  • a frequency component of a certain frequency represents the degree of possibility that the frequency may be a basic frequency (pitch frequency) f b of the level signal. Accordingly, the frequency component of the certain frequency can be regarded as basic frequency likelihood of the frequency. It is to be noted that, since the basic frequency f b represents that the level signal exhibits repetitions with the basic frequency, it corresponds to the tempo of the original audio signal.
  • the statistic processing section 49 performs a statistic process for blocks of one tune.
  • the statistic processing section 49 adds frequency components of the level signal for one tune supplied thereto in a unit of a block from the frequency component processing section 48 for each frequency. Then, the statistic processing section 49 supplies a result of the addition of frequency components over the blocks for one tune obtained by the statistic process as frequency components A of the level signal of the one tune to the feature extraction section 23 .
  • the speed feeling detection section 32 shown includes a peak extraction section 61 , a peak addition section 62 , a peak frequency arithmetic operation section 63 and a speed feeling arithmetic operation section 64 .
  • the peak extraction section 61 supplies the 10 comparatively high frequency components A 1 to A 10 to the peak addition section 62 and supplies the frequency components A 1 to A 10 and the corresponding frequencies f 1 to f 10 to the peak frequency arithmetic operation section 63 .
  • the speed feeling arithmetic operation section 64 arithmetically operates a speed feeling S (or information representative of a speed feeling S) based on the sum value ⁇ A i supplied thereto from the peak addition section 62 and the integrated value ⁇ A i ⁇ f i supplied thereto from the peak frequency arithmetic operation section 63 .
  • the speed feeling arithmetic operation section 64 supplies the speed feeling S to the tempo correction section 33 and outputs the speed feeling S to the outside.
  • FIG. 4 shows in block diagram an example of a detailed configuration of the tempo fluctuation detection section 34 shown in FIG. 1 .
  • the tempo fluctuation detection section 34 shown includes an addition section 81 , a peak extraction section 82 and a division section 83 .
  • the frequency components A of the frequencies corresponding to the range of the tempos 50 to 400 are supplied from the frequency analysis section 22 to the addition section 81 .
  • the addition section 81 adds the frequency components A supplied thereto from the frequency analysis section 22 over all of the frequencies and supplies a resulting sum value ⁇ A to the division section 83 .
  • the frequency components A of the frequencies corresponding to the range of the tempos 50 to 400 from the frequency analysis section 22 are supplied also to the peak extraction section 82 .
  • the peak extraction section 82 extracts the maximum frequency component A 1 from among the frequency components A and supplies the frequency component A 1 to the division section 83 .
  • the division section 83 arithmetically operates a tempo fluctuation W based on the sum value ⁇ A of the frequency components A supplied thereto from the addition section 81 and the maximum frequency component A 1 supplied thereto from the peak extraction section 82 and outputs the tempo fluctuation W to the outside.
  • the feature value detection process is started when audio signals of the left and right channels are supplied to the adder 20 .
  • step S 11 the adder 20 adds the audio signals of the left and right channels and supplies a resulting audio signal to the level calculation section 21 . Thereafter, the processing advances to step S 12 .
  • the level calculation section 21 produces a level signal of the audio signal supplied thereto from the adder 20 and supplies the level signal to the frequency analysis section 22 .
  • the EQ processing section 41 of the level calculation section 21 removes low frequency components of the audio signal which are not suitable for extraction of the tempo t and supplies the audio signal of frequency components suitable for extraction of the tempo t to the level signal processing sections 42 . Then, the level signal production section 42 produces a level signal representative of a transition of the level of the audio signal supplied thereto from the EQ processing section 41 and supplies the level signal to the frequency analysis section 22 .
  • step S 13 the frequency analysis section 22 frequency analyzes the level signal supplied thereto from the level calculation section 21 and outputs frequency components A of individual frequencies of the level signal as a result of the analysis. Then, the frequency analysis section 22 supplies the frequency components A to the tempo calculation section 31 , speed feeling detection section 32 and tempo fluctuation detection section 34 of the feature extraction section 23 . Thereafter, the processing advances to step S 14 .
  • the tempo calculation section 31 determines a tempo t of the audio signal based on the frequency components A of the level signal supplied thereto from the frequency analysis section 22 and supplies the tempo t to the tempo correction section 33 .
  • the tempo calculation section 31 extracts the maximum frequency component A 1 from among the frequency components A of the level signal supplied thereto from the frequency analysis section 22 and determines the frequency of the maximum frequency component A 1 as the basic frequency f b of the level signal.
  • the frequency of the maximum frequency component A 1 is a frequency of a maximum basic frequency likelihood, that is, a frequency which is most likely as the basic frequency. Therefore, the frequency of the maximum frequency component A 1 from among the frequency components A of the level signal is determined as the basic frequency f b .
  • the tempo calculation section 31 determines the tempo t of the original audio signal using the following expression (1) based on the basic frequency f b and the sampling frequency f s of the level signal and supplies the tempo t to the tempo correction section 33 .
  • t f b /f s ⁇ 60 (1)
  • step S 14 the processing advances to step S 15 , at which the speed feeling detection section 32 performs a speed feeling detection process based on the frequency components A supplied thereto from the frequency analysis section 22 . Then, the speed feeling detection section 32 supplies a speed feeling S of the audio signal obtained by the speed feeling detection process to the tempo correction section 33 and outputs the speed feeling S to the outside.
  • step S 16 the processing advances to step S 16 , at which the tempo correction section 33 performs a tempo correction process of correcting the tempo t supplied thereto from the tempo calculation section 31 at step S 14 as occasion demands based on the speed feeling S supplied thereto from the speed feeling detection section 32 at step S 15 . Then, the tempo correction section 33 outputs a tempo t (or information representative of a tempo t) obtained by the tempo correction process to the outside, and then ends the process.
  • step S 16 the processing advances to step S 17 , at which the tempo fluctuation detection section 34 performs a tempo fluctuation detection process based on the frequency components A of the level signal supplied thereto from the frequency analysis section 22 . Then, the tempo fluctuation detection section 34 outputs a tempo fluctuation W obtained by the tempo fluctuation detection process and representative of the fluctuation of the tempo of the audio signal to the outside. Then, the tempo fluctuation detection section 34 ends the process.
  • tempo t, speed feeling S and tempo fluctuation W outputted to the outside at steps S 14 to S 16 described above are supplied, for example, to a monitor so that they are displayed on the monitor.
  • step S 31 the decimation filter section 43 of the frequency analysis section 22 ( FIG. 2 ) removes, in order to allow the down sampling section 44 at the next stage to perform down sampling, high frequency components of the level signal supplied thereto from the level signal production section 42 and supplies the resulting level signal to the down sampling section 44 . Thereafter, the processing advances to step S 32 .
  • the down sampling section 44 performs down sampling of the level signal supplied thereto from the decimation filter section 43 and supplies the level signal after the down sampling to the EQ processing section 45 .
  • step S 33 the processing advances to step S 33 , at which the EQ processing section 45 performs filter processing of the level signal supplied thereto from the down sampling section 44 to remove low frequency components and high frequency components of the level signal. Then, the EQ processing section 45 supplies the level signal having frequency components remaining as a result of the removal of the low and high frequency components to the window processing section 46 , whereafter the processing advances to step S 34 .
  • the window processing section 46 extracts, from the level signal supplied thereto from the EQ processing section 45 , a predetermined number of samples in a time series as the level signal of one block, and performs a window process for the level signal of the block and supplies the resulting level signal to the frequency conversion section 47 . It is to be noted that processes at the succeeding steps S 34 to S 36 are performed in a unit of a block.
  • step S 34 the processing advances to step S 35 , at which the frequency conversion section 47 performs discrete cosine transform for the level signal of the block supplied thereto from the window processing section 46 thereby to perform frequency conversion of the level signal. Then, the frequency conversion section 47 obtains, from among frequency components obtained by the frequency conversion of the level signal of the block, those frequency components which have frequencies corresponding to, for example, the tempos 50 to 1 , 600 and supplies the frequency components to the frequency component processing section 48 .
  • step S 36 the frequency component processing section 48 processes the frequency components of the level signal of the block from the frequency conversion section 47 .
  • the frequency component processing section 48 adds, to the frequency components of the frequencies corresponding to, for example, the tempos 50 to 400 from among the frequency components of the level signal of the block from the frequency conversion section 47 , frequency components (harmonics) of the frequencies corresponding to the tempos equal to twice, three times and four times the tempos, respectively. Then, the frequency component processing section 48 determines the sum values as new frequency components and thereby obtains frequency components of the frequencies corresponding to the range of the tempos 50 to 400 , and supplies the frequency components to the statistic processing section 49 .
  • step S 37 the statistic processing section 49 decides whether or not frequency components of the level signal of blocks for one tune are received from the frequency component processing section 48 . If it is decided that frequency components of the level signal of blocks for one tune are not received as yet, then the processing returns to step S 34 . Then at step S 34 , the window processing section 46 extracts, from within the level signal succeeding the level signal extracted as one block, the level signal for one block and performs a window process for the extracted level signal for one block. Then, the window processing section 46 supplies the level signal of the block after the window process to the frequency conversion section 47 , whereafter the processing advances to step S 35 so that the processes described above are repeated.
  • the window processing section 46 may extract the level signal for one block from a point of time immediately after the block extracted at step S 34 in the immediately preceding cycle and perform a window process for the extracted level signal for one block or may otherwise extract the level signal for one block such that the level signal for one block overlaps with the level signal of a block extracted at step S 34 in the immediately preceding cycle and perform a window process for the extracted level signal.
  • step S 37 If it is decided at step S 37 that frequency components of the level signal of blocks for one tune are received, then the processing advances to step S 38 , at which the statistic processing section 49 performs a statistic process for the blocks for one tune.
  • the statistic processing section 49 adds the frequency components of the level signal for one tune successively supplied thereto in a unit of a block from the frequency component processing section 48 for the individual frequencies. Then, the statistic processing section 49 supplies frequency components A of the frequencies of the level signal for one tune obtained by the statistic process to the feature extraction section 23 , whereafter the processing returns to step S 13 of FIG. 5 .
  • step S 14 the tempo calculation section 31 uses the frequency of the maximum frequency component A 1 from among the frequency components A obtained by the statistic process of the frequency components of the level signal of the blocks for one tune supplied thereto from the statistic processing section 49 as the basic frequency f b of the level signal to determine the tempo t in accordance with the expression (1) given hereinabove. Consequently, the tempo t of the audio signal corresponding to one tune can be determined with a high degree of accuracy.
  • the window processing section 46 extracts the level signal for one block as seen in FIG. 7B at step S 34 of FIG. 6 .
  • the window processing section 46 extracts a predetermined number of samples from the level signal illustrated in FIG. 7A as the level signal of one block.
  • the window processing section 46 performs a window process for the level signal of the block illustrated in FIG. 7B (or multiplies the level signal of the block by a predetermined window function) to obtain a level signal illustrated in FIG. 7C wherein opposite end portions of the block are attenuated.
  • the level signal of the block illustrated in FIG. 7C is supplied from the window processing section 46 to the frequency conversion section 47 . Then at step S 35 of FIG. 6 , the frequency conversion section 47 discrete cosine transforms the level signal to obtain frequency components of frequencies corresponding to the range of the tempos 50 to 1 , 600 as seen in FIG. 7D .
  • the axis of abscissa indicates the frequency and the axis of ordinate indicates the frequency component.
  • the frequency components of the frequencies corresponding to the range from the tempo 50 to the tempo 1 , 600 illustrated in FIG. 7D are supplied from the frequency conversion section 47 to the frequency component processing section 48 .
  • the frequency component processing section 48 adds, to the frequency components of the frequencies corresponding to the tempos 50 to 400 , frequency components (harmonics) of frequencies corresponding to tempos equal to twice, three times and four times the tempos, respectively. Then, the frequency component processing section 48 determines the sum values newly as frequency components of the frequencies corresponding to the tempos. Consequently, frequency components of the frequencies corresponding to the range of the tempos 50 to 400 are obtained as seen in FIG. 7E . It is to be noted that, in FIG.
  • the axis of abscissa indicates the frequency and the axis of ordinate indicates the frequency component similarly as in FIG. 7D .
  • the statistic processing section 49 adds, at step S 38 of FIG. 6 , the frequency components illustrated in FIG. 7E regarding the level signal of the blocks for one tune thereby to obtain, for example, frequency components A illustrated in FIG. 8 regarding the audio signal of one tune.
  • the frequency components A of FIG. 8 include 11 peaks (maximum values) A 1 to A 11 .
  • peaks maximum values
  • a 1 to A 11 ten comparatively high peaks in the descending order are the frequency components A 1 to A 10 , and the corresponding frequencies are frequencies f 1 to f 10 , respectively.
  • the maximum frequency component is the frequency component A 1 .
  • the frequency f 1 of the frequency component A 1 is determined as the basic frequency f b of the level signal, and the tempo t of the overall audio signal of one tune is determined in accordance with the expression (1) given hereinabove.
  • the peak extraction section 61 of the speed feeling detection section 32 of FIG. 3 extracts, from the frequency components A of the level signal supplied thereto from the statistic processing section 49 ( FIG. 2 ) at step S 38 of FIG. 6 , those frequency components which each forms a peak, and further extracts, from the extracted frequency components, ten frequency components A 1 to A 10 having comparatively high peaks in the descending order. Then, the peak extraction section 61 supplies the ten comparatively high frequency components A 1 to A 10 to the peak addition section 62 , and supplies the frequency components A 1 to A 10 and the corresponding frequencies f 1 to f 10 to the peak frequency arithmetic operation section 63 .
  • the peak extraction section 61 extracts, from among the peaks A 1 to A 11 which each forms a peak, the frequency components A 1 to A 10 which form ten comparatively high peaks in the descending order. Then, the frequency components A 1 to A 10 are supplied to the peak addition section 62 , and the frequency components A 1 to A 10 and the frequencies f 1 to f 10 are supplied to the peak frequency arithmetic operation section 63 .
  • step S 54 the speed feeling arithmetic operation section 64 arithmetically operates a speed feeling S (or information representative of a speed feeling S) based on the sum values ⁇ A i supplied thereto from the peak addition section 62 and the integrated value ⁇ A i ⁇ f i supplied thereto from the peak frequency arithmetic operation section 63 . Then, the speed feeling arithmetic operation section 64 supplies the speed feeling S to the tempo correction section 33 and outputs the speed feeling S to the outside. Then, the speed feeling arithmetic operation section 64 returns the processing to step S 16 of FIG. 5 .
  • the speed feeling arithmetic operation section 64 uses the following expression (2) to arithmetically operate a speed feeling S and supplies the speed feeling S to the tempo correction section 33 .
  • each of the frequencies f i of the frequency components which each forms a peak is weighted in accordance with the magnitude of the frequency component A i of the peak, and the weighted frequencies f i are added. Accordingly, the speed feeling S determined using the expression (2) exhibits a high value where comparatively high peaks of the frequency components A i exist much on the high frequency side, but exhibits a low value where comparatively high peaks of the frequency components A i exist much on the low frequency side.
  • the speed feeling S determined using the expression (2) is further described with reference to FIGS. 10 and 11 .
  • FIGS. 10 and 11 illustrate an example of the frequency components A of the audio signal of one tune obtained by the frequency analysis section 22 . It is to be noted that, in FIGS. 10 and 11 , the axis of abscissa indicates the frequency, and the axis of ordinate indicates the frequency component (basic frequency likelihood).
  • the frequency components A of the level signal are one-sided to the low frequency side as seen in FIG. 10 .
  • a speed feeling S having a low value is obtained.
  • the tempo correction section 33 decides whether or not the tempo t supplied thereto from the tempo calculation section 31 ( FIG. 1 ) at step S 14 of FIG. 5 is higher than a predetermined value (threshold value) TH 1 .
  • a predetermined value TH 1 is set, for example, upon manufacture of the feature value detection apparatus 1 , by a manufacturer of the feature value detection apparatus 1 .
  • step S 72 the tempo correction section 33 decides whether or not the speed feeling S supplied from the speed feeling detection section 32 at step S 54 of FIG. 9 is higher than a predetermined value (threshold value) TH 2 .
  • the predetermined value TH 2 is set, for example, upon manufacture of the feature value detection apparatus 1 , by a manufacturer of the feature value detection apparatus 1 .
  • step S 72 If it is decided at step S 72 that the speed feeling S from the speed feeling detection section 32 is higher than the predetermined value TH 2 , that is, if a process result that both of the tempo t and the speed feeling S are high is obtained with regard to the original audio signal, then the processing advances to step S 74 .
  • step S 71 If it is decided at step S 71 that the tempo t from the tempo calculation section 31 is not higher than the predetermined value TH 1 , that is, when the tempo t from the tempo calculation section 31 is slow, the processing advances to step S 73 .
  • step S 73 it is decided whether or not the speed feeling S supplied thereto from the speed feeling detection section 32 at step S 54 of FIG. 9 is higher than a predetermined value TH 3 similarly as at step S 72 .
  • the predetermined value TH 3 is set, for example, upon manufacture of the feature value detection apparatus 1 , by a manufacturer of the feature value detection apparatus 1 . Further, the values of the predetermined values TH 2 and TH 3 may be equal to each other or may be different from each other.
  • step S 73 If it is decided at step S 73 that the speed feeling S from the tempo calculation section 31 is not higher than the predetermined value TH 3 , that is, if a processing result that both of the tempo t and the speed feeling S are low is obtained with regard to the original audio signal, then the processing advances to step S 74 .
  • the tempo correction section 33 determines the tempo t from the tempo calculation section 31 as it is as a tempo of the audio signal. In particular, if it is decided at step S 72 that the speed feeling S is high, then since it is decided that the tempo t from the tempo calculation section 31 is fast and the speed feeling S from the speed feeling detection section 32 is high, it is determined that the tempo t from the tempo calculation section 31 is reasonable from comparison thereof with the speed feeling S. Thus, at step S 74 , the tempo t from the tempo calculation section 31 is finally determined as it is as the tempo of the audio signal.
  • step S 73 determines the speed feeling S is not high, since it is decided that the tempo t from the tempo calculation section 31 is slow and the speed feeling S from the speed feeling detection section 32 is low, it is still determined that the tempo t from the tempo calculation section 31 is reasonable from comparison thereof with the speed feeling S. Consequently, at step S 74 , the tempo t from the tempo calculation section 31 is finally determined as it is as the tempo of the audio signal. After the tempo calculation section 31 determines the tempo, the processing returns to step S 16 of FIG. 5 .
  • step S 72 If it is decided at step S 72 that the speed feeling S from the speed feeling detection section 32 is not higher than the predetermined value TH 2 , that is, if a processing result that the tempo t from the tempo calculation section 31 is fast but the speed feeling S from the speed feeling detection section 32 is low is obtained with regard to the original audio signal, then the processing advances to step S 75 .
  • the tempo correction section 33 determines a value of, for example, one half the tempo t from the tempo calculation section 31 as the tempo t of the audio signal.
  • the tempo correction section 33 corrects the tempo t from the tempo calculation section 31 to a value equal to one half the tempo t and determines the corrected value as the tempo of the audio signal.
  • step S 73 If it is decided at step S 73 that the speed feeling S from the speed feeling detection section 32 is higher than the predetermined value TH 3 , that is, if it is decided that the tempo t from the tempo calculation section 31 is slow but the speed feeling S from the speed feeling detection section 32 is high is obtained with regard to the original audio signal, then the processing advances to step S 76 .
  • the tempo correction section 33 determines a value of, for example, twice the tempo t from the tempo calculation section 31 as the tempo t of the audio signal.
  • the tempo correction section 33 corrects the tempo t from the tempo calculation section 31 to a value equal to twice the tempo t and determines the corrected value as the tempo of the audio signal.
  • the tempo correction section 33 corrects the tempo t from the tempo calculation section 31 based on the speed feeling S from the speed feeling detection section 32 , the accurate tempo t which corresponds to the speed feeling S can be obtained.
  • the addition section 81 adds the frequency components A of the frequencies corresponding to the range of the temps 50 to 400 supplied thereto from the frequency analysis section 22 at step S 38 of FIG. 6 over all of the frequencies and supplies a resulting sum value EA to the division section 83 .
  • the peak extraction section 82 extracts, from among the frequency components A of the frequencies corresponding to the range of the tempos 50 to 400 supplied thereto from the frequency analysis section 22 at step S 38 of FIG. 6 , the maximum frequency component A 1 and supplies the frequency component A 1 to the division section 83 .
  • step S 92 the processing advances to step S 93 , at which the division section 83 arithmetically operates a tempo fluctuation W based on the sum value ⁇ A of the frequency components A supplied thereto from the addition section 81 and the maximum frequency component A 1 supplied thereto from the peak extraction section 82 and outputs the tempo fluctuation W to the outside.
  • the tempo fluctuation W represents a ratio of the sum value ⁇ A of the frequency components to the maximum frequency component A 1 . Accordingly, the tempo fluctuation W determined using the expression (3) exhibits a low value where the frequency component A 1 is much greater than the other frequency components A, but exhibits a high value where the frequency component A 1 is not much greater than the other frequency components A.
  • FIGS. 14 and 15 illustrate an example of the frequency components A regarding an audio signal of one tune obtained by the frequency analysis section 22 . It is to be noted that the axis of abscissa indicates the frequency and the axis of ordinate indicates the frequency component (basic frequency likelihood).
  • a tempo fluctuation W of a value which corresponds to the degree of variation of the tempo of the audio signal can be determined.
  • the tempo t can be detected with a high degree of accuracy.
  • the tempo t or the tempo fluctuation W outputted from the feature value detection apparatus 1 is used, then it is possible to recommend music (a tune) to the user.
  • an audio signal of classic music or a live performance usually has a slow tempo t and has a great tempo fluctuation W.
  • an audio signal of music in which an electronic drum is used usually has a fast tempo t and a small tempo fluctuation W.
  • the tempo correction section 33 in the present embodiment corrects the tempo t determined by the frequency analysis of the level signal of the audio signal based on the speed feeling S of the audio signal
  • the correction of the tempo t may otherwise be performed for a tempo obtained by any method.
  • the adder 20 adds audio signals of the left channel and the right channel in order to moderate the load of processing
  • a feature value detection process can be performed for each channel without adding the audio signals of the left and right channels.
  • such feature values as the tempo t, speed feeling S or tempo fluctuation W can be detected with a high degree of accuracy for each of the audio signals of the left and right channels.
  • the feature value detection apparatus 1 uses discrete cosine transform for the frequency analysis of a level signal, for example, a comb filter, a short-time Fourier analysis, wavelet conversion and so forth can be used for the frequency analysis of a level signal.
  • processing for an audio signal can be performed such that the audio signal is band divided into a plurality of audio signals of different frequency bands and the processing is performed for each of the audio signals of the individual frequency bands.
  • the tempo t, speed feeling S and tempo fluctuation W can be detected with a higher degree of accuracy.
  • the audio signal may not be a stereo signal but be a monaural signal.
  • statistic processing section 49 performs a statistic process for blocks for one tune
  • the statistic process may be performed in a different manner, for example, for some of blocks of one tune.
  • the frequency conversion section 47 may perform discrete cosine transform for the overall level signal of one tune.
  • an audio signal in the form of a digital signal is inputted, it is otherwise possible to input an audio signal in the form of an analog signal. It is to be noted, however, that, in this instance, it is necessary to provide an A/D (Analog/Digital) converter, for example, at a preceding stage to the adder 20 or between the adder 20 and the level calculation section 21 .
  • A/D Analog/Digital
  • the arithmetic operation expression for the speed feeling S is not limited to the expression ( 2 ).
  • the arithmetic operation expression for the tempo fluctuation W is not limited to the expression ( 3 ).
  • the tempo t, speed feeling S and tempo fluctuation W are determined as feature values of an audio signal, it is possible to determine some other feature value such as the beat.
  • FIG. 16 shows an example of a configuration of a form of a computer into which a program for executing the series of processes described above is to be installed.
  • the program can be recorded in advance on a hard disk 105 or in a ROM 103 as a recording medium built in the computer.
  • the recording medium may be stored (recorded) temporarily or permanently on a removable recording medium 111 such as a flexible disk, a CD-ROM (Compact Disc-Read Only Memory), an MO (Magneto-Optical) disk, a DVD (Digital Versatile Disc), a magnetic disk or a semiconductor memory.
  • a removable recording medium 111 as just described can be provided as package software.
  • the program may not only be installed from such a removable recording medium 111 as described above into the computer but also be transferred from a download site by radio communication into the computer through an artificial satellite for digital satellite broadcasting or transferred by wire communication through a network such as a LAN (Local Area Network) or the Internet to the computer.
  • the computer thus can receive the program transferred in this manner by a communication section 108 and install the program into the hard disk 105 built therein.
  • the computer has a built-in CPU (Central Processing Unit) 102 .
  • An input/output interface 110 is connected to the CPU 102 through a bus 101 . Consequently, if an instruction is inputted through the input/output interface 110 when an inputting section 107 formed from a keyboard, a mouse, a microphone and so forth is operated by the user or the like, then the CPU 102 loads a program stored in the ROM (Read Only Memory) 103 in accordance with the instruction.
  • ROM Read Only Memory
  • the CPU 102 loads a program stored on the hard disk 105 , a program transferred from a satellite or a network, received by the communication section 108 and installed in the hard disk 105 , or a program read out from the removable recording medium 111 loaded in a drive 109 and installed in the hard disk 105 , into a RAM (Random Access Memory) 104 and then executes the program. Consequently, the CPU 102 performs the process in accordance with the flow charts described hereinabove or performs processes which can be performed by the configuration described hereinabove with reference to the block diagrams.
  • the CPU 102 causes, for example, an outputting section 106 , which is formed from an LCD (Liquid Crystal Display) unit, a speaker and so forth, to output a result of the process through the input/output interface 110 or causes the communication section 108 to transmit or the hard disk 105 to record the result of the process.
  • an outputting section 106 which is formed from an LCD (Liquid Crystal Display) unit, a speaker and so forth, to output a result of the process through the input/output interface 110 or causes the communication section 108 to transmit or the hard disk 105 to record the result of the process.
  • LCD Liquid Crystal Display
  • the steps which describe the program for causing a computer to execute various processes may be but need not necessarily be processed in a time series in the order as described as the flow charts, and include processes which are executed in parallel or individually (for example, processes by parallel processing or by an object).
  • the program may be processed by a single computer or may otherwise be processed in a distributed fashion by a plurality of computers. Further, the program may be transferred to and executed by a computer at a remote place.

Abstract

A signal processing apparatus and method is disclosed by which a feature value of an audio signal such as the tempo can be detected with a high degree of accuracy. A level calculation section produces a level signal representative of a transition of the level of an audio signal. A frequency analysis section frequency analyzes the level signal. A feature value extraction section determines a tempo, a speed feeling and a tempo fluctuation of the audio signal based on a result of the frequency analysis of the level signal. The invention can be applied to an apparatus which determines, for example, a tempo from an audio signal.

Description

    BACKGROUND OF THE INVENTION
  • This invention relates to a signal processing apparatus and a signal processing method, a program, and a recording medium, and more particularly to a signal processing apparatus and a signal processing method, a program, and a recording medium by which a feature value of an audio signal such as the tempo is detected with a high degree of accuracy.
  • Various methods are known by which the tempo of an audio signal of, for example, a tune is detected. According to one of the methods, a peak portion and a level of an autocorrelation function of sound production starting time of an audio signal are observed to analyze the periodicity of the sound production time, and the tempo which is the number of quarter notes for one minute is detected from a result of the analysis. The method described is disclosed, for example, in Japanese Patent Laid-Open No. 2002-116754.
  • However, according to such a method of detecting the tempo from the periodicity of sound production time of a peak portion of an autocorrelation function as described above, if a peak appears at a potion corresponding to an eighth note in an autocorrelation function, then not the number of quarter notes for one minute but the number of eighth notes is likely to be detected as the tempo. For example, also music of the tempo 60 (the number of quarter notes for one minute is 60) is sometimes detected as music of the tempo 120 wherein the number of peaks for one minute, that is, the number of eighth notes, is 120. Accordingly, it is difficult to accurately detect the tempo.
  • Also a large number of algorithms are available for detecting the tempo instantaneously from an audio signal for a certain short period of time. However, it is difficult to detect the tempo of an overall tune using the algorithms.
  • SUMMARY OF THE INVENTION
  • It is an object of the present invention to provide a signal processing apparatus and a signal processing method, a program, and a recording medium by which a feature value of an audio signal such as the tempo can be detected with a high degree of accuracy.
  • In order to attain the object described above, according to an aspect of the present invention, there is provided a signal processing apparatus for processing an audio signal, comprising a production section for producing a level signal representative of a transition of the level of the audio signal, a frequency analysis section for frequency analyzing the level signal produced by the production section, and a feature value calculation section for determining a feature value or values of the audio signal based on a result of the frequency analysis by the frequency analysis section.
  • According to another aspect of the present invention, there is provided a signal processing method for a signal processing apparatus which processes an audio signal, comprising a production step of producing a level signal representative of a transition of the level of the audio signal, a frequency analysis step of frequency analyzing the level signal produced by the process at the production step, and a feature value calculation step of determining a feature value or values of the audio signal based on a result of the frequency analysis by the process at the frequency analysis step.
  • According to a further aspect of the present invention, there is provided a program for causing a computer to execute processing of an audio signal, comprising a production step of producing a level signal representative of a transition of the level of the audio signal, a frequency analysis step of frequency analyzing the level signal produced by the process at the production step, and a feature value calculation step of determining a feature value or values of the audio signal based on a result of the frequency analysis by the process at the frequency analysis step.
  • According to a still further aspect of the present invention, there is provided a recording medium on or in which a program for causing a computer to execute processing of an audio signal is recorded, the program comprising a production step of producing a level signal representative of a transition of the level of the audio signal, a frequency analysis step of frequency analyzing the level signal produced by the process at the production step, and a feature value calculation step of determining a feature value or values of the audio signal based on a result of the frequency analysis by the process at the frequency analysis step.
  • In the signal processing apparatus, signal processing method, program and recording medium, a level signal representative of a transition of the level of an audio signal is produced and frequency analyzed. Then, a feature value of the audio signal is determined based on a result of the frequency analysis.
  • Therefore, with the signal processing apparatus, signal processing method, program and recording medium, a feature value of music such as the temp can be detected with a high degree of accuracy.
  • The above and other objects, features and advantages of the present invention will become apparent from the following description and the appended claims, taken in conjunction with the accompanying drawings in which like parts or elements denoted by like reference symbols.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram showing an example of a configuration of a feature value detection apparatus to which the present invention is applied;
  • FIG. 2 is a block diagram showing a detailed configuration of a level calculation section and a frequency analysis section shown in FIG. 1;
  • FIG. 3 is a block diagram showing a detailed configuration of a speed feeling detection section shown in FIG. 1;
  • FIG. 4 is a block diagram showing a detailed configuration of a tempo fluctuation detection section shown in FIG. 1;
  • FIG. 5 is a flow chart illustrating a feature value detection process performed by the feature value detection apparatus of FIG. 1;
  • FIG. 6 is a flow chart illustrating a frequency analysis process at step S13 of FIG. 5;
  • FIGS. 7A to 7E and 8 are waveform diagrams illustrating the frequency analysis process of a frequency analysis section shown in FIG. 1;
  • FIG. 9 is a flow chart illustrating a speed feeling detection process at step S15 of FIG. 5;
  • FIGS. 10 and 11 are diagrams illustrating different examples of frequency components of an audio signal of one tune obtained by the frequency analysis section shown in FIG. 1;
  • FIG. 12 is a flow chart illustrating a tempo correction process at step S16 of FIG. 5;
  • FIG. 13 is a flow chart illustrating a tempo fluctuation detection process at step S17 of FIG. 5;
  • FIGS. 14 and 15 are diagrams illustrating different examples of frequency components of an audio signal of one tune obtained by the frequency analysis section shown in FIG. 1; and
  • FIG. 16 is a block diagram showing an example of a configuration of a computer to which the present invention is applied.
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
  • Before the best mode for carrying out the present invention is described in detail, a corresponding relationship between several features recited in the accompanying claims and particular elements of the preferred embodiment described below is described. It is to be noted, however, that, even if some mode for carrying out the invention which is recited in the specification is not described in the description of the corresponding relationship below, this does not signify that the mode for carrying out the invention is out of the scope or spirit of the present invention. On the contrary, even if some mode for carrying out the invention is described as being within the scope or spirit of the present invention in the description of the corresponding relationship below, this does not signify that the mode is not within the spirit or scope of some other invention than the present invention.
  • Further, the following description does not signify all of the invention disclosed in the present specification. In other words, the following description does not deny the presence of an invention which is disclosed in the specification but is not recited in the claims of the present application, that is, the description does not deny the presence of an invention which may be filed for patent in a divisional patent application or may be additionally included into the present patent application as a result of later amendment.
  • According to claim 1 of the present invention, there is provided a signal processing apparatus (for example, a feature value detection apparatus 1 of FIG. 1) for processing an audio signal, comprising a production section (for example, a level calculation section 21 of FIG. 1) for producing a level signal representative of a transition of the level of the audio signal, a frequency analysis section (for example, a frequency analysis section 22 of FIG. 1) for frequency analyzing the level signal produced by the production section, and a feature value calculation section (for example, a feature extraction section 23 of FIG. 1) for determining a feature value or values of the audio signal based on a result of the frequency analysis by the frequency analysis section.
  • According to claim 6 of the present invention, the signal processing apparatus may further comprise a statistic processing section (for example, a statistic processing section 49 of FIG. 2) for performing a statistic process of the result of the frequency analysis by the frequency analysis section. In this instance, the feature value calculation section determines the feature value or values based on the result of the frequency analysis statistically processed by the statistic processing section.
  • According to claim 7 of the present invention, the signal processing apparatus may further comprise a frequency component processing section (for example, a frequency component processing section 48 of FIG. 2) for adding, to frequency components of the level signal of the result of the frequency analysis by the frequency analysis section, frequency components having a relationship of harmonics to the frequency components and outputting the sum values as the frequency components of the level signal. In this instance, the feature value calculation section determines the feature value or values based on the frequency components outputted from the frequency component processing section.
  • According to claim 8 of the present invention, there is provided a signal processing method for a signal processing apparatus which processes an audio signal, comprising a production step (for example, a step S12 of FIG. 5) of producing a level signal representative of a transition of the level of the audio signal, a frequency analysis step (for example, a step S13 of FIG. 5) of frequency analyzing the level signal produced by the process at the production step, and a feature value calculation step (for example, steps S14 to S16 of FIG. 5) of determining a feature value or values of the audio signal based on a result of the frequency analysis by the process at the frequency analysis step.
  • According to claims 9 and 10 of the present invention, there are provided a program for causing a computer to execute processing of an audio signal and a recording medium on or in which a program for causing a computer to execute processing of an audio signal is recorded, the program comprising a production step (for example, a step S12 of FIG. 5) of producing a level signal representative of a transition of the level of the audio signal, a frequency analysis step (for example, a step S13 of FIG. 5) of frequency analyzing the level signal produced by the process at the production step, and a feature value calculation step (for example, steps S14 to S16 of FIG. 5) of determining a feature value or values of the audio signal based on a result of the frequency analysis by the process at the frequency analysis step.
  • In the following, a preferred embodiment of the present invention is described.
  • Referring to FIG. 1, there is shown in block diagram an example of a configuration of a feature value detection apparatus to which the present invention is applied.
  • The feature value detection apparatus 1 shown receives an audio signal supplied thereto as a digital signal of a tune reproduced, for example, from a CD (Compact Disc) and detects and outputs, for example, a tempo t, a speed feeling S and a tempo fluctuation W as feature values of the audio signal. It is to be noted that, in FIG. 1, the audio signal supplied to the feature value detection apparatus 1 is a stereo signal.
  • The feature value detection apparatus 1 includes an adder 20, a level calculation section 21, a frequency analysis section 22 and a feature extraction section 23.
  • An audio signal of the left channel and another audio channel of the right channel of a tune are supplied to the adder 20. The adder 20 adds the audio signals of the left and right channels and supplies a resulting signal to the level calculation section 21.
  • The level calculation section 21 produces a level signal representative of a transition of the level of the audio signal supplied thereto from the adder 20 and supplies the produced level signal to the frequency analysis section 22.
  • The frequency analysis section 22 frequency analyzes the level signal representative of a transition of the level of the audio signal supplied thereto from the level calculation section 21 and outputs frequency components A of individual frequencies of the level signal as a result of the analysis. Then, the frequency analysis section 22 supplies the frequency components A to the feature extraction section 23.
  • The feature extraction section 23 includes a tempo calculation section 31, a speed feeling detection section 32, a tempo correction section 33 and a tempo fluctuation detection section 34.
  • The tempo calculation section 31 outputs a tempo (feature value) t of the audio signal based on the frequency components A of the level signal supplied thereto from the frequency analysis section 22 and supplies the tempo t to the tempo correction section 33.
  • The speed feeling detection section 32 detects a speed feeling S of the audio signal based on the frequency components A of the level signal supplied thereto from the frequency analysis section 22 and supplies the speed feeling S to the tempo correction section 33. Further, the speed feeling detection section 32 outputs the speed feeling S as one of feature values of the audio signal to the outside.
  • The tempo correction section 33 corrects the tempo t supplied thereto from the tempo calculation section 31 as occasion demands based on the speed feeling S supplied thereto from the speed feeling detection section 32. Then, the tempo correction section 33 outputs the corrected tempo t as one of feature values of the audio signal to the outside.
  • The tempo fluctuation detection section 34 detects a tempo fluctuation W which is a fluctuation of the tempo of the audio signal based on the frequency components A of the level signal supplied thereto from the frequency analysis section 22 and outputs the tempo fluctuation W as one of the feature values of the audio signal to the outside.
  • In the feature value detection apparatus 1 having such a configuration as described above, audio signals of the left channel and the right channel of a tune are supplied to the level calculation section 21 through the adder 20. The level calculation section 21 converts the audio signals into a level signal. Then, the frequency analysis section 22 detects frequency components A of the level signal, and the tempo calculation section 31 arithmetically operates the tempo t based on the frequency components A while the speed feeling detection section 32 detects the speed feeling S based on the frequency components A. The tempo correction section 33 corrects the tempo t based on the speed feeling S as occasion demands and outputs the corrected tempo t. Meanwhile, the tempo fluctuation detection section 34 detects and outputs the tempo fluctuation W based on the frequency components A.
  • FIG. 2 shows an example of a detailed configuration of the level calculation section 21 and the frequency analysis section 22 shown in FIG. 1.
  • Referring to FIG. 2, the level calculation section 21 includes an EQ (Equalize) processing section 41 and a level signal production section 42. The frequency analysis section 22 includes a decimation filter section 43, a down sampling section 44, an EQ processing section 45, a window processing section 46, a frequency conversion section 47, a frequency component processing section 48 and a statistic processing section 49.
  • An audio signal is supplied from the adder 20 to the EQ processing section 41. The EQ processing section 41 performs a filter process for the audio signal. For example, the EQ processing section 41 has a configuration of a high-pass filter (HPF) and removes low frequency components of the audio signal which are not suitable for extraction of the tempo t. Thus, the EQ processing section 41 outputs an audio signal of frequency components which are suitable for extraction of the tempo t to the level signal production section 42. It is to be noted that the coefficient of the filter used by the filter process of the EQ processing section 41 is not limited specifically.
  • The level signal production section 42 produces a level signal representative of a transition of the level of the audio signal supplied thereto from the EQ processing section 41 and supplies the level signal to (the decimation filter section 43 of) the frequency analysis section 22. It is to be noted that the level signal may represent, for example, an absolute value or a power (squared) value of the audio signal, a moving average (value) of such an absolute value or power value, a value used for level indication by a level meter or the like. If a value used for level indication by a level meter is adopted as the level signal here, then the absolute value of the audio signal at each sample point makes the level signal at the sample point. However, if the absolute value of the audio signal at a sample point whose level signal is to be outputted now is lower than the level signal at the immediately preceding sample point, then a value obtained by multiplying the level signal at the immediately preceding sample point by a release coefficient R equal to or higher than 0.0 but lower than 1.0 (0.0≦R≦1.0) is used as the level signal at the sample point whose level signal is to be outputted now.
  • The decimation filter section 43 removes high frequency components of the level signal supplied thereto from the level signal production section 42 in order to allow down sampling to be performed by the down sampling section 44 at the next stage. The decimation filter section 43 supplies a resulting level signal to the down sampling section 44.
  • The down sampling section 44 performs down sampling of the level signal supplied thereto from the decimation filter section 43. Here, in order to detect the tempo t, only those components of the level signal having frequencies of several hundreds Hz or so are required. Therefore, the down sampling section 44 samples out samples of the level signal to decrease the sampling frequency of the level signal to 172 Hz. The level signal after the down sampling is supplied to the EQ processing section 45. Here, the down sampling by the down sampling section 44 can reduce the load (arithmetic operation amount) of later processing.
  • The EQ processing section 45 performs a filter process of the level signal supplied thereto from the down sampling section 44 to remove low frequency components (for example, a dc component and frequency components lower than a frequency corresponding to the tempo 50 (the number of quarter notes for one minute is 50)) and high frequency components (frequency components higher than a frequency corresponding to the tempo 400 (the number of quarter notes for one minute is 400)) from the level signal. In other words, the EQ processing section 45 removes those low frequency components and high frequency components which are not suitable for extraction of the tempo t. Then, the EQ processing section 45 supplies a level signal of remaining frequencies as a result of the removal of the low frequency components and high frequency components to the window processing section 46. It is to be noted that, in the following description, the tempo of the audio signal where the number of quarter notes for one minute is referred to as tempo i.
  • The window processing section 46 extracts, from the level signal supplied thereto from the EQ processing section 45, the level signals for a predetermined period of time, that is, a predetermined number of samples of the level signal, as one block in a time sequence. Further, in order to reduce the influence of sudden variation of the level signal at the opposite ends of the block or for some other object, the window processing section 46 window processes the level signal of the block using a window function such as a Hamming window or a Hanning window by which portions at the opposite ends of the block are gradually attenuated (or multiplies the level signal of the block by a window function) and supplies a resulting level signal to the frequency conversion section 47.
  • The frequency conversion section 47 performs, for example, discrete cosine transform for the level signal of the block supplied thereto from the window processing section 46 to perform frequency conversion (frequency analysis) of the level signal. The frequency conversion section 47 obtains frequency components of frequencies corresponding, for example, to the tempos 50 to 1,600 from among the frequency components obtained by the frequency conversion of the level signal of the block and supplies the obtained frequency components to the frequency component processing section 48.
  • The frequency component processing section 48 processes the frequency components of the level signal of the block from the frequency conversion section 47. In particular, the frequency component processing section 48 adds, to the frequency components of frequencies corresponding to, for example, the tempos 50 to 400 from among the frequency components of the level signal of the block from the frequency conversion section 47, frequency components (harmonics) of frequencies corresponding to tempos equal to twice, three times and four times the tempos, respectively. Then, the frequency component processing section 48 determines results of the addition as frequency components of the frequencies corresponding to the tempos.
  • For example, to a frequency component of a frequency corresponding to the tempo 50, frequency components of a frequency corresponding to the tempo 100 which is twice the tempo 50, another frequency corresponding to the tempo 150 which is three times the tempo 50 and a further frequency corresponding to the tempo 200 which is four times the tempo 50 are added, and the sum is determined as a frequency component of the frequency corresponding to the tempo 50. Further, for example, to a frequency component of a frequency corresponding to the tempo 100, frequency components of a frequency corresponding to the tempo 200 which is twice the tempo 100, another frequency corresponding to the tempo 300 which is three times the tempo 100 and a further frequency corresponding to the tempo 400 which is four times the tempo 100 are added, and the sum is determined as a frequency component of the frequency corresponding to the tempo 100.
  • It is to be noted that, for example, the frequency component corresponding to the tempo 100 which is added when the frequency component corresponding to the tempo 50 is to be determined is a frequency component corresponding to the tempo 100 before frequency components of harmonics thereto are added. This also applies to the other tempos.
  • As described above, the frequency component processing section 48 adds, to individual frequency components of the frequencies corresponding to the range of the tempos 50 to 400, frequency components of harmonics to them and uses the sum values as new frequency components to obtain frequency components of the frequencies corresponding to the range of the tempos 50 to 400 for each block. The frequency component processing section 48 supplies the obtained frequency components to the statistic processing section 49.
  • Here, a frequency component of a certain frequency represents the degree of possibility that the frequency may be a basic frequency (pitch frequency) fb of the level signal. Accordingly, the frequency component of the certain frequency can be regarded as basic frequency likelihood of the frequency. It is to be noted that, since the basic frequency fb represents that the level signal exhibits repetitions with the basic frequency, it corresponds to the tempo of the original audio signal.
  • The statistic processing section 49 performs a statistic process for blocks of one tune. In particular, the statistic processing section 49 adds frequency components of the level signal for one tune supplied thereto in a unit of a block from the frequency component processing section 48 for each frequency. Then, the statistic processing section 49 supplies a result of the addition of frequency components over the blocks for one tune obtained by the statistic process as frequency components A of the level signal of the one tune to the feature extraction section 23.
  • FIG. 3 shows in block diagram an example of a detailed configuration of the speed feeling detection section 32 shown in FIG. 1.
  • Referring to FIG. 3, the speed feeling detection section 32 shown includes a peak extraction section 61, a peak addition section 62, a peak frequency arithmetic operation section 63 and a speed feeling arithmetic operation section 64.
  • Frequency components A of the level signal are supplied from the frequency analysis section 22 to the peak extraction section 61. The peak extraction section 61 extracts, for example, frequency components of peak values (maximum values) from among the frequency components A of the level signal and further extracts frequency components A1 to A10 having 10 comparatively high peak values in a descending order from the extracted frequency components. Here, the frequency component having the ith peak in the descending order is represented by Ai (i=1, 2, . . . ) and the corresponding frequency is represented by fi.
  • The peak extraction section 61 supplies the 10 comparatively high frequency components A1 to A10 to the peak addition section 62 and supplies the frequency components A1 to A10 and the corresponding frequencies f1 to f10 to the peak frequency arithmetic operation section 63.
  • The peak addition section 62 adds all of the frequency components A1 to A10 supplied thereto from the peak extraction section 61 and supplies a resulting sum value ΣAi (=A1+A2+ . . . +A10) to the speed feeling arithmetic operation section 64.
  • The peak frequency arithmetic operation section 63 uses the frequency components A1 to A10 and the frequencies f1 to f10 supplied thereto from the peak extraction section 61 to arithmetically operate an integrated value ΣAi×fi (=A1×f1+A2×f2+ . . . +A10×f10) which is a sum total of the products of the frequency components Ai and the frequencies fi. Then, the peak frequency arithmetic operation section 63 supplies the integrated value ΣAi×fi to the speed feeling arithmetic operation section 64.
  • The speed feeling arithmetic operation section 64 arithmetically operates a speed feeling S (or information representative of a speed feeling S) based on the sum value ΣAi supplied thereto from the peak addition section 62 and the integrated value ΣAi×fi supplied thereto from the peak frequency arithmetic operation section 63. The speed feeling arithmetic operation section 64 supplies the speed feeling S to the tempo correction section 33 and outputs the speed feeling S to the outside.
  • FIG. 4 shows in block diagram an example of a detailed configuration of the tempo fluctuation detection section 34 shown in FIG. 1.
  • Referring to FIG. 4, the tempo fluctuation detection section 34 shown includes an addition section 81, a peak extraction section 82 and a division section 83.
  • The frequency components A of the frequencies corresponding to the range of the tempos 50 to 400 are supplied from the frequency analysis section 22 to the addition section 81. The addition section 81 adds the frequency components A supplied thereto from the frequency analysis section 22 over all of the frequencies and supplies a resulting sum value ΣA to the division section 83.
  • The frequency components A of the frequencies corresponding to the range of the tempos 50 to 400 from the frequency analysis section 22 are supplied also to the peak extraction section 82. The peak extraction section 82 extracts the maximum frequency component A1 from among the frequency components A and supplies the frequency component A1 to the division section 83.
  • The division section 83 arithmetically operates a tempo fluctuation W based on the sum value ΣA of the frequency components A supplied thereto from the addition section 81 and the maximum frequency component A1 supplied thereto from the peak extraction section 82 and outputs the tempo fluctuation W to the outside.
  • Now, a feature value detection process performed by the feature value detection apparatus 1 of FIG. 1 is described with reference to a flow chart of FIG. 5. The feature value detection process is started when audio signals of the left and right channels are supplied to the adder 20.
  • At step S11, the adder 20 adds the audio signals of the left and right channels and supplies a resulting audio signal to the level calculation section 21. Thereafter, the processing advances to step S12.
  • At step S12, the level calculation section 21 produces a level signal of the audio signal supplied thereto from the adder 20 and supplies the level signal to the frequency analysis section 22.
  • More particularly, the EQ processing section 41 of the level calculation section 21 removes low frequency components of the audio signal which are not suitable for extraction of the tempo t and supplies the audio signal of frequency components suitable for extraction of the tempo t to the level signal processing sections 42. Then, the level signal production section 42 produces a level signal representative of a transition of the level of the audio signal supplied thereto from the EQ processing section 41 and supplies the level signal to the frequency analysis section 22.
  • After the process at step S12, the processing advances to step S13, at which the frequency analysis section 22 frequency analyzes the level signal supplied thereto from the level calculation section 21 and outputs frequency components A of individual frequencies of the level signal as a result of the analysis. Then, the frequency analysis section 22 supplies the frequency components A to the tempo calculation section 31, speed feeling detection section 32 and tempo fluctuation detection section 34 of the feature extraction section 23. Thereafter, the processing advances to step S14.
  • At step S14, the tempo calculation section 31 determines a tempo t of the audio signal based on the frequency components A of the level signal supplied thereto from the frequency analysis section 22 and supplies the tempo t to the tempo correction section 33.
  • More particularly, the tempo calculation section 31 extracts the maximum frequency component A1 from among the frequency components A of the level signal supplied thereto from the frequency analysis section 22 and determines the frequency of the maximum frequency component A1 as the basic frequency fb of the level signal. In particular, since each of the frequency components A of the frequencies of the level signal represents a basic frequency likelihood of the frequency as described hereinabove, the frequency of the maximum frequency component A1 is a frequency of a maximum basic frequency likelihood, that is, a frequency which is most likely as the basic frequency. Therefore, the frequency of the maximum frequency component A1 from among the frequency components A of the level signal is determined as the basic frequency fb.
  • Further, the tempo calculation section 31 determines the tempo t of the original audio signal using the following expression (1) based on the basic frequency fb and the sampling frequency fs of the level signal and supplies the tempo t to the tempo correction section 33.
    t=f b /f s×60   (1)
  • After the process at step S14, the processing advances to step S15, at which the speed feeling detection section 32 performs a speed feeling detection process based on the frequency components A supplied thereto from the frequency analysis section 22. Then, the speed feeling detection section 32 supplies a speed feeling S of the audio signal obtained by the speed feeling detection process to the tempo correction section 33 and outputs the speed feeling S to the outside.
  • After the process at step S15, the processing advances to step S16, at which the tempo correction section 33 performs a tempo correction process of correcting the tempo t supplied thereto from the tempo calculation section 31 at step S14 as occasion demands based on the speed feeling S supplied thereto from the speed feeling detection section 32 at step S15. Then, the tempo correction section 33 outputs a tempo t (or information representative of a tempo t) obtained by the tempo correction process to the outside, and then ends the process.
  • After the process at step S16, the processing advances to step S17, at which the tempo fluctuation detection section 34 performs a tempo fluctuation detection process based on the frequency components A of the level signal supplied thereto from the frequency analysis section 22. Then, the tempo fluctuation detection section 34 outputs a tempo fluctuation W obtained by the tempo fluctuation detection process and representative of the fluctuation of the tempo of the audio signal to the outside. Then, the tempo fluctuation detection section 34 ends the process.
  • It is to be noted that the tempo t, speed feeling S and tempo fluctuation W outputted to the outside at steps S14 to S16 described above are supplied, for example, to a monitor so that they are displayed on the monitor.
  • Now, the frequency analysis process at step S13 of FIG. 5 is described with reference to a flow chart of FIG. 6.
  • At step S31, the decimation filter section 43 of the frequency analysis section 22 (FIG. 2) removes, in order to allow the down sampling section 44 at the next stage to perform down sampling, high frequency components of the level signal supplied thereto from the level signal production section 42 and supplies the resulting level signal to the down sampling section 44. Thereafter, the processing advances to step S32.
  • At step S32, the down sampling section 44 performs down sampling of the level signal supplied thereto from the decimation filter section 43 and supplies the level signal after the down sampling to the EQ processing section 45.
  • After the process at step S32, the processing advances to step S33, at which the EQ processing section 45 performs filter processing of the level signal supplied thereto from the down sampling section 44 to remove low frequency components and high frequency components of the level signal. Then, the EQ processing section 45 supplies the level signal having frequency components remaining as a result of the removal of the low and high frequency components to the window processing section 46, whereafter the processing advances to step S34.
  • At step S34, the window processing section 46 extracts, from the level signal supplied thereto from the EQ processing section 45, a predetermined number of samples in a time series as the level signal of one block, and performs a window process for the level signal of the block and supplies the resulting level signal to the frequency conversion section 47. It is to be noted that processes at the succeeding steps S34 to S36 are performed in a unit of a block.
  • After the process at step S34, the processing advances to step S35, at which the frequency conversion section 47 performs discrete cosine transform for the level signal of the block supplied thereto from the window processing section 46 thereby to perform frequency conversion of the level signal. Then, the frequency conversion section 47 obtains, from among frequency components obtained by the frequency conversion of the level signal of the block, those frequency components which have frequencies corresponding to, for example, the tempos 50 to 1,600 and supplies the frequency components to the frequency component processing section 48.
  • After the process at step S35, the processing advances to step S36, at which the frequency component processing section 48 processes the frequency components of the level signal of the block from the frequency conversion section 47. In particular, the frequency component processing section 48 adds, to the frequency components of the frequencies corresponding to, for example, the tempos 50 to 400 from among the frequency components of the level signal of the block from the frequency conversion section 47, frequency components (harmonics) of the frequencies corresponding to the tempos equal to twice, three times and four times the tempos, respectively. Then, the frequency component processing section 48 determines the sum values as new frequency components and thereby obtains frequency components of the frequencies corresponding to the range of the tempos 50 to 400, and supplies the frequency components to the statistic processing section 49.
  • After the process at step S36, the processing advances to step S37, at which the statistic processing section 49 decides whether or not frequency components of the level signal of blocks for one tune are received from the frequency component processing section 48. If it is decided that frequency components of the level signal of blocks for one tune are not received as yet, then the processing returns to step S34. Then at step S34, the window processing section 46 extracts, from within the level signal succeeding the level signal extracted as one block, the level signal for one block and performs a window process for the extracted level signal for one block. Then, the window processing section 46 supplies the level signal of the block after the window process to the frequency conversion section 47, whereafter the processing advances to step S35 so that the processes described above are repeated.
  • It is to be noted that the window processing section 46 may extract the level signal for one block from a point of time immediately after the block extracted at step S34 in the immediately preceding cycle and perform a window process for the extracted level signal for one block or may otherwise extract the level signal for one block such that the level signal for one block overlaps with the level signal of a block extracted at step S34 in the immediately preceding cycle and perform a window process for the extracted level signal.
  • If it is decided at step S37 that frequency components of the level signal of blocks for one tune are received, then the processing advances to step S38, at which the statistic processing section 49 performs a statistic process for the blocks for one tune. In particular, the statistic processing section 49 adds the frequency components of the level signal for one tune successively supplied thereto in a unit of a block from the frequency component processing section 48 for the individual frequencies. Then, the statistic processing section 49 supplies frequency components A of the frequencies of the level signal for one tune obtained by the statistic process to the feature extraction section 23, whereafter the processing returns to step S13 of FIG. 5.
  • After the process at step S13 of FIG. 5, the processing advances to step S14, at which the tempo calculation section 31 uses the frequency of the maximum frequency component A1 from among the frequency components A obtained by the statistic process of the frequency components of the level signal of the blocks for one tune supplied thereto from the statistic processing section 49 as the basic frequency fb of the level signal to determine the tempo t in accordance with the expression (1) given hereinabove. Consequently, the tempo t of the audio signal corresponding to one tune can be determined with a high degree of accuracy.
  • Now, the frequency analysis process of the frequency analysis section 22 is described with reference to FIGS. 7A to 7E and 8.
  • If a level signal illustrated in FIG. 7A is supplied from the EQ processing section 45 to the window processing section 46 in the frequency analysis section 22, then the window processing section 46 extracts the level signal for one block as seen in FIG. 7B at step S34 of FIG. 6. In particular, the window processing section 46 extracts a predetermined number of samples from the level signal illustrated in FIG. 7A as the level signal of one block. Then, the window processing section 46 performs a window process for the level signal of the block illustrated in FIG. 7B (or multiplies the level signal of the block by a predetermined window function) to obtain a level signal illustrated in FIG. 7C wherein opposite end portions of the block are attenuated.
  • The level signal of the block illustrated in FIG. 7C is supplied from the window processing section 46 to the frequency conversion section 47. Then at step S35 of FIG. 6, the frequency conversion section 47 discrete cosine transforms the level signal to obtain frequency components of frequencies corresponding to the range of the tempos 50 to 1,600 as seen in FIG. 7D. It is to be noted that, in FIG. 7D, the axis of abscissa indicates the frequency and the axis of ordinate indicates the frequency component. “T=50” indicated on the axis of abscissa represents the value of a frequency corresponding to the tempo 50, and “T=1600” represents the value of a frequency corresponding to the tempo 1,600.
  • The frequency components of the frequencies corresponding to the range from the tempo 50 to the tempo 1,600 illustrated in FIG. 7D are supplied from the frequency conversion section 47 to the frequency component processing section 48. Thus, at step S36 of FIG. 6, the frequency component processing section 48 adds, to the frequency components of the frequencies corresponding to the tempos 50 to 400, frequency components (harmonics) of frequencies corresponding to tempos equal to twice, three times and four times the tempos, respectively. Then, the frequency component processing section 48 determines the sum values newly as frequency components of the frequencies corresponding to the tempos. Consequently, frequency components of the frequencies corresponding to the range of the tempos 50 to 400 are obtained as seen in FIG. 7E. It is to be noted that, in FIG. 7E, the axis of abscissa indicates the frequency and the axis of ordinate indicates the frequency component similarly as in FIG. 7D. Further, “T=50” indicated on the axis of abscissa represents the value of a frequency corresponding to the tempo 50, and “T=400” indicates the value of a frequency corresponding to the tempo 400.
  • When such processes as described above are performed for the level signal of blocks for one tune and the frequency components of the frequencies illustrated in FIG. 7E regarding the level signal of blocks for one tune are supplied from the frequency component processing section 48 to the statistic processing section 49, the statistic processing section 49 adds, at step S38 of FIG. 6, the frequency components illustrated in FIG. 7E regarding the level signal of the blocks for one tune thereby to obtain, for example, frequency components A illustrated in FIG. 8 regarding the audio signal of one tune.
  • The frequency components A of FIG. 8 include 11 peaks (maximum values) A1 to A11. Here, of the eleven peaks A1 to A11, ten comparatively high peaks in the descending order are the frequency components A1 to A10, and the corresponding frequencies are frequencies f1 to f10, respectively. Then, the maximum frequency component is the frequency component A1.
  • In this instance, at step S14 of FIG. 5, the frequency f1 of the frequency component A1 is determined as the basic frequency fb of the level signal, and the tempo t of the overall audio signal of one tune is determined in accordance with the expression (1) given hereinabove.
  • Now, the speed feeling detection process at step S15 of FIG. 5 is described with reference to a flow chart of FIG. 9.
  • At step S51, the peak extraction section 61 of the speed feeling detection section 32 of FIG. 3 extracts, from the frequency components A of the level signal supplied thereto from the statistic processing section 49 (FIG. 2) at step S38 of FIG. 6, those frequency components which each forms a peak, and further extracts, from the extracted frequency components, ten frequency components A1 to A10 having comparatively high peaks in the descending order. Then, the peak extraction section 61 supplies the ten comparatively high frequency components A1 to A10 to the peak addition section 62, and supplies the frequency components A1 to A10 and the corresponding frequencies f1 to f10 to the peak frequency arithmetic operation section 63.
  • For example, if the frequency components A illustrated in FIG. 8 are supplied from the statistic processing section 49 to the speed feeling detection section 32, then the peak extraction section 61 extracts, from among the peaks A1 to A11 which each forms a peak, the frequency components A1 to A10 which form ten comparatively high peaks in the descending order. Then, the frequency components A1 to A10 are supplied to the peak addition section 62, and the frequency components A1 to A10 and the frequencies f1 to f10 are supplied to the peak frequency arithmetic operation section 63.
  • After the process at step S51, the processing advances to step S52, at which the peak addition section 62 adds all of the frequency components A1 to A10 supplied thereto from the peak extraction section 61 and supplies a sum value ΣAi (=A1+A2+ . . . +A10) to the speed feeling arithmetic operation section 64.
  • After the process at step S52, the processing advances to step S53, at which the peak frequency arithmetic operation section 63 uses the frequency components A1 to A10 and the frequencies f1 to f10 supplied thereto from the peak extraction section 61 to arithmetically operate an integrated value ΣAi×fi (=A1×f1+A2×f2+ . . . +A10×f10) which is the sum total of the products of the frequency components Ai and the frequencies fi. Then, the peak frequency arithmetic operation section 63 supplies the integrated value ΣAi×i to the speed feeling arithmetic operation section 64.
  • After the process at step S53, the processing advances to step S54, at which the speed feeling arithmetic operation section 64 arithmetically operates a speed feeling S (or information representative of a speed feeling S) based on the sum values ΣAi supplied thereto from the peak addition section 62 and the integrated value ΣAi×fi supplied thereto from the peak frequency arithmetic operation section 63. Then, the speed feeling arithmetic operation section 64 supplies the speed feeling S to the tempo correction section 33 and outputs the speed feeling S to the outside. Then, the speed feeling arithmetic operation section 64 returns the processing to step S16 of FIG. 5.
  • In particular, the speed feeling arithmetic operation section 64 uses the following expression (2) to arithmetically operate a speed feeling S and supplies the speed feeling S to the tempo correction section 33. S = i = 1 10 A i × f i i = 1 10 A i = A i i = 1 10 A i × f 1 + A 2 i = 1 10 A i × f 2 + + A 10 i = 1 10 A i × f 10 ( 2 )
  • In the expression (2) above, each of the frequencies fi of the frequency components which each forms a peak is weighted in accordance with the magnitude of the frequency component Ai of the peak, and the weighted frequencies fi are added. Accordingly, the speed feeling S determined using the expression (2) exhibits a high value where comparatively high peaks of the frequency components Ai exist much on the high frequency side, but exhibits a low value where comparatively high peaks of the frequency components Ai exist much on the low frequency side.
  • The speed feeling S determined using the expression (2) is further described with reference to FIGS. 10 and 11.
  • FIGS. 10 and 11 illustrate an example of the frequency components A of the audio signal of one tune obtained by the frequency analysis section 22. It is to be noted that, in FIGS. 10 and 11, the axis of abscissa indicates the frequency, and the axis of ordinate indicates the frequency component (basic frequency likelihood).
  • In the case of an audio signal which does not have a speed feeling (a slow audio signal), the frequency components A of the level signal are one-sided to the low frequency side as seen in FIG. 10. In this instance, according to the expression (2), a speed feeling S having a low value is obtained.
  • On the other hand, in the case of an audio signal which has a speed feeling (a fast audio signal), the frequency components A of the level signal are one-sided to the high frequency side as seen in FIG. 11. In this instance, according to the expression (2), a speed feeling S having a high value is obtained.
  • Accordingly, according to the expression (2), a value corresponding to a speed feeling of the audio signal is obtained.
  • Now, the tempo correction process at step S16 of FIG. 5 is described with reference to a flow chart of FIG. 12.
  • At step S71, the tempo correction section 33 decides whether or not the tempo t supplied thereto from the tempo calculation section 31 (FIG. 1) at step S14 of FIG. 5 is higher than a predetermined value (threshold value) TH1. It is to be noted that the predetermined value TH1 is set, for example, upon manufacture of the feature value detection apparatus 1, by a manufacturer of the feature value detection apparatus 1.
  • If it is decided at step S71 that the tempo t from the tempo calculation section 31 is higher than the predetermined value TH1, that is, when the tempo t from the tempo calculation section 31 is fast, the processing advances to step S72. At step S72, the tempo correction section 33 decides whether or not the speed feeling S supplied from the speed feeling detection section 32 at step S54 of FIG. 9 is higher than a predetermined value (threshold value) TH2. It is to be noted that the predetermined value TH2 is set, for example, upon manufacture of the feature value detection apparatus 1, by a manufacturer of the feature value detection apparatus 1.
  • If it is decided at step S72 that the speed feeling S from the speed feeling detection section 32 is higher than the predetermined value TH2, that is, if a process result that both of the tempo t and the speed feeling S are high is obtained with regard to the original audio signal, then the processing advances to step S74.
  • If it is decided at step S71 that the tempo t from the tempo calculation section 31 is not higher than the predetermined value TH1, that is, when the tempo t from the tempo calculation section 31 is slow, the processing advances to step S73. At step S73, it is decided whether or not the speed feeling S supplied thereto from the speed feeling detection section 32 at step S54 of FIG. 9 is higher than a predetermined value TH3 similarly as at step S72.
  • It is to be noted that the predetermined value TH3 is set, for example, upon manufacture of the feature value detection apparatus 1, by a manufacturer of the feature value detection apparatus 1. Further, the values of the predetermined values TH2 and TH3 may be equal to each other or may be different from each other.
  • If it is decided at step S73 that the speed feeling S from the tempo calculation section 31 is not higher than the predetermined value TH3, that is, if a processing result that both of the tempo t and the speed feeling S are low is obtained with regard to the original audio signal, then the processing advances to step S74.
  • At step S74, the tempo correction section 33 determines the tempo t from the tempo calculation section 31 as it is as a tempo of the audio signal. In particular, if it is decided at step S72 that the speed feeling S is high, then since it is decided that the tempo t from the tempo calculation section 31 is fast and the speed feeling S from the speed feeling detection section 32 is high, it is determined that the tempo t from the tempo calculation section 31 is reasonable from comparison thereof with the speed feeling S. Thus, at step S74, the tempo t from the tempo calculation section 31 is finally determined as it is as the tempo of the audio signal.
  • On the other hand, if it is decided at step S73 that the speed feeling S is not high, since it is decided that the tempo t from the tempo calculation section 31 is slow and the speed feeling S from the speed feeling detection section 32 is low, it is still determined that the tempo t from the tempo calculation section 31 is reasonable from comparison thereof with the speed feeling S. Consequently, at step S74, the tempo t from the tempo calculation section 31 is finally determined as it is as the tempo of the audio signal. After the tempo calculation section 31 determines the tempo, the processing returns to step S16 of FIG. 5.
  • If it is decided at step S72 that the speed feeling S from the speed feeling detection section 32 is not higher than the predetermined value TH2, that is, if a processing result that the tempo t from the tempo calculation section 31 is fast but the speed feeling S from the speed feeling detection section 32 is low is obtained with regard to the original audio signal, then the processing advances to step S75.
  • At step S75, the tempo correction section 33 determines a value of, for example, one half the tempo t from the tempo calculation section 31 as the tempo t of the audio signal. In particular, in the present case, since it is decided that the tempo t from the tempo calculation section 31 is fast but the speed feeling S from the speed feeling detection section 32 is low, the tempo t from the tempo calculation section 31 does not correspond to the speed feeling S from the speed feeling detection section 32. Therefore, the tempo correction section 33 corrects the tempo t from the tempo calculation section 31 to a value equal to one half the tempo t and determines the corrected value as the tempo of the audio signal. After the tempo correction section 33 determines the tempo, the processing returns to step S16 of FIG. 5.
  • If it is decided at step S73 that the speed feeling S from the speed feeling detection section 32 is higher than the predetermined value TH3, that is, if it is decided that the tempo t from the tempo calculation section 31 is slow but the speed feeling S from the speed feeling detection section 32 is high is obtained with regard to the original audio signal, then the processing advances to step S76.
  • At step S76, the tempo correction section 33 determines a value of, for example, twice the tempo t from the tempo calculation section 31 as the tempo t of the audio signal. In particular, in the present case, since it is decided that the tempo t from the tempo calculation section 31 is slow but the speed feeling S from the speed feeling detection section 32 is high, the tempo t from the tempo calculation section 31 does not correspond to the speed feeling S from the speed feeling detection section 32. Therefore, the tempo correction section 33 corrects the tempo t from the tempo calculation section 31 to a value equal to twice the tempo t and determines the corrected value as the tempo of the audio signal. After the tempo correction section 33 determines the tempo, the processing returns to step S16 of FIG. 5.
  • As described above, since, at steps S74 to S76 of FIG. 12, the tempo correction section 33 corrects the tempo t from the tempo calculation section 31 based on the speed feeling S from the speed feeling detection section 32, the accurate tempo t which corresponds to the speed feeling S can be obtained.
  • Now, the tempo fluctuation detection process executed at step S17 of FIG. 5 by the tempo fluctuation detection section 34 of FIG. 4 is described with reference to a flow chart of FIG. 13.
  • At step S91, the addition section 81 adds the frequency components A of the frequencies corresponding to the range of the temps 50 to 400 supplied thereto from the frequency analysis section 22 at step S38 of FIG. 6 over all of the frequencies and supplies a resulting sum value EA to the division section 83.
  • At step S92 after the process at step S91, the peak extraction section 82 extracts, from among the frequency components A of the frequencies corresponding to the range of the tempos 50 to 400 supplied thereto from the frequency analysis section 22 at step S38 of FIG. 6, the maximum frequency component A1 and supplies the frequency component A1 to the division section 83.
  • After the process at step S92, the processing advances to step S93, at which the division section 83 arithmetically operates a tempo fluctuation W based on the sum value ΣA of the frequency components A supplied thereto from the addition section 81 and the maximum frequency component A1 supplied thereto from the peak extraction section 82 and outputs the tempo fluctuation W to the outside.
  • More particularly, the division section 83 arithmetically operates the tempo fluctuation W using the following expression (3): W = Σ A A 1 ( 3 )
  • According to the expression (3), the tempo fluctuation W represents a ratio of the sum value ΣA of the frequency components to the maximum frequency component A1. Accordingly, the tempo fluctuation W determined using the expression (3) exhibits a low value where the frequency component A1 is much greater than the other frequency components A, but exhibits a high value where the frequency component A1 is not much greater than the other frequency components A.
  • Now, the speed feeling S determined using the expression (3) is described with reference to FIGS. 14 and 15.
  • FIGS. 14 and 15 illustrate an example of the frequency components A regarding an audio signal of one tune obtained by the frequency analysis section 22. It is to be noted that the axis of abscissa indicates the frequency and the axis of ordinate indicates the frequency component (basic frequency likelihood).
  • In the case of an audio signal whose tempo fluctuation is small, that is, in the case of an audio signal whose tempo varies little, the maximum frequency component A1 of the level signal of the audio signal is outstandingly greater than the other frequency components A as seen in FIG. 14. In this instance, according to the expression (3) above, a tempo fluctuation W of a low value is determined.
  • On the other hand, in the case of an audio signal whose tempo fluctuation is great, the maximum frequency component A1 of the level signal thereof is not outstandingly greater than the other frequency components A as seen in FIG. 15. In this instance, according to the expression (3), a tempo fluctuation W having a high value is obtained.
  • Accordingly, according to the expression (3), a tempo fluctuation W of a value which corresponds to the degree of variation of the tempo of the audio signal can be determined.
  • As described above, according to the feature value detection apparatus 1, since a level signal of an audio signal is determined and frequency analyzed and the tempo t is determined based on a result of the frequency analysis, the tempo t can be detected with a high degree of accuracy.
  • Further, if the tempo t or the tempo fluctuation W outputted from the feature value detection apparatus 1 is used, then it is possible to recommend music (a tune) to the user.
  • For example, an audio signal of classic music or a live performance usually has a slow tempo t and has a great tempo fluctuation W. On the other hand, for example, an audio signal of music in which an electronic drum is used usually has a fast tempo t and a small tempo fluctuation W.
  • Accordingly, it is possible to identify a genre and so forth of an audio signal based on the tempo t and/or the tempo fluctuation W and recommend a tune of a desirable genre to the user.
  • It is to be noted that, while the tempo correction section 33 in the present embodiment corrects the tempo t determined by the frequency analysis of the level signal of the audio signal based on the speed feeling S of the audio signal, the correction of the tempo t may otherwise be performed for a tempo obtained by any method.
  • Further, while, in the feature value detection apparatus 1, the adder 20 adds audio signals of the left channel and the right channel in order to moderate the load of processing, a feature value detection process can be performed for each channel without adding the audio signals of the left and right channels. In this instance, such feature values as the tempo t, speed feeling S or tempo fluctuation W can be detected with a high degree of accuracy for each of the audio signals of the left and right channels.
  • Further, while the feature value detection apparatus 1 uses discrete cosine transform for the frequency analysis of a level signal, for example, a comb filter, a short-time Fourier analysis, wavelet conversion and so forth can be used for the frequency analysis of a level signal.
  • Further, in the feature value detection apparatus 1, processing for an audio signal can be performed such that the audio signal is band divided into a plurality of audio signals of different frequency bands and the processing is performed for each of the audio signals of the individual frequency bands. In this instance, the tempo t, speed feeling S and tempo fluctuation W can be detected with a higher degree of accuracy.
  • Further, the audio signal may not be a stereo signal but be a monaural signal.
  • Further, while the statistic processing section 49 performs a statistic process for blocks for one tune, the statistic process may be performed in a different manner, for example, for some of blocks of one tune.
  • Further, the frequency conversion section 47 may perform discrete cosine transform for the overall level signal of one tune.
  • Further, while, in the present embodiment, an audio signal in the form of a digital signal is inputted, it is otherwise possible to input an audio signal in the form of an analog signal. It is to be noted, however, that, in this instance, it is necessary to provide an A/D (Analog/Digital) converter, for example, at a preceding stage to the adder 20 or between the adder 20 and the level calculation section 21.
  • Furthermore, the arithmetic operation expression for the speed feeling S is not limited to the expression (2). Similarly, also the arithmetic operation expression for the tempo fluctuation W is not limited to the expression (3).
  • Further, while, in the present embodiment, the tempo t, speed feeling S and tempo fluctuation W are determined as feature values of an audio signal, it is possible to determine some other feature value such as the beat.
  • While the series of processes described above can be executed by hardware for exclusive use, it may otherwise be executed by software. Where the series of processes is executed by software, a program which constructs the software is installed into a computer for universal use or the like.
  • FIG. 16 shows an example of a configuration of a form of a computer into which a program for executing the series of processes described above is to be installed.
  • The program can be recorded in advance on a hard disk 105 or in a ROM 103 as a recording medium built in the computer.
  • Or, the recording medium may be stored (recorded) temporarily or permanently on a removable recording medium 111 such as a flexible disk, a CD-ROM (Compact Disc-Read Only Memory), an MO (Magneto-Optical) disk, a DVD (Digital Versatile Disc), a magnetic disk or a semiconductor memory. Such a removable recording medium 111 as just described can be provided as package software.
  • It is to be noted that the program may not only be installed from such a removable recording medium 111 as described above into the computer but also be transferred from a download site by radio communication into the computer through an artificial satellite for digital satellite broadcasting or transferred by wire communication through a network such as a LAN (Local Area Network) or the Internet to the computer. The computer thus can receive the program transferred in this manner by a communication section 108 and install the program into the hard disk 105 built therein.
  • The computer has a built-in CPU (Central Processing Unit) 102. An input/output interface 110 is connected to the CPU 102 through a bus 101. Consequently, if an instruction is inputted through the input/output interface 110 when an inputting section 107 formed from a keyboard, a mouse, a microphone and so forth is operated by the user or the like, then the CPU 102 loads a program stored in the ROM (Read Only Memory) 103 in accordance with the instruction. Or, the CPU 102 loads a program stored on the hard disk 105, a program transferred from a satellite or a network, received by the communication section 108 and installed in the hard disk 105, or a program read out from the removable recording medium 111 loaded in a drive 109 and installed in the hard disk 105, into a RAM (Random Access Memory) 104 and then executes the program. Consequently, the CPU 102 performs the process in accordance with the flow charts described hereinabove or performs processes which can be performed by the configuration described hereinabove with reference to the block diagrams. Then, as occasion demands, the CPU 102 causes, for example, an outputting section 106, which is formed from an LCD (Liquid Crystal Display) unit, a speaker and so forth, to output a result of the process through the input/output interface 110 or causes the communication section 108 to transmit or the hard disk 105 to record the result of the process.
  • It is to be noted that, in the present specification, the steps which describe the program for causing a computer to execute various processes may be but need not necessarily be processed in a time series in the order as described as the flow charts, and include processes which are executed in parallel or individually (for example, processes by parallel processing or by an object).
  • Further, the program may be processed by a single computer or may otherwise be processed in a distributed fashion by a plurality of computers. Further, the program may be transferred to and executed by a computer at a remote place.

Claims (10)

1. A signal processing apparatus for processing an audio signal, comprising:
a production section for producing a level signal representative of a transition of the level of the audio signal;
a frequency analysis section for frequency analyzing the level signal produced by said production section; and
a feature value calculation section for determining a feature value or values of the audio signal based on a result of the frequency analysis by said frequency analysis section.
2. A signal processing apparatus according to claim 1, wherein said feature value calculation section determines a tempo of the audio signal as the feature value.
3. A signal processing apparatus according to claim 1, wherein said feature value calculation section determines a speed feeling of the audio signal as the feature value.
4. A signal processing apparatus according to claim 1, wherein said feature value calculation section determines a fluctuation of a tempo of the audio signal as the feature value.
5. A signal processing apparatus according to claim 1, wherein said feature value calculation section determines a tempo and a speed feeling of the audio signal as the feature values, and corrects the tempo based on the speed feeling to determine a final tempo.
6. A signal processing apparatus according to claim 1, further comprising a statistic processing section for performing a statistic process of the result of the frequency analysis by said frequency analysis section, said feature value calculation section determining the feature value or values based on the result of the frequency analysis statistically processed by said statistic processing section.
7. A signal processing apparatus according to claim 1, further comprising a frequency component processing section for adding, to frequency components of the level signal of the result of the frequency analysis by said frequency analysis section, frequency components having a relationship of harmonics to the frequency components and outputting the sum values as the frequency components of the level signal, said feature value calculation section determining the feature value or values based on the frequency components outputted from said frequency component processing section.
8. A signal processing method for a signal processing apparatus which processes an audio signal, comprising:
a production step of producing a level signal representative of a transition of the level of the audio signal;
a frequency analysis step of frequency analyzing the level signal produced by the process at the production step; and
a feature value calculation step of determining a feature value or values of the audio signal based on a result of the frequency analysis by the process at the frequency analysis step.
9. A program for causing a computer to execute processing of an audio signal, comprising:
a production step of producing a level signal representative of a transition of the level of the audio signal;
a frequency analysis step of frequency analyzing the level signal produced by the process at the production step; and
a feature value calculation step of determining a feature value or values of the audio signal based on a result of the frequency analysis by the process at the frequency analysis step.
10. A recording medium on or in which a program for causing a computer to execute processing of an audio signal is recorded, the program comprising:
a production step of producing a level signal representative of a transition of the level of the audio signal;
a frequency analysis step of frequency analyzing the level signal produced by the process at the production step; and
a feature value calculation step of determining a feature value or values of the audio signal based on a result of the frequency analysis by the process at the frequency analysis step.
US11/082,778 2004-03-23 2005-03-18 Signal processing apparatus and signal processing method, program, and recording medium Expired - Fee Related US7507901B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/350,519 US7868240B2 (en) 2004-03-23 2009-01-08 Signal processing apparatus and signal processing method, program, and recording medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2004084815A JP4650662B2 (en) 2004-03-23 2004-03-23 Signal processing apparatus, signal processing method, program, and recording medium
JP2004-084815 2004-03-23

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US12/350,519 Continuation US7868240B2 (en) 2004-03-23 2009-01-08 Signal processing apparatus and signal processing method, program, and recording medium

Publications (2)

Publication Number Publication Date
US20050217463A1 true US20050217463A1 (en) 2005-10-06
US7507901B2 US7507901B2 (en) 2009-03-24

Family

ID=35052807

Family Applications (2)

Application Number Title Priority Date Filing Date
US11/082,778 Expired - Fee Related US7507901B2 (en) 2004-03-23 2005-03-18 Signal processing apparatus and signal processing method, program, and recording medium
US12/350,519 Expired - Fee Related US7868240B2 (en) 2004-03-23 2009-01-08 Signal processing apparatus and signal processing method, program, and recording medium

Family Applications After (1)

Application Number Title Priority Date Filing Date
US12/350,519 Expired - Fee Related US7868240B2 (en) 2004-03-23 2009-01-08 Signal processing apparatus and signal processing method, program, and recording medium

Country Status (2)

Country Link
US (2) US7507901B2 (en)
JP (1) JP4650662B2 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070112558A1 (en) * 2005-10-25 2007-05-17 Yoshiyuki Kobayashi Information processing apparatus, information processing method and program
US20080236371A1 (en) * 2007-03-28 2008-10-02 Nokia Corporation System and method for music data repetition functionality
US20090249942A1 (en) * 2008-04-07 2009-10-08 Sony Corporation Music piece reproducing apparatus and music piece reproducing method
US20090287323A1 (en) * 2005-11-08 2009-11-19 Yoshiyuki Kobayashi Information Processing Apparatus, Method, and Program
US20100094782A1 (en) * 2005-10-25 2010-04-15 Yoshiyuki Kobayashi Information Processing Apparatus, Information Processing Method, and Program
US20100282045A1 (en) * 2009-05-06 2010-11-11 Ching-Wei Chen Apparatus and method for determining a prominent tempo of an audio work
US20100325135A1 (en) * 2009-06-23 2010-12-23 Gracenote, Inc. Methods and apparatus for determining a mood profile associated with media data
US20110067555A1 (en) * 2008-04-11 2011-03-24 Pioneer Corporation Tempo detecting device and tempo detecting program
US20140316544A1 (en) * 2010-04-17 2014-10-23 NL Giken Incorporated Electronic Music Box
US8952233B1 (en) * 2012-08-16 2015-02-10 Simon B. Johnson System for calculating the tempo of music
US9753925B2 (en) 2009-05-06 2017-09-05 Gracenote, Inc. Systems, methods, and apparatus for generating an audio-visual presentation using characteristics of audio, visual and symbolic media objects
CN111383621A (en) * 2018-12-28 2020-07-07 罗兰株式会社 Information processing device, rhythm detection device, and image processing system

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4650662B2 (en) * 2004-03-23 2011-03-16 ソニー株式会社 Signal processing apparatus, signal processing method, program, and recording medium
JP2009047860A (en) * 2007-08-17 2009-03-05 Sony Corp Performance supporting device and method, and program
JP4640407B2 (en) * 2007-12-07 2011-03-02 ソニー株式会社 Signal processing apparatus, signal processing method, and program
JP5179905B2 (en) * 2008-03-11 2013-04-10 ローランド株式会社 Performance equipment
JP4591557B2 (en) 2008-06-16 2010-12-01 ソニー株式会社 Audio signal processing apparatus, audio signal processing method, and audio signal processing program
JP4640463B2 (en) 2008-07-11 2011-03-02 ソニー株式会社 Playback apparatus, display method, and display program
JP2010063316A (en) * 2008-09-05 2010-03-18 Toyota Motor Corp Current detector
JP5203404B2 (en) * 2010-02-13 2013-06-05 学校法人 龍谷大学 Tempo value detection device and tempo value detection method
JP2011221133A (en) 2010-04-06 2011-11-04 Sony Corp Information processing device, client device, server device, list generating method, list retrieving method, list providing method, and program
JP5569228B2 (en) * 2010-08-02 2014-08-13 ソニー株式会社 Tempo detection device, tempo detection method and program
CN103093786A (en) * 2011-10-27 2013-05-08 浪潮乐金数字移动通信有限公司 Music player and implementation method thereof
WO2017145800A1 (en) 2016-02-25 2017-08-31 株式会社ソニー・インタラクティブエンタテインメント Voice analysis apparatus, voice analysis method, and program
JP6842558B2 (en) * 2017-09-12 2021-03-17 AlphaTheta株式会社 Music analysis device and music analysis program
CN108281157B (en) * 2017-12-28 2021-11-12 广州市百果园信息技术有限公司 Method for detecting drumbeat beat in music, computer storage medium and terminal

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5952596A (en) * 1997-09-22 1999-09-14 Yamaha Corporation Method of changing tempo and pitch of audio by digital signal processing
US6188967B1 (en) * 1998-05-27 2001-02-13 International Business Machines Corporation Audio feedback control for manufacturing processes
US20020053275A1 (en) * 2000-11-06 2002-05-09 Michiko Ogawa Musical signal processing apparatus
US20030133700A1 (en) * 2002-01-15 2003-07-17 Yamaha Corporation Multimedia platform for recording and/or reproducing music synchronously with visual images
US6721711B1 (en) * 1999-10-18 2004-04-13 Roland Corporation Audio waveform reproduction apparatus
US20040069123A1 (en) * 2001-01-13 2004-04-15 Native Instruments Software Synthesis Gmbh Automatic recognition and matching of tempo and phase of pieces of music, and an interactive music player based thereon
US20040177746A1 (en) * 2001-06-18 2004-09-16 Friedmann Becker Automatic generation of musical scratching effects
US20050217462A1 (en) * 2004-04-01 2005-10-06 Thomson J Keith Method and apparatus for automatically creating a movie
US20050275805A1 (en) * 2004-06-15 2005-12-15 Yu-Ru Lin Slideshow composition method
US20050283360A1 (en) * 2004-06-22 2005-12-22 Large Edward W Method and apparatus for nonlinear frequency analysis of structured signals
US20060185501A1 (en) * 2003-03-31 2006-08-24 Goro Shiraishi Tempo analysis device and tempo analysis method
US20070044641A1 (en) * 2003-02-12 2007-03-01 Mckinney Martin F Audio reproduction apparatus, method, computer program
US7236226B2 (en) * 2005-01-12 2007-06-26 Ulead Systems, Inc. Method for generating a slide show with audio analysis
US20070180980A1 (en) * 2006-02-07 2007-08-09 Lg Electronics Inc. Method and apparatus for estimating tempo based on inter-onset interval count

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4887505A (en) * 1987-06-26 1989-12-19 Yamaha Corporation Electronic musical instrument capable of performing an automatic accompaniment
JP3127406B2 (en) * 1991-05-13 2001-01-22 カシオ計算機株式会社 Tempo detection device
US5350882A (en) * 1991-12-04 1994-09-27 Casio Computer Co., Ltd. Automatic performance apparatus with operated rotation means for tempo control
JP3455757B2 (en) * 1993-08-25 2003-10-14 カシオ計算機株式会社 Tempo data generation device and tempo data generation method
JP3362491B2 (en) * 1993-12-27 2003-01-07 ティーディーケイ株式会社 Voice utterance device
JP2900976B2 (en) * 1994-04-27 1999-06-02 日本ビクター株式会社 MIDI data editing device
JP2636197B2 (en) * 1995-01-24 1997-07-30 工業技術院長 Awakening maintaining device and awakening recording medium
JPH10134549A (en) * 1996-10-30 1998-05-22 Nippon Columbia Co Ltd Music program searching-device
JP3631650B2 (en) * 1999-03-26 2005-03-23 日本電信電話株式会社 Music search device, music search method, and computer-readable recording medium recording a music search program
US6392135B1 (en) * 1999-07-07 2002-05-21 Yamaha Corporation Musical sound modification apparatus and method
JP4060993B2 (en) * 1999-07-26 2008-03-12 パイオニア株式会社 Audio information storage control method and apparatus, and audio information output apparatus.
JP3789326B2 (en) * 2000-07-31 2006-06-21 松下電器産業株式会社 Tempo extraction device, tempo extraction method, tempo extraction program, and recording medium
JP3780858B2 (en) * 2001-03-26 2006-05-31 ヤマハ株式会社 Waveform data analysis method, waveform data analysis apparatus and program
JP4695781B2 (en) * 2001-07-10 2011-06-08 大日本印刷株式会社 Method for encoding an acoustic signal
JP3674950B2 (en) * 2002-03-07 2005-07-27 ヤマハ株式会社 Method and apparatus for estimating tempo of music data
JP4650662B2 (en) * 2004-03-23 2011-03-16 ソニー株式会社 Signal processing apparatus, signal processing method, program, and recording medium
JP2006011550A (en) * 2004-06-22 2006-01-12 Sony Corp Information transmission system by cooperative filtering, information processing apparatus to be used for the same, and program to be used in information processing
JP2007164545A (en) * 2005-12-14 2007-06-28 Sony Corp Preference profile generator, preference profile generation method, and profile generation program
US20090260506A1 (en) * 2008-04-17 2009-10-22 Utah State University Method for controlling the tempo of a periodic conscious human physiological activity

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5952596A (en) * 1997-09-22 1999-09-14 Yamaha Corporation Method of changing tempo and pitch of audio by digital signal processing
US6188967B1 (en) * 1998-05-27 2001-02-13 International Business Machines Corporation Audio feedback control for manufacturing processes
US6721711B1 (en) * 1999-10-18 2004-04-13 Roland Corporation Audio waveform reproduction apparatus
US20020053275A1 (en) * 2000-11-06 2002-05-09 Michiko Ogawa Musical signal processing apparatus
US20040069123A1 (en) * 2001-01-13 2004-04-15 Native Instruments Software Synthesis Gmbh Automatic recognition and matching of tempo and phase of pieces of music, and an interactive music player based thereon
US20040177746A1 (en) * 2001-06-18 2004-09-16 Friedmann Becker Automatic generation of musical scratching effects
US20030133700A1 (en) * 2002-01-15 2003-07-17 Yamaha Corporation Multimedia platform for recording and/or reproducing music synchronously with visual images
US20070044641A1 (en) * 2003-02-12 2007-03-01 Mckinney Martin F Audio reproduction apparatus, method, computer program
US20060185501A1 (en) * 2003-03-31 2006-08-24 Goro Shiraishi Tempo analysis device and tempo analysis method
US20050217462A1 (en) * 2004-04-01 2005-10-06 Thomson J Keith Method and apparatus for automatically creating a movie
US20050275805A1 (en) * 2004-06-15 2005-12-15 Yu-Ru Lin Slideshow composition method
US20050283360A1 (en) * 2004-06-22 2005-12-22 Large Edward W Method and apparatus for nonlinear frequency analysis of structured signals
US7236226B2 (en) * 2005-01-12 2007-06-26 Ulead Systems, Inc. Method for generating a slide show with audio analysis
US20070180980A1 (en) * 2006-02-07 2007-08-09 Lg Electronics Inc. Method and apparatus for estimating tempo based on inter-onset interval count

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100094782A1 (en) * 2005-10-25 2010-04-15 Yoshiyuki Kobayashi Information Processing Apparatus, Information Processing Method, and Program
US8738674B2 (en) * 2005-10-25 2014-05-27 Sony Corporation Information processing apparatus, information processing method and program
US20070112558A1 (en) * 2005-10-25 2007-05-17 Yoshiyuki Kobayashi Information processing apparatus, information processing method and program
US8315954B2 (en) 2005-10-25 2012-11-20 Sony Corporation Device, method, and program for high level feature extraction
US8101845B2 (en) 2005-11-08 2012-01-24 Sony Corporation Information processing apparatus, method, and program
US20090287323A1 (en) * 2005-11-08 2009-11-19 Yoshiyuki Kobayashi Information Processing Apparatus, Method, and Program
US20080236371A1 (en) * 2007-03-28 2008-10-02 Nokia Corporation System and method for music data repetition functionality
US7659471B2 (en) * 2007-03-28 2010-02-09 Nokia Corporation System and method for music data repetition functionality
US20090249942A1 (en) * 2008-04-07 2009-10-08 Sony Corporation Music piece reproducing apparatus and music piece reproducing method
US8076567B2 (en) 2008-04-07 2011-12-13 Sony Corporation Music piece reproducing apparatus and music piece reproducing method
US8344234B2 (en) 2008-04-11 2013-01-01 Pioneer Corporation Tempo detecting device and tempo detecting program
US20110067555A1 (en) * 2008-04-11 2011-03-24 Pioneer Corporation Tempo detecting device and tempo detecting program
WO2010129693A1 (en) * 2009-05-06 2010-11-11 Gracenote, Inc. Apparatus and method for determining a prominent tempo of an audio work
US9753925B2 (en) 2009-05-06 2017-09-05 Gracenote, Inc. Systems, methods, and apparatus for generating an audio-visual presentation using characteristics of audio, visual and symbolic media objects
US20100282045A1 (en) * 2009-05-06 2010-11-11 Ching-Wei Chen Apparatus and method for determining a prominent tempo of an audio work
US8071869B2 (en) 2009-05-06 2011-12-06 Gracenote, Inc. Apparatus and method for determining a prominent tempo of an audio work
US20100325135A1 (en) * 2009-06-23 2010-12-23 Gracenote, Inc. Methods and apparatus for determining a mood profile associated with media data
US8805854B2 (en) 2009-06-23 2014-08-12 Gracenote, Inc. Methods and apparatus for determining a mood profile associated with media data
US11580120B2 (en) 2009-06-23 2023-02-14 Gracenote, Inc. Methods and apparatus for determining a mood profile associated with media data
US11204930B2 (en) 2009-06-23 2021-12-21 Gracenote, Inc. Methods and apparatus for determining a mood profile associated with media data
US10558674B2 (en) 2009-06-23 2020-02-11 Gracenote, Inc. Methods and apparatus for determining a mood profile associated with media data
US9842146B2 (en) 2009-06-23 2017-12-12 Gracenote, Inc. Methods and apparatus for determining a mood profile associated with media data
US9367282B2 (en) * 2010-04-17 2016-06-14 NL Giken Incorporated Electronic music box
US20140316544A1 (en) * 2010-04-17 2014-10-23 NL Giken Incorporated Electronic Music Box
US9286871B2 (en) * 2012-08-16 2016-03-15 Clevx, Llc System for calculating the tempo of music
US20150143977A1 (en) * 2012-08-16 2015-05-28 Clevx, Llc System for calculating the tempo of music
US8952233B1 (en) * 2012-08-16 2015-02-10 Simon B. Johnson System for calculating the tempo of music
CN111383621A (en) * 2018-12-28 2020-07-07 罗兰株式会社 Information processing device, rhythm detection device, and image processing system
US11094305B2 (en) * 2018-12-28 2021-08-17 Roland Corporation Information processing device, tempo detection device and video processing system
US20210335332A1 (en) * 2018-12-28 2021-10-28 Roland Corporation Video processing device and video processing method

Also Published As

Publication number Publication date
JP2005274708A (en) 2005-10-06
US20090114081A1 (en) 2009-05-07
US7507901B2 (en) 2009-03-24
JP4650662B2 (en) 2011-03-16
US7868240B2 (en) 2011-01-11

Similar Documents

Publication Publication Date Title
US7868240B2 (en) Signal processing apparatus and signal processing method, program, and recording medium
EP1947638B1 (en) Information Processing Device and Method, and Program
JP5543640B2 (en) Perceptual tempo estimation with scalable complexity
US7493254B2 (en) Pitch determination method and apparatus using spectral analysis
CN102568474B (en) Signal processing apparatus and signal processing method
US7649137B2 (en) Signal processing apparatus and method, program, and recording medium
JP6005510B2 (en) Selection of sound components in the audio spectrum for articulation and key analysis
CN102750948B (en) Music searching Apparatus and method for
EP4156184A1 (en) Encoding device and method, decoding device and method, and program
US20050211077A1 (en) Signal processing apparatus and method, recording medium and program
US20080245215A1 (en) Signal Processing Apparatus and Method, Program, and Recording Medium
US20150380014A1 (en) Method of singing voice separation from an audio mixture and corresponding apparatus
US20070288233A1 (en) Apparatus and method for detecting degree of voicing of speech signal
EP2544175A1 (en) Music section detecting apparatus and method, program, recording medium, and music signal detecting apparatus
US6564187B1 (en) Waveform signal compression and expansion along time axis having different sampling rates for different main-frequency bands
JPH10307580A (en) Music searching method and device
US20040159220A1 (en) 2-phase pitch detection method and apparatus
US9398387B2 (en) Sound processing device, sound processing method, and program
EP1306831B1 (en) Digital signal processing method, learning method, apparatuses for them, and program storage medium
JP2006505818A (en) Method and apparatus for generating audio components
US20030139830A1 (en) Information extracting device
JP2002049399A (en) Digital signal processing method, learning method, and their apparatus, and program storage media therefor
JP2002049398A (en) Digital signal processing method, learning method, and their apparatus, and program storage media therefor
JP2008096844A (en) Automatic music transcription device and method
KR100406248B1 (en) Method and apparatus for onset-detecting

Legal Events

Date Code Title Description
AS Assignment

Owner name: SONY CORPORATION, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:KOBAYASHI, YOSHIYUKI;REEL/FRAME:016701/0672

Effective date: 20050601

STCF Information on status: patent grant

Free format text: PATENTED CASE

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20210324