US7552105B2 - Importance degree calculation program, importance degree calculation method, and importance degree calculation apparatus - Google Patents
Importance degree calculation program, importance degree calculation method, and importance degree calculation apparatus Download PDFInfo
- Publication number
- US7552105B2 US7552105B2 US11/482,929 US48292906A US7552105B2 US 7552105 B2 US7552105 B2 US 7552105B2 US 48292906 A US48292906 A US 48292906A US 7552105 B2 US7552105 B2 US 7552105B2
- Authority
- US
- United States
- Prior art keywords
- section
- neighborhood
- instance
- importance degree
- instance set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
Definitions
- the present invention relates to an importance degree calculation program, an importance degree calculation method, and an importance degree calculation apparatus for calculating an importance degree used in Memory-based Reasoning (MBR).
- MLR Memory-based Reasoning
- “Variable” means type of information such as age or gender.
- Category value means a value represented by a character string such as “man”, “ woman”. There is no order relationship between category values.
- Category value variable means a variable whose value is the category value.
- numeric variable means a variable whose value is a numeric value such as age. There is an order relationship between values of the numeric variable.
- “Objective variable” means the category value variable serving as a criterion for calculation of an importance degree (to be described later).
- “Objective variable value” means a value of the objective variable.
- “Objective variable distribution” means the frequency distribution of the objective variable. The total of all objective variable value distributions becomes 1.
- Explanatory variable means a variable other than the objective variable, which serves as a calculation target in calculation of an importance degree (to be described later).
- Explanatory variable value means a value of the explanatory variable.
- “Instance” means a set of a plurality of explanatory variable values and one objective variable value.
- “Instance set” means a set including a plurality of instances.
- “Section” means a given range obtained by dividing the explanatory variable. In the case where “age” is used as an explanatory variable, the section indicates, e.g., a range from 20 to 29 years old.
- Importance degree means importance of a given section of the explanatory variable in the instance set.
- Importance degree is calculated with the objective variable as a criterion.
- the instance set having two instances each including three explanatory variables of “gender”, “age”, and “annual income” and one objective variable of “buying history” is represented as follows.
- Importance degree is used in an MBR as disclosed in, e.g., Patent Document 1: Jpn. Pat. Appln. Laid-Open Publication No. 2005-302054.
- the MBR extracts a plurality of instances close in terms of distance to an instance (unknown instance) whose objective variable is unknown from an instance set whose objective variable is known and estimates the objective variable of the unknown instance by a majority among the plurality of instances.
- the importance degree is used for calculation of a distance between instances, and emphasis is placed on the explanatory variable having a higher importance degree to increase accuracy in the estimation. Therefore, in order to make highly accurate estimation in the MBR, it is important to accurately calculate the importance degree (refer to Patent Document 1).
- Nc is the number of types of objective variable values in an instance set
- v) is the distribution of an objective variable value c in a j-th section vj in an explanatory variable
- p(c) is the distribution of an objective variable value c in the entire instance set.
- ⁇ denotes summation over all c or summation all over d.
- a conventional method comprises the following steps: previously dividing an explanatory variable into a plurality of sections; calculating the objective variable distribution in each section; and using the calculated distributions without change to calculate the objective variable distribution.
- FIG. 20 shows an example in which an explanatory variable is coarsely divided so as not to cause an error. As can be seen from FIG. 20 , a large difference is observed between the calculated objective variable distribution and a real distribution at the central portion.
- the average width stays constant in a conventional method, so that a problem occurs when the frequency of the explanatory variable drastically changes. More specifically, in a low density part, the frequency of the average width becomes low to decrease reliability of calculated objective variable distribution, so that an error is likely to occur in the importance degree obtained by using the calculated objective variable distribution. On the other hand, in a high density part, calculation of the objective variable distribution is made beyond the required frequency (i.e., including unnecessary part), so that a difference is caused between the calculated objective variable distribution and a real distribution.
- FIG. 21 shows an example in which the average width is set wide so that objective variable distribution having higher reliability can be obtained even in a low frequency part. As can be seen from FIG. 21 , a large difference is observed between the obtained objective variable distribution and a real distribution in the central part.
- the present invention has been made to solve the above problem, and an object thereof is to provide an importance degree calculation program, an importance degree calculation apparatus, and an importance degree calculation method capable of obtaining an objective variable distribution having high reliability irrespective of the section size in an explanatory variable in which frequency drastically changes.
- an importance degree calculation program allowing a computer to calculate importance degree from an instance set and an explanatory variable, comprising: a section generation step that receives, as an input, an instance set and an explanatory variable and uses the instance set to divide the explanatory variable into a plurality of sections to obtain a section set; a neighborhood instance set extraction step that uses the instance set, the section set obtained by the section generation step, and a neighborhood instance number threshold to extract from across all sections a neighborhood instance set of each section in which the number of instances is greater than the neighborhood instance number threshold; an objective variable distribution calculation step that calculates an objective variable distribution from the neighborhood instance set of each section extracted by the neighborhood instance set extraction step; and an importance degree calculation step that calculates importance degree of each section from the objective variable distribution in each section obtained by the objective variable distribution calculation step and instance set.
- the neighborhood instance set extraction step comprises: a neighborhood section set extraction step that uses the instance set, section set, and neighborhood instance number threshold to output a neighborhood section set of each section in which the sum of instances exceeds the neighborhood instance number threshold; and a neighborhood instance set output step that outputs, as a neighborhood instance set, an instance set included in the neighborhood section set of each section obtained by the neighborhood section set extraction step.
- the section generation step outputs respective category values existing in the explanatory variable as a section
- the neighborhood instance set extraction step uses the sections generated by the section generation step and distance between category values to extract neighborhood instances in the ascending order in terms of the distance between category values.
- the section generation step outputs respective category values existing in the explanatory variable as a section
- the neighborhood section set extraction step uses the sections generated by the section generation step and distance between categories to extract neighborhood sections in the ascending order in terms of the distance between category values.
- the section generation step outputs a section set in which each section is constituted by a combination of sections of respective explanatory variables.
- the section generation step outputs a section set in which sections are arranged in time-series.
- the importance degree calculation program comprises: a neighborhood instance number threshold input step that inputs a neighborhood instance number threshold; and an importance degree list display step that displays a list of calculated importance degrees, wherein the neighborhood instance number threshold input step and importance degree list display step are alternately repeated based on the user's determination.
- the importance degree calculation program according to the present invention comprises a neighborhood instance number threshold calculation step that uses an instance set and a neighborhood ratio threshold to output the product of the number of instances in the instance set and neighborhood ratio threshold as the instance number threshold.
- the importance degree calculation program comprises: a neighborhood ratio threshold input step that inputs a neighborhood ratio threshold; and an importance degree list display step that displays a list of importance degrees, wherein the neighborhood ratio threshold input step and importance degree list display step are alternately repeated based on the user's determination.
- the objective variable distribution calculation step calculates a distance between each instance in a neighborhood instance set and a section serving as a criterion and uses a weight which increases as the distance decreases to calculate an objective variable distribution.
- an importance degree calculation apparatus comprising: a section generation section that receives, as an input, an instance set and an explanatory variable and uses the instance set to divide the explanatory variable into a plurality of sections to obtain a section set; a neighborhood instance set extraction section that receives, as an input, the instance set, the section set obtained by the section generation section, and a neighborhood instance number threshold to extract from across all sections a neighborhood instance set of each section in which the number of instances is greater than the neighborhood instance number threshold; an objective variable distribution calculation section that calculates an objective variable distribution from the neighborhood instance set of each section extracted by the neighborhood instance set extraction section; and an importance degree calculation section that calculates importance degree of each section from the objective variable distribution in each section obtained by the objective variable distribution calculation section and instance set.
- the neighborhood instance set extraction section comprises: a neighborhood section set extraction section that receives, as an input, the instance set, section set, and neighborhood instance number threshold to output a neighborhood section set of each section in which the sum of instances exceeds the neighborhood instance number threshold; and a neighborhood instance set output section that outputs, as a neighborhood instance set, an instance set included in the neighborhood section set of each section obtained by the neighborhood section set extraction section.
- the section generation section outputs respective category values existing in the explanatory variable as a section
- the neighborhood instance set extraction section uses the sections generated by the section generation section and distance between category values to extract neighborhood instances in the ascending order in terms of the distance between category values.
- the importance degree calculation apparatus comprises a neighborhood instance number threshold calculation section that receives, as an input, an instance set and a neighborhood ratio threshold to output the product of the number of instances in the instance set and neighborhood ratio threshold as the instance number threshold.
- an importance degree calculation method that calculates importance degree from an instance set and an explanatory variable.
- the method is executed by a computer.
- the method comprises: a section generation step that receives, as an input, an instance set and an explanatory variable and uses the instance set to divide the explanatory variable into a plurality of sections to obtain a section set; a neighborhood instance set extraction step that uses the instance set, the section set obtained by the section generation step, and a neighborhood instance number threshold to extract from across all sections a neighborhood instance set of each section in which the number of instances is greater than the neighborhood instance number threshold; an objective variable distribution calculation step that calculates an objective variable distribution from the neighborhood instance set of each section extracted by the neighborhood instance set extraction step; and an importance degree calculation step that calculates importance degree of each section from the objective variable distribution in each section obtained by the objective variable distribution calculation step and instance set.
- FIG. 1 is a block diagram showing a basic configuration of an embodiment of the present invention
- FIG. 2 shows a flowchart showing operation of a neighborhood instance set extraction section 12 ;
- FIG. 3 shows an example of a frequency, a real objective variable distribution, and an objective variable distribution obtained in the present embodiment
- FIG. 4 is a block diagram showing another configuration example of the neighborhood instance set extraction section
- FIG. 5 shows a flowchart showing operation of a neighborhood section set extraction section 31 ;
- FIG. 6 is a block diagram showing a case where a category value variable is specified as an explanatory variable
- FIG. 7 is a table showing an example of a distance between each section and instance
- FIG. 8 is a block diagram showing a neighborhood instance number threshold calculation section
- FIG. 9 is a flowchart showing operation of the neighborhood instance number threshold calculation section
- FIG. 10 is a block diagram showing a neighborhood instance number threshold (or neighborhood ratio threshold) input section
- FIG. 11 is a flowchart showing a determination flow of a neighborhood instance number threshold (neighborhood ratio threshold);
- FIG. 12 is a view showing an example of an instance set
- FIG. 13 is a view showing a calculation result obtained according to a conventional method (sectioning).
- FIG. 14 is a view showing a calculation result obtained according to a conventional method (moving average).
- FIG. 15 is a calculation result obtained according to the present embodiment.
- FIG. 16 is a graph showing a result obtained according to conventional methods and a method of the present embodiment.
- FIG. 17 is a view showing an example of an instance set for time-series data
- FIG. 18 is a view showing a distance between sections calculated according to the present embodiment.
- FIG. 19 is a view showing a calculation result obtained according to the present embodiment.
- FIG. 20 is a view showing an example of a frequency, a real objective variable distribution and an objective variable distribution obtained using a conventional method (sectioning).
- FIG. 21 is a view showing an example of a frequency, a real objective variable distribution, and an objective variable distribution obtained using a conventional method (moving average).
- FIG. 1 is a block diagram showing a basic configuration of an embodiment of the present invention.
- An importance degree calculation apparatus 1 includes a section generation section 11 , a neighborhood instance set extraction section 12 , an objective variable distribution calculation section 13 , and an importance degree calculation section 14 .
- operations of the respective sections 11 to 14 correspond to steps (steps S 11 to S 14 ) necessary to perform operations of the present invention.
- the section generation section 11 receives, as an input, an instance set and an explanatory variable and divides the explanatory variable into fine sections (step S 11 ).
- the neighborhood instance set extraction section 12 receives, as an input, the instance set, section set, and a threshold of the number of neighborhood instances and extracts, across all sections, a neighborhood instance set of each section in which the number of instances is greater than the neighborhood instance number threshold irrespective of the size of each explanatory variable section obtained by the section generation section 11 (step S 12 ).
- the objective variable distribution calculation section 13 calculates an objective variable distribution based on the neighborhood instance set of each section (step S 13 ).
- the importance degree calculation section 14 calculates an importance degree of each section based on the objective variable distribution of each section and input instance set (step S 14 ).
- the section generation section 11 can use, e.g., a method of equally dividing the explanatory variable range (from the maximum value to minimum value) into a large number (e.g., 1,000) of sections.
- the neighborhood instance set extraction section 12 calculates the average value (average number of instances included in each section) of the explanatory variables in each section, extracts instances the number of which corresponds to an upper “neighborhood instance number threshold” near the average value, and sets a set of the extracted instances as the neighborhood instance set.
- the neighborhood instance number threshold is specified by a user. Examples of the method for specifying the neighborhood instance number threshold include the followings.
- FIG. 2 shows a flowchart showing detailed operation of the neighborhood instance set extraction section 12 .
- the neighborhood instance set extraction section 12 acquires a section set and an instance set respectively from a section set database 21 and an instance set database 22 and calculates an average value (average number of instances) in each section (step S 121 ).
- the neighborhood instance set extraction section 12 then calculates a distance between each section and instance (step S 122 ) and performs the calculation for all instances (steps S 122 and S 123 ). Results of the distance calculation are stored in a database 23 for set of instance and distance.
- the neighborhood instance set extraction section 12 then extracts instances the number of which corresponds to an upper neighborhood instance number threshold based on the set of instance and distance in the database 23 and neighborhood instance number threshold in a neighborhood instance number threshold database 24 (step S 124 ).
- the extraction result is stored in a neighborhood instance set database 25 .
- the neighborhood instance set extraction section 12 extracts, across all sections, a neighborhood instance set of each section in which the number of instances is greater than the neighborhood instance number threshold irrespective of the size of each explanatory variable section to thereby obtain sets of neighborhood instances.
- the neighborhood instance set extraction section 12 calculates an importance degree of each section based on the obtained sets of neighborhood instances, it is possible to increase reliability of an objective variable distribution obtained by the objective variable distribution calculation section 13 (the details will be described later) irrespective of the section size. As a result, the importance degree becomes less subject to an error in objective variable distribution, thereby increasing reliability of the importance degree.
- the objective variable distribution calculation section 13 calculates objective variable distribution based on the neighborhood instance set of each section. For example, a distribution of a given objective variable value is obtained according to the following formula: “(the number of instances whose objective variable value fulfills a predetermined criterion in neighborhood instance set)/(the number of instances in neighborhood instance set)”.
- FIG. 3 shows an example of a frequency, a real objective variable distribution, and an obtained objective variable.
- the obtained objective variable value distribution is extremely close to the real objective variable value distribution. This reveals that the importance degree calculation section 14 (the details of which will be described below) can obtain a highly reliable importance degree.
- the importance degree calculation section 14 calculates an importance degree of each section based on an objective variable distribution and an instance set of each section.
- the importance degree is calculated based on a difference between the objective variable distribution in an explanatory variable value and entire objective variable distribution using the equation (1), as the weight calculation method disclosed in the abovementioned Patent Document 1.
- neighborhood instance set extraction section may be constituted by a neighborhood section set extraction section 31 and a neighborhood instance set output section 32 , as shown in FIG. 4 .
- the neighborhood section set extraction section 31 receives, as an input, an instance set, a section set, and a neighborhood instance number threshold and outputs a neighborhood section set of each section in which the number of instances is greater than the neighborhood instance number threshold. For example, the following method is used in order to obtain a neighborhood set of a given section.
- Step 1 Calculate the number of instances in each section and average value (average number of instances included in each section) of the explanatory variables in section.
- Step 2 Extract sections in the ascending order in terms of a distance between the average values thereof until the sum of the instances included in each section exceeds a neighborhood instance number threshold.
- Step 3 Set the section set obtained in step 2 as a neighborhood section set of each section.
- the neighborhood instance set output section 32 outputs an instance set included in the neighborhood set section of each section as a neighborhood instance set.
- FIG. 5 shows a flowchart showing operation of the neighborhood section set extraction section 31 .
- the neighborhood section set extraction section 31 acquires a section set and an instance set respectively from the section set database 21 and instance set database 22 and calculates an average value in each section (steps S 211 and S 212 ). This calculation is performed for all sections.
- the average value in each section is stored in a section average value database 26 .
- the neighborhood section set extraction section 31 uses the average value in each section to calculate a distance from another section (step S 213 ). This calculation is performed for all sections (Yes in step S 214 ).
- the distance from one section to another is stored in the database 23 A as an element of a set of distances from one section to another.
- the neighborhood section set extraction section 31 extracts sections in the ascending order in terms of a difference between sections until the sum of the instances included in each section reaches a neighborhood instance number threshold (step S 215 ).
- the data from the databases 22 , 23 A, and 24 i.e., the instance set, set of distance from one section to another, and neighborhood instance number threshold are used in the extraction processing.
- neighborhood section sets of all sections are output from the database 25 A (step S 217 ).
- the neighborhood section set extraction section 31 extracts a neighborhood not by searching for a neighborhood of the instance set but by searching for a neighborhood of the section set. This reduces the number of extraction targets to enable higher speed operation than in the case of searching for a neighborhood of the instance set.
- FIG. 6 is a block diagram showing a case where a category value variable is specified as an explanatory variable.
- the section generation section 11 outputs category values existing in an explanatory variable as a section as in the case of FIG. 1 .
- the neighborhood instance set extraction section 12 allows a user to input a distance between category values, extracts neighborhood instances (neighborhood section, in the case of FIG. 4 ) in the ascending order in terms of a distance between category values and outputs a neighborhood instance set (neighborhood section set, in the case of FIG. 4 ).
- the section generation section 11 outputs three sections set for “office worker”, “part-time job”, and “inoccupation”.
- the neighborhood instance set extraction section 12 receives, as an input from a user, a neighborhood instance number threshold of “20” and distances between category values shown in the following table.
- the neighborhood instance set extraction section 12 outputs an instance set in which an explanatory variable is, “office worker” as a neighborhood instance set of “office worker”. Further, since a distance between “part-time job” and “inoccupation” is smallest, the neighborhood instance set extraction section 12 outputs an instance set in which an explanatory variable is “part-time job” or “inoccupation” as a neighborhood instance set of “part-time job” and an instance set in which an explanatory variable is “part-time job” or “inoccupation” as a neighborhood instance set of “inoccupation”.
- explanatory variable group A case where a plurality of explanatory variables (explanatory variable group) are specified will next be described.
- the section generation section 11 outputs a section set in which each section is constituted by a combination of sections of respective explanatory variables.
- the section generation section 11 outputs a section set constituted by six sections of “20 years old and 3 million”, “20 years old and 4 million”, “20 years old and 5 million”, “30 years old and 3 million”, “30 years old and 4 million”, and “30 years old and 5 million”.
- the neighborhood instance set extraction section 12 outputs a neighborhood instance set of each section in which the number of instances is greater than the neighborhood instance number threshold.
- the neighborhood instance set extraction section 12 uses the following equations (equations (2A and 2B)) to define a distance between a given section and instance and extracts neighborhood instance sets in the ascending order in terms of distance between them.
- section x and section y [ ⁇ (average value of section x of i-th explanatory variable ⁇ average value of section y )/(maximum value of i-th explanatory variable ⁇ minimum value of i-th explanatory variable) ⁇ 2 ] 1/2 (2B)
- a distance between each section and instance is represented as a table shown in FIG. 7 .
- a neighborhood instance set of each section can be extracted.
- the processing performed in the respective sections are same as those described in the above embodiment.
- the importance degree can be calculated for six sections of “year 1995 and 20 years old”, “year 2000 and 20 years old”, “year 2005 and 20 years old”, “year 1995 and 30 years old”, “year 2000 and 30 years old”, and “year 2005 and 30 years old”.
- a temporal variation in the importance degree of “20 years old” and “30 years old” from “year 1995” to “year 2005” can be grasped.
- neighborhood instance number threshold calculation section which calculates a neighborhood instance member threshold value and outputs it to the neighborhood instance set extraction section 12 (neighborhood section set extraction section 31 , in the case of FIG. 4 ) with reference to FIG. 8 .
- a flowchart of FIG. 9 shows operation of the neighborhood instance number threshold calculation section 61 .
- the neighborhood instance number threshold calculation section 61 calculates the number of instances in an instance set (step S 611 ), calculates the minimum number of instances in objective variable (step S 612 ), and calculates the following formula (step S 613 ). [Number of instances: A] ⁇ [Objective variable value instance number threshold in neighborhood: C]/[Minimum number of instances in objective variable: B] (3)
- the neighborhood instance number threshold calculation section 61 calculates the product of the number of instances and a neighborhood ratio threshold based on the calculated instance number (step S 614 ) and selects a smaller one of the product and neighborhood instance number threshold (step S 615 ). Then, the neighborhood instance number threshold calculation section 61 selects a larger one of the selected value and a value obtained according to (formula (3)) (step S 616 ) and sets a selected one as a neighborhood instance number threshold.
- neighborhood instance number threshold calculation section 61 receives, as an input, an instance set and a neighborhood ratio threshold and outputs the product of the number of instances in the instance set and neighborhood ratio threshold as a neighborhood instance number threshold. For example, a user specifies, e.g., 0.1 as a neighborhood ratio threshold. Thus, it is possible for the user to easily specify the neighborhood instance number threshold without taking consideration of the number of instances in an instance set.
- the objective variable distribution calculation section 13 calculates a distance between each instance in the neighborhood instance set and a section serving as a criterion and uses a weight which increases as the distance decreases to calculate an objective variable distribution.
- the distance between each instance in the neighborhood instance set and section serving as a criterion can be obtained using a distance calculation method (equations 2A and 2B) which has been described in (4: Case where plurality of explanatory variables (explanatory variable group) are specified) (the same calculation method can be used even in the case where the number of explanatory variables is 1).
- objective variable distribution p(i,c) of objective variable value c in i-th section can be calculated using the following (equation (4)).
- p ( i,c ) ⁇ (1/(1 +d ( i,j )))/ ⁇ (1/(1 +d ( i,j ))) (4)
- addition in the numerator is performed for j with respect to neighborhood instances in which an objective variable value is c, and addition in the denominator is performed for j with respect to all neighborhood instances.
- a neighborhood instance number threshold (or neighborhood ratio threshold) input section 81 allows a user to input a neighborhood instance number threshold (neighborhood ratio threshold) and presents a result obtained by calculation based on the input value to the user. The user can change the presented neighborhood instance number threshold (neighborhood ratio value) according to need.
- an importance degree list display section 82 displays a list of importance degrees output from the importance degree calculation section 14 to the user and prompts the user to determine whether he or she changes the input neighborhood instance number threshold (neighborhood ratio threshold).
- the user confirms the list of importance degrees and, when determining that importance degree varies greatly due to the influence of an error, increases the neighborhood instance number threshold (neighborhood ratio threshold) to be set. In contrast, when determining a variation in importance degree is poor and a neighborhood size is too large, the user decreases the neighborhood instance number threshold (neighborhood ratio threshold) to be set. After repetition of the above specification of the neighborhood instance number threshold (neighborhood ratio threshold) by the user and presentation of a list of importance degrees, the user inputs an adequate neighborhood instance number threshold (neighborhood ratio threshold), allowing an adequate influence level to be output.
- neighborhood instance number threshold or neighborhood ratio threshold
- neighborhood ratio threshold neighborhood ratio threshold
- the neighborhood instance number threshold (or neighborhood ratio threshold) input section 81 specifies an initial value of the neighborhood instance number threshold (step S 811 ) and calculates an importance degree (step S 812 ) to acquire a list of importance degrees (the list may be presented to a user for confirmation) (step S 813 ).
- the neighborhood instance number threshold (or neighborhood ratio threshold) input section 81 increases the current neighborhood instance number threshold (specified value or changed value) by a predetermined value, resets it (step S 815 ), and stores it in the database 24 as a neighborhood instance number threshold.
- the neighborhood instance number threshold (or neighborhood ratio threshold) input section 81 determines whether a variation in the importance degree is small (poor) (step S 816 ).
- neighborhood instance number threshold (or neighborhood ratio threshold) input section 81 decreases the current neighborhood instance number threshold (specified value or changed value) by a predetermined value, resets it (step S 817 ), and stores it in the database 24 as a neighborhood instance number threshold.
- the neighborhood instance number threshold (or neighborhood ratio threshold) input section 81 sets the current neighborhood instance number threshold without change and ends this flow.
- a user may make a determination in the above determination steps or determine an input value and set it in the above changing steps of a neighborhood instance number threshold (neighborhood ratio threshold).
- the neighborhood instance number threshold is specified by a user.
- “age” is set as an explanatory variable and “buying history” is set as an objective variable. “age” is a numeric variable ranging from 26 to 44 and changing one by one. “buying history” includes two values of “presence” and “absence”.
- parameters for each method are determined so that the number of instances to be used for calculating an objective variable distribution in each section becomes 100 or more. Note that a calculation result is rounded to two decimal places.
- the explanatory variable (age) is equally divided into two sections of “26 to 35 years old” and “36 to 44 years old”. In this case, the number of instances in each section, objective value variable, and importance degree are as shown in FIG. 13 .
- the moving average width is set to 5 (average is calculated within and between 5 years old around a target section).
- the number of instances in moving average width, objective variable distribution, and importance degree are as shown in FIG. 14 .
- Importance degree for each age obtained according to the above three methods is shown in FIG. 16 .
- an objective variable distribution and an importance degree largely differ from the real importance value (always 0.8) in age sections more than 35 years old.
- a proper objective variable distribution and importance degree are obtained. This reveals that by using the method of the present embodiment, it is possible to obtain an importance degree corresponding to a fine section size and less subject to an error even in an explanatory variable in which frequency drastically changes.
- the entire objective variable distribution is 0.5:0.5.
- 0.1 is specified as a neighborhood ratio threshold.
- a similar instance set of each section is calculated according to the above distances, and a method described in (1: Basic configuration) is used to calculate an objective variable distribution in each section and an importance degree of each section. The result is shown in FIG. 19 .
- the computer-readable storage medium mentioned here includes: a portable storage medium such as a CD-ROM, a flexible disk, a DVD disk, a magneto-optical disk, or an IC card; a database that holds computer program; another computer and database thereof; and a transmission medium on a network line.
Abstract
Description
Gender | Age | Annual income | | ||
Man | |||||
30 | 3 million | | |||
Woman | |||||
20 | 4 million | Absence | |||
q v(c)=p(c|v)/p(c)
W j(v)=Σ|q v(c)/Σq v(d)−1/Nc|/(2−2/Nc) (1)
(Explanatory variable 1) | (Explanatory variable 2) | Distance | ||
Office worker | Part- |
1 | ||
| Inoccupation | 1 | ||
Part-time job | Inoccupation | 0.5 | ||
Distance between section x and instance y=[Σ{(average value of section x of i-th explanatory variable−value of i-th explanatory variable of instance y)/(maximum value of i-th explanatory variable−minimum value of i-th explanatory variable)}2]1/2 (2A)
In the above calculation, the addition is made for (i=1 to the number corresponding to explanatory variables). In a similar way, a distance between sections can also be calculated using the following equation.
A distance between section x and section y=[Σ{(average value of section x of i-th explanatory variable−average value of section y)/(maximum value of i-th explanatory variable−minimum value of i-th explanatory variable)}2]1/2 (2B)
[Number of instances: A]×[Objective variable value instance number threshold in neighborhood: C]/[Minimum number of instances in objective variable: B] (3)
p(i,c)=Σ(1/(1+d(i,j)))/Σ(1/(1+d(i,j))) (4)
Claims (20)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2006067015 | 2006-03-13 | ||
JP2006-067015 | 2006-03-13 |
Publications (2)
Publication Number | Publication Date |
---|---|
US20070214102A1 US20070214102A1 (en) | 2007-09-13 |
US7552105B2 true US7552105B2 (en) | 2009-06-23 |
Family
ID=38480126
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/482,929 Active 2027-07-24 US7552105B2 (en) | 2006-03-13 | 2006-07-10 | Importance degree calculation program, importance degree calculation method, and importance degree calculation apparatus |
Country Status (1)
Country | Link |
---|---|
US (1) | US7552105B2 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7684901B2 (en) * | 2007-06-29 | 2010-03-23 | Buettner William L | Automatic utility usage rate analysis methodology |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005302054A (en) | 2005-05-24 | 2005-10-27 | Fujitsu Ltd | Prediction device and method for carrying out prediction based on similar case |
US7454311B2 (en) * | 2005-09-01 | 2008-11-18 | Fujitsu Limited | Computer-readable storage medium storing data analysis program |
-
2006
- 2006-07-10 US US11/482,929 patent/US7552105B2/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005302054A (en) | 2005-05-24 | 2005-10-27 | Fujitsu Ltd | Prediction device and method for carrying out prediction based on similar case |
US7454311B2 (en) * | 2005-09-01 | 2008-11-18 | Fujitsu Limited | Computer-readable storage medium storing data analysis program |
Also Published As
Publication number | Publication date |
---|---|
US20070214102A1 (en) | 2007-09-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109063945B (en) | Value evaluation system-based 360-degree customer portrait construction method for electricity selling company | |
US6636862B2 (en) | Method and system for the dynamic analysis of data | |
US20180315059A1 (en) | Method and system of managing item assortment based on demand transfer | |
US7818351B2 (en) | Apparatus and method for detecting a relation between fields in a plurality of tables | |
Boero et al. | Scoring rules and survey density forecasts | |
US20160232637A1 (en) | Shipment-Volume Prediction Device, Shipment-Volume Prediction Method, Recording Medium, and Shipment-Volume Prediction System | |
US8595155B2 (en) | Kernel regression system, method, and program | |
US20080279434A1 (en) | Method and system for automated modeling | |
US7725340B2 (en) | Ranking-based method for evaluating customer prediction models | |
JP2003316950A (en) | System and method for credit evaluation | |
JP4391506B2 (en) | Evaluation apparatus, evaluation method, and evaluation program | |
Berg et al. | Point and density forecasts for the euro area using Bayesian VARs | |
CN110400058A (en) | Retail management method and device based on RX rule | |
CN102165442A (en) | Methods and apparatus to calibrate a choice forecasting system for use in market share forecasting | |
JP5251217B2 (en) | Sales number prediction system, operation method of sales number prediction system, and sales number prediction program | |
JP6012860B2 (en) | Work time estimation device | |
CN101341506A (en) | Method of technology valuation | |
KR102217886B1 (en) | Exploration System and Method of Optimal Weight of Big Data-based Commodity Investment Recommendation Algorithm Using Artificial Intelligence | |
CN112950086B (en) | Dynamic construction method and system of performance assessment index system of civil aviation enterprise and public institution | |
Savin et al. | Heuristic model selection for leading indicators in Russia and Germany | |
CN108140051A (en) | Data based on whole world retrieval generate the connection to global networks system of global commerce grading in real time | |
JPH10275177A (en) | Device and method for evaluating performance of investment trust | |
US7552105B2 (en) | Importance degree calculation program, importance degree calculation method, and importance degree calculation apparatus | |
CN115577991B (en) | Business intelligent data analysis system and analysis method based on big data | |
KR20230144842A (en) | Server and method for providing recommended investment item based on investment portfolio of investor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: FUJITSU LIMITED, JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MAEDA, KAZUHO;REEL/FRAME:018051/0166 Effective date: 20060627 |
|
AS | Assignment |
Owner name: FUJITSU LIMITED, JAPAN Free format text: RECORD TO CORRECT THE ADDRESS OF THE ASSIGNEE ON THE ASSIGNMENT, ORIGINALLY RECORDED AT REEL 018051, FRAME 0166. (ASSIGNMENT OF ASSIGNOR'S INTEREST);ASSIGNOR:MAEDA, KAZUHO;REEL/FRAME:018234/0352 Effective date: 20060627 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 12 |