CA2279359A1 - A method of generating attribute cardinality maps - Google Patents
A method of generating attribute cardinality maps Download PDFInfo
- Publication number
- CA2279359A1 CA2279359A1 CA002279359A CA2279359A CA2279359A1 CA 2279359 A1 CA2279359 A1 CA 2279359A1 CA 002279359 A CA002279359 A CA 002279359A CA 2279359 A CA2279359 A CA 2279359A CA 2279359 A1 CA2279359 A1 CA 2279359A1
- Authority
- CA
- Canada
- Prior art keywords
- range
- elements
- bin
- mean
- determining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/24542—Plan optimisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/24542—Plan optimisation
- G06F16/24545—Selectivity estimation or determination
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99932—Access augmentation or optimizing
Abstract
This invention provides a novel means for creating a histogram for use in minimizing response time and resource consumption when optimizing a query in a database, and other like structures, the histogram being created by placing ordered elements into specific range until the next element to be considered for inclusion in the range is a predetermined distance from the (generalized) mean value associated with the elements within the range, whereupon that next element is placed in the following range. Similarly, the following ranges are closed when the next element to be considered for inclusion in the range is greater than a predetermined distance from the (generalized) mean value associated with the elements in that range, whereupon that next element is placed in the following range. For each range, the location and size of the range is recorded with, for example, the mean value, the slope or other attribute characterizing one or more elements in the range. The invention has also applications in pattern recognition, message routing, and in actuarial sciences.
Claims (64)
1. A method of generating a histogram comprising the steps of:
(a) providing a data set representing a plurality of elements and a value asso-ciated with each element, the data set having a property defining an order of the elements therein;
(b) determining at least one range, each of the at least one range having at least an element, an arithmetic mean of each range equal to the arithmetic mean of the values associated with the at least an element within said range, a specific range from the at least a range comprising a plurality of elements from the data set adjacent each other within the defined order, wherein the arithmetic mean of the specific range is within a predetermined maximum distance from a value associated with an element within the specific range, the predetermined maximum distance independent of the number of elements within the specific range and their associated values;
and, (c) for each range storing at least a value related to an estimate of a value associated with an element within the range and at least data relating to the size and location of the range.
(a) providing a data set representing a plurality of elements and a value asso-ciated with each element, the data set having a property defining an order of the elements therein;
(b) determining at least one range, each of the at least one range having at least an element, an arithmetic mean of each range equal to the arithmetic mean of the values associated with the at least an element within said range, a specific range from the at least a range comprising a plurality of elements from the data set adjacent each other within the defined order, wherein the arithmetic mean of the specific range is within a predetermined maximum distance from a value associated with an element within the specific range, the predetermined maximum distance independent of the number of elements within the specific range and their associated values;
and, (c) for each range storing at least a value related to an estimate of a value associated with an element within the range and at least data relating to the size and location of the range.
2. A method of generating a histogram as defined in claim 1 wherein the at least a range comprises a plurality of ranges, and wherein some ranges from the plurality of ranges have a different number of elements and some ranges from the plurality of ranges have different areas, an area of each range equal to the product of the arithmetic mean of said range and the number of elements within said range.
3. A method of generating a histogram as defined in claim 2 wherein the step of determining the at least a range is performed so as to limit variance between the values associated with the elements within a same range from the at least a range in a known fashion, the limitation forming further statistical data of the histogram.
4. A method of generating a histogram as defined in claim 3 wherein a value associated with each element within a range from the at least a range is within where ~ is the tolerance value used in generating the R-ACM and i is the location of the element within a histogram sector of length l.
5. A method of generating a histogram as defined in claim 3 wherein the at least a value related to an estimate of a value for an element within the range includes a value relating to the arithmetic mean, and wherein the at least data relating to the range comprises data relating to both endpoints of the range.
6. A method of generating a histogram as defined in claim 3 wherein the step of determining at least a range comprises the steps of:
(a) using a suitably programmed processor, defining a first bin as a current bin;
(b) using the suitably programmed processor, selecting an element and adding it to the current bin as the most recently added element to the current bin;
(c) selecting an element from within the data set, the selected element not within a bin and adjacent the most recently added element;
(d) determining at least a mean of the values associated with elements within the bin;
(e) when the most recently selected element differs from a mean from the at least a mean by an amount less than a predetermined amount, adding the most recently selected element to the current bin as the most recently added element to the current bin and returning to step (c);
(f) when the selected element differs from the mean from the at least a mean by an amount more than the predetermined amount, creating a new bin as the current bin and adding the selected element to the new bin as the most recently added element to the current bin and returning to step (c);
and, (g) providing data relating to each bin including data indicative of a range of elements within each bin as the determined at least a range.
(a) using a suitably programmed processor, defining a first bin as a current bin;
(b) using the suitably programmed processor, selecting an element and adding it to the current bin as the most recently added element to the current bin;
(c) selecting an element from within the data set, the selected element not within a bin and adjacent the most recently added element;
(d) determining at least a mean of the values associated with elements within the bin;
(e) when the most recently selected element differs from a mean from the at least a mean by an amount less than a predetermined amount, adding the most recently selected element to the current bin as the most recently added element to the current bin and returning to step (c);
(f) when the selected element differs from the mean from the at least a mean by an amount more than the predetermined amount, creating a new bin as the current bin and adding the selected element to the new bin as the most recently added element to the current bin and returning to step (c);
and, (g) providing data relating to each bin including data indicative of a range of elements within each bin as the determined at least a range.
7. A method of generating a histogram as defined in claim 6 comprising the step of: (aa) providing a value, ~, of the predetermined maximum distance.
8. A method of generating a histogram as defined in claim 6 wherein the at least a mean comprises an arithmetic mean and wherein the mean from the at least a mean is the arithmetic mean.
9. A method of generating a histogram as defined in claim 6 wherein adjacent ele-ments are selected from a start location within the data set in order sequentially toward an end of th.e data set.
10. A method of generating a histogram as defined in claim 6 wherein adjacent elements are selected from a start location in an alternating fashion toward a beginning of the data set and toward an end of the data set.
11. A method of generating a histogram as defined in claim 3 wherein the step of determining at least a range comprises the steps of:
(a) selecting an element from within the data set;
(b) determining a bin with which to associate the element;
(c) when the determined bin is empty, adding the element to the bin;
(d) when the determined bin is other than empty, determining at least a mean of the values associated with elements within the determined bin;
(e) when the moss; recently selected element differs from a mean from the at least a mean by an amount less than a predetermined amount, adding the most recently selected element to the determined bin and returning to step (a);
(f) when the selected element differs from the mean from the at least a mean by an amount more than the predetermined amount, adding the selected element to the determined bin and dividing the determined bin into one of two bins and three bins, one of which includes the selected element and returning to step (a); and, (g) providing data relating to each bin including data indicative of a range of elements within the bin.
(a) selecting an element from within the data set;
(b) determining a bin with which to associate the element;
(c) when the determined bin is empty, adding the element to the bin;
(d) when the determined bin is other than empty, determining at least a mean of the values associated with elements within the determined bin;
(e) when the moss; recently selected element differs from a mean from the at least a mean by an amount less than a predetermined amount, adding the most recently selected element to the determined bin and returning to step (a);
(f) when the selected element differs from the mean from the at least a mean by an amount more than the predetermined amount, adding the selected element to the determined bin and dividing the determined bin into one of two bins and three bins, one of which includes the selected element and returning to step (a); and, (g) providing data relating to each bin including data indicative of a range of elements within the bin.
12. A method of generating a histogram as defined in claim 11 wherein the at least a mean comprises an arithmetic mean and wherein the mean from the at least a mean is the arithmetic mean.
13. A method of generating a histogram as defined in claim 12 comprising the step of: determining a first arithmetic mean of a first selected bin; determining a second arithmetic mean of a second selected bin adjacent the first selected bin;
comparing the first and second arithmetic means; and, when the arithmetic means are within a predetermined distance of each other, merging the first selected bin and the second selected bin to form a single merged bin including all the elements of the first selected bin and all the elements of the second selected bin.
comparing the first and second arithmetic means; and, when the arithmetic means are within a predetermined distance of each other, merging the first selected bin and the second selected bin to form a single merged bin including all the elements of the first selected bin and all the elements of the second selected bin.
14. A method of generating a histogram as defined in claim 3 wherein the step of determining at least a range comprises the steps of:
(a) using a suitably programmed processor, defining a first bin as a current bin;
(b) using the suitably programmed processor, selecting an element and adding it to the current bin as the most recently added element to the current bin;
(c) selecting elements adjacent the most recently added element(s);
(d) determining a first mean of the values associated with elements within the bin and deterrnining a second mean of the selected elements;
(e) when the second mean differs from the first mean by an amount less than a predetermined amount, adding the most recently selected elements to the current bin as the most recently added elements and returning to step (c);
(f) when the second mean differs from the first mean by an amount more than the predetermined amount, creating a new bin as the current bin and adding at least one of the selected elements to the new bin as the most recently added element(s) to the current bin and returning to step (c); and, (g) providing data, relating to each bin including data indicative of a range of elements within the bin.
(a) using a suitably programmed processor, defining a first bin as a current bin;
(b) using the suitably programmed processor, selecting an element and adding it to the current bin as the most recently added element to the current bin;
(c) selecting elements adjacent the most recently added element(s);
(d) determining a first mean of the values associated with elements within the bin and deterrnining a second mean of the selected elements;
(e) when the second mean differs from the first mean by an amount less than a predetermined amount, adding the most recently selected elements to the current bin as the most recently added elements and returning to step (c);
(f) when the second mean differs from the first mean by an amount more than the predetermined amount, creating a new bin as the current bin and adding at least one of the selected elements to the new bin as the most recently added element(s) to the current bin and returning to step (c); and, (g) providing data, relating to each bin including data indicative of a range of elements within the bin.
15. A method of generating a histogram as defined in claim 14 wherein the step of (f) includes the steps of:
(f1) determining a first element within the selected elements to add to the new current bin;
(f2) adding the selected element(s) before the first element to the previous current bin; and, (f3) adding the selected element(s) from and including the first element to the new current bin.
(f1) determining a first element within the selected elements to add to the new current bin;
(f2) adding the selected element(s) before the first element to the previous current bin; and, (f3) adding the selected element(s) from and including the first element to the new current bin.
16. A method of generating a histogram as defined in claim 3 wherein the step of determining at least a range comprises the steps of:
(a) using a suitably programmed processor, defining a first bin as a current bin;
(b) using the suitably programmed processor, selecting an element and adding it to the current bin as the most recently added element to the current bin;
(c) selecting an element from within the data set, the selected element not within a bin and adjacent the most recently added element;
(d) determining the Generalized positive-2 mean of the current bin as the square-root of the (sum of the squares of the values associated with elements within the bin divided by the number of the elements within the bin);
(e) determining the Generalized negative-2 mean of the current bin as the square of the (sum of the square-roots of the values associated with elements within the bin divided by the number of the elements within the bin);
(f) when the value associated with the selected element is lower than the said Generalized positive-2 mean, determining a difference between the value associated with the selected element and the said Generalized positive-2 mean, and when the value associated with the selected element is higher than the said Generalized negative-2 mean, determining a difference between the value associated with the selected element and the said Generalized negative-2 mean;
(g) when a difference is other than greater than the predetermined amount, adding the selected element to the current bin as the most recently added element to the current bin and returning to step (c);
(h) when a difference is greater than the predetermined amount, defining a new bin as the current bin, adding the selected element to the current bin as the most recently added element to the current bin, and returning to step (c); and, (i) providing data relating to each bin including data indicative of a range of elements within each bin as the determined at least a range.
(a) using a suitably programmed processor, defining a first bin as a current bin;
(b) using the suitably programmed processor, selecting an element and adding it to the current bin as the most recently added element to the current bin;
(c) selecting an element from within the data set, the selected element not within a bin and adjacent the most recently added element;
(d) determining the Generalized positive-2 mean of the current bin as the square-root of the (sum of the squares of the values associated with elements within the bin divided by the number of the elements within the bin);
(e) determining the Generalized negative-2 mean of the current bin as the square of the (sum of the square-roots of the values associated with elements within the bin divided by the number of the elements within the bin);
(f) when the value associated with the selected element is lower than the said Generalized positive-2 mean, determining a difference between the value associated with the selected element and the said Generalized positive-2 mean, and when the value associated with the selected element is higher than the said Generalized negative-2 mean, determining a difference between the value associated with the selected element and the said Generalized negative-2 mean;
(g) when a difference is other than greater than the predetermined amount, adding the selected element to the current bin as the most recently added element to the current bin and returning to step (c);
(h) when a difference is greater than the predetermined amount, defining a new bin as the current bin, adding the selected element to the current bin as the most recently added element to the current bin, and returning to step (c); and, (i) providing data relating to each bin including data indicative of a range of elements within each bin as the determined at least a range.
17. A method of generating a histogram as defined in claim 3 wherein the step of determining at least a range comprises the steps of:
(a) using a suitably programmed processor, defining a first bin as a current bin;
(b) using the suitably programmed processor, selecting an element and adding it to the current bin as the most recently added element to the current bin;
(c) selecting an element from within the data set, the selected element not within a bin and adjacent the most recently added element;
(d) determining the Generalized positive-k mean of the current bin for a pre-determined k as the k th-root of the (sum of the k th powers of the values associated with elements within the bin divided by the number of the elements within the bin);
(e) determining the Generalized negative-k mean of the current bin as the k th power of the (sum of the k th-roots of the values associated with elements within the bin divided by the number of the elements within the bin);
(f) when the value associated with the selected element is lower than the said Generalized positive-k mean, determining a difference between the value associated with the selected element and the said Generalized positive-k mean, and when the value associated with the selected element is higher than the said Generalized negative-k mean, determining a difference between the value associated with the selected element and the said Generalized negative-k mean;
(g) when a difference is other than greater than the predetermined amount, adding the selected element to the current bin as the most recently added element to the current bin and returning to step (c);
(h) when a difference is greater than the predetermined amount, defining a new bin as the current bin, adding the selected element to the current bin as the most recently added element to the current bin, and returning to step (c); and, (i) providing data relating to each bin including data indicative of a range of elements within each bin as the determined at least a range.
(a) using a suitably programmed processor, defining a first bin as a current bin;
(b) using the suitably programmed processor, selecting an element and adding it to the current bin as the most recently added element to the current bin;
(c) selecting an element from within the data set, the selected element not within a bin and adjacent the most recently added element;
(d) determining the Generalized positive-k mean of the current bin for a pre-determined k as the k th-root of the (sum of the k th powers of the values associated with elements within the bin divided by the number of the elements within the bin);
(e) determining the Generalized negative-k mean of the current bin as the k th power of the (sum of the k th-roots of the values associated with elements within the bin divided by the number of the elements within the bin);
(f) when the value associated with the selected element is lower than the said Generalized positive-k mean, determining a difference between the value associated with the selected element and the said Generalized positive-k mean, and when the value associated with the selected element is higher than the said Generalized negative-k mean, determining a difference between the value associated with the selected element and the said Generalized negative-k mean;
(g) when a difference is other than greater than the predetermined amount, adding the selected element to the current bin as the most recently added element to the current bin and returning to step (c);
(h) when a difference is greater than the predetermined amount, defining a new bin as the current bin, adding the selected element to the current bin as the most recently added element to the current bin, and returning to step (c); and, (i) providing data relating to each bin including data indicative of a range of elements within each bin as the determined at least a range.
18. A method of generating a histogram as defined in claim 3 wherein the step of determining at least a range comprises the steps of:
(a) using a suitably programmed processor, defining a first bin as a current bin;
(b) using the suitably programmed processor, selecting an element and adding it to the current bin as the most recently added element to the current bin;
(c) selecting an element from within the data set, the selected element not within a bin and adjacent the most recently added element;
(d) determining a current largest value as the largest of the values associated with elements within the bin;
(e) determining a current smallest value as the smallest of the values associated with elements within the bin;
(f) when the value associated with the selected element is lower than the current largest value, determining a difference between the value associated with the selected element and the current largest value, and when the value associated with the selected element is higher than the current smallest value, determining a difference between the value associated with the selected element and the current smallest value;
(g) when a difference is other than greater than the predetermined amount, adding the selected element to the current bin as the most recently added element to the current bin and returning to step (c);
(h) when a difference is greater than the predetermined amount, defining a new bin as the current bin, adding the selected element to the current bin as the most recently added element to the current bin, and returning to step (c); and, (i) providing data. relating to each bin including data indicative of a range of elements within each bin as the determined at least a range.
(a) using a suitably programmed processor, defining a first bin as a current bin;
(b) using the suitably programmed processor, selecting an element and adding it to the current bin as the most recently added element to the current bin;
(c) selecting an element from within the data set, the selected element not within a bin and adjacent the most recently added element;
(d) determining a current largest value as the largest of the values associated with elements within the bin;
(e) determining a current smallest value as the smallest of the values associated with elements within the bin;
(f) when the value associated with the selected element is lower than the current largest value, determining a difference between the value associated with the selected element and the current largest value, and when the value associated with the selected element is higher than the current smallest value, determining a difference between the value associated with the selected element and the current smallest value;
(g) when a difference is other than greater than the predetermined amount, adding the selected element to the current bin as the most recently added element to the current bin and returning to step (c);
(h) when a difference is greater than the predetermined amount, defining a new bin as the current bin, adding the selected element to the current bin as the most recently added element to the current bin, and returning to step (c); and, (i) providing data. relating to each bin including data indicative of a range of elements within each bin as the determined at least a range.
19. A method of generating a histogram as defined in claim 18 wherein the predetermined amount is equal to 2(~).
20. A method of generating a histogram as defined in claim 1 wherein the step of determining at least a range comprises the steps of:
(a) selecting a group of elements within the data set and adjacent one another within the ordering;
(b) determining a mean of the values associated with each selected element within the selected group of elements;
(c) comparing a value associated with each selected element in the group to the mean value to determine a difference;
(d) when a value is different from the mean by more than a predetermined amount, returning to step (a); and, (e) when all values are different from the mean by less than or equal to the predetermined amount, creating a bin including the selected group of elements and returning to step (a).
(a) selecting a group of elements within the data set and adjacent one another within the ordering;
(b) determining a mean of the values associated with each selected element within the selected group of elements;
(c) comparing a value associated with each selected element in the group to the mean value to determine a difference;
(d) when a value is different from the mean by more than a predetermined amount, returning to step (a); and, (e) when all values are different from the mean by less than or equal to the predetermined amount, creating a bin including the selected group of elements and returning to step (a).
21. A method of generating a histogram as defined in claim 20 comprising the steps of (f1) selecting an element adjacent the bin including the selected group of elements, the selected element other than an element within a bin;
(f2) determining a mean of the values associated with each element within the bin and the selected element; and, (f3) when the value of the selected element differs from the mean by less than or equal to the predetermined amount, adding the selected element to the bin and returning to step (f1).
(f2) determining a mean of the values associated with each element within the bin and the selected element; and, (f3) when the value of the selected element differs from the mean by less than or equal to the predetermined amount, adding the selected element to the bin and returning to step (f1).
22. A method as defined in claim 1 comprising the step of: providing the mean associated with the range as an estimate of a value associated with an element within said range.
23. A method as defined in claim 22 comprising the step of: estimating a reliability of the estimated value.
24. A method as defined in claim 23 wherein the estimated value is used for estimating the computational efficiency of a search within a database.
25. A method as defined in claim 1 comprising the step of: estimating a value associated with a selected range of elements as a sum of (products of (the arithmetic mean far each range and a number of elements within both the range and the selected range)).
26. A method as defined in claim 1 wherein a plurality of ranges are determined and the determined ranges are used for dividing the elements into groups having similar statistical properties for use in actuarial calculations.
27. A method as defined in claim 1 comprising the steps of: determining a routing table in dependence upon the histogram; and, estimating a value for use in network routing in dependence upon the routing table.
28. A method as defined in claim 22 comprising the step of using the estimate for determining an approach to searching within a plurality of different databases given a predetermined limited time for conducting the search, wherein the approach is selected to approximately maximise the probability of successfully completing the search.
29. A method as defined in claim 1 comprising the step of improving a discriminant function used in a process of pattern recognition in dependence upon the histogram.
30. A method of generating a histogram comprising the steps of:
(a) providing a data set representing a plurality of elements and a value associated with each element, the data set having a property defining an order of the elements therein;
(b) determining a plurality of ranges, each of the plurality of ranges having at least an element, a known statistical correlation existing between values associated with elements in a same range, some ranges from the at least a range comprising a plurality of elements from the data set adjacent each other within the defined order, the statistical correlation for those ranges indicative of a maximum error between an estimated value associated with an element within the range and the value associated with the element, the maximum error other than (the total area of the range minus the estimated value), wherein an area of each range is equal to the product of the arithmetic mean of said range and the number of elements within said range; and, (c) for each range storing at least a value related to an estimate of a value associated with an element within the range and at least data relating to the size and location of the range.
(a) providing a data set representing a plurality of elements and a value associated with each element, the data set having a property defining an order of the elements therein;
(b) determining a plurality of ranges, each of the plurality of ranges having at least an element, a known statistical correlation existing between values associated with elements in a same range, some ranges from the at least a range comprising a plurality of elements from the data set adjacent each other within the defined order, the statistical correlation for those ranges indicative of a maximum error between an estimated value associated with an element within the range and the value associated with the element, the maximum error other than (the total area of the range minus the estimated value), wherein an area of each range is equal to the product of the arithmetic mean of said range and the number of elements within said range; and, (c) for each range storing at least a value related to an estimate of a value associated with an element within the range and at least data relating to the size and location of the range.
31. A method of generating a histogram as defined in claim 30 wherein a range from the some ranges define a range of values associated with elements within the range, and wherein the range of values has a maximum upper limit and a minimum lower limit, the lower limit other than zero and the upper limit other than the area of the range.
32. A method of generating a histogram as defined in claim 31 wherein a value associated with each element within a range from the some ranges is within a known maximum error of an estimated value for that element and wherein the maximum error is different for some elements in a same range than for others.
33. A method of generating a histogram comprising the steps of:
(a) providing a data set representing a plurality of elements and a value asso-ciated with each element, the data set having a property defining an order of the elements therein;
(b) determining at least one range having a length such that the value associated with at least one element within the range is within a predetermined maximum distance of at least one other element within the range; and, (c) for each range storing at least a value related to an estimate of a value associated with an element within the range and at least data relating to the size and location of the range.
(a) providing a data set representing a plurality of elements and a value asso-ciated with each element, the data set having a property defining an order of the elements therein;
(b) determining at least one range having a length such that the value associated with at least one element within the range is within a predetermined maximum distance of at least one other element within the range; and, (c) for each range storing at least a value related to an estimate of a value associated with an element within the range and at least data relating to the size and location of the range.
34. A method of generating a histogram comprising the steps of:
(a) providing a data set representing a plurality of elements and a value associated with each element, the data set having a property defining an order of the elements therein;
(b) determining at least one range having a length such that the value associated with every element within the range is within a predetermined maximum distance of every other element within the range; and, (c) for each range storing at least a value related to an estimate of a value associated with an element within the range and at least data relating to the size and location of the range.
(a) providing a data set representing a plurality of elements and a value associated with each element, the data set having a property defining an order of the elements therein;
(b) determining at least one range having a length such that the value associated with every element within the range is within a predetermined maximum distance of every other element within the range; and, (c) for each range storing at least a value related to an estimate of a value associated with an element within the range and at least data relating to the size and location of the range.
35. An article of manufacture comprising a computer usable medium having data determinative of the following:
(a) computer readable program code embodied therein for providing a data set representing a plurality of elements and a value associated with each element, the data set having a property defining an order of the elements therein; item [(b)] computer readable program code embodied therein for determining at least one range, each of the at least one range having at least an element, an arithmetic mean of each range equal to the arithmetic mean of the values associated with the at least an element within said range, a specific range from the at least a range comprising a plurality of elements from the data set adjacent each other within the defined order, wherein the arithmetic mean of the specific range is within a predetermined maximum distance from a value associated with an element within the specific range, the predetermined maximum distance independent of the number of elements within the specific range and their associated values; and, item [(c)] computer readable program code embodied therein for storing for each range at least a value related to the mean and at least data relating to the size and location of the range.
(a) computer readable program code embodied therein for providing a data set representing a plurality of elements and a value associated with each element, the data set having a property defining an order of the elements therein; item [(b)] computer readable program code embodied therein for determining at least one range, each of the at least one range having at least an element, an arithmetic mean of each range equal to the arithmetic mean of the values associated with the at least an element within said range, a specific range from the at least a range comprising a plurality of elements from the data set adjacent each other within the defined order, wherein the arithmetic mean of the specific range is within a predetermined maximum distance from a value associated with an element within the specific range, the predetermined maximum distance independent of the number of elements within the specific range and their associated values; and, item [(c)] computer readable program code embodied therein for storing for each range at least a value related to the mean and at least data relating to the size and location of the range.
36. A method of generating a histogram comprising the steps of:
(a) providing a data set representing a plurality of elements and a value associated with each element, the data set having a property defining an order of the elements therein;
(b) determining a range within the data set, the range comprising a plurality of elements from the data set and adjacent within the order; and, (c) storing a plurality of values indicative of a straight line defining an approximate upper boundary of the values associated with each element within the range, the straight line indicating different values for different elements within the range.
(a) providing a data set representing a plurality of elements and a value associated with each element, the data set having a property defining an order of the elements therein;
(b) determining a range within the data set, the range comprising a plurality of elements from the data set and adjacent within the order; and, (c) storing a plurality of values indicative of a straight line defining an approximate upper boundary of the values associated with each element within the range, the straight line indicating different values for different elements within the range.
37. A method of generating a histogram as defined in claim 36, wherein the provided range has a mean equal to the mean of the values associated with the at least an element within said range and an area of the range equal to the product of the mean of said range and the number of elements within said range; and comprising the steps of:
(d) providing a value associated with specific elements within the range; and, (e) determining a straight line approximating the determined value for each of the specific elements and having an area therebelow determined according to (the value of the straight line at the first element in the range + the value of the straight line at the last element in the range) divided by 2 all times the number of elements within the range, the area below the straight line approximately equal to the area of the range.
(d) providing a value associated with specific elements within the range; and, (e) determining a straight line approximating the determined value for each of the specific elements and having an area therebelow determined according to (the value of the straight line at the first element in the range + the value of the straight line at the last element in the range) divided by 2 all times the number of elements within the range, the area below the straight line approximately equal to the area of the range.
38. A method of generating a histogram as defined in claim 37, comprising the steps of:
(f) providing a second range within the data set, the second range adjacent the first range and comprising a plurality of elements from the data set and adjacent within the order, the provided second range having a mean equal to the mean of the values associated with the at least an element within said range and an area of the range equal to the product of the mean of said range and the number of elements within said range;
(g) providing a value associated with specific elements within the second range;
(h) determining a second straight line approximating the determined value for each of the specific elements within the second range and having an area therebelow determined according to (the value of the second straight line at the first element in the second range + the valve of the second straight line at the last element in the second range) divided by 2 all times the number of elements within the second range, the area below the straight line approximately equal to the area of the second range;
and, storing a plurality of values indicative of the second straight line.
(f) providing a second range within the data set, the second range adjacent the first range and comprising a plurality of elements from the data set and adjacent within the order, the provided second range having a mean equal to the mean of the values associated with the at least an element within said range and an area of the range equal to the product of the mean of said range and the number of elements within said range;
(g) providing a value associated with specific elements within the second range;
(h) determining a second straight line approximating the determined value for each of the specific elements within the second range and having an area therebelow determined according to (the value of the second straight line at the first element in the second range + the valve of the second straight line at the last element in the second range) divided by 2 all times the number of elements within the second range, the area below the straight line approximately equal to the area of the second range;
and, storing a plurality of values indicative of the second straight line.
39. A method of generating a histogram as defined in claim 38, wherein the step of determining the straight line and of determining the second straight line are performed such that the adjacent endpoints of the first straight line and the second straight line are a same point.
40. A method of generating a histogram as defined in claim 39, wherein the step of determining the second straight line is performed in dependence upon the endpoint of the first straight line adjacent the second range.
41. A method of generating a histogram as defined in claim 37, wherein the step of determining a range is performed so as to limit variance between values associated with elements in the range and the straight line in a known fashion, the limitation forming further statistical data of the histogram.
42. A method of generating a histogram as defined in claim 41, wherein the step of determining a range is performed so as to limit average error between some values associated with elements in the range and the straight line in a known fashion.
43. A method of generating a histogram as defined in claim 41, wherein the step of determining a range is performed so as to limit least squared error between some values associated with elements in the range and the straight line in a known fashion.
44. A method of generating a histogram as defined in claim 36, wherein the plurality of values are indicative of a range beginning, a range ending, a value at the range beginning and a value at the range ending.
45. A method of generating a histogram as defined in claim 44, wherein the plurality of values includes data for determining a straight line approximating the values associated with elements within the range, and differing therefrom by an amount less than a known amount less than each associated value.
46. A method of generating a histogram comprising the steps of:
(a) providing a data set representing a plurality of elements and a value associated with each element, the data set having a property defining an order of the elements therein;
(b) providing a range within the data set, the range comprising a plurality of elements from the data set and adjacent within the order;
(c) determining a straight line indicating different values for different elements within the range; and, (d) storing a plurality of values indicative of a straight line defining an approximate upper boundary of the values associated with each element within the range, the straight line indicating different values for different elements within the range.
(a) providing a data set representing a plurality of elements and a value associated with each element, the data set having a property defining an order of the elements therein;
(b) providing a range within the data set, the range comprising a plurality of elements from the data set and adjacent within the order;
(c) determining a straight line indicating different values for different elements within the range; and, (d) storing a plurality of values indicative of a straight line defining an approximate upper boundary of the values associated with each element within the range, the straight line indicating different values for different elements within the range.
47. A method of generating a histogram as defined in claim 46, wherein the provided range has a mean equal to the mean of the values associated with the at least an element within said range and an area of the range equal to the product of the mean of said range and the number of elements within said range; and wherein the step (c) comprises the steps of:
(c1) providing a value associated with specific elements within the range;
and, (c2) determining a straight line approximating the determined value for each of the specific elements and having an area therebelow determined according to (the value of the straight line at the first element in the range + the value of the straight line at the last element in the range) divided by 2 all times the number of elements within the range, the area below the straight line approximately equal to the area of the range.
(c1) providing a value associated with specific elements within the range;
and, (c2) determining a straight line approximating the determined value for each of the specific elements and having an area therebelow determined according to (the value of the straight line at the first element in the range + the value of the straight line at the last element in the range) divided by 2 all times the number of elements within the range, the area below the straight line approximately equal to the area of the range.
48. A method of generating a histogram as defined in claim 47, comprising the steps of:
(e) providing a second range within the data set, the second range adjacent the first range and comprising a plurality of elements from the data set and adjacent within the order, the provided second range having a mean equal to the mean of the values associated with the at least an element within said range and an area of the range equal to the product of the mean of said range and the number of elements within said range;
(f) providing a value associated with specific elements within the second range;
(g) determining a second straight line approximating the determined value for each of the specific elements within the second range and having an area therebelow determined according to (the value of the second straight line at the first element in the second range + the value of the second straight line at the last element in the second range) divided by 2 all times the number of elements within the second range, the area below the straight line approximately equal to the area of the second range;
and, storing a plurality of values indicative of the second straight line.
(e) providing a second range within the data set, the second range adjacent the first range and comprising a plurality of elements from the data set and adjacent within the order, the provided second range having a mean equal to the mean of the values associated with the at least an element within said range and an area of the range equal to the product of the mean of said range and the number of elements within said range;
(f) providing a value associated with specific elements within the second range;
(g) determining a second straight line approximating the determined value for each of the specific elements within the second range and having an area therebelow determined according to (the value of the second straight line at the first element in the second range + the value of the second straight line at the last element in the second range) divided by 2 all times the number of elements within the second range, the area below the straight line approximately equal to the area of the second range;
and, storing a plurality of values indicative of the second straight line.
49. A method of generating a histogram as defined in claim 48, wherein the step of determining the straight line and of determining the second straight line are performed such that the adjacent endpoints of the first straight line and the second straight line are a same point.
50. A method of generating a histogram as defined in claim 49, wherein the step of determining the second straight line is performed in dependence upon the endpoint of the first straight line adjacent the second range.
51. A method of generating a histogram as defined in claim 47, wherein the step of determining the straight line is performed so as to limit variance between values associated with elements in the range and the straight line in a known fashion, the limitation forming further statistical data of the histogram.
52. A method of generating a histogram as defined in claim 51, wherein the step of determining the straight line is performed so as to limit average error between some values associated with elements in the range and the straight line in a known fashion.
53. A method of generating a histogram as defined in claim 51, wherein the step of determining the straight line is performed so as to limit least squared error between some values associated with elements in the range and the straight line in a known fashion.
54. A method as defined in claim 36, comprising the step of estimating a value associated with an element based on the location of the straight line at the element.
55. A method as defined in claim 54, comprising the step of estimating a reliability of the estimated value.
56. A method as defined in claim 46, comprising the step of estimating a value associated with an element based on the location of the straight line at the element.
57. A method as deemed in claim 56, comprising the step of estimating a reliability of the estimated value.
58. A method as defined in claim 56, comprising the step of using the estimated value, estimating the computational efficiency of a search within a database.
59. A method as defined in claim 46, comprising the steps of determining a routing table in dependence upon the histogram; and, determining an estimate of a value within the routing table for determining a network routing.
60. A method as defined in claim 56, comprising the step of: using the estimate for determining an approach to searching within a plurality of different databases given a predetermined limited time for conducting the search, wherein the approach is determined to approximately maximise the probability of successfully completing the search.
61. A method as defined in claim 46, comprising the step of: improving the discriminant function used in a process of pattern recognition in dependence upon the histogram.
62. A method as defined in claim 36, comprising the step of estimating a value associated with a selected range of elements as a sum of areas below the straight lines for portions of ranges within the selected range.
63. A method as defined in claim 46, comprising the step of estimating a value associated with a selected range of elements as a sum of areas below the straight lines for portions of ranges within the selected range.
64. An article of manufacture comprising a computer usable medium having data determinative of the following:
(a) computer readable program code embodied therein for providing a data set representing a plurality of elements and a value associated with each element, the data set having a property defining an order of the elements therein;
(b) computer readable program code embodied therein for providing a range within the data set, the range comprising a plurality of elements from the data set and adjacent within the order;
(c) computer readable program code embodied therein for determining a straight line indicating different values for different elements within the range; and, (d) computer readable program code embodied therein for storing a plurality of values indicative of a straight line defining an approximate upper boundary of the values associated with each element within the range, the straight line indicating different values for different elements within the range.
(a) computer readable program code embodied therein for providing a data set representing a plurality of elements and a value associated with each element, the data set having a property defining an order of the elements therein;
(b) computer readable program code embodied therein for providing a range within the data set, the range comprising a plurality of elements from the data set and adjacent within the order;
(c) computer readable program code embodied therein for determining a straight line indicating different values for different elements within the range; and, (d) computer readable program code embodied therein for storing a plurality of values indicative of a straight line defining an approximate upper boundary of the values associated with each element within the range, the straight line indicating different values for different elements within the range.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA2279359A CA2279359C (en) | 1999-07-30 | 1999-07-30 | A method of generating attribute cardinality maps |
CA2743462A CA2743462C (en) | 1999-07-30 | 1999-07-30 | A method of generating attribute cardinality maps |
US09/487,328 US6865567B1 (en) | 1999-07-30 | 2000-01-19 | Method of generating attribute cardinality maps |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA2279359A CA2279359C (en) | 1999-07-30 | 1999-07-30 | A method of generating attribute cardinality maps |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA2743462A Division CA2743462C (en) | 1999-07-30 | 1999-07-30 | A method of generating attribute cardinality maps |
Publications (2)
Publication Number | Publication Date |
---|---|
CA2279359A1 true CA2279359A1 (en) | 2001-01-30 |
CA2279359C CA2279359C (en) | 2012-10-23 |
Family
ID=4163900
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA2279359A Expired - Fee Related CA2279359C (en) | 1999-07-30 | 1999-07-30 | A method of generating attribute cardinality maps |
CA2743462A Expired - Fee Related CA2743462C (en) | 1999-07-30 | 1999-07-30 | A method of generating attribute cardinality maps |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA2743462A Expired - Fee Related CA2743462C (en) | 1999-07-30 | 1999-07-30 | A method of generating attribute cardinality maps |
Country Status (2)
Country | Link |
---|---|
US (1) | US6865567B1 (en) |
CA (2) | CA2279359C (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112837250A (en) * | 2021-01-27 | 2021-05-25 | 武汉华中数控股份有限公司 | Infrared image self-adaptive enhancement method based on generalized histogram equalization |
Families Citing this family (139)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6532458B1 (en) * | 1999-03-15 | 2003-03-11 | Microsoft Corporation | Sampling for database systems |
JP4563558B2 (en) * | 2000-07-31 | 2010-10-13 | 株式会社ターボデータラボラトリー | Data compiling method and storage medium storing compiling method |
US7085769B1 (en) * | 2001-04-26 | 2006-08-01 | Ncr Corporation | Method and apparatus for performing hash join |
US20030033582A1 (en) * | 2001-05-09 | 2003-02-13 | Wavemarket, Inc. | Representations for estimating distance |
US6732085B1 (en) * | 2001-05-31 | 2004-05-04 | Oracle International Corporation | Method and system for sample size determination for database optimizers |
US6907422B1 (en) * | 2001-12-18 | 2005-06-14 | Siebel Systems, Inc. | Method and system for access and display of data from large data sets |
JP2003203076A (en) * | 2001-12-28 | 2003-07-18 | Celestar Lexico-Sciences Inc | Knowledge searching device and method, program and recording medium |
GB0205000D0 (en) * | 2002-03-04 | 2002-04-17 | Isis Innovation | Unsupervised data segmentation |
US7174344B2 (en) * | 2002-05-10 | 2007-02-06 | Oracle International Corporation | Orthogonal partitioning clustering |
US7707145B2 (en) * | 2002-07-09 | 2010-04-27 | Gerald Mischke | Method for control, analysis and simulation of research, development, manufacturing and distribution processes |
US7047230B2 (en) * | 2002-09-09 | 2006-05-16 | Lucent Technologies Inc. | Distinct sampling system and a method of distinct sampling for optimizing distinct value query estimates |
US20040093413A1 (en) * | 2002-11-06 | 2004-05-13 | Bean Timothy E. | Selecting and managing time specified segments from a large continuous capture of network data |
US7031958B2 (en) * | 2003-02-06 | 2006-04-18 | International Business Machines Corporation | Patterned based query optimization |
WO2004086185A2 (en) * | 2003-03-19 | 2004-10-07 | Unisys Corporation | Rules-based deployment of computing components |
CA2427209A1 (en) * | 2003-04-30 | 2004-10-30 | Ibm Canada Limited - Ibm Canada Limitee | Optimization of queries on views defined by conditional expressions having mutually exclusive conditions |
US7299226B2 (en) * | 2003-06-19 | 2007-11-20 | Microsoft Corporation | Cardinality estimation of joins |
US7149735B2 (en) * | 2003-06-24 | 2006-12-12 | Microsoft Corporation | String predicate selectivity estimation |
US7155585B2 (en) * | 2003-08-01 | 2006-12-26 | Falconstor Software, Inc. | Method and system for synchronizing storage system data |
EP1510932A1 (en) * | 2003-08-27 | 2005-03-02 | Sap Ag | Computer implemented method and according computer program product for storing data sets in and retrieving data sets from a data storage system |
US7739263B2 (en) * | 2003-09-06 | 2010-06-15 | Oracle International Corporation | Global hints |
US7260563B1 (en) * | 2003-10-08 | 2007-08-21 | Ncr Corp. | Efficient costing for inclusion merge join |
US7167848B2 (en) * | 2003-11-07 | 2007-01-23 | Microsoft Corporation | Generating a hierarchical plain-text execution plan from a database query |
US20050223019A1 (en) * | 2004-03-31 | 2005-10-06 | Microsoft Corporation | Block-level sampling in statistics estimation |
US20060005121A1 (en) * | 2004-06-30 | 2006-01-05 | Microsoft Corporation | Discretization of dimension attributes using data mining techniques |
US7283990B2 (en) * | 2004-07-27 | 2007-10-16 | Xerox Corporation | Method and system for managing resources for multi-service jobs based on location |
US20060074875A1 (en) * | 2004-09-30 | 2006-04-06 | International Business Machines Corporation | Method and apparatus for predicting relative selectivity of database query conditions using respective cardinalities associated with different subsets of database records |
US8046354B2 (en) * | 2004-09-30 | 2011-10-25 | International Business Machines Corporation | Method and apparatus for re-evaluating execution strategy for a database query |
US20060116983A1 (en) * | 2004-11-30 | 2006-06-01 | International Business Machines Corporation | System and method for ordering query results |
US7359922B2 (en) * | 2004-12-22 | 2008-04-15 | Ianywhere Solutions, Inc. | Database system and methodology for generalized order optimization |
US7716162B2 (en) * | 2004-12-30 | 2010-05-11 | Google Inc. | Classification of ambiguous geographic references |
US8935273B2 (en) * | 2005-06-23 | 2015-01-13 | International Business Machines Corporation | Method of processing and decomposing a multidimensional query against a relational data source |
US7630977B2 (en) * | 2005-06-29 | 2009-12-08 | Xerox Corporation | Categorization including dependencies between different category systems |
CA2519021A1 (en) * | 2005-09-13 | 2007-03-13 | Cognos Incorporated | System and method of providing date, arithmetic, and other functions for olap sources |
CA2519001A1 (en) * | 2005-09-13 | 2007-03-13 | Cognos Incorporated | System and method of data agnostic business intelligence query |
US7570821B2 (en) * | 2005-11-02 | 2009-08-04 | Kitakyushu Foundation For The Advancement Of Industry, Science And Technology | Apparatus and method for image coding |
US7685098B2 (en) * | 2005-12-08 | 2010-03-23 | International Business Machines Corporation | Estimating the size of a join by generating and combining partial join estimates |
US7185004B1 (en) | 2005-12-09 | 2007-02-27 | International Business Machines Corporation | System and method for reverse routing materialized query tables in a database |
US8732138B2 (en) * | 2005-12-21 | 2014-05-20 | Sap Ag | Determination of database statistics using application logic |
US7565266B2 (en) * | 2006-02-14 | 2009-07-21 | Seagate Technology, Llc | Web-based system of product performance assessment and quality control using adaptive PDF fitting |
US20070208696A1 (en) * | 2006-03-03 | 2007-09-06 | Louis Burger | Evaluating materialized views in a database system |
US7873628B2 (en) * | 2006-03-23 | 2011-01-18 | Oracle International Corporation | Discovering functional dependencies by sampling relations |
US7478083B2 (en) * | 2006-04-03 | 2009-01-13 | International Business Machines Corporation | Method and system for estimating cardinality in a database system |
US7702699B2 (en) * | 2006-05-31 | 2010-04-20 | Oracle America, Inc. | Dynamic data stream histograms for large ranges |
US7395270B2 (en) * | 2006-06-26 | 2008-07-01 | International Business Machines Corporation | Classification-based method and apparatus for string selectivity estimation |
US7962499B2 (en) | 2006-08-18 | 2011-06-14 | Falconstor, Inc. | System and method for identifying and mitigating redundancies in stored data |
US8694524B1 (en) * | 2006-08-28 | 2014-04-08 | Teradata Us, Inc. | Parsing a query |
US20080065616A1 (en) * | 2006-09-13 | 2008-03-13 | Brown Abby H | Metadata integration tool, systems and methods for managing enterprise metadata for the runtime environment |
US20080195578A1 (en) * | 2007-02-09 | 2008-08-14 | Fabian Hueske | Automatically determining optimization frequencies of queries with parameter markers |
JP5063151B2 (en) * | 2007-03-19 | 2012-10-31 | 株式会社リコー | Information search system and information search method |
US20080288527A1 (en) * | 2007-05-16 | 2008-11-20 | Yahoo! Inc. | User interface for graphically representing groups of data |
US8122056B2 (en) | 2007-05-17 | 2012-02-21 | Yahoo! Inc. | Interactive aggregation of data on a scatter plot |
US7739229B2 (en) | 2007-05-22 | 2010-06-15 | Yahoo! Inc. | Exporting aggregated and un-aggregated data |
US7756900B2 (en) * | 2007-05-22 | 2010-07-13 | Yahoo!, Inc. | Visual interface to indicate custom binning of items |
US7668952B2 (en) * | 2007-08-27 | 2010-02-23 | Internationla Business Machines Corporation | Apparatus, system, and method for controlling a processing system |
US7774336B2 (en) * | 2007-09-10 | 2010-08-10 | International Business Machines Corporation | Adaptively reordering joins during query execution |
JP5018487B2 (en) * | 2008-01-14 | 2012-09-05 | 富士通株式会社 | Multi-objective optimization design support apparatus, method, and program considering manufacturing variations |
JP5003499B2 (en) * | 2008-01-14 | 2012-08-15 | 富士通株式会社 | Multi-objective optimization design support apparatus, method, and program |
US20090182538A1 (en) * | 2008-01-14 | 2009-07-16 | Fujitsu Limited | Multi-objective optimum design support device using mathematical process technique, its method and program |
US7987177B2 (en) * | 2008-01-30 | 2011-07-26 | International Business Machines Corporation | Method for estimating the number of distinct values in a partitioned dataset |
US20090307187A1 (en) * | 2008-02-28 | 2009-12-10 | Amir Averbuch | Tree automata based methods for obtaining answers to queries of semi-structured data stored in a database environment |
US7987195B1 (en) | 2008-04-08 | 2011-07-26 | Google Inc. | Dynamic determination of location-identifying search phrases |
CN102084363B (en) * | 2008-07-03 | 2014-11-12 | 加利福尼亚大学董事会 | A method for efficiently supporting interactive, fuzzy search on structured data |
US9189523B2 (en) * | 2008-07-05 | 2015-11-17 | Hewlett-Packard Development Company, L.P. | Predicting performance of multiple queries executing in a database |
US9910892B2 (en) | 2008-07-05 | 2018-03-06 | Hewlett Packard Enterprise Development Lp | Managing execution of database queries |
US8473327B2 (en) * | 2008-10-21 | 2013-06-25 | International Business Machines Corporation | Target marketing method and system |
US8195496B2 (en) * | 2008-11-26 | 2012-06-05 | Sap Aktiengesellschaft | Combining multiple objective functions in algorithmic problem solving |
US8214352B2 (en) * | 2008-11-26 | 2012-07-03 | Hewlett-Packard Development Company | Modular query optimizer |
JP5163472B2 (en) * | 2008-12-17 | 2013-03-13 | 富士通株式会社 | Design support apparatus, method, and program for dividing and modeling parameter space |
US20100287015A1 (en) * | 2009-05-11 | 2010-11-11 | Grace Au | Method for determining the cost of evaluating conditions |
US8117224B2 (en) * | 2009-06-23 | 2012-02-14 | International Business Machines Corporation | Accuracy measurement of database search algorithms |
US20110184934A1 (en) * | 2010-01-28 | 2011-07-28 | Choudur Lakshminarayan | Wavelet compression with bootstrap sampling |
US8812484B2 (en) | 2010-03-30 | 2014-08-19 | Hewlett-Packard Development Company, L.P. | System and method for outer joins on a parallel database management system |
US8650218B2 (en) * | 2010-05-20 | 2014-02-11 | International Business Machines Corporation | Dynamic self configuring overlays |
US9785904B2 (en) * | 2010-05-25 | 2017-10-10 | Accenture Global Services Limited | Methods and systems for demonstrating and applying productivity gains |
US8356027B2 (en) * | 2010-10-07 | 2013-01-15 | Sap Ag | Hybrid query execution plan generation and cost model evaluation |
US20120136879A1 (en) * | 2010-11-29 | 2012-05-31 | Eric Williamson | Systems and methods for filtering interpolated input data based on user-supplied or other approximation constraints |
US8229917B1 (en) * | 2011-02-24 | 2012-07-24 | International Business Machines Corporation | Database query optimization using clustering data mining |
US9208462B2 (en) * | 2011-12-21 | 2015-12-08 | Mu Sigma Business Solutions Pvt. Ltd. | System and method for generating a marketing-mix solution |
US20130212085A1 (en) * | 2012-02-09 | 2013-08-15 | Ianywhere Solutions, Inc. | Parallelizing Query Optimization |
US9087361B2 (en) | 2012-06-06 | 2015-07-21 | Addepar, Inc. | Graph traversal for generating table views |
US9015073B2 (en) * | 2012-06-06 | 2015-04-21 | Addepar, Inc. | Controlled creation of reports from table views |
US9411853B1 (en) * | 2012-08-03 | 2016-08-09 | Healthstudio, LLC | In-memory aggregation system and method of multidimensional data processing for enhancing speed and scalability |
US9208198B2 (en) | 2012-10-17 | 2015-12-08 | International Business Machines Corporation | Technique for factoring uncertainty into cost-based query optimization |
US8972378B2 (en) * | 2012-10-22 | 2015-03-03 | Microsoft Corporation | Formulating global statistics for distributed databases |
GB2508223A (en) * | 2012-11-26 | 2014-05-28 | Ibm | Estimating the size of a joined table in a database |
US9465826B2 (en) * | 2012-11-27 | 2016-10-11 | Hewlett Packard Enterprise Development Lp | Estimating unique entry counts using a counting bloom filter |
GB2508603A (en) | 2012-12-04 | 2014-06-11 | Ibm | Optimizing the order of execution of multiple join operations |
US9105062B2 (en) | 2012-12-13 | 2015-08-11 | Addepar, Inc. | Transaction effects |
US9135300B1 (en) * | 2012-12-20 | 2015-09-15 | Emc Corporation | Efficient sampling with replacement |
US10642918B2 (en) * | 2013-03-15 | 2020-05-05 | University Of Florida Research Foundation, Incorporated | Efficient publish/subscribe systems |
US9870415B2 (en) * | 2013-09-18 | 2018-01-16 | Quintiles Ims Incorporated | System and method for fast query response |
US9785645B1 (en) * | 2013-09-24 | 2017-10-10 | EMC IP Holding Company LLC | Database migration management |
US9361339B2 (en) | 2013-11-26 | 2016-06-07 | Sap Se | Methods and systems for constructing q, θ-optimal histogram buckets |
US10223410B2 (en) * | 2014-01-06 | 2019-03-05 | Cisco Technology, Inc. | Method and system for acquisition, normalization, matching, and enrichment of data |
US9953074B2 (en) * | 2014-01-31 | 2018-04-24 | Sap Se | Safe synchronization of parallel data operator trees |
US9842152B2 (en) * | 2014-02-19 | 2017-12-12 | Snowflake Computing, Inc. | Transparent discovery of semi-structured data schema |
US9792328B2 (en) | 2014-03-13 | 2017-10-17 | Sybase, Inc. | Splitting of a join operation to allow parallelization |
US9836505B2 (en) | 2014-03-13 | 2017-12-05 | Sybase, Inc. | Star and snowflake join query performance |
GB201409214D0 (en) * | 2014-05-23 | 2014-07-09 | Ibm | A method and system for processing a data set |
US10007644B2 (en) * | 2014-06-17 | 2018-06-26 | Sap Se | Data analytic consistency of visual discoveries in sample datasets |
US10671917B1 (en) | 2014-07-23 | 2020-06-02 | Hrl Laboratories, Llc | System for mapping extracted Neural activity into Neuroceptual graphs |
US10360506B2 (en) | 2014-07-23 | 2019-07-23 | Hrl Laboratories, Llc | General formal concept analysis (FCA) framework for classification |
US10740331B2 (en) * | 2014-08-07 | 2020-08-11 | Coupang Corp. | Query execution apparatus, method, and system for processing data, query containing a composite primitive |
US9424333B1 (en) | 2014-09-05 | 2016-08-23 | Addepar, Inc. | Systems and user interfaces for dynamic and interactive report generation and editing based on automatic traversal of complex data structures |
US9244899B1 (en) | 2014-10-03 | 2016-01-26 | Addepar, Inc. | Systems and user interfaces for dynamic and interactive table generation and editing based on automatic traversal of complex data structures including time varying attributes |
US9218502B1 (en) | 2014-10-17 | 2015-12-22 | Addepar, Inc. | System and architecture for electronic permissions and security policies for resources in a data system |
US10409835B2 (en) * | 2014-11-28 | 2019-09-10 | Microsoft Technology Licensing, Llc | Efficient data manipulation support |
US9798775B2 (en) * | 2015-01-16 | 2017-10-24 | International Business Machines Corporation | Database statistical histogram forecasting |
US10318866B2 (en) * | 2015-03-05 | 2019-06-11 | International Business Machines Corporation | Selectivity estimation using artificial neural networks |
US9875087B2 (en) * | 2015-04-10 | 2018-01-23 | Oracle International Corporation | Declarative program engine for large-scale program analysis |
CN107710239A (en) * | 2015-07-23 | 2018-02-16 | 赫尔实验室有限公司 | PARZEN window feature selecting algorithms for form concept analysis (FCA) |
US10204135B2 (en) | 2015-07-29 | 2019-02-12 | Oracle International Corporation | Materializing expressions within in-memory virtual column units to accelerate analytic queries |
US10366083B2 (en) * | 2015-07-29 | 2019-07-30 | Oracle International Corporation | Materializing internal computations in-memory to improve query performance |
US11443390B1 (en) | 2015-11-06 | 2022-09-13 | Addepar, Inc. | Systems and user interfaces for dynamic and interactive table generation and editing based on automatic traversal of complex data structures and incorporation of metadata mapped to the complex data structures |
US10732810B1 (en) | 2015-11-06 | 2020-08-04 | Addepar, Inc. | Systems and user interfaces for dynamic and interactive table generation and editing based on automatic traversal of complex data structures including summary data such as time series data |
US10372807B1 (en) | 2015-11-11 | 2019-08-06 | Addepar, Inc. | Systems and user interfaces for dynamic and interactive table generation and editing based on automatic traversal of complex data structures in a distributed system architecture |
US9747338B2 (en) | 2015-12-16 | 2017-08-29 | International Business Machines Corporation | Runtime optimization for multi-index access |
US10452656B2 (en) * | 2016-03-31 | 2019-10-22 | Sap Se | Deep filter propagation using explicit dependency and equivalency declarations in a data model |
WO2017175433A1 (en) * | 2016-04-06 | 2017-10-12 | 三菱電機株式会社 | Map data generation system and method for generating map data |
US10706354B2 (en) | 2016-05-06 | 2020-07-07 | International Business Machines Corporation | Estimating cardinality selectivity utilizing artificial neural networks |
US10162859B2 (en) * | 2016-10-31 | 2018-12-25 | International Business Machines Corporation | Delayable query |
US10642832B1 (en) | 2016-11-06 | 2020-05-05 | Tableau Software, Inc. | Reducing the domain of a subquery by retrieving constraints from the outer query |
US11055331B1 (en) | 2016-11-06 | 2021-07-06 | Tableau Software, Inc. | Adaptive interpretation and compilation of database queries |
US10565286B2 (en) * | 2016-12-28 | 2020-02-18 | Sap Se | Constructing join histograms from histograms with Q-error guarantees |
US10346398B2 (en) | 2017-03-07 | 2019-07-09 | International Business Machines Corporation | Grouping in analytical databases |
US10949438B2 (en) | 2017-03-08 | 2021-03-16 | Microsoft Technology Licensing, Llc | Database query for histograms |
WO2019147201A2 (en) * | 2017-07-26 | 2019-08-01 | Istanbul Sehir Universitesi | Method of estimation for the result cluster of the inquiry realized for searching string in database |
US10977240B1 (en) | 2017-10-21 | 2021-04-13 | Palantir Technologies Inc. | Approaches for validating data |
US11422983B2 (en) | 2017-12-13 | 2022-08-23 | Paypal, Inc. | Merging data based on proximity and validation |
US11080276B2 (en) * | 2018-02-23 | 2021-08-03 | Sap Se | Optimal ranges for relational query execution plans |
US11226955B2 (en) | 2018-06-28 | 2022-01-18 | Oracle International Corporation | Techniques for enabling and integrating in-memory semi-structured data and text document searches with in-memory columnar query processing |
US11153400B1 (en) | 2019-06-04 | 2021-10-19 | Thomas Layne Bascom | Federation broker system and method for coordinating discovery, interoperability, connections and correspondence among networked resources |
CN111159316B (en) * | 2020-02-14 | 2023-03-14 | 北京百度网讯科技有限公司 | Relational database query method, device, electronic equipment and storage medium |
US11392572B2 (en) * | 2020-03-02 | 2022-07-19 | Sap Se | Selectivity estimation using non-qualifying tuples |
US11455302B2 (en) * | 2020-05-15 | 2022-09-27 | Microsoft Technology Licensing, Llc | Distributed histogram computation framework using data stream sketches and samples |
CN111784246B (en) * | 2020-07-01 | 2023-04-07 | 深圳市检验检疫科学研究院 | Logistics path estimation method |
US11782918B2 (en) * | 2020-12-11 | 2023-10-10 | International Business Machines Corporation | Selecting access flow path in complex queries |
CN113220806B (en) * | 2021-03-15 | 2022-03-18 | 中山大学 | Large-scale road network direction judgment method and system based on derivative parallel line segments |
US11928128B2 (en) * | 2022-05-12 | 2024-03-12 | Truist Bank | Construction of a meta-database from autonomously scanned disparate and heterogeneous sources |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US588968A (en) * | 1897-08-31 | Vehicle-brake | ||
US4950894A (en) * | 1985-01-25 | 1990-08-21 | Fuji Photo Film Co., Ltd. | Radiation image read-out method |
US5167228A (en) * | 1987-06-26 | 1992-12-01 | Brigham And Women's Hospital | Assessment and modification of endogenous circadian phase and amplitude |
US4956774A (en) * | 1988-09-02 | 1990-09-11 | International Business Machines Corporation | Data base optimizer using most frequency values statistics |
US5883968A (en) * | 1994-07-05 | 1999-03-16 | Aw Computer Systems, Inc. | System and methods for preventing fraud in retail environments, including the detection of empty and non-empty shopping carts |
US5950185A (en) * | 1996-05-20 | 1999-09-07 | Lucent Technologies Inc. | Apparatus and method for approximating frequency moments |
US5987468A (en) * | 1997-12-12 | 1999-11-16 | Hitachi America Ltd. | Structure and method for efficient parallel high-dimensional similarity join |
US6278989B1 (en) * | 1998-08-25 | 2001-08-21 | Microsoft Corporation | Histogram construction using adaptive random sampling with cross-validation for database systems |
US6146830A (en) * | 1998-09-23 | 2000-11-14 | Rosetta Inpharmatics, Inc. | Method for determining the presence of a number of primary targets of a drug |
US6401088B1 (en) * | 1999-02-09 | 2002-06-04 | At&T Corp. | Method and apparatus for substring selectivity estimation |
-
1999
- 1999-07-30 CA CA2279359A patent/CA2279359C/en not_active Expired - Fee Related
- 1999-07-30 CA CA2743462A patent/CA2743462C/en not_active Expired - Fee Related
-
2000
- 2000-01-19 US US09/487,328 patent/US6865567B1/en not_active Expired - Lifetime
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112837250A (en) * | 2021-01-27 | 2021-05-25 | 武汉华中数控股份有限公司 | Infrared image self-adaptive enhancement method based on generalized histogram equalization |
CN112837250B (en) * | 2021-01-27 | 2023-03-10 | 武汉华中数控股份有限公司 | Infrared image self-adaptive enhancement method based on generalized histogram equalization |
Also Published As
Publication number | Publication date |
---|---|
US6865567B1 (en) | 2005-03-08 |
CA2743462A1 (en) | 2001-01-30 |
CA2743462C (en) | 2012-10-16 |
CA2279359C (en) | 2012-10-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA2279359A1 (en) | A method of generating attribute cardinality maps | |
WO2021189729A1 (en) | Information analysis method, apparatus and device for complex relationship network, and storage medium | |
JP4875200B2 (en) | Method for searching for an object appearing in an image, apparatus therefor, computer program, computer system, and computer-readable storage medium | |
TW455794B (en) | System and method for detecting clusters of information | |
Hu et al. | Distance indexing on road networks | |
US8768893B2 (en) | Identifying computer users having files with common attributes | |
CN105843956A (en) | Paging query method and system | |
CA2523128A1 (en) | Information retrieval and text mining using distributed latent semantic indexing | |
CN102737123B (en) | A kind of multidimensional data distribution method | |
CN102169491B (en) | Dynamic detection method for multi-data concentrated and repeated records | |
US5995970A (en) | Method and apparatus for geographic coordinate data storage | |
Kriegel et al. | Similarity search in structured data | |
CN103473268B (en) | Linear element spatial index structuring method, system and search method and system thereof | |
US20140074538A1 (en) | Stack handling operation method, system, and computer program | |
CN105279524A (en) | High-dimensional data clustering method based on unweighted hypergraph segmentation | |
CN112434031A (en) | Uncertain high-utility mode mining method based on information entropy | |
Wicaksono et al. | The comparison of apriori algorithm with preprocessing and FP-growth algorithm for finding frequent data pattern in association rule | |
CN101140583A (en) | Text searching method and device | |
CN106484782B (en) | A kind of large-scale medical image retrieval based on the study of multicore Hash | |
KR101030250B1 (en) | Data processing method and data processing program | |
JP4440246B2 (en) | Spatial index method | |
JP3938815B2 (en) | Node creation method, image search method, and recording medium | |
CN104715002A (en) | SPACE DIVISION METHOD and SPACE DIVISION DEVICE | |
KR101319647B1 (en) | Method for providing scalable collaborative filtering framework with overlapped information | |
Low et al. | Colour-based relevance feedback for image retrieval |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request | ||
MKLA | Lapsed |
Effective date: 20170731 |