US 6606579 B1 Abstract A method of combining spectral data with non-spectral data which uses a defined distance measure of likeness (DML) value and conditional probabilities. The method includes the steps of collecting the spectral and non-spectral data for the produce item, determining DML values between the spectral and the non-spectral data and reference produce data for a plurality of types of produce items, determining conditional probability densities for all of the types of produce items using the DML values, combining the conditional probability densities to form a combined conditional probability density, and determining probabilities for the types of produce items from the combined conditional probability density.
Claims(12) 1. A method of identifying a produce item comprising the steps of:
(a) collecting produce data from the produce item;
(b) determining Distance Measure of Likeness (DML) values between the produce data and reference produce data for a plurality of types of produce items;
(c) determining conditional probability densities for all of the types of produce items using the DML values;
(d) combining the conditional probability densities together to form a combined conditional probability density;
(e) determining probabilities for the types of produce items from the combined conditional probability density;
(f) determining a number of candidate identifications from the probabilities; and
(g) identifying the produce item from the number of candidate identifications.
2. The method as recited in
(e-1) determining an a priori probability for the types of produce items; and
(e-2) determining the probabilities for the types of produce items from the combined conditional probability density and the a priori probability.
3. The method as recited in
(g-1) displaying the number of candidate identifications; and
(g-2) recording an operator selection of one of the number of candidate identifications.
4. The method as recited in
collecting spectral data.
5. The method as recited in
collecting non-spectral data.
6. The method as recited in
collecting spectral data from a spectrometer; and
collecting non-spectral data.
7. A method of identifying a produce item comprising the steps of:
(a) collecting produce data from the produce item;
(b) determining Distance Measure of Likeness (DML) values between the produce data and reference produce data for a plurality of types of produce items;
(c) determining conditional probability densities for all of the types of produce items using the DML values;
(d) combining the conditional probability densities together to form a combined conditional probability density;
(e) determining an a priori probability for the types of produce items;
(f) determining probabilities for the types of produce items from the combined conditional probability density and the a priori probability;
(g) determining a number of candidate identifications from the probabilities;
(h) displaying the number of candidate identifications; and
(i) recording an operator selection of one of the number of candidate identifications.
8. A method of combining spectral data with non-spectral data for recognizing a produce item comprising the steps of:
(a) collecting the spectral and non-spectral data for the produce item;
(b) determining Distance Measure of Likeness (DML) values between the spectral and the non-spectral data and reference produce data for a plurality of types of produce items;
(c) determining conditional probability densities for all of the types of produce items using the DML values;
(d) combining the conditional probability densities to form a combined conditional probability density; and
(e) determining probabilities for the types of produce items from the combined conditional probability density.
9. The method as recited in
(e-1) determining an a priori probability for the types of produce items; and
(e-2) determining the probabilities for the types of produce items from the combined conditional probability density and the a priori probability.
10. A produce recognition system comprising:
a number of sources of produce data for a produce item; and
a computer system which determines Distance Measure of Likeness (DML) values between the produce data and reference produce data for a plurality of types of produce items, determines conditional probability densities for all of the types of produce items using the DML values, combines the conditional probability densities together to form a combined conditional probability density, determines probabilities for the types of produce items from the combined conditional probability density, determines a number of candidate identifications from the probabilities, and identifies the produce item from the number of candidate identifications.
11. A produce recognition system comprising:
a number of sources of produce data for a produce item; and
a computer system which determines Distance Measure of Likeness (DML) values between the produce data and reference produce data for a plurality of types of produce items, determines conditional probability densities for all of the types of produce items using the DML values, combines the conditional probability densities together to form a combined conditional probability density, determines an a priori probability for the types of produce items, determines probabilities for the types of produce items from the combined conditional probability density and the a priori probability, determines a number of candidate identifications from the probabilities, displays the number of candidate identifications, and records an operator selection of one of the number of candidate identifications.
12. The system as recited in
Description “A Produce Data Collector And A Produce Recognition System”, filed Nov. 10, 1998, invented by Gu, and having a U.S. Ser. No. 09/189,783, now U.S. Pat. No. 6,332,573; “System and Method of Recognizing Produce Items Using Probabilities Derived from Supplemental Information”, filed Jul. 10, 2000, invented by Kerchner, and having a U.S. Ser. No. 09/612,682; “Method of Recognizing Produce Items Using Checkout Frequency”, filed Aug. 16, 2000, invented by Gu, and having a U.S. Ser. No. 09/640,032, now issued as U.S. Pat. No. 6,409,085; and “Produce Texture Data Collecting Apparatus and Method”, filed Aug. 16, 2000, invented by Gu, and having a U.S. Ser. No. 09/640,0254 pending. The present invention relates to product checkout devices and more specifically to a method of combining spectral data with non-spectral data in a produce recognition system. Bar code readers are well known for their usefulness in retail checkout and inventory control. Bar code readers are capable of identifying and recording most items during a typical transaction since most items are labeled with bar codes. Items which are typically not identified and recorded by a bar code reader are produce items, since produce items are typically not labeled with bar codes. Bar code readers may include a scale for weighing produce items to assist in determining the price of such items. But identification of produce items is still a task for the checkout operator, who must identify a produce item and then manually enter an item identification code. Operator identification methods are slow and inefficient because they typically involve a visual comparison of a produce item with pictures of produce items, or a lookup of text in table. Operator identification methods are also prone to error, on the order of fifteen percent. A produce recognition system is disclosed in the cited co-pending application, U.S. Ser. No. 09/189,783, now U.S. Pat. No. 6,332,573. A produce item is placed over a window in a produce data collector, the produce item is illuminated, and the spectrum of the diffuse reflected light from the produce item is measured. A terminal compares the spectrum to reference spectra in a library. The terminal determines candidate produce items and corresponding confidence levels and chooses the candidate with the highest confidence level. The terminal may additionally display the candidates for operator verification and selection. Increases in speed and accuracy are important in a checkout environment. It would be desirable to improve the speed and accuracy of the produce recognition process by supplementing spectral data with additional information helpful to recognition. Types of data which could potentially be used to improve identification include texture data, size and shape data, weight and density data, and brightness data. Since each data type describes a different physical attribute of an object, combining them mathematically is difficult and non-trivial. Specifically, the spectral data may consist of dozens of variables, each corresponding to a single color band, while weight and brightness, for example, may each be represented by a single variable. Therefore, it would be desirable to provide a method of combining spectral data with non-spectral data in a produce recognition system. In accordance with the teachings of the present invention, a method of combining spectral data with non-spectral data in a produce recognition system is provided. A method is presented for using a defined distance measure of likeness (DML) algorithm and Bayes Rule to compute a probability of an unknown object being of a given class C The method includes the steps of collecting the spectral and non-spectral data for the produce item, determining DML values between the spectral and the non-spectral data and reference produce data for a plurality of types of produce items, determining conditional probability densities for all of the types of produce items using the DML values, combining the conditional probability densities to form a combined conditional probability density, and determining probabilities for the types of produce items from the combined conditional probability density. It is accordingly an object of the present invention to provide a method of combining spectral data with non-spectral data in a produce recognition system. It is another object of the present invention to improve the speed and accuracy of produce recognition. It is another object of the present invention to provide a produce recognition system and method. It is another object of the present invention to provide a produce recognition system and method which combining spectral data with non-spectral data. It is another object of the present invention to provide a produce recognition system and method which combines spectral data with non-spectral data using a distance measure of likeness (DML) value. It is another object of the present invention to provide a produce recognition system and method which combines spectral data with non-spectral data and which identifies produce items by sorting the distance measure of likeness (DML) values in ascending order and choosing the item with smallest distance as the most likely identification. Additional benefits and advantages of the present invention will become apparent to those skilled in the art to which this invention relates from the subsequent description of the preferred embodiments and the appended claims, taken in conjunction with the accompanying drawings, in which: FIG. 1 is a block diagram of a transaction processing system including a produce recognition system; FIG. 2 is a block diagram of a type of spectral data collector; FIG. 3 is a detailed view of the spectral data collector of FIG. 2; FIG. 4 is an illustration of a probability density distribution of random samples on a two-dimensional plane; FIG. 5 is an illustration of symmetric two-dimensional probability density distributions for two classes; FIG. 6 is an illustration of asymmetric two-dimensional probability density distributions for two classes of produce items; FIG. 7 is a flow diagram illustrating the recognition method of the present invention; FIG. 8 is a flow diagram illustrating spectral data reduction procedures; and FIG. 9 is a flow diagram illustrating non-spectral data reduction procedures for a particular type of non-spectral data collector. Referring now to FIG. 1, transaction processing system Bar code data collector Spectral data collector Non-spectral data collector For example, non-spectral data collector As another example, non-spectral data collector Components of non-spectral data collector Non-spectral data may also include a priori or supplemental probabilities, such as those derived from customer shopping histories. Shopping histories may be collected as part of a customer loyalty program. Database Classification library Reference data During a transaction, spectral data collector Bar code data collector Scale In the case of bar coded items, transaction terminal In the case of non-bar coded produce items, transaction terminal Each sample of spectral and non-spectral data represents an instance to be processed by produce recognition software In an alternative embodiment, identification of produce item Storage medium To assist in proper identification of produce items, produce recognition software Turning now to FIGS. 2 and 3, an example spectral data collector This example spectral data collector Example spectral data collector Light source Light source Other types of light sources A plurality of different-colored LEDs having different non-overlapping wavelength ranges may be employed, but may provide less than desirable collector performance if gaps exist in the overall spectral distribution. Spectrometer Light separating element Photodetector array Control circuitry Transparent window Housing Camera With reference to FIG. 3, spectral data collector Spectral data collector Printed circuit board Light source assembly Light source The illustrated embodiment includes sixteen white LEDs arranged in four groups Lower light source mount Upper light source mount Turning mirror Camera Stray light baffle Turning mirror In operation, an operator places produce item Transaction terminal Classification library More specifically, library If a class number is associated with more than one PLU number, the additional PLU numbers represent a produce item which is similar except for different non-classifiable features. These features include cultivation methods (regular verses organic), ripening methods (regular verses vine-ripened or tree-ripened), seeded verses seedless, etc. Since non-classifiable features are normally not discernable by visually examining produce items, their identification will have to be accomplished using additional means, such as stickers, color-coded rubber bands, etc. There are two ways to process produce items with non-distinguishable features. In the first method, a class with multiple PLU numbers is expanded into multiple choices when presented to the operator in the user interface, and the operator identifies a produce item A second method involves additional operator input. After the operator selects the correct class, produce recognition software In either case, produce recognition software Produce recognition software The DML algorithm allows the projection of any data type into a one-dimensional space, thus simplifying the multivariate conditional probability density function into an univariate function. While the sum of squared difference (SSD) is the simplest measure of distance between an unknown instance and instances of known items, the distance between an unknown instance and a class of instances is most relevant to the identification of unknown instances. A distance measure of likeness (DML) value provides a distance between an unknown instance and a class, with the smallest DML value yielding the most likely candidate. In more detail, each instance is a point in the N-dimensional space, where N is the number of parameters that are used to describe the instance. The distance between points P The distance between two instances, d(P In reality, there are always measurement errors due to instrumental noise and other factors. No two items of the same class are identical, and for the same item, the color and appearance changes over its surface area. The variations of orientation and distance of produce item In a supermarket, a large number of instance points are measured from all the items of a class. There are enough instances from all items for all instance points to be spread in a practically definable volume in the N-dimensional space or for the shape and size of this volume to completely characterize the appearances of all the items of the class. The shape of this volume may be regular, like a ball in three dimensions, and it may be quite irregular, like a dumbbell in three dimensions. Now if the unknown instance P happens to be in the volume of a particular class, then it is likely to be identifiable as an item of the class. There is no certainty that instance P is identifiable as an item in the class if there are other classes A class is not only best described in N-dimensional space, but also is best described statistically, i.e., each instance is a random event, and a class is a probability density distribution in a certain volume in N-dimensional space. As an example, consider randomly sampling items from a large number of items within the class “Garden Tomatoes”. The items in this class have relatively well defined color and appearance: they are all red, but there are slight color variations from item to item, and even from side to side of the same item. However, compared to other classes It is difficult to imagine, much less to illustrate, the relative positions and overlapping of classes A first ideal example in two-dimensional space is shown in FIG. An unknown instance P happens to be in the overlapping area of two classes, C Relative to the respective distance scale, instance P is closer to the typical instance P A second example in two-dimensional space is shown in FIG. Although the relative positions of P A generalized distance measure for symmetric and asymmetric distributions in two-dimensional space is herein defined. This distance measure is a Distance Measure of Likeness (DML) for an unknown instance P(x, y) relative to a class C where P The following DML definition is extended to N-dimensional space: where P(x Before a DML value may be calculated, the typical instance and the related distance scales must be determined. If each class has a relatively well-defined color and the instance-to-instance variations are mostly random, then the typical instance is well approximated by the average instance: where each class in library
Each instance point P is a vector sum. Thus, the distance scale for i-th dimension can be defined as the root-mean-square (rms) deviation of the i-th parameter: The conditional probability density function of the spectral data for a given class (containing classifiable items) can be modeled and computed using the DML distance value. Captured spectral data is discrete data defined by many small wavelength bands. A spectrometer may record color information in dozens or even hundreds of wavelength bands. However, since diffuse reflection has a continuous and relatively smooth spectrum, about sixty equally-spaced wavelength bands in the 400-700 nm may be adequate. The optimal number of wavelength bands depends on the application requirement and the actual resolution of the spectrometer. Let's define N Assuming that the spectral variation of the diffuse reflection from a given class of objects is due to intrinsic color variation and some relatively small measurement error, then for a given class, the DML value provides a distance measure in a N-dimensional space. If we model the conditional probability density with the multivariate normal density function, due to the definition of DML we have where D This model is valid if all spectral components are statistically independent. This may not be true if the intrinsic color variation within the class is the dominant component, since the spectral curve is smooth and continuous, the variation of neighboring wavelength bands will most likely to be somewhat correlated. A more general probability density may be established as a univariate function of the DML distance, such that
For example, it could be established from the histogram (in D While the above discussions are based on continuum spectral data, the DML algorithm and equations (10) & (11) can be applied to any other multivariate data types. Of course, it is also applicable to univariate cases. While the variation of different spectral components of a given class may be correlated, it is a very good assumption that the variation of data from different sources (i.e., derived from different physical attributes) are independent from each other. For a produce recognition system, relevant non-spectral data types may include data captured from produce item Other important non-spectral pieces of information do not involve the physical properties of the object. For example, the check-out frequency of a given produce item can be used as a priori information for recognition. Other a priori probabilities include time of year, geographical region, total sales, and combinations thereof. Data to support calculation of these probabilities may be derived from customer shopping information. These probability values can then be used to rank the classes and to determine a subset of the most likely classes. For any given type of data, if the data can be reduced to a single parameter, the conditional probability density is naturally a univariate function. If the data involves two or more variables, it can be reduced to a univariate function by using the DML distance, as defined in equation (10). In more detail, assume there are N
The univariate conditional probability density for each data type can be written as
Under the assumption that the variables from different data types are statistically independent from each other, the combined conditional probability density function can be defined as the product of all the individual density functions,
By applying equation (11), we have
Equation (14) is called the state-conditional probability density or class-conditional probability density. When the natural frequency (or probability of occurrence) for each class,
is available, the a posteriori probability for a given observation can be computed using Bayes Rule, where N
with The class with the highest probability as computed with equation (17) is the most likely class. For many decision theory applications, a unique identification is required, in which case choosing the class with the highest a posteriori probability will guarantee the lowest error rate, statistically. However, for applications in which the error rate would be too high if only a single choice were allowed, the a posteriori probability should be used instead to provide a ranked list of possible identifications. There are two scenarios for the application of the a posteriori probabilities: 1. Provide a ranked list of all classes. In this case, only the relative value of the probability matters and the constant factor Q in equation (18) can be ignored. 2. Provide a truncated list of the most likely classes. A ranked list of all N
if we set the minimum probability as P
thus the truncated rank-list of choices is given by the first k classes. Turning now to FIG. 7, the produce recognition method of the present invention begins with START In step In step In step In step In step In step In step In step In step In step Turning to FIG. 8, a data reduction method for spectral data used to build library In step In step
where N
where C In step Calibration information includes reference spectrum F where F Calibration information may also include a correction factor C In step Turning to FIG. 9, a data reduction method for non-spectral data from camera In steps In step Reference data readings R Reference data R In step The method ends in step Advantageously, an analysis of DML values provides an appropriate measure for identifying produce items. The technique may be applied equally well to identify other items. Although the invention has been described with particular reference to certain preferred embodiments thereof, variations and modifications of the present invention can be effected within the spirit and scope of the following claims. Specifically, while the DML algorithm developed in this invention was based on discussions of a special data type, spectral data, it may be applied to any data type or combination of data types with the same effect. For example, the technique may be applied to produce data from produce data collectors of other types. The instance point vector is just an abstract set of numbers, it may consist of one type or a number of different types of data: spectral data, physical dimensions, surface texture measures, weight, etc., and these data may be reduced in different ways or remapped to different spaces. Patent Citations
Referenced by
Classifications
Legal Events
Rotate |