« PreviousContinue »
RED-EYE DETECTION BASED ON RED REGION DETECTION WITH EYE CONFIRMATION
 This is a continuation of application Ser. No. 09/783,505, filed Feb. 13, 2001, entitled "Red-Eye Detection Based On Red Region Detection With Eye Confirmation", which is hereby incorporated by reference herein.
 This invention relates to detecting red eye, and more particularly to red-eye detection based on red region detection with eye confirmation.
BACKGROUND OF THE INVENTION
 Red-eye is a problem commonly encountered in photography when light (typically from the camera's flash) reflects off the retinas at the back of the subject's eyes and causes the subject's eyes to turn red. Red-eye has been a problem for many years, and although a variety of solutions have been proposed to cure the problem, these solutions tend to be costly, cumbersome, and/or ineffective. One such solution is to use a bounce flash so that light hits the subject's eyes from the side (or above/below) rather than straight-on, thereby preventing the reflected light from coming straight back to the camera's lens. Bounce flashes, however, are cumbersome (often rivaling the size of the camera) and costly. Another solution is to pre-flash the subject, thereby causing the subject's pupils to close and decrease the amount of light allowed into the subject's eyes when the picture is taken. These pre-flash solutions, however, are not always effective, and cause a delay (while the pre-flash is operating) before the picture is actually taken during which time the subject may move.
 Attempts have also been made to cure the red-eye problem after-the-fact by processing the image to remove the red from the eyes. Computer software packages are available that allow for the removal of red-eye, such as by changing the color of the red portion of the eye. Some systems require manual selection, by the user, of the pixels within the image that are part of the red eyes prior to removing the red-eye. These systems are rather user unfriendly due to the steps the user must follow to identify exactly which pixels are part of the red eyes.
 Other systems have attempted to automatically detect where the red-eye portions of an image are (as opposed to other non-eye portions of the image that are red). Such systems typically start by using face detection techniques to determine where any faces are in the image and where eyes are within those faces. Once these faces (and eyes within them) are detected, the systems try to determine whether the eyes are red eyes. These systems, however, can have poor performance under many circumstances (e.g., when a face is partially obscured, such as by heavy shadows or heavy beards, when the face has an unusual expression or is distorted, etc.).
 The invention described below addresses these disadvantages, providing improved red-eye detection systems and methods.
SUMMARY OF THE INVENTION
 Red-eye detection based on red region detection with eye confirmation is described herein.
 In accordance with one aspect, pixels that correspond to the color of red-eye within an image are identified. A determination is then made as to whether these identified pixels and surrounding areas are part of an eye or not part of an eye. Those identified pixels that are determined to be part of an eye are the detected red-eye regions.
 In accordance with another aspect, a skin color filter is initially applied to a received image to identify areas of the image that include skin color. Those areas are then searched to identify red pixels within the areas. Adjacent red pixels (or those red pixels close enough to one another) are grouped together and a shape filter applies several rules to the pixel groupings. Pixel groups remaining after the filtration process are candidate red-eye regions. These candidate red-eye regions are input to an eye confirmation process which uses a multi-scale process to confirm whether each candidate red-eye region is part of an eye.
BRIEF DESCRIPTION OF THE DRAWINGS
 The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings. The same numbers are used throughout the figures to reference like components and/or features.
 FIG. 1 illustrates an exemplary environment in which the present invention may be practiced.
 FIG. 2 illustrates an exemplary system for performing red-eye detection in accordance with certain embodiments of the invention.
 FIG. 3 illustrates an example of scaling an image in accordance with certain embodiments of the invention
 FIG. 4 is a flowchart illustrating an exemplary process for detecting red-eye regions in accordance with certain embodiments of the invention.
 FIG. 5 illustrates an example of a suitable operating environment in which the invention may be implemented.
 FIG. 1 illustrates an exemplary environment in which the present invention may be practiced. A camera 102 is used to take a picture(s) of a subject 104 using a flash on camera 102 that potentially causes red-eye in images captured of subject 104. These images captured by camera 102 (commonly referred to as a picture being taken) are analyzed for red-eye and the areas with red-eye automatically detected as discussed in more detail below. The red-eye detection can be performed at camera 102, or alternatively the captured images may be transferred to a computing device 106 that detects red-eye. Device 106 may be any of a wide variety of devices, such as a desktop or portable computer, copying or printing devices (e.g., a photograph enlargement device including a scanner and printer), etc.
 Camera 102 may be any type of image capture device that captures and stores (or communicates) images, such as a film camera, a digital camera, a video camera, a camcorder, etc. Camera 102 may capture images in any of a variety of conventional manners, such as exposing film on which the image is captured, exposing one or more charge coupled devices (CCDs) and storing a captured still image in memory (e.g., a removable Flash memory, hard disk (or
other magnetic or optical storage medium), or motion video tape), exposing one or more CCDs and storing multiple captured frames (a captured video sequence) on a recording medium (e.g., Flash memory, disk or tape), etc.
 FIG. 2 illustrates an exemplary system 120 for performing red-eye detection in accordance with certain embodiments of the invention. System 120 can be implemented in any of a wide variety of devices, such as computers (whether desktop, portable, handheld, etc), image capture devices (e.g., camera 102 of FIG. 1), etc. Alternatively, system 120 may be a standalone system for coupling to (or incorporation within) other devices or systems.
 System 120 receives an image 122 into a red region detection module 124. Image 122 is received in digital format, but can be received from any of a wide variety of sources including sources that capture images in a nondigital format (e.g., on film) but that are subsequently converted to digital format (digitized). In the illustrated example, image 122 is made up of multiple pixels that can be referenced in a conventional manner using an x, y coordinate system. Red region detection module 124 detects red regions that are potentially regions of red-eye and identifies those detected regions to an eye confirmation module 126. Eye confirmation module 126 confirms each detected region as being either part of an eye or not part of an eye, and outputs an indication 128 of those detected regions that are confirmed as being parts of eyes. The identified detected red-eye regions 128 can then be made available to other systems for further processing, such as automatic removal of the red-eye regions (e.g., by changing the red color to black).
 Red region detection module 124 includes a red pixel identifier 130, a pixel grouper 132, and a filter 134. Image 122 is received by red pixel identifier 130 which analyzes image 122 on a per-pixel basis and identifies which of the pixels are "red" pixels. These identified red pixels are those pixels having a color that is associated with the colors typically found in red-eye. Identifier 130 may analyze each pixel in image 122, or alternatively only a subset of the pixels in image 122. For example, if a large number of red pixels in a circular pattern are identified then some of the pixels in the center of that pattern need not be analyzed. By way of another example, analysis of some other pixels may simply be skipped (e.g., at the corners or edges of the image), although skipping such analysis may degrade the performance of the red-eye detection.
 In one implementation, skin color filter module 136 detects those areas of image 122 that include skin color and communicates those images to identifier 130, thereby allowing identifier 130 to analyze only those pixels that are within the areas that include skin color. Different skin color filters can be applied by module 136, and in one implementation a skin color classifier is used in which color quantization of the original image is initially performed in order to improve skin color segmentation by homogenizing the image regions. The quantized color image is then segmented according to skin color characteristics based on either the YCbCr color model or the HSV (Hue, Saturation, Value) color model. This color quantization and image segmentation is discussed in more detail in Christophe Garcia and Georgios Tziritas, "Face Detection Using Quantized Skin Color Regions Merging and Wavelet Packet Analysis",
IEEE Transactions on Multimedia, Vol. 1, No. 3, Sep. 1999, which is hereby incorporated by reference.
 Given that red-eye is not typically a single shade of red, pixel identifier 130 uses a red-eye color model to which the color of each pixel being analyzed is compared. Based on this comparison to the red-eye color model, pixel identifier 130 determines whether the pixel is or is not a red pixel.
 In the illustrated example, the pixels of image 122 are 24-bit color pixels that are represented using the conventional RGB (Red, Green, Blue) color model, in which three different dots (one red, one green, and one blue) are energized to different intensities to create the appropriate color for the pixel. The 24 bits of color information identify the intensity that each of the three different dots is to be energized to in order to display the pixel. The RGB color model is well known to those skilled in the art and thus will not be discussed further except as it pertains to the present invention.
 Identifier 130 converts the 24-bit color model using RGB into a two-dimensional space referred to herein as the g and y characteristics. The g and Y characteristics are determined based on the three components of the RGB model as follows:
 This two-dimensional space using the g and Y characteristics is previously trained (e.g., offline) using multiple color samples from known red-eye pixels. This results in a two-dimensional Gaussian distribution for redeye colors based on the g and Y characteristics. Once the g and Y characteristics are generated for the pixel being analyzed, the g and y characteristics for that pixel are compared to the Gaussian distribution. If the g and y characteristics of the pixel are within a threshold probability of the Mixture Gaussian distribution, then identifier 130 determines that the pixel is a red pixel; otherwise identifier 130 determines that the pixel is not a red pixel. In one implementation, the threshold probability is 0.6, although different values could alternatively be used.
 Identifier 130 outputs an identification of each of the red pixels to pixel grouper 132. This identification can take any of a variety of forms, such as an x, y coordinate position of each pixel. Each of the identified pixels is a candidate red-eye pixel. Pixel grouper 132 groups together the candidate red-eye pixels identified by identifier 130 into one or more pixel groups. Pixel grouper 132 groups together any two adjacent candidate red-eye pixels into the same pixel group. In one implementation, two pixels are adjacent if each of their x and y coordinate values differs by no greater than one. Thus, each pixel surrounding a given pixel (whether above, below, to the left, to the right, or at a diagonal) is an adjacent pixel. Alternatively, surrounding diagonal pixels may not be considered adjacent. Pixel grouper 132 may optionally group together two candidate redeye pixels that are not adjacent but are within a threshold
distance of one another (e.g., separated by not more than one or two pixels) into the same group. Pixel grouper 132 then identifies these pixel groups (which includes any single pixels as their own groups) to filter 134.
 Filter 134 filters the pixel groups received from pixel grouper 132 based on a set of one or more heuristic rules. Filter 134 identifies certain pixel groups as being potential (or candidate) red-eye regions based on these rules and outputs an identification of the candidate red-eye regions to eye confirmation module 126.
 In one implementation, filter 134 begins by assuming all of the pixel groups received from grouper 132 are candidate red-eye regions, and then uses the following five rules to eliminate pixel groups. The pixel groups remaining (if any) after application of these rules are the candidate red-eye regions output to eye confirmation module 126.
 Rule 1) If the entire image is red (e.g., greater than a threshold number amount of the pixels in the picture are red, such as 95%) then none of the pixel groups are red-eye regions.
 Rule 2) A pixel group containing too few pixels (e.g., five or less) is not a red-eye region.
 Rule 3) A pixel group that is more rectangular than circular is not a red-eye region. The shape of a pixel group can be determined in any of a wide variety of conventional manners, such as based on the circumference of the group.
 Rule 4) A pixel group having an aspect ratio substantially different from a circle is not a red-eye region. The aspect ratio of the pixel group can be identified by calculating the distance between the rightmost and leftmost pixels in the group (the horizontal aspect), as well as the distance between the uppermost and lowermost pixels in the group (the vertical aspect). The aspect ratio is then the horizontal aspect divided by the vertical aspect. For a circle, the aspect ratio is one. In the illustrated example, a pixel group with an aspect ratio less than a lower bound or greater than an upper bound is not a red-eye region. In one implementation, the lower bound is 0.5 and the upper bound 2.0.
 Rule 5) A pixel group having a low filling ratio (e.g., less than 0.6) is not a red-eye region. The filling ratio is the number of red pixels in the group divided by the product of the horizontal aspect and the vertical aspect (as described in Rule 4).
 Filter 134 identifies the resultant candidate red-eye regions to eye confirmation module 126. This identification can take any of a variety of forms, such as the x, y coordinates of each pixel in the grouping, one pixel of the grouping (e.g., at the center) and the size of the grouping, a single pixel of the grouping (e.g., at the center), etc. Eye confirmation module 126 moves a window the size of an eye template (also referred to as an eye detector) around the image (e.g., starting with the eye template at or close to the center of the grouping) and determines whether the pixels of the image within the windows match the eye template. The eye template is trained based on multiple previously analyzed (e.g., offline) eyes, which includes both the pupil areas (which include the red-eye portion) and the areas surrounding the pupil (which may include, for example, the iris and the sclera, as well as possibly the skin, eyelashes, and eyebrows surrounding the eyeball, etc.). The eye confirma
tion module 126 analyzes the area surrounding the candidate red-eye regions to determine whether the regions are part of an eye (and thus truly red-eye regions) or not part of an eye (and thus not red-eye regions). However, only areas close to the pupil are analyzed (the entire face is not detected). In one implementation the window is 25 pixels (horizontally) by 15 pixels (vertically), although windows of other sizes may be used.
 To perform the confirmation for a particular candidate red-eye region, the window is positioned over (e.g., centered on) the red-eye region and the pixels within the window are classified, based on the eye template, as being either an eye or not an eye. If the pixels are classified as an eye, then no further analysis need be made for that eye. Alternatively, additional analysis may be performed (by moving the window over the image in the horizontal and/or vertical directions and repeating the classification) in order to identify the actual location (boundaries) of the eye (e.g., the eyeball, including the pupil, iris, and sclera). However, if the pixels are classified as not an eye, then the window is adjusted in the horizontal and/or vertical direction and the classification repeated. The window can be moved around multiple times and in multiple directions in an attempt to "locate" the eye if the candidate red-eye region is indeed a red-eye region. In one implementation, movement of the window is limited to ranging from -3 pixels to +3 pixels from the starting location in both the horizontal and vertical directions.
 The classification of the pixels within the window based on the trained eye template can be performed using any of a wide variety of conventional classification schemes. In one implementation, a conventional SVM (Support Vector Machine) classifier is trained using both images of eyes and images of objects similar to eyes but that are not eyes. Based on this training, the SVM classifier can classify the pixels in a window as being either an eye or not an eye. In another implementation, the pixels in the window are normalized to account for variations in lighting conditions and then projected onto an eigenspace representation which returns a feature vector for the candidate eye region. This feature vector is then input to a neural network trained with images of eyes, which classifies the feature vector as either an eye or not an eye. The operation and use of SVM and neural networks for classification are both well-known to those skilled in the art and thus will not be discussed further except as they pertain to the present invention.
 In the illustrated example, the size of the eye template and the size of the window are both fixed. Thus, some accommodation is made to account for the different sizes of eyes that may appear in images (e.g., based on how close the camera is to the subject, the size of the subject, how much the camera may have been "zoomed" for the picture, etc.). In one implementation, rather than having a fixed size eye template and window, multiple different-sized eye templates and windows are used to accommodate for these differences.
 In another implementation, the eye template and window sizes remain fixed, but the scale of the image is modified. FIG. 3 illustrates an example of the scaling of the image in accordance with certain embodiments of the invention. An image 160 is illustrated including multiple candidate red-eye regions 162, 164, 166, 168, and 170. For ease