US 7065233 B2 Abstract Described herein is a technique for creating a 3D face model using images obtained from an inexpensive camera associated with a general-purpose computer. Two still images of the user are captured, and two video sequences. The user is asked to identify five facial features, which are used to calculate a mask and to perform fitting operations. Based on a comparison of the still images, deformation vectors are applied to a neutral face model to create the 3D model. The video sequences are used to create a texture map. The process of creating the texture map references the previously obtained 3D model to determine poses of the sequential video images.
Claims(15) 1. One or more computer-readable media containing a program that is executable by a computer to create a 3D model of a face, the program comprising the following actions:
capturing at least two images of the face;
finding matched points in the two images based on locations identified by a human user on at least one of the images;
estimating rotation and translation of the face based on the matched points in the images;
determining 3D positions of the matched points based on the estimated rotation and translation to create a 3D representation of the face;
capturing a succession of 2D images containing the face in a range of poses;
determining poses of the face by performing the following actions for each 2D image in succession:
matching points of the face in the 2D image with corresponding points of a previous 2D image whose pose is already known, wherein the matched points of the images have corresponding 3D positions in fitted neutral face model, the 3D positions being determined by the poses of the images; and
calculating a pose for the 2D image that minimizes differences between projections of 3D positions of matched points of the previous image onto the given image and the corresponding matched points of the given image.
2. One or more computer-readable media as recited in
fitting a neutral face model to the 3D representation by applying deformation vectors to the neutral face model.
3. One or more computer-readable media as recited in
4. One or more computer-readable media as recited in
5. One or more computer-readable media as recited in
6. A method to create a 3D model of a face, the method comprising the following actions:
capturing at least two images of the face;
finding matched points in the two images based on locations identified by a human user on at least one of the images;
estimating rotation and translation of the face based on the matched points in the images;
determining 3D positions of the matched points based on the estimated rotation and translation to create a 3D representation of the face;
capturing a succession of 2D images containing the face in a range of poses;
determining poses of the face by performing the following actions for each 2D image in succession:
matching points of the face in the 2D image with corresponding points of a previous 2D image whose pose is already known, wherein the matched points of the images have corresponding 3D positions in fitted neutral face model, the 3D positions being determined by the poses of the images; and
calculating a pose for the 2D image that minimizes differences between projections of 3D positions of matched points of the previous image onto the given image and the corresponding matched points of the given image.
7. A method as recited in
fitting a neutral face model to the 3D representation by applying deformation vectors to the neutral face model.
8. A method as recited in
creating a texture map from the succession of 2D images in conjunction with the determined poses of the 2D images.
9. A method as recited in
10. A method as recited in
11. A device that creates a 3D model of a face, the device being configured to perform actions comprising:
means for capturing at least two images of the face;
means for finding matched points in the two images based on locations identified by a human user on at least one of the images;
means for estimating rotation and translation of the face based on the matched points in the images;
means for determining 3D positions of the matched points based on the estimated rotation and translation to create a 3D representation of the face;
means for capturing a succession of 2D images containing the face in a range of poses;
means for determining poses of the face by performing the following actions for each 2D image in succession:
means for matching points of the object in the 2D image with corresponding points of a previous 2D image whose pose is already known, wherein the matched points of the images have corresponding 3D positions in fitted neutral face model, the 3D positions being determined by the poses of the images; and
means for calculating a pose for the 2D image that minimizes differences between projections of 3D positions of matched points of the previous image onto the given image and the corresponding matched points of the given image.
12. A device as recited in
means for fitting a neutral face model to the 3D representation by applying deformation vectors to the neutral face model.
13. A device as recited in
14. A device as recited in
15. A device as recited in
creating a texture map from the succession of 2D images in conjunction with the calculated poses of the 2D images.
Description This application is a continuation of U.S. patent application Ser. No. 10/967,765, filed Oct. 18, 2004; which is a continuation of U.S. patent application Ser. No. 09/754,938, filed Jan. 4, 2001 now U.S. Pat. No. 6,807,290; which claims the benefit of U.S. Provisional Application No. 60/188,603, filed Mar. 9, 2000. The disclosure below relates to generating realistic three-dimensional human face models and facial animations from still images of faces. One of the most interesting and difficult problems in computer graphics is the effortless generation of realistic looking, animated human face models. Animated face models are essential to computer games, film making, online chat, virtual presence, video conferencing, etc. So far, the most popular commercially available tools have utilized laser scanners. Not only are these scanners expensive, the data are usually quite noisy, requiring hand touchup and manual registration prior to animating the model. Because inexpensive computers and cameras are widely available, there is a great interest in producing face models directly from images. In spite of progress toward this goal, the available techniques are either manually intensive or computationally expensive. Facial modeling and animation has been a computer graphics research topic for over 25 years [6, 16, 17, 18, 19, 20, 21, 22, 23, 27, 30, 31, 33]. The reader is referred to Parke and Waters' book [23] for a complete overview. Lee et al. [17, 18] developed techniques to clean up and register data generated from laser scanners. The obtained model is then animated using a physically based approach. DeCarlo et al. [5] proposed a method to generate face models based on face measurements randomly generated according to anthropometric statistics. They showed that they were able to generate a variety of face geometries using these face measurements as constraints. A number of researchers have proposed to create face models from two views [1, 13, 4]. They all require two cameras which must be carefully set up so that their directions are orthogonal. Zheng [37] developed a system to construct geometrical object models from image contours, but it requires a turn-table setup. Pighin et al. [26] developed a system to allow a user to manually specify correspondences across multiple images, and use vision techniques to computer 3D reconstructions. A 3D mesh model is then fit to the reconstructed 3D points. They were able to generate highly realistic face models, but with a manually intensive procedure. Blanz and Vetter [3] demonstrated that linear classes of face geometries and images are very powerful in generating convincing 3D human face models from images. Blanz and Vetter used a large image database to cover every skin type. Kang et al. [14] also use linear spaces of geometrical models to construct 3D face models from multiple images. But their approach requires manually aligning the generic mesh to one of the images, which is in general a tedious task for an average user. Fua et al. [8] deform a generic face model to fit dense stereo data, but their face model contains a lot more parameters to estimate because basically all of the vertexes are independent parameters, plus reliable dense stereo data are in general difficult to obtain with a single camera. Their method usually takes 30 minutes to an hour, while ours takes 2–3 minutes. Guenter et al. [9] developed a facial animation capturing system to capture both the 3D geometry and texture image of each frame and reproduce high quality facial animations. The problem they solved is different from what is addressed here in that they assumed the person's 3D model was available and the goal was to track the subsequent facial deformations. The system described below allows an untrained user with a PC and an ordinary camera to create and instantly animate his/her face model in no more than a few minutes. The user interface for the process comprises three simple steps. First, the user is instructed to pose for two still images. The user is then instructed to turn his/her head horizontally, first in one direction and then the other. Third, the user is instructed to identify a few key points in the images. Then the system computes the 3D face geometry from the two images, and tracks the video sequences, with reference to the computed 3D face geometry, to create a complete facial texture map by blending frames of the sequence. To overcome the difficulty of extracting 3D facial geometry from two images, the system matches a sparse set of corners and uses them to compute head motion and the 3D locations of these corner points. The system then fits a linear class of human face geometries to this sparse set of reconstructed corners to generate the complete face geometry. Linear classes of face geometry and image prototypes have previously been demonstrated for constructing 3D face models from images in a morphable model framework. Below, we show that linear classes of face geometries can be used to effectively fit/interpolate a sparse set of 3D reconstructed points. This novel technique allows the system to quickly generate photorealistic 3D face models with minimal user intervention. The following description sets forth a specific embodiment of a 3D modeling system that incorporates elements recited in the appended claims. The embodiment is described with specificity in order to meet statutory requirements. However, the description itself is not intended to limit the scope of this patent. Rather, the claimed invention might eventually be embodied in other ways, to include different elements or combinations of elements similar to the ones described in this document, in conjunction with other present or future technologies. System Overview The video camera is an inexpensive model such as many that are widely available for Internet videoconferencing. We assume the intrinsic camera parameters have been calibrated, a reasonable assumption given the simplicity of calibration procedures [36]. Data Capture The first stage is data capture. The user takes two images with a small relative head motion, and two video sequences: one with the head turning to each side. Alternatively, the user can simply turn his/her head from left all the way to the right, or vice versa. In that case, the user needs to select one approximately frontal view while the system automatically selects the second image and divides the video into two sequences. In the seque, we call the two images the base images. The user then locates five markers in each of the two base images. As shown in The next processing stage computes the face mesh geometry and the head pose with respect to the camera frame using the two base images and markers as input. The final stage determines the head motions in the video sequences, and blends the images to generate a facial texture map. Notation We denote the homogeneous coordinates of a vector x by {tilde over (x)}, i.e., the homogeneous coordinates of an image point m=(u,v) The fundamental geometric constraint between two images is known as the epipolar constraint [7, 35]. It states that in order for a point m in one image and a point m′ in the other image to be the projections of a single physical point in space, or in other words, in order for them to be matched, they must satisfy
Instead of representing a face as a linear combination of real faces or face models, we represent it as a linear combination of a neutral face model and some number of face metrics, where a metric is a deformation vector that linearly deforms a face in a certain way, such as to make the head wider, make the nose bigger, etc. Each deformation vector specifies a plurality of displacements corresponding respectively to the plurality of 3D points of the neutral face model. To be more precise, let's denote the face geometry by a vector S=(v We now describe our techniques to determine the face geometry from just two views. The two base images are taken in a normal room by a static camera while the head is moving in front. There is no control on the head motion, and the motion is unknown. We have to determine first the motion of the head and match some pixels across the two views before we can fit an animated face model to the images. However, some preprocessing of the images is necessary. Determining Facial Portions of the Images There are at least three major groups of objects undergoing different motions between the two views: background, head, and other parts of the body such as the shoulder. If we do not separate them, there is no way to determine a meaningful head motion, since the camera is static, we can expect to remove the background by subtracting one image from the other. However, as the face color changes smoothly, a portion of the face may be marked as background. Another problem with the image subtraction technique is that the moving body and the head cannot be distinguished. An initial step A step A step A step Either union or intersection of the two mask images is not enough to locate the face because it will include either too many (e.g., including undesired moving body) or too few (e.g., missing desired eyes and mouth) pixels. Since we already have information about the position of eye corners and mouth corners, we initially predict the approximate boundaries of the facial portion of each image, based on the locations identified by the user. More specifically, step In addition, step Within the inner ellipse, a “union” or “joining” operation The above steps result in a final mask image ( Corner Matching and Motion Determination One popular technique of image registration is optical flow [12, 2], which is based on the assumption that the intensity/color is conserved. This is not the case in our situation: the color of the same physical point appears to be different in images because the illumination changes when the head is moving. We therefore resort to a feature-based approach that is more robust to intensity/color variations. It consists of the following steps: (i) detecting corners in each image; (ii) matching corners between the two images; (iii) detecting false matches based on a robust estimation technique; (iv) determining the head motion; (v) reconstructing matched points in 3D space. Corner Detection. In a step Corner Matching. In a step False Match Detection. Operation Motion Estimation In a step However, the image locations of the feature point are not usually precise. A human typically cannot mark the feature points with high precision. An automatic facial feature detection algorithm may not produce perfect results. When there are errors, a five-point algorithm is not robust even when refined with a well-known bundle adjustment technique. For each of the five feature points, its 3D coordinates (x, y, z) coordinates need to be determined—fifteen (15) unknowns. Then, motion vector (R To substantially increase the robustness of the five point algorithm, a new set of parameters is created. These parameters take into consideration physical properties of the feature points. The property of symmetry is used to reduce the number of unknowns. Additionally, reasonable lower and upper bounds are placed on nose height and are represented as inequality constraints. As a result, the algorithm becomes more robust. Using these techniques, the number of unknowns is significantly reduced below 20. Even though the following algorithm is described with respect to five feature points, it is straightforward to extend the idea to any number of feature points less than or greater than five feature points for improved robustness. Additionally, the algorithm can be applied to other objects besides a face as long as the other objects represent some level of symmetry. Head motion estimation is first described with respect to five feature points. Next, the algorithm is extended to incorporate other image point matches obtained from image registration methods. Head Motion Estimation from Five Feature Points. -
- (1) A line E
_{1}E_{2 }connecting the eye corners E_{1 }and E_{2 }is parallel to a line M_{1}M_{2 }connecting the mouth corners. - (2) A line centered on the nose (e.g., line EOM when viewed straight on or lines NM or NE when viewed from an angle as shown) is perpendicular to mouth line M
_{1}M_{2 }and to eye line E_{1}E_{2}.
- (1) A line E
Let π be the plane defined by E By redefining the coordinate system, the number of parameters used to define five feature points is reduced from nine (9) parameters for generic five points to five (5) parameters for five feature points in this local coordinate system. Let t denote the coordinates of O under the camera coordinate system, and R the rotation matrix whose three columns are vectors of the three coordinate axis of Ω To make the system even more robust, we add an inequality constraint on e. The idea is to force e to be positive and not too large compared to a, b, c, d. In the context of the face, the nose is always out of plane π. In particular, we use the following inequality:
In summary, based on equations (1), (2) and (4), we estimate a, b, c, d, e, (R, t) and (R′, t′) by minimizing Incorporating Image Point Matches. If we estimate camera motion using is only the five user marked points, the result is sometimes not very accurate because the markers contain human errors. In this section, we describe how to incorporate the image point matches (obtained by any feature matching algorithm) to improve precision. Let (m Let (R In summary, the objective function (6) becomes
Notice that this is a much smaller minimization problem. We only need to estimate 16 parameters as in the five-point problem (5), instead of 16 +3 K unknowns. To obtain a good initial estimate, we first use only the five feature points to estimate the head motion by using the algorithm described in Section 2. Thus we have the following two step algorithm: -
- Step1. Set w
_{p}=0. Solve minimization problem 8. - Step2. Set w
_{p}=1. Use the results of step 1 as the initial estimates. Solve minimization problem (8).
- Step1. Set w
Notice that we can apply this idea to the more general cases where the number of feature points is not five. For example, if there are only two eye corners and mouth corners, we'll end up with 14 unknowns and 16+3 K equations. Other symmetric feature points (such as the outside eye corners, nostrils, and the like) can be added into equation 8 in a similar way by using the local coordinate system Ω Head Motion Estimation Results. In this section, we show some test results to compare the new algorithm with the traditional algorithms. Since there are multiple traditional algorithms, we chose to implement the algorithm as described in [34]. It works by first computing an initial estimate of the head motion from the essential matrix [7], and then re-estimate the motion with a nonlinear least-squares technique.) We have run both the traditional algorithm and the new algorithm on many real examples. We found many cases where the traditional algorithm fails while the new algorithm successfully results in reasonable camera motions. When the traditional algorithm fails, the computed motion is completely bogus, and the 3D reconstructions give meaningless results. But the new algorithm gives a reasonable result. We generate 3D reconstructions based on the estimated motion, and perform Delauney triangulation. We have also performed experiments on artificially generated data. We arbitrarily select 80 vertices from a 3D face model and project its vertices on two views (the head motion is eight degrees apart). The image size is 640 by 480 pixels. We also project the five 3D feature points (eye corners, nose top, and mouth corners) to generate the image coordinates of the markers. We then add random noises to the coordinates (u, v) of both the image points and the markers. The noises are generated by a pseudo-random generator subject to Gausian distribution with zero mean and variance ranging from 0.4 to 1.2. We add noise to the marker's coordinates as well. The results are plotted in We can see that as the noise increases, the error of the traditional algorithm has a sudden jump at certain point. But, the errors of our new algorithm grow much more slowly. 3D Reconstruction. In a step 3D positions of the markers are determined in the same way. Fitting a Face Model This stage of processing creates a 3D model of the face. The face model fitting process consists of two steps: fitting to 3D reconstructed points and fine adjustment using image information. 3D Fitting A step The vertex coordinates of the face mesh in the camera frame is a function of both the metric coefficients and the pose of the face. Given metric coefficients (c Let (p To solve this problem, we use an iterative closest point approach. At each iteration, we first fix T. For each p Fine Adjustment Using Image Information After the geometric fitting process, we have now a face mesh that is a close approximation to the real face. To further improve the result, we perform a search We use the snake approach [15] to compute the silhouettes of the face. The silhouette of the current face mesh is used as the initial estimate. For each point on this piecewise linear curve, we find the maximum gradient location along the normal direction within a small range (10 pixels each side in our implementation). Then we solve for the vertexes (acting as control points) to minimize the total distance between all the points and their corresponding maximum gradient locations. We use a similar approach to find the upper lips. To find the outer eye corner (not marked), we rotate the current estimate of that eye corner (given by the face mesh) around the marked eye corner by a small angle, and look for the eye boundary using image gradient information. This is repeated for several angles, and the boundary point that is the most distant to the marked corner is chosen as the outer eye corner. We could also use the snake approach to search for eyebrows. However, our current implementation uses a slightly different approach. Instead of maximizing image gradients across contours, we minimize the average intensity of the image area that is covered by the eyebrow triangles. Again, the vertices of the eyebrows are only allowed to move in a small region bounded by their neighboring vertices. This has worked very robustly in our experiments. We then use the face features and the image silhouettes as constraints in our system to further improve the mesh, in a step Face Texture From Video Sequence Now we have the geometry of the face from only two views that are close to the frontal position. For the sides of the face, the texture from the two images is therefore quite poor or even not available at all. Since each image only covers a portion of the face, we need to combine all the images in the video sequence to obtain a complete texture map. This is done by first determining the head pose for the images in the video sequence and then blending them to create a complete texture map. Determining Head Motions in Video Sequences In an operation We will denote the images on the video sequences by I In general, it is inefficient to use all the images in the video sequence for texture blending, because head motion between two consecutive frames is usually very small. To avoid unnecessary computation, the following process is used to automatically select images from the video sequence. Let us call the amount of rotation of the head between two consecutive frames the rotation speed. If s is the current rotation speed and α is the desired angle between each pair of selected images, the next image is selected α/s frames away. In our implementation, the initial guess of the rotation speed is set to 1 degree/frame and the desired separation angle is equal to 5 degrees. Texture Blending Operation
Because the rendering operations can be done using graphics hardware, this approach is very fast. User Interface We have built a user interface to guide the user through collecting the required images and video sequences, and marking two images. The generic head model without texture is used as a guide. Recorded instructions are lip-synced with the head directing the user to first look at a dot on the screen and push a key to take a picture. A second dot appears and the user is asked to take the second still image. The synthetic face mimics the actions the user is to follow. After the two still images are taken, the guide directs the user to slowly turn his/her head to record the video sequences. Finally, the guide places red dots on her own face and directs the user to do the same on the two still images. The collected images and markings are then processed and a minute or two later they have a synthetic head that resembles them. Animation Having obtained the 3D textured face model, the user can immediately animate the model with the application of facial expressions including frowns, smiles, mouth open, etc. To accomplish this we have defined a set of vectors, which we call posemes. Like the metric vectors described previously, posemes are a collection of artist-designed displacements. We can apply these displacements to any face as long as it has the same topology as the neutral face. Posemes are collected in a library of actions and expressions. The idle motions of the head and eyeballs are generated using Perlin's noise functions [24, 25]. Results We have used our system to construct face models for various people. No special lighting equipment or background is required. After data capture and marking, the computations take between 1 and 2 minutes to generate the synthetic textured head. Most of this time is spent tracking the video sequences. For people with hair on the sides or the front of the face, our system will sometimes pick up corner points on the hair and treat them as points on the face. The reconstructed model may be affected by them. For example, a subject might have hair lying down over his/her forehead, above the eyebrows. Our system treats the points on the hair as normal points on the face, thus the forehead of the reconstructed model is higher than the real forehead. In some animations, we have automatically cut out the eye regions and inserted separate geometries for the eyeballs. We scale and translate a generic eyeball model. In some cases, the eye textures are modified manually by scaling the color channels of a real eye image to match the face skin colors. We plan to automate this last step shortly. Even though the system is quite robust, it fails sometimes. We have tried our system on twenty people, and our system failed on two of them. Both people are young females with very smooth skin, where the color matching produces too few matches. Perspectives Very good results obtained with the current system encourage us to improve the system along three directions. First, we are working at extracting more face features from two images, including the lower lip and nose. Second, face geometry is currently determined from only two views, and video sequences are used merely for creating a complete face texture. We are confident that a more accurate face geometry can be recovered from the complete video sequences. Third, the current face mesh is very sparse. We are investigating techniques to increase the mesh resolution by using higher resolution face metrics or prototypes. Another possibility is to computer a displacement map for each triangle using color information. Several researchers in computer vision are working at automatically locating facial features in images [29]. With the advancement of those techniques, a completely automatic face modeling system can be expected, even though it is not a burden to click just five points with our current system. Additional challenges include automatic generation of eyeballs and eye texture maps, as well as accurate incorporation of hair, teeth, and tongues. Conclusions We have developed a system to construct textured 3D face models from video sequences with minimal user intervention. With a few simple clicks by the user, our system quickly generates a person's face model which is animated right away. Our experiments show that our system is able to generate face models for people of different races, of different ages, and with different skin colors. Such a system can be potentially used by an ordinary user at home to make their own face models. These face models can be used, for example, as avatars in computer games, online chatting, virtual conferencing, etc. Although details of specific implementations and embodiments are described above, such details are intended to satisfy statutory disclosure obligations rather than to limit the scope of the following claims. Thus, the invention as defined by the claims is not limited to the specific features described above. Rather, the invention is claimed in any of its forms or modifications that fall within the proper scope of the appended claims, appropriately interpreted in accordance with the doctrine of equivalents.
- [1] T. Akimoto, Y. Suenaga, and R. S. Wallace. Automatic 3d facial models.
*IEEE Computer Graphics and Applications,*13(5):16–22, September 1993. - [2] J. Barron, D. Fleet, and S. Beauchemin. Performance of optical flow techniques.
*The International Journal of Computer Vision,*12(1):43–77, 1994. - [3] V. Blanz and T. Vetter. A morphable model for the synthesis of 3d faces. In
*Computer Graphics, Annual Conference Series*, pages 187–194. Siggraph, August 1999 - [4] B. Dariush, S. B. Kang, and K. Waters. Spatiotemporal analysis of face profiles: Detection, segmentation, and registration. In
*Proc. of the*3^{rd }*International Conference on Automatic Face and Gesture Recognition*, pages 248–253. IEEE, April 1998. - [5] D. DeCarlo, D. Metaxas, and M. Stone. An anthropometric face model using variational techniques. In
*Computer Graphics, Annual Conference Series*, pages 67–74. Siggraph, July 1998. - [6] S. DiPaola. Extending the range of facial types.
*Journal of Visualization and Computer Animation,*2(4):129–131, 1991. - [7] O. Faugeras.
*Three*-*Dimensional Computer Vision: a Geometric Viewpoint*. MIT Press, 1993. - [8] P. Fua and C. Miccio. From regular images to animated heads: A least squares approach. In
*Eurographics of Computer Vision*, pages 188–202, 1996. - [9] B. Guenter, C. Grimm, D. Wood, H. Malvar, and F. Pighin. Making faces. In
*Computer Graphics, Annual Conference Series*, pages 55–66. Siggraph, July 1998. - [10] C. Harris and M. Stephens. A combined corner and edge detector.
*In Proc.*4^{th }*Alvey Vision Conf*., pages 189–192, 1988. - [11] B. K. Horn. Closed-form Solution of Absolute Orientation using Unit Quaternions.
*Journal of Optical Society A,*4(4):629–642, April 1987. - [12] B. K. P. Horn and B. G. Schunk. Determining Optical Flow.
*Artificial Intelligence,*17:185–203, 1981. - [13] H. H. S. Ip and L. Yin. Constructing a 3d individualized head model from two orthogonal views.
*The Visual Computer*, (12):254–266, 1996. - [14] S. B. Kang and M. Jones. Appearance-based structure from motion using linear classes of 3-d models.
*Manuscript,*1999. - [15] M. Kass, A. Witkin, and D. Terzopoulos. SNAKES: Active contour models.
*The International Journal of Computer Vision,*1:321–332, January 1988. - [16] A. Lanitis, C. J. Taylor, and T. F. Cootes. Automatic interpretation and coding of face images using flexible models.
*IEEE Transactions on Pattern Analysis and Machine Intelligence,*19(7):743–756, 1997. - [17] Y. C. Lee, D. Terzopoulos, and K. Waters. Constructing physics-based facial models of individuals. In
*Proceedings of Graphics Interface*, Pages 1–8, 1993. - [18] Y. C. Lee, D. Terzopoulos, and K. Waters. Realistic modeling for facial animation. In
*Computer Graphics, Annual Conference Series*, pages 55–62. SIGGRAPH, 1995. - [19] J. P. Lewis. Algorithms for solid noise synthesis. In
*Computer Graphics, Annual Conference Series*, pages 263–270. Siggraph, 1989. - [20] N. Magneneat-Thalmann, H. Minh, M. Angelis, and D. Thalmann. Design, transformation and animation of human faces.
*Visual Computer*, (5):32–39, 1989. - [21] F. I. Parke. Computer generated animation of faces. In
*ACM National Conference*, November 1972 - [22] F. I. Parke.
*A Parametric Model of human Faces*. PhD thesis, University of Utah, 1974 - [23] F. I. Parke and K. Waters.
*Computer Facial Animation*. A K Peters, Wellesley, Mass., 1996. - [24] K. Perlin. Real time responsive animation with personality.
*IEEE Transactions on Visualization and Computer Graphics,*1(1), 1995. - [25] K. Perlin and A. Goldberg. Improv: A system for scripting interactive actors in virtual worlds. In
*Computer Graphics, Annual Conference Series*, pages 205–216. Siggraph, August 1995. - [26] F. Pighin, J. Hecker, D. Lischinski, R. Szeliski, and D. H. Salesin. Synthesizing realistic facial expressions from photographs. In
*Computer Graphics, Annual Conference Series*, pages 75–84. Siggraph, July 1998. - [27] S. Platt and N. Badler. Animating facial expression.
*Computer Graphics,*15(3):245–252, 1981. - [28] P. Rousseeuw and A. Leroy.
*Robust Regression and Outlier Detection*. John Wiley & Sons, New York, 1987. - [29] T. Shakunaga, K. Ogawa, and S. Oki. Integration of eigentemplate and structure matching for automatic facial feature detection. In
*Proc. of the*3^{rd }*International Conference on Automatic Face and Gesture Recognition*, pages 94–99, April 1998. - [30] D. Terzopoulos and K. Waters. Physically based facial modeling, analysis, and animation. In
*Visualization and Computer Animation*, pages 73–80, 1990. - [31] J. T. Todd, S. M. Leonard, R. E. Shaw, and J. B. Pittenger. The perception of human growth.
*Scientific American*, (1242):106–114, 1980. - [32] T. Vetter and T. Poggio. Linear object classes and image synthesis from a single example image.
*IEEE Transactions on Pattern Analysis and Machine Intelligence,*19(7):733–742, 1997. - [33] K. Waters. A muscle model for animating three-dimensional facial expression.
*Computer Graphics,*22(4):17–24, 1987. - [34] Z. Zhang. Motion and structure from two perspective views: From essential parameters to euclidean motion via fundamental matrix.
*Journal of the Optical Society of America A,*14(11):2938–2950, 1997. - [35] Z. Zhang. Determining the epipolar geometry and its uncertainty: A review.
*The International Journal of Computer Vision,*27(2):161–195, 1998. - [36] Z. Zhang. Flexible camera calibration by viewing a plane from unknown orientations. In
*International Conference on Computer Vision*(ICCV'99), pages 666–673, 1999. - [37] J. Y. Zheng. Acquiring 3-d models from sequences of contours.
*IEEE Transactions of Pattern Analysis and Machine Intelligence,*16(2):163–178, February 1994. - [38] P. Fua. Using model-driven bundle-adjustment to model heads from raw video sequences. In International Conference on Computer Vision, pages 46–53, September 1999.
- [39] T. S. Jebara and A. Pentland. Parameterized structure from motion for 3d adaptive feedback tracking of faces. In Proc. CVPR, pages 144–150, 1997.
- [40] J. More. The levenberg-marquardt algorithm, implementation and theory. In G. A. Watson, editor, Numerical Analysis, Lecture Notes in Mathematics 630. Springer-Verlag, 1977.
Patent Citations
Non-Patent Citations
Referenced by
Classifications
Legal Events
Rotate |