|Publication number||USRE43490 E1|
|Application number||US 12/118,570|
|Publication date||Jun 26, 2012|
|Filing date||May 9, 2008|
|Priority date||May 27, 1994|
|Publication number||118570, 12118570, US RE43490 E1, US RE43490E1, US-E1-RE43490, USRE43490 E1, USRE43490E1|
|Inventors||Eric Gullichsen, Susan P. Wyshynski|
|Original Assignee||B.H. Image Co. Llc|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (21), Non-Patent Citations (2), Referenced by (3), Classifications (9), Legal Events (3)|
|External Links: USPTO, USPTO Assignment, Espacenet|
This is a reissue of U.S. Pat. No. 7,042,497, which is a continuation of application Ser. No. 09/429,697, filed Oct. 28, 1999, now U.S. Pat. No. 6,346,967 which is a continuation of prior application Ser. No. 09/128,963, filed Aug. 4, 1998, U.S. Pat. No. 6,005,611, and which was a continuation of application Ser. No. 08/250,594 filed May 27, 1994, U.S. Pat. No. 5,796,426, incorporated herein by reference.
The present invention relates to a method and apparatus for displaying a perspective corrected field of view from wide angle video sources, and more particularly relates to permitting the user of an orientation sensing means to view a selected portion of stored or real time video encoded from a wide angle source and transforming that portion to a perspective-corrected field of view.
“Virtual reality” and “telepresence” have become extremely popular for use in research, industrial and entertainment applications In “virtual reality”, or VR, a user is permitted to view a computer-generated graphical representation of a selected environment. Depending on the sophistication of the hardware and software used to generate the virtual reality environment, the user may be treated to a three dimensional view of the simulated environment. In “telepresence,” a user is permitted to view a real-world, live or recorded environment from a three dimensional perspective.
In addition, in some higher end systems the user is permitted to see different portions of the VR and telepresence environments simply by moving or orienting his head in one or more degrees of freedom. This permits the user to obtain the sensation that he is immersed in the computer-generated/real-world environment. High end devices detect pan, roll and tilt motions by the user and cause the environment to change accordingly. The pan\tilt\roll motions may be inputted by many types of input devices, such as joysticks, buttons or head orientation sensors (which may be connected to head mounted displays).
In VR applications, a continuing problem is how to render a three dimensional environment of the quality and speed users want while offering the product at a price they can afford. To make a realistic environment, such as in a three dimensional video game, many three dimensional polygons need to be rendered. This rendering requires prohibitively expensive hardware which greatly restricts the commercial value of such a system.
In relation to telepresence applications, a continuing problem with the prior art is how to encode sufficient data that a viewer may arbitrarily move his viewing perspective within the telepresence environment and not look beyond the field of view. One relatively simple solution, where the telepresence environment is based on a real three dimensional environment, is to simply use the head orientation sensors to cause a camera to track the orientation of the viewer. This has obvious limitations in that only one viewer can be in the telepresence environment at a time (since the camera can only track one viewer, and the other viewers will not typically be able to follow the head motions of the controlling viewer) and, also, prerecorded data cannot be used. Further, there is an inherent delay between a change in user viewing perspective and the time that it takes to realign the corresponding camera. These limitations greatly restrict the value of such systems.
One method for overcoming each of these limitations is to encode, either in real time or by pre-recording, a field of view largely equivalent to the entire range of motion vision of a viewer—that is, what the viewer would see if he moved his head in each permitted direction throughout the entire permissible range. For example, encoding substantially a full hemisphere of visual information would permit a plurality of viewers a reasonable degree of freedom to interactively look in a range of directions within the telepresence environment.
The difficulty with this approach is that most means for encoding such information distort, or warp, the visual data, so that the information must be corrected, or “de-warped” before a viewer can readily assimilate it. For example, a typical approach for encoding substantially a full hemisphere of information involves using a fish-eye lens. Fish-eye lenses, by their nature, convert a three dimensional scene to a two-dimensional representation by compressing the data at the periphery of the field of view. For the information to be viewed comfortably by a viewer in the VR environment, the visual data must be decompressed, or dewarped, so that it is presented in normal perspective as a two dimensional representation.
One solution to the distortion problem is proposed in U.S. Pat. No. 5,185,667 issued to Steven Zimmerman. The '667 patent describes an apparatus which effects camera control for pan, tilt, rotate and zoom while having no moving parts. Through the use of a fisheye lens and a complicated trigonometric technique, portions of the video images can be dewarped. However, the solution proposed by the '667 patent is impractical because it is insufficiently flexible to accommodate the use of other lenses besides a theoretically perfect hemispherical fisheye lens without the introduction of mathematical errors due to the misfit between the theoretical and the actual lens characteristics. This solution also introduces undesirable trigonometric complexity which slows down the transformation and is overly expensive to implement. This solution further maps each individual pixel through the complex trigonometric mapping formula further reducing the speed of the transformation from one coordinate system to another.
As a result, there has been a substantial need for a method and apparatus which can dewarp encoded wide angle visual data with sufficient speed and accuracy to permit a viewer to immerse himself in a VR or telepresence environment and look around within the environment while at the same time permitting other viewers concurrently to independently engage in the same activity on the same broadcast video signal. There has also been a need for a method and apparatus capable of providing such dewarping on a general purpose high speed computer.
The present invention overcomes the limitations of the prior art. In particular, the present invention transforms a plurality of viewing vectors within a selected portion of the wide angle, three dimensional video input into two dimensional control points and uses a comparatively simple method to transform the image between the control points to create a perspective-corrected field of view.
More specifically, the present invention is drawn to a method and apparatus which provides perspective corrected views of live, prerecorded or simulated wide angle environments. The present invention first captures a wide angle digital video input by any suitable means, such as through the combination of a high resolution video camera, hemispherical fisheye lens and real time digital image capture board. The captured image is then stored in a suitable memory means so portions of the image may be selected at a later time.
When a portion of the stored video is selected, a plurality of discrete viewing vectors in three dimensional space are chosen and transformed into a plurality of control points in a corresponding two dimensional plane. The area between the control points, which is still warped from the original wide angle image capture, is then transformed into a perspective corrected field of view through a biquadratic polynomial mapping technique. The perspective corrected field of view is then displayed on a suitable displaying apparatus, such as a monitor or head mounted display. The present invention further has the ability to sense an inputted selection, orientation and magnification of a new portion of the stored video for transformation.
In comparison with the prior art, the present invention provides a dependable, low cost, faster and more elegantly simple solution to dewarping wide angle three dimensional images. The present invention also allows for simultaneous dynamic transformation of wide angle video to multiple viewers and provides each user with the ability to access and manipulate the same or different portions of the video input. In VR applications, the present invention also allows the computer generated three dimensional polygons to be rendered in advance; thus, users may view the environments from any orientation quickly and without expensive rendering hardware.
It is therefore one object of the present invention to provide a method and apparatus for dewarping wide angle video to a perspective corrected field of view which can then be displayed.
It is another object of the present invention to provide a method and apparatus which can simultaneously transform the same or different portions of wide angle video input for different users.
It is yet another object of the present invention to provide a method and apparatus which allows selection and orientation of any portion of the video input.
It is still another object of the present invention to provide a method and apparatus for magnification of the video input.
It is still another object of the present invention to provide a method and apparatus which performs all of the foregoing objects while having no moving parts.
These and other objects of the invention will be better understood from the following Detailed Description of the Invention, taken together with the attached Figures.
Referring now to
The fisheye lens 20 causes the video output signal 24 from the camera 10 to be optically warped in a non-linear manner. Before the image can be comfortably viewed by a user, perspective-correcting measures must be taken. The digitized video signal 24 is thus transferred through the digitizing board 30 (typically but not necessarily operating at 30 frames per second) into memory 40 of the computer 150 so that portions of the video picture can be randomly accessed by a microprocessor 50, also within the computer 150, at any time.
The dewarping software is also stored in memory 40 and is applied to the video signal 24 by the microprocessor 50. The stored video signal is then transmitted from memory 40 to a special purpose ASIC 60 capable of biquadratic or higher order polynomial transformations for texture warping and interpolation. Alternatively, the texture warping ASIC 60 may be omitted and its functionality may be performed by software. Phantom lines have been used to show the optional nature of ASIC 60. The perspective corrected video signal is next transmitted to a video output stage 70, such as a standard VGA card, and from there displayed on a suitable monitor, head mounted display or the like 80. An input device 90, such as a joystick or headtracker (which senses the head movements of a user wearing a headmounted display), transmits position information through a suitable input port 100, such as a standard serial, parallel or game port, to the microprocessor 50 to control the portion of the stored video that is selected, dewarped and displayed. The input device 90 also transmits roll/pitch/yaw information to the microprocessor 50 so that a user may control the orientation of the dewarped video signal. Further, one skilled in the art will appreciate that a magnification option could be added to the input device 90 to allow the user to magnify the selected portion of video input, constrained only by the resolution of the camera 10.
The first generation of ASICs, developed for low-cost texture mapping of three dimensional graphics, mapped video images through a bilinear technique, such as is shown in
The warped image in the U-V plane, shown in
For any given viewing direction in three dimensional X-Y-Z space, we then have:
In the case of an ideal hemispheric fisheye lens, f(θ)=(RADIUS)(sin(θ)) and the lens equation which results is:
Equations (1) convert an inputted X-Y-Z three dimensional viewing vector into a corresponding control point in the U-V plane.
To dewarp a rectangular portion of the wide angle video input for a given viewing direction (x,y,z), eight other viewing vectors, which surround the primary viewing vector, are computed at the field of view angles fov_h and fov_v from the primary viewing vector, as shown in
The global bivariate polynomial transformation
is then found to describe the geometric correction necessary to transform the region within the warped 3×3 grid in the U-V plane into a perspective corrected field of view. A biquadratic polynomial transformation, N=2 in the above equations, has been selected because a second order polynomial approximates the warping characteristics of most lenses to an adequately high degree of precision and because there is existing hardware to perform the resulting biquadratic transformation. However, it will be appreciated by one skilled in the art that other polynomial transformations of higher degree could be used to increase the precision of the transformation.
Expanding the above equations (3):
The values for v and bij can be similarly found. In matrix form, the expanded equations (4) can be written as:
To discover aij and bij according to the method of the present invention, a pseudo-inverse technique is used. However, one skilled in the art will appreciate that there are methods to solve equations (5) other than by a pseudo inverse technique, i.e. a least squares technique. The pseudo-inverse solutions for A and B in the above equation (5) are:
Therefore, for a target display Cot a given pixel resolution N×M, W and its pseudo-inverse (WTW)−1WT can be calculated a priori. The values for aij and bij are then found by mapping the points in the U-V plane for the 3×3 grid of control points using the above equations (6). The biquadratic polynomial transformations of the equations (3) are then used to transform the area between the control points. In this embodiment, the determination of the coordinates of each pixel in the U-V plane takes a total of thirteen multiplication and ten addition operations. Additionally, three of the required multiplication operations per pixel may be obviated by storing a table of xy, x2 and y2 values for each xy coordinate pair in the dewarped destination image. In another embodiment, the “x” values which do not vary as “y” changes (i.e. a1*x+a4*x2 and b1*x+b4*x2) may also be precomputed and stored. Likewise, the “y” values which do not vary as “x” changes may be precomputed and stored. These further optimizations reduce the operations needed to determine the coordinates of each pixel in the U-V plane to two multiplication and four addition operations.
It will be appreciated by one skilled in the art that the accuracy of the dewarping transformation will increase as the number of transformed viewing vectors increases, i.e. a 4×4 grid of control points will produce a more accurate transformation than a 3×3 grid of control points. However, the amount of increase in accuracy quickly draws near an asymptote as the number of control points is increased. One skilled in the art will recognize, therefore, that there is little reason in increasing the number of viewing vectors to more than half of the total number of pixels in the displayed region.
It will be further appreciated by one skilled in the art that the selection of a rectangular shape of the video input could be changed to another shape and still be within the scope of the invention. Further, the number of control points could be increased or decreased to correspondingly increase or decrease the accuracy of the transformation. Further still, an image filtering stage could be applied during the inverse texture mapping without deviating from the present invention.
The foregoing description describes an inverse texture mapping technique whereby the biquadratic output (X-Y) is mapped to the input (U-V). In the case where a forward texture mapping ASIC is used, the coordinates for the destination control points in X-Y must be supplied so that the rectilinear source texture region can be mapped from the U-V plane, as provided by the inverse texture mapping software solution above, to the X-Y plane. An example of a forward texture mapping ASIC is the NV-1 chip sold by N-Vidia Corporation.
Thus, the same control points for the U-V plane map to the corners of the display screen in the X-Y plane. The warped regions outside the bounding box may be clipped by hardware or software so that they are not visible on the display screen.
The source pixel coordinates, which are fed from the host CPU, are converted to aij and bij coordinates for forward mapping in the forward mapping solution stage 240, again using techniques mathematically equivalent to those of the equations (7). A series of instructions is further sent from the host CPU to the chip 230 and received by a control unit 260. The control unit 260 sequences and controls the operation of the other functional stages within the chip 230. The host CPU also directs a linear sequence of source pixels, which are to be warped, to an interpolation sampler stage 250 within chip 230. Optionally, these can be subject to a low-pass spatial prefiltering stage 270 prior to transmission to the chip, to reduce sampling error during the warping process. Thus, within the chip 230, the source pixels and the aij and bij coordinates are both fed to the interpolation sampler 250. For each input pixel, one or more destination pixels together with their corresponding X-Y destination coordinates are produced. These warped pixels are then fed into the video frame buffer 280, located outside of the ASIC chip 230. Optionally, anti-aliasing circuitry 290 within the chip performs interpolation on output pixel values, such as bilinear interpolation between adjacent pixel samples, to minimize the effects of output spatial quantization error. One skilled in the art will recognize that the preceding hardware solution is merely exemplary and that there are many such solutions which could be employed and still fall within the scope of the present invention.
The techniques described herein may also be applied to synthetic images. Such images may be created entirely within a computer environment and may be composed of three dimensional geometrical descriptions of objects which can be produced by computer graphics rendering techniques generally known to those skilled in the art. Typically, synthetic images are produced by linear perspective projection, emulating the physical process of imaging onto planar film with a lens having a narrow field of view and producing a view of the synthetic environment as seen through a cone or truncated three dimensional pyramid. The color, intensity shading and other simulated physical properties of each pixel on the planar image grid can also be readily determined. For a synthetic environment, the viewing vectors in X-Y-Z space are rewritten in terms of the warped control points coordinates in the U-V plane
A direction vector in X-Y-Z space can thus be generated for each pixel in the U-V plane in the synthetic wide angle image which is created. For a perfect hemispherical fisheye, the generated vectors point in all directions within the created hemisphere, spaced to the limits of the resolution of the U-V image. This simulates a non-planar image grid, such as the projection of the synthetic environment onto a surface of a spherical image substrate. In this way, a fisheye or other wide angle image of a synthetic three dimensional environment can be produced. This technique can be used for the production of three dimensional modeled cartoons or interactive home gaming applications, among others.
One skilled in the art will appreciate that the present invention may be applied to a sequence of wide angle images changing in time, either live or recorded to an analog or digital storage media. The image substrate for recordation may be an electronic two dimensional image sensor, such as a CCD chip, or photographic film capable of chemically recording the image for subsequent transfer into digital form.
One skilled in the art will also appreciate that the present invention is not limited to transforming wide angle video onto a planar (U-V) surface, but that it is within the scope of the invention to transform wide angle video onto any suitable surface for displaying the video for the user.
Further, two real world, wide angle lenses can be positioned opposite each other to permit near 360 degrees of total coverage of the environment. If seamless omnidirectional coverage of an environment is required, this could be achieved with six wide angle lenses positioned along each direction of a three dimensional axis, as shown in
Further still, the same video signal may be simultaneously transmitted to an arbitrarily large number of viewers all having the ability to simultaneously dewarp the same or different portions of the video input, as in the case of interactive cable TV viewing or multiple player online interactive video game playing.
Having fully described the preferred embodiment of the present invention, it will be apparent to those of ordinary skill in the art that numerous alternatives and equivalents exist which do not depart from the invention set forth above. It is therefore to be understood that the present invention is not to be limited by the foregoing description, but only by the appended claims.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US3953111||Nov 4, 1974||Apr 27, 1976||Mcdonnell Douglas Corporation||Non-linear lens|
|US4728839||Feb 24, 1987||Mar 1, 1988||Remote Technology Corporation||Motorized pan/tilt head for remote control|
|US4751660||Jun 26, 1986||Jun 14, 1988||Sony Corporation||Determining orientation of transformed image|
|US4754269||Mar 5, 1985||Jun 28, 1988||Fanuc Ltd||Graphic display method for displaying a perspective view of an object on a CRT|
|US4772942||Jan 7, 1987||Sep 20, 1988||Pilkington P.E. Limited||Display system having wide field of view|
|US5023725||Oct 23, 1989||Jun 11, 1991||Mccutchen David||Method and apparatus for dodecahedral imaging system|
|US5048102||Sep 14, 1988||Sep 10, 1991||Commissariat A L'energie Atomique||Multiple interpolation process for image correction|
|US5067019||Mar 31, 1989||Nov 19, 1991||The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration||Programmable remapper for image processing|
|US5173948||Mar 29, 1991||Dec 22, 1992||The Grass Valley Group, Inc.||Video image mapping system|
|US5175808||Sep 12, 1989||Dec 29, 1992||Pixar||Method and apparatus for non-affine image warping|
|US5185667||May 13, 1991||Feb 9, 1993||Telerobotics International, Inc.||Omniview motionless camera orientation system|
|US5384588||Jan 31, 1994||Jan 24, 1995||Telerobotics International, Inc.||System for omindirectional image viewing at a remote location without the transmission of control signals to select viewing parameters|
|US5422987||Aug 19, 1992||Jun 6, 1995||Fujitsu Limited||Method and apparatus for changing the perspective view of a three-dimensional object image displayed on a display screen|
|US5796426||May 27, 1994||Aug 18, 1998||Warp, Ltd.||Wide-angle image dewarping method and apparatus|
|US5877801||Jun 5, 1997||Mar 2, 1999||Interactive Pictures Corporation||System for omnidirectional image viewing at a remote location without the transmission of control signals to select viewing parameters|
|US6005611||Aug 4, 1998||Dec 21, 1999||Be Here Corporation||Wide-angle image dewarping method and apparatus|
|US6346967||Oct 28, 1999||Feb 12, 2002||Be Here Corporation||Method apparatus and computer program products for performing perspective corrections to a distorted image|
|US7873233 *||Oct 17, 2006||Jan 18, 2011||Seiko Epson Corporation||Method and apparatus for rendering an image impinging upon a non-planar surface|
|EP0610863B1||Feb 7, 1994||Nov 14, 2001||Interactive Pictures Corporation||Omniview motionless camera surveillance system|
|GB2221118A||Title not available|
|WO1992021208A1||May 5, 1992||Nov 26, 1992||Telerobotics Int Inc||Omniview motionless camera orientation system|
|1||Rebiai et al. "Image Distortion from Zoom Lenses: modeling and digital correction," IBC 1992, pp. 438-441, IEE London, UK, Jul. 1992.|
|2||Rebiai et al., "Image Distortion from Zoom Lenses: modeling and digital correction," IBC 1992, pp. 438-441, IEE London, UK, Jul. 1992.|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US8525871 *||Nov 13, 2008||Sep 3, 2013||Adobe Systems Incorporated||Content-aware wide-angle images|
|US20100033551 *||Nov 13, 2008||Feb 11, 2010||Adobe Systems Incorporated||Content-Aware Wide-Angle Images|
|US20140314336 *||Dec 18, 2012||Oct 23, 2014||Dai Nippon Printing Co., Ltd.||Image processing device, image processing method, program for image processing device, recording medium, and image display device|
|U.S. Classification||348/207.99, 348/335|
|International Classification||H04N5/225, H04N5/262, G06T3/00|
|Cooperative Classification||H04N5/2628, G06T3/0018|
|European Classification||H04N5/262T, G06T3/00C2|
|Nov 18, 2009||AS||Assignment|
Owner name: B.H. IMAGE CO. LLC, DELAWARE
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BE HERE CORPORATION;REEL/FRAME:023535/0040
Effective date: 20071117
|Oct 9, 2012||CC||Certificate of correction|
|Oct 11, 2013||FPAY||Fee payment|
Year of fee payment: 8