US 20050264857 A1
The present invention display system discloses a three dimension display system comprising a three dimensional horizontal perspective display and a 3-D audio system such as binaural simulation to lend realism to the three dimensional display. The three dimensional display system can futher comprise a second display, together with a curvilinear blending display section to merge the various images. The multi-plane display surface can accommodate the viewer by adjusting the various images and 3-D sound according to the viewer's eyepoint and earpoint locations.
1. A method of three dimensional image display by horizontal perspective projection and 3-D audio system projection, the horizontal perspective projection comprising a display of horizontal perspective images according to a predetermined projection eyepoint, and the 3-D audio system projection comprising providing 3-D sound according to a predertermined projection earpoint, the method comprising the steps of:
detecting an eyepoint location of a viewer;
displaying a horizontal perspective image using the detected eyepoint location as the projection eyepoint; and
projecting a 3-D sound to the viewer using the eyepoint location as the projection earpoint.
2. A method as in
3. A method as in
4. A method of three dimensional image display by horizontal perspective projection and 3-D audio system projection, the horizontal perspective projection comprising a display of horizontal perspective images according to a predetermined projection eyepoint, and the 3-D audio system projection comprising providing 3-D sound according to a predertermined projection earpoint, the method comprising the steps of:
continuously scanning to detect an eyepoint location of a viewer;
calculating a new horizontal perspective image using the detected eyepoint location as the projection eyepoint;
displaying the new image;
calculating a new 3-D sound using the detected eyepoint location as the projection earpoint; and
displaying the new 3-D sound.
5. A method as in
6. A method as in
7. A method as in
8. A method as in
9. A method as in
10. A method as in
11. A method as in
12. A method as in
13. A method as in
14. A method as in
15. A method as in
16. A method as in
17. A method as in
18. A method of image display by horizontal perspective projection and 3-D audio system projection, the horizontal perspective projection comprising a display of horizontal perspective images according to a predetermined projection eyepoint, and the 3-D audio system projection comprising providing 3-D sound according to a predertermined projection earpoint, the method comprising the steps of:
displaying a first image onto a first display;
continuously scanning to detect an eyepoint location of a viewer;
calculating a second horizontal perspective image using the detected eyepoint location as the projection eyepoint;
displaying the second image onto a second horizontal perspective display; and
projecting a 3-D sound to the viewer using the eyepoint location as the projection earpoint.
19. A method as in
20. A method as in
This application claims priority from U.S. provisional applications Ser. No. 60/576,187 filed Jun. 1, 2004, entitled “Multi plane horizontal perspective display”; Ser. No. 60/576,189 filed Jun. 1, 2004, entitled “Multi plane horizontal perspective hand on simulator”; Ser. No. 60/576,182 filed Jun. 1, 2004, entitled “Binaural horizontal perspective display”; and Ser. No. 60/576,181 filed Jun. 1, 2004, entitled “Binaural horizontal perspective hand on simulator” which are incorporated herein by reference.
This application is related to co-pending application Ser. No. 11/098,681 filed Apr. 4, 2005, entitled “Horizontal projection display”; Ser. No. 11/098,685 filed Apr. 4, 2005, entitled “Horizontal projection display”, Ser. No. 11/098,667 filed Apr. 4, 2005, entitled “Horizontal projection hands-on simulator”; Ser. No. 11/098,682 filed Apr. 4, 2005, entitled “Horizontal projection hands-on simulator”; “Multi plane horizontal perspective display” filed May 27, 2005; “Multi plane horizontal perspective hand on simulator” filed May 27, 2005; “Binaural horizontal perspective display” filed May 27, 2005; and “Binaural horizontal perspective hand on simulator” filed May 27, 2005.
This invention relates to a three-dimensional display system, and in particular, to a multiple view display system.
Ever since humans began to communicate through pictures, they faced a dilemma of how to accurately represent the three-dimensional world they lived in. Sculpture was used to successfully depict three-dimensional objects, but was not adequate to communicate spatial relationships between objects and within environments. To do this, early humans attempted to “flatten” what they saw around them onto two-dimensional, vertical planes (e.g. paintings, drawings, tapestries, etc.). Scenes where a person stood upright, surrounded by trees, were rendered relatively successfully on a vertical plane. But how could they represent a landscape, where the ground extended out horizontally from where the artist was standing, as far as the eye could see?
The answer is three dimensional illusions. The two dimensional pictures must provide a numbers of cues of the third dimension to the brain to create the illusion of three dimensional images. This effect of third dimension cues can be realistically achievable due to the fact that the brain is quite accustomed to it. The three dimensional real world is always and already converted into two dimensional (e.g. height and width) projected image at the retina, a concave surface at the back of the eye. And from this two dimensional image, the brain, through experience and perception, generates the depth information to form the three dimension visual image from two types of depth cues: monocular (one eye perception) and binocular (two eye perception). In general, binocular depth cues are innate and biological while monocular depth cues are learned and environmental.
The major binocular depth cues are convergence and retinal disparity. The brain measures the amount of convergence of the eyes to provide a rough estimate of the distance since the angle between the line of sight of each eye is larger when an object is closer. The disparity of the retinal images due to the separation of the two eyes is used to create the perception of depth. The effect is called stereoscopy where each eye receives a slightly different view of a scene, and the brain fuses them together using these differences to determine the ratio of distances between nearby objects.
Binocular cues are very powerful perception of depth. However, there are also depth cues with only one eye, called monocular depth cues, to create an impression of depth on a flat image. The major monocular cues are: overlapping, relative size, linear perspective and light and shadow. When an object is viewed partially covered, this pattern of blocking is used as a cue to determine that the object is farther away. When two objects known to be the same size and one appears smaller than the other, this pattern of relative size is used as a cue to assume that the smaller object is farther away. The cue of relative size also provides the basis for the cue of linear perspective where the farther away the lines are from the observer, the closer together they will appear since parallel lines in a perspective image appear to converge towards a single point. The light falling on an object from a certain angle could provide the cue for the form and depth of an object. The distribution of light and shadow on a objects is a powerful monocular cue for depth provided by the biologically correct assumption that light comes from above.
Perspective drawing, together with relative size, is most often used to achieve the illusion of three dimension depth and spatial relationships on a flat (two dimension) surface, such as paper or canvas. Through perspective, three dimension objects are depicted on a two dimension plane, but “trick” the eye into appearing to be in three dimension space. The first theoretical treatise for constructing perspective, Depictura, was published in the early 1400's by the architect, Leone Battista Alberti. Since the introduction of his book, the details behind “general” perspective have been very well documented. However, the fact that there are a number of other types of perspectives is not well known. Some examples are military, cavalier, isometric, and dimetric, as shown at the top of
Of special interest is the most common type of perspective, called central perspective, shown at the bottom left of
The vast majority of images, including central perspective images, are displayed, viewed and captured in a plane perpendicular to the line of vision. Viewing the images at angle different from 90° would result in image distortion, meaning a square would be seen as a rectangle when the viewing surface is not perpendicular to the line of vision. However, there is a little known class of images that we called it “horizontal perspective” where the image appears distorted when viewing head on, but displaying a three dimensional illusion when viewing from the correct viewing position. In horizontal perspective, the angle between the viewing surface and the line of vision is preferrably 45° but can be almost any angle, and the viewing surface is perferrably horizontal (wherein the name “horizontal perspective”), but it can be any surface, as long as the line of vision forming a not-perpendicular angle to it.
Horizontal perspective images offer realistic three dimensional illusion, but are little known primarily due to the narrow viewing location (the viewer's eyepoint has to be coincide precisely with the image projection eyepoint), and the complexity involving in projecting the two dimensional image or the three dimension model into the horizontal perspective image.
The generation of horizontal perspective images require considerably more expertise to create than conventional perpendicular images. The conventional perpendicular images can be produced directly from the viewer or camera point. One need simply open one's eyes or point the camera in any direction to obtain the images. Further, with much experience in viewing three dimensional depth cues from perpendicular images, viewers can tolerate significant amount of distortion generated by the deviations from the camera point. In contrast, the creation of a horizontal perspective image does require much manipulation. Conventional camera, by projecting the image into the plane perpendicular to the line of sight, would not produce a horizontal perspective image. Making a horizontal drawing requires much effort and very time consuming. Further, since human has limited experience with horizontal perspective images, the viewer's eye must be positioned precisely where the projection eyepoint point is to avoid image distortion. And therefore horizontal perspective, with its difficulties, has received little attention.
For realistic three dimensional display, binaural or three dimensional audio simulation is also needed.
The present invention recognizes that the personal computer is perfectly suitable for horizontal perspective display. It is personal, thus it is designed for the operation of one person, and the computer, with its powerful microprocessor, is well capable of rendering various horizontal perspective images to the viewer.
Thus the present invention display system discloses a three dimension display system comprising at least a display surface displaying a three dimensional horizontal perspective images. The other display surfaces can display two dimensional images, or preferably three dimensional central perpective images. Further, the display surfaces can have a curvilinear blending display section to merge the various images. The multi-plane display system can comprise various camera eyepoints, one for the horizontal perspective images, one for the central perspective images, and optionally one for the curvilinear blending display surface. The multi-plane display surface can further adjust the various images to accommodate the position of the viewer. By changing the displayed images to keep the camera eyepoints of the horizontal perspective and central perspective images in the same position as the viewer's eye point, the viewer's eye is always positioned at the proper viewing position to perceive the three dimensional illusion, thus minimizing viewer's discomfort and distortion. The display can accept manual input such as a computer mouse, trackball, joystick, tablet, etc. to re-position the horizontal perspective images. The display can also automatically re-position the images based on an input device automatically providing the viewer's viewpoint location.
Further, the display is also included three dimensional audio such as binaural simulation to lend realism to the three dimensional display.
The present invention discloses a multi-plane display system comprising at least two display surfaces, one of which capable of projecting three dimensional illusion based on horizontal perspective projection.
In general, the present invention multi-plane display system can be used to display three dimensional images and has obvious utility to many industrial applications such as manufacturing design reviews, ergonomic simulation, safety and training, video games, cinematography, scientific 3D viewing, and medical and other data displays.
Horizontal perspective is a little-known perspective, of which we found only two books that describe its mechanics: Stereoscopic Drawing (©1990) and How to Make Anaglyphs (©1979, out of print). Although these books describe this obscure perspective, they do not agree on its name. The first book refers to it as a “free-standing anaglyph,” and the second, a “phantogram.” Another publication called it “projective anaglyph” (U.S. Pat. No. 5,795,154 by G. M. Woods, Aug. 18, 1998). Since there is no agreed-upon name, we have taken the liberty of calling it “horizontal perspective.” Normally, as in central perspective, the plane of vision, at right angle to the line of sight, is also the projected plane of the picture, and depth cues are used to give the illusion of depth to this flat image. In horizontal perspective, the plane of vision remains the same, but the projected image is not on this plane. It is on a plane angled to the plane of vision. Typically, the image would be on the ground level surface. This means the image will be physically in the third dimension relative to the plane of vision. Thus horizontal perspective can be called horizontal projection.
In horizontal perspective, the object is to separate the image from the paper, and fuse the image to the three dimension object that projects the horizontal perspective image. Thus the horizontal perspective image must be distorted so that the visual image fuses to form the free standing three dimensional figure. It is also essential the image is viewed from the correct eye points, otherwise the three dimensional illusion is lost. In contrast to central perspective images which have height and width, and project an illusion of depth, and therefore the objects are usually abruptly projected and the images appear to be in layers, the horizontal perspective images have actual depth and width, and illusion gives them height, and therefore there is usually a graduated shifting so the images appear to be continuous.
In other words, in Image A, the real-life three dimension object (three blocks stacked slightly above each other) was drawn by the artist closing one eye, and viewing along a line of sight perpendicular to the vertical drawing plane. The resulting image, when viewed vertically, straight on, and through one eye, looks the same as the original image.
In Image B, the real-life three dimension object was drawn by the artist closing one eye, and viewing along a line of sight 45° to the horizontal drawing plane. The resulting image, when viewed horizontally, at 45° and through one eye, looks the same as the original image.
One major difference between central perspective showing in Image A and horizontal perspective showing in Image B is the location of the display plane with respect to the projected three dimensional image. In horizontal perspective of Image B, the display plane can be adjusted up and down, and therefore the projected image can be displayed in the open air above the display plane, i.e. a physical hand can touch (or more likely pass through) the illusion, or it can be displayed under the display plane, i.e. one cannot touch the illusion because the display plane physically blocks the hand. This is the nature of horizontal perspective, and as long as the camera eyepoint and the viewer eyepoint is at the same place, the illusion is present. In contrast, in central perspective of Image A, the three dimensional illusion is likely to be only inside the display plane, meaning one cannot touch it. To bring the three dimensional illusion outside of the display plane to allow viewer to touch it, the central perspective would need elaborate display scheme such as surround image projection and large volume.
Now look at
Again, the reason your one open eye needs to be at this precise location is because both central and horizontal perspective not only define the angle of the line of sight from the eye point; they also define the distance from the eye point to the drawing. This means that
Notice that in
The generation of horizontal perspective images require considerably more expertise to create than central perspective images. Even though both methods seek to provide the viewer the three dimension illusion that resulted from the two dimensional image, central perspective images produce directly the three dimensional landscape from the viewer or camera point. In contrast, the horizontal perspective image appears distorted when viewing head on, but this distortion has to be precisely rendered so that when viewing at a precise location, the horizontal perspective produces a three dimensional illusion.
The present invention multi-plane display system promotes horizontal perspective projection viewing by providing the viewer with the means to adjust the displayed images to maximize the illusion viewing experience. By employing the computation power of the microprocessor and a real time display, the horizontal perspective display of the present invention is shown in
The horizontal perspective display system, shown in
The input device can be operated manually or automatically. The input device can detect the position and orientation of the viewew eyepoint, to compute and to project the image onto the display according to the detection result. Alternatively, the input device can be made to detect the position and orientation of the viewer's head along with the orientation of the eyeballs. The input device can comprise an infrared detection system to detect the position the viewer's head to allow the viewer freedom of head movement. Other embodiments of the input device can be the triangulation method of detecting the viewer eyepoint location, such as a CCD camera providing position data suitable for the head tracking objectives of the invention. The input device can be manually operated by the viewer, such as a keyboard, mouse, trackball, joystick, or the like, to indicate the correct display of the horizontal perspective display images.
The head or eye-tracking system can comprise a base unit and a head-mounted sensor on the head of the viewer. The head-mounted sensor produces signals showing the position and orientation of the viewer in response to the viewer's head movement and eye orientation. These signals can be received by the base unit and are used to compute the proper three dimensional projection images. The head or eye tracking system can be infrared cameras to capture images of the viewer's eyes. Using the captured images and other techniques of image processing, the position and orientation of the viewer's eyes can be determined, and then provided to the base unit. The head and eye tracking can be done in real time for small enough time interval to provide continous viewer's head and eye tracking.
The multi-plane display system comprises a number of new computer hardware and software elements and processes, and together with existing components creates a horizontal perspective viewing simulator. For the viewer to experience these unique viewing simulations the computer hardware viewing surface is preferrably situated horizontally, such that the viewer's line of sight is at a 45° angle to the surface. Typically, this means that the viewer is standing or seated vertically, and the viewing surface is horizontal to the ground. Note that although the viewer can experience hands-on simulations at viewing angles other than 45° (e.g. 55°, 30° etc.), it is the optimal angle for the brain to recognize the maximum amount of spatial information in an open space image. Therefore, for simplicity's sake, we use “45°” throughout this document to mean “an approximate 45 degree angle”. Further, while horizontal viewing surface is preferred since it simulates viewers' experience with the horizontal ground, any viewing surface could offer similar three dimensional illusion experience. The horizontal perspective illusion can appear to be hanging from a ceiling by projecting the horizontal perspective images onto a ceiling surface, or appear to be floating from a wall by projecting the horizontal perspective images onto a vertical wall surface.
The viewing simulations are generated within a three dimensional graphics view volume, both situated above and below the physical viewing surface. Mathematically, the computer-generated x, y, z coordinates of the Angled Camera point form the vertex of an infinite “pyramid”, whose sides pass through the x, y, z coordinates of the Reference/Horizontal Plane.
For the viewer to view open space images on their physical viewing device it must be positioned properly, which usually means the physical Reference Plane is placed horizontally to the ground. Whatever the viewing device's position relative to the ground, the Reference/Horizontal Plane must be at approximately a 45° angle to the viewer's line-of-site for optimum viewing.
One way the viewer might perform this step is to position their CRT computer monitor on the floor in a stand, so that the Reference/Horizontal Plane is horizontal to the floor. This example uses a CRT-type television or computer monitor, but it could be any type of viewing device, display screen, monochromic or color display, luminescent, TFT, phosphorescent, computer projectors and other method of image generation in general, providing a viewing surface at approximately a 45° angle to the viewer's line-of-sight.
The display needs to know the view's eyepoint to proper display the horizontal perspective images. One way to do this is for the viewer to supply the horizontal perspective display with their eye's real-world x, y, z location and line-of-site information relative to the center of the physical Reference/Horizontal Plane. For example, the viewer tells the horizontal perspective display that their physical eye will be located 12 inches up, and 12 inches back, while looking at the center of the Reference/Horizontal Plane. The horizontal perspective display then maps the computer-generated Angled Camera point to the viewer's eyepoint physical coordinates and line-of-site. Another way is for the viewer to manually adjusting an input device such as a mouse, and the horizontal perspective display adjust its image projection eyepoint until the proper eyepoint location is experienced by the viewer. Another way way is using triangulation with infrared device or camera to automatically locate the viewer's eyes locations.
The present invention also allows the viewer to move around the three dimensional display and yet suffer no great distortion since the display can track the viewer eyepoint and re-display the images correspondingly, in contrast to the conventional pior art three dimensional image display where it would be projected and computed as seen from a singular viewing point, and thus any movement by the viewer away from the intended viewing point in space would cause gross distortion.
The display system can further comprise a computer capable of re-calculate the projected image given the movement of the eyepoint location. The horizontal perspective images can be very complex, tedious to create, or created in ways that are not natural for artists or cameras, and therefore require the use of a computer system for the tasks. To display a three-dimensional image of an object with complex surfaces or to create an animation sequences would demand a lot of computational power and time, and therefore it is a task well suited to the computer. Three dimensional capable electronics and computing hardware devices and real-time computer-generated three dimensional computer graphics have advanced significantly recently with marked innovations in visual, audio and tactile systems, and have producing excellent hardware and software products to generate realism and more natural computer-human interfaces.
The multi-plane display system of the present invention are not only in demand for entertainment media such as televisions, movies, and video games but are also needed from various fields such as education (displaying three-dimensional structures), technological training (displaying three-dimensional equipment). There is an increasing demand for three-dimensional image displays, which can be viewed from various angles to enable observation of real objects using object-like images. The horizontal perspective display system is also capable of substitute a computer-generated reality for the viewer observation. The systems may include audio, visual, motion and inputs from the user in order to create a complete experience of three dimensional illusion.
The input for the horizontal perspective system can be two dimensional image, several images combined to form one single three dimensional image, or three dimensional model. The three dimensional image or model conveys much more information than that a two dimensional image and by changing viewing angle, the viewer will get the impression of seeing the same object from different perspectives continuously.
The multi-plane display system can further provide multiple views or “Multi-View” capability. Multi-View provides the viewer with multiple and/or separate left-and right-eye views of the same simulation. Multi-View capability is a significant visual and interactive improvement over the single eye view. In Multi-View mode, both the left eye and right eye images are fused by the viewer's brain into a single, three-dimensional illusion. The problem of the discrepancy between accommodation and convergence of eyes, inherent in stereoscopic images, leading to the viewer's eye fatigue with large discrepancy, can be reduced with the horizontal perspective display, especially for motion images, since the position of the viewer's gaze point changes when the display scene changes.
In Multi-View mode, the objective is to simulate the actions of the two eyes to create the perception of depth, namely the left eye and the right right sees slightly different images. Thus Multi-View devices that can be used in the present invention include methods with glasses such as anaglyph method, special polarized glasses or shutter glasses, methods without using glasses such as a parallax stereogram, a lenticular method, and mirror method (concave and convex lens).
In anaglyph method, a display image for the right eye and a display image for the left eye are respectively superimpose-displayed in two colors, e.g., red and blue, and observation images for the right and left eyes are separated using color filters, thus allowing a viewer to recognize a stereoscopic image. The images are displayed using horizontal perspective technique with the viewer looking down at an angle. As with one eye horizontal perspective method, the eyepoint of the projected images has to be coincide with the eyepoint of the viewer, and therefore the viewer input device is essential in allowing the viewer to observe the three dimensional horizontal perspective illusion. From the early days of the anaglyph method, there are much improvements such as the spectrum of the red/blue glasses and display to generate much more realizm and comfort to the viewers.
In polarized glasses method, the left eye image and the right eye image are separated by the use of mutually extinguishing polarizing filters such as orthogonally linear polarizer, circular polarizer, elliptical polarizer. The images are normally projected onto screens with polarizing filters and the viewer is then provided with corresponding polarized glasses. The left and right eye images appear on the screen at the same time, but only the left eye polarized light is transmitted through the left eye lens of the eyeglasses and only the right eye polarized light is transmitted through the right eye lens.
Another way for stereocopic display is the image sequential system. In such a system, the images are displayed sequentially between left eye and right eye images rather than superimposing them upon one another, and the viewer's lenses are synchronized with the screen display to allow the left eye to see only when the left image is displayed, and the right eye to see only when the right image is displayed. The shuttering of the glasses can be achieved by mechanical shuttering or with liquid crystal electronic shuttering. In shuttering glass method, display images for the right and left eyes are alternately displayed on a CRT in a time sharing manner, and observation images for the right and left eyes are separated using time sharing shutter glasses which are opened/closed in a time sharing manner in synchronism with the display images, thus allowing an observer to recognize a stereoscopic image.
Other way to display stereoscopic images is by optical method. In this method, display images for the right and left eyes, which are separately displayed on a viewer using optical means such as prisms, mirror, lens, and the like, are superimpose-displayed as observation images in front of an observer, thus allowing the observer to recognize a stereoscopic image. Large convex or concave lenses can also be used where two image projectors, projecting left eye and right eye images, are providing focus to the viewer's left and right eye respectively. A variation of the optical method is the lenticular method where the images form on cylindrical lens elements or two dimensional array of lens elements.
The illustration in the upper left of
Once the horizontal perspective display has incremented the Angled Camera point's x coordinate by two inches, or by the personal eye separation value supplied by the viewer, the rendering continues by displaying the second (left-eye) view.
Depending on the stereoscopic 3D viewing device used, the horizontal perspective display continues to display the left- and right-eye images, as described above, until it needs to move to the next display time period. An example of when this may occur is if the bear cub moves his paw or any part of his body. Then a new and second Simulated Image would be required to show the bear cub in its new position. This new Simulated Image of the bear cub, in a slightly different location, gets rendered during a new display time period. This process of generating multiple views via the nonstop incrementing of display time continues as long as the horizontal perspective display is generating real-time simulations in stereoscopic 3D.
By rapidly display the horizontal perspective images, three dimensional illusion of motion can be realized. Typically, 30 to 60 images per second would be adequate for the eye to perceive motion. For stereocopy, the same display rate is needed for superimposed images, and twice that amount would be needed for time sequential method.
The display rate is the number of images per second that the display uses to completely generate and display one image. This is similar to a movie projector where 24 times a second it displays an image. Therefore, 1/24 of a second is required for one image to be displayed by the projector. But the display time could be a variable, meaning that depending on the complexity of the view volumes it could take 1/12 or ½ a second for the computer to complete just one display image. Since the display was generating a separate left and right eye view of the same image, the total display time is twice the display time for one eye image.
The present invention further discloses a Multi-Plane display comprising a horizontal perspective display together with a non-horizontal central perspective display.
The Multi-Plane display can be made with one or more physical viewing surfaces. For example, the vertical leg of the “L” can be one physical viewing surface, such as flat panel display, and the horizontal leg of the “L” can be a separate flat panel display. The edge of the two display segments can be a non-display segment and therefore the two viewing surface are not continuous. Each leg of a Multi-Plane display is called a viewing plane and as you can see in the upper left of
To generate both the horizontal perspective and central perspective images requires the creation of two camera eyepoints (which can be the same or different) as shown in
The multi-plane display system can further include a curvilinear connection display section to blend the horizontal perspective and the central perspective images together at the location of the seam in the “L,” as shown at the bottom of
Furthermore, the multi-plane display system can comprise multiple display surfaces together with multiple curvilinear blending sections as shown in
The present invention multi-plane display system thus can simultaneously projecting a plurality of three dimensional images onto multiple display surfaces, one of which is a horizontal perspective image. Further, it can be a stereoscopic multiple display system allowing viewers to use their stereoscopic vision for three dimensional image presentation.
Since the multi-plane display system comprises at least two display surfaces, various requirements need to be addressed to ensure high fidelity in the three dimensional image projection. The display requirements are typically geometric accuracy, to ensure that objects and features of the image to be correctly positioned, edge match accuracy, to ensure continuity between display surfaces, no blending variation, to ensure no variation in luminance in the blending section of various display surfaces, and field of view, to ensure a continuous image from the eyepoint of the viewer.
Since the blending section of the multi-plane display system is preferably a curve surface, some distortion correction could be applied in order for the image projected onto the blending section surface to appear correct to the viewer. There are various solutions for providing distortion correction to a display system such as using a test pattern image, designing the image projection system for the specific curved blending display section, using special video hardware, utilizing a piecewise-linear approximation for the curved blending section. Still another distortion correction solution for the curve surface projection is to automatically computes image distortion correction for any given position of the viewer eyepoint and the projector.
Since the multi-plane display system comprises more than one display surface, care should be taken to minimize the seams and gaps between the edges of the respective displays. To avoid seams or gaps problem, there could be at least two image generators generating adjacent overlapped portions of an image. The overlapped image is calculated by an image processor to ensure that the projected pixels in the overlapped areas are adjusted to form the proper displayed images. Other solutions are to control the degree of intensity reduction in the overlapping to create a smooth transition from the image of one display surface to the next.
For realistic three dimensional display, binaural or three dimensional audio simulation is also included. The present invention also provide the means to adjust the binaural or 3D audio to ensure proper sound simulation.
Similar to vision, hearing using one ear is called monoaural and hearing using two ears is called binaural. Hearing can provide the direction of the sound sources but with poorer resolution than vision, the identity and content of a sound source such as speech or music, and the nature of the environment via echoes, reverberation such as a normal room or an open field.
The head and ears, and sometime the shoulder, function as an antenna system to provide information about the location, distance and environment of the sound sources. The brain can interprete properly the various kinds of sound arriving at the head such as direct sounds, diffractive sounds around the head and by interaction with the outer ears and shoulder, different sound amplitudes and different arrival time of the sounds. These acoustic modifications are called ‘sound cues’ and serve to provide us the directional acoustis information of the sounds.
Basically, the sound cues are related to timing, volume, frequency and reflection. In timing cues, the ears recognize the time the sound arrives and assume that the sound comes from the closest source. Further, with two ears separated about 8 inches apart, the delay of the sound reaching one ear with respect to the other ear can give a cue about the location of the sound source. The timing cue is stronger than the level cue in the sense that the listener locates the sound based on the first wave that reaches the ear, regardless of the loudness of any later arriving waves. In volume (or level) cues, the ears recognize the volume (or loudness) of the sound and assume that the sound coming from the loudest direction. With the binaural (two ears) effect, the amplitude difference between the ears is a strong cue for the localization of the sound source. In frequency (or equalization) cues, the ears recognize the frequency balance of the sound as it arrives in each ear since frontal sounds are directed into the eardrums, while rear sounds bounce off the external ear and thus having a high frequency roll off. In reflection cue, the sound bounces off various surfaces and are either dispersed or absorbed in varying degrees before reaching the ears multiple times. This reflections off the walls of the room and the foreknowledge of the difference between the way various floor coverings sound also contribute to localization. In addition, the body, especially the head, can move relative to the sound source to help in locate the sound.
The above various sound cues are scientifically classified into three types of spatial hearing cues: interaural time differences (ITDs), interaural level differences (ILDs), and head-related transfer functions (HRTFs). ITDs relate to the time for a sound to reach the ears and the time difference for reaching both ears. ILDs refer to the amplitude in the frequency spectrum of sound reaching the ears and also the amplitude differences of the sound frequencies as heard in both ears. HRTFs can provide the perception of distance by the changes in the timbre and distance dependencies, the time delay and directions of direct sound and reflections in echoic environments.
The HRTFs are a collection of spatial cues for a particular listener, including ITDs, ILDs and the reflections, diffractions and damping caused by the listener's body, head, outer ears and shoulder. The external ear, or pinna, has a significant contribution to the HRTFs. Higher frequency sounds are filtered by the pinna to provide the brain a way as to perceive the lateral position, or azimuth, and elevation of the sound source since the response of the pinna filter is highly dependent on the overall direction of the sound source. The head can account for a reduced amplitude of various frequencies of the sounds since the sound has to go through or around the head in order to reach the ear. The overall effects of head shadowing contribute to the perception of linear distance and direction of a sound source. Further, sound frequencies in the range of 1-3 kHz are reflected from the shoulder to produce echoes representing a time delay dependent on the elevation of the sound source. The reflections from surfaces in the world and the reverberation also seem to affect the localization judgement of sound distance and direction.
In addition to these cues, the movement of the head to help in locate the location of a sound source is a key factor, together with the vision to confirm the sound direction. For a 3D immersion, all mechanisms to localize the sounds are always in play and should normally agree. If not, there would be some discomfort and confusion.
Although we can hear with one ear, hearing with two ears is clearly better. Many of the sound cues are related to the binaural perception depending on both the relative loudness of sound and the relative time of arrival of sound at each ear. And thus the binaural performance is clear superior for the localization of single or multiple sound sources and for the formation of the room environment, for the separation of signals coming from multiple incoherent and coherent sound sources; and the enhancement of a chosen signal in a reverberant environment.
Mathematically speaking, HRTF is the frequency response of the sound waves as received by the ears. By measuring the HRTF of a particular listener, and by synthesised electronically using digital signal processing, the sounds can be delivered to a listener's ears via headphones or loudspeakers to create a virtual sound image in three dimensions.
The sound transformation to the ear canal, i.e. HRTF frequency response, can be measured accurately by using small microphones in the ear canals. The measured signal is then processed by a computer to derive the HRTF frequency responses for the left and right ears corresponding to the sound source location.
Thus a 3D audio system works by using the measured HRTFs as the audio filters or equalizers. When a sound signal is processed by the HRTFs filters, the sound localization cues are reproduced, and the listener should perceive the sound at the location specified by the HRTFs. This method of binaural synthesis works extremely well when the listener's own HRTFs are used to synthesize the localization cues. However, measuring HRTFs is a complicated procedure, so 3D audio systems typically use a single set of HRTFs previously measured from a particular human or manikin subject. Thus the HRTF sometimes needs to be changed to accurately respond to a perticular listener. The tuning of a HRTF function can be accomplished by providing various sound source locations and environments and asking the listener to identify.
A 3D audio system should provide the ability for the listener to define a three-dimensional space, to position multiple sound sources and that listener in that 3D space, and to do it all in real-time, or interactively. Beside 3D audio system, other technologies such stereo extension and surround sound could offer some aspects of 3D positioning or interactivity.
Extended stereo processes an existing stereo (two channel) soundtrack to add spaciousness and to make it appear to originate from outside the left/right speaker locations through fairly straight-forward methods. Some of the characteristics of the extended stereo technology include the size of the listening area (called sweet spot), the amount of spreading of stereo images, the amount of tonal changes, the amount of lost stereo panning information, and the ability to achieve effect on headphones as well as speakers.
The surround sound create a larger-than-stereo sound stage with a surround sound 5-speaker setup. Additionally, virtual surround sound systems use 3D audio technology to create the illusion of five speakers emanating from a regular set of stereo speakers, therefore enabling a surround sound listening experience without the need for a five speaker setup. The characteristics of the surround sound technology include the presentation accuracy, the clarity of spatial imaging, and the size of the listening area For better 3D audio system, audio technology needs to create a life-like listening experience by replicating the 3D audio cues that the ears hear in the real world for allowing non-interactive and interactive listening and positioning of sounds anywhere in the three-dimensional space surrounding a listener.
The head tracker function is also very important to provide perceptual room constancy to the listener. In other words, when the listener move their heads around, the signals would change so that the perceived auditory world maintain its spatial position.
To this end, the simulation system needs to know the head position in order to be able to control the binaural impulse responses adequately. Head position sensors have therefore to be provided. The impression of being immersed is of particular relevance for applications in the context of virtual reality.
A replica of a sound field can be produced by putting an infinite number of microphones everywhere. After being stored on a recorder with an infinite number of channels, this recording can then be played back through an infinite number of point-source loudspeakers, each placed exactly as its corresponding microphone was placed.
As the number of microphones and speakers is reduced, the quality of the sound field being simulated suffers. By the time we are down to two channels, height cues have certainly been lost and instead of a stage that is audible from anywhere in the room we find that sources on the stage are now only localizable if we listen along a line equidistant from the last two remaining speakers and face them.
However, only two channels should be adequate, since if we deliver the exact sound required to simulate a live performance at the entrance to each ear canal, then since we only have two ear canals, we should only need to generate two such sound fields. In other words, since we can hear three-dimensionally in the real world using just two ears, it must be possible to achieve the same effect from just two speakers or a set of headphones.
Headphone reproduction is thus differed from loudspeaker reproduction since headphone microphones should be spaced about seven inches apart for a normal ear separation, and loudspeaker microphones separation should be about seven feet apart. Further loudspeakers suffer from crosstalk and therefore some signal conditioning such as crosstalk cancellation will be needed for 3D loudspeaker setup.
Loudspeaker 3D audio systems are extremely effective in desktop computing environments. This is because there is usually only a single listener (the computer user) who is almost always centered between the speakers and facing forward towards the monitor. Thus, the primary user gets the full 3D effect because the crosstalk is properly cancelled. In typical 3D audio applications, like video gaming, friends may gather around to watch. In this case, the best 3D audio effects are heard by others when they are also centered with respect to the loudspeakers. Off-center listeners may not get the full effect, but they still hear a high quality stereo program with some spatial enhancements.
To achieve 3D audio, the speakers are typically arranged surrounding the listener in about the same horizontal plane, but could be arranged to completely surround the listener, from the ceiling to the floor to the surrounding walls. Optionally, the speakers can also be put on the ceiling, on the floor, arranged in an overhead dome configuration, or arranged in a vertical wall configuration. Further, beam transmitted speakers can be used instead of headphone. Beam transmitted speaker offers the freedom of movement for the listener and without the crosstalk between speakers since beam transmitted speaker provide a tight beam of sound.
Generally, a minimum of four loudspeakers are required to achieve a convincing 3-D audio experience, while some researchers are using twenty or more speakers in an anechoic chamber to recreate acoustic environments with much greater precision.
The main advantages of multi-speaker playback are:
Many crosstalk cancellers are based on a highly simplified model of crosstalk, for example modeling crosstalk as a simple delay and attenuation process, or a delay and a lowpass filter. Other crosstalk cancellers have been based on a spherical head model. As with binaural synthesis, crosstalk cancellation performance is ultimately limited by the variation in the size and shape of human heads.
3D audio simulation can be accomplished by the following steps:
The simulation of an acoustic environment involves one or more of the following functions:
Binaural simulation is generally carried out using the sound source material free from any unwanted echoes or noise. The sound source material can then be replayed to a subject, using the appropriate HRTF filters, to create the illusion that the source audio is originating from a particular direction. The HRTF filtering is achieved by simply convolving the audio signal with the pair of HRTF responses (one HRTF filter for each channel of the headphone).
The eyes and ears often perceive an event at the same time. Seeing a door close, and hearing a shutting sound, are interpreted as one event if they happen synchronously. If we see a door shut without a sound, or we see a door shut in front of us, and hear a shutting sound to the left, we get alarmed and confused. In another scenario, we might hear a voice in front of us, and see a hallway with a corner; the combination of audio and visual cues allows us to figure out that a person might be standing around the corner. Together, synchronized 3D audio and 3D visual cues provide a very strong immersion experience.
Both 3D audio and 3D graphics systems can be greatly enhanced by such synchronization.
Improved playback through headphones can be achieved through the use of head tracking. This technique makes use of continuous measurements of the orientation of a subject's head, and adapts the audio signals being fed to the headphones appropriately.
Binaural signal should allow a subject to easily discriminate between left and right sound source locations easily, but the ability to discriminate between front and back, and high and low sound sources is generally only possible if head movement is permitted. Whilst multiple speaker playback methods solve this problem to a large degree, there are still many applications where headphone playback is preferable, and head tracking can then be used as a valuable tool for improving the quality of the 3-D playback.
The simplest form of head tracking binaural system is one which simply simulates anechoic HRTFs, and changes the HRTF functions rapidly in response to the subjects head movements. This HRTF switching can be achieved through a lookup table, with interpolation used to resolve angles that are not represented in the HRTF table.
Simulation of room acoustics over headphones with head tracking becomes more difficult because the direction of arrival of the early reflections is also important in making the result sound realistic. Many researchers believe that the echoes in the reverberant tail of the room response are generally so diffuse that there is no requirement for this part of the room response to be tracked with the subject's head movements.
An important feature of any head tracking playback system is the delay from the subject head movement to the change in the audio response at the headphones. If this delay is excessive, the subject can experience a form of virtual motion sickness and general disorientation.
Audio cues change dramatically when a listener tilts or rotates his or her head. For example, quickly turning the head 90 degrees to look to the side is the equivalent of a sound traveling from the listener's side to the front in a split second. We often use head motion to track sounds or to search for them. The ears alert the brain about an event outside of the area that the eyes are currently focused on, and we automatically turn to redirect our attention. Additionally, we use head motion to resolve ambiguities: a faint, low sound could be either in front or back of us, so we quickly and sub-consciously turn our head a small fraction to the left, and we know if the sound is now off to the right, it is in the front, otherwise it is in the back. One of the reasons why interactive audio is more realistic than pre-recorded audio (soundtracks) is the fact that the listeners head motion can be properly simulated in an interactive system (using inputs from a joystick, mouse, or head-tracking system).
The HRTF function are performed using digital signal processing (DSP) hardware for real time performance. Typical feature of DSP are that the direct sound must be processed to give the correct amplitude and perceived direction, the early echoes must arrive at the listener with appropriate time, amplitude and frequency response to give the perception of the size of the spaces (as well as the acoustic nature of the room surfaces), and the late reverberation must be natural and correctly distributed in 3-D around the listener. The relative amplitude of the direct sound compared to the remainder of the room response helps to provide the sensation of distance.
Thus 3D audio simulation can provides a binaural gain so that the exact same audio content is more audible and intelligible in the binaural case, because the brain can localize and therefore “single out” the binaural signal, while the non-binaural signal gets washed into the noise. Further the listener would still be able to tune into and understand individual conversations, because they are still spatially separated, and “amplified by” binaural gain, an effect called the cocktail party effct. Binaural simulation also can provide faster reaction time because such a signal mirrors the ones received in the real world. In addition, binaural signals can convey positional information: a binaural radar warning sound can warn a user about a specific object that is approaching (with a sound that is unique to that object), and naturally indicate where that object is coming from. Also listening to binaural simulation can be less fatigue since we are used to hearing sounds that originate outside of their heads, as is the case with binaural signals. Mono or stereo signals appear to come from inside a listener's head when using headphones, and produce more strain than a natural sounding, binaural signal. An lastly, 3D binaural simulation can provide an increased perception and immersion in higher quality 3D environment when visuals are shown in synch with binaural sound.