|Publication number||US7868891 B2|
|Application number||US 11/229,458|
|Publication date||Jan 11, 2011|
|Filing date||Sep 16, 2005|
|Priority date||Sep 16, 2004|
|Also published as||CN101091175A, CN101091175B, US20060059494, WO2006034034A2, WO2006034034A3|
|Publication number||11229458, 229458, US 7868891 B2, US 7868891B2, US-B2-7868891, US7868891 B2, US7868891B2|
|Inventors||Daniel Elliot Wexler, Larry I. Gritz, Eric B. Enderton, Cass W. Everitt|
|Original Assignee||Nvidia Corporation|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (32), Non-Patent Citations (69), Referenced by (11), Classifications (13), Legal Events (2)|
|External Links: USPTO, USPTO Assignment, Espacenet|
This disclosure claims priority pursuant to 35 USC 119(e) from U.S. provisional patent application Ser. No. 60/610,873, filed on Sep. 16, 2004, by Dan Wexler et al., titled “LOAD BALANCING,” assigned to the assignee of the presently claimed subject matter.
This disclosure is related to load balancing, such as between co-processors, for example.
Computer graphics is an extensive field in which a significant amount of hardware and software development has taken place over the last twenty years or so. See, for example, Computer Graphics: Principles and Practice, by Foley, Van Dam, Feiner, and Hughes, published by Addison-Wesley, 1997. Typically, in a computer platform or other similar computing device, dedicated graphics hardware is employed in order to render graphical images, such as those used in connection with computer games, for example. For such systems, dedicated graphics hardware may be limited in a number of respects that have the potential to affect the quality of the graphics, including hardware flexibility and/or its rendering capability.
In graphics, typically, a standard computing platform will include a central processing unit (CPU) and a graphical processing unit (GPU). As GPUs continue to become more complex and capability of a larger number of computing tasks, techniques for load balancing processing between the processors becomes more desirable.
Embodiments of methods, apparatuses, devices, and/or systems for load balancing two processors, such as for graphics and/or video processing, for example, are described. In accordance with one embodiment, a method of load balancing between a programmable GPU and a CPU includes the following. A two-ended queue is formed of separate work units each capable of being processed at least in part by said GPU and said CPU. The GPU and CPU process the work units by having the GPU and CPU select work units from respective ends of the queue.
Subject matter is particularly pointed out and distinctly claimed in the concluding portion of the specification. The claimed subject matter, however, both as to organization and method of operation, together with objects, features, and advantages thereof, may best be understood by reference of the following detailed description when read with the accompanying drawings in which:
In the following detailed description, numerous specific details are set forth to provide a thorough understanding of the claimed subject matter. However, it will be understood by those skilled in the art that the claimed subject matter may be practiced without these specific details. In other instances, well-known methods, procedures, components and/or circuits have not been described in detail so as not to obscure the claimed subject matter.
Computer graphics is an extensive field in which a significant amount of hardware and software development has taken place over the last twenty years or so. See, for example, Computer Graphics: Principles and Practice, by Foley, Van Dam, Feiner, and Hughes, published by Addison-Wesley, 1997. Typically, in a computer platform or other similar computing device, dedicated graphics hardware is employed in order to render graphical images, such as those used in connection with computer games, for example. For such systems, dedicated graphics hardware may be limited in a number of respects that have the potential to affect the quality of the graphics, including hardware flexibility and/or its rendering capability. However, higher quality graphics continues to be desirable as the technology and the marketplace continues to evolve. Thus, signal processing and/or other techniques to extend the capability of existing hardware in terms of the quality graphics that may be produced continues to be an area of investigation.
As previously discussed, dedicated graphics hardware may be limited in its capabilities, such as its graphics rendering capabilities and/or its flexibility. This may be due at least in part, for example, to the cost of hardware providing improved abilities relative to the demand for such hardware. Despite this, however, in recent years, the capabilities of dedicated graphics hardware provided on state-of-the-art computer platforms and/or similar computing systems have improved and continue to improve. For example, fixed function pipelines have been replaced with programmable vertex and fragment processing stages. As recently as 6 years ago, most consumer three-dimensional (3D) graphics operations were principally calculated on a CPU and the graphics card primarily displayed the result as a frame buffer. However, dedicated graphics hardware has evolving into a graphics pipeline comprising tens of millions of transistors. Today, a programmable graphics processing unit (GPU) is capable of more than simply feed-forward triangle rendering. State-of-the art graphics chips, such as the NVIDIA GeForce 4 and the ATI Radon 9000, for example, replace fixed-function vertex and fragment processing stages with programmable stages, as described in more detail hereinafter. These programmable vertex and fragment processing stages have the capability to execute programs allowing control over shading and/or texturing calculations, as described in more detail hereinafter.
Similar to CPU architectures, a GPU may be broken down into pipeline stages. However, whereas a CPU embodies a general purpose design used to execute arbitrary programs, a GPU is architected to process raw geometry data and eventually represent that information as pixels on a display, such as a monitor, for example.
Typically, for an object to be drawn, the following operations are executed by such a pipeline:
As illustrated by block 115 of
As illustrated by block 120 and previously suggested, a graphics pipeline typically will perform transform and lighting (T & L) operations and the like. Block 120 depicts a fixed-function unit; however, these operations are being replaced more and more by programmable vertex units, such as 130, also referred to as vertex shaders. Vertex shader 130 applies a vertex program to a stream of vertices. Therefore, the program processes data at the vertex level. Most operations are performed in one cycle, although this restriction need not apply. A typical vertex program is on the order of a hundred or more instructions.
As with the vertex stage, the fragment processing stage has undergone an evolution from a fixed function unit, such as illustrated by block 140, to a programmable unit, such as illustrated by block 150. Thus, previously, texturing, filtering and blending were performed using fixed function state machines or similar hardware. As with vertex shaders, a pixel shader, such as 150, also referred to as a programmable fragment processing stage, permits customized programming control. Therefore, on a per pixel basis, a programmer is able to compute color and the like to produce desired customized visual effects.
These trends in programmability of the graphics pipeline have transformed the graphics processing unit (GPU) and its potential applications. Thus, one potential application of such a processor or processing unit is to accomplish high quality graphics processing, such as may be desirable for a variety of different situations, such as for creating animation and the like, for example. More specifically, in recent years, the performance of graphics hardware has increased more rapidly than that of central processing units (CPUs). As previously indicated, CPU designs are typically intended for high performance processing on sequential code. It is, therefore, becoming increasingly more challenging to use additional transistors to improve processing performance. In contrast, as just illustrated, programmable graphics hardware is designed for parallel processing of vertex and fragment stage code. As a result, GPUs are able to use additional transistors more effectively than CPUs to produce processing performance improvements. Thus, GPUs offer the potential to sustain processing performance improvements as semiconductor fabrication technology continues to advance.
Of course, programmability is a relatively recent innovation. Furthermore, a range of differing capabilities are included within the context of “programmability.” For the discussion of this particular embodiment, focus will be placed upon the fragment processing stage of the GPU rather than the vertex stage, although, of course, the claimed subject matter is not limited in scope in this respect. Thus, in one embodiment, a programmable GPU may comprise a fragment processing stage that has a simple instruction set. Fragment program data types may primarily comprise fixed point input textures. Output frame buffer colors may typically comprise eight bits per color component. Likewise, a stage typically may have a limited number of data input elements and data output elements, a limited number of active textures, and a limited number of dependent textures. Furthermore, the number of registers and the number of instructions for a single program may be relatively short. The hardware may only permit certain instructions for computing texture addresses only at certain points within the program. The hardware may only permit a single color value to be written to the frame buffer for a given pass, and programs may not loop or execute conditional branching instructions. In this context, an embodiment of a GPU with this level of capability or a similar level of capability shall be referred to as a fixed point programmable GPU.
In contrast, more advanced dedicated graphics processors or dedicated graphics hardware may comprise more enhanced features. The fragment processing stage may be programmable with floating point instructions and/or registers, for example. Likewise, floating point texture frame buffer formats may be available. Fragment programs may be formed from a set of assembly language level instructions capable of executing a variety of manipulations. Such programs may be relatively long, such as on the order of hundreds of instructions or more. Texture lookups may be permitted within a fragment program, and there may, in some embodiments, be no limits on the number of texture fetches or the number of levels of texture dependencies within a program. The fragment program may have the capability to write directly to texture memory and/or a stencil buffer and may have the capability to write a floating point vector to the frame buffer, such as RGBA, for example. In this context, an embodiment of a GPU with this level of capability or a similar level of capability may be referred to as a floating point programmable GPU.
Likewise, a third embodiment or instantiation of dedicated graphics hardware shall be referred to here as a programmable streaming processor. A programmable streaming processor comprises a processor in which a data stream is applied to the processor and the processor executes similar computations or processing on the elements of the data stream. The system may execute, therefore, a program or kernel by applying it to the elements of the stream and by providing the processing results in an output stream. In this context, likewise, a programmable streaming processor which focuses primarily on processing streams of fragments comprises a programmable streaming fragment processor. In such a processor, a complete instruction set and larger data types may be provided. It is noted, however, that even in a streaming processor, loops and conditional branching are typically not capable of being executed without intervention originating external to the dedicated graphics hardware, such as from a CPU, for example. Again, an embodiment of a GPU with this level of capability or a similar level comprises a programmable streaming processor in this context.
In this particular embodiment, GPU 210 may comprise any instantiation of a programmable GPU, such as, for example, one of the three previously described embodiments, although for the purposes of this discussion, it is assumed that GPU 210 comprises a programmable floating point GPU. Likewise, it is, of course, appreciated that the claimed subject matter is not limited in scope to only the three types of GPUs previously described. These three are merely provided as illustrations of typical programmable GPUs. All other types of programmable GPUs currently known or to be developed later are included within the scope of the claimed subject matter. For example, while
Likewise, for this simplified embodiment, system 200 comprises a CPU 230 and a GPU 210. In this particular embodiment, memory 240 comprises random access memory or RAM, although the claimed subject matter is not limited in scope in this respect. Any one of a variety of types of memory currently known or to be developed may be employed. It is noted that memory 240 includes frame buffer 250 in this particular embodiment, although, again, the claimed subject matter is not limited in scope in this respect. For example,
It is worth repeating that
In graphics, one typical and frequent computation is referred to as “ray tracing.” Ray tracing is employed in a variety of ways in graphics, such as for simulating illumination effects, including shadows, reflection, and/or refraction, as well as other uses. In general, ray tracing refers to a process to determine the visibility of surfaces present in a particular graphical image by tracing imaginary rays of light from the viewer's eye to objects in the scene. See, for example, Computer Graphics, Sec. 15.10, pp. 701-718.
One of the difficulties with ray tracing is that typically it is one of the most time consuming graphics operations to be performed. Furthermore, ray tracing is typically performed on a CPU, rather than on a GPU due, at least in part, to the complexity of the computation involved. However, more recently, work has begun to employ a programmable GPU in the computation process. For example, in “Ray Tracing on Programmable Graphics Hardware,” by Timothy Purcell et al., ACM Transactions On Graphics, 2002, interesting methods of storing and/or accessing data for general purpose computations on a GPU are explored. One problem with the approach suggested by Purcell et al., however, is the large amount of storage capability needed to store an entire scene while performing calculations. In another recent paper, “The Ray Engine,” by Nathan Carr et al., Graphics Hardware, 2002, an approach is suggested in which a GPU is employed to compute ray-triangle intersections. The difficulty with this approach is that the ray-triangle intersections are computed on the GPU one triangle at a time. Such an approach, therefore, may be time consuming and may not fully utilize the parallel processing capability available via a programmable GPU. Additional techniques for using a programmable GPU to perform ray tracing for graphics processing are, therefore, desired.
Although ray tracing is a time consuming operation, it may be, at least for some computations, that more time is spent determining portions of an image where ray tracing does not need to be utilized than is spent actually performing and computing ray-primitive intersections. Thus, as will become more clear, processing advantages may be obtained by applying a programmable GPU to reduce the number of ray-primitive intersection calculations to be completed or performed by determining those portions of the image where ray tracing is not desired.
Another aspect of this particular embodiment is employing the parallel processing capability of the GPU. In particular, intersecting a plurality of rays with a hierarchy of bounding surfaces suggests a repetitive calculation that may potentially be performed effectively on a GPU. The following discussion focuses on processing by the GPU itself and how the GPU interacts with the CPU to load balance and compute ray-primitive intersections. Thus, yet another aspect of this particular embodiment involves load balancing between the GPU and the CPU.
Referring now to
It is noted that the shape of the bounding surfaces may take any form. For example, the shape of a bounding surface may comprise a sphere, square, rectangle, convex surface, or other types of surface. For this particular embodiment, the bounding surface comprises a box, referred to here as a bounding box. One advantage of employing a box is that it is quick and easy to implement. In this context, a bounding box also shall be referred to as a voxel or volume. Due at least in part to the use of bounding boxes here, the division of the image is substantially grid-based, as illustrated in
As previously alluded to,
In this particular embodiment, the rays are represented by pixels, as illustrated in
As a result of dividing the image spatially, voxels may be ranked based at least in part on the number of rays that intersect the perimeter, referred to here as a batch of rays. It is noted here that, for this particular application of this particular technique, the rays are substantially coherent. Thus, based at least in part on the number of rays that intersect its perimeter, a bounding box or voxel represents, in this context, an amount of work to be performed by a processor, referred to here as an item or unit of work. The amount of work to be performed for a particular work unit or work item is related at least in part to the number of rays that intersect the perimeter of the particular bounding box. Additionally, within a bounding box is a series of additional bounding boxes or a hierarchy. Thus, the particular bounding box illustrated in
As depicted at block 330 of
Once the GPU takes on a unit of work, such as, for example, the work unit designated as row 1, column 1 of grid 610, it is able to process that voxel using a technique that processes ten rays and eight bounding boxes in a cycle, although, of course, the claimed subject matter is not limited in scope in this respect. The number of rays and the bounding boxes, however, to process in a GPU cycle may vary depending upon a variety of factors. It is also noted that, in this embodiment, the same ten rays are applied. Furthermore, for this particular embodiment, the eight boxes comprise hierarchically successive boxes, although the claimed subject matter is, of course, not limited in scope to employing hierarchically successive boxes. Thus, if a particular ray intersects all eight bounding boxes, here, that provides some information to be used for further graphical processing, as described in more detail below.
The mechanism that is employed to process ten rays and eight bounding boxes in a cycle involves taking advantage of the architecture of a GPU that includes a programmable pixel shader stage, as previously described. Therefore, the number of rays, for example, to employ for processing on a GPU may vary depending, at least in part, on the application and the particular situation. It may also vary depending at least in part on other factors, such as, for example, the particular GPU, its particular architecture, the particular image being processed, etc. Likewise, a similar variation may apply to the number of bounding boxes to process in a cycle.
As previously explained, a pixel shader executes a so-called “fragment program.” Thus, in the fragment stage of a GPU, such as fragment stage 180, for example, a pixel shader, such as 150, for example, is presented with a fragment program in the form of instructions to execute. Likewise, particular pixels are designated on which it is desired that the fragment program be executed. When executing such a program, a GPU typically produces or outputs values to a particular location for a particular pixel. Thus, in this embodiment, to perform a parallel computation, such as eight bounding boxes in a cycle, the results of a particular ray/pixel computation are written to a particular location, in this particular embodiment, to a stencil buffer. More specifically, for a pixel processed by a pixel shader, typically a GPU computes its color (e.g., red, green, blue), alpha (e.g., coverage), depth, and other additional values that may be specific to the particular fragment program. For this particular embodiment, the stencil buffer comprises a byte or eight bits for storing those other additional values. Hence, for this particular embodiment, eight bounding boxes, each utilizing a bit of the stencil byte, are handled in a cycle. Again, the claimed subject matter is not limited in scope in this particular respect. For example, the results of the computation might, instead, be stored as depth, as color, or as some other attribute for which the GPU has a specific buffer location. Here, as indicated, each bit in the stencil buffer represents the results of computing the intersection between a particular ray and a particular bounding box. One advantage, then, of employing the stencil buffer is that from an input/output perspective, it is relatively easy to read out a particular computation result by masking the other bits of the stencil buffer.
As previously discussed, in this particular embodiment, ten rays are processed in a cycle. In this particular embodiment, this is done in the form of a 2×5 array of pixels, although, the claimed subject matter is not limited in scope in this respect. In general, it is desirable to use an array having a dimension that is a multiple of two to make efficient use of the GPU. Of course, the claimed subject matter is not limited in scope to employing a 2×N array, where N is any positive integer. Therefore, in this particular embodiment, to capture the efficiencies of parallel processing, for this particular embodiment, ten rays are processed in a cycle.
In this particular embodiment, a bounding box is represented as a range in X, a range in Y, and a range in Z. Thus, a fragment program may be written to determine for ten pixels whether the rays associated with those pixels intersect such a bounding box. If an intersection occurs, a bit may be set in the stencil buffer for that particular pixel. Likewise, with a fragment program, computing eight bounding boxes in a cycle leverages the hardware architecture of the GPU, as previously described.
One issue when employing a GPU is determining when processing has ceased. To determine this, the CPU queries the GPU. However, such querying has some efficiency implications. Querying the GPU results in the GPU stopping its' processing so that it is able to provide data to the CPU. Thus, it may be undesirable to query too frequently because this may result in processing inefficiency. However, it is, likewise, desirable to not query the GPU too infrequently because once the GPU has ceased, it may sit idle, representing wasted processing time.
For this particular embodiment, a two sided-queue, as previously described, provides a mechanism to balance these considerations, as shall be described in more detail later. Within this context, as suggested, the frequency at which the CPU queries the GPU mechanism may affect efficiency of processing by the GPU. Thus, depending on the particular implementation or embodiment, it may be desirable to vary this frequency.
As previously described, the GPU and the CPU begin separate work units initially, as illustrated by block 330 of
If, alternatively, however, the GPU has uncovered a hit, this means that some rays intersected bounding boxes for the particular voxel. The GPU, by providing data back to the CPU regarding the rays where this intersection has taken place, assists the CPU to determine the number of rays that still remain “active” for further processing. This information allows the CPU to schedule another work unit in the two-sided queue previously described. This scheduling by the CPU determines whether additional processing for this particular voxel will performed by the GPU or the CPU.
At some point, however, there are no additional bounding boxes in the hierarchy. Once this occurs, assuming the GPU has uncovered a “hit,” it indicates that a computation be performed to determine whether the ray or rays intersect the primitives bounded by the bounding boxes. In this particular embodiment, this latter computation is performed by the CPU rather than by the GPU. Therefore, the CPU computes intersections between one or more rays and one or more graphical objects based at least in part on the computations performed by the GPU. The CPU completes such processing for a particular work unit by determining whether the ray or rays intersect any primitives. This is illustrated in
It is possible for a ray to intersect two or three objects. In order to address this, intersections between rays and primitives are cached and sorted using a z-buffer to determine which primitive is the first or closest intersection.
It is, of course, noted that the claimed subject matter is not limited in scope to this particular embodiment, such as to a particular time at which the CPU queries the GPU, for example. As one example, the CPU may query the GPU at substantially predetermined time intervals. Alternatively, in an adaptive approach, the CPU may make queries based at least in part on the amount of processing for the GPU to do with respect to the latest work unit begun by the GPU. As noted, the manner in which the CPU queries the GPU may affect the processing efficiency of the GPU. It is intended, therefore, to include within the scope of the claimed subject matter, any and all ways in which the CPU may query the GPU. As should be clear, it is desirable to have GPU and CPU processing coincide or overlap temporally. In general, the greater amount of time that both processors are processing in parallel, the greater the throughput. Thus, it is desirable that querying of the GPU by the CPU take place in a manner to make temporal overlap of processing by the processors as extensive as possible. Of course, as previously indicated, the manner in which this is accomplished effectively may vary with particular implementations. As was also previously suggested, the GPU, therefore, in this particular embodiment, is employed to accelerate processing by assisting the CPU to determine those voxels in which it is not desirable to attempt to compute ray-primitive intersections.
To illustrate querying of the GPU at any time,
In this particular embodiment, as suggested, when queried, if processing has ceased, the GPU provides information to the CPU regarding whether a bounding box has been intersected or whether the ray(s) have missed the object(s) in the particular voxel. Hitting or intersecting a bounding box suggests additional processing is desired; however, if a bounding box is not hit or intersected, then, likewise, the primitives bounded by the box will not be intersected. It is, of course, noted that the information desired is not simply that a ray has intersected a primitive, but also where that intersection has occurred, what is the closest ray-primitive intersection, etc. This is a reason to have the CPU complete the process begun by the GPU, once there are no more bounding boxes in the hierarchy. The GPU is employed, in essence, to determine when it is desirable to have the CPU compute ray-primitive intersections for a particular cell or voxel. However, advantages of employing a GPU to, at times, “walk the hierarchy” include that the GPU may perform calculations in parallel with the CPU and that the GPU may perform some calculations more efficiently than the CPU.
In summary, in this particular embodiment, a set of hierarchical voxels or bounding boxes are employed. Batches of substantially coherent rays are iterated through a bounding box hierarchy. After a voxel has completed its processing, the batch of rays is walked forward into the next set of voxels. Within this process, the number of rays contained within a voxel is employed to determine implicitly, via a two-sided queue, whether the computations would better be performed by the GPU or the CPU for the purposes of load balancing. As previously indicated, it is desirable that large batches be processed on the GPU and small batches be processed on the CPU to take advantage of the particular capabilities of these, respective processors.
It is, of course, now appreciated, based at least in part on the foregoing disclosure, that software may be produced capable of producing the desired graphics processing. It will, of course, also be understood that, although particular embodiments have just been described, the claimed subject matter is not limited in scope to a particular embodiment or implementation. For example, one embodiment may be in hardware, such as implemented to operate on a device or combination of devices as previously described, for example, whereas another embodiment may be in software. Likewise, an embodiment may be implemented in firmware, or with any combination of hardware, software, and/or firmware, for example. Likewise, although the claimed subject matter is not limited in scope in this respect, one embodiment may comprise one or more articles, such as a storage medium or storage media. This storage media, such as, one or more CD-ROMs and/or disks, for example, may have stored thereon instructions, that when executed by a system, such as a computer system, a computing platform, a GPU, a CPU, another device or system, or combinations thereof, for example, may result in an embodiment of a method in accordance with the claimed subject matter being executed, such as one of the embodiments previously described, for example. As one potential example, a computing platform may include one or more processing units or processors, one or more input/output devices, such as a display, a keyboard and/or a mouse, and/or one or more memories, such as static random access memory, dynamic random access memory, flash memory, and/or a hard drive, although, again, the claimed subject matter is not limited in scope to this example.
In the preceding description, various aspects of the claimed subject matter have been described. For purposes of explanation, specific numbers, systems and/or configurations were set forth to provide a thorough understanding of the claimed subject matter. However, it should be apparent to one skilled in the art having the benefit of this disclosure that the claimed subject matter may be practiced without the specific details. In other instances, well-known features were omitted and/or simplified so as not to obscure the claimed subject matter. While certain features have been illustrated and/or described herein, many modifications, substitutions, changes and/or equivalents will now occur to those skilled in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and/or changes as fall within the true spirit of the claimed subject matter.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US5283860||Nov 15, 1990||Feb 1, 1994||International Business Machines Corporation||System and method for displaying trimmed surfaces using bitplane masking|
|US5594854||Mar 24, 1995||Jan 14, 1997||3Dlabs Inc. Ltd.||Graphics subsystem with coarse subpixel correction|
|US5600763||Jul 21, 1994||Feb 4, 1997||Apple Computer, Inc.||Error-bounded antialiased rendering of complex scenes|
|US5701404||May 31, 1996||Dec 23, 1997||Softimage||Method and system for efficiently trimming a nurbs surface with a projected curve|
|US5808628||Jun 6, 1995||Sep 15, 1998||Quantel Ltd.||Electronic video processing system|
|US5850230||Feb 7, 1995||Dec 15, 1998||A/N Inc.||External memory system having programmable graphics processor for use in a video game system or the like|
|US5977986||Dec 6, 1995||Nov 2, 1999||Intel Corporation||Image encoding for faster decoding|
|US6128642 *||Jul 22, 1997||Oct 3, 2000||At&T Corporation||Load balancing based on queue length, in a network of processor stations|
|US6377265||Feb 12, 1999||Apr 23, 2002||Creative Technology, Ltd.||Digital differential analyzer|
|US6426755||May 16, 2000||Jul 30, 2002||Sun Microsystems, Inc.||Graphics system using sample tags for blur|
|US6466227 *||Sep 1, 1999||Oct 15, 2002||Mitsubishi Electric Research Laboratories, Inc.||Programmable architecture for visualizing sampled and geometry data|
|US6556200 *||Sep 1, 1999||Apr 29, 2003||Mitsubishi Electric Research Laboratories, Inc.||Temporal and spatial coherent ray tracing for rendering scenes with sampled and geometry data|
|US6614445||Mar 23, 1999||Sep 2, 2003||Microsoft Corporation||Antialiasing method for computer graphics|
|US6651082 *||Jul 1, 1999||Nov 18, 2003||International Business Machines Corporation||Method for dynamically changing load balance and computer|
|US6809739||Jun 28, 2002||Oct 26, 2004||Silicon Graphics, Inc.||System, method, and computer program product for blending textures during rendering of a computer generated image using a single texture as a mask|
|US6853377||Jun 26, 2002||Feb 8, 2005||Nvidia Corporation||System and method of improved calculation of diffusely reflected light|
|US6862025 *||Feb 28, 2002||Mar 1, 2005||David B. Buehler||Recursive ray casting method and apparatus|
|US6876362||Jul 10, 2002||Apr 5, 2005||Nvidia Corporation||Omnidirectional shadow texture mapping|
|US6919896 *||Mar 11, 2002||Jul 19, 2005||Sony Computer Entertainment Inc.||System and method of optimizing graphics processing|
|US6999100||Nov 28, 2000||Feb 14, 2006||Nintendo Co., Ltd.||Method and apparatus for anti-aliasing in a graphics system|
|US7015914||Dec 10, 2003||Mar 21, 2006||Nvidia Corporation||Multiple data buffers for processing graphics data|
|US7061502||Nov 28, 2000||Jun 13, 2006||Nintendo Co., Ltd.||Method and apparatus for providing logical combination of N alpha operations within a graphics system|
|US7071937||May 30, 2000||Jul 4, 2006||Ccvg, Inc.||Dirt map method and apparatus for graphic display system|
|US7091979||Aug 29, 2003||Aug 15, 2006||Nvidia Corporation||Pixel load instruction for a programmable graphics processor|
|US7119810||Dec 5, 2003||Oct 10, 2006||Siemens Medical Solutions Usa, Inc.||Graphics processing unit for simulation or medical diagnostic imaging|
|US7133041 *||Feb 26, 2001||Nov 7, 2006||The Research Foundation Of State University Of New York||Apparatus and method for volume processing and rendering|
|US7180523||Mar 31, 2000||Feb 20, 2007||Intel Corporation||Trimming surfaces|
|US7471291 *||Nov 6, 2006||Dec 30, 2008||The Research Foundation Of State University Of New York||Apparatus and method for real-time volume processing and universal three-dimensional rendering|
|US20030043169||Aug 31, 2001||Mar 6, 2003||Kevin Hunter||System and method for multi-sampling primitives to reduce aliasing|
|US20030227457||Jun 6, 2002||Dec 11, 2003||Pharr Matthew Milton||System and method of using multiple representations per object in computer graphics|
|US20040207623||Apr 18, 2003||Oct 21, 2004||Isard Michael A.||Distributed rendering of interactive soft shadows|
|US20050225670||Apr 2, 2004||Oct 13, 2005||Wexler Daniel E||Video processing, such as for hidden surface reduction or removal|
|1||"Last Time", CS679-Fall 2003-Copyright University of Wisconsin, 23 pgs.|
|2||"Order-Independent Transparency Rendering System and Method", U.S. Appl. No. 09/944,988, filed Aug. 31, 2001, 35 pgs.|
|3||"Last Time", CS679-Fall 2003—Copyright University of Wisconsin, 23 pgs.|
|4||Advisory action mailed Dec. 28, 2007 in co-pending U.S. Appl. No. 10/817,692, 4 pages.|
|5||Advisory Action mailed Oct. 14, 2008 in co-pending U.S. Appl. No. 10/792,497, 3 pages.|
|6||Advisory Action mailed Oct. 15, 2007 in co-pending U.S. Appl. No. 10/792,497, 4 pages.|
|7||Amendment AF filed Sep. 29, 2008 in co-pending U.S. Appl. No. 10/792,497, 23 pages.|
|8||Amendment After Final filed Dec. 12, 2007 in co-pending U.S. Appl. No. 10/817,692, 15 pages.|
|9||Amendment After Final filed Feb. 5, 2009 in co-pending U.S. Appl. No. 10/817,692,17 pages.|
|10||Amendment After Final filed Sep. 27, 2007 in co-pending U.S. Appl. No. 10/792,497, 19 pages.|
|11||Amendment and RCE filed Dec. 29, 2008 in co-pending U.S. Appl. No. 10/792,497, 23 pages.|
|12||Amendment and RCE filed Jan. 31, 2008 in co-pending U.S. Appl. No. 10/817,692, 15 pages.|
|13||Amendment and RCE filed Oct. 29, 2007 in co-pending U.S. Appl. No. 10/792,497, 19 pages.|
|14||Amendment filed Jan. 19, 2007 in co-pending U.S. Appl. No. 10/817,692, 14 pages.|
|15||Amendment filed Jul. 17, 2007 in co-pending U.S. Appl. No. 10/817,692, 17 pages.|
|16||Amendment filed Jun. 22, 2006 in co-pending U.S. Appl. No. 10/817,692, 6 pages.|
|17||Amendment filed Jun. 27, 2008 in co-pending U.S. Appl. No. 10/817,692, 17 pages.|
|18||Amendment filed Mar. 4, 2008 in co-pending U.S. Appl. No. 10/792,497, 22 pages.|
|19||Amendment filed May 7, 2009 in co-pending U.S. Appl. No. 10/792,497, 20 pages.|
|20||*||Amit Reisman et al., Parallel Progressive Rendering of Animation Sequences at Interactive Rates on Distributed-Memory Machines, 1997, pp. 39-47.|
|21||Carr et al., "The Ray Engine", University of Illinois, Graphics Hardware 2002, 10 pgs.|
|22||Certificate of Patent mailed Feb. 19, 2010 in co-pending JP Patent Application No. 2007-532499, 4 pages.|
|23||CN Office Action & Translation issued Mar. 20, 2009 in co-pending CN Application No. 200580031001.1, 17 pages.|
|24||Decision to Grant a Patent mailed Jan. 7, 2010 in co-pending JP Patent Application No. 2007-532499, 4 pages.|
|25||Djeu, Peter, "Graphics on a Stream Processor", Mar. 20, 2003, 53 pgs.|
|26||Everitt, "Interactive Order-Independent Transparency", May 15, 2001, 12 pages.|
|27||Final Office action mailed Aug. 18, 2009 in co-pending U.S. Appl. No. 10/792,497, 20 pages.|
|28||Final Office Action mailed Jul. 27, 2007 in co-pending U.S. Appl. No. 10/792,497, 15 pages.|
|29||Final Office Action mailed Jul. 28, 2008 in co-pending U.S. Appl. No. 10/792,497, 19 pages.|
|30||Final Office action mailed Nov. 6, 2008 in co-pending U.S. Appl. No. 10/817,692, 15 pages.|
|31||Final Office action mailed Oct. 12, 2007 in co-pending U.S. Appl. No. 10/817,692, 14 pages.|
|32||Haeberli et al., "The Accumulation Buffer: Hardware Support for High-Quality Rendering", Computer Graphics, vol. 24, No. 4, Aug. 1990, pp. 309-318.|
|33||Haller, Michael, "Shader Programming Cg, NVIDIA'S Shader Language", 2003, www.nvidia.com, 45 pgs.|
|34||Heckbert et al., "Beam Tracing Polygonal Objects", Computer Graphics, vol. 18, No. 3, Jul. 1984, pp. 119-127.|
|35||International Search Report and the Written Opinion, International application No. PCT/US05/33170, dated Apr. 2007.|
|36||Issue Fee and Comments on Statement of Reasons for Allowance filed May 21, 2009 in co-pending U.S. Appl. No. 10/817,692, 3 pages.|
|37||Issue Notification mailed Jun. 10, 2009 in co-pending U.S. Appl. No. 10/817,692, 1 page.|
|38||Kapasi et al., "Programmable Stream Processors", Computer. Org., vol. 36, No. 8, Aug. 2003 IEEE Computer Society, pp. 1-14.|
|39||Kumar et al., "Efficient Rendering of Trimmed NURBS Surfaces", Apr. 23, 1995, pp. 1-28.|
|40||Lindhold et al., "A User-Programmable Vertex Engine", NVIDIA Corporation, 2001 ACM, pp. 149-158.|
|41||Luebke, David, "Programmable Graphics Hardware", Nov. 20, 2003, 22 pgs.|
|42||Macedonia, Michael, "The GPU Enters Computing's Mainstream", 2003 IEEE, pp. 1-5.|
|43||Mammen, Abraham, "Transparency and Antialiasing Algorithms Implemented with the Virtual Pixel Maps Technique", IEEE Computer Graphics and Applications, vol. 9, Issue 4:43-55, Jul. 1989.|
|44||*||Nathan A. Carr et al., The Ray Engine, The Eurographics Association, 2002, pp. 37-46.|
|45||Notice of Allowance mailed Feb. 26, 2009 in co-pending U.S. Appl. No. 10/817,692, 8 pages.|
|46||Notice of Reasons for Rejection mailed Aug. 19, 2009 in co-pending JP Patent Application No. 2007-532499, 38 pages.|
|47||Office action mailed Apr. 17, 2007 in co-pending U.S. Appl. No. 10/817,692, 13 pages.|
|48||Office Action mailed Dec. 4, 2007 in co-pending U.S. Appl. No. 10/792,497, 19 pages.|
|49||Office action mailed Feb. 2, 2009 in co-pending U.S. Appl. No. 10/792,497, 17 pages.|
|50||Office action mailed Feb. 8, 2007 in co-pending U.S. Appl. No. 10/792,497,17 pages.|
|51||Office action mailed Mar. 13, 2006 in co-pending U.S. Appl. No. 10/817,692, 13 pages.|
|52||Office action mailed Mar. 27, 2008 in co-pending U.S. Appl. No. 10/817,692, 13 pages.|
|53||Office action mailed Sep. 19, 2006 in co-pending U.S. Appl. No. 10/817,692, 12 pages.|
|54||Peercy et al., "Interactive Multi-Pass Programmable Shading", ACM 2000, pp. 425-432.|
|55||Pharr et al., "Rendering Complex Scenes with Memory-Coherent Ray Tracing", Computer Science Dept., Stanford University, 1997, 8 pgs.|
|56||Polyglot, "What Are These Pixel Shaders of Which You Speak", Oct. 28, 2003, Kuro5hin, www.kuro5hin.org., pp. 1-11.|
|57||Proudfoot et al., "A Real-Time Procedural Shading System for Programmable Graphics Hardware", 2001 ACM, pp. 159-170.|
|58||Purcell et al., "Ray Tracing on Programmable Graphics Hardware", Stanford University, 2002 ACM, pp. 703-712.|
|59||*||Purcell et al., Ray Tracing on Programmable Graphics Hardware, © 2002, Association for Computing Machinery, Inc., pp. 703-712.|
|60||Response filed May 8, 2007 in co-pending U.S. Appl. No. 10/792,497, 22 pages.|
|61||Rockwood et al., "Real-time Rendering of Trimmed Surfaces", Computer Graphics, vol. 23, No. 3, Jul. 1989, pp. 107-116.|
|62||Shantz et al., "Rendering Trimmed NURBS with Adaptive Forward Differencing", Computer Graphics, vol. 22, No. 4, Aug. 1988, pp. 189-198.|
|63||Shinya et al., "Principles and Applications of Pencil Tracing", Computer Graphics, vol. 21, No. 4, Jul. 1987, pp. 45-54.|
|64||Tomov, Stan, "Numerical Simulations Using Programmable GPUs", Data Analysis and Visualization, Sep. 5, 2003, Brookhaven Science Associates, 16 pgs.|
|65||*||Tong-Yee Lee, C.S. Raghavendra, John B. Nicholas, "Load Balancing Strategies for Ray Tracing on Parallel Processors", IEEE Region Annual International Conference, pp. 177-181, 1994.|
|66||*||Turner Whitted, An Improved Illumination Model for Shaded Display, ACM Press, Jun. 1990, pp. 343-349.|
|67||U.S. Appl. No. 10/817,692, filed Apr. 2004, Wexler.|
|68||*||Wilfrid Lefer, An Efficient Parallel Ray Tracing Scheme for Distributed Memory Parallel Computers, ACM Press, Nov. 1993, pp. 77-80.|
|69||Zenz, Dave, "Advances in Graphics Architectures", Dell Graphics Technologist, Sep. 2002 Dell Computer Corporation, pp. 1-6.|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US8120611 *||May 24, 2007||Feb 21, 2012||Kabushiki Kaisha Toshiba||Information processing apparatus and information processing method|
|US8243081 *||Aug 22, 2006||Aug 14, 2012||International Business Machines Corporation||Methods and systems for partitioning a spatial index|
|US8289332 *||Dec 8, 2005||Oct 16, 2012||Sinvent As||Apparatus and method for determining intersections|
|US8587588 *||Aug 18, 2009||Nov 19, 2013||Dreamworks Animation Llc||Ray-aggregation for ray-tracing during rendering of imagery|
|US8795087||Feb 14, 2012||Aug 5, 2014||Empire Technology Development Llc||Load balancing in cloud-based game system|
|US9237115||Jun 16, 2014||Jan 12, 2016||Empire Technology Development Llc||Load balancing in cloud-based game system|
|US20080001954 *||May 24, 2007||Jan 3, 2008||Yoshiyuki Hirabayashi||Information processing apparatus and information processing method|
|US20080049016 *||Aug 22, 2006||Feb 28, 2008||Robert Allen Shearer||Methods and Systems for Partitioning A Spatial Index|
|US20080259078 *||Dec 8, 2005||Oct 23, 2008||Tor Dokken||Apparatus and Method for Determining Intersections|
|US20110043521 *||Aug 18, 2009||Feb 24, 2011||Dreamworks Animation Llc||Ray-aggregation for ray-tracing during rendering of imagery|
|WO2013122572A1 *||Feb 14, 2012||Aug 22, 2013||Empire Technology Development Llc||Load balancing in cloud-based game system|
|U.S. Classification||345/503, 345/502, 345/419|
|International Classification||G06F15/16, G06T15/00|
|Cooperative Classification||G06T1/20, G06F2209/509, G06F9/505, G06F9/5066, G06T15/005|
|European Classification||G06F9/50C2, G06F9/50A6L, G06T15/00A|
|Sep 16, 2005||AS||Assignment|
Owner name: NVIDIA CORPORATION, CALIFORNIA
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WEXLER, DANIEL ELLIOT;GRITZ, LARRY I.;ENDERTON, ERIC B.;AND OTHERS;REEL/FRAME:017002/0129
Effective date: 20050915
|Jun 11, 2014||FPAY||Fee payment|
Year of fee payment: 4