|Publication number||US7420572 B1|
|Application number||US 11/313,085|
|Publication date||Sep 2, 2008|
|Filing date||Dec 19, 2005|
|Priority date||Dec 19, 2005|
|Also published as||US8432406|
|Publication number||11313085, 313085, US 7420572 B1, US 7420572B1, US-B1-7420572, US7420572 B1, US7420572B1|
|Inventors||Lordson L. Yue, Vimal S. Parikh|
|Original Assignee||Nvidia Corporation|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (34), Non-Patent Citations (1), Referenced by (2), Classifications (9), Legal Events (3)|
|External Links: USPTO, USPTO Assignment, Espacenet|
The invention relates generally to graphics processing. More particularly, the invention relates to an apparatus, system, and method for clipping graphics primitives with accelerated context switching.
Advanced Graphics Processing Units (“GPUs”) sometimes implement techniques for context switching. In general, context switching refers to switching execution among multiple contexts such that those contexts can share a common processing resource. Multiple contexts can be related to distinct operation modes of the same application program or multiple application programs.
In order to accelerate graphics processing, it is desirable to reduce a response time for context switching. As can be appreciated, this response time represents a performance penalty for context switching and is typically dependent upon a pair of factors, namely an amount of time to complete any pending work and an amount of execution state information to be stored and restored. Typically, a larger amount of time to complete any pending work translates into a longer response time, thus resulting in a larger performance penalty. Similarly, a larger amount of execution state information to be stored and restored typically translates into a longer response time and a larger performance penalty. Unfortunately, current techniques for context switching can be deficient from the standpoint of one or both of these factors, particularly with respect to clipping graphics primitives.
One current technique for context switching is a “wait for idle” technique. In accordance with this technique, context switching is typically placed on hold so as to complete any pending work in connection with clipping a graphics primitive. While an amount of execution state information to be stored and restored is thus reduced, completing the pending work often takes an undesirable amount of time. Indeed, in some instances, completing the pending work can take hundreds of clock cycles, particularly when clipping the graphics primitive with respect to multiple clipping planes. Another current technique for context switching is a “halt style” technique. In accordance with this technique, any pending work in connection with clipping a graphics primitive is halted prior to its completion to allow context switching. While an amount of time to complete the pending work is thus reduced, an amount of execution state information to be stored and restored is often undesirably large, particularly given the extensive amount of information that is typically maintained in hardware while clipping the graphics primitive. Indeed, in some instances, storing and restoring the execution state information can take numerous clock cycles.
It is against this background that a need arose to develop the apparatus, system, and method described herein.
In one aspect, the invention relates to a graphics processing apparatus. In one embodiment, the graphics processing apparatus includes a clipping unit that is configured to issue an initial set of outputs based on execution of a set of clipping operations. The graphics processing apparatus also includes a control unit that is connected to the clipping unit. The control unit is configured to preserve an initial execution state of the clipping unit in response to an initial command for context switching, and the initial execution state is preserved based on a number of the initial set of outputs.
In another aspect, the invention relates to a clipping unit. In one embodiment, the clipping unit includes a clipping engine that is configured to produce outputs for a current context. The clipping unit also includes an output unit that is connected to the clipping engine. The output unit is configured to selectively issue a subset of the outputs based on whether the subset of the outputs is yet to be issued for the current context.
In a further aspect, the invention relates to a graphics processing method. In one embodiment, the graphics processing method includes issuing an initial set of outputs based on clipping of a graphics primitive. The graphics processing method also includes, in response to an initial command for context switching: (i) suspending the clipping of the graphics primitive; and (ii) preserving an initial execution state based on an indication that the initial set of outputs has already been issued.
Other aspects and embodiments of the invention are also contemplated. The foregoing summary and the following detailed description are not meant to restrict the invention to any particular embodiment but are merely meant to describe some embodiments of the invention.
For a better understanding of the nature and objects of some embodiments of the invention, reference should be made to the following detailed description taken in conjunction with the accompanying drawings, in which:
Like reference numerals are used to refer to corresponding components of the drawings.
The computer 102 includes a Central Processing Unit (“CPU”) 108, which is connected to a memory 110 over a bus 122. Referring to
In the illustrated embodiment, the graphics processing apparatus 112 performs a number of operations when servicing a particular context related to the application programs 124. This context can be related to a particular operational mode of the application programs 124, such as a three-dimensional (“3-D”) graphics mode, a two-dimensional (“2-D”) graphics mode, or a video mode. Referring to
During execution of a particular context, the transformation module 116 initially receives a graphics primitive that represent an object to be displayed. Examples of graphics primitives include one-dimensional graphics primitives, such as lines, and two-dimensional graphics primitives, such as polygons. Referring to
The foregoing describes operation of the graphics processing apparatus 112 in the absence of context switching. Advantageously, the graphics processing apparatus 112 also operates in accordance with an improved technique for context switching. This improved technique provides the graphics processing apparatus 112 with multi-tasking capabilities by allowing the graphics processing apparatus 112 to efficiently service multiple contexts related to the application programs 124. In particular, when execution switches from a first context to a second context, the graphics processing apparatus 112 preserves an execution state of the first context prior to servicing the second context. The execution state of the first context represents a degree of progress of execution of the first context. Subsequent to servicing the second context, the graphics processing apparatus 112 restores the execution state of the first context so that the execution of the first context can proceed. In connection with context switching, execution state information of the first context is collected from the transformation module 116, the clipping module 118, and the rasterization module 120, and this execution state information is then delivered to the local memory 126 for storage. Subsequent to servicing the second context, this execution state information is retrieved from the local memory 126 and then restored to the transformation module 116, the clipping module 118, and the rasterization module 120. In the illustrated embodiment, context switching is typically initiated by the CPU 108, which issues a context switching command that is delivered to the graphics processing apparatus 112 over the bus 122. However, it is also contemplated that context switching can be initiated by the graphics processing apparatus 112.
With respect to the clipping module 118, context switching is implemented so as to reduce both an amount of time to complete any pending work and an amount of execution state information to be stored and restored. In particular, in response to receiving a context switch command, the clipping module 118 suspends clipping a graphics primitive in accordance with the first context. Advantageously, clipping the graphics primitive is suspended prior to its completion and with little or no delay upon receiving the context switching command. The clipping module 118 then collects execution state information of the first context and delivers this execution state information to the local memory 126 for storage. Advantageously, this execution state information represents a reduced and optimized set of information to preserve an execution state of clipping the graphics primitive and need not include a variety of other information that is maintained in the clipping module 118. In particular, as further described below, this execution state is readily preserved based on how many outputs have already been issued by the clipping module 118 in connection with clipping the graphics primitive. Subsequent to servicing the second context, this execution state is restored to the clipping module 118, which resumes issuing outputs that have not yet been issued in connection with clipping the graphics primitive. By operating in such manner, the clipping module 118 allows context switching to be performed in an accelerated manner so as to reduce a performance penalty for context switching. In addition, such accelerated context switching is achieved with little or no additional cost or complexity as compared with a conventional implementation.
Attention next turns to
As illustrated in
As illustrated in
Advantages and features of the clipping module 118 can be further understood in conjunction with
Next, at Tswitch1, the control unit 214 receives an initial context switching command and directs the clipping unit 204 to switch to a second context. In response, the clipping engine 206 suspends clipping the graphics primitive in accordance with the first context. In conjunction, the control unit 214 collects contents of the vertex memory 202 and the counter 212 so as to preserve an execution state of clipping the graphics primitive. In particular, the control unit 214 retrieves a content of the counter 212 so as to preserve an indication that three outputs have already been issued as a result of clipping the graphics primitive. Upon switching to the second context, the clipping unit 204 can either service the second context or remain idle.
In conjunction, the output unit 208 selectively issues the outputs O4 through On without reissuing the outputs O1 through O3. The output unit 208 can either discard the outputs O1 through O3 or designate these outputs as being invalid. In the illustrated example, the output unit 208 references contents of the counters 210 and 212 to determine which outputs should be issued. In particular, the counter 210 is cleared upon restart of clipping the graphics primitive, namely at or around Tswitch2, and again tracks a number of outputs that are produced as a result of clipping the graphics primitive. However, the counter 212 is not initially incremented so as to preserve the indication that three outputs have already been issued prior to resuming the first context. Thus, by comparing the contents of the counters 210 and 212, the output unit 208 will not issue any output until the output O4 is produced, upon which the counters 210 and 212 are again incremented substantially in unison. In the absence of further context switching, the counter 212 is then cleared upon completion of clipping the graphics primitive, namely at or around Tend. Alternatively, the clipping module 118 can operate in a similar manner as described above in the event of further context switching.
As can be appreciated at this point, embodiments of the invention allow context switching to be performed in an accelerated manner and with little or no additional cost or complexity as compared with a conventional implementation. In particular, context switching is accelerated by reducing both an amount of time to complete any pending work and an amount of execution state information to be stored and restored. For example, prior to context switching, a clipping program can be executed in accordance with an initial context to produce an initial sequence of outputs. In the event of context switching, execution of the clipping program can be suspended prior to its completion and with little or no delay. A reduced and optimized set of information can be collected so as to preserve an initial execution state of the clipping program. In particular, the initial execution state can be preserved simply based on contents of a vertex memory and a counter, without requiring other contents of an internal datapath pipeline or scratch registers to be stored and restored. Upon resuming the initial context, the initial execution state can be restored, and the clipping program can be restarted from the beginning. By referencing the initial execution state, a remaining sequence of outputs can be issued without reissuing the initial sequence of outputs.
It should be appreciated that the specific embodiments of the invention described above are provided by way of example, and various other embodiments are encompassed by the invention. For example, while some embodiments have been described with reference to clipping graphics primitives, it is contemplated that the improved technique for context switching can be similarly implemented for various applications, such as those in which a relatively large number of outputs are produced or those in which completing any pending work can take a relatively long time.
Some embodiments of the invention relate to a computer storage product with a computer-readable medium having computer code thereon for performing various computer-implemented operations. The media and computer code may be those specially designed and constructed for the purposes of the invention, or they may be of the kind well known and available to those having skill in the computer software arts. Examples of computer-readable media include, but are not limited to: magnetic storage media such as hard disks, floppy disks, and magnetic tape; optical storage media such as Compact Disc/Digital Video Discs (“CD/DVDs”), Compact Disc-Read Only Memories (“CD-ROMs”), and holographic devices; magneto-optical storage media such as floptical disks; carrier wave signals; and hardware devices that are specially configured to store and execute program code, such as Application-Specific Integrated Circuits (“ASICs”), Programmable Logic Devices (“PLDs”), and ROM and RAM devices. Examples of computer code include, but are not limited to, machine code, such as produced by a compiler, and files containing higher-level code that are executed by a computer using an interpreter. For example, an embodiment of the invention may be implemented using Java, C++, or other object-oriented programming language and development tools. Additional examples of computer code include, but are not limited to, encrypted code and compressed code.
Some embodiments of the invention can be implemented using computer code in place of, or in combination with, hardwired circuitry. For example, with reference to
While the invention has been described with reference to the specific embodiments thereof, it should be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the true spirit and scope of the invention as defined by the appended claims. In addition, many modifications may be made to adapt a particular situation, material, composition of matter, method, process operation or operations, to the objective, spirit and scope of the invention. All such modifications are intended to be within the scope of the claims appended hereto. In particular, while the methods disclosed herein have been described with reference to particular operations performed in a particular order, it will be understood that these operations may be combined, sub-divided, or re-ordered to form an equivalent method without departing from the teachings of the invention. Accordingly, unless specifically indicated herein, the order and grouping of the operations is not a limitation of the invention.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US3889107||Sep 27, 1973||Jun 10, 1975||Evans & Sutherland Computer Co||System of polygon sorting by dissection|
|US4958305||Nov 4, 1987||Sep 18, 1990||General Electric Company||Polygon edge clipping|
|US5051737||Feb 23, 1989||Sep 24, 1991||Silicon Graphics, Inc.||Efficient graphics process for clipping polygons|
|US5224210 *||Jun 18, 1992||Jun 29, 1993||Hewlett-Packard Company||Method and apparatus for graphics pipeline context switching in a multi-tasking windows system|
|US5420980 *||Mar 16, 1993||May 30, 1995||Hewlett-Packard Company||Methods and apparatus for graphics pipeline relative addressing in a multi-tasking windows system|
|US5428779 *||Nov 9, 1992||Jun 27, 1995||Seiko Epson Corporation||System and method for supporting context switching within a multiprocessor system having functional blocks that generate state programs with coded register load instructions|
|US5455958 *||Oct 19, 1994||Oct 3, 1995||International Business Machines Corporation||Rendering context manager for display adapters|
|US5564009 *||Jun 2, 1995||Oct 8, 1996||Hewlett-Packard Company||Methods and apparatus for burst data block movement in a multi-tasking windows system|
|US5572657 *||Dec 9, 1994||Nov 5, 1996||Hewlett-Packard Company||Methods and apparatus for graphics block movement in a multi-tasking windows system|
|US5613052||Sep 2, 1993||Mar 18, 1997||International Business Machines Corporation||Method and apparatus for clipping and determining color factors for polygons|
|US5720019||Jun 8, 1995||Feb 17, 1998||Hewlett-Packard Company||Computer graphics system having high performance primitive clipping preprocessing|
|US5838331||Aug 18, 1997||Nov 17, 1998||Parametric Technology Corporation||Computer graphics system for creating and enhancing texture maps|
|US5877773||May 30, 1997||Mar 2, 1999||Hewlett-Packard Company||Multi-pass clipping in a geometry accelerator|
|US5949421||Mar 31, 1997||Sep 7, 1999||Cirrus Logic, Inc.||Method and system for efficient register sorting for three dimensional graphics|
|US5986669||Sep 9, 1997||Nov 16, 1999||Intergraph Corporation||Graphics processing with efficient clipping|
|US6052129||Oct 1, 1997||Apr 18, 2000||International Business Machines Corporation||Method and apparatus for deferred clipping of polygons|
|US6061066||Mar 23, 1998||May 9, 2000||Nvidia Corporation||Method and apparatus for creating perspective correct graphical images|
|US6137497||May 30, 1997||Oct 24, 2000||Hewlett-Packard Company||Post transformation clipping in a geometry accelerator|
|US6181352||Mar 22, 1999||Jan 30, 2001||Nvidia Corporation||Graphics pipeline selectively providing multiple pixels or multiple textures|
|US6208361||Jun 15, 1998||Mar 27, 2001||Silicon Graphics, Inc.||Method and system for efficient context switching in a computer graphics system|
|US6359630||Jun 14, 1999||Mar 19, 2002||Sun Microsystems, Inc.||Graphics system using clip bits to decide acceptance, rejection, clipping|
|US6459438||Feb 2, 2000||Oct 1, 2002||Ati International Srl||Method and apparatus for determining clipping distance|
|US6507348||Feb 2, 2000||Jan 14, 2003||Ati International, Srl||Method and apparatus for clipping an object element in accordance with a clip volume|
|US6512524||Feb 2, 2000||Jan 28, 2003||Ati International, Srl||Method and apparatus for object element attribute determination using barycentric coordinates|
|US6552723||Aug 20, 1999||Apr 22, 2003||Apple Computer, Inc.||System, apparatus and method for spatially sorting image data in a three-dimensional graphics pipeline|
|US6577305||Aug 20, 1999||Jun 10, 2003||Apple Computer, Inc.||Apparatus and method for performing setup operations in a 3-D graphics pipeline using unified primitive descriptors|
|US6597363 *||Aug 20, 1999||Jul 22, 2003||Apple Computer, Inc.||Graphics processor with deferred shading|
|US6621495 *||Jun 8, 2000||Sep 16, 2003||International Business Machines Corporation||Method and apparatus to handle immediate mode data streams in a data processing system|
|US6686924||Feb 2, 2000||Feb 3, 2004||Ati International, Srl||Method and apparatus for parallel processing of geometric aspects of video graphics data|
|US6782432||Jun 30, 2000||Aug 24, 2004||Intel Corporation||Automatic state savings in a graphics pipeline|
|US6928646||Feb 2, 2000||Aug 9, 2005||Sony Corporation||System and method for efficiently performing scheduling operations in an electronic device|
|US7088359||Apr 23, 2003||Aug 8, 2006||Via Technologies, Inc.||Vertex reordering in 3D graphics|
|US7292242||Aug 11, 2004||Nov 6, 2007||Nvida Corporation||Clipping with addition of vertices to existing primitives|
|US20030095137||Nov 16, 2001||May 22, 2003||Chung-Yen Lu||Apparatus and method for clipping primitives in a computer graphics system|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US8139070 *||Oct 3, 2007||Mar 20, 2012||Matrox Graphics, Inc.||Systems for and methods of context switching in a graphics processing system|
|US8909892||Jun 15, 2012||Dec 9, 2014||Nokia Corporation||Method, apparatus, and computer program product for fast context switching of application specific processors|
|U.S. Classification||345/620, 345/506, 345/619|
|International Classification||G09G5/00, G06T1/20|
|Cooperative Classification||G09G5/363, G06T1/20|
|European Classification||G09G5/36C, G06T1/20|
|Dec 19, 2005||AS||Assignment|
Owner name: NVIDIA CORPORATION, CALIFORNIA
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:YUE, LORDSON L.;PARIKH, VIMAL S.;REEL/FRAME:017396/0899
Effective date: 20051214
|Feb 1, 2012||FPAY||Fee payment|
Year of fee payment: 4
|Feb 25, 2016||FPAY||Fee payment|
Year of fee payment: 8