US 7382381 B2
A graphics to video encoder is disclosed. The encoder comprises a client image constructor that receives client input and constructs client image frames based on the client input. A scene integrator is coupled to the client image constructor. The scene integrator accesses base image frames and integrates client image frames with base image frames to generate client scene frames. The graphics to video encoder also has coupled to the scene integrator a video encoder that encodes and outputs the client scene frames as a video bitstream.
1. A graphics to video encoder, comprising:
a client image constructor that receives client input and constructs client image frames based on the client input;
a scene integrator coupled to the client image constructor and accessing base image frames, each of said base image frames based upon a three dimensional model of a game event received from a server, and integrating ones of the client image frames with ones of the base image frames to generate client scene frames; and
a video encoder coupled to the scene integrator and that encodes and outputs the client scene frames as a video bitstream.
2. The encoder of
3. The encoder of
4. The encoder of
5. The encoder of
6. The encoder of
7. The encoder of
8. The encoder of
9. The encoder of
10. A method of integrating client input into a graphics based image, comprising:
receiving client input;
generating client image frames based on said client input;
integrating ones of said client image frames with ones of base image frames to produce client scene frames, wherein each of said base image frames is based upon a three dimensional model of a game event received from a server;
encoding said client scene frames to produce a video bitstream; and
transmitting said video bitstream.
11. The method of
12. The method of
13. The method of
14. The method of
15. The method of
16. The method of
17. The method of
18. The method of
19. A system for forming and transmitting a video bitstream, said system comprising:
a first server that generates three-dimensional graphics; and
a second server coupled to the first server and comprising:
a client image constructor that receives client input and constructs client image frames based on the client input;
a graphics-rendering unit that generates base image frames based on the three-dimensional graphics of a game event received from said first server;
a scene integrator coupled to the client image constructor and the graphics-rendering unit, said scene integrator integrating ones of the client image frames with ones of the base image frames to generate client scene frames; and
a video encoder coupled to the scene integrator and that encodes and outputs the scene frames as a video bitstream.
20. The system of
21. The system of
22. The system of
23. The system of
24. The system of
25. The system of
Embodiments of the present invention relate to the field of video encoding. More specifically, embodiments of the present invention relate to integrating an image based on client input with a graphical image.
A wide array of mobile clients, such as personal digital assistants (PDAs) and cellular telephones, include a display screen for displaying streaming video content. With the expanded bandwidth of wireless networks (e.g., 3G wireless networks), it was believed that streaming video would occupy the vast majority of wireless media. However, the fastest-growing applications have instead been in the arena of mobile network games based on three-dimensional (3D) graphics models. For instance, in countries such as Korea and Japan, the use of mobile network games has increased such that there is a substantial desire to access mobile network games using mobile electronic devices.
Mobile network games require real-time interactivity, which demands high-volume and timely delivered data. This is a difficult task for the current 3G wireless network real-time mechanism. Moreover, typical mobile clients are low-powered lightweight devices with limited computing resources; therefore, they lack the ability to render the millions of triangles per second typically necessary for high quality graphics. As a result, current mobile online games are typically limited in group size and interaction, and are simplistic in visual quality.
In order to improve the quality of mobile online games, a fundamental advance in wireless network technology and a drastic speedup in mobile computing hardware are required. However, game observers comprise a large and growing set of mobile client users. Game observers are users that access network games as non-participants who are interested only in viewing the action. As network games mature, highly skilled players acquire fan bases that loyally follow and observe their heroes in action en masse in multicast channels.
Currently, mobile network game observers are subject to the same limitations as active game participants. Specifically, in order to observe a network game, the observer's mobile client typically must meet the hardware requirements necessary to display 3D graphics. However, as describe above, typical mobile clients do not include hardware capable of rendering high-quality 3D graphics. Accordingly, mobile game observers are limited to viewing mobile online games, which are often less compelling and have simplistic graphics, and therefore less desirable to observe.
A graphics to video encoder is disclosed. The encoder comprises a client image constructor that receives client input and constructs client image frames based on the client input. A scene integrator is coupled to the client image constructor. The scene integrator accesses base image frames and integrates the client image frames with the base image frames to generate client scene frames. The graphics to video encoder has a video encoder coupled to the scene integrator. The video encoder encodes and outputs the client scene frames as a video bitstream.
The accompanying drawings, which are incorporated in and form a part of this specification, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention.
A method, system, and device to integrate client input derived images with graphics images is disclosed. In the following detailed description of the present invention, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be obvious to one skilled in the art that the present invention may be practiced without these specific details or by using alternate elements or methods. In other instances well known methods, procedures, components, and circuits have not been described in detail as not to unnecessarily obscure aspects of the present invention.
Aspects of the present invention may be implemented in a computer system that includes, in general, a processor for processing information and instructions, random access (volatile) memory (RAM) for storing information and instructions, read-only (non-volatile) memory (ROM) for storing static information and instructions, a data storage device such as a magnetic or optical disk and disk drive for storing information and instructions, an optional user output device such as a display device (e.g., a monitor) for displaying information to the computer user, an optional user input device including alphanumeric and function keys (e.g., a keyboard) for commnunicating information and command selections to the processor, and an optional user input device such as a cursor control device (e.g., a mouse) for communicating user input information and command selections to the processor.
Game server 108 is operable to provide three-dimensional (3D) graphics game play (e.g., games using 3D models) to game player 102. Game server 108 sends updated game events to game player 102 as well as to proxy server 610. It should be appreciated that any number of game players 102 may access core network 104.
Proxy server 610 is a computer system for converting updated game events to an encoded bit stream. In addition to video content, the bitstream transmitted by the proxy server 610 may comprise audio content. In one embodiment, the bitstream is streamed to interested clients 614 using communication link 112. However, any method or technique for transmitting wireless transmission of data maybe used. In one embodiment, mobile multicast support 112 utilizes a 3G wireless standard, and includes a Gateway GPRS (General Packet Radio Service) Support Node (GGSN), a Serving GPRS Support Node (SGSN), a plurality of UMTS (Universal Mobile Technology System) Terrestrial Radio Access Network (UTRAN) nodes, and a plurality of user equipment (UE) for transmitting the bit stream to clients 614. However, any wireless standard may be used. In one embodiment, the user equipment is a mobile client. In one embodiment, the mobile client is a personal digital assistant. In another embodiment, the mobile client is a cellular telephone.
The client node 614 may be a device with wireless capability. Many wireless devices, such as wireless handsets and devices do not have rendering engines or have rendering engines with limited processing capability. Thus, by performing the graphical rendering at the proxy server 610, the client node 614 is able to display a higher quality video display than if rendering were attempted at the client node 714.
By being an inactive game observer receiving streaming video (and audio) of game action instead of an active game player, the complexity of rendering millions of triangles per second is pushed back to the game server and proxy server, which are much more capable of performing the rendering and then the graphics-to-video conversion. The result is that the game observer can enjoy high visual quality, in some cases even higher quality than the game players themselves.
Embodiment of the present invention provide a method, device, and system that integrate an image based on client input with a base image prior to video encoding. The proxy server 610 can receive client input from one or more client nodes 614 and generate two-dimensional client video frames based thereon. These client frames are integrated with two-dimensional frames generated by the proxy server 610 based on events from the game server 108. For example, the content of one of the client frames is integrated with the content of one of the game server frames, then a second of each, etc. The resulting integrated frames are encoded and transmitted to the client nodes 614 in the video bitstream.
As an illustration of client input, the client nodes 614 are allowed to provide comments and the like, which may affect the presentation of the game being observed by client 614. For example, the client input may be comments commonly referred to as “heckling”. These heckling comments can be viewed by many client nodes 614. The game player 102 is typically not aware of the client input; however, the client input can be presented to the game player(s). Thus, the clients 614 can enjoy a limited level of interactivity while retaining the benefit of high quality graphics of the video streaming presented herein.
The client image 715 is generated by a rendering unit based on client input. Exemplary client inputs are text messages, and instant messages. The client image 715 comprises client icons 740, in this embodiment. There is also a text box 745, which is based on client input. The text box 745 is generated from a text message in one embodiment. The client icons 740 can be made to perform various actions, such as move around or throw objects 750. For example, the client input may be a message whose content is “jump.” The client icons 740 are rendered as jumping in a series of frames in the client image 715, in response to this client input. These frames are then integrated with a series of frames in the base image 705.
The integration of the frames generated from the client nodes is non-time-critical. By non-time-critical, it is meant that the frames client image 715, which are generated based on the client input, do not have to be integrated with particular frames of the base image 705. For example, the client may make a comment based on his/her observation of the game being streamed to the client. It is not critical when in time the client comments get integrated into the game. For example, there may be a delay between the time the client input is sent by the client node to the proxy server 610 and the time that the proxy server 610 incorporates the effect of the client comments into the video stream sent back to the client.
The integration of the frames generated from the client nodes may be intrusive or non-intrusive. By non-intrusive it is meant that the clients' input does not affect other elements in the base image. For example, if the base image is a game environment, the client input is purely additive to the rest of the scene composition. Referring to
An embodiment of the present invention provides for intrusive client input. In this case, the client input alters the base image. For example, the client input does affect elements in the game environment. In the intrusive case, typically a central authority such as the game server determines the consequence of the intrusive actions. As an illustration, the clients might vote for their preferred game player. The results can be used to alter the base image 705 by, for example, causing a player 735 to fall down.
Referring again to
While the proxy server 610 and gamer server 108 are depicted as separate components, their functions can be incorporated into a single server. Having a separate proxy server 610 allows offloading of load from the game server 108, and also improves scalability to large number of viewers. For instance, if there is a game with two players and 1000 viewers, it is possible to use a single game server 108, and 50 proxy servers 610, each of which will handle client input of 20 client nodes 614.
When multiple proxy servers 610 are used, each may receive identical input from the game server 108. Each proxy server 610 then handles their respective client input and integrates those client inputs with the game events before generating the video stream.
Further, the proxy server 610 can keep track of which client nodes 614 wish to be grouped together. For example, a group of users may want to share the experience. Thus, the client input from one client node 614 may be viewed by another client node 614. The proxy server 610 may combine the client input from multiple client nodes 614 and generated a common video stream therefrom.
In some embodiments, the client image 715 is rendered at a depth between the active polygon layer 805 and the background layer 810. This is referred to herein as depth-multiplexing the client image 715, active polygon layer 805, and the background layer into a single client scene 725. However, it will be appreciated that it is not required that the images (715, 805, 810) are depth-multiplexed. For example, the depths at which the various images are rendered in the client scene may overlap.
In one implementation, the active layer 805, which is typically foreground, is differentiated (e.g., separated) from the background layer 810 by selecting a depth value “d” such that “x” percent of rendered objects have depths less than “d.” A suitable value of “x” may be determined by an analysis of the depth values extracted from data used to render the frames. Alternatively, the value of “x” may be pre-determined. The objects in the base image 705 having relatively low depth values may be assumed to be active layer 805 objects. The remaining objects may be assumed to be background layer 810 objects. The objects in the client image 715 may be rendered at a depth between the active layer objects and background layer objects.
However, it is not required that the base image 705 be divided between an active polygon layer 805 and a background layer 810. In one implementation, the client icons 740, objects 750, and other information in the client image 715 are rendered in the client scene 725 as ghost images. For example, they are rendered as semi-transparent characters such that they would not obliquely cover information derived from the active polygon layer 805.
Some games provide one or more background images 810 separate from the active polygon layer 805 prior to start of game play. Referring to
Various embodiments of the present invention can flexibly locate the client image 715 in the client scene 725 such that objects in the client image 715 are not completely covered by the foreground (e.g., active layer 805). It will be appreciated that the client image 715 may be rendered in front of the active polygon layer 805, if desired.
The client image constructor 920 may comprise a composition information extractor (not depicted in
The graphics-rendering unit 210 renders frames of base image 705 based on game events. Graphics rendering unit 210 may receive a 3D model from a server (e.g., game server 108 of
The scene integrator 910 combines the content of a frame of the base image 705 with a frame of the client image 715 to form a frame of client scene 725, which is input into the video encoder 230.
The video encoder 230 converts the raw frames of client scene 725 into compressed video for video streaming to the interested client nodes. The bit stream is subsequently packetized and sent to the interested observers (e.g., client nodes 614 of
Audio information may also be added to the encoded bit stream 235, based on the client input. For example, the client layer constructor 920 may generate a digitized audio signal based on client input. The digitized audio signal can be input to the video encoder, which incorporates the digitized audio signal into the encoded bitsteam, or it can be sent as separate media stream, along with the video stream, to the client(s).
The video encoder 230 can use the composition information from the client image constructor 920 and/or the game-rendering unit 210 for more efficient video encoding and/or streaming. In one embodiment, the composition information includes depth values. It should be appreciated that the composition information may include other factors, including but not limited to: motion, texture, shape, fog, lighting, and other semantic information. In one embodiment, composition information 225 is transmitted to video encoder 230 to improve the visual quality of regions of interest. In another embodiment, client image constructor 920 generates a weighting factor based on the composition information 225 for identifying a region of interest. The weighting factor is then transmitted to video encoder 230 to improve the visual quality of regions of interest.
While depicted as separate blocks, it will be appreciated that the functionality of some of the blocks in
As previously discussed, the client image 715 may be constructed from client information from multiple client nodes. In this case, the objects in the client image 715 relate to client input that was received at different points in time. Thus, some objects in the client image 715 relate to recent client input, while other objects relate to older client input. If the client input is heckling, the recent client input can be defined as an active heckler and the older input as a silent, heckler. In order to improve image quality, more bits can be allocated to active hecklers than silent heckler at any given time. Further, if the amount of scene activity is large, as determined by the average of the magnitude of the motion vectors in a frame during video encoding, more bits can be allocated to the foreground as opposed to the hecklers.
Step 520 is generating client image frames based on the client input. For example, a client image constructor 920 generates a client image 715 based on the client input. The client input may be, but is not limited to, text messages and instant messages.
Step 530 is integrating client image frames with base image frames to produce client scene frames. The integration is performed on a frame-by-frame basis. For example, one client image frame is integrated with one base image frame to produce one client scene frame. The base image may be from a game; however, the base image is not so limited.
Step 540 is encoding the client scene frames to produce a video bitstream. For example, a video encoder 230 converts raw frames of client scene into compressed video for video streaming to the interested client nodes. The bit stream may be packetized.
Step 550 is transmitting the video bitstream. For example, the bitstream is sent to the client nodes (e.g., 614 of
Thus, embodiments of the present invention, integrating a graphic image with an image based on client input, is thus described. While the present invention has been described in particular embodiments, it should be appreciated that the present invention should not be construed as limited by such embodiments, but rather construed according to the below claims.