Search Images Maps Play YouTube News Gmail Drive More »
Sign in
Screen reader users: click this link for accessible mode. Accessible mode has the same essential features but works better with your reader.

Patents

  1. Advanced Patent Search
Publication numberUS20030105907 A1
Publication typeApplication
Application numberUS 10/273,829
Publication dateJun 5, 2003
Filing dateOct 17, 2002
Priority dateOct 22, 2001
Also published asCN1286019C, CN1608249A, DE60211730D1, DE60211730T2, DE60236309D1, DE60237222D1, DE60239227D1, EP1438667A2, EP1438667B1, EP1438818A2, EP1438818B1, EP1442355A2, EP1442355B1, EP1442365A2, EP1442374A2, EP1442374B1, EP1466448A2, EP1466448B1, US6901491, US6938119, US7209996, US7248585, US7865667, US20030081615, US20030084309, US20030088610, US20030093614, US20030097518, US20070162911, WO2003036450A2, WO2003036450A3, WO2003036482A2, WO2003036482A3, WO2003036485A2, WO2003036485A3, WO2003036508A2, WO2003036508A3, WO2003036884A2, WO2003036884A3, WO2003036902A2, WO2003036902A3
Publication number10273829, 273829, US 2003/0105907 A1, US 2003/105907 A1, US 20030105907 A1, US 20030105907A1, US 2003105907 A1, US 2003105907A1, US-A1-20030105907, US-A1-2003105907, US2003/0105907A1, US2003/105907A1, US20030105907 A1, US20030105907A1, US2003105907 A1, US2003105907A1
InventorsLeslie Kohn, Michael Wong
Original AssigneeSun Microsystems, Inc.
Export CitationBiBTeX, EndNote, RefMan
External Links: USPTO, USPTO Assignment, Espacenet
System and method for caching DRAM using an egress buffer
US 20030105907 A1
Abstract
A system and method includes a server that includes a processor and a memory system coupled that are coupled to a bus system. A network interface is coupled to the processor and an egress buffer is coupled to the processor and the network interface by an egress bus.
Images(7)
Previous page
Next page
Claims(19)
What is claimed is:
1. A server comprising:
a processor coupled to a bus system;
a memory system coupled to the bus system;
a network interface coupled to the processor; and
an egress buffer coupled to the processor and the network interface by an egress bus.
2. The server of claim 1, wherein the processor includes a plurality of processors.
3. The server of claim 2, wherein the plurality of processors are included on a first die.
4. The server of claim 2, wherein the plurality of processors are included on a plurality of dies.
5. The server of claim 1, wherein the egress buffer includes high speed random access memory.
6. The server of claim 1, wherein the egress buffer includes random access memory that has an operating speed of about 400 MHz.
7. The server of claim 1, wherein the egress buffer and the egress bus have a data throughput rate that is greater than or equal to about twice the amount of a data stream to be served.
8. The server of claim 1, wherein the egress buffer includes a double data rate buffer.
9. The server of claim 1, wherein the egress bus has a bandwidth that is greater than or equal to about twice the amount of a data stream to be served.
10. The server of claim 1, wherein the egress bus includes a 32-bit data bus.
11. A method of serving data comprising:
receiving a request for data in a processor in a server;
retrieving the requested data;
processing the retrieved data in the processor;
storing the processed data in an egress buffer that is coupled to the processor and a network interface; and
serving the stored data from the egress buffer through the network interface.
12. The method of claim 11, wherein the egress buffer that is coupled to the processor and the network interface by an egress bus.
13. The method of claim 11, wherein the requested data includes a data stream.
14. The method of claim 13, wherein the egress bus has a bandwidth of about twice a bandwidth of the data stream.
15. The method of claim 13, wherein the egress bus includes a 32-bit data bus.
16. The method of claim 11, wherein the processed data is stored in the egress buffer substantially simultaneously with the stored data being served from the egress buffer.
17. The method of claim 11, wherein processing the retrieved data in the processor includes at least one of a group consisting of formatting the data, encrypting the data, and decrypting the data.
18. A method of serving a data stream comprising:
receiving a request for a data stream in a processor in a server;
retrieving the requested data stream;
processing the retrieved data stream in the processor;
storing the processed data stream in an egress buffer that is coupled to the processor and a network interface by an egress bus having a bandwidth that is greater than or equal to about twice the data stream; and
serving the stored data stream from the egress buffer through the network interface.
19. The method of claim 18, wherein the data stream includes at least one of a group consisting of audio and video.
Description
CROSS REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority from U.S. Provisional Patent Application No. 60/345,315 filed on Oct. 22, 2001 and entitled “High Performance Web Server,” which is incorporated herein by reference in its entirety for all purposes.

BACKGROUND OF THE INVENTION

[0002] 1. Field of the Invention

[0003] The present invention relates generally to microprocessors, and more particularly, to methods and systems for microprocessors to serve data from memory systems.

[0004] 2. Description of the Related Art

[0005] A typical server computer, such as a web server, has one main memory. A server serves data from the memory to a client computer that requested the data. FIG. 1 shows a typical web server 102 and client computer 110 that are linked by a network 104, such as the Internet or other network. FIG. 2 is a high-level block diagram of a typical web server 102. As shown, the web server 102 includes a processor 202, a memory system 203 that includes a ROM 204, a main memory DRAM 206 and a mass storage device 210, each connected by a peripheral bus system 208. The peripheral bus system 208 may include one or more buses connected to each other through various bridges, controllers and/or adapters, such as are well known in the art. For example, the peripheral bus system 208 may include a “system bus” that is connected through an adapter to one or more expansion buses, such as a Peripheral Component Interconnect (PCI) bus. Also coupled to the peripheral bus system 208 are a network interface 212, a number (N) of input/output (I/O) devices 216-1 through 216-N and a peripheral cryptographic processor 220. 141 I/O devices 216-1 through 216-N may include, for example, a keyboard, a pointing device, a display device and/or other conventional I/O devices. Mass storage device 210 may include any suitable device for storing large volumes of data, such as a magnetic disk or tape, magneto-optical (MO) storage device, or any of various types of Digital Versatile Disk (DVD) or Compact Disk (CD) based storage.

[0006] Network interface 212 provides data communication between the computer system and other computer systems on the network 104. Hence, network interface 212 may be any device suitable for or enabling the web server 102 to communicate data with a remote processing system (e.g., client computer 110) over a data communication link, such as a conventional telephone modem, an Integrated Services Digital Network (ISDN) adapter, a Digital Subscriber Line (DSL) adapter, a cable modem, a satellite transceiver, an Ethernet adapter, or the like. 161 The web server 102 typically processes large quantities of data, for example, streaming data such as streaming video or streaming audio or other types of data or serving a website and other web data. FIG. 3 is a flowchart of the method operations 300 of the web server 102 serving a large volume of data such as a 10 MB data stream. In operation 305, the web server 102 receives a request for the 10 MB data stream from the client 110. If the 10 MB data stream required processing such as being encrypted, then the 10 MB data stream must first be retrieved from the DRAM 206 into the processor 202. In operation 310, the data stream is retrieved from the DRAM 206 and/or other portions of the memory system 203.

[0007] In operation 315, the data stream is processed in the processor 202. In operation 320, the processed data stream is stored in the DRAM 206. In operation 325, the data stream is served through the network interface 212 to the network 104 to the client 110.

[0008] The processed data stream must be stored in the memory system 203 because the processor 202 and the network interface 212 typically have different data processing rates. By way of example, the processor 202 can process data at a rate of about 2 GHz or even greater. The peripheral bus system 208 typically operates at about 166 MHz, therefore the network interface 212 typically does not operates as fast as 2 GHz and cannot serve the data as fast as the processor can process the data. As a result the processed data must be temporarily stored in the memory system 203 so that the network interface 212 can serve the processed data at the optimal rate for the network interface 212. Alternatively, the network interface 212 may be able to output data faster than the processor can process the data, therefore, the processed data can be built up in the memory system 203 and the network interface 203 can serve the data from the memory system 203 at a high rate.

[0009] Now, as described in FIG. 3 above, the 10 MB data stream must transfer across the peripheral bus system 208 between the DRAM 206 and the processor 202 three times. Therefore, a 10 MB data stream being served results in a 30 MB data stream flowing between the DRAM 206 and the processor 202. These multiple passes between the DRAM 206 and the processor 202 consume large portion of the total I/O bandwidth of the processor 202 I/O which can limit the ability of the processor 202 to perform other operations besides serving the 10 MB data stream.

[0010] What is needed is a system and method to reduce the bandwidth usage of the processor to memory system interface.

SUMMARY OF THE INVENTION

[0011] Broadly speaking, the present invention fills these needs by providing a system method for caching DRAM to reduce the bandwidth usage of the processor to memory system interface. It should be appreciated that the present invention can be implemented in numerous ways, including as a process, an apparatus, a system, computer readable media, or a device. Several inventive embodiments of the present invention are described below.

[0012] One embodiment includes a server that includes a processor and a memory system coupled that are coupled to a bus system. A network interface is coupled to the processor and an egress buffer is coupled to the processor and the network interface by an egress bus.

[0013] The processor can also include multiple processors. The multiple processors can be included on a first die or chip. Alternatively, the multiple processors can be included on multiple separate dies or chips.

[0014] The egress buffer can include a high-speed random access memory. In one embodiment, the egress buffer includes random access memory that has an operating speed of about 400 MHz.

[0015] The egress buffer and the egress bus can have a data throughput rate that is greater than or equal to about twice the amount of a data stream to be served.

[0016] The egress buffer can also include a double data rate buffer.

[0017] The egress buffer can also include a double data rate buffer.

[0018] The egress bus has a bandwidth that is greater than or equal to about twice the amount of a data stream to be served. The egress bus can also include a 32-bit data bus.

[0019] One embodiment includes a system and method of serving data that includes receiving a request for data in a processor in a server. The requested data is retrieved. The retrieved data is processed in the processor. The processed data is stored in an egress buffer that is coupled to the processor and a network interface. The stored data is served from the egress buffer through the network interface.

[0020] The egress buffer is coupled to the processor and the network interface by an egress bus.

[0021] The requested data can include a data stream.

[0022] The egress bus has a bandwidth of about twice a bandwidth of the data stream.

[0023] The egress bus can include a 32-bit data bus.

[0024] The processed data can be stored in the egress buffer substantially simultaneously with the stored data being served from the egress buffer.

[0025] Processing the retrieved data in the processor can also include formatting the data, encrypting the data, and decrypting the data among other processes.

[0026] Another embodiment includes a system and method of serving a data stream that includes receiving a request for a data stream in a processor in a server. The requested data stream is retrieved. The retrieved data stream is processed in the processor. The processed data stream is stored in an egress buffer that is coupled to the processor and a network interface by an egress bus. The egress bus has a bandwidth that is greater than or equal to about twice the data stream. The stored data stream is served from the egress buffer through the network interface. The data stream can include audio or video or any other streaming media.

[0027] Other aspects and advantages of the invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

[0028] The present invention will be readily understood by the following detailed description in conjunction with the accompanying drawings, and like reference numerals designate like structural elements.

[0029]FIG. 1 shows a typical web server and client computer that are linked by a network, such as the Internet or other network.

[0030]FIG. 2 is a high-level block diagram of a typical web server.

[0031]FIG. 3 is a flowchart of the method operations of the web server serving a large volume of data such as a 10 MB data stream.

[0032]FIG. 4 shows a block diagram of a server in accordance with one embodiment of the present invention.

[0033]FIG. 5 is a flow chart of the method operations of serving data using an egress buffer in accordance with one embodiment of the present invention.

[0034]FIG. 6 shows a block diagram of a processor according to one embodiment of the present invention.

DETAILED DESCRIPTION OF THE EXEMPLARY EMBODIMENTS

[0035] Several exemplary embodiments for caching DRAM to reduce the bandwidth usage of the processor to memory system interface will now be described. It will be apparent to those skilled in the art that the present invention may be practiced without some or all of the specific details set forth herein.

[0036] One embodiment of the present invention includes an egress buffer that can be used to temporarily store processed data from the processor that will be served by the network interface. The egress buffer thereby reduces the demand on the bandwidth usage of the processor to memory system interface by about two-thirds.

[0037]FIG. 4 shows a block diagram of a server 400 in accordance with one embodiment of the present invention. The server 400 can be a web server or other type of server. The server 400 includes a bus system 408 that couples a processor 402 and a memory system 404. The processor 402 includes at least one processor core 402A. The server 400 also includes an egress buffer 420 that is coupled to the processor 402 and a network interface 412.

[0038] The egress buffer 420 is coupled to the processor 402 and the network interface 412 via a dedicated egress bus 422. The egress bus 422 can be as wide as necessary, for example, the egress bus 422 can be 32-bits (i.e., lines) wide but the egress bus 422 could be narrower or wider such as 16-bits or 64-bits. The egress buffer 420 can be large enough to buffer the desired data throughput of the network interface 412 as will be described in more detail below. Referring to the above example of a 10 gigabit data throughput, the egress buffer 420 would need to be 32 megabytes or possibly larger.

[0039] In one embodiment, the egress buffer 420 includes a very high-speed ram such as a fast cycle time RAM (FCRAM) that operates as fast as about 400 MHz or more. The FCRAM allows the egress buffer 420 to serve the data across the egress bus 422 to the network interface 412 at the speed of the network interface 412.

[0040] In one embodiment, the server 400 can include multiple processors on multiple processor chips or dies. The egress bus 422 can also couple a single egress buffer 420 to all of the multiple processors. An egress bus controller can be included to manage the data flow between the multiple processors and the egress buffer 420.

[0041]FIG. 5 is a flow chart of the method operations 500 of serving data using an egress buffer in accordance with one embodiment of the present invention. In operation 505, a request for data is received in the server 400. The request can be from an application within the server 400 or due to a request received from an external data requester, such as a client computer 110 in FIG. 1 that is linked to the server 400 by a network.

[0042] The processor 402 retrieves the requested data, in operation 510. The data can be retrieved from numerous sources such as from the memory system 404 or other sources via the system data bus 408. In operation 515, the processor 402 processes the retrieved data such as packetizing the data or performing some other formatting, encryption, decryption, or other processing to the retrieved data.

[0043] The processed data is stored in the egress buffer 420 via the egress bus 422, in operation 520. In operation 525, the network interface 412, 412′ retrieves the processed data from the egress buffer 420, via the egress bus 422 and serves the data to the data requestor.

[0044]FIG. 6 shows a block diagram of a processor 402′ according to one embodiment of the present invention. The processor 402′ includes a processor core 402A′ and an integrated network interface 412′. Because the integrated network interface 412′ is included on the processor die 402′ with the processor core 402A′, the network interface 412′ can output data faster than the network interface 412 described in FIG. 4 above.

[0045] In one embodiment, a dedicated bus 422A couples the processor core 402A′ to the egress bus 422, through a process data switch 430. The process data switch 430 is also coupled to the network interface 412′ via a bus 422B. Alternatively, the network interface 412′ can be coupled to the egress buffer 420 by a separate, dedicated bus. The process data switch 430 directs the data from the processor core 402A′ to the egress buffer 420 or the memory system 404 and controls the data flow across the egress bus 422 so that the data flows either to the network interface 412′ or from the processor core 402A′.

[0046] In alternative embodiments, the egress bus 422 can also link other components on the processor die to the egress buffer 420.

[0047] In one embodiment, the egress buffer 420 is an about 400 MHz, double data rate (DDR) buffer. When combined with a 32-bit wide egress bus 422, a 400 MHz DDR buffer produces 800 MHz×32-bit wide egress bus 422 to produce 3.2 GB per second throughput with a relatively small actual buffer of only two or four bits per 32 bit lines of the egress bus 422. 3.2 GB per second throughput of the egress bus 422 and egress buffer 420 equates to slightly more than 24 gigabits per second. A 24 gigabit per second egress buffer 420 can support two 10-gigabit-per-second data streams: A first 10 gigabit data stream is input to the egress buffer 420 while a second 10 gigabit data stream output from the egress buffer 420 to the network interface 412, 412′. The speed of the egress buffer 420 memory must be sufficient to support the network interface 412, 412′ data demand rate.

[0048] The egress buffer 420

[0049] Because the egress buffer 420 is coupled to the processor core 402A′ by the dedicated egress bus 422, the egress bus 422 can deliver the data much quicker than a shared data bus such as the I/O interface 432 between the memory system 404 and the processor core 402A′. Further, because the egress buffer 420 uses much higher speed type RAM (e.g., FCRAM), the egress buffer 420 can serve the data faster than standard DRAM.

[0050] The egress buffer 420 can also substantially smooth out the data interface between the data processing rate of the processor core 402A and the rate the network interface 412 can serve the data. Often the difference in processing rates (i.e., the transient variation) can vary as the processor performs other operations or the network is busy and reduces the rate the network interface 412 can serve the data. The amount of transient variation increases as the size of the egress buffer 420 increases.

[0051] The egress buffer 420 FCRAM can operate in any range from about 100 MHz or even slower to about 400 MHz or greater. The higher speed of the egress buffer 420, the greater the efficiency of the processor serving the data to the network interface. Alternatively, lower speed egress buffer 420 FCRAM would also increase the efficiency by reducing the demand across the system bus 408 and specifically across the interface between the memory system 404 and the processor 402.

[0052] The egress buffer 420 could be within a single die or chip with the processor 402. However, typically the egress buffer 420 would not be part of the processor die because of the physical size of the memory is relatively large as compared to the size of the microprocessor devices in the processor 402 and therefore including the egress buffer is not an efficient use of the space on processor die.

[0053] The network interface 412, 412′ can have any bandwidth such as about a 4 gigabit per second or about a 10 gigabit per second. The network interface 412, 412′ has direct access to the egress buffer 420 via the dedicated egress bus 422.

[0054] As used herein the term “about” means +/−10%. By way of example, the phrase “about 250” indicates a range of between 225 and 275.

[0055] With the above embodiments in mind, it should be understood that the invention might employ various computer-implemented operations involving data stored in computer systems. These operations are those requiring physical manipulation of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. Further, the manipulations performed are often referred to in terms, such as producing, identifying, determining, or comparing.

[0056] Any of the operations described herein that form part of the invention are useful machine operations. The invention also relates to a device or an apparatus for performing these operations. The apparatus may be specially constructed for the required purposes, or it may be a general-purpose computer selectively activated or configured by a computer program stored in the computer. In particular, various general-purpose machines may be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.

[0057] The invention can also be embodied as computer readable code on a computer readable medium. The computer readable medium is any data storage device that can store data, which can be thereafter be read by a computer system. Examples of the computer readable medium include hard drives, network attached storage (NAS), read-only memory, random-access memory, CD-ROMs, CD-Rs, CD-RWs, magnetic tapes, and other optical and non-optical data storage devices. The computer readable medium can also be distributed over a network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

[0058] It will be further appreciated that the instructions represented by the operations in FIG. 5 are not required to be performed in the order illustrated, and that all the processing represented by the operations may not be necessary to practice the invention. Further, the processes described in FIG. 5 can also be implemented in software stored in any one of or combinations of the RAM, the ROM, or the hard disk drive.

[0059] Although the foregoing invention has been described in some detail for purposes of clarity of understanding, it will be apparent that certain changes and modifications may be practiced within the scope of the appended claims. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims.

Referenced by
Citing PatentFiling datePublication dateApplicantTitle
US7290116Jun 30, 2004Oct 30, 2007Sun Microsystems, Inc.Level 2 cache index hashing to avoid hot spots
US7366829Jun 30, 2004Apr 29, 2008Sun Microsystems, Inc.TLB tag parity checking without CAM read
US7430643Dec 30, 2004Sep 30, 2008Sun Microsystems, Inc.Multiple contexts for efficient use of translation lookaside buffer
US7509484Jun 30, 2004Mar 24, 2009Sun Microsystems, Inc.Handling cache misses by selectively flushing the pipeline
US7519796Jun 30, 2004Apr 14, 2009Sun Microsystems, Inc.Efficient utilization of a store buffer using counters
US7543132Jun 30, 2004Jun 2, 2009Sun Microsystems, Inc.Optimizing hardware TLB reload performance in a highly-threaded processor with multiple page sizes
US7571284Jun 30, 2004Aug 4, 2009Sun Microsystems, Inc.Out-of-order memory transactions in a fine-grain multithreaded/multi-core processor
Classifications
U.S. Classification710/305, 710/54
International ClassificationG06F9/46, G06F15/167, G09C1/00, H04L29/06, G06F12/00, G06F13/38, G11C11/4074, G06F15/173, G06F11/10, G06F21/00, G06F12/16, G06F13/16, G06F9/38, G06F13/00, G06F15/78, G06F15/16, G06F12/08, G06F9/30, G06F1/32, H04L12/56, G06F13/14
Cooperative ClassificationG06F11/108, Y02B60/1225, G06F21/72, H04L29/06, G06F9/3891, H04L49/90, G06F15/7846, G06F9/30043, G06F9/3879, G06F12/0811, G06F12/084, G06F12/0813, G06F1/3203, G06F15/17375, G11C11/4074, H04L45/745, H04L47/2441, G06F11/1008, G06F1/3275, H04L49/9057, H04L47/10, G06F1/3225, H04L49/9089, G06F13/1689, G06F9/3851, H04L69/22
European ClassificationG06F12/08B4L, H04L45/745, G11C11/4074, H04L49/90P, H04L47/24D, G06F1/32P1C6, H04L47/10, G06F12/08B4N, G06F13/16D8, G06F1/32P, G06F9/38S1, G06F12/08B4S, G06F9/30A2L, G06F11/10M, G06F9/38E4, H04L49/90, H04L49/90R1, G06F21/72, G06F9/38T6C, G06F15/173N4A, G06F1/32P5P8, H04L29/06, G06F15/78P1C, G06F11/10R1
Legal Events
DateCodeEventDescription
Oct 17, 2002ASAssignment
Owner name: SUN MICROSYSTEMS, INC., CALIFORNIA
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KOHN, LESLIE;WONG, MICHAEL K.;REEL/FRAME:013405/0127
Effective date: 20021015