Search Images Maps Play YouTube News Gmail Drive More »
Sign in
Screen reader users: click this link for accessible mode. Accessible mode has the same essential features but works better with your reader.

Patents

  1. Advanced Patent Search
Publication numberUS20050071542 A1
Publication typeApplication
Application numberUS 10/842,339
Publication dateMar 31, 2005
Filing dateMay 10, 2004
Priority dateMay 13, 2003
Also published asCN1788260A, CN100444141C, DE112004000821T5, US7016213, US7421525, US20040230718, US20050162882, US20050166006, WO2004102403A2, WO2004102403A3
Publication number10842339, 842339, US 2005/0071542 A1, US 2005/071542 A1, US 20050071542 A1, US 20050071542A1, US 2005071542 A1, US 2005071542A1, US-A1-20050071542, US-A1-2005071542, US2005/0071542A1, US2005/071542A1, US20050071542 A1, US20050071542A1, US2005071542 A1, US2005071542A1
InventorsFrederick Weber, Ross La Fetra, Paul Miranda
Original AssigneeAdvanced Micro Devices, Inc.
Export CitationBiBTeX, EndNote, RefMan
External Links: USPTO, USPTO Assignment, Espacenet
Prefetch mechanism for use in a system including a host connected to a plurality of memory modules via a serial memory interconnect
US 20050071542 A1
Abstract
A system includes a host coupled to a serially connected chain of memory modules. In one embodiment, the host includes a memory controller that may be configured to issue a memory read request for data stored within the memory modules. The memory controller may further request that data be prefetched from the memory modules by encoding prefetch information within the memory read request. The memory controller may also be configured to issue a memory write request to write data to the memory modules and to selectively request that one or more pages of memory within a given one of the memory modules remain open by encoding the prefetch information within the memory write request.
Images(7)
Previous page
Next page
Claims(23)
1. A system comprising:
a host including a memory controller; and
a plurality of memory modules coupled serially in a chain to said host;
wherein said memory controller is configured to issue a memory read request for data stored within said plurality of memory modules; and
wherein said memory controller is further configured to request that data be prefetched from said plurality of memory modules by encoding prefetch information within said memory read request.
2. The system as recited in claim 1, wherein each of said plurality of memory modules includes a memory control hub coupled to control access to a plurality of memory chips.
3. The system as recited in claim 2, wherein each of said plurality of memory modules includes a DRAM controller configured to generate a memory read cycle to said plurality of memory chips in response to receiving a memory read command having a memory address that matches a memory address associated with said memory control hub.
4. The system as recited in claim 3, wherein said DRAM controller is further configured to selectively generate a memory read cycle to prefetch data in response to receiving particular prefetch information.
5. The system as recited in claim 1, wherein said prefetch information includes prefetch hint information that indicates whether or not to prefetch data.
6. The system as recited in claim 1, wherein said prefetch information includes prefetch stride information that indicates a number of addresses skipped between accesses to said plurality of memory modules.
7. The system as recited in claim 1, wherein said plurality of memory modules is coupled serially in a chain to said host via a plurality of memory links, wherein each memory link includes an uplink for conveying transactions toward said host and a downlink for conveying transactions originating at said host to a next memory module in said chain.
8. The system as recited in claim 7, wherein said uplink and said downlink are each a uni-directional link including a plurality of signals configured to convey transactions using packets that include control and configuration packets and memory access packets, wherein at least a portion of packets include control, address and data information, and wherein said control, address and data information share the same wires of a given link.
9. The system as recited in claim 1, wherein each memory module of said plurality of memory modules includes a storage for storing prefetch data returned from a respective plurality of memory chips located on each of said memory modules.
10. The system as recited in claim 1, wherein said memory controller is further configured to issue said memory read request without knowledge of a memory size associated with each of said memory modules or an address range associated with any of said memory modules.
11. The system as recited in claim 10, wherein said memory controller is further configured to issue subsequent memory read requests prior to receiving a response to said memory read request.
12. The system as recited in claim 1, wherein said memory controller includes a prefetch unit configured to provide said prefetch information for inclusion within said memory read request.
13. The system as recited in claim 1, wherein said memory controller is further configured to issue a memory write request to write data to said plurality of memory modules and to selectively request that one or more pages of memory within a given one of said plurality of memory modules remain open by encoding said prefetch information within said memory write request.
14. A memory module coupled serially in a chain with other memory modules to a host, said memory module comprising:
a plurality of memory chips; and
a memory control hub coupled to control access to said plurality of memory chips, wherein said memory control hub is configured to generate memory read cycles to said plurality of memory chips in response to receiving a memory read request for data stored within said plurality of memory chips;
wherein said memory control hub is further configured to selectively prefetch data from said plurality of memory chips in response to receiving prefetch information encoded within said memory read request.
15. The memory module as recited in claim 14, wherein said memory module includes a DRAM controller configured to generate said memory read cycle to said plurality of memory chips in response to receiving a memory read command having a memory address that matches a memory address associated with said memory control hub.
16. The memory module as recited in claim 15, wherein said DRAM controller is further configured to selectively generate said memory read cycle to prefetch data in response to receiving particular prefetch information.
17. The memory module as recited in claim 14, wherein said prefetch information includes prefetch hint information that indicates whether or not to prefetch data.
18. The memory module as recited in claim 14, wherein said prefetch information includes prefetch stride information that indicates a number of addresses skipped between accesses to said plurality of memory modules.
19. The memory module as recited in claim 1, wherein said memory module is coupled serially with other memory modules in said chain to said host via a plurality of memory links, wherein each memory link includes an uplink for conveying transactions toward said host and a downlink for conveying transactions originating at said host to a next memory module in said chain.
20. The memory module as recited in claim 19, wherein said uplink and said downlink are each a unidirectional link including a plurality of signals configured to convey transactions using packets that include control and configuration packets and memory access packets, wherein at least a portion of packets include control, address and data information, and wherein said control, address and data information share the same wires of a given link.
21. The memory module as recited in claim 14, wherein said memory control hub is further configured to write data to said plurality of memory chips and to selectively request that one or more pages of memory within said plurality of memory chips remain open in response to receiving a memory write request including said prefetch information.
22. A method comprising:
connecting plurality of memory modules serially in a chain to a host including a memory controller;
said memory controller issuing a memory read request for data stored within said plurality of memory modules; and
said memory controller requesting that data be prefetched from said plurality of memory modules by encoding prefetch information within said memory read request.
23. The method as recited in claim 22 further comprising storing prefetch data returned from a respective plurality of memory chips located on each of said memory modules within a respective a storage located on each of said memory modules.
Description

This application claims the benefit of U.S. Provisional Application No. 60/470,078 filed May 13, 2003.

BACKGROUND OF THE INVENTION

1. Field of the Invention

This invention relates to computer system memory and, more particularly, to prefetching data in a serial memory subsystem topology.

2. Description of the Related Art

Many computer systems employ a main system memory that may be configured dependent upon the needs of an end user. In such systems, a motherboard or system board may include a number of memory expansion sockets. One or more small circuit boards, referred to as memory modules, may be inserted into the sockets as needed to increase the memory capacity of the computer system. Each of the memory modules typically includes multiple memory devices that provide a given amount of memory capacity. The memory devices are usually implemented using some type of dynamic random access memory (DRAM). Some examples of DRAM types include synchronous DRAM (SDRAM) as well as the various types of double data rate SDRAM (DDR SDRAM).

In conventional computer systems, the memory modules are connected to a memory/DRAM controller via a memory bus that includes address, control and a data signals. In some computer systems, the address, control and data signals may be multiplexed and thus share the same sets of wires. In other computer systems, the address, control and data signals may use separate wires. In either case, each of the address and control signals are routed to each expansion socket such that the memory modules, when inserted, are connected in parallel to the memory/DRAM controller. In some systems the memory/DRAM controller may reside on the same integrated circuit (IC) chip as the system processor, while in other systems the memory/DRAM controller may reside in one IC (e.g., a Northbridge) of a chipset.

Although the operating speed of computer system processors continues to increase, the relative performance of the main system memory has not increased at the same rate. This may be due, at least in part, to the incremental improvement in the bandwidth of the system memory architectures described above.

SUMMARY

Various embodiments of a prefetch mechanism for use in a system including a host coupled serially to a plurality of memory modules are disclosed. In one embodiment, the system includes a host coupled to a serially connected chain of memory modules. The host includes a memory controller that may be configured to issue a memory read request for data stored within the memory modules. The memory controller may further request that data be prefetched from the memory modules by encoding prefetch information within the memory read request.

In one specific implementation, each of the memory modules may include a memory control hub that may control access to a plurality of memory chips on the memory module. In addition, each of the memory modules may include a DRAM controller configured to generate a memory read cycle to the memory chips in response to receiving a memory read command having a memory address that matches a memory address associated with the memory control hub.

In another specific implementation, the prefetch information may include prefetch hint information that indicates whether or not to prefetch data and prefetch stride information that indicates a number of addresses skipped between accesses to the memory modules.

In another specific implementation, each memory module includes a storage for storing prefetch data returned from the memory chips located on each of the memory modules.

In still another specific implementation, the memory controller may be configured to issue a memory write request to write data to the memory modules and to selectively request that one or more pages of memory within a given one of the memory modules remain open by encoding the prefetch information within the memory write request.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of one embodiment of a system including a serially connected chain of memory modules.

FIG. 2 is a block diagram of one embodiment of a memory module such as a memory module illustrated in FIG. 1.

FIG. 3 is a block diagram of another embodiment of a memory module such as a memory module illustrated in FIG. 1.

FIG. 4 is a diagram of one embodiment of a memory read packet.

FIG. 5 is a diagram of one embodiment of a memory write packet.

FIG. 6 is a block diagram of one embodiment of a computer system.

While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the present invention as defined by the appended claims. Note, the headings are for organizational purposes only and are not meant to be used to limit or interpret the description or claims. Furthermore, note that the word “may” is used throughout this application in a permissive sense (i.e., having the potential to, being able to), not a mandatory sense (i.e., must). The term “include” and derivations thereof mean “including, but not limited to.” The term “connected” means “directly or indirectly connected,” and the term “coupled” means “directly or indirectly coupled.”

DETAILED DESCRIPTION

Turning now to FIG. 1, a block diagram of one embodiment of a system including a serially connected chain of memory modules is shown. System 50 includes a host 100 coupled to a system memory 125 via a memory link 11A. System 50 may be configured to operate as part of a computing device such as a computer system or server system, for example. System memory 125 includes a memory module 150A coupled to a memory module 150B via a memory link 110B. Memory module 150B is shown coupled to a memory link 110C, which may be coupled to an additional memory module (not shown) as desired to form a serially connected chain of memory modules that is coupled to host 100. It is noted that although two memory modules are shown in the chain, it is contemplated that one or more memory modules may be connected in this manner. It is further noted that components including a reference number followed by a reference letter may be referred to generally by the reference number alone. For example, when referring generally to all memory modules, reference may be made to memory module 150.

In the illustrated embodiment, memory module 150A includes a memory control hub 160A, which is coupled to a plurality of memory devices that are designated memory chip 171A through 171N, where N may be any number, as desired. In one embodiment, memory control hub 160A may be coupled to the memory chips via any type of memory interconnect. For example, in one embodiment, the memory interconnect may be a typical address, control and data bus configuration.

Similarly, memory module 150B includes a memory control hub 160B, which is coupled to a plurality of memory devices that are designated memory chip 181A through 181N, where N may be any number, as desired. In one embodiment, memory control hub 160B may be coupled to the memory chips via any type of memory interconnect as described above. It is noted that each of memory chips 171A through 171N and 181A through 181N may be any type of memory device such as a memory device in the DRAM family of memory devices, for example.

In the illustrated embodiment, memory links 10A-110C form a memory interconnect. In one embodiment, each of memory links 110A-110C forms a point-to-point memory interconnect that is implemented as two sets of unidirectional lines. One set of unidirectional lines is referred to as a downlink and is configured to convey transactions away from host 100 in a downstream direction. The other set of unidirectional lines is referred to as an uplink and is configured to convey transactions toward host 100 in an upstream direction. In addition, in one embodiment, each set of unidirectional lines may be implemented using a plurality of differential signal pairs. In one embodiment, each memory link 110 includes an 18-bit downlink and a 16-bit uplink, where each bit is a differential signal pair. As will be described in greater detail below in conjunction with the descriptions of FIG. 5A through FIG. 5D, the memory interconnect formed by memory links 110 may be configured to convey packets.

Generally speaking, all transactions from host 100 flow downstream through all memory modules 150 on the downlink and all response transactions flow upstream from the responding memory module 150 through each upstream memory module 150 on the uplink. More particularly, in one embodiment, host 100 may request to retrieve or store data within system memory 125. In response to host 100 making a request, memory controller 105 initiates a corresponding transaction such as a memory read transaction or a memory write transaction, for example. Memory controller 105 transmits the transaction to system memory 125 via memory link 110A. In the illustrated embodiment, the transaction is received by memory control hub 160A of memory module 150A.

In response to receiving the transaction, memory control hub 160A is configured to transmit the received transaction to memory module 150B via memory link 110B without decoding the transaction. This is referred to as forwarding the transaction downstream. Thus, each transaction received on a downlink by a given memory control hub 160 of a given memory module 150 is forwarded to the next memory module 150 in the chain that is coupled to the downlink without decoding the transaction. In one embodiment, decoding of the transaction may occur in parallel with the forwarding of the transaction. In other embodiments, the decoding of the transaction may occur after the transaction has been forwarded. A more detailed description of downstream forwarding function may be found below in the description of FIG. 3.

Likewise, if memory controller 105 initiates a read request transaction, for example, the memory module 150 having the memory location corresponding to the address in the request will respond with the requested data. The response will be transmitted on the memory module's uplink toward host 100. If there are any intervening memory modules between the sending memory module and host 100, the intervening memory module will forward the response transaction on its uplink to either host 100 or the next memory module in the chain in an upstream direction. In addition, when the responding memory module is ready to send the response, it may inject the response into a sequence of transactions that are being forwarded upstream on the uplink. A more detailed description of upstream forwarding function may be found below in the description of FIG. 5.

In one embodiment, memory controller 105 may be configured to make requests to system memory 125 without knowledge of which of memory modules 150A and 150B a particular address is associated. For example, each of memory modules 150 may be assigned a range of memory addresses during a system configuration sequence. Each memory control hub 160 may include logic (not shown in FIG. 1) that may decode the address of an incoming request. Thus, a memory control hub 160 of a given memory module 150 may initiate a memory read cycle or memory write cycle to the memory chips on the given memory module 150 in response to decoding a memory request having an address that is in the address range assigned to the given memory module 150. As will be described in greater detail below in conjunction with the description of FIG. 2, in one embodiment, each memory control hub 160 may include a DRAM controller (not shown in FIG. 1) for initiating memory cycles to the memory chips to which it is connected.

In addition, in one embodiment, memory controller 105 may initiate a subsequent memory access request prior to receiving a response to a previous memory access request. In such an embodiment, memory controller 105 may keep track of outstanding requests and may thus process the responses in a different order than they were sent.

Further, in the illustrated embodiment, memory controller 105 includes a prefetch unit 107 configured to provide prefetch information for prefetching data from addresses that correspond to a current memory read request. As will be described in greater detail below, prefetch unit 107 may predict which address may be fetched next. The prefetch unit 107 may calculate the prefetch address or addresses and may provide prefetch hints that may be encoded and embedded within the memory request packets. Prefetch unit 107 may also provide the stride (i.e., the number of cache lines that will be skipped between memory accesses) for inclusion within the memory request packets. In addition, a prefetch buffer (not shown in FIG. 1) may be used to store the prefetched cache lines. In one implementation, a prefetch buffer may be located on each memory module. In an alternative implementation a prefetch buffer may be located on the host.

The Memory Interconnect

The memory interconnect includes one or more high-speed point-to-point memory links such as memory links 110A-110C each including an uplink such as uplink 111A and a downlink such as downlink 112A, for example. As noted above, in one embodiment downlinks may be 18-bit links while uplinks may be 16-bit links. As such, an 18-bit downlink may include 16 control, address and data (CAD) signals, a busy signal and a Control (CTL) signal. A given uplink may include 16 control, address and data (CAD) signals. It is contemplated however, that in an alternative embodiment, an uplink such as uplink 211A may also include a CTL signal.

In addition to the high-speed links, other signals may be provided to each memory module 150. For example, in one embodiment, a reset signal, a power OK signal and a reference clock may be provided to each memory module 150 from host 100. Further, other signals may be provided between each memory module. For example, as described above, a next memory module present signal may be provided between memory modules.

Generally speaking, the types of transactions conveyed on memory links 110 may be categorized into configuration and control transactions and memory transactions. In one embodiment, configuration and control transactions may be used to configure memory control hub 160. For example, configuration and control transactions may be used to access configuration registers, assign a memory address range to a memory module or to assign a hub address to a memory control hub. Memory transactions may be used to access the memory locations within the memory chips (e.g., 171A-171N . . . 181A-181N).

Accordingly, in one embodiment, there are two types of addressing supported: hub addressing and memory addressing. Using hub addressing, eight hub bits identify the specific memory control hub being accessed. In one embodiment, a hub address of FFh may be indicative of a broadcast to all memory control hubs. Using memory addressing, each hub decodes the upper portion of the address bits to determine which hub should accept the request and the lower portion to determine the memory location to be accessed. In one embodiment, there are 40 address bits, although it is contemplated that other numbers of address bits may be used as desired.

In one embodiment, each of the memory links is configured to convey the transactions using one or more packets. The packets include control and configuration packets and memory access packets, each of which may include a data payload depending on the type of command the packet carries. As such, the sets of wires that make up memory links 110 may be used to convey control, address and data.

The packets may be generally characterized by the following: Each packet includes a number of bit positions which convey a single bit of information. Each packet is divided into several bit times and during a given bit time, all of the bit positions of the packet are sampled. As such, the control information and data share the same wires of a given link (e.g., CAD wires). As will be described in greater detail below, in one embodiment, packets are multiples of bit pairs and the first bit-time of every packet is sampled at an even bit-time. Packets begin with a control header that may be either one or two bit-pairs in length. In one embodiment, the first five bits of the control header is the command code. Table 1 below illustrates the various types of packets and their associated command codes. It is noted however, that the actual codes shown in column one are for illustrative purposes and that other codes may be used for each given command.

TABLE 1
Packet types and command codes
Header
Length
(bit- Normal Address
Code times) Command Description Direction Response Type
00h NOP Null Operation/Idle State Both
04h 2 AddrSet Address Set Down AddrAck Hub
05h 2 AddrAck Address Acknowledge Up
06h 2 Ack Acknowledge Up
07h 2 Nak Not Acknowledge/Error Up
08h 2 SRdResp Short Read Response Up
09h 2 LRdResp Long Read Response Up
0Ah 2 ConfigRd Configuration Read Down RdResp Hub
0Ch 2 ConfigWr Configuration Write Down Ack Hub
0Eh 2 DIMMCtl DIMM Control Down Ack Hub
10h 4 SMemRd Short Memory Read Down RdResp/Ack Memory
11h 4 LMemRd Long Memory Read Down RdResp Memory
12h 4 BlkMemWr Block Memory Write Down Ack Memory
13h 4 SbytMemWr Short Byte Memory Write Down Ack Memory
14h 4 LbytMemWr Long Byte Memory Write Down Ack Memory

Further, in one embodiment, packets (except NOP packets) may be transmitted with an error detecting code (EDC). It is noted that in one implementation, the EDC is a 32-bit cyclic redundancy code (CRC), although other implementations may employ other EDC's as desired. Additionally, addresses are sent most significant bit-time first to speed decode within memory control hub 160 while data is sent least significant byte first. It is noted however, that other embodiments are contemplated in which the addresses may be sent least significant bit-time first and data my be sent most significant byte first. Packets may carry a payload of byte enables and/or data. Packets with no payload are referred to as header-only packets. In one embodiment, the size of the data short reads may be up to one half of a programmed cache line size. In addition, the size of the data for long reads and block writes may be up to the programmed cache line size. Further, the size of the data for byte writes may be a maximum of 64 bytes regardless of the cache line size setting.

In addition to the control header and command code information included within a packet, the CTL signal may be used to convey information about each packet. As illustrated in Table 2 below, some exemplary CTL encodings are shown.

TABLE 2
CTL encodings for downstream use
Even Odd Content of CAD
0 0 Data or Byte Enable Payload
1 1 Control Header
0 1 CRC for a Packet with Payload
1 0 CRC for a Header-Only Packet

Different values of CTL for the header and payload portions of a packet may provide enough information to allow header-only packets to be inserted within the payload of another packet. This may be useful for reducing the latency of read commands by allowing them to issue while a write packet is still being sent on the link. Table 3 illustrates an exemplary packet including a payload in tabular format. The packet in table 3 also shows that a header-only packet is inserted in the payload during bit times 4-7.

TABLE 3
Packet with payload and header-only packet
inserted within payload
Bit-time CTL CAD
0 1 Header1 bits [15:0]
1 1 Header1 bits [31:16]
2 0 Data bits [15:0]
3 0 Data bits [31:16]
4 1 Header2 bits [15:0]
5 1 Header2 bits [31:16]
6 1 CRC2 bits [15:0]
7 0 CRC2 bits [31:16]
8 0 Data bits [47:32]
9 0 Data bits [64:48]
10 0 CRC1 bits [15:0]
11 1 CRC1 bits [31:16]

Referring now to FIG. 2, a block diagram of one embodiment of memory module 150 is shown. Components that correspond to those shown in FIG. 1 are numbered identically for clarity and simplicity. Memory module 150 includes a memory control hub 160 coupled to memory chips 261A through 261N via a memory bus 265. Memory control hub 160 includes a control unit 240 coupled to a DRAM controller 250. DRAM controller 250 is coupled to memory chips 261A-261N. Control unit 240 includes an uplink control 241 and a downlink control 242. As noted above, memory bus 265 may be any type of memory interconnect. In the illustrated embodiment, memory control hub 160 is coupled to a memory link 110A in an upstream direction and a memory link 110B in a downstream direction. It is further noted that the frequency of operation of memory bus 265 is independent of the frequency of operation of memory links 110.

In the illustrated embodiment, uplink control unit 241 may be configured to receive and forward packets received from another memory module downstream. The receiving and forwarding of the upstream packets creates an upstream transaction sequence. In addition, uplink control unit 241 may be configured to inject packets that originate within memory module 150 into the transaction stream.

In the illustrated embodiment, downlink control unit 242 may be configured to receive packets that originate at the host and if a memory module is connected downstream, to forward those packets to the downstream memory module. In addition, downlink control unit 242 may be configured to copy and decode the packets. In one embodiment, if the packets include an address that is within the range of addresses assigned to memory module 150 and the packet is a memory access request, downlink control unit 242 may pass the command associated with the packet to DRAM controller 250. However, if the packet is not a memory request, but is instead a configuration packet, downlink control unit 242 may pass the configuration command associated with the packet to the core logic of control unit 240 (not shown) for processing. It is noted that in one embodiment, if the packet does not include an address that is within the range of addresses assigned to memory module 150, memory control hub 160 may drop or discard the packet if memory module 150 is the last memory module in the chain.

In one embodiment, memory control hub 160 is configured to receive a module present signal (not shown), which when activated by a downstream memory module, indicates to an upstream memory module that there is a downstream memory module present. In such an embodiment, if memory control hub 160 receives a transaction and no downstream memory module is determined to be present, memory control hub 160 may drop the transaction.

As mentioned above, in one implementation, prefetch unit 107 may predict which addresses may be needed and may encode hint information and stride information into a memory request packet. In an alternative implementation, the prediction information may be generated by other hardware and or software and provided to prefetch unit 107. For example, software executing a memory-streaming algorithm may provide explicit stride information, which may be passed to prefetch unit 107. Prefetch unit 107 may then generate hint and stride information for inclusion within a given memory request.

The hint information may indicate to the DRAM Controller 250 the type of addresses to prefetch (if any). Table 4 below, illustrates an exemplary set of hint values. It is noted that in other embodiments, other values having other meanings are possible.

TABLE 4
Packet Prefetch Hint Values
Hint Meaning
0 None
1 Last access to this cacheline
2 More accesses to this cacheline
3 Accesses to next cacheline to follow, see stride
4 Accesses to previous cacheline to follow, see stride
5-7 Reserved

The stride information indicates how many cachelines will be skipped between accesses. Table 5 below, illustrates an exemplary set of stride values. It is noted that in other embodiments, other values having other meanings are possible.

TABLE 5
Packet Prefetch Stride Values
Stride Number of cache lines skipped
0 none
1 1
2 2
3 3

In one implementation, DRAM controller 250 is configured to initiate memory cycles to memory chips 261A-261N in response to memory commands from memory control hub 160. Thus, in response to receiving a memory request from addresses assigned to that memory module, downlink control 242 may pass the memory request command, including the hint and stride information to DRAM controller 250. DRAM controller 250 generates the memory cycles corresponding to the request. For example, if a read request is received for the data at a given address, DRAM controller 250 generates the read cycles for the data at that address. In addition, DRAM controller 250 decodes the hint and stride information and prefetches the cachelines of data indicated by the hint and stride information.

As will be described in greater detail below in conjunction with the description of FIG. 3, each memory module 150 may include a storage configured as a prefetch buffer (not shown in FIG. 2) for storing prefetch data. In such an implementation, the read data from the explicit memory request may be returned to host 100 without being stored the within prefetch buffer.

In another embodiment, all the requested data (including any prefetched data) returned by DRAM controller 250 may be stored within a read buffer (not shown) within host 100. However, depending on the size of the read buffer, the prefetch data may get discarded to make room for read request data that is explicitly requested in a memory read packet before the expected data arrives. Thus, in another implementation, host 100 may include a separate prefetch buffer for storing prefetched data separate from read data returned as result of explicit read requests.

Turning to FIG. 3, a block diagram of another embodiment of memory module 150 is shown. Components that correspond to those shown in FIG. 1 are numbered identically for clarity and simplicity. Memory module 150 of FIG. 3 includes a memory control hub 160 coupled to memory chips 261A through 261N via a memory bus 265. Memory control hub 160 includes a control unit 240 coupled to a DRAM controller 250. DRAM controller 250 is coupled to memory chips 261A-261N and to a prefetch buffer 375. Control unit 240 includes an uplink control 241 and a downlink control 242. As noted above, memory bus 265 may be any type of memory interconnect. In the illustrated embodiment, memory control hub 160 is coupled to a memory link 110A in an upstream direction and a memory link 110B in a downstream direction. It is noted that the frequency of operation of memory bus 265 is independent of the frequency of operation of memory links 110.

The operation of memory module 150 of FIG. 3 is similar to the operation of memory module 150 of FIG. 2. However, since memory module 150 of FIG. 3 includes a prefetch buffer 375, the operational differences are described below. It is noted that although prefetch buffer 375 is shown as part of memory control hub 160, in other embodiments, prefetch buffer 375 may be separate from memory control hub 160.

As mentioned above, the data retrieved in response to an explicit memory read request may be sent back to host 100 and not stored within prefetch buffer 375. However, prefetch buffer 375 may store data that is prefetched by DRAM controller 250 in response to receiving hint and stride information with an explicit memory request. When a subsequent memory request is received by DRAM controller 250, prefetch buffer 375 may be checked for the requested data. If the data is stored within prefetch buffer 375, that data is returned to host 100; thereby possibly saving time and reducing the latency associated with accessing memory chips 261A-N.

In one implementation, when an explicit memory request packet such as a read request packet is received, the command and address information corresponding to the request packet is provided to DRAM controller 250. In addition, the hint and stride information is provided to DRAM controller 250. DRAM controller 250 is configured to generate memory read cycles corresponding to the explicit read command. In addition, DRAM controller 250 is configured to decode the prefetch hint and stride information and to generate memory read cycles corresponding to the addresses indicated by the hint and stride information. The read data returned from memory chips 261A-N as a result of the explicit read command may be packetized and injected into the upstream flow of packets on uplink 211A by uplink control 241. The read data returned from memory chips 261A-N as a result of the prefetch read commands is stored within prefetch buffer 375.

Similar to a memory read request, when a memory write request packet is received by a memory module, the command and address information corresponding to the write request packet is also provided to DRAM controller 250. If the write packet includes hint and stride information, instead of performing prefetch operations (as when a read request is received), in one implementation, DRAM controller 250 may use the hint and stride information to keep open for any subsequent accesses, the page within memory chips 261A-N corresponding to the hint and stride information.

In one embodiment, prefetch buffer 375 may be implemented using memory devices in the random access memory (RAM) family that have faster access times than for example, access times of memory chips 261A-N. Any suitable RAM device may be used such as static RAM (SRAM) or fast SRAM (FSRAM).

In addition, prefetch buffer 375 may be implemented using a variety of suitable structures. For example, depending on the size, prefetch buffer 375 may be implemented as a fully associative, set associative, or direct mapped structure that may include tags to support lookup functions. The tags may also be used to invalidate entries in prefetch buffer 375 in response to DRAM controller 250 receiving a command to write data to an address that is stored within prefect buffer 375, for example.

FIG. 4 and FIG. 5 illustrate exemplary memory access packets that may be conveyed on memory links 110A through 110C of FIG. 1. Turning now to FIG. 4, a diagram of one embodiment of an exemplary memory read packet is shown. In the illustrated embodiment, memory read packet 425 is 16 bits wide and includes six bit times or three bit-pairs. During bit time zero, the five-bit command code (e.g., 10h or 11h) is conveyed in bit positions 0-4. A prefetch hint value is encoded and conveyed in bit positions 5-7. Exemplary prefetch hint values are discussed and shown above in Table 4. An eight-bit tag is conveyed in bit positions 8-15.

During bit time one, the length of the data that should be returned is conveyed in bit positions 0-5. In one embodiment, a value of 00h indicates no data, a value of 01h indicates two bit-pairs of data, a value of 02h indicates four bit-pairs of data, and so on. A zero length read results in an acknowledge packet (Ack) being returned to the requestor. In one embodiment, a read of a half cache line or less may result in a short RdResp and a read of more than a half cache line may result in either a single long RdResp or two short RdResp. The cache line size may be programmed by software into the configuration registers of host 100 and each memory control hub 160. A prefetch stride prediction value is encoded and conveyed in bit positions 6-7. Exemplary prefetch stride prediction values are discussed and shown in Table 4 above. Address bits 39-32 of the requested location in DRAM are conveyed in bit positions 8-15.

During bit time two, the address bits 31-16 of the requested location in DRAM are conveyed in bit positions 0-15 and during bit time 3, the address bits 3-15 of the requested location in DRAM are conveyed in bit positions 3-15. Also during bit time 3, the packet priority is conveyed in bit positions 0-1. In one embodiment, the priority may be indicative of the priority of the packet relative to other requests. For example, one priority may be to delay all requests with lower priority even if they are already in progress and to execute this request ahead of them. Bit position 2 is reserved. During bit times four and five, bits 0-15 and 16-31, respectively, of a CRC are conveyed in bit positions 0-15.

Referring to FIG. 5, a diagram of one embodiment of an exemplary block memory write packet is shown. In the illustrated embodiment, block memory write packet 525 is 16 bits wide and includes eight bit times or four bit-pairs. During bit time zero, the five-bit command code (e.g., 12h) is conveyed in bit positions 0-4. A prefetch hint value is encoded and conveyed in bit positions 5-7. An eight-bit tag is conveyed in bit positions 8-15.

During bit time one, the length of the data being conveyed in the data payload is conveyed in bit positions 0-5. In one embodiment, a value of 00h indicates no data, a value of 01h indicates two bit-pairs of data, a value of 02h indicates four bit-pairs of data, and so on. A prefetch stride prediction value is encoded and conveyed in bit positions 6-7. Address bits 39-32 of the location in DRAM being written are conveyed in bit positions 8-15.

During bit time two, the address bits 31-16 of the location in DRAM being written are conveyed in bit positions 0-15 and during bit time 3, the address bits 3-15 of the location in DRAM being written are conveyed in bit positions 3-15. Also during bit time 3, the packet priority is conveyed in bit positions 0-1. Bit position 2 is reserved.

During bit times four and five, bits 0-15 and 16-31 of a first bit pair of the data payload are conveyed in bit positions 0-15. If more data is being written, subsequent bit pairs may convey bits 0-15 and 16-31 of subsequent data payload. During bit times 4+2N and 5+2N, bits 0-15 and 16-31, respectively, of a CRC are conveyed in bit positions 0-15.

It is noted that although only two types of packets were shown, other types of packets, which may correspond to the command codes listed in table 3 are contemplated. It is further noted that although the various fields of the exemplary packets are shown having a particular number of bits, it is contemplated that in other embodiments, the various fields of the peach packet may include other numbers of bits as desired.

FIG. 6 is a block diagram of one embodiment of a computer system. Computer system 600 includes process nodes 612A-612D each interconnected by coherent packet interface links 615A-D. Each link of coherent packet interface 615 may form a high-speed point-to-point link. Process nodes 612A-D may each include one or more processors. Computer system 600 also includes an I/O node 620 which is coupled to process node 612A via a non-coherent packet interface 650A. I/O node 620 may be connected to another I/O node (not shown) in a chain topology for example, by non-coherent packet interface 650B. Process nodes 612A is illustrated as a host node and may include a host bridge for communicating with I/O node 620 via NC packet interface 650A. Process nodes 612B-D may also include host bridges for communication with other I/O nodes (not shown). The non-coherent packet interface links formed by NC packet interface 650A-B may also be referred to as point-to-point links. I/O node 620 is connected to a pair of peripheral buses 625A-B.

FIG. 6 further illustrates respective system memories (e.g., 617A and 617B) coupled to process nodes 612A and 612B. In the illustrated embodiment, process node 612A and 612B are each illustrative of a host as shown in FIG. 1, and each system memory 617 may be implemented in the configuration described in conjunction with the description of FIG. 1 and FIG. 2 above. Further, the interconnects between each of process nodes 612A and 612B and their respective system memories 617 may be reflective of the memory interconnect including memory link 110C described above in FIG. 1 and FIG. 2. It is noted that in other embodiments, other numbers of process nodes may be used. Further, it is contemplated that each of process nodes 612C and 612D may be similarly connected to a respective system memory such as system memory 617, for example.

In the illustrated embodiment, each link of coherent packet interface 615 is implemented as sets of unidirectional lines (e.g. lines 615A are used to transmit packets from processing node 612A to processing node 612B and lines 615B are used to transmit packets from processing node 612B to processing node 612C). Other sets of lines 615C-D are used to transmit packets between other processing nodes as illustrated in FIG. 1. The coherent packet interface 615 may be operated in a cache coherent fashion for communication between processing nodes (“the coherent link”). Further, non-coherent packet interface 650 may be operated in a non-coherent fashion for communication between I/O nodes and between I/O nodes and a host bridge such as the host bridge of process node 612A (“the non-coherent link”). The interconnection of two or more nodes via coherent links may be referred to as a “coherent fabric”. Similarly, the interconnection of two or more nodes via non-coherent links may be referred to as a “non-coherent fabric”. It is noted that a packet to be transmitted from one processing node to another may pass through one or more intermediate nodes. For example, a packet transmitted by processing node 612A to processing node 612C may pass through either processing node 612B or processing node 612D as shown in FIG. 6. Any suitable routing algorithm may be used. Other embodiments of computer system 600 may include more or fewer processing nodes than the embodiment shown in FIG. 6.

One example of a packet interface such as non-coherent packet interface 650 may be compatible with HyperTransport™ technology. Peripheral buses 625A and 625B are illustrative of a common peripheral bus such as a peripheral component interconnect (PCI) bus. It is understood, however, that other types of buses may be used.

It is further noted that other computer system configurations are possible and contemplated. For example, it is contemplated that the system memory configuration described above in FIG. 1 through FIG. 5 may be used in conjunction with a computer system employing a processor chipset that includes a Northbridge. In such an embodiment, a memory controller within the Northbridge may serve as the host.

Although the embodiments above have been described in considerable detail, numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.

Referenced by
Citing PatentFiling datePublication dateApplicantTitle
US7117316 *Aug 5, 2002Oct 3, 2006Micron Technology, Inc.Memory hub and access method having internal row caching
US7133972 *Jun 7, 2002Nov 7, 2006Micron Technology, Inc.Memory hub with internal cache and/or memory access prediction
US7260685 *Jun 20, 2003Aug 21, 2007Micron Technology, Inc.Memory hub and access method having internal prefetch buffers
US7603526 *Jan 29, 2007Oct 13, 2009International Business Machines CorporationSystems and methods for providing dynamic memory pre-fetch
US7606988 *Jan 29, 2007Oct 20, 2009International Business Machines CorporationSystems and methods for providing a dynamic memory bank page policy
US7636813 *May 22, 2006Dec 22, 2009International Business Machines CorporationSystems and methods for providing remote pre-fetch buffers
US7644253 *Nov 1, 2006Jan 5, 2010Micron Technology, Inc.Memory hub with internal cache and/or memory access prediction
US7742438 *Apr 21, 2005Jun 22, 2010Owlink Technology, Inc.HDCP data transmission over a single communication channel
US7757064 *Sep 7, 2006Jul 13, 2010Infineon Technologies AgMethod and apparatus for sending data from a memory
US7949794 *Nov 2, 2006May 24, 2011Intel CorporationPCI express enhancements and extensions
US8032711 *Dec 22, 2006Oct 4, 2011Intel CorporationPrefetching from dynamic random access memory to a static random access memory
US8099523Mar 28, 2011Jan 17, 2012Intel CorporationPCI express enhancements and extensions including transactions having prefetch parameters
US8230119Aug 23, 2010Jul 24, 2012Intel CorporationPCI express enhancements and extensions
US8230120Mar 28, 2011Jul 24, 2012Intel CorporationPCI express enhancements and extensions
US8447888 *Dec 9, 2011May 21, 2013Intel CorporationPCI express enhancements and extensions
US8473642Aug 4, 2011Jun 25, 2013Intel CorporationPCI express enhancements and extensions including device window caching
US8549183Sep 16, 2010Oct 1, 2013Intel CorporationPCI express enhancements and extensions
US8555101Mar 11, 2011Oct 8, 2013Intel CorporationPCI express enhancements and extensions
US8793404Jun 11, 2012Jul 29, 2014Intel CorporationAtomic operations
US20120089750 *Dec 9, 2011Apr 12, 2012Jasmin AjanovicPci express enhancements and extensions
WO2007135144A1 *May 22, 2007Nov 29, 2007IbmSystems and methods for providing remote pre-fetch buffers
Classifications
U.S. Classification711/105, 711/137
International ClassificationG06F13/28, G06F13/16, G11C5/00, G06F12/00
Cooperative ClassificationG06F13/4243, G06F12/0215, G06F12/0862, G06F2212/6022, G06F13/1626
European ClassificationG06F13/16A2R, G06F13/42C3S, G06F12/02C
Legal Events
DateCodeEventDescription
May 10, 2004ASAssignment
Owner name: ADVANCED MICRO DEVICES, INC., CALIFORNIA
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WEBER, FREDERICK D.;LA FETRA, ROSS V.;MIRANDA, PAUL C.;REEL/FRAME:015313/0299;SIGNING DATES FROM 20040331 TO 20040411