WO2013083194A1 - Memory controller and method for controlling accesses to a memory - Google Patents

Memory controller and method for controlling accesses to a memory Download PDF

Info

Publication number
WO2013083194A1
WO2013083194A1 PCT/EP2011/072158 EP2011072158W WO2013083194A1 WO 2013083194 A1 WO2013083194 A1 WO 2013083194A1 EP 2011072158 W EP2011072158 W EP 2011072158W WO 2013083194 A1 WO2013083194 A1 WO 2013083194A1
Authority
WO
WIPO (PCT)
Prior art keywords
pending
memory
access requests
access request
write
Prior art date
Application number
PCT/EP2011/072158
Other languages
French (fr)
Inventor
Ronen HADAR
Shlomo Reches
Yoram Gross
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to PCT/EP2011/072158 priority Critical patent/WO2013083194A1/en
Publication of WO2013083194A1 publication Critical patent/WO2013083194A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1605Handling requests for interconnection or transfer for access to memory bus based on arbitration
    • G06F13/161Handling requests for interconnection or transfer for access to memory bus based on arbitration with latency improvement
    • G06F13/1626Handling requests for interconnection or transfer for access to memory bus based on arbitration with latency improvement by reordering requests

Definitions

  • the present invention relates to a memory controller and to a method for controlling accesses to a memory.
  • GHz gigahertz
  • the speed of system memory is another determining factor for computing performance.
  • DRAM Dynamic Random Access Memory
  • a computer's memory is a temporary storage area for data that needs to be available for programs to run efficiently. The faster the memory can provide data, the more work the processor can perform. Increased data "throughput" translates directly into better system performance. DRAM devices are widely used due to their cost performance high density and low power.
  • DRAM Double Data Rate SDRAM memory
  • PC66 memory speed of 66 MHz
  • PC100 100MHz
  • PC133 133MHz
  • DDR Double Data Rate SDRAM memory technology
  • DDR memory started with 200MHz DDR (or DDR200) and is now available in DDR266, DDR333, and DDR400 speeds for mainstream PCs.
  • memory speeds were able to keep up with the processor's requirements.
  • the point was reached where the processor's ability to process data was accelerating faster than current memory technologies could support, memory became a major limitation in system performance.
  • the invention is based on the finding that the time it takes to perform n random memory accesses varies depending on how the accesses are ordered.
  • the ordering can be described by a single-source shortest path problem for a graph with nonnegative edge path costs.
  • other nodes may be added to the graph.
  • a change of the shortest path may occur.
  • an algorithm can take into account the possible future nodes and their properties.
  • An access is represented by a vertex (node).
  • the cost of access defines a path from one vertex to the next.
  • An algorithm for finding the optimum access order in very high clock rate with near optimal results but yet with low complexity maintains history of memory accesses and pending memory accesses.
  • To select the next access the algorithm computes the cost of performing each of the pending accesses with relation to the history accesses.
  • the history length is until there are no timing effects between history and new accesses.
  • the algorithm maintains a window with pending accesses and can be implemented by hardware.
  • DRAM Dynamic Random Access Memory
  • SDRAM Synchronous Dynamic Random Access Memory
  • DDR Double Data Rate
  • DDR3 Double Data Rate Type 3
  • BL Block Length
  • tRC Row-cycle time
  • tRTW Read-to-write delay time
  • tWTR Write-to-read delay time
  • tWR Write Recovery time
  • DSP Digital Signal Processor
  • ASIC application specific integrated circuit
  • the invention relates to a memory controller for controlling accesses to a memory, comprising: a history buffer configured to store processed access requests to the memory; a pending buffer configured to store pending access requests to the memory; and a selection unit configured to select an access request to be processed from the pending access requests, the selection being based on a cost function of the pending access requests and the processed access requests.
  • the cost function is based on timing parameters of the memory.
  • the cost function is configured to evaluate waste cycles between each of the pending access requests and a last processed one of the processed access requests.
  • the selection unit is configured to select a pending access request which minimizes the evaluated waste cycles.
  • the evaluating the waste cycles considers timing dependencies between the pending access requests and the processed access requests stored in the history buffer.
  • a length of the history buffer is such that the processed access requests stored in the history buffer have non-negligible timing dependencies with respect to the pending access requests.
  • an aging weight is assigned to each of the pending access requests in the pending buffer.
  • the aging weight indicates a length of stay of the associated pending access request in the pending buffer.
  • an aging weight assigned to an older pending access request is greater or equal than an aging weight assigned to a younger pending access request.
  • a cost provided by the cost function for a pending access request is reduced by the aging weight assigned to the pending access request.
  • the memory is a DRAM.
  • the memory is a DDR SDRAM, in particular a DDR3 SDRAM.
  • the timing parameters are one or more of the following: a row-to-row activation delay time, a row-cycle time, a write recovery time, a read- to-write delay time, a write-to-read delay time, a burst length and a clock frequency.
  • the memory comprises a number of M memory banks;
  • the selection unit is configured to select the access request to be processed from N pending read access requests and one pending write access request; and the N pending read access requests are directed to different ones of the memory banks.
  • the number N of pending read access requests is smaller or equal than the number M of memory banks.
  • the N pending read access requests are the oldest read access requests stored in the pending buffer of size K with respect to the memory banks they are directed to, and the one pending write access request is the oldest write access request stored in the pending buffer.
  • the number N of the pending read access requests plus the number 1 of the pending write access request is smaller or equal than the size K of the pending buffer.
  • the invention relates to a method for controlling accesses to a memory, comprising: storing processed access requests to the memory in a history buffer; storing pending access requests to the memory in a pending buffer; and selecting an access request to be processed from the pending access requests, the selection being based on a cost function of the pending access requests and the processed access requests.
  • the memory comprises a number of M memory banks and the method further comprises:
  • the N pending read access requests, from which the access request to be processed is selected are the oldest read access requests stored in the pending buffer with respect to the memory banks they are directed to; and the one write access request, from which the access request to be processed is selected, is the oldest write access request stored in the pending buffer.
  • the selecting the one pending write access request comprises: assigning memory banks from the M memory banks to recommended memory banks such that the recommended memory banks have less entries than memory banks not assigned to the recommended memory banks, wherein memory banks, to which a write access request stored in the history buffer was directed, are not assigned to the recommended memory banks; and selecting the one pending write access request from pending write access requests directed to one of the recommended memory banks.
  • the one pending write access request is selected from pending write access requests directed the one of the recommended memory banks which one has the lowest number of entries; and the one pending write access request is selected from pending write access requests directed to an arbitrary one of the M memory banks if no memory bank is assigned to the recommended memory banks.
  • the invention relates to a computer program for implementing a method according to the second aspect as such or according to any of the implementation forms of the second aspect. According to aspects of the invention, DRAM performance is increased by an innovative controller design.
  • the controller design is flexible to adaptation of the same design to different devices by applying new sets of rules, the same design may be optimized to different memory devices and different technologies by a simple configuration.
  • the algorithm is configured by a set of rules, e.g. defined by the timing parameters, for the memory device it interacts with.
  • the presented technique optimizes the sequence of memory accesses and thus increases the memory performance. DRAM performance is enhanced by an innovative controller mechanism as described below with respect to the drawings.
  • the memory controller according to the first aspect can be implemented in every design using DRAM memory. It enhances system performance or even reduces the system cost by relaxing the memory timing requirement. As such the system designer may be able to use lower speed and lower cost memory to achieve a required target bandwidth.
  • the methods described herein may be implemented as hardware or as software in a memory controller, in a micro-controller, in a Digital Signal Processor (DSP) or in any other side- processor or as hardware circuit within an application specific integrated circuit (ASIC).
  • DSP Digital Signal Processor
  • ASIC application specific integrated circuit
  • the invention can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations thereof. .
  • Fig. 1 shows a memory controller 100 according to an implementation form
  • Fig. 2 shows a data processing system 200 with a memory controller according to an implementation form
  • Fig. 3 shows a memory controller 300 according to an implementation form
  • Fig. 4 shows a diagram 400 of an aging weight function according to an implementation form
  • Fig. 5 shows a memory controller 500 according to an implementation form
  • Fig. 6 shows a performance diagram 600 of a memory controller according to an
  • Fig. 7 shows a performance diagram 700 of a memory controller according to an
  • Fig. 8 shows a schematic diagram of a method 800 for controlling accesses to a memory according to an implementation form.
  • Fig. 1 shows a memory controller 100 according to an implementation form.
  • the memory controller 100 acts as the glue logic that connects processors, high speed input-output devices and the memory system (not shown) to each other.
  • the memory controller 100 can exist as a separate device or as part of the processor and integrated into the processor package; the function of the memory controller 100 remains essentially the same in either case.
  • the primary function of the memory controller 100 is to manage the flow of data between the processors, input-output devices and the memory system, correctly and efficiently.
  • the function of the memory controller 100 is to manage the flow of data into and out of the memory devices.
  • the memory access protocol and timing parameters define the interface protocol of the memory controller 100.
  • the performance characteristics of the memory controller 100 depends on the implementation specifics of the micro-architecture as described in the following.
  • the memory controller 100 controls accesses to a memory (not shown) and comprises a history buffer 101 , a pending buffer 103 and a selection unit 105.
  • the history buffer stores processed access requests 1 11 to the memory, i.e. past access requests which have been processed by giving access to the memory.
  • the pending buffer 103 stores pending access requests 109 to the memory, i.e. access requests that are waiting for being allowed to access the memory.
  • the selection unit 105 is the unit which gives allowance for access to the memory.
  • the selection unit 105 selects an access request 107 to be processed from the pending access requests 109, i.e. from the access requests waiting for an allowance to access the memory.
  • the access request 107 is selected from an integer number of N+1 pending access requests directed to different memory banks of a memory comprising an integer number of M memory banks.
  • the number of pending access requests N is smaller or equal than the number of memory banks M.
  • the N+1 pending access requests from which the access request 107 is selected are the N+1 oldest pending access requests stored in the pending buffer 103.
  • the pending buffer 103 has a size of K buffer entries, wherein the size K is greater than the number N+1 of pending access requests.
  • Access requests may comprise read access requests and write access requests.
  • the access request comprises a number of up to N read access requests directed to different memory banks and one single write access request directed to one memory bank. A detailed implementation example is described with respect to Figure 5 below.
  • the selection or allowance is based on a cost function of the pending access requests 109 and the processed access requests 11 1. For each of the pending access requests 109 stored in the pending buffer 103 a cost is calculated with respect to each of the processed access requests 1 11 stored in the history buffer 101.
  • the selection unit 105 selects the pending access request 107 which has the minimum cost for being allowed to access the memory.
  • the cost may be based on timing parameters of the memory, e.g. by evaluating waste cycles between each of the pending access requests 109 and a last processed one 107 of the processed access requests 109.
  • Waste cycles are the processor cycles which cannot be used for processing because a timing dependency has to be observed such that the processor must wait for a specific number of processor cycles.
  • the selection unit 105 may select the pending access request which minimizes the evaluated waste cycles.
  • the waste cycles may be evaluated by considering timing dependencies between the pending access requests 109 and the processed access requests 11 1 stored in the history buffer 101.
  • the length of the history buffer 101 may be chosen such that the processed access requests 1 11 stored in the history buffer 101 have non-negligible timing dependencies with respect to the pending access requests 109.
  • Non-negligible timing dependencies produce different cost functions in the selection unit 105 because waste cycles between a pending access request 109 and the last processed one 107 depend on the timing dependency with respect to earlier processed access requests 1 11.
  • the length of the history buffer 101 is chosen such that only processed access requests 1 11 are stored therein which have a (non-negligible) timing relation to the pending access requests 109. Thus, the length of the history buffer 101 may be limited.
  • the history buffer 101 and the pending buffer 103 may be FIFO (First In First Out) buffers.
  • Fig. 2 shows a data processing system 200 with a memory controller 100 according to an implementation form which memory controller 100 corresponds to the memory controller 100 as depicted in Fig. 1.
  • a processor 202 in a computer needs memory storage 210 to process its data.
  • Data 204 (from hard drives, CD-ROM, DVD drives, Flash cards, peripheral cards, USB, routers, etc.) is stored in memory 210 first, before being delivered to the processor 202.
  • Memory 210 is divided in blocks of memory, called memory banks or memory modules, e.g. a number of M memory banks 210_1 , 210_2, 210_M as depicted in Fig. 2.
  • M is any natural number.
  • Data processing in a computer system can be represented by a funnel system as depicted in Fig. 2.
  • Data 204 is filled into the funnel which outputs data 204 to the memory 210; the funnel then "channels" the data 204 through its pipe to the processor's 202 input:
  • a "traffic" controller located in the funnel's pipe.
  • the memory controller 100 that handles all data transfers involving the memory modules 210_1 , 210_2, 210_M and the processor 202.
  • the memory controller 100 manages all movement of data between the processor 202 and the memory modules 210_1 , 210_2, 210_M.
  • Data 204 is sent to the memory controller 100 which may be part of a computer motherboard's "chipset".
  • the memory controller 100 is like a traffic signal that regulates data transfer either to memory modules 210_1 , 210_2, 210_M for storage, or to the processor 202 for data manipulation or "crunching".
  • Data 204 moves through the funnel's pipe in one direction at a time.
  • the memory controller 100 acts like a traffic signal that directs the movement of data 204 across the memory channel. For example, data arriving to the memory controller 100 is first stored in the memory modules 210_1 , 210_2, 210_M, then is re-read and finally transferred to the processor 202.
  • the memory controller 100 sends data 204 as fast as the processor 202 can receive it and stores it back into the memory modules 210_1 , 210_2, 210_M as fast as the processor 202 can "pump" the data 204 out.
  • the memory controller 100 reaches its peak efficiency when the data throughput from the processor 202 matches the memory modules' 210_1 , 210_2, 210_M throughput.
  • a memory controller corresponding to the memory controller 100 as described with respect to Fig. 1 controls the accesses to the memory banks 210_1 , 210_2, 210_M by a cost function weighting the accesses to the memory with respect to timing parameters of the memory 210.
  • An access offering the lowest cost, e.g. with respect to waste cycles, is allowed to access the memory 210 thereby increasing the speed of the memory accesses and reducing number of events in which the funnel is overfilled with data.
  • the memory 210 is a DRAM and the memory controller 100 is a DRAM controller. In an implementation form, the memory 210 is a DDR SDRAM and the memory controller 100 is a DDR SDRAM controller. In an implementation form, the memory 210 is a DDR3 SDRAM and the memory controller 100 is a DDR3 SDRAM controller.
  • the memory 210 is a DDR3 SDRAM and the timing parameters are one or more of the following: a row-to-row activation delay time tRRD, a row-cycle time tRC, a write recovery time tWR, a read-to-write delay time tRTW, a write-to-read delay time tWTR, a burst length BL and a clock frequency. Exemplary values for these timing parameters are given in Table 1. Table 1 : DDR3-1600 timing
  • Fig. 3 shows a memory controller 300 according to an implementation form.
  • the memory controller 300 controls accesses to a memory (not shown) and comprises a history buffer 301 or a history access register, a pending buffer 303 or a pending access register and a selection unit 305.
  • the history buffer 301 stores processed access requests AO, A1 , A2, A3, A4, A5, A6 and A7 to the memory, i.e. past access requests which have been processed by giving access to the memory.
  • the pending buffer 303 stores pending access requests RD0, RD1 , RD2, RD3, RD4, RD5, RD6, RD7 and WR to the memory, i.e.
  • the selection unit 305 is the unit which gives allowance for access to the memory.
  • the selection unit 305 selects a best access request 307 to be processed from the pending access requests RDO, RD1 , RD2, RD3, RD4, RD5, RD6, RD7 and WR.
  • the selection or allowance is based on a cost function of the pending access requests RDO, RD1 , RD2, RD3, RD4, RD5, RD6, RD7 and WR and the processed access requests AO, A1 , A2, A3, A4, A5, A6 and A7.
  • a cost is calculated with respect to each other of the pending access requests RDO, RD1 , RD2, RD3, RD4, RD5, RD6, RD7 and WR and with respect to each of the processed access requests AO, A1 , A2, A3, A4, A5, A6 and A7 stored in the history buffer 301.
  • the logic that selects the best access from the window of pending accesses RDO, RD1 , RD2, RD3, RD4, RD5, RD6, RD7 and WR is a decision matrix 315.
  • the matrix 315 calculates the cost between all couple combinations 315_y_x of pending RDO, RD1 , RD2, RD3, RD4, RD5, RD6, RD7, WR and history AO, A1 , A2, A3, A4, A5, A6, A7 accesses. This is shown in Fig. 3.
  • the cost is calculated by accumulating the costs of the write access request WR with respect to the processed access request AO determined by the couple combination 315_WR_A0, the write access request WR with respect to the processed access request A1 determined by the couple combination
  • the costs determined by the respective couple combinations 315_y_x (y representing the line inputs RDO, RD1 , RD2, RD3, RD4, RD5, RD6, RD7, WR and x representing the column inputs A1 , A2, A3, A4, A5, A6, A7 of matrix 315) may be determined by evaluating waste cycles between the respective pending access request RDO, RD1 , RD2, RD3, RD4, RD5, RD6, RD7, WR and the respective processed access request AO, A1 , A2, A3, A4, A5, A6, A7.
  • the memory comprises a number of M memory banks and the selection unit 305 selects the access request to be processed from N pending read access requests and one pending write access request, wherein the N pending read access requests are directed to different ones of the memory banks.
  • Figure 3 illustrates a number of 8 read access requests RDO, RD1 , RD2, RD3, RD4, RD5, RD6, RD7 and one write access request WR.
  • the number N may be any natural number.
  • the number N of pending read access requests is smaller or equal than the number M of memory banks.
  • the number N of the pending read access requests plus the number 1 of the pending write access request is smaller or equal than the size K of the pending buffer.
  • read access request RDO is directed to the first memory bank 210
  • read access request RD1 is directed to the second memory bank 21 1
  • read access request RD2 is directed to the third memory bank 212
  • read access request RD3 is directed to the fourth memory bank 213
  • read access request RD4 is directed to the fifth memory bank 214
  • read access request RD5 is directed to the sixth memory bank 215
  • read access request RD6 is directed to the seventh memory bank 216
  • read access request RD7 is directed to the eighth memory bank 217.
  • the eight read access requests RDO, RD1 , RD2, RD3, RD4, RD5, RD6, RD7 are requesting parallel access to different memory blocks of a memory having at least eight memory blocks and the one write access request WR is requesting access to a best memory block, e.g. a memory block having the least entries.
  • the pending buffer 303 is sufficiently large to store the eight read access requests RDO, RD1 , RD2, RD3, RD4, RD5, RD6, RD7, i.e. the pending buffer 303 has a size of at least 8 entries in this implementation form.
  • the N pending read access requests are the oldest read access requests stored in the pending buffer 303 of size K with respect to the memory banks they are directed to and the one pending write access request is the oldest write access request stored in the pending buffer 303.
  • Fig. 4 shows a diagram 400 of an aging weight function according to an implementation form.
  • the aging weight function assigns a weight 403 to access ages 405 of the pending access requests 109.
  • the weight 403 is zero for access ages 405 ranging from zero to a predetermined access age 401 and is increasing linearly for access ages 405 greater than the predetermined access age 401.
  • aging weight functions for example an exponential increasing of the weight or a predetermined access age 401 being zero such that the weight 403 is increasing from the beginning.
  • the memory controller 100, 300 uses a greedy cost function as it always prefers the access with lowest cost.
  • an aging weight 403 is implemented in the memory controller 100, 300.
  • the weight 403 is a monolithic function with weight value growing the longer an access remains in the pending window 103.
  • the selection unit 105, 305 subtracts the aging value 403 from the cost causing the memory controller 100, 300 to prefer older accesses.
  • An example for an aging function is shown in Fig. 4.
  • An aging weight 403 is assigned to each of the pending access requests 109 in the pending buffer 103.
  • the aging weight 403 indicates a length of stay 405 or an access age of the associated pending access request 109 in the pending buffer 103.
  • An aging weight 403 assigned to an older pending access request is greater or equal than an aging weight 403 assigned to a younger pending access request.
  • a cost provided by the cost function for a pending access request 109 is reduced by the aging weight 403 assigned to the pending access request 109.
  • Fig. 5 shows a memory controller 500 according to an implementation form.
  • the memory controller 500 may correspond to the memory controller 300 as described with respect to Fig. 3 or to the memory controller 100 as described with respect to Fig. 1.
  • the memory controller 500 comprises a pending buffer 503 of size K, where K is any integer number, which pending buffer 503 stores a number of K pending access requests 509_1 , 509_2, 509_3, 509_K.
  • the memory controller 500 further comprises a number of N+1 age selectors 510_1 , 510_2, ... , 510_N, 510_N+1 , where N is any integer number smaller or equal than K.
  • 510_N, 510_N+1 selects a pending access request from the K pending access requests 509_1 , 509_2, 509_3, ... , 509_K stored in the pending buffer 503.
  • the first age selector 510_1 selects a read access request RDO directed to a first memory bank from the K pending access requests 509_1 , 509_2, 509_3, ... , 509_K;
  • the second age selector 510_2 selects a read access request RD1 directed to a second memory bank from the K pending access requests 509_1 , 509_2, 509_3, ...
  • the third age selector 510_2 selects a read access request RD2 directed to a third memory bank from the K pending access requests 509_1 , 509_2, 509_3, ... , 509_K and so on until the Nth (in this implementation form the eighth) age selector 510_N selects a read access request RD7 directed to an eighth memory bank from the K pending access requests 509_1 , 509_2, 509_3, ... , 509_K.
  • the N+1th (in this implementation form the ninth) age selector 510_N+1 selects a write access request WR directed to a best memory bank, e.g. a memory bank having the most free entries or an arbitrary memory bank, from the K pending access requests 509_1 , 509_2, 509_3, ... , 509_K according to the mechanism described below.
  • the oldest read access to each memory bank stored in the pending buffer 503 and the oldest write access stored in the pending buffer 503 is selected.
  • the age selectors 510_1 , 510_2, ... , 510_N, 510_N+1 are configured to select the oldest pending access request directed to the respective memory bank 1 to N from the K pending access requests 509_1 , 509_2, 509_3, ... , 509_K stored in the pending buffer 503.
  • the write access request WR is selected according to the best memory bank.
  • the mechanism can work in the following way:
  • the memory controller 500 knows how many entries are used in each memory bank 210_1 , 210_2, 210_M and defines a list of memory banks that has less entries than others. This list is known as recommended banks and can hold a number of 1 to M memory banks.
  • the recommended list is matched to the history buffer 101 and any memory bank that resides in both, i.e. the recommended list and the history buffer 101 , is removed from the recommended list.
  • One of the memory banks in the resulting list (of phase [2]) is selected to be the bank to which the write transaction will be written.
  • the write transaction is processed the same way as the N read transactions.
  • the memory controller 500 uses an optimized window implementation.
  • Memory with 8 banks has a maximum of 8 different accesses. For example: effective window with 30 positions should select one read access from each bank and one write access, the actual write bank access is selected later from available free banks. The selected read access should be the oldest of the bank.
  • the pending access requests RD0, RD1 , RD2, RD3, RD4, RD5, RD6, RD7, WR selected by the age selectors 510_1 , 510_2 510_N, 510_N+1 correspond to the pending access requests RD0, RD1 , RD2, RD3, RD4, RD5, RD6, RD7, WR illustrated in Fig. 3 which are input to the decision matrix 315.
  • the decision matrix 315 may have a number of eight times nine, i.e. 72 decision elements 315_y_x.
  • the memory controller 500 has a reduced complexity by minimizing the window size in the pending buffer 103 and can be efficiently implemented.
  • the memory controller 500 applies the strategy of write caching.
  • the basic idea for write caching is that write requests are typically non-critical in terms of performance, but read requests may be critical. As a result, it is typically desirable to cache write requests and allow read requests to proceed ahead. Furthermore, DRAM devices are typically poorly designed to support back-to-back read and write requests.
  • When a column read command follows a write command due to the differences in the direction of data flow between read and write commands, significant overheads exist when column read and write commands are pipelined back-to-back.
  • the strategy of write caching allows read requests that may be critical to application performance to proceed ahead of write requests, and the write caching strategy can also reduce read-write overheads when it is combined with a strategy to send multiple write requests to the memory system consecutively.
  • the memory controller 500 utilizes the write caching strategy thereby prioritizing read requests over write requests.
  • Fig. 6 shows a performance diagram 600 of a memory controller according to an
  • the performance diagram 600 illustrates normalized bandwidth over packet length, wherein packet lengths between 64 and 2007 are simulated.
  • the memory controller 100 was simulated with various parameters and under random access scenario.
  • the graphs 601 and 602 depicted in Fig. 6 illustrate the performance results with 601 and without 602 optimization for various length packets.
  • a reference memory system containing 5 x 16-bit DDR3-1600 devices running at 800MHz is defined.
  • Then the bandwidth required by packets in all lengths with 601 and without 602 optimization is simulated and normalized to the reference system.
  • the graph 601 illustrates that for small packets the normalized bandwidth is 2.1. It means that for small packets, 2.1 times the reference memory bandwidth is consumed.
  • the bandwidth requirements from the optimized design 601 are low compared to a design without optimization 602.
  • the gap between the two lines of the graphs 601 and 602 is the bandwidth saved by the optimized design per packet lengths.
  • Fig. 7 shows a performance diagram 700 of a memory controller according to an
  • the window length for the optimized design is fixed.
  • the graph illustrated in Fig. 7 shows the optimized design efficiency for different window lengths, e.g. value K described herein, in the range between 1 and 37.
  • the design efficiency increases with increasing window size.
  • the window size is 1 (nothing to select)
  • a design efficiency of 40% can be reached, for a window size of 8 a design efficiency of 82% can be reached and for a window size of 37 a design efficiency of 97% can be reached.
  • Fig. 8 shows a schematic diagram of a method 800 for controlling accesses to a memory according to an implementation form.
  • the method 800 comprises storing processed access requests 801 to the memory in a history buffer; storing pending access requests 803 to the memory in a pending buffer; and selecting 805 an access request to be processed from the pending access requests, wherein the selection is based on a cost function of the pending access requests and the processed access requests.
  • the memory may comprise a number of M memory banks and the method may comprise a selecting of the access request to be processed from N pending read access requests and one pending write access request, wherein the N pending read access requests are directed to different ones of the memory banks.
  • the N pending read access requests, from which the access request to be processed is selected may be the oldest read access requests stored in the pending buffer with respect to the memory banks they are directed to.
  • the one write access request, from which the access request to be processed is selected may be the oldest write access request stored in the pending buffer.
  • the selecting the one pending write access request may comprise an assigning of memory banks from the M memory banks to recommended memory banks such that the
  • recommended memory banks have less entries than memory banks not assigned to the recommended memory banks. Memory banks, to which a write access request stored in the history buffer was directed, are not assigned to the recommended memory banks. The one pending write access request may be selected from pending write access requests directed to one of the recommended memory banks.
  • the one pending write access request can be selected from pending write access requests directed the one of the recommended memory banks which one has the lowest number of entries. If no memory bank is assigned to the recommended memory banks, the one pending write access request is selected from pending write access requests directed to an arbitrary one of the M memory banks.
  • the write is selected according to the best bank.
  • the mechanism works in the following way:
  • the memory controller 100 knows how many entries are used in each memory bank 210_1 , 201_2, 210_M and defines a list of memory banks that has less entries than others. This list is known as recommended banks and can hold a number of 1 to M memory banks.
  • the recommended list is matched to the history buffer 101 and any memory bank that resides in both, i.e. the recommended list and the history buffer 101 , is removed from the recommended list.
  • One of the memory banks in the resulting list (of phase [2]) is selected to be the bank to which the write transaction will be written.
  • the present disclosure also supports a computer program product including computer executable code or computer executable instructions that, when executed, causes at least one computer to execute the performing and computing steps described herein.
  • the present disclosure also supports a system configured to execute the performing and computing steps described herein.

Abstract

The invention relates to a memory controller (100) for controlling accesses to a memory, comprising a history buffer (101) configured to store processed access requests (111) to the memory; a pending buffer (103) configured to store pending access requests (109) to the memory; and a selection unit (105) configured to select an access request (107) to be processed from the pending access requests (109), the selection being based on a cost function of the pending access requests (109) and the processed access requests (111).

Description

DESCRIPTION
Memory controller and method for controlling accesses to a memory BACKGROUND OF THE INVENTION
The present invention relates to a memory controller and to a method for controlling accesses to a memory. Traditionally, the GHz (Gigahertz) number, indicating processor speeds, has always been one of the key factors in any computer systems. As the GHz number increases, so does the performance of the computer. The speed of system memory is another determining factor for computing performance. The most common form of memory installed today is Dynamic Random Access Memory (DRAM). A computer's memory is a temporary storage area for data that needs to be available for programs to run efficiently. The faster the memory can provide data, the more work the processor can perform. Increased data "throughput" translates directly into better system performance. DRAM devices are widely used due to their cost performance high density and low power. Since 1997, there have been several major transitions in DRAM memory speed and technology. SDRAM started with a memory speed of 66 MHz (PC66) and progressed to 100MHz (PC100) and then to 133MHz (PC133). In 2002, standard SDRAM began to be replaced with the faster Double Data Rate (DDR) SDRAM memory technology. DDR memory started with 200MHz DDR (or DDR200) and is now available in DDR266, DDR333, and DDR400 speeds for mainstream PCs. In the past, memory speeds were able to keep up with the processor's requirements. However, the point was reached where the processor's ability to process data was accelerating faster than current memory technologies could support, memory became a major limitation in system performance. Simply put, memory speeds could no longer keep up with advances in processor speeds and data throughput. The DRAM continues to be the performance bottleneck in system design and the memory controller is one weak link in this chain. A new concept to get more data to the processor in mainstream computers was needed - without relying solely on memory speed.
Further to that, applications like networking have high percentage of random accesses where DRAM capability is the worst. Some applications reduce the amount of randomness using cache architectures. However the timing cost of random access may tend to increase so that even in low randomness the performance impact may not be tolerable. Applications where random accesses cannot be controlled suffer the most from performance degradation.
Hence there is a demand for improving memory access in a computer system.
SUMMARY OF THE INVENTION
It is the object of the invention to provide a concept for improving memory access in a computer system.
This object is achieved by the features of the independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures.
The invention is based on the finding that the time it takes to perform n random memory accesses varies depending on how the accesses are ordered. The ordering can be described by a single-source shortest path problem for a graph with nonnegative edge path costs. As during the time of progressing through the access, e.g. via graph nodes, other nodes may be added to the graph. Thus, a change of the shortest path may occur.
Therefore, an algorithm can take into account the possible future nodes and their properties. An access is represented by a vertex (node). The cost of access defines a path from one vertex to the next. An algorithm for finding the optimum access order in very high clock rate with near optimal results but yet with low complexity maintains history of memory accesses and pending memory accesses. To select the next access, the algorithm computes the cost of performing each of the pending accesses with relation to the history accesses. The history length is until there are no timing effects between history and new accesses. The algorithm maintains a window with pending accesses and can be implemented by hardware.
In order to describe the invention in detail, the following terms, abbreviations and notations will be used:
DRAM: Dynamic Random Access Memory,
SDRAM: Synchronous Dynamic Random Access Memory, DDR: Double Data Rate, DDR3: Double Data Rate Type 3,
PC: Personal Computer, tRRD: Row-to-row activation delay,
BL: Block Length, tRC: Row-cycle time, tRTW: Read-to-write delay time, tWTR: Write-to-read delay time, tWR: Write Recovery time,
DSP: Digital Signal Processor, ASIC: application specific integrated circuit
According to a first aspect, the invention relates to a memory controller for controlling accesses to a memory, comprising: a history buffer configured to store processed access requests to the memory; a pending buffer configured to store pending access requests to the memory; and a selection unit configured to select an access request to be processed from the pending access requests, the selection being based on a cost function of the pending access requests and the processed access requests.
In a first possible implementation form of the memory controller according to the first aspect, the cost function is based on timing parameters of the memory.
In a second possible implementation form of the memory controller according to the first aspect as such or according to the first implementation form of the first aspect, the cost function is configured to evaluate waste cycles between each of the pending access requests and a last processed one of the processed access requests. In a third possible implementation form of the memory controller according to the second implementation form of the first aspect, the selection unit is configured to select a pending access request which minimizes the evaluated waste cycles. In a fourth possible implementation form of the memory controller according to the second or according to the third implementation form of the first aspect, the evaluating the waste cycles considers timing dependencies between the pending access requests and the processed access requests stored in the history buffer. In a fifth possible implementation form of the memory controller according to the first aspect as such or according to any of the previous implementation forms of the first aspect, a length of the history buffer is such that the processed access requests stored in the history buffer have non-negligible timing dependencies with respect to the pending access requests. In a sixth possible implementation form of the memory controller according to the first aspect as such or according to the any of the preceding implementation forms of the first aspect, an aging weight is assigned to each of the pending access requests in the pending buffer.
In a seventh possible implementation form of the memory controller according to the sixth implementation form of the first aspect, the aging weight indicates a length of stay of the associated pending access request in the pending buffer.
In an eighth possible implementation form of the memory controller according to the sixth or according to the seventh implementation form of the first aspect, an aging weight assigned to an older pending access request is greater or equal than an aging weight assigned to a younger pending access request.
In a ninth possible implementation form of the memory controller according to one of the sixth to the eighth implementation forms of the first aspect, a cost provided by the cost function for a pending access request is reduced by the aging weight assigned to the pending access request.
In a tenth possible implementation form of the memory controller according to the first aspect as such or according to the any of the preceding implementation forms of the first aspect, the memory is a DRAM. In an eleventh possible implementation form of the memory controller according to the first aspect as such or according to the any of the preceding implementation forms of the first aspect, the memory is a DDR SDRAM, in particular a DDR3 SDRAM. In a twelfth possible implementation form of the memory controller according to the eleventh implementation form of the first aspect, the timing parameters are one or more of the following: a row-to-row activation delay time, a row-cycle time, a write recovery time, a read- to-write delay time, a write-to-read delay time, a burst length and a clock frequency. In a thirteenth possible implementation form of the memory controller according to the first aspect as such or according to the any of the preceding implementation forms of the first aspect, the memory comprises a number of M memory banks;
the selection unit is configured to select the access request to be processed from N pending read access requests and one pending write access request; and the N pending read access requests are directed to different ones of the memory banks.
In a fourteenth possible implementation form of the memory controller according to the thirteenth implementation form of the first aspect, the number N of pending read access requests is smaller or equal than the number M of memory banks.
In a fifteenth possible implementation form of the memory controller according to the thirteenth implementation form or according to the fourteenth implementation form of the first aspect, the N pending read access requests are the oldest read access requests stored in the pending buffer of size K with respect to the memory banks they are directed to, and the one pending write access request is the oldest write access request stored in the pending buffer.
In a sixteenth possible implementation form of the memory controller according to the fifteenth implementation form of the first aspect, the number N of the pending read access requests plus the number 1 of the pending write access request is smaller or equal than the size K of the pending buffer.
According to a second aspect, the invention relates to a method for controlling accesses to a memory, comprising: storing processed access requests to the memory in a history buffer; storing pending access requests to the memory in a pending buffer; and selecting an access request to be processed from the pending access requests, the selection being based on a cost function of the pending access requests and the processed access requests.
In a first possible implementation form of the method according to the second aspect, the memory comprises a number of M memory banks and the method further comprises:
selecting the access request to be processed from N pending read access requests and one pending write access request, wherein the N pending read access requests are directed to different ones of the memory banks. In a second possible implementation form of the method according to the first implementation form of the second aspect, the N pending read access requests, from which the access request to be processed is selected, are the oldest read access requests stored in the pending buffer with respect to the memory banks they are directed to; and the one write access request, from which the access request to be processed is selected, is the oldest write access request stored in the pending buffer.
In a third possible implementation form of the method according to the first or according to the second implementation form of the second aspect, the selecting the one pending write access request comprises: assigning memory banks from the M memory banks to recommended memory banks such that the recommended memory banks have less entries than memory banks not assigned to the recommended memory banks, wherein memory banks, to which a write access request stored in the history buffer was directed, are not assigned to the recommended memory banks; and selecting the one pending write access request from pending write access requests directed to one of the recommended memory banks.
In a fourth possible implementation form of the method according to the third implementation form of the second aspect, the one pending write access request is selected from pending write access requests directed the one of the recommended memory banks which one has the lowest number of entries; and the one pending write access request is selected from pending write access requests directed to an arbitrary one of the M memory banks if no memory bank is assigned to the recommended memory banks. According to a third aspect, the invention relates to a computer program for implementing a method according to the second aspect as such or according to any of the implementation forms of the second aspect. According to aspects of the invention, DRAM performance is increased by an innovative controller design. The controller design is flexible to adaptation of the same design to different devices by applying new sets of rules, the same design may be optimized to different memory devices and different technologies by a simple configuration. The algorithm is configured by a set of rules, e.g. defined by the timing parameters, for the memory device it interacts with. The presented technique optimizes the sequence of memory accesses and thus increases the memory performance. DRAM performance is enhanced by an innovative controller mechanism as described below with respect to the drawings.
The memory controller according to the first aspect can be implemented in every design using DRAM memory. It enhances system performance or even reduces the system cost by relaxing the memory timing requirement. As such the system designer may be able to use lower speed and lower cost memory to achieve a required target bandwidth.
The methods described herein may be implemented as hardware or as software in a memory controller, in a micro-controller, in a Digital Signal Processor (DSP) or in any other side- processor or as hardware circuit within an application specific integrated circuit (ASIC).
The invention can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations thereof. .
BRIEF DESCRIPTION OF THE DRAWINGS
Further embodiments of the invention will be described with respect to the following figures, in which:
Fig. 1 shows a memory controller 100 according to an implementation form;
Fig. 2 shows a data processing system 200 with a memory controller according to an implementation form; Fig. 3 shows a memory controller 300 according to an implementation form;
Fig. 4 shows a diagram 400 of an aging weight function according to an implementation form; Fig. 5 shows a memory controller 500 according to an implementation form;
Fig. 6 shows a performance diagram 600 of a memory controller according to an
implementation form; Fig. 7 shows a performance diagram 700 of a memory controller according to an
implementation form; and
Fig. 8 shows a schematic diagram of a method 800 for controlling accesses to a memory according to an implementation form.
DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION
Further embodiments of the invention will be described with respect to the following figures, in which:
Fig. 1 shows a memory controller 100 according to an implementation form. The memory controller 100 acts as the glue logic that connects processors, high speed input-output devices and the memory system (not shown) to each other. The memory controller 100 can exist as a separate device or as part of the processor and integrated into the processor package; the function of the memory controller 100 remains essentially the same in either case. The primary function of the memory controller 100 is to manage the flow of data between the processors, input-output devices and the memory system, correctly and efficiently. The function of the memory controller 100 is to manage the flow of data into and out of the memory devices. The memory access protocol and timing parameters define the interface protocol of the memory controller 100. The performance characteristics of the memory controller 100 depends on the implementation specifics of the micro-architecture as described in the following.
The memory controller 100 controls accesses to a memory (not shown) and comprises a history buffer 101 , a pending buffer 103 and a selection unit 105. The history buffer stores processed access requests 1 11 to the memory, i.e. past access requests which have been processed by giving access to the memory. The pending buffer 103 stores pending access requests 109 to the memory, i.e. access requests that are waiting for being allowed to access the memory. The selection unit 105 is the unit which gives allowance for access to the memory. The selection unit 105 selects an access request 107 to be processed from the pending access requests 109, i.e. from the access requests waiting for an allowance to access the memory.
In an implementation form, the access request 107 is selected from an integer number of N+1 pending access requests directed to different memory banks of a memory comprising an integer number of M memory banks. In an implementation form, the number of pending access requests N is smaller or equal than the number of memory banks M. In an
implementation form, the N+1 pending access requests from which the access request 107 is selected, are the N+1 oldest pending access requests stored in the pending buffer 103. In an implementation form, the pending buffer 103 has a size of K buffer entries, wherein the size K is greater than the number N+1 of pending access requests. Access requests may comprise read access requests and write access requests. In an implementation form, the access request comprises a number of up to N read access requests directed to different memory banks and one single write access request directed to one memory bank. A detailed implementation example is described with respect to Figure 5 below.
The selection or allowance is based on a cost function of the pending access requests 109 and the processed access requests 11 1. For each of the pending access requests 109 stored in the pending buffer 103 a cost is calculated with respect to each of the processed access requests 1 11 stored in the history buffer 101. The selection unit 105 selects the pending access request 107 which has the minimum cost for being allowed to access the memory.
The cost may be based on timing parameters of the memory, e.g. by evaluating waste cycles between each of the pending access requests 109 and a last processed one 107 of the processed access requests 109. Waste cycles are the processor cycles which cannot be used for processing because a timing dependency has to be observed such that the processor must wait for a specific number of processor cycles. The selection unit 105 may select the pending access request which minimizes the evaluated waste cycles. The waste cycles may be evaluated by considering timing dependencies between the pending access requests 109 and the processed access requests 11 1 stored in the history buffer 101. The length of the history buffer 101 may be chosen such that the processed access requests 1 11 stored in the history buffer 101 have non-negligible timing dependencies with respect to the pending access requests 109. Non-negligible timing dependencies produce different cost functions in the selection unit 105 because waste cycles between a pending access request 109 and the last processed one 107 depend on the timing dependency with respect to earlier processed access requests 1 11. The length of the history buffer 101 is chosen such that only processed access requests 1 11 are stored therein which have a (non-negligible) timing relation to the pending access requests 109. Thus, the length of the history buffer 101 may be limited.
After the selected pending access has accessed the memory it is stored in the history buffer 101. The history buffer 101 and the pending buffer 103 may be FIFO (First In First Out) buffers.
Fig. 2 shows a data processing system 200 with a memory controller 100 according to an implementation form which memory controller 100 corresponds to the memory controller 100 as depicted in Fig. 1. A processor 202 in a computer needs memory storage 210 to process its data. Data 204 (from hard drives, CD-ROM, DVD drives, Flash cards, peripheral cards, USB, routers, etc.) is stored in memory 210 first, before being delivered to the processor 202. Memory 210 is divided in blocks of memory, called memory banks or memory modules, e.g. a number of M memory banks 210_1 , 210_2, 210_M as depicted in Fig. 2. M is any natural number. When more data 204 can be delivered to the processor 202 via memory 210 at faster speeds, the processor 202 can manipulate instructions and data 204 more efficiently and ultimately, the requested task can be accomplished in less time.
Data processing in a computer system can be represented by a funnel system as depicted in Fig. 2. Data 204 is filled into the funnel which outputs data 204 to the memory 210; the funnel then "channels" the data 204 through its pipe to the processor's 202 input: To prevent the funnel from being over-filled with data 204, there is a "traffic" controller located in the funnel's pipe. In computers, there is a special chip called the "memory controller" 100 that handles all data transfers involving the memory modules 210_1 , 210_2, 210_M and the processor 202. The memory controller 100 manages all movement of data between the processor 202 and the memory modules 210_1 , 210_2, 210_M. Data 204 is sent to the memory controller 100 which may be part of a computer motherboard's "chipset". The memory controller 100 is like a traffic signal that regulates data transfer either to memory modules 210_1 , 210_2, 210_M for storage, or to the processor 202 for data manipulation or "crunching".
Graphically, this architecture is pictured in Fig. 2: Data 204 moves through the funnel's pipe in one direction at a time. The memory controller 100 acts like a traffic signal that directs the movement of data 204 across the memory channel. For example, data arriving to the memory controller 100 is first stored in the memory modules 210_1 , 210_2, 210_M, then is re-read and finally transferred to the processor 202. The memory controller 100 sends data 204 as fast as the processor 202 can receive it and stores it back into the memory modules 210_1 , 210_2, 210_M as fast as the processor 202 can "pump" the data 204 out. The memory controller 100 reaches its peak efficiency when the data throughput from the processor 202 matches the memory modules' 210_1 , 210_2, 210_M throughput.
A memory controller corresponding to the memory controller 100 as described with respect to Fig. 1 controls the accesses to the memory banks 210_1 , 210_2, 210_M by a cost function weighting the accesses to the memory with respect to timing parameters of the memory 210. An access offering the lowest cost, e.g. with respect to waste cycles, is allowed to access the memory 210 thereby increasing the speed of the memory accesses and reducing number of events in which the funnel is overfilled with data.
In an implementation form, the memory 210 is a DRAM and the memory controller 100 is a DRAM controller. In an implementation form, the memory 210 is a DDR SDRAM and the memory controller 100 is a DDR SDRAM controller. In an implementation form, the memory 210 is a DDR3 SDRAM and the memory controller 100 is a DDR3 SDRAM controller.
In an implementation form, the memory 210 is a DDR3 SDRAM and the timing parameters are one or more of the following: a row-to-row activation delay time tRRD, a row-cycle time tRC, a write recovery time tWR, a read-to-write delay time tRTW, a write-to-read delay time tWTR, a burst length BL and a clock frequency. Exemplary values for these timing parameters are given in Table 1. Table 1 : DDR3-1600 timing
Figure imgf000014_0001
Fig. 3 shows a memory controller 300 according to an implementation form. The memory controller 300 controls accesses to a memory (not shown) and comprises a history buffer 301 or a history access register, a pending buffer 303 or a pending access register and a selection unit 305. The history buffer 301 stores processed access requests AO, A1 , A2, A3, A4, A5, A6 and A7 to the memory, i.e. past access requests which have been processed by giving access to the memory. The pending buffer 303 stores pending access requests RD0, RD1 , RD2, RD3, RD4, RD5, RD6, RD7 and WR to the memory, i.e. access requests that are waiting for being allowed to access the memory, wherein RDO, RD1 , RD2, RD3, RD4, RD5, RD6 and RD7 are pending read access requests and WR is a pending write access request. The selection unit 305 is the unit which gives allowance for access to the memory. The selection unit 305 selects a best access request 307 to be processed from the pending access requests RDO, RD1 , RD2, RD3, RD4, RD5, RD6, RD7 and WR. The selection or allowance is based on a cost function of the pending access requests RDO, RD1 , RD2, RD3, RD4, RD5, RD6, RD7 and WR and the processed access requests AO, A1 , A2, A3, A4, A5, A6 and A7. For each of the pending access requests RDO, RD1 , RD2, RD3, RD4, RD5, RD6, RD7 and WR stored in the pending buffer 303 a cost is calculated with respect to each other of the pending access requests RDO, RD1 , RD2, RD3, RD4, RD5, RD6, RD7 and WR and with respect to each of the processed access requests AO, A1 , A2, A3, A4, A5, A6 and A7 stored in the history buffer 301. The logic that selects the best access from the window of pending accesses RDO, RD1 , RD2, RD3, RD4, RD5, RD6, RD7 and WR is a decision matrix 315. The matrix 315 calculates the cost between all couple combinations 315_y_x of pending RDO, RD1 , RD2, RD3, RD4, RD5, RD6, RD7, WR and history AO, A1 , A2, A3, A4, A5, A6, A7 accesses. This is shown in Fig. 3.
For the write access request WR, for example, the cost is calculated by accumulating the costs of the write access request WR with respect to the processed access request AO determined by the couple combination 315_WR_A0, the write access request WR with respect to the processed access request A1 determined by the couple combination
315_WR_A1 , the write access request WR with respect to the processed access request A2 determined by the couple combination 315_WR_A2, the write access request WR with respect to the processed access request A3 determined by the couple combination
315_WR_A3, the write access request WR with respect to the processed access request A4 determined by the couple combination 315_WR_A4, the write access request WR with respect to the processed access request A5 determined by the couple combination
315_WR_A5, the write access request WR with respect to the processed access request A6 determined by the couple combination 315_WR_A6 and the write access request WR with respect to the processed access request A7 determined by the couple combination
315_WR_A7.
Analogously the costs for the read access requests RDO, RD1 , RD2, RD3, RD4, RD5, RD6, RD7 are calculated. The costs determined by the respective couple combinations 315_y_x (y representing the line inputs RDO, RD1 , RD2, RD3, RD4, RD5, RD6, RD7, WR and x representing the column inputs A1 , A2, A3, A4, A5, A6, A7 of matrix 315) may be determined by evaluating waste cycles between the respective pending access request RDO, RD1 , RD2, RD3, RD4, RD5, RD6, RD7, WR and the respective processed access request AO, A1 , A2, A3, A4, A5, A6, A7.
In an implementation form, the memory comprises a number of M memory banks and the selection unit 305 selects the access request to be processed from N pending read access requests and one pending write access request, wherein the N pending read access requests are directed to different ones of the memory banks. Figure 3 illustrates a number of 8 read access requests RDO, RD1 , RD2, RD3, RD4, RD5, RD6, RD7 and one write access request WR. The number N may be any natural number. In an implementation form, the number N of pending read access requests is smaller or equal than the number M of memory banks. In an implementation form, the number N of the pending read access requests plus the number 1 of the pending write access request is smaller or equal than the size K of the pending buffer.
In an implementation form, read access request RDO is directed to the first memory bank 210, read access request RD1 is directed to the second memory bank 21 1 , read access request RD2 is directed to the third memory bank 212, read access request RD3 is directed to the fourth memory bank 213, read access request RD4 is directed to the fifth memory bank 214, read access request RD5 is directed to the sixth memory bank 215, read access request RD6 is directed to the seventh memory bank 216 and read access request RD7 is directed to the eighth memory bank 217. In an implementation form, the eight read access requests RDO, RD1 , RD2, RD3, RD4, RD5, RD6, RD7 are requesting parallel access to different memory blocks of a memory having at least eight memory blocks and the one write access request WR is requesting access to a best memory block, e.g. a memory block having the least entries. In an implementation form, the pending buffer 303 is sufficiently large to store the eight read access requests RDO, RD1 , RD2, RD3, RD4, RD5, RD6, RD7, i.e. the pending buffer 303 has a size of at least 8 entries in this implementation form.
In an implementation form, the N pending read access requests are the oldest read access requests stored in the pending buffer 303 of size K with respect to the memory banks they are directed to and the one pending write access request is the oldest write access request stored in the pending buffer 303. Fig. 4 shows a diagram 400 of an aging weight function according to an implementation form. The aging weight function assigns a weight 403 to access ages 405 of the pending access requests 109. In the implementation form, the weight 403 is zero for access ages 405 ranging from zero to a predetermined access age 401 and is increasing linearly for access ages 405 greater than the predetermined access age 401. Other implementation forms support other types of aging weight functions, for example an exponential increasing of the weight or a predetermined access age 401 being zero such that the weight 403 is increasing from the beginning. The memory controller 100, 300 uses a greedy cost function as it always prefers the access with lowest cost. However, in order to limit the time of certain accesses in the pending window or pending buffer 103 an aging weight 403 is implemented in the memory controller 100, 300. In the implementation form, the weight 403 is a monolithic function with weight value growing the longer an access remains in the pending window 103. The selection unit 105, 305 subtracts the aging value 403 from the cost causing the memory controller 100, 300 to prefer older accesses. An example for an aging function is shown in Fig. 4.
An aging weight 403 is assigned to each of the pending access requests 109 in the pending buffer 103. The aging weight 403 indicates a length of stay 405 or an access age of the associated pending access request 109 in the pending buffer 103. An aging weight 403 assigned to an older pending access request is greater or equal than an aging weight 403 assigned to a younger pending access request. A cost provided by the cost function for a pending access request 109 is reduced by the aging weight 403 assigned to the pending access request 109.
Fig. 5 shows a memory controller 500 according to an implementation form. The memory controller 500 may correspond to the memory controller 300 as described with respect to Fig. 3 or to the memory controller 100 as described with respect to Fig. 1. The memory controller 500 comprises a pending buffer 503 of size K, where K is any integer number, which pending buffer 503 stores a number of K pending access requests 509_1 , 509_2, 509_3, 509_K. The memory controller 500 further comprises a number of N+1 age selectors 510_1 , 510_2, ... , 510_N, 510_N+1 , where N is any integer number smaller or equal than K. Each of the age selectors 510_1 , 510_2, ... , 510_N, 510_N+1 selects a pending access request from the K pending access requests 509_1 , 509_2, 509_3, ... , 509_K stored in the pending buffer 503. The first age selector 510_1 selects a read access request RDO directed to a first memory bank from the K pending access requests 509_1 , 509_2, 509_3, ... , 509_K; the second age selector 510_2 selects a read access request RD1 directed to a second memory bank from the K pending access requests 509_1 , 509_2, 509_3, ... , 509_K; the third age selector 510_2 selects a read access request RD2 directed to a third memory bank from the K pending access requests 509_1 , 509_2, 509_3, ... , 509_K and so on until the Nth (in this implementation form the eighth) age selector 510_N selects a read access request RD7 directed to an eighth memory bank from the K pending access requests 509_1 , 509_2, 509_3, ... , 509_K. The N+1th (in this implementation form the ninth) age selector 510_N+1 selects a write access request WR directed to a best memory bank, e.g. a memory bank having the most free entries or an arbitrary memory bank, from the K pending access requests 509_1 , 509_2, 509_3, ... , 509_K according to the mechanism described below.
In an implementation form, the oldest read access to each memory bank stored in the pending buffer 503 and the oldest write access stored in the pending buffer 503 is selected. The age selectors 510_1 , 510_2, ... , 510_N, 510_N+1 are configured to select the oldest pending access request directed to the respective memory bank 1 to N from the K pending access requests 509_1 , 509_2, 509_3, ... , 509_K stored in the pending buffer 503. In an implementation form, the write access request WR is selected according to the best memory bank. The mechanism can work in the following way:
[1] The memory controller 500 knows how many entries are used in each memory bank 210_1 , 210_2, 210_M and defines a list of memory banks that has less entries than others. This list is known as recommended banks and can hold a number of 1 to M memory banks.
[2] The recommended list is matched to the history buffer 101 and any memory bank that resides in both, i.e. the recommended list and the history buffer 101 , is removed from the recommended list.
[3] One of the memory banks in the resulting list (of phase [2]) is selected to be the bank to which the write transaction will be written.
[4] If the resulting list (of phase [2]) is empty, an arbitrary memory bank is selected for the write.
[5] The write transaction is processed the same way as the N read transactions. In an implementation form, the memory controller 500 uses an optimized window implementation. In an exemplary implementation form the implemented window for DRAM with 8 banks is limited to 8+1=9 positions even if the effective window is greater than 9. Memory with 8 banks has a maximum of 8 different accesses. For example: effective window with 30 positions should select one read access from each bank and one write access, the actual write bank access is selected later from available free banks. The selected read access should be the oldest of the bank.
In an implementation form, the pending access requests RD0, RD1 , RD2, RD3, RD4, RD5, RD6, RD7, WR selected by the age selectors 510_1 , 510_2 510_N, 510_N+1 correspond to the pending access requests RD0, RD1 , RD2, RD3, RD4, RD5, RD6, RD7, WR illustrated in Fig. 3 which are input to the decision matrix 315. In this implementation form, the decision matrix 315 may have a number of eight times nine, i.e. 72 decision elements 315_y_x. The memory controller 500 has a reduced complexity by minimizing the window size in the pending buffer 103 and can be efficiently implemented.
In an implementation form, the memory controller 500 applies the strategy of write caching. The basic idea for write caching is that write requests are typically non-critical in terms of performance, but read requests may be critical. As a result, it is typically desirable to cache write requests and allow read requests to proceed ahead. Furthermore, DRAM devices are typically poorly designed to support back-to-back read and write requests. When a column read command follows a write command, due to the differences in the direction of data flow between read and write commands, significant overheads exist when column read and write commands are pipelined back-to-back. The strategy of write caching allows read requests that may be critical to application performance to proceed ahead of write requests, and the write caching strategy can also reduce read-write overheads when it is combined with a strategy to send multiple write requests to the memory system consecutively. The memory controller 500 according to an implementation form utilizes the write caching strategy thereby prioritizing read requests over write requests.
Fig. 6 shows a performance diagram 600 of a memory controller according to an
implementation form. The performance diagram 600 illustrates normalized bandwidth over packet length, wherein packet lengths between 64 and 2007 are simulated. The memory controller 100 was simulated with various parameters and under random access scenario. The graphs 601 and 602 depicted in Fig. 6 illustrate the performance results with 601 and without 602 optimization for various length packets. A reference memory system containing 5 x 16-bit DDR3-1600 devices running at 800MHz is defined. Then the bandwidth required by packets in all lengths with 601 and without 602 optimization is simulated and normalized to the reference system. The graph 601 illustrates that for small packets the normalized bandwidth is 2.1. It means that for small packets, 2.1 times the reference memory bandwidth is consumed. The bandwidth requirements from the optimized design 601 are low compared to a design without optimization 602. The gap between the two lines of the graphs 601 and 602 is the bandwidth saved by the optimized design per packet lengths.
Fig. 7 shows a performance diagram 700 of a memory controller according to an
implementation form. In the previous example of Fig. 6, the window length for the optimized design is fixed. The graph illustrated in Fig. 7 shows the optimized design efficiency for different window lengths, e.g. value K described herein, in the range between 1 and 37. The design efficiency increases with increasing window size. When the window size is 1 (nothing to select), a design efficiency of 40% can be reached, for a window size of 8 a design efficiency of 82% can be reached and for a window size of 37 a design efficiency of 97% can be reached.
Fig. 8 shows a schematic diagram of a method 800 for controlling accesses to a memory according to an implementation form. The method 800 comprises storing processed access requests 801 to the memory in a history buffer; storing pending access requests 803 to the memory in a pending buffer; and selecting 805 an access request to be processed from the pending access requests, wherein the selection is based on a cost function of the pending access requests and the processed access requests.
The memory may comprise a number of M memory banks and the method may comprise a selecting of the access request to be processed from N pending read access requests and one pending write access request, wherein the N pending read access requests are directed to different ones of the memory banks.
The N pending read access requests, from which the access request to be processed is selected, may be the oldest read access requests stored in the pending buffer with respect to the memory banks they are directed to. The one write access request, from which the access request to be processed is selected, may be the oldest write access request stored in the pending buffer. The selecting the one pending write access request may comprise an assigning of memory banks from the M memory banks to recommended memory banks such that the
recommended memory banks have less entries than memory banks not assigned to the recommended memory banks. Memory banks, to which a write access request stored in the history buffer was directed, are not assigned to the recommended memory banks. The one pending write access request may be selected from pending write access requests directed to one of the recommended memory banks.
The one pending write access request can be selected from pending write access requests directed the one of the recommended memory banks which one has the lowest number of entries. If no memory bank is assigned to the recommended memory banks, the one pending write access request is selected from pending write access requests directed to an arbitrary one of the M memory banks.
In other words, the write is selected according to the best bank. The mechanism works in the following way:
[1] The memory controller 100 knows how many entries are used in each memory bank 210_1 , 201_2, 210_M and defines a list of memory banks that has less entries than others. This list is known as recommended banks and can hold a number of 1 to M memory banks.
[2] The recommended list is matched to the history buffer 101 and any memory bank that resides in both, i.e. the recommended list and the history buffer 101 , is removed from the recommended list.
[3] One of the memory banks in the resulting list (of phase [2]) is selected to be the bank to which the write transaction will be written.
[4] If the resulting list (of phase [2]) is empty, an arbitrary memory bank is selected for the write.
[5] The write transaction is processed the same way as the N read transactions.
From the foregoing, it will be apparent to those skilled in the art that a variety of methods, systems, computer programs on recording media, and the like, are provided. The present disclosure also supports a computer program product including computer executable code or computer executable instructions that, when executed, causes at least one computer to execute the performing and computing steps described herein. The present disclosure also supports a system configured to execute the performing and computing steps described herein.
Many alternatives, modifications, and variations will be apparent to those skilled in the art in light of the above teachings. Of course, those skilled in the art readily recognize that there are numerous applications of the invention beyond those described herein. While the present inventions has been described with reference to one or more particular embodiments, those skilled in the art recognize that many changes may be made thereto without departing from the scope of the present invention. It is therefore to be understood that within the scope of the appended claims and their equivalents, the inventions may be practiced otherwise than as specifically described herein.

Claims

CLAIMS:
1. The memory controller (100) for controlling accesses to a memory, comprising: a history buffer (101) configured to store processed access requests (11 1) to the memory; a pending buffer (103) configured to store pending access requests (109) to the memory; and a selection unit (105) configured to select an access request (107) to be processed from the pending access requests (109), the selection being based on a cost function of the pending access requests (109) and the processed access requests (1 11).
2. The memory controller (100) of claim 1 , wherein the cost function is based on timing parameters of the memory.
3. The memory controller (100) of one of the preceding claims, wherein the cost function is configured to evaluate waste cycles between each of the pending access requests (109) and a last processed one of the processed access requests (1 11); wherein the selection unit (105) is configured to select a pending access request (109) which minimizes the evaluated waste cycles.
4. The memory controller (100) of one of the preceding claims, wherein a length of the history buffer (101) is such that the processed access requests (1 1 1) stored in the history buffer (101) have non-negligible timing dependencies with respect to the pending access requests (109).
5. The memory controller (100) of one of the preceding claims, wherein an aging weight (403) is assigned to each of the pending access requests (109) in the pending buffer (103); wherein the aging weight (403) indicates a length of stay (405) of the associated pending access request (109) in the pending buffer (103).
6. The memory controller (100) of claim 5, wherein a cost provided by the cost function for a pending access request (109) is reduced by the aging weight (403) assigned to the pending access request (109).
7. The memory controller (100) of one of the preceding claims, wherein the memory (210) is a DDR SDRAM, in particular a DDR3 SDRAM; wherein the timing parameters are one or more of the following: a row-to-row activation delay time, a row-cycle time, a write recovery time, a read-to-write delay time, a write-to-read delay time, a burst length and a clock frequency.
8. The memory controller (100, 300, 500) of one of the preceding claims, wherein the memory (210) comprises a number of M memory banks (210_1 , 210_2, 210_M); wherein the selection unit (105) is configured to select the access request (107) to be processed from N pending read access requests (RDO, RD1 , RD2, RD3, RD4, RD5, RD6, RD7) and one pending write access request (WR); and wherein the N pending read access requests (RDO, RD1 , RD2, RD3, RD4, RD5, RD6, RD7) are directed to different ones of the memory banks (210_1 , 210_2, 210_M).
9. The memory controller (100, 300, 500) of claim 8, wherein the N pending read access requests (RDO, RD1 , RD2, RD3, RD4, RD5, RD6, RD7) are the oldest read access requests
(109) stored in the pending buffer (103) of size K with respect to the memory banks (210_1 , 210_2, 210_M) they are directed to, and wherein the one pending write access request (WR) is the oldest write access request (109) stored in the pending buffer (103).
10. The memory controller (100, 300, 500) of claim 9, wherein the number N of the pending read access requests (RDO, RD1 , RD2, RD3, RD4, RD5, RD6, RD7) plus the number 1 of the pending write access request (WR) is smaller or equal than the size K of the pending buffer (103).
1 1. A method (800) for controlling accesses to a memory (210), comprising: storing (801) processed access requests (1 11) to the memory (210) in a history buffer (101); storing (803) pending access requests (109) to the memory (210) in a pending buffer (103); and selecting (805) an access request (107) to be processed from the pending access requests (109), the selection being based on a cost function of the pending access requests (109) and the processed access requests (1 11).
12. The method (800) of claim 1 1 , wherein the memory (210) comprises a number of M memory banks (210_1 , 210_2, 210_M), the method (800) further comprising: selecting the access request (107) to be processed from N pending read access requests (RD0, RD1 , RD2, RD3, RD4, RD5, RD6, RD7) and one pending write access request (WR), wherein the N pending read access requests (RD0, RD1 , RD2, RD3, RD4, RD5, RD6, RD7) are directed to different ones of the memory banks (210_1 , 210_2, 210_M).
13. The method (800) of claim 12, wherein the N pending read access requests (RD0, RD1 , RD2, RD3, RD4, RD5, RD6, RD7), from which the access request (107) to be processed is selected, are the oldest read access requests (109) stored in the pending buffer (103) with respect to the memory banks (210_1 , 210_2, 210_M) they are directed to; and wherein the one write access request (WR), from which the access request (107) to be processed is selected, is the oldest write access request (109) stored in the pending buffer (103).
14. The method (800) of claim 12 or claim 13, wherein the selecting the one pending write access request (WR) comprises: assigning memory banks from the M memory banks (210_1 , 210_2, 210_M) to recommended memory banks such that the recommended memory banks have less entries than memory banks not assigned to the recommended memory banks, wherein memory banks, to which a processed write access request (11 1) stored in the history buffer (101) was directed, are not assigned to the recommended memory banks; and selecting the one pending write access request (WR) from pending write access requests (109) directed to one of the recommended memory banks.
15. The method (800) of claim 14, wherein the one pending write access request (WR) is selected from pending write access requests (109) directed to the one of the recommended memory banks which one has the lowest number of entries; and wherein the one pending write access request (WR) is selected from pending write requests (109) directed to an arbitrary one of the M memory banks (210_1 , 210_2, . 210_M) if no memory bank is assigned to the recommended memory banks.
PCT/EP2011/072158 2011-12-08 2011-12-08 Memory controller and method for controlling accesses to a memory WO2013083194A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2011/072158 WO2013083194A1 (en) 2011-12-08 2011-12-08 Memory controller and method for controlling accesses to a memory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2011/072158 WO2013083194A1 (en) 2011-12-08 2011-12-08 Memory controller and method for controlling accesses to a memory

Publications (1)

Publication Number Publication Date
WO2013083194A1 true WO2013083194A1 (en) 2013-06-13

Family

ID=45349486

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2011/072158 WO2013083194A1 (en) 2011-12-08 2011-12-08 Memory controller and method for controlling accesses to a memory

Country Status (1)

Country Link
WO (1) WO2013083194A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030217224A1 (en) * 2002-05-15 2003-11-20 Watts Jonathan Manuel Reordering requests for access to subdivided resource
US7376803B1 (en) * 2004-10-19 2008-05-20 Nvidia Corporation Page stream sorter for DRAM systems
US20090055570A1 (en) * 2007-08-22 2009-02-26 Madrid Philip E Detection of speculative precharge

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030217224A1 (en) * 2002-05-15 2003-11-20 Watts Jonathan Manuel Reordering requests for access to subdivided resource
US7376803B1 (en) * 2004-10-19 2008-05-20 Nvidia Corporation Page stream sorter for DRAM systems
US20090055570A1 (en) * 2007-08-22 2009-02-26 Madrid Philip E Detection of speculative precharge

Similar Documents

Publication Publication Date Title
KR102519019B1 (en) Ordering of memory requests based on access efficiency
US8832350B2 (en) Method and apparatus for efficient memory bank utilization in multi-threaded packet processors
US20210073152A1 (en) Dynamic page state aware scheduling of read/write burst transactions
JP5426036B2 (en) MEMORY ACCESS DEVICE FOR MEMORY SHARING OF MULTIPLE PROCESSORS AND ACCESS METHOD
US8667225B2 (en) Store aware prefetching for a datastream
US20180089087A1 (en) Byte-addressable flash-based memory module
JP7036925B2 (en) Memory controller considering cache control
US20240054082A1 (en) Memory module threading with staggered data transfers
US20190065243A1 (en) Dynamic memory power capping with criticality awareness
US8285917B2 (en) Apparatus for enhancing flash memory access
WO2005114669A2 (en) System and method for improving performance in computer memory systems supporting multiple memory access latencies
US10152434B2 (en) Efficient arbitration for memory accesses
US20130151795A1 (en) Apparatus and method for controlling memory
US7913013B2 (en) Semiconductor integrated circuit
CN102405466B (en) Memory control device and method for controlling same
Jacob A case for studying DRAM issues at the system level
WO2013083194A1 (en) Memory controller and method for controlling accesses to a memory
Shao Reducing main memory access latency through SDRAM address mapping techniques and access reordering mechanisms
WO2012069874A1 (en) Integrated circuit device, signal processing system and method for prefetching lines of data therefor
KR20070020391A (en) Dmac issue mechanism via streaming id method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11796967

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11796967

Country of ref document: EP

Kind code of ref document: A1