Search Images Maps Play YouTube News Gmail Drive More »
Sign in
Screen reader users: click this link for accessible mode. Accessible mode has the same essential features but works better with your reader.

Patents

  1. Advanced Patent Search
Publication numberUS20060059316 A1
Publication typeApplication
Application numberUS 11/030,010
Publication dateMar 16, 2006
Filing dateJan 5, 2005
Priority dateSep 10, 2004
Also published asCN100533372C, CN101036117A, CN101036117B, CN101040256A, CN101053234A, CN101053234B, CN101069170A, CN101069170B, CN101128804A, CN101128804B, US7941585, US9141548, US20060059286, US20060059310, US20140317353
Publication number030010, 11030010, US 2006/0059316 A1, US 2006/059316 A1, US 20060059316 A1, US 20060059316A1, US 2006059316 A1, US 2006059316A1, US-A1-20060059316, US-A1-2006059316, US2006/0059316A1, US2006/059316A1, US20060059316 A1, US20060059316A1, US2006059316 A1, US2006059316A1
InventorsDavid Asher, Gregg Bouchard, Richard Kessler, Robert Sanzone
Original AssigneeCavium Networks
Export CitationBiBTeX, EndNote, RefMan
External Links: USPTO, USPTO Assignment, Espacenet
Method and apparatus for managing write back cache
US 20060059316 A1
Abstract
A network services processor includes an input/output bridge that avoids unnecessary updates to memory when cache blocks storing processed packet data are no longer required. The input/output bridge monitors requests to free buffers in memory received from cores and 10 units in the network services processor. Instead of writing the cache block back to the buffer in memory that will be freed, the input/output bridge issues don't write back commands to a cache controller to clear the dirty bit for the selected cache block, thus avoiding wasteful write-backs from cache to memory. After the dirty bit is cleared, the buffer in memory is freed, that is, made available for allocation to store data for another packet.
Images(7)
Previous page
Next page
Claims(17)
1. A network services processor comprising:
a plurality of processors;
a coherent shared memory including a cache and a memory, the coherent shared memory shared by the plurality of processors; and
an input/output bridge coupled to the plurality of processors and the cache, the input/output bridge monitoring requests to free a buffer in memory to avoid writing a modified cache block in the cache back to the buffer.
2. The network services processor of claim 1, wherein upon detecting a request to free the buffer, the input/output bridge issues a command to clear a dirty bit associated with the cache block.
3. The network services processor of claim 2 further comprising:
a cache controller coupled to the plurality of processors, the cache and the input/output bridge, the cache controller storing the dirty bit associated with the block and clearing the dirty bit upon receiving the command from the input/output bridge.
4. The network services processor of claim 3 wherein the input/output bridge further comprises:
a don't write back queue which stores commands to be issued to the cache controller.
5. The network services processor of claim 4 wherein the input/output bridge further comprises:
a free queue that stores requests to free blocks to be added to a free pool.
6. The network services processor of claim 5 further comprising:
a plurality of processing units coupled to the input/output bridge, the input/output bridge storing packets to be transferred between processing units and the coherent shared memory in which packets are stored for processing by the processors.
7. The network services processor of claim 1, further comprising:
a memory allocator which provides free lists of buffers in memory for storing received packets.
8. The network services processor of claim 1, wherein the coherent shared memory is coupled to the processors and input/output bridge by a coherent memory bus that includes a commit bus, a store bus, a fill bus and an add bus.
9. A method for increasing memory bandwidth comprising:
sharing a coherent shared memory among a plurality of processors, the coherent shared memory including a cache and a memory; and
monitoring requests to free a buffer in memory to avoid writing a modified cache block in the cache back to the buffer.
10. The method of claim 9 further comprising:
upon detecting a request to free the buffer, issuing a command to clear a dirty bit associated with the cache block.
11. The method of claim 10 further comprising:
storing commands to be issued to the cache controller in a don't write back queue.
12. The method of claim 10 further comprising:
storing requests to free blocks to be added to a free pool in a free queue.
13. The method of claim 10 further comprising:
storing packets to be transferred between a plurality of processing units and the coherent shared memory in which packets are stored for processing by the processors.
14. The method of claim 9, further comprising:
providing a list of free buffers in memory for storing received packets.
15. The method of claim 9, wherein the coherent shared memory is coupled to the processors and input/output bridge by a coherent memory bus that includes a commit bus, a store bus, a fill bus and an add bus.
16. A network services processor comprising:
means for sharing, by a plurality of processors a coherent shared memory, the coherent shared memory including a cache and a memory; and
means for monitoring requests to free a buffer in memory to avoid writing a modified cache block in the cache back to the buffer.
17. A system for managing a write back cache comprising:
a memory; and
logic which issues a don't write back command in response to a request to free a buffer in the memory, the don't write back command issued to clear a dirty bit in a cache block associated with the buffer to avoid writing the modified cache block back to the buffer.
Description
    RELATED APPLICATION
  • [0001]
    This application claims the benefit of U.S. Provisional Application No. 60/609,211, filed on Sep. 10, 2004. The entire teachings of the above application are incorporated herein by reference.
  • BACKGROUND OF THE INVENTION
  • [0002]
    A multi-processing system includes a plurality of processors that share a single memory. Typically, multi-level caches are used to reduce memory bandwidth demands on the single memory. The multi-level caches may include a first-level private cache in each processor and a second-level cache shared by all of the processors. As the cache is much smaller than the memory in the system, only a portion of the data stored in buffers/blocks in memory is replicated in the cache.
  • [0003]
    If data stored in a buffer/block requested by a processor is replicated in the cache, there is a cache hit. If the requested data is not replicated in the cache, there is a cache miss and the requested block that stores the data is retrieved from memory and also stored in the cache.
  • [0004]
    When shared data is cached, the shared value may be replicated in multiple first-level caches. Thus, caching of shared data requires cache coherence. Cache coherence ensures that multiple processors see a consistent view of memory, for example, a read of the shared data by any of the processors returns the most recently written value of the data.
  • [0005]
    Typically, blocks of memory (cache blocks) are replicated in cache and each cache block has an associated tag that includes a so-called dirty bit. The state of the dirty bit indicates whether the cache block has been modified. In a write back cache, the modified cache block is written back to memory only when the modified cache block is replaced by another cache block in the cache.
  • SUMMARY OF THE INVENTION
  • [0006]
    In a network services processor, when the modified cache block is replaced in the cache, the modified cache block may not always need to be written back to memory. For example, the cache block can be used to store packet data while it is being processed. After the data has been processed, the processed packet data stored in the cache block is no longer required and the buffer in memory is freed, that is, made available for allocation to store data for another packet. As the processed packet data that is stored in the cache block will not be used when the buffer in memory is re-allocated for storing other packet data, it would be wasteful to write the cache block in the cache back to the buffer in memory. Not performing a write operation to write the cache block back to memory reduces both the time taken for the write operation in the processor and the memory bandwidth to write the data to memory.
  • [0007]
    Accordingly, a network services processor includes a input/output bridge that avoids unnecessary memory updates when cache blocks storing processed packet data are no longer required, that is, buffers in memory (corresponding to the cache blocks in cache) are freed. Instead of writing the cache block back to memory, only the dirty bit for the selected cache block is cleared, thus avoiding these wasteful write-backs from cache to memory.
  • [0008]
    A network services processor includes a plurality of processors and a coherent shared memory. The coherent memory includes a cache and a memory and is shared by the plurality of processors. An input/output bridge is coupled to the plurality of processors and the cache. The input/output bridge monitors requests to free a buffer in memory (that is, a buffer that has been allocated for storing packet data) to avoid writing a modified cache block in the cache back to the buffer.
  • [0009]
    Upon detecting a request to free the block stored in cache memory, the input/output bridge issues a command to clear a dirty bit associated with the cache block. A cache controller may be coupled to the plurality of processors, the cache and the input/output bridge. The cache controller stores the dirty bit associated with the block and clears the dirty bit upon receiving the command from the input/output bridge.
  • [0010]
    The input/output bridge may also include a don't write back queue which stores commands to be issued to the cache controller. The input/output bridge may include a free queue that stores requests to free blocks to be added to a free pool. The network services processor may also include a plurality of processing units coupled to the input/output bridge. The input/output bridge stores packets to be transferred between the processing units and the coherent shared memory in which packets are stored for processing by the processors.
  • [0011]
    The network services processor may also include a memory allocator that provides free lists of blocks in shared coherent memory for storing received packets.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • [0012]
    The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular description of preferred embodiments of the invention, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention.
  • [0013]
    FIG. 1 is a block diagram of a security appliance including a network services processor according to the principles of the present invention;
  • [0014]
    FIG. 2 is a block diagram of the network services processor shown in FIG. 1;
  • [0015]
    FIG. 3 is a block diagram illustrating a Coherent Memory Bus (CMB) coupled to cores, L2 cache controller and Input/Output Bridge (IOB) and units for performing input and output packet processing coupled to the IOB through the IO bus;
  • [0016]
    FIG. 4 is a block diagram of the cache controller and L2 cache shown in FIG. 3;
  • [0017]
    FIG. 5 is a block diagram of the I/O Bridge (IOB) in the network services processor shown in FIG. 3; and
  • [0018]
    FIG. 6 illustrates the format of a pool free command to add a free address to a pool.
  • DETAILED DESCRIPTION OF THE INVENTION
  • [0019]
    A description of preferred embodiments of the invention follows.
  • [0020]
    FIG. 1 is a block diagram of a security appliance 102 including a network services processor 100 according to the principles of the present invention. The security appliance 102 is a standalone system that can switch packets received at one Ethernet port (Gig E) to another Ethernet port (Gig E) and perform a plurality of security functions on received packets prior to forwarding the packets. For example, the security appliance 102 can be used to perform security processing on packets received on a Wide Area Network prior to forwarding the processed packets to a Local Area Network. The network services processor 100 includes hardware packet processing, buffering, work scheduling, ordering, synchronization, and cache coherence support to accelerate packet processing tasks according to the principles of the present invention.
  • [0021]
    The network services processor 100 processes Open System Interconnection network L2-L7 layer protocols encapsulated in received packets. As is well-known to those skilled in the art, the Open System Interconnection (OSI) reference model defines seven network protocol layers (L1-7). The physical layer (L1) represents the actual interface, electrical and physical that connects a device to a transmission medium. The data link layer (L2) performs data framing. The network layer (L3) formats the data into packets. The transport layer (L4) handles end to end transport. The session layer (L5) manages communications between devices, for example, whether communication is half-duplex or full-duplex. The presentation layer (L6) manages data formatting and presentation, for example, syntax, control codes, special graphics and character sets. The application layer (L7) permits communication between users, for example, file transfer and electronic mail.
  • [0022]
    The network services processor performs work (packet processing operations) for upper level network protocols, for example, L4-L7. The packet processing (work) to be performed on a particular packet includes a plurality of packet processing operations (pieces of work). The network services processor allows processing of upper level network protocols in received packets to be performed to forward packets at wire-speed. Wire-speed is the rate of data transfer of the network over which data is transmitted and received. By processing the protocols to forward the packets at wire-speed, the network services processor does not slow down the network data transfer rate.
  • [0023]
    The network services processor 100 includes a plurality of Ethernet Media Access Control interfaces with standard Reduced Gigabyte Media Independent Interface (RGMII) connections to the off-chip PHYs 104 a, 104 b.
  • [0024]
    The network services processor 100 receives packets from the Ethernet ports (Gig E) through the physical interfaces PHY 104 a, 104 b, performs L7-L2 network protocol processing on the received packets and forwards processed packets through the physical interfaces 104 a, 104 b to another hop in the network or the final destination or through the PCI bus 106 for further processing by a host processor. The network protocol processing can include processing of network security protocols such as Firewall, Application Firewall, Virtual Private Network (VPN) including IP Security (IPSec) and/or Secure Sockets Layer (SSL), Intrusion detection System (IDS) and Anti-virus (AV).
  • [0025]
    A DRAM controller in the network services processor 100 controls access to an external Dynamic Random Access Memory (DRAM) 108 that is coupled to the network services processor 100. The DRAM 108 stores data packets received from the PHYs interfaces 104 a, 104 b or the Peripheral Component Interconnect Extended (PCI-X) interface 106 for processing by the network services processor 100. In one embodiment, the DRAM interface supports 64 or 128 bit Double Data Rate II Synchronous Dynamic Random Access Memory (DDR II SDRAM) operating up to 800 MHz.
  • [0026]
    A boot bus 110 provides the necessary boot code which is stored in flash memory 112 and is executed by the network services processor 100 when the network services processor 100 is powered-on or reset. Application code can also be loaded into the network services processor 100 over the boot bus 110, from a device 114 implementing the Compact Flash standard, or from another high-volume device, which can be a disk, attached via the PCI bus.
  • [0027]
    The miscellaneous I/O interface 116 offers auxiliary interfaces such as General Purpose Input/Output (GPIO), Flash, IEEE 802 two-wire Management Interface (MDIO), Universal Asynchronous Receiver-Transmitters (UARTs) and serial interfaces.
  • [0028]
    The network services processor 100 includes another memory controller for controlling Low latency DRAM 118. The low latency DRAM 118 is used for Internet Services and Security applications allowing fast lookups, including the string-matching that may be required for Intrusion Detection System (IDS) or Anti Virus (AV) applications.
  • [0029]
    FIG. 2 is a block diagram of the network services processor 100 shown in FIG. 1. The network services processor 100 delivers high application performance using a plurality of processor cores 202.
  • [0030]
    In one embodiment, each processor core 202 is a dual-issue, superscalar processor with instruction cache 206, Level 1 data cache 204, and built-in hardware acceleration (crypto acceleration module) 200 for cryptography algorithms with direct access to low latency memory over the low latency memory bus 230.
  • [0031]
    The network services processor 100 includes a memory subsystem. The memory subsystem includes level 1 data cache memory 204 in each core 202, instruction cache in each core 202, level 2 cache memory 212, a DRAM controller 216 for access to external DRAM memory 108 (FIG. 1) and an interface 230 to external low latency memory.
  • [0032]
    The memory subsystem is architected for multi-core support and tuned to deliver both high-throughput and low-latency required by memory intensive content networking applications. Level 2 cache memory 212 and external DRAM memory 108 (FIG. 1) are shared by all of the cores 202 and I/O co-processor devices over a coherent memory bus 234. The coherent memory bus 234 is the communication channel for all memory and I/O transactions between the cores 202, an I/O Bridge (IOB) 232 and the Level 2 cache and controller 212.
  • [0033]
    Frequently used data values stored in DRAM 108 (FIG. 1) may be replicated for quick access in cache (L1 or L2). The cache stores the contents of frequently accessed locations in DRAM 108 (FIG. 1) and the address in DRAM where the contents are stored. If the cache stores the contents of an address in DRAM requested by a core 202, there is a “hit” and the data stored in the cache is returned. If not, there is a “miss” and the data is read directly from the address in DRAM 108 (FIG. 1).
  • [0034]
    A Free Pool Allocator (FPA) 236 maintains pools of pointers to free memory locations (that is, memory that is not currently used and is available for allocation) in DRAM 108 (FIG. 1). In one embodiment, the FPA unit 236 implements a bandwidth efficient (Last In First Out (LIFO)) stack for each pool of pointers.
  • [0035]
    In one embodiment, pointers submitted to the free pools are aligned on a 128 byte boundary and each pointer points to at least 128 bytes of free memory. The free size (number of bytes) of memory can differ in each pool and can also differ within the same pool. In one embodiment, the FPA unit 236 stores up to 2048 pointers. Each pool uses a programmable portion of these 2048 pointers, so higher priority pools can be allocated a larger amount of free memory. If a pool of pointers is too large to fit in the Free Pool Allocator (FPA) 236, the Free Pool Allocator (FPA) 236 builds a tree/list structure in level 2 cache 212 or DRAM using freed memory in the pool of pointers to store additional pointers.
  • [0036]
    The I/O Bridge (IOB) 232 manages the overall protocol and arbitration and provides coherent I/O partitioning. The IOB 232 includes a bridge 238 and a Fetch and Add Unit (FAU) 240. The bridge 238 includes queues for storing information to be transferred between the I/O bus 262, coherent memory bus 234, and the IO units including the packet input unit 214 and the packet output unit 218. The bridge 238 also includes a Don't Write Back (DWB) engine 260 that monitors requests to free memory in order to avoid unnecessary cache updates to DRAM 108 (FIG. 1) when cache blocks are no longer required (that is, the buffers in memory are freed) by adding them to a free pool in the FPA unit 236. Prior to describing the operation of the bridge 238 in further detail, the IO units coupled to the IO bus 262 in the network services processor 100 will be described.
  • [0037]
    Packet Input/Output processing is performed by an interface unit 210 a, 210 b, a packet input unit (Packet Input) 214 and a packet output unit (PKO) 218. The input controller and interface units 210 a, 210 b perform all parsing of received packets and checking of results to offload the cores 202.
  • [0038]
    The packet input unit 214 allocates and creates a work queue entry for each packet. This work queue entry includes a pointer to one or more buffers (blocks) stored in L2 cache 212 or DRAM 108 (FIG. 1). The packet input unit 214 writes packet data into buffers in Level 2 cache 212 or DRAM 108 in a format that is convenient to higher-layer software executed in at least one processor core 202 for further processing of higher level network protocols. The packet input unit 214 supports a programmable buffer size and can distribute packet data across multiple buffers in DRAM 108 (FIG. 1) to support large packet input sizes.
  • [0039]
    A packet is received by any one of the interface units 210 a, 210 b through a SPI-4.2 or RGM II interface. A packet can also be received by the PCI interface 224. The interface unit 210 a, 210 b handles L2 network protocol pre-processing of the received packet by checking various fields in the L2 network protocol header included in the received packet. After the interface unit 210 a, 210 b has performed L2 network protocol processing, the packet is forwarded to the packet input unit 214. The packet input unit 214 performs pre-processing of L3 and L4 network protocol headers included in the received packet. The pre-processing includes checksum checks for Transmission Control Protocol (TCP)/User Datagram Protocol (UDP) (L3 network protocols).
  • [0040]
    The Packet order/work (POW) module (unit) 228 queues and schedules work (packet processing operations) for the processor cores 202. Work is defined to be any task to be performed by a core that is identified by an entry on a work queue. The task can include packet processing operations, for example, packet processing operations for L4-L7 layers to be performed on a received packet identified by a work queue entry on a work queue. The POW module 228 selects (i.e. schedules) work for a core 202 and returns a pointer to the work queue entry that describes the work to the core 202.
  • [0041]
    After the packet has been processed by the cores 202, a packet output unit (PKO) 218 reads the packet data stored in L2 cache 212 or memory (DRAM 108 (FIG. 1)), performs L4 network protocol post-processing (e.g., generates a TCP/UDP checksum), forwards the packet through the interface unit 210 a, 210 b and frees the L2 cache 212 or DRAM 108 locations used to store the packet by adding pointers to the locations in a pool in the FPA unit 236.
  • [0042]
    The network services processor 100 also includes application specific co-processors that offload the cores 202 so that the network services processor achieves high-throughput. The application specific co-processors include a DFA co-processor 244 that performs Deterministic Finite Automata (DFA) and a compression/decompression co-processor 208 that performs compression and decompression.
  • [0043]
    The Fetch and Add Unit (FAU) 240 is a 2 KB register file supporting read, write, atomic fetch-and-add, and atomic update operations. The PCI interface controller 224 has a DMA engine that allows the processor cores 202 to move data asynchronously between local memory in the network services processor and remote (PCI) memory in both directions.
  • [0044]
    FIG. 3 is a block diagram illustrating the Coherent Memory Bus (CMB) 234 coupled to the cores 202, L2 cache controller 212 and Input/Output Bridge (IOB) 232. FIG. 3 also illustrates IO units for performing input and output packet processing coupled to the IOB 232 through the IO bus 262. The CMB 234 is the communication channel for all memory and I/O transactions between the cores 202, the IOB 232 and the L2 cache controller and cache 212.
  • [0045]
    The CMB 234 includes four busses: ADD 300, STORE 302, COMMIT 304, and FILL 306. The ADD bus 300 transfers address and control information to initiate a CMB transaction. The STORE bus 302 transfers the store data associated with a transaction. The COMMIT bus 304 transfers control information that initiates transaction responses from the L2 cache. The FILL bus 306 transfers fill data (cache blocks) from the L2 cache controller and cache 212 to the L1 data cache 204 and reflection data for transfers from a core 202 to the I/O bus 262. The reflection data includes commands/results that are transferred between the I/O Bridge 232 and cores 202. The CMB 234 is a split-transaction highly pipelined bus. For an embodiment with a cache block size of 128 bytes, a CMB transaction transfers a cache block size at a time.
  • [0046]
    All of the busses in the CMB 234 are decoupled by queues in the L2 cache controller and cache 212 and the bridge 238. This decoupling allows for variable timing between the different operations required to complete different CMB transactions.
  • [0047]
    Memory requests to coherent memory space initiated by a core 202 or the IOB 232 are directed to the L2 cache controller 212. The IOB 232 initiates memory requests on behalf of I/O units coupled to the IO bus 262.
  • [0048]
    A fill transaction initiated by a core 202 replicates contents of a cache block in either L1 instruction cache 206 (FIG. 1) or L1 data cache 204 (FIG. 1). Once the core wins arbitration for the ADD bus 300, it puts control information (that is, the fill transaction) and the address of the cache block on the ADD bus 300. The L2 cache controller 212 receives the ADD bus information, and services the transaction by sending a fill indication on the COMMIT bus 304 and then transferring the cache block on the FILL bus 306.
  • [0049]
    A store transaction puts contents of a cache block stored in L1 instruction cache 206 (FIG. 2) or L1 data cache 204 (FIG. 2) into L2 cache. Once the initiator (core or IOB) wins arbitration for the ADD bus, it puts control information (store transaction), the address of the cache block and the number of transfers required on the ADD bus. The STORE bus cycles are scheduled later, after the STORE bus 302 is available. The store data is driven onto the STORE bus 302 by the cores or IOB 232. For an embodiment with a cache block size of 128 bytes and 128-bit octaword (16 byte) transfers, the number of cycles on the STORE bus 302 can range from one to eight to transfer an entire cache block. If a copy of the cache block is not stored in L1 data cache 204 in another core, no core data cache invalidation is required and the L2 cache controller 212 puts a commit operation on the COMMIT bus 304. The commit operation indicates that the store is visible to all users of the CMB at this time. If an out-of-date copy of the cache block resides in at least one L1 data cache 204 in a core 202, a commit/invalidation operation appears on the COMMIT bus 304, followed by an invalidation cycle on the FILL bus 306.
  • [0050]
    A Don't write back command issued by the IOB 232 results in control information and the address of the cache block placed on the ADD bus 300. The L2 cache controller 212 receives the ADD bus information and services the command by clearing a dirty bit in a tag associated with the cache block, if the cache block is present in the L2 cache. The L2 cache controller and cache 212 will be described later in conjunction with FIG. 4. By clearing the dirty bit in the tag associated with the cache block, a write of the cache block back to DRAM 108 (FIG. 1) is avoided. In a write-back cache, this write is avoided whenever the cache block is replaced in the L2 cache.
  • [0051]
    As already discussed in conjunction with FIG. 1 and FIG. 2, packets are received through any one of the interface units 210 a, 210 b or the PCI interface 224. The interface units 210 a, 210 b and packet input unit 214 perform parsing of received packets and check the results of the parsing to offload the cores 202. The interface unit 210 a, 210 b checks the L2 network protocol trailer included in a received packet for common exceptions. If the interface unit 210 a, 210 b accepts the packet, the Free Pool Allocator (FPA) 236 allocates memory for storing the packet data in L2 cache memory or DRAM 108 (FIG. 1) and the packet is stored in the allocated memory (cache or DRAM).
  • [0052]
    The packet input unit 214 includes a Packet Input Processing (PIP) unit 302 and an Input Packet Data unit (IPD) 400. The packet input unit 214 uses one of the pools of pointers in the FPA unit 236 to store received packet data in level 2 cache or DRAM.
  • [0053]
    The I/O busses include an inbound bus (IOBI) 308 and an outbound bus (IOBO) 310, a packet output bus (POB) 312, a PKO-specific bus (PKOB) 316 and an input packet data bus (IPDB) 314. The interface unit 210 a, 210 b places the 64-bit packet segments from the received packets onto the IOBI bus 308. The IPD 400 in the packet input unit 214 latches each 64-bit packet segment from the IOBI bus for processing. The IPD 400 accumulates the 64 bit packet segments into 128-byte cache blocks. The IPD 400 then forwards the cache block writes on the IPDB bus 314. The I/O Bridge 232 forwards the cache block write onto the Coherent Memory Bus (CMB) 234.
  • [0054]
    A work queue entry is added to a work queue by the packet input unit 214 for each packet arrival. The work queue entry is the primary descriptor that describes work to be performed by the cores. The Packet Order/Work (POW) unit 228 implements hardware work queuing, hardware work scheduling and tag-based synchronization and ordering to queue and schedule work for the cores.
  • [0055]
    FIG. 4 is a block diagram of the Level 2 cache controller and L2 cache 212 shown in FIG. 3. The Level 2 cache controller and L2 cache 212 includes an interface to the CMB 234 and an interface to the DRAM controller 216. In one embodiment, the CMB interface is 384 bits wide, the DRAM interface is 512 bits wide, and the internal cache data interfaces are 512 bits wide. The L2 cache in the L2 cache and controller 212 is shared by all of the cores 202 and the I/O units, although it can be bypassed using particular transactions on the CMB 234.
  • [0056]
    The L2 cache controller 212 also contains internal buffering and manages simultaneous in-flight transactions. The L2 cache controller 212 maintains copies of tags for L1 data cache 204 in each core 202 and initiates invalidations to the L1 data cache 204 in the cores 202 when other CMB sources update blocks in the L1 data cache.
  • [0057]
    In one embodiment, the L2 cache is 1 MB, 8-way set associative with a 128 byte cache block. In a set associative cache, a cache block read from memory can be stored in a restricted set of blocks in the cache. A cache block is first mapped to a set of blocks and can be stored in any block in the set. For example, in an 8-way set associative cache, there are 32 blocks in a set of blocks and a 128 byte block in memory can replicated in any block in the set of blocks in the cache. The cache controller includes an address tag for each block that stores the block address. The address tag is stored in the L2 tags 410.
  • [0058]
    The CMB 234 includes write-invalidate coherence support. The data cache 204 in each core is a write-through cache. The L2 cache is write-back and both the data stored in the L2 cache 612 and the tags stored in L2 tags 410 are protected by a Single Error Correction, Double Error Detection Error Correction Code (SECDED ECC).
  • [0059]
    The L2 cache controller 212 maintains memory reference coherence and returns the latest copy of a block for every fill request, whether the latest copy of the block is in the cache (L1 data cache 204 or L2 data cache 612), in DRAM 108 (FIG. 1) or in flight. The L2 cache controller 212 also stores a duplicate copy of the tags in duplicate tags 412 for each core's L1 data cache 204. The L2 cache controller 212 compares the addresses of cache block store requests against the data cache tags stored in the duplicate tags 412, and invalidates (both copies) a data cache tag for a core 202 whenever the store is from another core 202 or coupled to the IO bus 262 (FIG. 2) from an IO unit via the IOB 232.
  • [0060]
    The L2 cache controller 212 has two memory input queues 602 that receive memory transactions from the ADD bus 300: one for transactions initiated by cores 202 and one for transactions initiated by the IOB 236.
  • [0061]
    The two queues 602 allow the L2 cache controller 212 to give the IOB memory transactions a higher priority than core transactions. The L2 cache controller 212 processes transactions from the queues 602 in one of two programmable arbitration modes, fixed priority or round-robin allowing IOB transactions required to service real-time packet transfers to be processed at a higher priority.
  • [0062]
    The L2 cache controller 212 also services CMB reflections, that is, non-memory transactions that are necessary to transfer commands and/or data between the cores and the IOB. The L2 cache controller 212 includes two reflection queues 604, 606, that store the ADD/STORE bus information to be reflected. Two different reflection queues are provided to avoid deadlock: reflection queue 604 stores reflections destined to the cores 202, and reflection queue 606 stores reflections destined to the IOB 236 over the FILL bus and COMMIT bus.
  • [0063]
    The L2 cache controller 212 can store and process up to 16 simultaneous memory transactions in its in-flight address buffer 610. The L2 cache controller 212 can also manage up to 16 in-flight cache victims, and up to four of these victims may reside in the victim data file 608. On a fill transaction, received data is returned from either the L2 cache or DRAM 108 (FIG. 1). The L2 cache controller 212 deposits data received on the STORE bus 302 into a file associated with the in-flight addresses 610. Stores can either update the cache 612 or be written-through to DRAM 108 (FIG. 1). Stores that write into the L2 data cache 612 do not require a DRAM fill to first read the old data in the block, if the store transaction writes the entire cache block.
  • [0064]
    All data movement transactions between the L2 cache controller 212 and the DRAM controller 216 are 128 byte, full-cache blocks. The L2 cache controller 212 buffers DRAM controller fills in one or both of two queues: in a DRAM-to-L2 queue 420 for data destined to be written to L2 cache 612, and in a DRAM-to-CMB queue 422 for data destined for the FILL bus 306. The L2 cache controller 212 buffers stores for the DRAM controller in the victim address/data files 414, 608 until the DRAM controller 216 accepts them.
  • [0065]
    The cache controller buffers all the COMMIT/FILL bus commands needed from each possible source: the two reflection queues 604, 606, fills from L2/DRAM 420, 422, and invalidates 416.
  • [0066]
    FIG. 5 is a block diagram of the I/O Bridge (IOB) 232 shown in FIG. 3. The I/O Bridge (IOB) 232 manages the overall protocol and arbitration and provides coherent I/O partitioning. The IOB 232 has three virtual busses (1) 1/O to I/O (request and response) (2) core to I/O (request) and (3) I/O to L2 Cache (request and response). The IOB also has separate PKO and IPD interfaces.
  • [0067]
    The IOB 232 includes twelve queues 500 a-l to store information to be transferred on different buses. There are six queues 500 a-f arbitrating to transfer on the ADD/STORE buses of the Coherent Memory Bus (CMB) 234 and five queues 500 g-k arbitrating to transfer on the IOBO bus. Another queue 5001 queues packet data to be transferred to the PKO 218 (FIG. 3).
  • [0068]
    As previously discussed, when a buffer in memory is added to a free pool in the FPA unit 236, that buffer may also be replicated in a cache block in cache (L1 data cache 204 in a core 202 or L2 cache 612). Furthermore, these cached blocks may store a more current version of the data than stored in the corresponding block in DRAM 108 (FIG. 1). That is, the cache blocks in cache may be “dirty”, signified by a dirty bit set in a tag associated with each cache block stored in L2 tags 410 (FIG. 4). As is well-known in the art, a “dirty” bit is a bit used to mark modified data stored in a cache so that the modification may be carried over to primary memory (DRAM 108 (FIG. 1)).
  • [0069]
    In a write-back cache, when dirty blocks are replaced in the cache, the dirty cache blocks are written back to DRAM to ensure that the data in the block stored in the DRAM is up-to-date. The memory has just been freed and it will not be used until it is re-allocated for processing another packet, so it would be wasteful to write the cache blocks from the level 2 cache back to the DRAM. It is more efficient to clear the dirty bit for any of these blocks that are replicated in the cache to avoid writing the ‘dirty’ cache blocks to DRAM later.
  • [0070]
    The core freeing the memory executes a store instruction to add the address to pool of free buffers. The store instruction from the core is reflected through reflection queue 606 on FILL bus 306 of the CMB. The IOB 232 can create Don't write back (DWB) CMB commands as a result of the memory free command.
  • [0071]
    The DWB command results in a Don't Write Back (DWB) coherent memory bus transaction on the ADD bus 300 that results in clearing the dirty bit in the L2 tags 410, if the cache block is present in the L2 cache. This is an ADD-bus only transaction on the coherent memory bus. This architecture allows the DWB engine 260 to be separated from the free pool unit 236. In one embodiment, the DWB engine 260 resides nearer to the cache controller, so less bandwidth is required to issue the DWB commands on the coherent memory bus 234. The Don't write back operation is used to avoid unnecessary writebacks from the L2 cache to DRAM for free memory locations (that is, memory blocks (buffers) in a free memory pool available for allocation).
  • [0072]
    When a core 202 or I/O unit coupled to the IO bus 262 adds free memory to a pool in the FPA unit 236, it not only specifies the address of the free memory, but also specifies the number of cache blocks for which the DWB engine 260 can send DWB commands to the L2 cache controller. The core or I/O module need not initiate any DWB commands. Rather, the DWB engine 260 automatically creates the DWB commands when it observes the command to add free memory to a pool in the FPA unit 236.
  • [0073]
    The DWB engine 260 avoids unnecessary cache memory updates when buffers replicated in cache blocks in cache that store processed packets are freed by intercepting memory free requests destined for the free pool allocator (FPA) 236. The IOB 232 intercepts memory free commands arriving from either the cores (via a reflection onto the COMMIT/FILL busses 304, 306) or from other IO units (via the IOBI bus 308). When the DWB engine 260 observes a memory free operation, it intercepts and queues the memory free operation. The free memory is not made available to the FPA unit 236 while the DWB engine 260 is sending DWB commands for the free memory. Later the DWB engine 260 sends all necessary DWB commands for the free memory. After all of the DWB commands are completed/visible, the memory free operation continues by forwarding the request to the FPA unit 236.
  • [0074]
    The IOB 232 can buffer a limited number of the memory free commands inside the DWB 260. If buffering is available, the IOB intercepts the memory free request until the IOB 232 has finished issuing the CMB DWB commands through the DWB engine 260 to the L2 cache controller queue 500 e for the request, and then forwards the request onto the FPA unit 236 (via the OBO bus 310). It is optional for the IOB 232 to issue the DWB requests. Thus, if buffering is not available in the DWB 232, the DWB engine 260 does not intercept the memory free request, and instead the memory free request is forwarded directly to the FPA unit 236 and no DWB commands are issued.
  • [0075]
    The memory free requests include a hint indicating the number of DWB Coherent Memory Bus (CMB) transactions that the IOB 232 can issue. Don't Write Back (DWB) commands are issued on the ADD bus 300 in the Coherent Memory Bus (CMB) 234 for free memory blocks so that DRAM bandwidth is not unnecessarily wasted writing the freed cache blocks back to DRAM. The DWB commands are queued on the DWB-to-L2C queue 500 e and result in the L2 cache controller 212 clearing the dirty bits for the selected blocks in the L2 tags 410 in the L2 cache memory controller, thus avoiding these wasteful write-backs to DRAM 108 (FIG. 1).
  • [0076]
    Returning to FIG. 4, the DWB command enters the “in flight address” structure 610. Eventually, it is selected to be sent to the L2 tags 410. The address in the DWB command is compared to the addresses stored in the L2 tags 410, and the dirty bit the L2 tag is cleared if the associated address is replicated in cache (that is, there is a ‘hit’), the dirty bit is cleared. If the associated address hits in a write-buffer entry in a write buffer in a core 202 (that is, the data has not yet been updated in L2 cache), the write-buffer entry is invalidated. This way, all memory updates for the cache block are voided.
  • [0077]
    No further processing of the address is performed, that is the address is not checked against copies of the L1 tags in the “Duplicate Tags” block 412, and the victim address file 414 as would be the case for other in flight addresses.
  • [0078]
    Returning to FIG. 5, the DWB engine 260 in the input/output bridge 232 waits to receive a commit from the L2 cache controller before it can pass the free request onto the FPA unit 236. The IOB bridges the address/data pair into the IOBO bus, the FPA unit 236 recognizes it, and buffers the pointer to the available memory in the pool within the FPA UNIT 236 block. A DMA write access can be used to free up space in the pool within the FPA unit 236. The FPA unit 236 places the Direct Memory Access (DMA) address and data onto the IOBI bus (shown), which the IOB bridges onto the CMB 234.
  • [0079]
    FIG. 6 illustrates the format of a pool free command 600 to add a free address to a pool in the FPA unit 236. The pool free command 600 includes a subdid field 602 that stores the pool number in the FPA unit 236 to which the address is to be added, a pointer field 604 for storing a pointer to the free (available) memory, and a DWB count field 606 for storing a DWB count. The DWB count specifies the number of cache lines starting at the address stored in the pointer field 604 for which the IOB is to execute “don't write back” commands. A pool free command specifies the maximum number of DWBs to execute on the coherent memory bus 234.
  • [0080]
    The DWB engine 260 in the IOB 232 starts issuing DWB commands for cache blocks starting at the beginning of the free memory identified by the pointer 604 and marches forward linearly. As the DWB commands consume bandwidth on the CMB, the DWB count should be selected so that DWB commands are only issued for cache blocks that may have been modified.
  • [0081]
    While this invention has been particularly shown and described with references to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention encompassed by the appended claims.
Patent Citations
Cited PatentFiling datePublication dateApplicantTitle
US4415970 *Nov 14, 1980Nov 15, 1983Sperry CorporationCache/disk subsystem with load equalization
US4755930 *Jun 27, 1985Jul 5, 1988Encore Computer CorporationHierarchical cache memory system and method
US4780815 *Feb 17, 1987Oct 25, 1988Hitachi, Ltd.Memory control method and apparatus
US5091846 *Oct 30, 1989Feb 25, 1992Intergraph CorporationCache providing caching/non-caching write-through and copyback modes for virtual addresses and including bus snooping to maintain coherency
US5119485 *May 15, 1989Jun 2, 1992Motorola, Inc.Method for data bus snooping in a data processing system by selective concurrent read and invalidate cache operation
US5155831 *Apr 24, 1989Oct 13, 1992International Business Machines CorporationData processing system with fast queue store interposed between store-through caches and a main memory
US5276852 *Mar 15, 1993Jan 4, 1994Digital Equipment CorporationMethod and apparatus for controlling a processor bus used by multiple processor components during writeback cache transactions
US5404483 *Jun 22, 1992Apr 4, 1995Digital Equipment CorporationProcessor and method for delaying the processing of cache coherency transactions during outstanding cache fills
US5408644 *Jun 5, 1992Apr 18, 1995Compaq Computer CorporationMethod and apparatus for improving the performance of partial stripe operations in a disk array subsystem
US5590368 *Jul 27, 1995Dec 31, 1996Intel CorporationMethod and apparatus for dynamically expanding the pipeline of a microprocessor
US5619680 *Nov 25, 1994Apr 8, 1997Berkovich; SemyonMethods and apparatus for concurrent execution of serial computing instructions using combinatorial architecture for program partitioning
US5623627 *Dec 9, 1993Apr 22, 1997Advanced Micro Devices, Inc.Computer memory architecture including a replacement cache
US5623633 *Jul 27, 1993Apr 22, 1997Dell Usa, L.P.Cache-based computer system employing a snoop control circuit with write-back suppression
US5737547 *Jun 7, 1995Apr 7, 1998Microunity Systems Engineering, Inc.System for placing entries of an outstanding processor request into a free pool after the request is accepted by a corresponding peripheral device
US5737750 *Dec 9, 1996Apr 7, 1998Hewlett-Packard CompanyPartitioned single array cache memory having first and second storage regions for storing non-branch and branch instructions
US5742840 *Aug 16, 1995Apr 21, 1998Microunity Systems Engineering, Inc.General purpose, multiple precision parallel operation, programmable media processor
US5754819 *Jul 28, 1994May 19, 1998Sun Microsystems, Inc.Low-latency memory indexing method and structure
US5794060 *Nov 22, 1996Aug 11, 1998Microunity Systems Engineering, Inc.General purpose, multiple precision parallel operation, programmable media processor
US5794061 *Nov 22, 1996Aug 11, 1998Microunity Systems Engineering, Inc.General purpose, multiple precision parallel operation, programmable media processor
US5809321 *Nov 22, 1996Sep 15, 1998Microunity Systems Engineering, Inc.General purpose, multiple precision parallel operation, programmable media processor
US5822603 *Nov 22, 1996Oct 13, 1998Microunity Systems Engineering, Inc.High bandwidth media processor interface for transmitting data in the form of packets with requests linked to associated responses by identification data
US5860158 *Nov 15, 1996Jan 12, 1999Samsung Electronics Company, Ltd.Cache control unit with a cache request transaction-oriented protocol
US5890217 *Feb 7, 1996Mar 30, 1999Fujitsu LimitedCoherence apparatus for cache of multiprocessor
US5893141 *Mar 6, 1997Apr 6, 1999Intel CorporationLow cost writethrough cache coherency apparatus and method for computer systems without a cache supporting bus
US5895485 *Feb 24, 1997Apr 20, 1999Eccs, Inc.Method and device using a redundant cache for preventing the loss of dirty data
US5897656 *Sep 16, 1996Apr 27, 1999Corollary, Inc.System and method for maintaining memory coherency in a computer system having multiple system buses
US5991855 *Jul 2, 1997Nov 23, 1999Micron Electronics, Inc.Low latency memory read with concurrent pipe lined snoops
US6018792 *Jul 2, 1997Jan 25, 2000Micron Electronics, Inc.Apparatus for performing a low latency memory read with concurrent snoop
US6021473 *Aug 27, 1996Feb 1, 2000Vlsi Technology, Inc.Method and apparatus for maintaining coherency for data transaction of CPU and bus device utilizing selective flushing mechanism
US6026475 *Nov 26, 1997Feb 15, 2000Digital Equipment CorporationMethod for dynamically remapping a virtual address to a physical address to maintain an even distribution of cache page addresses in a virtual address space
US6065092 *Oct 24, 1997May 16, 2000Hitachi Micro Systems, Inc.Independent and cooperative multichannel memory architecture for use with master device
US6070227 *Oct 31, 1997May 30, 2000Hewlett-Packard CompanyMain memory bank indexing scheme that optimizes consecutive page hits by linking main memory bank address organization to cache memory address organization
US6125421 *May 6, 1998Sep 26, 2000Hitachi Micro Systems, Inc.Independent multichannel memory architecture
US6134634 *Dec 19, 1997Oct 17, 2000Texas Instruments IncorporatedMethod and apparatus for preemptive cache write-back
US6188624 *Jul 12, 1999Feb 13, 2001Winbond Electronics CorporationLow latency memory sensing circuits
US6226715 *May 6, 1999May 1, 2001U.S. Philips CorporationData processing circuit with cache memory and cache management unit for arranging selected storage location in the cache memory for reuse dependent on a position of particular address relative to current address
US6279080 *Jun 9, 1999Aug 21, 2001Ati International SrlMethod and apparatus for association of memory locations with a cache location having a flush buffer
US6408365 *Feb 1, 1999Jun 18, 2002Nec CorporationMultiprocessor system having means for arbitrating between memory access request and coherency maintenance control
US6438658 *Jun 30, 2000Aug 20, 2002Intel CorporationFast invalidation scheme for caches
US6526481 *Apr 27, 2000Feb 25, 2003Massachusetts Institute Of TechnologyAdaptive cache coherence protocols
US6546471 *Feb 18, 2000Apr 8, 2003Hitachi, Ltd.Shared memory multiprocessor performing cache coherency
US6560680 *Nov 26, 2001May 6, 2003Micron Technology, Inc.System controller with Integrated low latency memory using non-cacheable memory physically distinct from main memory
US6563818 *May 20, 1999May 13, 2003Advanced Micro Devices, Inc.Weighted round robin cell architecture
US6571320 *Nov 7, 2000May 27, 2003Infineon Technologies AgCache memory for two-dimensional data fields
US6587920 *Nov 30, 2000Jul 1, 2003Mosaid Technologies IncorporatedMethod and apparatus for reducing latency in a memory system
US6598136 *Oct 22, 1997Jul 22, 2003National Semiconductor CorporationData transfer with highly granular cacheability control between memory and a scratchpad area
US6622214 *Jan 12, 1999Sep 16, 2003Intel CorporationSystem and method for maintaining memory coherency in a computer system having multiple system buses
US6622219 *Apr 26, 2002Sep 16, 2003Sun Microsystems, Inc.Shared write buffer for use by multiple processor units
US6643745 *Mar 31, 1998Nov 4, 2003Intel CorporationMethod and apparatus for prefetching data into cache
US6647456 *Feb 23, 2001Nov 11, 2003Nvidia CorporationHigh bandwidth-low latency memory controller
US6654858 *Aug 31, 2000Nov 25, 2003Hewlett-Packard Development Company, L.P.Method for reducing directory writes and latency in a high performance, directory-based, coherency protocol
US6665768 *Oct 12, 2000Dec 16, 2003Chipwrights Design, Inc.Table look-up operation for SIMD processors with interleaved memory systems
US6718457 *Dec 3, 1998Apr 6, 2004Sun Microsystems, Inc.Multiple-thread processor for threaded software applications
US6725336 *Apr 20, 2001Apr 20, 2004Sun Microsystems, Inc.Dynamically allocated cache memory for a multi-processor unit
US6757784 *Sep 28, 2001Jun 29, 2004Intel CorporationHiding refresh of memory and refresh-hidden memory
US6785677 *May 2, 2001Aug 31, 2004Unisys CorporationMethod for execution of query to search strings of characters that match pattern with a target string utilizing bit vector
US6924810 *Nov 18, 2002Aug 2, 2005Advanced Micro Devices, Inc.Hierarchical texture cache
US7055003 *Apr 25, 2003May 30, 2006International Business Machines CorporationData cache scrub mechanism for large L2/L3 data cache structures
US7093153 *Oct 30, 2002Aug 15, 2006Advanced Micro Devices, Inc.Method and apparatus for lowering bus clock frequency in a complex integrated data processing system
US7209996 *Oct 16, 2002Apr 24, 2007Sun Microsystems, Inc.Multi-core multi-thread processor
US7558925 *Jan 18, 2006Jul 7, 2009Cavium Networks, Inc.Selective replication of data structures
US7594081 *Dec 28, 2004Sep 22, 2009Cavium Networks, Inc.Direct access to low-latency memory
US20010037406 *Mar 9, 2001Nov 1, 2001Philbrick Clive M.Intelligent network storage interface system
US20010054137 *Jun 10, 1998Dec 20, 2001Richard James EickemeyerCircuit arrangement and method with improved branch prefetching for short branch instructions
US20020032827 *Mar 14, 1997Mar 14, 2002De H. NguyenStructure and method for providing multiple externally accessible on-chip caches in a microprocessor
US20020099909 *Nov 26, 2001Jul 25, 2002Meyer James W.System controller with integrated low latency memory using non-cacheable memory physically distinct from main memory
US20020112129 *Feb 12, 2001Aug 15, 2002International Business Machines CorporationEfficient instruction cache coherency maintenance mechanism for scalable multiprocessor computer system with store-through data cache
US20030056061 *Aug 20, 2002Mar 20, 2003Alpine Microsystems, Inc.Multi-ported memory
US20030065884 *Sep 28, 2001Apr 3, 2003Lu Shih-Lien L.Hiding refresh of memory and refresh-hidden memory
US20030067913 *Oct 5, 2001Apr 10, 2003International Business Machines CorporationProgrammable storage network protocol handler architecture
US20030105793 *Apr 9, 2002Jun 5, 2003Guttag Karl M.Long instruction word controlling plural independent processor operations
US20030110208 *Jan 24, 2003Jun 12, 2003Raqia Networks, Inc.Processing data across packet boundaries
US20030115238 *Jan 17, 2003Jun 19, 2003Sun Microsystems, Inc.Method frame storage using multiple memory circuits
US20030115403 *Dec 19, 2001Jun 19, 2003Bouchard Gregg A.Dynamic random access memory system with bank conflict avoidance feature
US20030172232 *Mar 6, 2002Sep 11, 2003Samuel NaffzigerMethod and apparatus for multi-core processor integrated circuit having functional elements configurable as core elements and as system device elements
US20030212874 *Mar 24, 2003Nov 13, 2003Sun Microsystems, Inc.Computer system, method, and program product for performing a data access from low-level code
US20040000782 *Jun 28, 2002Jan 1, 2004Riefe Richard K.Steering column with foamed in-place structure
US20040010782 *Jul 9, 2002Jan 15, 2004Moritz Csaba AndrasStatically speculative compilation and execution
US20040059880 *Sep 23, 2002Mar 25, 2004Bennett Brian R.Low latency memory access method using unified queue mechanism
US20040073778 *Jul 8, 2003Apr 15, 2004Adiletta Matthew J.Parallel processor architecture
US20040250045 *Jul 2, 2004Dec 9, 2004Dowling Eric M.Split embedded dram processor
US20050114606 *Nov 21, 2003May 26, 2005International Business Machines CorporationCache with selective least frequently used or most frequently used cache line replacement
US20050138272 *Dec 22, 2003Jun 23, 2005Phison Electronics Corp.Method of controlling DRAM for managing flash memory
US20050138297 *Dec 23, 2003Jun 23, 2005Intel CorporationRegister file cache
US20050267996 *Mar 30, 2005Dec 1, 2005O'connor James MMethod frame storage using multiple memory circuits
US20050273605 *May 20, 2004Dec 8, 2005Bratin SahaProcessor extensions and software verification to support type-safe language environments running with untrusted code
US20060059310 *Dec 17, 2004Mar 16, 2006Cavium NetworksLocal scratchpad and data caching system
US20060059314 *Dec 28, 2004Mar 16, 2006Cavium NetworksDirect access to low-latency memory
US20060143396 *Dec 29, 2004Jun 29, 2006Mason CabotMethod for programmer-controlled cache line eviction policy
US20070038798 *Jan 18, 2006Feb 15, 2007Bouchard Gregg ASelective replication of data structures
Referenced by
Citing PatentFiling datePublication dateApplicantTitle
US7558925Jan 18, 2006Jul 7, 2009Cavium Networks, Inc.Selective replication of data structures
US7594081Dec 28, 2004Sep 22, 2009Cavium Networks, Inc.Direct access to low-latency memory
US7813277Jun 29, 2007Oct 12, 2010Packeteer, Inc.Lockless bandwidth management for multiprocessor networking devices
US7941585Dec 17, 2004May 10, 2011Cavium Networks, Inc.Local scratchpad and data caching system
US8041899Jul 29, 2008Oct 18, 2011Freescale Semiconductor, Inc.System and method for fetching information to a cache module using a write back allocate algorithm
US8059532Jun 21, 2007Nov 15, 2011Packeteer, Inc.Data and control plane architecture including server-side triggered flow policy mechanism
US8111707Dec 20, 2007Feb 7, 2012Packeteer, Inc.Compression mechanisms for control plane—data plane processing architectures
US8279885Sep 25, 2007Oct 2, 2012Packeteer, Inc.Lockless processing of command operations in multiprocessor systems
US8316431 *Oct 12, 2005Nov 20, 2012Canon Kabushiki KaishaConcurrent IPsec processing system and method
US8381072 *Aug 13, 2012Feb 19, 2013Kabushiki Kaisha ToshibaCache memory, computer system and memory access method
US8473658 *Oct 25, 2011Jun 25, 2013Cavium, Inc.Input output bridging
US8595401May 30, 2013Nov 26, 2013Cavium, Inc.Input output bridging
US8683128May 7, 2010Mar 25, 2014International Business Machines CorporationMemory bus write prioritization
US8838901May 7, 2010Sep 16, 2014International Business Machines CorporationCoordinated writeback of dirty cachelines
US8996812Jun 19, 2009Mar 31, 2015International Business Machines CorporationWrite-back coherency data cache for resolving read/write conflicts
US9141548Jan 20, 2014Sep 22, 2015Cavium, Inc.Method and apparatus for managing write back cache
US9419867Mar 30, 2007Aug 16, 2016Blue Coat Systems, Inc.Data and control plane architecture for network application traffic management device
US9431105Feb 26, 2014Aug 30, 2016Cavium, Inc.Method and apparatus for memory access management
US20060059310 *Dec 17, 2004Mar 16, 2006Cavium NetworksLocal scratchpad and data caching system
US20060059314 *Dec 28, 2004Mar 16, 2006Cavium NetworksDirect access to low-latency memory
US20070038798 *Jan 18, 2006Feb 15, 2007Bouchard Gregg ASelective replication of data structures
US20070067567 *Sep 19, 2005Mar 22, 2007Via Technologies, Inc.Merging entries in processor caches
US20070214358 *Oct 12, 2005Sep 13, 2007Canon Kabushiki KaishaConcurrent ipsec processing system and method
US20080239956 *Mar 30, 2007Oct 2, 2008Packeteer, Inc.Data and Control Plane Architecture for Network Application Traffic Management Device
US20080282034 *Jul 18, 2008Nov 13, 2008Via Technologies, Inc.Memory Subsystem having a Multipurpose Cache for a Stream Graphics Multiprocessor
US20080316922 *Jun 21, 2007Dec 25, 2008Packeteer, Inc.Data and Control Plane Architecture Including Server-Side Triggered Flow Policy Mechanism
US20090003204 *Jun 29, 2007Jan 1, 2009Packeteer, Inc.Lockless Bandwidth Management for Multiprocessor Networking Devices
US20090083517 *Sep 25, 2007Mar 26, 2009Packeteer, Inc.Lockless Processing of Command Operations in Multiprocessor Systems
US20090106501 *Oct 17, 2007Apr 23, 2009Broadcom CorporationData cache management mechanism for packet forwarding
US20100030974 *Jul 29, 2008Feb 4, 2010Kostantin GodinSystem and method for fetching information to a cache module using a write back allocate algorithm
US20100325367 *Jun 19, 2009Dec 23, 2010International Business Machines CorporationWrite-Back Coherency Data Cache for Resolving Read/Write Conflicts
US20120297147 *May 20, 2011Nov 22, 2012Nokia CorporationCaching Operations for a Non-Volatile Memory Array
US20150331803 *Jul 28, 2015Nov 19, 2015The Quantum Group Inc.System and method for slice processing computer-related tasks
Classifications
U.S. Classification711/141, 711/E12.035, 711/E12.022
International ClassificationG06F13/28
Cooperative ClassificationG06F2212/6022, G06F2212/6012, G06F13/24, G06F12/0891, G06F12/0875, G06F12/0835, G06F12/0804, G06F11/3632, G06F9/383, G06F9/30138, G06F9/30014, G06F12/0815, G06F12/0813, G06F12/084
European ClassificationG06F12/08B14, G06F11/36B4, G06F9/30R5X, G06F9/38D2, G06F13/24, G06F9/30A1A1, G06F12/08B20, G06F12/08B4P4P
Legal Events
DateCodeEventDescription
Nov 2, 2005ASAssignment
Owner name: CAVIUM NETWORKS, CALIFORNIA
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ASHER, DAVID H.;BOUCHARD, GREGG A.;KESSLER, RICHARD E.;AND OTHERS;REEL/FRAME:016972/0826;SIGNING DATES FROM 20050304 TO 20050914
Mar 14, 2007ASAssignment
Owner name: CAVIUM NETWORKS, INC., A DELAWARE CORPORATION, CAL
Free format text: MERGER;ASSIGNOR:CAVIUM NETWORKS, A CALIFORNIA CORPORATION;REEL/FRAME:019014/0174
Effective date: 20070205
Jul 21, 2011ASAssignment
Owner name: CAVIUM, INC., CALIFORNIA
Free format text: MERGER;ASSIGNOR:CAVIUM NETWORKS, INC.;REEL/FRAME:026632/0672
Effective date: 20110617