WO1999046681A1 - Cache coherence unit for interconnecting multiprocessor nodes having pipilined snoopy protocol - Google Patents

Cache coherence unit for interconnecting multiprocessor nodes having pipilined snoopy protocol Download PDF

Info

Publication number
WO1999046681A1
WO1999046681A1 PCT/US1999/005523 US9905523W WO9946681A1 WO 1999046681 A1 WO1999046681 A1 WO 1999046681A1 US 9905523 W US9905523 W US 9905523W WO 9946681 A1 WO9946681 A1 WO 9946681A1
Authority
WO
WIPO (PCT)
Prior art keywords
coherence
memory
bus
cache
unit
Prior art date
Application number
PCT/US1999/005523
Other languages
French (fr)
Inventor
Wolf-Dietrich Weber
Original Assignee
Fujitsu Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Limited filed Critical Fujitsu Limited
Priority to JP53870899A priority Critical patent/JP2001525095A/en
Publication of WO1999046681A1 publication Critical patent/WO1999046681A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0817Cache consistency protocols using directory methods
    • G06F12/0828Cache consistency protocols using directory methods with concurrent directory accessing, i.e. handling multiple concurrent coherency transactions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0831Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0813Multiuser, multiprocessor or multiprocessing cache systems with a network or matrix configuration

Abstract

The present invention consists of a cache coherence protocol within a cache coherence unit for use in a data processing system. The data processing system is comprised of multiple nodes, each node having a plurality of processors with associated caches, a memory, and input/output. The processors within the node are coupled to a memory bus operating according to a 'snoopy' protocol. This invention includes a cache coherence protocol for a sparse directory in combination with te multiprocessor nodes. In addition, the invention has the following features: the current state and information from the incoming bus request are used to make an immediate decision on actions and next state; the decision mechanism for outgoing coherence is pipelined to follow the bus; and the incoming coherence pipeline acts independently of outgoing coherence pipeline.

Description

1 CACHE COHERENCE UNIT FOR
INTERCONNECTING MULTIPROCESSOR NODES
HAVING PIPELINED SNOOPY PROTOCOL
CROSS-REFERENCE TO RELATED APPLICATIONS
This application is related to co-pending U.S. Patent Application Serial No. 09/003,7771, entitled "Memory Protection Mechanism For A Distributed Shared
Memory Multiprocessor With Integrated Message Passing Support," filed on January
7, 1998; and co-pending U.S. Provisional Patent Application Serial No. 60/084,795, entitled "Cache Coherence Unit For Interconnecting Multiprocessor Nodes Having A
Sparse Directory With Enhanced Replacement Control," filed on May 8, 1998; which
are hereby incorporated by reference.
2 BACKGROUND OF THE INVENTION
1. Field of Invention
The present invention generally relates to cache coherence for multiprocessor data processing systems, and more particularly to cache coherence for a plurality of
multiprocessor nodes, each node having a snoopy bus protocol.
2. Discussion of the Background Art
Multiprocessor architectures are classified according to types of address space
and memory organization. Address space architecture classifications are based upon
the mechanism by which processors communicate. Processors communicate either by
explicit messages sent directly from one processor to another or by access through shared-memory address space. The first classification is called a message passing
architecture while the second is a shared-memory architecture.
Memory organization is classified as centralized or distributed. In a
centralized organization memory system, the entire memory is located concentrically
or symmetrically with respect to each processor in the system. Thus, each processor
has equivalent access to a given memory location. In a distributed organization
system, on the other hand, each processor within the multiprocessor system has an
associated memory that is physically located near the processor; furthermore, every
processor has the capability of directly address its own memory as well as the remote
memories of the other processors. A distributed, shared-memory system is known as
a distributed shared-memory (DSM) or a non-uniform memory access (NUMA)
architecture. DSM architecture provides a single shared address space to the programmer where all memory locations may be accessed by every processor. As
there is no need to distribute data or explicitly communicate data between the 3 processors in software, the burden of programming a parallel machine is simpler in a DSM model. In addition, by dynamically partitioning the work, DSM architecture makes it easier to balance the computational load between processors. Finally, as
shared memory is the model provided on small-scale multiprocessors, DSM
architecture facilitates the portability of programs parallelized for a small system to a
larger shared-memory system. In contrast, in a message-passing system, the
programmer is responsible for partitioning all shared data and managing communication of any updates.
The prior art provides numerous examples of DSM architectures. However,
such systems communicate through high bandwidth buses or switching networks, and the shared-memory increases data latency. Latency is defined as the time required to access a memory location within the computer, and describes the bottleneck impeding system performance in multiprocessor systems. Latency is decreased in DSM systems
by memory caching and hardware cache-coherence.
Caching involves placing high-speed memory adjacent to a processor where
the cache is hardware rather than software controlled. The cache holds data and
instructions that are frequently accessed by the processor. A cache system capitalizes
on the fact that programs exhibit temporal and spatial locality in their memory
accesses. Temporal locality refers to the propensity of a program to again access a
location that was recently accessed, while spatial locality refers to the tendency of a
program to access variables at locations near those that were recently accessed.
Cache latency is typically several times less than that of main system memory.
Lower latency results in improved speed of the computer system. Caching is
especially important in multiprocessor systems where memory latency is higher because they are physically larger, but caching does introduce coherence problems 4 between the independent caches. In a multiprocessor system, it becomes necessary to ensure that when a processor requests data from memory, the processor receives the most up-to-date copy of the data to maintain cache coherence.
Protocols incorporated in hardware have been developed to maintain cache coherence. Most small-scale multiprocessor systems maintain cache coherence with a
snoopy protocol. This protocol relies on every processor monitoring (or "snooping")
all requests to memory. Each cache independently determines if accesses made by
another processor require an update. Snoopy protocols are usually built around a
central bus (a snoopy bus). Snoopy bus protocols are very common, and many small- scale systems utilizing snoopy protocols are commercially available.
To increase the processing power of computer systems, manufacturers have
attempted to add more processing units to existing systems. When connecting additional microprocessors to the main bus to help share the workload, processing
power is added linearly to the system while maintaining the cost-performance of the
uni-processor. In such systems, however, bus bandwidth becomes the limiting factor
in system performance since performance decreases rapidly with an increase in the
number of processors.
In order to overcome the scaling problem of bus-based cache coherence
protocols, directory-based protocols have been designed. In directory based systems,
the state of each memory line is kept in a directory. The directory is distributed with
memory such that the state of a memory line is attached to the memory where that line
lives. The caches are kept coherent by a point-to-point cache coherence protocol
involving the memory system and all the processor caches.
U.S. Patent No. 5,029,070 to McCarthy et al. discloses a method for maintaining cache coherence by storing a plurality of cache coherency status bits with 5 each addressable line of data in the caches. McCarthy et al. specifically rejects storing the plurality of cache coherency status bits in the global memory. A plurality of state
lines are hardwired to the bus master logic and bus monitor logic in each cache. The
state lines are ORed so that all the states of all the same type of cache coherency bits
in every cache except for the line undergoing a cache miss appear on the state line.
This allows the bus master to rapidly determine if any other cache has a copy of the line being accessed because of a cache miss.
U.S. Patent No. 5,297,269 to Donaldson et al. discloses a system for point-to- point cache coherence in a multi-node processing system where the coherency is
maintained by each main memory module through a memory directory resident on the
individual memory module. The memories and nodes are coupled together by means of a cross bar switch unit coupled point-to-point to one or more main memory
modules. The memory directory of each main memory module contains a plurality of
coherency state fields for each data block within the module. Each main memory
module maintains the coherency between nodes. The module queries its own directory upon each data transfer operation that affects the coherency state of a data
block.
Sequent (T. Lovett and R. Clapp, StiNG, "A CC-NUMA Computer System for
the Commercial Marketplace," Proceedings of the 23rd International Symposium on
Computer Architecture, pages 308-317, May 1996) and Data General (R. Clark and K.
Alnes, "An SCI Interconnect Chipset and Adapter," Symposium Record, Hot Interconnects IV, pages 221-235, August 1996) disclose machines that interconnect
multiple quad Pentium Pro nodes into a single shared-memory system. These two
systems both utilize an SCI-based interconnect, a micro-coded controller, and a large
per-node cache. The use of the SCI coherence protocol prevents close coupling of the 6 inter-node coherence mechanism to the intra-node (snoopy) coherence mechanisms.
The mismatch between the two protocols requires the use of a large L3 (node-level)
cache to store the coherence tag information required by the SCI protocol, to correct
the mismatch of cache line size, and to adapt the coherence abstraction presented by
the processing node to that required by SCI. In addition, the complexity of the SCI
coherence protocol invariably leads to programmable implementations that are unable
to keep up with the pipeline speed of the processor bus, and that can only process one
request at a time. The result is a coherence controller that is large, expensive, and slow.
What is needed is an inter-node coherence mechanism that is simple, fast, and well-matched to the pipelined snoopy protocol. Such a mechanism can be very tightly
coupled to the processor bus and can thus achieve higher performance at lower cost.
7 SUMMARY OF THE INVENTION
This invention includes the cache coherence protocol for a sparse directory in combination with multiprocessor nodes, each node having a memory and a data bus
operating under a pipelined snoopy bus protocol. In addition, the invention has the
following features: the current state and information from the incoming bus request
are used to make an immediate decision on actions and next state; the decision
mechanism for outgoing coherence is pipelined to follow the bus; and the incoming coherence pipeline acts independently of the outgoing coherence pipeline.
The invention implements the cache coherence protocol within a cache
coherence unit for use in a data processing system. The data processing system is
comprised of multiple nodes, each node having a plurality of processors with associated caches, a memory, and input/output. The processors within the node are
coupled to a memory bus operating according to a "snoopy" protocol.
Multiple nodes are coupled together using an interconnection network, with
the mesh coherence unit acting as a bridge between the processor/memory bus and the
interconnection network. The mesh coherence unit is attached to the
processor/memory bus and the interconnection network. In addition, it has a
coherence directory attached to it. This directory keeps track of the state information
of the cached memory locations of the node memory. The mesh coherence unit follows bus transactions on the processor/memory bus, looks up cache coherence state
in the directory, and exchanges messages with other mesh coherence units across the
interconnection network as required to maintain cache coherence.
The invention incorporates close coupling of the pipelined snoopy bus to the
sparse directory. In addition, the invention incorporates dual coherence pipelines. 8 The purpose of having dual coherence pipelines is to be able to service network
requests and bus requests at the same time in order to increase performance. Finally,
the invention incorporates a coherence protocol where all protocol interactions have
clearly defined beginnings and endings. The protocol interactions end the transactions on a given line before the interaction of a new line may begin. This process is
achieved by the invention keeping track of all the transient states within the system.
9 BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 A is a block diagram illustrating a multiprocessor system having a
plurality of nodes connected via a mesh coherence unit to an interconnect;
FIG. IB is a block diagram of a node of FIG. 1 A;
FIG. 2A is a block diagram of an individual node of FIG. 1A in further detail;
FIG. 2B is a block diagram of a P6 segment of FIG. 2A; FIG. 3 is a block diagram illustrating one embodiment of a shared memory site of FIG. 1A;
FIG. 4 illustrates the state transitions of the cache coherency protocol during a
remote read miss;
FIG. 5 illustrates the state transitions of the cache coherency protocol for a
remote write miss with clean copies;
FIG. 6 is a block diagram of the mesh coherence unit; and
FIG. 7 is a block diagram illustrating the relationship of the TRAT, ROB, NI,
UP, and DP.
10
DETAILED DESCRIPTION OF THE INVENTION
Referring to FIGS. 1 A and IB, a plurality of nodes 100 are coupled to
Interconnect 110, which enables nodes 100 to share information either by a message-
passing mechanism, a shared-memory mechanism, or a hybrid of the two mechanisms.
In the preferred embodiment, up to four nodes 100 are coupled together in a shared-
memory mechanism to create a shared-memory site 120. The nodes are connected to
Interconnect 110 via a Mesh Coherence Unit 130. Each Mesh Coherence Unit 130 is
coupled to an associated Sparse Directory 140.
Referring now to FIGS. 2A and 2B, node 200 of the cluster shown in FIG. 1 A
is shown. Four processors 210, together with their associated caches 220, are coupled
to memory bus 230. In the present embodiment, memory bus 230 operates according
to a snoopy bus protocol. In addition, associated memory 240 and input/output 250 of
the processors are attached to bus 230. In the preferred embodiment, Quad-P6
segment (P6 segment) 260 contains standard high volume Intel processor-based SMP
nodes made up of four Pentium® Pro processors, up to 1 GByte of DRAM, and two
PCI buses for attaching I/O. The P6 segment 260 is shown in FIG. 2B.
The P6 segment 260 maintains coherency within the segment by the use of a
snoopy bus protocol. Within P6 segment 260, each associated cache 220 snoops or
monitors all transactions with main memory 240 by all other caches 220 within
segment 260. In this way, each cache 220 within the P6 segment 260 is aware of all
memory lines 270 within memory 240 that are transferred from main memory 240 to a
cache 220. Mesh Coherence Unit (MCU) 130 is coupled to both P6 segment 260 via memory bus 11
230 and to Interconnect 110. All inter-node communication is passed through MCU
130. P6 segment 260, together with MCU 130, makes up current node 200. MCU
130 coordinates the flow of instructions and data between current node 200 and the
other nodes 100 connected to Interconnect 110. MCU 130 maintains the cache
coherence between nodes 100 within shared-memory site 120 and extends the P6 bus
functions over the mesh to connect multiple nodes 100 together. Nodes 100, together
with the Interconnect 110, make up a cluster. The nodes within a cluster, both within
shared memory sites 120 and those outside the sites, may be located physically close
to one another or may be distributed at a distance. Within a site 120, coherency is
maintained between nodes 100 by the MCU 130 and coherency is maintained within
node 200 by the standard P6 snoopy bus protocol. In the preferred embodiment,
coherency within site 120 is maintained by hardware; however, those familiar to the
art will recognize that site coherency could also be maintained in software or
firmware.
As will be described in detail below, MCU 130 maintains inter-node
coherency by implementing a directory-based, cache coherence protocol. MCU 130
keeps track of cache lines accessed by remote nodes 100 and the cache line status
within current node 200 with sparse directory 140. The MCU 130 also supports
Message Passing and Memory Copy between nodes both inter- and intra-site. In the
preferred embodiment, MCU 130 is a single custom CMOS chip. Sparse coherence
directory 140 is stored in standard, off-the-shelf SRAM chips. In the present invention, only three 1 Mbit chips are required for a total directory size of less then
0.5 MByte. A comparable full-directory design with 6 bits of state for every cache
line would require 24 MByte of storage per GByte of main memory. 12 Now referring to FIG. 3, one embodiment of the present invention is shown.
Cache coherence within the system is maintained on two levels. On one level, cache
coherence is maintained within individual nodes 200 by a snoopy bus protocol. On
the second level, MCU 130 maintains cache coherence between nodes 100 by using an
invalidating, directory-based cache coherence protocol. Memory line 1900 is one
cache line's worth of data stored in memory 1910 (240) at a particular address. Home
node 1920 is the particular node where a cached memory line physically resides. A
memory line 1900 can be cached locally 1921 at the home node 1920 or remotely in
one or more processor caches 1922 or 1923. When a memory line 1900 is cached
remotely, the line is either unmodified (clean) 1923 or modified (dirty) 1922. Owner
node 1930 is the particular node that has control of memoiy line 1900 with the ability
to update the line. The owner node 1930 is said to own a dirty copy 1922 of the
memory line 1900. A remote node 1950 is said to have a clean copy 1923 of the
memory line 1900 and is referred to as a "sharer" node. A local node 1940 is the node
200 where a current memory request originates. Any node that is not the home or
local node is called a remote node 1950.
Each MCU 1911 (130) maintains a directory to track its own memory lines
1900 in use by remote nodes 1930, 1940, 1950. The home node directory 1960 (140)
tracks those nodes 200 that have requested a read-only copy 1923 of the cache line
1900 so that when a node wants to update a line 1900, the directory 1960 (140) knows
which nodes to selectively invalidate. In addition, the directory 1960 (140) tracks the
node, if any, that owns the memory line 1900. The home node 1920 knows all the
remote nodes 1950 within the shared-memory site 120 that have requested a read-only
copy of the home node's 1920 memory line 1900, and also the remote node 1930 that 13 has requested write access to the memory line 1900. When home node 1920 must
communicate with remote nodes 1930, 1950, it does so selectively based upon which
nodes have accessed the line rather than to all nodes within site 120. Thus, the directory-based cache coherence protocol achieves coherence by point-to-point
communication between nodes 100 rather than broadcast invalidation.
When processor 1941 requests memory line 1900, its cache 1942 is checked to
determine whether the data is present in the cache. If processor 1941 finds the data in
cache 1942 (a "cache hit"), cache 1942 transfers the data to processor 1941. If
processor 1941 does not find the data in cache 1942 (a "cache miss"), the processor
issues a request onto P6 memory bus 230 for the data. The bus request includes the
real address of memory line 1900 and the identification of home node 1920 for
memory line 1900. If memory line 1900 resides on a different node, MCU 1943 (130)
of local node 1940 generates a network request for the cache line 1900 across
Interconnect 110 to home node 1920. When the MCU at home node 1911 (130)
receives the network request, it stages the request onto home node's memory bus 1912
(230). The network request then snoops bus 1912 (230) to determine if memory line
1900 is cached locally 1921. If not cached, the network request obtains control of
memory line 1900 in the node's memory 1910 (240). Thus, the action of MCU 1911
(230) at home node 1920 is like the action of a local processor 210 accessing memory
line 1900 within node 200. Depending on the status of memory line 1900, MCU 1911
(230) will take the appropriate action to give the requested memory line 1900 to local
node 1940. If local node 1940 has requested an update of line 1900, local node 1940
becomes the owner node; if local node 1940 only requests read capacity, it becomes a
sharer. Only one node 200 may be the owner of a memory line 1900 at any one time. 14 When an owner node 1930 finishes accessing a memory line 1900, the owner
node 1930 must notify the home node 1920 that it no longer requires access to line
1900. If owner node 1930 has modified the data, line 1900 must be updated at home
node 1920. This process is termed a "writeback" of the data.
If home node 1920 needs to regain control of memory line 1900, line 1922
may be flushed back to home node 1920 or flushed forward to another remote node
1940 if the remote node requires access. Owner node 1930, upon receiving a flush
back request, returns the control of memory line 1900 to home node 1920. Upon
receiving a flush forward request, owner node 1930 transfers memory line 1900 to a
remote node 1940, which becomes the new owner node.
Referring now to FIG. 4, the state transitions upon a read of a memory line 270
is illustrated. When a remote read by a processor 210 results in a cache miss and the
local node 600 must obtain memory line 270 from home node 610. Local node 600
first determines that memory line 270 is not in cache 220 (read miss). Local MCU
130 requests a fetch 601 from home node 610 for memory line 270 using a fetch
request message. If memory line 270 is not dirty (owned by another node), home
node 630 returns with a data message 602.
Referring now to FIG. 5 the state transitions on write requests by a remote
node to the home node where one or more remote nodes have cached, read-only
copies of the requested line is illustrated. In this transition, only read-only copies are
cached and no node has ownership of the line at the beginning of the sequence. After
a cache miss, local node 720 sends the fetch exclusive request 721 to home node 730.
Home node 730 sends back the data and ownership of line 270 to local node 720 via
the data exclusive message 731. Simultaneously, home node 730 sends invalidation 15 instructions 732 to the remote nodes 740 with cached, read-only copies of memory
line 270. Remote nodes 740 immediately acknowledge the receipt of the invalidation
741 when the invalidation is received in Mesh Interface 350 of the remote nodes 740.
When home node 730 receives acknowledgments 741 from all remote nodes 740 with
cached copies, home node 730 notifies local node 720 with the done message 733. At
this point, local node 720 is the owner of line 270.
Referring now to FIG. 6, a block diagram of the Mesh Coherence Unit (MCU)
130 is shown. The Pipe Control (PC) 310 arbitrates for control of P6 memory bus 230
during different bus phases of Up Pipe (UP) 320 and Down Pipe (DP) 330. Requests
that arrive from P6 bus 230 are turned into network requests by DP 330 and are then
sent to Queue Interface (QI) 340. The network requests are dispatched over the mesh
by Mesh Interface (MI) 350. Remote requests from the mesh are received by MI 350
and passed to Network Interface (NI) 360, which generates either bus requests that are
passed to the UP 320 or network requests that are passed to the QI 340. UP 320
requests P6 bus 230 arbitration from PC 310, and UP 320 sources the requests onto
the P6 bus 230 once arbitration has been won by the PC 310. The Exported Real
Address Table (ERAT) 370 maintains cache coherence information while Temporary
Real Address Table (TRAT) 380 and Remote Outstanding Operations Buffer (ROB)
390 keep track of cache lines in transition.
ERAT 370, TRAT 380, and ROB 390 together make up the sparse directory
140. The sparse directory 140 is a set associative directory. The specific functions of
directory 140 are discussed in detail below.
Down Pipe (DP) 330 transforms the P6 bus requests into network requests.
DP 330 maintains coherence during a P6 bus request in three phases: the request 16 phase, the snoop phase, and the response phase. During the request phase, DP 330 records the P6 request phase signal in a history queue. The history queue holds up to
eight outstanding requests on the bus for a reply and nine requests of data transfer.
The request is held in the queue until the DP 330 is ready to process the request.
DP 330 processes the bus request during the snoop phase. During this phase,
the bus request address is sent to ERAT 370 for lookup and the ERAT data bus is
switched for an ERAT read. The bus requests are translated into network requests with the use of a Coherence Action Table. The network requests are prepared for
output to QI 340 and the ERAT 370, ROB 390, and TRAT 380 entries are updated as
required.
The Queue Interface (QI) 340 creates the packets for transfer to the remote
nodes and passes the completed packets to Mesh Interface (MI) 350. QI 340 receives
a request or command from either DP 330 or NI 360. QI 340 stores the request in an
output queue. If the request contains data, QI 340 must hold the request in abeyance
until it receives the associated data from DP 330. Once QI 340 receives the data, QI
340 consolidates the request and data together into a single packet. The transfer of
data usually occurs sometime after QI 340 receives the request. Requests to transfer
data are always bus 230 initiated requests, while requests initiated by NI 360 are
command requests only.
NI 360 receives packets from Interconnect 110 through MI 350. NI 360 decodes the
packet and determines whether to send the P6 messages to UP 320, send mesh
messages to QI 340, or to update ROB 390 and TRAT 380 entries. The detailed
operation of NI 360 is discussed below.
Up Pipe (UP) 320 receives bus requests from NI 360. UP 320 requests bus 17 arbitration from Pipe Control (PC) 310. Once PC 310 notifies UP 320 that arbitration
has been won, UP 320 sources the bus request onto P6 memory bus 230. Depending
upon the request received from NI 360, UP 320 may also deallocate ROB 390 or
TRAT 380 entries for the incoming transient request.
Referring now to FIG. 7, the interaction between ROB 390/TRAT 380
combination and UP 320, DP 330, and NI 360 is shown. When NI 360 receives a
network request, it reads ROB 390/TRAT 380 entry for the request to determine the
request's current transition state. NI 360 updates or deallocates the ROB 390/TRAT
380 entry as required by the request. During the snoop phase as described above, DP
330 reads the ROB 390/TRAT 380 entries to determine the current transient state of a
given memory line. If no ROB 390/TRAT 380 entry exits for a given line, and the
ROB 390/TRAT 380 is not full, DP 330 may allocate an entry within ROB 390/TRAT
380. Similarly, UP 320 reads the ROB 390/TRAT 380 for a given memory line. UP
320 may deallocate the entry in TRAT 380 depending upon the response that is
received from NI 360. UP 320, NI 360, and DP 330 are each pipelined to handle multiple bus transactions or network requests at the same time. Through the use of
the ROB 390 and TRAT 380, as described above, the UP 320, NI 360, and DP 330
coordinate their actions to operate independently and in parallel when the addresses of
the request they are processing are independent. In this manner, a high-performance,
coherence protocol processing engine is achieved.
The invention has been explained above with reference to a preferred
embodiment. Other embodiments will be apparent to those skilled in the art in light
of this disclosure. For example, the present invention may readily be implemented
using configurations other than those described in the preferred embodiment above. Additionally, the present invention may effectively be used in combination with 18 systems other than the one described above as the preferred embodiment. Therefore, these and other variations upon the preferred embodiments are intended to be covered by the present invention, which is limited only by the appended claims.

Claims

19 WHAT IS CLAIMED IS:
1. A data processing system comprising: a plurality of multiprocessor nodes;
a plurality of coherence units coupled to said multiprocessor nodes; and
a node interconnect coupled to said coherence units.
2. A cache coherence unit for use in a data processing system having multiple
nodes each having a plurality of processors coupled to a memory bus operating
according to a snoopy protocol, each processor having an associated cache memory
for caching information, said cache coherence unit having an address phase, a snoop
phase, and a response phase, and comprising:
a bus interface element coupled to said memory bus;
a coherence control element coupled to said bus interface element and coupled to said cache memory; and a directory coupled to said coherence control element for storing information
of locations in said cache memory.
3. The cache coherence unit as in claim 2, wherein said coherence control
element reads state information from said directory after said address phase and
during said snoop phase, and updates said state information in said directory before a
next snoop phase.
20 4. The coherence unit of claim 2 further comprising a protocol decision pipeline
coupled to said coherence control element for processing requests for cache memory from a remote node.
5. The coherence unit of claim 2 where said directory is a sparse directory.
6. The coherence unit of claim 2 wherein said directory has dual ports.
7. The cache coherence unit of claim 2, wherein said directory stores state
information characterizing a current transient state and a next transient state.
8. The cache coherence unit of claim 7, wherein said coherence control element
updates said information characterizing a current transient state.
9. The cache coherence unit of claim 7, wherein said coherence control element
updates said information characterizing a current transient state and then updates said
information characterizing a next transient state.
21 10. A method for maintaining cache coherence in a cache coherence unit in a data processing system having multiple nodes each having a plurality of processors coupled to a memory bus operating according to a snoopy protocol having an address
phase, a snoop phase, and a response phase, each processor having an associated
cache memory, comprising the steps of:
coupling a bus interface element to said memory bus;
coupling a coherence control element to said bus interface element and to said cache memory; and
storing information of locations in said cache memory locations in a directory
coupled to said coherence control element.
11. The method of claim 10, wherein said coupling step further comprises:
reading state information from said directory after said address phase and
before said snoop phase; and
updating said state information before a next snoop phase.
12. The cache coherence unit of claim 10, wherein said coherence control element
further comprises:
means for reading state information from said directory after said address
phase and before said snoop phase; and
means for updating said state information before a next snoop phase.
22 13. The cache coherence unit of claim 11, wherein said coherence control element further comprises:
means for updating information characterizing a current transient state;
means for reading information characterizing a next transient state; and means for updating said information characterizing a next transient state after
said means for updating information characterizing a current transient state
is completed.
23 14. A data processing system comprising:
a first node having a first memory coupled to a first memory bus, a plurality of processors coupled to said first memory bus, each processor having a
respective cache and said memory bus operated according to a snoopy bus protocol for maintaining coherence between said caches, said snoopy bus
protocol having an address phase, a snoop phase, and a response phase;
a second node having a second memory coupled to a second memory bus, a
plurality of processors coupled to said second memory bus, each processor
having a respective cache and said memory bus operated according to said
snoopy bus protocol; a first internode communication unit coupled to said first memory bus and having a first directory for indicating states of cached blocks of said first
and second memories and a coherence control element coupled to said first
bus and coupled to said first directory for reading state information from
said first directory after said address phase and before said snoop phase
and updating said state information before a next snoop phase; and
a second internode communication unit coupled to said second memory bus
and coupled to said first internode communication unit, said first memory
being accessible to said second node and said second memory being
accessible to said first node, and said second coherence unit having a
second directory for indicating states of cached blocks of said second memory and a coherence control element coupled to said second bus and
coupled to said second directory for reading state information from said
second directory after said address phase and before said snoop phase and
updating said state information before a next snoop phase. 24
15. The data processing system of claim 14 further comprising a plurality of
nodes.
16. The data processing system of claim 14 wherein said first and second
directories are sparse directories.
17. The data processing system of claim 14, wherein said first and second
internode communication units further comprise:
a mesh interface unit; a network interface unit coupled to said mesh interface unit; and a transfer agent coupled to said network interface unit, said memory buses, and
said directories.
18. The data processing system of claim 17, wherein said network interface unit
further comprises: a data transfer unit coupled to an interconnect unit and to said transfer agent
for transferring data and control requests from said interconnect unit to
said transfer agent; and a coherence action unit coupled to said interconnect unit and said directory for
reading state information from said directory.
25 19. A method of maintaining cache coherency within a data processing system
having a plurality of nodes and an interconnect, each node having a plurality of multiprocessors with associated caches and memory, a memory bus controlled by a snoopy bus protocol, a directory for storing state information of cached memory lines,
and a mesh interface controlled by a mesh protocol, comprising the steps of:
receiving a bus request from the said memory bus for the control of said
cached memory line;
reading said state information of said cached line; updating said state information of said cache line;
transforming said bus request into a network request; and
forwarding said network request to said mesh interface.
20. The method of claim 19, further comprising the steps of: receiving a network request from said mesh interface;
arbitrating with said memory bus for scheduling said request onto the said
memory bus;
passing said network request to a transfer agent; and
transferring said network requests from said transfer agent to said memory bus.
26 21. A cache coherence unit within a data processing system having a plurality of nodes, each node having a plurality of processors coupled to a memory bus operating
according to a snoopy bus protocol, each processor having an associated cache, and
coupled to an interconnect, said cache coherence unit comprising:
a control element for arbitrating with said memory bus;
a coherence control unit coupled to said control element for maintaining coherence of cached memory locations of bus requests received from said
memory bus;
a directory coupled to said coherence control unit for storing state information
of said cached memory locations; a first interface unit coupled to said coherence control unit for transforming
said bus requests into mesh-out requests;
a mesh interface coupled to said first interface unit for transferring said mesh-
out requests to said interconnect; a second interface unit coupled to said mesh interface for receiving mesh-in
requests from said interconnect; and
a transfer agent coupled to said second interface unit and coupled to said
memory bus for transferring said mesh-in requests from said second
interface unit to said memory bus.
27 22. The cache coherence unit of claim 21 , wherein said coherence control unit
further comprises: a first agent coupled to said first interface unit for transforming said bus into
mesh-out requests; and
a second agent coupled to said first interface for transferring said mesh-out
requests to said interconnect.
23. The cache coherence unit of claim 21, wherein said coherence control unit
further comprises: a third agent coupled to said first interface for transferring data from said
memory bus to said interconnect.
24. The cache coherence unit of claim 21, wherein said second interface unit
further comprises:
a first input agent for mapping incoming network request from said remote
nodes to bus transactions; and
a second input agent for handling incoming mesh replies and dealing with all
mesh-to-bus data transfers.
25. The cache coherence unit of claim 21, wherein said second interface unit
further comprises: a data transfer unit coupled to said mesh interface and said data storage unit;
and a coherence action unit coupled to said mesh interface unit and said directory.
28 26. A cache coherence unit within a data processing system having a plurality of
nodes and an interconnect, each node having a plurality of multiprocessors with associated caches and memory, a memory bus controlled by a snoopy bus protocol, a
directory for storing state information of cached memory lines, and a mesh interface
controlled by a mesh protocol, comprising:
a control unit coupled to said memory bus for receiving bus requests;
means for reading said state information of said cached line;
means for updating said state information of said cache line; and a mesh interface for forwarding said bus requests to said interconnect.
27. The cache coherence unit of claim 26, further comprising:
means for receiving network requests from said interconnect;
a bus arbitration unit coupled to said memory bus for scheduling said network
requests onto the said memory bus; means for passing said network request to a transfer agent; and
means for transferring said network requests from said transfer agent to said
memory bus.
28. A method for maintaining cache coherence in a multi-node data processing system, comprising the steps of:
receiving a request for a cache line from a local node;
determining the status of said cache line; and
sending a copy of said cache line to said local node.
29 29. The method of claim 28, further comprising the steps of: invalidating the status of remote copies of said cache line; and
notifying said local node that transfer of said cache line is complete.
30. The method of claim 28, further comprising the steps of: requesting a copy of said cache line from a home node; and
receiving a copy of said cache line from said home node.
31. A system for maintaining cache coherence in a multi-node data processing system, comprising: means for receiving a request for a cache line from a local node;
means for determining the status of said cache line; and
means for sending a copy of said cache line to said local node.
32. The system of claim 31 , further comprising:
means for invalidating the status of remote copies of said cache line; and
means for notifying said local node that transfer of said cache line is complete.
33. The system of claim 31 , further comprising:
means for requesting a copy of said cache line from a home node; and
means for receiving a copy of said cache line from said home node.
PCT/US1999/005523 1998-03-12 1999-03-12 Cache coherence unit for interconnecting multiprocessor nodes having pipilined snoopy protocol WO1999046681A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP53870899A JP2001525095A (en) 1998-03-12 1999-03-12 Cache coherence unit for interconnected multiprocessor nodes with pipelined snoopy protocol

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/041,568 US6631448B2 (en) 1998-03-12 1998-03-12 Cache coherence unit for interconnecting multiprocessor nodes having pipelined snoopy protocol
US09/041,568 1998-03-12

Publications (1)

Publication Number Publication Date
WO1999046681A1 true WO1999046681A1 (en) 1999-09-16

Family

ID=21917213

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/005523 WO1999046681A1 (en) 1998-03-12 1999-03-12 Cache coherence unit for interconnecting multiprocessor nodes having pipilined snoopy protocol

Country Status (3)

Country Link
US (1) US6631448B2 (en)
JP (1) JP2001525095A (en)
WO (1) WO1999046681A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1652091A1 (en) * 2003-08-05 2006-05-03 Newisys, Inc. Methods and apparatus for providing early responses from a remote data cache
US10042804B2 (en) 2002-11-05 2018-08-07 Sanmina Corporation Multiple protocol engine transaction processing

Families Citing this family (81)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6581126B1 (en) * 1996-12-20 2003-06-17 Plx Technology, Inc. Method, system and apparatus for a computer subsystem interconnection using a chain of bus repeaters
US6560681B1 (en) * 1998-05-08 2003-05-06 Fujitsu Limited Split sparse directory for a distributed shared memory multiprocessor system
JP3676934B2 (en) * 1998-12-15 2005-07-27 株式会社日立製作所 Processors and multiprocessor systems
US7529799B2 (en) * 1999-11-08 2009-05-05 International Business Machines Corporation Method and apparatus for transaction tag assignment and maintenance in a distributed symmetric multiprocessor system
US6697919B2 (en) * 2000-06-10 2004-02-24 Hewlett-Packard Development Company, L.P. System and method for limited fanout daisy chaining of cache invalidation requests in a shared-memory multiprocessor system
US8635410B1 (en) * 2000-07-20 2014-01-21 Silicon Graphics International, Corp. System and method for removing data from processor caches in a distributed multi-processor computer system
US20030131201A1 (en) * 2000-12-29 2003-07-10 Manoj Khare Mechanism for efficiently supporting the full MESI (modified, exclusive, shared, invalid) protocol in a cache coherent multi-node shared memory system
US6615319B2 (en) * 2000-12-29 2003-09-02 Intel Corporation Distributed mechanism for resolving cache coherence conflicts in a multi-node computer architecture
FR2820850B1 (en) * 2001-02-15 2003-05-09 Bull Sa CONSISTENCY CONTROLLER FOR MULTIPROCESSOR ASSEMBLY, MODULE AND MULTIPROCESSOR ASSEMBLY WITH MULTIMODULE ARCHITECTURE INCLUDING SUCH A CONTROLLER
WO2002069238A2 (en) * 2001-02-24 2002-09-06 International Business Machines Corporation Managing coherence via put/get windows
US6883070B2 (en) * 2001-03-14 2005-04-19 Wisconsin Alumni Research Foundation Bandwidth-adaptive, hybrid, cache-coherence protocol
US6745272B2 (en) * 2001-04-04 2004-06-01 Advanced Micro Devices, Inc. System and method of increasing bandwidth for issuing ordered transactions into a distributed communication system
US7222220B2 (en) * 2001-05-01 2007-05-22 Sun Microsystems, Inc. Multiprocessing system employing address switches to control mixed broadcast snooping and directory based coherency protocols transparent to active devices
US6799217B2 (en) * 2001-06-04 2004-09-28 Fujitsu Limited Shared memory multiprocessor expansion port for multi-node systems
US6598120B1 (en) * 2002-03-08 2003-07-22 International Business Machines Corporation Assignment of building block collector agent to receive acknowledgments from other building block agents
US7107409B2 (en) * 2002-03-22 2006-09-12 Newisys, Inc. Methods and apparatus for speculative probing at a request cluster
US7653790B2 (en) * 2002-05-13 2010-01-26 Glasco David B Methods and apparatus for responding to a request cluster
US7395379B2 (en) * 2002-05-13 2008-07-01 Newisys, Inc. Methods and apparatus for responding to a request cluster
US7266587B2 (en) * 2002-05-15 2007-09-04 Broadcom Corporation System having interfaces, switch, and memory bridge for CC-NUMA operation
US7296121B2 (en) * 2002-11-04 2007-11-13 Newisys, Inc. Reducing probe traffic in multiprocessor systems
US7003633B2 (en) * 2002-11-04 2006-02-21 Newisys, Inc. Methods and apparatus for managing probe requests
US7103726B2 (en) * 2002-11-04 2006-09-05 Newisys, Inc. Methods and apparatus for managing probe requests
US7346744B1 (en) 2002-11-04 2008-03-18 Newisys, Inc. Methods and apparatus for maintaining remote cluster state information
US6934814B2 (en) * 2002-11-05 2005-08-23 Newisys, Inc. Cache coherence directory eviction mechanisms in multiprocessor systems which maintain transaction ordering
US7162589B2 (en) * 2002-12-16 2007-01-09 Newisys, Inc. Methods and apparatus for canceling a memory data fetch
US7334089B2 (en) * 2003-05-20 2008-02-19 Newisys, Inc. Methods and apparatus for providing cache state information
US6973548B1 (en) * 2003-06-20 2005-12-06 Unisys Corporation Data acceleration mechanism for a multiprocessor shared memory system
US7337279B2 (en) * 2003-06-27 2008-02-26 Newisys, Inc. Methods and apparatus for sending targeted probes
US8028130B1 (en) * 2003-07-22 2011-09-27 Oracle America, Inc. Pipeline structure for a shared memory protocol
US7383464B2 (en) * 2003-12-08 2008-06-03 International Business Machines Corporation Non-inline transaction error correction
US8468308B2 (en) * 2004-01-20 2013-06-18 Hewlett-Packard Development Company, L.P. System and method for non-migratory requests in a cache coherency protocol
US20050160238A1 (en) * 2004-01-20 2005-07-21 Steely Simon C.Jr. System and method for conflict responses in a cache coherency protocol with ordering point migration
US8145847B2 (en) * 2004-01-20 2012-03-27 Hewlett-Packard Development Company, L.P. Cache coherency protocol with ordering points
US7769959B2 (en) 2004-01-20 2010-08-03 Hewlett-Packard Development Company, L.P. System and method to facilitate ordering point migration to memory
US7620696B2 (en) * 2004-01-20 2009-11-17 Hewlett-Packard Development Company, L.P. System and method for conflict responses in a cache coherency protocol
US7818391B2 (en) 2004-01-20 2010-10-19 Hewlett-Packard Development Company, L.P. System and method to facilitate ordering point migration
US8090914B2 (en) * 2004-01-20 2012-01-03 Hewlett-Packard Development Company, L.P. System and method for creating ordering points
US8176259B2 (en) 2004-01-20 2012-05-08 Hewlett-Packard Development Company, L.P. System and method for resolving transactions in a cache coherency protocol
US7395374B2 (en) * 2004-01-20 2008-07-01 Hewlett-Packard Company, L.P. System and method for conflict responses in a cache coherency protocol with ordering point migration
US20050193177A1 (en) * 2004-03-01 2005-09-01 Moga Adrian C. Selectively transmitting cache misses within coherence protocol
US20060047849A1 (en) * 2004-06-30 2006-03-02 Mukherjee Shubhendu S Apparatus and method for packet coalescing within interconnection network routers
US7698278B2 (en) * 2004-08-31 2010-04-13 Red Hat, Inc. Method and system for caching directory services
US8010682B2 (en) * 2004-12-28 2011-08-30 International Business Machines Corporation Early coherency indication for return data in shared memory architecture
JP4956900B2 (en) * 2005-03-07 2012-06-20 富士通株式会社 Address snoop method and multiprocessor system
JP4362454B2 (en) * 2005-04-07 2009-11-11 富士通株式会社 Cache coherence management device and cache coherence management method
US20070083715A1 (en) * 2005-09-13 2007-04-12 International Business Machines Corporation Early return indication for return data prior to receiving all responses in shared memory architecture
US7536514B2 (en) * 2005-09-13 2009-05-19 International Business Machines Corporation Early return indication for read exclusive requests in shared memory architecture
US20080010321A1 (en) * 2006-06-20 2008-01-10 International Business Machines Corporation Method and system for coherent data correctness checking using a global visibility and persistent memory model
US7506108B2 (en) * 2006-06-30 2009-03-17 Intel Corporation Requester-generated forward for late conflicts in a cache coherency protocol
US7536515B2 (en) * 2006-06-30 2009-05-19 Intel Corporation Repeated conflict acknowledgements in a cache coherency protocol
US7721050B2 (en) * 2006-06-30 2010-05-18 Intel Corporation Re-snoop for conflict resolution in a cache coherency protocol
US7600080B1 (en) * 2006-09-22 2009-10-06 Intel Corporation Avoiding deadlocks in a multiprocessor system
JP4868246B2 (en) * 2007-09-12 2012-02-01 エヌイーシーコンピュータテクノ株式会社 Multiprocessor and memory replacement method
US8275947B2 (en) * 2008-02-01 2012-09-25 International Business Machines Corporation Mechanism to prevent illegal access to task address space by unauthorized tasks
US8200910B2 (en) * 2008-02-01 2012-06-12 International Business Machines Corporation Generating and issuing global shared memory operations via a send FIFO
US8255913B2 (en) * 2008-02-01 2012-08-28 International Business Machines Corporation Notification to task of completion of GSM operations by initiator node
US8484307B2 (en) * 2008-02-01 2013-07-09 International Business Machines Corporation Host fabric interface (HFI) to perform global shared memory (GSM) operations
US8239879B2 (en) * 2008-02-01 2012-08-07 International Business Machines Corporation Notification by task of completion of GSM operations at target node
US8214604B2 (en) * 2008-02-01 2012-07-03 International Business Machines Corporation Mechanisms to order global shared memory operations
US20130268663A1 (en) * 2010-12-28 2013-10-10 Mitsubishi Electric Corporation Communication network system
JP5375876B2 (en) * 2011-05-19 2013-12-25 富士通株式会社 Multiprocessor system
WO2013165343A1 (en) * 2012-04-30 2013-11-07 Hewlett-Packard Development Company, L.P. Hidden core to fetch data
US10268583B2 (en) * 2012-10-22 2019-04-23 Intel Corporation High performance interconnect coherence protocol resolving conflict based on home transaction identifier different from requester transaction identifier
US20140114928A1 (en) * 2012-10-22 2014-04-24 Robert Beers Coherence protocol tables
DE112013003723B4 (en) * 2012-10-22 2018-09-13 Intel Corporation High performance physical coupling structure layer
US9256537B2 (en) * 2013-02-14 2016-02-09 International Business Machines Corporation Coherent attached processor proxy supporting coherence state update in presence of dispatched master
EP2979170B1 (en) 2013-03-28 2020-07-08 Hewlett-Packard Enterprise Development LP Making memory of compute and expansion blade devices available for use by an operating system
US10289467B2 (en) 2013-03-28 2019-05-14 Hewlett Packard Enterprise Development Lp Error coordination message for a blade device having a logical processor in another system firmware domain
CN105103121B (en) 2013-03-28 2018-10-26 慧与发展有限责任合伙企业 The subregion of blade system and blade system executes method
US20150254182A1 (en) * 2014-03-07 2015-09-10 Cavium, Inc. Multi-core network processor interconnect with multi-node connection
US9529532B2 (en) 2014-03-07 2016-12-27 Cavium, Inc. Method and apparatus for memory allocation in a multi-node system
US9372800B2 (en) 2014-03-07 2016-06-21 Cavium, Inc. Inter-chip interconnect protocol for a multi-chip system
US10592459B2 (en) 2014-03-07 2020-03-17 Cavium, Llc Method and system for ordering I/O access in a multi-node environment
US9411644B2 (en) 2014-03-07 2016-08-09 Cavium, Inc. Method and system for work scheduling in a multi-chip system
CN107077429B (en) * 2015-03-20 2019-10-18 华为技术有限公司 Method for reading data, equipment and system
US10467139B2 (en) * 2017-12-29 2019-11-05 Oracle International Corporation Fault-tolerant cache coherence over a lossy network
US11714755B2 (en) 2020-07-31 2023-08-01 Hewlett Packard Enterprise Development Lp System and method for scalable hardware-coherent memory nodes
US11573898B2 (en) 2020-08-17 2023-02-07 Hewlett Packard Enterprise Development Lp System and method for facilitating hybrid hardware-managed and software-managed cache coherency for distributed computing
US11366750B2 (en) * 2020-09-24 2022-06-21 EMC IP Holding Company LLC Caching techniques
CN113703958B (en) * 2021-07-15 2024-03-29 山东云海国创云计算装备产业创新中心有限公司 Method, device, equipment and storage medium for accessing data among multi-architecture processors
CN116962259B (en) * 2023-09-21 2024-02-13 中电科申泰信息科技有限公司 Consistency processing method and system based on monitoring-directory two-layer protocol

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0817076A1 (en) * 1996-07-01 1998-01-07 Sun Microsystems, Inc. A multiprocessing computer system employing local and global address spaces and multiple access modes
EP0820016A2 (en) * 1996-07-01 1998-01-21 Sun Microsystems, Inc. A multiprocessing system including an enhanced blocking mechanism for read-to-share-transactions in a NUMA mode

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5029070A (en) 1988-08-25 1991-07-02 Edge Computer Corporation Coherent cache structures and methods
US5297269A (en) 1990-04-26 1994-03-22 Digital Equipment Company Cache coherency protocol for multi processor computer system
US5448698A (en) 1993-04-05 1995-09-05 Hewlett-Packard Company Inter-processor communication system in which messages are stored at locations specified by the sender
US6490630B1 (en) * 1998-05-08 2002-12-03 Fujitsu Limited System and method for avoiding deadlock in multi-node network
US6715008B2 (en) * 1998-05-08 2004-03-30 Fujitsu Ltd. Method and system for over-run protection in a message passing multi-processor computer system using a credit-based protocol
US6625694B2 (en) * 1998-05-08 2003-09-23 Fujitsu Ltd. System and method for allocating a directory entry for use in multiprocessor-node data processing systems

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0817076A1 (en) * 1996-07-01 1998-01-07 Sun Microsystems, Inc. A multiprocessing computer system employing local and global address spaces and multiple access modes
EP0820016A2 (en) * 1996-07-01 1998-01-21 Sun Microsystems, Inc. A multiprocessing system including an enhanced blocking mechanism for read-to-share-transactions in a NUMA mode

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LOVETT T ET AL: "STING: A CC-NUMA COMPUTER SYSTEM FOR THE COMMERCIAL MARKETPLACE", PROCEEDINGS OF THE 23RD. ANNUAL SYMPOSIUM ON COMPUTER ARCHITECTURE, PHILADELPHIA, MAY 22 - 24, 1996, no. SYMP. 23, 22 May 1996 (1996-05-22), ASSOCIATION FOR COMPUTING MACHINERY, pages 308 - 317, XP000679364, ISBN: 0-89791-786-3 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10042804B2 (en) 2002-11-05 2018-08-07 Sanmina Corporation Multiple protocol engine transaction processing
EP1652091A1 (en) * 2003-08-05 2006-05-03 Newisys, Inc. Methods and apparatus for providing early responses from a remote data cache
EP1652091A4 (en) * 2003-08-05 2008-10-29 Newisys Inc Methods and apparatus for providing early responses from a remote data cache

Also Published As

Publication number Publication date
US20010013089A1 (en) 2001-08-09
JP2001525095A (en) 2001-12-04
US6631448B2 (en) 2003-10-07

Similar Documents

Publication Publication Date Title
US6631448B2 (en) Cache coherence unit for interconnecting multiprocessor nodes having pipelined snoopy protocol
JP3644587B2 (en) Non-uniform memory access (NUMA) data processing system with shared intervention support
JP3661761B2 (en) Non-uniform memory access (NUMA) data processing system with shared intervention support
US6615319B2 (en) Distributed mechanism for resolving cache coherence conflicts in a multi-node computer architecture
KR100548908B1 (en) Method and apparatus for centralized snoop filtering
KR100465583B1 (en) Non-uniform memory access(numa) data processing system that speculatively forwards a read request to a remote processing node and communication method in the system
EP0817073B1 (en) A multiprocessing system configured to perform efficient write operations
US6738868B2 (en) System for minimizing directory information in scalable multiprocessor systems with logically independent input/output nodes
US5878268A (en) Multiprocessing system configured to store coherency state within multiple subnodes of a processing node
US7657710B2 (en) Cache coherence protocol with write-only permission
CA2280172C (en) Non-uniform memory access (numa) data processing system that holds and reissues requests at a target processing node in response to a retry
KR100324975B1 (en) Non-uniform memory access(numa) data processing system that buffers potential third node transactions to decrease communication latency
US20110029738A1 (en) Low-cost cache coherency for accelerators
US6266743B1 (en) Method and system for providing an eviction protocol within a non-uniform memory access system
KR101072174B1 (en) System and method for implementing an enhanced hover state with active prefetches
US6226718B1 (en) Method and system for avoiding livelocks due to stale exclusive/modified directory entries within a non-uniform access system
CN106201939A (en) Multinuclear catalogue concordance device towards GPDSP framework
US6813694B2 (en) Local invalidation buses for a highly scalable shared cache memory hierarchy
US10489292B2 (en) Ownership tracking updates across multiple simultaneous operations
US6826654B2 (en) Cache invalidation bus for a highly scalable shared cache memory hierarchy
US20040030950A1 (en) Apparatus for imprecisely tracking cache line inclusivity of a higher level cache
US6636948B2 (en) Method and system for a processor to gain assured ownership of an up-to-date copy of data

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): JP

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

ENP Entry into the national phase

Ref country code: JP

Ref document number: 1999 538708

Kind code of ref document: A

Format of ref document f/p: F

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase