WO2001029669A1

WO2001029669A1 - Communication error reporting mechanism in a multiprocessing computer system

Info

Publication number: WO2001029669A1
Application number: PCT/US2000/028266
Authority: WO
Inventors: Christopher J. Jackson; Erik E. Hagersten
Original assignee: Sun Microsystems, Inc.
Priority date: 1999-10-15
Filing date: 2000-10-11
Publication date: 2001-04-26
Also published as: AU8017100A; US6536000B1

Abstract

A multiprocessing computer system includes a plurality of processing nodes, each having one or more processors (16), a memory (18), and a system interface (24). The plurality of processing nodes may be interconnected through a global interconnect network (52) which supports cluster communications. The system interface (24) of an initiating node may launch a request to a remote node's memory or I/O. The computer system implements an error communication reporting mechanism wherein errors associated with remote transactions may be reported back to a particular processor (16) which initiated the transaction. Each processor (16) includes an error status register (66) that is large enough to hold a transaction error code. The protocol associated with a local bus (20) of each node (i.e., a bus interconnecting the processors of a node to the node's system interface) includes acknowledgement messages for transactions when they have completed. In the event a transaction which is transmitted by a system interface (24) upon the global interconnect network (52) on behalf of a particular processor (16) incurs an error, the system interface (24) sets an error flag in the acknowledgement message and provides an associated error code. If the acknowledgement message denotes an error, the error code is written into the processor's error status register (66) for later retrieval by software. In various embodiments, a system interface (24) may acknowledge a transaction to a given processor (16) early (even if that transaction has not completed globally) if a subsequent transaction from the same processor is pending in the interface.

Description

TITLE: COMMUNICATION ERROR REPORTING MECHANISM IN A MULTIPROCESSING COMPUTER SYSTEM

BACKGROUND OF THE INVENTION

1. Field of the Invention

This invention relates to the field of multiprocessor computer systems and, more particularly, to communication error reporting mechanisms m multiprocessor computer systems

2. Descπption of the Relevant Art

Multiprocessing computer systems mclude two or more processors which may be employed to perform computing tasks. A particular computing task may be performed upon one processor while other processors perform unrelated computmg tasks. Alternatively, components of a particular computing task may be distributed among multiple processors to decrease the tune required to perform the computing task as a whole Generally speaking, a processor is a device configured to perform an operation upon one or more operands to produce a result. The operation is performed m response to an instruction executed by the processor.

A popular architecture in commercial multiprocessing computer systems is the symmetric multiprocessor (SMP) architecture. Typically, an SMP computer system comprises multiple processors connected through a cache hierarchy to a shared bus. Additionally connected to the bus is a memory, which is shared among the processors m the system. Access to any particular memory location within the memory occurs in a similar amount of time as access to any other particular memory location. Smce each location m the memory may be accessed in a uniform manner, this structure is often referred to as a uniform memory architecture (UMA).

Processors are often configured with internal caches, and one or more caches are typically mcluded in the cache hierarchy between the processors and the shared bus in an SMP computer system. Multiple copies of data residmg at a particular mam memory address may be stored m these caches. In order to mamtam the shared memory model, in which a particular address stores exactly one data value at any given time, shared bus computer systems employ cache coherency Generally speaking, an operation is coherent if the effects of the operation upon data stored at a particular memory address are reflected in each copy of the data within the cache hierarchy. For example, when data stored at a particular memory address is updated, the update may be supplied to the caches which are storing copies of the previous data. Alternatively, the copies of the previous data may be invalidated m the caches such that a subsequent access to the particular memory address causes the updated copy to be transferred from mam memory. For shared bus systems, a snoop bus protocol is typically employed. Each coherent transaction performed upon the shared bus is examined (or "snooped") against data in the caches. If a copy of the affected data is found, the state of the cache lme containing the data may be updated m response to the coherent transaction. Unfortunately, shared bus architectures suffer from several drawbacks which limit their usefulness in multiprocessing computer systems A bus is capable of a peak bandwidth (e.g a number of bytes/second which may be transferred across the bus) As additional processors are attached to the bus, the bandwidth required to supply the processors with data and instructions may exceed the peak bus bandwidth. Smce some processors are forced to wait for available bus bandwidth, performance of the computer system suffers when the bandwidth requirements of the processors exceeds available bus bandwidth Additionally, addmg more processors to a shared bus mcreases the capacitive loadmg on the bus and may even cause the physical length of the bus to be mcreased. The mcreased capacitive loadmg and extended bus length mcreases the delay m propagating a signal across the bus. Due to the mcreased propagation delay, transactions may take longer to perform Therefore, the peak bandwidth of the bus may decrease as more processors are added. These problems are further magnified by the continued increase in operating frequency and performance of processors The mcreased performance enabled by the higher frequencies and more advanced processor microarchitectures results m higher bandwidth requirements than previous processor generations, even for the same number of processors Therefore, buses which previously provided sufficient bandwidth for a multiprocessing computer system may be insufficient for a similar computer system employing the higher performance processors Another approach for implementing multiprocessing computer systems is a scalable shared memory (SSM) architecture (also referred to as a distributed shared memory architecture) An SSM architecture mcludes multiple nodes within which processors and memory reside The multiple nodes communicate via a network coupled therebetween When considered as a whole, the memory mcluded within the multiple nodes forms the shared memory for the computer system. Typically, directories are used to identify which nodes have cached copies of data corresponding to a particular address Coherency activities may be generated via examination of the directories.

SSM systems are scaleable, overcoming the limitations of the shared bus architecture. Smce many of the processor accesses are completed within a node, nodes typically have much lower bandwidth requirements upon the network than a shared bus architecture must provide upon its shared bus. The nodes may operate at high clock frequency and bandwidth, accessmg the network when needed. Additional nodes may be added to the network without affecting the local bandwidth of the nodes. Instead, only the network bandwidth is affected

In a typical SSM system, a global domam is created by way of the SSM protocol which makes all the memory attached to the global domain look like one shared memory accessible to all of its processors. A global domam typically runs a smgle kernel. Hardware provides conventional MMU (memory management umt) protection, and the kernel manages mappmgs (e.g. reloadmg of key registers on context switches) to allow user programs to co-exist without trusting one another. Smce the nodes of a global domam share memory and may cache data, a software error in one node may create a fatal software error which may crash the entire system. Similarly, a fatal hardware error m one node will typically cause the entire global domam to crash.

Accordmgly, in another approach to multiprocessing computer systems, clustermg may be employed to provide greater fault protection. Unlike SSM approaches, the memory of one node in a cluster system is not freely accessible by processors of other cluster nodes Likewise, the I/O of one node is typically not freely accessible by processors of other nodes While memory is not freely shared between nodes of a cluster, a cluster allows nodes to communicate with each other in a protected way using an interconnection network which may be initialized by the operating system. Normally, each node of a cluster runs a separate kernel Nodes connected in a cluster should not be able to spread local faults, both hardware and software, that would crash other nodes

Cluster systems are often built on communication mechanisms which are less reliable than, for mstance, SMP buses, since they must connect computers in separate chassis which may be separated by substantial distances Because of this, cluster operations may incur errors, and application programs must be informed of these errors so that they can take appropπate recovery steps An ideal error reporting mechanism would be completely accurate and easy to use Currently-used technology has various limitations m this area For mstance, interfaces which do not provide process-virtualized error information, but log errors on a controller- or system-wide basis, may cause processes which were not responsible for an error to mcur error recovery overhead On the other hand, interfaces which report error information directly to an initiating processor in the form of a processor fault or trap are less easy to use, since many programming languages do not cleanly support the handling of asynchronous errors

It is accordmgly desnable that a cluster communication interconnect be able to tolerate communication errors, and that it be able to report those errors to the software responsible for them For maximum efficiency, it is desirable that the interconnect be able to provide error information directly to an application process rather than to the operating system

In one approach to communication error reporting m a cluster system, a number of cluster error status registers are embedded m each communications interface Each of these registers is associated with a particular processor in the multiprocessor computer system When a cluster operation initiated by one of the processors mcurs an error, the interface notes that error in the cluster error status register associated with that processor Applications may read their cluster error status register whenever they wish to check the status of previously performed cluster operations The per-processor cluster error status registers are saved and restored on processor context switches, thus providmg virtual-per application cluster error status registers to every operating system process

Systems employing such approaches to communication error reporting suffer from vaπous drawbacks For example, m a system which contams multiple cluster mterfaces, an application which wants to ascertain the status of its operations may need to read multiple cluster error status registers, one from each cluster mterface This mcreases the time needed to perform a complete messagmg operation In addition, the operating system must save and restore multiple cluster error status registers for each process during a context switch This mcreases context switch time and thus adds to the general overhead imposed by the operating system

Another drawback to such systems is that the cluster mterface must contain cluster error status registers for all processors which could possibly be part of any machme m which it is mstalled This adds to the cost of the mterface, which is a particular drawback when trying to develop a high-volume, low cost implementation which is usable in multiple types of systems

It is thus desirable to provide a fast and reliable error communication mechanism m a multiprocessing computer system which allows for efficient and scalable implementations of user and kernel-level communication protocols

SUMMARY OF THE INVENTION

The problems outlmed above mav m large part be solved by a communication error reporting mechanism m accordance with the present invention In one embodiment, a multiprocessing computer system mcludes a plurality of processing nodes, each including one or more processors, a memory, and a system mterface The plurality of processing nodes may be interconnected through a global interconnect network which supports cluster communications The system interface of an initiating node may launch a request to a remote node's memory or I/O The computer system implements an error communication reporting mechanism wherem errors associated with remote transactions may be reported back to a particular processor which initiated the transaction Each processor mcludes an error status register that is large enough to hold a transaction error code The protocol associated with a local bus of each node (1 e . a bus interconnecting the processors of a node to the node's system mterface) mcludes acknowledgement messages for transactions when they have completed In the event a transaction which is transmitted by a system mterface upon the global mterconnect network on behalf of a particular processor incurs an error, the svstem interface sets an error flag m the acknowledgement message and provides an associated error code If the acknowledgement message denotes an error, the error code is written into the processor s error status register for later retrieval by software In various embodiments, a system mterface may acknowledge a transaction to a given processor early (even if that transaction has not completed globally) if a subsequent transaction from the same processor is pending in the mterface

Advantageously, the per-processor error status registers may be saved and restored on processor context switches, thus providmg virtual per-application cluster error status registers to every operatmg system process

Improved scalmg may be attained in embodiments employmg multiple system interfaces smce only a smgle error status register needs to be read on an error check or context switch Additionally, a processor may perform a read to its associated error status register without executmg a cycle upon the local bus Still further, errors may be reported without processor faults or traps

BRIEF DESCRIPTION OF THE DRAWINGS

Other objects and advantages of the invention will become apparent upon reading the following detailed description and upon reference to the accompanymg drawings in which Fig 1 is a block diagram of a multiprocessor computer system Fig 2 is a block diagram of another embodiment of a multiprocessor computer system

Fig 3 is a block diagram of yet another embodiment of a multiprocessor computer system Fig 4 is a block diagram illustrating aspects a node of a multiprocessor computer system Fig 5 is a block diagram illustrating aspects of a group of error handlmg subsystems within a multiprocessor computer system Figs 6-8 are block diagrams illustrating operation of the group of error handlmg subsystems of Fig 5

While the mvention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be descπbed in detail It should be understood, however, that the drawings and detailed description thereto are not intended to limit the mvention to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present mvention as defined by the appended claims

DETAILED DESCRIPTION OF THE INVENTION

Turning now to Fig 1 , a block diagram of one embodiment of a multiprocessing computer system 10 is shown Computer system 10 mcludes multiple SMP nodes 12A-12D interconnected by a pomt-to-pomt network 14 Each SMP node includes multiple processors, a local bus, a memory, and a system mterface For example, SMP node 12A includes processors 16A, an SMP bus 20A. a memorv 22A, and a system mterface 24A SMP nodes 12B-12D are configured similarly Each SMP node may further mclude one or more input/output (I/O) interfaces (e.g , I O interfaces 26A-26D) which are used to interface to peπpheral devices such as seπal and parallel ports disk drives, modems, printers, and so on Elements referred to herein with a common reference number followed by a particular letter will be collectively referred to by the reference number alone For example, SMP nodes 12 A- 12D will be collectively referred to as SMP nodes 12

Each SMP node 12 is essentially an SMP system havmg its corresponding memory 22 as the shared memorv Processors 16 are high performance processors In one embodiment each processor 16 is a SPARC™ processor compliant with version 9 of the SPARC™ processor architecture It is noted, however, that any processor architecture may be employed by processors 16 It is further noted that each of the processors 16 mcludes an appropπate mterface to support the protocols associated with each SMP bus 20. as will be descnbed further below Each of the processors 16 may additionally mclude cache memory subsystems SMP bus 20 accommodates communication between processors 16, memory 22, system mterface 24, and

I/O mterface 26 In one embodiment, SMP bus 20 mcludes an address bus and related control signals, as well as a data bus and related control signals Because the address and data buses are separate, a split-transaction bus protocol may be employed upon SMP bus 20 Generally speakmg, a split-transaction bus protocol is a protocol m which a transaction occurring upon the address bus may differ from a concurrent transaction occurring upon the data bus Transactions mvolvmg address and data mclude an address phase m which the address and related control information are conveyed upon the address bus, and a data phase m which data is conveyed upon the data bus Additional address phases and/or data phases for other transactions may be initiated pπor to the data phase corresponding to a particular address phase An address phase and the corresponding data phase may be correlated in a number of ways For example, data transactions may occur m the same order that the address transactions occur Alternatively, address and data phases of a given transaction may be identified via a unique tag

In one embodiment, each transaction conveyed upon SMP bus 20 mcludes a field (or control signals) which identifies the particular CPU which initiated the transaction A particular processor 16 may initiate a read operation upon SMP bus 20 by asserting certain control signals and conveymg the address of the requested data upon SMP bus 20 This corresponds to the address phase of a read operation When the target device has the requested data available, the data is conveyed upon SMP bus 20 and is provided to the mitiator during a data phase When the requested data resides in a remote node, system mterface 24 conveys the request across network 14, and, upon receipt of the data from the remote node, provides the read data to the mitiator via SMP bus 20

For write operations, a particular processor 16 may indicate its intent to perform a write by conveymg the destination address durmg an address phase upon SMP bus 20 The target then issues a message indicating it is ready to accept the data, and mdicates where the mitiator should send the data (e g , to an allocated internal buffer of the target) The mitiator subsequently sends the data across SMP bus 20 durmg a data phase Similar to read transactions, if the destmation for a write transaction resides m a remote node, the corresponding system mterface 24 handles the transaction globally on behalf of the initiating processor It is noted that m other embodiments, other specific protocols may be supported by each SMP bus 20 Computer system 10 may be operable in a cluster mode When operating in a cluster mode, the memory of one node is not freely accessible by processors of other cluster nodes Likewise, the I/O of one node is typically not freely accessible by processors of other nodes Instead, the system mterface 24 of each node 12 mcludes cluster management functionality which is operable to determine whether a particular remote node is allowed access to that node's memory or I/O The system mterface of each node also detects transactions upon SMP bus 20 which require a transfer to another SMP node 12 System mterface 24 performs the transfers and tracks the transactions until they have completed in the remote nodes The cluster configuration is typically maintained by the operating system kernel

In the embodiment shown, system mterface 24 is coupled to a pomt-to-pomt network 14 In a point- to- point network, individual connections exist between each node upon the network A particular node communicates directly with a second node via a dedicated link To communicate with a third node, the particular node utilizes a different link than the one used to communicate with the second node Alternatively, the point to point network 14 may be configured such that a particular node may be used as a "hop" to pass through communications between a sendmg node and a receivmg node That is, the network is arranged such that communications from a sendmg node to a particular receiving node must pass through a hop node By configuring the network using hop nodes, the cost of the system may be reduced, and the interconnect network may be simplified

It is noted that, although four SMP nodes 12 are shown m Fig 1, embodiments of computer system 10 employmg any number of nodes are contemplated Additionally, m other embodiments, global interconnects other than a pomt-to-point network may be employed to interconnect and facilitate communication between the processmg nodes, such as a broadcast network As used herem, a processmg node is a data processmg subsystem mcludmg at least one processor, a corresponding memory, and circuitry for communicatmg with other processmg nodes

It is further noted that embodiments are also contemplated wherem a plurality of nodes are configured to operate m an SSM mode of operation with respect to each other, but that collectively form a cluster node within a cluster that mcludes other cluster nodes As discussed previously, it is possible that communication errors will occur m systems such as computer system 10 when a transaction which requires access to the memory or I/O of another node is transmitted from a particular node across network 14 For example, it is possible that a node receivmg the transaction will determine that the requesting node does not have access πghts to the address specified m the transaction, as determined by the cluster management functionality Similarly, the node to which a particular transaction is sent may not respond at all. Other vaπous types of errors are also possible, such as destmation busy errors, mvalid transaction errors, access violation errors, read-only data errors, non-existent node errors, general communication errors, and so on Accordmgly, computer system 10 implements an error communication reporting mechanism wherem errors associated with remote transactions (that is, transactions that are transmitted to remote nodes) may be reported back to a particular processor 16 which initiated the transaction More particularly, and as will be descnbed in further detail below, each processor 16 mcludes an error status register which is large enough to hold a transaction error code. The protocol associated with each SMP bus 20 is extended to mclude acknowledgement messages for transactions when they have completed In the event a transaction which is transmitted by a system mterface 24 upon network 14 on behalf of a particular processor mcurs an error, the system mterface sets an error flag m the acknowledgement message and provides an associated error code If the acknowledgement message denotes an error, the error code is wπtten into the processor's error status register for later retneval by software

Advantageously, the per-processor error status registers are saved and restored on processor context switches, thus providmg virtual per-apphcation cluster error status registers to every operating system process Further details regardmg a particular implementation of the error reporting mechanism are provided further below m conjunction with Figs. 4 and 5 Pπor to discussmg details of the error reporting mechanism, it is first noted that several system interfaces may reside within a smgle node For example, Fig 2 illustrates an exemplary system m which node 12B mcludes a plurality of system interfaces 24B-1 through 24B-n Each system mterface 24, which may be implemented as an mtegrated circuit chip, mcludes a finite number of ports to support point-to-point connections to other nodes Accordmgly, by mcludmg several system interfaces within a common node, greater connectivity to additional nodes may be achieved

Different system interfaces 24 may also be provided and initialized to handle only a subset of the accesses for a particular address slice (e g , address region) For example, one mterface may handle even addresses while another interface handles odd addresses In this way, havmg more than one mterface may mcrease the bandwidth provided to one specific node

This concept may be better understood with reference to Fig 3 Fig 3 illustrates an exemplary system configuration wherem a node 12A includes a pair of system interfaces 24A-1 and 24A-2, each coupled to provide selected cluster communications to corresponding system mterfaces 24B-1 and 24B-2 of node 12B In this configuration, system mterfaces 24 A- 1 and 24A-2 may be initialized such that system interface 24A-1 handles even addresses for a particular address slice, while system mterface 24A-2 handles odd addresses This "data stnping" thus provides mcreased bandwidth to node 12B for accesses to that slice, smce the burden associated with such transfers is spread between the system mterfaces

The handlmg of inter-node communication errors within the systems descnbed above is next considered Fig 4 is a block diagram illustrating aspects of one embodiment of a node 12 mcludmg a pair of system mterfaces 24 coupled between a global mterconnect 52 and an SMP bus 20 A plurality of processors 16-1 through 16-m and a memory 18 are further shown coupled to SMP bus 20

As illustrated, each system mterface 24 mcludes a plurality of request agents 54- 1 through 54-n Each request agent 54 mcludes an error handlmg subsystem 56- 1 through 56-n, respectively A transaction filter 57 of each system mterface 24 is shown coupled between request agents 54-1 through 54-n and SMP bus 20 A global transaction processmg umt 58 within each system mterface 24 is further shown coupled between global mterconnect 52 and SMP bus 20 A cluster management unit 60 is depicted within each global transaction processmg unit 58

During operation, transaction filter 57 monitors transactions initiated upon SMP bus 20 to determine whether a given transaction must be conveved globally to another node via global interconnect 52 This may be determined by the address of the transaction Each of request agents 56-1 through 56-n is capable of receivmg a transaction mitiated upon SMP bus 20 through transaction filter 56, and is configured to transmit a correspondmg transaction via global mterconnect 52 to a destmation remote node on behalf of the initiating processor 16 In this particular embodiment, each request agent 56 is capable of handlmg a single outstandmg transaction at a time, and tracks the transaction until it has completed

The global transaction processing unit 58 of each system mterface 24 is provided to receive incoming requests from remote notes, and to convev the requests to SMP bus 20. when appropriate When the global transaction processmg unit 58 of a given system interface receives a transaction from a remote node, the associated cluster management unit 60 determines whether access from the remote node is allowed m accordance with the cluster configuration If access is allowed, the global transaction processmg unit 58 initiates a correspondmg transaction upon SMP bus 20 In the case of write operations, the global transaction processmg unit 58 may cause the data to be written mto a particular memory or I/O location In the case of read transactions, the global transaction processmg unit 58 may cause data to be read from a particular memory or I/O location Following the data access, the global transaction processmg umt 58 transmits a completion message (mcludmg read data, in the case of reads) through global mterconnect 52 to the node from which the transaction was initially received

Each of the processors 16-1 through 16-m is shown with an internal error status register 66- 1 through 66- m, respectively Each error status register 66 is large enough to hold a transaction eπor code If the error handlmg subsystem 56 of a particular request agent 54 determines that an error has occuπed with respect to a particular tiansaction it is handlmg, the request agent 54 provides an error code m an acknowledgement message conveyed upon SMP bus 20 In general, both read and wnte operations performed on SMP bus 20 conclude with an acknowledgement message For read transactions, the acknowledgement message may be provided in the same phase m which the read data is conveyed to the mitiator For wπte operations, the acknowledgement message may be communicated m a separate phase on SMP bus 20

In response to receivmg an acknowledgement message indicating an eπor, the error code is stored within the eπor status register 66 of the processor that mitiated the transaction Various types of eπors may be mdicated by the eπor handling subsystem 56 of a particular request agent 54 For example, m one embodiment, detectable eπors mclude eπors reported by a remote node such as access violations mcludmg out-of-bounds, destmation time out eπors, destination busy eπors, and so on For these classes of eπors, the request agent receives an encoded eπor message from the remote node m a global communication conveyed through global mterconnect 52 The request agent then passes a coπespondmg eπor code to the initiating processor m the acknowledgement message conveyed on SMP bus 20 In addition, eπors may also be determined by a request agent itself, such as time-out eπors, which may occur, for example, when a remote node does not respond to a transaction

In accordance with the eπor reporting mechanism as descnbed above in conjunction with Fig 4, improved scalmg may be attained m embodiments employmg multiple system mterfaces smce only a smgle eπor status register needs to be read on an eπor check or context switch Additionally, a processor 16 may perform a read to its associated eπor status register 66 without executmg a cycle upon SMP bus 20 It is noted that a particular processor 16 may read its associated eπor status register usmg an address dedicated to the internal eπor status register, or, m other implementations, by executmg a specialized instruction

Other advantages may also be realized For example, the cost of a system implemented m accordance with the foregomg descnption may further be reduced, smce a separate eπor status register coπespondmg to each possible CPU in the system is not incorporated within each of the system interfaces 24 In one embodiment, before a given transaction is acknowledged on SMP bus 20 to an initiating processor

16, the given tiansaction must first be completed globally In this manner, if the request agent 54 handling the transaction determines the global transaction incuπed an eπor, the appropriate eπor code can be conveyed with the acknowledgement message on SMP bus 20 to the initiating processor 16 The data rate between a processor and a remote node may thus be limited by the number of transactions the processor allows to be outstandmg, and by the latency of those transactions' acknowledgements

Accordingly, to improve performance, in other embodiments certain transactions may be acknowledged upon SMP bus 20 to the initiating processor before the transaction has actually completed globally This is possible smce normally the status of individual transactions is not important (that is, software executing upon a given processor normally would not check the content of a coπespondmg eπor status register 66 after every transaction) Instead, software will normally check the status of the coπespondmg eπor status register 66 after a group of transactions have completed Accordmgly, m vaπous implementations (mcludmg that descnbed below m conjunction with Fig. 5), the request agents 54 may be configured to determine if there are any previous outstanding transactions from the processor issumg a new transaction If so, those previous transactions may be acknowledged early (that is, before the transactions have completed globally), if desired. Any eπors that come back from a remote node which are related to those early-acknowledged transactions can be reported on any later outstanding transaction It is noted that m such implementations, the last transaction pendmg m the mterface for a given processor must wait until all previous remote transactions have completed globally before it can be acknowledged. Additionally, it is noted that before the eπor status for a group of transactions can be determined, the last transaction m the group must be completed. In embodiments employmg SPARC™ processors, the MEMBAR instruction may be executed to insure completion of all previous transactions. The throughput in embodiments which allow request agents to acknowledge transactions upon SMP bus 20 early may be advantageously limited by the number of transactions the system mterface 24 is able to keep track of, mstead of by the number of transactions each processor allows to be outstanding

Fig 5 is a block diagram illustrating further aspects of one embodiment of a multiprocessing computer system mcludmg a number of eπor handlmg subsystems 56-1 through 56-x associated with vanous request agents 54 The eπor handlmg subsystems 56 illustrated m Fig. 5 may reside within a smgle system mterface 24 or within multiple system mterfaces. The eπor handlmg subsystems 56 are interconnected by an initiate bus 70 and a completion bus 72. It is noted that the initiate bus 70 and completion bus 72 are mdependent of global mterconnect 52. Each eπor handlmg subsystem 56 mcludes an associated control unit 80-1 through 80-x coupled to a memory or storage umt 82-1 through 82-x, respectively, and to a timeout counter 84-1 though 84-x, respectively. The storage umt 82 of each eπor handlmg subsystem 56 mcludes a field for storing an "inhented eπor" code, a field for storing a processor id, a field for storing a "has-parent" bit, and a field for storing a "has-child" bit. From Fig. 4, it is noted that a separate eπor handlmg subsystem 56 as illustrated m Fig. 5 may be provided for each request agent 54. It is further noted that time-out counters 84 are provided for determining time out eπors, which may occur when a response is not received from a remote node m response to a globally transmitted transaction.

During operation, when a request agent accepts a new transaction, the control unit 80 associated with that request agent sets the inhented eπor field of storage unit 82 to "000" (indicating no eπor, m this particular example) and clears its has-parent and has-child bits The associated control unit 80 further sets the processor id field to the mitiator of the transaction, and dπves the processor id value onto the initiate bus 70.

When a control umt 80 of another eπor handlmg subsystem 56 detects a processor id value upon initiate bus 70 which is the same as the processor id stored m its associated storage umt 82, the control unit 80 of that eπor handlmg subsystem sets the has-child bit for that eπor handlmg subsystem and asserts the predecessor signal at lme 74 If the control unit 80 which is driving the initiate bus 70 detects that the predecessor signal is asserted by another eπor handlmg subsystem, it sets its associated has-parent bit. A request agent whose has-child bit is clear and whose has-parent bit is set is refeπed to herem as bemg an "hen"

Any non-heir request agent whose child-bit is set may acknowledge a transaction to the processor it is acting on behalf of before that transaction has completed globally (e.g., m the case of a wnte operation). When it does so, the control unit 80 for that request agent supplies the value contamed m its associated inhented eπor field as the transaction's completion status (m the acknowledgement message provided on SMP bus 20) It is noted that the request agent remams busy (i.e , it cannot accept a new transaction) until the transaction is completed globally On the other hand, an hen request agent cannot acknowledge a transaction on SMP bus 20, and must wait until it is no longer an hen A non-hen request agent with no children (wherem both the has-child bit and has- parent bit are cleared) can acknowledge a transaction on SMP bus 20 when the transaction has completed globally When a request agent receives a completion message from a remote node through global mterconnect 52, and if the control umt 80 associated with that request agent has already provided an early acknowledgement coπespondmg to the transaction upon SMP bus 20, the control umt 80 dnves the processor id and inhented eπor code of the associated eπor handlmg subsystem upon completion bus 72 At this pomt, the associated request agent may retire the transaction Similarly, if the request agent has not yet provided a coπespondmg acknowledgement upon SMP bus 20, the control unit 80 of that request agent dnves its associated processor id and a "000" eπor status on the completion bus 72 It further acknowledges the transaction upon SMP bus 20 In the acknowledgement message dπven upon SMP bus 20, the control unit 80 either dnves the value within the inhented eπor field of the associated storage umt 82 as an eπor code, or if that is "000", provides whatever eπor code was received m the global completion message

Each remaining eπor handlmg subsystem 56 monitors the completion bus 72 to determine whether a processor id coπespondmg to the value stored m the processor id field of its associated storage unit 82 is driven upon completion bus 72. If a control umt 80 detects a conveyance of a processor id coπespondmg to the processor id value stored in its associated storage umt 82, the control unit 80 asserts the "survivor" signal at lme 76 if it is a non-hen agent. If an hen agent detects a conveyance of a coπespondmg processor id on completion bus 72, the hen agent samples the survivor signal. If the survivor signal is not asserted, that agent clears its associated has- parent bit, and is thus no longer an hen Regardless of whether the bit is cleared or not, if the agent's inhented eπor field is 000, it is set to the eπor status dπven on the completion bus.

The operation of the eπor handlmg subsystems 56 of Fig. 5 may be better understood with reference to the example illustrated m Figs 6-8 Referring to Fig 6, assume processor 16-1 mitiates a transaction upon SMP bus 20 that is handled and transmitted globally by the request agent associated with eπor handlmg subsystem 56-1 In response to receipt of this transaction, the eπor handlmg subsystem 56-1 sets the processor id field of storage umt 82-1 to a value of, for example, "001", which coπesponds to processor 16-1 The control unit 80-1 further sets the inhented eπor field of storage unit 82-1 to "000" and clears the has-parent and has-child bits, as illustrated The control unit 80-1 finally dnves the processor id value "001" upon the initiate bus 70 At this pomt it is assumed that the request agents associated with eπor handlmg subsystems 56-2 though 56-x have no outstanding transactions

Next, assume processor 16-1 mitiates another transaction upon SMP bus 20 which is handled and transmitted globally by the request agent associated with eπor handlmg subsystem 56-2. Similar to the previous operation, the eπor handlmg subsystem 56-2 responsively sets its inhented eπor field to "000", and clears its has- child and has-parent bits The control unit 80-2 further sets its processor id field to "001 " and dnves the processor id on the initiate bus 70 At this pomt, the control unit 80-1 of eπor handlmg subsystem 56-1 detects the transmission of the processor id "001" on initiate bus 70 and, smce it matches the processor id within storage unit 82- 1 , control unit 80- 1 sets its has-child bit, and asserts the predecessor signal at lme 74 In response to the predecessor signal bemg asserted, the control unit 80-2 sets its has-parent bit. The values stored within storage units 82-1 and 82-2 following these operations illustrated m Fig 7 As stated previously, a request agent whose has-parent bit is set and whose has-child bit is refeπed to herem as an "hen"

Assume next that processor 16-1 mitiates a transaction upon SMP bus 20 which is handled and transmitted globally by the request agent associated with eπor handlmg subsystem 56-x In response to this operation, control unit 80-x sets the inhented eπor field of storage unit 82-x to "000" and the processor id field to "001", and clears its has-parent and has-child bits Control unit 80-x further dnves the processor id value on the initiate bus 70 Control umt 80-2 responsively sets the has-child bit of storage umt 82-2, and asserts the predecessor signal at lme 74 (it is noted that eπor handlmg subsystem 56-1 may do the same, however, its has-child bit was already set) In response to the predecessor signal being asserted, control umt 80-x sets the has-parent bit of storage unit 82-x The values stored within each storage unit 82 following these operations are illustrated m Fig 8

In the situation illustrated by Fig 8, the request agent associated with eπor handlmg subsystem 82-x is an hen Smce the request agents associated with eπor handling subsystems 56-1 and 56-2 are not hens (and have set has-child bits), either could acknowledge the transaction it is handlmg upon SMP bus 20 to processor 16-1 Thus, consider a situation wherem the request agent associated with eπor handlmg 56-1 acknowledges the transaction it is handlmg (l e , before the transaction completes globally) In this case, the value "000" within the inhented eπor field of storage unit 82-1 is conveyed upon SMP bus 20 in an acknowledgement message, indicating no eπor This value may be stored within the eπor status register 66-1 of processor 16-1

If the request agent associated with eπor handlmg subsystem 56-1 later receives a completion message from a remote node indicating an eπor, or determines that an eπor has occuπed due to a timeout, for example, control umt 80-1 conveys the coπespondmg eπor code upon completion bus 72, along with the processor id "001" At this pomt, the request agent associated with eπor handlmg subsystem 56-1 may be retired, and is available to accept new transactions If no eπor is mdicated, an eπor code value of "000" (mdicatmg no eπor) is conveyed upon completion bus 72 along with the processor id

In response to control unit 80-1 conveymg the eπor code upon completion bus 72, control unit 80-2 asserts the survivor signal, smce it is a non-hen agent Additionally, smce the request agent associated with control umt

80-x is an hen agent, control umt 80-x samples the survivor signal Smce m this case the survivor signal is asserted by control umt 80-2, the has-parent bit of storage unit 82-x is not cleared, and the request agent associated with eπor handlmg subsystem 56-x remams an heir (note that if the survivor signal was not asserted, the parent-bit would be cleared) The eπor code conveyed upon completion bus 72 is, however, stored within the inhented eπor field of storage unit 82-x This value may later be conveyed m an acknowledgement message upon SMP bus 20 when eπor handlmg subsystem 56-x is allowed to acknowledge its coπespondmg transaction Operations m accordance with the foregomg descπption are performed in response to subsequent transactions initiated by processor 16-1, and m response to the acknowledgements of other transactions

Software executmg on a particular processor can penodically read the eπor status register 66 associated with that processor to determine if any eπor has been recorded smce the last time it read the eπor status register

This may be accomplished by performing a read operation to a particular address m the address space of the system (l e , to an address each eπor status register 66 is mapped) In other embodiments, a specialized instruction may be defined to allow access to each eπor status register 66

The systems descnbed above may advantageously allow per-processor eπor status registers to be saved and restored on processor contact switches, thus providmg virtual per-apphcation cluster eπor status registers to every operating system process. The systems may further allow for efficient and scalable implementations of user and kemel-level commumcation protocols with eπor reporting. Eπors may be reported without processor faults or traps.

Numerous vaπations and modifications will become apparent to those skilled m the art once the above disclosure is fully appreciated It is mtended that the followmg claims be mterpreted to embrace all such vaπations and modifications

Claims

WHAT IS CLAIMED IS:

1 A subsystem for a multiprocessing computer system, said subsystem compnsmg a first processor coupled to a local bus, said first processor mcludmg a first eπor status register, a second processor coupled to said local bus, said second processor mcludmg a second eπor status register, a system mterface coupled to said local bus, wherem said system mterface is configured to receive transactions dnected to one or more remote nodes which are mitiated by said first and second processors, wherem said system mterface is configured to provide a first eπor code to be stored withm said first eπor status register in response to a first eπor bemg generated as a result of a first tiansaction mitiated by said first processor, and wherem said system mterface is configured to provide a second eπor code to be stored withm said second eπor status register m response to a second eπor bemg generated as a result of a second transaction mitiated by said second processor

2 The subsystem as recited m Claim 1 wherem said system mterface is configured to provide said first eπor code to said first eπor status register via an acknowledgement message conveyed upon said local bus

3 The subsystem as recited m Claim 1 wherem said system mterface mcludes a first request agent configured to convey a global transaction coπespondmg to said first transaction to a first remote node on behalf of said first processor, and wherem said system mterface mcludes a second request agent configured to convey to said first remote node another global transaction coπespondmg to a subsequent transaction by said first processor

4 The multiprocessing computer system as recited m claim 3, wherem said first request agent mcludes a first storage umt configured to store said first eπor code, and wherem said second request agent mcludes a second storage umt configured to store a thnd eπor code

5 The multiprocessing computer system as recited m claim 4 wherem said first request agent is configured to provide an acknowledgement message to said first processor pπor to said global transaction completing withm said first remote node

6 The multiprocessing computer system as recited m claim 5 wherem said second request agent is configured to convey a second acknowledgement message upon said local bus to said first processor m response to said subsequent transaction, wherein said second acknowledgement message mcludes said first eπor code

7 The multiprocessing computer system as recited m claim 6 wherem said first processor mcludes an mtemal eπor status register, and wherein said first processor is configured to store said first eπor code withm said internal eπor status register m response to said second acknowledgement message

8 A multiprocessing computer system compnsmg a plurality of processmg nodes and a global mterconnect network interconnecting said plurality of processmg nodes, wherem a first node mcludes a plurality of processors, a memory coupled to said plurality of processors through a local bus, and a system mterface for receivmg transactions mitiated by said plurality of processors on said local bus which are destined to remote nodes; wherem each of said plurality of processors mcludes an eπor status register configured to store information regarding eπors associated with transactions conveyed upon said global bus network by said system mterface

9 The multiprocessing computer system of claim 8, wherem said system mterface is configmed to generate an acknowledgement message m response to a given transaction

10 The multiprocessing computer system of claim 9, wherem said system mterface is configured to convey said acknowledgement message to a given processor that mitiated said given transaction through said local bus

11 The multiprocessing computer system as recited in Claim 10 wherem said given processor is configured to initiate said given transaction upon said local bus, and wherem said system mterface mcludes a request agent configured to receive said given transaction and to convey said given transaction to a remote target node on behalf of said given processor.

12 The multiprocessing computer system of claim 11 , wherem said request agent is configured to provide an eπor code with said acknowledgement message m response to receivmg an eπor message from said remote target node.

13. The multiprocessing computer system of claim 12, wherem said eπor code is indicative of a time-out eπor conveyed by said remote target node.

14 The multiprocessing computer system as recited m claim 11, wherem said request agent is configured to provide an eπor code with said acknowledgement message m response to detecting an eπor associated with said given transaction

15 The multiprocessing computer system of claim 14, wherem said eπor code is indicative of a time-out eπor determined by said request agent

16 The multiprocessing computer system of claim 12, wherem said eπor code is indicative of an access violation

17 The multiprocessing computer system as recited m claim 10, wherem said acknowledgement message is encodable to mdicate an eπor

18 The multiprocessing computer system as recited m claim 17, wherem said system mterface mcludes a first request agent configured to convey said given transaction to a first remote node on behalf of said given processor, and wherem said system mterface mcludes a second request agent configured to convey a second transaction to a second remote node on behalf of said given processor

19. The multiprocessing computer system as recited m claim 18, wherem said first request agent mcludes a first storage umt configured to store a first eπor code, and wherem said second request agent mcludes a second storage umt configured to store a second eπor code

20 The multiprocessing computer system as recited m claim 19 wherem said first request agent is configured to provide said acknowledgement message to said given processor pnor to said given transaction completing withm said first remote node

21 The multiprocessing computer system as recited m claim 20 wherem said second request agent is configured to store a given eπor code generated m response to said given transaction, and wherem said second request agent is configured to convey a second acknowledgement message upon said local bus to said given processor m response to said second transaction, wherem said second acknowledgement message mcludes said given eπor code

22. The multiprocessing computer system as recited in claim 21 wherem said given processor mcludes an internal eπor status register, and wherem said given processor is configured to store said given eπor code withm said internal eπor status register m response to said second acknowledgement message.

23. A method for reporting communication eπors m a multiprocessing computer system, said method compnsmg - a first processor of a first processmg node initiating a first transaction on a local bus; a system mterface receivmg said first transaction on said local bus and conveymg a coπespondmg global tiansaction to a remote node, said system mterface receivmg a global completion message from said remote node, wherem said global completion message mcludes an eπor indication, said system mterface providmg an eπor code m an acknowledgement message conveyed via said local bus to said first processor, said eπor code bemg indicative of said eπor indication; and said first processor storing said eπor code withm an mtemal eπor status register