|Publication number||US5327548 A|
|Application number||US 07/973,356|
|Publication date||Jul 5, 1994|
|Filing date||Nov 9, 1992|
|Priority date||Nov 9, 1992|
|Also published as||EP0597598A2|
|Publication number||07973356, 973356, US 5327548 A, US 5327548A, US-A-5327548, US5327548 A, US5327548A|
|Inventors||William R. Hardell, Jr., James D. Henson, Jr., Oscar R. Mitchell|
|Original Assignee||International Business Machines Corporation|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (10), Non-Patent Citations (2), Referenced by (15), Classifications (13), Legal Events (6)|
|External Links: USPTO, USPTO Assignment, Espacenet|
The present invention is related to co-pending U.S. patent application Ser. No. 07/969,596, filed Oct. 30, 1992, having title "APPARATUS AND METHOD FOR BOOTING A MULTIPLE PROCESSOR SYSTEM HAVING A GLOBAL/LOCAL MEMORY ARCHITECTURE", and having common inventorship and assignee.
The present invention relates in general to computer system memories. More particularly, the invention is directed to systems and methods for using spare bits in the context of a global memory shared by a multiplicity of processors.
Systems composed of multiple but coordinated processors were first developed and used in the context of mainframes. More recently, interest in multiple processor systems has escalated as a consequence of the low cost and high performance of microprocessors, with the objective of replicating mainframe performance through the parallel use of multiple microprocessors.
A variety of architectures have been defined for multi-processor systems. Most designs rely upon highly integrated architectures by virtue of the need for cache coherence. In such systems, cache coherence is maintained through complex logic circuit interconnection of the cache memories associated with the individual microprocessors to ensure data consistency as reflected in the various caches and main memory.
A somewhat different approach to architecting a multi-processor system relies upon a relatively loose hardware level coupling of the individual processors, with the singular exception of circuit logic controlling access to the shared global memory, and the use of software to manage cache coherence. An architecture which relies upon software managed cache coherence allows the designer to utilize existing processor hardware to the maximum extent, including the utilization of memory error correction resources such as bank related spare bit steering and data error correction code (ECC) memory configurations. This relative independence of the processors also lends itself to multi-processor systems with extenuated levels of availability, in that one or more processors may be disconnected without disrupting the operation of the remaining processors. Coordination in the access to, and coherency with, a shared global memory is of course somewhat more difficult when the processors are not closely coupled.
One problem that arises with a shared global memory, loosely coupled, multi-processor architecture relates to the management of error detection and correction resources. In such context, the designation and coordinated use of spare bits as well as error correction code bits must be consistent from processor to processor, so that the data in global memory is both consistent and reliable.
The present invention defines a system and method for steering spare bits in a multi-processor architecture having global memory resources, being comprised of a means for selecting a first processor to define the steering of spare bits in global memory, a means for enabling processors to define the steering of spare bits in respective local memories, and means for transferring global memory spare bit steering information from the first processor to other processors.
In a preferred practice of the invention, the first of the multi-processors reaching a specified stage in the booting process is assigned responsibility for testing both its local memory and the global memory. The remaining processors test only their respective local memory arrays. The bit steering information derived by the selected processor is thereafter conveyed to each of the other processors as a part of ensuring that the memory spare bit steering is consistent from processor to processor for the global memory. Local memory bit steering is individualized to the associated processor.
The global memory spare bit steering information is conveyed from the selected processor which performs the global memory test to the remaining processors in either of two manners, as preferably embodied. The first involves transfer through semaphore related registers in an atomic semaphore controller connected to all of the processors. In another form, spare bit steering and bank configuration information is fundamentally conveyed from the processor testing global memory to the other processors in the multi-processor system through a specially allocated block of global memory. Limited setup and global memory pointer information is passed through the atomic semaphore controller in the second form. Both systems and methods distribute for common use an identical set of spare bit steering and memory configuration information.
The benefits and features of the systems and methods to which the present invention pertains will be more clearly understood and appreciated upon considering the ensuing description of a detailed embodiment.
FIG. 1 is a schematic block diagram of a multiprocessor system.
FIG. 2 schematically depicts the relationship between data in global memory and spare bit steering and memory configuration information as stored in configuration registers of the processors.
FIG. 3 illustrates by flow diagram the operations performed by the various processors in the embodying system.
FIG. 1 illustrates by schematic block diagram an architecture for the multi-processor to which the present invention pertains. Included within the system are four processors, identified by reference numerals 1-4. A representative example of a processor is the RISC System/6000 workstation with associated AIX Operating System as is commercially available from IBM Corporation. Each processor 1-4 includes memory configuration register 6 and bit steering configuration register 7, which registers store memory array starting address and size information in the memory configuration register, and spare bit steering by bank information in the bit steering configuration register. Associated with each processor 1-4 are locally addressable memory arrays, respectively identified by reference numerals 8, 9, 11 and 12. Though not explicitly shown, each processor also includes a cache type memory for both instructions and data. As noted earlier, cache coherency is managed by software in a manner to be described hereinafter. Atomic semaphore controller 13 in FIG. 1 allows software to coordinate access to the global memory array, generally at 14. Controller 13 includes a number of lockable semaphore type registers 16. During operation, controller 13 only allows one processor at a time to acquire exclusive access to a semaphore register. However, different processors may own different semaphores at the same time, and each processor may own more than one semaphore at a time. Software uses the semaphores to allocate processor access to the different blocks or banks of global memory. Software also introduces cache flush cycles to maintain global memory coherence between the various processor caches.
Non-blocking crosspoint switch 17 uses a relatively conventional design to allow processors 1-4 direct access to all parts of global memory array 14, in the absence of any address contentions. The processors are thereby able to concurrently communicate with the global memory in all but localized contention situations.
The generation, distribution and use of spare bit and memory configuration information is best understood with reference to FIG. 2. Block 18 depicts a composite, local and global, memory address range as viewed by a processor. Typically the first bank, Bank 0, is the local memory. Each bank of the memory is shown to include by row not only a string of data bits in columns 0-N, but also a spare bit column S. The data bits 0-N include both raw data and bits added for error correction, preferably adequate to identify two errors and to correct one. The presence of one, or possibly more than one, spare bit column ensures that hard defects in the memory array do not consistently consume the error correction code resources, given that those are usually included to manage soft errors.
In the illustrated memory block 18, bank 0 is composed of 64 megs and has, as shown, a single bad bit positioned in the third data bit column, and a further succession of five bad bits in the B data bit column. In this case, a single spare bit column is inadequate to substitute for both columns 3 and B, requiring that the whole of the page of memory be mapped out. Bank 1 is composed of 32 megs, and in this case has all five defective bits in column 6. The various processors when addressing bank 1 will be steered so that the data designated for a column 6 position is written to and read from the spare bit column S.
Note that the information about bank 0 and bank 1, as well as bank 2, appears in bit sparing configuration register 7 of processor 1. Memory configuration register 6 in processor 1 includes data regarding the sizes and starting addresses of the banks in memory system 18.
According to the present invention, the information in registers 6 and 7 of processor 1, which is presumed to be the processor which ran the global memory test by which the noted defects were identified, is distributed in identical form to each of the other processors in the multi-processor system. On the other hand, the corresponding form of information about the local memory associated with each respective processor is not distributed. The distribution of the information to corresponding registers in all of the multi-processors ensures a consistent view of global memory from each of the multiple processors while allowing fully individualized management of local memory.
The distribution of spare bit steering information and memory configuration information can be accomplished in various ways. Preferably, the register data is conveyed from the processor which performed the global memory test to the other processors in the system through a broadcast using semaphore registers 16 (FIG. 1). In the alterative, the spare bit steering and memory configuration information may be written to a designated block of global memory by the processor which tested global memory, and followed in individualized succession by a reading of such memory resident register data by each of the other three processors. In this practice of the invention, memory pointer and minimum configuration information is still passed through semaphores 16. Pointers are used to identify the block of global memory containing information, so that the location can be adjusted for global memory defects. The minimum configuration information specifies the global memory bank organization.
FIG. 3 illustrates the sequences of operations in each of N associated processors. It is presumed that processor 0 acquires responsibility for testing global memory. Two aspects are worth noting. Since only one processor performs the test of global memory, and determines the associated bit steering and bank configuration, the delays and potential inconsistencies associated with having each processor perform a similar test are obviated. Secondly, note by the parallelism of the operations performed in the various processors that one or more of the processors could be disconnected or disabled without inhibiting of the operation of the remaining processors. Since the first processor to reach a certain stage in the booting process assumes the responsibilities of processor 0, the parallel character always remains intact.
In reflection, it should be apparent that the system and related method to which the present invention pertains ensures consistency of global array memory spare bit steering in the context of a loosely coupled multi-processor system coordinated through software management of both semaphore registers in an atomic semaphore controller and cache coherence. Spare bit and memory configuration information is derived by a selected processor and distributed either through the atomic semaphore controller or through a commonly accessible block of global memory. The relative independence of the processors provides extenuated system level operational redundancy.
Though the invention has been described and illustrated by way of a specific embodiment, the systems and methods encompassed by the invention should be interpreted consistent with the breadth of the claims set forth hereinafter.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US4253146 *||Dec 21, 1978||Feb 24, 1981||Burroughs Corporation||Module for coupling computer-processors|
|US4438494 *||Aug 25, 1981||Mar 20, 1984||Intel Corporation||Apparatus of fault-handling in a multiprocessing system|
|US4608687 *||Sep 13, 1983||Aug 26, 1986||International Business Machines Corporation||Bit steering apparatus and method for correcting errors in stored data, storing the address of the corrected data and using the address to maintain a correct data condition|
|US4965717 *||Dec 13, 1988||Oct 23, 1990||Tandem Computers Incorporated||Multiple processor system having shared memory with private-write capability|
|US4972314 *||Jul 5, 1988||Nov 20, 1990||Hughes Aircraft Company||Data flow signal processor method and apparatus|
|US5099418 *||Jun 14, 1990||Mar 24, 1992||Hughes Aircraft Company||Distributed data driven process|
|US5134616 *||Feb 13, 1990||Jul 28, 1992||International Business Machines Corporation||Dynamic ram with on-chip ecc and optimized bit and word redundancy|
|US5163133 *||Feb 17, 1989||Nov 10, 1992||Sam Technology, Inc.||Parallel processing system having a broadcast, result, and instruction bus for transmitting, receiving and controlling the computation of data|
|US5199033 *||May 10, 1990||Mar 30, 1993||Quantum Corporation||Solid state memory array using address block bit substitution to compensate for non-functional storage cells|
|US5204938 *||Mar 18, 1992||Apr 20, 1993||Loral Aerospace Corp.||Method of implementing a neural network on a digital computer|
|1||*||IBM Technical Disclosure Bulletin, vol. 33, No. 9, Feb., 1991 System Support for Multiprocessing Without an Atomic Storage , pp. 18 23.|
|2||IBM Technical Disclosure Bulletin, vol. 33, No. 9, Feb., 1991-"System Support for Multiprocessing Without an Atomic Storage", pp. 18-23.|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US5642506 *||Jan 16, 1996||Jun 24, 1997||International Business Machines Corporation||Method and apparatus for initializing a multiprocessor system|
|US5684979 *||Jul 23, 1996||Nov 4, 1997||International Business Machines Corporation||Method and means for initializing a page mode memory in a computer|
|US5829052 *||Dec 10, 1996||Oct 27, 1998||Intel Corporation||Method and apparatus for managing memory accesses in a multiple multiprocessor cluster system|
|US5835767 *||Aug 19, 1994||Nov 10, 1998||Unisys Corporation||Method and apparatus for controlling available processor capacity|
|US5867702 *||Jan 23, 1997||Feb 2, 1999||International Business Machines Corporation||Method and apparatus for initializing a multiprocessor system|
|US6058475 *||Sep 22, 1997||May 2, 2000||Ncr Corporation||Booting method for multi-processor computer|
|US6151663 *||Jan 2, 1998||Nov 21, 2000||Intel Corporation||Cluster controller for memory and data cache in a multiple cluster processing system|
|US6192384 *||Sep 14, 1998||Feb 20, 2001||The Board Of Trustees Of The Leland Stanford Junior University||System and method for performing compound vector operations|
|US6457100 *||Sep 15, 1999||Sep 24, 2002||International Business Machines Corporation||Scaleable shared-memory multi-processor computer system having repetitive chip structure with efficient busing and coherence controls|
|US6601165||Mar 26, 1999||Jul 29, 2003||Hewlett-Packard Company||Apparatus and method for implementing fault resilient booting in a multi-processor system by using a flush command to control resetting of the processors and isolating failed processors|
|US7251744 *||Jan 21, 2004||Jul 31, 2007||Advanced Micro Devices Inc.||Memory check architecture and method for a multiprocessor computer system|
|US7996592 *||May 2, 2001||Aug 9, 2011||Nvidia Corporation||Cross bar multipath resource controller system and method|
|US9600189||Jun 11, 2014||Mar 21, 2017||International Business Machines Corporation||Bank-level fault management in a memory system|
|US20020166017 *||May 2, 2001||Nov 7, 2002||Kim Jason Seung-Min||Cross bar multipath resource controller system and method|
|US20080059687 *||Aug 31, 2006||Mar 6, 2008||Peter Mayer||System and method of connecting a processing unit with a memory unit|
|U.S. Classification||711/147, 713/1|
|International Classification||G06F15/16, G06F9/52, G06F15/177, G06F11/10, G06F15/173, G11C29/00|
|Cooperative Classification||G11C29/70, G06F15/17375, G06F11/1044|
|European Classification||G11C29/70, G06F15/173N4A|
|Nov 9, 1992||AS||Assignment|
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST.;ASSIGNORS:HARDELL,, WILLIAM R., JR.;HENSON,, JAMES D., JR.;MITCHELL, OSCAR R.;REEL/FRAME:006339/0764
Effective date: 19921109
|Nov 12, 1997||FPAY||Fee payment|
Year of fee payment: 4
|Dec 14, 2001||FPAY||Fee payment|
Year of fee payment: 8
|Jan 18, 2006||REMI||Maintenance fee reminder mailed|
|Jul 5, 2006||LAPS||Lapse for failure to pay maintenance fees|
|Aug 29, 2006||FP||Expired due to failure to pay maintenance fee|
Effective date: 20060705