|Publication number||US7826996 B2|
|Application number||US 11/679,175|
|Publication date||Nov 2, 2010|
|Priority date||May 20, 2003|
|Also published as||US7184916, US8126674, US20040267481, US20080059105, US20100324854, WO2004114319A2, WO2004114319A3|
|Publication number||11679175, 679175, US 7826996 B2, US 7826996B2, US-B2-7826996, US7826996 B2, US7826996B2|
|Inventors||David R. Resnick, Gerald A. Schwoerer, Kelly J. Marquardt, Alan M. Grossmeier, Michael L. Steinberger, Van L. Snyder, Roger A. Bethard|
|Original Assignee||Cray Inc.|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (45), Classifications (16), Legal Events (1)|
|External Links: USPTO, USPTO Assignment, Espacenet|
This is a divisional of U.S. patent application Ser. No. 10/850,044 titled “APPARATUS AND METHOD FOR TESTING MEMORY CARDS” filed May 19, 2004 (now U.S. Pat. No. 7,184,916), which claimed benefit under 35 U.S.C. 119(e) to U.S. Provisional Application Ser. No. 60/472,174, entitled “APPARATUS AND METHOD FOR TESTING MEMORY CARDS”, filed May 20, 2003, each of which is herein incorporated in its entirety by reference.
This application is also related to: U.S. patent application Ser. No. 10/850,057 entitled “APPARATUS AND METHOD FOR MEMORY WITH BIT SWAPPING ON THE FLY AND TESTING” filed on May 19, 2004 (now U.S. Pat. No. 7,320,100), which claimed benefit of U.S. Provisional Patent Application No. 60/472,174 filed on May 20, 2003, titled “APPARATUS AND METHOD FOR TESTING MEMORY CARDS,” and to
U.S. patent application Ser. No. 11/558,450 titled “APPARATUS AND METHOD FOR MEMORY READ-REFRESH, SCRUBBING AND VARIABLE-RATE REFRESH” filed on Nov. 10, 2006 (which is a divisional of U.S. patent application Ser. No. 10/850,057);
U.S. patent application Ser. No. 11/558,452 titled “APPARATUS AND METHOD FOR MEMORY ASYNCHRONOUS ATOMIC READ-CORRECT-WRITE OPERATION” filed on Nov. 10, 2006 (now U.S. Pat. No. 7,676,728) (which is also a divisional of U.S. patent application Ser. No. 10/850,057); and
U.S. patent application Ser. No. 11/558,454 titled “APPARATUS AND METHOD FOR MEMORY BIT-SWAPPING-WITHIN-ADDRESS-RANGE CIRCUIT” filed on Nov. 10, 2006 (now U.S. Pat. No. 7,565,593) (which is also a divisional of U.S. patent application Ser. No. 10/850,057); each of which is incorporated herein in its entirety by reference.
This invention relates to the field of computer memories, and more specifically to a method and apparatus for testing a computer memory, for example one implemented on a card in which additional logic functions on the card make direct access to the memory parts themselves difficult or impossible, and for testing memory-card logic in which the normal data paths do not support easy, low-cost test access.
Modern computer systems require faster, more sophisticated, and larger capacity memory, often provided on daughter cards such as DIMMs (dual-inline memory modules) having a plurality of memory chips per daughter card. As system performance keeps increasing, it is difficult and expensive to connect enough memory parts more or less directly to the processor or its interface ICs. Electrical issues and pin limitations push memory system design in directions that put the memory controller(s) on the memory cards and also push the card interface to have higher data rates per pin in order to reduce the number of pins while keeping the card bandwidth in line with the higher performance needs of the attached processors and of the bandwidth of the memory components on the memory cards. A memory card design that adopts this direction has test issues, in that the memory components (the chips) are not directly accessible for testing as is normal in past industry practice, and the data rates of the high-speed interfaces are too fast for connection to testers that are available in normal production testing. While special purpose test equipment can be built and used, the design of special-purpose memory testers is very expensive and time consuming.
Thus, there is a need for improved testing methods and apparatus for new memory cards and for logic functions in which test access is ‘hidden’ behind high speed interfaces.
The present invention provides a memory daughter card (MDC) having one or more (likely multiple) very high-speed serial interface(s), optionally an on-card L3 cache, and an on-card MDC test engine that allows one MDC to be directly connected to another MDC, or to itself, for testing purposes. In some embodiments, a control interface, such as a JTAG interface and/or a Firewire channel, allows the test engine to be programmed and controlled by a test controller on a test fixture that allows a single card to be tested, or simultaneous testing of one or more pairs of MDCs, one MDC in a pair (the “golden” MDC) testing the other MDC of that pair.
A method is also described, wherein one MDC executes a series of reads and writes (and optionally other commands) to another MDC to test at least some of the (and ideally, most or all of) other card's functions. A method is also described, wherein one port of an MDC executes a series of reads and writes (and optionally other commands) to another port of the same MDC to test at least some of the (and ideally, most or all of) the card's functions.
It is to be understood that a memory “card” includes any suitable packaging, including printed circuit card, ceramic module, or any other packaging that holds a plurality of memory chips along with some or all of the circuitry described herein. In some embodiments, a “card” would include a single integrated-circuit chip having both the memory and some or all of the circuitry described herein.
In the following detailed description of the preferred embodiments, reference is made to the accompanying drawings that form a part hereof, and in which are shown by way of illustration specific embodiments in which the invention may be practiced. It is understood that other embodiments may be utilized and structural changes may be made without departing from the scope of the present invention.
The leading digit(s) of reference numbers appearing in the Figures generally corresponds to the Figure number in which that component is first introduced, such that the same reference number is used throughout to refer to an identical component which appears in multiple Figures. Signals and connections may be referred to by the same reference number or label, and the actual meaning will be clear from its use in the context of the description.
In some embodiments, each MDC 110 includes a single W-chip 120 (i.e., a circuit 120, which, in some embodiments, is implemented on a single chip (and, in some embodiments, includes other circuitry and/or functions than described herein)), but in other embodiments, circuit 120 is implemented using more than one chip, but is designated herein as W-chip or circuit 120) having a high-speed external card interface 112, which in turn includes a plurality of SerDes (serializer-deserializer) ports 121 (for example, four SerDes ports 121 per MDC 110 are used in some embodiments). A crossbar switch 123 connects each SerDes port 121 to each one of a plurality of L3 caches 124 (for example, four L3 caches 124 per MDC 110 are provided in some embodiments). In some embodiments, each L3 cache 124 is tied by connection 126 to a corresponding DDR2 memory controller 127. In some embodiments, an additional “degrade capability” connection 128 is provided between each L3 cache 124 and a neighboring DDR2 memory controller 127. In some embodiments, each DDR2 memory controller 127 controls a memory or memory portion 130 having five eight-bit-wide DDR2 memory-chip groups 130 (for example, each chip group 130 having one memory chip, or having two or more stacked chips). This provides each DDR2 memory controller 127 with a forty-bit-wide data path, providing 32 data bits, seven ECC (error-correction code) bits, and a spare bit.
In some embodiments, the individual memory components of the memory-chip group(s) 130 conform to the emerging JEDEC Standards Committee DDR2 SDRAM Data Sheet Revision 1.0 Specification JC 42.3 (JESD79-2 Revision 1.0) dated Feb. 3, 2003 or subsequent versions thereof. In other embodiments, conventional, readily available DDR chips are used. In yet other embodiments, any suitable memory-chip technology (such as Rambus™, SDRAM, SRAM, EEPROM, Flash memory, etc.) is used for memory chip-groups 130.
The W-chip 120 also includes a control interface 122 (some embodiments use a JTAG-type boundary scan circuit for control interface 122; some embodiments further use Firewire (IEEE Standard 1394) channel for the off-card interface 119 to control interface circuit 122). In some embodiments, the Firewire interface is built into W-chip 120, while in other embodiments, the Firewire interface is built on a separate chip on MDC 110, and connects to a JTAG interface provided by control interface 122. Control interface 122 provides the mechanism to set bit patterns in configuration registers (for example, some embodiments use memory-mapped registers (MMRs)) that hold variables that control the operation of W-chip 120.
The present invention also provides circuitry that allows one MDC 110 to test another MDC 110, in some embodiments, or to test itself, in some embodiments. In some embodiments, this circuitry is implemented as a W-chip test engine (WTE) 125 having a microcode sequence, described further below.
In some embodiments, such a configuration allows a large variety of debug activities to be performed that are not available on simpler setups that run a large number of tests, but generate only a pass-fail result, such as checking a checksum value after a large number of tests were run. The ability to load microcode having newly devised tests allows intricate debug to be performed, even when the high-speed interfaces (SerDes ports 121, for example) are run at full speed.
In some embodiments, the test fixture 211 (which is similar to fixture 210 of
In some embodiments, test controller 220 sets up one or more SerDes ports 121 (for example, port 0 and port 2) as the tester port(s) wherein WTE 125 runs the memory tests out those ports and receives results back into those ports), and sets up the other ports 121 (for example, ports 1 and 3) as the unit-under-test (UUT) ports wherein they are configured in the normal read/write memory card mode (as if it were in system 100 of
In some embodiments, a test-control computer 288 is provided to drive test controller 220, and to receive results for display, transmission, or storage. In some embodiments, a computer-readable storage medium 289 (such as diskette, CDROM) is used to provide the control program data that is loaded into microcode memory 310 of
In some embodiments, the cache quadrants 124 each drive separate memory controllers 127. In turn, each memory controller drives a set of memory chips 130.
Thus, the memory daughter card (MDC) 110 for computer system 100 is very different from conventional memory cards designed and used previously in the computer industry. MDC 110 does not provide direct access to the memory parts on the card from the card's connector, but instead it receives commands and functional requests through four high-speed ports 121 that can not easily be connected to, or functionally tested by, general-purpose testers or conventional memory testers. This means that test capability of the card must be designed into the card as part of the design process and, in some embodiments, needs to interact with and accept test requirements of the vendor or vendors that will manufacture the card. This invention describes the basic test requirements and capabilities in support of all aspects of making and using a MDC 110: in card manufacturing and test, in initial system debug and checkout, in field test and support, in card repair, etc.
The test capability described here is typically not intended to replace a multimillion-dollar test system, but to enable verification of correct operation of all components on the card and to support maintenance and debugging functions when needed.
Overview of MDC 110
In some embodiments, MDC 110 includes two major kinds of components: a single ASIC called W-chip 120 (other embodiments include a plurality of chips that together provide the function for such a W-chip 120), and a plurality of (e.g., twenty, in some embodiments) DDR2 (double-data-rate type two) memory-chip groups 130 (or, in other embodiments, other types or mixes of types of memory components 130). In some embodiments, there are multiple less-complex components, generally capacitors.
Clock signals 222 (there are two required, in some embodiments) are supplied through the card connector using differential signaling.
As shown in
In addition, two MDCs 110 can be connected together such that one MDC 110 can be used to provide test data and test sequences for the other MDC 110.
In some embodiments, the W-chip test engine 125, other maintenance functions, and other status and control aspects of MDC 110 and W-chip 120 are accessed through a JTAG port 122 (Joint Test Action Group, IEEE Std. 1149.1) that is available at the card connector pins. In other embodiments, Firewire channel is provided and connected as the external interface to the MDC 110, and is internally connected to the JTAG control interface 122.
In some embodiments, each DRAM controller 127 drives five memory parts 130, each being eight-bits wide, and thus has a 40-bit data interface. In some embodiments, a second rank of five parts 130 is also supported. In other embodiments, multiple ranks of chips are provided, with a separate chip select per rank. This needs only one additional chip-select signal output from each memory controller 127 for each memory rank in the chip-group stacks since, if the two-rank capability is implemented, memory chips are, in some embodiments, connected as five stacks of two memory parts each with almost all pins shared in each stack.
In operation, each 40-bit data interface is used as thirty-two data bits, seven SECDED (single-bit error correction, double-bit error detection) checkbyte bits and an active spare bit. When being tested, memory can be accessed like that or alternatively or additionally can be exercised as a simple 40-bit interface.
A basic feature for the test design of MDC 110 is that the card is testable with almost no support needed externally, except for connection to a controlling JTAG (or Firewire or other similar or suitable channel) interface, two clock sources, and some routing on the connector that provides power in addition to connections to the clocks and maintenance wiring, at a minimum. In an MDC 110 testing environment, wiring for interface port loopback tests should be provided, for example as shown in
The test design will also support using one MDC 110 to test another. Doing this means a more complex test fixture in order to have the pair of cards connected together, for example as shown in
Software support is required to drive the JTAG interface and to make use of the test capabilities of the card. In some embodiments, an interface is provided between a standard channel such a Firewire, IEEE Std. 1394 and the JTAG pins of W-chip 120 because a connection is required to a maintenance or control processor 220 will, in some embodiments, require the interface chip for operation and maintenance of computer system 100.
In some embodiments, loopback connections for the high-speed ports 121 using the test fixture enable the ports 121 to be tested at full data rates without test or tester connections to the ports 121. The port interface transmitters and receivers automatically synchronize together and then pass test data back and forth as part of each port's initialization sequence, indicating that each port is ready for use. In addition, the WTE can generate and receive test operands for the interface ports 121 using the test fixture's loopback wiring. These tests can use test-specified or pseudo-random data patterns. The same test sequences can be done using an internal loopback capability at each port's IO pads (See
In some embodiments, the WTE is a basic microcode sequencer that is designed to generate requests and accept responses from the internal logic and memory functions and can check the returned data. The sequencer is loaded with tests consisting of commands, address sequences (including looping capabilities), test data and expected result data according to the needs of the test to be performed. The test engine 125 is very flexible so that a diagnostic or test engineer can directly specify needed test functions and sequences. Test sequences of almost unlimited lengths can be generated.
In some embodiments, the test data width is controllable so that data functions with and without accompanying SECDED ECC can be tested easily. The WTE also can generate tests with pseudo-random numbers and check the results of tests using that data. The number of different test-data operands and expected-data results are typically bound by buffer size limits.
The L3 cache can be tested specifically by the test engine 125 and can be used to help test the DRAM memory subsections. When testing the subsections, test data can be placed into the cache through the JTAG port or can be written to the cache by the WTE. A test sequence in the WTE can then generate requests to the cache that cause cache data to be written to the subsection memory. Subsequent WTE requests can cause that data to be read and checked. The benefit of doing this, as the cache is small with respect to the memory in each subsection, is that full memory bandwidths can be generated so as to check for data and timing interactions and for other transient issues.
Each of the logic functions in W-chip 120 chip has several associated MMRs (Memory Mapped Registers). The registers control and configure the respective logic. Also, if a function has status (such as a memory controller 127 provides information on SECDED errors), that information is recorded in local MMRs. All MMRs can be accessed and controlled through the JTAG interface.
Some errors detected by normal logic functions can indicate the need for support, recovery or reconfiguration by the operating system or maintenance processor, for those cases data packets can be generated by the normal logic functions that become interrupt requests in normal system operation and can provide expected interaction that helps verify correct operation of MDC 110 functions. All interrupts can be enabled and disabled by setting control bits in MMRs.
In normal use, system data paths are 64 data-bits wide and are considered as having a single 8-byte data item or two 32-bit data items. At the memory, the data path is 40-bits wide to support 32-bit data items, ECC (the error correction code data) and the spare-memory-bit path. In some embodiments, in order to enable full testing of the memory chips, all needed paths in W-chip 120 support 40- and 80-bit data widths.
Of course the high-speed processor ports 121 are narrower—four bits in each direction for some embodiments. However, the SerDes assembly/disassembly process allows for interface data packet elements (called flits in packet parlance) to support data that is 32- and 64-bits wide. In addition, the interface supports 40-bit wide data elements in test mode, in which 40 of the 64-bit data items hold test data.
Functions of the Memory Controllers 127 that Affect Test and Maintenance
A simplification for some embodiments of the controller 127 is that individual byte-enables are not used. For those cases, at each data strobe, all 40 data bits are used or they are all skipped. Also, in some embodiments, there are no power-down or sleep modes supported in memory and there are no chip self- or auto-refresh functions. Each controller 127 generates distributed refresh functions using normal memory references and uses the returned data to accomplish background memory scrubbing. (If the refresh data has an error, a memory write cycle is scheduled to put correct data back in memory, if that is possible for those embodiments.
For some embodiments, each memory controller 127 can only accept memory requests that result in 16-byte/burst-of-four or 32-byte/burst-of-eight data transfers to/from memory. All references close the banks in the memory parts at the completion of that operation for those embodiments. In some embodiments, there is one maintenance case where one MDC 110 is being used to source test sequences to another card in which whole rows from the memory banks are transferred. This function is typically not used in normal system operation.
The same logic that detects and fixes data being scrubbed can be used to rewrite correct data back to memory when a correctable error occurs during normal user operation, in some embodiments. (Most systems using SECDED or more powerful error correction schemes fix the data being returned to a user but leave the data bad in memory. This can accumulate soft errors in memory and result in multi-bit, uncorrectable errors.)
For some embodiments, each controller has 7-bit SECDED and an active spare bit along with the normal data path of 32 bits. In test mode either 32-bit (letting the controller control the other eight bits), or 40-bit data can be written and read. In 32-bit mode, checkbytes are generated and checked and the position of a data bit to be replaced by the spare data-bit can be specified. The WTE can exercise and test this logic.
For some embodiments, each controller is designed to maximize memory bandwidth by allowing memory requests to go out of order and by grouping read and write operations such that bus turn-around losses are reduced. The reordering takes place with respect to the memory banks of the memory chips so that multiple requests for the same bank stay in order. If the oldest request is for bank 0, but that bank is busy, use a following request to start another memory operation for a bank that is not busy. The reordering function can not be turned off in some embodiments, but can be controlled and used by specifying what address sequences are generated when generating address sequences for testing. The test engine 125 can check returned data without being dependent on data ordering. Each memory request has a transaction identifier (TID) that is used to establish correspondence between particular requests and data being returned in response to the requests by returning the TID with the corresponding returned data items.
Each controller can be driven directly from the JTAG interface for a more direct memory access though this capability does not support test at high data rate (in some embodiments, 4 MBytes/sec or so).
The spare bit capability mentioned above allows an otherwise unused bit in the data path to memory to substitute for any of the other bits. Thus the memory interface is functionally 39 bits wide and the 40th bit can be used in place of any of the other 39. It is expected that the spare will generally be used to avoid ‘stuck’ bits in memory though it is also useful for some failures like broken nets and pins and similar faults.
In some embodiments, there is a ‘memory degrade’ option that allows system operation to be restarted in the presence of failing memory components. When the degrade option is activated, two of the four memory controllers 127 support all four L3 cache quadrants. The degrade option allows either the even or odd numbered controllers to be used, with the other pair idled. This reduces the memory size and the memory bandwidth by half but allows users to continue to use the processors whose associated memory has failures. The degrade paths must be tested as part of the verification testing of MDC 110.
The controller design supports multiple memory-chip densities and various memory timing and functional variations, in some embodiments. These functions and modes are controlled by on-chip registers and can be exercised and tested by the test engine as desired. The memory controller, in some embodiments, also supports multiple different kinds of atomic memory operations (AMOs) like add-to-memory functions for example. These read-modify-write functions can also be exercised and tested by the test engine.
Test and Maintenance Functions of the Processor Ports 121
In some embodiments, when a SerDes receiver (SerDes-in 341 portion of a port 121) is powered up or when the receiver loses link synchronization, the receiver automatically goes into a ‘training’ mode where it expects to receive a timing sequence so that clock and frame sync can be established or recovered. When the output logic of a SerDes port 121 is initialized, each bit-serial driver puts out a data sequence that enables the corresponding receive logic to acquire both clock and frame synchronization. After the frame-sync interval, a test-data sequence is generated and processed to verify each link's functionality. If that sequence is done correctly the receiver becomes ready to accept normal data traffic.
In order for things to remain in sync, each output constantly sends data packets. If there is no port information to be transmitted at the time each packet is sent, a null packet is formed and transmitted. Status in both the transmitter and receiver indicate how things are going. This means that, for example, if a net or connector breaks, reading the status MMR of the receiver indicates that the receiver has dropped out of clock sync and is not detecting any input.
In normal use data is ‘packetized’ to enable detection and recovery from errors. Each packet has ECC for data checking and has a packet ID so that error packets can be identified. As packets are received the ECC is checked. If all packets in a frame are received correctly, an acknowledgement is passed back to the transmitter. This enables the transmitter to keep sending more packets. There is a maximum number of packets that can be sent without being acknowledged. If an error is detected, no acknowledgement is returned. The transmitter will time out (in some embodiments, the timing is adjustable) and, by knowing the last frame that was successfully received at the other end, will start retransmitting the failed frame packets. Status is kept and another MMR has a limit on the number of retries that will be attempted before giving up.
There are some other test functions that test that the packet error checking and packet retry functions work correctly. The functions are, in some embodiments, able to be controlled directly from on-chip MMRs and so do not require the WTE, though the test engine 125 can provide additional testing, if desired.
In some embodiments, any errors detected in the SerDes interface and in checking the packet data are recorded in status MMRs and are available at all times.
As was stated before, in some embodiments, logic associated with each SerDes port (the LCB or Link Control Block) can generate a pseudo-random data sequence that can be sent and checked at the receiver. This is normally done as part of the initialization sequence. This means that, in some embodiments, no additional direct test capability is needed from the WTE or from other tests specifically directed at the interface ports. Of course the ports will be exercised by data passing through the ports, as when one memory card is being used to test another card. Error checking and recovery is enabled and used for these cases.
The transmit/output and receive/input sides of each SerDes port are independent enough that a single loopback connection can verify functionality using the functions discussed above. There is a maintenance function to activate this loopback connection at the pins of W-chip 120.
Test and Maintenance Functions for the L3 Cache and Associated Logic
In some embodiments, the L3 data cache has SECDED circuitry on a 32-bit basis. Like the DRAM interface, data can be written and read in this mode and also in a 40-bit mode so that the memory underneath the data checkbytes can be easily tested. This would normally require that the cache support 39 bits, but 40 bits of data width are provided so that the data items in the cache can be used as full-width test operands for the memory subsections.
Associated with each cache line (32 bytes per cache line, in some embodiments) is an address. The address is used when memory requests arrive from the processors to see if the requested data item is present in the cache so that a subsection memory reference can be avoided. The addresses for all the cache lines are grouped together into a Tag RAM. Each entry in the Tag RAM is the address for the data of one cache line. In addition to the address data in the Tag RAM, sharing and coherency state data for each line is also stored. This information is used to determine data ‘ownership’ and sharing properties.
In some embodiments, the Tag RAM is protected by its own SECDED checkbyte. The logic and memory associated with the checkbyte are not directly testable but have a maintenance function, discussed below, that enables full test of the associated functionality. The coherency logic is tested with specific test sequences from the WTE. Built into the coherency logic are illegal sequence detectors (like trying to evict the same item twice in succession) that help in the testing of these functions, in some embodiments.
The ‘way-compare’ logic in the cache (in some embodiments, sixteen comparators that see if a request address matches one of the addresses in the Tag RAM) is tested by storing specific addresses in the Tag RAM and then generating a memory request (usually from the WTE) and seeing if data is returned from the cache or if a memory-get request is generated to the memory controller 127 (indicating that no address match was found).
Each quadrant of the L3 data cache is ‘more or less’ testable as a random-access memory when put into a specific test mode. At the same time and using the same test mode, the other sharing and coherency logic is driven by the same sequence (read and write operations) and sends responses to the WTE for checking. The ‘more or less’ comes from the fact that the multiple cache entries at a single address index (the ‘sixteen ways’) are distinguished by the requirement that the address in each respective Tag entry must be different and the way-compare logic indicates that a particular ‘way’ has the data cached for a particular address and self identifies. In some embodiments, there is no mechanism to say “read the data item that resides in ‘way-3’ for the following address/index.” In a test mode the individual ways can be identified, but again, without knowing a ‘real memory address.’ In some embodiments, from the WTE, data can be written to specific ways and memory indexes; this is equivalent to having a memory address. When data is being read from the cache, the address compare logic chooses a way that matches the requested address and returns the correct data without ever having a specific read address. In some embodiments, the JTAG path can read and write specific cache locations but at a lower bandwidth than can be sustained by the WTE.
Testing of the SECDED checkbyte generation, memory, syndrome generation, and data correction functions of the Tag RAM are accomplished with the following test:
The cache is also used in testing the DRAM memory. When this is done, data to be written to the DRAMs is stored in the cache. The WTE generates AMO (or other) references that cause data to be written to the DRAMs in the associated subsection. Data can be subsequently read by having the WTE generate normal memory reads for the same addresses. In some embodiments, using AMO (atomic memory operation) references allows full memory bandwidth to be generated and does not require that the detailed structure of the cache be understood in order to generate useful test sequences. (By way of explanation: in some embodiments, AMO operations take place in each memory controller 127; any cache data must be forwarded to the controller so that can take place. The memory controller 127 writes the data to memory as part of AMO functionality.)
Other Test and Maintenance Functions
In some embodiments, the W-chip 120 has a capable internal test-point monitoring capability. Commands are sent to the logic monitor to choose what test points to monitor and to select a triggering condition. The selected testpoint data is saved in a buffer memory for observation later.
The trigger condition can start or stop data recording. If the trigger condition mode stops testpoint data recording, data recording is started when the mode is selected and runs continuously—the testpoint data buffer is circular—and is stopped when the trigger condition occurs. As a result, data in the testpoint buffer looks backward in time as the condition that generated the trigger condition corresponds to the last entry in the buffer. If the trigger condition mode is to start recording testpoint data, than data recording is started when the trigger condition occurs and is stopped when the buffer is full. Data in the buffer is then later in time than the triggering event. This capability has proved very useful for low-level debugging and fault-finding.
The JTAG scan logic has full access to all memory-mapped registers which hold configuration information and control and receive status from all major logic functions in the IC. This includes system level operations as well as maintenance and diagnostic functions.
Functions of the W-Chip Test Engine 125
The WTE (W-chip test engine) 125 is connected into the chip's logic as shown in
The test engine 125 is controlled and results observed through MMR registers that are accessed through the JTAG port. In addition, in some embodiments, the test engine 125 can be used in other system test operations, for example by generating test data packets that can be sent to the processors for diagnostic functions.
The logic of the test engine 125 consists of two major components: a sequencer 346 (e.g., one that is controlled by microcode stored in the W-chip) which generates tests and a result test checker 347. A block diagram of the sequencer is shown in
In some embodiments, the Test Generation logic has the following major features and subcomponents:
When the WTE is running a test, the different registers needed for the test and the contents of some of the fields in the sequence memory are used to build a request packet—write at the following address using a specified data item from the test data buffer, for example—and sent off for execution. Each packet is given an identifier, called a TID (for Transaction IDentifier), that is most importantly used when data is returned as a result of a data read request. The Result logic keeps a pointer to the expected data in association with the TID. This means that data checking is not dependant on the order that data is returned from memory.
The Test Result logic is shown in
All the needed ‘meta’ controls for the WTE test functions—indicating, for example, to the crossbar logic that 40/80-bit data paths are required instead of 32/64-bit paths or that the test sequence is for the L3 cache rather then the DRAM memories—are MMRs that are controlled via the JTAG scan logic.
The WTE also has the ability to generate requests to the memory subsection controllers that result in a stream of data being dumped to the processor ports. The data stream becomes a sequence of memory read and write requests to a connected unit-under-test. A test mode set in the memory controllers 127 causes whole memory rows to be read at maximum bandwidth. This function is used on the Gold unit when it is generating test streams for use in testing another MDC 110.
Among several other functions that can be useful in support of system operation, debugging, or checkout, it is, in some embodiments, very easy for the WTE to change the ECC checkbytes in memory in the following ways: 1) pass through memory making the data checkbytes correspond to data stored there and 2) pass through memory storing invalid checkbyte values. The first function allows corrupt memory to be accessed and the second is intended to generate an interrupt when a program accesses data that has not been subsequently validly initialized; this is useful in software program debugging.
The test engine can also be used, in some embodiments, in normal system operation, for example by zeroing-out newly allocated memory pages as a help to operating system allocation routines.
Using One MDC 110 to Test Another
When one MDC 110 tests another, one card (the golden unit) is a master and is used to provide a stream of requests to the MDC unit under test. The following is done:
In this mode, the memory controllers 127 always reference and send out whole rows from the memory. If the test ends before the last data in a row, the test data generator must pad the end of the sequence with null/empty packets, in some embodiments.
The request data stored in the memory of the gold unit must be properly formatted data packets. In some embodiments, data within the test sequence can be normal 32- and 64-bit data or it can provide 40-bit data items in the data portion of the request packets. For some embodiments, a single test stream must not mix 32/64 bit data requests with 40-bit data requests. The 40-bit data format allows memory normally holding ECC data bits to be tested as normal memory with full control over the stored data bits. This 40-bit mode will not exercise full memory bandwidth however, in some embodiments. When in 40-bit mode, all memory requests must be for 16-byte data items (a single burst-of-four for each memory subsection when using DDR2 SDRAM memory), in some embodiments.
About the Memory Mapped Registers (MMRS) in W-Chip 120
All MMRs are loaded and unloaded through the JTAG scan path, in some embodiments. All control functions including master clear and initialization functions are done through on-chip MMRs. Internal status for all functions is available in the requisite MMRs. The internal memory blocks including the L3 data and Tag/coherency memories and the test point buffer can be written and read through the MMR access mechanism.
Each MMR or memory function is assigned an address or an address range. In the JTAG scan port there is a register that can be loaded with the needed address; there is also a function register is that is loaded at the same time. If the function is writing, data follows the address in the serial data stream. If the function is reading, the data from the addressed entity is driven from the scan output. The result is quick access to any needed function, status register, or data memory and avoidance of long scan chains when accessing the MMR functions. When the IC is powered up or is given the lowest level of master clear, all MMRs are loaded with default values, in some embodiments. While some of the defaults will likely never change except for some of the maintenance functions (enable coherency in the L3 cache, for example), others will become obsolete and will always change; for example, when 4-Gbit memory parts become available the memory size default for 1-Gbit memory parts will, in some embodiments, never be used on new systems from that point onward. For some embodiments, the scan port in W-chip 120 can run at any frequency from dc to 50 MHz.
Using the Test Functions in MDC 110/W-Chip 120
In some embodiments, test sequences will follow the same basic operational steps:
Some embodiments of the invention include a first circuit 120 for use with a first memory card 110, the card having a plurality of memory chips 130. This first circuit includes a high-speed external card interface 112 (also called a system interface 112) connected to write and read data to and from the memory chips 130, and a test engine 125 configured to control the high-speed interface 112 and/or the memory chips 130 and to provide testing functions to a second substantially identical circuit 120 on a second memory card 110.
Some embodiments of the first circuit 120 further include one or more memory controllers 127, each one of the one or more memory controllers 127 connected to control a subset of the plurality of memory chips 130.
Some embodiments of the first circuit 120 further include one or more caches 124; each one of the one or more caches 124 operatively coupled to a corresponding one of the memory controllers 127.
In some embodiments of the first circuit 120, the high-speed external card interface 112 further includes a crossbar switch 123, and one or more SerDes ports 121, each one of the one or more SerDes ports 121 connectable through the crossbar switch 123 to a plurality of the caches 124.
Some embodiments of the first circuit 120 further include a control interface 122, the control interface configured to program the test engine and to initialize, control, and observe test sequences.
In some embodiments, the invention includes a system 200 for using a first memory card 110 to test a second memory card 110, the system 200 including a test fixture 210 having a first interface 219A connectable to the first memory card and a second interface 219B connectable to the second memory card, such that at least some inputs from the first interface are connected to corresponding outputs of the second interface, and at least some outputs from the first interface are connected (via connection wiring 230) to corresponding inputs of the second interface, and a test controller 220 operable to send configuration data to the first interface to cause a testing function to be performed when suitable first and second memory cards are connected to the fixture.
In some embodiments, the first interface connects each of one or more high-speed SerDes port of the first memory card 110 to a corresponding SerDes port of the second card 110.
In some embodiments, the test controller 220 receives test results from the first memory card 110 indicative of functionality of the second memory card 110.
In some embodiments, the test controller 220 includes an interface 219 (or 219A and 219B) to send and receive data from respective control interface ports 119 of the control interfaces 122 on the first memory card 110 and the second memory card 110.
In some embodiments, the test controller 220 is operable to configure the second memory card 110 to each one of a plurality of different operation modes.
Some embodiments of the test system 200 further include a test controller connection 219 to both the first and second memory cards.
In some embodiments, the invention includes a method for testing memory cards, the method including connecting a plurality of interface lines of a first memory card to corresponding complementary interface lines of a second memory card, configuring the first memory card to be operable to perform testing functions, configuring the second memory card to be operable to perform normal read and write operations, and testing the second memory card under the control of the first memory card.
In some embodiments of this method, the configuring of the first memory card includes loading microcode into the first memory card.
In some embodiments, the invention includes a first memory card 110 that includes a plurality of memory chips 130, one or more high-speed external card interfaces 121, including a first interface 121 and a second interface 121, each connected to write and read data to and from the memory chips 130, and a test engine 125 configured to control the first high-speed interface 121 and the memory chips 130 in order to provide testing functions to the second high-speed interface 121.
In some embodiments of this card 110, the test engine 125 is operable to generate requests that look like and perform as normal requests to the card.
In some embodiments of this card 110, the test engine 125 includes internal paths that enable the test engine 125 to send requests to and receive results from a plurality of internal chip functions.
Some embodiments of this card further include circuitry that allows results to return in a different order than the order in which they were generated.
Some embodiments of this card further include a microcode memory that stores code that controls at least some functions of the test engine.
In some embodiments, the invention includes a computer system 100 or 200 that includes a first processing unit 106 or 220, and the first memory card 110 described above, operatively coupled to the first processing unit 106 or 220.
Some embodiments of this computer system 100 or 200 further include a second memory card 110 substantially identical to the first memory card 110, and operatively coupled to the first processing unit 106 or 220.
In some embodiments of the computer system 200, at least one interface port 121 of the first memory card 110 is complementarily connected to a respective interface port 121 of the second memory card 110, and wherein the first processing unit 220 is configured to load configuration information into the first memory card to cause the first memory card 110 to perform test functions to the second memory card 110, the first processing unit 220 also configured to receive test results.
In some embodiments of the computer system 100, the first processing unit 106 is configured to load configuration information into the first memory card 110 and the second memory card 110 to cause the first memory card 110 and second memory card 110 to perform normal read and write operations.
Some embodiment further include a second processing unit 106, a third memory card 110 substantially identical to the first memory card 110, and operatively coupled to the second processing unit 106, and a fourth memory card 110 substantially identical to the first memory card 110, and operatively coupled to the second processing unit 106.
Other embodiments of the invention include a first memory card 110 that includes a plurality of memory chips 130, a high-speed external card interface 112 connected to write and read data to and from the memory chips 130, and a test engine 125 configured to control the high-speed interface 112 and/or the memory chips 130 in order to provide testing functions to a second substantially identical memory card 110.
Some embodiments of card 110 further include one or more memory controllers 127, each one of the one or more memory controllers 127 connected to control a subset of the plurality of memory chips 130.
Some embodiments of card 110 further include one or more caches 124; each one of the one or more caches 124 operatively coupled to a corresponding one of the memory controllers 127.
In some embodiments of card 110, the high-speed external card interface 112 further includes a crossbar switch, one or more SerDes ports, each one of the one or more SerDes ports connectable through the crossbar switch to a plurality of the caches.
Some embodiments of the first memory card 110 further include a control interface, the control interface configured to program the test engine and to initialize, control, and observe test sequences.
Another aspect of the invention in some embodiments provides a single-chip memory-support circuit 120 that includes a system interface 112, a memory interface 113 operable to generate read and write operations to a memory 130, wherein the circuit 120 operates to provide data from the memory interface 113 to the system interface 112, and a test engine 125 operatively coupled to control the system interface 112 and the memory interface 113 in order to provide testing functions. In some embodiments, the testing functions are programmably configurable, i.e., they can be controlled by information that is loadable into the test engine. Since this control information is loadable, it can be changed to enable testing of various conditions that perhaps could not be anticipated early in the design phase.
Some embodiments of card 110 further include a control interface 122, wherein testing configuration information is loadable through the control interface 122 into the test engine 125 to provide the programmably configurable testing functions.
Some embodiments of card 110 further include a cache operatively coupled to the memory interface and the system interface to provide cached data to the system interface.
In some embodiments, the test engine includes a test-generation function; and a test-result-checking function, wherein results can be returned and checked in an order different than the order in which they were generated.
Another aspect of the invention in some embodiments provides a integrated-circuit chip that includes an input-output port; and a test engine operatively coupled to control the input/output port such that functionality of the input/output port can be tested by connecting the input/output port to a similar port of another chip and sending test commands to and receiving test results from the other chip's port.
In some embodiments of this chip, the testing can be performed without regard to the electrical and architectural implementation of the ports.
Some embodiments of this chip further include a memory interface operable to generate read and write operations to a memory, wherein the circuit operates to provide data from the memory interface into the input/output port.
Some embodiments of this chip further include a control interface, wherein testing configuration information is loadable through the control interface into the test engine to provide testing functions.
Some embodiments of this chip further include a cache operatively coupled to the memory interface and the input/output port to provide cached data to the input/output port.
Some embodiments of this chip further include functional logic on the chip; wherein use of the test engine is independent of operation of the functional logic.
Some embodiments of this chip further include functional logic on the chip; wherein use of the test engine is independent of and tests operation of the functional logic.
In some embodiments, the test engine generates a plurality of tests in order that two or more simultaneous functions of the functional logic are tested at the same time. For example, testing cache and causing heavy memory traffic, by requesting lots of data that is not in the cache, which in turn causes additional memory operations to fill the cache. In some embodiments, the WTE 125 can stimulate the crossbar 123 with a broadcast function requesting, for example, four pieces of data simultaneously. In some embodiments, the results checker 347 provides simultaneous checking of up to four results.
In some embodiments, various functions provided by the test engine are also used in normal operation. For example, the test engine provides a fast, efficient, and easily programmed way to provide additional functionality to the MDC 110 for normal operation, such as the ability to zero a block of data, or to fill data patterns that are recognizable as invalid data (such functions could be, but need not be, associated with allocation of memory blocks). In some embodiments, a user requests the operating system (OS) (e.g., of processor 106 of
The WTE 125 is also useful for debugging, in some embodiments. For example, the user sees that some program is making a memory reference to an address that is considered out of bounds, and the program is crashing the operating system, but due to the large number of different programs that are multitasking in the computer system it is very difficult to tell which program is making the out-of-bounds memory request, or where in the program. Thus, in some embodiments, the WTE 125 is used to initialized some or all unused memory with a particular data pattern that is never validly usable by normal code (e.g., in a memory with SECDED error-correction code, this could be a pattern of all zeros in the normal 32-bit data field, and with a pattern of data in the field of error-correction bits (the seven or eight extra bits that are used for error correction) that indicates a two-or-more-bit uncorrectable error). Upon receiving the command to initialize memory, WTE 125 would go through the memory-allocation block and initialize that piece of memory that is going out of bounds with the predetermined special data pattern (which gives an uncorrectable error indication when accessed as normal memory). Thus, when the user accesses that area (e.g., the area beyond the end of a defined array), they get a multiple-bit error due to the initialization done by WTE 125. When a user's program is exceeding the bounds of an array, the multiple-bit error pattern is read from past end of array, and the W-chip 120 recognizes and reports the “corrupt data.”
In some embodiments, there is an interrupt generated by the W-chip 120 for multiple-bit errors that are detected. In some embodiments, each memory controller 127 performs SECDED error correction (generates the ECC bits on data being written, and checks and corrects correctable errors, and reports uncorrectable errors). WTE 125 can cause writes of 40-bit data (of any arbitrary pattern), rather than 32-bit data plus SECDED, as is written from the normal write if data from a system processor. In some embodiments, the interrupts to report errors go through the normal data path through the high-speed serial ports, and the error gets reported back by an interrupt-request packet to inform the OS that this or that error happened.
In some embodiments, all requests have TID (Transaction IDentifier) tags that are sent to MDC 110 with each request, and then when the data are retrieved, they are returned with the corresponding TID to identify to the processor which request this data belongs to. If an error is detected, the error return includes the corresponding TID, along with an error-reply flag (indicating an error in the request, MDC 110 unable to satisfy with the proper data). The OS is told which card and which memory controller 127 detected the error.
In some embodiments, another aspect of the invention provides a system for testing a first memory card. This system includes a test fixture having a first interface connectable to the first memory card, such that at least some inputs of the first interface are connected to corresponding outputs of the first interface, and a test controller operable to send test configuration data to the first interface to cause a testing function to be performed by the first memory card when connected to the fixture.
In some embodiments, the first interface connects one SerDes port of the first memory card to another SerDes port of the first memory card.
In some embodiments, the test controller receives test results from the first memory card indicative of functionality of the first memory card.
In some embodiments, the test controller includes an interface to send and receive data from a control interface port on the first memory card.
In some embodiments, the test controller is operable to configure the first memory card to each one of a plurality of different operation modes.
Some embodiments of the invention include a computer-readable medium (such as, for example, a CDROM, DVD, floppy diskette, hard disk drive, flash memory device, or network or internet connection connectable to supply instructions. The computer-readable medium includes instructions stored thereon for causing a suitably programmed information processing system to perform methods that implement any or all of the inventions and combinations described herein. In some embodiments, this computer-readable medium is connected or connectable to system 100 of
In some embodiments, all of the memory 130 is implemented on a single chip. In some embodiments, all of the circuitry described for one or another of the embodiments of MDC 110 is implemented on a single chip.
It is to be understood that the above description is intended to be illustrative, and not restrictive. Although numerous characteristics and advantages of various embodiments as described herein have been set forth in the foregoing description, together with details of the structure and function of various embodiments, many other embodiments and changes to details will be apparent to those of skill in the art upon reviewing the above description. The scope of the invention should be, therefore, determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled. In the appended claims, the terms “including” and “in which” are used as the plain-English equivalents of the respective terms “comprising” and “wherein,” respectively. Moreover, the terms “first,” “second,” and “third,” etc., are used merely as labels, and are not intended to impose numerical requirements on their objects.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US4319357||Dec 14, 1979||Mar 9, 1982||International Business Machines Corp.||Double error correction using single error correcting code|
|US4384348||Dec 29, 1980||May 17, 1983||Fujitsu Limited||Method for testing semiconductor memory device|
|US4394763||Aug 29, 1980||Jul 19, 1983||Fujitsu Limited||Error-correcting system|
|US4667330||Sep 2, 1986||May 19, 1987||Oki Electric Industry Co., Ltd.||Semiconductor memory device|
|US4757503||Apr 3, 1987||Jul 12, 1988||The University Of Michigan||Self-testing dynamic ram|
|US4782486||May 14, 1987||Nov 1, 1988||Digital Equipment Corporation||Self-testing memory|
|US4888773||Jun 15, 1988||Dec 19, 1989||International Business Machines Corporation||Smart memory card architecture and interface|
|US5200963||Jun 26, 1990||Apr 6, 1993||The United States Of America As Represented By The Administrator, National Aeronautics And Space Administration||Self-checking on-line testable static ram|
|US5267242||Sep 5, 1991||Nov 30, 1993||International Business Machines Corporation||Method and apparatus for substituting spare memory chip for malfunctioning memory chip with scrubbing|
|US5274648||Feb 3, 1992||Dec 28, 1993||International Business Machines Corporation||Memory card resident diagnostic testing|
|US5274765 *||Apr 17, 1990||Dec 28, 1993||Bull S.A||Multifunctional coupler for connecting a central processing unit of a computer to one or more peripheral devices|
|US5278839||Apr 16, 1991||Jan 11, 1994||Hitachi, Ltd.||Semiconductor integrated circuit having self-check and self-repair capabilities|
|US5357621||Oct 12, 1993||Oct 18, 1994||Hewlett-Packard Company||Serial architecture for memory module control|
|US5400342||Feb 14, 1992||Mar 21, 1995||Nippon Telegraph & Telephone Corporation||Semiconductor memory having test circuit and test method thereof|
|US5406565||Jul 6, 1990||Apr 11, 1995||Mv Limited||Memory array of integrated circuits capable of replacing faulty cells with a spare|
|US5430859 *||Jul 26, 1991||Jul 4, 1995||Sundisk Corporation||Solid state memory system including plural memory chips and a serialized bus|
|US5495491||Mar 5, 1993||Feb 27, 1996||Motorola, Inc.||System using a memory controller controlling an error correction means to detect and correct memory errors when and over a time interval indicated by registers in the memory controller|
|US5502814||Jun 29, 1993||Mar 26, 1996||Fujitsu Limited||Method of detecting defective memory locations in computer system|
|US5533194||Dec 28, 1994||Jul 2, 1996||International Business Machines Corporation||Hardware-assisted high speed memory test apparatus and method|
|US5537564||Mar 8, 1993||Jul 16, 1996||Zilog, Inc.||Technique for accessing and refreshing memory locations within electronic storage devices which need to be refreshed with minimum power consumption|
|US5745508||Nov 13, 1995||Apr 28, 1998||Tricord Systems, Inc.||Error-detection code|
|US5751728||Apr 28, 1997||May 12, 1998||Nec Corporation||Semiconductor memory IC testing device|
|US5774646||Jul 26, 1994||Jun 30, 1998||Sgs Thomson Microelectronics S.R.L.||Method for detecting faulty elements of a redundancy semiconductor memory|
|US5787101||May 5, 1997||Jul 28, 1998||Thomson Consumer Electronics, Inc.||Smart card message transfer without microprocessor intervention|
|US5822265||Jul 29, 1997||Oct 13, 1998||Rockwell Semiconductor Systems, Inc.||DRAM controller with background refresh|
|US6226766||Apr 7, 1994||May 1, 2001||Texas Instruments Incorporated||Method and apparatus for built-in self-test of smart memories|
|US6229727||Sep 28, 1998||May 8, 2001||Cisco Technology, Inc.||Method and apparatus for support of multiple memory devices in a single memory socket architecture|
|US6285962||Mar 15, 1999||Sep 4, 2001||Tanisys Technology, Inc.||Method and system for testing rambus memory modules|
|US6434648||Dec 10, 1998||Aug 13, 2002||Smart Modular Technologies, Inc.||PCMCIA compatible memory card with serial communication interface|
|US6463001||Sep 15, 2000||Oct 8, 2002||Intel Corporation||Circuit and method for merging refresh and access operations for a memory device|
|US6640321||Apr 14, 2000||Oct 28, 2003||Lsi Logic Corporation||Built-in self-repair of semiconductor memory with redundant row testing using background pattern|
|US6658610||Sep 25, 2000||Dec 2, 2003||International Business Machines Corporation||Compilable address magnitude comparator for memory array self-testing|
|US6779128||Feb 18, 2000||Aug 17, 2004||Invensys Systems, Inc.||Fault-tolerant data transfer|
|US6796501||Apr 30, 2001||Sep 28, 2004||Semiconductor Components Industries, L.L.C.||Smart card reader circuit and method of monitoring|
|US6879530||Jul 18, 2002||Apr 12, 2005||Micron Technology, Inc.||Apparatus for dynamically repairing a semiconductor memory|
|US6898101||Dec 16, 1997||May 24, 2005||Cypress Semiconductor Corp.||Microcontroller with programmable logic on a single chip|
|US6901541||Mar 13, 2001||May 31, 2005||Micron Technology, Inc.||Memory testing method and apparatus|
|US6944694||Jul 11, 2001||Sep 13, 2005||Micron Technology, Inc.||Routability for memory devices|
|US7073112||Oct 8, 2003||Jul 4, 2006||International Business Machines Corporation||Compilable address magnitude comparator for memory array self-testing|
|US7088713||Jun 19, 2001||Aug 8, 2006||Broadcom Corporation||Switch fabric with memory management unit for improved flow control|
|US20020116668||Feb 20, 2001||Aug 22, 2002||Matrix Semiconductor, Inc.||Memory card with enhanced testability and methods of making and using the same|
|US20050053057||Jun 15, 2004||Mar 10, 2005||Silicon Graphics, Inc.||Multiprocessor node controller circuit and method|
|EP0441088A1||Oct 31, 1990||Aug 14, 1991||International Business Machines Corporation||Memory card resident diagnostic testing|
|EP0849743B1||Dec 19, 1997||Mar 26, 2003||Texas Instruments Incorporated||Built-in self test memory devices|
|JP2003015966A||Title not available|
|U.S. Classification||702/118, 702/190, 702/188, 702/189|
|International Classification||G06F11/00, G11C29/44, G11C29/26|
|Cooperative Classification||G11C11/401, G11C29/26, G11C2029/0401, G11C29/44, G11C29/4401, G11C2029/4002|
|European Classification||G11C29/44A, G11C29/26, G11C29/44|