|Publication number||US20050273559 A1|
|Application number||US 11/132,432|
|Publication date||Dec 8, 2005|
|Filing date||May 19, 2005|
|Priority date||May 19, 2004|
|Also published as||CN101002169A, US8719837, US9003422, US20050278505, US20050278513, US20050278517, US20050289321, US20050289323, US20140208087, WO2005114441A2, WO2005114441A3|
|Publication number||11132432, 132432, US 2005/0273559 A1, US 2005/273559 A1, US 20050273559 A1, US 20050273559A1, US 2005273559 A1, US 2005273559A1, US-A1-20050273559, US-A1-2005273559, US2005/0273559A1, US2005/273559A1, US20050273559 A1, US20050273559A1, US2005273559 A1, US2005273559A1|
|Inventors||Aris Aristodemou, Daniel Hansson, Morgyn Taylor, Kar-Lik Wong|
|Original Assignee||Aris Aristodemou, Daniel Hansson, Morgyn Taylor, Kar-Lik Wong|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (31), Referenced by (4), Classifications (50), Legal Events (1)|
|External Links: USPTO, USPTO Assignment, Espacenet|
This application claims priority to provisional application No. 60/572,238 filed May 19, 2004, entitled “Microprocessor Architecture,” hereby incorporated by reference in its entirety.
This invention relates generally to microprocessor architecture and more specifically to an improved cache debug unit for a microprocessor.
A major focus of microprocessor design has been to increase effective clock speed through hardware simplifications. Exploiting the property of locality of memory references, cache memories have been successful in achieving high performance in many computer systems. In the past, cache memories of microprocessor-based systems were provided off-chip using high performance memory components. This was primarily because the amount of silicon area necessary to provide an on-chip cache memory of reasonable performance would have been impractical. Increasing the size of an integrated circuit to accommodate a cache memory adversely impacts the yield of the integrated circuit in a given manufacturing process. However, with the density achieved recently in integrated circuit technology, it is now possible to provide on-chip cache memory economically.
In a computer system with a cache memory, when a memory word is needed, the central processing unit (CPU) looks into the cache memory for a copy of the memory word. If the memory word is found in the cache memory, a cache “hit” is said to have occurred, and the main memory is not accessed. Thus, a figure of merit which can be used to measure the effectiveness of the cache memory is the “hit” ratio. The hit ratio is the percentage of total memory references in which the desired datum is found in the cache memory without accessing the main memory. When the desired datum is not found in the cache memory, a “cache miss” is said to have occurred and the main memory is then accessed for the desired datum. In addition, in many computer systems there are portions of the address space which are not mapped to the cache memory. This portion of the address space is said to be “uncached” or “uncacheable”. For example, the addresses assigned to input/output (I/O) devices are almost always uncached. Both a cache miss and an uncacheable memory reference result in an access to the main memory.
In the course of developing or debugging a computer system, it is often necessary to monitor program execution by the CPU or to interrupt one instruction stream to direct the CPU to execute certain alternate instructions. A known method used to debug a processor utilizes means for observing the program flow during operation of the processor. With systems having off-chip cache, program observability is relatively straight forward by using probes. However, observing the program flow of processors having cache integrated on-chip is much more difficult because most of the processing operations are performed internally within the chip.
As integrated circuit manufacturing techniques have improved, on-chip cache has become standard in most microprocessors designs. Due to difficulties in interfacing with the on-chip cache, debugging systems have also had to move onto the chip. Modern on-chip cache memories may now employ cache debug units directly in the cache memory themselves.
There is therefore a need for a cached processor having relatively simple design, reduced silicon footprint and reduced power consumption that allows the real time capture of data in the cached processor for debug purposes and which can be used at high frequencies.
It should be appreciated that the description herein of various advantages and disadvantages associated with known apparatus, methods, and materials is not intended to limit the scope of the invention to their exclusion. Indeed, various embodiments of the invention may include one or more of the known apparatus, methods, and materials without suffering from their disadvantages.
As background to the techniques discussed herein, the following references are incorporated herein by reference: U.S. Pat. No. 6,862,563 issued Mar. 1, 2005 entitled “Method And Apparatus For Managing The Configuration And Functionality Of A Semiconductor Design” (Hakewill et al.); U.S. Ser. No. 10/423,745 filed Apr. 25, 2003, entitled “Apparatus and Method for Managing Integrated Circuit Designs”; and U.S. Ser. No. 10/651,560 filed Aug. 29, 2003, entitled “Improved Computerized Extension Apparatus and Methods”, all assigned to the assignee of the present invention.
Various embodiments of the invention are disclosed that overcome one or more of the shortcomings of conventional microprocessors through a microprocessor architecture having a unified cache debug unit. In these embodiments, a separate cache debug unit is provided which serves as an interface to both the instruction cache and the data cache. In various exemplary embodiments, the cache debug has shared hardware logic accessible to both the instruction cache and the data cache. In various exemplary embodiments, a cache debug unit may be selectively switched off or run on a separate clock than the instruction pipeline. In various exemplary embodiments, an auxiliary unit of the execute stage of the microprocessor core is used to pass instructions to the cache debug unit and to receive responses back from the cache debug unit. Through the instruction cache and data cache respectively, the cache debug unit may also access the memory subsystem to perform cache flushes, cache updates and various other debugging functions.
At least one exemplary embodiment of the invention provide a microprocessor core comprising a multistage pipeline, a cache debug unit, a data pathway between the cache debug unit and an instruction cache unit, a data pathway between the cache debug unit and a data cache unit, and a data pathway between a unit of the multistage pipeline and the cache debut unit.
At least one additional exemplary embodiment provides a microprocessor comprising a multistage pipeline, a data cache unit, an instruction cache unit, and a unified cache debug unit operatively connected to the data cache unit, the instruction cache unit, and the multistage pipeline.
Yet another exemplary embodiment of this invention provides a RISC-type microprocessor comprising a multistage pipeline, and a cache debug unit, wherein the cache debug unit comprises an interface to an instruction cache unit of the microprocessor, and an interface to a data cache unit of the microprocessor.
Other aspects and advantages of the invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.
The following description is intended to convey a thorough understanding of the invention by providing specific embodiments and details involving various aspects of a new and useful microprocessor architecture. It is understood, however, that the invention is not limited to these specific embodiments and details, which are exemplary only. It further is understood that one possessing ordinary skill in the art, in light of known systems and methods, would appreciate the use of the invention for its intended purposes and benefits in any number of alternative embodiments, depending upon specific design and other needs.
Discussion of the invention will now made by way of example in reference to the various drawing figures.
Because the microprocessor core 100 shown in
Still referring to
Another novel feature of the microprocessor architecture illustrated in
With continued reference to
Referring now to
As noted herein, in a conventional microprocessor architecture employing cache debug, a portion of each of the instruction cache and data cache will be allocated for debug logic. Usually, however, these debug functions are performed off line, rather than at run time, and/or are expected to be slow. Furthermore, there are strong similarities to the debug functions in both the instruction cache and the data cache causing redundant logic to be employed in the processor design, thereby increasing costs and complexity of the design. Although the debug units are seldom used during runtime, they consume power even when not being specifically invoked because of their inclusion in the instruction and data cache components themselves.
In various exemplary embodiments, this design drawback of conventional cache debug units is overcome by a unified cache debug unit 200, such as that shown in
As shown in the exemplary embodiment illustrated in
With continued reference to
In various exemplary embodiments, because the CDU 200 is located outside of both the instruction cache 210 and the data cache 220, the architecture of each of these structures is simplified. Moreover, because in various exemplary embodiments, the CDU 200 may be selectively turned off when it is not being used, less power will be consumed than with conventional cache-based debug units which receive power even when not in use. In various embodiments, the cache debug unit 200 remains powered off until a call is received from the auxiliary unit 240 or until the pipeline determines that an instruction from the auxiliary unit 240 to the cache debug unit 200 is in the pipeline. In various embodiments, the cache debug unit will remain powered on until an instruction is received to power off. However, in various other embodiments, the cache debug unit 200 will power off after all requested information has been sent back to the auxiliary unit 240. Moreover, because conventional instruction and data cache debug units have similar structure, reduction in total amount of silicon may be achieved due to shared logic hardware in the CDU 200.
While the foregoing description includes many details and specificities, it is to be understood that these have been included for purposes of explanation only. The embodiments of the present invention are not to be limited in scope by the specific embodiments described herein. For example, although many of the embodiments disclosed herein have been described with reference to cache debug unit in an RISC-type embedded microprocessor, the principles herein are equally applicable to cache debug units in microprocessors in general. Indeed, various modifications of the embodiments of the present inventions, in addition to those described herein, will be apparent to those of ordinary skill in the art from the foregoing description and accompanying drawings. Thus, such modifications are intended to fall within the scope of the following appended claims. Further, although the embodiments of the present inventions have been described herein in the context of a particular implementation in a particular environment for a particular purpose, those of ordinary skill in the art will recognize that its usefulness is not limited thereto and that the embodiments of the present inventions can be beneficially implemented in any number of environments for any number of purposes. Accordingly, the claims set forth below should be construed in view of the full breadth and spirit of the embodiments of the present inventions as disclosed herein.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US5423011 *||Jun 11, 1992||Jun 6, 1995||International Business Machines Corporation||Apparatus for initializing branch prediction information|
|US5450586 *||Apr 30, 1992||Sep 12, 1995||Hewlett-Packard Company||System for analyzing and debugging embedded software through dynamic and interactive use of code markers|
|US5493687 *||Jul 8, 1991||Feb 20, 1996||Seiko Epson Corporation||RISC microprocessor architecture implementing multiple typed register sets|
|US5530825 *||Apr 15, 1994||Jun 25, 1996||Motorola, Inc.||Data processor with branch target address cache and method of operation|
|US5560036 *||May 1, 1995||Sep 24, 1996||Mitsubishi Denki Kabushiki Kaisha||Data processing having incircuit emulation function|
|US5586279 *||Jan 28, 1994||Dec 17, 1996||Motorola Inc.||Data processing system and method for testing a data processor having a cache memory|
|US5636363 *||Jun 14, 1991||Jun 3, 1997||Integrated Device Technology, Inc.||Hardware control structure and method for off-chip monitoring entries of an on-chip cache|
|US5808876 *||Jun 20, 1997||Sep 15, 1998||International Business Machines Corporation||Multi-function power distribution system|
|US5809293 *||Jul 29, 1994||Sep 15, 1998||International Business Machines Corporation||System and method for program execution tracing within an integrated processor|
|US5848264 *||Oct 25, 1996||Dec 8, 1998||S3 Incorporated||Debug and video queue for multi-processor chip|
|US5920711 *||Sep 18, 1995||Jul 6, 1999||Synopsys, Inc.||System for frame-based protocol, graphical capture, synthesis, analysis, and simulation|
|US5964884 *||Sep 26, 1997||Oct 12, 1999||Advanced Micro Devices, Inc.||Self-timed pulse control circuit|
|US5978909 *||Nov 26, 1997||Nov 2, 1999||Intel Corporation||System for speculative branch target prediction having a dynamic prediction history buffer and a static prediction history buffer|
|US6154857 *||Dec 17, 1997||Nov 28, 2000||Advanced Micro Devices, Inc.||Microprocessor-based device incorporating a cache for capturing software performance profiling data|
|US6185732 *||Aug 25, 1997||Feb 6, 2001||Advanced Micro Devices, Inc.||Software debug port for a microprocessor|
|US6292879 *||Oct 23, 1996||Sep 18, 2001||Anthony S. Fong||Method and apparatus to specify access control list and cache enabling and cache coherency requirement enabling on individual operands of an instruction of a computer|
|US6550056 *||Dec 3, 1999||Apr 15, 2003||Mitsubishi Denki Kabushiki Kaisha||Source level debugger for debugging source programs|
|US6609194 *||Nov 12, 1999||Aug 19, 2003||Ip-First, Llc||Apparatus for performing branch target address calculation based on branch type|
|US6622240 *||Feb 1, 2000||Sep 16, 2003||Intrinsity, Inc.||Method and apparatus for pre-branch instruction|
|US6774832 *||Mar 25, 2003||Aug 10, 2004||Raytheon Company||Multi-bit output DDS with real time delta sigma modulation look up from memory|
|US6823444 *||Jul 3, 2001||Nov 23, 2004||Ip-First, Llc||Apparatus and method for selectively accessing disparate instruction buffer stages based on branch target address cache hit and instruction stage wrap|
|US6925634 *||Dec 3, 2001||Aug 2, 2005||Texas Instruments Incorporated||Method for maintaining cache coherency in software in a shared memory system|
|US6963554 *||Dec 27, 2000||Nov 8, 2005||National Semiconductor Corporation||Microwire dynamic sequencer pipeline stall|
|US7093165 *||Oct 24, 2002||Aug 15, 2006||Kabushiki Kaisha Toshiba||Debugging Method|
|US20020100019 *||Dec 3, 2001||Jul 25, 2002||Hunter Jeff L.||Software shared memory bus|
|US20020100020 *||Dec 3, 2001||Jul 25, 2002||Hunter Jeff L.||Method for maintaining cache coherency in software in a shared memory system|
|US20030046614 *||May 29, 2002||Mar 6, 2003||Brokish Charles W.||System and method for using embedded real-time analysis components|
|US20030126508 *||Dec 28, 2001||Jul 3, 2003||Timothe Litt||Method and apparatus for efficiently implementing trace and/or logic analysis mechanisms on a processor chip|
|US20030154463 *||Feb 8, 2002||Aug 14, 2003||Betker Michael Richard||Multiprocessor system with cache-based software breakpoints|
|US20050097398 *||Oct 30, 2003||May 5, 2005||International Business Machines Corporation||Program debug method and apparatus|
|US20050273559 *||May 19, 2005||Dec 8, 2005||Aris Aristodemou||Microprocessor architecture including unified cache debug unit|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US8495287 *||Jun 24, 2010||Jul 23, 2013||International Business Machines Corporation||Clock-based debugging for embedded dynamic random access memory element in a processor core|
|US20050273559 *||May 19, 2005||Dec 8, 2005||Aris Aristodemou||Microprocessor architecture including unified cache debug unit|
|US20050289323 *||May 19, 2005||Dec 29, 2005||Kar-Lik Wong||Barrel shifter for a microprocessor|
|US20110320716 *||Jun 24, 2010||Dec 29, 2011||International Business Machines Corporation||Loading and unloading a memory element for debug|
|U.S. Classification||711/125, 714/E11.207, 712/41, 711/126|
|International Classification||G06F15/76, H03M13/00, G06F9/318, G06F9/00, G06F9/30, G06F15/78, G06F15/00, G06F9/38, G06F12/00, G06F12/08|
|Cooperative Classification||G06F9/32, G06F15/7867, G06F9/30032, G06F11/3648, G06F5/01, G06F9/325, G06F9/30149, G06F9/3885, G06F9/3816, Y02B60/1207, G06F9/30036, G06F9/3844, G06F12/0802, G06F9/3897, G06F9/30181, Y02B60/1225, G06F9/3806, G06F9/3802, G06F9/3846, G06F9/3861|
|European Classification||G06F11/36B7, G06F9/38E2D, G06F5/01, G06F9/38T8C2, G06F9/30X, G06F9/38B, G06F15/78R, G06F9/38D2, G06F9/38B9, G06F9/38T, G06F9/38B2B, G06F9/30T2, G06F9/30A1P, G06F9/32B6, G06F9/38E2S, G06F9/30A1M|
|Aug 22, 2005||AS||Assignment|
Owner name: ARC INTERNATIONAL, UNITED KINGDOM
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ARISTODEMOU, ARIS;HANSSON, DANIEL;TAYLOR, MORGYN;AND OTHERS;REEL/FRAME:016909/0825
Effective date: 20050721