US 20050273559 A1
A microprocessor architecture including a unified cache debug unit. A debug unit on the processor chip receives data/command signals from a unit of the execute stage of the multi-stage instruction pipeline of the processor and returns information to the execute stage unit. The cache debug unit is operatively connected to both instruction and data cache units of the microprocessor. The memory subsystem of the processor may be accessed by the cache debug unit through either of the instruction or data cache units. By unifying the cache debug in a separate structure, the need for redundant debug structure in both cache units is obviated. Also, the unified cache debug unit can be powered down when not accessed by the instruction pipeline, thereby saving power.
1. In a microprocessor, a microprocessor core comprising:
a multistage pipeline;
a cache debug unit;
a data pathway between the cache debug unit and an instruction cache unit;
a data pathway between the cache debug unit and a data cache unit; and
a data pathway between a unit of the multistage pipeline and the cache debut unit.
2. The microprocessor according to
3. The microprocessor according to
4. The microprocessor according to
5. The microprocessor according to
6. The microprocessor according to
7. The microprocessor according to
8. A microprocessor comprising:
a multistage pipeline;
a data cache unit;
an instruction cache unit; and
a unified cache debug unit operatively connected to the data cache unit, the instruction cache unit, and the multistage pipeline.
9. The microprocessor according to
10. The microprocessor according to
11. The microprocessor according to
12. The microprocessor according to
13. The microprocessor according to
14. The microprocessor according to
15. A RISC-type microprocessor comprising:
a multistage pipeline; and
a cache debug unit, wherein the cache debug unit comprises:
an interface to an instruction cache unit of the microprocessor; and
an interface to a data cache unit of the microprocessor.
16. The microprocessor according to
17. The microprocessor according to
18. The microprocessor according to
19. The microprocessor according to
20. The microprocessor according to
21. The microprocessor according to
This application claims priority to provisional application No. 60/572,238 filed May 19, 2004, entitled “Microprocessor Architecture,” hereby incorporated by reference in its entirety.
This invention relates generally to microprocessor architecture and more specifically to an improved cache debug unit for a microprocessor.
A major focus of microprocessor design has been to increase effective clock speed through hardware simplifications. Exploiting the property of locality of memory references, cache memories have been successful in achieving high performance in many computer systems. In the past, cache memories of microprocessor-based systems were provided off-chip using high performance memory components. This was primarily because the amount of silicon area necessary to provide an on-chip cache memory of reasonable performance would have been impractical. Increasing the size of an integrated circuit to accommodate a cache memory adversely impacts the yield of the integrated circuit in a given manufacturing process. However, with the density achieved recently in integrated circuit technology, it is now possible to provide on-chip cache memory economically.
In a computer system with a cache memory, when a memory word is needed, the central processing unit (CPU) looks into the cache memory for a copy of the memory word. If the memory word is found in the cache memory, a cache “hit” is said to have occurred, and the main memory is not accessed. Thus, a figure of merit which can be used to measure the effectiveness of the cache memory is the “hit” ratio. The hit ratio is the percentage of total memory references in which the desired datum is found in the cache memory without accessing the main memory. When the desired datum is not found in the cache memory, a “cache miss” is said to have occurred and the main memory is then accessed for the desired datum. In addition, in many computer systems there are portions of the address space which are not mapped to the cache memory. This portion of the address space is said to be “uncached” or “uncacheable”. For example, the addresses assigned to input/output (I/O) devices are almost always uncached. Both a cache miss and an uncacheable memory reference result in an access to the main memory.
In the course of developing or debugging a computer system, it is often necessary to monitor program execution by the CPU or to interrupt one instruction stream to direct the CPU to execute certain alternate instructions. A known method used to debug a processor utilizes means for observing the program flow during operation of the processor. With systems having off-chip cache, program observability is relatively straight forward by using probes. However, observing the program flow of processors having cache integrated on-chip is much more difficult because most of the processing operations are performed internally within the chip.
As integrated circuit manufacturing techniques have improved, on-chip cache has become standard in most microprocessors designs. Due to difficulties in interfacing with the on-chip cache, debugging systems have also had to move onto the chip. Modern on-chip cache memories may now employ cache debug units directly in the cache memory themselves.
There is therefore a need for a cached processor having relatively simple design, reduced silicon footprint and reduced power consumption that allows the real time capture of data in the cached processor for debug purposes and which can be used at high frequencies.
It should be appreciated that the description herein of various advantages and disadvantages associated with known apparatus, methods, and materials is not intended to limit the scope of the invention to their exclusion. Indeed, various embodiments of the invention may include one or more of the known apparatus, methods, and materials without suffering from their disadvantages.
As background to the techniques discussed herein, the following references are incorporated herein by reference: U.S. Pat. No. 6,862,563 issued Mar. 1, 2005 entitled “Method And Apparatus For Managing The Configuration And Functionality Of A Semiconductor Design” (Hakewill et al.); U.S. Ser. No. 10/423,745 filed Apr. 25, 2003, entitled “Apparatus and Method for Managing Integrated Circuit Designs”; and U.S. Ser. No. 10/651,560 filed Aug. 29, 2003, entitled “Improved Computerized Extension Apparatus and Methods”, all assigned to the assignee of the present invention.
Various embodiments of the invention are disclosed that overcome one or more of the shortcomings of conventional microprocessors through a microprocessor architecture having a unified cache debug unit. In these embodiments, a separate cache debug unit is provided which serves as an interface to both the instruction cache and the data cache. In various exemplary embodiments, the cache debug has shared hardware logic accessible to both the instruction cache and the data cache. In various exemplary embodiments, a cache debug unit may be selectively switched off or run on a separate clock than the instruction pipeline. In various exemplary embodiments, an auxiliary unit of the execute stage of the microprocessor core is used to pass instructions to the cache debug unit and to receive responses back from the cache debug unit. Through the instruction cache and data cache respectively, the cache debug unit may also access the memory subsystem to perform cache flushes, cache updates and various other debugging functions.
At least one exemplary embodiment of the invention provide a microprocessor core comprising a multistage pipeline, a cache debug unit, a data pathway between the cache debug unit and an instruction cache unit, a data pathway between the cache debug unit and a data cache unit, and a data pathway between a unit of the multistage pipeline and the cache debut unit.
At least one additional exemplary embodiment provides a microprocessor comprising a multistage pipeline, a data cache unit, an instruction cache unit, and a unified cache debug unit operatively connected to the data cache unit, the instruction cache unit, and the multistage pipeline.
Yet another exemplary embodiment of this invention provides a RISC-type microprocessor comprising a multistage pipeline, and a cache debug unit, wherein the cache debug unit comprises an interface to an instruction cache unit of the microprocessor, and an interface to a data cache unit of the microprocessor.
Other aspects and advantages of the invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.
The following description is intended to convey a thorough understanding of the invention by providing specific embodiments and details involving various aspects of a new and useful microprocessor architecture. It is understood, however, that the invention is not limited to these specific embodiments and details, which are exemplary only. It further is understood that one possessing ordinary skill in the art, in light of known systems and methods, would appreciate the use of the invention for its intended purposes and benefits in any number of alternative embodiments, depending upon specific design and other needs.
Discussion of the invention will now made by way of example in reference to the various drawing figures.
Because the microprocessor core 100 shown in
Still referring to
Another novel feature of the microprocessor architecture illustrated in
With continued reference to
Referring now to
As noted herein, in a conventional microprocessor architecture employing cache debug, a portion of each of the instruction cache and data cache will be allocated for debug logic. Usually, however, these debug functions are performed off line, rather than at run time, and/or are expected to be slow. Furthermore, there are strong similarities to the debug functions in both the instruction cache and the data cache causing redundant logic to be employed in the processor design, thereby increasing costs and complexity of the design. Although the debug units are seldom used during runtime, they consume power even when not being specifically invoked because of their inclusion in the instruction and data cache components themselves.
In various exemplary embodiments, this design drawback of conventional cache debug units is overcome by a unified cache debug unit 200, such as that shown in
As shown in the exemplary embodiment illustrated in
With continued reference to
In various exemplary embodiments, because the CDU 200 is located outside of both the instruction cache 210 and the data cache 220, the architecture of each of these structures is simplified. Moreover, because in various exemplary embodiments, the CDU 200 may be selectively turned off when it is not being used, less power will be consumed than with conventional cache-based debug units which receive power even when not in use. In various embodiments, the cache debug unit 200 remains powered off until a call is received from the auxiliary unit 240 or until the pipeline determines that an instruction from the auxiliary unit 240 to the cache debug unit 200 is in the pipeline. In various embodiments, the cache debug unit will remain powered on until an instruction is received to power off. However, in various other embodiments, the cache debug unit 200 will power off after all requested information has been sent back to the auxiliary unit 240. Moreover, because conventional instruction and data cache debug units have similar structure, reduction in total amount of silicon may be achieved due to shared logic hardware in the CDU 200.
While the foregoing description includes many details and specificities, it is to be understood that these have been included for purposes of explanation only. The embodiments of the present invention are not to be limited in scope by the specific embodiments described herein. For example, although many of the embodiments disclosed herein have been described with reference to cache debug unit in an RISC-type embedded microprocessor, the principles herein are equally applicable to cache debug units in microprocessors in general. Indeed, various modifications of the embodiments of the present inventions, in addition to those described herein, will be apparent to those of ordinary skill in the art from the foregoing description and accompanying drawings. Thus, such modifications are intended to fall within the scope of the following appended claims. Further, although the embodiments of the present inventions have been described herein in the context of a particular implementation in a particular environment for a particular purpose, those of ordinary skill in the art will recognize that its usefulness is not limited thereto and that the embodiments of the present inventions can be beneficially implemented in any number of environments for any number of purposes. Accordingly, the claims set forth below should be construed in view of the full breadth and spirit of the embodiments of the present inventions as disclosed herein.