« PreviousContinue »
RISC PROCESSOR ARCHITECTURE WITH
HIGH PERFORMANCE CONTEXT
SWITCHING IN WHICH ONE CONTEXT
CAN BE LOADED BY A CO-PROCESSOR
WHILE ANOTHER CONTEXT IS BEING 5
ACCESSED BY AN ARITHMETIC LOGIC
BACKGROUND OF THE INVENTION
1. Field of the Invention
The invention relates to reduced instruction set computer (RISC) processor architecture. More particularly, the invention relates to a processor architecture designed to substantially improve processing speed in real time I/O intensive applications.
2. State of the Art
One of the many known methods for increasing throughput in a microprocessor is known as "pipeline processing". Pipeline processing involves overlapping the execution of several instructions by temporally offsetting each subse- 20 quent instruction. In order to implement pipeline processing effectively, it is preferable that each instruction in the processor's instruction set utilize the same number of clock cycles. For example, in a case where each instruction utilizes exactly n-number of clock cycles, a pipeline of n-number of 25 instructions can be created with each subsequent instruction being offset from the previous instruction by one clock cycle. In such a system of pipeline processing, the processor effectively processes one full instruction each clock cycle. One of the achievements of RISC processor design is the 30 definition of an instruction set in which the execution of all, or most, instructions require a uniform number of cycles. A discussion of the general background of RISC can be found in "MIPS R-2000 RISC Architecture" by G. Kane (Prentice Hall, 1987) the complete disclosure of which is hereby 35 incorporated by reference herein.
A popular prior art RISC architecture is the MIPS I Instruction Set Architecture (ISA). MIPS is a simple but high performance RISC architecture which has attracted enormous third-party support. The MIPS I and MIPS II ISAs 40 are well documented in "MIPS RISC Architecture" by G. Kane and J. Heinrich (Prentice Hall, 1992), the complete disclosure of which is hereby incorporated by reference herein.
The MIPS R-2000 processor executes instructions in five 45 portions (one per clock cycle) and the instruction pipeline is a five stage pipeline, one stage per instruction portion. The five instruction portions are instruction fetch (IF), read operands from registers while decoding instruction (RD), perform operation on instruction operands (ALU), access 50 memory (MEM), and write back results to a register (WB). Prior art FIG. 1 illustrates the MIPS pipeline with five instructions offset from each other by one clock cycle. As shown in FIG. 1, during the cycle in which the first instruction is writing back results to a register (WB), the second 55 instruction is accessing memory (MEM), the third instruction is performing an operation on instruction operands (ALU), the fourth instruction is reading operands from registers while decoding instruction (RD), and the fifth instruction is fetching the instruction (IF) from instruction 60 RAM. Additional background on the MIPS pipeline may be found in "Computer Organization and Design: the Hardware/Software Interface", by D. A. Patterson and J. L. Hennessey (Morgan Kauffmann, 1994), the complete disclosure of which is hereby incorporated by reference herein. 65
The instruction pipeline in RISC architecture achieves a certain amount of operational "parallelism". In the example
shown in FIG. 1, once the pipeline is full, five instructions are executed in parallel. Although each instruction still requires five clock cycles, a new instruction can be added to the pipeline each clock cycle to keep the pipeline full. So long as the pipeline is full, the RISC processor may continue to process instructions at the effective rate of one instruction per clock cycle, provided there are no stall cycles, NOP instructions, or aborted pipelines.
Those skilled in the art will appreciate that inherent latencies exist for load, jump, and branch instructions and that some instructions may require data which is not yet available. These conditions are referred to as processing interdependencies. One way to resolve interdependencies is to stall or delay the pipeline. Another way (utilized by the R-2000) is to insert NOP (no operation) instructions in the pipeline to account for latency between instructions. The insertion of NOP instructions is effected by the software assembler when a program is compiled. It will also be understood that exceptions (e.g., interrupts) interfere with the smooth flow of the pipeline. When an R-2000 detects an exception, for example, the instruction causing the exception is aborted and all instructions in the pipeline which have started execution are aborted. A jump to the designated exception handler occurs. After the exception is processed, the processor returns to the instruction which preceded the instruction which was executing when the exception occurred. Interrupt handling robs processor cycles and degrades system performance. If interrupt handling is not efficient, the performance advantages of pipeline processing may be lost.
Most modern processors, including RISC processors, support multiple simultaneous processes and/or multithreaded processes. When running several different programs on a single processor (multiple simultaneous processes) or when running a multithreaded processes, it is necessary for the processor (or operating system) to switch from one program or thread (context) to another. Context switching is often performed according to a priority schedule whereby some processes are given more processing time than others. Theoretically, context switching can improve system performance by switching to a new context whenever a process or thread is stalled waiting for an I/O device and by returning to the stalled process or thread when it is ready to run. In practice, however, context switching tends to prevent optimum system performance because extra processing cycles (128 cycles in the case of a MIPS processor) must be used to switch contexts and no process instructions are executed during the context switch. During a context switch, the contents of all immediate registers (also called general purpose registers, i.e. registers which are directly read from or written to by the ALU of the processor) which describe the state of the current process are saved to RAM before switching to another process. After saving the current state (context), the next context is loaded from RAM into registers before the next process can be run. This nonproductive processor activity (saving and restoring register contents) can adversely affect overall performance, particularly in a real time event driven system where context switches are largely governed by I/O activity.
Even with a single thread program, context switching may occur often. For example, the MIPS R-2000 ISA has two operating modes: user mode and kernel mode. Each of these modes is a different context and the programmer may create several "user mode" contexts, each for a different thread. However, even with a single user mode context, context switching between the user mode context and the kernel context may occur frequently. According to the MIPS ISA,
the CPU enters the kernel mode whenever an exception is detected and remains in kernel mode until a Restore From Exception (RFE) instruction is executed. Consequently, in an event driven application, frequent context switches can be expected regardless of the number of threads-in user 5 modes.
The relative high speed of RISC processors make them an ideal choice for telecommunications applications including SONET and ATM applications. Despite the power of RISC processors, however, the extremely high demands of 1° SONET and ATM telecommunications tax the resources of RISC processors, particularly with regard to interrupt handling and context switching. It will be appreciated that telecommunications in general is almost entirely real time event driven and that the high volume, broad band commu- 15 nications provided via SONET and ATM is even more so.
SUMMARY OF THE INVENTION
It is therefore an object of the invention to provide a processor architecture which is particularly well suited for 20 telecommunications applications.
It is also an object of the invention to provide a processor architecture which is particularly well suited for real time event driven applications. ^
It is another object of the invention to provide a processor architecture which is ideally suited to interrupt handling and context switching.
It is still another object of the invention to provide an improved context switching architecture in a RISC proces- 30 sor which is readily supported by third-party products.
In accord with these objects which will be discussed in detail below, the RISC processor of the present invention is similar to a MIPS R-2000 processor with several modifications which are designed to optimize the processor for use in 35 telecommunications applications such as SONET and ATM applications and to generally optimize its performance for real time event driven applications. More specifically, the processor of the invention broadly includes a sequencer, a register ALU (RALU), an optional (preferable) data RAM, 40 and a coprocessor interface. The sequencer includes an Nx32 bit instruction RAM (IRAM) which is booted from external memory through the coprocessor interface. The RALU includes an ALU and a multiport register file implemented as a plurality of general purpose registers which are 45 arranged to accommodate three contexts. According to a presently preferred embodiment, the multiported register file includes three sets of general purpose registers and a new opcode is provided for switching among the sets of general purpose registers. With multiple sets of general purpose 50 registers, context switching can be completed in three processing cycles. In addition, one set of general purpose registers can be loaded by a coprocessor while another set of general purpose registers is in use by the ALU. According to a presently preferred embodiment, each of the three sets of 55 general purpose registers includes twenty-eight thirty-two bit registers. In addition, according to the presently preferred embodiment, a single set of four thirty-two bit common registers is provided for use in any context. The set of common registers is preferably used to store information 60 which is used by more than one context. With the three sets of general purpose registers, the processor of the invention services interrupts approximately 10-12 times faster than a standard MIPS R-2000 processor.
According to the preferred embodiment of the invention, 65 the data RAM is preferably Mx32 bits, is byte addressable, and is preferably implemented with asynchronous SRAM.
The RISC processor of the invention is designed to operate within most of the MIPS ISA with a few instructions ignored and several new instructions added. Accordingly, consistent with the MIPS ISA, the sequencer is treated as coprocessor 0 and coprocessor 1 is reserved for a floating point unit. Whereas the MIPS ISA only provides for two additional coprocessors (for a total of four), the ISA according to the invention supports up to six additional coprocessors (for a total of eight). According to the invention, all logic external to the processor is accessed through one of the (six) coprocessor interfaces.
The processor's pipeline, interblock communication, and clocking scheme have been designed to operate in an ASIC implementation from a VHDL model which utilizes most of the MIPS I ISA (except for features which are not relevant to telecommunications and other I/O intensive applications) with the enhancements described herein. Most of the new instructions in the ISA of the invention deal with coprocessor functionality, exception processing, and context switching.
Additional objects and advantages of the invention will become apparent to those skilled in the art upon reference to the detailed description taken in conjunction with the provided figures.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a diagram of prior art pipeline instruction processing in a MIPS processor;
FIG. 2 is a schematic block diagram of the major functional blocks of a processor according to the invention;
FIG. 3 is a schematic block diagram of the major functional blocks of the RALU of FIG. 2;
FIG. 4 is a schematic block diagram of the major functional blocks of the sequencer of FIG. 2;
FIG. 5 is a schematic block diagram of the major functional blocks of the coprocessor interface of FIG. 2; and
FIG. 6 is a timing diagram of the waveforms of key signals of the alternate context interface of the invention.
DETAILED DESCRIPTION OF THE
Referring now to FIG. 2, a processor 10 according to the invention generally includes a sequencer 12, a register ALU (RALU) 14, data RAM 16, and a coprocessor interface 18, each being coupled to a thirty-two bit data bus 20. The data RAM 16 is not essential to the operation of the processor, but is preferable for most applications. The data RAM is preferably Mx32 bits, is byte addressable, and is preferably implemented with asynchronous SRAM. The sequencer 12 is coupled to the RALU 14 and the coprocessor 18 by a thirty-two bit instruction bus 22 whereby instructions fetched by the sequencer from IRAM are made available to the RALU and the coprocessor(s) as described in more detail below. When data RAM 16 is provided, it is controlled by the RALU, 14 via a control link 24. Flags for conditional instructions and traps are passed by the RALU to the sequencer 12 via a flag line 26. It will be appreciated that the sequencer 12, RALU 14, and coprocessor interface 18 each have a clock/reset input 28, 30, 32 respectively. In addition, the sequencer has an interrupt request input 34 as well as a coprocessor condition flag input 36. It will also be understood that the coprocessor interface 18 is provided with I/O lines 38 for coupling to a coprocessor.
As mentioned above, the presently preferred processor 10 according to the invention is based on the MIPS R-2000 ISA