US 7865699 B2
This invention pertains to apparatus, method and a computer program stored on a computer readable medium. The computer program includes instructions for use with an instruction unit having a code page, and has computer program code for partitioning the code page into at least two sections for storing in a first section thereof a plurality of instruction words and, in association with at least one instruction word, for storing in a second section thereof an extension to each instruction word in the first section. The computer program further includes computer program code for setting a state of at least one page table entry bit for indicating, on a code page by code page basis, whether the code page is partitioned into the first and second sections for storing instruction words and their extensions, or whether the code page is comprised instead of a single section storing only instruction words.
1. A digital data processor comprising an instruction unit, said instruction unit comprising a code page that is partitioned for storing in a first section thereof a plurality of instruction words and, in association with at least one instruction word, in a second section thereof an extension to said at least one instruction word,
where said first section is comprised of a first plurality of storage locations, and where said second section is comprised of a second plurality of storage locations, and where said at least one instruction word and said extension to said at least one instruction word are one of fixed length and variable length program instructions, wherein each of the first and second section is comprised of contiguous storage locations,
where at least some of the second storage locations are not allocated for storing instruction word extensions.
2. A digital data processor as in
3. A digital data processor as in
4. A digital data processor as in
5. A digital data processor as in
6. A digital data processor as in
7. A digital data processor comprising an instruction unit, said instruction unit comprising a code page that is partitioned for storing in a first section thereof a plurality of instruction words and, in association with at least one instruction word, in a second section thereof an extension to said at least one instruction word, further comprising at least one page table entry bit having a state for indicating, on a code page by code page basis, whether the code page is partitioned into said first and second sections for storing instruction words and at least one instruction word extension, or whether the code page is comprised instead of a single section storing only instruction words, wherein each of the first and second section is comprised of contiguous storage locations.
8. A digital data processor as in
9. A digital data processor as in
10. A digital data processor as in
11. A digital data processor as in
12. A digital data processor as in
13. A method to operate an instruction unit having a code page, comprising:
partitioning said code page into at least two sections;
storing in a first section thereof a plurality of instruction words and, in association with at least one instruction word, storing in a second section thereof an extension to said at least one instruction word, wherein each of the at least two sections is comprised of contiguous storage locations; and
detecting when program execution has reached the end of the first section for ensuring that a next instruction address is not contained in the second section.
14. A method as in
15. A method as in
16. A method as in
17. A method as in
18. A method as in
19. A method as in
20. A method as in
21. A method as in
22. A method as in
23. A method as in
24. A computer program stored on a computer readable medium, said computer program comprising instructions for use with an instruction unit having a code page, comprising:
computer program code for partitioning said code page into at least two sections for storing in a first section thereof a plurality of instruction words and, in association with at least one instruction word, for storing in a second section thereof an extension to said at least one instruction word, wherein each of the first and second section is comprised of contiguous storage locations; and
computer program code for setting a state of at least one page table entry bit for indicating, on a code page by code page basis, whether the code page is partitioned into said first and second sections for storing instruction words and at least one instruction word extension, or whether the code page is comprised instead of a single section storing only instruction words, where at least some of the second storage locations are not allocated for storing instruction word extensions.
25. A computer program as in
This patent application is a continuation of U.S. patent application Ser. No. 10/720,585, filed on Nov. 24, 2003 now U.S. Pat. No. 7,340,588.
This invention relates generally to digital data processor architectures and, more specifically, relates to program instruction decoding and execution hardware that operates with either fixed or variable length instruction words.
A number of data processor instruction set architectures (ISAs) operate with fixed length instructions. For example, several Reduced Instruction Set Computer (RISC) architecture data processors, such as one known as the PowerPC™ (PowerPC is trademark of the International Business Machines Corporation), feature instruction words that have a (fixed) width of 32 bits. Another conventional architecture, known as IA-64 EPIC (Explicitly Parallel Instruction Computer), uses a fixed format of three instructions per 128 bits, and a 32-bit Modifier field (the first word in every quadword) that provides up to 10 additional instruction bits for each of the next three instructions of the quadword.
As instruction pipelines become deeper and memory latencies become longer, more instructions must be in flight (executing) at once in order to keep data processor execution units well utilized. However, in order to increase the number of non-memory operations in flight, it is generally necessary to increase the number of registers in the data processor, so that independent instructions may read their inputs and write their outputs without interfering with the execution of other instructions. Unfortunately, in most RISC architectures there is not sufficient space in a 32-bit opcode (instruction word) for operands to specify more than 32 registers, i.e., 5-bits per operand, with most operations requiring three operands and some requiring two or four operands.
In addition, as the conventional fixed-width data processor architectures age, new applications become important, and these new applications may require new types of instructions to run efficiently. For example, in the last few years multimedia vector extensions have been made to several ISA's, for example SSE-2 for the IA-32 architecture and VMX (also known Altivec™, a trademark of Motorola, Inc., or by Velocity Engine™, a trademark of Apple Computer, Inc.) for the PowerPC™ architecture. However, with only a fixed number of bits in an instruction word, it has become increasingly difficult or impossible to add new instructions/opcodes to many architectures.
Several techniques for extending instruction word length have been proposed and used in the prior art. For example, Complex Instruction Set Computer (CISC) architectures generally allow the use of a variable length instruction. However variable instruction lengths have at least three significant drawbacks.
A first drawback to the use of variable length instructions is that they complicate the decoding of instructions, as the instruction length is generally not known until at least a part of the instruction has been read, and because the positions of all operands within an instruction are likewise not generally known until at least part of the instruction is read.
A second drawback to the use of variable length instructions is that variable length instructions may cross a memory page boundary. In modern data processors having address translation this means that both the lower order and higher order parts of the instruction address must be checked to ensure that they have a valid mapping from the effective address space given by the instruction pointer to the physical address space of the machine, with an appropriate exception being signaled if one or both parts of the instruction address do not have a valid mapping. It is noted that page crossings cannot occur if: (1) instructions have a fixed width of 32-bits (or equivalently 4 bytes, or any number of bytes that is a power of 2); and (2) instruction addresses are aligned on a “natural” byte boundary corresponding to the width of the instruction, e.g., 4 byte instructions on 4-byte boundaries.
A third drawback to the use of variable length instructions is that instructions of variable width are not compatible with the existing code for fixed width data processor architectures.
The use of a fixed width 64-bit instruction word (or other higher powers of two) would avoid the first two problems, but not the third. However, the use of 64-bit instructions introduces the further difficulty that the additional 32-bits beyond the current 32-bit instruction words are far more than what is needed to specify the numbers of additional registers required by deeper instruction pipelines, or the number of additional opcodes likely to be needed in the foreseeable future. The use of excess instruction bits wastes space in main memory and in instruction caches, thereby slowing the performance of the data processor.
The above-mentioned IA-64 architecture packs three instructions into 16 bytes (128-bits), for an average of 42.67 bits per instruction. While this type of instruction encoding avoids problems with page and cache line crossing, this type of instruction encoding also exhibits several problems, both on its own, and as a technique for extending other fixed instruction width ISAs.
First, and without incurring significant implementation difficulty (likely slowing the execution speed and requiring significantly more integrated circuit die area), this technique allows branches to go only to the first of the three instructions, whereas most other architectures allow branches to any instruction.
Second, this technique also “wastes” bits for specifying the interaction between instructions. For example, “stop bits” are used to indicate if all three instructions can be executed in parallel, or whether they must be executed sequentially, or whether some combination of the two is possible.
Third, the three instruction packing technique also forces additional complexity in the implementation in order to deal with three instructions at once.
Finally, the three instruction packing format for IA-64 has no requirement to be compatible with existing 32-bit instruction sets. As a result, there is no obvious mechanism to achieve compatibility with other fixed width instruction encodings, such as the conventional 32-bit RISC encodings.
Prior to this invention, the problems that were inherent in the prior art instruction word extension approaches were not adequately addressed or solved.
The foregoing and other problems are overcome, and other advantages are realized, in accordance with the presently preferred embodiments of this invention.
This invention provides a method and an apparatus to augment instruction sets that use fixed width instructions to include additional or extra instruction bits per instruction word. The extra instruction word bits are added in a manner that is compatible with existing conventional fixed instruction width code, and permit the mixing of conventional and augmented instructions, with one type directly invoking the other (without operating system intervention). A feature of this invention is that the number of bits that are added is not excessive as compared to what is required to specify a reasonable number of additional registers and/or opcodes. For example, in a presently preferred embodiment only eight bits are added to a 32-bit instruction word. Another feature of this invention is that the widened or augmented instructions never cross a page boundary, and thus require only one access to a translation lookaside buffer (TLB), and furthermore will not generate a page fault exception in the middle of an instruction. Another feature of this invention is that the widened instructions have a fixed width (e.g., 40 bits in the preferred embodiment when starting from an instruction set that employs 32-bit instructions), and thereby the problems associated with variable width instruction words are avoided. In addition, the widened instructions made possible by the use of this invention do not require any changes in the way the program counter is updated, nor do they impose any restrictions on the operation of branch instructions (e.g., any instruction can be the target of a branch instruction).
In the preferred embodiment of this invention instructions at the end of the page are skipped when updating the program counter sequentially, or by using a branch at the end of a page.
A feature of this invention is a divided code page structure, guaranteeing that instructions and their extensions lie on the same code page.
Another feature of this invention is the use of at least one page table entry bit for indicating instruction length on a page-by-page basis.
While the exemplary embodiment describes a page-table selected embodiment, the teachings of the present invention can be practiced with a variety of selection techniques. One non-limiting example of such a selection technique uses one of a per-process bit, a per-segment bit, or a global mode bit such as a bit in the machine state register (MSR). In another embodiment, this mode maybe entered by executing a specially designated instruction that indicates that such a mode is entered, e.g., a special “switch” instruction, or a “jump and switch”, “branch subroutine and switch”, or “return from subroutine” instruction.
A still further feature of this invention is in providing an ability to maintain existing instruction address semantics. For example, during sequential code execution the program counter is still updated by the same number of bytes (e.g., four bytes)as in the original architecture to point to the next instruction.
This invention pertains to apparatus, method and a computer program stored on a computer readable medium. For example, the computer program includes instructions for use with an instruction unit having a code page, and has computer program code for partitioning the code page into at least two sections for storing in a first section thereof a plurality of instruction words and, in association with at least one instruction word, for storing in a second section thereof an extension to the at least one instruction word. The computer program further includes computer program code for setting a state of at least one page table entry bit for indicating, on a code page by code page basis, whether the code page is partitioned into the first and second sections for storing instruction words and at least one instruction word extension, or whether the code page is comprised instead of a single section storing only instruction words.
The foregoing and other aspects of these teachings are made more evident in the following Detailed Description of the Preferred Embodiments, when read in conjunction with the attached Drawing Figures, wherein:
It is noted at the outset that this invention will be described below in the context of an extension of 32-bit instruction words, of a type commonly employed in RISC architectures, to 40-bit instruction words. However, instruction width augmentation for other fixed width instruction sizes (e.g., 40-bits, 64-bits, or 128-bits, including bundle-oriented instruction sets such as some Very Long Instruction Word (VLIW) architectures such as one known as IA-64 (Intel Corporation)) are also within the scope of this invention, as is the instruction width augmentation of 32-bit instruction words to other than 40-bit instruction words (e.g., to 48-bits, or to 56-bits). The invention is also described in the context of a page size of 4096 bytes (1024, 32-bit instructions). However, the use of other page sizes (more or less than 4096 bytes) is also clearly within the scope of this invention. Thus, those skilled in the art should realize that the ensuing description, and specific references to numbers of bits, instruction widths and code page sizes is not intended to be read in a limiting sense upon the practice of this invention.
In this embodiment of the invention the upper one fourth of a (standard 4096-byte) code page is reserved for storing extensions to instructions in the lower three quarters of the code page. In other words, the lower 3072 bytes, offset 0x000 to 0xBFF, of the page hold 768 32-bit instructions. The upper 1024 bytes, offset 0xC00 to 0xFFF, hold 768 8-bit instruction extensions, one for each of the 768 instructions. The upper 256 bytes of the page (or 1/16 of the page) is not required, and in this embodiment is not used.
Specifically, the instruction at offset 0x000 has a 1-byte extension at offset 0xC00, the instruction at offset 0x004 has a one byte extension at offset 0xC01, and so forth up to the instruction at 0xBFC, which has a one byte extension at offset 0xEFF. In general if an instruction is at offset Q on a page (0<=Q<=0xBFC), then the one byte extension is at offset (0xC00|(Q>>2)), which is a straightforward computation for hardware.
Furthermore, in that all instructions and their extensions are always on the same page, only a single translation lookaside buffer (TLB) lookup is necessary. Furthermore, neither the instructions or their extensions ever cross page boundaries.
It should be noted that in a given code page, every 32-bit basic instruction has a corresponding 8-bit extension, or no 32-bit basic instruction has an 8-bit extension. In the preferred embodiment, whether a page is of the first type or of the second type is determined by a value of a bit in the Page Table Entry (PTE) for that page. Whenever code is executed using address translation (which is almost always in modern processors), each page has a corresponding PTE, whose contents are normally determined by the operating system. When code is executed without using address translation, all instructions are treated as 32-bit basic instructions and there are no extensions. Whether address translation is used is also normally determined by the operating system and conveyed to the processor via a bit in the processor's machine state register (MSR). For code pages employing 8-bit extensions to each 32-bit basic instruction, extensions are supplied for all 768 basic instructions, even if the extended instruction results in the same function (e.g., a no-operation (NOP) function) as would have been conveyed by the 32-bit basic instruction.
The basic 32-bit instructions and their 8-bit extensions may be combined into a single 40-bit instruction at several points.
The output of the L1 instruction cache 14 is applied to the input of a 40-bit wide, otherwise conventional instruction processing pipeline having a fetch stage 20, a decode stage 22, an execution stage 24 and a writeback stage 26. The use of the invention with other instruction processing pipeline architectures should be readily apparent to those skilled in the art.
The instruction unit 10 also includes address decoding logic 28 that operates for page addresses above 0xC00 to determine if the PTE bit Ext_Ins_Page is true. If it is, then a Fetch Address Exception condition is indicated, if the IAR 16 generates an instruction fetch address whose low order 12 bits are equal to or greater than 0xC00 (an address in the second section 12B of the code page 12) (i.e., when the IAR should be addressing a 32-bit base instruction in the first section 12A (address: 0x000<0xC00) of the code page 12).
It should be noted that if the Ext_Ins_Page bit is not true, then the IAR 16 can freely generate instruction addresses for the entire 1024 word code page 12, including instruction addresses in the range of 0xC00 to 0xFFC, in a conventional manner.
In the embodiment of
Discussing the foregoing embodiments now in further detail, in the embodiment of
There are at least two techniques to determine whether an instruction has 8-bits of associated extended instruction, and these techniques are independent of the technique for combining the 32-bit and 8-bit portions of instructions into a single 40-bit instruction.
In the first technique a new mode bit may be added to the machine state register (MSR), which when true indicates that the instruction fetch should use the 40-bit extended scheme, with traditional fixed width (32-bit) instructions being used otherwise. This technique is compatible with existing code, and allows the 40-bit extensions to be used in any number of applications. However, the use of the MSR is not presently preferred, as its use requires operating system intervention to change modes (as modifying the MSR is generally only permitted when a processor is in “supervisor” state). This technique also does not allow code with traditional 32-bit instructions to call, and be called by, code with 40-bit extended instructions.
In the presently preferred technique, the one depicted in
To most effectively implement this invention the instruction pipeline 20-26 is modified as necessary to incorporate the additional numbers of bits and, as in
The software that is stored on the computer readable medium is designed, in accordance with this invention, to accommodate the selective partitioning of the code page 12, and the setting of the state of the PTE bit in the TLB 18.
It should be noted that more than one bit in a PTE may be used to support multiple instruction sizes. For example, two PTE bits can be used to specify normal (32-bit) instruction word widths, or 40-bit instruction word widths, or 48-bit instruction word widths, or 56-bit instruction word widths, with corresponding changes being made to the sizes of the 32-bit code portion 12A and the extended code portion 12B of the code page 12.
There are further considerations and advantages that arise from the use of this invention. For example, on pages with 40-bit instructions the last instruction must branch to the start of the next page, in order to skip over the extended code region 12B, if the instruction that would have “naturally” occupied this location on the page is not an unconditional branch, or to the branch target, if the instruction that would have “naturally” occupied this last location on the page is an unconditional branch. Alternatively, additional hardware (e.g., a comparator 19 (shown in
The “unused” space (256 bytes) at the end of a page of 40-bit instructions may be used for certain purposes. These include, but are not limited to, storing constant values, storing security information, and storing CRC or checksum of other error detection and/or correction information to guarantee the integrity of the page. Such uses need not be defined in the architecture, and may be specified by the software.
While described thus far in the context of fixed width basic instructions (e.g., 32-bit basic instructions and extensions thereof), this invention can also be used with variable width instruction architectures. For example, it is known that variable width instruction architectures can run out of “prefix bits” for indicating the number of bytes in a variable width instruction. In such cases the techniques used by this invention for extending fixed basic instruction widths may be employed to extend the number of bits available in a variable width instruction, and hence allow additional “prefix bits” to be specified, if needed.
In general, each instruction word has a width of x bits and each instruction word extension has a width of y bits, where x=n*(8-bits), where y=m*(8-bits), where n is an integer greater than one (e.g., n=4, or n=8), and where m has a value less than one, equal to one, or greater than one for providing an overall instruction word of the desired width. For example, m may be 0.5 for providing an instruction word extension of 4-bits, or m may be 1.0 for providing an instruction word extension of 8-bits, or m may be 1.5 for providing an instruction word extension of 12-bits.
Based on the foregoing, and in view of the presently preferred embodiments discussed above, it can be appreciated that this invention enables eight additional bits of extended opcode for each existing 32-bit instruction of an instruction set. The use of these eight bits of extended opcode is sufficient to add two bits to each of four register fields, and to thus permit up to 128 registers to be directly addressed in every instruction. Since most machines of interest have multiple register files, e.g. integer and floating point registers, the use of this invention allows each of these files to be extended from, for example, 32 entries to 128 entries. Alternatively, some or all of the additional instruction bits may be used to specify new instruction types for execution.
Furthermore, the use of this invention does not require any change to the computation of a next instruction address, nor does it impose any restriction on branch instructions (other than they not branch to an instruction extension in the code page section 12B).
The teachings of this invention are compatible with existing RISC code and, using as little as a single bit with each PTE (the Extended Instruction Page (Ext_Ins_Page) bit), this invention allows code with 40-bit instructions to be intermixed with traditional 32-bit instructions, with no mode change or supervisor activity required. Beneficially, in that the basic instructions and their extensions are always located on the same page 12, only a single TLB 18 lookup is necessary. Neither the instructions nor their extensions ever cross page boundaries.
The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the best method and apparatus presently contemplated by the inventors for carrying out the invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. As but some examples, and as was noted above, this invention is not limited to the use of any specific instruction widths, instruction extension widths, code page memory sizes, specific sizes of partitions or allocations of code page memory and the like, nor is this invention limited for use with any one specific type of hardware architecture or programming model, nor is this invention limited to a particular instruction pipeline. The use of other and similar or equivalent embodiments may be attempted by those skilled in the art. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention.
Further, some of the features of the present invention could be used to advantage without the corresponding use of other features. As such, the foregoing description should be considered as merely illustrative of the principles of the present invention, and not in limitation thereof.