WO1996038780A2 - Method for performing signed division - Google Patents

Method for performing signed division Download PDF

Info

Publication number
WO1996038780A2
WO1996038780A2 PCT/US1996/007614 US9607614W WO9638780A2 WO 1996038780 A2 WO1996038780 A2 WO 1996038780A2 US 9607614 W US9607614 W US 9607614W WO 9638780 A2 WO9638780 A2 WO 9638780A2
Authority
WO
WIPO (PCT)
Prior art keywords
equal
dividend
temporary
register
setting
Prior art date
Application number
PCT/US1996/007614
Other languages
French (fr)
Other versions
WO1996038780A3 (en
Inventor
John T. Hon-Kai
Original Assignee
National Semiconductor Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Semiconductor Corporation filed Critical National Semiconductor Corporation
Priority to EP96925249A priority Critical patent/EP0772815A2/en
Publication of WO1996038780A2 publication Critical patent/WO1996038780A2/en
Publication of WO1996038780A3 publication Critical patent/WO1996038780A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/535Dividing only
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2207/00Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F2207/535Indexing scheme relating to groups G06F7/535 - G06F7/5375
    • G06F2207/5353Restoring division
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/499Denomination or exception handling, e.g. rounding or overflow
    • G06F7/49905Exception handling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/499Denomination or exception handling, e.g. rounding or overflow
    • G06F7/49936Normalisation mentioned as feature only
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/499Denomination or exception handling, e.g. rounding or overflow
    • G06F7/49994Sign extension

Definitions

  • the present invention relates generally to microprocessor architecture, and more particularly to the architecture of a microprocessor execution unit which performs arithmetic and logic operations concurrently with address computations.
  • microprocessors have a "pipelined architecture" whereby the processor is divided into stages. This permits the processor to perform several tasks at once thereby allowing the processor to work on different parts of the instructions simultaneously as they are advanced through the pipe with each clock cycle. Under ideal conditions, one instruction can leave the pipeline and another instruction enter the pipeline every clock cycle.
  • One such microprocessor is the Intel486TM microprocessor.
  • Compatible x86-type microprocessors include those made by Advanced Micro Devices and Cyrix.
  • the x86 microprocessor has a complex instruction set architecture which includes over 400 instructions. However, some of these instructions are rarely used by either the operating system or the compiler. Thus, it would be desirable to optimize the architecture for commonly used instructions.
  • the memory of an x86-type microprocessor is organized as a sequence of 8- bit bytes and each byte is assigned a unique physical address.
  • application programs do not directly address the physical address, but instead use a virtual addressing scheme whereby the physical address is calculated based on a memory management model that includes segmentation and paging.
  • the physical memory is divided into independent memory spaces called segments. Each segment has a segment descriptor which contains its base address and a size limit for that segment.
  • An application program issues a logical address which the addressing hardware translates into a linear address by adding an offset to the base address, so long as the offset does not exceed the size limit.
  • T e offset is the sum of three components, namely, a displacement value, a base register and an index register.
  • Paging is also supported by x86-type architecture, whereby linear address space, which may be part of physical memory or disk storage, is divided into 4k blocks of memory called pages. If paging is employed, addressing hardware translates the linear address into a physical address. If not, the linear address is the same as the physical address.
  • x86-type addressing hardware must handle the worst case scenario, namely, wherein the effective address is the sum of the base and all offset components. However, in many applications, only one offset component is present. Therefore, it would also be desirable to optimize the addressing hardware to handle the usual rather than the worst case addressing computation.
  • An architecture for a microprocessor execution unit is disclosed.
  • the architecture is generally described as having an arithmetic unit and an addressing unit.
  • the arithmetic unit performs arithmetic and logical operations on a pair of operands in response to control signals.
  • the addressing unit operates in conjunction with the arithmetic unit to calculate linear addresses as well as offsets and limits.
  • the arithmetic unit includes a first portion for performing addition, subtraction and logical operations, a second portion for performing multiply, divide and single-bit shift operations, and a third portion for performing multi-bit shift and rotate operations.
  • the addressing unit is coupled to the output of the arithmetic unit.
  • the base component is input to the addresing unit and the index and displacement components are input to the arithmetic unit.
  • the results are summed in a single cycle to yield a linear address.
  • Figure 1 is a block diagram showing portions of an integrated microprocessor system
  • Figure 2 is a l ⁇ nctional block diagram of the execution unit portion of the integrated microprocessor system of Figure 1
  • Figure 3 is a more detailed block diagram of the execution unit of Figure 2
  • Figure 4 is a schematic diagram of the result registers for a division operation
  • Figure 5 is a schematic diagram of the result registers for a multiplication operation
  • Figure 6 is a block diagram of a portion of the execution unit of Figure 3
  • the preferred embodiment of the present invention is an integrated microprocessor system 1 having a pipelined architecture, wherein the pipeline includes, for example, a decoder stage 2, an execution unit 3 and a writeback stage 4
  • the present disclosure is directed to the execution unit 3, which provides a 32-bit data path for operands and instructions stored in general registers, including register file 6 and shadow register 7, and wherein arithmetic, logical and addressing computations are performed by the execution unit for programming instructions executed by the integrated microprocessor system 1
  • the microprocessor system 1 includes a control logic unit 5 which is coupled to send control signals CTRL to the execution unit 3 and to receive data signals SGNL from the execution unit
  • the control logic unit 5 is also coupled to other components of the microprocessor system 1 and receives microcode and other input for making programmed control decisions
  • the execution unit 3 is implemented to be substanually compatible with the Intel x86 instruction set, as set forth in the "Intel486TM MICROPROCESSOR FAMILY PROGRAMMER'S REFERENCE MANUAL," which is expressly incorporated herein by reference
  • the x86 instructions will be referenced herein in their common mnemonic form, such as ADD, SUB, MUL, DIV, etc
  • FIG. 2 A simplified functional diagram of the execution unit 3 is shown in Figure 2
  • An arithmetic unit 50 has two inputs 51 and 52 for receiving operands OpA and OpB, respectively, from register file 6
  • the arithmetic unit 50 generates an arithmetic or logical result 53 in a single cycle for many x86 instructions
  • the arithmetic unit 50 includes a first portion for performing addition, subtraction and logical operations, a second portion for performing multiply, divide and single-bit shift operations, and a third portion for performing multi-bit shift and rotate operations, as will be shown and desc ⁇ bed in more detail below
  • An addressing unit 55 has one input selectively coupled to the output of the arithmetic unit 50 or to OpA When an addressing instruction is received, the segment base component is provided to the addressing unit 55 on input 56, and the base, index, or displacement components, or immediate segment address operands, are provided to the arithmetic unit 50 on inputs 51 and 52 The addressing unit 55 then sums the address components to yield output 57 which is a linear address
  • a limit check unit 60 is provided to make sure the offset 58, 1 e , output 53 or OpA, is not addressing a location outside of the segment as determined by the control signal LIMIT
  • Sign extension unit 101 is a 3: 1 multiplexor that selects a byte and sign extends it into 32 bits, or selectes a word and sign extends it into 32 bits, or selects a dword, and then outputs the 32 bit result a, concern onto data line 201.
  • the term "sign extend” means copying the sign bit into the 24 highest order bits for a byte or into the 16 highest order bits for a word.
  • Operand B is received into a sign extension unit 102.
  • Sign extension unit 102 includes a 5: 1 multiplexor that selects a signed byte and sign extends it into 32 bits, or a signed word and sign extends it into 32 bits, or an '.nsigned byte and sign extends it into 32 bits, or an unsigned word and sign extends it into 32 bits, or a dword.
  • the output 202 is a 32 bit result b m .
  • Sign extension unit 102 also includes a 2: 1 multiplexor that selects OpB or its complement.
  • An adder 103 receives and operates upon data lines 202 and 203 and carry input Cl 204.
  • Data line 203 is from the output of a 2:1 multiplexor 104, which selects either a, genre data line 201 or UpperQ data line 205.
  • Adder 103 performs logical operations on data lines 202 and 203 to generate logic output 207, which is available to the user through output gate 111.
  • the adder 103 also performs addition on data lines 202, 203 and 204 to generate sum output 206, which is available to the user through output gate 112.
  • Two 32 bit registers are provided for performing multiply, divide and single-bit shift operations.
  • a 3:1 multiplexor 105 selects from a, n data line 201, UpperQ data line 205, or SUM data line 207. The selected value may be shifted either left or right by one bit by left/right shifter 106 and then stored in register 107.
  • a 2:1 multiplexor 108 selects from b, n data line 202 or from LowerQ data line 208. The selected value may be shifted either left or right by one bit by left right shifter 109 and then stored in register 1 10.
  • the least significant bit (LSB) of left/right shifter 106 is coupled to the most significant bit (MSB) of left/right shifter 109 to permit up to 64 single bit position shifts.
  • the UpperQ register 107 provides an output data line 205 which is fed back to multiplexor 104 or multiplexor 105, as described above, or made available to the user through output gate 113.
  • the LowerQ register 1 10 provides an output data line 208 which is fed back to multiplexor 108 as described above, or available to the user through output gate 1 14.
  • a barrel shifter 120 comprising a 32 by 32 transistor array is provided for performing multi-bit shift and rotate operations.
  • a pair of 32 bit 2:1 multiplexors 121, 122 couple the a, n data line 201 to the barrel shifter 120.
  • a 5 bit decoder 123 provides 32 output signals, only one of which is true, to the barrel shifter 120, thus selecting one row of the barrel shifter.
  • the output 209 of the barrel shifter 120 is available to the user through output gate 115.
  • a multiplexor 130 selectively outputs status flags from the execution unit 3 through output gate 116, as shown in Table I:
  • ZF Zero Flag: Zero result set ZF to 1 ; else ZF is cleared
  • the addressing unit includes a 4: 1 multiplexor 152 that selects the a, compassion data line 201 if it is a dword, or zero extends the a, compassion data line 201 if it is a word, or the SUM output 206 if a dword, or zero extends the SUM output if a word
  • zero extend means copying a zero into the 24 highest order bits for a byte or into the 16 highest order bits for a word
  • An adder 154 receives the output from multiplexor 152 as well as the segment base value on data line 56 and adds the two values together, thereby generating a linear address 57
  • a limit check unit 160 is also provided in execution unit 3
  • the address includes a 20 bit limit value 162 which is stored in the shadow register 7
  • This limit value is provided to multiplexor 164, where it is scaled to 32 bits, depending on the value of the granularity bit, then inverted through 32 bit inverter 166
  • the output of inverter 166 is coupled to an adder 168, in which only the carry out function is used, and to a multiplexor 170.
  • the output of multiplexor 152 is also coupled to adder 168.
  • the output B of adder 168 indicates that the offset is below the scaled limit value.
  • the multiplexor 170 is provided with constants HC (half ceiling) and FC (full ceiling), which provide the maximum value for addressing computations and cause selection of either 16 bit addresses (HC) or 32 bits addresses (FC).
  • the output of multiplexor 170 which is the upper limit for address computations, is fed to adder 174. which is a carry save adder (CSA).
  • Additional inputs to CSA 174 are from multiplexor 152 and multiplexor 176.
  • the output of CSA 174 is fed to the input of adder 178 and to a single bit left shift unit 180, which effectively multiplies the value of the carry bits by 2
  • the output of shift unit 180 is fed to the adder 178
  • the output SegSpace of adder 178 is used for a limit calculation by a prefetch unit (not shown) and the output A of adder 178 indicates that the offset in above the scaled limit value
  • execution unit 3 for arithmetic and logical instructions will now be described in more detail.
  • Instructions for addition, subtraction, and logical operations are carried out in a conventional manner by utilizing the resources ot adder 103.
  • a division example of 50 by 7 yields a quotient of 7 with a remainder of 1, as shown in Table II (truncated to 8 bits).
  • Register 1 10 is used to provide the quotient while register 107 is used to provide the remainder, as illustrated in Figure 5
  • the lower 8 bits of register 1 10 contain the quotient while the lower 8 bits of register 107 contain the remainder
  • the lower 16 bits of register 1 10 contain the quotient while the lower 16 bits of register 107 contain the remainder
  • all 32 bits of register 110 contain the quotient while all 32 bits of register 107 contain the remainder.
  • step 302 the divisor is set equal to the absolute value of the 32 bit sign extension of OpB.
  • step 304 the divisor is compared to zero. If true, then an interrupt occurs in step 306 and the routine stops. If not, thei. the data length is determined in step 308 and a temporary register initialized in step 310. If the data length is a byte, then register tempi is set equal to 80 hex. If the data length is a word, then register tempi is set equal to 8000 hex. If the data length is a dword, then register tempi is set equal to 80000000 hex.
  • step 312 the dividend OpA is examined to determine if it is negative. If not, then the program jumps to step 320. Is so, then the lower dividend (register 110) is adjusted to become the two's complement value of the lower dividend in step 314.
  • step 316 the adder 103 is examined to see if there is a carry out. If so, go to step 318. If not, go to step 319.
  • step 318 an adjusted upper dividend is set equal to the two's complement of the upper dividend (register 107).
  • step 319 the adjusted upper dividend is set equal to the one's complement of the upper dividend.
  • step 320 the adjusted upper dividend is set equal to the upper dividend.
  • step 322 the adjusted lower dividend is aligned to be left justified.
  • step 324 a division carry register is set equal to the most significant bit of the adjusted upper dividend, then the adjusted upper dividend is shifted left one bit position, then the LSB of the adjusted upper dividend is set equal to the MSB of the adjusted lower dividend, then the adjusted lower dividend is shifted to the left by one bit position, and finally, the LSB of the adjusted lower dividend is set equal to zero.
  • a temporary result register stores the result of subtracting the value in register tempi from the adjusted divisor. Then, the temporary result register is set equal to the adjusted upper dividend less the temporary result register.
  • step 328 the temporary result and the adjusted upper register are compared to zero. If true, then the parity flag hPF is set equal to one (step 329). If not, then the parity flag hPF is set equal to zero (step 330).
  • step 332 the size is defined based on the data length, i.e., a byte, a word, or a dword.
  • Step 334 includes several sub steps. Step 334a is the first division step and is basically a subtract then shift left. Step 334b calls for comparing the hidden parity flag hPF to 1. If true, then go to step 334c, else go to step 334d.
  • Step 336 checks to see if the adjusted upper dividend and the division carry register are greater than the adjusted divisor, and that the divisor and dividend are positive values. If so, then an interrupt is generated in step 338 and the routine stops. If not, then the size is compared to 0 in step 340. If true, then go to step 350. If not, go to step 342.
  • step 342 a normal division operation is performed.
  • step 344 the size is decremented by one.
  • Step 350 is a division end step that is similar to the normal division step, except that the difference is not shifted left one bit, but is stored directly into the upper register 107.
  • the lower register 110 is updated as before.
  • step 352 the temporary remainder is set equal to the upper register 107. If, in step 354, the temporary remainder is 0, and the divisor is greater than 0, and Hpf equals 1, then an interrupt is generated (step 356) and the routine stops. If not, then a temporary quotient is set equal to the lower register 110 in step 358.
  • step 360 the sign of the divisor is compared to the sign of the dividend. If equal, a second temporary quotient is set equal to the first temporary quotient in step 362. If not, then the second temporary quotient is set equal to the complement of the first temporary quotient in step 364. In step 366, if the sign of the second temporary quotient is not equal to the exclusive OR of the sign of the dividend with the sign of the divisor, and the second temporary quotient is not equal to 0, then an interrupt is generated (step 368) and the routine stops If not, then go to step 370
  • step 370 The dividend is examined in step 370 to see if it is negative If so, then the remainder is set equal to the temporary remainder in step 372 If not, then the remainder is set equal to the complement of the temporary remainder step 374
  • the product of a multiplication operation is contained in registers 110 and 107 as illustrated in Figure 5
  • a 16-bit result is contained in the upper 8 bits of register 110 and the lower 8 bits of register 107
  • a 32-bit result is contained in the upper 16 bits of register 110 and the lower 16 bits of register 107
  • a 64-bit result is contained all 32 bits of register 110 and all 32 bits of register 107
  • the barrel shifter 120 and associated multiplexors 121 and 122 may be used to carry out multi-bit shift and rotate operations, as is more fully described in the following commonly assigned, copendmg applications “BARREL SHIFTER' by Thomas W S Thomson and H John Tarn as filed on May 26, 1995, (2) "BIT SEARCHING THROUGH 8, 16, OR 32-BIT OPERANDS USING A 32-BIT DATA PATH” by Thomas W S Thomson as filed on May 26, 1995, and (3) "METHOD FOR PERFORMING ROTATE THROUGH CARRY USING A 32-BIT BARREL SHIFTER AND COUNTER” by H John Tarn as filed on May 26, 1995
  • Double precision shift operations are also fully supported by the execution unit 3, as more fully described in commonly assigned, copendmg application entitled “DOUBLE PRECISION (64-BIT) SHIFT OPERATIONS USING A 32-BIT DATA PATH” by Thomas W S Thomson and filed on May 26. 1995
  • Addressing computations for x86 segmented address space are optimized in execution unit 3 for the predominant cases, l e , where the address consists only of two components, namely a scaled index and a displacement, or a base and a displacement
  • the execution unit is capable of performing the entire address computation in a single cycle, I e , it can perform calculate the offset, the linear address and the limit in a single cycle
  • FIG. 7 An address cycle is illustrated schematically in Figure 7
  • a 32-bit segment base address is provided to input 56 and defines the memory segment space in which an operand resides
  • a 32-bit or 16-bit segment offset value is added to the segment base to form the linear address
  • the offset value is constructed from up to two general registers, namely a base register or an index register, and a literal displacement value, which is an 8-bit, 16-bit, or 32-bit value taken from the addressing instruction format
  • the index register can be scaled by a factor of 2, 4, or 8 before use, thereby allowing the index register to count elements rather than bytes when indexing through an array
  • the invention embodiments described herein have been implemented in an integrated circuit which includes a number of additional functions and features which are described in the following co-pending, commonly assigned patent applications, the disclosure of each of which is incorporated herein by reference: U.S. patent application Serial No. 08/ , entitled "DISPLAY CONTROLLER

Abstract

Instructions for multiplication and division are carried out by using the adder (103), the upper shifter comprising multiplexor (105), shifter (106) and register (107), and the lower shifter comprising multiplexor (108), shifter (109) and register (110). Generally, most multiplication and division instructions are performed according to conventional algorithms, i.e. shift and add for multiplication, and subtract and shift for division operations. For a division operation, if the value of A is greater than the value stored in registers (107, 110) then 0 is entered and the shifter is selected else 1 is entered and the adder (103) is selected. A division example of 50 by 7 yields a quotient of 7 with a remainder of 1, as shown in Table II (truncated to 8 bits). Register (110) is used to provide the quotient while register (107) is used to provide the remainder.

Description

METHOD FOR PERFORMING SIGNED DIVISION
BACKGROUND OF THE INVENTION
The present invention relates generally to microprocessor architecture, and more particularly to the architecture of a microprocessor execution unit which performs arithmetic and logic operations concurrently with address computations.
Many modern microprocessors have a "pipelined architecture" whereby the processor is divided into stages. This permits the processor to perform several tasks at once thereby allowing the processor to work on different parts of the instructions simultaneously as they are advanced through the pipe with each clock cycle. Under ideal conditions, one instruction can leave the pipeline and another instruction enter the pipeline every clock cycle. One such microprocessor is the Intel486™ microprocessor. Compatible x86-type microprocessors include those made by Advanced Micro Devices and Cyrix.
Arising out of the need for compatibility with older Intel microprocessor designs and the fact that it is a general purpose microprocessor, the x86 microprocessor has a complex instruction set architecture which includes over 400 instructions. However, some of these instructions are rarely used by either the operating system or the compiler. Thus, it would be desirable to optimize the architecture for commonly used instructions.
As described in Chapter 2 of the "Intel486™ MICROPROCESSOR FAMILY PROGRAMMER'S REFERENCE MANUAL," the memory of an x86-type microprocessor is organized as a sequence of 8- bit bytes and each byte is assigned a unique physical address. However, application programs do not directly address the physical address, but instead use a virtual addressing scheme whereby the physical address is calculated based on a memory management model that includes segmentation and paging.
The physical memory is divided into independent memory spaces called segments. Each segment has a segment descriptor which contains its base address and a size limit for that segment. An application program issues a logical address which the addressing hardware translates into a linear address by adding an offset to the base address, so long as the offset does not exceed the size limit. T e offset is the sum of three components, namely, a displacement value, a base register and an index register.
Paging is also supported by x86-type architecture, whereby linear address space, which may be part of physical memory or disk storage, is divided into 4k blocks of memory called pages. If paging is employed, addressing hardware translates the linear address into a physical address. If not, the linear address is the same as the physical address.
Thus, x86-type addressing hardware must handle the worst case scenario, namely, wherein the effective address is the sum of the base and all offset components. However, in many applications, only one offset component is present. Therefore, it would also be desirable to optimize the addressing hardware to handle the usual rather than the worst case addressing computation.
SUMMARY OF THE INVENTION
An architecture for a microprocessor execution unit is disclosed. The architecture is generally described as having an arithmetic unit and an addressing unit. The arithmetic unit performs arithmetic and logical operations on a pair of operands in response to control signals. The addressing unit operates in conjunction with the arithmetic unit to calculate linear addresses as well as offsets and limits.
The arithmetic unit includes a first portion for performing addition, subtraction and logical operations, a second portion for performing multiply, divide and single-bit shift operations, and a third portion for performing multi-bit shift and rotate operations.
The addressing unit is coupled to the output of the arithmetic unit. When an addressing instruction is received, the base component is input to the addresing unit and the index and displacement components are input to the arithmetic unit. The results are summed in a single cycle to yield a linear address. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description of the invention and accompanying drawings which set forth an illustrative embodiment in which the principles of the invention are utilized
BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 is a block diagram showing portions of an integrated microprocessor system Figure 2 is a lαnctional block diagram of the execution unit portion of the integrated microprocessor system of Figure 1
Figure 3 is a more detailed block diagram of the execution unit of Figure 2 Figure 4 is a schematic diagram of the result registers for a division operation Figure 5 is a schematic diagram of the result registers for a multiplication operation Figure 6 is a block diagram of a portion of the execution unit of Figure 3
DETAILED DESCRIPTION OF THE INVENTION
Referring now to Figure 1 , the preferred embodiment of the present invention is an integrated microprocessor system 1 having a pipelined architecture, wherein the pipeline includes, for example, a decoder stage 2, an execution unit 3 and a writeback stage 4 The present disclosure is directed to the execution unit 3, which provides a 32-bit data path for operands and instructions stored in general registers, including register file 6 and shadow register 7, and wherein arithmetic, logical and addressing computations are performed by the execution unit for programming instructions executed by the integrated microprocessor system 1
The microprocessor system 1 includes a control logic unit 5 which is coupled to send control signals CTRL to the execution unit 3 and to receive data signals SGNL from the execution unit The control logic unit 5 is also coupled to other components of the microprocessor system 1 and receives microcode and other input for making programmed control decisions
The execution unit 3 is implemented to be substanually compatible with the Intel x86 instruction set, as set forth in the "Intel486™ MICROPROCESSOR FAMILY PROGRAMMER'S REFERENCE MANUAL," which is expressly incorporated herein by reference The x86 instructions will be referenced herein in their common mnemonic form, such as ADD, SUB, MUL, DIV, etc
A simplified functional diagram of the execution unit 3 is shown in Figure 2 An arithmetic unit 50 has two inputs 51 and 52 for receiving operands OpA and OpB, respectively, from register file 6 The arithmetic unit 50 generates an arithmetic or logical result 53 in a single cycle for many x86 instructions
The arithmetic unit 50 includes a first portion for performing addition, subtraction and logical operations, a second portion for performing multiply, divide and single-bit shift operations, and a third portion for performing multi-bit shift and rotate operations, as will be shown and descπbed in more detail below
An addressing unit 55 has one input selectively coupled to the output of the arithmetic unit 50 or to OpA When an addressing instruction is received, the segment base component is provided to the addressing unit 55 on input 56, and the base, index, or displacement components, or immediate segment address operands, are provided to the arithmetic unit 50 on inputs 51 and 52 The addressing unit 55 then sums the address components to yield output 57 which is a linear address
A limit check unit 60 is provided to make sure the offset 58, 1 e , output 53 or OpA, is not addressing a location outside of the segment as determined by the control signal LIMIT
Refemng now to Figure 3, the execution unit 3 is illustrated in greater detail It should be apparent to one versed in the art that each component of the execution unit described below is controlled or selected by one or more control signals provided by the control logic unit 5 However, a detailed description of these control signals in not necessary for a complete understanding of the invenuon Operand A is received into a sign extension unit 101. Sign extension unit 101 is a 3: 1 multiplexor that selects a byte and sign extends it into 32 bits, or selectes a word and sign extends it into 32 bits, or selects a dword, and then outputs the 32 bit result a,„ onto data line 201. The term "sign extend" means copying the sign bit into the 24 highest order bits for a byte or into the 16 highest order bits for a word.
Operand B is received into a sign extension unit 102. Sign extension unit 102 includes a 5: 1 multiplexor that selects a signed byte and sign extends it into 32 bits, or a signed word and sign extends it into 32 bits, or an '.nsigned byte and sign extends it into 32 bits, or an unsigned word and sign extends it into 32 bits, or a dword. The output 202 is a 32 bit result bm. Sign extension unit 102 also includes a 2: 1 multiplexor that selects OpB or its complement.
An adder 103 receives and operates upon data lines 202 and 203 and carry input Cl 204. Data line 203 is from the output of a 2:1 multiplexor 104, which selects either a,„ data line 201 or UpperQ data line 205.
Adder 103 performs logical operations on data lines 202 and 203 to generate logic output 207, which is available to the user through output gate 111. The adder 103 also performs addition on data lines 202, 203 and 204 to generate sum output 206, which is available to the user through output gate 112.
Two 32 bit registers are provided for performing multiply, divide and single-bit shift operations. For the upper 32 bits, a 3:1 multiplexor 105 selects from a,n data line 201, UpperQ data line 205, or SUM data line 207. The selected value may be shifted either left or right by one bit by left/right shifter 106 and then stored in register 107. For the lower 32 bits, a 2:1 multiplexor 108 selects from b,n data line 202 or from LowerQ data line 208. The selected value may be shifted either left or right by one bit by left right shifter 109 and then stored in register 1 10. The least significant bit (LSB) of left/right shifter 106 is coupled to the most significant bit (MSB) of left/right shifter 109 to permit up to 64 single bit position shifts.
The UpperQ register 107 provides an output data line 205 which is fed back to multiplexor 104 or multiplexor 105, as described above, or made available to the user through output gate 113. The LowerQ register 1 10 provides an output data line 208 which is fed back to multiplexor 108 as described above, or available to the user through output gate 1 14.
A barrel shifter 120 comprising a 32 by 32 transistor array is provided for performing multi-bit shift and rotate operations. A pair of 32 bit 2:1 multiplexors 121, 122 couple the a,n data line 201 to the barrel shifter 120. A 5 bit decoder 123 provides 32 output signals, only one of which is true, to the barrel shifter 120, thus selecting one row of the barrel shifter. The output 209 of the barrel shifter 120 is available to the user through output gate 115.
A multiplexor 130 selectively outputs status flags from the execution unit 3 through output gate 116, as shown in Table I:
TABLE I
Flag Function
CF Carrv flag: Carrv or Borrow from most-significant bit
PF Parity Flag: Exclusive NOT of lower 8 bits of result
AF Auxiliary Flae: Carry of Borrow from bit 8
ZF Zero Flag: Zero result set ZF to 1 ; else ZF is cleared
SF Sign Flag: set to most-significant bit of result
OF Overflow Flag: set to 1 is two's complement overflow occurs; else cleared
The addressing unit includes a 4: 1 multiplexor 152 that selects the a,„ data line 201 if it is a dword, or zero extends the a,„ data line 201 if it is a word, or the SUM output 206 if a dword, or zero extends the SUM output if a word The term "zero extend" means copying a zero into the 24 highest order bits for a byte or into the 16 highest order bits for a word
An adder 154 receives the output from multiplexor 152 as well as the segment base value on data line 56 and adds the two values together, thereby generating a linear address 57
A limit check unit 160 is also provided in execution unit 3 The address includes a 20 bit limit value 162 which is stored in the shadow register 7 This limit value is provided to multiplexor 164, where it is scaled to 32 bits, depending on the value of the granularity bit, then inverted through 32 bit inverter 166 The output of inverter 166 is coupled to an adder 168, in which only the carry out function is used, and to a multiplexor 170. The output of multiplexor 152 is also coupled to adder 168. The output B of adder 168 indicates that the offset is below the scaled limit value.
The multiplexor 170 is provided with constants HC (half ceiling) and FC (full ceiling), which provide the maximum value for addressing computations and cause selection of either 16 bit addresses (HC) or 32 bits addresses (FC). The output of multiplexor 170, which is the upper limit for address computations, is fed to adder 174. which is a carry save adder (CSA). Additional inputs to CSA 174 are from multiplexor 152 and multiplexor 176. Additional constant inputs 0, 1 and 3 are provided to the multiplexor 176 to define the instruction length, i.e., 0 = byte, 1 = word, and 3 = dword.
The output of CSA 174 is fed to the input of adder 178 and to a single bit left shift unit 180, which effectively multiplies the value of the carry bits by 2 The output of shift unit 180 is fed to the adder 178 The output SegSpace of adder 178 is used for a limit calculation by a prefetch unit (not shown) and the output A of adder 178 indicates that the offset in above the scaled limit value
The operation of execution unit 3 for arithmetic and logical instructions will now be described in more detail.
Instructions for addition, subtraction, and logical operations are carried out in a conventional manner by utilizing the resources ot adder 103.
Instructions for multiplication and division are earned out by using the adder 103, the upper shifter comprising multiplexor 105. shifter 106 and register 107. and the lower shifter comprising multiplexor 108, shifter 109 and register 1 10 Generally, most multiplication and division instructions are performed according to conventional algoπthms, i.e., shift and add for multiplication, and subtract and shift for division operations.
For a division operation, if the value of a,n is greater than the value stored in registers 107, 1 10, then 0 is entered and the shifter is selected, else 1 is entered and the adder 103 is selected. For a multiplication operation, if OpA equals 1, then the adder 103 is selected, else the shifter is selected
A division example of 50 by 7 yields a quotient of 7 with a remainder of 1, as shown in Table II (truncated to 8 bits).
Table II
Cycle Register Shifter 106+109 Adder 107+110 103
1 00110010 01100100 1111
2 01100100 11001001 0101
3 0101 1001 10110011 0100
4 01000011 10000111 0001
5 00010111 xxxx xxxx xxxx
Register 1 10 is used to provide the quotient while register 107 is used to provide the remainder, as illustrated in Figure 5 Thus, for byte operations, the lower 8 bits of register 1 10 contain the quotient while the lower 8 bits of register 107 contain the remainder For word operations, the lower 16 bits of register 1 10 contain the quotient while the lower 16 bits of register 107 contain the remainder For dword operations, all 32 bits of register 110 contain the quotient while all 32 bits of register 107 contain the remainder.
The IDIV instruction requires special attention, and will be described with reference to Figures 6a through 6e. In step 302, the divisor is set equal to the absolute value of the 32 bit sign extension of OpB.
In step 304, the divisor is compared to zero. If true, then an interrupt occurs in step 306 and the routine stops. If not, thei. the data length is determined in step 308 and a temporary register initialized in step 310. If the data length is a byte, then register tempi is set equal to 80 hex. If the data length is a word, then register tempi is set equal to 8000 hex. If the data length is a dword, then register tempi is set equal to 80000000 hex.
In step 312, the dividend OpA is examined to determine if it is negative. If not, then the program jumps to step 320. Is so, then the lower dividend (register 110) is adjusted to become the two's complement value of the lower dividend in step 314.
In step 316, the adder 103 is examined to see if there is a carry out. If so, go to step 318. If not, go to step 319.
In step 318, an adjusted upper dividend is set equal to the two's complement of the upper dividend (register 107). In step 319, the adjusted upper dividend is set equal to the one's complement of the upper dividend. In step 320, the adjusted upper dividend is set equal to the upper dividend.
In step 322, the adjusted lower dividend is aligned to be left justified. In step 324, a division carry register is set equal to the most significant bit of the adjusted upper dividend, then the adjusted upper dividend is shifted left one bit position, then the LSB of the adjusted upper dividend is set equal to the MSB of the adjusted lower dividend, then the adjusted lower dividend is shifted to the left by one bit position, and finally, the LSB of the adjusted lower dividend is set equal to zero.
In step 326, a temporary result register stores the result of subtracting the value in register tempi from the adjusted divisor. Then, the temporary result register is set equal to the adjusted upper dividend less the temporary result register.
In step 328, the temporary result and the adjusted upper register are compared to zero. If true, then the parity flag hPF is set equal to one (step 329). If not, then the parity flag hPF is set equal to zero (step 330).
In step 332, the size is defined based on the data length, i.e., a byte, a word, or a dword. Step 334 includes several sub steps. Step 334a is the first division step and is basically a subtract then shift left. Step 334b calls for comparing the hidden parity flag hPF to 1. If true, then go to step 334c, else go to step 334d.
Step 336 checks to see if the adjusted upper dividend and the division carry register are greater than the adjusted divisor, and that the divisor and dividend are positive values. If so, then an interrupt is generated in step 338 and the routine stops. If not, then the size is compared to 0 in step 340. If true, then go to step 350. If not, go to step 342.
In step 342, a normal division operation is performed. In step 344, the size is decremented by one. Step 350 is a division end step that is similar to the normal division step, except that the difference is not shifted left one bit, but is stored directly into the upper register 107. The lower register 110 is updated as before.
In step 352, the temporary remainder is set equal to the upper register 107. If, in step 354, the temporary remainder is 0, and the divisor is greater than 0, and Hpf equals 1, then an interrupt is generated (step 356) and the routine stops. If not, then a temporary quotient is set equal to the lower register 110 in step 358.
In step 360, the sign of the divisor is compared to the sign of the dividend. If equal, a second temporary quotient is set equal to the first temporary quotient in step 362. If not, then the second temporary quotient is set equal to the complement of the first temporary quotient in step 364. In step 366, if the sign of the second temporary quotient is not equal to the exclusive OR of the sign of the dividend with the sign of the divisor, and the second temporary quotient is not equal to 0, then an interrupt is generated (step 368) and the routine stops If not, then go to step 370
The dividend is examined in step 370 to see if it is negative If so, then the remainder is set equal to the temporary remainder in step 372 If not, then the remainder is set equal to the complement of the temporary remainder step 374
Finally, the quotient is set equal to the second temporary quotient in step 376
A multiplication example of 10 by 5 yields a product of 50, as shown in Table III (truncated to 8 bits)
Table III
Cycle Register Shifter 106+109 Adder 107+110 103
1 00000000 00000000 1111
2 10100000 01010000 0101
3 01010000 00101000 0100
4 11001000 01100100 0001
5 01100100 00110010 1101
6 00110010 xxxx xxxx xxxx
The product of a multiplication operation is contained in registers 110 and 107 as illustrated in Figure 5 Thus, for byte operations, a 16-bit result is contained in the upper 8 bits of register 110 and the lower 8 bits of register 107 For word operations, a 32-bit result is contained in the upper 16 bits of register 110 and the lower 16 bits of register 107 For dword operations, a 64-bit result is contained all 32 bits of register 110 and all 32 bits of register 107
The barrel shifter 120 and associated multiplexors 121 and 122 may be used to carry out multi-bit shift and rotate operations, as is more fully described in the following commonly assigned, copendmg applications "BARREL SHIFTER' by Thomas W S Thomson and H John Tarn as filed on May 26, 1995, (2) "BIT SEARCHING THROUGH 8, 16, OR 32-BIT OPERANDS USING A 32-BIT DATA PATH" by Thomas W S Thomson as filed on May 26, 1995, and (3) "METHOD FOR PERFORMING ROTATE THROUGH CARRY USING A 32-BIT BARREL SHIFTER AND COUNTER" by H John Tarn as filed on May 26, 1995
Double precision shift operations are also fully supported by the execution unit 3, as more fully described in commonly assigned, copendmg application entitled "DOUBLE PRECISION (64-BIT) SHIFT OPERATIONS USING A 32-BIT DATA PATH" by Thomas W S Thomson and filed on May 26. 1995
Addressing computations for x86 segmented address space are optimized in execution unit 3 for the predominant cases, l e , where the address consists only of two components, namely a scaled index and a displacement, or a base and a displacement The execution unit is capable of performing the entire address computation in a single cycle, I e , it can perform calculate the offset, the linear address and the limit in a single cycle
An address cycle is illustrated schematically in Figure 7 A 32-bit segment base address is provided to input 56 and defines the memory segment space in which an operand resides A 32-bit or 16-bit segment offset value is added to the segment base to form the linear address The offset value is constructed from up to two general registers, namely a base register or an index register, and a literal displacement value, which is an 8-bit, 16-bit, or 32-bit value taken from the addressing instruction format The index register can be scaled by a factor of 2, 4, or 8 before use, thereby allowing the index register to count elements rather than bytes when indexing through an array The invention embodiments described herein have been implemented in an integrated circuit which includes a number of additional functions and features which are described in the following co-pending, commonly assigned patent applications, the disclosure of each of which is incorporated herein by reference: U.S. patent application Serial No. 08/ , entitled "DISPLAY CONTROLLER
CAPABLE OF ACCESSING AN EXTERNAL MEMORY FOR GRAY SCALE MODULATION
DATA" (atty. docket no. NSC 1-62700); U.S. patent application Serial No. 08/ , entitled
"SERIAL INTERFACE CAPABLE OF OPERATING IN TWO DIFFERENT SERIAL DATA TRANSFER MODES" (atty. docket no. NSC 1-62800); U.S. patent application Serial No.
08/ , entitled "HIGH PERFORMANCE MULTIFUNCTION DIRECT MEMORY ACCESS
(DMA) CONTROLLER" (atty. docket no. NSC 1-62900); U.S. patent application Serial No.
08/ , entitled "OPEN DRAIN MULTI-SOURCE CLOCK GENERATOR HAVING
MINIMUM PULSE WIDTH" (atty. docket no. NSC 1-63000); U.S. patent application Serial No.
08/ , entitled "INTEGRATED CIRCUIT WITH MULTIPLE FUNCTIONS SHARING
MULTIPLE INTERNAL SIGNAL BUSES ACCORDING TO DISTRIBUTED BUS ACCESS AND CONTROL ARBITRATION" (atty. docket no. NSC1-63100); U.S. patent application Serial No.
08/ entitled "EXECUTION UNIT ARCHITECTURE TO SUPPORT x86 INSTRUCTION
SET AND x86 SEGMENTED ADDRESSING" (atty. docket no. NSC1-63300); U.S. patent application
Serial No. 08/ , entitled "BARREL SHIFTER" (atty. docket no. NSC1-63400); U.S. patent application Serial No. 08/ , entitled "BIT SEARCHING THROUGH 8, 16, OR 32-BIT
OPERANDS USING A 32-BIT DATA PATH" (atty. docket no. NSC1-63500); U.S. patent application
Serial No. 08/ , entitled "DOUBLE PRECISION (64-BIT) SHIFT OPERATIONS USING A
32-BIT DATA PATH" (atty. docket no. NSC1-63600); U.S. patent application Serial No.
08/ entitled "METHOD FOR PERFORMING SIGNED DIVISION" (atty. docket no.
NSC1-63700); U.S. patent application Serial No. 08/ , entitled "METHOD FOR
PERFORMING ROTATE THROUGH CARRY USING A 32-BIT BARREL SHIFTER AND
COUNTER" (atty. docket no. NSC1-63800); U.S. patent application Serial No. 08/ , entitled
"AREA AND TIME EFFICIENT HELD EXTRACTION CIRCUIT" (atty. docket no. NSC 1-63900);
U.S. patent application Serial No. 08/ , entitled "NON-ARITHMETICAL CIRCULAR
BUFFER CELL AVAILABILITY STATUS INDICATOR CIRCUIT" (atty. docket no. NSCl-64000);
U.S. patent application Serial No. 08/ , entitled "TAGGED PREFETCH AND INSTRUCΗON
DECODER FOR VARIABLE LENGTH INSTRUCΗON SET AND METHOD OF OPERAΗON" (atty. docket no. NSCl-64100); U.S. patent application Serial No. 08/ , entitled "PARTITIONED
DECODER CIRCUIT FOR LOW POWER OPERAΗON" (atty. docket no. NSC1-64200); U.S. patent application Serial No. 08/ entitled "CIRCUIT FOR DESIGNATING INSTRUCΗON
POINTERS FOR USE BY A PROCESSOR DECODER" (atty. docket no. NSC 1-64300); U.S. patent application Serial No. 08/ , entitled "CIRCUIT FOR GENERATING A DEMAND-BASED
GATED CLOCK" (atty. docket no. NSC 1-64500); U.S. patent application Serial No. 08/ , entitled "INCREMENTOR/DECREMENTOR" (atty. docket no. NSCl-64700); U.S. patent application
Serial No. 08/ entitled "A PIPELINED MICROPROCESSOR THAT PIPELINES MEMORY
REQUESTS TO AN EXTERNAL MEMORY" (atty. docket no. NSC 1-64800); U.S. patent application
Serial No. 08/ entitled "CODE BREAKPOINT DECODER" (atty. docket no. NSC1-64900);
U.S. patent application Serial No. 08/ , entitled "TWO TIER PREFETCH BUFFER
STRUCTURE AND METHOD WITH BYPASS" (atty. docket no. NSC1-65000); U.S. patent application
Serial No. 08/ entitled "INSTRUCΗON LIMIT CHECK FOR MICROPROCESSOR" (atty. docket no. NSC1-65100); U.S. patent application Serial No. 08/ entitled "A PIPELINED
MICROPROCESSOR THAT MAKES MEMORY REQUESTS TO A CACHE MEMORY AND AN EXTERNAL MEMORY CONTROLLER DURING THE SAME CLOCK CYCLE" (atty. docket no. NSC1-65200); U.S. patent application Serial No. 08/ entitled "APPARATUS AND METHOD FOR EFHCIENT COMPUTAΗON OF A 486™ MICROPROCESSOR COMPAΗBLE POP
INSTRUCΗON" (atty. docket no. NSC 1-65700); U.S. patent application Serial No. 08/ , entitled "APPARATUS AND METHOD FOR EFFICIENTLY DETERMINING ADDRESSES FOR MISALIGNED DATA STORED IN MEMORY" (atty. docket no. NSC 1-65800); U.S. patent application
Serial No. 08/ entitled "METHOD OF IMPLEMENTING FAST 486™
MICROPROCESSOR COMPATIBLE STRING OPERATION" (atty. docket no. NSC 1-65900); U.S. patent application Sei.al No. 08/ , entitled "A PIPELINED MICROPROCESSOR THAT
PREVENTSTHECACHEFROMBEINGREADWHENTHECONTENTSOFTHECACHEARE
INVALID" (atty. docket no. NSC 1-66000); U.S. patent application Serial No. 08/ , entitled
"DRAM CONTROLLER THAT REDUCES THE TIME REQUIRED TO PROCESS MEMORY
REQUESTS" (atty. docket no. NSC1-66300); U.S. patent application Serial No. 08/ , entitled
"INTEGRATED PRIMARY BUS AND SECONDARY BUS CONTROLLER WITH REDUCED PIN
COUNT" (atty. docket no. NSC1-66400); U.S. patent application Serial No. 08/ , entitled
"SUPPLY AND INTERFACE CONFIGURABLE INPUT/OUTPUT BUFFER" (atty. docket no.
NSCl-66500); U.S. patent application Serial No. 08/ , entitled "CLOCK GENERATION
CIRCUIT FOR A DISPLAY CONTROLLER HAVING A FINE TUNEABLE FRAME RATE" (atty. docket no. NSC1-66600); U.S. patent application Serial No. 08/ , entitled "CONFIGURABLE
POWER MANAGEMENT SCHEME" (atty. docket no. NSC 1-66700); U.S. patent application Serial No.
08/ entitled "BIDIRECTIONAL PARALLEL SIGNAL INTERFACE" (atty. docket no.
NSC 1-67000); U.S. patent application Serial No. 08/ , entitled "LIQUID CRYSTAL
DISPLAY (LCD) PROTECTION CIRCUIT" (atty. docket no. NSC1-67100); U.S. patent application
Serial No. 08/ entitled "IN-CIRCUJT EMULATOR STATUS INDICATOR CIRCUIT" (atty. docket no. NSC 1-67400); U.S. patent application Serial No. 08/ , entitled "DISPLAY
CONTROLLER CAPABLE OF ACCESSING GRAPHICS DATA FROM A SHARED SYSTEM
MEMORY" (atty. docket no. NSCl-67500); U.S. patent application Serial No. 08/ entitled
"INTEGRATED CIRCUIT WITH TEST SIGNAL BUSES AND TEST CONTROL CIRCUITS" (atty. docket no. NSC1-67600); U.S. patent application Serial no. 08/ , entitled "DECODE BLOCK
TEST METHOD AND APPARATUS" (atty. docket no. NSC 1-68000).
It should be understood that the invention is not intended to be limited by the specifics of the above-described embodiment, but rather defined by the accompanying claims.

Claims

WHAT IS CLAIMED IS: 1. A method for performing signed integer division, comprising the sequential steps of: a. setting the divisor equal to the absolute value of the 32 bit sign extension of a first operand; b. comparing the divisor to zero, wherein if true, an interrupt is generated and the routine stops; c. determining the data length and initializing a temporary register based on the result of the determination; d. determining whether the dividend is negative, and if so, take the two's complement of the dividend, and if not, jump to step f; e. examining the adder to see if there is a carry out, and if not, jump to step g; f. setting the adjusted upper dividend equal to the two's complement of the upper dividend, then jump to step h; g. setting the adjusted upper dividend equal to the one's complement of the upper dividend; h. left justifying the adjusted lower dividend; i. setting a division carry register equal to the most significant bit of the adjusted upper dividend; j. shifting the adjusted upper dividend left one bit position; k. setting the LSB of the adjusted upper dividend equal to the MSB of the adjusted lower dividend;
1. shifting the adjusted lower dividend left by one bit position; m. setting the LSB of the adjusted lower dividend is set equal to zero; n. subtracting the value in register tempi from the adjusted divisor and storing the result in a temporary result register; o. setting the temporary result register equal to the adjusted upper dividend less the temporary result register; p. comparing the temporary result and the adjusted upper register to zero, wherein if true, the parity flag is set equal to one, and wherein if false, the parity flag is set equal to zero; q. subtracting then shifting left; r. comparing the hidden parity flag to one; s. storing directly into the upper register; t. setting the temporary remainder equal to the upper register; u. generating an interrupt if the temporary remainder is 0, and the divisor is greater than 0, and Hpf equals 1; v. setting the temporary quotient equal to the lower register;
w. comparing the sign of the divisor to the sign of the dividend, and if equal, setting a second temporary quotient equal to the first temporary quotient, and if not equal, then setting the temporary quotient equal to the complement of the first temporary quotient; x. generating an interrupt if the sign of the second temporary quotient is not equal to the exclusive OR of the sign of the dividend AND the sign of the divisor, and the second temporary quotient is not equal to 0; y. examining the dividend, and if negative, then setting the remainder equal to the temporary remainder, and if not, then setting the remainder equal to the complement of the temporary remainder; and z. setting the quotient equal to the second temporary quotient.
PCT/US1996/007614 1995-05-26 1996-05-23 Method for performing signed division WO1996038780A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP96925249A EP0772815A2 (en) 1995-05-26 1996-05-23 Method for performing signed division

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US08/451,571 US5754460A (en) 1995-05-26 1995-05-26 Method for performing signed division
US08/451,204 1995-05-26

Publications (2)

Publication Number Publication Date
WO1996038780A2 true WO1996038780A2 (en) 1996-12-05
WO1996038780A3 WO1996038780A3 (en) 1997-01-09

Family

ID=23792763

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1996/007614 WO1996038780A2 (en) 1995-05-26 1996-05-23 Method for performing signed division

Country Status (3)

Country Link
US (1) US5754460A (en)
EP (1) EP0772815A2 (en)
WO (1) WO1996038780A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102156625A (en) * 2011-03-31 2011-08-17 北京大学 Method for performing division calculation by utilizing rheostatic element

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6125380A (en) * 1998-04-13 2000-09-26 Winbond Electronics Corporation Dividing method
DE10055659C1 (en) * 2000-11-10 2002-03-28 Infineon Technologies Ag Calculation circuit for division of fixed point signal uses adders and logic stage with AND and OR logic
JP4712247B2 (en) * 2001-08-31 2011-06-29 富士通セミコンダクター株式会社 Microprocessor development system for application programs involving integer division or integer remainder
US7174358B2 (en) * 2002-11-15 2007-02-06 Broadcom Corporation System, method, and apparatus for division coupled with truncation of signed binary numbers
US7165086B2 (en) * 2002-11-15 2007-01-16 Broadcom Corporation System, method, and apparatus for division coupled with rounding of signed binary numbers

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4224676A (en) * 1978-06-30 1980-09-23 Texas Instruments Incorporated Arithmetic logic unit bit-slice with internal distributed iterative control
US5097435A (en) * 1988-12-24 1992-03-17 Kabushiki Kaisha Toshiba High speed dividing apparatus

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4381550A (en) * 1980-10-29 1983-04-26 Sperry Corporation High speed dividing circuit
JPS60140428A (en) * 1983-12-28 1985-07-25 Hitachi Ltd Divider
JP3098242B2 (en) * 1988-07-13 2000-10-16 日本電気株式会社 Data processing device
US5027309A (en) * 1988-08-29 1991-06-25 Nec Corporation Digital division circuit using N/M-bit subtractor for N subtractions
US5204953A (en) * 1989-08-04 1993-04-20 Intel Corporation One clock address pipelining in segmentation unit
US5259006A (en) * 1990-04-18 1993-11-02 Quickturn Systems, Incorporated Method for substantially eliminating hold time violations in implementing high speed logic circuits or the like
US5189319A (en) * 1991-10-10 1993-02-23 Intel Corporation Power reducing buffer/latch circuit
US5254888A (en) * 1992-03-27 1993-10-19 Picopower Technology Inc. Switchable clock circuit for microprocessors to thereby save power
US5493523A (en) * 1993-12-15 1996-02-20 Silicon Graphics, Inc. Mechanism and method for integer divide involving pre-alignment of the divisor relative to the dividend
US5404473A (en) * 1994-03-01 1995-04-04 Intel Corporation Apparatus and method for handling string operations in a pipelined processor
US5574677A (en) * 1994-11-23 1996-11-12 Exponential Technology, Inc. Adaptive non-restoring integer divide apparatus with integrated overflow detect

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4224676A (en) * 1978-06-30 1980-09-23 Texas Instruments Incorporated Arithmetic logic unit bit-slice with internal distributed iterative control
US5097435A (en) * 1988-12-24 1992-03-17 Kabushiki Kaisha Toshiba High speed dividing apparatus

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
COMPUTER DESIGN, vol. 16, no. 5, May 1977, LITTLETON, MASSACHUSETTS US, pages 124-127, XP002015482 S. SANYAL: "An ALgorithm for Nonrestoring Division" *
K. HWANG: "Computer arithmetic: principles, architecture, and design" 1979 , J. WILEY & SONS , NEW YORK XP002015483 pages 218-221 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102156625A (en) * 2011-03-31 2011-08-17 北京大学 Method for performing division calculation by utilizing rheostatic element

Also Published As

Publication number Publication date
WO1996038780A3 (en) 1997-01-09
EP0772815A2 (en) 1997-05-14
US5754460A (en) 1998-05-19

Similar Documents

Publication Publication Date Title
US5272660A (en) Method and apparatus for performing integer and floating point division using a single SRT divider in a data processor
US5450607A (en) Unified floating point and integer datapath for a RISC processor
US5917741A (en) Method and apparatus for performing floating-point rounding operations for multiple precisions using incrementers
US6269384B1 (en) Method and apparatus for rounding and normalizing results within a multiplier
EP0685787A1 (en) Multibit shifting apparatus, data processor using same, and method therefor
US5381360A (en) Modulo arithmetic addressing circuit
US5426600A (en) Double precision division circuit and method for digital signal processor
US5511017A (en) Reduced-modulus address generation using sign-extension and correction
EP0685786A1 (en) Combined multiplier/shifter and method therefor
US5682339A (en) Method for performing rotate through carry using a 32 bit barrel shifter and counter
US5301139A (en) Shifter circuit for multiple precision division
KR20000053047A (en) Eight-bit microcontroller having a risc architecture
JP2500098B2 (en) Digital computer system
KR100351527B1 (en) Code breakpoint decoder
JPS6014338A (en) Branch mechanism for computer system
US5655139A (en) Execution unit architecture to support X86 instruction set and X86 segmented addressing
US5754460A (en) Method for performing signed division
KR100431726B1 (en) Method for performing signed division
US5237525A (en) In a data processor an SRT divider having a negative divisor sticky detection circuit
US5687102A (en) Double precision (64 bit) shift operations using a 32 bit data path
EP1089166A2 (en) An integer instruction set architecture and implementation
US6393554B1 (en) Method and apparatus for performing vector and scalar multiplication and calculating rounded products
US5649147A (en) Circuit for designating instruction pointers for use by a processor decoder
US7580967B2 (en) Processor with maximum and minimum instructions
US5583453A (en) Incrementor/decrementor

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): DE KR

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE

AK Designated states

Kind code of ref document: A3

Designated state(s): DE KR

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE

WWE Wipo information: entry into national phase

Ref document number: 1019970700540

Country of ref document: KR

WWE Wipo information: entry into national phase

Ref document number: 1996925249

Country of ref document: EP

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWP Wipo information: published in national office

Ref document number: 1996925249

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWP Wipo information: published in national office

Ref document number: 1019970700540

Country of ref document: KR

WWW Wipo information: withdrawn in national office

Ref document number: 1996925249

Country of ref document: EP

WWG Wipo information: grant in national office

Ref document number: 1019970700540

Country of ref document: KR