US 7774393 B1 Abstract An apparatus and method for integer to floating-point format conversion. A processor may include an adder configured to perform addition of respective mantissas of two floating-point operands to produce a sum, where a smaller-exponent one of the floating-point operands has a respective exponent less than or equal to a respective exponent of a larger-exponent one of the floating-point operands. The processor may further include an alignment shifter coupled to the adder and configured, in a first mode of operation, to align the floating-point operands prior to the addition by shifting the respective mantissa of the smaller-exponent operand towards a least-significant bit position. The alignment shifter may be further configured, in a second mode of operation, to normalize an integer operand by shifting the integer operand towards a most-significant bit position. The second mode of operation may be active during execution of an instruction to convert the integer operand to floating-point format.
Claims(21) 1. A processor, comprising:
an adder configured to perform addition of respective mantissas of two floating-point operands to produce a sum, wherein a smaller-exponent one of said floating-point operands has a respective exponent less than or equal to a respective exponent of a larger-exponent one of said floating-point operands;
an alignment shifter coupled to provide an input to said adder and configured, in a first mode of operation that is active during execution of a floating-point addition instruction, to align said floating-point operands prior to said addition by shifting the respective mantissa of said smaller-exponent operand towards a least-significant bit position; and
a rounding circuit configured to round said sum and a normalization circuit configured to normalize said sum, wherein said rounding circuit and said normalization circuit are configured such that during execution of any floating-point addition instruction, either said normalization circuit is operable to perform a normalization shift of said sum by more than one bit position, or said rounding circuit is operable to round said sum, but not both;
wherein said adder, said alignment shifter, said rounding circuit, and said normalization circuit are included within a floating-point adder pipeline comprising a plurality of pipeline stages, wherein the floating-point adder pipeline is configured to concurrently process multiple floating-point addition instructions that concurrently occupy different pipeline stages;
wherein said adder is operable, during execution of an instruction to convert an integer operand to floating-point format, to produce a result of conversion of said integer operand as said sum, wherein for at least some values of said integer operand, conversion of said integer operand to floating-point format requires both a normalization shift by more than one bit position and rounding, and wherein said rounding circuit is configured to round said result of conversion of said integer operand;
wherein said alignment shifter is further configured, in a second mode of operation that is active during execution of said instruction to convert said integer operand to floating-point format, to perform said normalization shift required by said conversion prior to said adder producing said result of conversion of said integer operand as said sum, such that execution of said instruction to convert said integer operand to floating-point format does not require additional execution latency relative to execution of said floating-point addition instruction.
2. The processor as recited in
3. The processor as recited in
4. The processor as recited in
5. The processor as recited in
6. The processor as recited in
7. The processor as recited in
8. A method, comprising:
during execution of a floating-point addition instruction:
an alignment shifter aligning two floating-point operands having respective mantissas by shifting the respective mantissa of a smaller-exponent one of said floating point operands towards a least-significant bit position;
an adder adding said floating-point operands after said aligning to produce a sum; and
either a rounding circuit rounding said sum or a normalization circuit performing a normalization shift of said sum by more than one bit position, wherein said rounding circuit and said normalization circuit are physically configured such that during execution of any floating-point addition instruction, either said normalization circuit performs said normalization shift or said rounding circuit performs said rounding, but not both;
wherein said adder, said alignment shifter, said rounding circuit, and said normalization circuit are included within a floating-point adder pipeline comprising a plurality of pipeline stages, wherein the floating-point adder pipeline concurrently processes multiple floating-point addition instructions that concurrently occupy different pipeline stages;
during execution of an instruction to convert an integer operand to floating-point format, wherein conversion of said integer operand to floating-point format requires, for at least some values of said integer operand, both a normalization shift by more than one bit position and rounding:
prior to said adder producing a result of said instruction to convert said integer operand, said alignment shifter performing said normalization shift of said integer operand required by said conversion; and
said adder producing a result of conversion of said integer operand as said sum and said rounding circuit rounding said result of conversion of said integer operand, such that executing said instruction to convert said integer operand to floating-point format does not require additional execution latency relative to executing said floating-point addition instruction;
wherein said smaller-exponent operand has a respective exponent less than or equal to a respective exponent of a larger-exponent one of said floating-point operands.
9. The method as recited in
10. The method as recited in
11. The method as recited in
12. The method as recited in
13. The method as recited in
14. The method as recited in
issuing an instruction from one of a plurality of threads during one execution cycle; and
issuing another instruction from another one of said plurality of threads during a successive execution cycle.
15. A system, comprising:
a system memory; and
a processor coupled to said system memory, wherein said processor comprises:
an adder configured to perform addition of respective mantissas of two floating-point operands to produce a sum, wherein a smaller-exponent one of said floating-point operands has a respective exponent less than or equal to a respective exponent of a larger-exponent one of said floating-point operands;
an alignment shifter coupled to provide an input to said adder and configured, in a first mode of operation that is active during execution of a floating-point addition instruction, to align said floating-point operands prior to said addition by shifting the respective mantissa of said smaller-exponent operand towards a least-significant bit position; and
a rounding circuit configured to round said sum and a normalization circuit configured to normalize said sum, wherein said rounding circuit and said normalization circuit are configured such that during execution of any floating-point addition instruction, either said normalization circuit is operable to perform a normalization shift of said sum by more than one bit position, or said rounding circuit is operable to round said sum, but not both;
wherein said adder, said alignment shifter, said rounding circuit, and said normalization circuit are included within a floating-point adder pipeline comprising a plurality of pipeline stages, wherein the floating-point adder pipeline is configured to concurrently process multiple floating-point addition instructions that concurrently occupy different pipeline stages;
wherein said adder is operable, during execution of an instruction to convert an integer operand to floating-point format, to produce a result of conversion of said integer operand as said sum, wherein for at least some values of said integer operand, conversion of said integer operand to floating-point format requires both a normalization shift by more than one bit position and rounding, and wherein said rounding circuit is configured to round said result of conversion of said integer operand;
wherein said alignment shifter is further configured, in a second mode of operation that is active during execution of said instruction to convert said integer operand to floating-point format, to perform said normalization shift required by said conversion prior to said adder producing said result of conversion of said integer operand as said sum, such that execution of said instruction to convert said integer operand to floating-point format does not require additional execution latency relative to execution of said floating-point addition instruction.
16. The system as recited in
17. The system as recited in
18. The system as recited in
19. The system as recited in
20. The system as recited in
21. The system as recited in
Description 1. Field of the Invention This invention relates to processors and, more particularly, to execution of floating-point arithmetic instructions. 2. Description of the Related Art In many processor implementations that include support for floating-point arithmetic, support for conversion of data from an integer format to a floating-point format is also provided. For example, an instruction set architecture may define specific instructions to perform such conversion, in order to allow programmers to perform floating-point operations on data originally formatted as integer data. In some embodiments, floating-point data is represented in a normalized format in which the floating-point mantissa and exponent are adjusted so that the most significant bit of the floating-point mantissa is equal to one. If the result of a floating-point operation, such as addition or subtraction, is not normalized, a normalization shift may be performed to normalize the result. Additionally, in some embodiments certain floating-point results may not be capable of an exact representation using a finite number of digits. In some such embodiments, such inexact results may be rounded according to a particular rounding mode. When floating-point operands are normalized, it may be the case that a given floating-point addition or subtraction operation requires either a large normalization shift of the result or result rounding, but not both. In some embodiments, a floating-point addition pipeline may be optimized for the exclusivity of these cases, for example by allowing normalization and rounding to occur in parallel rather than serially. However, integer data that is to be converted to floating-point format is not necessarily normalized, and may not exactly convert to floating-point representation (thus requiring rounding). Integer-to-floating-point conversion may therefore violate the exclusivity assumption just mentioned. In some embodiments, if integer-to-floating-point conversion is implemented within a floating-point addition pipeline that is optimized based on the exclusivity of normalization and rounding, integer-to-floating-point conversion instructions may require an extra execution cycle relative to other floating-point instructions in order to perform rounding. This may degrade performance of the conversion instructions and may create a pipeline hazard in pipelined floating-point embodiments, which may degrade the performance of other floating-point instructions. In an alternative embodiment, additional dedicated logic may be implemented specifically to handle normalization of integer operands, but such additional logic may incur additional design area and power consumption. Various embodiments of an apparatus and method for integer to floating-point format conversion are disclosed. In one embodiment, a processor may include an adder configured to perform addition of respective mantissas of two floating-point operands to produce a sum, where a smaller-exponent one of the floating-point operands has a respective exponent less than or equal to a respective exponent of a larger-exponent one of the floating-point operands. The processor may further include an alignment shifter coupled to the adder and configured, in a first mode of operation, to align the floating-point operands prior to the addition by shifting the respective mantissa of the smaller-exponent operand towards a least-significant bit position. The alignment shifter may be further configured, in a second mode of operation, to normalize an integer operand by shifting the integer operand towards a most-significant bit position. The second mode of operation may be active during execution of an instruction to convert the integer operand to floating-point format. In one embodiment, a method may include determining whether an instruction to convert an integer operand to floating-point format is executing, and if the instruction is not executing, configuring an alignment shifter to align two floating-point operands having respective mantissas by shifting the respective mantissa of a smaller-exponent one of the floating point operands towards a least-significant bit position. The method may further include, if the instruction is executing, configuring the alignment shifter to normalize an integer operand by shifting the integer operand towards a most-significant bit position. The smaller-exponent operand may have a respective exponent less than or equal to a respective exponent of a larger-exponent one of the floating-point operands. While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present invention as defined by the appended claims. A block diagram illustrating one embodiment of a multithreaded processor Cores Crossbar L2 cache In some embodiments, L2 cache Memory interface In the illustrated embodiment, processor Peripheral interface Network interface Overview of Fine-Grained Multithreading Processor Core As mentioned above, in one embodiment each of cores One embodiment of core Instruction fetch unit In one embodiment, fetch unit Pick unit Decode unit In some embodiments, instructions from a given thread may be speculatively issued from decode unit Execution units Floating point/graphics unit In the illustrated embodiment, FGU Load store unit In one embodiment, LSU Stream processing unit SPU As previously described, instruction and data memory accesses may involve translating virtual addresses to physical addresses. In one embodiment, such translation may occur on a page level of granularity, where a certain number of address bits comprise an offset into a given page of addresses, and the remaining address bits comprise a page number. For example, in an embodiment employing 4 MB pages, a 64-bit virtual address and a 40-bit physical address, 22 address bits (corresponding to 4 MB of address space, and typically the least significant address bits) may constitute the page offset. The remaining 42 bits of the virtual address may correspond to the virtual page number of that address, and the remaining 18 bits of the physical address may correspond to the physical page number of that address. In such an embodiment, virtual to physical address translation may occur by mapping a virtual page number to a particular physical page number, leaving the page offset unmodified. Such translation mappings may be stored in an ITLB or a DTLB for rapid translation of virtual addresses during lookup of instruction cache A number of functional units in the illustrated embodiment of core During the course of operation of some embodiments of core In one embodiment, TLU Exemplary Core Pipeline Diagram In the illustrated embodiment, core The first four stages of the illustrated integer pipeline may generally correspond to the functioning of IFU During the Execute stage, one or both of execution units In the illustrated embodiment, integer instructions are depicted as executing back-to-back in the pipeline without stalls. In execution cycles 0 through 7, instructions from threads 0, 3, 6, 2, 7, 5, 1 and 4 enter the Fetch stage, respectively, though in other embodiments, instructions may issue from various threads in a different order according to the operation of pick unit By execution cycle 7, it is noted that each stage of the pipeline holds an instruction from a different thread in a different stage of execution, in contrast to conventional processor implementations that typically require a pipeline flush when switching between threads or processes. In some embodiments, flushes and stalls due to resource conflicts or other scheduling hazards may cause some pipeline stages to have no instruction during a given cycle. However, in the fine-grained multithreaded processor implementation employed by the illustrated embodiment of core Floating-Point Addition Path As noted above, in various embodiments FGU Both swap mux In many embodiments of floating-point addition logic, the two floating-point mantissas are aligned with one another prior to their being added. That is, one or both mantissas are shifted, adjusting corresponding operand exponents accordingly, until both exponents are equal. In the illustrated embodiment, the operand having the smaller exponent (i.e., the numerically lesser exponent) may be aligned to the operand having the larger exponent by shifting the mantissa of the smaller-exponent operand towards the least significant bit (LSB) position (which is typically the rightmost bit position of a number, but may occupy another position in various embodiments). Since shifting a binary number by one bit position towards the LSB is equivalent to division by two, incrementing the exponent of the smaller-exponent operand by one for each bit position the mantissa is shifted (i.e., effectively multiplying the mantissa by two) preserves the value of the smaller-exponent operand. In case both exponents are already equal prior to alignment, no alignment may be necessary, and either operand may be considered the smaller-exponent operand. In the illustrated embodiment, the alignment shift count (i.e., the number of bit positions by which to shift the mantissa of the smaller-exponent operand) may be determined by exponent difference logic For any given addition operation, either operand A or operand B may be the smaller-exponent operand. Rather than provide an aligner for each operand, in the illustrated embodiment, swap mux Once selected by swap mux Following alignment of the smaller-exponent mantissa, both mantissas may be added by adder In many floating-point representation formats, such as IEEE 754, floating-point operands are normalized if it is possible to do so within the precision defined for the operand. In one embodiment a floating-point number is normalized if the most significant bit (MSB) of its mantissa is equal to one. (In some embodiments, such as IEEE 754, the most significant bit of the mantissa is not expressly represented, but is rather implied to have a value of one.) Depending on the arithmetic operation being performed, the sum produced by adder In one embodiment, sum LZD logic Because the number of mantissa bits available in any given floating-point representation format is finite, not all possible sums may be represented exactly. For example, the exact binary representation of certain rational floating-point fractions may require an infinite number of mantissa bits for exact representation, and may be only inexactly represented with a finite mantissa. In the illustrated embodiment, rounding logic Finally, in the illustrated embodiment result mux In the embodiment illustrated in In another embodiment, if the need for rounding and a large normalization shift can be assumed to be mutually exclusive or orthogonal as just mentioned, the location at which a rounding increment should occur if rounding is required may be determinable during the addition process. In such an embodiment, an appropriate rounding constant (which typically includes a one in the bit position to be incremented due to rounding, and zeroes elsewhere) may be added along with the A and B operand mantissas, so that rounding is incorporated into the addition process. One such embodiment, which may also be referred to as an early round organization, is illustrated in In some embodiments performing early rounding, adder Integer to Floating-Point Conversion In some software applications, it may be necessary or convenient to perform floating-point calculations on data that is originally formatted in an integer representation. Many ISAs provide instructions for the conversion of an integer operand to floating-point format (e.g., single or double precision) and vice versa. However, unlike many floating-point representation formats, integer operands are not generally stored in a normalized format. Thus, a given integer operand may require a large normalization shift during conversion to a floating point mantissa value. Depending on the floating-point precision chosen to represent the integer operand, some integer operands may not be capable of exact representation in floating-point format. For example, IEEE 754 single precision format provides a maximum effective mantissa width of 24 bits (23 explicit bits with an implicit MSB of one). Thus, some integer values wider than 24 bits may be rounded upon conversion to single precision format. It is noted that under some circumstances, conversion of an integer operand to floating-point format may require both a large normalization shift and a rounding operation, contrary to the assumption of orthogonality of these tasks that may be made for floating-point operands as described above. Accommodating integer-to-floating-point (IntToFP) conversion in some embodiments optimized around that assumption, such as the parallel embodiment of As described above, alignment shifter In the embodiments of FGU Specifically, in one embodiment alignment shifter In some embodiments, the integer operand to be converted may be represented in a signed two's complement format, whereas the floating-point format may use a sign-magnitude representation (e.g., a sign bit and an unsigned mantissa). In one such embodiment, if the integer operand is negative, IntToFP LZD logic It is noted that, in an embodiment configured to perform a one's complement of a negative two's complement integer operand prior to counting leading zeros, the resulting leading zero count in certain instances may be one greater than the count needed to normalize the integer operand. However, in some embodiments this situation may be detected and handled similarly to an overflow condition resulting from an ordinary addition operation. For example, in various embodiments, an overflow or carry out from adder Once the integer operand has been normalized by alignment shifter If necessary, the integer operand may be rounded according to serial round, parallel round, or early round configurations illustrated in One embodiment of a method of normalizing an integer operand is illustrated in If such a conversion instruction is not executing (for example, if a regular floating-point addition instruction is executing), alignment shifter If such a conversion instruction is executing, alignment shifter Exemplary System Embodiment As described above, in some embodiments processor In various embodiments, system memory Peripheral storage device As described previously, in one embodiment boot device Network In the foregoing discussion, references made to values such as zero or one refer to arithmetic values. It is contemplated that in various embodiments, a given arithmetic value may be implemented using either positive logic, in which logic values ‘1’ and ‘0’ correspond respectively to arithmetic values one and zero, or negative logic, in which this correspondence is inverted. It is further contemplated that any other suitable signaling scheme for conveying arithmetic values may be employed. Although the embodiments above have been described in considerable detail, numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications. Patent Citations
Non-Patent Citations Referenced by
Classifications
Legal Events
Rotate |