US 6336208 B1 Abstract A process for mapping logic nodes to a plurality of sizes of lookup tables in a programmable gate array. A node and its predecessor nodes are selectively collapsed into a first single node as a function of delay factors associated with the plurality of sizes of lookup tables and a maximum of delay factors associated with the predecessor nodes. If a cut-size associated with the first single node is less than or equal to one of the sizes of lookup tables, the one size is selected to implement the first single node. If a lookup table size was not selected for the first single node, the node and its predecessor nodes are selectively collapsed into a second single node as a function of the delay factors and the maximum delay factor increased by a selected value. If a cut-size associated with the second single nodes is less than or equal to one of the sizes of lookup tables, the one size is selected to implement the second single node.
Claims(19) 1. A process for mapping logic nodes to a plurality of sizes of lookup tables in a programmable gate array, comprising:
selectively collapsing into a first single node, a node and its predecessor nodes as a function of delay factors associated with the plurality of sizes of lookup tables and a maximum of delay factors associated with the predecessor nodes;
selecting one of the sizes of lookup tables to implement the first single node if an associated cut-size of the first single node is less than or equal to the number of inputs to the one size;
selectively collapsing into a second single node a node and its predecessor nodes as a function of the delay factors and the maximum delay factor increased by a selected value if a lookup table size was not selected for the first single node; and
selecting one of the sizes of lookup tables to implement the second single node if an associated cut-size of the second single node is less than or equal to the number of inputs to the one size.
2. The process of
3. The process of
if a size was selected for the first single node, assigning a delay factor to the first single node;
if a size was selected for the second single node, assigning a delay factor to the second single node;
repeating the steps of selectively collapsing and selecting sizes for all the nodes in a network.
4. The process of
5. The process of
6. The process of
7. The process of
8. The process of
9. The process of
determining a maximum path delay value of delay values associated with respective input/output paths through the programmable gate array; and
combining a group of selected ones of the lookup tables into a single lookup table only if the path delay value associated with the single lookup table does not exceed the maximum path delay value.
10. A process for mapping a logic node and its predecessor logic nodes to one of a plurality of sizes of lookup tables in a programmable gate array, each of the plurality of sizes of lookup tables having an associated delay factor, comprising:
(a) initializing a counter to a selected value;
(b) selecting one of the sizes of lookup tables;
(c) collapsing into a single node the logic node and the ones of the predecessor logic nodes having delay factors greater than a maximum of delay factors associated with the predecessor logic nodes plus the counter value minus the delay factor of the one size lookup table;
(d) if the single node has an associated cut-size that is less than or equal to the number of inputs to the one size lookup table, mapping to the one size lookup table the logic nodes that have been collapsed into the single node and that are within a cut of the single node;
(e) if the associated cut-size of the single node is greater than the number of inputs of the one size lookup table, selecting another one of the sizes of lookup tables to use as the one size;
(f) repeating steps (c) through (e) until the logic nodes within a cut are mapped or cut-sizes for all the sizes of lookup tables have been considered in mapping;
(g) if the logic nodes within a cut have not been mapped to one of the sizes of lookup tables and cut-sizes for all the sizes of lookup tables have been considered in mapping, incrementing the counter value; and
(h) repeating steps (b) through (g) until the counter value equals a least of delay factors of the sizes of lookup tables.
11. The process of
12. A process for mapping a logic node and its predecessor logic nodes to one of a plurality of sizes of lookup tables in a programmable gate array, each of the plurality of sizes of lookup tables having an associated delay factor, comprising:
(a) initializing a collapse factor as a function of a maximum of respective delay factors associated with the predecessor logic nodes, wherein the collapse factor is greater than the maximum of the delay factors of the predecessor logic nodes;
(b) selecting one of the sizes of lookup tables;
(c) collapsing into a single node the logic node and the ones of the predecessor logic nodes having delay factors greater than the collapse factor minus the delay factor of the one size lookup table;
(d) if the single node has an associated cut-size that is less than or equal to the number of inputs to the one size lookup table, mapping to the one size lookup table the logic nodes collapsed into the single node that are within a cut of the single node;
(e) if the associated cut-size of the single node is greater than the number of inputs of the one size lookup table, selecting another one of the sizes of lookup tables to use as the one size;
(f) repeating steps (c) through (e) until the logic nodes within a cut are mapped or all the sizes of lookup tables have been considered in mapping.
13. The process of
(g) if the logic nodes within a cut have not been mapped to one of the sizes of lookup tables and cut-sizes for all the sizes of lookup tables have been considered in mapping, increasing the collapse factor; and
(h) repeating steps (b) through (g) until the logic nodes within a cut are mapped or the collapse factor exceeds a predetermined value.
14. The process of
15. The process of
16. A process for mapping logic nodes comprising:
(a) arranging the nodes to be processed in topological order such that all fan-ins for a node are processed before the node is processed;
(b) getting the next node to be processed;
(c) selecting the smallest size LUT;
(d) collapsing into a single node the node in process and all predecessor nodes having a delay greater than the maximum delay of the predecessor nodes minus the LUT delay of the selected LUT;
(e) if the network formed by collapsing these nodes has a cut size no greater than that of the selected LUT, mapping the collapsed node into the selected LUT and assigning the LUT delay to the collapsed node;
(f) if the cut sizes don't match and there are more LUT sizes available, selecting a larger LUT size;
(g) repeating steps d, e, and f until all LUT sizes have been tried or the node is mapped;
(h) if mapping did not occur for any LUT size when all those predecessor nodes having delay greater than the maximum delay of the predecessor nodes minus the LUT delay of the selected LUT were collapsed into a single node, then increasing delay by a factor i and repeat steps c through g until the factor i is equal to delay of the fastest LUT;
(i) selecting a fastest LUT and assigning the node to the fastest LUT; and
(j) repeating the process for all nodes.
17. The process of
18. The process of
19. A process for using slack information to determine LUT output delay, wherein the LUT output delay is the time interval between when a signal arrives at a primary input to the time when the signal arrives at the output of the LUT, comprising:
(a) arranging LUTs to be processed in topological order such that all fan-outs from an original LUT are processed before the original LUT is processed; and
(b) for each original LUT:
(b1) determining the maximum set of predecessor original LUTs that can be covered by a new LUT such that the LUT output delay of the new LUT is less than or equal to the LUT output delay of the original LUT plus the slack value of the original LUT;
(b2) replacing the original LUT and the maximum set of predecessor original LUTs with the new LUT; and
(b3) assigning slack values to fan-ins of the new LUT.
Description The present invention generally relates to mapping combinational logic to circuit elements of a programmable logic device, and more particularly, mapping combinational logic to lookup tables (LUTs) of multiple sizes. Field programmable gate arrays (FPGAs), first introduced by XILINX in 1985, are becoming increasingly popular devices for use in electronics systems. For example, communications systems employ FPGAs in large measure for their re-programmability. In general, the use of FPGAs continues to grow at a rapid rate because FPGAs permit relatively short design cycles, reduce costs through logic consolidation, and offer flexibility in their re-programmability. The capabilities of and specifications for XILINX FPGAs are set forth in “The Programmable Logic Data Book,” published in 1998 by XILINX, Inc., the contents of which is incorporated herein by reference. Some types of FPGAs are implemented with a network of programmable logic blocks that include lookup tables (LUTs). A LUT is used to implement a user-programmed logic function of the LUT inputs, and the number of inputs for a particular LUT depends upon the FPGA architecture. Mapping software converts a user's combinational logic circuit into a network of LUTs for an FPGA implementation. Examples of such mapping software include Chortle-crf and FlowMap. The Chortle software is described in “Chortle-crf: Fast Technology Mapping for Lookup Table-Based FPGAs” by Robert Francis, Jonathan Rose, and Zvonko Vranesic, proceedings of the Design Automation Conference 1991, pp 227-223. The FlowMap software is described in “FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designs” by J. Cong and Y. Ding, IEEE Transactions on CAD, February 1994, vol. 13, No. 1, pp 1-12. Mapping software typically attempts to optimize area, delay, or a combination of area and delay. Area is modeled as the total number of LUTs required for the mapping, and delay is typically modeled in terms of the number of levels of logic in the critical path from an input of the circuit to an output of the circuit that is determined to be the critical path. Chortle-crf attempts to produce a mapping having an optimized delay and area, and FlowMap attempts to produce a mapping having an optimized delay. The latest FPGAs have lookup tables of multiple sizes. Researchers have described algorithms for mapping to LUTs of multiple sizes while minimizing the delay. One such algorithm is described in a paper by Jianshe He and Jonathan Rose entitled “Technology Mapping for Heterogeneous FPGAs” (1994 ACM International Conference on FPGAs). Another mapping software package described by Jason Cong and Songjie Xu for mapping a network to multiple sizes of LUTs entitled “Delay-Optimal Technology Mapping for FPGAs with Heterogeneous LUTs,” presented at the 1998 Design Automation Conference improves upon the process by He and Rose. The process by He and Rose can take an indefinite amount of time. Cong and Xu state that their process can be performed in an amount of time proportional to where c is the number of LUT sizes, K A method that address the aforementioned problems, as well as other related problems, is therefore desirable. In various embodiments, the invention comprises processes for mapping logic nodes to a plurality of sizes of lookup tables in a programmable gate array. In one embodiment, a node and its predecessor nodes are selectively collapsed into a first single node as a function of delay factors associated with the plurality of sizes of lookup tables and a maximum of delay factors associated with the predecessor nodes. The term “collapse” is used here to mean that a plurality of original nodes and their input and output signals (where the output signal of one original node is an input signal to another original node) are replaced by a single node plus all the input signals and output signals except those that connect the original nodes. If a cut-size (number of inputs) associated with the first single node is less than or equal to the number of inputs to one of the sizes of lookup tables, the one size is selected to implement the first single node. If a lookup table size was not matched by the first single node, the node and its predecessor nodes are selectively collapsed into a second single node as a function of the delay factors, and the maximum delay factor is increased by a selected value. If a cut-size associated with the second single nodes is less than or equal to the number of inputs to one of the sizes of lookup tables, the one size is selected to implement the second single node. In another embodiment, each of the plurality of sizes of lookup tables has an associated delay factor. The process comprises: (a) initializing a counter to a selected value; (b) selecting one of the sizes of lookup tables; (c) collapsing into a single node the logic node and those of the predecessor logic nodes having delay factors greater than a maximum of delay factors associated with the predecessor logic nodes plus the counter value minus the delay factor of the one size lookup table; (d) if the single node has an associated cut-size that is less than or equal to the number of inputs to the one size lookup table, mapping to the one size lookup table the logic nodes that have been collapsed into the single node and that are within a cut of the single node; (e) if the associated cut-size of the single node is greater than the one size lookup table, selecting another one of the sizes of lookup tables to use as the one size; (f) repeating steps c through e until the logic node is mapped or all the sizes of lookup tables have been considered in mapping; (g) if the logic node has not been mapped to one of the sizes of lookup tables and all the sizes of lookup tables have been considered in mapping, incrementing the counter; and (h) repeating steps b through g until the counter value equals a least of delay factors of the sizes of lookup tables. In yet another embodiment, a process for mapping a logic node and its predecessor logic nodes to one of a plurality of sizes of lookup tables in a programmable gate array comprises: (a) initializing a collapse factor as a function of a maximum of respective delay factors associated with the predecessor logic nodes, wherein the collapse factor is greater than the maximum of the delay factors of the predecessor logic nodes; (b) selecting one of the sizes of lookup tables; (c) collapsing into a single node the logic node and the ones of the predecessor logic nodes having delay factors greater than the collapse factor minus the delay factor of the one size lookup table; (d) if the single node has an associated cut-size that is less than or equal to the number of inputs to the one size lookup table, mapping to the one size lookup table the logic nodes collapsed into the single node that are within a cut of the single node; (e) if the associated cut-size of the single node is greater than the one size lookup table, selecting another one of the sizes of lookup tables to use as the one size; (f) repeating steps c through e until the logic node is mapped or all the sizes of lookup tables have been considered in mapping. In another embodiment, a process for mapping comprises: (a) arranging the nodes to be processed in topological order such that all fan-ins for a node are processed before the node is processed; (b) getting the next node to be processed; (c) selecting the smallest size LUT, along with its LUT delay and cut size; (d) collapsing into a single node the node in process and all predecessor nodes having a delay greater than the maximum delay of the predecessor nodes minus the LUT delay of the selected LUT (so that collapsing reduces delay); (e) if the network formed by collapsing these nodes has a cut size less than or equal to the number of inputs of the selected LUT, mapping the collapsed node into the LUT and assigning the LUT delay to the collapsed node; (f) if the cut sizes don't match and there are more LUT sizes available, selecting a larger LUT size (with new LUT delay and cut size); (g) repeating steps d, e, and f until all LUT sizes have been tried or the node is mapped; (h) if mapping did not occur for any of the LUT sizes when all those predecessor nodes having delay greater than MAXDELAY minus LUTxDELAY were collapsed into a single node, then increase delay by a factor i and repeat steps c through g until the factor i is equal to delay of the fastest LUT; (i) selecting the fastest LUT and assigning the node to the selected LUT; and (j) repeating the process for all nodes. The above summary of the present invention is not intended to describe each disclosed embodiment of the present invention. The figures and detailed description that follow provide additional example embodiments and aspects of the present invention. Various aspects and advantages of the invention will become apparent upon review of the following detailed description and upon reference to the drawings in which: FIG. 1 is a graph of an example network of nodes prior to mapping to LUTs of a programmable gate array; FIG. 2 is a graph of a network after mapping the nodes to respective 4-LUTs; FIG. 3 is a graph of the network after mapping the nodes to two 8-LUTs; FIG. 4 is a graph of the network after mapping the nodes to an 8-LUT and a 4-LUT; FIG. 5 is a flowchart of a process for mapping a network of nodes to LUTs having various sizes in accordance with an example embodiment of the invention; FIG. 6 (comprising FIGS. 6A and 6B) is a flowchart of an example process for determination of the minimum delays of nodes in a network as implemented with various sizes of LUTS; FIG. 7A is a graph of an example network having a plurality of logic nodes; FIG. 7B illustrates a plurality of nodes collapsed into a single node; FIG. 8A is a graph of an example network of logic nodes; FIG. 8B illustrates how a node of FIG. 8A cannot be combined with its predecessor nodes to form a node having a delay that is equal to the maximum delay of its fan-ins; FIG. 8C is a graph that illustrates the process of FIG. 6; FIG. 8D is a graph that further illustrates the process of FIG. 6; FIG. 8E is a graph that further illustrates the process of FIG. 6; FIG. 9 is a flowchart of an example process for packing LUT mappings; and FIG. 10 is a graph of an example network, initially mapped to 4-LUTs, and subsequently packed into 8-LUTs. While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the detailed description is not intended to limit the invention to the particular forms disclosed. On the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims. The present invention is believed to be applicable to a variety of programmable gate arrays having LUTs of various sizes and delays. For example, the present invention has been found to be particularly applicable and beneficial for programmable gate arrays having 4-input and 8-input LUTs. While the present invention is not so limited, an appreciation of the present invention is presented by way of specific examples, in this instance a programmable gate array having 4-input and 8-input LUTs. A LUT having 4 inputs will be referred to as a “4-LUT,” a LUT having 8 inputs will be referred to as an “8-LUT,” and a LUT having x inputs will be referred to as an “x-LUT.” FIG. 1 is a graph of an example network While not shown, it will be appreciated that the inputs a, b, c, d, e, f, g, h, and i and the output FIGS. 2, FIG. 2 is a graph of the network The combined delay factors of the 4-LUTs FIG. 3 is a graph of the network The overall delay of the network As compared to the mapping of FIG. 2, there is a difference of 1 unit between the overall delay of the 4-LUTs FIG. 4 is a graph of the network As compared to the mappings of FIGS. 2 and 3, the mapping of FIG. 4 provides an overall delay that is less than the delay of the mapping of FIG. FIG. 5 is a flowchart of a process for mapping a network of nodes to LUTs having various sizes in accordance with an example embodiment of the invention. The process generally proceeds in three phases, wherein phase In phase At the end of phase In phase Phase FIG. 6 (comprising FIGS. 6A and 6B) is a flowchart of an example process for determining the minimum delays of nodes in a network as implemented with various sizes of LUTs. The objective of the process for determining the minimum delays of the nodes in the network is to find the smallest possible delay of a given node in the network when the node and its predecessor nodes are mapped to the various sizes of LUTs. In processing a particular node, the minimum delay process generally first checks whether the delay factor of the node when combined with selected ones of its predecessor nodes, or “fan-ins,” can be the same as the maximum delay of its fan-ins. If the node cannot be combined as indicated, the process then checks whether the node when combined with selected ones of its fan-ins can be the same as the maximum delay of its fan-ins plus a selected value. If the second combination is not possible, a LUT size is selected based on the size that has the minimum delay factor. The process of FIG. 6 is described in conjunction with the example graphs of FIGS. 7A-7B and At step FIG. 7A is a graph of an example network having nodes If, for example, a programmable gate array architecture has available 4-LUTs and 8-LUTs, step At step The dashed oval Decision step After collapsing, the network consists of a single node that subsumes nodes If there are more nodes to process, step Steps Step FIGS. 8B and 8C illustrate how node FIG. 8C illustrates an iteration through steps Step Beginning with a 4-LUT, dashed oval As shown by dashed oval Decision step If decision step If a node cannot be combined with selected ones of its fan-ins such that the combination is either MAXDELAY (steps FIG. 9 is a flowchart of a process for packing the LUT mappings established in steps The critical limit is determined by the “slack” value. Initially, each output LUT is assigned a slack value that is equal to the difference between the maximum allowed delay at any of the output LUTs and the actual delay of the respective output LUT, as shown by step Each LUT in the network is processed in the order described above. Step Step slack(LUT F)=slack(LUT A)+delay(LUT A)−delay(LUT F)−LUTXDELAY. The process continues as long as there are more LUTS, as shown by decision step At the end of the packing process, the delay of the critical path remains unchanged (since the critical path has a slack of zero). However, LUTs on non-critical paths may have been re-mapped to LUTs of different sizes, packed with predecessor or successor LUTs, and assigned larger delays, wherein no path is made to have a delay that exceeds the delay of the critical path. The packing process is expected to result in a network having fewer LUTs and a smaller area. However, there may be some duplication of LUTs caused by fanout of the packed LUTs. FIG. 10 is a graph of an example network Accordingly, the present invention provides, among other aspects, a method for mapping combinational logic to multiple sizes of LUTs in programmable logic devices. Other aspects and embodiments of the present invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. It is intended that the specification and illustrated embodiments be considered as examples only, with a true scope and spirit of the invention being indicated by the following claims. Patent Citations
Non-Patent Citations
Referenced by
Classifications
Legal Events
Rotate |