US 20010003843 A1 Abstract A system for determining an affinity associated with relocating a cell located on a surface of a semiconductor chip to a different location on the surface is disclosed herein. Each cell may be part of a cell net containing multiple cells. The system initially defines a bounding box containing all cells in the net which contains the cell. The system then establishes a penalty vector based on the bounding box and borders of a region containing the cell, computes a normalized sum of penalties for all nets having the cell as a member, and calculates the affinity based on the normalized sum of penalties. Also included in the disclosed system are methods and apparatus for capacity and utilization planning of the use of the floor, or the surface area, and the methods and apparatus for parallelizing the process of affinity based placements using multiple processors. Finally, method and apparatus for connecting the cells based on a Steiner Tree method is disclosed.
Claims(21) 1. A method for locating a plurality of elements on a surface, said method comprising the steps of:
assigning the elements to portions of the surface; preplacing the elements onto the surface; repositioning the elements depending on relative affinities of the elements to each other; and connecting the elements on the surface. 2. A method according to claim 1 3. A method according to claim 1 4. A method according to claim 3 partitioning the surface into a grid comprising a plurality of regions; defining pieces, each piece comprising at least one of said regions, and each piece having a capacity; allocating said capacity of each of said pieces to predefined groups of the elements; and reallocating said capacity of said pieces to said groups of the elements. 5. A method according to claim 1 6. A method according to claim 5 7. A method according to claim 5 dividing the surface into a plurality of regions; assigning non-adjacent regions to said processors, each processor determining affinities of the elements of its assigned region; and repositioning the elements. 8. A method according to claim 5 assigning the elements to said processors, each processor determining affinities of its assigned elements; and repositioning the elements. 9. A method according to claim 1 partitioning the elements into a plurality of sets, each set having at least a predetermined number of elements; constructing a minimal spanning tree having vertices and edges, said vertices of said spanning tree representing the elements and said sets; and connecting the elements per said edges of said minimal spanning tree. 10. A computer-implemented method for locating a plurality of elements on a surface, said method comprising the steps of:
forming a neighborhood defined as a set of the elements; ordering the elements within each said neighborhood according to their relative distance from said target element; preplacing the elements within a two-dimensional abstraction of the surface; iteratively subdividing the surface into a plurality of regions; assigning the elements to said plurality of regions; calculating affinities of the elements using a plurality of processors; moving the elements based on affinities of the elements; levelizing element density over the surface based on relationships between the elements; relocating any overlapping elements; and performing a final cell adjustment for element positions. 11. A computer-implemented method according to claim 10 dividing the surface into a plurality of regions; and assigning non-adjacent regions to each of said plurality of processors to place the elements onto said regions simultaneously. 12. A computer-implemented method according to claim 10 assigning the elements to each of said plurality processors; and determining the element placements by simultaneously operating said plurality of processors. 13. A computer-implemented method according to claim 10 14. An apparatus for placing a plurality of elements on a surface, said apparatus comprising:
a processor; memory connected to said processor; said memory having instructions for said processor to assign the elements to portions of the surface; to preplace the elements onto the surface; to reposition the elements depending on relative affinities of the elements to each other; and to connect the elements on the surface. 15. An apparatus according to claim 8 16. An apparatus according to claim 9 17. An apparatus according to claim 8 18. An apparatus according to claim 8 19. An apparatus according to claim 8 20. A machine-readable storage medium containing instructions for a processor, said instructions comprising the steps for locating a plurality of elements on a surface and comprising the steps of:
assigning the elements to portions of the surface; preplacing the elements onto the surface; repositioning the elements depending on relative affinities of the elements to each other; and connecting the elements on the surface. 21. A storage medium according to claim 5 Description [0001] This is a continuation-in-part of co-pending application Ser. No. 08/672,535, filed Jun. 28, 1996. [0002] 1. Field of the Invention [0003] The present invention generally relates to the art of microelectronic integrated circuit layout, and more specifically to the art of placement and routing of cells on integrated circuit chips. [0004] 2. Description of Related Art [0005] a. Introduction [0006] Microelectronic integrated circuits consist of a large number of electronic components which are fabricated by layering several different materials on a silicon base or wafer. The design of an integrated circuit transforms a circuit description into a geometric description which is known as a layout. A layout consists of a set of planar geometric shapes in the various layers of the silicon chip. [0007] The process of converting the specifications of an electrical circuit into a layout is called the physical design. Physical design requires arranging elements, wires, and predefined cells on a fixed area, and the process can be tedious, time consuming, and prone to many errors due to tight tolerance requirements and the minuteness of the individual components. [0008] Currently, the minimum geometric feature size of a component is on the order of 0.5 microns. Feature size may be reduced to 0.1 micron within several years. This small feature size allows fabrication of as many as 10 million transistors or approximately 1 million gates of logic on a 25 millimeter by 25 millimeter chip. This feature size decrease/transistor increase trend is expected to continue, with even smaller feature geometries and more circuit elements on an integrated circuit. Larger chip sizes will allow far greater numbers of circuit elements. [0009] Due to the large number of components and the exacting details required by the fabrication process, physical design is not practical without the aid of computers. As a result, most phases of physical design extensively use Computer Aided Design (CAD) tools, and many phases have already been partially or fully automated. Automation of the physical design process has increased the level of integration, reduced turn around time and enhanced chip performance. [0010] The object of physical chip design is to determine an optimal arrangement of devices in a plane and to find an efficient interconnection or routing scheme between the devices to obtain the desired functionality. Since space on the chip surface is at a premium, algorithms must use the space very efficiently to lower costs and improve yield. The arrangement of individual cells in an integrated circuit chip is known as a cell placement. [0011] Each microelectronic circuit device or cell includes a plurality of pins or terminals, each of which is connected to pins of other cells by a respective electrical interconnect wire network or net. A goal of the optimization process is to determine a cell placement such that all of the required interconnects can be made, and the total wirelength and interconnect congestion are minimized. [0012] Prior art methods for achieving this goal comprise generating one or more initial placements, modifying the placements using optimization methodologies including genetic algorithms such as simulated evolution, force directed placement or simulated annealing, described hereinbelow, and comparing the resulting placements using a cost criteria. [0013] Depending on the input, placement algorithms are classified into two major groups, constructive placement and iterative improvement methods. The input to the constructive placement algorithms consists of a set of blocks along with the netlist. The algorithm provides locations for the blocks. Iterative improvement algorithms start with an initial placement. These algorithms modify the initial placement in search of a better placement. The algorithms are applied in a recursive or an iterative manner until no further improvement is possible, or the solution is considered to be satisfactory based on a predetermined criteria. [0014] Iterative algorithms can be divided into three general classifications: simulated annealing, simulated evolution and force directed placement. The simulated annealing algorithm simulates the annealing process that is used to temper metals. Simulated evolution simulates the biological process of evolution, while the force directed placement simulates a system of bodies attached by springs. [0015] Assuming that a number N of cells are to be optimally arranged and routed on an integrated circuit chip, the number of different ways that the cells can be arranged on the chip, or the number of permutations, is equal to N! (N factorial). In the following description, each arrangement of cells will be referred to as a placement. In a practical integrated circuit chip, the number of cells can be hundreds of thousands or millions. Thus, the number of possible placements is extremely large. [0016] Interactive algorithms function by generating large numbers of possible placements and comparing them in accordance with some criteria which is generally referred to as fitness. The fitness of a placement can be measured in a number of different ways, for example, overall chip size. A small size is associated with a high fitness and vice versa. Another measure of fitness is the total wire length of the integrated circuit. A high total wire length indicates low fitness and vice versa. [0017] The relative desirability of various placement configurations can alternatively be expressed in terms of cost, which can be considered as the inverse of fitness, with high cost corresponding to low fitness and vice versa. [0018] b. Simulated Annealing [0019] Basic simulated annealing per se is well known in the art and has been successfully used in many phases of VLSI physical design such as circuit partitioning. Simulated annealing is used in placement as an iterative improvement algorithm. Given a placement configuration, a change to that configuration is made by moving a component or interchanging locations of two components. Such interchange can be alternatively expressed as transposition or swapping. [0020] In the case of a simple pairwise interchange algorithm, it is possible that a configuration achieved has a cost higher than that of the optimum, but no single interchange can cause further cost reduction. In such a situation, the algorithm is trapped at a local optimum and cannot proceed further. This happens quite often when the algorithm is used in practical applications. Simulated annealing helps to avoid getting achieving and maintaining a local optima by occasionally accepting moves that result in a cost increase. [0021] In simulated annealing, all moves that result in a decrease in cost are accepted. Moves that result in an increase in cost are accepted with a probability that decreases over time as the iterations proceed. The analogy to the actual annealing process is heightened with the use of a parameter called temperature T. This parameter controls the probability of accepting moves that result in increased cost. [0022] More of such moves are accepted at higher values of temperature than at lower values. The algorithm starts with a very high value of temperature that gradually decreases so that moves that increase cost have a progressively lower probability of being accepted. Finally, the temperature reduces to a very low value which requires that only moves that reduce costs are to be accepted. In this way, the algorithm converges to an optimal or near optimal configuration. [0023] In each stage, the placement is shuffled randomly to get a new placement. This random shuffling could be achieved by transposing a cell to a random location, a transposition of two cells, or any other move that can change the wire length or other cost criteria. After the shuffle, the change in cost is evaluated. If there is a decrease in cost, the configuration is accepted. Otherwise, the new configuration is accepted with a probability that depends on the temperature. [0024] The temperature is then lowered using some function which, for example, could be exponential in nature. The process is stopped when the temperature is dropped to a certain level. A number of variations and improvements on the basic simulated annealing algorithm have been developed. An example is described in an article entitled “Timberwolf 3.2 A New Standard Cell Placement and Global Routing Package” by Carl Sechen, et al., IEEE 23rd Designed Automation Conference paper 26.1, pages 432 to 439. [0025] c. Simulated Evolution [0026] Simulated evolution, which is also known as the genetic algorithm, is analogous to the natural process of mutation of species as they evolve to better adapt to their environment. The algorithm starts with an initial set of placement configurations which is called the population. The initial placement can be generated randomly. The individuals in the population represent a feasible placement to the optimization problem and are actually represented by a string of symbols. [0027] The symbols used in the solution string are called genes. A solution string made up of genes is called a chromosome. A schema is a set of genes that make up a partial solution. The simulated evolution or genetic algorithm is iterated, and each iteration is called a generation. During each iteration, the individual placements of the population are evaluated on the basis of fitness or cost. Two individual placements among the population are selected as parents, with probabilities based on their fitness. A better fitness for an individual placement increases the probability that the placement will be chosen. [0028] The genetic operators are called crossover, mutation and inversion, which are analogous to their counterparts in the evolution process, are applied to the parents to combine genes from each parent to generate a new individual called the offspring or child. The offspring are evaluated, and a new generation is formed by including some of the parents and the offspring on the basis of their fitness in a manner such that the size of the population remains the same. As the tendency is to select high fitness individuals to generate offspring, and the weak individuals are deleted, the next generation tends to have individuals that have good fitness. [0029] The fitness of the entire population improves with successive generations. Consequently, overall placement quality improves over iterations. At the same time, some low fitness individual cell placements are reproduced from previous generations to maintain diversity even though the probability of doing so is quite low. In this way, it is assured that the algorithm does not lock into a local optimum. [0030] The first main operator of the genetic algorithm is crossover, which generates offspring by combining schemata of two individuals at a time. Combining schemata entails choosing a random cut point and generating the offspring by combining the left segment of one parent with the right segment of the other. However, after doing so, some cells may be duplicated while other cells are deleted. This problem will be described in detail below. [0031] The amount of crossover is controlled by the crossover rate, which is defined as the ratio of the number of offspring produced by crossing in each generation to the population size. Crossover attempts to create offspring with fitness higher than either parent by combining the best genes from each. [0032] Mutation creates incremental random changes. The most commonly used mutation is pairwise interchange or transposition. This is the process by which new genes that did not exist in the original generation, or have been lost, can be generated. [0033] The mutation rate is defined as the ratio of the number of offspring produced by mutation in each generation to the population size. It must be carefully chosen because while it can introduce more useful genes, most mutations are harmful and reduce fitness. The primary application of mutation is to pull the algorithm out of local optima. Inversion is an operator that changes the representation of a placement without actually changing the placement itself so that an offspring is more likely to inherit certain schema from one parent. [0034] After the offspring are generated, individual placements for the next generation are chosen based on some criteria. Numerous selection criteria are available, such as total chip size and wire length as described above. In competitive selection, all the parents and offspring compete with each other, and the fittest placements are selected so that the population remains constant. In random selection, the placements for the next generation are randomly selected so that the population remains constant. [0035] The latter criteria is often advantageous considering the fact that by selecting the fittest individuals, the population converges to individuals that share the same genes and the search may not converge to an optimum. However, if the individuals are chosen randomly there is no way to gain improvement from an older generation to a new generation. By combining both methods, stochastic selection chooses probabilities based on the fitness of each individual. [0036] d. Force Directed Placement [0037] Force directed placement exploits the similarity between the placement problem and the classical mechanics problem of a system of bodies attached to springs. In this method, the blocks connected to each other by nets are supposed to exert attractive forces on each other. The magnitude of this force is directly proportional to the distance between the blocks. Additional proportionality is achieved by connecting more “springs” between blocks that “talk” to each other more (volume, frequency, etc.) and fewer “springs” where less extensive communication occurs between each block. [0038] According to Hooke's Law, the force exerted due to the stretching of the springs is proportional to the distance between the bodies connected to the spring. If the bodies are allowed to move freely, they would move in the direction of the force until the system achieved equilibrium. The same idea is used for placing the cells. The final configuration of the placement of cells is the one in which the system achieves a solution that is closest to actual equilibrium. [0039] e. Parallel Processing Technique 1 [0040] Because of the large number of possible placements, computerized implementation of the placement algorithms discussed above can take many days. In addition, the placement algorithm may need to be repeated with different parameters or different initial arrangements to improve the results. [0041] To reduce the time required to place optimally the cells, multiple processors have been used to speed up the process. In such implementations, multiple processors operate simultaneously to place optimally the cells on the integrated chip. However, such prior efforts to reduce the placement time by parallel processing of the placement methods have been impeded by three obstacles. [0042] First, multiple processors may conflict with each other. This occurs where an area on the chip, which is being processed by one processor, is affected by movements of one or more cells into the area by another processor. When this occurs, one of the two conflicting processors must wait for the other to finish or postpone its own move for later. The area-conflict problem not only lessens the advantage of multiprocessing, but also increases the processing overhead encountered. This is because, before moving a cell, each of the processors must check for area-conflicts with all other processors. As the number of processors increases, the area-conflicts increase rapidly to negate the advantage of multiprocessing, such that the time required to place the cells is increased. [0043] Second, the optimization process can become trapped in a local optimum. To eliminate the area-conflict problem, some systems have assigned particular core areas to each of the processors with the restriction that each of the processors only operate within its assigned area. After processing cells of the assigned areas, the processors are then assigned to different areas, and so on. Although this method eliminates area-conflicts, it limits the movements of the cells to the area assigned to the processor. The limitation on the movement of the cells increases the likelihood of the placement becoming stuck at a local optimum. In the case of a pairwise interchange algorithm, it is possible that a configuration achieved is at a local optimum such that any further exchange within the limited area will not result in a further reduction in cost. In such a situation, the algorithm is trapped at the local optimum and does not proceed further. This happens frequently when the algorithm is used in practical applications, and the extent of the local optimum problem increases as additional processors are added because the increase in the number of processors operating simultaneously reduces the area assigned to each of the processors. Decreases in the area assigned to each of the processors lead to corresponding decreases of the distances the cells of the areas may be moved to improve the optimization. [0044] Third, if multiple processors are used simultaneously to place the cells of an integrated chip, it is possible for the processors to deadlock. This occurs where each of the processors has halted its operation while waiting for another processor to complete its operations. In this situation, all processing is stopped and the system halts. An example of deadlock is where processor P [0045] In short, because of the ever-increasing number of cells on an integrated chips (currently at millions of cells on a chip), and the resulting increase in the number of possible placements of the cells on the chip, a computer is used to find an optimal layout of the cells on the chip. Even with the aid of computers, existing methods can take several days to place a large number of cells, and these methods may need to be repeated with different parameters or different initial arrangements. To decrease the time required to place the chip, multiple processors have been used to perform the placement of the cells. However, the use of multiple processors has led to area-conflicts, local optimum problems, and potential deadlock situations, negating the advantages of using the multiple processors. [0046] f. Parallel Processing Technique 2 [0047] Alternative to the Parallel Processing Technique 1 discussed above, another technique to implement parallel processing of cell placement algorithms is described below. [0048] The problems associated with the prior art parallelization techniques of assigning regions to multiple processors is illustrated using FIG. 43. The figure illustrates a grossly simplified integrated circuit chip (IC) with four nets [0049] The first problem is the crossover net problem. If the regions are divided such that crossover nets are created, then the effectiveness of the parallel processing technique is reduced. This is because none of the processors which share the crossover nets can accurately calculate the position of the (which is always the basis for the decision about the cell move) because the other processor may move its cell during the calculation. Naturally, as the number of processors increases, the number of crossover nets increases, aggravating the problem. A large number of crossover nets can be fatal for the convergence of cell placement algorithms. For example, in FIG. 43, nets [0050] Second, cell movements from one region (or processor) to another creates communications overhead which may negate the advantages of multiple processor cell placement technique. Each time a cell is moved from one region to another, the processor moving the cell from its assigned region must communicate with the processor receiving the cell to its assigned region. The communication requirement complicates the implementation of cell placement algorithms and slows down both of the communicating processors. As the number of processors, the number of cells, or the number of required cell moves increase, the communication overhead increases. In particular, the performance of the parallel processing technique is especially poor if the spring density levelization method is used as the cell placement algorithm because the algorithm tends to make global cell moves. [0051] Third, to minimize crossover nets and communications overheads, the prior art parallelization techniques typically require a “good” preplacement of the cells on the chip. That is, in order to operate effectively, the prior art methods require the nets to be within a single region and the cells of the nets to be “close” to each other. The best way to achieve this is to increase the region size and decrease the number of processors running in parallel. However, the increase in the region size and the decrease in the number of parallel processors defeat the purpose of parallelizing the cell placement algorithm. Moreover, even with such preplacement of cells, there are generally still many crossover nets. [0052] In order to avoid the problems associated with crossover nets, regions have to be made larger. Use of large regions has the disadvantage in that it limits the number of processors that can be used. In fact, if the entire integrated chip is defined as one region, and only one processor is assigned to place the cells of the chip, then there would be no crossover net problems or communications overhead; but, there also is no parallel processing, and the cell placement becomes a sequential process. Finally, the prior art technique of assigning regions of the IC to each of the multiple processors lead to the problem of unbalanced work load. Because each of the regions may contain varying number of nets, cells, or cells requiring further movements, it is difficult to assign regions to the processors so as to assign equal amount of work to each of the processors. Consequently, some processors finish the placement of the cells of its assigned regions more quickly than other processors, reducing the effectiveness of parallelization of the placement algorithm. [0053] In short, assigning multiple processors have been used implement cell placement algorithms by assigning regions of the IC to each of the processors. However, this technique has lead to crossover net conflicts, interprocessor communication problems, cell preplacement requirements, and uneven distribution of work problems, negating the advantages of using the multiple processors. [0054] g. Floor Plan Optimization [0055] The cost or the desirability of various placement configuration can be measured using other methods such as capacity distribution and utilization ratio. Capacity distribution and utilization ratios measure the placement of the cells for each of the functional blocks for the integrated circuit. An integrated circuit is designed with various functional blocks, or functions, which, operating together, achieves the desired operation. [0056] Each of the functions of the circuit is implemented by a plurality of cells and is assigned a portion of the core space upon which the cells are placed. For example, an integrated circuit design may require the use of a central processor unit (CPU) function, memory function, and some type of input/output (I/O) function. [0057] In this Subsection, Subsection 1c-b, Section 3B and in the corresponding claims of this document, the terms and phrases “core,” “core space,” “core area,” “floor,” “floor space,” and “integrated circuit,” will be used interchangeably to refer to the area of the integrated circuit upon which cells are placed to implement various functions of the integrated circuit. [0058] The capacity is the maximum amount of cells which can be placed on the core space or any portion of the core space and is usually measured in cell height units. Provided that entire core space has sufficient capacity, it is often desirable to place the cells on the core space with a certain capacity distribution. For instance, it may be desirable that the cells of the integrated circuit be distributed evenly throughout the chip to avoid high concentration of the cells in a small location with a low concentration of the cells for the rest of the core space. On the other hand, it may be desirable to implement certain functions of the chip on a small portion of the core space with a high concentration of the cells. In sum, a predetermined capacity distribution of the core space or for any function assigned to a portion of the core space may be one of the requirements of the cell placement. [0059] A closely related concept is the utilization of the space. The utilization is the ratio of the amount of the actual core space use within a predefined portion of the core space to the capacity of the core space for the predefined portion of the core space. For example, if a portion of the core space assigned to a function has a capacity of 100,000 cell height units, and the cells to implement the function uses 50,000 cell height units, then the utilization of the portion of the core space is 50 percent. [0060] The capacity distribution or the utilization ratio for each of the functions of the integrated circuit or for the entire core space may be predetermined as an engineering parameter based on such factors as heat dissipation, power management, manufacturing constraints, etc. [0061] The current methods of optimally placing the cells on the integrated circuit involve (1) assigning functions to be implemented to portions of the integrated circuit; (2) placing the cells of each of the functions onto the assigned portion of the integrated circuit using a placement algorithm; (3) calculating the capacity distribution of the integrated circuit and the utilization rate of each portion of the integrated circuit used to implement its function; and (4) iterating the first three steps to obtain a better placement in terms of capacity distribution or utilization. [0062] The disadvantages of the current process involve time and accuracy. Because the placement process requires manual iteration between floor planing tools (to calculate and evaluate capacity and utilization) and placement tools (to newly place the cells onto the core), the optimal placement process takes a long time. Also, is difficult to manually optimize many different parameters simultaneously because, at each iteration, the operator has to simultaneously consider many parameters - overall capacity, capacity distribution, overall utilization, utilization of each functions, utilization distribution, overlap size among functions, aspect ratio of functions, etc. Even with highly experience professionals, the simultaneous consideration of all of the parameters for an optimal cell placement is an extremely difficult process. Further, the complexity of the cell placement process is continually increasing as the number of functions and the number of cells on integrated chips increase, rendering manual analysis techniques to become nearly impossible to perform. [0063] In short, because of the ever-increasing complexity of integrated circuit chips and the number of cells required to implement the functions of the complex designs, the manual placement optimization methods are fast becoming obsolete. The manual floor planning and cell placement optimization process requires an inordinate amount of time because the process requires manual iteration between running floor plan tools and placement tools. In additional, it is extremely difficult, at best, for human beings to simultaneously optimize several parameters (function utilization, overlap size among functions, aspect ratios of functions, etc.). [0064] h. Net Routine [0065] Each microelectronic circuit device or cell includes a plurality of pins or terminals, each of which is connected to pins of other cells by a respective electrical interconnection wire network, or net. A purpose of the optimization process used in the physical design stage is to determine a cell placement such that all of the required interconnections can be made, but total wirelength and interconnection congestion are minimized. The process of determining the interconnections of already placed cells of an integrated circuit is called routing. [0066] Assuming that a number N of cells are to be optimally arranged and routed on an integrated circuit chip, the number of different ways that the cells can be arranged on the chip, or the number of permutations, is equal to N! (N factorial). In addition, each of the cells may require multiple connection points (or pins), each of which, in turn, may require connections to multiple pins of multiple cells. The possible routing permutations are even larger than the possible cell placements by many orders of magnitude. [0067] Because of the large number of possible placements and routing permutations, even computerized implementation of the placement algorithms discussed above can take many days. In addition, the placement and routing algorithms may need to be repeated with different parameters or different initial arrangements to improve the results. [0068] To reduce the time required to optimally route the nets, multiple processors have been used to speed up the process. In such implementations, multiple processors are assigned to different areas of the chip to simultaneously route the nets in its assigned areas. However, it has been difficult to evenly distribute the amount of routing required from each of the multiple processors. In fact, due to the nonlinear algorithm complexity, the obvious, always assumed parallelization which is to split the nets among the processors does not work because routing of one highest fanout net can take much longer than routing of all other nets of the integrated circuit. Such unbalanced parallelization of the routing function has been the norm in the art, leading to ineffective use of parallel processing power. [0069] In short, because of the ever-increasing number of cells on an integrated chips (currently at millions of cells on a chip), and the resulting increase in the number of possible routing of the cells and the nets on the chips, multiple processors are used to simultaneously route the nets of an integrated chip. However, even with the aid of computers, existing methods can take several days, and the addition of processors may not decrease the required time because of the difficulties of balancing the amount of work between the processors. [0070] i. Other Considerations [0071] The problem of cell placement is compounded by external requirements specific to each individual integrated circuit chip. In conventional chip design, the positions of certain “unmovable” cells (external interconnect terminals or pads, large “megacells” etc.) are fixed a priori by the designer. Given those fixed positions, the rest of the cells are then placed on the chip. Since the unmovable cells and pads are located or placed before the placement for the rest of the cells of chip has been decided on, it is unlikely that the chosen positions will be optimal. [0072] In this manner, a number of regions, which may have different sizes and shapes, are defined on the chip for placement of the rest of the cells. [0073] It is desirable to assign individual microelectronic devices or cells to the regions, or “partition” the placement such that the total interconnect wirelength is minimized. [0074] However, methodologies for accomplishing this goal efficiently have not been proposed heretofore. [0075] The general partitioning methodology is to hierarchically partition a large circuit into a group of smaller sub-circuits until each sub-circuit is small enough to be designed efficiently. Because the quality of the design may suffer due to the partitioning, the partitioning of a circuit requires care and precision. [0076] One of the most common objectives of partitioning is to minimize the cutsize which is defined as a number of nets crossing a cut. Also the number of partitions often appears as a constraint with upper and lower bounds. At chip level, the number of partitions is determined, in part, by the capability of the placement algorithm. [0077] The prior art accomplishes partitioning by means of a series of “bipartitioning” problems, in which a decision is made to assign a component to one of two regions. Each component is hierarchically bipartitioned until the desired number of components is achieved. [0078] Numerous alternate methodologies for cell placement and assignment are known in the art. These include quadratic optimization as disclosed in an article entitled “GORDIAN: VLSI Placement by Quadratic Programming and Slicing Optimization”, by J. Kleinhans et al, IEEE Trans. on CAD, 1991, pp. 356-365, and simulated annealing as described in an article entitled “A Loosely Coupled Parallel Algorithm for Standard Cell Placement”, by W. Sun and C. Sechan, Proceedings of IEEE/ACM IC-CAD Conference, 1994, pp. 137-144. [0079] These prior art methods cannot simultaneously solve the partitioning problem and the problem of placing partitions on the chip, and thus the applicability of such methods to physical design automation systems for integrated circuit chip design is limited. [0080] More specifically, prior art methods do not provide any metric for specifying distances between cells based on netlist connections. An initial placement must be performed to establish physical locations for cells and thereby distances therebetween. [0081] Also, prior art methods fix cells in clusters at the beginning of optimization, and do not provide any means for allowing cells to move between clusters as optimization proceeds. This can create areas of high routing congestion, which cannot be readily eliminated because cell movements between clusters which could relieve the congestion are not allowed. [0082] In summary, the problem inherent in these prior cell placement methods is that repeated iterations generally do not tend to converge to a satisfactory relatively uniform overall cell placement for large numbers of cells. The aforementioned methods can take several days to place a large number of cells, and repeating these methods with different parameters or different initial arrangements may not necessarily provide improvements to cell placement. Typical methods for using these designs involve using a chosen method until a particular parameter, for example wire length, achieves a certain criteria or the method fails to achieve this criteria for a predetermined number of runs. The results are inherently non-optimal for other placement fitness measurements, having optimized the method based only on a single parameter. Further, results of these placement techniques frequently cannot be wired properly, or alternately, the design does not meet timing requirements. For example, with respect to simulated annealing, setting the temperature to different values may, under certain circumstances, improve placement, but efficient and uniform placement of the cells is not guaranteed. [0083] According to the present invention, there is provided a method and an apparatus for locating a plurality of elements on a surface. The method comprises the of the steps of assigning the elements to portions of the surface; preplacing the elements onto the surface; repositioning the elements depending on relative affinities of the elements to each other; and connecting the elements on the surface. Specifically, the present invention applies the above method for placing cells on an integrated circuit chip. [0084] According to another embodiment of the present invention, a computer-implemented method and apparatus for locating a plurality of elements on a surface is disclosed. The method comprises the steps of forming a neighborhood defined as a set of the elements; ordering elements within each neighborhood according to their relative distance from said target element; preplacing the elements within a two-dimensional abstraction of said surface; iteratively subdividing the surface into a plurality of regions; assigning the elements to the regions; calculating affinities of the elements using a plurality of processors; moving the elements based on affinities for relocating said elements; levelizing element density over the surface based on the affinities between various elements; relocating any overlapping elements; and performing a final cell adjustment for element positions. [0085] According to another aspect of the present invention, there is provided a method and apparatus for maximizing effectiveness of parallel processing to achieve an optimal cell placement layout of a core area of an integrated chip. The core area is defined as the area on the integrated chip upon which the cells are to be placed. The method is realized by dividing the core area into a plurality of regions, assigning a set of non-adjacent regions to each of the multiple processors, and allowing each of the multiple processors to process the cells of the regions to which it has been assigned. [0086] Because each of the multiple processors is operating upon a non-adjacent region at any one time, most of the cell movements of one processor are “far enough” from the cell movements of the other processes so as to minimize conflict between processors. Consequently, no limits need be placed upon the areas within which a processor operates or cell movements are made. Because the cell moves allowed by the invention disclosed herein are not limited, acceptance of undesirable local optimal solutions is avoided. [0087] According to another aspect of the present invention, there is provided a method and apparatus for placing cells on an integrated circuit chip by assigning cells, rather than regions of the chip, to a plurality of processors and having each of the processors place its assigned cells on the chip. The cells are assigned to the chips so as to balance the workload among the processors. [0088] To reduce crossover nets and inter-processor communications overhead, the affinities of the individual cells to each of the multiple processors are calculated, and the affinity values are used to reassign the cells to other processors. However, the affinity values are also weighed against the processor work load to maintain a balanced work load among the processors. In addition, because the processors are assigned to cells instead of regions, the cell placement algorithms become inherently less dependent upon the initial placement of the cells on the integrated circuit. [0089] The parallelization techniques of the present invention can be modified for different placement algorithms because the method describes a way to implement any placement algorithm using multiple processors operating simultaneously. [0090] To overcome the difficulties of the current floor planning techniques, the floor planning method and apparatus disclosed in this specification provides for a method to optimize a given floorplan automatically while meeting required capacity distribution and utilization. The disclosed new floor planning technique achieves almost uniform utilization of the chip by optimally using the overlap and border regions of the functions while satisfying the given floorplan constraints. [0091] According to another aspect of the present invention, there is provided a method and apparatus for allocating floor space of an integrated circuits chips to functions of the chip comprising the steps of partitioning the IC into a grid comprising a plurality of regions; defining pieces, where each piece comprises at least one of the regions, and each piece having a capacity; and allocating said capacity of each of said pieces to a plurality of functions. [0092] In addition, after the allocation of the capacity of the pieces to the functions, the sufficiency of the allocated capacities are analyzed for each of the functions. Then, the capacities of the pieces are reallocated to shift excess capacities from the functions with excess capacities to the functions with a shortage of space. [0093] According to another aspect of the present invention, there is provided a method and apparatus for grouping the pins of a cell placement layout of an integrated circuit to achieve a balanced performance for parallel processing of the cell routing. First, the pins of the net are partitioned into neighborhoods and the best partitions are selected. Then, the pins are reassigned into better partitions and a minimal spanning method is used to create a graph structure of the partitions of the pins to create a minimally partitioned nets. The minimal spanning tree (MST) of the pins, thus defining the nets, can be used to assign groups of the pins to the multiple CPU's. The multiple CPU's simultaneously, or in parallel, connect the pins, thus routing the net. [0094] The method of the present invention can be applied to the entire set of pins of an integrated circuit, but is best applied to each of the nets of the integrated circuit. [0095] An apparatus for locating a plurality of elements on a surface includes one or more processors and memory connected to the processors. The memory stores the instructions for performing above described tasks. The apparatus may include other components such as a monitor and a harddrive to store information regarding the elements and the surface, and to display the results of the operations as well as system status information. [0096] The present invention also includes a computer storage medium that stores a plurality of executable instructions for instructing a computer for organizing integrated circuit pins for routing purposes, including instructions to partition the pins into a plurality of sets; to construct a spanning tree having vertices and edges; and to assign the sets in accordance with said edges of said spanning tree. [0097] These and other aspects, features, and advantages of the present invention will be apparent to those persons having ordinary skilled in the art to which the present invention relates from the foregoing description and the accompanying drawings. [0098]FIG. 1A is a flow chart illustrating the main steps of the process according to the present invention; [0099]FIG. 1B is a flow chart illustrating the main steps of the process according to the present invention; [0100]FIG. 2 is an illustration of an exemplary integrated circuit chip; [0101]FIG. 3 is an illustration of a cell that has one pin connected to each net; [0102]FIG. 4 illustrates seven nets, each of which interconnect a plurality of cells; [0103]FIG. 5 is a flowchart illustrating the formation of a cell cluster or “neighborhood” in accordance with the present invention; [0104]FIG. 6 is an illustration of a center cell and nets; [0105]FIG. 7 shows that the cell v is assigned a coordinate between 0 and 1; [0106]FIG. 8 is a flowchart illustrating the iteration of the recomputing of the net and cell coordinates; [0107]FIG. 9 illustrates a cell having several pins which belong to the net; [0108]FIG. 10 is an illustration of a core divided into subregions; [0109]FIG. 11 is an illustration of a moveable cell within the core region; [0110]FIG. 12 is a flowchart that demonstrates the procedure for obtaining an initial one dimensional placement of the movable cells; [0111]FIG. 13 is an illustration of the coordinates of the nets along an imaginary line; [0112]FIG. 14 is an illustration of the coordinates of the nets along an imaginary line; [0113]FIG. 15 is an illustration of the placement of nets along the line in the direction partitioned along a dividing line providing two subregions containing the cell positions; [0114]FIG. 16 is an illustration of a region physically divided in half by a dividing point; [0115]FIG. 17 is an illustration of one cell located in each of the ten subintervals; [0116]FIG. 18 is a flowchart relating finding a levelizing cut point; [0117]FIG. 19 is an illustration of the calculation step which determines the offset of the cut line from the dividing line; [0118]FIG. 20 is an illustration of the two regions that are divided using two dividing lines; [0119]FIG. 21 is an illustration of adjacent cell location step which initially considers moving a cell from its current position to each of the adjacent regions, as well as considering leaving the cell in the current region; [0120]FIG. 22 is an illustration of the (A,B) interval which is subdivided into equal subintervals in subdivision step [0121]FIG. 23 is an illustration of the cell region having a certain number of columns, or possibly rows, located therein; [0122]FIG. 24 is an illustration of the penalty calculation step [0123]FIG. 25 is an illustration of a three pin net; [0124]FIG. 26 is an illustration of each dividing line partitions regions, and each of these regions has a capacity denoting the volume of cells which can fit within the region; [0125]FIG. 27 represents a region having indices (TX, TY). [0126]FIG. 28 illustrates an ordering of cells within the neighborhood; [0127]FIG. 29 is an illustration of the weight assignment step which assigns each cell a weight equal to the size of the neighborhood minus the index of the cell; [0128]FIG. 30 is an illustration of the weights of the neighborhood attraction in a direction; [0129]FIG. 31 is an illustration of the system which iterates a predetermined number of times, preferably once, calculating affinities; [0130]FIG. 32 is a flowchart associated with the density driven spring system; [0131]FIG. 33 is an illustration of a portion of the chip that has seven columns which are partitioned into maximal segments without blockages; [0132]FIG. 34 is a preferred order for scanning the regions; [0133]FIG. 35 is an illustration that denotes the top and bottom of the column; [0134]FIG. 36 is a flowchart of a preferred process adjusting cell spacing in the column to remove overlap with minimal noise; [0135]FIG. 37 illustrates a column containing cells of specified heights; and [0136]FIG. 38 is an illustration of the cells that are set to the grids by increasing the coordinate until the bottom of each cell reaches the closest horizontal grid line; [0137]FIG. 39 illustrates a possible partitioning of a core region; [0138]FIG. 40 illustrates an embodiment of the core region partition in accordance with the present invention; [0139]FIG. 41 is a flow-chart illustrating a method of sequencing core area regions in accordance with the present invention; [0140]FIG. 42 illustrates an integrated circuit chip; [0141]FIG. 43 is a flowchart illustrating the steps taken by the parallel processing technique of the present invention for simultaneous cell placement; [0142]FIG. 44 illustrates an example of a possible assignment of core space area to various functions; [0143]FIG. 45 illustrates a partitioning of core space according to one embodiment of the present invention; [0144]FIG. 46 illustrates the relationship between the partitioning grid and a function-area assignment layout; [0145]FIG. 47 illustrates the definition of pieces of the core space according to one embodiment of the present invention; [0146]FIG. 48 illustrates the pieces of the core space according to one embodiment of the present invention; [0147]FIG. 49 illustrates a graph of the functions of the core space of FIGS. [0148]FIG. 50 illustrates a graph of the functions of the core space of FIGS. [0149]FIG. 51 is a flow-chart illustrating a method of organizing the pins of an integrated circuit in accordance with a preferred embodiment of the present invention; [0150]FIG. 52 illustrates construction of neighborhoods of pins in accordance with a preferred embodiment of the present invention; [0151]FIG. 53 illustrates construction of partitions of pins in accordance with a preferred embodiment of the present invention; [0152]FIG. 54 illustrates modification of partitions of pins in accordance with a preferred embodiment of the present invention; AND [0153]FIG. 55 illustrates an apparatus according to a preferred embodiment of the present invention. [0154] An overall block diagram of the preferred implementation of the current invention is presented in FIG. 1. As will become apparent from the following detailed description, other embodiments can be implemented with highly effective results while still within the scope of the invention. [0155] SECTION 1: SYSTEM OVERVIEW [0156]FIGS. 1A and 1B comprise a flow chart that illustrates the main steps of the process according to the present invention. A brief description of the various steps of the process is presented with reference to FIGS. 1A and 1B. To facilitate describing and understanding the invention, this disclosure is divided into sections. This first section is a general overview of the process according to the present invention. Subsequent sections describe and explain the algorithms and process steps shown in FIGS. 1A and 1B with reference to other figures of the drawings as appropriate. [0157] The specific algorithms described herein, as well as the basic steps which they represent (even if they are replaced by different algorithms), are designed for implementation in a general purpose computer. Furthermore, each of the algorithms described herein, as well as the basic step it represents, can be encoded on computer storage media such as CD Roms, floppy disks and computer harddrives, whether alone or in combination with one or more of the algorithms and steps described herein. [0158] Given only the netlist, before the cells have been placed on the chip, there is no way using prior art techniques to compute the conventional geometric distance between two cells (the “Euclidean distance”) because no geometric coordinates exist for any cell. A new mathematical form of distance is defined in the algorithms according to the present invention in which the distance between cells can be calculated from the way in which connections in the netlist interconnect its cells. This distance measure plays a critical role in the analysis of the netlist for placement by the algorithms. [0159] The cell placement system according to the present invention performs placement as either a uniprocessor or multi- or parallel-processor procedure. Unlike previous systems in which a constructive heuristic provided an initial placement followed by a statistical improvement technique, the process according to the present invention constructs and optimizes placements in a series of highly integrated steps. [0160] Subsection 1-1: Data Preparation [0161] The use of placement techniques must, of course, be preceded by the step [0162] Subsection 1-2: Neighborhood Construction [0163] As shown at the start of the flowchart of FIG. 1A, the process according to the present invention constructs a neighborhood [0164] Subsection 1-3: Optimization of Cell Neighborhood System [0165] After the neighborhood of a cell is constructed, coordinates are assigned to each cell, and the neighborhood system is optimized using the center cell. The optimization technique is described in detail in §3 below. [0166] Subsection 1-3A: Parallel Cell Placement with Minimal Conflicts [0167] Placement of the cells on an integrated circuit chip can be performed in parallel, using multiple processors, by assigning cells to the multiple processors. Section 3A below, along with FIGS. 42 and 43, discusses the implementation technique of the parallel processing of the cell placement methods by assigning cells to the multiple processors. [0168] Subsection 1-3B: Floor Plan Optimization [0169] Prior to the very first preplacement of the cells on the IC surface, the functions of the IC (which the cells implement) must be assigned to the various portions of the IC surface. For instance, the CPU function (the cells implementing the CPU function) of the IC may be assigned to the central portion of the surface while the memory function (the cells implementing the memory function) may be assigned to the upper portions. Section 3B below describes the method and apparatus to optimally assign the portions of the IC surface to the functions to achieve an optimal floor plan. [0170] Subsection 1-4: Iterative One Dimensional Preplacement Optimization [0171] The cell coordinates are then iteratively optimized under the iterative one-dimensional (ID) preplacement optimization procedure described in §4. The purpose of this iterative ID preplacement optimization procedure is to get a fast, good cell preplacement. In the iterative one-dimensional preplacement optimization procedure of §4, the cells are pre-placed on a two-dimensional abstraction of the chip surface. The iterative one-dimensional preplacement optimization procedure begins with the step [0172] Subsection 1-5: Fast Procedure for Finding a Levelizing Cut Point [0173] A density-driven one dimensional preplacement is performed [0174] The surface abstraction is divided into subregions by bisection in a selected direction. A preferred levelization by bisection process [0175] Subsection 1-6: Median Control and Increase in Resolution [0176] A median control procedure [0177] The 1D preplacement optimization procedure of §4, the cut-point procedure of §5 and the median control procedure of §6 are then iterated for a specified number of times, and the average value of the cost function (e.g. wire length) for the iterations is computed. [0178] The 1D preplacement optimization procedure of §4, the cut-point procedure of §5 and the median control procedure of §6 are then again iterated for the specified number of times as a block [0179] Subsection 1-7: Universal Affinity Driven Discrete Placement Optimization [0180] An affinity-driven placement optimization [0181] Subsection 1-8: Density Driven Capacity Penalty System [0182] Anther method to calculate the cost of a cell placement layout is the relative density of the partitions of the surface. A density driven system to calculate cell affinity is discussed in by §8. [0183] Subsection 1-9: Wire Length Driven Affinity System [0184] An alternative embodiment of the present invention is to calculate cell affinities and placement costs according to the relative wire lengths of different designs. The wire length driven affinity system is discussed in §9. [0185] Subsection 1-10: Minimizing Maximal Cut Driven Affinity System [0186] Another parameter used to produce an affinity for improving cell placement is minimizing the maximal number of nets that intersect the unit segment of the grid system imposed of the surface abstraction of the chip. The net intersect minimization affinity system is discussed in §10. [0187] subsection 1-11: Neighborhood System Driven Optimization [0188] Each moveable cell v is located within a neighborhood Neigh(v) constructed in accordance with the optimization of cell neighborhood system procedure outlined above. That procedure yields an ordering of cells according to the cells' distance from the center of the neighborhood, after optimization. The neighborhood driven affinity system is described in §11. [0189] Subsection 1-12: Functional Sieve Optimization Technique [0190] The combination of affinities introduces an element of randomization. A deterministic system for combining affinities which converges at a relatively rapid rate is desired to optimally utilize affinities. Such a system which iteratively optimizes cell placement using a combination of affinities is the functional sieve approach. The functional sieve technique is described in §12. [0191] When the affinity-driven optimization is complete, the level of the subdivision of the chip surface is checked [0192] The typical number of iterations of block [0193] After a certain level of hierarchy is achieved, it may be desirable to stop the process at this point and not continue with further cell placement. This is especially true if one wants to obtain a fast estimate of cell placement. [0194] Subsection 1-13: Course Overflow Remover (Bulldozer) [0195] Continuing with FIG. 1B, after global levelization has been performed, there may still be some density peaks in the core area of the chip. A preferred procedure for density peaks removal is described in §13. The procedure for density peaks removal [0196] Subsection 1-14: Overlap Remover with Minimal Noise [0197] The overlap removal procedure [0198] Subsections 1-15. Sinusoidal Optimization, and 1-16, Dispersion Driven Levelizing System [0199] When the highest level of hierarchy is reached [0200] The cell density levelization preferably begins with a dispersion-driven step [0201] The sinusoidal optimization procedure [0202] Subsection 1-16A: Efficient Multiprocessing of Cell Placement Algorithms [0203] The above-discussed placement optimization techniques can be implemented using multiple processors to simultaneously place the cells on the integrated chip (IC) surface. In particular, the IC surface can be conceptually divided into a plurality of regions and the multiple processors assigned to several, non-adjacent regions to process the cells of the assigned regions simultaneously. Section 16A, along with FIGS. [0204] Subsection 1-17: Cell Placement Crystallization [0205] After applying the overlap removal procedure to remove the overlap, most of the cells are close to their final positions. The crystallization step places the cells in correct, final positions. Proper vertical cell spacing are computed so that horizontal wires can be routed over and between cells in the vertical columns. Vertical and local-horizontal “swaps” may be performed if doing so improves the cost functions. Cells must be assigned proper geometric coordinates so that their positions correspond to legal grid positions specified by the underlying chip architecture. All of these steps [0206] Subsection 1-18: In General [0207] An exemplary integrated circuit chip is illustrated in FIG. 2 and generally designated by the reference numeral [0208] The integrated circuit [0209] The cells [0210] For a particular electrical circuit having predefined input and output terminals and interconnected in a predetermined way, the problem for the chip designer is in constructing a layout indicating the positions of the modules such that the area on the chip surface occupied by wires and the overall layout area are minimized. [0211] The system shown in FIGS. 1A and 1B receives inputs for a user-specified integrated circuit design which includes a netlist. A connection between two or more interconnected elements of the circuit is known as a wiring net, or net. A netlist is a list of cells and nets. [0212] SECTION 2: NEIGHBORHOOD CONSTRUCTION [0213] A hyperedge is a series of pins which are interconnected, i.e., wired together with an electrically common connection. For example, a hyperedge having pins A, B, and C means that pins A, B, and C are all connected together with a common metal wire. The “length” l(q) of a wiring net or hyperedge is equal to the number of pins (vertices) that are interconnected by the net minus one. This can be represented mathematically as l(q)=|q|−1, where q is the net and |q| is the number of pins that are interconnected by the net q. [0214] A particular cell, especially a large cell, can have two or more pins that are interconnected by one net q, and for this reason |q| is the number of pins rather than the number of cells interconnected by a net q. However, for simplicity of description and illustration, the following examples will assume that each cell has only one pin connected to each net. [0215] An example is illustrated in FIG. 3. A net q [0216] A distance ρ(v [0217]FIG. 4 illustrates seven nets q [0218] The cell v [0219] There is a path from the cell v [0220]FIG. 4 also illustrates how to measure a distance ρ(v,q) between a cell v and a net q. This distance can be expressed mathematically as ρ(v,q)=min [0221] Measurement of the distance between the cell v [0222] In accordance with the present metric, a “range” range [0223] One further definition is necessary for understanding the present invention. A “border” is a list of all nets that have ranges equal to the index of the border. For example, a border having an index of 7 (border [0224] The borders can be considered as a series of concentric shells or quantum levels, with each border having an incrementally higher index and including nets having an incrementally higher range than the border with the next lower index. [0225]FIG. 5 is a flowchart illustrating the formation of a cell cluster or “neighborhood” N(v,M) in accordance with the present invention. The term “neighborhood” is illustrative of the fact that the clusters can be “fuzzy”, with one cell being included in two or more clusters, and two or more clusters being allowed to overlap. [0226] Initially, a target number M of cells are designated to be included in a neighborhood. A number of cells between 15 and 30 tends to work best, with the optimal number being about 20 cells in each neighborhood. The algorithm outlined below is executed until C [0227] The first step is to specify a particular cell v to constitute the center of the cluster N, and a value for M as indicated in a step [0228] The flowchart of FIG. 5 includes a plurality of nested loops indicated by broken lines. This notation indicates that all of the steps included within each loop are to be performed for all outer loops. [0229] A step [0230] The next step, designated as [0231] The next step [0232] The method of FIG. 5 will be described further with reference being made to an example illustrated in FIG. 6. This example includes a center cell v [0233] Step [0234] The net q [0235] In steps [0236] In step [0237] The next border is border [0238] The steps [0239] The next border is border [0240] Examination of the next border, border [0241] In this manner, clusters or neighborhoods are grown one border at a time until a maximum size is reached. In addition, the borders are grown by “hitting” nets having corresponding ranges through net interconnections starting at the center cell v. [0242] SECTION 3: OPTIMIZATION OF CELL NEIGHBORHOOD SYSTEM [0243] In the foregoing process of constructing neighborhoods, a list of the nets processed is generated. That list of nets includes all nets incident to cells included in the neighborhood. Once the neighborhood is established, coordinates are assigned to each individual cell. For each cell v, the neighborhood of the cell is constructed and optimized using the cell as the center. A target number of cells C [0244] In accordance with the current invention, we assign coordinates to each cell and to each net in the neighborhood. We assign the center V [0245] As shown in FIG. 8, we then iterate recomputing of net and cell coordinates by iterating two procedures, as follow: [0246] Procedure 1: The new net coordinates are computed such that for any net q within the set of nets Q,
[0247] where |q| is the number of pins of the net q. This equation sums the total of the current coordinates of the cell v and sums this for all cells in an individual net, then divides by the total number of pins on the net. The result of the summation and division is the coordinate of the net q. [0248] Procedure 2: In new cell coordinate computation, for each cell v, the weight β[v] is represented by:
[0249] where for a net q, v is an element of q. [0250] The new cell coordinate Z[v] is equal to:
[0251] We apply the iteration procedure only on cells from the neighborhood except the center and only on nets that have at least one cell in the neighborhood. The iteration is generally accomplished for a pre-determined number of times, preferably 15 to 20 times. [0252] SECTION 3A: PARALLEL CELL PLACEMENT WITH MINIMAL CONFLICTS [0253] Referring now to FIG. 43, a flowchart [0254] As indicated by the reference numeral [0255] After the initial assignment of cells to the processors, the cells can be re-assigned between the processors [0256] As indicated by reference numeral [0257] To facilitate the discussion of the present invention, the following terms are used in this specification:
[0258] The value of time(C [0259] Assigning Cells to Processors [0260] As discussed above, the work load can be evenly distributed among the processors by assigning, to each of the processors, the average_load where the average_load is
[0261] Unlike the prior art techniques where the processors are assigned to regions of the IC, the present invention assigns the cells of the IC to each of the processors. For the initial assignment, the cells are divided into parts with the equal total times. More precisely, the following method is used. First, beginning with the first cell, C [0262] and the found set of cells {C [0263] and assign the set to P [0264] For each processor P [0265] Affinity Calculation and Cell Reassignment [0266] The reduction of crossover nets and inter-processor communications can be achieved by assigning the cells to processors to obtain the highest affinity value for the entire system. In this invention, the affinity of a cell to switch from the currently assigned processor to another processor consists of two parts. The first one is the reduction in number of conflicts and the second one controls the processors' load balance. Assuming that cell C affinity( [0267] and we define cell affinity (C [0268] The netlist_affinity(C [0269] The best way to calculate the number of conflicts caused by net N, denoted as conflicts(N), is to maintain an array (a [0270] Alternatively, conflicts(N) can be the number of different processors having cells from the net N minus 1. [0271] Yet another method to determine conflicts(N) is to assign 1 if cells from N are assigned to more than one processor and 0 otherwise. [0272] The load_affinity is the work load balancing factor and is determined by
[0273] A constant, λ, may be used as the weighing factor to shift the relative importance between the netlist_affinity and the load_affinity. A small constant value would reduce the relative effect of the load_affinity factor in the overall affinity calculation, thereby giving the netlist_affinity factor a relatively larger role in the determination of the affinity. In this case, the cells of the integrated circuit are more likely to be reassigned to processors based upon the reduction in the number of conflicts the reassignment will effect. On the other hand, a larger constant value would increase the relative effect of the load_affinity factor in the overall affinity calculation, thereby giving the load_affinity factor a relatively larger role in the determination of the affinity. Consequently, the cells of the integrated circuit are more likely to be reassigned to processors based upon work load balance among the processors. [0274] Once the cell affinities are calculated as discussed above, the cells are reassigned [0275] In all subsequent iterations [0276] The number of iterations [0277] Referring now to FIG. 44, an apparatus [0278] The specific algorithms described herein, as well as the basic steps which they represent (even if they are replaced by different algorithms), are designed for implementation in a general purpose computer. Furthermore, each of the algorithms described herein, as well as the basic steps it represents, can be encoded on computer storage media such as CD ROMS, floppy disks, computer harddrives, and other magnetic, optical, other machine readable media, whether alone or in combination with one or more of the algorithms and steps described herein. [0279] SECTION 3B: FLOOR PLAN OPTIMIZATION [0280] Step One: Assign Portions of the Core Space to the Functions. [0281] Referring to FIG. 44, the first step of the cell placement optimization method is to assign portions of core space [0282] The assignment of the functions to the portions of core space [0283] In our example, the eight functions are assigned to the portions of the core space
[0284] Some portions border each other while other portions overlap. Core portion [0285] The border area and the overlap areas will be used by the method of the present invention to optimally place cells such that the capacity distribution and utilization requirements are met. As described below, the bordering and the overlapping areas are used to shift the capacities of the functions assigned to the bordering and overlapping portions of the core space to create additional capacity for placing the cells of the functions with a shortage of capacity. For example, suppose the capacity of the core portion [0286] When the excess capacity of portion [0287] Likewise, f5 (portion [0288] Moreover, the capacity-shifting technique using the bordering and the overlapping regions, can be employed to shift excess capacity from one portion (function) of the integrated circuit to another portion of the integrated circuit even when the two portions do not share a border or an overlapping area. For example, if portion [0289] The details of the implementation of the shifting technique will be discussed below. [0290] Step Two: Define Regions. [0291] Referring now to FIG. 45, once the functions of the integrated circuit are assigned specific portions ( [0292] Each region is assigned to each of the portions which takes space from the region. FIG. 46 shows the relationship between the regions and the portions of the core space. As the table illustrates, in contrast to the one-to-one relationship between functions and portions, there is a one-to-many relationship between portions and the regions. [0293] If a border between two or more portions lies within a region, then the region is assigned to all of the portions which have its border within the region. For instance, as illustrated by FIGS. 44, 46, and
[0294] Step Three: Define the Pieces. [0295] Referring to FIGS. 47 and 48, after partitioning the core region [0296]FIG. 47 illustrates the relationship between the pieces and the regions of the core space, and Table 3B(3) below partially lists the pieces of the core space and the regions comprising each of the listed pieces. FIG. 48 shows all of the pieces of the integrated circuit
[0297] For convenience, the following expressions are used: [0298] ℑ set of all pieces of the core; [0299] ℑ(f) set of all pieces from which the portion of the core assigned to function ƒ may take some space. [0300] Referring to FIG. 48, in our example, ℑ={P1262, P1264, P1266, P1268, P1270, P1272, P1274, P1276, P1277, P1278, P1280, P1282, P1284, P1286, P1288, P1290, P1292, P1294, P1296}. Table 3B(4) below lists the ℑ(f) for some of the functions of the example integrated circuit.
[0301] As shown by FIGS. 47 and 48 and by Tables 3B(3) and 3B(4), piece 12P62 comprises all elementary regions belonging to f1 only. Piece 1264 comprises elementary regions each of which belongs to both f1 and f2. Note that a piece can comprise only a single elementary region. For instance, piece P1272 comprises only one elementary region R [0302] Each of the pieces has a capacity, or a maximum number of cells which can be placed in the core space defined by the piece. If a piece is assigned to a single portion (assigned to a function) of the core space, then entire capacity of the piece is available to the portion (i.e., to accommodate the cells of the function assigned to that portion); however, a piece, such as P1270, can be assigned to two or more portions, each portion representing a function. In such a case, the capacity of the piece is divided and allocated to the functions to which the piece belongs. Therefore, the following notation is used to express the capacity of a piece assigned to a portion, which, in turn, is assigned to a function: X [0303] For example, if piece P1264 has capacity for 4,000 cell height units, then X [0304] Step Four: Define Capacity and Utilization Requirements [0305] A cell placement is acceptable when the placement results in a predetermined level of utilization for each of the portions assigned to the functions of the circuit. [0306] To place the cells with a built in factor to achieve the predetermined level of utilization, the cells are given fictive heights prior to being placed on the core space. The fictive height of a cell is the height of the cell used to calculate the space, or the number of cell height units, required to place the cell on the core space. [0307] The actual height of a cell is usually measured in millimicrons. Because all of the standard cells have the same width, the cell height is usually used as the measure of capacity as well has the height of the cell. [0308] For example, if a function's target utilization rate is fifty percent, then the cells of the function should be placed on the core space such that the cells actually use fifty percent of the space provided for the cells on the core space. That is, when the cells of the function are placed on the core space, the ratio between the actual amount of the capacity used by the cells divided by the amount of the capacity taken up by or reserved for the cells must be fifty percent. Alternatively expressed, the utilization ratio determines the density of the space taken up to place the cells of the function. [0309] Therefore, if a function's target utilization rate is hh(f)=the sum of all fictive heights of all cells of the function f. [0310] For each piece of the core space, the following may be defined: cap(P)=the capacity of the piece P. [0311] Then, to meet the predetermined capacity distribution and utilization requirements, the following two expressions must be satisfied:
[0312] Expression (A) states that, for each piece P, the capacity of the piece, cap(P), must equal to the sum of X [0313] If cap(P) is less than the sum of all X [0314] If the capacity allocations, X [0315] Expression (B) states that, for each function, the sum of the fictive heights of all the cells of the function must be less than or equal to the sum of the capacities the function is assigned in each of the piece in which the function is assigned capacities. For example, referring to FIGS. 46 and 48, the sum of the fictive heights of all of the cells of f1 (assigned to portion [0316] If the sum of all fictive heights of all the cells of the function is greater than the sum of all the capacities of the function in each of its pieces, then there is insufficient amount of the core pace to place the cells of the function. [0317] In summary, if Expression (A) is not satisfied, then a solution is not feasible. In such a case, for a feasible placement solution, the functions must be reassigned to different portions of the core space, the pieces may be redefined, or the capacities of the pieces may be reallocated to the functions until Expression (A) is met. When Expression (A) is met, then a feasible cell placement exists, and Expression (B) is analyzed. If Expression (B) is met for a given cell placement, then the placement is a correct, and the processing stops. If Expression (B) is not met, then the following steps, Step Five, Step Six, and Step Seven, are followed to shift, or reallocate, the capacities of the pieces to meet Expression (B). [0318] Step Five: Construct the Graph. [0319] Referring now to FIG. 49, a graph [0320] In the instant example, the vertices ( [0321] Continuing to refer to FIG. 49, dashed-line edges [0322] The solid-line edges [0323] For example, edge [0324] Edge [0325] Referring primarily to FIG. 49 but also referring to FIGS. 47 and 48, vertices [0326] edge [0327] edge [0328] edge [0329] edge [0330] Step Six: Identify the Functions with Capacity Shortages and the Functions with Excess Capacity. [0331] After building the graph [0332] hh(f)>the sum of the capacities of all pieces which contribute core space to the function (i.e.,the sum of all X [0333] In other words, V1 contains all vertices (functions) which do not have sufficient core space to place all of their cells. [0334] The functions (vertices) with excess core space are assigned to V2. All functions with excess core space satisfy the expression: [0335] hh(f)<sum of the capacities of all pieces which contribute core space to the function (i.e., the sum of all X [0336] In other words, V2 contains all vertices (functions) which have more than the core space needed to place their cells. [0337] Step Seven: Shifting Excess Capacities to Meet Deficiencies. [0338] For each of the vertices of V1, the graph [0339] During the traversal, a chain of the vertices and the edges traversed is maintained. The chain begins with a first vertex (function, f [0340] The maximum capacity that can be shifted from f [0341] 1. the amount of the shortage of f [0342] 2. the amount of excess capacity of f [0343] 3. the smallest maximum-capacity of any of the edges of the chain. The capacity of each of the edges is expressed as W(f1,f2,P). [0344] After building the chain through which excess capacity of a piece can be shifted, the capacities of the each pieces of the chain, is updated as to shift the amount of capacity, represented by α, from the second vertex (f [0345] The process can best be illustrated using an example shown by FIG. 50. Referring primarily to FIG. 50, but also to FIGS. [0346] A. The vertices have the following properties: [0347] 1. vertex [0348] 2. vertex [0349] 3. vertex [0350] 4. vertices [0351] B. The edges have the following properties: [0352] 1. [0353] 2. [0354] 3. [0355] 4. [0356] Given the graph [0357] Chain 1: [0358] Continuing to refer to FIG. 50 but also referring to FIGS. [0359] The chain can be denoted [0360] (1) 500, the deficiency of f3; [0361] (2) 300, the amount of excess capacity of f1; and [0362] ( [0363] Therefore, the α of Chain 1 is 300. [0364] The actual shifting of 300 cell height units from f1 (vertex [0365] (1) reallocating Xf1,P1264 to be 300 units less than its previous value, thereby freeing space for cells of f2 in piece P1264; [0366] (2) reallocating Xf2,P1264 to be 300 units more than its previous value, thereby taking the freed space, and creating an excess capacity of 300 units in f2; [0367] (3) reallocating Xf2,P1268 to be 300 units less than its previous value, thereby freeing space for cells of f3 in piece P1268; and [0368] (4) reallocating Xf1,P1268 to be 300 units more than its previous value, thereby adding space for cells of f3 in piece P1268, alleviating the shortage by 300 cell height units. [0369] After the above-listed operations to shift [0370] Chain 2: [0371] The 200 units of 1200 excess capacity of function f5 can be shifted to f3 in a similar operation using Chain 2 which can be denoted [0372] The above described process is repeated for each of the vertices (functions) of the set V1 until no vertices remain in the set. Set V1 cannot be emptied if at least one vertex (function) of the set does not have sufficient core space to place all of its cells. In that case the placement is not possible under the given parameters. [0373] Also, a vertex (function) cannot be reached to claim its excess core space when the total space assigned to the functions in the neighborhood is less than the minimal required to place the cells of the respective functions. To overcome this problem, the process disclosed by this document can be rerun after making one or more of the following changes: [0374] 1. the utilization of some or all of the neighboring functions can be increased; [0375] 2. the physical area assigned to the neighboring functions can be increased; or [0376] 3. elementary region grid can be modified to create shared core space pieces encompassing the function and its neighboring functions. [0377] SECTION 4: ITERATIVE ONE DIMENSIONAL PREPLACEMENT OPTIMIZATION [0378] A one dimensional iterative optimization initially provides a fast, good cell coordinate placement. The one dimensional iterative optimization is performed in both the x and y directions. As may be appreciated by one of ordinary skill in the art, the iterative optimization may be performed in the y direction initially, but the preferred method is to perform it in the x direction. In the x direction, a netlist or hypergraph H includes the set V of cells v and the set Q of nets q. In addition, it should be noted that where “x” or “X” is used below for calculation in the x-direction, when calculating in the y-direction, “y” or “Y” would be used. As used herein, “z” and “Z” are universal notations representing either “x” and “X”, on the one hand, or “y” and “Y” on the other, depending on which direction is being considered. [0379]FIG. 9 illustrates a cell v [0380] As shown in FIG. 10, the core [0381] In initial placement optimization initialization step [0382] The flowchart of FIG. 12 demonstrates the procedure for obtaining an initial one dimensional placement of the movable cells. The movable cells are assigned the coordinate of the center of the region where they are located in initialization step [0383] where |q| is the number of pins of the net q. This equation sums the total of the current coordinates of the cell v and the z-offset (x or y depending on the direction) of the pin on the cell which belongs to the net q, and sums this for all cells in an individual net, then divides by the total number of pins on the net. The result of the summation and division is the coordinate of the net q. [0384] In new cell coordinate computation step [0385] where for a net q, v is an element of q. [0386] For each interval X[i], X[i+1] and each cell v from that interval, the new cell coordinate Z[v] is equal to:
[0387] Z* (q,i) is calculated by determining a temporary value a, where a initially equals Z[q]. If a is greater than X[i+1], or is outside the interval, then a is set to the greater border condition, or equal to X[i+1]. If a is less than X[i], again outside the interval, then a is set to the lesser border condition X[i]. Finally, Z* (q,i) is set equal to a. [0388] This set of steps places the coordinates of the nets along an imaginary line as shown in FIG. 13, line [0389] Once the new cell coordinates are computed, the difference between the previous value of the cost function and the new value of the cost function is determined in step [0390] SECTION 5: FAST PROCEDURE FOR FINDING A LEVELIZING CUT POINT [0391] The surface abstraction, or core region [0392] The dividing point in the current example generates two subregions in the region R[i,j]. As shown in FIG. 15, the placement of nets along the line in the x direction is partitioned along a dividing line [0393] and the sum of all heights of all cells having coordinates greater than Z [0394] It is preferable to use the total of all cell heights, but other parameters, such as the number of cells, may be used while still within the scope of the invention. [0395] Initially, if all cells within a given region R[i,j] are within an interval (A,B), the (A,B) interval is subdivided into N equal subintervals in subdivision step [0396] As an example, assume (A,B) is an interval from 0 to 200 and 10 equal subintervals are desired. In fact, a number in the range of 1000 such intervals would normally be desired, but 10 is used here for purposes of illustration. Further, assume that one cell is located in each of the ten subintervals, as shown in FIG. 17, although it would be probable that subintervals would contain more than one cell. Assigning A has a value of zero, and B a value of 200, n(v) for a cell in this arrangement is equal to the minimum integer value greater than Z(v)/10 for the cell. The designation “[x]” denotes take the minimum integer greater than x, such that for x having a value of 1.3, the value of ]x[ is 2. [0397] This results in an integer value for a subinterval within the (A,B) region where the individual cell is located. Height accumulation step [0398] where h(v) is the height of cell v. Value of array elements step [0399] Cut point index locator step [0400] The levelizing cut point, where cell height is equivalent to the percentage of area within the surface abstraction is equal to
[0401] Clustering of cells within a single region, or at a border of a region, may provide an inaccurate cut point. In such a case, where the levelizing cut point requires a higher accuracy, the subinterval where the levelizing cutpoint is located may be again divided into N subintervals in subdivision step [0402] Once the levelizing cutpoint is located, all cells are shifted according to the following procedure. [0403] SECTION 6: MEDIAN CONTROL AND INCREASE IN RESOLUTION [0404] For a given region R[i,j] having a dividing point D, the levelizing cut point Z
[0405] For cells outside the region, those cells are placed at the border of the region. If a is greater than X[I[v]+1] then a is equal to X[I[v]+1]. If X is less than X[i[v]], then a is equal to X[I[v]]. Z[v] is then set equal to this value a. [0406] In the preferred embodiment, the system initially places all cells at the center of the two-dimensional abstraction of the chip surface. The system then performs a predetermined number of iterations of the One Dimensional Preplacement Optimization in one direction, such as the x direction. The surface abstraction is then subdivided into sub-regions by dividing the surface abstraction in the opposite direction. The system then uses the Levelizing Cut Point procedure to partition the cells into groups proportional to the capacities of the subregions. The Median Control procedure then modifies the coordinates of the cells. The Levelizing Cut Point and Median Control procedures are iterated a specified number of times (preferably 6) with the specified number of iterations comprising a Block. The average cost function is computed after a Block is performed. After each Block, an overall cost function, described below, is computed. After repeating this Block a predetermined number of times (typically 10), the system computes the average cost of each cost calculated during these Block iterations. The current average cost value is compared with the previous average cost value, and if the difference between the average value and the previous value is less than a predetermined value (such as 10 [0407] If the average cost function has not decreased by a specified amount, further Blocks of computations are required. At the end of this iterative procedure the cells are assigned to subregions in such a way that the capacities of the subregions are not violated. [0408] After assignment of the cells to a respective subregion, as is described in §1 above, the system may repeat the aforementioned procedures based on a cut in the opposite direction. If, for example, the initial iterative one-dimensional preplacement optimization divides the available space on the surface abstraction by a vertical line, or divides in the x direction, the system executes the finding of a levelizing cut point procedure and the median control and resolution increase procedure in this direction. Upon completion of these procedures, the cells are assigned to one of the two regions, and the procedure may be repeated in the y-direction, based on the cells located in the two regions, after other optimization procedures discussed below are completed. As shown in FIG. 20, the two regions are divided using two dividing lines in the y direction and cells are placed along these two lines. The system locates a levelizing cut point for each region and partitions out the cells to the four remaining cells. This division in the y direction creates a second level of hierarchy. [0409] For purposes of this patent specification, hierarchy levels are determined based on the number of divisions of the surface abstraction. The level of hierarchy is the sum of the number of times the surface abstraction is divided into separate regions. For example, if the surface abstraction has been divided three times in the x direction and two times in the y direction, the system has reached the fifth level of hierarchy. The total number of regions is equal to 2 [0410] SECTION 7: UNIVERSAL AFFINITY DRIVEN DISCRETE PLACEMENT OPTIMIZATION [0411] After each surface abstraction division, the system performs a discrete placement optimization. For purposes of illustrating this procedure, it is assumed that the previous routines have furnished two sets of cells partitioned into two regions on the surface abstraction. All cells are located in the centers of each region. [0412] The system calculates affinities and cost functions for the arrangement. An affinity is calculated based on current cell placement and blockages in a chip. Affinities are heuristically connected with a desired cost function, which should be minimized. Affinities can be driven by cell density, wire length, minimizing maximal cut, clustering, etc., or some combination of these parameters depending on the goal sought to be achieved. Affinities may be positive or negative, and relate to the quality of an alternate placement of a cell. For example, having a cell with a higher affinity at a first location and a lower affinity at a second location indicates that the preferred placement of the cell is the first location. [0413] The parameter of the discrete placement optimization is ε, which represents the accuracy of the placement, and is a small number, such as 10 [0414] Global threshold evaluation step [0415] The overall global threshold is determined using a similar procedure to that described above with reference to the procedure for finding the levelizing cut point. The affinities are ordered sequentially, and all cells are defined to be within an interval (A,B), exclusive of negative affinities. With reference to FIG. 22, the (A,B) interval is subdivided into k equal subintervals in subdivision step [0416] MaxAff(v) is the maximum affinity over all adjacent regions for the cell v. This calculation yields an integer value denoting a subinterval within the (A,B) region where the individual cell affinity is located. Now for each cell v we increase the appropriate element of the array by 1 such that A(i(v))=A(i(v))+1. Cell affinity summation array step [0417] Global affinity evaluation step [0418] After calculating GlobThresh, the system evaluates the list of all cells in a predetermined sequential order. Affinity comparison step [0419] The result of this procedure is a global threshold for all cells. Some cells have been moved to adjacent regions, altering affinities of other cells. The procedure is then repeated two more times, for a total of three iterations, through the list of all movable cells using the same threshold. [0420] As an additional and optional procedure, a local threshold can be calculated in addition to the global threshold. The local threshold is calculated in the same fashion as the global threshold, but with respect to only the cells from the region where the cell is located. If we use this additional, optional procedure, we move the cell only if the maximal affinity is greater than both the global threshold and the local threshold. [0421] An average cost function, representing the average of the three values of the cost function calculated after each iteration, is computed. Now we compute a new threshold as described above in step AvgCost [0422] then the optimization process is halted. δ is a small number, typically 10 [0423] SECTION 8: DENSITY DRIVEN CAPACITY PENALTY SYSTEM [0424] The surface abstraction is partitioned alternately in the vertical and horizontal directions, where each division denotes an additional level of hierarchy. The levels of hierarchy, Lev [0425] The system then calculates a region capacity in terms of the heights of cells which can be located within a single region. This capacity of cell heights accounts for rows or columns of locations where cells may be located. As shown in FIG. 23, the cell region will have a certain number of columns, or possibly rows, located therein. The cell height capacity represents the space available to individual cells within the region and is based on the hierarchy of the surface abstraction. As outlined below, the highest level of hierarchy defines a single column per region. As may be appreciated by one of ordinary skill in the art, rows may be used rather than columns to define a total cell width capacity rather than a height capacity. [0426] All cells are located at the center of a region during some phases of the placement procedure. The height of a single cell may extend into more than one region. A parameter ColKey is assigned to this placement system process. The center of each cell is assigned to the center of the region it occupies. If ColKey has a value of 0, the entire height of the cell is located within a single region. If ColKey is equal to 1, the height of the cell is distributed to the regions the cell overlaps. For example, if a cell has a height of 16 units while the region has a height of ten units, three units are assigned to the cell above and three to the cell below the current cell. Cells located in an edge region are assigned to the region away from the edge, and not to any region outside the edge. Hence in the example previously presented, ten units of the cell would be assigned to the edge region and three to the region above the edge region. [0427] Movement of the cells from one region to another requires updating the total of the heights in each region. [0428] Each cell v is located within regions with indices I[v] and J[v], in the x and y directions, respectively. Movement of a cell to an adjacent region is denoted by Δ [0429] for Δ [0430] otherwise (i.e., either or both Δ [0431] α represents the degree of counting, which affects the movability of a cell v to a new region. α will typically have a value between 0.1 and 1. Prior level calculation step PenCap( [0432] where λ is the capacity penalty weight in the total affinity, and Sw is a switch parameter set to 0 or 1 depending on whether use of the PenCap8 variable is desired. PenCap8 is used only when the area is divided into 16 by 16 regions or more. λ [0433] where λ [0434] SECTION 9: WIRE LENGTH DRIVEN AFFINITY SYSTEM [0435] An alternate embodiment of the current design is to calculate affinities and penalties according to the relative wire lengths of different designs. This procedure provides a set of affinities providing the minimal wire length over all feasible placement solutions. [0436] For each cell v and net q, the minimum and maximum values for the X component penalties are as follows: [0437] where X(w) is the current coordinate of the cell origin, x(v,q) is a pin offset from the origin where the pin belongs to the net q. The y component penalties are similar: [0438] These equations define a bounding box [0439] The borders of the region where the cell v is located are denoted by: [0440] X [0441] X [0442] Y [0443] Y [0444] The penalty vector for cell v and net q in the x-direction is: [0445] (PenHP [0446] These values correspond respectively to movement of the cell to the left, nonmovement of the cell in the horizontal direction, and movement of the cell to the right. [0447] The penalty vector for cell v and net q in the y-direction is: [0448] (PenHP [0449] These values correspond respectively to movement of the cell upward, nonmovement of the cell in the vertical direction, and movement of the cell downward. [0450] The penalty vector for the individual situation is as follows. If λ [0451] The total penalty for a cell v in the X direction is a normalized sum of the penalties in the X direction over all nets incident to the cell v:
[0452] In the Y direction,
[0453] The total penalty is the sum of the x and y components: PenHP( [0454] The affinity is the opposite of the penalty: AffHP( [0455] and a first combined affinity is calculated based on both capacity and wire length: Aff( [0456] QEF(v) represents a scaling factor having the following parameters:
[0457] where Height(v) represents the height of the cell v. Although any values may be used for A and B in this equation, experience and testing has shown that the values of 5 and 5 produce the most beneficial results. [0458] SECTION 10: MINIMIZING MAXIMAL CUT DRIVEN AFFINITY SYSTEM [0459] Another parameter used to produce an affinity for improving cell placement is minimizing the maximal number of nets that intersect the unit segment of the grid system imposed of the surface abstraction of the chip. Net overlap inherently yields inefficiency of wiring, and thus minimizing the number of nets which cross other nets improves overall system efficiency. For each level of chip core partitioning hierarchy, the number and position of the vertical and horizontal lines which induced the level of partitioning hierarchy are evaluated, including determining the number of nets which intersect a line partitioning the cell into regions. Initially, the system determines the number of nets which intersect the lines and the relative affinities for these line crossings. The system moves the cells and the nets change position based on relative affinities, and then the number of net crossings and affinities are recomputed. [0460] As shown in FIG. 26, each dividing line partitions regions, and each of these regions has a capacity denoting the volume of cells which can fit within the region. The system performs the following procedure once after each bisection. The system calculates the capacities as an average capacity of regions adjacent to the dividing line. In FIG. 26, the capacity of dividing line X(i) is defined as the average capacity of all regions to the left of the line and all regions to the right of the line. The system calculates average vertical line capacity and average horizontal line capacity for all lines, representing the amount of wiring which is available over the entire surface abstraction. The capacity may also represent available space for wiring available on multiple layers of the chip. The capacity of each horizontal and vertical line is then divided by the corresponding horizontal or vertical average values. Hence, if the capacity of the line represented by X(i) in FIG. 26 has a capacity of 1500 cells and the average capacity of all vertical lines on the surface abstraction is 1000 cells, the relative cut of the line is 1.5. The ratio of the number of nets crossing a line and the capacity of the line are defined as the relative cut. [0461] Before each optimization step in the affinity driven discrete placement optimization procedure, and particularly before calculation of global and/or local thresholds, the system calculates a midcut for the surface abstraction. The midcut represents the average relative cut over all lines of the surface abstraction. FIG. 27 represents a region having indices (TX, TY). The number of cuts represents, with the current cell configuration, the number of times a net crosses a boundary, while the capacity of the line represents the total number of possible crossings of the particular boundary. The system calculates four penalties which represent the cost of a change for a half-perimeter move of cells within the region one unit to the right, left, up, and down:
[0462] These equations, as illustrated in FIG. 27, represent the number of cuts over region dividing lines TX, TX+1, TY, and TY+1 relative to the capacity of the dividing lines. The XI and YJ factors represent the size of one region. The factor β represents the relative penalty associated with cuts, and testing has shown that a reasonable range for β factors is 0.4 to 0.5. As shown in FIG. 27, for a region twenty units in length on the x and y sides, with ten cuts along each dimension and a capacity for one hundred cuts, with an average number of cuts equal to twenty cuts, and a β factor of 0.45, the values for DXL and DYB are 11.045 each. For 40 cuts on the right hand side and upper side of the regions, the values are 11.18. [0463] Discrete affinities in the x and y direction represent the numbers of nets whose half-perimeter decreases on movement of cells across the boundary minus the number of nets whose halfperimeter increases when a cell moves in a given direction. AffX[i], i=−1,0,1; AffY[j], j=−1,0,1 [0464] Affinity for zero movement represents the numbers calculated above. Movement of a cell in a particular direction, such as crossing a boundary line, induces an affinity for that cell. From FIG. 27, movement of the cell to the right and up decreases the penalty, or increases the affinity for the cell. Thus affinity in the x direction, AffX, for movement to the right is −1, to the left is 1, and affinity for movement in the y direction, AffY for movement of the cell upward is −1, and downward is 1. Affinity for keeping the cell in its current position is 0. [0465] The discrete affinities for movement in each of the four directions are multiplied by the corresponding factor: AffcutX[−1]=AffX[−1 AffcutY[−1]=AffY[−1 AffcutX[1]=AffX[1]* AffcutY[1]=AffY[1]* [0466] Total affinities for movement of the cell in the vertical and horizontal directions are the summation of affinities in the X and Y directions: Affcut[ [0467] In alternative embodiment of this procedure is to use the square of the number of crossings as a component of the cost of change for the halfperimeter move. For movement to the left, this would yield an equation of:
[0468] Squaring the factors increases the emphasis on the number of cuts, and balancing with new β′ factors yields an arrangement wherein the total number of cuts converge rapidly to a relatively uniform quantity. [0469] SECTION 11: NEIGHBORHOOD SYSTEM DRIVEN OPTIMIZATION [0470] Each moveable cell v is located within a neighborhood Neigh(v) constructed in accordance with the optimization of cell neighborhood system procedure outlined above. That procedure yields an ordering of cells according to the cells' distance from the center of the neighborhood, after optimization. FIG. 28 illustrates such an ordering of cells within the neighborhood, Neigh(v)=(w(v,1), w(v,2), . . . w(v,M)), where M is the size of the neighborhood, generally in the range of 20 cells. [0471] From FIG. 29, weight assignment step [0472] An alternate preferred method of assigning weights is to declare a number L, where L equals M plus some positive integer, such as 2, and weights range from 21 down to 2. The reason for this shift is that the weight accorded to a factor of 1 is infinitely greater in terms of multiplications than a factor of zero. Thus relative weights may be misleading if low number factors, such as zero and one, are used as weighing factors. Any monotonically decreasing function may be employed in defining the weights accorded the cells within the neighborhood. [0473] The system then calculates attraction weights in step [0474] These equations represent the weights of the neighborhood attraction in a direction. For example, assume a neighborhood (v [0475] In affinity definition step AffNeighborhood [0476] [0477] These values represent the relative overall benefit of moving the location of the neighborhood in a particular direction or leaving the neighborhood in its current position. Grid [0478] Resuming with the example of FIG. 30, M is equal to five and we are in the fourth level of hierarchy. Thus, AffNeighborhood [0479] Affinities may be combined while still within the scope of the current invention. Combinations of capacity affinities, wire length affinities, cut affinities, and neighborhood affinities present an enhanced system of determining the preferred direction of movement of a cell or net. Such an affinity combination may include combining the following affinities: Aff( [0480] As outlined above, QEF(v) represents the capacity penalty influence factor, which is a function of cell v relative height. Such a combination of affinities takes into account cell position as well as relative weight accorded to an individual neighborhood. [0481] SECTION 12: FUNCTIONAL SIEVE OPTIMIZATION TECHNIQUE [0482] The combination of affinities introduces an element of randomization. A deterministic system for combining affinities which converges at a relatively rapid rate is desired to optimally utilize affinities. Such a system which iteratively optimizes cell placement using a combination of affinities is the functional sieve approach. [0483] The functional sieve performs several calculated iterations of combining affinities and moving cells based on relative affinities and then computing cost functions for the new cell positions. The functional sieve utilizes the following basic formula: Aff( [0484] As illustrated in FIG. 31, the system in step [0485] After computing the cost function, the system performs a predetermined number of major iterations and calculates the cost function after each major iteration. The preferred number of major iterations and cost function calculations is six. After this predetermined number of major iterations and cost function value calculations, the system computes the average cost value for all of the costs calculated in the previous steps. This procedure steps through different affinity evaluations and obtains a preferred overall movement of cells on the surface abstraction. The functional sieve optimization process is halted when two consecutive cost average function values satisfy a given accuracy, such as 10 [0486] During the discrete placement procedure described above, the μ [0487] During final placement, a crystallization procedure produces fine placement of the cells after the aforementioned functional sieve procedure is completed. The system uses a small non-zero value, such as 10 [0488] An alternate embodiment of the current functional sieve alters the multiplying factors for the various affinities. Such an embodiment is particularly useful in crowded net situations, and emphasizes cross cuts while taking advantage of open nets. [0489] The basic equation for the alternative embodiment is: Aff( [0490] where β is a number between zero and one, depending on the emphasis desired placed on the number of cuts. [0491] The affinity combinations disclosed within this functional sieve operation are not limited to those disclosed here, and may include other combinations using other weighing factors. Such an alternate weighing and affinity scheme would produce a desirable placement of cells and still within the scope of the present invention. [0492] SECTION 13: COARSE OVERFLOW REMOVER (BULLDOZER) [0493] A coarse overflow remover procedure is applied on the highest level of the chip core region hierarchy when each region contains a piece of only one column. The list of cells is scanned in the order of decreasing heights in order to find a new region for each of them. A list of cells in order of decreasing cell height is made. If the height of a cell is smaller than the available space in the corresponding column segment, then the cell retains its location. Most of the cells will keep their previous positions if the initial cell density is acceptable. [0494]FIG. 33 represents a portion of the chip that has seven columns [0495] The capacity of a column segment is its height. The next cell from the list will get a new position according to the following rule: look for the closest (using Manhattan distance) region to the current cell so that the corresponding column segment will not have an overflow capacity if the next cell is assigned to that region. A preferred order for scanning the regions is shown in FIG. 34. First consider the original region (marked with the numeral 0) and then consider the regions having a distance of 1, then consider the regions having a distance of 2, etc. [0496] This step considers only cells that already were assigned new positions and the current one. Usually, a cell is going to stay on the old position. As soon as the region is found that satisfies this condition, the region scanning is stopped, and assign the cell to that region. If the original region satisfies the condition, the cell is reassigned to the original region. [0497] SECTION 14: OVERLAP REMOVER WITH MINIMAL NOISE [0498] The purpose of this process is to smoothly remove cell overlap with minimal increase of the wire length. FIG. 36 is a flow chart of an overlap remover according to the invention. The overlap remover process is applied separately to each column of cells. It is assumed that each column is continuously connected with no blockages between cells of the same column. As shown in FIG. 35, denote the top and bottom of the column with index j by T[j] and B[j], respectively. Similarly the top and bottom of column k are denoted by T[k] and B[k], respectively. The vertical grid step is used as the unit of measure. [0499] First the cells in a column are sorted in the order of increasing cell bottom y coordinates. Denote cells in that order by [0500] v [0501] [0502] The bottom coordinates of these cells are [0503] Y [0504] As shown in FIG. 35, the parameter zaz is defined as the distance between the top of one cell in a column and the bottom of the next cell upward. There must be at least one grid space between adjacent cells to have a feasible layout. [0505]FIG. 37 illustrates a numerical example. Suppose a column [0506] The average extra space per cell is now calculated as
[0507] The parameter minzaz satisfies the condition
[0508] Therefore, for the example given a possible value for minzaz is 3. [0509] The following array is calculated: [0510] where Norms [v [0511] The parameter of the overlap remover process is integer values of minzaz, which can be positive or negative. The process further includes the step of modifying the array zaz such that all its elements are not less than minzaz. The array elements are processed forward and backward alternately. The following procedure is executed: [0512] (a) At the beginning of the process the counter is initialized to zero. If the processing element is less than minzaz, then the element is increased by 1 and the counter is decreased by 1 and the next element is processed. [0513] (b) If the element is greater than minzaz and also positive, but the counter is negative, then the counter is increased by 1 and the element is decreased by 1. The steps (a) and (b) are repeated until the condition is satisfied. Then we proceed with the next element. [0514] (c) If all elements became not less than minzaz [zaz(i)≧minzaz) and the counter has zero value, the process is stopped. The cells are moved in one grid interval increments until the condition is satisfied. [0515]FIG. 36 is a flowchart of a preferred process adjusting cell spacing in the column to remove overlap with minimal noise. The process of adjusting cell spacing begins with a step [0516] If zaz(i) is not less than minzaz, then a process step [0517] The remaining situation to be considered is when zaz(i) is not less than minzaz and the condition count >0 and zaz(i) <maxzaz is not satisfied. In a process step [0518] The process of adjusting cell spacing then proceeds to a step [0519] The result of adjusting the cell spacing in accordance with this preferred process is that overlap between cells is removed and spacing that were too large have been reduced to acceptable values. Cells that previously overlapped now have a spacing zaz(i) of one grid space. Cells that were too far apart now have spacings zaz(i) such that minzaz ≦zaz(i)≦maxzaz. [0520] After finishing the procedure the cell coordinates are modified: [0521] Y [0522] Y [0523] For i=2, 3, . . . , n. [0524] SECTION 15: SINUSOIDAL OPTIMIZATION [0525] This procedure significantly levelizes the cell density with almost no increase in wire length. The ColKey parameter has been discussed above in the section that describes the density-driven capacity penalty system. For the sinusoidal optimization procedure the ColKey parameter should be set to 1. Setting the ColKey parameter to 1 means that the height of a cell is distributed over all regions with which the cell overlaps. Precisely, if the cell has been assigned to the highest level hierarchy region with an index j, it is assumed that the cell center is in the center of the region. Depending on the real height of the cell, the occupancy is updated for all regions the cell with which the cell overlaps. [0526] The region occupancy is updated after every cell move. Because the number of cells higher than the smallest region height is relatively small, updating the region occupancy is not going to affect the complexity of the optimization. In addition to the basic region capacity penalty, which is calculated taking into account real cell dimensions as described above, the segment column capacity penalty is also used now. It is necessary to consider the capacity penalty to achieve more uniform distribution of big cells on the chip. [0527] The main block of the sinusoidal optimization procedure comprises a number of big iterations of the discrete placement optimization described previously herein with reference to FIGS. 21 and 22. Denote that main block by Optim (k), where k is the number of iterations. The main parameter is the capacity penalty influence parameter λ, which has been described previously with reference to FIGS. 23 and 24. The value of the capacity penalty influence parameter λ will be changed during the sinusoidal optimization process. [0528] Steps that preferably are included in the sinusoidal optimization procedure are as follows: [0529] { [0530] Optim(m); [0531] λ=λ·l [0532] Optim(2·m); [0533] λ=λ·l [0534] Optim(m) [0535] λ=λ/l [0536] Optim(2·m); [0537] λ=λ/l [0538] }. [0539] where m and l are predetermined integer parameters. Typically m is one of the numbers 6 to 10, and λ is 2. This sinusoidal optimization procedure typically is iterated in combination with the other levelizing procedures described herein, specifically, the dispersion-driven leveling system described in §16. [0540] There are two types of sinusoidal optimization. One type is unconstrained and contains standard discrete placement optimization. The other type of sinusoidal optimization controls cell column densities inside the discrete placement optimization. [0541] SECTION 16: DISPERSION-DRIVEN LEVELIZING SYSTEM [0542] This procedure does smooth continuous cell density levelization on the chip and is illustrated by FIG. 32. First, a new coordinate system is introduced on the chip by imposing a mesh on the chip and assigning integer coordinates to the nodes of the mesh. The nodes of the mesh are classified as to whether they are movable or fixed. Nodes of a square that overlaps with a blockage or a megacell are fixed. All other nodes are movable. [0543] The densities of the square regions are calculated as a sum of portions of the height of the cells that overlap the region. [0544] After coordinates are assigned to the nodes of the square mesh, the node coordinates are transformed such that the squares defined by the mesh are deformed into arbitrary equilaterals. A constraint on the deformation of the mesh is that regions that overlap with megacells are not deformed. [0545] The coordinates of the movable nodes are iteratively recalculated to minimize the special cost function density dispersion. To speed up the convergence, the whole optimization procedure is organized hierarchically. Starting from the mesh square regions the hierarchy is built up using quadragrouping (reverse quadrasection). [0546] On the hierarchy level k denote by den (k, i, j) the density of the region (k, i, j), and by s (k, i, j) the area of the region. The total density DEN will be the sum of the densities of the regions for all i and j.
[0547] If the total available core area is a fixed number S, then define
[0548] The density dispersion D is then given by
[0549] which is the cost function. The dispersion is minimized by doing coordinate node local moves. Suppose the node is not on the core border and therefore has four adjacent regions. Then for each node A with coordinate (x, y) the local average density is computed as
[0550] where den [0551] The local cost function is defined as
[0552] The coordinates for A are chosen in order to minimize the local cost function. An algorithm for minimizing the local cost is to separately move each point A(x,y) a distance δ to the left or right (up or down for the y coordinate). The value of δ can change with each coordinate. The value of the cost function is calculated for each move. In each local region the set of the coordinates that minimizes the cost function is chosen for the cells. [0553] After all of the global levelization steps have been performed, there may still be some density “peaks” in the core region of the chip. The bulldozer procedure described above may be applied to remove these peaks. Finally, the sinusoidal optimization procedure is applied again to the chip surface, which is by now subdivided into cell columns. Reapplying the sinusoidal optimization process ensures that the cells will be evenly assigned to the columns as required by the structure of the final design. [0554] SECTION 16A: EFFICIENT MULTIPROCESSING OF CELL PLACEMENT ALGORITHMS [0555] An exemplary integrated circuit chip is illustrated in FIG. 2 and generally designated by the reference numeral [0556]FIG. 39 illustrates one possible partitioning of a core area [0557] To simplify the discussion, this specification will refer to each of the regions of the grid as R [0558]FIG. 39 also illustrates cell swaps between regions of the core area [0559] For the purposes of our discussion, it is assumed that three (3) processors—P [0560] If the regions are assigned to the processors sequentially, then the order in which the regions are processed and the processor assignments to the regions might be as shown below in Table 16A(1).
[0561] The entire Table 16A(1) represents the core area
[0562] Under the cell placement process described above, the first set of regions R [0563] As illustrated by FIG. 39, regions R
[0564] In this scenario, three sets of area-conflict problems arise. The first area-conflict is between P [0565] By constraining cell movements to adjacent regions only, the cell movement [0566] The final problem arising out of the current scenario is a possibility of a deadlock between the processors. If, for example, P [0567] All three problems discussed above can be minimized, or eliminated, if any two processors are, at any one time, operating sufficiently distant from each other to avoid area-conflicts. Automated assignments of regions to multiple processors for simultaneous processing such that the regions are sufficiently distant to avoid area conflicts is an important aspect of the present invention. The assignment is accomplished as follows: (1) dividing the core area into a plurality of rectangular regions of M columns by N rows; (2) determining the “interval parameter” for both the columns and for rows; and (3) determining a sequence in which the rectangular regions are to be processed such that each set of simultaneously processed regions contains regions which are sufficiently distant from each other to avoid conflicts. [0568] Consequently, when the multiple processors are assigned to the regions, each of the processors will be processing cells of a region far enough from the other regions being processed at that time such that area-conflict and deadlock problems are greatly reduced. In addition, the need to restrict the movements of cells, which creates local optimum problem, is also eliminated. [0569] The number of columns M and the number of rows N are predetermined and can be arbitrarily set. However, the value of M is typically set as one half of the number of cell columns in the core area, and the value of N is typically equal to M. FIG. 40 shows the core area [0570] The column “interval parameter,” denoted KX, may be any number greater than one and less than M. The row “interval parameter,” denoted KY, may be any number greater than one and less than N. The interval parameters are used in sequencing the rectangular regions as will be discussed more fully below. Although KX and KY may be assigned arbitrary values within the respective limits, it has been found that good choices for KX and for KY are:
[0571] Referring now to FIG. 41, a flowchart [0572] To create the sequence, the first operation, as indicated by the reference number [0573] For each of the columns traversed by p, a second index, denoted as q for the purposes of this discussion, is used to traverse the rows one (1) through the KY [0574] As indicated by the operation C [0575] For the instant example, the column traversal will be [0576] for p=1: C [0577] for p=2: C [0578] for p=3: C [0579] The index p will not reach 4 because KX=3. [0580] Operation W [0581] For the instant example, the row traversal will be: [0582] for q=1: W [0583] for q=2: W [0584] The index q will not reach 3 because KY=2. [0585] Using the indices i and j to traverse columns and rows in the above described manner, the sequence is created, as indicated by operation [0586] The above-described operations to produce a sequence of regions R
[0587] Alternatively, using a repeat-until construct, the pseudo-program becomes:
[0588] Utilizing the operations as described above, and using the values discussed previously, the core area
[0589] The entire Table 16A(4) represents the core area [0590] In the instant example, the first three regions of the sequence, R [0591] The above described assignment technique increases the effectiveness of parallel processing because no processor has to wait idlely for another processor to finish its operation before processing another region. The effect of the above discussed assignment technique on the overall performance of the placement algorithm is most evident when the number of cells in each of the rectangular regions varies or when processors are operating at different speeds from each other. [0592] Table 16A(5) below sets forth one possible order in which the regions may be simultaneously processed by the processors.
[0593] As Tables 16A(4) and 16A(5) illustrate, no two adjacent regions are processed simultaneously in this example. In particular, note that regions R [0594] Under the new cell placement process described above, the first set of regions to be simultaneously processed by the three processors are R [0595] However, it is possible, even under the new cell placement process, for some conflicts to exist. The eighth iteration of the new process, as detailed by Tables 16A(4) and 16A(5) may be used to illustrate the advantages of the new process even where some conflicts occur. [0596] The eighth iteration of the cell placement process involves the regions R
[0597] In this scenario, only one area-conflict problem exists. The area-conflict is between P [0598] SECTION 17: CELL PLACEMENT CRYSTALLIZATION [0599] The purpose of this procedure is to get final cell placement. First, the height of each cell is increased by one grid plus γ [0600] Now, the original height of each cell is increased by one grid plus a certain percentage of the remaining available space. For this purpose, 72% is preferable. Then the overlap remover procedure is executed with maxzaz set equal to the column height to ensure that there is no overflow in any of the connected column segments. [0601] Next the positions of the large cells are fixed and then the sinusoidal optimization is executed for k [0602] Now the detailed coordinates of each cell are obtained. In the remaining part of the placement crystallization the following three procedures are iterated: [0603] 1. The vertical optimization is performed for k3 iterations. During one iteration, the list of cells is scanned. For each cell the change in the cost function is calculated if the cell is moved down for a (parameter). The change in cost function is calculated if the cell is moved up. The move that improves the cost function the most (if any) is performed. [0604] 2. Overlap remover with minimal noise. [0605] 3. Next k [0606] Finally, referring to FIG. 38, the cells are set to the grids by increasing the y-coordinate until the bottom of each cell reaches the closest horizontal grid line. [0607] At this point, most of the cells are close to their final positions. The crystallization step places them in correct, final positions. Proper vertical cell spacings are computed, so that horizontal wires can be routed over and between cells in the vertical columns. Vertical and local-horizontal “swaps” may be performed if doing so improves the cost functions. Cells must be assigned proper geometric coordinates so that their positions correspond to legal grid positions specified by the underlying chip architecture. All of these steps are performed by the crystallization process described above, and the cells are frozen into their final positions. At this point, the placement process according to the invention system has completed its work. A data structure is prepared that can be read by a routing system (not shown) for chip routing and design completion. [0608] While the invention has been described in connection with specific embodiments thereof, it will be understood that the invention is capable of further modifications. This application is intended to cover any variations, uses or adaptations of the invention following, in general, the principles of the invention, and including such departures from the present disclosure as come within known and customary practice within the art to which the invention pertains. [0609] SECTION 18: NET ROUTING AND PIN CONNECTION [0610] Referring to FIG. 51, a flow chart [0611] The step referenced by reference number [0612] Therefore, the partition method is iterated with pp's as the elements of the new partition. This operation is identified by boxes [0613] For example, if K is 20 and the integrated circuit contains 4,000 pins to be routed, the first partitioning of the pins into groups of about 20 pins each results in approximately 200 pin partitions (pp's). Because 200 is much larger than 20, the pp's are partitioned into sets of about 20 pp's each, resulting in approximately ten (10) sets of pp's. In this example, the number of sets of pp's, ten, is in the same order of magnitude as K, therefore, no further iteration of the partitioning step is necessary. [0614] After the partitioning of the pins, as indicated by boxes [0615] After creating an MST for each set of the pp's, the partitions of each set pp's are redefined to “link” the partitions of the sets connected by the edge of the MST. This operation is indicated by box [0616] The creation of the MSTs and the redefinition of the partitions to link the members of the sets are iterated [0617] To create a minimal spanning tree, any of the well known algorithms can be used. The inventors of the present invention has used Steiner's tree with good results. [0618] The details of the method to create an MST for any set of vertices is discussed in the Minimal Spanning Tree subsection below. [0619] The partition tree is distinguishable from the minimal spanning tree. The partition tree represents the iterative partitioning of the pins into pp's, the pp's into sets, and the sets into meta-sets, and so on until the highest level of meta sets are formed. The MST represents the relationship, or interconnection between the sets and all of the members of any set. [0620] For instance, at the lowest level, the pins are partitioned into pp's having, on average, approximately K pins belong to each pp's. After assigning the pins to the pp's, an MST is generated for each set whereby the pins of each of the sets are connected to the other pins of the set to minimize the traversal, or spanning of the pins of the set. Then, each of the sets of the pp's are thus connected, and so on. [0621] The result of the above operations is a one large MST at the top level of the partition tree where each of the vertices of the top level MST represents, on average, approximately K number of sets. That is, each node of the top level MST represents, on average, approximately 20 (the value of K in the example) subnodes, each of which, in turn, represent, on average, about 20 sub-subnodes, and so on. At the leaf level of the MST, each of the pp's represents, on average, about 20 pins. In fact, all of the sets belonging to the same level of the partition tree represents roughly the same number of pins. Consequently, if the same number of nodes of the MST are assigned to each of the multiple processors, then the processors will have approximately same number of pins to connect. This leads to balanced work load among the processors and efficient implementation of parallel processing technique. [0622] Furthermore, the routing process itself will be efficient because, as will be explained below, the present invention partitions the pins into clusters of pins near each other. [0623] Partitioning Method [0624] The pins of the net are partitioned as discussed below. [0625] First, from each pin of the net as a center pin, a neighborhood is constructed. Each of the neighborhoods contains at least K pins of the net. The neighborhood is constructed for the center pin as follows: [0626] a. find the nearest pin from the center pin; [0627] b. determine the distance (rectilinear distance is used in this example but Euclidean distance can be used) to the nearest pin; [0628] c. define a bounding box to include the nearest pin; [0629] d. if any other pins are included within the bounding box, include the other pins in the neighborhood; and [0630] e. if the neighborhood contains less than K pins, then find the next nearest pin (not yet a member of the neighborhood) and repeat the steps b to e. [0631] Referring to FIG. 52, a sample net [0632] Neighborhood [0633] Second, the net is covered, or partitioned, with the neighborhoods with highest ratio between the number of pins in the neighborhood (not already used by another neighborhood) divided by the geometric area of the neighborhood. This ratio indicates how “clustered” the pins are. Because the number of pins in the neighborhood is approximately K, the determining factor is the geometric area of the neighborhood. A high ratio indicates that the pins of the neighborhood are clustered together within a small area. On the other hand, a low ratio indicates that the pins of the neighborhood are apart from each other. [0634] The covering of the net is accomplished as follows: [0635] a. analyze each of the neighborhood to determine its ratio; [0636] b. select the neighborhood, among the remaining neighborhoods, with the highest ratio; [0637] c. the selected neighborhood covers its pins; and [0638] d. repeat steps a to c until all of the pins are covered. [0639] Continuing to refer to FIG. 52, it seems that neighborhood [0640] Third, after all of the pins have been covered, the center pins of the covering neighborhoods are used to construct pin partitions. The pin partitions are created by taking all of the center pins, and assigning all other pins of the net to the closest center pin. For the purposes of partition construction, the neighborhood definitions are abandoned. The neighborhood definitions were used only to determine the center pins of the partitions. [0641] Therefore, in the example as illustrated by FIG. 52, assuming that all three neighborhoods [0642] The net as illustrated by FIGS. 52 and 53 resulted in only three pin partitions (pp's) [0643] Minimal Spanning Tree and Partition Routing [0644] Following the construction of the partition tree. The pp's and the meta sets are organized into minimum spanning trees (MST). To construct an MST for a set of pp's, the center pins of each of the pp's are considered as the vertices and the distance between any two pp's is defined as the distance between the closest pins of the two partitions. [0645]FIG. 53 illustrates three pp's [0646] Referring to FIG. 53, for the purposes of constructing the MST for the pp's [0647] Given the partitions and the distances between the partitions, the process of constructing a MST from the given information is well known in the art and will not be discussed here. Professor James A. McHugh provides an adequate overview of the MST construction method in ALGORITHMIC GRAPH THEORY (1990, Prentice-Hall) pp. [0648] Once a MST is constructed, each of the connected partitions (as represented by the connected vertices of the MST) are connected as follows: [0649] a. the two pins which determined the distance between the two partitions are identified; [0650] b. for each of the two pins, calculate the minimal distance between the pin and any of the other pins of its partition; and [0651] c. the pin whose just calculated distance is greater is assigned to the partition of the other pin as well as retaining its assignment to the original partition. [0652] Referring again to FIG. 53, assuming that the vertices representing partitions [0653] Likewise, assuming that the vertices representing partitions [0654] After the additional assignments of pins [0655] Similar to the iterative application technique used to partition the pins and the sets of pins, the MST and the above-described partition routing technique can be applied interactively to effect the same connections between sets of partitions and meta sets of the sets of partitions. [0656] IN GENERAL [0657] Referring now to FIG. 55, an apparatus [0658] SUMMARY [0659] The specific algorithms described herein, as well as the basic steps which they represent (even if they are replaced by different algorithms), are designed for implementation in a general purpose computer. Furthermore, each of the algorithms described herein, as well as the basic steps it represents, can be encoded on computer storage media such as CD ROMS, floppy disks, computer hard drives, and other magnetic, optical, other machine readable media, whether alone or in combination with one or more of the algorithms and steps described herein. [0660] Although the present invention has been described in detail with regarding the exemplary embodiments and drawings thereof, it should be apparent to those skilled in the art that various adaptations and modifications of the present invention may be accomplished without departing from the spirit and the scope of the invention. Thus, by way of example and not of limitation, the present invention is discussed as illustrated by the figures. Accordingly, the invention is not limited to the precise embodiment shown in the drawings and described in detail hereinabove. [0661] In the following claims, those elements which do not include the words “means for” are intended not to be interpreted under 35 U.S.C. §112 ¶ 6. Referenced by
Classifications
Legal Events
Rotate |