Pin assignment for multi-FPGA systems

By Victoria Richardson,2014-05-27 14:58
8 views 0
Pin assignment for multi-FPGA systems





     Pin Assignment for Multi-FPGA Systems

     Scott Hauck and Gaetano Borriello, Member, IEEE

     Abstract?ª Multi-FPGA systems have tremendous potential, providing a high-performance computing substrate for many different applications. One of the keys to achieving this potential is a complete, automatic mapping solution that creates high-quality mappings in the shortest possible time. In this paper, we consider one step in this process, the assignment of inter-FPGA signals to speci?c I/O pins on the FPGA??s in a multi-FPGA system. We show that this problem can neither be handled by pin assignment methods developed for other applications nor standard routing algorithms. Although current mapping systems ignore this issue, we show that an intelligent pin assignment method can achieve both quality and mapping speed improvements over random approaches. Intelligent pin assignment methods already exist for multi-FPGA systems, but are restricted to topologies where logic-bearing FPGA??s cannot be directly connected. In this paper, we provide three new algorithms for the pin assignment of multi-FPGA systems with arbitrary topologies. We compare these approaches on several mappings to current multi-FPGA systems, and show that the force-directed approach produces better mappings, in signi?cantly shorter time, than any of the other approaches. Index Terms?ª Logic emulation, multi-FPGA systems, pin assignment, recon?gurable computing, routing.



     Fig. 1. Two views of the inter-FPGA routing problem: As a complex graph including internal resources (a), and a more abstract graph with FPGA??s as nodes (b).

     I. INTRODUCTION HERE is currently tremendous interest in the development of computing platforms from standard FPGA??s. These systems harness multiple FPGA??s, connected in a ?xed pattern, to implement complex logic structures. In order to use such a system effectively, an automatic mapping system must be developed to translate hardware speci?cations into programming ?les for the individual FPGA??s in the system. This mapping system differs from the standard CAD algorithms used for other technologies because of two major constraints. First, the connections between chips are ?xed by the architecture of the multi-FPGA system, and thus any interchip routing must conform to this

    topology. Second, a multi-FPGA system is ready to use seconds after the mapping has been developed. Thus, the time to create a mapping is just as important as the resulting quality. The process of mapping logic onto a multi-FPGA system usually goes through the following stages. First, logic synthesis and technology mapping convert the logic into functions that ?t the logic blocks in the FPGA??s. Second, partitioning and

     Manuscript received March 27, 1996. This work was supported in part by the Defense Advanced Research Projects Agency under Contract N00014-J91-4041. The work of S. Hauck was supported by an AT&T fellowship. This paper was recommended by Associate Editor K. Cheng. S. Hauck is with the Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL 60208 USA. G. Borriello is with the Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195 USA. Publisher Item Identi?er S 0278-0070(97)09011-8.


     global placement assign the logic to speci?c FPGA??s in the system. Third, global routing and pin assignment route the inter-FPGA signals. Fourth, single-chip placement and routing tools develop complete mappings for the individual FPGA??s in the system. One way of performing the global routing and pin assignment step for multi-FPGA systems is to use standard single-FPGA routing algorithms. In this approach, the FPGA??s are viewed as complex entities, explicitly modeling both internal routing resources and individual IOB??s connected by board traces [Fig. 1(a)]. The routing algorithm would be used to determine both which intermediate FPGA??s to use for longdistance routing (i.e., a signal from FPGA to would be assigned to use either FPGA or ), as well as which individual IOB??s to route through. Unfortunately, although the logic has already been assigned to FPGA??s during partitioning and global placement, the assignment of logic into individual logic blocks will not be done until the ?nal step, single-chip placement and routing. Thus, since there is no speci?c source or sink for the individual routes, standard routing algorithms cannot be applied. The approach generally taken is to abstract entire FPGA??s into single nodes in the routing graph, with the arcs between the nodes representing bundles of traces. This solves the unassigned source and sink problem mentioned above since partitioning has already assigned the logic to speci?c FPGA??s. Unfortunately, the routing algorithm can no longer determine which individual traces a signal should use since those details have been abstracted away. It is this problem, the assignment of a mapping??s logic pins to FPGA IOB??s, that this paper addresses. The techniques developed in this paper, when combined with solutions we have developed for other multiFPGA mapping and

    architecture issues [11], [10], can be combined to create a complete multi-FPGA system solution for rapid-prototyping and other domains [9]. II. PROBLEM DEFINITION In the pin assignment problem for multi-FPGA systems, we start with a multi-FPGA system and a logic mapping to be

     0278?C0070/97$10.00 ? 1997 IEEE



     Fig. 2. Terminology for the pin assignment problem.

     t onto this system. The logic mapping consists of partitions of logic, interconnected by external signals, as well as logic pins which represent the connection of a partition to a signal. Similarly, the multi-FPGA system consists of FPGA chips, interconnected by traces on the circuit board, as well as IOB??s which represent the connection of a trace to a chip. There is also a preexisting assignment of partitions to chips, where each partition must be assigned to a single chip, and each chip can be assigned at most one partition. The pin assignment process for multi-FPGA systems assigns all signals to traces, and logic pins to IOB??s (see Fig. 2). Speci?cally, each signal , which is connected to partitions , must be assigned to exactly one trace . This trace must be connected to chips such that each partition is assigned to a chip . Each trace can be assigned at most one signal. By assigning signals to traces, we also ?x the placement of logic pins to IOB??s, constraining the placement of the FPGA??s in the system. Note that under this de?nition, a signal can only move between FPGA??s that are directly connected (i.e., cannot take multiple hops from the source to the destination). This means that long-distance signals must be split into multiple shorter interconnected signals, where each short signal represents the passage of the signal between connected FPGA??s. We judge the quality of a pin assignment by its impact on the routing resource usage in the FPGA??s in the system after the logic mapping has been completely mapped to the multi-FPGA system, with a good pin assignment requiring a minimum of routing resources. In this paper, we estimate the routing resource usage in a multi-FPGA system by the cumulative source to sink delay of all signals in all FPGA??s in the system. This metric will penalize those assignments that require excessive usage of routing resources in the FPGA, which will decrease performance and reduce the amount of logic that can be ?t into the system. III. EXISTING PIN ASSIGNMENT APPROACHES The problem of pin assignment has been studied in many other domains, including routing channels in ASIC??s [3], general routing in cell-based designs [24], [5], and custom printed circuit boards (PCB??s) [17]. Unfortunately, these approaches are unsuitable to the multi-FPGA pin assignment problem. First, these algorithms seek to minimize the length of connections between cells or chips, while in a multi-FPGA system, the connections between chips

    are ?xed. The standard approaches also assume that pin assignment has no effect on the quality of the logic element implementations, while in the multi-FPGA problem, the primary impact of a pin assignment is on the quality of the mapping to the logic elements (individual FPGA??s in this case). Because of these differences, there is no obvious way to adapt existing pin

     assignment approaches for other technologies to the multiFPGA system problem. There are existing methods of pin assignment for certain multi-FPGA topologies. These approaches require that all logic-bearing FPGA??s are connected only through FPIC??s,1 crossbar chips, or routing-only FPGA??s. It is assumed that the routing chips can handle any assignment of signals to their IOB??s equally well, and thus the only consideration for pin assignment is the routeability of the logic-bearing chips. Since no logic-bearing chips are directly connected, we can place these FPGA??s independently. These placements induce a pin assignment onto the routing chips. Since the routing chips can handle any pin connection pattern equally well, the routing chips need only be con?gured to implement this pattern, and the mapping to the multi-FPGA system is complete. While most such approaches require a separate global routing step to determine which routing chips to send signals through [2], [13], [16], it is possible to handle some carefully constructed topologies without global routing [4]. The existing pin assignment methods are adequate for systems where logic-bearing FPGA??s are never directly connected. However, the majority of multi-FPGA systems have some or all of their logic-bearing FPGA??s directly connected [9], and thus cannot be handled by existing techniques. Today??s systems address this problem by simply creating a random pin assignment, or by requiring manual assignment by the user. For random assignments, the global routing step decides which FPGA??s to route through based purely on interchip concerns, and then pin assignment randomly assigns the signals to traces connecting the appropriate FPGA??s. By using a random pin assignment, the system gives up a potential optimization opportunity, degrading both routing resource usage and mapping time. A random pin assignment is likely to be a poor pin assignment, requiring extra routing inside the FPGA??s to compensate for the bad pin positioning. This will also make it harder to ?nd a routing for the individual FPGA??s, slowing down the single chip mapping tools. As we will show, this can cause the routing step to take up to four times as long simply because of the poor pin assignment. In the next sections, we present three new algorithms for performing pin assignment for arbitrary multi-FPGA topologies. These algorithms take into consideration the structure of the logic assigned to the FPGA??s in the system in order to develop an intelligent assignment of signals to traces. The ?rst methods demonstrate how exact placement

    and routing information for the individual FPGA??s in the system can be used to improve pin assignment quality. The ?nal method then demonstrates how less exact information can be used, maintaining the pin assignment quality while greatly improving the run times. IV. SEQUENTIAL PLACEMENT PIN ASSIGNMENT One approach to pin assignment is to allow the FPGA placement tool to determine its own assignment. This requires that the placement tool allow the user to restrict the locations

     1 An FPIC [1], [12] is a purely routing device, allowing arbitrary interconnections between its I/O pins.






     Fig. 3. Example of pin assignment by sequential placement. Partitioning and global placement assign logic to different FPGA??s (a), putting connected logic blocks A and B in adjacent FPGA??s. The mapping to FPGA 1 is then placed (b), with the restriction that the signal from A to B must go on a trace from 1 to 2. After placement, the assignment of this signal propagates to the adjacent FPGA??s (c), forcing a speci?c pin assignment for the signal from A to B . FPGA 2 would then be placed.



     Fig. 4. Checkerboard (a) and wavefront (b) pin assignment placement orders.

     to which a logic pin can be assigned (a feature available in tools such as the Xilinx APR and PPR placement and routing tools [22]). With such a system, interchip signals are restricted to only those IOB??s that are connected to the proper destinations. Once the placement tool determines the pin assignment for one FPGA, this assignment is propagated to the attached FPGA??s (see Fig. 3). Unfortunately, this limits the number of placement runs that can be performed in parallel. Speci?cally, the assignment from one FPGA is propagated to adjacent FPGA??s only after that FPGA has been placed, and thus no two adjacent FPGA??s can be placed simultaneously. Since the placement and routing steps can be the most timeconsuming steps in the mapping process, achieving the greatest parallelism in this task can be critical. An algorithm for achieving the highest parallelism during placement, while allowing the placement tool to determine the pin assignment, is to ?nd a minimum graph coloring of the multi-FPGA??s system (FPGA??s are nodes, board traces are edges). Since the structure of a multi-FPGA system is usually prede?ned, the coloring can be precomputed, and thus the

    inef?ciency of ?nding a graph coloring is not important. Then, all of the FPGA??s assigned the same color can be placed at the same time since adjacent FPGA??s cannot be assigned the same color. For example, in a four-way mesh (every FPGA is connected to the FPGA directly adjacent horizontally and vertically), the FPGA??s could be placed in a checkerboard pattern, with half handled in the ?rst iteration, and half in the second [Fig. 4(a)]. One problem with the maximum parallelism (or ??checkerboard??) method is that, while the pin assignment for the FPGA??s placed ?rst will be very good since the placer is mostly free in its placement choices, other FPGA??s may be placed fairly poorly. For example, in the mesh shown in

     Fig. 4(a), half of the FPGA??s have their pin assignments dictated by the placement of their neighbors, and thus there is no attention paid to the logic within these FPGA??s. There is an alternative to the checkerboard algorithm, which trades longer mapping run times for better results. The idea is to make sure that FPGA??s are not mapped completely independently of one another. In the ?rst step, a single FPGA is placed. The FPGA that is most constrained by the system??s architecture (i.e., by external interfaces) is normally chosen since it should bene?t most from avoiding extra pin constraints. In succeeding steps, neighbors of previously placed FPGA??s are chosen to be placed next, with as many FPGA??s placed as possible without simultaneously placing interconnected FPGA??s. For example, in a four-way mesh, where the ?rst FPGA routed is in the upper left corner, the mapping process would proceed in a diagonal wave from upper left to lower right [Fig. 4(b)]. In this way, mappings ??grow?? from a single ??seed?? assignment, and will be more related to one another than in the checkerboard approach, hopefully easing the mapping tasks for all of the FPGA??s. Unfortunately, even this ??wavefront?? placement order may not generate good pin assignments. While the individual FPGA placements attempt to ?nd local optimums, global concerns are largely ignored. For example, while signals that are used together in an FPGA will be placed together in the mapping, signals used together in an unplaced neighbor may be scattered (Fig. 5). There is also a signi?cant performance overhead for the checkerboard and wavefront methods (hereafter referred to jointly as ??sequential placement methods??) since they sequentialize the very time-consuming step of placement for different FPGA??s. There are some topologies that cannot be handled by sequential placement approaches. Speci?cally, when the traces that connect FPGA??s are allowed to connect more than two FPGA??s, there is the potential for con?icts to arise which can cause failures with sequential placement methods. For example, consider the topology in Fig. 6. Assume that we wish to map a circuit with one signal connected between and , seven between and , and seven between and . In this situation, at least one

    signal connected between each pair of FPGA??s must be placed on a three-destination trace. However,



     Fig. 5. Example of the quality problems with sequential placement-based pin assignments. The logic at left must be assigned to FPGA??s 1 and 2. If FPGA 1 is placed ?rst, the signals leading to C will be grouped together, but A and B may be separated. Changing the placement order will group together A and B , but could separate E and F . A better placement is shown at right.

     Fig. 6. Example of a multi-FPGA topology that cannot be handled by sequential placement techniques, but which can be found in current systems [20], [7], [14], [6], [21].

     regardless of the order, the ?rst FPGA placed can use all of the three-destination traces for its own signals, causing the assignment to fail. V. FORCE-DIRECTED PIN ASSIGNMENT As we have shown, pin assignment via sequential placement can be slow, cannot optimize globally, and may not work at all for some topologies. What is necessary is a more global approach which optimizes the entire mapping, while avoiding sequentializing the placement step. In this section, we will present one such approach, and give a quantitative comparison with the approaches presented earlier. Intuitively, the best approach to pin assignment would be to place all FPGA??s simultaneously, with the placement runs communicating with each other to balance the pin assignment demands of each FPGA. In this way, a global optimum could be quickly reached. Unfortunately, the communication necessary to perform this task would be prohibitive. The forcedirected approach discussed here is similar to simultaneous placement, but will perform the assignment on a single machine within a single process. Obviously, with the placement of a single FPGA consuming considerable CPU time, complete placement of all FPGA??s simultaneously on a single processor is impractical, and thus simpli?cation of the problem is key to a workable solution. Force-directed pin assignment uses force-directed placement of the individual FPGA??s [18]. In force-directed placement, the signals that connect logic in a mapping are replaced by springs between the signal??s source and each sink, and the placement process consists of seeking a minimum net force placement of the logic. To ?nd this minimum net force con?guration, and thus minimize wire length in the resulting mapping, the software randomly chooses a logic block and moves it to its minimum net force location. This greedy process continues until a local optimum is found, at which point the software accepts that con?guration.

     Force-directed placement may seem to be a poor choice for pin assignment, and is generally felt to be inferior to simulated annealing

    for placement. Two reasons for this are the dif?culty force-directed placement has optimizing for goals other than wire length, and the inaccuracy of the spring approximation to routing costs. However, force-directed placement can handle all of the optimization tasks involved in pin assignment, and we exploit properties of the spring metric in order to ef?ciently handle multi-FPGA systems. Also, force-directed placement is much faster than simulated annealing, and the time to create a mapping is an important issue for multi-FPGA systems. As implied earlier, we will not simply place individual FPGA??s, but will, in fact, use force-directed placement simultaneously on all FPGA??s in the system. To do this ef?ciently, we make two alterations to the basic force-directed algorithm. 1) We do not restrict logic functions to nonshared, discrete locations, but instead allow them to be freely placed into any location in the FPGA with no regard to congestion or physical logic block boundaries (however, logic pins are constrained to exact IOB??s, and logic pins cannot share a single IOB location). 2) We assume that all logic functions are always at their optimal positions. While the second alteration is simply a change in the movement order of nodes, the ?rst change could cause a signi?cant loss in accuracy. However, the resulting degradation of the local optimum will be more than made up for by the ability to seek the global optimum. Also, while this allows logic functions to share physical locations, something that cannot be done in valid FPGA mappings, force-directed placement is used only to determine pin assignments, and the logic functions will be placed later by standard FPGA placement tools. Thus, the ?nal placements will have logic functions assigned to unique, discrete logic block locations. Note that a similar assumption has proven to be a reasonable tradeoff in the placement of standard cells via simulated annealing [19]. By making the two assumptions just given, we now have the opportunity to greatly simplify the mapping process. We can examine the system of springs built for the circuit mapping, and use the laws of physics to remove nodes corresponding to logic functions, leaving only logic pins. As shown in the Appendix, as well as in the example of Fig. 7, the springs connected between an internal logic node and its neighbors can be replaced with a set of springs connected between the node??s neighbors while maintaining the exact same forces on the



     Fig. 7. Example of spring simpli?cation rules. The source circuit at left has logic node others are merged at right.


     replaced at center, and any springs created in parallel with

     remaining nodes. By repeatedly applying these simpli?cation rules to the logic nodes in the system, we end up with a mapping consisting only of logic pins, with spring connections that act identically to the complete mapping they replace. In this way, we simplify the problem enough to allow the pin assignment of a large system of FPGA??s to be performed ef?ciently. There are some details of the force-directed algorithm that need to be considered here. First, ef?ciency in the individual move calculations is critical. To determine the best destination of a node during a move, it is necessary to calculate the change in net force for assigning the signal to each trace that goes between the FPGA??s the signal connects. This is necessary because there may not be any relationship between the location of an IOB on one FPGA and the location of the IOB the trace goes to on another FPGA (in fact, it has been shown that there is some advantage to be gained by scattering IOB connections [11]). As shown in (1), we can speed up the process of determining the net force at each location by reformulating the standard force equation so that the information from all springs connected to a node can be combined at the beginning of a move. Then, the calculation of the net force at each location requires only a small amount of computation. Equation (1) shows the reformulation of the standard force equation SpringConst pos pos pos SpringConst (1)

     In fact, our current system allows different FPGA architectures and sizes to be intermingled within a single multi-FPGA system. We can also accommodate crossbar chips and FPIC??s, chips where minimizing wire length is unnecessary, by setting the spring constant of all springs within such chips to zero. Another important consideration is support for prede?ned pin assignments for special signals. Speci?cally, signals such as clocks, reset signals, ?xed global communication lines, and host interfaces need to be assigned to speci?c IOB locations. Handling these constraints in the pin assignment tool simply requires permanently locking these signals into their proper locations. VI. EXPERIMENTAL RESULTS To compare the various pin assignment approaches, we tested each approach on mappings for four different current systems. These include a systolic DNA comparison circuit for Splash [15], a telecommunications circuit for the NTT board [23], a calorimeter circuit for DECPeRLe-1 [21], and a RISC processor con?gured for logic simulation on the Marc1 system [14]. The pin assignment systems compared are: ??Random,?? which randomly assigns connections to traces; ??Checkerboard?? and ??Wavefront,?? which use sequential placement runs to do pin assignment; and ??Force,?? which uses the force-directed pin assignment technique. The results include both mapping time (Table I) and resulting wirelength (Table II). ??Multi?? is CPU time on SPARC-10??s, and includes the time to perform pin assignment as well as individual FPGA place and routes, and assumes that enough machines

    are available to achieve maximum parallelism. ??Uni,?? the time it takes to complete all of these tasks on a single machine, is also included. The reason the ??Multi?? category is included is that most users of multi-FPGA systems use multiple machines to perform the placement and routing of the individual FPGA??s in the system. In this way, signi?cant performance improvements can be obtained without requiring any special parallelization. The sequential placement techniques only sequentialize the placement step, while routing is allowed to be performed in parallel with subsequent placements and routings. Also, these placement attempts are begun as soon as possible so that, for example, in the checkerboard algorithm, we do not need to wait to begin placing FPGA??s of one color until all FPGA??s of the previous color are placed, but instead can proceed with a placement once all of its direct neighbors in the previous color(s) have been completed. The random approach

     SpringConst pos

     A ?nal extension necessary is software support to avoid the problems found in the sequential approaches on some topologies. Just as the topology in Fig. 6 caused problems for the other approaches, it could cause dif?culties with the forcedirected approach. The solution is to ensure that a signal is never moved into a trace unless it connects to the FPGA??s the signal moves between, and any signal already at that location can be moved to another position under the same restrictions. As mentioned earlier, the force-directed approach is capable of handling most of the optimization goals necessary for pin assignment. First, since little information about the individual FPGA architectures is necessary beyond the location of the IOB??s, it is very easy to port this software to different FPGA??s.












     is assumed to be able to generate a pin assignment in zero time, so its time value is simply the longest time to place and route any

Report this document

For any questions or suggestions please email