DOC

M2M-OCT-SW8 Octal 8-Port Myrinet-SAN Switch

By Carl Hunter,2014-05-07 19:27
16 views 0
M2M-OCT-SW8 Octal 8-Port Myrinet-SAN Switch

    M2M-OCT-SW8: Octal 8-Port Myrinet-SAN Switch

    Recommended Topologies

     “White Paper”

    for distribution to selected customers

    Danny Cohen & Chuck Seitz

    Myricom, Inc.

    3 July 1997

Background

Myricom would like to make it easier and more cost-effective to build large Myrinet-SAN

    clusters than it is now with the M2M-DUAL-SW8, and more robust physically than it is with

    M2M-Ys (cable splitters), etc. The principal way in which Myricom can improve the

    performance/cost of the switch network for large clusters is to provide more highly integrated

    components. 16-port switches will help, but these won’t be ready to ship in standard products

    until at least November.

Myricom is supplying M2M-DUAL-SW8 Myrinet-SAN switches at a (US-list) price of

    $125/port, but the cost to our customers for the SAN cables and accessories such as splitters for

    large clusters is similar to that of the switches themselves. The AMP microstrip SAN cables are

    technologically remarkable, and there is not much we can do about their cost and price.

    However, we can implement many of the inter-switch links of large networks on circuit boards

    internal to a more highly integrated switch component.

In other words, we can do a better job for our customers by supplying “higher level,” more

    powerful, switch components. At a given cost, we can provide a higher performance (higher

    bisection data rate) network with fewer individual components, each with more switches, and

    significantly less inter-switch cabling.

Therefore, we have been working on a plan for a family of Myrinet-SAN-switch products that

    will be particularly attractive for large clusters. The basic question is: “What topology?”

Full-BBW Topologies

The performance metric that we sought to maximize in defining a higher level switch component

    is the network Bisection Bandwidth (BBW), which is measured in links, and is defined

    mathematically as the minimum number of links crossing any cut that bisects the hosts (half of

    the hosts on one side of the cut, the other half on the other side). We consider a topology to have

    a full BBW when BBW is equal to half the number of nodes, e.g., 64 links across any bisecting

    cut of a 128-node network. Except for expander graphs and other exotic topologies, for which

    BBW must be defined differently, the full BBW is also the maximum possible. Any cut across

    ? 1997 Myricom, Inc. 1

    the host links between any half of the hosts and the switch network will bisect the hosts and cut a number of links equal to half the number of hosts.

    In computing practice, it is possible with a full-BBW network and a suitable choice of routes for the hosts to perform any possible data permutation with all hosts sending and receiving packets at once at the full link data rate. The significance of the BBW for chaotic traffic patterns is that the aggregate network capacity increases statistically with and is largely determined by the BBW. Thus, the network BBW is an important metric in practice for tightly-coupled computations on multicomputers.

    Inasmuch as the path-formation latency of Myrinet-SAN switches is so much smaller than software latencies, the maximal or average diameter, measured in switches traversed, is a relatively unimportant metric of the network topology. In the topologies illustrated below, the number of nodes is denoted as N, the minimal-bisection data rate as BBW, the maximal distance

    . (graph diameter in switches) as D, and the average distance as Dav

    There is a tradeoff in switch-network topologies between performance and cost. The higher level switch component proposed is not only an ideal building block for full-BBW topologies, but can be used in somewhat lower cost topologies in which the BBW is a fraction -- such as 1/2 -- of the full BBW. The BBW actually required for multicomputers and clusters depends upon the speed of the hosts relative to the links, and on the computations performed.

    Suppose, for example, a distributed computation on a Pentium Pro cluster consumes 20% of the available 528 MB/s of host-memory bandwidth for moving data between hosts, 10% for sending and 10% for receiving. By today’s standards, this computation would be considered to be quite communication-intensive. The 160MB/s Myrinet channels to and from each host would be conveying packet data about 1/3 of the time. If the packet destinations were randomly and uniformly distributed amongst the other hosts, the bisections of a full-BBW network would be at about 1/3 of capacity, and there would be relatively little internal blocking in the switch network. If, however, the network had only 1/2 of the full BBW, the bisections would be at about 2/3 capacity, and close to the point at which internal blocking can be expected to throttle the aggregate network throughput.

The M2M-OCT-SW8 Switch

? 1997 Myricom, Inc. 2

The topology. The M2M-OCT-SW8 contains eight interconnected 8-port switches connected as

    a bi-graph with two stages of four 8-port switches each. Each 8-port switch is connected to all

    four of the 8-port switches of the other stage.

    Physical and cabling characteristics. In the orientation of the diagram above, the M2M-OCT-SW8 has on its left side 16 Myrinet-SAN connectors, each carrying one link only (as its A link,

    without a B link). The A-only connectors are preferred for connecting to 16 hosts without using

    any accessories. AMP has completed the custom tooling for Myrinet-SAN cables with locking

    cable ends. This improvement will prevent cable ends from becoming disconnected from the

    SAN connectors due to vibration or mishandling. Cable splitters would be the “weak link,” so

    we are now particularly motivated to eliminate them.

On its right side, the M2M-OCT-SW8 has 8 Myrinet-SAN connectors, each carrying two links

    (A and B). This double-link connector and cable arrangement is very efficient for the inter-

    switch connections in large configurations.

The M2M-OCT-SW8 will be 17” wide, 1U (1.75”) high, with an internal power supply, and will

    be designed to be mountable in a standard 19” rack. It will operate on about 20 Watts.

    As a switch network for 16 hosts. The M2M-OCT-SW8 provides a full BBW of 8 links. One 8-link cut can be visualized by a horizontal cut through the center of the diagram above, but there

    are a myriad of other cuts that leave half of the hosts on one side of the cut and the other half on

    the other side. You can take our word for it that there is no bisecting cut that crosses fewer than

    8 links.

    ? 1997 Myricom, Inc. 3

Thus, the M2M-OCT-SW8 is an ideal switch for a 16-host cluster even without using the ports

    on the right side.

For each of the topologies presented here, we include a tabulation of components required for the

    cluster, first the interfaces and the cables between the interfaces and switch network, and then the

    switches, inter-switch cables, and any accessories required. The total cost will vary in actual

    implementations because the lengths of the cables depend on the physical configuration, i.e., the packaging, relative position of cabinets, etc. The cost of 5-foot SAN cables is used as an

    expected average. These tables use current prices, which are subject to change without notice.

     = 2.5 N = 16, BBW = 8 (full), D = 3, Dav

QtyComponentEachTotalSubtotals

    16Myrinet-SAN/PCI interface$1,300$20,800

    165-foot SAN cable$140$2,240$23,040

    1Octal 8-port Myrinet-SAN switch$6,000$6,000$6,000

    (Total cost per host = $1,815)Total:$29,040

For comparison, here is what it would require and cost for the Myrinet-SAN components to build

    a M2M-OCT-SW8, which is priced at $6,000, using M2M-DUAL-SW8 switches.

    Qty Item Price Cost 4 M2M-DUAL-SW8 Dual 8-port Myrinet-SAN switch $2,000 $8,000 8 M2M-Y SAN splitter 40 320 4 M2M-F A-B Flip 40 160 8 M2M-CB-05 5-foot SAN cable 140 1,120

     Total: $9,600

M2M-OCT-SW8 as a switch network for 24 or 32 hosts. A single M2M-OCT-SW8 can

    support 24 hosts (without M2M-Y accessories) simply by connecting 8 hosts to the A links of the

    SAN connectors on the right:

N = 24, BBW = 8 (2/3 full), D = 3, D = 2.22 av

QtyComponentEachTotalSubtotals

    24Myrinet-SAN/PCI interface$1,300$31,200

    245-foot SAN cable$140$3,360$34,560

    1Octal 8-port Myrinet-SAN switch$6,000$6,000$6,000

    (Total cost per host = $1,690)Total:$40,560

or 32 hosts by splitting the A and B links with 8 M2M-Y accessories on the SAN connectors on

    the right:

N = 32, BBW = 8 (1/2 full), D = 3, D = 2.25 av

? 1997 Myricom, Inc. 4

QtyComponentEachTotalSubtotals

    32Myrinet-SAN/PCI interface$1,300$41,600

    325-foot SAN cable$140$4,480$46,080

    1Octal 8-port Myrinet-SAN switch$6,000$6,000

    8M2M-Y$40$320$6,320

    (Total cost per host = $1,638)Total:$52,400

    For these “2/3 full” and “half-full” configurations, however, BBW = 8 links (10.24 Gbits/sec), which is the full BBW only for the 16-host configuration.

    M2M-OCT-SW8 in large clusters. What makes the M2M-OCT-SW8 an ideal building block for large, full-BBW clusters is that it preserves bandwidth between the 16 host links on the left

    and the 16 inter-switch links on the right. Although the BBW between the hosts is 8 links, all of

    the vertical cuts of this topology cross 16 links. This bandwidth-preserving property is clearly

    necessary for any switch component designed for full BBW and scaling to large numbers of

    hosts.

The following pages present recommended network topologies for using the M2M-OCT-SW8

    Myrinet-SAN switch component to build full-BBW networks for 32, 64, and 128 hosts, and

    fractional-BBW networks for 64 and 128 hosts. We believe that these networks represent the

    best options available for performance and cost-effectiveness.

We imagine Myricom’s customers as assembling large clusters out of M2M-OCT-SW8-

    connected 16-host subclusters, which can be checked out and operated independently prior to

    cabling them to other 16-host subclusters.

    Routing with multiple-path redundancy. Between the 16 hosts connected to one M2M-OCT-SW8, there are four, equivalent, minimal routes from the set of four hosts on one 8-port switch to

    the hosts on each other 8-port switch. Both in the M2M-OCT-SW8 topology and in those shown

    later, any or all minimal (progressive) routes can be used without the possibility of a network

    deadlock.

The host or interface software may employ either a static or a dynamic choice of routes. The

    usual objective is to spread the packet traffic uniformly across the available paths in order to use

    all of the available bisection. An example of dynamic routing is to select between all available

    minimal routes sequentially or at random for successive packets. With dynamic routing, the

    packets between pairs of hosts will not necessarily be received in the same order as they are sent.

    ? 1997 Myricom, Inc. 5

    N = 32, BBW = 16 (full), D = 4, D = 3.25 av

QtyComponentEachTotalSubtotals

    32Myrinet-SAN/PCI interface$1,300$41,600

    325-foot SAN cable$140$4,480$46,080

    2Octal 8-port Myrinet-SAN switch$6,000$12,000

    85-foot SAN cable (switch/switch)$140$1,120$13,120

    (Total cost per host = $1,850)Total:$59,200

    ? 1997 Myricom, Inc. 6

N = 64, BBW = 32 (full), D = 5, D = 4.31 av

    16 hosts16 hosts

    16 hosts16 hosts

    4 M2M-DUAL-SW8

QtyComponentEachTotalSubtotals

    64Myrinet-SAN/PCI interface$1,300$83,200

    645-foot SAN cable$140$8,960$92,160

    4Octal 8-port Myrinet-SAN switch$6,000$24,000

    4Dual 8-port Myrinet-SAN switch$2,000$8,000

    325-foot SAN cable (switch/switch)$140$4,480$36,480

    (Total cost per host = $2,010)Total:$128,640

? 1997 Myricom, Inc. 7

    N = 64, BBW = 20 (5/8 full), D = 4, D = 3.62 av

This configuration costs less than the full-BBW version, and provides a BBW of 20 links (25.6

    Gbits/sec) compared with the full BBW of 32 links (40.96 Gbits/sec).

QtyComponentEachTotalSubtotals

    64Myrinet-SAN/PCI interface$1,300$83,200

    645-foot SAN cable$140$8,960$92,160

    4Octal 8-port Myrinet-SAN switch$6,000$24,000

    165-foot SAN cable (switch/switch)$140$2,240$26,240

    (Total cost per host = $1,850)Total:$118,400

    ? 1997 Myricom, Inc. 8

N = 128, BBW = 64 (full), D = 5, D = 4.7 av

    16 hosts16 hosts16 hosts16 hosts

    8

    M2M-DUAL-SW8

    16 hosts16 hosts16 hosts16 hosts

This multi-stage graph with 5 stages provides the full BBW of 64 links (81.92 Gbits/sec).

QtyComponentEachTotalSubtotals

    128Myrinet-SAN/PCI interface$1,300$166,400

    1285-foot SAN cable$140$17,920$184,320

    8Octal 8-port Myrinet-SAN switch$6,000$48,000

    8Dual 8-port Myrinet-SAN switch$2,000$16,000

    645-foot SAN cable (switch/switch)$140$8,960$72,960

    (Total cost per host = $2,010)Total:$257,280

? 1997 Myricom, Inc. 9

    N = 128, BBW = 32 (1/2 full), D = 4, D = 3.8 av

This configuration costs less than the full-BBW version, and provides a BBW of 32 links (40.96

    Gbits/sec) compared with full BBW of 64 links (81.92 Gbits/sec). It leaves one of the double-

    SAN-port connectors unused on each switch. These ports could be used to connect to additional

    hosts.

QtyComponentEachTotalSubtotals

    128Myrinet-SAN/PCI interface$1,300$166,400

    1285-foot SAN cable$140$17,920$184,320

    8Octal 8-port Myrinet-SAN switch$6,000$48,000

    285-foot SAN cable (switch/switch)$140$3,920$51,920

    (Total cost per host = $1,846)Total:$236,240

    ? 1997 Myricom, Inc. 10

Report this document

For any questions or suggestions please email
cust-service@docsford.com