A Communication-Based Complexity Measure

By Benjamin Evans,2014-08-08 20:08
26 views 0
A Communication-Based Complexity Measure



    At Sandia National Laboratories, we are practical applications (s = 0.4 0.8

    currently engaged in research involving percent) used at Sandia, we have massively-parallel processing. There is achieved the speedup factors on a 1024-considerable skepticism regarding the processor hypercube which we believe viability of massive parallelism; the are unprecedented [2]: 1021 for beam

    skepticism centers around Amdahl’s law, stress analysis using conjugate gradients, an argument put forth by Gene Amdahl 1020 for baffled surface wave simulation in 1967 [1] that even when the fraction using explicit finite differences, and of serial work in a given problem is 1016 for unstable fluid flow using flux-small, say s, the maximum speedup corrected transport. How can this be, obtainable from even an infinite number when Amdahl’s argument would predict of parallel processors is only 1/s. We otherwise?

    now have timing results for a 1024- The expression and graph both processor system that demonstrate that contain the implicit assumption that p is

    the assumptions underlying Amdahl’s independent of N, which is virtually

    1967 argument are inappropriate for the never the case. One does not take a current approach to massive ensemble fixed-size problem and run it on various parallelism. numbers of processors except when

     If N is the number of processors, s is doing academic research; in practice, the

    the amount of time spent (by a serial problem size scales with the number of processor) on serial parts of a program processors.

    and p is the amount of time spent (by a 1024x

    serial processor) on parts of the program

    that can be done in parallel, then

    Amdahl’s law says that speedup is given


    Speedup = (s + p ) (s + p N)

     = 1 (s + p N),


    where we have set total time s + p = 1 167xfor algebraic simplicity. For N = 1024, 91x63xthis is an unforgivingly steep function of 48x31x24xs near s = 0 (see Figure 1). 10%1%2%3%4% The steepness of the graph near s = 0 2(approximately N) implies that very Serial Fraction

    few problems will experience even a FIGURE 1. Speedup under Amdahls Law 100-fold speedup. Yet for three very

    Communications of the ACM May 1988 Volume 31 Number 5

     When given a more powerful Amdahl’s paradigm. The two

    processor, the problem generally approaches, fixed-sized and scaled-sized, expands to make use of the increased are contrasted and summarized in Figure

    2a and b. facilities. Users have control over such

     Our work to date shows that it is not things as grid resolution, number of

    timesteps, difference operator an insurmountable task to extract very complexity, and other parameters that high efficiency from a massively-parallel are usually adjusted to allow the ensemble, for the reasons presented here. program to be run in some desired We feel that it is important for the amount of time. Hence, it may be most computing research community to realistic to assume that run time, not overcome the “mental block” against

    massive parallelism imposed by a problem size, is constant.

     As a first approximation, we have misuse of Amdahl’s speedup formula; found that it is the parallel or vector part speedup should be measured by scaling of a program that scales with the the problem to the number of processors, problem size. Times for vector startup, not fixing problem size. We expect to program loading, serial bottlenecks and extend our success to a broader range of I/O that make up the s component of the applications and even larger values for N.

    run do not grow with problem size.

    REFERENCES When we double the number of degrees 1. Amdahl, G.M. Validity of the single-processor of freedom in a physical simulation, we approach to achieving large scale computing double the number of processors. But capabilities. In AFIPS Conference Proceedings vol. 30

    (Atlantic City, N.J., Apr. 1820). AFIPS Press, Reston, this means that, as a first approximation, Va., 1967, pp. 483485. the amount of work that can be done in parallel varies linearly with the number 2. Benner, R.E., Gustafson, J.L., and Montry, G.R.,

    Development and analysis of scientific application of processors. For the three applications programs on a 1024-processor hypercube,” SAND mentioned above, we found that the 88-0317, Sandia National Laboratories, Feb. 1988. parallel portion scaled by factors of

    1023.9969, 1023.9965, and 1023.9965.

    If we use s and p to represent serial and

    parallel time spent on the parallel

    system, then a serial processor would

    ; N to perform the require time s + p

    task. This reasoning gives an alternative

    to Amdahl’s law suggested by E. Barsis

    at Sandia:

Scaled speedup = (s + p ?;N ) ⁄ (s + p)

     = s + p ?;N

     = N + (1 N) ?;s

In contrast with Figure 1, this function is

    simply a line, and one with much more

    moderate slope: 1 N. It is thus much

    easier to achieve efficient parallel

    performance than is implied by

    Communications of the ACM May 1988 Volume 31 Number 5

    Time = 1

    Run on serialspprocessor


    Run on parallel processor

    Time = s + p ? N

    FIGURE 2a. Fixed-Size Model: Speedup = 1 ? (s + p ? N)

    Time = s + Np

    Hypothetical run on serial processor1N

    spRun on parallel processor

    Time = 1

    FIGURE 2b. Scaled-Size Model: Speedup = s + Np Communications of the ACM May 1988 Volume 31 Number 5

Report this document

For any questions or suggestions please email