DOC

The FetchExecute Cycle

By Rick Chavez,2014-04-14 06:30
8 views 0
Modern computer design practice is driven by the fact that almost all programs, including Operating Systems, are written in a HLL (High Level Language).

    The FetchExecute Cycle

    This cycle is the logical basis of all stored program computers. Instructions are stored in memory as machine language.

    Instructions are fetched from memory and then executed. The common fetch cycle can be expressed in the following control sequence.

     MAR ? PC. // The PC contains the address of the instruction.

     READ. // Put the address into the MAR and read memory.

     IR ? MBR. // Place the instruction into the MBR.

    This cycle is described in many different ways, most of which serve to highlight

    additional steps required to execute the instruction. Examples of additional

    steps are: Decode the Instruction, Fetch the Arguments, Store the Result, etc.

    A stored program computer is often called a “von Neumann Machine” after one

    of the originators of the EDVAC.

    This FetchExecute cycle is often called the “von Neumann bottleneck”, as the

    necessity for fetching every instruction from memory slows the computer.

    Avoiding the Bottleneck In the simple stored program machine, the following loop is executed.

     Fetch the next instruction

     Loop Until Stop

     Execute the instruction

     Fetch the next instruction

     End Loop.

    The first attempt to break out of this endless cycle was “instruction prefetch”;

    fetch the next instruction at the same time the current one is executing.

    As we can easily see, this concept can be extended.

    InstructionLevel Parallelism: Instruction Prefetch

    Break up the fetchexecute cycle and do the two in parallel.

    This dates to the IBM Stretch (1959)

The prefetch buffer is implemented in the CPU with onchip registers.

    The prefetch buffer is implemented as a single register or a queue.

    The CDC6600 buffer had a queue of length 8 (I think).

    Think of the prefetch buffer as containing the IR (Instruction Register)

    When the execution of one instruction completes, the next one is already

    in the buffer and does not need to be fetched.

    Any program branch (loop structure, conditional branch, etc.) will invalidate the

    contents of the prefetch buffer, which must be reloaded.

    InstructionLevel Parallelism: Pipelining Better considered as an “assembly line”

Note that the throughput is distinct from the time required for the execution of a

    single instruction. Here the throughput is five times the single instruction rate.

    What About Two Pipelines?

Code emitted by a compiler tailored for this architecture has the possibility to

    run twice as fast as code emitted by a generic compiler.

    Some pairs of instructions are not candidates for dual pipelining.

     C = A + B

     D = A ? C // Need the new value of C here

    This is called a RAW (Read After Write) dependency, in that the value for C must be written to a register before it can be read for the next operation.

    Stopping the pipeline for a needed value is called stalling.

    The DynamicStatic Interface and

    Management of Complexity High Level Languages contains some complex constructs that need to be

    executed by the hardware. The question is how to handle the complexity.

    We need to view the architecture as being both the hardware to execute the

    instructions and the compiler that translates the HLL into those instructions.

    Do we have a sophisticated compiler or complex hardware?

    The DSI (DynamicStatic Interface) refers to the division of complexity

    between the compiler and the hardware.

    As an example of what a sophisticated compiler can do for us, consider the

    following two code fragments, based on the previous slide.

     C = A + B

     D = A ? C // The CPU must detect the dependence and stall

     C = A + B

     D = A2 + A?B // This code is equivalent with no stall necessary. A sophisticated compiler will convert the first code to its logical equivalent.

    Superscalar Architectures

    Having 2, 4, or 8 completely independent pipelines on a CPU is very

    resourceintensive and not directly in response to careful analysis.

    Often, the execution units are the slowest units by a large margin. It

    is usually a better use of resources to replicate the execution units.

    What Is Executed? The Idea of Multilevel Machines. In discussing the fetchexecute cycle, we claimed that each instruction is

    fetched and executed. We now ask about the type of instruction.

    In order to answer this question more precisely, we introduce the idea of a

    multilevel machine and multiple levels of computer languages.

    We begin this discussion by discussing three levels of languages.

    HighLevel Language Englishlike statements Z = X + Y

    Assembly Language Mnemonic codes Load X

     Add Y

     Store Z

    Machine Language Binary numbers 0x1100

     (Here shown in 0x3101

     hexadecimal form) 0x2102

    The machine language used in this example is the MARIE design (CPSC 2105)

    The Multilevel Machine

    (1)Following Andrew Tanenbaum, we define a fourlevel machine. Each level of the machine corresponds to a language level.

     Machine Language Language Type

     M3 L3 High level language such as C++ or Java

     M2 L2 Assembly language

     M1 L1 Binary machine language

     M0 Control Signals Microarchitecture level

    Following Tanenbaum, we define a virtual machine as a hypothetical computer that directly executes language at its level. For example, M3 as a virtual

    machine directly executes high level language programs.

    The student should be aware that there is another, very important, use of the

    term virtual machine, with an entirely different definition. We use that later.

    (1) Structured Computer Organization (5th Edition) by Andrew S. Tanenbaum.

     ISBN 0 13 148521 0. Dr. Tanenbaum defines six levels.

    Options for Executing a High Level Language Program

    There are three options for executing a L3 program. Each has been tried.

    Direct Execution. This has been tried with the FORTH and LISP languages.

     This is much less flexible than the other two approaches,

     much more difficult to implement, and less efficient.

    Translation Translate the L3 program to a lower level language, such

     as L2 or L1. The lower level languages are much more

     based on the computer hardware, and easier to execute.

     For a HLL, this step is called compilation.

    Interpretation Write a program in a lower level language, either L2 or

     L1, that takes the L3 program as input data and causes

     the computer to achieve the desired effect.

    Example: The JVM (Java Virtual Machine) is a virtual machine that

     appears to execute the Java program directly. In actual

     fact, it translates the Java code into byte code and

     interprets that byte code.

Report this document

For any questions or suggestions please email
cust-service@docsford.com