Instruction Pipeline

Introduction

The control unit in all Central Processing Units (CPUs) follows the same basic instruction processing sequence:

  1. fetch the instruction
  2. decode the instruction
  3. execute the instruction

On traditional CPUs, these phases are typically executed sequentially as shown:

Modern, high-performance CPUs (like MIPS®) use a technique called Pipelining, whereby these phases of instruction processing are executed in independent overlapping stages, as shown:

[[include :xsi:imagesrc:ras; margin: 0 auto;"]]

sequential-execution.png

N-stage pipelines therefore have n-instructions at different stages of execution moving through the pipeline, similar to an automotive assembly line.

Pipelines do not improve the instruction execution time (we still need to fetch+decode+execute), rather, they improve the instruction throughput. In our example, we increased the instruction throughput by 3x via the 3-stage pipeline.

PIC32MZ Pipeline

The MIPS microAptive® MPU Core implements a five-stage pipeline as shown here:

pic32mz-pipelined-execution.png
  • I-STAGE: INSTRUCTION FETCH
    • Fetch the instruction from cache/instruction SRAM. Increment the PC by four (4).
  • E-STAGE: EXECUTION
    • Fetch operands from the register file. Begin ALU Operation. Calculate branch target address. Begin MDU operation.
  • M-STAGE: MEMORY FETCH
    • ALU Op Completes. Data cache/SRAM access is performed for load/store operations. MDU operations proceed.
  • A-STAGE: ALIGN
    • Align loaded data with its word boundary. Load data or MDU result to E-STAGE for bypassing.
  • W-STAGE: WRITEBACK
    • For register-to-register or load instructions, the result is written back to register file.

In a multi-stage pipelined implementation, certain operations can lead to interlocks and hazards. The effect of these is to reduce the instruction throughput. The microAptive® MPU core has been designed to minimize the effects of these hazards.

Data Hazards

The term data hazard refers to the situation where results from prior ALU operations are required before they have been written back to the register file.

In the following example, the result of the first operation (in register t3) is required as input for execution of the second operation:

The microAptiv® Microprocessor cores implement a “bypass” mechanism that allows the result of an operation to be sent directly to the instruction that needs it without having to write the result to the register, and then read it back.

The following diagram depicts the PIC32MZ Datapath, showing the A » E and M » E stage bypass connections:

pic32mz-datapath-diagram.png

The following pipeline diagram depicts how the result (t3) can be made available after the M-STAGE to the following instruction without stalling the pipeline:

Control Hazards

Control hazards arise in pipelined CPU architectures whereby instructions that follow branch instructions are fetched by the control hardware, and (traditionally) are flushed if the branch is taken, reducing instruction throughput.

microAptiv® MCU cores implement a delay slot in the pipeline and include branch target address calculation hardware in the E-stage of the pipeline.

For branch operations, these result in:

  • Having an instruction slot available after all branch instructions to do useful work.
  • Guaranteeing that the pipeline fetches either the 'branch taken' instruction or 'fall-through' instruction in the cycle immediately following the delay slot.

Since the branch cannot take effect until the second stage of the pipeline, we say that it is a delayed branch (i.e. the branch/jump will take place after the instruction following the branch is in the pipeline).

The following pipeline diagram demonstrates how either the branch taken instruction (label L2) or the fall-through instruction (label L1) is executed immediately after the delay slot instruction (sw t0, 0(sp)):

[[include :xsi:imagesrc:ras; margin: 0 auto;"]]

a-e-bypass-example.png

Summary: The PIC32MZ pipeline begins the fetch of either the branch path or the fall-through path in the cycle following the delay slot. The MIPS® programmer (or compiler!) must organize their code to perform some useful work during this delay slot, or simply insert a nop instruction.

© 2024 Microchip Technology, Inc.
Notice: ARM and Cortex are the registered trademarks of ARM Limited in the EU and other countries.
Information contained on this site regarding device applications and the like is provided only for your convenience and may be superseded by updates. It is your responsibility to ensure that your application meets with your specifications. MICROCHIP MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WHETHER EXPRESS OR IMPLIED, WRITTEN OR ORAL, STATUTORY OR OTHERWISE, RELATED TO THE INFORMATION, INCLUDING BUT NOT LIMITED TO ITS CONDITION, QUALITY, PERFORMANCE, MERCHANTABILITY OR FITNESS FOR PURPOSE. Microchip disclaims all liability arising from this information and its use. Use of Microchip devices in life support and/or safety applications is entirely at the buyer's risk, and the buyer agrees to defend, indemnify and hold harmless Microchip from any and all damages, claims, suits, or expenses resulting from such use. No licenses are conveyed, implicitly or otherwise, under any Microchip intellectual property rights.