CSE 410 20wi

Final Exam Topic List

The final will cover material from the lectures, slides, assignments, midterm. The list of topics here is in bottom-up order of the material, not slide order, following the review sessions classes.
- Logic
  - gates: and, or, xor, not, [nor, nand, ...]
  - truth tables
  - boolean circuits
     - sequential and combinational components
  - synthesizing a boolean circuit from a truth table
  - snthesizing a truth table from a boolean circuit
  - half adder circuit
  - full adder built from half adders
  - d-latch circuit
  - role of the clock

Instruction Set Architecture (ISA)
  - Why define an ISA?
  - ISA approaches:  CISC (Complex Instruction Set Computer) and RISC (Reduced Instruction Set Computer)
  - RISC-V: an open source ISA
     - flexible, extensible
     - ISA designed to allow very fast implementations; power efficient implementations

Elements of an ISA
  - CPU resources
      - registers:  how many?  how wide?
        - why would you want a lot of regisgters?  why wouldn't you?
        - relationship of register width to size of memory (virtual address space size)
      - PC (program counter)
      - system use registers: e.g., memory context, trap handler address

Instuctions
  - What can they do?
    - We know about RISC-style instruction sets
       - "load-store" architectures allow you to operate on values only in registers, not in memory
  - Memory adddressing modes
     - byte addressing
  - Instruction encoding in binary
     - How long are instructions?  Why want short?  Why don't want short?
     - Fixed length or variable length?
     - Decoding: opcode in fixed place
     - Does instruction set include instructions with immediate data?
        - what is "immediate data"?
     - branch addressing: PC-relative
       - leave out of instruction bits whose values are constant: e.g., low-order bit of branch address,
              since branch addresses must be even
  - Instructions are encoded in a small number of formats;  e.g., R-type

Base-Offset Addressing
  - if offset is 0, then you're using the address in the base register
    - you can compute any address into a register
  - natural for accessing arrays
    - base register holds address of start of array offset is computed from the index of the element you wan
    - for this to work, arrays must be allocated contiguously in memory

Pointers: base-offset where offset is 0
   - E.g., Instruction fetch: pc is the "base register" that points to the next instruction
   - In C: int x; int *p; p = &x; *p = 4;  // sets x to 4
   - In Java:  Person p = new Preson();  p.getName();
   - In any language, what does x = y mean?  In particular, if I next modify y, is x modified as well?

Higher Level Language Programs and Execution
  - compiling: translating from HLL to another language (e.g., machine code) in way that you compute the same thing
    in both
  - representing values using bits
    - unsigned int - place value representation
    - signed int - 2's complement representation
    - char - ASCII / Unicode
    - colors - red/green/blue
    - etc.

Compiling / Generating Instructions
  - statement in higher level language is a specification for what to compute, not how to compute it
  - expressions:  e.g., sum = 4 + 3 - ( 4 + 3)
      - many different sequences of hardware instructions can be used for that
  - if...then...else
  
Procedure Call
  - have to get args from caller to callee
  - have to get return value to caller
  - exactly how this is done is a system-wide convention (depends on OS and CPU)
    - typicall leave args in registers, find return value there as well
  - caller has to tell callee where to branch to when done
    - need some way to store the PC somewhere the callee can find the value
    - on RISC-V, jal instruction
  - need some agreemenet about hwo save which registers
     - s0, s1, ... are saved by callee
     - t0, t1, ... are saved by caller
  - local variables:  allocated on stack
     - amount of space required can be computed at compile time (so can generate code to do allocation)
  - global variables: in data segment
  - new'ed variables: in heap
  - compiler and system need to agree on what memory is going to look like at runtime:
     - stack at top
     - instructions at bottom
     - data segment above that
     - heap above that

Virtual Memory
  - one VAS (Virtual Address Space) per process (running program)
    - simplifies compiler's job
    - provides protection of one process from others
  - Address translation
    - solves (external) fragmentation issue for memory allocation
      - fixed size pages of virtual memory and equal/fixed size frames of real memory
    - program running on CPU issues virtual addresses
    - MMU (memory map unit) translates from virtual address to physical address, and then reads/writes real memory
    - MMU uses a page table to do the translation
      - some data structure where the virtual page number is the key and the physical frame number is the value
    - PTE (page table entry) also stores access rights:  r(ead), w(rite), (e)x(ecute)
      - and a "valid bit": if off, means the correspond page in virtual memory doesn't exist at all
  - Page Tables
    - multi-level page table is an approach to reducing the size of the page table
      - rather than proportional to the maximum possible size of the VAS, it's more like proportional to the
        amount of VAS actually used by the process

CPU Implementation
  - (Yes, this isn't bottom-up, as this is way at the bottom)
  - Five phases:  instruction fetch (IF), register read (REG), ALU (arithmentic/logic unit) (EX), data memory (MEM), 
    and register write (WB - write-back)
  - The datapath has "pipes" that connect these components so that every flow of information among them needed by
    every instruction type can be performed
  - multiplexors are switches that "activate" the portion of the full datapath required for an individual instruction 
    (and cause other paths to be ignored)
  - control figures out what the instrucgtion is and sets the switches appropriately
  - (Note: control controls more than just multiplexors, like the W bit on the register file)

Pipelining / Parallelism
  - pipelining is used to execute a single instruction stream in parallel
  - the pipelined execution is required to get results identical to a sequential execution (executing only
    one instruction at a time)
  - General idea in parallelism: dependences
    - dependences restrict the order in which instructions from the sequential stream can be done,
      relative to each othe
    - RAW (read-after-write), WAR (write-after-read), and WAW (write-after-write)
  - Pipelining specific:
    - WAR nad WAW are not issues (because pipeline always respects those dependences)
    - RAW depences can cause hazards
      - hazards on data produced by an R-type instruction can be resolved by "forwarding"
        - the value needed on the read exists somewhere in the pipeline;  send it from there to the ALU
      - if a LW is followed immediately by an R-type instruction that uses the just loaded value, that
        value is not available in time to be fed to the ALU
       - resolved introducing "a bubble" (a NOP (no operation)) into the pipeline to separate the LW and the R-type
          - who introduces the bubble?
            - the hardware may have complicated enough control to do so, or
            - the compiler does so if the hardware can't
    - Control Hazard:  data hazard on the PC
       - don't know if a conditional branch will be taken or not for a few cycles after the branch is fetched,
         but need to fetch another instruction the very next cycle
       - approaches:
          (a) insert bubles -- do nothing until you figure out whether or not to take the branch
          (b) assume not taken -- just updated PC with PC+4 and keep fetching.  If branch is taken, convert
              those incorrectly fetched instructions into NOPs (in the pipeline registers)
          (c) branch prediction:  hardware "remembers" what happens (branch taken or not) for each value of the
              PC that fetches a branch, and uses that information to predict taken or not taken next time it fetches
              that branch

NOT ON FINAL
- Interrupts/Exceptions/Traps
- Protected mode execution of CPU
- Context switching