Due: April 28th (extended) at 23:00 via the Catalyst dropbox

This homework has two parts that cover different topics from the lectures. We won't accept late submissions so we recommend you start early!

Questions about this assignment should go to the TA, Thierry.

Turnin

You should submit your modified discover-bpred.py file and ilp_scheduler.py to the Catalyst dropbox for part 1 and 2 respectively.

The formatting of the output of the python scripts should be kept intact so our grading scripts will work, though you can otherwise change the code to your heart's content. We will test your code by calling main() with one of our insn streams and expect output on stdout; as long as you maintain this interface your code will work fine with our tests.

PART 1 - Mystery Branch Predictor

Background

Dr. Chip D. Signer has been put in charge of designing a new chip for his employer. Dr. Signer wants to reverse engineer the branch predictor of a rival chipmaker in order to ensure that his chip performs better. All Dr. Signer knows is that the rival's chip uses a correlating (2-level) branch predictor that uses global history. In order to understand more about the branch predictor, Dr. Signer wants to determine:

the number of entries in the branch history register (BHR)
the number of bits used for the saturating counters in the branch history table (BHT)
the number of entries in the branch history table (BHT)

Assignment

Your assignment is to come up with an algorithm to determine the parameters of a a series of "mystery" branch predictors. Conveniently, these mystery predictors are provided in a Python file called branchpredictor.py. Each branch predictor can be accessed via the following interface:

predict(pc): ask the branch predictor to predict the direction of a branch at the specified PC. Returns True if the branch is predicted taken, False if not taken. Note that calls to predict() do not update the predictor's internal state.

actual(pc, direction): updates the branch predictor's state with the outcome of a branch at the specified PC going in the specified direction (True means taken, False means not taken). This method does not return anything.

reset(): reset the state of the branch predictor. All entries in the BHR are reset to not taken. Each saturating counter in the BHT is set to a value such that it currently reads not taken, but with 1 increment will read "taken".

The initial state of the branch predictor is the same as after a call to reset() (see above).

The constants MAX_* in discover-bpred.py list the maximum values (inclusive) for various branch predictor parameters, so your inference algorithm needn't check or handle values outside these ranges. The minimum value for each parameter is 1.

The indexing function for the BHT uses the bits of the BHR concatenated with the bits from the PC in the following manner: {BHR bits}{PC bits}, i.e. the PC bits are the low-order bits.

You must fill in the missing function definitions in discover-bpred.py to infer the parameters of the branch predictors implemented by branchpredictor.py. You should submit your modified discover-bpred.py file via the Catalyst dropbox.

Restrictions

We will be looking over your source code, so don't cheat by using any interface to the branch predictor other than the two functions specified above. You can of course change things to help debug, but your code will be tested against our versions of branchpredictor.py.

Feel free to change what you need to in discover-bpred.py, but please make sure that the discover() function returns a dictionary in the format indicated there (so our grading scripts will work). We recommend following the given order of inferring parameters (first BHR size, then saturating counter size, then BHT entries) though you can solve them in a different order if you wish.

Files

Download the Python files you need for the assignment here: hw1-mystery-branch-predictor.tar.gz.

You can run the code for this assignment with the command python discover-bpred.py in your Windows, Mac or Linux command terminal. You may need to install Python first; it's easy to download and install. Be sure to install Python 2.x; this code will not work with the new syntax of Python 3.x.

You should ensure your code works for the included mystery branch predictors. We will test your code against other mystery branch predictors, too, so feel free to come up with other test cases yourself.

PART 2 - ILP Scheduler

Background

Dr. Chip D. Signer has been put in charge of designing a new dynamically scheduled processor for his employer. In order to know how to best utilize on-chip resources, Dr. Signer wants to understand a few things about the workloads that will be running on his new chip. Dr. Signer wants to simulate these workloads on an ideal machine with infinite resources, to see how many resources these workloads require. In particular, he wants to know, for a given program (i.e. stream of instructions):

how many clock ticks it would take to run the program given infinite resources, assuming that each instruction (insn) takes 1 tick
the maximum number of insns that execute in parallel, to know how wide to make the datapath
the minimum number of physical registers needed by the program for it to run as quickly as possible, to know how large to make the register file

Since this is a dynamically scheduled processor, the register operands of each insn need to be renamed before the insns can be scheduled. In order to know which physical registers are free and which are still live, you must track the live-range (region of execution between dispatch of the instruction that allocated the register, and the retirement of the instruction that writes to the same logical register) of each physical register p. This range starts as soon as p is used for the first time, and ends after p's corresponding architectural register a is overwritten, as any subsequent uses of a should be routed to the new physical register responsible for storing a. p is then free to be reallocated to hold the value for some new architectural register.

Assignment

Your assignment is to write code that takes an insn stream, and performs register renaming and then scheduling of those insns on an infinite machine. After you have the insn schedule, you can answer Dr. Signer's 3 questions above.

Simplifying Assumptions

There are a number of simplifying assumptions in this assignment to make your life easier.

All insns have an execution latency of 1
Insns communicate exclusively through registers; there are no memory dependences
You're scheduling insns on an infinite machine, so you never have to worry about resource conflicts
You do have to worry about the live-ranges of registers, so that you can use registers as efficiently as possible
We've provided a random insn-trace generator (genInsns()) to make it easier for you to test your code
For all insn traces, you can assume that only register 0 is live at the beginning. Insns will only ever consume input from live registers, though they may output to arbitrary registers (causing the output register to become live going forward).

The format for insns is documented at the top of ilp_scheduler.py. Each insn is a simple Python dictionary.

Our solution added about 70 lines of Python to ilp_scheduler.py.

Renaming Walkthrough

This diagram walks through a modified version of random-3ticks-3width-4pregs.insns (included with the assignment), which has RAW, WAW and WAR data dependences. It shows how to optimally rename the trace, with an added insn F to make things more interesting.

Diagram explaining how renaming works.

A few things we can observe:

Since insns B, C and E all execute in cycle 2, they each need a distinct physical register for their output to avoid WAW conflicts. Some of these registers are then dead in the next cycle: e.g., blue is dead (since it's been overwritten by green in insn C) and so can be used in cycle 3 to store the output from insn D.

Keeping track of when a register is free can be tricky, as insn E shows. Data dependences allow it to execute in cycle 2, but we have to make sure that insn D gets the right value for architectural register 1 (from green), so insn E writes into orange, avoiding a WAR conflict. Since E overwrote architectural register 1, the value in green is dead. But we can't free green yet, because D needs to read from green "in the future". This is where OoO execution complicates renaming: all insns get renamed in-order (the light blue arrows), even though they may execute OoO if their data dependences (black arrows) allow.

This situation actually arises in other places, too, but some traces have not really required precise management of the free list. E.g., even in this trace, it's straightforward to rename insns A-D: the blue register always could be (incorrectly) set as "free immediately" and things would work ok, because of the data dependences. Once we add insns E and F, we're not so lucky: we can't let insn F write to green, as this would interfere with insn D which executes at the same time. So we need an extra physical register, yellow, to allow things to execute as quickly as possible.

Files

Download the files you need for this assignment: hw2-ilp-scheduler.tar.gz.

You can run the code for this assignment with the command python ilp_scheduler.py in your Windows, Mac or Linux command terminal. You may need to install Python first; it's easy to download and install. Be sure to install Python 2.x; this code will not work with the new syntax of Python 3.x.

You should ensure your code works for the included insn trace files (*.insns). We will test your code on these traces as well as others that we generate randomly, so you should test using the random insn trace generator yourself as well.

Thank you to Adrian Sampson who designed and created the first version of this assignment.

HW 1 Branch Prediction and ILP 548

Turnin

PART 1 - Mystery Branch Predictor

Background

Assignment

Restrictions

Files

PART 2 - ILP Scheduler

Background

Assignment

Simplifying Assumptions

Renaming Walkthrough

Files

HW 1 Branch Prediction and ILP
548