## CSE471: Computer Design & Organization Assignment 1

Due: Thursday, April 5

The purpose of this assignment is to acquaint you with the sim-outorder simulator that is part of the SimpleScalar tool set and the environment in which it executes. Sim-outorder is an instruction-level simulator that implements many of the architectural features we will study this quarter. But for this assignment, we will use it as though it looks very much like the R3000 you studied in CSE378. Sim-outorder implements the SimpleScalar instruction set architecture, which is very similar to the MIPS architecture.

For this assignment you are expected to work in teams of 2 people. Please turn in one report per pair and remember to put both partner names on the report. If you decided to divide up the work, say who did what.

For this assignment, you should:

- 1. Pick an application from the application directory to instrument. All these programs are taken from the SPEC95 and SPEC2000 benchmark suites, which have been the standard workload for architecture research until very recently (we are now on SPEC2006 but their compilation in SimpleScalar code is still iffy). They have already been precompiled for sim-outorder and you can use them as input to the simulator. Jacob will send you email if it turns out that any of the applications are not appropriate for this assignment.
- 2. Set sim-outorder configuration parameters to reflect a computer that has the following configuration:
  - a pipeline that fetches, decodes, issues, executes and commits one instruction/cycle, no matter what the instruction type
  - only one of each type of functional unit
  - an 8KB, two-way set-associative L1 instruction and data caches with 32 byte blocks
  - a 256KB, direct-mapped L2 unified cache with 32 byte blocks
  - an 8-way, 128-entry data TLB
  - a 4-way, 64-entry instruction TLB
  - All caches are write-back, write-allocate. They and the TLB have an LRU block replacement policy.
  - The page size is 4KB.

This means that the rest of the parameters should be left with their default values, except for two which must be set to: fetch:speed 1 and issue:inorder true. The configurations are set by command line arguments or sim-outorder's config file.

3. Run the simulation for 100 million instructions executed (set the max:inst parameter appropriately)

## Answer or do the following:

- 1. Print-out the output generated by the simulator and highlight the configuration parameters you used, which will be either a configuration file or command line parameters. (1 point)
- 2. Highlight and label the values for the following metrics on your sim-outorder output and answer the related questions:
  - (a) (2 points) The total number of instructions committed this is all the instructions that have completed all the phases of instruction execution, including writing their results to the register file. Why is this number not equal to the number of instructions executed as reported by the output? Is this latter number the same as the one set-up in max:inst (10<sup>8</sup>)?
  - (b) (2 points) The total number of branches and their frequency. Does this seem to be consistent with what you heard about branch occurrences in integer programs?
  - (c) (1 point) The total number of block replacements for the L1 data cache.
  - (d) (3 points) The number of accesses to the L2 cache. If SimpleScalar had not printed it out, how could have you deduced it from other metrics generated by the simulator?

## 3. Answer the following questions:

- (a) (2 points) In general we say "L1 I-cache miss ratios are negligible while L1 D-cache miss ratios are not". Is this true for your experiment (the answer might vary depending on the application)? What factors could contribute to the non-negligible I-cache miss rate?
- (b) (3 points) A major contributor to the fact that the CPI is not 1 is the access to the memory hierarchy. You should attempt to quantify the contribution to the CPI due to cache misses. You should decompose it into contributions due to the misses in each of the 3 caches. Be careful that the hit ratios are related to accesses to the various caches and that you have to convert that into figures related to frequency of instructions.
- (c) (3 points) List 3 other possible contributions to the CPI that will make the latter greater than 1.

There is no electronic turn-in required for this assignment. Instead bring to class on April 5 paper copies of the output generated by sim-outorder and your answers to the questions.