![university of washington logo](/images/purple_w.png)
CSE 410 Homework 3: Memory Organization and Caches
Assigned | Monday, May 14, 2012 |
---|---|
Due Date | Tuesday, May 22, 2012 at 11:00 pm You may turn in written copies in class on Monday, May 21, if you wish. |
Files |
hw3.tar on the course web site or ~cse410/hw3.tar on klaatu |
Overview
This assignment has two parts: Part I is a set of written questions from the textbook. Part II is a set of benchmark experiments for you to run to observe how changes in memory reference patterns and other aspects of programs affect performance.
The purpose of this assignment is to help clarify your understanding of memory hierarchies and caches. The written problems will also help you better prepare for examinations. Also remember that the book contains many practice problems similar to the ones included here. Solutions to the practice problems are located at the end of each chapter.
The benchmark experiments will give you a chance to see how these principles play out when real code is executed on real systems. You should run the experiments, then answer questions to explain some of the behavior you observed.
Logistics
All of the answers to these questions should be turned in as electronic documents in PDF format. You may also turn in paper copies in the class before the assignment is due if you wish.
Grading
For the written problems from the book we will use the same rules as on the previous assignments, including reserving the right to grade only a subset of the problems. We will provide solutions to all of the problems in the written homework assignments in a timely fashion after the assignment is due. Since the late policy affords you a maximum of two late days, we cannot release solutions until the Friday following the assignment's due date, at the earliest. The answers to part II will be graded individually.
Part I: Written Problems
Answer the following questions from the book. Be sure you are using the 2nd edition of Computer Systems: A Programmer's Perspective. If you're not using the right book you might be doing the wrong problems!
- Problem 6.27
- Problem 6.30
- Problem 6.35
- Problem 6.42
Part II: An Experiment in C and Java
The goals of this part of the assignment are to:
- Evaluate a claim made in the lecture slides that two seemingly equivalent ways of writing a program can have vastly different performance.
- Get a feel for the relative performance of Java and C.
- Get a feel for the effectiveness of turning on C compiler optimizations.
Overview
Let's test the claim that understanding the memory hierarchy can be useful in writing efficient programs. An example in the first-day lecture slides said that interchanging two loops has no effect on the correctness of the results, but can give a 21x difference in performance. Let's see about that.
Here's the important part of the code. It computes exactly the same thing no matter which of the two loops is outermost.
int rep;
int i, j;
// ...
for (i = 0; i < 2048; i++) {
for (j = 0; j < 2048; j++) {
// src[i][j] = i * rep;
dst[i][j] = src[i][j];
}
}
You will download a set of four tiny programs—one in C and three in Java—that contain those loops. You'll compile them and time how long it takes them to run. For the C program, you'll compile both with and without compiler optimizations enabled, so in total you will have five programs to compare at a time (three Java programs + one C program compiled two ways).
You will do this several times, making small modifications to see what differences they make—how the choice of language affects performance and how effective the compiler can be at optimizing your code when you:
- interchange the order of the
i
andj
loops - uncomment the commented line
- change the size of the array being copied from 2048 x 2048 to 4096 x 4096
You'll run each version of the code and measure how long it takes to complete. With all the permutations (5 executables x 2 loop orderings x 2 commented/uncommented line versions x 2 array sizes), that's 40 versions. (It will be easy—just read all the way through these instructions first.)
You'll then turn in a short document, described below, in which you summarize your test results and answer a few questions.
Details
Downloading
Fetch the files, which are provided as a tar
archive: hw3.tar . Save them to a
directory in which you want a new directory (containing the files)
created.
Now issue the command tar xvf hw3.tar
. That will
un-archive the files, creating directory hw3
. In that
directory you will find these files:
File | Description |
---|---|
cacheExperiment.java | Rows 'Java' in your tables of test results (see below) |
cacheExperimentInteger.java | Rows 'JavaInteger' in your tables |
cacheExperimentIntegerInteger.java | Rows 'JavaIntegerInteger' in your tables |
cacheExperiment.c | Rows 'C' and 'Optimized C' in your tables |
run.pl | See “Automating” below |
Compiling
To compile the C program without optimizations, cd
to
the hw3
directory and run gcc -Wall
cacheExperiment.c
. That results in an executable
named a.out
. To compile the program with optimizations,
type gcc -Wall -O2 cacheExperiment.c
, which also produces
executable a.out
(overwriting the previous one).
To run a.out
, you would
type ./a.out
. (Note: You don't actually
want to do this. See the next heading about obtaining timings.)
To compile cacheExperiment.java
type javac -Xlint
cacheExperiment.java
, which produces cacheExperiment.class
. Do the
same thing for the other Java programs. To run it, type java -Xmx640M -cp . cacheExperiment
. (Again,
this is a command you need to time, so read on.)
Timing
On Linux, you can measure the CPU time consumed by any execution
using the time
program. For example:
$ /usr/bin/time ./a.out 0.12user 0.03system 0:00.16elapsed 95%CPU (0avgtext+0avgdata 66704maxresident)k 0inputs+0outputs (0major+8287minor)pagefaults 0swaps
This executes the command (./a.out
) and then prints
information about the resources it consumed. (Type man
time
to obtain more information about the time program and
ways to format its output.)
The only information we'll use is the user time ('0.12user', meaning 0.12 seconds of CPU time consumed while not in the operating system) and the system time ('0.03system', meaning 0.03 CPU seconds spent by the operating system doing things for this application). The measured time we want is the sum of those two. For this example, the measured time would be 0.15 seconds.
Measured times are likely to vary quite a bit from one run to the next, even without changing anything. (This course will explain some of the reasons why.) Note that all the programs wrap the two array-copying loops with another loop that causes the copy to be performed 10 times. One goal of that is to reduce the amount of variability in the measurements.
Automating
The distribution includes an optional script, run.pl
,
that automates some of the chore of running the five executables and
gathering measurements. To run it, type ./run.pl
.
It compiles each of the source files (and cacheExperiment.c
twice;
with and without optimizations), runs each with
the time
command, and reports the sum of the user and
system times.
run.pl
should work in most environments (including the
CSE virtual machine). It should work for you, but it is an optional
(and unsupported) tool.
So, to summarize:
- Compile and measure each of the Java implementations as they come in the distribution. Compile and measure the C program with and without optimizations.
- Edit each source file to uncomment the assignment to
array
src
. Re-compile and re-measure. - Edit to switch the order of the
i
andj
loops. Recompile and re-measure. - Edit to re-comment out the statement assigning to
array
src
(with thei
andj
loops still reversed). Re-compile and re-measure. - Edit to put the loops back in the original order. (At this point the code is the same as it was when you first fetched it.) Change the code to copy an array of size 4096 x 4096. Then repeat steps 1–4 above.
Test Results
Collect your results in a short PDF document with the following sections:
- The Test System
- A short string describing the system your ran on (e.g., “my Mac laptop” or “the CSE home VM on my Windows laptop” or “lab Linux workstation”).
- What the CPU is on that system. You can obtain that on any
Linux system by issuing the command
cat /proc/cpuinfo
. Give us the model name, as listed.
- Test Results
Four tables of numbers giving the measured
CPU time consumed when executing each of the five executables
under the different configurations. Each table should look like this. (It doesn't have
to be exactly this, to every detail of formatting, but please keep your information in the same order;
it makes reading 50 copies of these tables easier if they're all laid out the same way.)
Array Size Performing
src
assignment?App Time with i
thenj
Time with j
theni
2048 No Java JavaInteger JavaIntegerInteger C Optimized C - Q&A
Answer these questions:- What are the source code differences among the three Java implementations?
- Pick a single pair of results that most surprised you. What is it about the results that surprised you? (That is, from the (40*39)/2 pairs of measurement results, pick one pair whose relationship is least like what you would have guessed.)
- [Optional extra credit] None of these
programs appear to actually do anything, so one is tempted to
optimize them by simply eliminating all code (resulting in an
empty
main()
). Is that a correct optimization? Related to that, try compiling this C program, with and without optimization, and then timing running it:#include <stdio.h> #define SIZE 1000000 int main() { int i, j, k; int sum = 1; for (i = 0; i < SIZE; i++) { for (j = 0; j < SIZE; j++) { for (k = 0; k < SIZE; k++) { sum = -sum; } } } printf("hello, world\n"); return 0; }
Now replace the
printf
line with
and compile/run unoptimized and optimized.printf("Sum is %d\n", sum);
Added 5/20: Feel free to experiment with smaller numbers for SIZE if using 1000000 as suggested above takes more time than you have patience. The important thing is to observe how the code behaves with and without optimizations both before and after changing theprintf
statement, and explain what is going on.
Submitting Your Work
Please turn in a PDF file containing your answers to the Catalyst drop box linked from the CSE 410 main web page. We will not accept electronic submissions that are not in PDF format. (Scanned copies of handwritten files are okay, as long as the files are small - a few MB at the most - and the copies are legible, but PDF files produced with a text editor or word processor are preferred.) You may also hand in a paper copy of your solutions in the class before the due date.
Computer Science & Engineering University of Washington Box 352350 Seattle, WA 98195-2350 (206) 543-1695 voice, (206) 543-2969 FAX