CSE 351: The Hardware/Software Interface

Autumn 2011 Course Website Return home »

Homework 0: An experiment in C and Java

Assigned Saturday, September 24, 2011
Due Date Wednesday, October 5, 2011 at 11:59p
Files hw0.tar.gz

Assignment Goals

Overview

Let's test the claim that understanding the memory hierarchy can be useful in writing efficient programs. An example in the first-day lecture slides said that interchanging two loops has no effect on the correctness of the results, but can give a 21x difference in performance. Let's see about that.

Here's the important part of the code. It computes exactly the same thing no matter which of the two loops is outermost.

int rep;
int i, j;

// ...

    for (i = 0; i < 2048; i++) {
        for (j = 0; j < 2048; j++) {
            // src[i][j] = i * rep;
            dst[i][j] = src[i][j];
        }
    }

You will download a set of four tiny programs—one in C and three in Java—that contain those loops. You'll compile them and time how long it takes them to run. For the C program, you'll compile both with and without compiler optimizations enabled, so in total you will have five programs to compare at a time (three Java programs + one C program compiled two ways).

You will do this several times, making small modifications to see what differences they make—how the choice of language affects performance and how effective the compiler can be at optimizing your code when you:

You'll run each version of the code and measure how long it takes to complete. With all the permutations (5 executables x 2 loop orderings x 2 commented/uncommented line versions x 2 array sizes), that's 40 versions. (It will be easy—just read all the way through these instructions first.)

You'll then turn in a short document, described below, in which you summarize your test results and answer a few questions.

Details

System requirements

The assignment requires use of a Linux system (or a Mac, if the right software is installed by default):

Downloading

Fetch the files, which are provided as a tar archive: hw0.tar.gz . Save them to a directory in which you want a new directory (containing the files) created.

Now issue the command tar xzf hw0.tar.gz. That will un-archive the files, creating directory hw0. In that directory you will find these files:

File Description
hw0.java Rows 'Java' in your tables of test results (see below)
hw0Integer.java Rows 'JavaInteger' in your tables
hw0IntegerInteger.java Rows 'JavaIntegerInteger' in your tables
hw0.c Rows 'C' and 'Optimized C' in your tables
run.pl See “Automating” below

Compiling

To compile the C program without optimizations, cd to the hw0 directory and run gcc -Wall hw0.c. That results in an executable named a.out. To compile the program with optimizations, type gcc -Wall -O2 hw0.c, which also produces executable a.out (overwriting it).

To run a.out, you would type ./a.out. (Note: You don't actually want to do this. See the next heading about obtaining timings.)

To compile hw0.java type javac -Xlint hw0.java, which produces hw0.class. Do the same thing for the other Java programs.
To run it, type java -Xmx640M -cp . hw0. (Again, this is a command you need to time, so read on.)

Timing

On Linux, you can measure the CPU time consumed by any execution using the time program. For example:

$ /usr/bin/time ./a.out
0.12user 0.03system 0:00.16elapsed 95%CPU (0avgtext+0avgdata 66704maxresident)k
0inputs+0outputs (0major+8287minor)pagefaults 0swaps

This executes the command (./a.out) and then prints information about the resources it consumed. (Type man time to obtain more information about the time program and ways to format its output.)

The only information we'll use is the user time ('0.12user', meaning 0.12 seconds of CPU time consumed while not in the operating system) and the system time ('0.03system', meaning 0.03 CPU seconds spent by the operating system doing things for this application). The measured time we want is the sum of those two. For this example, the measured time would be 0.15 seconds.

Measured times are likely to vary quite a bit from one run to the next, even without changing anything. (This course will explain some of the reasons why.) Note that all the programs wrap the two array-copying loops with another loop that causes the copy to be performed 10 times. One goal of that is to reduce the amount of variability in the measurements.

Automating

The distribution includes an optional script, run.pl, that automates some of the chore of running the five executables and gathering measurements. To run it, type ./run.pl. It compiles each of the source files (and hw0.c twice; with and without optimizations), runs each with the time command, and reports the sum of the user and sytem times.

run.pl should work in most environments (including the CSE virtual machine). It should work for you, but it is an optional (and unsupported) tool.

So, to summarize:

  1. Compile and measure each of the Java implementations as they come in the distribution. Compile and measure the C program with and without optimizations.
  2. Edit each source file to uncomment the assignment to array src. Re-compile and re-measure.
  3. Edit to switch the order of the i and j loops. Recompile and re-measure.
  4. Edit to re-comment out the statement assigning to array src (with the i and j loops still reversed). Re-compile and re-measure.
  5. Edit to put the loops back in the original order. (At this point the code is the same as it was when you first fetched it.) Change the code to copy an array of size 4096 x 4096. Then repeat steps 1–4 above.

Test Results

Collect your results in a short PDF document with the following sections:

  1. The Test System
    • A short string describing the system your ran on (e.g., “my Mac laptop” or “the CSE home VM on my Windows laptop” or “lab Linux workstation”).
    • What the CPU is on that system. You can obtain that on any Linux system by issuing the command cat /proc/cpuinfo. Give us the model name, as listed.
  2. Test Results Four tables of numbers giving the measured CPU time consumed when executing each of the five executables under the different configurations. Each table should look like this. (It doesn't have to be exactly this, to every detail of formatting, but please keep your information in the same order; it makes reading 50 copies of these tables easier if they're all laid out the same way.)
    Array Size Performing
    src assignment?
    App Time with i then j Time with j then i
    2048 No Java
    JavaInteger
    JavaIntegerInteger
    C
    Optimized C
  3. Q&A
    Answer these questions:
    1. What are the source code differences among the three Java implementations?
    2. Pick a single pair of results that most surprised you. What is it about the results that surprised you? (That is, from the (40*39)/2 pairs of measurement results, pick one pair whose relationship is least like what you would have guessed.)
    3. [Optional extra credit: 0 points] None of these programs appear to actually do anything, so one is tempted to optimize them by simply eliminating all code (resulting in an empty main()). Is that a correct optimization? Related to that, try compiling this C program, with and without optimization, and then timing running it:
      #include <stdio.h>
      
      #define SIZE 1000000
      
      int main() {
          int i, j, k;
          int sum = 1;
      
          for (i = 0; i < SIZE; i++) {
              for (j = 0; j < SIZE; j++) {
                  for (k = 0; k < SIZE; k++) {
                      sum = -sum;
                  }
              }
          }
      
          printf("hello, world\n");
      
          return 0;
      }
      

      Now replace the printf line with

      printf("Sum is %d\n", sum);
      and compile/run unoptimized and optimized.

Submitting Your Work

Please turn in a PDF file containing your answers to the Catalyst Drop Box for this assignment. We will not accept submissions that are not in PDF format.

Bring your results to section on Thursday so we can discuss them as a group and find out whether the class saw any interesting patterns.

Grading

Grading is binary. If you convince us that you did the assignment (and on time) then you will receive full credit. If you fail to convince us, you will receive no credit!