CSE 351: The Hardware/Software Interface, Winter 2011
  CSE Home   About Us   Search   Contact Info 
 
Course Home
  Home
Administation
  Overview
  Course email
  Anonymous feedback
  View feedback
 
Assignment Utilities
  Home Virtual Machines
  Homework Turnin
  Class GoPost Forum
  Gradebook
 
Most Everything
  Schedule
    Homework 0
Out: Monday January 3
Due: Wednesday January 5, 11:59 PM
Turnin: Online

Update: New version of run.pl modified to bark loudly when an execution fails.

Assignment Goals

  • Evaluate a claim made in the lecture slides that two seemingly equivalent ways of writing a program can have vastly different performance.
  • Get a feel for the relative performance of Java and C.
  • Get a feel for the effectiveness of turning on C compiler optimizations.
  • Verify that you have set up a working environment in which you can edit, compile, and run C programs.

Overview

The first-day lecture slides make the claim that understanding the memory hierarchy can be useful in writing efficient programs. An example shows that interchanging two loops leads to two correct implementations, but with a factor of 21 difference in performance.

Let's try that.

Here's the important part of the code. The code computes exactly the same thing no matter which of the two loops is outermost.


    for ( i=0; i<2048; i++ ) {
      for ( j=0; j<2048; j++ ) {
        //src[i][j] = i*rep;
	dst[i][j] = src[i][j];
      }
    }

I've written four implementations containing these loops, one in C and three in Java. For the C program, you'll compile both with and without compiler optimizations enabled. Comparing the performance of these five (3 Java + C compiled two ways) allows us to get some idea how the choice of language affects performance and how effective the compiler can be at optimizing your code.

You'll edit the programs to change a couple small things. One is to interchange the order of the i and j loops. Another is to uncomment the commented line. The third is to change the size of the array being copied (from 2048 to 8192).

You'll run each version of the code and measure how long it takes to complete. (5 executables x 2 loop orderings x 2 comment/uncommented line versions x 2 array sizes = 40 versions.) You'll then turn in a short document, structured like this:

  1. The Test System
    • A short string describing the system your ran on (e.g., 'my Mac laptop' or 'the CSE home VM on my Windows laptop' or 'lab Linux workstation' or ...).
    • What the CPU is on that system. You can obtain that on any Linux system using the command 'cat /proc/cpuinfo'. Give us the model name, as listed.

  2. Test Results
    Four tables of numbers giving the measured CPU time consumed when executing each of the five executables under the different configurations. Each table should look like this. (It doesn't have to be exactly this, to very detail of formatting, but it makes reading 50 copies of these tables easier if they're all laid out the same way.)
    Array SizePerforming
    src assignment?
    Appi then jj then i
    2048NoJavameasured timeetc.
    JavaInteger  
    JavaIntegerInteger  
    C  
    Optimized C  

  3. Q&A
    Answer these questions
    1. What are the source code differences among the three Java implementations?
    2. Pick a single pair of results that most surprised you. What is it about the results that surprised you?
      (That is, from the (40*39)/2 pairs of measurement results, pick one pair whose relationship is least like what you would have guessed.)
    3. [Optional extra credit: 0 points] None of these programs appear to actually do anything, so one is tempted to optimize them by simply eliminating all code (resulting in an empty main()). Is that a correct optimization?

      Related to that, trying compiling this C program, with and without optimization, and then timing running it:

      #include <stdio.h>
      
      #define SIZE 1000000
      
      int main() {
        
        int i, j, k;
        int sum=1;
      
        for (i=0; i<SIZE; i++)
          for (j=0; j<SIZE; j++)
            for (k=0; k<SIZE; k++)
      	sum = -sum;
      
      
        printf("hello, world\n");
      
        return 0;
      }
      

      Now replace the printf line with

      printf("Sum is %d\n", sum);
      
      and compile/run unoptimized and optimized.

Details

  • The assignment requires use of a Linux system (or possibly a Mac -- I'm not sure exactly what software comes on them). The practical possibilities are:
    • Your personal machine:
      • If you're running Linux, what you need is either already installed or else easily obtained.
      • If you're not running Linux, you can run the CSE home Linux virtual machine. I highly recommend this option!
    • A CSE instructional lab Linux workstation.
    • The CSE Linux instructional backend machine(s), attu.cs.washington.edu, accessed via ssh.

  • Fetch the files, which are provided as a tar archive: hw0.tar.gz.
    Save them to a directory in which you want a new directory (containing the files) created.

    Now issue the command tar xzf hw0.tar.gz
    That will un-archive the files, creating directory hw0.

    In that directory you will find these files:

    hw0.javaRows 'Java' in your tables
    hw0Integer.javaRows 'JavaInteger' in your tables
    hw0IntegerInteger.javaRows 'JavaIntegerInteger' in your tables
    hw0.CRows 'C' and 'Optimized C' in your tables

  • To compile the C program without optimizations, type 'gcc -Wall hw0.c'. That results in an executable named a.out.

    To compile the program with optimizations, type 'gcc -Wall -O2 hw0.c', which also produces executable a.out (overwriting it).

    To run a.out, type './a.out'.
    Note: You don't actually want to do this. See the point on obtaining timings below.

  • To compile hw0.java type 'javac -Xlint hw0.java'. Make the obvious change for the other java files.

    To run, type 'java -cp . hw0'.

  • On Linux, you can measure the CPU time consumed by any execution using the time program. For example:
    $ /usr/bin/time ./a.out
    0.12user 0.03system 0:00.16elapsed 95%CPU (0avgtext+0avgdata 66704maxresident)k
    0inputs+0outputs (0major+8287minor)pagefaults 0swaps
    
    This executes the command (./a.out) and then prints information about the resources it consumed. The only information we'll use is the user time ('0.12user', meaning 0.12 seconds of CPU time consumed while not in the operating system) and the system time ('0.03system', meaning 0.03 CPU seconds spent by the operating system doing things for this application).

    The measured time we want is the sum of those two. For this example, the measured time would be 0.15 seconds.

    (Type 'man time' to obtain more information about the time program.)

  • Measured times are likely to vary quite a bit from one run to the next, even without changing anything. (This course will explain some of the reasons why.)

    All implementations wrap the two array copying loops with another loop that causes the copy to be performed 10 times. Part of the goal of that is to reduce the amount of variability in the measurements.

  • The distribution includes a script, run.pl that automates some of the chore of running the five executables and gathering measurements. To run it, type './run.pl'. It compiles each of the source files (and hw0.c twice, with and without optimizations), runs each with the time command, and reports the sum of the user and sytem times.

    run.pl works everywhere I've tried it (including the CSE virtual machine). It should work for you, but it is an optional and unsupported tool.

  • To summarize the measurement task:
    • Compile and measure each of the Java implementations as they come in the distribution. Compile and measure the C program as well, with and without optimizations.
    • Edit each source file to uncomment the assignment to array src. Re-compile and re-measure.
    • Edit to switch the order of the i and j loops. Recompile and re-run.
    • Edit to re-comment the statement assigning to array src. Re-compile and re-run.
    • Edit to put the loops back in the original order. (At this point the code is the same as it was when you first fetched it.) Change the code to copy an array of size 8192x8192. Then repeat the steps above.

Turnin

Please turn in a .pdf or .txt file using the turn-in page linked from the course home page. (You can do so even if you're not yet officially enrolled.)

Grading

Grading is basically binary. If you convince us that you actually did the assignment, on time, then full credit. If you fail to convince us, then no credit.

Note that there are two kinds of work required to "do the assignment" -- the mechnical aspects of modifying and measuring the code, and the intellectual effort of thinking about the results.


Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to zahorjan at cs.washington.edu]