CSE 374 16wi :: Homework 1

Due: Thursday, Jan. 14, 2016, at 23.00.

Introduction

This homework presents you with two scenarios. Each asks you to write a shell script to solve a problem. The goal is to give you an opportunity to practice shell scripting on realistic tasks. Many real-world tasks involve operating on some data, and these are no exception. You will need to download the data files for this homework here (from the command line wget http://courses.cs.washington.edu/courses/cse374/16wi/hws/hw1-data-files.tar). They are distributed in a tar archive, which you can extract with tar -xf filename.

Each scenario requires you to use tools we did not talk about in class, so please read the implementation advice sections carefully. Please use the discussion board for this assignment to post questions and help each other out.

Performance Review

Among the date files, you will find two programs: moonlight and rosebud. Each program takes a single number as an argument. We are very interested in the relative performance of these two programs, and in particular, which one is faster. We expect this may depend on what number is passed to each program.

Your task is to collect performance data for both programs, store that data in files, and then display a plot of the data using gnuplot, a command-line graphing utility. This plot should make clear under what inputs each program is preferable in terms of running time.

Specification

Write a script profile.sh with the following features:
  1. Generates a file moonlight.dat where each line is two numbers: (1) the number passed to moonlight and (2) how long moonlight took to run in seconds. See below for an example of the required format (the first line in the file is a comment that labels the columns).
                #N time
                500 0.02
                1000 0.09
                1500 0.21
  2. Generates a file rosebud.dat containing the corresponding data for rosebud.
  3. Displays a plot of the data from moonlight and rosebud.
  4. Prints nothing.
  5. Takes no arguments.

Extra credit: Enhance the data collection or plotting in interesting ways. For example, you could display best-fit functions along with the data, or you could extend the data collection to average multiple trials for each data point to reduce error. Be creative and push yourself! Extra credit will be awarded for each distinct, interesting enhancement. Please document any extra credit features you implement in a file extra_credit.txt.

Implementation Advice

Timing: There are multiple ways to get the running time of a program in Linux. Bash has a built-in time that is useful in most circumstances. Since you need to get the time in a specific format, however, I recommend using the GNU version of time instead, which is at /usr/bin/time. This version enables you to specify the format of the output. A description of how this works appears in the man page.

Use /usr/bin/time to get the running time of moonlight with an input of 500 as follows: /usr/bin/time -f "some_format_string" ./moonlight 500. The format string controls what timing information gets printed; consult the man page for a description of what goes into the format string. For example, the format string "total time: %E" tells /usr/bin/time to print total time: followed by the elapsed time in hours, minutes, and seconds. It is important to note that /usr/bin/time prints its output to stderr not stdout. Remember that the file descriptor for stderr is 2.

Output: Both moonlight and rosebud print a lot of output, but the specification for profile.sh states that the script should not print anything. To meet the spec, you will need to suppress this output when profile.sh runs either program. One way to accomplish this is to redirect the output to /dev/null, a special Unix file that throws away everything written to it. It is possible to have multiple redirections for a single command.

Plotting: You will use gnuplot to display a simple plot of the performance data you collect. gnuplot provides a simple language to construct plots, and the kind of plot you are expected to display can be created with a single command:

            plot 'moonlight.dat' title 'moonlight' with linespoints, 'rosebud.dat' title 'rosebud' with linespoints

Of course, this command must be given to gnuplot somehow. My recommendation is to put this command in a separate file (the name is not important, I called mine plot.gp) and use gnuplot's -c flag to have it run commands from a script. You will also probably need to use -p to get the plot to stick around after gnuplot exits. Otherwise, the plot with immediately disappear after gnuplot finishes executing the command. Altogether, following this approach, the line to do the plotting in profile.sh would look like

            gnuplot -p -c plot.gp

A final note about actually seeing the plot. gnuplot uses the X11 windowing system to display the plots. If you do this assignment locally on a Linux machine or within a Linux virtual machine, this should work without needing any additional action on your part. If you do this assignment on klaatu, however, there are a couple things you will need to do. First, when you connect to klaatu with ssh you will need to use the -Y flag to tell ssh to forward the X11 data to your local machine (i.e., ssh -Y username@klaatu.cs.washington.edu).

Second, you will need a version of X11 on your local machine. On Windows, you will need to install Xming, an free implementation of X11 for Windows. You will need to have Xming already running in order to see the plots. Alternatively, MobaXTerm is an all-in-one solution. On Mac, you will need to install the free X11 implementation XQuartz. Depending on what version of OS X you have, XQuartz may already be installed. XQuartz should open automatically whenever it is needed. Please post to the discussion board for this homework if you encounter any difficulties getting this stuff up and running.

Image Recovery

We are currently collecting images of vital importance. Unfortunately, the instruments and software we use to acquire them are flawed, and the images are stored as text rather than image data. Your task is to write a script that can process these text files and convert them into images. Five of the images we have collected so far are provided in the data files (img_a.txt, img_b.txt, etc.) to assist you in testing your script.

Specification

Write a script txt_to_png.sh with the following features:
  1. Takes any number of arguments, each of which is the name of a text file to be converted into a png file (a type of image file, pronounced "ping," which stands for Portable Network Graphics).
  2. Converts the file given by each argument into a png file of the same name (not including the file extension, naturally).
  3. Prints an appropriate usage message to stderr if no arguments are provided.
  4. Prints an appropriate error message to stderr if one of the arguments points to a file that does not exist.
  5. Prints nothing to stdout.
  6. Removes any temporary files the script creates.
  7. Runs in a reasonable amount of time (takes no longer than 20 seconds on any of the provided images when run on klaatu).

Extra credit: The convert program you will be using is capable of much more than just converting images from one format to another. It can do simple image transformations like resizing and changing an image to grayscale, as well as more complication operations such as sharpen, blur, etc. These are described in the extensive documentation. To receive extra credit for this exercise, you must implement at least two of these operations as optional arguments to your script (the built-in getopt may prove useful for this). Be careful that implementing these does not interfere with meeting the spec, as your script will first be evaluated without any extra credit features. Please document any extra credit features you implement in a file extra_credit.txt.

Implementation Advice

Conversion: You will use ImageMagick's convert program to produce the final png files. The first step of the conversion, however, is to convert from the format of the input text files to a text format covert can understand. The format of the input text files is shown below.

          width
          height
          r g b
          r g b
          ...

The lines for width and height will give the image's width and height in pixels. Each line after that gives the color of a single pixel in the image in terms of the red, green, and blue values. For example, img_c.txt begins like this.

          315
          253
          86 82 79
          77 73 70

The pixel values are listed in something called row-major order, meaning each pixel in the top row of the image is listed, left-to-right, followed by each pixel in the second row, and so on. As a complete example, here's what the input file would look like for this 3-pixel-by-3-pixel image: .

          3
          3
          255 0 0
          0 255 0
          0 0 255
          0 0 255
          255 0 0
          0 255 0
          0 255 0
          0 0 255
          255 0 0
      

The format convert requires is similar, but different enough that you will need to think carefully about how you will do the conversion. Using the 3-by-3 image again, here's what the converted text file would look like.

          # ImageMagick pixel enumeration: 3,3,255,srgb
          0,0: (255,0,0)
          1,0: (0,255,0)
          2,0: (0,0,255)
          0,1: (0,0,255)
          1,1: (255,0,0)
          2,1: (0,255,0)
          0,2: (0,255,0)
          1,2: (0,0,255)
          2,2: (255,0,0)

This format must be followed exactly for convert to recognize the text as corresponding to image data. Note the contents of the first line, and the position of the width and height within that line. Also note that each pixel value is preceded by its x-y position in the image. To extract the width and height from the input file, I recommend looking into the -n option for the head and tail commands. To transform the pixel values separated by spaces into pixel values separated by commas, you might consider using parameter substitution or the tr program.

Once you have assembled a text file in the correct format, you invoke convert as follows.

          convert txt:input_filename output_filename

Though your script should not do this, you can check to see if your script produced a reasonable-looking image with the display command.

Processing arguments: You may find the following built-in features useful when processing the arguments to your script.

Printing to stderr: Recall that since stdout is file descriptor 1 and stderr is file descriptor 2, you can redirect output from stdout to stderr with 1>&2.

Turn-in Instructions

Use the turn-in drop box to submit the files you created to solve these two exercises. These include

Please do not turn in any of the input files. Your solutions will be tested with the input files as they were provided to you, so do not make any modifications to them as part of your solution. You may use up to two late days on this assignment. If you do the exercises on klaatu, you can use scp (pscp or winscp on Windows) to transfer the files to your local machine, so they can be submitted using the local web browser.