CSE 374 16wi :: Homework 1
Due: Thursday, Jan. 14, 2016, at 23.00.
Introduction
This homework presents you with two scenarios. Each asks you to write a shell script to solve a problem. The goal is to give you an opportunity to practice shell scripting on realistic tasks. Many real-world tasks involve operating on some data, and these are no exception. You will need to download the data files for this homework here (from the command line wget http://courses.cs.washington.edu/courses/cse374/16wi/hws/hw1-data-files.tar
). They are distributed in a tar archive, which you can extract with tar -xf filename
.
Each scenario requires you to use tools we did not talk about in class, so please read the implementation advice sections carefully. Please use the discussion board for this assignment to post questions and help each other out.
Performance Review
Among the date files, you will find two programs: moonlight
and rosebud
. Each program takes a single number as an argument. We are very interested in the relative performance of these two programs, and in particular, which one is faster. We expect this may depend on what number is passed to each program.
Your task is to collect performance data for both programs, store that data in files, and then display a plot of the data using gnuplot, a command-line graphing utility. This plot should make clear under what inputs each program is preferable in terms of running time.
Specification
Write a script
profile.sh
with the following features:
- Generates a file
moonlight.dat
where each line is two numbers: (1) the number passed to moonlight
and (2) how long moonlight
took to run in seconds. See below for an example of the required format (the first line in the file is a comment that labels the columns).
#N time
500 0.02
1000 0.09
1500 0.21
- Generates a file
rosebud.dat
containing the corresponding data for rosebud
.
- Displays a plot of the data from
moonlight
and rosebud
.
- Prints nothing.
- Takes no arguments.
Extra credit: Enhance the data collection or plotting in interesting ways. For example, you could display best-fit functions along with the data, or you could extend the data collection to average multiple trials for each data point to reduce error. Be creative and push yourself! Extra credit will be awarded for each distinct, interesting enhancement. Please document any extra credit features you implement in a file extra_credit.txt
.
Implementation Advice
Timing: There are multiple ways to get the running time of a program in Linux. Bash has a built-in time
that is useful in most circumstances. Since you need to get the time in a specific format, however, I recommend using the GNU version of time
instead, which is at /usr/bin/time
. This version enables you to specify the format of the output. A description of how this works appears in the man page.
Use /usr/bin/time
to get the running time of moonlight
with an input of 500 as follows: /usr/bin/time -f "some_format_string" ./moonlight 500
. The format string controls what timing information gets printed; consult the man page for a description of what goes into the format string. For example, the format string "total time: %E"
tells /usr/bin/time
to print total time:
followed by the elapsed time in hours, minutes, and seconds. It is important to note that /usr/bin/time
prints its output to stderr
not stdout
. Remember that the file descriptor for stderr
is 2.
Output: Both moonlight
and rosebud
print a lot of output, but the specification for profile.sh
states that the script should not print anything. To meet the spec, you will need to suppress this output when profile.sh
runs either program. One way to accomplish this is to redirect the output to /dev/null
, a special Unix file that throws away everything written to it. It is possible to have multiple redirections for a single command.
Plotting: You will use gnuplot to display a simple plot of the performance data you collect. gnuplot provides a simple language to construct plots, and the kind of plot you are expected to display can be created with a single command:
plot 'moonlight.dat' title 'moonlight' with linespoints, 'rosebud.dat' title 'rosebud' with linespoints
Of course, this command must be given to gnuplot somehow. My recommendation is to put this command in a separate file (the name is not important, I called mine plot.gp
) and use gnuplot's -c
flag to have it run commands from a script. You will also probably need to use -p
to get the plot to stick around after gnuplot exits. Otherwise, the plot with immediately disappear after gnuplot finishes executing the command. Altogether, following this approach, the line to do the plotting in profile.sh
would look like
gnuplot -p -c plot.gp
A final note about actually seeing the plot. gnuplot uses the X11 windowing system to display the plots. If you do this assignment locally on a Linux machine or within a Linux virtual machine, this should work without needing any additional action on your part. If you do this assignment on klaatu, however, there are a couple things you will need to do. First, when you connect to klaatu with ssh
you will need to use the -Y
flag to tell ssh
to forward the X11 data to your local machine (i.e., ssh -Y username@klaatu.cs.washington.edu
).
Second, you will need a version of X11 on your local machine. On Windows, you will need to install Xming, an free implementation of X11 for Windows. You will need to have Xming already running in order to see the plots. Alternatively, MobaXTerm is an all-in-one solution. On Mac, you will need to install the free X11 implementation XQuartz. Depending on what version of OS X you have, XQuartz may already be installed. XQuartz should open automatically whenever it is needed. Please post to the discussion board for this homework if you encounter any difficulties getting this stuff up and running.
Image Recovery
We are currently collecting images of vital importance. Unfortunately, the instruments and software we use to acquire them are flawed, and the images are stored as text rather than image data. Your task is to write a script that can process these text files and convert them into images. Five of the images we have collected so far are provided in the data files (img_a.txt
, img_b.txt
, etc.) to assist you in testing your script.
Specification
Write a script
txt_to_png.sh
with the following features:
- Takes any number of arguments, each of which is the name of a text file to be converted into a png file (a type of image file, pronounced "ping," which stands for Portable Network Graphics).
- Converts the file given by each argument into a png file of the same name (not including the file extension, naturally).
- Prints an appropriate usage message to stderr if no arguments are provided.
- Prints an appropriate error message to stderr if one of the arguments points to a file that does not exist.
- Prints nothing to stdout.
- Removes any temporary files the script creates.
- Runs in a reasonable amount of time (takes no longer than 20 seconds on any of the provided images when run on klaatu).
Extra credit: The convert
program you will be using is capable of much more than just converting images from one format to another. It can do simple image transformations like resizing and changing an image to grayscale, as well as more complication operations such as sharpen, blur, etc. These are described in the extensive documentation. To receive extra credit for this exercise, you must implement at least two of these operations as optional arguments to your script (the built-in getopt
may prove useful for this). Be careful that implementing these does not interfere with meeting the spec, as your script will first be evaluated without any extra credit features. Please document any extra credit features you implement in a file extra_credit.txt
.
Implementation Advice
Conversion: You will use ImageMagick's convert
program to produce the final png files. The first step of the conversion, however, is to convert from the format of the input text files to a text format covert
can understand. The format of the input text files is shown below.
width
height
r g b
r g b
...
The lines for width and height will give the image's width and height in pixels. Each line after that gives the color of a single pixel in the image in terms of the red, green, and blue values. For example, img_c.txt
begins like this.
315
253
86 82 79
77 73 70
The pixel values are listed in something called row-major order, meaning each pixel in the top row of the image is listed, left-to-right, followed by each pixel in the second row, and so on. As a complete example, here's what the input file would look like for this 3-pixel-by-3-pixel image: .
3
3
255 0 0
0 255 0
0 0 255
0 0 255
255 0 0
0 255 0
0 255 0
0 0 255
255 0 0
The format convert
requires is similar, but different enough that you will need to think carefully about how you will do the conversion. Using the 3-by-3 image again, here's what the converted text file would look like.
# ImageMagick pixel enumeration: 3,3,255,srgb
0,0: (255,0,0)
1,0: (0,255,0)
2,0: (0,0,255)
0,1: (0,0,255)
1,1: (255,0,0)
2,1: (0,255,0)
0,2: (0,255,0)
1,2: (0,0,255)
2,2: (255,0,0)
This format must be followed exactly for convert
to recognize the text as corresponding to image data. Note the contents of the first line, and the position of the width and height within that line. Also note that each pixel value is preceded by its x-y position in the image. To extract the width and height from the input file, I recommend looking into the -n
option for the head
and tail
commands. To transform the pixel values separated by spaces into pixel values separated by commas, you might consider using parameter substitution or the tr
program.
Once you have assembled a text file in the correct format, you invoke convert
as follows.
convert txt:input_filename output_filename
Though your script should not do this, you can check to see if your script produced a reasonable-looking image with the display
command.
Processing arguments: You may find the following built-in features useful when processing the arguments to your script.
-
$#
is the number of arguments that were provided.
-
$@
is a list of all the arguments.
-
shift
throws out the first argument (i.e., $1
), and shifts all the other arguments down one (i.e., $2
becomes $1
, and so on). This affects $#
.
-
-e
is the option that tests if a file exists.
-
read
can be used to iterate over a file one line at a time as follows.
while read line
do
# do things with $line
done < filename
Printing to stderr: Recall that since stdout
is file descriptor 1 and stderr
is file descriptor 2, you can redirect output from stdout
to stderr
with 1>&2
.
Turn-in Instructions
Use the turn-in drop box to submit the files you created to solve these two exercises. These include
profile.sh
- a gnuplot file, if you used one (e.g.,
plot.gp
)
txt_to_png.sh
extra_credit.txt
, if you implemented any extra credit
Please do not turn in any of the input files. Your solutions will be tested with the input files as they were provided to you, so do not make any modifications to them as part of your solution. You may use up to two late days on this assignment. If you do the exercises on klaatu, you can use scp (pscp or winscp on Windows) to transfer the files to your local machine, so they can be submitted using the local web browser.