CSE 303: Autumn 2006, Assignment 2

Due: Monday 16 October 2006, 2:00pm

Updates

Thursday, October 12: Changed the "Advice and Hints" section to make it more clear what to do with the two image directories.

Assignment Goal

This is an individual assignment in which you will debug a bash script and write a bash script. Problem 2 is more difficult than problem 1.

1. Debug a Bash Script

Get mycat2buggy.sh from the course website. It is a buggy solution to the extra-credit from Assignment 1. Make mycat2.sh by fixing mycat2buggy.sh.

Do not make large changes to the file; try to change as little as possible.
There are 7–8 bugs (depending how you count) that can be fixed by changing, adding, or deleting a total of 16 characters.
Recall mycat2.sh should cat its third-through-last arguments, sending stdout to its first argument and stderr to its second argument (exiting if there are fewer than three arguments or one of the first two arguments is an existing file). If the arguments have repeats (i.e., multiple arguments are the same string), it should cat each file only once.

2. Write a Bash Script

Write a shell script, described below, which creates a web pages containing the original and modified images from another web page.

Motivation

You have joined a research team at UW that is developing face recognition algorithms. A publication deadline is coming up, and the graduate student you are working with needs images as test cases for the system. She has asked you to help her locate them. But there is a problem: the algorithm relies on the ability to detect edges in images, and this research team does not have access to a sophisticated edge detector. The graduate student wants you to collect images from the web and create a new web page that shows each image alongside an edge-detected version of the image. She will look through what you produce in order to find examples that are likely to succeed or likely to fail.

Approximate Size

The sample solution is 82 lines, including 30 lines that are blank or have only a comment, and another 51 lines that process command-line arguments. The sample is verbose. You should not need significantly more than this to do this assignment.

Specification

Write a script called processImages.sh that does the following.

Takes two arguments. The first is a URL for a web page with images. The second is a name for a new subdirectory. If the script is not given exactly two arguments, exit with an appropriate error message and a return code of 1. If the first argument is a not downloadable URL, exit with an appropriate error message and a return code of 2. If the second argument exists, ask the user before overwriting it. All error messages should be output on stderr.
Creates a new directory (named the second argument) and put all new files in that directory.
Downloads all images in the given web page (the first argument) that can be found in other URLs. Each line of the web page may have multiple image URLs, but you only need to get one of them in this case (it does not matter which one). You may also assume that each image URL has the following properties:
- It begins with "http://"
- It ends with ".jpg"
- It is surrounded by double-quotes.
Create a file index.html inside the new directory that is an HTML file (see separate documentation for a description of HTML and sample output) containing each downloaded pictures next to an edge-detected version of that picture.
Includes a link to the original URL at the top of the web page you output.
You may use other “temporary” files (and probably will use at least two), but delete them
before your script completes. (Note that while debugging it’s helpful not to delete them.)

Advice and Hints

Here is a detailed description of the sample solution, which uses some tricks. On attu, you can use firefox to view HTML. (Use Ctrl-o to open a local file, or supply the file name as a command line argument.)

First, check the number of command-line arguments as usual, using appropriate file-test operators.
Then check that the url is accessible. Use the command wget with the option --spider to do this. The output of this command is ugly, so redirect both stdout and stderr to /dev/null to get rid of it. (You may not want to redirect output while testing.) You can detect whether or not wget was successful by checking its return code. (See the man page on wget for more information on this useful tool.)
Then check to see if the second argument exists and is writable. If it is, Ask the if she wants to overwrite it. Do this in the following way:
- Print a prompt string that explains the situation and asks the user to enter "y" to overwrite the file or "n" to exit.
- Use the read command to get user input. (The -n option is handy for limiting the response to a single character.)
- Test that the response is "y", and if so, exit.
Then create the new directory and cd into it.
Then download the web page to a temporary file. Use wget with the -O (capital O) option to specify the file name, and don't forget to redirect output to /dev/null!
Then create two new directories, one for the original images and and one for the edge-detected versions, and change into the directory for the original versions.
Then use grep and sed to make another temporary file.
- First, get all lines that have image URLs in them.
- Then for all these lines, use sed to "match the whole line" but replace the part before the URL with wget, the URL with itself, and the part after the URL with nothing .
- Save the results in a temporary file.
Then check to see if this temporary file is empty, and if so, exit. This makes it easier to avoid problems later if no images are downloaded.
Then download the images by using source on the temporary file. Again, redirect output to /dev/null.
Then use a for loop on the images in the current directory to generate edge-detected versions of each image. Use the convert command with the option -edge 1. The output file should have the same name as the original image, but should be stored in the second directory you created earlier.
Then use echo to produce the "beginning stuff" for index.html.
Then use another for loop to output HTML tags into index.html that show the original image followed by the edge-detected image.
Then use echo to produce the "ending stuff" for index.html.
Finally, remove any temporary files you made. (Don't delete the image files!)

You can check your output on this web page, which should output this.

Be Careful: As programmers, you have the ability to write buggy programs that harm others. For example, if you write an infinite loop that constantly downloads files from the above web page, you (and possibly your instructor) could get in trouble. Please be careful!

3. Extra Credit

Remember the course policy on extra credit. You may do either or both of the following problems.

Our script misses a lot of images on many web pages. For example, any image that has a relative URL (one that does not begin with "http://") will be missed by our program. Make processImagesRelative.sh, which is like processImages.sh but handles relative URLs.
In the readme.txt file associated with this assignment, explain why it was important for you to put a link to the original web page that the images came from. Hint: This is not a technical question.

Assessment

Your solutions should be

Correct shell scripts that run on attu.cs.washington.edu
In good style, including indentation and line breaks
Of reasonable size

Turn-in instructions

Use the standard turn-in instructiond described here. Your directory should contain the files mycat2.sh, and processImages.sh. If you do the extra credit, you should also turn in processImagesRelative.sh and/or readme.txt.