[an error occurred while processing this directive]

Project 6

"Prove It!"

The Zorkians are overwhelmingly impressed with your ability to interpret their files. On the long flight back to Zork for refueling, they begin taking computer science lessons from you, and are doing pretty well until you get to complexity theory. Then they have trouble believing your claims about what makes a program efficient. You are challenged to prove your claims before a panel of Zorkian wizards. Now, even though they hadn't developed computers, the people of Zork are skilled mathematicians and scientists. So they expect a scientific approach from you. If you succeed, you will receive an honorary doctorate of astronomy. If you fail, well... let's not go there . This time you might be on your own; in this sector of the cosmos, communication with other Earthlings is possible only by limited means. At most you could persuade your hosts to dock with one other saucer.

So, that's it. Design and conduct an experiment to verify, or prove wrong, some interesting question related to algorithmic complexity. Start by stating a hypothesis; figure out how to test it; conduct the experiment, and report and reflect on the results. Work with one classmate if you like. Your effort should result in two primary artifacts:

A Lab Report describing the experiment and its results
Java code and possibly data files used in conducting the experiment

Please organize the Lab Report exactly as follows:

Abstract: Your name(s), date, course, a title, and a one-paragraph summary of the rest of the report.
Hypothesis: Statement of the hypothesis, followed by an explanation, if needed.
Methodology. Explain how the experiment was conducted. For example: What exactly do the tests consist of? Where does the data come from? How many tests? How are the results collected? Where does the code come from? Etc. etc.
Results: the data which was collected; as much or as little as you feel is needed for credibility; in addition to the raw data which was collected, you might process it in some way (even by hand or by spreadsheet) for purposes of analysis or summarization. [Feel free to turn additional files with more data, as long as your report mentions this and tells what is in the files.]
Conclusions: Was the hypothesis verified? Why or why not? What issues remain, if any? What should be followed up on or done differently if more experiments were to be conducted? Etc. You do not have to apply formal methods of statistical validation (confidence limits, etc.). Informal arguments suffice, if they are supported by the data and convincingly presented.

Part 1 Deadline : Paper only, in quiz section on Tuesday, May 21. Turn-in a draft of the lab report, as follows:

The Abstract may be omitted except for your name(s) and section. If you are going to dock with a partner, you must say so at this time. Each partner should turn in an exact duplicate of the Part 1 Lab Report (yes, even if you are in the same section!). You cannot add or change partners after this point.
The Hypothesis should be as complete and careful as you can make it.
The Methodology section can be fairly sketchy, but should show that you have thought through the problem and have a clear idea how to proceed.
The Results section should be a brief status report: state where you are in the process; how far along you are with the programming, whether you have you data lined up, etc. It should also state where you expect to get materials such as data, Java classes, etc.; or that you are developing it all yourself.
The Conclusions can be empty.

Part 2 Deadline : Electronic, Thursday night, May 23, 9:00pm; paperwork in lecture Friday. If you work as a team, only one person should turn in material. A README file is still required. For example, it should explain how to run the program, as usual (which is the main class, the starting method, what files are needed, what kind of user inputs are expected, etc, etc). The README doesn't need to duplicate any information that is in the Lab Report.

Turn in all code files necessary to compile and run the program. This means that if you used Java from an outside source, you must turn in a copy so that we can compile and run your program. If your experiments require data files, turn in some samples.

This time, package all the files (including the Lab Report) into a single .jar file and submit that. The option to turn in individual files will not be available (unless you hear otherwise later). Hint: you can create a .jar file in BlueJ, or with tools in the Sun Java SDK. Please try this out before you need to! If for some reason the Lab Report isn't printed properly on the turn-in receipt, please print the file separately and hand that in separately. (This could happen if the file is in .doc format, for example).

The Lab Report for Part 2 will be substantially different from the one you turned in for Part 1, since it should contain the abstract, details of the methodology, the actual results, and your conclusions.

In writing code to conduct the experiments, you are free to use any Java code or test files or information from any source, as long as

you use Java source code and turn it in; no precompiled .class files or other binary programs should be turned in.
the source material is openly and freely available to all and has been so since before you were given this assignment. It can't be "this CD which came with a book I bought" or "this code my former roommate left on his computer." It can be "this code which accompanies a book and which is available without charge on the web" or "this code which my former roommate completed and which has been posted on his public web site and indexed by Google since before the assignment was given."
the source is given full credit (ie, you state who the authors are, to the best of your knowledge, and where the material can be found).
The results must come from your efforts alone. That is, even if you find appropriate experiments and results publicly available, you must conduct the experiments yourself and report your own results.

You are encouraged to use the Message Board to bounce ideas around, share results (but not share code of any significant size).

Grading Notes:

As you might have gathered, the Lab Report will weigh heavily in the grade. In addition to containing the requested information, it should be well written and have a professional tone. The Engineering Writing Center can help students work on their writing, if that is a concern.

The Java program will be evaluated in terms of its suitability for its purpose. As always, follow principles of good programming practice, although that will not be a major focus of grading this time.

There are no internal technical requirements this time, in terms of Java features that must be present.

There are no specific requirements for the user interface.

P-Point opportunity : If you so designate, all your files (except the README) will be made publicly available on the web. This is optional. If you allow your files to be published, it will not affect your grade on this project, but will earn a participation point. You will want to avoid placing any sensitive personal information in the files, such as your student ids.

Go to the Turn-in form for part 2

Notes on Picking a Hypothesis.

Good hypotheses should be narrow, easy to state and to understand (although not necessarily easy to prove!), and ones that can be plausibly investigated under the time constraints. It would also be nice if you chose an "authentic" hypothesis, that is, one which answers a real question that you have, or settles a real doubt in your mind. The following are JUST some examples.

A fine idea as a starting point, but too vague:	Quicksort is faster than Insertion Sort
Better:	The QuickSort in the Java library will out-perform the Insertion sort given the lecture notes, when run on large arrays.
Better yet:	When run on the same computer under similar conditions, the QuickSort in the Java library will run faster than Insertion sort as coded in the lecture notes on all randomly generated arrays of doubles of size greater than 1000.
Good idea, but hard to prove, since it might (or might not) depend on the program and the hardware	The hardware you run on makes only a constant factor difference in how fast a program runs
Better yet:	The same compiled Java program which correctly implements the binary search algorithm, when run on two computers which have processors of differing speeds, will differ in execution time by a constant factor for large data sets.
Even better:	add information about which processors, etc. are to be compared.
Good start, interesting question:	A QuickSort hand-coded by a programmer will outperform the quick sort in the Java library
Good question, but experiments might not be the right way to approach it	QuickSort will sometimes run slower than Insertion sort, even for very large data sets.
Making the hypothesis testable:	QuickSort will generally run slower than Insertion sort, even for large data sets, if the data is <some fact or property>...
A few more examples of a plausible starting idea. There are hundreds more ideas out there.	QuickSort with Partition choice <X> will run faster than QuickSort with Partition choice <Y> Binary Search implemented iteratively will outperform Binary Search implemented recursively. If a random series of adds and removes is done, a Linked List will outperform an ArrayList If a random series of adds is done, a Linked List with a "last" pointer will significantly outperform a Linked List without one.

The examples shown all involve a comparison. That isn't strictly necessary. For example, "QuickSort exhibits an nlogn performance curve" is something you could try to prove or disprove by experiment. (Caution: it's not enough to point to a graph and say "it looks like an nlogn curve.")

The examples given all involving searching or sorting. You don't have to stick with those topics. The criterion, as given above, is simply "some interesting question related to algorithmic complexity". You can explore other Java collections, variations on familiar structures (e.g, circular or double-linked lists); data structures that you have read about or that we will come to later in the course, algorithms for sorting (there are zillions of them), etc.. But time is limited, so don't byte off more than you can chew .

You do not need your TA's approval for the topic. But you are encouraged to visit me or a TA, or a consultant in the IPL, to talk over your ideas.

Using What You Know

Some reminders:

System.CurrentTimeMillis(). Use this (or date or calendar) just before and just after the events you want to time. Milliseconds aren't a very fine unit, so for small, fast sections of code, don't expect great accuracy. One standard way to improve accuracy is to run each test case more than once, and use the average value. Remember that file operations are hugely slower than memory access, so don't include I/O time (unless that happens to be what you want to measure)

Streams: you know how to read and write them now. You don't HAVE to necessarily, but it's one option for getting data or collecting it. Remember there are handy file generator programs around, too.

Inheritance: clever use of superclasses (including abstract classes and interfaces as well as concrete superclass) can help prevent writing duplicate code (especially when the objects are used as parameters). Use of subclassing can help avoid duplicate code too, in completely different circumstances.

Programming by contract: Documenting and/or checking invariants (and writing executable assertions when possible) can speed debugging.

Questions? Yes, I'll bet there are! Ask them. Sooner rather than later.

Last updated: 05/19/2002 09:55:05 AM

[an error occurred while processing this directive]