The Zorkians are overwhelmingly impressed
with your ability to interpret their files. On the long
flight
back to Zork
for refueling, they begin taking computer science lessons
from you, and are doing pretty well until you get to complexity
theory. Then they have trouble believing your claims about what
makes a program efficient. You are challenged to
prove
your claims
before a panel of Zorkian wizards. Now, even
though they hadn't developed computers, the people of Zork are skilled
mathematicians and scientists. So they expect a scientific
approach from you. If you succeed, you will receive an
honorary
doctorate
of astronomy. If you fail, well...
let's
not go there
. This time you might be on your own; in this sector
of the cosmos, communication with other Earthlings is possible only by
limited means. At most you could persuade your hosts to dock
with one other saucer.
So, that's it. Design and conduct an experiment to verify, or
prove wrong, some interesting question related to algorithmic
complexity. Start by stating a hypothesis; figure out how to
test it; conduct the experiment, and report and reflect on the
results. Work with one classmate if you like. Your effort should result in two primary artifacts:
-
A Lab Report describing the experiment and its results
-
Java code and possibly data files used in conducting the experiment
Please organize the Lab Report exactly as follows:
-
Abstract: Your name(s), date, course, a title, and a
one-paragraph summary of the rest of the report.
-
Hypothesis: Statement of the hypothesis, followed by an explanation,
if needed.
-
Methodology. Explain how the experiment was
conducted. For example: What exactly do the tests consist of?
Where does the data come from? How many tests? How are
the results collected? Where does the code come from?
Etc. etc.
-
Results: the data which was collected; as much or as little as
you feel is needed for credibility; in addition to the raw data
which was collected, you might process it in some way (even by
hand or by spreadsheet) for purposes of analysis or summarization.
[Feel free to turn additional files with more data, as long as
your report mentions this and tells what is in the files.]
-
Conclusions: Was the hypothesis verified? Why or why
not? What issues remain, if any? What
should be followed up on or done differently if more experiments
were to be conducted? Etc. You do
not have to apply formal methods of statistical validation
(confidence limits, etc.). Informal arguments suffice, if
they are supported by the data and convincingly presented.
Part 1 Deadline
: Paper only, in quiz
section on Tuesday, May 21. Turn-in a draft of the lab report,
as follows:
-
The Abstract may be omitted except for your name(s) and section.
If you are going to dock with a partner, you must say so at this
time.
Each
partner should turn in an exact duplicate
of the Part 1 Lab Report (yes, even if you are in the same
section!). You cannot add or change partners after this
point.
-
The Hypothesis should be as complete and careful as you can make
it.
-
The Methodology section can be fairly sketchy, but should show
that you have thought through the problem and have a clear idea
how to proceed.
-
The Results section should be a brief status report: state where
you are in the process; how far along you are with the
programming, whether you have you data lined up, etc. It
should also state where you expect to get materials such as data,
Java classes, etc.; or that you are developing it all yourself.
-
The Conclusions can be empty.
Part 2 Deadline
: Electronic, Thursday night, May 23, 9:00pm; paperwork
in
lecture Friday. If you work as a team, only one person should
turn in material. A README file is still required. For
example, it should explain how to run the program, as usual (which is
the main class, the starting method, what files are needed, what kind
of user inputs are expected, etc, etc). The README doesn't need
to duplicate any information that is in the Lab Report.
Turn in all code files necessary to compile and run the
program. This means that if you used Java from an outside
source, you must turn in a copy so that we can compile and run your
program. If your experiments require data files, turn in some
samples.
This time, package all the files (including the Lab Report) into a single .jar file and submit
that. The option to turn in individual files will not be
available (unless you hear otherwise later). Hint: you can
create a .jar file in BlueJ, or with tools in the Sun Java SDK.
Please try this out
before
you need to! If for some
reason the Lab Report isn't printed properly on the turn-in receipt,
please print the file separately and hand that in separately.
(This could happen if the file is in .doc format, for example).
The Lab Report for Part 2 will be substantially different from the one you
turned in for Part 1, since it should contain the abstract, details of the
methodology, the actual results, and your conclusions.
In writing code to conduct the experiments, you are free to use any
Java code or test files or information from any source, as long
as
-
you use Java source code and turn it in; no precompiled .class
files or other binary programs should be turned in.
-
the source material is openly and freely available to all and has been so
since before you were given this assignment. It can't be
"this CD which came with a book I bought" or "this
code my former roommate left on his computer." It can
be "this code which accompanies a book and which is available
without charge on the web" or "this code which my former
roommate completed and which has been posted on his public web
site and indexed by Google since before the assignment was
given."
-
the source is given full credit (ie, you state who the authors
are, to the best of your knowledge, and where the material can be
found).
-
The results must come from your efforts alone. That is,
even if you find appropriate experiments and results publicly
available, you must conduct the experiments yourself and report
your own results.
You are encouraged to use the Message Board to bounce ideas around,
share results (but not share code of any significant size).
Grading Notes:
As you might have gathered, the Lab Report will weigh heavily in
the grade. In addition to containing the requested information,
it should be well written and have a professional tone. The
Engineering Writing Center can help students work on their writing, if
that is a concern.
The Java program will be evaluated in terms of its suitability for
its purpose. As always, follow principles of good programming
practice, although that will not be a major focus of grading this
time.
There are no internal technical requirements this time, in terms of
Java features that must be present.
There are no specific requirements for the user
interface.
P-Point opportunity
: If you so designate, all
your files (except the README) will be made publicly available on the
web. This is optional. If you allow
your files to be published, it will not affect your grade on this
project, but will earn a participation point.
You will want to avoid placing any sensitive personal information in
the files, such as your student ids.
Go to the
Turn-in form
for part 2
Notes on Picking a Hypothesis.
Good hypotheses should be narrow, easy to state and to understand
(although not necessarily easy to prove!), and ones that can be
plausibly investigated under the time constraints. It would also
be nice if you chose an "authentic" hypothesis, that is, one
which answers a real question that you have, or settles a real doubt
in your mind. The following are JUST some examples.
A fine idea as a starting
point, but too vague:
|
Quicksort is faster than
Insertion Sort
|
Better:
|
The QuickSort in the Java
library will out-perform the Insertion sort given the lecture
notes, when run on large arrays.
|
Better yet:
|
When run on the same computer
under similar conditions, the QuickSort in the Java library will
run faster than Insertion sort as coded in the lecture notes on
all randomly generated arrays of doubles of size greater than
1000.
|
Good idea, but hard to
prove, since it might (or might not) depend on the program and
the hardware
|
The hardware you run on makes
only a constant factor difference in how fast a program runs
|
Better yet:
|
The same compiled Java program
which correctly implements the binary search algorithm, when run
on two computers which have processors of differing speeds, will
differ in execution time by a constant factor for large data
sets.
|
Even better:
|
add information about which
processors, etc. are to be compared.
|
Good start, interesting
question:
|
A
QuickSort hand-coded by a programmer will outperform the quick
sort in the Java library
|
Good question, but
experiments might not be the right way to approach it
|
QuickSort will sometimes run
slower than Insertion sort, even for very large data sets.
|
Making the hypothesis
testable:
|
QuickSort will generally run
slower than Insertion sort, even for large data sets, if the
data is <some fact or property>...
|
A few more examples of a
plausible starting idea. There are hundreds more ideas out
there.
|
QuickSort
with Partition choice <X> will run faster than QuickSort
with Partition choice <Y>
Binary Search
implemented iteratively will outperform Binary Search
implemented recursively.
If a random series of
adds and removes is done, a Linked List will outperform an
ArrayList
If a random series of
adds is done, a Linked List with a "last" pointer will
significantly outperform a Linked List without one.
|
The examples shown all involve a comparison. That isn't
strictly necessary. For example, "QuickSort exhibits an
nlogn performance curve" is something you could try to prove or
disprove by experiment. (Caution: it's not enough to point to a
graph and say "it looks like an nlogn curve.")
The examples given all involving searching or sorting. You don't have
to stick with those topics. The criterion, as given above, is simply
"some interesting question related to algorithmic
complexity". You can explore other Java collections,
variations on familiar structures (e.g, circular or double-linked
lists); data
structures that you have read about or that we will come to later in
the course, algorithms for sorting (there are
zillions of them), etc.. But time is limited, so
don't byte off more
than you can chew
.
You do not need your TA's approval for the topic. But you are
encouraged to visit me or a TA, or a consultant in the IPL, to talk over your ideas.
Using What You Know
Some reminders:
System.CurrentTimeMillis(). Use this (or date or
calendar) just before and just after the events you want to
time. Milliseconds aren't a very fine unit, so for small, fast
sections of code, don't expect great accuracy. One standard way
to improve accuracy is to run each test case more than once, and use
the average value. Remember that
file operations are hugely slower than memory access, so don't include
I/O time (unless that happens to be what you want to measure)
Streams: you know how to read and write them now. You don't
HAVE to necessarily, but it's one option for getting data or
collecting it. Remember there are handy file generator programs
around, too.
Inheritance: clever use of superclasses (including abstract classes
and interfaces as well as concrete superclass) can help prevent
writing duplicate code (especially when the objects are used as
parameters). Use of subclassing can help avoid duplicate code
too, in completely different circumstances.
Programming by contract: Documenting and/or checking invariants
(and writing executable assertions when possible) can speed debugging.
Questions? Yes, I'll bet there are! Ask them.
Sooner rather than later.
|