Guidelines for final reports:
Be sure that your writeup contains:
- Team and member names (one report per group).
- Your goals for the project
- Your system design and algorithmic choices
- Sample screens of typical usage scenarios (if applicable)
- Experiments and results that show how effective your system
is, and where it could be improved. See the discussion at the bottom of
this page.
- Anything you considered suprising or that you learned. What would
you do differently if you could do it over?
- Conclusions and ideas for future work
- Appendices:
- Which people did what parts of the work? Were there problematic
dynamics in your group?
- What externally-written code (if any) was used in your project?
- Instructions on how to start and use your project (including OS
or browser requirements, pointer to a README file (or include that
inline). Is there a website to visit?
There is no limit on length, but we appreciate good organization and tight,
precise writing. Points off for rambling and repetition.
What to Hand In
- Please give Dan a stapled, double-sided, hard-copy of your
report as well as an electronic version of both the report and the
code using class turnin.
- We also need an electronic (.pdf) copy of your in-class presentation
so please submit that at the same time.
- Late submissions for the final project and writeup will not be
accepted, as it will be the end of the quarter and the registrar applies
a hard deadline for grade submission. Sorry! (But happy
holidays).
Experiments
No matter what you do, start by clearly stating the question you are trying to answer.
There are three main ways you can
evaluate your system: two pertain to the system as a whole (including the
UI) and the other looks at the performance of one or more submodules.
- Informal User Study of your System.
This is the most important type of user study and the one that is most
appropriate for people in this class. The basic idea is to watch a small
number of people using your system in order to understand what they are
trying to do, how well it works for them, what confuses them and what could
be improved. It is usually followed by improvements to the UI and perhaps
another evaluation in a process of iterative design improvement. One
reports the user's comments and your subsequent design changes. An
excellent thing to read before doing such a study is: Some
techniques for Observing Users by Kathleen Gomoll. An example of a good
paper which uses this technique is Summarizing
Personal Web Browsing Sessions by Mira Dontcheva et al., UIST
2006. Focus on the evaluation section which starts on page 7, especially the "Informal User Evaluation" on page 8.
- Formal User Study of your System.
Once you have a polished UI design, it is common to do a more detailed study, with a
larger group of subjects, looking for statistically significant results.
It is unlikely that any 454 groups will have time to do this, but here is
an example of one paper which (in my biased view) does such a study nicely:
Improving the Performance of Motor-Impaired Users with
Automatically-Generated, Ability-Based Interfaces, by
Gajos, K. and Wobbrock, J. and Weld, D., CHI 2008
- Module performance study.
Most (if not all) groups should include at least one experiment of this
form. Fortunately, with advance planning, these don't take very
long. Indeed, you did something of this form with HW1 and your evaluation
of the naive Byaes classifier. The
trick is to plan what you will measure before you write your code.
Pick a performance measure that is relevant to the system you have built:
precision? recall? speed? accuracy? throughput? latency? In the simplest
case, just measure this aspect of your system. Ideally, however, you will
measure two versions of your system and compare the two. For example,
classifier accuracy using a bag of words representation vs bag of words
augmented with part of speech tags. Or throughput with and without your
snazzy caching scheme. This is why it is important to plan such an
experiment before you have implemented the caching mechanism - so you can
easily turn it on and off. Here's one example of a paper which include
results of this form:
Information
Extraction from Wikipedia: Moving Down the Long Tail by Fei Wu, Raphael
Hoffmann, and Daniel S. Weld, KDD 2008. Specifically, the experiment summarized in Figure 4.
In all cases try to present your results graphically (instead of a big
table). When creating such a graph, beware of Microsoft Office
default templates which include
gratuitous chart
junk. Instead maximize the data/ink ration. See this discussion of data presentation principles.