Suppose you are building a software agent that is supposed to support UW students in pursuing their education. You are competing to produce the most intelligent agent of this type, and the winning entry will be named the "Husky Helper".The intent of the contest is to produce the "most intelligent" tool. What characteristics might qualify a tool as "intelligent"? Some possibilities:What features do you feel would be most important to have, in order to win the contest, and how could the judges measure these features?
However, some part of the tool's change could be because it was learning from all users. In general, the more data it has, the more able to make correct learning decisions (i.e. decisions about how to change itself) it will be. The part of the tool's change that is different per user should be the part on which different users will have different preferences or needs. Otherwise, for common needs, the tool might appear to be changing in the same way for all students -- this would be legitimate.
Here's an example: A speech recognition system might improve
if it had more words in its dictionary. It could get new words from
anywhere, (e.g. a product upgrade, or each user might be asked to help
by entering words that the system had trouble with) and then make them
available to all. But it should adapt to each user's accent and style
of speaking separately.
Recovering from failures an important topic in "dialog systems"
that use natural language.
Safety is a very significant issue for agents that can make changes,
not just gather information.
Much of the knowledge the tool might need will be in common among all
users.
And if the tool was fairly "raw" to begin with, and needed to be trained,
it would be appropriate to do a lot of training before releasing the tool
(or before submitting it for judging).
A caution: We may not be able to distinguish whether the tool is figuring out how to do these things, or whether the tool designer just wrote them into the tool. One possible way to tell these apart is that we don't expect the tool designer to have thought of everything -- we'd find boundaries that the tool couldn't go beyond.
A problem in rating tools that perform different tasks is:
How do we take into account the difficulty of the task that they're trying
to perform? Here, we might need to rely on experts in user interfaces,
who would know the relative difficulty of the tasks. For instance,
a natural language interface that uses speech is harder than one that uses
typed sentences. A tool that has a small, fixed, set of "domains"
for which it can recognize requests, and always picks one of them, is simpler
than one that tries to determine when the user is talking about a domain
it doesn't know about. And it would be considerably harder for the
tool to try to add that new domain to its repertoire!