From: Danny Wyatt (danny_at_cs.washington.edu)
Date: Sun Oct 19 2003 - 17:55:39 PDT
Evolving Robot Tank Controllers
Jacob Eisenstein
Summary:
Eisenstein designed a representation of Robocode controllers that allows
the controller program to be learned with genetic programming. He
learned and evaluated his controllers against a set of hand-coded
controllers, and he discusses the results.
Important Ideas:
Finding the right fitness function is very important. As a scored game,
Robocode already provides an easy way to evaluate success, but most
human observers are not content with the raw score. Eisenstein adjusted
his fitness function to account for more nuanced appreciations of
Robocode battles.
Learned behavior is hard to predict, but beneficial to understand.
Controllers that choose to win hugely in one game and lose the rest
succeed when scored on total points. Controllers that never shoot can
win against opponents that (possibly irrationally) choose to shoot.
What can these developments teach us about the environment and our
assumptions about it?
Flaws:
It's all about the representation. Like many genetic programming
solutions, the representation of the problem and the genome build in
many assumptions that cannot be learned around. For example, Eisenstein
did not give the onHitByBullet or onHitRobot events control of the gun
"since they don't know where the opponent is". These events certainly
contain enough information to infer some things about the opponent's
location, and the onHitRobot event can give enough information to more
accurately predict the opponent's location than the onScannedRobot
event. But, because Eisenstein built that out of his representation a
priori, the controllers will never have a chance to learn that.
(Indeed, he says his controllers could never beat Tracker, and that may
be the fault of this choice in the representation.)
Uncontrolled experiments. Eisenstein seems to suggest that he changed
his fitness function while he was running his experiments, and he may
not have controlled earlier results to account for this change. More
broadly, there are so many variables at play that it is hard to
determine what changes lead to success---if success itself can even be
well-defined.
Open Questions:
How to generalize? Eisenstein says that his evolved controllers don't
adjust to random start positions and can't handle all opponents evenly.
How to avoid catatonics? In an environment where penalties discourage
experimentation it is easy for the most rational choice to become "do
nothing". How can learning be gotten off the ground before playing for
keeps?
This archive was generated by hypermail 2.1.6 : Sun Oct 19 2003 - 17:55:21 PDT