There are two basic classes of function optimization methods: those utilizing derivative information and those not. Derivative-based methods are typically preferred when applicable, since they require fewer function evaluations than equivalent non-derivative methods. Curiously, we found a non-derivative method to work best since it was more robust to singularities where the function gradient was zero.
Since is a sum of many independent factors, we may compute its gradient as
the sum of each factors' gradient. For the heuristic preference of little
joint motion, this is trivial. The additive costs for illegal joint positions
are also easy, since their gradients are always zero. The tricky case is the
handle constraints, since each is affected by an entire hierarchy of node
angles, each of which affects the handle in different ways. We adopted the
standard method of computing each partial derivative of
using the
Jacobian. The Jacobian can be efficiently computed in this case via matrix
multiplication of the affine transforms and their derivatives.
By following the gradient, we can approach a local minimum in the state space that hopefully corresponds to a good solution. After playing around with a very simple gradient descent algorithm, we tried conjugate gradient descent, as implemented in Numerical Recipes in C. Conjugate gradient examines changes in the gradient to choose more intelligent directions and guarantee quadratic convergence. This worked significantly better than our initial approach.
However, once we began to explore multiple constraints, conjugate gradient began to have increasing difficulty finding reasonable solutions. We suspect that this is due to a very challenging optimization surface with many local minima. It is also possible that our gradient computation method had bugs. Turning off the additive penalties for illegal angles gave conjugate gradient additional freedom, but its performance remained unimpressive.
We had a much better experience using Powell's direction set method, also as implemented in Numerical Recipes in C. Powell's method uses a set of direction vectors, initially corresponding to each angle, and minimizes in each direction independently. It then replaces the farthest travelled direction with a new vector, representing the overall direction travelled in this iteration. This implementation does not guarantee quadratic convergence (though it can be made to, with some modifications). Powell's method requires many more function evaluations per iteration than conjugate gradient, making it substantially slower. However, since it knows nothing of the gradient, it also explores the state space somewhat better and was less likely to get stuck.