Gradient-descent learning

Use a differentiable activation function
- Try a continuous function f ( ) instead of sgn( )
  - First guess: Use a linear unit
- Define an error function (cost function)

Called the Delta Rule
- Minimizes the mean-squared error
- Delta rule = adaline rule = Widrow-Hoff rule = LMS rule

Cost function measures the network’s performance as a differentiable function of the weights

Use a differentiable activation function