Gradient-descent learning
Use a differentiable activation function
- Try a continuous function f ( ) instead of sgn( )
- First guess: Use a linear unit
- Define an error function (cost function)
-
-
-
-
-
-
Called the Delta Rule
- Minimizes the mean-squared error
- Delta rule = adaline rule = Widrow-Hoff rule = LMS rule
Cost function measures the network’s performance as a differentiable function of the weights