Neural Network Binary Classification
- Initialize the weights to 0 or small random numbers
- For each training sample x, perform the following steps:
- Computer the output value γ
- Update the weights
Update of weight
w¹ = w¹ + Δw
Δw = n ( y – y¹ ) * x
where n is learning rate between 0.0 and 1.0
Convergence of perceptron is only guaranteed if the 2 classes are linearly separable and learning rate is sufficiently small. Otherwise set a max number of passes over training dataset (epochs) or a threshold for the number of tolerated misclassifications.
Adaptive Linear Neurons (Adaline)
1 of the key ingredient of a supervised machine learning algo is to define an objective function that is to be optimized during learning process. Often, it is a cost function to be minimized. In Adaline, it is the reduce the sum of squared errors.
Also, the weights are updated based on a linear activation function rather than unit step function.
Let cost function defined as J. Differentiate J to find the global minimum.
Weight update is based on all samples in training set instead of incrementally after each sample, hence this is called batch gradient descent.
Stochastic gradient descent
Reaches convergence faster because of frequent weight updates.