**Neural Network Binary Classification**

- Initialize the weights to 0 or small random numbers
- For each training sample x, perform the following steps:
- Computer the output value γ
- Update the weights

Update of weight

w¹ = w¹ + Δw

and

Δw = n ( y – y¹ ) * x

where n is learning rate between 0.0 and 1.0

Convergence of perceptron is only guaranteed if the 2 classes are linearly separable and learning rate is sufficiently small. Otherwise set a max number of passes over training dataset (epochs) or a threshold for the number of tolerated misclassifications.

**Adaptive Linear Neurons (Adaline)**

1 of the key ingredient of a supervised machine learning algo is to define an objective function that is to be optimized during learning process. Often, it is a cost function to be minimized. In Adaline, it is the reduce the sum of squared errors.

Also, the weights are updated based on a linear activation function rather than unit step function.

Let cost function defined as J. Differentiate J to find the global minimum.

Weight update is based on all samples in training set instead of incrementally after each sample, hence this is called batch gradient descent.

**Stochastic gradient descent**

Reaches convergence faster because of frequent weight updates.