Summary of ML Classifiers in SciKit Learn

5 steps to train ML classifier

  1. Selection of features
  2. Choosing a performance metric
  3. Choose a classifier and optimization algorithm
  4. Evaluate the performance of the model
  5. Tuning the algorithm


  • Easy to understand.
  • Never converge if the classes are not linearly separable.
  • Parametric Model

Logistic Regression:

  • Performs very well on datasets that are linearly separable.
  • Output of sigmoid function can be interpreted as probability of event happening.
  • Uses regularization to add bias to reduce the chances of overfitting. Regularization is an useful method to handle collinearity, which is correlation between features, filter out noise from data and eventually prevent overfitting. It introduces additional bias to penalize extreme parameter weights. For it to work, all features must be on comparable scale.
  • Works with streaming data if using stochastic descent.
  • Does not perform well if there are outliers
  • Parametric Model

Linear Support Vector Machine (SVM):

  • Maximise the distance between the decision boundary and the nearest sample points (maximise margins)
  • Good generalization when distance is maxmised, and overfitting when the distance is minimised.

Kernel SVM :

  • Basic idea is to create nonlinear combination of the original features and project htem into higher dimensions via a mapping fnction so that it becomes linearly separable.
  • Common kernel is radial basis function or gaussian kernel

Decision Tree:

  • Easy to understand
  • Prone to overfitting. Controlled by pruning the trees.
  • At root, data is split on features that results in the biggest information gain
  • Non-parametric model

Random Forest:

  • Ensemble of Decision Trees. Combine weak learners to build a robust model that has better generalisation and less overfitting.
  • Hard to understand parameters and there isnt much parameters to tune.
  • Non-parametric model

K-means Nearest Neighbour:

  • Computationally Expensive
  • Right choice of k determines over or under fitting.
  • Very susceptible to overfitting.
  • Does not require training data. Classifier adapts as new data comes in.
  • Non-parametric model



Author: Zac

Think & Do

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s