Accessing storage space in Termux

Storage Space in Termux

Executing termux-setup-storage (run apt update && apt upgrade to ensure that this tool is available) ensures:

  1. That permission to shared storage is granted to Termux when running on Android 6.0 or later.
  2. That an app-private folder on external storage is created (if external storage exists).
  3. That a folder $HOME/storage is created.

Summary of ML Classifiers in SciKit Learn

5 steps to train ML classifier

  1. Selection of features
  2. Choosing a performance metric
  3. Choose a classifier and optimization algorithm
  4. Evaluate the performance of the model
  5. Tuning the algorithm


  • Easy to understand.
  • Never converge if the classes are not linearly separable.
  • Parametric Model

Logistic Regression:

  • Performs very well on datasets that are linearly separable.
  • Output of sigmoid function can be interpreted as probability of event happening.
  • Uses regularization to add bias to reduce the chances of overfitting. Regularization is an useful method to handle collinearity, which is correlation between features, filter out noise from data and eventually prevent overfitting. It introduces additional bias to penalize extreme parameter weights. For it to work, all features must be on comparable scale.
  • Works with streaming data if using stochastic descent.
  • Does not perform well if there are outliers
  • Parametric Model

Linear Support Vector Machine (SVM):

  • Maximise the distance between the decision boundary and the nearest sample points (maximise margins)
  • Good generalization when distance is maxmised, and overfitting when the distance is minimised.

Kernel SVM :

  • Basic idea is to create nonlinear combination of the original features and project htem into higher dimensions via a mapping fnction so that it becomes linearly separable.
  • Common kernel is radial basis function or gaussian kernel

Decision Tree:

  • Easy to understand
  • Prone to overfitting. Controlled by pruning the trees.
  • At root, data is split on features that results in the biggest information gain
  • Non-parametric model

Random Forest:

  • Ensemble of Decision Trees. Combine weak learners to build a robust model that has better generalisation and less overfitting.
  • Hard to understand parameters and there isnt much parameters to tune.
  • Non-parametric model

K-means Nearest Neighbour:

  • Computationally Expensive
  • Right choice of k determines over or under fitting.
  • Very susceptible to overfitting.
  • Does not require training data. Classifier adapts as new data comes in.
  • Non-parametric model


Standard Deviation

If the population of interest is approximately normally distributed, the standard deviation provides information on the proportion of observations above or below certain values.

For example, the average height for adult men in the United States is about 70 inches (177.8 cm), with a standard deviation of around 3 inches (7.62 cm). This means that most men (about 68%, assuming a normal distribution) have a height within 3 inches (7.62 cm) of the mean (67–73 inches (170.18–185.42 cm)) – one standard deviation – and almost all men (about 95%) have a height within 6 inches (15.24 cm) of the mean (64–76 inches (162.56–193.04 cm)) – two standard deviations.

If the standard deviation were zero, then all men would be exactly 70 inches (177.8 cm) tall. If the standard deviation were 20 inches (50.8 cm), then men would have much more variable heights, with a typical range of about 50–90 inches (127–228.6 cm).

Three standard deviations account for 99.7% of the sample population being studied, assuming the distribution is normal (bell-shaped). (See the 68-95-99.7 rule, or the empirical rule, for more information.)

Bias and Variance

In statistics, the bias (or bias function) of an estimator is the difference between this estimator’s expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. Otherwise the estimator is said to be biased.

In statistics and machine learning, the bias–variance tradeoff is the property of a set of predictive models whereby models with a lower bias in parameter estimation have a higher variance of the parameter estimates across samples, and vice versa. The bias–variance dilemma or problem is the conflict in trying to simultaneously minimize these two sources of error that prevent supervised learning algorithms from generalizing beyond their training set:[citation needed]

The bias is an error from erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting).
The variance is an error from sensitivity to small fluctuations in the training set. High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs (overfitting).

Training ML Algo for Classification

Neural Network Binary Classification

  1. Initialize the weights to 0 or small random numbers
  2. For each training sample x, perform the following steps:
    1. Computer the output value γ
    2. Update the weights

Update of weight

w¹ = w¹ + Δw


Δw = n ( y – y¹ ) * x

where n is learning rate between 0.0 and 1.0

Convergence of perceptron is only guaranteed if the 2 classes are linearly separable and learning rate is sufficiently small. Otherwise set a max number of passes over training dataset (epochs) or a threshold for the number of tolerated misclassifications.

Adaptive Linear Neurons (Adaline)

1 of the key ingredient of a supervised machine learning algo is to define an objective function that is to be optimized during learning process. Often, it is a cost function to be minimized. In Adaline, it is the reduce the sum of squared errors.

Also, the weights are updated based on a linear activation function rather than unit step function.

Let cost function defined as J. Differentiate J to find the global minimum.

Weight update is based on all samples in training set instead of incrementally after each sample, hence this is called batch gradient descent.

Stochastic gradient descent

Reaches convergence faster because of frequent weight updates.

Python syntax

New f-string syntax in Python 3.6

>>> name = 'Fred'
>>> age = 42

>>> f'He said his name is {name} and he is {age} years old.'

He said his name is Fred and he is 42 years old.


Underscore _ as a variable name in Python

A throwaway variable in Python or the last expression evaluated