Stashaway Risk Index

Did a little bit of backtesting using Google Spreadsheets. Assume that I invest $100 every Tuesday and Thursday.

Currently I have 2 portfolio in Stashaway with different risk index. Below is the current weighting for Stashaway Risk Index 20%

XLP 14.4%
XLY 14.8%
XLK 3.0%
VBK 0.0%
AAXJ 13.9%
CWB 14.8%
TLT 8.4%
TIP 14.8%
GLD 14.8%
CASH 1.1%

The performance as below:

Tenor Returns
1 week 0.19%
2 week -0.22%
1 month 0.13%
3 month -0.24%
6 month -1.70%
1 year -2.67%
2 year -0.14%
3 year 3.40%
5 year 7.13%
10 year 26.38%

These figures are absolute returns (not annualised).  The strong performers in the portfolio are XLP, XLY, XLK, mid performers are AAXJ and CWB.


For Risk Index 26%, the weights are as such,

XLP 14.8%
XLY 15.8%
XLK 0.0%
VBK 14.8%
AAXJ 14.8%
CWB 14.8%
TLT 0.0%
TIP 8.9%

The main difference is the inclusion of VBK and the exclusion of XLK and TLT.  I wonder why XLK is replaced by VBK when the sector is quite different.

Performance as below.

Tenor Returns
1 week 0.20%
2 week -0.42%
1 month -0.23%
3 month -1.19%
6 month -2.94%
1 year -3.46%
2 year 0.34%
3 year 5.21%
5 year 9.20%
10 year 32.81%

The main reason for the out-performance is because TLT was dropped.

In both portfolio, GLD as a component is lagging now but it hasn’t always been like that. However in my personal opinion, investment in gold does not produce dividends, nor does it improve business/life in general. Having it as wealth preservation seems to stem from the fact that it was previously a control for currencies.

All in all, these figures arent very good for investments. It is possible that volatility has been reduced but the average returns are just too low. I was honestly expecting at least 5-6%.



Accessing storage space in Termux

Storage Space in Termux

Executing termux-setup-storage (run apt update && apt upgrade to ensure that this tool is available) ensures:

  1. That permission to shared storage is granted to Termux when running on Android 6.0 or later.
  2. That an app-private folder on external storage is created (if external storage exists).
  3. That a folder $HOME/storage is created.

Summary of ML Classifiers in SciKit Learn

5 steps to train ML classifier

  1. Selection of features
  2. Choosing a performance metric
  3. Choose a classifier and optimization algorithm
  4. Evaluate the performance of the model
  5. Tuning the algorithm


  • Easy to understand.
  • Never converge if the classes are not linearly separable.
  • Parametric Model

Logistic Regression:

  • Performs very well on datasets that are linearly separable.
  • Output of sigmoid function can be interpreted as probability of event happening.
  • Uses regularization to add bias to reduce the chances of overfitting. Regularization is an useful method to handle collinearity, which is correlation between features, filter out noise from data and eventually prevent overfitting. It introduces additional bias to penalize extreme parameter weights. For it to work, all features must be on comparable scale.
  • Works with streaming data if using stochastic descent.
  • Does not perform well if there are outliers
  • Parametric Model

Linear Support Vector Machine (SVM):

  • Maximise the distance between the decision boundary and the nearest sample points (maximise margins)
  • Good generalization when distance is maxmised, and overfitting when the distance is minimised.

Kernel SVM :

  • Basic idea is to create nonlinear combination of the original features and project htem into higher dimensions via a mapping fnction so that it becomes linearly separable.
  • Common kernel is radial basis function or gaussian kernel

Decision Tree:

  • Easy to understand
  • Prone to overfitting. Controlled by pruning the trees.
  • At root, data is split on features that results in the biggest information gain
  • Non-parametric model

Random Forest:

  • Ensemble of Decision Trees. Combine weak learners to build a robust model that has better generalisation and less overfitting.
  • Hard to understand parameters and there isnt much parameters to tune.
  • Non-parametric model

K-means Nearest Neighbour:

  • Computationally Expensive
  • Right choice of k determines over or under fitting.
  • Very susceptible to overfitting.
  • Does not require training data. Classifier adapts as new data comes in.
  • Non-parametric model


Standard Deviation

If the population of interest is approximately normally distributed, the standard deviation provides information on the proportion of observations above or below certain values.

For example, the average height for adult men in the United States is about 70 inches (177.8 cm), with a standard deviation of around 3 inches (7.62 cm). This means that most men (about 68%, assuming a normal distribution) have a height within 3 inches (7.62 cm) of the mean (67–73 inches (170.18–185.42 cm)) – one standard deviation – and almost all men (about 95%) have a height within 6 inches (15.24 cm) of the mean (64–76 inches (162.56–193.04 cm)) – two standard deviations.

If the standard deviation were zero, then all men would be exactly 70 inches (177.8 cm) tall. If the standard deviation were 20 inches (50.8 cm), then men would have much more variable heights, with a typical range of about 50–90 inches (127–228.6 cm).

Three standard deviations account for 99.7% of the sample population being studied, assuming the distribution is normal (bell-shaped). (See the 68-95-99.7 rule, or the empirical rule, for more information.)

Bias and Variance

In statistics, the bias (or bias function) of an estimator is the difference between this estimator’s expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. Otherwise the estimator is said to be biased.

In statistics and machine learning, the bias–variance tradeoff is the property of a set of predictive models whereby models with a lower bias in parameter estimation have a higher variance of the parameter estimates across samples, and vice versa. The bias–variance dilemma or problem is the conflict in trying to simultaneously minimize these two sources of error that prevent supervised learning algorithms from generalizing beyond their training set:[citation needed]

The bias is an error from erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting).
The variance is an error from sensitivity to small fluctuations in the training set. High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs (overfitting).

Training ML Algo for Classification

Neural Network Binary Classification

  1. Initialize the weights to 0 or small random numbers
  2. For each training sample x, perform the following steps:
    1. Computer the output value γ
    2. Update the weights

Update of weight

w¹ = w¹ + Δw


Δw = n ( y – y¹ ) * x

where n is learning rate between 0.0 and 1.0

Convergence of perceptron is only guaranteed if the 2 classes are linearly separable and learning rate is sufficiently small. Otherwise set a max number of passes over training dataset (epochs) or a threshold for the number of tolerated misclassifications.

Adaptive Linear Neurons (Adaline)

1 of the key ingredient of a supervised machine learning algo is to define an objective function that is to be optimized during learning process. Often, it is a cost function to be minimized. In Adaline, it is the reduce the sum of squared errors.

Also, the weights are updated based on a linear activation function rather than unit step function.

Let cost function defined as J. Differentiate J to find the global minimum.

Weight update is based on all samples in training set instead of incrementally after each sample, hence this is called batch gradient descent.

Stochastic gradient descent

Reaches convergence faster because of frequent weight updates.