Giving Computers the Ability to Learn from Data

General concepts of Machine Learning

3 types of learning and basic terminology

– Supervised learning. Learn a model from labeled training data that allows us to predict future or unknown data. Supervised refers to a set of sample where desired output signals (labels) are already known. Eg spam filter

Also known as classification tasks. Another subcategory is regression where the outcome signal is a continuous value.

In regression, we are given a number of predictor (explanatory) variables and a continuous response variable (outcome) and we try to find a relationship between those variables that allow us to predict an outcome.

– Reinforcement learning. Develop a system(agent) that improves its performance based on interactions with the environment. The environment gives back a reward signal. Information about the current state of the environment is a measure  of how well the action was measured by a reward function. Agent then uses reinforcement learning to learn a series of actions that maximizes this reward via an explanatory trial and error approach or deliberate learning. Eg chess learning

– Unsupervised learning deals with unlabelled data or data of unknown structure. It explores the structure of our data to extract meaningful information without the guidance of a known outcome or reward function.

Clustering is an exploratory technique that allows us to organize a pile of information into meaningful clusters without any prior knowledge of their group memberships.

Dimensionality reduction. Each observation comes with its own measurement. Unsupervised dimensionality reduction is a commonly used approach in feature preprocessing to remove noise from data and compress data onto a smaller dimensional subspace while retaining most of the relevant information.

Building blocks for successfully designing machine learning systems

  1. Preprocessing
    1. Feature extraction and scaling
    2. Feature selection
    3. Dimensionality Reduction
    4. Sampling
  2. Learning
    1. Model selection
    2. Cross validation
    3. Performance Metrics. Common use metrics is classification accuracy
    4. Hyperparameter Optimization
  3. Evaluation
  4. Prediction

Install and set up python for data analysis and ML

Use Anaconda Python for the easiest setup.


Iris dataset

3 types of flowers, Setosa, Versicolor and Viriginica

150 observations

4 features. Sepal Length, Sepal Width, Petal Length, Petal Width


Github Link




Author: Zac

Think & Do

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s