General concepts of Machine Learning
3 types of learning and basic terminology
– Supervised learning. Learn a model from labeled training data that allows us to predict future or unknown data. Supervised refers to a set of sample where desired output signals (labels) are already known. Eg spam filter
Also known as classification tasks. Another subcategory is regression where the outcome signal is a continuous value.
In regression, we are given a number of predictor (explanatory) variables and a continuous response variable (outcome) and we try to find a relationship between those variables that allow us to predict an outcome.
– Reinforcement learning. Develop a system(agent) that improves its performance based on interactions with the environment. The environment gives back a reward signal. Information about the current state of the environment is a measure of how well the action was measured by a reward function. Agent then uses reinforcement learning to learn a series of actions that maximizes this reward via an explanatory trial and error approach or deliberate learning. Eg chess learning
– Unsupervised learning deals with unlabelled data or data of unknown structure. It explores the structure of our data to extract meaningful information without the guidance of a known outcome or reward function.
Clustering is an exploratory technique that allows us to organize a pile of information into meaningful clusters without any prior knowledge of their group memberships.
Dimensionality reduction. Each observation comes with its own measurement. Unsupervised dimensionality reduction is a commonly used approach in feature preprocessing to remove noise from data and compress data onto a smaller dimensional subspace while retaining most of the relevant information.
Building blocks for successfully designing machine learning systems
- Feature extraction and scaling
- Feature selection
- Dimensionality Reduction
- Model selection
- Cross validation
- Performance Metrics. Common use metrics is classification accuracy
- Hyperparameter Optimization
Install and set up python for data analysis and ML
Use Anaconda Python for the easiest setup.
3 types of flowers, Setosa, Versicolor and Viriginica
4 features. Sepal Length, Sepal Width, Petal Length, Petal Width