A **supervised learning algorithm** takes a known set of input data and known responses to the data (output), and *trains* a model to generate reasonable predictions for the response to new data.

For example, suppose you want to predict whether someone will have a heart attack within a year. You have a set of data on previous patients, including age, weight, height, blood pressure, etc. You know whether the previous patients had heart attacks within a year of their measurements. So, the problem is combining all the existing data into a model that can predict whether a new person will have a heart attack within a year.

You can think of the entire set of input data as a heterogeneous matrix. Rows of the matrix are called *observations*, *examples*, or *instances*, and each contain a set of measurements for a subject (patients in the example). Columns of the matrix are called *predictors*, *attributes*, or *features*, and each are variables representing a measurement taken on every subject (age, weight, height, etc. in the example). You can think of the response data as a column vector where each row contains the output of the corresponding observation in the input data (whether the patient had a heart attack). To *fit* or *train* a supervised learning model, choose an appropriate algorithm, and then pass the input and response data to it.

Supervised learning splits into two broad categories: **classification and regression**.

- In
, the goal is to assign a class (or*classification**label*) from a finite set of classes to an observation. That is, responses are categorical variables. Applications include spam filters, advertisement recommendation systems, and image and speech recognition. Predicting whether a patient will have a heart attack within a year is a classification problem, and the possible classes are`true`

and`false`

. Classification algorithms usually apply to nominal response values. However, some algorithms can accommodate ordinal classes. - In
, the goal is to predict a continuous measurement for an observation. That is, the responses variables are real numbers. Applications include forecasting stock prices, energy consumption, or disease incidence.*regression*

**The steps for supervised learning are:**

- Prepare Data
- Choose an Algorithm
- Fit a Model
- Choose a Validation Method
- Examine Fit and Update Until Satisfied
- Use Fitted Model for Predictions

#### Prepare Data

All supervised learning methods start with an input data matrix, usually called `X`

here. Each row of `X`

represents one observation. Each column of `X`

represents one variable, or predictor. Represent missing entries with `NaN`

values in `X`

. Statistics and Machine Learning Toolbox supervised learning algorithms can handle `NaN`

values, either by ignoring them or by ignoring any row with a `NaN`

value.

You can use various data types for response data `Y`

. Each element in `Y`

represents the response to the corresponding row of `X`

. Observations with missing `Y`

data are ignored.

- For regression,
`Y`

must be a numeric vector with the same number of elements as the number of rows of`X`

. - For classification,
`Y`

can be any of these data types. This table also contains the method of including missing entries.

#### Choose an Algorithm

There are tradeoffs between several characteristics of algorithms, such as:

- Speed of training
- Memory usage
- Predictive accuracy on new data
- Transparency or interpretability, meaning how easily you can understand the reasons an algorithm makes its predictions

Details of the algorithms appear in Characteristics of Classification Algorithms. More detail about ensemble algorithms is in Choose an Applicable Ensemble Method.