Bias – variance trade-off

When talking about bias-variance trade-off, we are talking about supervised machine learning, where the algorithm learns on a labeled data set. In supervised machine learning, the algorithm learns a model from training data. The goal is to estimate the mapping function (f) for the output variable (Y) given input variable (X). The prediction error can be broken into three parts: bias error, variance error, and irreducible error. We almost can do nothing with the irreducible error, because they are introduced from the way we frame the problem and caused by such factors as unknown variables. Then, what we can focus on is bias error and variance error.

Bias error is caused by oversimplification of the machine learning algorithm, leading to under-fitting. When you train a model, you normally give simplified assumptions to make the target function easier to learn. Low bias means there are fewer assumptions about the form of the target function. Higher bias means there are more assumptions about the form of the target function.

Bias error is also known as “approximation error”. Approximation error measures the quality of the model family. A simple way to think of approximation error is that suppose you have an infinite amount of data to train your model, how well will you do with that representation?

Variance error is caused by complex machine learning algorithms, leading to over-fitting and high sensitivity. In this case, your model learns too much noise from the training data set, then performs poorly on the test data set. Low variance means changes to the training data set cause small changes to the estimate of the target function. High variance means changes to the training data set cause large changes to the estimate of the target function.

Variance error is also known as “estimation error”, measuring how far the actual learned classifier f is from the optimal classifier f*. Talking about the example above, this is the cost that you have to pay when you do not have an infinite amount of training data.

So, where does the trade-off come from? When you increase the complexity of the machine learning algorithm, bias error decreases because you are making fewer assumptions about the form of the target function. However, this only lasts until a certain point. When you continue to increase the complexity of the model, the model suffers from high variance as it learns too much noise and could not generalize well. Therefore, the bias-variance trade-off can be simply explained as follows: when you increase the bias, you decrease the variance. When you increase the variance, you decrease the bias.

Putting in practice, k-nearest neighbor algorithm has low bias and high variance. If you choose the value of k = 1, each new point is predicted by its nearest neighbor in the training data set. However, you can change the trade-off by increasing the value of k (from 1 to the number of training data points). Doing so increases the number of neighbors contributing to the prediction, increasing bias and decreasing variance.

Although in reality we cannot calculate precisely real bias and and variance error terms as we are not 100% sure about the form of the target function, it is important to acknowledge the bias-variance trade-off. In the following article, I’ll talk about different ways to manage bias and variance.

Leave a comment