In this post we formally describe the problem of linear regression, or the fitting of a representative line (or hyperplane in higher dimensions) to a set of input/output data points. Regression in general may be performed for a variety of reasons: to produce a so-called trend line (or - more generally - a curve) that can be used to help visually summarize, drive home a particular point about the data under study, or to learn a model so that precise predictions can be made regarding output values in the future.

In this post we describe a very particular form of nonlinear regression called logistic regression that is designed to deal with a very particular kind of dataset that is commonly dealt with in machine learning/deep learning: two-class classification data.

In this post we discuss the perceptron - a historically significant and useful way of thinking about linear classification. The derivation of the perceptron is more direct to the task of classification than is logistic regression, provoking both the use of the rectified linear unit function which plays such an important role with neural networks, as well as shedding light on the origin of the softmax cost function.

Here we discuss an often used variation of the original perceptron, called the margin perceptron, that is once again based on analyzing the geometry of the classification problem where a line (or hyperplane in higher dimensions) is used to separate two classes of data. We then build on this fundamental concept to derive support vector machines, a popular method used for linear classification.

In the next two posts we discuss two popular generalizations of the two class framework discussed previously for multi-class classification - namely, One-versus-All and multi-class logistic regression. For a dataset consisting of C classes both schemes learn C two-class linear classifiers - each of which distinguishes a single class from all others - fusing them together to create multi-class classifier. These two popular approaches fuse the C linear classifiers together in essentially the same manner, differing only in how the individual classifiers are trained: in the One-versus-All scheme each classifier is trained independently from the others, while with multi-class logistic regression all are tuned simultaneously.

In this post we discuss a popular alternative to OvA multi-class classification - detailed in the previous post - where we also learn CC two-class classifiers (and also employ the fusion rule) but train them simultaneously instead of independently as with OvA.