Introduction
In a previous blog, I implemented the perceptron learning algorithm for the binary classification problem and gave a detailed example. In this blog, I will consider a 3-class classification example in detail. This is only slightly more complicated than the binary classification problem.
A Quick Recap of the Perceptron Learning Algorithm
The perceptron is a linear classifier. The way you learn the weights of the perceptron is by iterating through the training data. You look up one training data item at a time and perform an update, if necessary. You then look at the next data item, again performing an update if necessary. You keep going until you've cycled through the entire data set once, without making any updates.
A Numeric Example
As long as the data are linearly separable, the perceptron learning rule will find the separating boundary. The proof of the algorithm's convergence (along with a nice 2 dimensional illustrative example of the algorithm in acton) can be found in chapter 4 of the neural network design book.
Let's look at a concrete 3-class classification problem. Each of the three classes has its own weight vector. (Recall that for the binary classification problem we only had one weight vector.) Let's say that at some stage during the run of the algorithm, the weights are as follows: $$w_1 = [1, 2, -2]^T, w_2 = [3, -2, -1]^T, w_3= [-1, 2, 4]^T $$
The next training data point to consider is the following feature vector with label 2: $$f(x) = [1, -0.5, 3]^T, y^* = 2$$
To do the perceptron update, we first classify the data point based on the weights that we currently have. We compute the three inner products: $$ w_{1}^T \cdot f(x) = 1 \cdot 1 + 2 \cdot (-0.5) + (-2) \cdot 3 = -6 $$
Similarly, $$ w_2^T \cdot f(x) = 1 $$ $$ w_3^T \cdot f(x) = 10 $$
Of these three numbers, 10 is the largest. This means that based on the current weight vector, the prediction of our perceptron classifier, for this particular feature vector $f(x)$ is $y = 3$.
So the true label is $y^* = 2$ and our current weights predict the label of $y = 3$.
Therefore, we need to do an update. We need to change the weight vectors $w_2$ and $w_3$ and nudge them in the correct direction, so to speak. As things stand at the moment, $w_3$ `favors' $f(x)$ more than $w_2$ favors it. We want the opposite: we want the inner product $ w_2^T \cdot f(x) $ to be higher than ${w_3}^T \cdot f(x) $.
To achieve this, we update $\bf{w_2}$ as follows: $$ \bf{w_2} \Leftarrow \bf{w_2} + \bf{f(x)} $$
This gives us the new $\bf{w_2}$: $$[3, -2, -1]^T + [1, -0.5, 3]^T = [4, -2.5, 2]^T $$
The update for $w_3$ goes in the opposite direction. $w_3$ is the weight vector winning the current prediction and we don't want it to have that high inner product with $f(x)$. So we update it as follows: $$ \bf{w_3} \Leftarrow \bf{w_3} - \bf{f(x)} $$
The new $w_3$ is now $$[-1, 2, 4]^T - [1, -0.5, 3]^T = [-2, 2.5, 1]^T $$
This completes the update. If you recompute the inner products with the new weight vectors, you get: $$ w_2^T \cdot f(x) = 11.25 $$ $$ w_3^T \cdot f(x) = -2.5 $$
Thus, $w_2^T \cdot f(x)$ being higher after the update, we see that the new label for the feature vector $f(x)$ is $y = 2$.