In this post we discuss the perceptron - a historically significant and useful way of thinking about linear classification.
Press the button 'Toggle code' below to toggle code on and off for entire this presentation.
from IPython.display import display
from IPython.display import HTML
import IPython.core.display as di # Example: di.display_html('<h3>%s:</h3>' % str, raw=True)
# This line will hide code by default when the notebook is exported as HTML
di.display_html('<script>jQuery(function() {if (jQuery("body.notebook_app").length == 0) { jQuery(".input_area").toggle(); jQuery(".prompt").toggle();}});</script>', raw=True)
# This line will add a button to toggle visibility of code blocks, for use with the HTML export version
di.display_html('''<button onclick="jQuery('.input_area').toggle(); jQuery('.prompt').toggle();">Toggle code</button>''', raw=True)
In this Section we derive the perceptron, which provides a foundational perceptron perspective on two-class classification. We will see - among other things - our first use of the rectified linear unit as well as the origin of phrase softmax in softmax cost function.
With two-class classification we have a training set of P points {(xp,yp)}Pp=1 where yp's take on just two label values from {−1,+1}.
As we saw with logistic regression the decision boundary was formally given as a hyperplane
w0+xTw=0When hyperplane parameters w0 and w are chosen well we have (for most points)
w0+xTpw>0ifyp=+1w0+xTpw<0ifyp=−1
The two (ideal) conditions again:
w0+xTpw>0ifyp=+1w0+xTpw<0ifyp=−1We can combine them in a single equation (because yp∈{−1,+1})
−yp(w0+xTpw)<0written equivalently - using ReLU function - as
max(0,−yp(w0+xTpw))=0Lets look at the expression max(0,−yp(w0+xTpw)) more closely:
We sum up this expression over the entire dataset, giving the so-called perceptron or ReLU cost
g(w0,w)=P∑p=1max(0,−yp(w0+xTpw))The ReLU cost
g(w0,w)=P∑p=1max(0,−yp(w0+xTpw))We test out the perceptron on the 3-d dataset used in the previous (logistic regression) example
The perceptron / relu cost function - a Python implementation
# the relu cost function
def relu(w):
cost = 0
for p in range(0,len(y)):
x_p = x[p]
y_p = y[p]
a_p = w[0] + sum([a*b for a,b in zip(w[1:],x_p)])
cost += np.maximum(0,-y_p*a_p)
return cost
# declare an instance of our current our optimizers
opt = superlearn.optimimzers.MyOptimizers()
# run desired algo with initial point, max number of iterations, etc.,
w_hist = opt.gradient_descent(g = relu,w = np.random.randn(np.shape(x)[1]+1,1),max_its = 50,alpha = 10**-2,steplength_rule = 'diminishing')
# create instance of 3d demos
demo5 = superlearn.classification_3d_demos.Visualizer(data)
# draw the final results
demo5.static_fig(w_hist,view = [15,-140])
The softmax function defined as
soft(s1,s2,...,sN)=log(es1+es2+⋯+esN)is a generic smooth approximation to the max function
soft(s1,s2,...,sN)≈max(s1,s2,...,sN)Why? Say, sj is the maximum and (much) larger than the rest. Then, esj dominates the sum and we have that
log(es1+es2+⋯+esN)≈log(esj)=sjThe ReLU function
g(s)=max(0,s)and its smooth softmax approximation
g(s)=soft(0,s)=log(1+es)
Recall the ReLU perceptron cost function
g(w0,w)=P∑p=1max(0,−yp(w0+xTpw))We replace the pth summand with its softmax approximation
soft(0,−yp(w0+xTpw))=log(e0+e−yp(w0+xTpw))=log(1+e−yp(w0+xTpw))giving the smooth cost function
g(w0,w)=P∑p=1log(1+e−yp(w0+xTpw))Where have you seen this cost before?
g(w0,w)=P∑p=1log(1+e−yp(w0+xTpw))Predictions, misclassifications, the accuracy, etc., are all found in the same manner described for logistic regression.