Computational Calculus Series¶

Part 8: Derivatives of multi-input functions¶

In this post we describe how derivatives are defined in higher dimensions, when dealing with multi-input functions. We explore these ideas first with $N=2$ inputs for visualization purposes, generalizing afterwards.

Press the button 'Toggle code' below to toggle code on and off for entire this presentation.

In [1]:

from IPython.display import display
from IPython.display import HTML
import IPython.core.display as di # Example: di.display_html('<h3>%s:</h3>' % str, raw=True)

# This line will hide code by default when the notebook is exported as HTML
di.display_html('<script>jQuery(function() {if (jQuery("body.notebook_app").length == 0) { jQuery(".input_area").toggle(); jQuery(".prompt").toggle();}});</script>', raw=True)

# This line will add a button to toggle visibility of code blocks, for use with the HTML export version
di.display_html('''<button onclick="jQuery('.input_area').toggle(); jQuery('.prompt').toggle();">Toggle code</button>''', raw=True)

1. Generalizing the derivative for multi-input functions¶

In this Section we describe how the notion of a derivative for single-input functions is naturally generalized to multi-input functions.

1.1 From tangent line to tangent hyperplane¶

the derivative of a multi-input function represents the set of slopes that define a tangent hyperplane

Example 1. Tangent hyperplane¶

for example below we plot the two closely related functions

\begin{array} \ g(w) = 2 + \text{sin}(w)\\ g(w_1,w_2) = 2 + \text{sin}(w_1 + w_2) \end{array}

along with their tangent line / hyperplane at the origin

In [2]:

# plot a single input quadratic in both two and three dimensions
func1 = lambda w: 2 + np.sin(w) 
func2 = lambda w: 2 + np.sin(w[0] + w[1]) 

# use custom plotter to show both functions
callib.derivative_ascent_visualizer.compare_2d3d(func1 = func1,func2 = func2)

1.2 Derivatives: from secants to tangents¶

we saw how the derivative of a single input function $g(w)$ at a point $w^0$ was approximately the value

\begin{equation} \frac{\mathrm{d}}{\mathrm{d}w}g(w^0) \approx \frac{g(w^0 + \epsilon) - g(w^0)}{\epsilon} \end{equation}

remember this is the slope of secant line passing through $(w^0,\,\,g(w^0))$ and $(w^0 + \epsilon, \,\, g(w^0 + \epsilon))$

and letting $|\epsilon|$ shrink to zero approximation becomes equality and we get derivative = slope of the tangent line at $w^0$.

Example 2. Single input secant experiment¶

below we plot $g(w) = \text{sin}(w)$ over a short window of its input $w^0 = 0$

take a point nearby that can be controlled via the slider mechanism, and connect the two via a secant line

when the neighborhood point is close enough to $0$ the secant line becomes tangent, and turns from red to green

In [3]:

# what function should we play with?  Defined in the next line, along with our fixed point where we show tangency.
g = lambda w: np.sin(w)

# create an instance of the visualizer with this function
st = callib.secant_to_tangent.visualizer(g = g)

# run the visualizer for our chosen input function and initial point
st.draw_it(w_init = 0, num_frames = 200)

Out[3]:

with $N$ inputs have same situation - but compute a derivative along each input axis

take $N = 2$ we fix a point $(w_1,w_2) = (w^0_1,w^0_2)$ then and can form secant hyperplane along either $w_1$ or $w_2$ input axes

the slope of each simple secant hyperplane = approximaion to derivative along that axis

for example derivative along first axis

\begin{equation} \frac{\mathrm{d}}{\mathrm{d}w_1}g(w^0_1,w^0_2) \approx \frac{g(w^0_1 + \epsilon,w^0_2) - g(w^0_1,w^0_2)}{\epsilon} \end{equation}

the second point only has a different $w_1$ value

likewise to compute the derivative in the second input axis $w_2$ here we compute the slope value

\begin{equation} \frac{\mathrm{d}}{\mathrm{d}w_2}g(w^0_1,w^0_2) \approx \frac{g(w^0_1 ,w^0_2 + \epsilon) - g(w^0_1,w^0_2)}{\epsilon} \end{equation}

each derivative $\frac{\mathrm{d}}{\mathrm{d}w_1}g(w^0_1,w^0_2)$ and $\frac{\mathrm{d}}{\mathrm{d}w_2}g(w^0_1,w^0_2)$ called a partial derivative of the function $g(w_1,w_2)$

typically use $\partial$ symbol in place of $\mathrm{d}$ notation for partial derivatives: $\frac{\partial}{\partial w_1}g(w^0_1,w^0_2)$ and $\frac{\partial}{\partial w_2}g(w^0_1,w^0_2)$

Example 3. Multi-input secant experiment¶

below we repeat the secant experiment for the multi-input function

\begin{equation} g(w_1,w_2) = 5 + (w_1 + 0.5)^2 + (w_2 + 0.5)^2 \end{equation}

secant hyperplanes whose slope is given by an individual partial derivative is shown in left and right panel

when secant points are close enough to the origin the secant becomes tangent in each input dimension, and hyperplane changes color from red to green

In [11]:

# what function should we play with?  Defined in the next line, along with our fixed point where we show tangency.
func = lambda w: 5 + (w[0] +0.5)**2 + (w[1]+0.5)**2 
view = [20,150]

# run the visualizer for our chosen input function and initial point
callib.secant_to_tangent_3d.animate_it(func = func,num_frames=50,view = view)

Out[11]:

Equation for tangent hyperplane¶

hyperplanes at a point $(w^0_1,w^0_2)$ tangent along each input dimension - like those shown in the figure above - one for each partial derivative

-each such hyperplane is simple and has slope like single-input case, e.g., along $w_1$ axis

\begin{equation} h(w_1,w_2) = g(w^0_1,w^0_2) + \frac{\partial}{\partial w_1}g(w^0_1,w^0_2)(w^{\,}_1 - w^0_1) \end{equation}

and likewise for the tangency along the $w_2$ axis.

\begin{equation} h(w_1,w_2) = g(w^0_1,w^0_2) + \frac{\partial}{\partial w_2}g(w^0_1,w^0_2)(w^{\,}_2 - w^0_2) \end{equation}

neither simple hyperplane represents the full tangency at the point $(w^0_1,w^0_2)$, which must be a function of both inputs $w_1$ and $w_2$.

to get this we must sum up the slope contributions from both input axes, which gives the full tangent hyperplane

\begin{equation} h(w_1,w_2) = g(w^0_1,w^0_2) + \frac{\partial}{\partial w_1}g(w^0_1,w^0_2)(w^{\,}_1 - w^0_1) + \frac{\partial }{\partial w_2}g(w^0_1,w^0_2)(w^{\,}_2 - w^0_2) \end{equation}

Example 4. Arbitrary tangent hyperplane¶

below illustrated each single-input tangency (with respect to $w_1$ and $w_2$ in the left and middle panels respectively), along with the full tangent hyperplane (right panel) at a point

In [16]:

# what function should we play with?  Defined in the next line, along with our fixed point where we show tangency.
func = lambda w: 5 + (w[0] +0.5)**2 + (w[1]+0.5)**2
view = [10,150]

# run the visualizer for our chosen input function and initial point
callib.secant_to_tangent_3d.draw_it(func = func,num_frames=50,view = view)

1.3 The gradient¶

collect partial derivatives into vector-valued function called the gradient denoted $\nabla g(w_1,w_2)$, where the partial derivatives column-wise as

\begin{equation} \nabla g(w_1,w_2) = \begin{bmatrix} \ \frac{\partial}{\partial w_1}g(w_1,w_2) \\ \frac{\partial}{\partial w_2}g(w_1,w_2) \end{bmatrix} \end{equation}

when a function has only a single input the gradient reduces to a single derivative,

this is why the derivative of a function (regardless of its number of inputs) is typically just referred to as its gradient.

for general $N$ gradient consists of $N$ partial derivatives stacked into a column vector

\begin{equation} \nabla g(w_1,w_2,..,w_N) = \begin{bmatrix} \ \frac{\partial}{\partial w_1}g(w_1,w_2,..,w_N) \\ \frac{\partial}{\partial w_2}g(w_1,w_2,..,w_N) \\ \vdots \\ \frac{\partial}{\partial w_N}g(w_1,w_2,..,w_N) \end{bmatrix} \end{equation}

2. Derivatives and the direction of greatest ascent¶

2.1 The steepest ascent direction of a tangent line¶

The ascent / descent direction of a tangent hyperplane tells us the direction we must travel in (at least locally around where it most closely resembles its underlying function) in order to increase / decrease the value of the underlying function.

Example 5. The derivative as a direction¶

In [13]:

# what function should we play with?  Defined in the next line.
g = lambda w: 1.5 + 0.4*w**2 

# run the visualizer for our chosen input function
callib.derivative_ascent_visualizer.animate_visualize2d(g=g,num_frames = 50,plot_descent = True)

Out[13]:

In [14]:

# what function should we play with?  Defined in the next line.
g = lambda w: np.sin(2*w) + 1.5

# run the visualizer for our chosen input function
callib.derivative_ascent_visualizer.animate_visualize2d(g=g,num_frames = 100,plot_descent = True)

Out[14]:

at each point we can see that ascent direction provides the right way to go to increase the function value locally (near the point of tangency), and likewise the descent direction provides the direction to travel in to decrease the function value (at least locally near the point of tangency)

Unit directions¶

it is the direction and not the magnitude of the derivative that provides ascent / descent direction

we can normalize the derivative to have unit length by dividing off its norm as $\frac{\frac{\mathrm{d}}{\mathrm{d}w}g(w)}{\left\Vert \frac{\mathrm{d}}{\mathrm{d}w}g(w) \right\Vert_2 }$

the value of this unit-length derivative is either +1 or -1 (we only have 2 directions we can move in), gives unit length ascent / descent directions

2.2 The steepest ascent direction and the tangent hyperplane¶

precisely the same idea holds for higher dimensions as well, only now we have a multitude of partial derivatives supplying ascent information

steepest ascent is direction of gradient $\nabla g\left(\mathbf{w}\right)$, likewise the negative gradient $-\nabla g(\mathbf{w})$ provides the direction of steepest descent

as in single-input case we care only about direction, so can unit-normalize gradient as $\frac{\nabla g(\mathbf{w})}{\Vert \nabla g(\mathbf{w}) \Vert_2 }$

Example 6. The gradient as a direction¶

below we illustrate ascent / descent directions provided by gradient hyperplane for

\begin{equation} g(w_1,w_2) = w_1^2 + w_2^2 \end{equation}

at $(-1,1)$.

two partial derivatives shown in blue, gradient steepest ascent direction shown in black, negative gradient steepest descent direction in red, and the tangent hyperplane in green

In [3]:

# define function, and points at which to take derivative
func = lambda w:  6 + (w[0])**2 + (w[1])**2
pt1 = [-1,1];

# animate 2d slope visualizer
view = [33,30]
callib.derivative_ascent_visualizer.visualize3d(func=func,view = view,pt1 = pt1,plot_descent = True)

We show the same picture via the Python cell below for the more complicated sinusoid

\begin{equation} g(w_1,w_2) = 5 + \text{sin}(1.5w_1) - 2w_2 \end{equation}

evaluating the gradient at the origin.

In [4]:

# define function, and points at which to take derivative
func = lambda w:  5 + np.sin(1.5*w[0] - 2*w[1])
pt1 = [0,0];

# animate 2d slope visualizer
view = [33,50]
callib.derivative_ascent_visualizer.visualize3d(func=func,view = view,pt1 = pt1,plot_descent = True)