In this series of posts we describe some basic ideas regarding vectors and matrices (also called 'arrays') that are fundamental to understanding machine learning / deep learning.
In this post we review the concept of a elementary arithmetic done with vectors and matrices, both of which are often referred to as 'arrays'.
# imports from custom library
import sys
sys.path.append('../../')
import matplotlib.pyplot as plt
from mlrefined_libraries import basics_library as baslib
from mlrefined_libraries import linear_algebra_library as linlib
%load_ext autoreload
%autoreload 2
In this Section we introduce the concept of a vector as well as the basic operations one can perform on a single vector or pairs of vectors. These include the transpose operation, addition/subtraction, and several multiplication operations including the inner, outer, and element-wise products.
A vector is another word for a listing of numbers. For example the following
$$ [2.1, \, \, -5.7, \, \, 13] $$is a vector of three elements, also referred to as a vector of length three or a vector of dimension $1\times3$. The '1' in the first position tells us that this is a row vector, while the '3' says how many elements the vector has. In general a vector can be of arbitrary length, and can contain numbers, variables, or both. For example
$$ [x_1, \,\, x_2, \,\, x_3, \,\, x_4] $$is a vector of four variables, of dimension $1\times4$.
When listing the numbers / variables out horizontally we call the vector a row vector. Of course we can also list them just as well vertically, e.g., we could write the first example above as a column
\begin{bmatrix} 2.1 \\ -5.7 \\ 13 \\ \end{bmatrix}in which case we refer to this as a column vector of length three or a vector of dimension $3\times 1$. Notice that the row version of this had dimension $1\times 3$. Here the '1' in the second entry tells us that the vector is a column.
We can swap back and forth between a row and column version of a vector by transposing it. This is an operation performed on a single vector, and simply turns a row vector into an equivalent column vector and vice-versa. This operation is denoted by a superscript T placed just to the right and above a vector. For example we can transpose a column into a row vector like this
$$ {\begin{bmatrix} 2.1 \\ -5.7 \\ 13 \\ \end{bmatrix}}^{\,T} = [2.1, \, \, -5.7, \, \, 13] $$and likewise a row into a column vector like this
$$ [2.1, \, \, -5.7, \, \, 13]^{\,T} = {\begin{bmatrix} 2.1 \\ -5.7 \\ 13 \\ \end{bmatrix}} $$To discuss vectors more generally we use algebraic notation, typically a bold lowercase (often English) letter. This notation does not denote whether or not the vector is a row or column, or how many elements it contains: such information must be given explicitly. For example we can denote a vector of numbers
$$ \mathbf{x} = [2.1, \, \, -5.7, \, \, 13] $$Here the fact that $\mathbf{x}$ represents a row vector with three elements is clear from its definition. Thus when we say $\mathbf{x}^T$ it is clear from its definition that
$$ \mathbf{x} = {\begin{bmatrix} 2.1 \\ -5.7 \\ 13 \\ \end{bmatrix}} $$We could also define $\mathbf{x}$ to be the vector of variables like
$$ \mathbf{x} = {\begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \\ \end{bmatrix}} $$Again nothing about the notation $\mathbf{x}$ itself tells us whether or not it is a row or column vector, nor how many elements it contains. This information is given explicitly when we define what the notation means here.
Vectors are often interpreted geometrically, and can be drawn when in two or three dimensions. A single vector is usually drawn as either a point or an arrow stemming from the origin. In the next cell we illustrate both. The 'point' version is in the left panel, with the 'arrow' version being in the right. In both cases we simply plot each coordinate of the vector as a coordinate in the Cartesian plane. Regardless of whether a vector is a column or row it is drawn the same.
# import numpy, define a vectors
import numpy as np
vec1 = np.asarray([3,3])
plotter = baslib.vector_plots.single_plot(vec1)
Vectors are referred to more generally as 'arrays', which is the nomenclature used to constructs a vector in numpy as shown in the following cell.
# import statement for numpy
import numpy as np
# construct a vector (a.k.a. an array), and print it out
x = np.asarray([2.1,-5.7,13])
print (x)
By default a numpy array is initialized in this way is dimensionless - technically speaking neither a row nor a column vector - which you can see by printing its 'shape' which is numpy-speak for dimensions.
# print out the vector's initial shape (or dimensions)
print (np.shape(x))
Thus we must explicitly define whether $\mathbf{x}$ is a row or column vector. We can do this by re-defining its shape as shown in the next cell.
# reshape x to be a row vector and print
x.shape = (1,3)
print ('----- x as a row vector ----')
print (x)
# reshape x to be a column vector and print
x.shape = (3,1)
print ('----- x as a column vector ----')
print (x)
The notation for transposing a vector in numpy looks like
numpy_array.T
We illustrate on $\mathbf{x}$ in the next cell. Notice that we last set $\mathbf{x}$ to be a column vector prior to activating the cell below.
print ('----- the original vector - a column -----')
print (x)
print ('----- the transpose - now a row vector ----- ')
print (x.T)
We add and subtract two vectors elementwise, with just one catch: in order to add/subtract two vectors they must have the same dimensions. This means that in order to add/subtract two vectors they must have the same number of elements, and both must be row or column vectors.
For example, to add these two vectors
$$ \mathbf{x} = {\begin{bmatrix} 2.1 \\ -5.7 \\ 13 \\ \end{bmatrix}} \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \mathbf{y} = {\begin{bmatrix} 4.3 \\ 9.2 \\ 13 \\ \end{bmatrix}} $$we add them element-wise
$$ \mathbf{x} + \mathbf{y} = {\begin{bmatrix} 2.1 + 4.3 \\ -5.7 + 9.2 \\ 13 + 13 \\ \end{bmatrix}} = {\begin{bmatrix} 6.4 \\ 3.5 \\ 26 \\ \end{bmatrix}} $$We likewise subtract these two vectors as
$$ \mathbf{x} - \mathbf{y} = {\begin{bmatrix} 2.1 - 4.3 \\ -5.7 - 9.2 \\ 13 - 13 \\ \end{bmatrix}} = {\begin{bmatrix} -2.2 \\ -14.9 \\ 0 \\ \end{bmatrix}} $$We can add / subtract vectors in numpy as shown in the next Python cell.
# define both x and y, make x a row vector and y a column vector
x = np.asarray([2.1,-5.7,13])
x.shape = (3,1)
y = np.asarray([4.3, 9.2, 13])
y.shape = (3,1)
print ('*** x + y ***')
print (x + y)
print ('*** x - y ***')
print (x - y)
More generally to add two Nx1 column vectors
$$ \mathbf{x} = {\begin{bmatrix} x_1 \\ x_2\\ \vdots \\ x_N \\ \end{bmatrix}} \,\,\,\,\,\,\,\,\, \mathbf{y} = {\begin{bmatrix} y_1 \\ y_2\\ \vdots \\ y_N \\ \end{bmatrix}} $$we write
$$ \mathbf{x} + \mathbf{y} = {\begin{bmatrix} x_1 + y_1 \\ x_2 + y_2\\ \vdots \\ x_N + y_N \\ \end{bmatrix}} $$and likewise for subtraction.
Elementwise addition / subtraction is by far the most common type of addition/ subtraction used in practice with vectors, and is by default what we assume in the future when we say describe addition / subtraction of vectors unless stated otherwise.
Even if two vectors have the same number of elements, techically speaking we cannot add or subtract them if one is a row vector and the other is a column vector. However with numpy it is possible to add two vectors of different shapes via numpy's built in broadcasting operations.
For example if $\mathbf{x}$ was a row vector
$$\mathbf{x} = [2.1, \, \, -5.7, \, \, 13]$$and $\mathbf{y}$ was a column vector
$$\mathbf{y} = {\begin{bmatrix} 4.3 \\ 9.2 \\ 13 \\ \end{bmatrix}} $$addition/subtraction with $\mathbf{y}$ would not be defined. If we try this in numpy we will not throw an error, but return a matrix of values.
# turn x into a row vector
x.shape = (1,3)
# try to add x and y
print ('*** x + y ***')
print (x + y)
Examining the matrix closely, you can see that what numpy has done here is make three copies of $\mathbf{y}$, and added the first element of $\mathbf{x}$ to each element of the first copy, added the second element of $\mathbf{x}$ to the second copy, and the third element of $\mathbf{x}$ to the third copy. Numpy makes this sort of operation on $\mathbf{x}$ and $\mathbf{y}$ more convenient than having to use a for loop.
If we try to add / subtract two vectors of different lengths numpy will throw an error. For example, in the next cell we try to add a vector with three elements to that has only two.
# define both x and y, make x a row vector and y a column vector
x = np.asarray([2.1,-5.7,13])
x.shape = (3,1)
y = np.asarray([4.3, 9.2])
y.shape = (2,1)
print ('*** x + y ***')
print (x + y)
Two-dimensions vectors are often represented geometrically by plotting each vector not as a point, but as an arrow stemming from the origin. From this perspective the addition of two vectors can be seen to be (very nicely) always be equal to the vector representing the far corner of the parallelogram formed by the two vectors in the sum. This is called the parallelogram law, and is illustrated by the Python cell below for any two user-defined input vectors.
Here the two input vectors are colored black, with their sum shown in red. Note the blue dashed lines are merely visual guides helping to outline the parallelogram underlying the sum.
# import numpy, define two vectors, and add to see their sum visually
import numpy as np
vec1 = np.asarray([1,3])
vec2 = np.asarray([-3,2])
plotter = baslib.vector_plots.vector_add_plot(vec1,vec2)
We can multiply any vector by a scalar by treating the multiplication elementwise. For example if
$$ \mathbf{x} = {\begin{bmatrix} 2.1 \\ -5.7 \\ 13 \\ \end{bmatrix}} $$then for any scalar $c$
$$ c\times\mathbf{x} = {\begin{bmatrix} c\times 2.1 \\ c\times-5.7 \\ c\times13 \\ \end{bmatrix}} $$And this is how scalar multiplication is done regardless of whether or not the vector is a row or column.
Using numpy we can use the standard multiplication operator to perform scalar-vector multiplication, as illustrated in the next cell.
# define vector
x = np.asarray([2.1,-5.7,13])
# multiply by a constant
c = 2
print (c*x)
This holds in general for a general $N\times 1$ vector $\mathbf{x}$ as well.
$$ c\times\mathbf{x} = {\begin{bmatrix} c\times x_1 \\ c\times x_2\\ \vdots \\ c\times x_N \\ \end{bmatrix}} $$There are a number of ways to multiply two vectors - perhaps the most natural is the elementwise product. This works precisely how it sounds: multiply two vectors of the same dimension element-by-element. The former piece of this is important: just like addition, technically speaking we need both vectors to have the same dimension in order to make this work.
To multiply two vectors element-wise
we then write
$$ \mathbf{x} \times \mathbf{y} = {\begin{bmatrix} x_1 \times y_1 \\ x_2 \times y_2\\ \vdots \\ x_N \times y_N \\ \end{bmatrix}} $$In numpy we use the natural multiplication operation '*' to perform elementwise multiplication between two vectors.
# define vector
x = np.asarray([2.1,-5.7,13])
x.shape = (3,1)
y = np.asarray([4.3, 9.2, 13])
y.shape = (3,1)
print (x*y)
The inner product is another way to multiply two vectors of the same dimension, and is the natural extension of multiplication of two scalar values in that this product produces a scalar output. Here is how the inner product is defined: to take the inner product of two $N\times1$ vectors we first multiply them together entry-wise, then add up the result.
For two general vectors
$$ \mathbf{x} = {\begin{bmatrix} x_1 \\ x_2\\ \vdots \\ x_N \\ \end{bmatrix}} \,\,\,\,\,\,\,\,\, \mathbf{y} = {\begin{bmatrix} y_1 \\ y_2\\ \vdots \\ y_N \\ \end{bmatrix}} $$the inner product is written as $\mathbf{x}^T \mathbf{y}$ and is defined as
$$ \mathbf{x}^T \mathbf{y}= \text{sum}\left(\mathbf{x}\times \mathbf{y}\right) =x_{1}y_{1}+x_{2}y_{2}+\cdots x_{N}y_{N} = \sum_{n=1}^Nx_ny_n $$The inner product is also often referred to as the 'dot' product, and written notationally as
In the next cell we use numpy to take the inner product between two vectors. Notice that we can write this out in at least two ways:
np.dot(x.T,y)
np.sum(x*y)
# define two column vectors and compute the inner product
x = np.asarray([2.1,-5.7,13])
x.shape = (3,1)
y = np.asarray([4.3, 9.2, 13])
y.shape = (3,1)
# compute the inner product
print (np.dot(x.T,y)[0][0])
print(np.sum(x*y))
The inner product (also commonly called the correlation) is also interesting because it helps define the geometric length of a vector. Notice the result of multiplying a vector $\mathbf{x}$ by itself
$$ \mathbf{x}^T \mathbf{x}= \sum_{n=1}^Nx_nx_n = \sum_{n=1}^Nx_n^2 $$We square each element and sum the result. Visualizing a vector - as we do in the next Python cell for a two-dimensional example - we can square this formula with a common elementary formula for the vector's length. In the left panel below we show the 'point' view of a two-dimensional vector, and in the right panel the corresponding 'arrow' version. Visual guides - drawn in dashed blue - are shown in each panel.
# import numpy, define a vectors
import numpy as np
vec1 = np.asarray([3,3])
plotter = baslib.vector_plots.single_plot(vec1,guides = True)
If we treated this vector - say as drawn in the right panel - as the hypotonus of a right triangle, how would we calculate its length? Using the Pythagorean theorem we would square the length of each side - i.e., the length of each blue guide - sum the result, and take the square root of this. Or - denoting our two-dimensional vector more generally as
$$ \mathbf{x} = {\begin{bmatrix} x_1 \\ x_2\\ \end{bmatrix}} $$then this computation becomes
$$ \text{length of hypotonus (a.k.a. length of vector)} = \sqrt{x_1^2 + x_2^2} $$In terms of the inner product we can write this length computation equivalently as
$$ \sqrt{x_1^2 + x_2^2} = \sqrt{\mathbf{x}^T\mathbf{x}} $$Therefore we can express the length of a vector in terms of the inner product with itself - and this generalizes to vectors of any length.
Thus the notation denoting the length of a vector $\mathbf{x}$ being $\lVert \mathbf{x} \rVert _2$, we can always compute the length as
$$ \lVert \mathbf{x} \rVert _2 = \sqrt{\mathbf{x}^T\mathbf{x}} $$The following beautiful formula - referred to as the inner product rule - holds for any two vectors $\mathbf{x}$ and $\mathbf{y}$
$$ \mathbf{x}^T\mathbf{y} = \lVert \mathbf{x} \rVert_2 \lVert \mathbf{y} \rVert_2 \text{cos}(\theta) $$This rule - which can be formally proven using the law of cosines from trigonometry - is perhaps best intuited after a slight rearrangement of its terms, as follows.
$$ \left(\frac{\mathbf{x}}{ \lVert \mathbf{x} \rVert_2}\right)^T \left(\frac{\mathbf{y}}{ \lVert \mathbf{y} \rVert_2} \right)= \text{cos}(\theta) $$Here each input vector is normalized, i.e,. has length equal to one, and the formula can be interpreted as a smooth measurement of the angle $\theta$ between these two vectors lying on the unit circle.
Notice that because cosine lies between $-1$ and $+1$ so too does this measurement. When the two vectors point in the exact same direction the value takes on $+1$, when they point in completely opposite directions $-1$. When the two vectors are perpendicular to each other, their inner product is equal to zero.
The two extremes $\pm 1$ are easy to verify - for example if $\mathbf{x} = \mathbf{y}$ then the two vectors overlap completely, pointing in the same direction and
$$ \left(\frac{\mathbf{x}}{ \lVert \mathbf{x} \rVert_2}\right)^T \left(\frac{\mathbf{x}}{ \lVert \mathbf{x} \rVert_2} \right)= \left(\frac{\mathbf{x}^T\mathbf{x}}{ \lVert \mathbf{x} \rVert_2^2} \right)=\left(\frac{\lVert \mathbf{x} \rVert_2^2}{ \lVert \mathbf{x} \rVert_2^2} \right) = +1 $$In the next Python cell we animate the value of the inner product for two unit-length vectors using a slider widget. The first $\mathbf{x}_1 = \begin{bmatrix} 1 \\ 0 \end{bmatrix}$ is fixed, while the second is free to vary around the circle using the slider. As the free vector rotates around the circle the corresponding inner product between it and the fixed vector is shown in the right panel. Note how when the free vector is perpendicular to the fixed vector, the inner product is zero. When they are pointed in completely opposite directions the inner product is $-1$, and when perfectly aligned $+1$.
# illustrate the full range of the inner product for against the vector [1,0]
start = 0.45 # where to start off on the unit circle
linlib.transform_animators.inner_product_visualizer(num_frames = 200,start = start)