In this example, we define an operator that computes the action of a Kernel Matrix without assembling the entire matrix. More precisely, each entry \(a_{i,j}\) in the matrix \(A\) is computed as \(K(x_i,x_j)\). Where \(K\) is some given kernel and \(x_i\) comes from some given dataset. This is often useful in machine learning when using the “kernel trick” in SVMs.
To follow along in code, checkout KernelMatrix repo.
Example
Let \(X\) be 4 grid points of a unit square mesh.
\[X = \begin{bmatrix} 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 1 \end{bmatrix}\]and \(x_i\) be the \(i\)th column of \(X\).
Let \(K\) be the Euclidean distance kernel:
\[K(u, v) = \sqrt{\sum_{i=1}^n{(u_i - v_i)^2}}\]Then the matrix \(A\) is given by
\[A = \begin{bmatrix} K(x_0, x_0) & K(x_0, x_1) & K(x_0, x_2) & K(x_0, x_3) \\ K(x_1, x_0) & K(x_1, x_1) & K(x_1, x_2) & K(x_1, x_3) \\ K(x_2, x_0) & K(x_2, x_1) & K(x_2, x_2) & K(x_2, x_3) \\ K(x_3, x_0) & K(x_3, x_1) & K(x_3, x_2) & K(x_3, x_3) \\ \end{bmatrix} \\ = \begin{bmatrix} 0 & 1 & 1 & 1.41 \\ 1 & 0 & 1.41 & 1 \\ 1 & 1.41 & 0 & 1 \\ 1.41 & 1 & 1 & 0 \\ \end{bmatrix}\]Kernel Matrix Operator
To compute the action of a Kernel Matrix in linalgcpp, we define our own operator KernelMatrix
.
The constructor takes in a dataset \(X\), a kernel \(K\), and a shift parameter \(\sigma\):
Where a Kernel
is defined in code as function of two Vectors
Then, the action of the operator can be given by
\[A=K(X,X)+\sigma I\]We avoid assembling the entire matrix by computing the element \(a_{i,j} = K(x_i, x_j)\) each time we need it using the Entry
function. Where Entry
is defined as
Then, to define the action of the operator on a vector, \(y=Ax\), we implement the required Mult
function.
Since, \(A\) is necessarily symmetric, we can optimize the operation a bit.
This is almost exactly the same as a DenseMatrix-Vector multiply operation, but accessing each entry requries an application of the kernel function.
Application of the Operator
To test our newly created operator, we apply its action to a randomly generated Vector. Our test function is given by
We measure the time taken to compute one matrix-vector operation \(y=Ax\).
Also, it is of great interest to measure the amount of memory that would have been required to store the entries of the operator. Since the size of a double precision value is 8 bytes, the memory required to store the entire \((n×m)\) matrix is given by \(8∗n∗m\) bytes.
The calculation in gigabytes is given by
Generating Test Data
Our test dataset \(X\) comes from the coordinates of a refined unit square. This is generated by
Distributed Environment
Although I have not yet looked in to it, it may be of some interest to define a ParKernelMatrix
.
This would present some interesting design challenges.
Since the Kernel Matrix is dense, each processor would probably have to have access to the entire dataset.
This would also probably benefit greatly from GPU acceleration.