Stephan Gelever

Born, raised, and educated in Portland, Oregon. I am currently in Seattle employed as a Software Development Engineer for Amazon. My background is in Mathematics and Computer Science.

© 2020

Kernel Matrix without Assembly

In this example, we define an operator that computes the action of a Kernel Matrix without assembling the entire matrix. More precisely, each entry \(a_{i,j}\) in the matrix \(A\) is computed as \(K(x_i,x_j)\). Where \(K\) is some given kernel and \(x_i\) comes from some given dataset. This is often useful in machine learning when using the “kernel trick” in SVMs.

To follow along in code, checkout KernelMatrix repo.

Example

Let \(X\) be 4 grid points of a unit square mesh.

\[X = \begin{bmatrix} 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 1 \end{bmatrix}\]

and \(x_i\) be the \(i\)th column of \(X\).

Let \(K\) be the Euclidean distance kernel:

\[K(u, v) = \sqrt{\sum_{i=1}^n{(u_i - v_i)^2}}\]

Then the matrix \(A\) is given by

\[A = \begin{bmatrix} K(x_0, x_0) & K(x_0, x_1) & K(x_0, x_2) & K(x_0, x_3) \\ K(x_1, x_0) & K(x_1, x_1) & K(x_1, x_2) & K(x_1, x_3) \\ K(x_2, x_0) & K(x_2, x_1) & K(x_2, x_2) & K(x_2, x_3) \\ K(x_3, x_0) & K(x_3, x_1) & K(x_3, x_2) & K(x_3, x_3) \\ \end{bmatrix} \\ = \begin{bmatrix} 0 & 1 & 1 & 1.41 \\ 1 & 0 & 1.41 & 1 \\ 1 & 1.41 & 0 & 1 \\ 1.41 & 1 & 1 & 0 \\ \end{bmatrix}\]

Kernel Matrix Operator

To compute the action of a Kernel Matrix in linalgcpp, we define our own operator KernelMatrix. The constructor takes in a dataset \(X\), a kernel \(K\), and a shift parameter \(\sigma\):

KernelMatrix(DenseMatrix X, Kernel K, double shift);

Where a Kernel is defined in code as function of two Vectors

using Kernel = std::function<double(const VectorView& x_i, const VectorView& x_j)>;

Then, the action of the operator can be given by

\[A=K(X,X)+\sigma I\]

We avoid assembling the entire matrix by computing the element \(a_{i,j} = K(x_i, x_j)\) each time we need it using the Entry function. Where Entry is defined as

double KernelMatrix::Entry(int i, int j) const
{
    const auto&& X_i = X_.GetColView(i);
    const auto&& X_j = X_.GetColView(j);

    double k_ij = K_(X_i, X_j);

    if (i == j)
    {
        k_ij += shift_;
    }

    return k_ij;
}

Then, to define the action of the operator on a vector, \(y=Ax\), we implement the required Mult function. Since, \(A\) is necessarily symmetric, we can optimize the operation a bit. This is almost exactly the same as a DenseMatrix-Vector multiply operation, but accessing each entry requries an application of the kernel function.

void KernelMatrix::Mult(const VectorView& b, VectorView x) const
{
    int n = Rows();

    x = 0.0;

    for (int i = 0; i < n; ++i)
    {
        double a_ii = Entry(i, i);
        x[i] += a_ii * b[i];

        for (int j = i + 1; j < n; ++j)
        {
            double a_ij = Entry(i, j);

            x[j] += a_ij * b[i];
            x[i] += a_ij * b[j];
        }
    }
}

Application of the Operator

To test our newly created operator, we apply its action to a randomly generated Vector. Our test function is given by

void test_kernel_matrix(const std::string& label,
                        const KernelMatrix& kernel_mat,
                        VectorView rhs)
{
    Timer timer(Timer::Start::True);
    kernel_mat.Mult(rhs);
    timer.Click();

    std::cout << "------------------------------------\n";
    std::cout << "Kernel Type         " << label << "\n";
    std::cout << "Matrix Size         " << kernel_mat.Rows() << "\n";
    std::cout << "Memory to Assemble  " << calc_memory_reqs_gb(kernel_mat) << " GB\n";
    std::cout << "Mult Time           " << timer.TotalTime() << " seconds\n";
}

We measure the time taken to compute one matrix-vector operation \(y=Ax\).

Also, it is of great interest to measure the amount of memory that would have been required to store the entries of the operator. Since the size of a double precision value is 8 bytes, the memory required to store the entire \((n×m)\) matrix is given by \(8∗n∗m\) bytes.

The calculation in gigabytes is given by

double calc_memory_reqs_gb(const KernelMatrix& kernel_mat)
{
    int n = kernel_mat.Rows();
    int m = kernel_mat.Cols();
    double gb = 1024 * 1024 * 1024;

    return (sizeof(double) * n * m) / gb;
}

Generating Test Data

Our test dataset \(X\) comes from the coordinates of a refined unit square. This is generated by

DenseMatrix UnitSquareCoordinates(int refinements)
{
    linalgcpp_assert(refinements >= 0,
                     "Invalid number of refinements");

    int dim = 2;
    int size = std::pow(dim, refinements) + 1;
    int num_points = size * size;

    DenseMatrix coordinates(dim, num_points);

    for (int i = 0; i < size; ++i)
    {
        for (int j = 0; j < size; ++j)
        {
            int index = i * size + j;

            coordinates(0, index) = j / (size - 1.0);
            coordinates(1, index) = i / (size - 1.0);
        }
    }

    return coordinates;
}

Distributed Environment

Although I have not yet looked in to it, it may be of some interest to define a ParKernelMatrix. This would present some interesting design challenges. Since the Kernel Matrix is dense, each processor would probably have to have access to the entire dataset. This would also probably benefit greatly from GPU acceleration.