Scalable Gaussian process inference with Stan

https://github.com/onnela-lab/gptools/actions/workflows/main.yml/badge.svg https://readthedocs.org/projects/gptools-stan/badge/?version=latest https://img.shields.io/pypi/v/gptools-stan https://img.shields.io/static/v1?label=&message=GitHub&color=gray&logo=github

Gaussian processes (GPs) are flexible distributions to model functional data. Whilst theoretically appealing, they are computationally cumbersome except for small datasets. This package implements two methods for scaling GP inference in Stan:

  1. a Sparse Approximation of the likelihood that is generally applicable.

  2. an exact method for regularly spaced data modeled by stationary kernels using fast Fourier Methods.

The implementation follows Stan’s design and exposes performant inference through a familiar interface.

Getting Started

The library is loaded with Stan’s #include statement, and methods to evaluate or approximate the likelihood of a GP use the declarative ~ sampling syntax. The following brief example uses Fourier Methods to sample GP realizations.

functions {
    // Include utility functions, such as real fast Fourier transforms.
    #include gptools/util.stan
    // Include functions to evaluate GP likelihoods with Fourier methods.
    #include gptools/fft.stan
}

data {
    // The number of sample points.
    int<lower=1> n;
    // Real fast Fourier transform of the covariance kernel.
    vector[n %/% 2 + 1] cov_rfft;
}

parameters {
    // GP value at the `n` sampling points.
    vector[n] f;
}

model {
    // Sampling statement to indicate that `f` is a GP.
    f ~ gp_rfft(zeros_vector(n), cov_rfft);
}

You can learn more by following the Examples or delving into the Function Reference. The Background section offers a deeper explanation of the methods used to evaluate likelihoods and the pros and cons of different parameterizations. See the accompanying publication “Scalable Gaussian process inference with Stan” for further details.

Installation

If you have a recent python installation, the library can be installed by running

pip install gptools-stan

from the command line. The library exposes a function gptools.stan.compile_model() for compiling cmdstanpy.CmdStanModels with the correct include paths. For example, the example above can be compiled using the following snippet.

>>> from gptools.stan import compile_model
>>>
>>> # stan_file = path/to/getting_started.stan
>>> model = compile_model(stan_file=stan_file)
>>> model.name
'getting_started'

If you use cmdstanr or another Stan interface, you can download the library files from GitHub. Then add the library location to the compiler include_paths as described in the manual (see here for cmdstanr instructions).

Reproducing results from the accompanying publication

The accompanying publication “Scalable Gaussian process inference with Stan” provides theoretical background and a technical description of the methods. All results and figures can be reproduced by following the instructions in the repository of reproduction materials.