Getting started#

This example illustrates the use of gptools by sampling from the prior distribution of a simple Gaussian process model. It uses Fourier Methods to accelerate sampling. Let’s define a squared exponential kernel and the grid \(x\) of \(n\) observations.

from gptools.util.kernels import ExpQuadKernel
from matplotlib import pyplot as plt
import numpy as np


n = 100  # Number of observations.
x = np.arange(n)  # Location of observations.
epsilon = 1e-9  # "Nugget" variance for numerical stability.

# Define a kernel, evaluate the covariance and its Fourier transform.
kernel = ExpQuadKernel(1, n / 10, period=n)
cov = kernel.evaluate(0, x[:, None])
cov[0] += epsilon
cov_rfft = kernel.evaluate_rfft(n) + epsilon

# Show the kernel and its Fourier transform.
fig, (ax1, ax2) = plt.subplots(1, 2)
ax1.plot(x, cov)
ax1.set_xlabel("location $x$")
ax1.set_ylabel("kernel $k(0, x)$")

xi = np.arange(cov_rfft.size)
ax2.plot(xi, cov_rfft)
ax2.set_xlabel(r"frequency $\xi$")
ax2.set_ylabel(r"kernel RFFT $\tilde k(\xi)$")
fig.tight_layout()

../../_images/7aa36a4c9218080e36b859bf8e789fe8a874cd1b046771518f5674d86a736411.png

The left panel shows the covariance between a point at \(x=0\) and the rest of the observation grid. We set period=n because the RFFT assumes periodic boundary conditions. While most real-world problems do not have periodic boundary conditions, we can attenuate their effect by padding the domain (see Density of T. panamensis on a 50 ha plot in Panama for details). The right panel shows the Fourier transform of the kernel. Because the squared exponential kernel gives rise to very smooth functions, only the first few Fourier modes have appreciable power. Let’s use Stan to sample from the distribution. The model is shown below.

functions {
    // Include utility functions, such as real fast Fourier transforms.
    #include gptools/util.stan
    // Include functions to evaluate GP likelihoods with Fourier methods.
    #include gptools/fft.stan
}

data {
    // The number of sample points.
    int<lower=1> n;
    // Real fast Fourier transform of the covariance kernel.
    vector[n %/% 2 + 1] cov_rfft;
}

parameters {
    // GP value at the `n` sampling points.
    vector[n] f;
}

model {
    // Sampling statement to indicate that `f` is a GP.
    f ~ gp_rfft(zeros_vector(n), cov_rfft);
}

from gptools.stan import compile_model
import os


# Define a data object for Stan, compile the model, and get the number of samples.
data = {"n": n, "cov_rfft": cov_rfft}
model = compile_model(stan_file="getting_started.stan")

# Sample from the model and visualize samples.
fit = model.sample(data, chains=1, iter_warmup=100, iter_sampling=20)


def plot_samples(fit, alpha):
    fig, ax = plt.subplots()
    ax.plot(x, fit.f.T, color="C0", alpha=alpha)
    ax.set_xlabel("location $x$")
    ax.set_ylabel("GP $f$")
    fig.tight_layout()
    

plot_samples(fit, 0.1)

WARNING:root:cmdstan<2.36.0 had a bug in the evaluation of Matern 3/2 kernels with different length scales for different dimensions; see https://github.com/stan-dev/math/pull/3084 for details. Your model may yield unexpected results or crash if you use nearest-neighbor Gaussian processes with Matern 3/2 kernels. Your cmdstan version is 2.31; consider upgrading.

WARNING:cmdstanpy:CmdStanModel(compile=...) is deprecated and will be removed in the next major version. The constructor will always ensure a model has a compiled executable.
If you wish to force recompilation, use force_compile=True instead.

INFO:cmdstanpy:compiling stan file /home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started/getting_started.stan to exe file /home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started/getting_started

DEBUG:cmdstanpy:cmd: make STANCFLAGS+=--filename-in-msg=getting_started.stan STANCFLAGS+=--include-paths=/home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/python/stan/gptools/stan,/home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started /home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started/getting_started
cwd: /home/docs/.cmdstan/cmdstan-2.31.0

DEBUG:cmdstanpy:Console output:

--- Translating Stan model to C++ code ---
bin/stanc --filename-in-msg=getting_started.stan --include-paths=/home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/python/stan/gptools/stan,/home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started --o=/home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started/getting_started.hpp /home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started/getting_started.stan

--- Compiling, linking C++ code ---
g++ -std=c++1y -pthread -D_REENTRANT -Wno-sign-compare -Wno-ignored-attributes      -I stan/lib/stan_math/lib/tbb_2020.3/include    -O3 -I src -I stan/src -I stan/lib/rapidjson_1.1.0/ -I lib/CLI11-1.9.1/ -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.3.9 -I stan/lib/stan_math/lib/boost_1.78.0 -I stan/lib/stan_math/lib/sundials_6.1.1/include -I stan/lib/stan_math/lib/sundials_6.1.1/src/sundials    -DBOOST_DISABLE_ASSERTS          -c -Wno-ignored-attributes   -x c++ -o /home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started/getting_started.o /home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started/getting_started.hpp
g++ -std=c++1y -pthread -D_REENTRANT -Wno-sign-compare -Wno-ignored-attributes      -I stan/lib/stan_math/lib/tbb_2020.3/include    -O3 -I src -I stan/src -I stan/lib/rapidjson_1.1.0/ -I lib/CLI11-1.9.1/ -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.3.9 -I stan/lib/stan_math/lib/boost_1.78.0 -I stan/lib/stan_math/lib/sundials_6.1.1/include -I stan/lib/stan_math/lib/sundials_6.1.1/src/sundials    -DBOOST_DISABLE_ASSERTS                -Wl,-L,"/home/docs/.cmdstan/cmdstan-2.31.0/stan/lib/stan_math/lib/tbb" -Wl,-rpath,"/home/docs/.cmdstan/cmdstan-2.31.0/stan/lib/stan_math/lib/tbb"      /home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started/getting_started.o src/cmdstan/main.o        -Wl,-L,"/home/docs/.cmdstan/cmdstan-2.31.0/stan/lib/stan_math/lib/tbb" -Wl,-rpath,"/home/docs/.cmdstan/cmdstan-2.31.0/stan/lib/stan_math/lib/tbb"   stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_nvecserial.a stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_cvodes.a stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_idas.a stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_kinsol.a  stan/lib/stan_math/lib/tbb/libtbb.so.2 -o /home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started/getting_started
rm -f /home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started/getting_started.o

INFO:cmdstanpy:compiled model executable: /home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started/getting_started

DEBUG:cmdstanpy:cmd: /home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started/getting_started info
cwd: None

DEBUG:cmdstanpy:input tempfile: /tmp/tmp20shfmc6/am6h50u9.json

INFO:cmdstanpy:CmdStan start processing

DEBUG:cmdstanpy:idx 0

DEBUG:cmdstanpy:running CmdStan, num_threads: 1

DEBUG:cmdstanpy:CmdStan args: ['/home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started/getting_started', 'id=1', 'random', 'seed=84386', 'data', 'file=/tmp/tmp20shfmc6/am6h50u9.json', 'output', 'file=/tmp/tmp20shfmc6/getting_startedjmmppx0t/getting_started-20250317164646.csv', 'method=sample', 'num_samples=20', 'num_warmup=100', 'algorithm=hmc', 'adapt', 'engaged=1']

INFO:cmdstanpy:CmdStan done processing.

DEBUG:cmdstanpy:runset
RunSet: chains=1, chain_ids=[1], num_processes=1
 cmd (chain 1):
	['/home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started/getting_started', 'id=1', 'random', 'seed=84386', 'data', 'file=/tmp/tmp20shfmc6/am6h50u9.json', 'output', 'file=/tmp/tmp20shfmc6/getting_startedjmmppx0t/getting_started-20250317164646.csv', 'method=sample', 'num_samples=20', 'num_warmup=100', 'algorithm=hmc', 'adapt', 'engaged=1']
 retcodes=[0]
 per-chain output files (showing chain 1 only):
 csv_file:
	/tmp/tmp20shfmc6/getting_startedjmmppx0t/getting_started-20250317164646.csv
 console_msgs (if any):
	/tmp/tmp20shfmc6/getting_startedjmmppx0t/getting_started-20250317164646_0-stdout.txt

DEBUG:cmdstanpy:Chain 1 console:
method = sample (Default)
  sample
    num_samples = 20
    num_warmup = 100
    save_warmup = 0 (Default)
    thin = 1 (Default)
    adapt
      engaged = 1 (Default)
      gamma = 0.050000000000000003 (Default)
      delta = 0.80000000000000004 (Default)
      kappa = 0.75 (Default)
      t0 = 10 (Default)
      init_buffer = 75 (Default)
      term_buffer = 50 (Default)
      window = 25 (Default)
    algorithm = hmc (Default)
      hmc
        engine = nuts (Default)
          nuts
            max_depth = 10 (Default)
        metric = diag_e (Default)
        metric_file =  (Default)
        stepsize = 1 (Default)
        stepsize_jitter = 0 (Default)
    num_chains = 1 (Default)
id = 1 (Default)
data
  file = /tmp/tmp20shfmc6/am6h50u9.json
init = 2 (Default)
random
  seed = 84386
output
  file = /tmp/tmp20shfmc6/getting_startedjmmppx0t/getting_started-20250317164646.csv
  diagnostic_file =  (Default)
  refresh = 100 (Default)
  sig_figs = -1 (Default)
  profile_file = profile.csv (Default)
num_threads = 1 (Default)


Gradient evaluation took 4.8e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.48 seconds.
Adjust your expectations accordingly!


WARNING: There aren't enough warmup iterations to fit the
         three stages of adaptation as currently configured.
         Reducing each adaptation stage to 15%/75%/10% of
         the given number of warmup iterations:
           init_buffer = 15
           adapt_window = 75
           term_buffer = 10

Iteration:   1 / 120 [  0%]  (Warmup)
Iteration: 100 / 120 [ 83%]  (Warmup)
Iteration: 101 / 120 [ 84%]  (Sampling)
Iteration: 120 / 120 [100%]  (Sampling)

 Elapsed Time: 0.028 seconds (Warm-up)
               0.034 seconds (Sampling)
               0.062 seconds (Total)

../../_images/3f95e7b97c6058a162361e2d39027187264289404b84312f778bd097ff5085cb.png

The samples are consistent with a GP prior, but they are far from independent. Under the prior, adjacent elements of \(f\) are heavily correlated. This frustrates Stan’s sampler which draws samples by exploring the target distribution like a ball rolling over a landscape. Narrow valleys lead to slow exploration. We thus need to choose a different parameterization as shown in the model below.

functions {
    // Include utility functions, such as real fast Fourier transforms.
    #include gptools/util.stan
    // Include functions to evaluate GP likelihoods with Fourier methods.
    #include gptools/fft.stan
}

data {
    // The number of sample points.
    int<lower=1> n;
    // Real fast Fourier transform of the covariance kernel.
    vector[n %/% 2 + 1] cov_rfft;
}

parameters {
    // GP value at the `n` sampling points.
    vector[n] z;
}

transformed parameters {
    // Transform from white noise to a GP realization.
    vector[n] f = gp_inv_rfft(z, zeros_vector(n), cov_rfft);
}

model {
    // Implies that `f` is a GP.
    z ~ normal(0, 1);
}

Let’s repeat the sampling process using the reparameterized model.

model = compile_model(stan_file="getting_started_non_centered.stan")
fit = model.sample(data, chains=1, iter_warmup=100, iter_sampling=20)
plot_samples(fit, 0.5)

WARNING:root:cmdstan<2.36.0 had a bug in the evaluation of Matern 3/2 kernels with different length scales for different dimensions; see https://github.com/stan-dev/math/pull/3084 for details. Your model may yield unexpected results or crash if you use nearest-neighbor Gaussian processes with Matern 3/2 kernels. Your cmdstan version is 2.31; consider upgrading.

WARNING:cmdstanpy:CmdStanModel(compile=...) is deprecated and will be removed in the next major version. The constructor will always ensure a model has a compiled executable.
If you wish to force recompilation, use force_compile=True instead.

INFO:cmdstanpy:compiling stan file /home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started/getting_started_non_centered.stan to exe file /home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started/getting_started_non_centered

DEBUG:cmdstanpy:cmd: make STANCFLAGS+=--filename-in-msg=getting_started_non_centered.stan STANCFLAGS+=--include-paths=/home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/python/stan/gptools/stan,/home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started /home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started/getting_started_non_centered
cwd: /home/docs/.cmdstan/cmdstan-2.31.0

DEBUG:cmdstanpy:Console output:

--- Translating Stan model to C++ code ---
bin/stanc --filename-in-msg=getting_started_non_centered.stan --include-paths=/home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/python/stan/gptools/stan,/home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started --o=/home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started/getting_started_non_centered.hpp /home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started/getting_started_non_centered.stan

--- Compiling, linking C++ code ---
g++ -std=c++1y -pthread -D_REENTRANT -Wno-sign-compare -Wno-ignored-attributes      -I stan/lib/stan_math/lib/tbb_2020.3/include    -O3 -I src -I stan/src -I stan/lib/rapidjson_1.1.0/ -I lib/CLI11-1.9.1/ -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.3.9 -I stan/lib/stan_math/lib/boost_1.78.0 -I stan/lib/stan_math/lib/sundials_6.1.1/include -I stan/lib/stan_math/lib/sundials_6.1.1/src/sundials    -DBOOST_DISABLE_ASSERTS          -c -Wno-ignored-attributes   -x c++ -o /home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started/getting_started_non_centered.o /home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started/getting_started_non_centered.hpp
g++ -std=c++1y -pthread -D_REENTRANT -Wno-sign-compare -Wno-ignored-attributes      -I stan/lib/stan_math/lib/tbb_2020.3/include    -O3 -I src -I stan/src -I stan/lib/rapidjson_1.1.0/ -I lib/CLI11-1.9.1/ -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.3.9 -I stan/lib/stan_math/lib/boost_1.78.0 -I stan/lib/stan_math/lib/sundials_6.1.1/include -I stan/lib/stan_math/lib/sundials_6.1.1/src/sundials    -DBOOST_DISABLE_ASSERTS                -Wl,-L,"/home/docs/.cmdstan/cmdstan-2.31.0/stan/lib/stan_math/lib/tbb" -Wl,-rpath,"/home/docs/.cmdstan/cmdstan-2.31.0/stan/lib/stan_math/lib/tbb"      /home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started/getting_started_non_centered.o src/cmdstan/main.o        -Wl,-L,"/home/docs/.cmdstan/cmdstan-2.31.0/stan/lib/stan_math/lib/tbb" -Wl,-rpath,"/home/docs/.cmdstan/cmdstan-2.31.0/stan/lib/stan_math/lib/tbb"   stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_nvecserial.a stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_cvodes.a stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_idas.a stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_kinsol.a  stan/lib/stan_math/lib/tbb/libtbb.so.2 -o /home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started/getting_started_non_centered
rm -f /home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started/getting_started_non_centered.o

INFO:cmdstanpy:compiled model executable: /home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started/getting_started_non_centered

DEBUG:cmdstanpy:cmd: /home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started/getting_started_non_centered info
cwd: None

DEBUG:cmdstanpy:input tempfile: /tmp/tmp20shfmc6/6wpw8wf3.json

INFO:cmdstanpy:CmdStan start processing

DEBUG:cmdstanpy:idx 0

DEBUG:cmdstanpy:running CmdStan, num_threads: 1

DEBUG:cmdstanpy:CmdStan args: ['/home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started/getting_started_non_centered', 'id=1', 'random', 'seed=74308', 'data', 'file=/tmp/tmp20shfmc6/6wpw8wf3.json', 'output', 'file=/tmp/tmp20shfmc6/getting_started_non_centeredsprmsd1y/getting_started_non_centered-20250317164705.csv', 'method=sample', 'num_samples=20', 'num_warmup=100', 'algorithm=hmc', 'adapt', 'engaged=1']

INFO:cmdstanpy:CmdStan done processing.

DEBUG:cmdstanpy:runset
RunSet: chains=1, chain_ids=[1], num_processes=1
 cmd (chain 1):
	['/home/docs/checkouts/readthedocs.org/user_builds/gptools-stan/checkouts/latest/docs/getting_started/getting_started_non_centered', 'id=1', 'random', 'seed=74308', 'data', 'file=/tmp/tmp20shfmc6/6wpw8wf3.json', 'output', 'file=/tmp/tmp20shfmc6/getting_started_non_centeredsprmsd1y/getting_started_non_centered-20250317164705.csv', 'method=sample', 'num_samples=20', 'num_warmup=100', 'algorithm=hmc', 'adapt', 'engaged=1']
 retcodes=[0]
 per-chain output files (showing chain 1 only):
 csv_file:
	/tmp/tmp20shfmc6/getting_started_non_centeredsprmsd1y/getting_started_non_centered-20250317164705.csv
 console_msgs (if any):
	/tmp/tmp20shfmc6/getting_started_non_centeredsprmsd1y/getting_started_non_centered-20250317164705_0-stdout.txt

DEBUG:cmdstanpy:Chain 1 console:
method = sample (Default)
  sample
    num_samples = 20
    num_warmup = 100
    save_warmup = 0 (Default)
    thin = 1 (Default)
    adapt
      engaged = 1 (Default)
      gamma = 0.050000000000000003 (Default)
      delta = 0.80000000000000004 (Default)
      kappa = 0.75 (Default)
      t0 = 10 (Default)
      init_buffer = 75 (Default)
      term_buffer = 50 (Default)
      window = 25 (Default)
    algorithm = hmc (Default)
      hmc
        engine = nuts (Default)
          nuts
            max_depth = 10 (Default)
        metric = diag_e (Default)
        metric_file =  (Default)
        stepsize = 1 (Default)
        stepsize_jitter = 0 (Default)
    num_chains = 1 (Default)
id = 1 (Default)
data
  file = /tmp/tmp20shfmc6/6wpw8wf3.json
init = 2 (Default)
random
  seed = 74308
output
  file = /tmp/tmp20shfmc6/getting_started_non_centeredsprmsd1y/getting_started_non_centered-20250317164705.csv
  diagnostic_file =  (Default)
  refresh = 100 (Default)
  sig_figs = -1 (Default)
  profile_file = profile.csv (Default)
num_threads = 1 (Default)


Gradient evaluation took 5.5e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.55 seconds.
Adjust your expectations accordingly!


WARNING: There aren't enough warmup iterations to fit the
         three stages of adaptation as currently configured.
         Reducing each adaptation stage to 15%/75%/10% of
         the given number of warmup iterations:
           init_buffer = 15
           adapt_window = 75
           term_buffer = 10

Iteration:   1 / 120 [  0%]  (Warmup)
Iteration: 100 / 120 [ 83%]  (Warmup)
Iteration: 101 / 120 [ 84%]  (Sampling)
Iteration: 120 / 120 [100%]  (Sampling)

 Elapsed Time: 0.022 seconds (Warm-up)
               0.005 seconds (Sampling)
               0.027 seconds (Total)

../../_images/3f80b1a64a50e0a24af3a7fbd5ff2f32a47902a0235c7702c5059d3dce365058.png

As shown above, we now have virtually independent samples from the target distribution. Picking the right parameterization is important for exploring the target distribution efficiently. You can explore more advanced use cases in further Examples.