Samplers, samplers, everywhere...

This notebook aims to provide basic examples of how to run a variety of MCMC and nested sampling codes in Python. It won't go into detail about MCMC methods in general, and assumes a bit of knowledge about them, nor will it discuss all the various bells-and-whistles that each sampler can use, but it will hopefully help people get started with writing a sampler code using an understandable (albeit "toy") example. A very nice, quite similar, article on this topic by Jake VanderPlas can be found here. There are also very good online notes from the Computational Statistics and Statistical Computing course at Duke University by Cliburn Chan & Janice McCarthy, covering some of this material (including PyMC3 and PyStan examples that are very similar to what is covered here). Section 2 of Brewer & Foreman-Mackey (2016) gives a nice summary of the advantages and disadvantages of some of the algorithms shown here. A good general description of fitting a model to data in a Bayesian formalism, including various useful examples, can be found in Hogg, Bovy & Lang (2010).

This code within this notebook is designed to work for Python 3. An older version that should be compatible with Python 2 can be found here.

You can jump straight to the examples for the different samplers with the links below:

MCMC samplers:

  1. emcee
  2. PyMC3
  3. TensorFlow Probability
  4. Zeus
  5. PyStan
  6. PyJAGS

Nested samplers:

  1. Nestle
  2. CPNest
  3. dynesty
  4. UltraNest
  5. PyMultiNest
  6. DNest4
  7. PyPolyChord

An example using the bilby package, which provides a unified interface to a variety of different samplers is also provided.

Background

MCMC methods are most often used in the context of Bayesian parameter estimation, e.g., you have a model, $y$, defined by a set of parameters, ${\vec{\theta}}$, with certain prior probability distributions $p(\vec{\theta}|I)$, and you want to efficiently sample from the marginalised posterior probability distributions on those parameters given some data, $\mathbf{d}$, and a particular likelihood function, $p(\mathbf{d}|\vec{\theta},I)$, ($I$ is just a substitute for all the implicit assumptions that have gone into defining our model). From Bayes theorem we have the joint posterior on the whole set of parameters,

$$ p(\vec{\theta}|\mathbf{d},I) = \frac{p(\mathbf{d}|\vec{\theta},I) p(\vec{\theta}|I)}{p(\mathbf{d}|I)}, $$

with marginalised posteriors on an individual parameter, say $\theta_1$, given by nested integrals over all other parameters,

$$ p(\theta_1|\mathbf{d},I) = \frac{1}{p(\mathbf{d}|I)}\int^{\forall \theta_i \in (\vec{\theta}\neq \theta_1)} p(\mathbf{d}|\vec{\theta},I) p(\vec{\theta}|I) {\rm d}{\theta_i}, $$

($\forall \theta_i \in (\vec{\theta}\neq \theta_1)$ can be translated into "for all parameters $\theta_i$ in $\vec{\theta}$ that are not equal to $\theta_1$). $p(\mathbf{d}|I)$ is the marginal likelihood, or evidence, for the data given the particular model used, and is given by

$$ p(\mathbf{d}|I) = \int^{\forall \theta_i \in \vec{\theta}} p(\mathbf{d}|\vec{\theta},I) p(\vec{\theta}|I) {\rm d}{\theta_i}, $$

which is a normalising constant for the posterior. It will be discussed more below in relation to nested sampling.

This is the only context I (and probably many people in the physical sciences) have encountered using MCMC methods, and is the context I will use below.

Toy model

The very basic example that I will use to demonstrate coding up the samplers is that of sampling from the posterior of two parameters, $m$ and $c$, defining the model:

$$ y(\mathbf{x};m,c) = m\mathbf{x} + c, $$

where $\mathbf{x}$ is a vector of values of $x$. This is basically fitting the gradient and $y$-intercept of a straight line, which should be fairly familiar as linear regression, and is obviously a solved problem (assuming uniform priors!) using, e.g., least squares fitting. But, this provides a simple example that can hopefully be extended to more realistics scenarios.

Our aim is to produce samples drawn from the posterior $p(m, c|\mathbf{d},I)$, which can be used (by, e.g., histogramming the sample values) to approximate the marginal posteriors $p(m|\mathbf{d},I)$ and $p(c|\mathbf{d},I)$.

The vector of data samples, $\mathbf{d}$, will consist of the model defined at a set of $M$ values, $\mathbf{x}$, with additive Gaussian noise of known standard deviation, i.e., data point $i$ will be defined by

$$ d_i = y(x_i;m,c) + n_i, $$

with

$$ n_i \sim N(0, \sigma_i), $$

which means the noise is drawn from a Gaussian (or Normal) distribution of zero mean and standard deviation of $\sigma_i$.

Note: I'll generally use the term Gaussian distribution rather than "Normal" distribution, as this it what I first heard it called. But, in many packages discussed below "Normal" is the more common word for the distribution.

Setting the likelihood

A sensible and fairly standard likelihood function (due to the Central Limit Theorem and maximum entropy arguments) for a single data point given values of $m$ and $c$ and a known noise standard deviation, will be given by a Gaussian distribution of the form

$$ p(d_i|m,c,I) = \left(2\pi\sigma_i^2\right)^{-1/2} \exp{\left(-\frac{\left[(d_i - y(x_i;m,c)\right]^2}{2\sigma_i^2} \right)}, $$

and the joint likelihood for the whole dataset of $M$ points (assuming independent noise) will be the product of the individual likelihoods

$$ p(\mathbf{d}|m,c,I) = \prod_{i=1}^M p(d_i|m,c,I) = \left(2\pi\right)^{-M/2}\left(\prod_{i=1}^M \sigma_i^{-1} \right) \exp{\left(-\sum_{i=1}^M\frac{\left[(d_i - y(x_i;m,c)\right]^2}{2\sigma_i^2} \right)}. $$

When computing likelihoods, numerical considerations mean that one almost always works with the natural logarithm of the likelihood, so we have

$$ \ln{p(\mathbf{d}|m,c,I)} \equiv \log{L} = -\frac{M}{2}\ln{(2\pi)} - \ln\left(\prod_{i=1}^M \sigma_i\right) - \sum_{i=1}^M\frac{\left[(d_i - y(x_i;m,c)\right]^2}{2\sigma_i^2}. $$

For many cases we can ignore the terms that do not include our required parameters when sampling from their marginalised likelihoods (as they will just be constants), so often the log-likelihood can be set to be

$$ \log{L} = - \sum_{i=1}^M\frac{\left[(d_i - y(x_i;m,c)\right]^2}{2\sigma_i^2}. $$

Note: If wanting to evaluate the marginal likelihood for the data (e.g., using nested sampling, which is discussed later) to use it for model comparison, the constants may be required.

Setting the prior

In addition to the likelihood, we need to define prior distributions on the parameters. To demonstrate a couple of different, and reasonably common, priors, I'll use a uniform probability distribution for the prior on $m$ and a Gaussian probability distribution for the prior on $c$:

$$ p(m|\mu_m,\sigma_m,I) = \frac{1}{\sqrt{2\pi\sigma_m^2}}\exp{\left(-\frac{(m-\mu_m)^2}{2\sigma_m^2}\right)}, $$

and

$$ p(c|c_{\rm min}, c_{\rm max}) = \Bigg\{\begin{array}{cl} \frac{1}{c_{\rm max}-c_{\rm min}} & \text{if}~c_{\rm min} < c < c_{\rm max}, \\ 0 & \text{otherwise}. \end{array} $$

In our example we'll set:

  • $\mu_m = 0$ and $\sigma_m = 10$,
  • $c_{\rm min} = -10$ and $c_{\rm max} = 10$.

Note: These priors are not intended to be the least informative for the parameterisation that we have, but are just chosen for demonstration purposes to show how to code up two different distributions. A better prior for this particular problem might be the one discussed here, which could be coded up for PyMC3 using the example here.

Creating the data

So, let's get started creating our data. I'll (arbitrarily) use:

  • $m = 3.5$,
  • $c = 1.2$,

for the model and

  • $\mathbf{x} \in [0, 10)$ in $M=50$ uniform steps ($[a,b)$ means inclusive of $a$, but exclusive of $b$),
  • $\sigma_i = \sigma = 2$ (i.e., each noise point is independent, but has the same standard deviation),

for the data.

In [1]:
# show plots inline in the notebook
%matplotlib inline

import numpy as np                  # import numpy
from time import time               # use for timing functions

# useful modules!
import os
import sys

# make the plots look a bit nicer with some defaults
from matplotlib import pyplot as pl # import pyplot from matplotlib
import matplotlib as mpl
rcparams = {}
rcparams['axes.linewidth'] = 0.5
rcparams['font.family'] = 'serif'
rcparams['font.size'] = 22
rcparams['legend.fontsize'] = 16
rcparams['mathtext.fontset'] = "stix"

# functions for plotting posteriors
import corner
from scipy.stats import gaussian_kde
In [2]:
# set the true values of the model parameters for creating the data
m = 3.5 # gradient of the line
c = 1.2 # y-intercept of the line

# set the "predictor variable"/abscissa
M = 50
xmin = 0.
xmax = 10.
stepsize = (xmax - xmin) / M
x = np.arange(xmin, xmax, stepsize)

# define the model function
def straight_line(x, m, c):
    """
    A straight line model: y = m*x + c
    
    Args:
        x (list): a set of abscissa points at which the model is defined
        m (float): the gradient of the line
        c (float): the y-intercept of the line
    """

    return m * x + c

# seed our random number generator, so we have reproducible data
np.random.seed(sum([ord(v) for v in 'samplers']))

# create the data - the model plus Gaussian noise
sigma = 2.0  # standard deviation of the noise
data = straight_line(x, m, c) + np.random.normal(scale=sigma, size=M)

# plot the data
mpl.rcParams.update(rcparams) # update plot parameters
fig, ax = pl.subplots(figsize=(9,6))
ax.plot(x, data, 'bo', alpha=0.5, label='data')
ax.plot(x, straight_line(x, m, c), 'r-', lw=2, label='model')
ax.legend()
ax.set_xlim([xmin, xmax])
ax.set_xlabel(r'$x$');

MCMC samplers

First up I'll deal with MCMC samplers that are purely written in Python, then a couple that are wrappers to other libraries.

emcee

emcee (Foreman-Mackey et al, 2013) is a Python MCMC implementation that uses an affine invariant ensemble sampler (Goodman & Weare, 2010). This basically means that it doesn't just evolve a single point in the model parameter space, but evolves and ensemble of points, and steps in the evolution are tuned based on the current state of the ensemble. The steps scale naturally (the affine invariant bit) as the ensemble closes in on the posterior. This code has been well used in the astrophysics community (and as an astrophysicist myself I'm showing it as my first example!).

emcee is available on PyPI and is installable via pip with:

pip install emcee

or via Conda with:

conda install -c conda-forge emcee

The source code is available on GitHub here. The example here is very similar to the line model example given in the emcee documentation.

First up, we need to define the likelihood function, prior functions, and posterior probability function. These all need to be defined as the natural logarithms of the functions. All the functions take in a sample tuple or list, where a sample is a vector $\vec{\theta}$ of parameter values, that can be unpacked to give the individual parameter values. Other arguments to the posterior function can be user defined.

Posterior

As we do not care about the marginal likelihood, the posterior will just be the product of the likelihood and prior, and will take the following form:

In [3]:
def logposterior(theta, data, sigma, x):
    """
    The natural logarithm of the joint posterior.

    Args:
        theta (tuple): a sample containing individual parameter values
        data (list): the set of data/observations
        sigma (float): the standard deviation of the data points
        x (list): the abscissa values at which the data/model is defined
    """

    lp = logprior(theta) # get the prior

    # if the prior is not finite return a probability of zero (log probability of -inf)
    if not np.isfinite(lp):
        return -np.inf

    # return the likeihood times the prior (log likelihood plus the log prior)
    return lp + loglikelihood(theta, data, sigma, x)

Likelihood

The likelihood will take the following form:

In [4]:
def loglikelihood(theta, data, sigma, x):
    """
    The natural logarithm of the joint Gaussian likelihood.

    Args:
        theta (tuple): a sample containing individual parameter values
        data (list): the set of data/observations
        sigma (float): the standard deviation of the data points
        x (list): the abscissa values at which the data/model is defined

    Note:
        We do not include the normalisation constants (as discussed above).
    """
    
    # unpack the model parameters from the tuple
    m, c = theta

    # evaluate the model (assumes that the straight_line model is defined as above)
    md = straight_line(x, m, c)

    # return the log likelihood
    return -0.5 * np.sum(((md - data) / sigma)**2)

Prior

The prior function will take the following form:

In [5]:
def logprior(theta):
    """
    The natural logarithm of the prior probability.

    Args:
        theta (tuple): a sample containing individual parameter values

    Note:
        We can ignore the normalisations of the prior here.
    """

    lp = 0.

    # unpack the model parameters from the tuple
    m, c = theta

    # uniform prior on c
    cmin = -10. # lower range of prior
    cmax = 10.  # upper range of prior

    # set prior to 1 (log prior to 0) if in the range and zero (-inf) outside the range
    # (we don't care about it being properly normalised, but you can if you want) 
    lp = 0. if cmin < c < cmax else -np.inf

    # Gaussian prior on m
    mmu = 0.     # mean of the Gaussian prior
    msigma = 10. # standard deviation of the Gaussian prior
    lp -= 0.5 * ((m - mmu) / msigma)**2

    return lp

MCMC set up

We need to decide on an initial number of ensemble points and initialise the samples, i.e., set starting points for them. The initial samples can be drawn from the prior distributions (in the emcee documentation example it starts the opposite way round, i.e., with initial samples tightly packed around the true value). We'll choose 100 ensemble points (the number of so-called walkers) and initialise them with:

In [6]:
Nens = 100   # number of ensemble points

mmu = 0.     # mean of the Gaussian prior
msigma = 10. # standard deviation of the Gaussian prior

mini = np.random.normal(mmu, msigma, Nens) # initial m points

cmin = -10.  # lower range of prior
cmax = 10.   # upper range of prior

cini = np.random.uniform(cmin, cmax, Nens) # initial c points

inisamples = np.array([mini, cini]).T # initial samples

ndims = inisamples.shape[1] # number of parameters/dimensions

Because our initial samples are drawn from the prior it can take time for them to converge on sampling from the posterior distributions that we want. To try and get around this we can run the sampler with a burn-in period (although see, e.g., here or here for some discussions about whether burn-ins are actually necessary), and we will ignore samples during this burn-in.

Note: Alternative options could be to use some optimisation routine to try and find the posterior mode and start samples at that point, or use all samples, work out an autocorrelation length and thin the samples accordingly so they are uncorrelated (as shown a bit later), which gets rid of correlations during the initial convergence of the samples.

We'll choose a burn-in of 500 samples, and then run the chain for another 500 samples, which will give an output with $(500+500)\times 100 = 100000$ samples.

In [7]:
Nburnin = 500   # number of burn-in samples
Nsamples = 500  # number of final posterior samples

The sampler first has to be initialised and told the required arguments, including any additional arguments for the posterior function:

In [8]:
import emcee # import the emcee package

print('emcee version: {}'.format(emcee.__version__))

# for bookkeeping set number of likelihood calls to zero
loglikelihood.ncalls = 0

# set additional args for the posterior (the data, the noise std. dev., and the abscissa)
argslist = (data, sigma, x)

# set up the sampler
sampler = emcee.EnsembleSampler(Nens, ndims, logposterior, args=argslist)
emcee version: 3.0.2

Now the sampler can be run, and the burn-in samples removed, with:

In [9]:
# pass the initial samples and total number of samples required
t0 = time() # start time
sampler.run_mcmc(inisamples, Nsamples + Nburnin);
t1 = time()

timeemcee = (t1-t0)
print("Time taken to run 'emcee' is {} seconds".format(timeemcee))

# extract the samples (removing the burn-in)
samples_emcee = sampler.get_chain(flat=True, discard=Nburnin)
Time taken to run 'emcee' is 1.9068684577941895 seconds

Now we can plot the resulting posteriors using corner.py, and also show smoothed versions using a Gaussian kernel density estimate.

In [10]:
# plot the resulting posteriors
mpl.rcParams.update({'font.size': 16})

def plotposts(samples, **kwargs):
    """
    Function to plot posteriors using corner.py and scipy's gaussian KDE function.
    """
    if "truths" not in kwargs:
        kwargs["truths"] = [m, c]

    fig = corner.corner(samples, labels=[r'$m$', r'$c$'], hist_kwargs={'density': True}, **kwargs)

    # plot KDE smoothed version of distributions
    for axidx, samps in zip([0, 3], samples.T):
        kde = gaussian_kde(samps)
        xvals = fig.axes[axidx].get_xlim()
        xvals = np.linspace(xvals[0], xvals[1], 100)
        fig.axes[axidx].plot(xvals, kde(xvals), color='firebrick')

# make plot
plotposts(samples_emcee)
In [11]:
# lets store some results for showing together later
resdict = {}

resdict['memcee_mu'] = np.mean(samples_emcee[:,0])     # mean of m samples
resdict['memcee_sig'] = np.std(samples_emcee[:,0])     # standard deviation of m samples
resdict['cemcee_mu'] = np.mean(samples_emcee[:,1])     # mean of c samples
resdict['cemcee_sig'] = np.std(samples_emcee[:,1])     # standard deviation of c samples
resdict['ccemcee'] = np.corrcoef(samples_emcee.T)[0,1] # correlation coefficient between parameters

The time to convergence can depend greatly on how the initial points are set up, so if you have a better guess as to location and shape of the posterior, then using those guesses is advised. But, beware overtuning! emcee has some handy utilities to draw samples from around an initial guess. In the example below I show the use of the (now deprecated) sample_ball() function, which creates a ball of samples assuming no correlations between parameters. However, if you knew that your parameter uncertainties had a particular covariance, you could use sample_ellipsoid() to generate a sample cloud using a supplied covarance matrix.

In [12]:
from emcee.utils import sample_ball
m_guess, c_guess = 1., 1. # guess at the mean of the distribution
m_std, c_std = 1., 1.     # guess at the width of the distribution

# generate samples from a ball based on those guesses
inisamples = sample_ball((m_guess, c_guess), (m_std, c_std), size=Nens)
In [13]:
# reset so we don't pick up the result from earlier
sampler.reset()

# pass the initial samples and total number of samples required
t0 = time() # start time
sampler.run_mcmc(inisamples, Nsamples+Nburnin);
t1 = time()

timeemcee = (t1-t0)
print("Time taken to run 'emcee' is {} seconds".format(timeemcee))

# extract the samples (removing the burn-in)
samples_emcee = sampler.get_chain(flat=True, discard=Nburnin)
Time taken to run 'emcee' is 1.8876838684082031 seconds
In [14]:
# plot the resulting posteriors
plotposts(samples_emcee)

Once the sampler has settled, the samples that come out of MCMC will be drawn from the posterior. However, even in the best of situations, draws which are adjacent are often correlated with each other. The number of "independent" samples, that is, fully independent draws from the posterior can be much fewer than the samples in hand.

The emcee EnsembleSampler class has an inbuilt function, get_autocorr_time(), to calculate the auto-correlation length (ACL) (or time) - this is an estimate the number of samples between independent samples in the chain. You can use this number to estimate convergence of chains, and also to thin out chains to produce independent samples.

In [15]:
m_acl, c_acl = sampler.get_autocorr_time(quiet=True)
print("The autocorrelation length for m is {0} and c is {1}".format(m_acl, c_acl))

# thin out the chain
samples_emcee = sampler.get_chain(flat=True, discard=Nburnin, thin=int(max([m_acl, c_acl])))
print("Number of independent samples is {}".format(len(samples_emcee)))

resdict['esemcee'] = len(samples_emcee)
resdict['essemcee'] = int(resdict['esemcee'] / timeemcee)
print("Effective samples per second: {}".format(resdict['essemcee']))

# plot the resulting posteriors
plotposts(samples_emcee)
WARNING:root:The chain is shorter than 50 times the integrated autocorrelation time for 2 parameter(s). Use this estimate with caution and run a longer chain!
N/50 = 20;
tau: [23.98692287 25.13038681]
The autocorrelation length for m is 23.986922869990487 and c is 25.130386813726226
Number of independent samples is 2000
Effective samples per second: 1059

In practice, if you have a minimum number of independent samples that you require then you can repeatedly run the sampler, checking the ACL each time (and maybe adjusting the c input parameter for get_autocorr_time()), until you have collected the required number.

Checking the ACL is something that can be done for any of the samplers discussed in the rest of this notebook. Different sampling methods can lead to different ACLs, and methods that may take more time per sample might end up having shorter ACLs. So, examining the various trade-offs between overall speed and the final number of independent samples can be worthwhile.

PyMC3

PyMC3 (Salvatier, Wiecki & Fonnesbeck, 2016) is a Python MCMC implementation that can use a variety of modern sampling method, including "No-U-turn sampling" (NUTS) (Hoffman & Gellman, 2014) and Hamiltonian Monte Carlo (Duane et al, 1987), both of which make use of the gradient of the posterior to efficiently sample the space. An example of getting started using PyMC3 can be found here upon which this example is based.

This code can be installed using pip with:

pip install pymc3

or with Conda with:

conda install -c conda-forge pymc3

and the source is available on GitHub here.

The majority of distributions that can be used for priors and likelihood functions are predefined in PyMC3, so you do not have to worry about writing them yourself (although you can create your own custom distribution).

Note that in PyMC3 the prior and likelihood distributions have to be defined within the context of a PyMC3 Model() class, and cannot be defined outside that context (the with statment). Therefore, to set up the model we can't just use the straight_line() function defined above, but can do the following:

In [16]:
import pymc3 as pm # import PyMC3

print('PyMC3 version: {}'.format(pm.__version__))

# set the PyMC3 model
linear_model = pm.Model()

with linear_model:
    # set prior parameters
    cmin = -10. # lower range of uniform distribution on c
    cmax = 10.  # upper range of uniform distribution on c
    
    mmu = 0.     # mean of Gaussian distribution on m
    msigma = 10. # standard deviation of Gaussian distribution on m
    
    # set priors for unknown parameters
    cmodel = pm.Uniform('c', lower=cmin, upper=cmax) # uniform prior on y-intercept
    mmodel = pm.Normal('m', mu=mmu, sd=msigma)       # Gaussian prior on gradient
    
    sigmamodel = sigma # set a single standard deviation
    
    # Expected value of outcome, aka "the model"
    mu = mmodel*x + cmodel

    # Gaussian likelihood (sampling distribution) of observations, "data"
    Y_obs = pm.Normal('Y_obs', mu=mu, sd=sigmamodel, observed=data)
WARNING:theano.tensor.blas:Using NumPy C-API based implementation for BLAS functions.
PyMC3 version: 3.11.2

Now we can perform the sampling. By default, for continuous variables like we have, the sampling step method will use NUTS, but the method can be chosen manually. We'll run with 1000 samples, and tune (like the burn-in discussed above) with 1000 samples (discarding these tuning samples):

In [17]:
%%capture
Nsamples = 1000 # final number of samples
Ntune = 1000    # number of tuning samples

# perform sampling
t0 = time()
with linear_model:
    trace = pm.sample(Nsamples, tune=Ntune, discard_tuned_samples=True); # perform sampling
t1 = time()

timepymc3 = (t1 - t0)
WARNING:pymc3:The acceptance probability does not match the target. It is 0.8881386717506918, but should be close to 0.8. Try to increase the number of tuning steps.
WARNING:pymc3:The acceptance probability does not match the target. It is 0.9037542650276801, but should be close to 0.8. Try to increase the number of tuning steps.
In [18]:
print("Time taken to run 'PyMC3' is {} seconds".format(timepymc3))
Time taken to run 'PyMC3' is 4.574299335479736 seconds

Now, let's extract the samples from the trace object and make a corner plot!

In [19]:
samples_pymc3 = np.vstack((trace['m'], trace['c'])).T

resdict['mpymc3_mu'] = np.mean(samples_pymc3[:,0])      # mean of m samples
resdict['mpymc3_sig'] = np.std(samples_pymc3[:,0])      # standard deviation of m samples
resdict['cpymc3_mu'] = np.mean(samples_pymc3[:,1])      # mean of c samples
resdict['cpymc3_sig'] = np.std(samples_pymc3[:,1])      # standard deviation of c samples
resdict['ccpymc3'] = np.corrcoef(samples_pymc3.T)[0,1]  # correlation coefficient between parameters

plotposts(samples_pymc3)
In [20]:
# calculate the autocorrelation length
acts = []
for param in ["m", "c"]:
    acts.append(emcee.autocorr.integrated_time(trace[param]))
    print("Autocorrelation length for '{}' is {}".format(param, acts[-1]))
print("Maximum autocorrelation length is {0:.1f}".format(np.max(acts)))

resdict['espymc3'] = int(len(samples_pymc3) / np.max(acts))
resdict['esspymc3'] = int(resdict['espymc3'] / timepymc3)
print("Effective samples per second: {}".format(resdict['esspymc3']))
Autocorrelation length for 'm' is [4.57782586]
Autocorrelation length for 'c' is [3.95745792]
Maximum autocorrelation length is 4.6
Effective samples per second: 190

TensorFlow Probability

TensorFlow Probability (TFP) is a probabilistic modelling framework built upon the TensorFlow library. TFP contains some samplers for performing MCMC. At the time of writing TFP is still fairly new, and currently includes a Hamiltonian MC sampler, a Metropolis-Hastings sampler, and a slice sampler.

TFP can be installed from PyPI via pip with:

pip install tensorflow tensorflow-probability

where tensorflow must be explicitly installed as it's not a dependency of tensorflow-probability.

Using TFP for MCMC sampling has some similarites to PyMC3, however it can be fairly opaque, and (for me at least!) non-intuitive. The example given does not come with a detailed explanation of the constructions I've used, because I don't have any in-depth knowlegde of them! It was mainly cobbled together by looking at the example given here.

Firstly, we can set up the probablistic model (with a JointDistributionSequential class) that contains the prior distributions on the parameters of interest and a lambda function defining the likelihood function, within which we write out the straight line model for the data. Some things to be aware of are: the arguments to the lambda function defining the likelihood must be given in the opposite order to that in which those arguments are placed within the JointDistributionSequential class; the name of the Independent distribution that holds the likelihood function should be the same as the name of the variable that holds the data.

In [21]:
# import TensorFlow probability
import tensorflow_probability as tfp
from tensorflow_probability import distributions as tfd
import tensorflow as tf

cmin = -10.  # lower range of uniform distribution on c
cmax = 10.   # upper range of uniform distribution on c

mmu = 0.      # mean of Gaussian distribution on m
msigma = 10.  # standard deviation of Gaussian distribution on m

# convert x values and data to 32 bit float
xtfp = x.astype(np.float32)  # x is being use globally here
datatfp = data.astype(np.float32)

# set model - contains priors and the expected linear model
model = tfd.JointDistributionSequential([
  tfd.Normal(loc=mmu, scale=msigma, name="m"),  # m prior
  tfd.Uniform(cmin, cmax, name="c"),  # c prior
  lambda c, m: (tfd.Independent(
        tfd.Normal( # the Normal likelihood function
            loc=(m[..., tf.newaxis] * xtfp + c[..., tf.newaxis]), # the straight line model
            scale=sigma  # the data standard deviation
        ),
        name="datatfp",  # the name of the variable holding the data
        reinterpreted_batch_ndims=1,
    ))
])

print('TensorFlow version: {}'.format(tf.__version__))
print('TensorFlow Probability version: {}'.format(tfp.__version__))
TensorFlow version: 2.6.0-dev20210519
TensorFlow Probability version: 0.14.0-dev20210519

Next, the probablistic model must be wrapped in a function that takes in values of the parameters being sampled and returns the probability.

In [22]:
def target_log_prob_fn(mvalue, cvalue):
    return model.log_prob(
        (mvalue, cvalue, datatfp)
    )

We can now set up the sampler and here we'll use the No U-turn Sampler (NUTS). For the set up of the NoUTurnSampler we have to specify a step size (i.e., the size of the proposals). Some experimentation for our particular case shows a step size of 0.05 is reasonable, but it won't work for every case and most be tuned by hand. It also means that if different parameters have quite different dynamic ranges this method will not work well (it may produce jumps that are always for too big or small for certain sets of parameters).

Wrapping the sampling within a function (do_sampling()) with the @tf.function decorator compiles the model and means it runs far faster than without the wrapper.

We'll draw initial values of the m and c parameters from distributions that match their priors.

In [23]:
Nsamples = 4000 # final number of samples
Nburn = 4000    # number of tuning samples

# set up Hamiltonian MC (within TensorFlow function to speed up computation)
@tf.function(autograph=False)
def do_sampling():
    # set initial state (random draw from prior)
    qc = tf.random.uniform([], minval=cmin, maxval=cmax, name="init_c")
    qm = tf.random.normal([], stddev=msigma, mean=mmu, name="init_m")

    hmc_kernel = tfp.mcmc.HamiltonianMonteCarlo(
        target_log_prob_fn=target_log_prob_fn,
        step_size=0.05,
        num_leapfrog_steps=5,
    )

    return tfp.mcmc.sample_chain(
        num_results=Nsamples,
        current_state=[qm, qc],
        kernel=hmc_kernel,
        num_burnin_steps=Nburn,
    )

t0 = time()
states, kernel_results = do_sampling()
t1 = time()

timetfp = (t1-t0)

# extract the samples
ms, cs = states

# convert output states to numpy arrays
samples_tfp = np.vstack((ms.numpy(), cs.numpy())).T

num_accepted = np.sum(kernel_results.is_accepted)
print("Acceptance rate: {}".format(num_accepted / Nsamples))  # print out acceptance rate
/opt/conda/envs/python37/lib/python3.7/site-packages/tensorflow_probability/python/__init__.py:75: UserWarning: TensorFloat-32 matmul/conv are enabled for NVIDIA Ampere+ GPUs. The resulting loss of precision may hinder MCMC convergence. To turn off, run `tf.config.experimental.enable_tensor_float_32_execution(False)`. For more detail, see https://github.com/tensorflow/community/pull/287.
  'TensorFloat-32 matmul/conv are enabled for NVIDIA Ampere+ GPUs. The '
/opt/conda/envs/python37/lib/python3.7/site-packages/tensorflow_probability/python/mcmc/sample.py:342: UserWarning: Tracing all kernel results by default is deprecated. Set the `trace_fn` argument to None (the future default value) or an explicit callback that traces the values you are interested in.
  warnings.warn('Tracing all kernel results by default is deprecated. Set '
Acceptance rate: 0.927
In [24]:
print("Time taken to run 'TFP' is {} seconds".format(timetfp))
Time taken to run 'TFP' is 4.815892219543457 seconds
In [25]:
resdict['mtfp_mu'] = np.mean(samples_tfp[:,0])      # mean of m samples
resdict['mtfp_sig'] = np.std(samples_tfp[:,0])      # standard deviation of m samples
resdict['ctfp_mu'] = np.mean(samples_tfp[:,1])      # mean of c samples
resdict['ctfp_sig'] = np.std(samples_tfp[:,1])      # standard deviation of c samples
resdict['cctfp'] = np.corrcoef(samples_tfp.T)[0,1]  # correlation coefficient between parameters

plotposts(samples_tfp)
In [26]:
# calculate the autocorrelation length
acts = []
for i, param in enumerate(["m", "c"]):
    acts.append(emcee.autocorr.integrated_time(samples_tfp[:,i]))
    print("Autocorrelation length for '{}' is {}".format(param, acts[-1]))
print("Maximum autocorrelation length is {0:.1f}".format(np.max(acts)))

resdict['estfp'] = int(len(samples_tfp) / np.max(acts))
resdict['esstfp'] = int(resdict['estfp'] / timetfp)
print("Effective samples per second: {}".format(resdict['esstfp']))
Autocorrelation length for 'm' is [14.8794824]
Autocorrelation length for 'c' is [17.3901431]
Maximum autocorrelation length is 17.4
Effective samples per second: 47

Zeus

Zeus (Karamanis & Beutler, 2020, Karamanis, Beutler & Peacock, 2021) is a pure Python MCMC code that uses ensemble slice sampling. It is very similar in usage to emcee.

Zeus is available via PyPI and can be installed using pip with:

pip install zeus-mcmc

or with Conda with:

conda install -c conda-forge zeus-mcmc

The Zeus source code is available on Github.

The Zeus documentation gives a nice example of fitting a straight line. The interface is so similar to emcee that we can reuse the logprior, loglikelihood and logposterior functions that we used in the case (I won't bother redefining them, so you'll have to scroll back up!). So, that just leaves the sampling. We will just leave any tunable parameters at their default values, but you can choose between a few different proposal distributions.

In [27]:
import zeus

print('Zeus version: {}'.format(zeus.__version__))

ndim = 2          # Number of parameters/dimensions (e.g. m and c)
Nens = 100        # for consistency with the emcee example we'll use 100 
Nburn = 500       # burn-in samples
Nsamples = 500    # Nsamples

# set the initial samples (we'll draw from the prior)
mini = np.random.normal(mmu, msigma, Nens) # initial m points
cini = np.random.uniform(cmin, cmax, Nens) # initial c points
inisamples = np.array([mini, cini]).T # initial samples

# Initialise the sampler
sampler = zeus.EnsembleSampler(Nens, ndim, logposterior, args=[data, sigma, x])

t0 = time()
# Run the sampler
sampler.run_mcmc(inisamples, Nburn + Nsamples)
t1 = time()

timezeus = t1 - t0

# print summary diagnostics
sampler.summary
Initialising ensemble of 100 walkers...
Sampling progress :   1%|          | 8/1000 [00:00<00:13, 72.51it/s]
Zeus version: 2.3.0
Sampling progress : 100%|██████████| 1000/1000 [00:12<00:00, 79.13it/s]
Summary
-------
Number of Generations: 1000
Number of Parameters: 2
Number of Walkers: 100
Number of Tuning Generations: 14
Scale Factor: 0.641642
Mean Integrated Autocorrelation Time: 3.09
Effective Sample Size: 32373.08
Number of Log Probability Evaluations: 586000
Effective Samples per Log Probability Evaluation: 0.055244
In [28]:
print("Time taken to run 'Zeus' is {} seconds".format(timezeus))
Time taken to run 'Zeus' is 12.64051866531372 seconds

The EnsembleSampler object has a method to extract the posterior samples and we can discard the burn-in and thin the samples if required.

In [29]:
samples_zeus = sampler.get_chain(flat=True, discard=Nburn)

resdict['mzeus_mu'] = np.mean(samples_zeus[:,0])      # mean of m samples
resdict['mzeus_sig'] = np.std(samples_zeus[:,0])      # standard deviation of m samples
resdict['czeus_mu'] = np.mean(samples_zeus[:,1])      # mean of c samples
resdict['czeus_sig'] = np.std(samples_zeus[:,1])      # standard deviation of c samples
resdict['cczeus'] = np.corrcoef(samples_zeus.T)[0,1]  # correlation coefficient between parameters

plotposts(samples_zeus)
In [30]:
# calculate the autocorrelation length
acts = []
for i, param in enumerate(["m", "c"]):
    acts.append(emcee.autocorr.integrated_time(samples_zeus[:, i]))
    print("Autocorrelation length for '{}' is {}".format(param, acts[-1]))
print("Maximum autocorrelation length is {0:.1f}".format(np.max(acts)))

resdict['eszeus'] = int(len(samples_zeus) / np.max(acts))
resdict['esszeus'] = int(resdict['eszeus'] / timezeus)
print("Effective samples per second: {}".format(resdict['esszeus']))
Autocorrelation length for 'm' is [3.08637982]
Autocorrelation length for 'c' is [3.09159199]
Maximum autocorrelation length is 3.1
Effective samples per second: 1279

MCMC wrapper samplers

There are other MCMC samplers that are not pure Python packages, but have a Python-based wrapper to a different library. With these you can't write pure Python model and probability functions, but instead have to create a code string that is then compiled and run using the underlying library functions. We will look at PyStan and PyJAGS.

PyStan

PyStan is a Python interface to the popular Stan library. As with PyMC3, it uses the modern NUTS and Hamiltonain MC sampling methods. It can be installed using pip with:

pip install pystan

or via Conda with:

conda install -c conda-forge pystan

and the source code is available on GitHub here.

To define the model and distributions you need to write a code string like this:

In [31]:
# model and distributions for straight line fitting
line_code = """
data {{
    int N;      // number of data points
    real y[N];           // observed data points
    real x[N];           // abscissa points
    real sigma; // standard deviation
}}
parameters {{
    // parameters for the fit
    real m;
    real c;
}}
transformed parameters {{
    real theta[N];
    for (j in 1:N)
    theta[j] = m * x[j] + c; // straight line model
}}
model {{
    m ~ normal({mmu}, {msigma});     // prior on m (gradient)
    c ~ uniform({clower}, {cupper}); // prior on c (y-intercept)
    y ~ normal(theta, sigma);        // likelihood of the data given the model
}}
"""

You can then define the inputs to the data section in a dictionary:

In [32]:
# set the data and the abscissa
linear_data = {
    'N': M,          # number of data points
    'y': data,       # observed data (converted from numpy array to a list)
    'x': x,          # abscissa points (converted from numpy array to a list)
    'sigma': sigma,  # standard deviation
}

Now you can compile the model and run the sampler, with 1000 iterations and 4 chains, with:

In [33]:
%%capture
import pystan  # import PyStan

Nsamples = 1000  # set the number of iterations of the sampler
chains = 4       # set the number of chains to run with

# dictionary for inputs into line_code
linedict = {}
linedict['mmu'] = 0.0     # mean of Gaussian prior distribution for m
linedict['msigma'] = 10   # standard deviation of Gaussian prior distribution for m
linedict['clower'] = -10  # lower bound on uniform prior distribution for c
linedict['cupper'] = 10   # upper bound on uniform prior distribution for c

t0 = time()
sm = pystan.StanModel(model_code=line_code.format(**linedict)); # compile model
t1 = time()

timepystancomp = (t1 - t0)

t0 = time()
fit = sm.sampling(data=linear_data, iter=Nsamples, chains=chains); # perform sampling
t1 = time()

timepystan = (t1 - t0)
COMPILING THE C++ CODE FOR MODEL anon_model_15aa743a6e306b61e6880db7be28ca62 NOW.
In [34]:
print('PyStan version: {}'.format(pystan.__version__))

print("Time taken to comile 'PyStan' model is {} seconds".format(timepystancomp))
print("Time taken to run 'PyStan' is {} seconds".format(timepystan))
PyStan version: 2.19.1.1
Time taken to comile 'PyStan' model is 41.173564434051514 seconds
Time taken to run 'PyStan' is 0.36288952827453613 seconds

PyStan does have a plotting function (you could run fit.plot()), but we'll extract the chains and use corner.py:

In [35]:
la = fit.extract(permuted=True)  # return a dictionary of arrays

samples_pystan = np.vstack((la['m'], la['c'])).T

resdict['mpystan_mu'] = np.mean(samples_pystan[:,0])      # mean of m samples
resdict['mpystan_sig'] = np.std(samples_pystan[:,0])      # standard deviation of m samples
resdict['cpystan_mu'] = np.mean(samples_pystan[:,1])      # mean of c samples
resdict['cpystan_sig'] = np.std(samples_pystan[:,1])      # standard deviation of c samples
resdict['ccpystan'] = np.corrcoef(samples_pystan.T)[0,1] # correlation coefficient between parameters

# plot using corner.py
plotposts(samples_pystan)
In [36]:
# calculate the autocorrelation length
acts = []
for i, param in enumerate(["m", "c"]):
    acts.append(emcee.autocorr.integrated_time(samples_pystan[:, i]))
    print("Autocorrelation length for '{}' is {}".format(param, acts[-1]))
print("Maximum autocorrelation length is {0:.1f}".format(np.max(acts)))

resdict['espystan'] = int(len(samples_pystan) / np.max(acts))
resdict['esspystan'] = int(resdict['espystan'] / timetfp)
print("Effective samples per second: {}".format(resdict['esspystan']))
Autocorrelation length for 'm' is [1.21500575]
Autocorrelation length for 'c' is [1.11721154]
Maximum autocorrelation length is 1.2
Effective samples per second: 341

PyJAGS

PyJAGS is another wrapper to a non-Python library, in this case the JAGS library (Plummer, 2003). As with PyStan, you have to write a code string that is then compiled and run. To use PyJAGS you need to install JAGS, e.g. on a Debian-based system this can be done using:

sudo apt-get install jags

and then install PyJAGS using pip with:

pip install pyjags

The source code for PyJAGS is available on GitHub here.

A linear regression fitting example using PyJAGS is given here, on which this example is based.

We can define the model string using (note that in this model the Gaussian/Normal distribution dnorm takes the inverse variance as the second argument, rather than the standard deviation):

In [37]:
# create the JAGS code for the linear model
line_code_jags = """
model {{
    for (i in 1:N) {{
        y[i] ~ dnorm(c + m * x[i], {invvar}) # Gaussian likelihood
    }}
    m ~ dnorm({mmu}, {minvvar})     # Gaussian prior on m
    c ~ dunif({clower}, {cupper})   # Uniform prior on c
}}
"""

You can then define the data inputs section in a dictionary:

In [38]:
datadict = {
    'x': x,     # abscissa points (converted from numpy array to a list)
    'N': M,     # number of data points
    'y': data,  # the observed data
}

Now you can compile the model and run the sampler, with 1000 iterations and 4 chains, with:

In [39]:
%%capture
import pyjags # import PyJAGS

Nsamples = 1000 # set the number of iterations of the sampler
chains = 4      # set the number of chains to run with

# dictionary for inputs into line_code
linedict = {}
linedict['mmu'] = 0.0              # mean of Gaussian prior distribution for m
linedict['minvvar'] = 1 / 10**2    # inverse variance of Gaussian prior distribution for m
linedict['clower'] = -10           # lower bound on uniform prior distribution for c
linedict['cupper'] = 10            # upper bound on uniform prior distribution for c
linedict['invvar'] = 1 / sigma**2  # inverse variance of the data

# compile model
t0 = time()
model = pyjags.Model(line_code_jags.format(**linedict), data=datadict, chains=chains)
t1 = time()

timepyjagscomp = (t1 - t0)

t0 = time()
samples = model.sample(Nsamples, vars=['m', 'c']) # perform sampling
t1 = time()

timepyjags = (t1 - t0)
Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
NumExpr defaulting to 8 threads.
Using JAGS library located in /usr/lib/x86_64-linux-gnu/libjags.so.4.
Loading module basemod from /usr/lib/x86_64-linux-gnu/JAGS/modules-4/basemod.so
Loading module bugs from /usr/lib/x86_64-linux-gnu/JAGS/modules-4/bugs.so
Loading module lecuyer from /usr/lib/x86_64-linux-gnu/JAGS/modules-4/lecuyer.so
In [40]:
print('PyJAGS version: {}'.format(pyjags.__version__))

print("Time taken to comile 'PyJAGS' model is {} seconds".format(timepyjagscomp))
print("Time taken to run 'PyJAGS' is {} seconds".format(timepyjags))
PyJAGS version: 1.3.7
Time taken to comile 'PyJAGS' model is 0.09230971336364746 seconds
Time taken to run 'PyJAGS' is 0.059946537017822266 seconds

Now we can plot the samples using corner.py:

In [41]:
mchainjags = samples['m'].flatten()
cchainjags = samples['c'].flatten()

samples_pyjags = np.vstack((mchainjags, cchainjags)).T

resdict['mpyjags_mu'] = np.mean(samples_pyjags[:,0])      # mean of m samples
resdict['mpyjags_sig'] = np.std(samples_pyjags[:,0])      # standard deviation of m samples
resdict['cpyjags_mu'] = np.mean(samples_pyjags[:,1])      # mean of c samples
resdict['cpyjags_sig'] = np.std(samples_pyjags[:,1])      # standard deviation of c samples
resdict['ccpyjags'] = np.corrcoef(samples_pyjags.T)[0,1] # correlation coefficient between parameters

# plot using corner.py
plotposts(samples_pyjags)
In [42]:
# calculate the autocorrelation length
acts = []
for i, param in enumerate(["m", "c"]):
    acts.append(emcee.autocorr.integrated_time(samples_pystan[:, i]))
    print("Autocorrelation length for '{}' is {}".format(param, acts[-1]))
print("Maximum autocorrelation length is {0:.1f}".format(np.max(acts)))

resdict['espyjags'] = int(len(samples_pyjags) / np.max(acts))
resdict['esspyjags'] = int(resdict['espyjags'] / timetfp)
print("Effective samples per second: {}".format(resdict['esspyjags']))
Autocorrelation length for 'm' is [1.21500575]
Autocorrelation length for 'c' is [1.11721154]
Maximum autocorrelation length is 1.2
Effective samples per second: 683

That ends our exploration of MCMC codes. We now move on to Nested Sampling!

Nested Sampling

Nested sampling (Skilling, 2006) is a method to numerically perform the integral required to evaluate the marginal likelihood of the data given a particular model. This is not a value that is produced by most standard MCMC methods, but is a value that is very useful if wanting to do Bayesian model comparison. Above, we defined the marginal likelihood, or evidence, as

$$ p(\mathbf{d}|I) = \int^{\forall \theta_i \in \vec{\theta}} p(\mathbf{d}|\vec{\theta},I) p(\vec{\theta}|I) {\rm d}{\theta_i}, $$

but, we can make this more explicit for a given hypothesis, or model, $H_j$, by stating:

$$ p(\mathbf{d}|H_j, I) \equiv Z = \int^{\forall \theta_i \in \vec{\theta}} p(\mathbf{d}|\vec{\theta}, H_j, I) p(\vec{\theta}|H_j,I) {\rm d}{\theta_i}, $$

where previously the $H_j$ had been subsumed inside our implicit assumptions, $I$.

Nested sampling algorithms generally start by drawing a set of points (sometimes called "live points", or "active points") from the prior distribution. It relies on these points being independent draws from the prior, otherwise the resulting marginal likelihood value can be biased. The likelihood is evaluated at each point, and the smallest likelihood point is found. This points is then removed from the set of live points (and goes into the sum calculating the marginal likelihood integral) and a new point is drawn from the prior with the constraint that it has a likelihood larger than the just removed point. Drawing truly independent points from this constrained prior is the main tricky part of nested sampling (particularly when the number of parameters is not small), and a variety of methods (including MCMC methods) can be used.

The uncertainty on the estimate of the natural logarithm of the marginal likelihood, $\ln{Z}$, can be found if the information gain (or Kullback-Leibler divergence), $h$, in going from the prior to the posterior is calculated. If $h$ (in nats) is available then the uncertainty is (Skilling, 2006):

$$ \sqrt{\frac{h}{N_{\rm live}}}, $$

where $N_{\rm live}$ is the number of live points used.

Note: This is the statistical uncertainty on the value, but if there are any biases due to the sampling not being independent draws from the prior then there can be systematic uncertainties too.

As well as producing a value for the marginal likelihood, a by-product of the nested sampling algorithm is the production a set of samples that can be re-sampled from to provide samples drawn from the posterior distribution.

Note: MCMC methods can be used to estimate the marginal likelihood, e.g., emcee can be run with a parallel tempering sampling method and used to compute the evidence using thermodynamic integration (e.g., Goggans & Chi, 2004), or potentially using Slice sampling with PyMC3.

As with MCMC, there are various codes available to perform the nested sampling algorithm. We will again look at pure Python implementations and some that are wrappers to other libraries.

Nestle

Nestle is a pure Python nested sampling algorithm. It has three options for the method with which it draws new samples: "classic", "single ellipsoid", and "multiple ellipsoids. In this example we'll make use of the last of these. This uses the MultiNest method (also see PyMultiNest below) (Feroz et al, 2009) in which the current live points are clustered using a recursive k-means method (however, clustering is not re-done at every iteration of the algorithm), ellipsoids completely bounding each cluster are found, an ellipsoid is randomly chosen with a probability proportional to its volume, and a new sample is drawn uniformly from it.

Nestle is available on PyPI and installable using pip with:

pip install nestle

or via Conda with:

conda install -c conda-forge nestle

The source code is available on GitHub here.

The set up of likelihood functions is quite similar to that for emcee, but the prior that is used is a unit hypercube (an $n$-dimensional cube with all sides equal to one), so parameters must be mapped into this space. This is simple for uniform priors, but requires a re-parameterisation using a Jacobian for more complex priors (if possible).

For our Gaussian prior on $m$ we can't analytically re-parameterise it for a unit hypercube, however we can use inverse transform sampling to take samples from a uniform range between 0 and 1, and convert them back to a Gaussian, e.g.

$$ m = \mu_m + \sqrt{2}\sigma_m {\rm erf}^{-1}\left(2c - 1\right), $$

where $c \sim {\rm Uniform}(0,1)$ and ${\rm erf}^{-1}$ is the inverse error function. As shown in the Nestle documentation we can use the scipy.special.ndtri function for this.

An example of using Nestle for the linear regression problem, but with uniform priors on both parameters, is given here.

Setting the prior transform

Given the above description, rather than setting a prior function, we set a prior transformation function that maps from parameters drawn from the unit hypercube into the true parameters.

In [43]:
# import the inverse error function from scipy
from scipy.special import ndtri

def prior_transform(theta):
    """
    A function defining the tranform between the parameterisation in the unit hypercube
    to the true parameters.

    Args:
        theta (tuple): a tuple containing the parameters.
        
    Returns:
        tuple: a new tuple or array with the transformed parameters.
    """

    mprime, cprime = theta # unpack the parameters (in their unit hypercube form)

    cmin = -10.  # lower bound on uniform prior on c
    cmax = 10.   # upper bound on uniform prior on c

    mmu = 0.     # mean of Gaussian prior on m
    msigma = 10. # standard deviation of Gaussian prior on m

    m = mmu + msigma*ndtri(mprime) # convert back to m
    c = cprime*(cmax-cmin) + cmin  # convert back to c

    return (m, c)

Setting the likelihood function

This is similar to setting the likelihood function for emcee except, as we're calculating the marginal likelihood, we will keep the normalisation constants. As before, we work with the natural logarithm of the likelihood function. The function can only take in a tuple variable containing the parameter values, so any additional required values must be hard coded, or set as global variables.

In [44]:
# set the natural logarithm of 2pi, so that it doesn't have to be recalculated
LN2PI = np.log(2.*np.pi)
LNSIGMA = np.log(sigma) # natural log of the data noise standard deviation

def loglikelihood_nestle(theta):
    """
    The log-likelihood function.
    """

    m, c = theta # unpack the parameters

    # normalisation
    norm = -0.5 * M * LN2PI - M * LNSIGMA

    # chi-squared (data, sigma and x are global variables defined early on in this notebook)
    chisq = np.sum(((data - straight_line(x, m, c)) / sigma)**2)

    return norm - 0.5 * chisq

Now you can run the sampler. We'll use 1024 live/active points (there's nothing special about this number, and powers of two are not required). The more live points you use the longer the code will take to run, but the uncertainty on the final marginal likelihood value will decrease with greater numbers of live points.

Unlike with MCMCs, where you give it a number of iterations to run and after that point it stops, nested sampling needs to be supplied with a stopping criterion at which point it will terminate. One common criterion is based on the ratio between an estimate of the total evidence and the current calculated evidence value. Once this value gets below some set value the algorithm will terminate.

In [45]:
# import nestle
import nestle

print('Nestle version: {}'.format(nestle.__version__))

nlive = 1024     # number of live points
method = 'multi' # use MutliNest algorithm
ndims = 2        # two parameters
tol = 0.1        # the stopping criterion

t0 = time()
res = nestle.sample(loglikelihood_nestle, prior_transform, ndims, method=method, npoints=nlive, dlogz=tol)
t1 = time()

timenestle = (t1-t0)

print("Time taken to run 'Nestle' is {} seconds".format(timenestle))
Nestle version: 0.2.0
Time taken to run 'Nestle' is 0.9425952434539795 seconds

We can extract the value of the marginal likelihood, $\ln{Z}$, and its uncertainties with the following:

In [46]:
logZnestle = res.logz                         # value of logZ
infogainnestle = res.h                        # value of the information gain in nats
logZerrnestle = np.sqrt(infogainnestle/nlive) # estimate of the statistcal uncertainty on logZ

print("log(Z) = {} ± {}".format(logZnestle, logZerrnestle))
log(Z) = -116.7243045299626 ± 0.08290852230596568

A summary of the above values can also be output using:

In [47]:
print(res.summary())
niter: 10660
ncall: 17621
nsamples: 11684
logz: -116.724 +/-  0.083
h:  7.039

As mentioned above, the samples output by nested sampling are not drawn from the posterior, and as such need to be resampled from using certain weights. These weights are output by Nestle in the results. We can get posterior samples with the following:

In [48]:
# re-scale weights to have a maximum of one
nweights = res.weights/np.max(res.weights)

# get the probability of keeping a sample from the weights
keepidx = np.where(np.random.rand(len(nweights)) < nweights)[0]

# get the posterior samples
samples_nestle = res.samples[keepidx,:]

resdict['mnestle_mu'] = np.mean(samples_nestle[:,0])      # mean of m samples
resdict['mnestle_sig'] = np.std(samples_nestle[:,0])      # standard deviation of m samples
resdict['cnestle_mu'] = np.mean(samples_nestle[:,1])      # mean of c samples
resdict['cnestle_sig'] = np.std(samples_nestle[:,1])      # standard deviation of c samples
resdict['ccnestle'] = np.corrcoef(samples_nestle.T)[0,1]  # correlation coefficient between parameters
resdict['nestle_npos'] = len(samples_nestle)              # number of posterior samples
resdict['nestle_time'] = timenestle                       # run time
resdict['nestle_logZ'] = logZnestle                       # log marginalised likelihood
resdict['nestle_logZerr'] = logZerrnestle                 # uncertainty on log(Z)

print('Number of posterior samples is {}'.format(len(samples_nestle)))
Number of posterior samples is 2808

Now we can plot the posteriors:

In [49]:
plotposts(samples_nestle)

With nested sampling the samples should be uncorrelated by construction (in reality, for some complex problems, some methods used to draw new samples within the nested sampling algorithm can leave correlations). During nested sampling, you can estimate the number of effective posterior samples from the full chain of samples using, for example, (an unnormalised verion of) the effective sampler size calculation given here.

In [50]:
def ess(weights):
    """
    Estimate the effective sample size from the weights.
    
    Args:
        weights (array_like): an array of weights values for each nested sample
        
    Returns:
        int: the effective sample size
    """

    N = len(weights)
    w = weights / weights.sum()
    ess = N / (1.0 + ((N * w - 1)**2).sum() / N)

    return int(ess)
In [51]:
resdict['esnestle'] = ess(res.weights)
resdict['essnestle'] = int(resdict['esnestle'] / timenestle)
print("Effective samples per second: {}".format(resdict['essnestle']))
Effective samples per second: 4309

CPNest

CPNest (Veitch & del Pozzo, 2017) is another pure Python nested sampling algorithm. To perform the sampling from the restricted prior it uses a Metropolis-Hastings based MCMC, with sample proposal distributions that use a combination of the affine invarient ensemble methods described for emcee (Goodman & Weare, 2010), differential evolution, or jumps based on the eigenvectors of the covariance matrix of live points.

CPNest is available on PyPI and can be installed using pip with:

pip install cpnest

or via Conda with:

conda install -c conda-forge cpnest

The source code is available on GitHub here.

To run CPNest we have to define a class containing the likelihood and prior functions. CPNest also requires explicit bounds on all parameters from which it will uniformly draw an initial set of live points. Prior to running the nested sampling algorithm, it will run an MCMC using the prior as the target distribution to make sure the samples are drawn from the required prior, so it does not require any mapping between the prior and a unit-hypercube.

In [52]:
# import the Model class from CPNest
from cpnest.model import Model

LN2PI = np.log(2.*np.pi)

# set the model, likelihood and prior
class StraightLineModel(Model):
    """
    A simple straight line model, with a Gaussian likelihood.
    
    Args:
        data (:class:`numpy.ndarray`): an array containing the observed data
        abscissa (:class:`numpy.ndarray`): an array containing the points at which the data were taken
        modelfunc (function): a function defining the model
        sigma (float): the standard deviation of the noise in the data
        
    """
    names = ['m','c'] # parameter names (this is a required variables for the class)

    # define the boundaries using for initial sampling of the live points
    cmin = -10.  # lower range on c (the same as the uniform c prior lower bound)
    cmax = 10.   # upper range on c (the same as the uniform c prior upper bound)

    mmu = 0.     # mean of the Gaussian prior on m
    msigma = 10. # standard deviation of the Gaussian prior on m

    cbounds = [cmin, cmax]
    # set the m bounds to be +/- 10sigma about the mean
    mbounds = [mmu-10.*msigma, mmu+10.*msigma]

    # NOTE: the bounds could instead be set through arguments passed to the __init__
    # function for the class if wanted
    bounds=[mbounds, cbounds] # upper and lower bounds on each parameter (required for the class)

    def __init__(self, data, abscissa, modelfunc, sigma):
        # set the data
        self._data = data         # oberserved data
        self._abscissa = abscissa # points at which the observed data are taken
        self._sigma = sigma       # standard deviation(s) of the data
        self._logsigma = np.log(sigma) # log sigma here to save computations in the likelihood
        self._ndata = len(data)   # number of data points
        self._model = modelfunc   # model function

    def log_likelihood(self, params):
        """
        The natural logarithm of the likelihood function.

        Args:
            params (dict): a dictionary keyed to the parameter names.

        Returns:
            float: the log likelihood value.
        """
        # extract the parameters
        m = params['m']
        c = params['c']

        # calculate the model
        model = self._model(x, m, c)

        # normalisation
        norm = -0.5*self._ndata*LN2PI - self._ndata*self._logsigma

        # chi-squared
        chisq = np.sum(((self._data - model)/(self._sigma))**2)

        return norm - 0.5*chisq

    def log_prior(self,p):
        """
        The natural logarithm of the prior function.

        Args:
            p (dict): a dictionary keyed to parameter names.

        Returns:
            float: The log prior value.
        """
        # the uniform priors are dealt with by just checking that we're within the bounds
        if not self.in_bounds(p): return -np.inf # check parameters are all in bounds

        # Gaussian prior on m
        lp = 0.
        m = p['m']
        lp -= 0.5 * ((m - self.mmu) / self.msigma)**2 # no need for normalisation constant on the prior

        return lp
Setting verbosity to 0

We can now run CPNest. As with Nestle we will use 1024 live points. CPNest can use mutltiple CPU cores, but we'll set it to use one to be comparable to the other tests. The number of iterations used for the MCMC sampling is automatically worked out within the code by calculating autocorrelation lengths to see how many iterations are needed to produce independent samples, but a maximum chain length can also be set. We'll set this to 1024. As with Nestle a stopping criterion is required, which for CPNest is defined in the same way as Nestle. The current CPNest default is 0.1 and this cannot be altered by the user.

In [53]:
%%capture
# import cpnest
import cpnest

print('CPNest version: {}'.format(cpnest.__version__))

nlive = 1024      # number of live point
maxmcmc = 1024    # maximum MCMC chain length
nthreads = 1      # use one CPU core

# set up the algorithm
work = cpnest.CPNest(
    StraightLineModel(data, x, straight_line, sigma), verbose=0,
    nthreads=nthreads, nlive=nlive, maxmcmc=maxmcmc
);
# run the algorithm
t0 = time()
work.run();
t1 = time()

timecpnest = (t1 - t0)
Setting verbosity to 0
2021-05-20, 09:32:04 - cpnest.cpnest.CPNest                  : Running with 1 parallel threads
Running with 1 parallel threads
Setting verbosity to 0
2021-05-20, 09:32:21 - cpnest.sampler.Sampler                : Sampler process 970: MCMC samples accumulated = 461865
Sampler process 970: MCMC samples accumulated = 461865
2021-05-20, 09:32:21 - cpnest.sampler.Sampler                : Sampler process 970 - mean acceptance 0.496: exiting
Sampler process 970 - mean acceptance 0.496: exiting
2021-05-20, 09:32:21 - cpnest.NestedSampling.NestedSampler   : Final evidence: -116.80
Final evidence: -116.80
2021-05-20, 09:32:21 - cpnest.NestedSampling.NestedSampler   : Information: 7.16
Information: 7.16
2021-05-20, 09:32:22 - cpnest.nest2pos                       : Computed log_evidences: (-116.80638721422837,)
Computed log_evidences: (-116.80638721422837,)
2021-05-20, 09:32:22 - cpnest.nest2pos                       : Relative weights of input files: [1.0]
Relative weights of input files: [1.0]
2021-05-20, 09:32:22 - cpnest.nest2pos                       : Relative weights of input files taking into account their length: [1.0]
Relative weights of input files taking into account their length: [1.0]
2021-05-20, 09:32:22 - cpnest.nest2pos                       : Number of input samples: [11764]
Number of input samples: [11764]
2021-05-20, 09:32:22 - cpnest.nest2pos                       : Expected number of samples from each input file [2754]
Expected number of samples from each input file [2754]
2021-05-20, 09:32:22 - cpnest.nest2pos                       : Samples produced: 2754
Samples produced: 2754
In [54]:
print("Time taken to run 'CPNest' is {} seconds".format(timecpnest))
Time taken to run 'CPNest' is 17.42618751525879 seconds

We can again extract the value of $\ln{Z}$ and its uncertainty. One simple bit of model comparison that we may want to do is to compare the evidence for the model of the data containg a straight line (with the given priors on the parameter), with the evidence for a model that the data just contains independent Gaussian noise with the given standard deviation. We can calculate the latter evidence easily by just setting the model to zero in the log-likelihood function, e.g.:

$$ \ln{Z}_{\rm noise} = -\frac{M}{2}\ln{(2\pi)} - \ln\left(\prod_{i=1}^M \sigma_i\right) - \sum_{i=1}^M\frac{d_i^2}{2\sigma_i^2}. $$

The natural logarithm of the Bayes factor comparing the two models is then:

$$ \ln{\mathcal{B}} = \ln{Z} - \ln{Z}_{\rm noise}. $$

We will also extract this value.

In [55]:
logZcpnest = work.NS.logZ                     # value of log Z
infogaincpnest = work.NS.state.info           # value of the information gain
logZerrcpnest = np.sqrt(infogaincpnest/nlive) # estimate of the statistcal uncertainty on logZ

# get the null log likelihood (evidence that the data is Gaussian noise with zero mean, and given standard devaition)
logZnull = work.user.log_likelihood({'m': 0., 'c': 0.})

print("log(Z) = {} ± {}".format(logZcpnest, logZerrcpnest))

# output the log Bayes factor
print('log Bayes factor is {}'.format(logZcpnest - logZnull))
log(Z) = -116.80257733898489 ± 0.08359347188542411
log Bayes factor is 2633.843652183457

CPNest will itself resample from the nested samples to give posterior samples held in a dictionary, which we can plot.

In [56]:
mchain_cpnest = work.posterior_samples['m']  # extract chain of m values
cchain_cpnest = work.posterior_samples['c']  # extract chain if c values
samples_cpnest = np.vstack((mchain_cpnest, cchain_cpnest)).T

# print out the number of posterior samples
print('Number of posterior samples is {}'.format(samples_cpnest.shape[0]))

resdict['mcpnest_mu'] = np.mean(samples_cpnest[:,0])      # mean of m samples
resdict['mcpnest_sig'] = np.std(samples_cpnest[:,0])      # standard deviation of m samples
resdict['ccpnest_mu'] = np.mean(samples_cpnest[:,1])      # mean of c samples
resdict['ccpnest_sig'] = np.std(samples_cpnest[:,1])      # standard deviation of c samples
resdict['cccpnest'] = np.corrcoef(samples_cpnest.T)[0,1]  # correlation coefficient between parameters
resdict['cpnest_npos'] = len(samples_cpnest)              # number of posterior samples
resdict['cpnest_time'] = timecpnest                       # run time
resdict['cpnest_logZ'] = logZcpnest                       # log marginalised likelihood
resdict['cpnest_logZerr'] = logZerrcpnest                 # uncertainty on log(Z)

# plot using corner.py
plotposts(samples_cpnest)
Number of posterior samples is 2754
In [57]:
from cpnest.nest2pos import compute_weights
_, weights = compute_weights(work.get_nested_samples()["logL"], work.nlive)
weights = np.exp(weights)

resdict['escpnest'] = ess(weights)
resdict['esscpnest'] = int(resdict['escpnest'] / timecpnest)
print("Effective samples per second: {}".format(resdict['esscpnest']))
Effective samples per second: 229

dynesty

dynesty (Speagle, 2019) is a pure Python-based implementation of the dynamic nested sampling algorithm (Higson et al, 2017). It is very similar in usage to Nestle. Dynamic nested sampling is a way of dynamically adapting the number of live points used to provide more accurate evidence estimates and/or larger numbers of independent posterior samples. The code can run using dynamic adjustment of live points or just in a "static" mode with a fixed number of live points. There a many options for how the new live points are drawn, both in terms of the bounds from which they are drawn and the sampling from within those bounds. The bound options include the using bounding ellipsoids in the style of MultiNest, or the RadFriends algorithm of UltraNest, among others. The sampling options include uniform draws from the bounds, random walks, or slice sampling in the style of PolyChord. If not explicitly specified, dynesty will select a sampling algorithm based on the number of dimensions of the problem being tackled.

dynesty is available on PyPI and can be installed with pip using:

pip install dynesty

or using Conda with:

conda install -c conda-forge dynesty

Note: Other dynamic nested sampling codes exist such as dyPolyChord and perfectns.

The likelihood function and prior transform can be set up in a way that is identical to Nestle (we'll define both again below for completeness), with the differences being in how the sampler is called. Here we will show the sampling being used for both "static" and "dynamic" nested sampling.

In [58]:
# import the inverse error function from scipy
from scipy.special import ndtri

def prior_transform(theta):
    """
    A function defining the tranform between the parameterisation in the unit hypercube
    to the true parameters.

    Args:
        theta (tuple): a tuple containing the parameters.
        
    Returns:
        tuple: a new tuple or array with the transformed parameters.
    """

    mprime, cprime = theta # unpack the parameters (in their unit hypercube form)

    cmin = -10.  # lower bound on uniform prior on c
    cmax = 10.   # upper bound on uniform prior on c

    mmu = 0.     # mean of Gaussian prior on m
    msigma = 10. # standard deviation of Gaussian prior on m

    m = mmu + msigma*ndtri(mprime) # convert back to m
    c = cprime*(cmax-cmin) + cmin  # convert back to c

    return (m, c)


# set the natural logarithm of 2pi, so that it doesn't have to be recalculated
LN2PI = np.log(2. * np.pi)
LNSIGMA = np.log(sigma) # natural log of the data noise standard deviation


def loglikelihood_dynesty(theta):
    """
    The log-likelihood function.
    """

    m, c = theta # unpack the parameters

    # normalisation
    norm = -0.5 * M * LN2PI - M * LNSIGMA

    # chi-squared (data, sigma and x are global variables defined early on in this notebook)
    chisq = np.sum(((data-straight_line(x, m, c))/sigma)**2)

    return norm - 0.5 * chisq

First we'll run with the "static" sampler. For consistency with Nestle and CPNest we'll use 1024 live points and a stopping criterion of 0.1. As with our Nestle example we will use the MultiNest-style bounding ellipsoids, but with uniform sampling based on the low-dimensionality of the problem.

In [59]:
# import dynesty
import dynesty

print('dynesty version: {}'.format(dynesty.__version__))

nlive = 1024      # number of live points
bound = 'multi'   # use MutliNest algorithm for bounds
ndims = 2         # two parameters
sample = 'unif'   # uniform sampling
tol = 0.1         # the stopping criterion

from dynesty import NestedSampler

sampler = NestedSampler(loglikelihood_dynesty, prior_transform, ndims,
                        bound=bound, sample=sample, nlive=nlive)

t0 = time()
sampler.run_nested(dlogz=tol, print_progress=False) # don't output progress bar
t1 = time()

timedynesty = (t1-t0)

print("Time taken to run 'dynesty' (in static mode) is {} seconds".format(timedynesty))
dynesty version: 1.1
Time taken to run 'dynesty' (in static mode) is 3.2678871154785156 seconds

We can extract the value of the marginal likelihood, $\ln{⁡Z}$, and its uncertainties with the following:

In [60]:
res = sampler.results # get results dictionary from sampler

logZdynesty = res.logz[-1]        # value of logZ
logZerrdynesty = res.logzerr[-1]  # estimate of the statistcal uncertainty on logZ

print("log(Z) = {} ± {}".format(logZdynesty, logZerrdynesty))
log(Z) = -116.66207844899319 ± 0.11680906514881793

Or, get a summary of this information using:

In [61]:
print(res.summary())
Summary
=======
nlive: 1024
niter: 10600
ncall: 50103
eff(%): 23.200
logz: -116.662 +/-  0.117
None

As with Nestle we need to draw posterior samples from the nested samples based on their prior weights.

In [62]:
# get function that resamples from the nested samples to give sampler with equal weight
from dynesty.utils import resample_equal

# draw posterior samples
weights = np.exp(res['logwt'] - res['logz'][-1])
samples_dynesty = resample_equal(res.samples, weights)

resdict['mdynesty_mu'] = np.mean(samples_dynesty[:,0])      # mean of m samples
resdict['mdynesty_sig'] = np.std(samples_dynesty[:,0])      # standard deviation of m samples
resdict['cdynesty_mu'] = np.mean(samples_dynesty[:,1])      # mean of c samples
resdict['cdynesty_sig'] = np.std(samples_dynesty[:,1])      # standard deviation of c samples
resdict['ccdynesty'] = np.corrcoef(samples_dynesty.T)[0,1]  # correlation coefficient between parameters
resdict['dynesty_npos'] = len(samples_dynesty)              # number of posterior samples
resdict['dynesty_time'] = timedynesty                       # run time
resdict['dynesty_logZ'] = logZdynesty                       # log marginalised likelihood
resdict['dynesty_logZerr'] = logZerrdynesty                 # uncertainty on log(Z)

print('Number of posterior samples is {}'.format(len(samples_dynesty)))

# plot using corner.py
plotposts(samples_dynesty)
Number of posterior samples is 11624
In [63]:
resdict['esdynesty'] = ess(np.exp(res["logwt"] - logZdynesty))
resdict['essdynesty'] = int(resdict['esdynesty'] / timedynesty)
print("Effective samples per second: {}".format(resdict['essdynesty']))
Effective samples per second: 1246

We can now do the same thing, but with the "dynamic" sampler. For this we can specify an initial number of live points (we will stick with 1024), but the actual number used will evolve. The stopping criterion for the dynamic method is more complex than for the static case, and we will just leave it at the default values.

In [64]:
# import the dynamic nested sampler
from dynesty import DynamicNestedSampler

# set up the sampler
dsampler = DynamicNestedSampler(loglikelihood_dynesty, prior_transform, ndims,
                                bound=bound, sample=sample)

# run the sampler (not that we have not included the dlogz tolerance)
t0 = time()
dsampler.run_nested(nlive_init=nlive, print_progress=False)
t1 = time()

timedynestydynamic = (t1-t0)

print("Time taken to run 'dynesty' (in dynamic mode) is {} seconds".format(timedynestydynamic))
Time taken to run 'dynesty' (in dynamic mode) is 115.42199993133545 seconds

Let's just compare the marginal likelihoods and numbers of posterior samples between the static and dynamic methods.

In [65]:
dres = dsampler.results # get results dictionary from sampler

logZdynestydynamic = dres.logz[-1]        # value of logZ
logZerrdynestydynamic = dres.logzerr[-1]  # estimate of the statistcal uncertainty on logZ

print("Static: log(Z) = {} ± {}".format(logZdynesty, logZerrdynesty))
print("Dynamic: log(Z) = {} ± {}".format(logZdynestydynamic, logZerrdynestydynamic))

# draw posterior samples
dweights = np.exp(dres['logwt'] - dres['logz'][-1])
samples_dynestydynamic = resample_equal(dres.samples, dweights)
essdynestydynamic = ess(np.exp(dres["logwt"] - logZdynestydynamic))
print('Static: num. posterior samples: {}'.format(len(samples_dynesty)))
print('Dynamic: num. posterior samples: {}'.format(len(samples_dynestydynamic)))
print('Static: effective sample size: {}'.format(resdict['esdynesty']))
print('Dynamic: effective sample size: {}'.format(essdynestydynamic))
Static: log(Z) = -116.66207844899319 ± 0.11680906514881793
Dynamic: log(Z) = -116.70461530428305 ± 0.11701110275760397
Static: num. posterior samples: 11624
Dynamic: num. posterior samples: 25519
Static: effective sample size: 4074
Dynamic: effective sample size: 16617

UltraNest

UltraNest is Python-based implementation of the nested sampling, that contains several sampling algorithms designed for correctness of sampling. One particular method it provides is called RadFriends (Buchner, 2014) that is designed to be robust against effects caused by inaccuracies of drawing new samples from the constrained prior.

UltraNest is available from its github repository, but can be installed form PyPi using

pip install ultranest

or using Conda with:

conda install -c conda-forge ultranest

Similarly to Nestle and dynesty, UltraNest samples from a unit hypercube, and therefore you have to defined a prior transformation function, so we can set up these functions in a very similar way to what we have seen above.

In [66]:
# define the prior transform

from scipy.special import ndtri

def prior_transform_ultranest(theta):
    """
    A function defining the tranform between the parameterisation in the unit hypercube
    to the true parameters.

    Args:
        theta (list): a list/array containing the parameters.
        
    Returns:
        list: a new list/array with the transformed parameters.
    """

    # unpack the parameters (in their unit hypercube form)
    mprime = theta[0]
    cprime = theta[1]

    cmin = -10.  # lower bound on uniform prior on c
    cmax = 10.   # upper bound on uniform prior on c

    mmu = 0.     # mean of Gaussian prior on m
    msigma = 10. # standard deviation of Gaussian prior on m

    m = mmu + msigma*ndtri(mprime) # convert back to m
    c = cprime*(cmax-cmin) + cmin  # convert back to c

    return np.array([m, c])

We can again define the likelihood in a very similay way to Nestle.

In [67]:
LN2PI = np.log(2.*np.pi)
LNSIGMA = np.log(sigma)

# define the Gaussian likelihood function
def loglikelihood_ultranest(theta):
    """
    The log-likelihood function.
    """

    # unpack the parameters
    m = theta[0]
    c = theta[1]

    # normalisation
    norm = -0.5*M*LN2PI - M*LNSIGMA

    # chi-squared (data, sigma and x are global variables defined early on in this notebook)
    chisq = np.sum(((data-straight_line(x, m, c))/sigma)**2)

    return norm - 0.5*chisq

To run UltraNest we'll use the ReactiveNestedSampler method. Unlike other nested sampling methods, this does not require you to specify the number of live points to use. It will automatically adjust the number as required, starting with a default minimum of 400.

In [68]:
import ultranest

print('UltraNest version: {}'.format(ultranest.__version__))

# set the ReactiveNestedSampler method
sampler = ultranest.ReactiveNestedSampler(["m", "c"], loglikelihood_ultranest, prior_transform_ultranest)

ndims = 2         # two parameters
tol = 0.1         # the stopping criterion

# run the nested sampling algorithm
t0 = time()
result = sampler.run(dlogz=tol)
t1 = time()

timeultranest = (t1 - t0)
ReactiveNestedSampler: dims=2+0, resume=False, log_dir=None, backend=hdf5, vectorized=False, nbootstraps=30, ndraw=128..65536
UltraNest version: 3.2.0
[ultranest] Sampling 400 live points from prior ...
Sampling 400 live points from prior ...
run_iter dlogz=0.1, dKL=0.5, frac_remain=0.01, Lepsilon=0.0010, min_ess=400
max_iters=-1, max_ncalls=-1, max_num_improvement_loops=-1, min_num_live_points=400, cluster_num_live_points=40
minimal_widths_sequence: [(-inf, 400.0), (inf, 400.0)]
Z=-inf(0.00%) | Like=-214244.82..-121.47 [-214244.8166..-138.2075] | it/evals=0/401 eff=0.0000% N=400 
iteration=0, ncalls=401, logz=-inf, remainder_fraction=100.0000%, Lmin=-214244.82, Lmax=-121.47
Z=-71571.6(0.00%) | Like=-71292.90..-121.47 [-214244.8166..-138.2075] | it/evals=40/442 eff=95.2381% N=400 
iteration=40, ncalls=442, logz=-71571.64, remainder_fraction=100.0000%, Lmin=-71292.90, Lmax=-121.47
Z=-49279.5(0.00%) | Like=-48421.64..-121.47 [-214244.8166..-138.2075] | it/evals=80/486 eff=93.0233% N=400 
iteration=80, ncalls=486, logz=-49279.46, remainder_fraction=100.0000%, Lmin=-48421.64, Lmax=-121.47
Z=-34150.3(0.00%) | Like=-33268.40..-121.47 [-214244.8166..-138.2075] | it/evals=120/532 eff=90.9091% N=400 
iteration=120, ncalls=532, logz=-34150.27, remainder_fraction=100.0000%, Lmin=-33268.40, Lmax=-121.47
Z=-24050.8(0.00%) | Like=-23889.22..-121.47 [-214244.8166..-138.2075] | it/evals=160/580 eff=88.8889% N=400 
iteration=160, ncalls=580, logz=-24050.80, remainder_fraction=100.0000%, Lmin=-23889.22, Lmax=-121.47
Z=-20478.2(0.00%) | Like=-20270.79..-121.47 [-214244.8166..-138.2075] | it/evals=180/604 eff=88.2353% N=400 
iteration=180, ncalls=604, logz=-20478.23, remainder_fraction=100.0000%, Lmin=-20270.79, Lmax=-121.47
Z=-17572.1(0.00%) | Like=-17356.63..-121.47 [-214244.8166..-138.2075] | it/evals=200/633 eff=85.8369% N=400 
iteration=200, ncalls=633, logz=-17572.08, remainder_fraction=100.0000%, Lmin=-17356.63, Lmax=-121.47
Z=-13340.1(0.00%) | Like=-12613.91..-121.47 [-214244.8166..-138.2075] | it/evals=240/685 eff=84.2105% N=400 
iteration=240, ncalls=685, logz=-13340.13, remainder_fraction=100.0000%, Lmin=-12613.91, Lmax=-121.47
Z=-10635.3(0.00%) | Like=-10625.24..-121.47 [-214244.8166..-138.2075] | it/evals=270/724 eff=83.3333% N=400 
iteration=270, ncalls=724, logz=-10635.29, remainder_fraction=100.0000%, Lmin=-10625.24, Lmax=-121.47
Z=-10120.7(0.00%) | Like=-10091.27..-121.47 [-214244.8166..-138.2075] | it/evals=280/738 eff=82.8402% N=400 
iteration=280, ncalls=738, logz=-10120.74, remainder_fraction=100.0000%, Lmin=-10091.27, Lmax=-121.47
Z=-8348.0(0.00%) | Like=-8296.66..-121.47 [-214244.8166..-138.2075] | it/evals=320/785 eff=83.1169% N=400 
iteration=320, ncalls=785, logz=-8348.02, remainder_fraction=100.0000%, Lmin=-8296.66, Lmax=-121.47
Z=-6675.8(0.00%) | Like=-6653.67..-121.47 [-214244.8166..-138.2075] | it/evals=360/831 eff=83.5267% N=400 
iteration=360, ncalls=831, logz=-6675.79, remainder_fraction=100.0000%, Lmin=-6653.67, Lmax=-121.47
Z=-5420.9(0.00%) | Like=-5386.20..-110.42 [-214244.8166..-138.2075] | it/evals=400/879 eff=83.5073% N=400 
iteration=400, ncalls=879, logz=-5420.90, remainder_fraction=100.0000%, Lmin=-5386.20, Lmax=-110.42
Z=-4446.5(0.00%) | Like=-4436.99..-110.42 [-214244.8166..-138.2075] | it/evals=440/923 eff=84.1300% N=400 
iteration=440, ncalls=923, logz=-4446.53, remainder_fraction=100.0000%, Lmin=-4436.99, Lmax=-110.42
Z=-4233.0(0.00%) | Like=-4221.09..-110.42 [-214244.8166..-138.2075] | it/evals=450/933 eff=84.4278% N=400 
iteration=450, ncalls=933, logz=-4233.02, remainder_fraction=100.0000%, Lmin=-4221.09, Lmax=-110.42
Z=-3521.6(0.00%) | Like=-3510.33..-110.42 [-214244.8166..-138.2075] | it/evals=480/970 eff=84.2105% N=400 
iteration=480, ncalls=970, logz=-3521.60, remainder_fraction=100.0000%, Lmin=-3510.33, Lmax=-110.42
Z=-2865.7(0.00%) | Like=-2854.28..-110.42 [-214244.8166..-138.2075] | it/evals=520/1015 eff=84.5528% N=400 
iteration=520, ncalls=1015, logz=-2865.67, remainder_fraction=100.0000%, Lmin=-2854.28, Lmax=-110.42
Z=-2699.3(0.00%) | Like=-2672.52..-110.42 [-214244.8166..-138.2075] | it/evals=540/1039 eff=84.5070% N=400 
iteration=540, ncalls=1039, logz=-2699.32, remainder_fraction=100.0000%, Lmin=-2672.52, Lmax=-110.42
Z=-2485.3(0.00%) | Like=-2464.80..-110.42 [-214244.8166..-138.2075] | it/evals=560/1065 eff=84.2105% N=400 
iteration=560, ncalls=1065, logz=-2485.25, remainder_fraction=100.0000%, Lmin=-2464.80, Lmax=-110.42
Z=-2128.8(0.00%) | Like=-2117.42..-109.44 [-214244.8166..-138.2075] | it/evals=600/1110 eff=84.5070% N=400 
iteration=600, ncalls=1110, logz=-2128.82, remainder_fraction=100.0000%, Lmin=-2117.42, Lmax=-109.44
Z=-1803.4(0.00%) | Like=-1793.74..-109.44 [-214244.8166..-138.2075] | it/evals=640/1157 eff=84.5443% N=400 
iteration=640, ncalls=1157, logz=-1803.38, remainder_fraction=100.0000%, Lmin=-1793.74, Lmax=-109.44
Z=-1498.3(0.00%) | Like=-1486.63..-109.44 [-214244.8166..-138.2075] | it/evals=680/1210 eff=83.9506% N=400 
iteration=680, ncalls=1210, logz=-1498.25, remainder_fraction=100.0000%, Lmin=-1486.63, Lmax=-109.44
Z=-1268.3(0.00%) | Like=-1257.22..-109.44 [-214244.8166..-138.2075] | it/evals=720/1259 eff=83.8184% N=400 
iteration=720, ncalls=1259, logz=-1268.30, remainder_fraction=100.0000%, Lmin=-1257.22, Lmax=-109.44
Z=-1073.3(0.00%) | Like=-1061.70..-108.73 [-214244.8166..-138.2075] | it/evals=760/1305 eff=83.9779% N=400 
iteration=760, ncalls=1305, logz=-1073.27, remainder_fraction=100.0000%, Lmin=-1061.70, Lmax=-108.73
Z=-938.3(0.00%) | Like=-917.40..-108.73 [-214244.8166..-138.2075] | it/evals=800/1353 eff=83.9454% N=400 
iteration=800, ncalls=1353, logz=-938.26, remainder_fraction=100.0000%, Lmin=-917.40, Lmax=-108.73
Z=-796.0(0.00%) | Like=-783.14..-108.73 [-214244.8166..-138.2075] | it/evals=840/1405 eff=83.5821% N=400 
iteration=840, ncalls=1405, logz=-795.99, remainder_fraction=100.0000%, Lmin=-783.14, Lmax=-108.73
Z=-677.1(0.00%) | Like=-668.93..-108.73 [-214244.8166..-138.2075] | it/evals=880/1458 eff=83.1758% N=400 
iteration=880, ncalls=1458, logz=-677.08, remainder_fraction=100.0000%, Lmin=-668.93, Lmax=-108.73
Z=-624.9(0.00%) | Like=-614.39..-108.73 [-214244.8166..-138.2075] | it/evals=900/1485 eff=82.9493% N=400 
iteration=900, ncalls=1485, logz=-624.87, remainder_fraction=100.0000%, Lmin=-614.39, Lmax=-108.73
Z=-572.9(0.00%) | Like=-559.25..-108.73 [-214244.8166..-138.2075] | it/evals=920/1508 eff=83.0325% N=400 
iteration=920, ncalls=1508, logz=-572.94, remainder_fraction=100.0000%, Lmin=-559.25, Lmax=-108.73
Z=-508.1(0.00%) | Like=-499.48..-108.73 [-214244.8166..-138.2075] | it/evals=960/1555 eff=83.1169% N=400 
iteration=960, ncalls=1555, logz=-508.13, remainder_fraction=100.0000%, Lmin=-499.48, Lmax=-108.73
Z=-469.3(0.00%) | Like=-460.40..-108.73 [-214244.8166..-138.2075] | it/evals=990/1591 eff=83.1234% N=400 
iteration=990, ncalls=1591, logz=-469.33, remainder_fraction=100.0000%, Lmin=-460.40, Lmax=-108.73
Z=-458.2(0.00%) | Like=-449.71..-108.73 [-214244.8166..-138.2075] | it/evals=1000/1602 eff=83.1947% N=400 
iteration=1000, ncalls=1602, logz=-458.17, remainder_fraction=100.0000%, Lmin=-449.71, Lmax=-108.73
Z=-409.3(0.00%) | Like=-400.45..-108.73 [-214244.8166..-138.2075] | it/evals=1040/1650 eff=83.2000% N=400 
iteration=1040, ncalls=1650, logz=-409.28, remainder_fraction=100.0000%, Lmin=-400.45, Lmax=-108.73
Z=-370.7(0.00%) | Like=-362.39..-108.73 [-214244.8166..-138.2075] | it/evals=1080/1699 eff=83.1409% N=400 
iteration=1080, ncalls=1699, logz=-370.71, remainder_fraction=100.0000%, Lmin=-362.39, Lmax=-108.73
Z=-329.1(0.00%) | Like=-318.48..-108.73 [-214244.8166..-138.2075] | it/evals=1120/1751 eff=82.9016% N=400 
iteration=1120, ncalls=1751, logz=-329.08, remainder_fraction=100.0000%, Lmin=-318.48, Lmax=-108.73
Z=-308.1(0.00%) | Like=-300.60..-108.73 [-214244.8166..-138.2075] | it/evals=1160/1810 eff=82.2695% N=400 
iteration=1160, ncalls=1810, logz=-308.11, remainder_fraction=100.0000%, Lmin=-300.60, Lmax=-108.73
Z=-305.8(0.00%) | Like=-298.52..-108.73 [-214244.8166..-138.2075] | it/evals=1170/1825 eff=82.1053% N=400 
iteration=1170, ncalls=1825, logz=-305.77, remainder_fraction=100.0000%, Lmin=-298.52, Lmax=-108.73
Z=-287.9(0.00%) | Like=-279.59..-108.73 [-214244.8166..-138.2075] | it/evals=1200/1862 eff=82.0793% N=400 
iteration=1200, ncalls=1862, logz=-287.90, remainder_fraction=100.0000%, Lmin=-279.59, Lmax=-108.73
Z=-271.9(0.00%) | Like=-263.82..-108.73 [-214244.8166..-138.2075] | it/evals=1240/1914 eff=81.9022% N=400 
iteration=1240, ncalls=1914, logz=-271.89, remainder_fraction=100.0000%, Lmin=-263.82, Lmax=-108.73
Z=-261.2(0.00%) | Like=-252.90..-108.73 [-214244.8166..-138.2075] | it/evals=1260/1943 eff=81.6591% N=400 
iteration=1260, ncalls=1943, logz=-261.18, remainder_fraction=100.0000%, Lmin=-252.90, Lmax=-108.73
Z=-251.6(0.00%) | Like=-242.91..-108.73 [-214244.8166..-138.2075] | it/evals=1280/1967 eff=81.6847% N=400 
iteration=1280, ncalls=1967, logz=-251.56, remainder_fraction=100.0000%, Lmin=-242.91, Lmax=-108.73
Z=-238.2(0.00%) | Like=-230.17..-108.73 [-214244.8166..-138.2075] | it/evals=1320/2022 eff=81.3810% N=400 
iteration=1320, ncalls=2022, logz=-238.20, remainder_fraction=100.0000%, Lmin=-230.17, Lmax=-108.73
Z=-230.4(0.00%) | Like=-222.06..-108.73 [-214244.8166..-138.2075] | it/evals=1350/2061 eff=81.2763% N=400 
iteration=1350, ncalls=2061, logz=-230.40, remainder_fraction=100.0000%, Lmin=-222.06, Lmax=-108.73
Z=-228.5(0.00%) | Like=-220.97..-108.73 [-214244.8166..-138.2075] | it/evals=1360/2072 eff=81.3397% N=400 
iteration=1360, ncalls=2072, logz=-228.54, remainder_fraction=100.0000%, Lmin=-220.97, Lmax=-108.73
Z=-218.8(0.00%) | Like=-210.00..-108.73 [-214244.8166..-138.2075] | it/evals=1400/2121 eff=81.3481% N=400 
iteration=1400, ncalls=2121, logz=-218.78, remainder_fraction=100.0000%, Lmin=-210.00, Lmax=-108.73
Z=-208.7(0.00%) | Like=-200.12..-108.73 [-214244.8166..-138.2075] | it/evals=1440/2168 eff=81.4480% N=400 
iteration=1440, ncalls=2168, logz=-208.73, remainder_fraction=100.0000%, Lmin=-200.12, Lmax=-108.73
Z=-199.1(0.00%) | Like=-191.20..-108.73 [-214244.8166..-138.2075] | it/evals=1480/2217 eff=81.4529% N=400 
iteration=1480, ncalls=2217, logz=-199.13, remainder_fraction=100.0000%, Lmin=-191.20, Lmax=-108.73
Z=-192.1(0.00%) | Like=-183.70..-108.73 [-214244.8166..-138.2075] | it/evals=1520/2275 eff=81.0667% N=400 
iteration=1520, ncalls=2275, logz=-192.09, remainder_fraction=100.0000%, Lmin=-183.70, Lmax=-108.73
Z=-184.4(0.00%) | Like=-176.16..-108.73 [-214244.8166..-138.2075] | it/evals=1560/2330 eff=80.8290% N=400 
iteration=1560, ncalls=2330, logz=-184.38, remainder_fraction=100.0000%, Lmin=-176.16, Lmax=-108.73
Z=-177.7(0.00%) | Like=-169.48..-108.73 [-214244.8166..-138.2075] | it/evals=1600/2384 eff=80.6452% N=400 
iteration=1600, ncalls=2384, logz=-177.65, remainder_fraction=100.0000%, Lmin=-169.48, Lmax=-108.73
Z=-174.9(0.00%) | Like=-167.22..-108.73 [-214244.8166..-138.2075] | it/evals=1620/2410 eff=80.5970% N=400 
iteration=1620, ncalls=2410, logz=-174.95, remainder_fraction=100.0000%, Lmin=-167.22, Lmax=-108.73
Z=-172.9(0.00%) | Like=-165.06..-108.73 [-214244.8166..-138.2075] | it/evals=1640/2431 eff=80.7484% N=400 
iteration=1640, ncalls=2431, logz=-172.91, remainder_fraction=100.0000%, Lmin=-165.06, Lmax=-108.73
Z=-168.0(0.00%) | Like=-160.02..-108.73 [-214244.8166..-138.2075] | it/evals=1680/2479 eff=80.8081% N=400 
iteration=1680, ncalls=2479, logz=-168.03, remainder_fraction=100.0000%, Lmin=-160.02, Lmax=-108.73
Z=-164.9(0.00%) | Like=-157.02..-108.73 [-214244.8166..-138.2075] | it/evals=1710/2519 eff=80.6984% N=400 
iteration=1710, ncalls=2519, logz=-164.88, remainder_fraction=100.0000%, Lmin=-157.02, Lmax=-108.73
Z=-164.0(0.00%) | Like=-156.08..-108.73 [-214244.8166..-138.2075] | it/evals=1720/2532 eff=80.6754% N=400 
iteration=1720, ncalls=2532, logz=-163.97, remainder_fraction=100.0000%, Lmin=-156.08, Lmax=-108.73
Z=-159.5(0.00%) | Like=-151.55..-108.73 [-214244.8166..-138.2075] | it/evals=1760/2582 eff=80.6599% N=400 
iteration=1760, ncalls=2582, logz=-159.46, remainder_fraction=100.0000%, Lmin=-151.55, Lmax=-108.73
Z=-155.0(0.00%) | Like=-146.89..-108.73 [-214244.8166..-138.2075] | it/evals=1800/2628 eff=80.7899% N=400 
iteration=1800, ncalls=2628, logz=-155.00, remainder_fraction=100.0000%, Lmin=-146.89, Lmax=-108.73
Z=-151.0(0.00%) | Like=-142.93..-108.73 [-214244.8166..-138.2075] | it/evals=1840/2681 eff=80.6664% N=400 
iteration=1840, ncalls=2681, logz=-151.02, remainder_fraction=100.0000%, Lmin=-142.93, Lmax=-108.73
Z=-147.8(0.00%) | Like=-139.59..-108.73 [-214244.8166..-138.2075] | it/evals=1880/2730 eff=80.6867% N=400 
iteration=1880, ncalls=2730, logz=-147.85, remainder_fraction=100.0000%, Lmin=-139.59, Lmax=-108.73
Z=-144.9(0.00%) | Like=-136.68..-108.73 [-138.1550..-113.0528] | it/evals=1920/2782 eff=80.6045% N=400 
iteration=1920, ncalls=2782, logz=-144.95, remainder_fraction=100.0000%, Lmin=-136.68, Lmax=-108.73
Z=-142.2(0.00%) | Like=-134.36..-108.73 [-138.1550..-113.0528] | it/evals=1960/2836 eff=80.4598% N=400 
iteration=1960, ncalls=2836, logz=-142.23, remainder_fraction=100.0000%, Lmin=-134.36, Lmax=-108.73
Z=-141.3(0.00%) | Like=-133.17..-108.70 [-138.1550..-113.0528] | it/evals=1980/2865 eff=80.3245% N=400 
iteration=1980, ncalls=2865, logz=-141.25, remainder_fraction=100.0000%, Lmin=-133.17, Lmax=-108.70
Z=-140.3(0.00%) | Like=-132.43..-108.70 [-138.1550..-113.0528] | it/evals=2000/2888 eff=80.3859% N=400 
iteration=2000, ncalls=2888, logz=-140.32, remainder_fraction=100.0000%, Lmin=-132.43, Lmax=-108.70
Z=-138.7(0.00%) | Like=-130.75..-108.70 [-138.1550..-113.0528] | it/evals=2040/2931 eff=80.6006% N=400 
iteration=2040, ncalls=2931, logz=-138.68, remainder_fraction=100.0000%, Lmin=-130.75, Lmax=-108.70
Z=-137.1(0.00%) | Like=-128.82..-108.70 [-138.1550..-113.0528] | it/evals=2070/2967 eff=80.6389% N=400 
iteration=2070, ncalls=2967, logz=-137.09, remainder_fraction=100.0000%, Lmin=-128.82, Lmax=-108.70
Z=-136.6(0.00%) | Like=-128.28..-108.70 [-138.1550..-113.0528] | it/evals=2080/2977 eff=80.7140% N=400 
iteration=2080, ncalls=2977, logz=-136.57, remainder_fraction=100.0000%, Lmin=-128.28, Lmax=-108.70
Z=-134.3(0.00%) | Like=-126.12..-108.70 [-138.1550..-113.0528] | it/evals=2120/3024 eff=80.7927% N=400 
iteration=2120, ncalls=3024, logz=-134.29, remainder_fraction=100.0000%, Lmin=-126.12, Lmax=-108.70
Z=-132.8(0.00%) | Like=-124.77..-108.70 [-138.1550..-113.0528] | it/evals=2160/3090 eff=80.2974% N=400 
iteration=2160, ncalls=3090, logz=-132.79, remainder_fraction=100.0000%, Lmin=-124.77, Lmax=-108.70
Z=-131.4(0.00%) | Like=-123.23..-108.70 [-138.1550..-113.0528] | it/evals=2200/3139 eff=80.3213% N=400 
iteration=2200, ncalls=3139, logz=-131.41, remainder_fraction=100.0000%, Lmin=-123.23, Lmax=-108.70
Z=-129.9(0.00%) | Like=-121.64..-108.70 [-138.1550..-113.0528] | it/evals=2240/3189 eff=80.3155% N=400 
iteration=2240, ncalls=3189, logz=-129.93, remainder_fraction=99.9998%, Lmin=-121.64, Lmax=-108.70
Z=-129.6(0.00%) | Like=-121.42..-108.70 [-138.1550..-113.0528] | it/evals=2250/3201 eff=80.3285% N=400 
iteration=2250, ncalls=3201, logz=-129.59, remainder_fraction=99.9997%, Lmin=-121.42, Lmax=-108.70
Z=-128.6(0.00%) | Like=-120.35..-108.70 [-138.1550..-113.0528] | it/evals=2280/3236 eff=80.3949% N=400 
iteration=2280, ncalls=3236, logz=-128.63, remainder_fraction=99.9993%, Lmin=-120.35, Lmax=-108.70
Z=-127.4(0.00%) | Like=-119.19..-108.67 [-138.1550..-113.0528] | it/evals=2320/3286 eff=80.3881% N=400 
iteration=2320, ncalls=3286, logz=-127.44, remainder_fraction=99.9977%, Lmin=-119.19, Lmax=-108.67
Z=-126.9(0.00%) | Like=-118.69..-108.67 [-138.1550..-113.0528] | it/evals=2340/3313 eff=80.3296% N=400 
iteration=2340, ncalls=3313, logz=-126.91, remainder_fraction=99.9960%, Lmin=-118.69, Lmax=-108.67
Z=-126.4(0.01%) | Like=-118.26..-108.67 [-138.1550..-113.0528] | it/evals=2360/3338 eff=80.3268% N=400 
iteration=2360, ncalls=3338, logz=-126.42, remainder_fraction=99.9934%, Lmin=-118.26, Lmax=-108.67
Z=-125.6(0.02%) | Like=-117.28..-108.67 [-138.1550..-113.0528] | it/evals=2400/3385 eff=80.4020% N=400 
iteration=2400, ncalls=3385, logz=-125.55, remainder_fraction=99.9833%, Lmin=-117.28, Lmax=-108.67
Z=-124.9(0.03%) | Like=-116.85..-108.67 [-138.1550..-113.0528] | it/evals=2430/3425 eff=80.3306% N=400 
iteration=2430, ncalls=3425, logz=-124.92, remainder_fraction=99.9686%, Lmin=-116.85, Lmax=-108.67
Z=-124.7(0.04%) | Like=-116.66..-108.67 [-138.1550..-113.0528] | it/evals=2440/3437 eff=80.3424% N=400 
iteration=2440, ncalls=3437, logz=-124.73, remainder_fraction=99.9619%, Lmin=-116.66, Lmax=-108.67
Z=-124.1(0.07%) | Like=-115.95..-108.67 [-138.1550..-113.0528] | it/evals=2480/3489 eff=80.2849% N=400 
iteration=2480, ncalls=3489, logz=-124.07, remainder_fraction=99.9254%, Lmin=-115.95, Lmax=-108.67
Z=-123.4(0.14%) | Like=-115.25..-108.67 [-138.1550..-113.0528] | it/evals=2520/3546 eff=80.1017% N=400 
iteration=2520, ncalls=3546, logz=-123.40, remainder_fraction=99.8553%, Lmin=-115.25, Lmax=-108.67
Z=-122.8(0.27%) | Like=-114.66..-108.67 [-138.1550..-113.0528] | it/evals=2560/3592 eff=80.2005% N=400 
iteration=2560, ncalls=3592, logz=-122.80, remainder_fraction=99.7339%, Lmin=-114.66, Lmax=-108.67
Z=-122.2(0.48%) | Like=-114.11..-108.67 [-138.1550..-113.0528] | it/evals=2600/3642 eff=80.1974% N=400 
iteration=2600, ncalls=3642, logz=-122.24, remainder_fraction=99.5166%, Lmin=-114.11, Lmax=-108.67
Z=-122.1(0.56%) | Like=-113.96..-108.67 [-138.1550..-113.0528] | it/evals=2610/3658 eff=80.1105% N=400 
iteration=2610, ncalls=3658, logz=-122.11, remainder_fraction=99.4417%, Lmin=-113.96, Lmax=-108.67
Z=-121.8(0.78%) | Like=-113.69..-108.67 [-138.1550..-113.0528] | it/evals=2640/3693 eff=80.1701% N=400 
iteration=2640, ncalls=3693, logz=-121.77, remainder_fraction=99.2169%, Lmin=-113.69, Lmax=-108.67
Z=-121.3(1.24%) | Like=-113.11..-108.67 [-138.1550..-113.0528] | it/evals=2680/3739 eff=80.2636% N=400 
iteration=2680, ncalls=3739, logz=-121.33, remainder_fraction=98.7617%, Lmin=-113.11, Lmax=-108.67
Z=-121.1(1.55%) | Like=-112.92..-108.67 [-113.0496..-111.6313] | it/evals=2700/3764 eff=80.2616% N=400 
iteration=2700, ncalls=3764, logz=-121.11, remainder_fraction=98.4532%, Lmin=-112.92, Lmax=-108.67
Z=-120.9(1.86%) | Like=-112.72..-108.67 [-113.0496..-111.6313] | it/evals=2720/3786 eff=80.3308% N=400 
iteration=2720, ncalls=3786, logz=-120.91, remainder_fraction=98.1426%, Lmin=-112.72, Lmax=-108.67
Z=-120.5(2.66%) | Like=-112.38..-108.67 [-113.0496..-111.6313] | it/evals=2760/3835 eff=80.3493% N=400 
iteration=2760, ncalls=3835, logz=-120.53, remainder_fraction=97.3397%, Lmin=-112.38, Lmax=-108.67
Z=-120.3(3.39%) | Like=-112.13..-108.67 [-113.0496..-111.6313] | it/evals=2790/3872 eff=80.3571% N=400 
iteration=2790, ncalls=3872, logz=-120.28, remainder_fraction=96.6144%, Lmin=-112.13, Lmax=-108.67
Z=-120.2(3.71%) | Like=-112.04..-108.67 [-113.0496..-111.6313] | it/evals=2800/3884 eff=80.3674% N=400 
iteration=2800, ncalls=3884, logz=-120.20, remainder_fraction=96.2903%, Lmin=-112.04, Lmax=-108.67
Z=-120.2(3.74%) | Like=-112.03..-108.67 [-113.0496..-111.6313] | it/evals=2801/3885 eff=80.3730% N=400 
iteration=2801, ncalls=3885, logz=-120.19, remainder_fraction=96.2559%, Lmin=-112.03, Lmax=-108.67
Z=-119.9(4.99%) | Like=-111.69..-108.67 [-113.0496..-111.6313] | it/evals=2840/3930 eff=80.4533% N=400 
iteration=2840, ncalls=3930, logz=-119.89, remainder_fraction=95.0141%, Lmin=-111.69, Lmax=-108.67
Z=-119.6(6.49%) | Like=-111.41..-108.67 [-111.6293..-111.2050] | it/evals=2880/3985 eff=80.3347% N=400 
iteration=2880, ncalls=3985, logz=-119.61, remainder_fraction=93.5106%, Lmin=-111.41, Lmax=-108.67
Z=-119.4(8.18%) | Like=-111.14..-108.67 [-111.2035..-111.0319] | it/evals=2920/4036 eff=80.3080% N=400 
iteration=2920, ncalls=4036, logz=-119.35, remainder_fraction=91.8223%, Lmin=-111.14, Lmax=-108.67
Z=-119.1(10.45%) | Like=-110.88..-108.67 [-110.8782..-110.8709]*| it/evals=2960/4093 eff=80.1516% N=400 
iteration=2960, ncalls=4093, logz=-119.11, remainder_fraction=89.5535%, Lmin=-110.88, Lmax=-108.67
Z=-119.1(11.01%) | Like=-110.82..-108.67 [-110.8219..-110.8198]*| it/evals=2970/4110 eff=80.0539% N=400 
iteration=2970, ncalls=4110, logz=-119.05, remainder_fraction=88.9883%, Lmin=-110.82, Lmax=-108.67
Z=-118.9(12.93%) | Like=-110.65..-108.67 [-110.6542..-110.6499]*| it/evals=3000/4144 eff=80.1282% N=400 
iteration=3000, ncalls=4144, logz=-118.89, remainder_fraction=87.0707%, Lmin=-110.65, Lmax=-108.67
Z=-118.7(15.82%) | Like=-110.46..-108.67 [-110.4569..-110.4564]*| it/evals=3040/4198 eff=80.0421% N=400 
iteration=3040, ncalls=4198, logz=-118.69, remainder_fraction=84.1770%, Lmin=-110.46, Lmax=-108.67
Z=-118.6(17.35%) | Like=-110.35..-108.67 [-110.3460..-110.3453]*| it/evals=3060/4224 eff=80.0209% N=400 
iteration=3060, ncalls=4224, logz=-118.60, remainder_fraction=82.6463%, Lmin=-110.35, Lmax=-108.67
Z=-118.5(18.92%) | Like=-110.27..-108.67 [-110.2710..-110.2681]*| it/evals=3080/4249 eff=80.0208% N=400 
iteration=3080, ncalls=4249, logz=-118.51, remainder_fraction=81.0778%, Lmin=-110.27, Lmax=-108.67
Z=-118.4(22.13%) | Like=-110.12..-108.67 [-110.1276..-110.1166] | it/evals=3120/4300 eff=80.0000% N=400 
iteration=3120, ncalls=4300, logz=-118.35, remainder_fraction=77.8722%, Lmin=-110.12, Lmax=-108.67
Z=-118.2(24.61%) | Like=-110.01..-108.67 [-110.0101..-110.0042]*| it/evals=3150/4335 eff=80.0508% N=400 
iteration=3150, ncalls=4335, logz=-118.24, remainder_fraction=75.3853%, Lmin=-110.01, Lmax=-108.67
Z=-118.2(25.44%) | Like=-109.97..-108.67 [-109.9869..-109.9747] | it/evals=3160/4345 eff=80.1014% N=400 
iteration=3160, ncalls=4345, logz=-118.20, remainder_fraction=74.5642%, Lmin=-109.97, Lmax=-108.67
Z=-118.1(29.06%) | Like=-109.84..-108.67 [-109.8379..-109.8368]*| it/evals=3200/4391 eff=80.1804% N=400 
iteration=3200, ncalls=4391, logz=-118.07, remainder_fraction=70.9392%, Lmin=-109.84, Lmax=-108.67
Z=-118.0(32.75%) | Like=-109.74..-108.67 [-109.7368..-109.7303]*| it/evals=3240/4443 eff=80.1385% N=400 
iteration=3240, ncalls=4443, logz=-117.95, remainder_fraction=67.2473%, Lmin=-109.74, Lmax=-108.67
Z=-117.8(36.29%) | Like=-109.63..-108.67 [-109.6312..-109.6309]*| it/evals=3280/4487 eff=80.2545% N=400 
iteration=3280, ncalls=4487, logz=-117.85, remainder_fraction=63.7075%, Lmin=-109.63, Lmax=-108.67
Z=-117.7(39.95%) | Like=-109.53..-108.67 [-109.5301..-109.5300]*| it/evals=3320/4533 eff=80.3291% N=400 
iteration=3320, ncalls=4533, logz=-117.75, remainder_fraction=60.0528%, Lmin=-109.53, Lmax=-108.67
Z=-117.7(43.49%) | Like=-109.44..-108.66 [-109.4413..-109.4368]*| it/evals=3360/4585 eff=80.2867% N=400 
iteration=3360, ncalls=4585, logz=-117.66, remainder_fraction=56.5084%, Lmin=-109.44, Lmax=-108.66
Z=-117.6(47.04%) | Like=-109.36..-108.66 [-109.3590..-109.3566]*| it/evals=3400/4643 eff=80.1320% N=400 
iteration=3400, ncalls=4643, logz=-117.58, remainder_fraction=52.9566%, Lmin=-109.36, Lmax=-108.66
Z=-117.5(48.86%) | Like=-109.33..-108.66 [-109.3313..-109.3263]*| it/evals=3420/4677 eff=79.9626% N=400 
iteration=3420, ncalls=4677, logz=-117.55, remainder_fraction=51.1403%, Lmin=-109.33, Lmax=-108.66
Z=-117.5(50.51%) | Like=-109.29..-108.66 [-109.2872..-109.2856]*| it/evals=3440/4702 eff=79.9628% N=400 
iteration=3440, ncalls=4702, logz=-117.51, remainder_fraction=49.4859%, Lmin=-109.29, Lmax=-108.66
Z=-117.4(54.10%) | Like=-109.21..-108.66 [-109.2115..-109.2113]*| it/evals=3480/4754 eff=79.9265% N=400 
iteration=3480, ncalls=4754, logz=-117.44, remainder_fraction=45.9015%, Lmin=-109.21, Lmax=-108.66
Z=-117.4(57.38%) | Like=-109.16..-108.66 [-109.1610..-109.1599]*| it/evals=3520/4804 eff=79.9273% N=400 
iteration=3520, ncalls=4804, logz=-117.39, remainder_fraction=42.6234%, Lmin=-109.16, Lmax=-108.66
Z=-117.3(60.62%) | Like=-109.11..-108.66 [-109.1082..-109.1074]*| it/evals=3560/4856 eff=79.8923% N=400 
iteration=3560, ncalls=4856, logz=-117.33, remainder_fraction=39.3809%, Lmin=-109.11, Lmax=-108.66
Z=-117.3(63.63%) | Like=-109.06..-108.66 [-109.0610..-109.0609]*| it/evals=3600/4912 eff=79.7872% N=400 
iteration=3600, ncalls=4912, logz=-117.28, remainder_fraction=36.3731%, Lmin=-109.06, Lmax=-108.66
Z=-117.2(66.45%) | Like=-109.01..-108.66 [-109.0143..-109.0140]*| it/evals=3640/4963 eff=79.7721% N=400 
iteration=3640, ncalls=4963, logz=-117.24, remainder_fraction=33.5506%, Lmin=-109.01, Lmax=-108.66
Z=-117.2(69.16%) | Like=-108.98..-108.66 [-108.9832..-108.9831]*| it/evals=3680/5033 eff=79.4302% N=400 
iteration=3680, ncalls=5033, logz=-117.20, remainder_fraction=30.8439%, Lmin=-108.98, Lmax=-108.66
Z=-117.2(69.82%) | Like=-108.97..-108.66 [-108.9714..-108.9707]*| it/evals=3690/5048 eff=79.3890% N=400 
iteration=3690, ncalls=5048, logz=-117.19, remainder_fraction=30.1807%, Lmin=-108.97, Lmax=-108.66
Z=-117.2(71.70%) | Like=-108.94..-108.66 [-108.9352..-108.9348]*| it/evals=3720/5085 eff=79.4023% N=400 
iteration=3720, ncalls=5085, logz=-117.16, remainder_fraction=28.3038%, Lmin=-108.94, Lmax=-108.66
Z=-117.1(74.09%) | Like=-108.91..-108.66 [-108.9107..-108.9107]*| it/evals=3760/5133 eff=79.4422% N=400 
iteration=3760, ncalls=5133, logz=-117.13, remainder_fraction=25.9072%, Lmin=-108.91, Lmax=-108.66
Z=-117.1(76.26%) | Like=-108.89..-108.66 [-108.8884..-108.8884]*| it/evals=3800/5184 eff=79.4314% N=400 
iteration=3800, ncalls=5184, logz=-117.10, remainder_fraction=23.7406%, Lmin=-108.89, Lmax=-108.66
Z=-117.1(78.30%) | Like=-108.86..-108.66 [-108.8641..-108.8633]*| it/evals=3840/5241 eff=79.3225% N=400 
iteration=3840, ncalls=5241, logz=-117.08, remainder_fraction=21.6956%, Lmin=-108.86, Lmax=-108.66
Z=-117.1(79.74%) | Like=-108.85..-108.66 [-108.8499..-108.8497]*| it/evals=3870/5292 eff=79.1087% N=400 
iteration=3870, ncalls=5292, logz=-117.06, remainder_fraction=20.2588%, Lmin=-108.85, Lmax=-108.66
Z=-117.1(80.19%) | Like=-108.85..-108.66 [-108.8453..-108.8447]*| it/evals=3880/5303 eff=79.1352% N=400 
iteration=3880, ncalls=5303, logz=-117.05, remainder_fraction=19.8059%, Lmin=-108.85, Lmax=-108.66
Z=-117.0(81.92%) | Like=-108.83..-108.66 [-108.8260..-108.8257]*| it/evals=3920/5351 eff=79.1759% N=400 
iteration=3920, ncalls=5351, logz=-117.03, remainder_fraction=18.0752%, Lmin=-108.83, Lmax=-108.66
Z=-117.0(83.53%) | Like=-108.81..-108.66 [-108.8111..-108.8110]*| it/evals=3960/5400 eff=79.2000% N=400 
iteration=3960, ncalls=5400, logz=-117.01, remainder_fraction=16.4700%, Lmin=-108.81, Lmax=-108.66
Z=-117.0(84.99%) | Like=-108.80..-108.66 [-108.7960..-108.7958]*| it/evals=4000/5455 eff=79.1296% N=400 
iteration=4000, ncalls=5455, logz=-117.00, remainder_fraction=15.0120%, Lmin=-108.80, Lmax=-108.66
Z=-117.0(86.34%) | Like=-108.79..-108.66 [-108.7853..-108.7849]*| it/evals=4040/5514 eff=78.9988% N=400 
iteration=4040, ncalls=5514, logz=-116.98, remainder_fraction=13.6596%, Lmin=-108.79, Lmax=-108.66
Z=-117.0(86.65%) | Like=-108.78..-108.66 [-108.7813..-108.7806]*| it/evals=4050/5531 eff=78.9320% N=400 
iteration=4050, ncalls=5531, logz=-116.98, remainder_fraction=13.3458%, Lmin=-108.78, Lmax=-108.66
Z=-117.0(87.57%) | Like=-108.77..-108.66 [-108.7729..-108.7722]*| it/evals=4080/5567 eff=78.9626% N=400 
iteration=4080, ncalls=5567, logz=-116.97, remainder_fraction=12.4291%, Lmin=-108.77, Lmax=-108.66
Z=-117.0(88.70%) | Like=-108.76..-108.66 [-108.7632..-108.7631]*| it/evals=4120/5613 eff=79.0332% N=400 
iteration=4120, ncalls=5613, logz=-116.95, remainder_fraction=11.2961%, Lmin=-108.76, Lmax=-108.66
Z=-116.9(89.73%) | Like=-108.75..-108.66 [-108.7536..-108.7534]*| it/evals=4160/5669 eff=78.9524% N=400 
iteration=4160, ncalls=5669, logz=-116.94, remainder_fraction=10.2674%, Lmin=-108.75, Lmax=-108.66
Z=-116.9(90.67%) | Like=-108.74..-108.66 [-108.7441..-108.7437]*| it/evals=4200/5727 eff=78.8436% N=400 
iteration=4200, ncalls=5727, logz=-116.93, remainder_fraction=9.3280%, Lmin=-108.74, Lmax=-108.66
Z=-116.9(91.32%) | Like=-108.74..-108.66 [-108.7391..-108.7388]*| it/evals=4230/5767 eff=78.8150% N=400 
iteration=4230, ncalls=5767, logz=-116.92, remainder_fraction=8.6791%, Lmin=-108.74, Lmax=-108.66
Z=-116.9(91.53%) | Like=-108.74..-108.66 [-108.7375..-108.7373]*| it/evals=4240/5780 eff=78.8104% N=400 
iteration=4240, ncalls=5780, logz=-116.92, remainder_fraction=8.4716%, Lmin=-108.74, Lmax=-108.66
Z=-116.9(92.31%) | Like=-108.73..-108.66 [-108.7307..-108.7307]*| it/evals=4280/5828 eff=78.8504% N=400 
iteration=4280, ncalls=5828, logz=-116.91, remainder_fraction=7.6899%, Lmin=-108.73, Lmax=-108.66
Z=-116.9(93.02%) | Like=-108.73..-108.66 [-108.7260..-108.7256]*| it/evals=4320/5889 eff=78.7029% N=400 
iteration=4320, ncalls=5889, logz=-116.91, remainder_fraction=6.9804%, Lmin=-108.73, Lmax=-108.66
Z=-116.9(93.67%) | Like=-108.72..-108.66 [-108.7194..-108.7193]*| it/evals=4360/5942 eff=78.6720% N=400 
iteration=4360, ncalls=5942, logz=-116.90, remainder_fraction=6.3300%, Lmin=-108.72, Lmax=-108.66
Z=-116.9(94.26%) | Like=-108.72..-108.66 [-108.7155..-108.7150]*| it/evals=4400/5987 eff=78.7543% N=400 
iteration=4400, ncalls=5987, logz=-116.89, remainder_fraction=5.7423%, Lmin=-108.72, Lmax=-108.66
Z=-116.9(94.40%) | Like=-108.71..-108.66 [-108.7137..-108.7136]*| it/evals=4410/6000 eff=78.7500% N=400 
iteration=4410, ncalls=6000, logz=-116.89, remainder_fraction=5.6033%, Lmin=-108.71, Lmax=-108.66
Z=-116.9(94.79%) | Like=-108.71..-108.66 [-108.7098..-108.7098]*| it/evals=4440/6034 eff=78.8072% N=400 
iteration=4440, ncalls=6034, logz=-116.89, remainder_fraction=5.2095%, Lmin=-108.71, Lmax=-108.66
Z=-116.9(95.28%) | Like=-108.71..-108.66 [-108.7063..-108.7061]*| it/evals=4480/6085 eff=78.8039% N=400 
iteration=4480, ncalls=6085, logz=-116.88, remainder_fraction=4.7237%, Lmin=-108.71, Lmax=-108.66
Z=-116.9(95.50%) | Like=-108.70..-108.66 [-108.7042..-108.7040]*| it/evals=4500/6109 eff=78.8229% N=400 
iteration=4500, ncalls=6109, logz=-116.88, remainder_fraction=4.4990%, Lmin=-108.70, Lmax=-108.66
Z=-116.9(95.72%) | Like=-108.70..-108.66 [-108.7021..-108.7019]*| it/evals=4520/6135 eff=78.8143% N=400 
iteration=4520, ncalls=6135, logz=-116.88, remainder_fraction=4.2840%, Lmin=-108.70, Lmax=-108.66
Z=-116.9(96.12%) | Like=-108.70..-108.66 [-108.6985..-108.6984]*| it/evals=4560/6182 eff=78.8654% N=400 
iteration=4560, ncalls=6182, logz=-116.87, remainder_fraction=3.8838%, Lmin=-108.70, Lmax=-108.66
Z=-116.9(96.39%) | Like=-108.70..-108.66 [-108.6959..-108.6958]*| it/evals=4590/6217 eff=78.9067% N=400 
iteration=4590, ncalls=6217, logz=-116.87, remainder_fraction=3.6070%, Lmin=-108.70, Lmax=-108.66
Z=-116.9(96.48%) | Like=-108.70..-108.66 [-108.6953..-108.6953]*| it/evals=4600/6227 eff=78.9429% N=400 
iteration=4600, ncalls=6227, logz=-116.87, remainder_fraction=3.5193%, Lmin=-108.70, Lmax=-108.66
Z=-116.9(96.81%) | Like=-108.69..-108.66 [-108.6927..-108.6927]*| it/evals=4640/6283 eff=78.8713% N=400 
iteration=4640, ncalls=6283, logz=-116.87, remainder_fraction=3.1895%, Lmin=-108.69, Lmax=-108.66
Z=-116.9(97.11%) | Like=-108.69..-108.66 [-108.6894..-108.6894]*| it/evals=4680/6338 eff=78.8144% N=400 
iteration=4680, ncalls=6338, logz=-116.86, remainder_fraction=2.8903%, Lmin=-108.69, Lmax=-108.66
Z=-116.9(97.38%) | Like=-108.69..-108.66 [-108.6872..-108.6872]*| it/evals=4720/6388 eff=78.8243% N=400 
iteration=4720, ncalls=6388, logz=-116.86, remainder_fraction=2.6181%, Lmin=-108.69, Lmax=-108.66
Z=-116.9(97.63%) | Like=-108.68..-108.66 [-108.6850..-108.6848]*| it/evals=4760/6435 eff=78.8732% N=400 
iteration=4760, ncalls=6435, logz=-116.86, remainder_fraction=2.3718%, Lmin=-108.68, Lmax=-108.66
Z=-116.9(97.69%) | Like=-108.68..-108.66 [-108.6843..-108.6843]*| it/evals=4770/6447 eff=78.8821% N=400 
iteration=4770, ncalls=6447, logz=-116.86, remainder_fraction=2.3135%, Lmin=-108.68, Lmax=-108.66
Z=-116.9(97.85%) | Like=-108.68..-108.66 [-108.6828..-108.6825]*| it/evals=4800/6479 eff=78.9604% N=400 
iteration=4800, ncalls=6479, logz=-116.85, remainder_fraction=2.1479%, Lmin=-108.68, Lmax=-108.66
Z=-116.9(98.05%) | Like=-108.68..-108.66 [-108.6814..-108.6813]*| it/evals=4840/6530 eff=78.9560% N=400 
iteration=4840, ncalls=6530, logz=-116.85, remainder_fraction=1.9455%, Lmin=-108.68, Lmax=-108.66
Z=-116.9(98.15%) | Like=-108.68..-108.66 [-108.6806..-108.6806]*| it/evals=4860/6559 eff=78.9089% N=400 
iteration=4860, ncalls=6559, logz=-116.85, remainder_fraction=1.8516%, Lmin=-108.68, Lmax=-108.66
Z=-116.9(98.24%) | Like=-108.68..-108.66 [-108.6799..-108.6799]*| it/evals=4880/6585 eff=78.9006% N=400 
iteration=4880, ncalls=6585, logz=-116.85, remainder_fraction=1.7620%, Lmin=-108.68, Lmax=-108.66
Z=-116.8(98.40%) | Like=-108.68..-108.66 [-108.6785..-108.6785]*| it/evals=4920/6633 eff=78.9347% N=400 
iteration=4920, ncalls=6633, logz=-116.85, remainder_fraction=1.5957%, Lmin=-108.68, Lmax=-108.66
Z=-116.8(98.52%) | Like=-108.68..-108.66 [-108.6773..-108.6773]*| it/evals=4950/6668 eff=78.9726% N=400 
iteration=4950, ncalls=6668, logz=-116.85, remainder_fraction=1.4811%, Lmin=-108.68, Lmax=-108.66
Z=-116.8(98.56%) | Like=-108.68..-108.66 [-108.6770..-108.6770]*| it/evals=4960/6680 eff=78.9809% N=400 
iteration=4960, ncalls=6680, logz=-116.85, remainder_fraction=1.4448%, Lmin=-108.68, Lmax=-108.66
Z=-116.8(98.69%) | Like=-108.68..-108.66 [-108.6757..-108.6756]*| it/evals=5000/6726 eff=79.0389% N=400 
iteration=5000, ncalls=6726, logz=-116.85, remainder_fraction=1.3082%, Lmin=-108.68, Lmax=-108.66
Z=-116.8(98.82%) | Like=-108.67..-108.66 [-108.6744..-108.6743]*| it/evals=5040/6778 eff=79.0216% N=400 
iteration=5040, ncalls=6778, logz=-116.84, remainder_fraction=1.1844%, Lmin=-108.67, Lmax=-108.66
Z=-116.8(98.93%) | Like=-108.67..-108.66 [-108.6732..-108.6732]*| it/evals=5080/6830 eff=79.0047% N=400 
iteration=5080, ncalls=6830, logz=-116.84, remainder_fraction=1.0722%, Lmin=-108.67, Lmax=-108.66
[ultranest] Explored until L=-1e+02  
Explored until L=-1e+02  
[ultranest] Likelihood function evaluations: 6864
Likelihood function evaluations: 6864
did a run_iter pass!
[ultranest]   logZ = -116.8 +- 0.06943
  logZ = -116.8 +- 0.06943
[ultranest] Effective samples strategy satisfied (ESS = 1581.1, need >400)
Effective samples strategy satisfied (ESS = 1581.1, need >400)
[ultranest] Posterior uncertainty strategy is satisfied (KL: 0.45+-0.06 nat, need <0.50 nat)
Posterior uncertainty strategy is satisfied (KL: 0.45+-0.06 nat, need <0.50 nat)
[ultranest] Evidency uncertainty strategy is satisfied (dlogz=0.18, need <0.1)
Evidency uncertainty strategy is satisfied (dlogz=0.18, need <0.1)
[ultranest]   logZ error budget: single: 0.13 bs:0.07 tail:0.01 total:0.07 required:<0.10
  logZ error budget: single: 0.13 bs:0.07 tail:0.01 total:0.07 required:<0.10
[ultranest] done iterating.
done iterating.
In [69]:
print("Time taken to run 'UltraNest' is {} seconds".format(timeultranest))
Time taken to run 'UltraNest' is 7.066677570343018 seconds

We can extract the value of the marginal likelihood, $\ln{Z}$, and its uncertainties with the following:

In [70]:
logZultranest = result['logz']        # value of logZ
logZerrultranest = result['logzerr']  # estimate of the statistcal uncertainty on logZ

# output marginal likelihood
print("Marginalised evidence is {} ± {}".format(logZultranest, logZerrultranest))
Marginalised evidence is -116.83266582912734 ± 0.17742441776372703

We can now resample from the nested samples to get the posterior samples.

In [71]:
points = np.array(result["weighted_samples"]["points"])
weights = np.array(result["weighted_samples"]["weights"])
scaledweights = weights / weights.max()
mask = np.random.rand(len(scaledweights)) < scaledweights

samples_ultranest = points[mask, :]

resdict['multranest_mu'] = np.mean(samples_ultranest[:,0])      # mean of m samples
resdict['multranest_sig'] = np.std(samples_ultranest[:,0])      # standard deviation of m samples
resdict['cultranest_mu'] = np.mean(samples_ultranest[:,1])      # mean of c samples
resdict['cultranest_sig'] = np.std(samples_ultranest[:,1])      # standard deviation of c samples
resdict['ccultranest'] = np.corrcoef(samples_ultranest.T)[0,1]  # correlation coefficient between parameters
resdict['ultranest_npos'] = len(samples_ultranest)              # number of posterior samples
resdict['ultranest_time'] = timeultranest                       # run time
resdict['ultranest_logZ'] = logZultranest                       # log marginalised likelihood
resdict['ultranest_logZerr'] = logZerrultranest                 # uncertainty on log(Z)

# plot using corner.py
plotposts(samples_ultranest)

print('Number of posterior samples is {}'.format(samples_ultranest.shape[0]))
Number of posterior samples is 1085
In [72]:
resdict['esultranest'] = ess(weights)
resdict['essultranest'] = int(resdict['esultranest'] / timeultranest)
print("Effective samples per second: {}".format(resdict['essultranest']))
Effective samples per second: 223

We will now look at some nested sampling algorithms that are not pure Python, but do have Python wrappers.

PyMultiNest

PyMultiNest (Buchner et al, 2014) is a wrapper to the popular MultiNest code (Feroz & Hobson, 2008 & Feroz et al, 2009), which has been particularly taken up by the astrophysics and cosmology community. The basics underlying MultiNest were described in relation to Nestle, although MultiNest can use more sophisticated sampling methods, such as importance sampling (Feroz et al, 2014).

To use PyMultiNest, the MultiNest library itself must be installed (MultiNest can be downloaded from GitHub here). Installation of PyMultiNest and its requirements are described here; once MultiNest is installed you can then install PyMultiNest using pip with:

pip install pymultinest

A simpler solution, that comes bundled with MultiNest, is to install using Conda with:

conda install -c conda-forge pymultinest

Note: If using MultiNest, and software wrapping it, for your work please note the software license restrictions here.

Some examples of running PyMultiNest can be found here, although an interface through a solver function and Solver class are also available. We'll use the Solver class method, which can take in the same initialisation keyword arguments as the run() method.

We use the Solver class to create a new class for our specific example. As with CPNest we use this class to define our log likeihood function, and like Nestle we define a prior transform to go from a unit hypercube to the true parameterisation. These functions can only take in arrays of the parameter values that we are exploring and cannot take in additional arguments. Therefore any additional values required must either be global variables, or a new __init__ method for the class must be defined. We'll show the latter of these.

Note: PyMultiNest is also available in Python through the Monte Python package.

In [73]:
# import the Solver class
import pymultinest
from pymultinest.solve import Solver
from scipy.special import ndtri

LN2PI = np.log(2.*np.pi)

class StraightLineModelPyMultiNest(Solver):
    """
    A simple straight line model, with a Gaussian likelihood.

    Args:
        data (:class:`numpy.ndarray`): an array containing the observed data
        abscissa (:class:`numpy.ndarray`): an array containing the points at which the data were taken
        modelfunc (function): a function defining the model
        sigma (float): the standard deviation of the noise in the data
        **kwargs: keyword arguments for the run method
    """

    # define the prior parameters
    cmin = -10.  # lower range on c (the same as the uniform c prior lower bound)
    cmax = 10.   # upper range on c (the same as the uniform c prior upper bound)

    mmu = 0.     # mean of the Gaussian prior on m
    msigma = 10. # standard deviation of the Gaussian prior on m

    def __init__(self, data, abscissa, modelfunc, sigma, **kwargs):
        # set the data
        self._data = data         # oberserved data
        self._abscissa = abscissa # points at which the observed data are taken
        self._sigma = sigma       # standard deviation(s) of the data
        self._logsigma = np.log(sigma) # log sigma here to save computations in the likelihood
        self._ndata = len(data)   # number of data points
        self._model = modelfunc   # model function

        Solver.__init__(self, **kwargs)

    def Prior(self, cube):
        """
        The prior transform going from the unit hypercube to the true parameters. This function
        has to be called "Prior".

        Args:
            cube (:class:`numpy.ndarray`): an array of values drawn from the unit hypercube

        Returns:
            :class:`numpy.ndarray`: an array of the transformed parameters
        """

        # extract values
        mprime = cube[0]
        cprime = cube[1]

        m = self.mmu + self.msigma*ndtri(mprime)      # convert back to m
        c = cprime*(self.cmax-self.cmin) + self.cmin  # convert back to c

        return np.array([m, c])

    def LogLikelihood(self, cube):
        """
        The log likelihood function. This function has to be called "LogLikelihood".

        Args:
            cube (:class:`numpy.ndarray`): an array of parameter values.

        Returns:
            float: the log likelihood value.
        """

        # extract parameters
        m = cube[0]
        c = cube[1]

        # calculate the model
        model = self._model(x, m, c)

        # normalisation
        norm = -0.5*self._ndata*LN2PI - self._ndata*self._logsigma

        # chi-squared
        chisq = np.sum(((self._data - model)/(self._sigma))**2)

        return norm - 0.5*chisq

Now we can run the sampler. As before we'll use 1024 live points, and like with Nestle we'll use a stopping criterion of 0.1. By default the sampler will use importance sampling.

In [74]:
nlive = 1024 # number of live points
ndim = 2     # number of parameters
tol = 0.1    # stopping criterion

# run the algorithm
t0 = time()
solution = StraightLineModelPyMultiNest(data, x, straight_line, sigma, n_dims=ndim,
                                        n_live_points=nlive, evidence_tolerance=tol);
t1 = time()

timepymultinest = (t1-t0)
  analysing data from /tmp/tmpz61wkljapymultinest/.txt
In [75]:
print("Time taken to run 'PyMultiNest' is {} seconds".format(timepymultinest))
Time taken to run 'PyMultiNest' is 2.296818494796753 seconds

Now we can extract the value of $\ln{Z}$ and its uncertainty.

In [76]:
logZpymnest = solution.logZ        # value of log Z
logZerrpymnest = solution.logZerr  # estimate of the statistcal uncertainty on logZ

print("log(Z) = {} ± {}".format(logZpymnest, logZerrpymnest))
log(Z) = -116.67152537031939 ± 0.08263529812848029

These solutions, and the mean and standard deviation of the $m$ and $c$ parameters, can also be shown by using:

In [77]:
print(solution)
Model in "(temporary directory)" (2 dimensions)
Evidence ln Z = -116.7 +- 0.1
Parameter values:
   Parameter  1 : 3.464 +- 0.100
   Parameter  2 : 0.992 +- 0.566

We can extract and plot the posteriors. The solution.samples value contains an array of the posterior samples rather than the nested samples, so no resamping is required.

In [78]:
mchain_pymnest = solution.samples[:,0] # extract chain of m values
cchain_pymnest = solution.samples[:,1] # extract chain if c values

samples_pymnest = np.vstack((mchain_pymnest, cchain_pymnest)).T

resdict['mpymnest_mu'] = np.mean(samples_pymnest[:,0])      # mean of m samples
resdict['mpymnest_sig'] = np.std(samples_pymnest[:,0])      # standard deviation of m samples
resdict['cpymnest_mu'] = np.mean(samples_pymnest[:,1])      # mean of c samples
resdict['cpymnest_sig'] = np.std(samples_pymnest[:,1])      # standard deviation of c samples
resdict['ccpymnest'] = np.corrcoef(samples_pymnest.T)[0,1]  # correlation coefficient between parameters
resdict['pymnest_npos'] = len(samples_pymnest)              # number of posterior samples
resdict['pymnest_time'] = timepymultinest                   # run time
resdict['pymnest_logZ'] = logZpymnest                       # log marginalised likelihood
resdict['pymnest_logZerr'] = logZerrpymnest                 # uncertainty on log(Z)

# print out the number of posterior samples
print('Number of posterior samples is {}'.format(samples_pymnest.shape[0]))

# plot using corner.py
plotposts(samples_pymnest)
Number of posterior samples is 4532
In [79]:
# note: I'm not calculating the correct ESS here as I can't access the weights
resdict['espymnest'] = len(samples_pymnest)
resdict['esspymnest'] = int(resdict['espymnest'] / timepymultinest)
print("Effective samples per second: {}".format(resdict['esspymnest']))
Effective samples per second: 1973

DNest4

DNest4 (Brewer & Foreman-Mackey, 2016) uses a nested sampling algorithm called Diffusive Nested Sampling (Brewer et al, 2011). The main difference between this method and others is that it can at any point still propose samples throughout the whole prior, rather than being constrained to draw points within the current minimum likelihood bound. This allows it to find volumes of high likelihood that might be missed by other methods because they lie in an area where no live points remain, i.e. it is good for sampling narrow multi-model posteriors. It can also be used in cases where the number of parameters in the model that is being fit is itself a variable, e.g., if the line model could either be a straight line, or a quadratic, or a cubic. The method used is called a Reversible Jump MCMC (Green, 1995), although this is not currently available in the Python interface for DNest4.

An example of using the Python wrapper for DNest4 for a straight line mode is given here, but we will reproduce it with minor modifications (in that example $\sigma$ is also included in the fitting).

You can install the development version of DNest4 by cloning it from its GitHub repository and following the instructions here. However, the Python interace can be installed from PyPI via pip with

pip install dnest4

or through Conda with

conda install -c conda-forge dnest4

For DNest4 we need to create a Model class that contains the following methods:

  • from_prior(): a function that draws a sample from the prior distributions of the parameters
  • perturb(): a function that proposes new sample values
  • log_likelihood(): the natural logarithm of the likelihood function

The perturb() method also needs to calculate the natural logarithm of the Metropolis-Hastings ratio required to give the acceptance probability:

$$ H = \frac{p(\theta'|I)}{p(\theta|I)} \frac{g(\theta|\theta')}{g(\theta'|\theta)}, $$

where the first ratio on the right-hand-side is the ratio of priors at the proposed point $\theta'$ and current point $\theta$, and the second ratio of the ratio of the proposal distributions. In our case the proposal distribution is symmetric, so its ratio is unity, and the ratio of the prior depends on the Gaussian prior on $m$, but not on the uniform prior for $c$.

Parameters cannot be passed to the class on initialisation, so things such as the data must be global variables.

In [80]:
# import randh function, which draws points from a heavy-tailed distribution, and wrapm which wraps a point within a given range
from dnest4 import randh, wrap

LN2PI = np.log(2.*np.pi)
LNSIGMA = np.log(sigma)

# define prior range values as global variables
cmin = -10.  # lower range on c (the same as the uniform c prior lower bound)
cmax = 10.   # upper range on c (the same as the uniform c prior upper bound)

mmu = 0.     # mean of the Gaussian prior on m
msigma = 10. # standard deviation of the Gaussian prior on m

class DNest4Model(object):
    """
    Specify the model in Python.
    """
    def __init__(self):
        """
        Parameter values *are not* stored inside the class
        """
        pass

    def from_prior(self):
        """
        Draw points from the prior distribution.

        Returns:
            :class:`np.ndarray`: a numpy array containing the parameter values.
        """

        m = np.random.normal(mmu, msigma)
        c = np.random.uniform(cmin,  cmax)

        return np.array([m, c])

    def perturb(self, params):
        """
        Perturb the current parameters by proposing a new position. This takes a numpy array of
        parameters as input, and modifies it in-place. In just perturbs one parameter at a time.

        Args:
            params (:class:`np.ndarray`): the current parameters (this is modified by the function)

        Returns:
            float: the natural logarithm of the Metropolis-Hastings proposal ratio.
        """
        logH = 0.0 # log of the Metropolis-Hastings prior x proposal ratio

        # randomly choose which parameter to perturb 
        which = np.random.randint(len(params))
        mmu = 0.
        msigma = 10.
        if which == 0:
            # update H for Gaussian prior
            logH -= -0.5*((params[which]-mmu)/msigma)**2

        params[which] += 1.*randh() # we could scale this or use different distributions for each parameter, but in our case we know our distribution is quite compact and unimodal

        if which == 0:
            # update H for Gaussian prior
            logH += -0.5*((params[which]-mmu)/msigma)**2
        else:
            # wrap c value so that it stays within the prior range
            params[which] = wrap(params[which], cmin, cmax)

        return logH

    def log_likelihood(self, params):
        """
        Gaussian sampling distribution.
        """
        m, c = params # unpack parameters
        
        norm = -0.5*M*LN2PI - M*LNSIGMA
        chisq = np.sum(((data - straight_line(x, m, c))/sigma)**2)
        return norm - 0.5*chisq

We can now run the sampler. The values that are passed to the sampler are described in Section 7 of Brewer & Foreman-Mackey, 2016, but we'll stick with the defaults used in the example code. Unlike with the other nested sampling algorithms we do not pass a tolerance for the stopping criterion, but just specify the number of levels and steps to perform. The num_steps gives to total number of MCMC iterations, but if not set the code will run forever by default, until the process is killed by the user.

In [81]:
# import dnest4
import dnest4

print('DNest4 version: {}'.format(dnest4.__version__))

# Create a model object and a sampler
model = DNest4Model()
sampler = dnest4.DNest4Sampler(model, backend=dnest4.backends.CSVBackend(".", sep=" "))

# Set up the sampler. The first argument is max_num_levels
gen = sampler.sample(max_num_levels=30, num_steps=1000, new_level_interval=10000,
                     num_per_step=10000, thread_steps=100, num_particles=5, lam=10, beta=100)

# Do the sampling (one iteration here = one particle save)
t0 = time()
for sample in enumerate(gen):
    pass
t1 = time()

timednest4 = (t1-t0)

print("Time taken to run 'DNest4' is {} seconds".format(timednest4))
DNest4 version: 0.2.4
Time taken to run 'DNest4' is 197.01088690757751 seconds

To create the posterior samples the output of DNest4 has to be processed, which can be performed using the postprocess() function. Once this is done the posterior samples will be held in a posterior_sample.txt file containing a column for each parameter. As stated in Brewer & Foreman-Mackey, 2016:

"it is harder to compute justified error bars on $\ln{(Z)}$ in DNS than it is in standard NS."

so we will not quote a value here.

In [82]:
# Run the postprocessing to get marginal likelihood and generate posterior samples
logZdnest4, infogaindnest4, _ = dnest4.postprocess(plot=False);

samples_dnest4 = np.loadtxt('posterior_sample.txt')
log(Z) = -117.48591853302189
Information = 7.816433705922876 nats.
Effective sample size = 161.8889539426515

Now let's plot the posteriors (noting that the effective number of posterior samples is quite a bit smaller than for the other samplers).

In [83]:
# print out the number of posterior samples
print('Number of posterior samples is {}'.format(samples_dnest4.shape[0]))

resdict['mdnest4_mu'] = np.mean(samples_dnest4[:,0])      # mean of m samples
resdict['mdnest4_sig'] = np.std(samples_dnest4[:,0])      # standard deviation of m samples
resdict['cdnest4_mu'] = np.mean(samples_dnest4[:,1])      # mean of c samples
resdict['cdnest4_sig'] = np.std(samples_dnest4[:,1])      # standard deviation of c samples
resdict['ccdnest4'] = np.corrcoef(samples_dnest4.T)[0,1]  # correlation coefficient between parameters
resdict['dnest4_npos'] = len(samples_dnest4)              # number of posterior samples
resdict['dnest4_time'] = timednest4                       # run time
resdict['dnest4_logZ'] = logZdnest4                       # log marginalised likelihood

# plot using corner.py
plotposts(samples_dnest4)
Too few points to create valid contours
Number of posterior samples is 161
In [84]:
# note: I'm not calculating the correct ESS here as I can't access the weights
resdict['esdnest4'] = len(samples_dnest4)
resdict['essdnest4'] = len(samples_dnest4) / timednest4
print("Effective samples per second: {0:.2f}".format(resdict['essdnest4']))
Effective samples per second: 0.82

PyPolyChord

PolyChord (Handley et al, 2015a, Handley et al, 2015b) is a nested sampling implementation that uses slice sampling when drawing new points from the constrained prior.

The PolyChord package itself provides a Python interface called PyPolyChord for using the algorithm. PolyChord can be downloaded from Github here with:

git clone https://github.com/PolyChord/PolyChordLite.git

PyPolyChord can then be installed with

cd PolyChordLite
python setup.py install

The installation will attempt to find a copy of the libmpi.so library. If you have not got MPI (or are having further problems with MPI) you can disable it by instead installing with:

python setup.py --no-mpi install

As with Nestle and PyMultiNest PolyChord samples from a unit hypercube, so you need to define a prior transform function. We also need a log likelihood function. These can be very similar to those used for Nestle, but the functions must take in and output list rather than lists or tuples. Values such as the data array cannot be passed to the functions explicitly, so must be set as global variables.

Note: If using PolyChord for your work, please note the license restrictions described here.

In [85]:
from scipy.special import ndtri

LN2PI = np.log(2. * np.pi)
LNSIGMA = np.log(sigma)

def prior_transform_polychord(cube):
    """
    A function defining the tranform between the parameterisation in the unit hypercube
    to the true parameters.

    Args:
        cube (array, list): a list containing the parameters as drawn from a unit hypercube.

    Returns:
        list: the transformed parameters.
    """

    # unpack the parameters (in their unit hypercube form)
    mprime = cube[0]
    cprime = cube[1]
    
    cmin = -10.   # lower bound on uniform prior on c
    cmax = 10.    # upper bound on uniform prior on c

    mmu = 0.      # mean of Gaussian prior on m
    msigma = 10.  # standard deviation of Gaussian prior on m

    m = mmu + msigma * ndtri(mprime) # convert back to m
    c = cprime * (cmax - cmin) + cmin  # convert back to c

    theta = [m, c]

    return theta


def loglikelihood_polychord(theta):
    """
    The log-likelihood function.

    Args:
        theta (array, list): the set of parameter values.

    Returns:
        float: the log-likelihood value.
        list: A list of any derived parameters (an empty list if there are none).
    """

    # unpack the parameters
    m = theta[0]
    c = theta[1]

    # normalisation
    norm = -0.5 * M * LN2PI - M*LNSIGMA

    # chi-squared (data, sigma and x are global variables defined early on in this notebook)
    chisq = np.sum(((data-straight_line(x, m, c)) / sigma)**2)

    return norm - 0.5 * chisq, []

As before we'll use 1024 live points (the default is to use 25 times the dimensionality of the problem) and a stopping criterion of 0.1 (the default value is 0.001). The current version of PyPolyChord will return values of the marginal likelihood and its uncertainty from the PyPolyChord object that runs the algorithm, but the class does not currently contain the posterior samples. It instead outputs these to files. The location of these files is set by setting the keyword arguments base_dir (defaults to "chains") and file_root (defaults to "test") in a PolyChordSettings object. This sets {root} = {base_dir}/{file_root}, and within this the posterior samples can be found in {root}.txt and the other statistics found in {root}.stats.

You may find that codes produce a "Segmentation Fault" soon after they start running. The PolyChord documentation says:

"The slice sampling & clustering steps use a recursive procedure. The default memory allocated to recursive procedures is embarassingly small (to guard against memory leaks)."

and the solution is to:

Try increasing the stack size:
Linux: ulimit -s unlimited
OSX: ulimit -s hard

Alternatively, within a Python script you can achieve this using the resource module, as shown below.

In [86]:
import resource
curlimit = resource.getrlimit(resource.RLIMIT_STACK)
resource.setrlimit(resource.RLIMIT_STACK, (resource.RLIM_INFINITY,resource.RLIM_INFINITY))
In [87]:
nlive = 1024   # number of live points
ndims = 2      # number of parameters
nderived = 0   # number of derived parameters (this is zero)
tol = 0.1      # stopping criterion
basedir = os.path.join(os.getcwd(), 'polychord')  # output base directory
if not os.path.isdir(basedir):
    os.makedirs(basedir)                          # create base directory
    os.makedirs(os.path.join(basedir, 'clusters'))
fileroot = 'straightline'                         # output file name
broot = os.path.join(basedir, fileroot)

# import PolyChord
import pypolychord
from pypolychord.settings import PolyChordSettings  # class for passing setup information
from pypolychord.priors import UniformPrior         # pre-defined class for a uniform prior

# setup run settings using the PolyChordSetting class
pargs = {'nlive': nlive,
         'precision_criterion': tol,
         'base_dir': basedir,
         'file_root': fileroot,
         'write_resume': False, # don't output a resume file
         'read_resume': False}  # don't read a resume file
settings = PolyChordSettings(ndims, nderived, **pargs)

t0 = time()
output = pypolychord.run_polychord(loglikelihood_polychord, ndims, nderived, settings, prior_transform_polychord)
t1 = time()

timepolychord = (t1-t0)

print("Time taken to run 'PyPolyChord' is {} seconds".format(timepolychord))

# reset stack resource limit
resource.setrlimit(resource.RLIMIT_STACK, curlimit)
Time taken to run 'PyPolyChord' is 8.239450216293335 seconds

We can extract the marginal likelihood and the uncertainty from the output variable.

In [88]:
print("log(Z) = {} ± {}".format(output.logZ, output.logZerr))
log(Z) = -116.669815848209 ± 0.0833838596668114

We can also extract the posterior samples from the file with the _equal_weights.txt suffix. Given we have two parameters, the chains for these will be the final two columns within the samples file.

In [89]:
samplefile = broot+'_equal_weights.txt'
samples_polychord = np.loadtxt(samplefile)
samples_polychord = samples_polychord[:,-ndims:] # extract the last 'ndims' columns

# print out the number of posterior samples
print('Number of posterior samples is {}'.format(samples_polychord.shape[0]))

resdict['mpolychord_mu'] = np.mean(samples_polychord[:,0])      # mean of m samples
resdict['mpolychord_sig'] = np.std(samples_polychord[:,0])      # standard deviation of m samples
resdict['cpolychord_mu'] = np.mean(samples_polychord[:,1])      # mean of c samples
resdict['cpolychord_sig'] = np.std(samples_polychord[:,1])      # standard deviation of c samples
resdict['ccpolychord'] = np.corrcoef(samples_polychord.T)[0,1]  # correlation coefficient between parameters
resdict['polychord_npos'] = len(samples_polychord)              # number of posterior samples
resdict['polychord_time'] = timepolychord                       # run time
resdict['polychord_logZ'] = output.logZ                         # log marginalised likelihood
resdict['polychord_logZerr'] = output.logZerr                   # uncertainty on log(Z)

# plot using corner.py
plotposts(samples_polychord)
Number of posterior samples is 2816
In [90]:
weights = np.loadtxt(broot+'.txt')[:,0]  # get weights

resdict['espolychord'] = ess(weights)
resdict['esspolychord'] = int(resdict['espolychord'] / timepolychord)
print("Effective samples per second: {}".format(resdict['esspolychord']))
Effective samples per second: 493

One code to rule them all!

There are a variety of different styles, syntaxes and formats required to use the various packages decribed above, so wouldn't it be nice to have a unified interface to access a range of different samplers!

Well, there are in fact a few different packages that can allow you to use more than on of the above samplers, e.g., Monte Python, cronus, Gleipnir and bilby (other may, and probably do, exist).

bilby

Of these, bilby (Ashton et al, 2019) provides the access to the widest range of samplers (I may be biased as one of the developers), including both MCMC and nested sampler packages. An example of using bilby for a linear regression model can be found here, but I'll reproduce something equivalent to the example in this notebook below.

Bilby has a selection of built-in likelihood functions and prior distributions. You can define your own likelihood, but here we'll use the built-in Gaussian likelihood function (the source code for which can be found here). By default, bilby will use the dynesty sampler, but we'll explicitly set it in this example.

In [91]:
%%capture
import bilby

# import the likelihood function and prior function
from bilby.core.likelihood import GaussianLikelihood
from bilby.core.prior import Gaussian, Uniform

# define the likelihood function
likelihood = bilby.core.likelihood.GaussianLikelihood(x, data, straight_line, sigma=sigma)

cmin = -10.  # lower range of uniform distribution on c
cmax = 10.   # upper range of uniform distribution on c

mmu = 0.      # mean of Gaussian distribution on m
msigma = 10.  # standard deviation of Gaussian distribution on m

# define the priors in a dictionary
priors = {
    "m": Gaussian(mmu, msigma, name="m"),
    "c": Uniform(cmin, cmax, name="c"),
}

# run the sampler (using dynesty)
bilbyresult = bilby.run_sampler(
    likelihood=likelihood,
    priors=priors,
    outdir="bilby",     # the directory to output the results to
    label="bilby",      # a label to apply to the output file
    plot=False,         # by default this is True and a corner plot will be produced
    injection_parameters={"m": m, "c": c},  # (optional) true values for adding to plots
    sampler='dynesty',  # set the name of the sampler
    nlive=1024,         # add in any keyword arguments used by that sampler
    sample='unif',
)

The bilby Result object contains all the posterior samples (as a Pandas data frame) and has methods and attributes to extract the required information. We can use these to produce a corner plot:

In [92]:
bilbyresult.plot_corner()
Out[92]:

Summary

The timings given in this notebook are not directly comparable as the various codes have not been run with identical settings, and the number of likelihood evaluations produced can be quite different. But, they may be helpful in providing a rough ball-park of expected run times. As mentioned in relation to emcee above, it is important to note that different samplers will produce different numbers of truly independent samples, so some methods that may take longer to run may actually give more independent samples. Therefore comparisons of the number of effective samples per second may be more useful, although these will still be quite problem dependent. It is also worth noting that some of the codes can make use or multiple CPU cores, or GPUs, but these have not been used here. So, speed-ups could be achieved by using these in-built parallelisations.

The number of posterior samples produced by the nested sampling codes can depend strongly on the number of initial live points used. However, for the codes where we have specified a number of live points the final posterior sizes are very comparable. It should be noted that comparing posterior sample sizes and run times for DNest4 in particular is not entirely fair, as it is designed with more complex problems in mind and is not created to be efficient at simple examples where other methods work easily.

Something that is also worth noting, in particular with regard to nested sampling algorithms, is that they will generally take longer to run when there is a larger degree of difference between the prior and the posterior (i.e., the data contains a lot of new information, which, in technical terms, is called having a large Kullback-Leibler (KL) divergence). So, estimating the parameters of a model when the data contains a high signal-to-noise ratio signal will often take longer than the same model but with a weaker signal in the data.

Now, let's overplot everything (it will look a bit horrific)!

In [93]:
from matplotlib.pyplot import cm 
from matplotlib.colors import to_hex

# get a list of all the samples
chains = [samples_emcee, samples_pymc3, samples_zeus, samples_tfp, samples_pystan,
          samples_pyjags, samples_nestle, samples_cpnest, samples_dynesty, samples_ultranest,
          samples_pymnest, samples_dnest4, samples_polychord]

color = cm.gnuplot(np.linspace(0, 1, len(chains)))

hist2dkwargs = {'plot_datapoints': False,
                'plot_density': False,
                'levels': 1.0 - np.exp(-0.5 * np.arange(1.5, 2.1, 0.5) ** 2)} # roughly 1 and 2 sigma

for i, chain in enumerate(chains):
    cv = to_hex(color[i])
    if i == 0:
        fig = corner.corner(
            chain,
            labels=[r"$m$", r"$c$"],
            color=cv,
            hist_kwargs={'density': True},
            **hist2dkwargs,
        )
    else:
        if i < len(chains)-1:
            corner.corner(
                chain,
                labels=[r"$m$", r"$c$"],
                fig=fig,
                color=cv,
                hist_kwargs={'density': True},
                **hist2dkwargs,
            )
        else: # plot true values on last plot
            corner.corner(
                chain,
                labels=[r"$m$", r"$c$"],
                truths=[m, c],
                fig=fig,
                color=cv,
                hist_kwargs={'density': True},
                **hist2dkwargs,
            )
            
fig.set_size_inches(10.0, 9.0)
Too few points to create valid contours
In [94]:
from IPython.display import Markdown

outputtable = """
| Sampler | $m$ | $c$ | corr. coeff | num. effective samples | effective samples / s |
|--- | :---: | :---: | :---: | :---: | :---: |
| emcee | {memcee_mu:.3f} ± {memcee_sig:.3f} | {cemcee_mu:.3f} ± {cemcee_sig:.3f} | {ccemcee:.3f} | {esemcee:d} | {essemcee:d} |
| PyMC3 | {mpymc3_mu:.3f} ± {mpymc3_sig:.3f} | {cpymc3_mu:.3f} ± {cpymc3_sig:.3f} | {ccpymc3:.3f} | {espymc3:d} | {esspymc3:d} |
| TFP | {mtfp_mu:.3f} ± {mtfp_sig:.3f} | {ctfp_mu:.3f} ± {ctfp_sig:.3f} | {cctfp:.3f} | {estfp:d} | {esstfp:d} |
| emcee | {mzeus_mu:.3f} ± {mzeus_sig:.3f} | {czeus_mu:.3f} ± {czeus_sig:.3f} | {cczeus:.3f} | {eszeus:d} | {esszeus:d} |
| PyStan | {mpystan_mu:.3f} ± {mpystan_sig:.3f} | {cpystan_mu:.3f} ± {cpystan_sig:.3f} | {ccpystan:.3f} | {espystan:d} | {esspystan:d} |
| PyJAGS | {mpyjags_mu:.3f} ± {mpyjags_sig:.3f} | {cpyjags_mu:.3f} ± {cpyjags_sig:.3f} | {ccpyjags:.3f} | {espyjags:d} | {esspyjags:d} |
| Nestle | {mnestle_mu:.3f} ± {mnestle_sig:.3f} | {cnestle_mu:.3f} ± {cnestle_sig:.3f} | {ccnestle:.3f} | {esnestle:d} | {essnestle:d} |
| CPNest | {mcpnest_mu:.3f} ± {mcpnest_sig:.3f} | {ccpnest_mu:.3f} ± {ccpnest_sig:.3f} | {cccpnest:.3f} | {escpnest:d} | {esscpnest:d} |
| dynesty | {mdynesty_mu:.3f} ± {mdynesty_sig:.3f} | {cdynesty_mu:.3f} ± {cdynesty_sig:.3f} | {ccdynesty:.3f} | {esdynesty:d} | {essdynesty:d} |
| UltraNest | {multranest_mu:.3f} ± {multranest_sig:.3f} | {cultranest_mu:.3f} ± {cultranest_sig:.3f} | {ccultranest:.3f} | {esultranest:d} | {essultranest:d} |
| PyMultiNest | {mpymnest_mu:.3f} ± {mpymnest_sig:.3f} | {cpymnest_mu:.3f} ± {cpymnest_sig:.3f} | {ccpymnest:.3f} | {espymnest:d} | {esspymnest:d} |
| DNest4 | {mdnest4_mu:.3f} ± {mdnest4_sig:.3f} | {cdnest4_mu:.3f} ± {cdnest4_sig:.3f} | {ccdnest4:.3f} | {esdnest4:.2f} | {essdnest4:.2f} |
| PyPolyChord | {mpolychord_mu:.3f} ± {mpolychord_sig:.3f} | {cpolychord_mu:.3f} ± {cpolychord_sig:.3f} | {ccpolychord:.3f} | {espolychord:d} | {esspolychord:d} |
"""

Markdown(outputtable.format(**resdict))
Out[94]:
Sampler $m$ $c$ corr. coeff num. effective samples effective samples / s
emcee 3.466 ± 0.098 0.975 ± 0.560 -0.862 2000 1059
PyMC3 3.467 ± 0.094 0.978 ± 0.539 -0.858 873 190
TFP 3.463 ± 0.100 1.003 ± 0.578 -0.876 230 47
emcee 3.465 ± 0.099 0.985 ± 0.562 -0.864 16172 1279
PyStan 3.463 ± 0.096 0.992 ± 0.552 -0.862 1646 341
PyJAGS 3.462 ± 0.096 1.001 ± 0.546 -0.861 3292 683
Nestle 3.465 ± 0.101 0.983 ± 0.577 -0.858 4062 4309
CPNest 3.468 ± 0.095 0.961 ± 0.538 -0.856 4008 229
dynesty 3.468 ± 0.100 0.974 ± 0.570 -0.867 4074 1246
UltraNest 3.467 ± 0.093 0.963 ± 0.526 -0.847 1581 223
PyMultiNest 3.464 ± 0.100 0.992 ± 0.566 -0.868 4532 1973
DNest4 3.466 ± 0.094 1.004 ± 0.559 -0.869 161.00 0.82
PyPolyChord 3.467 ± 0.098 0.984 ± 0.568 -0.862 4066 493
In [95]:
outputnest = """
#### Nested sampling

| Sampler | $\ln{{Z}}$ | num. posterior samples | run time (s) |
| --- | --- | :---: | --- |
| Nestle | {nestle_logZ:.3f} ± {nestle_logZerr:.3f} | {nestle_npos} | {nestle_time:.3f} |
| CPNest | {cpnest_logZ:.3f} ± {cpnest_logZerr:.3f} | {cpnest_npos} | {cpnest_time:.3f} |
| dynesty | {dynesty_logZ:.3f} ± {dynesty_logZerr:.3f} | {dynesty_npos} | {dynesty_time:.3f} |
| UltraNest | {ultranest_logZ:.3f} ± {ultranest_logZerr:.3f} | {ultranest_npos} | {ultranest_time:.3f} |
| PyMultiNest | {pymnest_logZ:.3f} ± {pymnest_logZerr:.3f} | {pymnest_npos} | {pymnest_time:.3f} |
| DNest4 | {dnest4_logZ:.3f} | {dnest4_npos} | {dnest4_time:.3f} |
| PyPolyChord| {polychord_logZ:.3f} ± {polychord_logZerr:.3f} | {polychord_npos} | {polychord_time:.3f} |
"""

Markdown(outputnest.format(**resdict))
Out[95]:

Nested sampling

Sampler $\ln{Z}$ num. posterior samples run time (s)
Nestle -116.724 ± 0.083 2808 0.943
CPNest -116.803 ± 0.084 2754 17.426
dynesty -116.662 ± 0.117 11624 3.268
UltraNest -116.833 ± 0.177 1085 7.067
PyMultiNest -116.672 ± 0.083 4532 2.297
DNest4 -117.486 161 197.011
PyPolyChord -116.670 ± 0.083 2816 8.239
In [96]:
import cpuinfo
print('This was run on a single core "{}" CPU'.format(cpuinfo.get_cpu_info()['brand_raw']))
This was run on a single core "Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz" CPU

Running this notebook

I have created a Docker image, available on Quay.io, in which Python 3.7 and all the above samplers are installed (this is roughly 5 Gb in size, so may take some time to download). This can be used to run this notebook in Jupyter. You can download this notebook from here.

If you have Docker installed then under Linux, Mac OS or Windows you can run the notebook by opening a terminal (e.g., the Powershell in Windows) and running:

sudo docker run -i -t -p 8888:8888 -v ${notebook_location}:/samplers quay.io/mattpitkin/samplers-demo:latest /bin/bash -c "jupyter notebook --notebook-dir=/samplers --ip='*' --port=8888 --no-browser --allow-root --MultiKernelManager.default_kernel_name=Samplers"

where {notebook_location} is replaced with the directory path containing the downloaded notebook. This will provide a link that can be pasted into your browser to show the notebook, and the Samplers.ipynb file can be selected. Under Windows remove the sudo from the above command.

Comments