Likelihood-free inference modules

To be able to make use of the outputs of the IMNN it is best to use likelihood-free inference (this is true when using any sort of neural network, but anyway - I digress). Any lfi or sbi module should be able to use these outputs where the network (once trained) can be added to the simulation as a compression step. For ease of use there are some jax implementations of a few different approximate Bayesian computation algorithms. These also come with a plotting function. Examples are available here. Note that the module is not fully tested, but seems to be functionally working. There are more functions available with pip install imnn-tf where there is a previous implementation with more general versions of the same functions and checkout also github.com/justinalsing/pydelfi

Doing likelihood-free inference

With a trained IMNN it is possible to get an estimate of some data using

estimate = IMNN.get_estimate(target_data)

Along with the Fisher information from the network, we can use this to make a Gaussian approximation of the posterior under the assumption that the fiducial parameter values used to calculate the Fisher information coincide with the parameter estimate. This posterior can be calculated using the imnn.lfi module. For all of the available functions in the lfi module a TensorFlow Probability-like distribution is used for the prior, e.g. for a uniform distribution for two parameters between 0 and 10 each we could write

import tensorflow_probability
tfp = tensorflow_probability.substrates.jax

prior = tfp.distributions.Blockwise(
[tfp.distributions.Uniform(low=low, high=high)
 for low, high in zip([0., 0.], [10., 10.])])

 prior.low = np.array([0., 0.])
 prior.high = np.array([10., 10.])

We set the values of prior.low and prior.high since they are used to define the plotting ranges. Note that prior.event_shape should be equal to n_params, i.e. the number of parameters in the physical model.

Gaussian Approximation

The GaussianApproximation simply evaluates a multivariate Gaussian with mean at estimate and covariance given by np.linalg.inv(IMNN.F) on a grid defined by the prior ranges.

GA = imnn.lfi.GaussianApproximation(
  parameter_estimates=estimate,
  invF=np.linalg.inv(IMNN.F),
  prior=prior,
  gridsize=100)

And corner plots of the Gaussian approximation can be made using

GA.marginal_plot(
  ax=None,                   # Axes object to plot (constructs new if None)
  ranges=None,               # Ranges for each parameter (None=preset)
  marginals=None,            # Marginal distributions to plot (None=preset)
  known=None,                # Plots known parameter values if not None
  label=None,                # Adds legend element if not None
  axis_labels=None,          # Adds labels to the axes if not None
  levels=None,               # Plot specified approx significance contours
  linestyle="solid",         # Linestyle for the contours
  colours=None,              # Colour for the contours
  target=None,               # If multiple target data, which index to plot
  format=False,              # Whether to set up the plot decoration
  ncol=2,                    # Number of columns in the legend
  bbox_to_anchor=(1.0, 1.0)) # Where to place the legend

Note that this approximation shouldn’t be necessarily a good estimate of the true posterior, for that actual LFI methods should be used.

Approximate Bayesian Computation

To generate simulations and accept or reject these simulations based on a distance based criterion from some target data we can use

ABC = imnn.lfi.ApproximateBayesianComputation(
  target_data=target_data,
  prior=prior,
  simulator=simulator,
  compressor=IMNN.get_estimate,
  gridsize=100,
  F=IMNN.F,
  distance_measure=None)

This takes in the target data and compresses it using the provided compressor (like IMNN.get_estimate). The Fisher information matrix can be provided to rescale the parameter directions to make meaningful distance measurements as long as summaries are parameter estimates. If a different distance measure is better for the specific problem this can be passed as a function. Note that if simulations have already been done for the ABC and only the plotting and the acceptance and rejection is needed then simulator can be set to None. The ABC can actually be run by calling the module

parameters, summaries, distances = ABC(
    ϵ=None,             # The size of the epsilon ball to accept summaries
    rng=None,           # Random number generator for params and simulation
    n_samples=None,     # The number of samples to run (at one time)
    parameters=None,    # Values of parameters with premade compressed sims
    summaries=None,     # Premade compressed sims to avoid running new sims
    min_accepted=None,  # Num of required sims in epsilon ball (iterative)
    max_iterations=10,  # Max num of iterations to try and get min_accepted
    smoothing=None,     # Amount of smoothing on the histogrammed marginals
    replace=False)      # Whether to remove all previous run summaries

If not run and no premade simulations have been made then n_samples and rng must be passed. Note that if ϵ is too large then the accepted samples should not be considered to be drawn from the posterior but rather some partially marginalised part of the joint distribution of summaries and parameters, and hence it can be very misleading - ϵ should be a small as possible! Like with the GaussianApproximation there is a ABC.marginal_plot(...) but the parameter samples can also be plotted as a scatter plot on the corner plot

ABC.scatter_plot(
    ax=None,                 # Axes object to plot (constructs new if None)
    ranges=None,             # Ranges for each parameter (None=preset)
    points=None,             # Parameter values to scatter (None=preset)
    label=None,              # Adds legend element if not None
    axis_labels=None,        # Adds labels to the axes if not None
    colours=None,            # Colour for the scatter points (and hists)
    hist=True,               # Whether to plot 1D histograms of points
    s=5,                     # Marker size for points
    alpha=1.,                # Amount of transparency for the points
    figsize=(10, 10),        # Size of the figure if not premade
    linestyle="solid",       # Linestyle for the histograms
    target=None,             # If multiple target data, which index to plot
    ncol=2,                  # Number of columns in the legend
    bbox_to_anchor=(0., 1.)) # Where to place the legend

And the summaries can also be plotted on a corner plot with exactly the same parameters as scatter_plot (apart from gridsize being added) but if points is left None then ABC.summaries.accepted is used instead and the ranges calculated from these values. If points is supplied but ranges is None then the ranges are calculated from the minimum and maximum values of the points are used as the edges.

ABC.scatter_summaries(
    ax=None,
    ranges=None,
    points=None,
    label=None,
    axis_labels=None,
    colours=None,
    hist=True,
    s=5,
    alpha=1.,
    figsize=(10, 10),
    linestyle="solid",
    gridsize=100,
    target=None,
    format=False,
    ncol=2,
    bbox_to_anchor=(0.0, 1.0))

Population Monte Carlo

To more efficiently accept samples than using a simple ABC where samples are drawn from the prior, the PopulationMonteCarlo provides a JAX accelerated iterative acceptance and rejection scheme where each iteration the population of samples with summaries closest to the summary of the desired target defines a new proposal distribution to force a fixed population to converge towards the posterior without setting an explicit size for the epsilon ball of normal ABC. The PMC is stopped using a criterion on the number of accepted proposals compared to the number of total draws from the proposal. When this gets very small it suggests the distribution is stationary and that the proposal has been reached. It works similarly to ApproximateBayesianComputation.

PMC = imnn.lfi.PopulationMonteCarlo(
  target_data=target_data,
  prior=prior,
  simulator=simulator,
  compressor=IMNN.get_estimate,
  gridsize=100,
  F=IMNN.F,
  distance_measure=None)

And it can be run using

parameters, summaries, distances = PMC(
    rng,                     # Random number generator for params and sims
    n_points,                # Number of points from the final distribution
    percentile=None,         # Percentage of points making the population
    acceptance_ratio=0.1,    # Fraction of accepted draws vs total draws
    max_iteration=10,        # Maximum number of iterations of the PMC
    max_acceptance=1,        # Maximum number of tries to get an accepted
    max_samples=int(1e5),    # Maximum number of attempts to get parameter
    n_initial_points=None,   # Number of points in the initial ABC step
    n_parallel_simulations=None, # Number of simulations to do in parallel
    proposed=None,           # Prerun parameter values for the initial ABC
    summaries=None,          # Premade compressed simulations for ABC
    distances=None,          # Precalculated distances for the initial ABC
    smoothing=None,          # Amount of smoothing on histogrammed marginal
    replace=False)           # Whether to remove all previous run summaries

The same plotting functions as ApproximateBayesianComputation are also available in the PMC

Note

There seems to be a bug in PopulationMonteCarlo and the parallel sampler is turned off

LikelihoodFreeInference

class imnn.lfi.LikelihoodFreeInference(prior, gridsize=100, marginals=None)

Base class (some functionality) for likelihood free inference methods

Mostly used for plotting

Parameters
  • gridsize (list) – The number of grid points to evaluate the marginal distribution on for each parameter

  • ranges (list) – A list of arrays containing the gridpoints for the marginal distribution for each parameter

  • marginal (list of lists) – A list of rows and columns of marginal distributions of a corner plot

prior()

A prior distribution which can be evaluated and sampled from (should also contain a low and a high attribute with appropriate ranges)

Public Methods:

__init__(prior[, gridsize, marginals])

Constructor method

get_gridsize(gridsize, size)

Propogates gridpoints per parameter into a list if int provided

get_levels(marginal, ranges[, levels])

Used for approximately calculating confidence region isocontours

setup_plot([ax, ranges, axis_labels, …])

Builds corner plot

marginal_plot([ax, ranges, marginals, …])

Plots the marginal distribution corner plots

put_marginals(marginals)

Creates list of 1D and 2D marginal distributions ready for plotting

target_choice(target)

Returns the indices of targets to plot and the number of targets

Private Methods:

_scatter_plot([ax, ranges, points, label, …])

Plotter for scatter plots


_scatter_plot(ax=None, ranges=None, points=None, label=None, axis_labels=None, colours=None, hist=True, s=5.0, alpha=1.0, figsize=(10, 10), linestyle='solid', target=None, format=False, ncol=2, bbox_to_anchor=(0.0, 1.0))

Plotter for scatter plots

Plots scatter plots for points (parameters or summaries) in 2D subplots and the histogram of points in the 1D diagonal subplots.

Parameters
  • ax (list of matplotlib axes or None, default=None) – If ax is None a new figure and axes are created to make a corner plot otherwise a set of axes are formatted correctly for a corner plot. If existing ax is not correctly shaped there will be an error

  • ranges (list, default=None) – The list of ranges to set the number of rows and columns (if this is None the ranges set on initialisation will be used)

  • points (float(n_targets, n_points, {n_params} or {n_summaries})) – The points to scatter plot

  • label (str or None, default=None) – Name to be used in the legend

  • axis_labels (list of str or None, default=None) – A list of names for each parameter, no axis labels if None

  • colours (str or list or None, default=None) – The colour or list of colours to use for the different targets, if None then the normal matplotlib colours are used

  • hist (bool, default=True) – Whether or not to plot histograms on the diagonal of the corner plot

  • s (float, default=5.) – The size of the marker points in the scatter plot

  • alpha (float, default=1.) – The amount of alpha colour for the marker points

  • figsize (tuple, default=(10, 10)) – The size (in inches) to create a figure (if ax is None)

  • linestyle (str, default="solid") – Linestyle for the histograms

  • target (None or int or list, default=None) – The indices to choose which target’s points are plotted

  • format (bool, default=False) – If formatting is not needed

  • ncols (int, default=2) – Number of columns for the legend

  • bbox_to_anchor (tuple, default=(0.0, 1.0)) – Position to fix the legend to

Returns

The scatter plot axes

Return type

axes object

Raises

ValueError – if colour input is not correct

get_gridsize(gridsize, size)

Propogates gridpoints per parameter into a list if int provided

Parameters
  • gridsize (int or list) – The number of grid points to evaluate the marginal distribution on for every parameter (int) or each parameter (list)

  • size (int) – Number of parameters describing size of gridsize list

Returns

The number of gridpoints to evaluate marginals for each parameter

Return type

list

Raises
  • ValueError – If list passed is wrong length

  • TypeError – If gridsize is not int or list of correct length

get_levels(marginal, ranges, levels=[0.68, 0.95])

Used for approximately calculating confidence region isocontours

To calculate the values of the marginal distribution whose isocontour contains approximately levels specified fraction of samples drawn from the distribution the marginal distribution is sorted by value and normalised to one. If the distribution does is significantly non-zero outside of the range then this will cause large biases! The cumulative sum of the sorted values are then calculated and the value at which the index of this normalised cumulative distribution is closest to the desired level is used to return the value of the original marginal distribution.

Parameters
  • marginal (float(gridsize*n_param)) – The gridded marginal distribution to find the isocontour values of

  • ranges (list) – List of the grid points for each parameter

  • levels (list, default=[0.68, 0.95]) – The fraction describing the percentage of samples inside the isocontour

Returns

The values of the isocontours of the marginal distributions

Return type

list

marginal_plot(ax=None, ranges=None, marginals=None, known=None, label=None, axis_labels=None, levels=None, linestyle='solid', colours=None, target=None, format=False, ncol=2, bbox_to_anchor=(1.0, 1.0))

Plots the marginal distribution corner plots

Plots the 68% and 95% (approximate) confidence contours for each 2D parameter pair and the 1D densities as a corner plot.

Parameters
  • ax (list of matplotlib axes or None, default=None) – If ax is None a new figure and axes are created to make a corner plot otherwise a set of axes are formatted correctly for a corner plot. If existing ax is not correctly shaped there will be an error

  • ranges (list, default=None) – The list of ranges to set the number of rows and columns (if this is None the ranges set on initialisation will be used)

  • marginals (list of lists (n_params * n_params) or None, default=None) – The marginal distributions for every parameter and every 2D combination. If None the marginals defined in the class are used (if they exist)

  • known (float(n_params,)) – Values to plot known parameter value lines at

  • label (str or None, default=None) – Name to be used in the legend

  • axis_labels (list of str or None, default=None) – A list of names for each parameter, no axis labels if None

  • linestyle (str, default="solid") – Linestyle for the histograms

  • colours (str or list or None, default=None) – The colour or list of colours to use for the different targets, if None then the normal matplotlib colours are used

  • target (None or int or list, default=None) – The indices to choose which target’s points are plotted

  • format (bool, default=False) – If formatting is not needed

  • ncols (int, default=2) – Number of columns for the legend

  • bbox_to_anchor (tuple, default=(0.0, 1.0)) – Position to fix the legend to

Returns

The scatter plot axes

Return type

axes object

Raises
  • ValueError – if marginals is not provided

  • TypeError – if marginals is not a list

put_marginals(marginals)

Creates list of 1D and 2D marginal distributions ready for plotting

The marginal distribution lists from full distribution array. For every parameter the full distribution is summed over every other parameter to get the 1D marginals and for every combination the 2D marginals are calculated by summing over the remaining parameters. The list is made up of a list of n_params lists which contain n_columns number of objects.

Parameters

marginals (float(n_targets, gridsize*n_params)) – The full distribution from which to calculate the marginal distributions

Returns

The 1D and 2D marginal distributions for each parameter (of pair)

Return type

list of lists

setup_plot(ax=None, ranges=None, axis_labels=None, figsize=(10, 10), format=False)

Builds corner plot

axlist of matplotlib axes or None, default=None

If ax is None a new figure and axes are created to make a corner plot otherwise a set of axes are formatted correctly for a corner plot. If existing ax is not correctly shaped there will be an error

rangeslist, default=None

The list of ranges to set the number of rows and columns (if this is None there will be an error)

axis_labelslist of str or None, default=None

A list of names for each parameter, no axis labels if None

figsizetuple, default=(10, 10)

The size (in inches) to create a figure (if ax is None)

formatbool, default=False

If formatting is not needed

Returns

The formatted matplotlib axes

Return type

axes object

target_choice(target)

Returns the indices of targets to plot and the number of targets

Parameters

target (None or int or list) – The indices of the targets to plot. If None then n_targets from the class instance is used

Returns

  • list – The indices of the targets to be plotted

  • int – The number of targets to be plotted

GaussianApproximation

class imnn.lfi.GaussianApproximation(parameter_estimates, invF, prior, gridsize=100)

Uses Fisher information and parameter estimates approximate marginals

Since the inverse of the Fisher information matrix describes the minimum variance of some estimator we can use it to make an approximate (Gaussian- distributed) estimate of the target distribution. Note that this will not reflect the true shape of the target distribution and as likely to underestimate the distribution as overestimate it. Furthermore, if the Fisher information matrix is calculated far from the estimate of the parameter values then its value may not be representative of the Fisher information at that position and so the variance estimated from its inverse be incorrect.

Parameters
  • parameter_estimates (float(n_targets, n_params)) – The parameter estimates of each target data

  • invF (float(n_targets, n_params, n_params)) – The inverse Fisher information matrix for each target

  • marginals (list of lists) – The 1D and 2D marginal distribution for each target

Public Methods:

__init__(parameter_estimates, invF, prior[, …])

Constructor method

get_marginals([parameter_estimates, invF, …])

Creates list of 1D and 2D marginal distributions ready for plotting

Inherited from LikelihoodFreeInference

__init__(parameter_estimates, invF, prior[, …])

Constructor method

get_gridsize(gridsize, size)

Propogates gridpoints per parameter into a list if int provided

get_levels(marginal, ranges[, levels])

Used for approximately calculating confidence region isocontours

setup_plot([ax, ranges, axis_labels, …])

Builds corner plot

marginal_plot([ax, ranges, marginals, …])

Plots the marginal distribution corner plots

put_marginals(marginals)

Creates list of 1D and 2D marginal distributions ready for plotting

target_choice(target)

Returns the indices of targets to plot and the number of targets

Private Methods:

Inherited from LikelihoodFreeInference

_scatter_plot([ax, ranges, points, label, …])

Plotter for scatter plots


get_marginals(parameter_estimates=None, invF=None, ranges=None, gridsize=None)

Creates list of 1D and 2D marginal distributions ready for plotting

The marginal distribution lists from full distribution array. For every parameter the full distribution is summed over every other parameter to get the 1D marginals and for every combination the 2D marginals are calculated by summing over the remaining parameters. The list is made up of a list of n_params lists which contain n_columns number of objects. The value of the distribution comes from

Parameters
  • parameter_estimates (float(n_targets, n_params) or None, default=None) – The parameter estimates of each target data. If None the class instance parameter estimates are used

  • invF (float(n_targets, n_params, n_params) or None, default=None) – The inverse Fisher information matrix for each target. If None the class instance inverse Fisher information matrices are used

  • ranges (list or None, default=None) – A list of arrays containing the gridpoints for the marginal distribution for each parameter. If None the class instance ranges are used determined by the prior range

  • gridsize (list or None, default=None) – If using own ranges then the gridsize for these ranges must be passed (not checked)

Returns

The 1D and 2D marginal distributions for each parameter (of pair)

Return type

list of lists

ApproximateBayesianComputation

class imnn.lfi.ApproximateBayesianComputation(target_data, prior, simulator, compressor, gridsize=100, F=None, distance_measure=None)

Approximate Bayesian computation

Approximate Bayesian computation involves drawing parameter values from a prior, \(\boldsymbol{\theta}_i\leftarrow p(\boldsymbol{\theta})\), and generating simulations at this value using a model, \({\bf d}_i = f(\boldsymbol{\theta}_i)\), which can then be compressed down to some informative summary, \({\bf x}_i =g({\bf d}_i)\). By compressing some observed target to get \({\bf x}^\textrm{obs} =g({\bf d}^\textrm{obs})\) we can simulate from the prior distribution of possible parameter values to create a set of summaries \(\{{\bf x}_i|i\in[1, n_\textrm{sims}]\}\). The approximate posterior distribution is then obtained as the samples of parameter values whose summaries of simulations have an infinitely small difference from the summary of the target observation:

\[p(\boldsymbol{\theta}|{\bf d}^\textrm{obs})\approx \{\boldsymbol{\theta}_i| i~\textrm{where}~\lim_{\epsilon\to0} D(g(f(\boldsymbol{\theta}_i), g({\bf d}^\textrm{obs}))<\epsilon\}\]

Note that the distance measure \(D(x, y)\) is somewhat ambiguous, but in the infinitessimal limit will be more or less flat. Sensible choices for this distance measure can be made, i.e. for parameter estimates a Fisher scaled squared distance, but it is not wholly trivial in all cases.

This class contains several methods for making simulations, compressing them and doing distance based acceptance and rejection of the parameters used to make the simulations from the compressed summaries of target data. In particular either a single set of simulations can be run and the distance calculated, simulations can be run until a desired number of accepted points within achosen distance are obtained or if there are already a set of run simulations, different distance values can be chosen for acceptance and rejection of points. There are also several plotting methods for corner plots.

Note that if the summaries of the data are not parameter estimates then the distance measure must be optimal for the summary space to make sense.

Parameters
  • target_summaries (float(n_targets, n_summaries)) – The compressed data to be infered

  • n_summaries (int) – The size of the output of the compressor for a single simulation

  • n_targets (int) – The number of different targets to be infered

  • F (float(n_params, n_params)) – The Fisher information matrix for rescaling distance measure

  • parameters (utils container) – Holds accepted and rejected parameters and some summaries of results

  • summaries (utils container) – Holds accepted and rejected summaries and some summaries of results

  • distances (utils container) – Holds accepted and rejected distances of summaries to targets and some summaries of results

simulator()

A simulator which takes in an array of parameters and returns simulations made at those parameter values

compressor()

A function which takes in an array of simulations and compresses them. If no compression is needed this can be an identity function

distance_measure:

Either F_distance or euclidean_distance depending on inputs

Public Methods:

__init__(target_data, prior, simulator, …)

Constructor method

__call__(ε=None[, rng, n_samples, …])

Run the ABC

set_samples(parameters, summaries[, …])

Fills containers and calculates distances

get_samples(rng, n_samples[, dist])

Draws parameters from prior and makes and compresses simulations

set_accepted(ε[, smoothing])

Sets the accepted and rejected attributes of the containers

get_min_accepted(rng, ε, accepted[, …])

Iteratively run ABC until a minimum number of samples are accepted

set_marginals([accepted_parameters, ranges, …])

Wrapper for get_marginals to set the marginals attribute

get_marginals([accepted_parameters, ranges, …])

Creates the 1D and 2D marginal distribution list for plotting

scatter_plot([ax, ranges, points, label, …])

Plot a scatter corner plot of all accepted parameter pair values

scatter_summaries([ax, ranges, points, …])

Plot a scatter corner plot of all accepted summary pair values

F_distance(x, y, F)

Default distance measure, squared difference scaled by Fisher matrix

euclidean_distance(x, y[, aux])

Euclidean distance between simulation summaries and target summary

Inherited from LikelihoodFreeInference

__init__(target_data, prior, simulator, …)

Constructor method

get_gridsize(gridsize, size)

Propogates gridpoints per parameter into a list if int provided

get_levels(marginal, ranges[, levels])

Used for approximately calculating confidence region isocontours

setup_plot([ax, ranges, axis_labels, …])

Builds corner plot

marginal_plot([ax, ranges, marginals, …])

Plots the marginal distribution corner plots

put_marginals(marginals)

Creates list of 1D and 2D marginal distributions ready for plotting

target_choice(target)

Returns the indices of targets to plot and the number of targets

Private Methods:

Inherited from LikelihoodFreeInference

_scatter_plot([ax, ranges, points, label, …])

Plotter for scatter plots


F_distance(x, y, F)

Default distance measure, squared difference scaled by Fisher matrix

The target summaries are expanded on the zeroth axis so that it can broadcast with the simulation summaries.

Parameters
  • x ((any, n_params)) – Envisioned as a whole batch of summaries of simulations

  • y ((n_params)) – Envisioned as a target summary

  • F ((n_params)) – Fisher information matrix to scale the summary direction

euclidean_distance(x, y, aux=None)

Euclidean distance between simulation summaries and target summary

The target summaries are expanded on the zeroth axis so that it can broadcast with the simulation summaries.

Parameters
  • x ((any, n_params)) – Envisioned as a whole batch of summaries of simulations

  • y ((n_params)) – Envisioned as a target summary

  • aux (None, default=Note) – Empty holder so that function works in the same way as F_distance

get_marginals(accepted_parameters=None, ranges=None, gridsize=None, smoothing=None)

Creates the 1D and 2D marginal distribution list for plotting

Using list of parameter values (accepted by the ABC) an approximate set of marginal distributions for plotting are created based on histogramming the points. Smoothing can be performed on the histogram to avoid undersampling artefacts.

For every parameter the full distribution is summed over every other parameter to get the 1D marginals and for every combination the 2D marginals are calculated by summing over the remaining parameters. The list is made up of a list of n_params lists which contain n_columns number of objects.

Parameters
  • accepted_parameters (float(any, n_params) or None, default=None) – An array of all accepted parameter values. If None, the accepted parameters from the parameters class attribute are used

  • ranges (list or None, default=None) – A list of arrays containing the bin centres for the marginal distribution obtained by histogramming for each parameter. If None the ranges are constructed from the ranges of the prior distribution.

  • gridsize (list or None, default=None) – The number of grid points to evaluate the marginal distribution on for each parameter. This needs to be set if ranges is passed (and different from the gridsize set on initialisation)

  • smoothing (float or None, default=None) – A Gaussian smoothing for the marginal distributions. Smoothing not done if smoothing is None

Returns

The 1D and 2D marginal distributions for each parameter (of pair)

Return type

list of lists

get_min_accepted(rng, ε, accepted, n_simulations=1, max_iterations=10, smoothing=None, verbose=True)

Iteratively run ABC until a minimum number of samples are accepted

Setting a maximum distance that simulation summaries can be allowed to be from the summaries of each target an iterative scheme can be used to get n_samples simulations at a time, compress them and calculate the distances until the minimum number of desired samples are within the allowed cutoff distance. If multiple targets are being inferred then the simulations are run until all targets have at least the minimum number of accepted samples.

Parameters
  • rng (int(2,)) – A random number generator to draw the parameters and make simulations with

  • ϵ (float or float(n_targets)) – The acceptance distance between summaries from simulations and the summary of the target data. A different epsilon can be passed for each target.

  • accepted (int) – The minimum number of samples to be accepted within the ϵ cutoff

  • n_simulations (int, default=1) – The number of simulations to do at once

  • max_iterations (int, default=10) – Maximum number of iterations in the while loop to prevent infinite runs. Note if max_iterations is reached then there are probably not the required number of accepted samples

  • smoothing (float or None, default=None) – A Gaussian smoothing for the marginal distributions

  • verbose (bool, default=True) – Whether to print out the running stats (number accepted, iterations, etc)

Returns

  • float(any, n_params) – All parameters values drawn

  • float(any, n_summaries) – All summaries of simulations made

  • float(n_target, any) – The distance of every summary to each target

get_samples(rng, n_samples, dist=None)

Draws parameters from prior and makes and compresses simulations

Simulations are done n_samples in parallel

Parameters
  • rng (int(2,)) – A random number generator to draw the parameters and make simulations with

  • n_samples (int) – Number of parallel samples to draw

  • dist (fn or None, default=None) – A distribution to sample, if None the samples will be drawn from the prior set in the class initialisation

scatter_plot(ax=None, ranges=None, points=None, label=None, axis_labels=None, colours=None, hist=True, s=5, alpha=1.0, figsize=(10, 10), linestyle='solid', target=None, ncol=2, bbox_to_anchor=(0.0, 1.0))

Plot a scatter corner plot of all accepted parameter pair values

Plots scatter plots for parameter values in 2D subplots (and optionally the histogram of points in the 1D diagonal subplots).

Parameters
  • ax (list of matplotlib axes or None, default=None) – If ax is None a new figure and axes are created to make a corner plot otherwise a set of axes are formatted correctly for a corner plot. If existing ax is not correctly shaped there will be an error

  • ranges (list, default=None) – The list of ranges to set the number of rows and columns (if this is None the ranges set on initialisation will be used)

  • points (float(n_targets, n_points, {n_params} or {n_summaries})) – The points to scatter plot, if None the class instance of parameters.accepted will be used

  • label (str or None, default=None) – Name to be used in the legend

  • axis_labels (list of str or None, default=None) – A list of names for each parameter, no axis labels if None

  • colours (str or list or None, default=None) – The colour or list of colours to use for the different targets, if None then the normal matplotlib colours are used

  • hist (bool, default=True) – Whether or not to plot histograms on the diagonal of the corner plot

  • s (float, default=5.) – The size of the marker points in the scatter plot

  • alpha (float, default=1.) – The amount of alpha colour for the marker points

  • figsize (tuple, default=(10, 10)) – The size (in inches) to create a figure (if ax is None)

  • linestyle (str, default="solid") – Linestyle for the histograms

  • target (None or int or list, default=None) – The indices to choose which target’s points are plotted

  • format (bool, default=False) – If formatting is not needed

  • ncols (int, default=2) – Number of columns for the legend

  • bbox_to_anchor (tuple, default=(0.0, 1.0)) – Position to fix the legend to

Returns

The scatter plot axes

Return type

axes object

scatter_summaries(ax=None, ranges=None, points=None, label=None, axis_labels=None, colours=None, hist=True, s=5, alpha=1.0, figsize=(10, 10), linestyle='solid', gridsize=100, target=None, format=False, ncol=2, bbox_to_anchor=(0.0, 1.0))

Plot a scatter corner plot of all accepted summary pair values

Plots scatter plots for summary values in 2D subplots (and optionally the histogram of points in the 1D diagonal subplots).

Parameters
  • ax (list of matplotlib axes or None, default=None) – If ax is None a new figure and axes are created to make a corner plot otherwise a set of axes are formatted correctly for a corner plot. If existing ax is not correctly shaped there will be an error

  • ranges (list, default=None) – The list of ranges to set the number of rows and columns (if this is None the ranges set on initialisation will be used)

  • points (float(n_targets, n_points, {n_params} or {n_summaries})) – The points to scatter plot, if None the class instance of summaries.accepted will be used

  • label (str or None, default=None) – Name to be used in the legend

  • axis_labels (list of str or None, default=None) – A list of names for each parameter, no axis labels if None

  • colours (str or list or None, default=None) – The colour or list of colours to use for the different targets, if None then the normal matplotlib colours are used

  • hist (bool, default=True) – Whether or not to plot histograms on the diagonal of the corner plot

  • s (float, default=5.) – The size of the marker points in the scatter plot

  • alpha (float, default=1.) – The amount of alpha colour for the marker points

  • figsize (tuple, default=(10, 10)) – The size (in inches) to create a figure (if ax is None)

  • linestyle (str, default="solid") – Linestyle for the histograms

  • target (None or int or list, default=None) – The indices to choose which target’s points are plotted

  • format (bool, default=False) – If formatting is not needed

  • ncols (int, default=2) – Number of columns for the legend

  • bbox_to_anchor (tuple, default=(0.0, 1.0)) – Position to fix the legend to

Returns

The scatter plot axes

Return type

axes object

set_accepted(ε, smoothing=None)

Sets the accepted and rejected attributes of the containers

Using a distance (or list of distances for each target) cutoff between simulation summaries and summaries from some target the accepted and rejected parameter values (and summaries) can be defined. These points are used to make an approximate set of marginal distributions for plotting based on histogramming the points - where smoothing can be performed on the histogram to avoid undersampling artefacts.

Parameters
  • ϵ (float or float(n_targets)) – The acceptance distance between summaries from simulations and the summary of the target data. A different epsilon can be passed for each target.

  • smoothing (float or None, default=None) – A Gaussian smoothing for the marginal distributions

get_accepted:

Returns a boolean array with whether summaries are within ϵ

set_marginals(accepted_parameters=None, ranges=None, gridsize=None, smoothing=None)

Wrapper for get_marginals to set the marginals attribute

Parameters
  • accepted_parameters (float(any, n_params) or None, default=None) – An array of all accepted parameter values

  • ranges (list or None, default=None) – A list of arrays containing the bin centres for the marginal distribution obtained by histogramming for each parameter.

  • gridsize (list or None, default=None) – The number of grid points to evaluate the marginal distribution on for each parameter

  • smoothing (float or None, default=None) – A Gaussian smoothing for the marginal distributions

set_samples(parameters, summaries, distances=None, replace=False)

Fills containers and calculates distances

All parameters and summaries of simulations made at those parameters (and their distance from the summary of the observed data) are kept in containers for ease of use. This function sets the all attribute of these containers with the passed summaries and parameters (and distances if provided or it calculates the distances). These are concatenated to existing values unless replace = True in which the existing values are removed and overwritten with the passed summaries, parameters and distances.

Parameters
  • parameters (float(any, n_params)) – The pre-obtained parameter values to set to the class

  • summaries (float(any, n_summaries)) – The summaries of prerun simulations (at parameter values corresponding to the values in parameters)

  • distances (float(n_targets, any) or None, default=None) – The distances of the summaries from the summaries of each target. If None then the distances will be calculated

  • replace (bool, default=False) – Whether to replace the summaries, parameters and distances already obtained

PopulationMonteCarlo

class imnn.lfi.PopulationMonteCarlo(target_data, prior, simulator, compressor, F=None, gridsize=100, distance_measure=None)

Population driven approximate Bayesian computation sample retrieval

SEEMS TO BE A BUG IN THIS MODULE CURRENTLY!

Approximate Bayesian computation tends to be very expensive because it depends on drawing samples from all over the prior and a lot of the domain may be extremely unlikely without good knowledge of the sampling distribution. To make sampling more efficient, the population Monte Carlo iteratively updates the proposal distribution for new samples based on the density of the closest summaries of simulations to the summary of the target.

Parameters
  • acceptance_reached (float(n_targets,)) – The number of draws accepted over the number over the number of draws made for each target

  • iterations (int(n_targets,)) – The number of iterations of population updates for each target

  • total_draws (int(n_targets,)) – The total number of draws made (equivalent to number of simulations made) for each target

Public Methods:

__init__(target_data, prior, simulator, …)

Constructor method

__call__(rng, n_points[, percentile, …])

Run the PMC

move_samples(rng, samples, summaries, …[, …])

Loop which updates the population iteratively for individual target

set_samples(parameters, summaries[, …])

Fills containers and calculates distances

set_accepted([smoothing])

Container values to list of accepted attributes, builds marginals

w_cov(proposed, weighting)

Inherited from ApproximateBayesianComputation

__init__(target_data, prior, simulator, …)

Constructor method

__call__(rng, n_points[, percentile, …])

Run the PMC

set_samples(parameters, summaries[, …])

Fills containers and calculates distances

get_samples(rng, n_samples[, dist])

Draws parameters from prior and makes and compresses simulations

set_accepted([smoothing])

Container values to list of accepted attributes, builds marginals

get_min_accepted(rng, ε, accepted[, …])

Iteratively run ABC until a minimum number of samples are accepted

set_marginals([accepted_parameters, ranges, …])

Wrapper for get_marginals to set the marginals attribute

get_marginals([accepted_parameters, ranges, …])

Creates the 1D and 2D marginal distribution list for plotting

scatter_plot([ax, ranges, points, label, …])

Plot a scatter corner plot of all accepted parameter pair values

scatter_summaries([ax, ranges, points, …])

Plot a scatter corner plot of all accepted summary pair values

F_distance(x, y, F)

Default distance measure, squared difference scaled by Fisher matrix

euclidean_distance(x, y[, aux])

Euclidean distance between simulation summaries and target summary

Inherited from LikelihoodFreeInference

__init__(target_data, prior, simulator, …)

Constructor method

get_gridsize(gridsize, size)

Propogates gridpoints per parameter into a list if int provided

get_levels(marginal, ranges[, levels])

Used for approximately calculating confidence region isocontours

setup_plot([ax, ranges, axis_labels, …])

Builds corner plot

marginal_plot([ax, ranges, marginals, …])

Plots the marginal distribution corner plots

put_marginals(marginals)

Creates list of 1D and 2D marginal distributions ready for plotting

target_choice(target)

Returns the indices of targets to plot and the number of targets

Private Methods:

Inherited from LikelihoodFreeInference

_scatter_plot([ax, ranges, points, label, …])

Plotter for scatter plots


move_samples(rng, samples, summaries, distances, weighting, target, F, ε_ind=None, acceptance_ratio=None, max_iteration=None, max_acceptance=None, max_samples=None, n_parallel_simulations=None)

Loop which updates the population iteratively for individual target

The values of the parameters corresponding to the most distant summaries to the summary of the target data are considered to be the means of a new proposal distribution with a covariance given by the population weighted samples (weighted from the prior in the first instance). From this proposal distribution a new sample (or a parallel set of samples) is proposed and a simulation made at each sample, compressed and the distance from the summary of the target data calculated. If this distance is within the population then that proposal is accepted and the weighting is updated with this new sample and the new furthest samples will be updated with this new proposal distribution on the next iteration. This is continued until the number of accepted draws is small compared to the total number of draws (this is the acceptance_ratio). At this point the distribution is somewhat stationary (as long as acceptance_ratio is small enough) and the samples approximately come from the target distribution.

Parameters
  • rng (int(2,)) – A random number generator to draw the parameters and make simulations with

  • samples (float(n_points, n_params)) – The parameter values from the ABC step

  • summaries (float(n_points, n_summaries)) – The summaries of prerun simulations from the ABC step (at parameter values corresponding to the values in samples)

  • distances (float(n_points,)) – The distances of the summaries from the summaries of a target from the ABC step

  • weighting (float(n_points,)) – The value of the prior evaluated at the parameter values obtained in the ABC step

  • target (float(n_summaries,)) – The summary of the target to infer the parameter values of

  • F (float(n_params, n_params)) – The Fisher information matrix to rescale parameter directions for measuring the distance. This should be set to the identity matrix float(n_summaries, n_summaries) if the summaries are not parameter estimates

  • ϵ_ind (float, default=None) – The index of the outer most population simulation

  • acceptance_ratio (float, default=None) – The ratio of the number of accepted points to total proposals. When this gets small it suggests that points aren’t being added to the population any more because the population is stationary

  • max_iteration (int, default=None) – The cutoff number of iterations to break out of the while loop even if acceptance_ratio is not reached

  • max_acceptance (int, default=None) – The cutoff number of attempts in a single iteration to attempt to get a sample accepted

  • max_samples (int, default=None) – The number of attempts to get parameter values from the truncated normal distribution

  • n_parallel_simulations (int or None, default=None) – number of simulations to do at once, the innermost (closest) summaries are the only ones accepted, but this can massively reduce sampling time as long as the simulations can be easily parallelised

Returns

  • int(2,) – A random number generator to draw the parameters and make simulations with

  • float(n_points, n_params) – The accepted parameter values

  • float(n_points, n_summaries) – The summaries corresponding to the accepted parameter values, i.e. the closest summaries to the summary of the target data

  • float(n_points,) – The distances of the summaries from the summaries of the target

  • float(n_points,) – The value of the weighting from by the population

  • float – The acceptance ratio reach by the final iteration

  • int – The number of iterations run

  • int – The total number of draws from the proposal distribution over all iterations

single_iteration_condition:

Checks if the acceptance ratio or maximum iterations is reached

single_iteration:

Runs a single (or parallel set of) simulation(s), accepts if closer

set_accepted(smoothing=None)

Container values to list of accepted attributes, builds marginals

Parameters

smoothing (float or None, default=None) – A Gaussian smoothing for the marginal distributions

set_samples(parameters, summaries, distances=None, replace=False)

Fills containers and calculates distances

All parameters and summaries of simulations made at those parameters (and their distance from the summary of the observed data) are kept in containers for ease of use. This function sets the all attribute of these containers with the passed summaries and parameters (and distances if provided or it calculates the distances). These are concatenated to existing values unless replace = True in which the existing values are removed and overwritten with the passed summaries, parameters and distances.

Parameters
  • parameters (float(n_targets, n_points, n_params)) – The pre-obtained parameter values to set to the class

  • summaries (float(n_targets, n_points, n_summaries)) – The summaries of prerun simulations (at parameter values corresponding to the values in parameters)

  • distances (float(n_targets, n_points) or None, default=None) – The distances of the summaries from the summaries of each target. If None then the distances will be calculated

  • replace (bool, default=False) – Whether to replace the summaries, parameters and distances already obtained