Likelihood-free inference modules¶

To be able to make use of the outputs of the IMNN it is best to use likelihood-free inference (this is true when using any sort of neural network, but anyway - I digress). Any lfi or sbi module should be able to use these outputs where the network (once trained) can be added to the simulation as a compression step. For ease of use there are some jax implementations of a few different approximate Bayesian computation algorithms. These also come with a plotting function. Examples are available here. Note that the module is not fully tested, but seems to be functionally working. There are more functions available with pip install imnn-tf where there is a previous implementation with more general versions of the same functions and checkout also github.com/justinalsing/pydelfi

Doing likelihood-free inference¶

With a trained IMNN it is possible to get an estimate of some data using

estimate = IMNN.get_estimate(target_data)

Along with the Fisher information from the network, we can use this to make a Gaussian approximation of the posterior under the assumption that the fiducial parameter values used to calculate the Fisher information coincide with the parameter estimate. This posterior can be calculated using the imnn.lfi module. For all of the available functions in the lfi module a TensorFlow Probability-like distribution is used for the prior, e.g. for a uniform distribution for two parameters between 0 and 10 each we could write

import tensorflow_probability
tfp = tensorflow_probability.substrates.jax

prior = tfp.distributions.Blockwise(
[tfp.distributions.Uniform(low=low, high=high)
 for low, high in zip([0., 0.], [10., 10.])])

 prior.low = np.array([0., 0.])
 prior.high = np.array([10., 10.])

We set the values of prior.low and prior.high since they are used to define the plotting ranges. Note that prior.event_shape should be equal to n_params, i.e. the number of parameters in the physical model.

Gaussian Approximation¶

The GaussianApproximation simply evaluates a multivariate Gaussian with mean at estimate and covariance given by np.linalg.inv(IMNN.F) on a grid defined by the prior ranges.

GA = imnn.lfi.GaussianApproximation(
  parameter_estimates=estimate,
  invF=np.linalg.inv(IMNN.F),
  prior=prior,
  gridsize=100)

And corner plots of the Gaussian approximation can be made using

GA.marginal_plot(
  ax=None,                   # Axes object to plot (constructs new if None)
  ranges=None,               # Ranges for each parameter (None=preset)
  marginals=None,            # Marginal distributions to plot (None=preset)
  known=None,                # Plots known parameter values if not None
  label=None,                # Adds legend element if not None
  axis_labels=None,          # Adds labels to the axes if not None
  levels=None,               # Plot specified approx significance contours
  linestyle="solid",         # Linestyle for the contours
  colours=None,              # Colour for the contours
  target=None,               # If multiple target data, which index to plot
  format=False,              # Whether to set up the plot decoration
  ncol=2,                    # Number of columns in the legend
  bbox_to_anchor=(1.0, 1.0)) # Where to place the legend

Note that this approximation shouldn’t be necessarily a good estimate of the true posterior, for that actual LFI methods should be used.

Approximate Bayesian Computation¶

To generate simulations and accept or reject these simulations based on a distance based criterion from some target data we can use

ABC = imnn.lfi.ApproximateBayesianComputation(
  target_data=target_data,
  prior=prior,
  simulator=simulator,
  compressor=IMNN.get_estimate,
  gridsize=100,
  F=IMNN.F,
  distance_measure=None)

This takes in the target data and compresses it using the provided compressor (like IMNN.get_estimate). The Fisher information matrix can be provided to rescale the parameter directions to make meaningful distance measurements as long as summaries are parameter estimates. If a different distance measure is better for the specific problem this can be passed as a function. Note that if simulations have already been done for the ABC and only the plotting and the acceptance and rejection is needed then simulator can be set to None. The ABC can actually be run by calling the module

parameters, summaries, distances = ABC(
    ϵ=None,             # The size of the epsilon ball to accept summaries
    rng=None,           # Random number generator for params and simulation
    n_samples=None,     # The number of samples to run (at one time)
    parameters=None,    # Values of parameters with premade compressed sims
    summaries=None,     # Premade compressed sims to avoid running new sims
    min_accepted=None,  # Num of required sims in epsilon ball (iterative)
    max_iterations=10,  # Max num of iterations to try and get min_accepted
    smoothing=None,     # Amount of smoothing on the histogrammed marginals
    replace=False)      # Whether to remove all previous run summaries

If not run and no premade simulations have been made then n_samples and rng must be passed. Note that if ϵ is too large then the accepted samples should not be considered to be drawn from the posterior but rather some partially marginalised part of the joint distribution of summaries and parameters, and hence it can be very misleading - ϵ should be a small as possible! Like with the GaussianApproximation there is a ABC.marginal_plot(...) but the parameter samples can also be plotted as a scatter plot on the corner plot

ABC.scatter_plot(
    ax=None,                 # Axes object to plot (constructs new if None)
    ranges=None,             # Ranges for each parameter (None=preset)
    points=None,             # Parameter values to scatter (None=preset)
    label=None,              # Adds legend element if not None
    axis_labels=None,        # Adds labels to the axes if not None
    colours=None,            # Colour for the scatter points (and hists)
    hist=True,               # Whether to plot 1D histograms of points
    s=5,                     # Marker size for points
    alpha=1.,                # Amount of transparency for the points
    figsize=(10, 10),        # Size of the figure if not premade
    linestyle="solid",       # Linestyle for the histograms
    target=None,             # If multiple target data, which index to plot
    ncol=2,                  # Number of columns in the legend
    bbox_to_anchor=(0., 1.)) # Where to place the legend

And the summaries can also be plotted on a corner plot with exactly the same parameters as scatter_plot (apart from gridsize being added) but if points is left None then ABC.summaries.accepted is used instead and the ranges calculated from these values. If points is supplied but ranges is None then the ranges are calculated from the minimum and maximum values of the points are used as the edges.

ABC.scatter_summaries(
    ax=None,
    ranges=None,
    points=None,
    label=None,
    axis_labels=None,
    colours=None,
    hist=True,
    s=5,
    alpha=1.,
    figsize=(10, 10),
    linestyle="solid",
    gridsize=100,
    target=None,
    format=False,
    ncol=2,
    bbox_to_anchor=(0.0, 1.0))

Population Monte Carlo¶

To more efficiently accept samples than using a simple ABC where samples are drawn from the prior, the PopulationMonteCarlo provides a JAX accelerated iterative acceptance and rejection scheme where each iteration the population of samples with summaries closest to the summary of the desired target defines a new proposal distribution to force a fixed population to converge towards the posterior without setting an explicit size for the epsilon ball of normal ABC. The PMC is stopped using a criterion on the number of accepted proposals compared to the number of total draws from the proposal. When this gets very small it suggests the distribution is stationary and that the proposal has been reached. It works similarly to ApproximateBayesianComputation.

PMC = imnn.lfi.PopulationMonteCarlo(
  target_data=target_data,
  prior=prior,
  simulator=simulator,
  compressor=IMNN.get_estimate,
  gridsize=100,
  F=IMNN.F,
  distance_measure=None)

And it can be run using

parameters, summaries, distances = PMC(
    rng,                     # Random number generator for params and sims
    n_points,                # Number of points from the final distribution
    percentile=None,         # Percentage of points making the population
    acceptance_ratio=0.1,    # Fraction of accepted draws vs total draws
    max_iteration=10,        # Maximum number of iterations of the PMC
    max_acceptance=1,        # Maximum number of tries to get an accepted
    max_samples=int(1e5),    # Maximum number of attempts to get parameter
    n_initial_points=None,   # Number of points in the initial ABC step
    n_parallel_simulations=None, # Number of simulations to do in parallel
    proposed=None,           # Prerun parameter values for the initial ABC
    summaries=None,          # Premade compressed simulations for ABC
    distances=None,          # Precalculated distances for the initial ABC
    smoothing=None,          # Amount of smoothing on histogrammed marginal
    replace=False)           # Whether to remove all previous run summaries

The same plotting functions as ApproximateBayesianComputation are also available in the PMC

Note¶

There seems to be a bug in PopulationMonteCarlo and the parallel sampler is turned off

LikelihoodFreeInference¶

class imnn.lfi.LikelihoodFreeInference(prior, gridsize=100, marginals=None)¶

Base class (some functionality) for likelihood free inference methods

Mostly used for plotting

Parameters

gridsize (list) – The number of grid points to evaluate the marginal distribution on for each parameter
ranges (list) – A list of arrays containing the gridpoints for the marginal distribution for each parameter
marginal (list of lists) – A list of rows and columns of marginal distributions of a corner plot

prior()¶: A prior distribution which can be evaluated and sampled from (should also contain a low and a high attribute with appropriate ranges)

Public Methods:

`__init__`(prior[, gridsize, marginals])	Constructor method
`get_gridsize`(gridsize, size)	Propogates gridpoints per parameter into a list if int provided
`get_levels`(marginal, ranges[, levels])	Used for approximately calculating confidence region isocontours
`setup_plot`([ax, ranges, axis_labels, …])	Builds corner plot
`marginal_plot`([ax, ranges, marginals, …])	Plots the marginal distribution corner plots
`put_marginals`(marginals)	Creates list of 1D and 2D marginal distributions ready for plotting
`target_choice`(target)	Returns the indices of targets to plot and the number of targets

Private Methods:

_scatter_plot([ax, ranges, points, label, …])

Plotter for scatter plots

_scatter_plot(ax=None, ranges=None, points=None, label=None, axis_labels=None, colours=None, hist=True, s=5.0, alpha=1.0, figsize=(10, 10), linestyle='solid', target=None, format=False, ncol=2, bbox_to_anchor=(0.0, 1.0))¶

Plotter for scatter plots

Plots scatter plots for points (parameters or summaries) in 2D subplots and the histogram of points in the 1D diagonal subplots.

Parameters

ax (list of matplotlib axes or None, default=None) – If ax is None a new figure and axes are created to make a corner plot otherwise a set of axes are formatted correctly for a corner plot. If existing ax is not correctly shaped there will be an error
ranges (list, default=None) – The list of ranges to set the number of rows and columns (if this is None the ranges set on initialisation will be used)
points (float(n_targets, n_points, {n_params} or {n_summaries})) – The points to scatter plot
label (str or None, default=None) – Name to be used in the legend
axis_labels (list of str or None, default=None) – A list of names for each parameter, no axis labels if None
colours (str or list or None, default=None) – The colour or list of colours to use for the different targets, if None then the normal matplotlib colours are used
hist (bool, default=True) – Whether or not to plot histograms on the diagonal of the corner plot
s (float, default=5.) – The size of the marker points in the scatter plot
alpha (float, default=1.) – The amount of alpha colour for the marker points
figsize (tuple, default=(10, 10)) – The size (in inches) to create a figure (if ax is None)
linestyle (str, default="solid") – Linestyle for the histograms
target (None or int or list, default=None) – The indices to choose which target’s points are plotted
format (bool, default=False) – If formatting is not needed
ncols (int, default=2) – Number of columns for the legend
bbox_to_anchor (tuple, default=(0.0, 1.0)) – Position to fix the legend to

Returns

The scatter plot axes

Return type

axes object

Raises

ValueError – if colour input is not correct

get_gridsize(gridsize, size)¶

Propogates gridpoints per parameter into a list if int provided

Parameters

gridsize (int or list) – The number of grid points to evaluate the marginal distribution on for every parameter (int) or each parameter (list)
size (int) – Number of parameters describing size of gridsize list

Returns

The number of gridpoints to evaluate marginals for each parameter

Return type

list

Raises

ValueError – If list passed is wrong length
TypeError – If gridsize is not int or list of correct length

get_levels(marginal, ranges, levels=[0.68, 0.95])¶

Used for approximately calculating confidence region isocontours

To calculate the values of the marginal distribution whose isocontour contains approximately levels specified fraction of samples drawn from the distribution the marginal distribution is sorted by value and normalised to one. If the distribution does is significantly non-zero outside of the range then this will cause large biases! The cumulative sum of the sorted values are then calculated and the value at which the index of this normalised cumulative distribution is closest to the desired level is used to return the value of the original marginal distribution.

Parameters

marginal (float(gridsize*n_param)) – The gridded marginal distribution to find the isocontour values of
ranges (list) – List of the grid points for each parameter
levels (list, default=[0.68, 0.95]) – The fraction describing the percentage of samples inside the isocontour

Returns

The values of the isocontours of the marginal distributions

Return type

list

marginal_plot(ax=None, ranges=None, marginals=None, known=None, label=None, axis_labels=None, levels=None, linestyle='solid', colours=None, target=None, format=False, ncol=2, bbox_to_anchor=(1.0, 1.0))¶

Plots the marginal distribution corner plots

Plots the 68% and 95% (approximate) confidence contours for each 2D parameter pair and the 1D densities as a corner plot.

Parameters

ax (list of matplotlib axes or None, default=None) – If ax is None a new figure and axes are created to make a corner plot otherwise a set of axes are formatted correctly for a corner plot. If existing ax is not correctly shaped there will be an error
ranges (list, default=None) – The list of ranges to set the number of rows and columns (if this is None the ranges set on initialisation will be used)
marginals (list of lists (n_params * n_params) or None, default=None) – The marginal distributions for every parameter and every 2D combination. If None the marginals defined in the class are used (if they exist)
known (float(n_params,)) – Values to plot known parameter value lines at
label (str or None, default=None) – Name to be used in the legend
axis_labels (list of str or None, default=None) – A list of names for each parameter, no axis labels if None
linestyle (str, default="solid") – Linestyle for the histograms
colours (str or list or None, default=None) – The colour or list of colours to use for the different targets, if None then the normal matplotlib colours are used
target (None or int or list, default=None) – The indices to choose which target’s points are plotted
format (bool, default=False) – If formatting is not needed
ncols (int, default=2) – Number of columns for the legend
bbox_to_anchor (tuple, default=(0.0, 1.0)) – Position to fix the legend to

Returns

The scatter plot axes

Return type

axes object

Raises

ValueError – if marginals is not provided
TypeError – if marginals is not a list

put_marginals(marginals)¶

Creates list of 1D and 2D marginal distributions ready for plotting

The marginal distribution lists from full distribution array. For every parameter the full distribution is summed over every other parameter to get the 1D marginals and for every combination the 2D marginals are calculated by summing over the remaining parameters. The list is made up of a list of n_params lists which contain n_columns number of objects.

Parameters: marginals (float(n_targets, gridsize*n_params)) – The full distribution from which to calculate the marginal distributions
Returns: The 1D and 2D marginal distributions for each parameter (of pair)
Return type: list of lists

setup_plot(ax=None, ranges=None, axis_labels=None, figsize=(10, 10), format=False)¶

Builds corner plot

axlist of matplotlib axes or None, default=None: If ax is None a new figure and axes are created to make a corner plot otherwise a set of axes are formatted correctly for a corner plot. If existing ax is not correctly shaped there will be an error
rangeslist, default=None: The list of ranges to set the number of rows and columns (if this is None there will be an error)
axis_labelslist of str or None, default=None: A list of names for each parameter, no axis labels if None
figsizetuple, default=(10, 10): The size (in inches) to create a figure (if ax is None)
formatbool, default=False: If formatting is not needed

Returns: The formatted matplotlib axes
Return type: axes object

target_choice(target)¶

Returns the indices of targets to plot and the number of targets

Parameters

target (None or int or list) – The indices of the targets to plot. If None then n_targets from the class instance is used

Returns

list – The indices of the targets to be plotted
int – The number of targets to be plotted

GaussianApproximation¶

class imnn.lfi.GaussianApproximation(parameter_estimates, invF, prior, gridsize=100)¶

Uses Fisher information and parameter estimates approximate marginals

Since the inverse of the Fisher information matrix describes the minimum variance of some estimator we can use it to make an approximate (Gaussian- distributed) estimate of the target distribution. Note that this will not reflect the true shape of the target distribution and as likely to underestimate the distribution as overestimate it. Furthermore, if the Fisher information matrix is calculated far from the estimate of the parameter values then its value may not be representative of the Fisher information at that position and so the variance estimated from its inverse be incorrect.

Parameters

parameter_estimates (float(n_targets, n_params)) – The parameter estimates of each target data
invF (float(n_targets, n_params, n_params)) – The inverse Fisher information matrix for each target
marginals (list of lists) – The 1D and 2D marginal distribution for each target

Public Methods:

`__init__`(parameter_estimates, invF, prior[, …])	Constructor method
`get_marginals`([parameter_estimates, invF, …])	Creates list of 1D and 2D marginal distributions ready for plotting

Inherited from LikelihoodFreeInference

`__init__`(parameter_estimates, invF, prior[, …])	Constructor method
`get_gridsize`(gridsize, size)	Propogates gridpoints per parameter into a list if int provided
`get_levels`(marginal, ranges[, levels])	Used for approximately calculating confidence region isocontours
`setup_plot`([ax, ranges, axis_labels, …])	Builds corner plot
`marginal_plot`([ax, ranges, marginals, …])	Plots the marginal distribution corner plots
`put_marginals`(marginals)	Creates list of 1D and 2D marginal distributions ready for plotting
`target_choice`(target)	Returns the indices of targets to plot and the number of targets

Private Methods:

Inherited from LikelihoodFreeInference

_scatter_plot([ax, ranges, points, label, …])

Plotter for scatter plots

get_marginals(parameter_estimates=None, invF=None, ranges=None, gridsize=None)¶

Creates list of 1D and 2D marginal distributions ready for plotting

The marginal distribution lists from full distribution array. For every parameter the full distribution is summed over every other parameter to get the 1D marginals and for every combination the 2D marginals are calculated by summing over the remaining parameters. The list is made up of a list of n_params lists which contain n_columns number of objects. The value of the distribution comes from

Parameters

parameter_estimates (float(n_targets, n_params) or None, default=None) – The parameter estimates of each target data. If None the class instance parameter estimates are used
invF (float(n_targets, n_params, n_params) or None, default=None) – The inverse Fisher information matrix for each target. If None the class instance inverse Fisher information matrices are used
ranges (list or None, default=None) – A list of arrays containing the gridpoints for the marginal distribution for each parameter. If None the class instance ranges are used determined by the prior range
gridsize (list or None, default=None) – If using own ranges then the gridsize for these ranges must be passed (not checked)

Returns

The 1D and 2D marginal distributions for each parameter (of pair)

Return type

list of lists

ApproximateBayesianComputation¶

class imnn.lfi.ApproximateBayesianComputation(target_data, prior, simulator, compressor, gridsize=100, F=None, distance_measure=None)¶

Approximate Bayesian computation

Approximate Bayesian computation involves drawing parameter values from a prior, \(\boldsymbol{\theta}_i\leftarrow p(\boldsymbol{\theta})\), and generating simulations at this value using a model, \({\bf d}_i = f(\boldsymbol{\theta}_i)\), which can then be compressed down to some informative summary, \({\bf x}_i =g({\bf d}_i)\). By compressing some observed target to get \({\bf x}^\textrm{obs} =g({\bf d}^\textrm{obs})\) we can simulate from the prior distribution of possible parameter values to create a set of summaries \(\{{\bf x}_i|i\in[1, n_\textrm{sims}]\}\). The approximate posterior distribution is then obtained as the samples of parameter values whose summaries of simulations have an infinitely small difference from the summary of the target observation:

\[p(\boldsymbol{\theta}|{\bf d}^\textrm{obs})\approx \{\boldsymbol{\theta}_i| i~\textrm{where}~\lim_{\epsilon\to0} D(g(f(\boldsymbol{\theta}_i), g({\bf d}^\textrm{obs}))<\epsilon\}\]

Note that the distance measure \(D(x, y)\) is somewhat ambiguous, but in the infinitessimal limit will be more or less flat. Sensible choices for this distance measure can be made, i.e. for parameter estimates a Fisher scaled squared distance, but it is not wholly trivial in all cases.

This class contains several methods for making simulations, compressing them and doing distance based acceptance and rejection of the parameters used to make the simulations from the compressed summaries of target data. In particular either a single set of simulations can be run and the distance calculated, simulations can be run until a desired number of accepted points within achosen distance are obtained or if there are already a set of run simulations, different distance values can be chosen for acceptance and rejection of points. There are also several plotting methods for corner plots.

Note that if the summaries of the data are not parameter estimates then the distance measure must be optimal for the summary space to make sense.

Parameters

target_summaries (float(n_targets, n_summaries)) – The compressed data to be infered
n_summaries (int) – The size of the output of the compressor for a single simulation
n_targets (int) – The number of different targets to be infered
F (float(n_params, n_params)) – The Fisher information matrix for rescaling distance measure
parameters (utils container) – Holds accepted and rejected parameters and some summaries of results
summaries (utils container) – Holds accepted and rejected summaries and some summaries of results
distances (utils container) – Holds accepted and rejected distances of summaries to targets and some summaries of results

simulator()¶: A simulator which takes in an array of parameters and returns simulations made at those parameter values

compressor()¶: A function which takes in an array of simulations and compresses them. If no compression is needed this can be an identity function

distance_measure:: Either F_distance or euclidean_distance depending on inputs

Public Methods:

`__init__`(target_data, prior, simulator, …)	Constructor method
`__call__`(ε=None[, rng, n_samples, …])	Run the ABC
`set_samples`(parameters, summaries[, …])	Fills containers and calculates distances
`get_samples`(rng, n_samples[, dist])	Draws parameters from prior and makes and compresses simulations
`set_accepted`(ε[, smoothing])	Sets the accepted and rejected attributes of the containers
`get_min_accepted`(rng, ε, accepted[, …])	Iteratively run ABC until a minimum number of samples are accepted
`set_marginals`([accepted_parameters, ranges, …])	Wrapper for `get_marginals` to set the `marginals` attribute
`get_marginals`([accepted_parameters, ranges, …])	Creates the 1D and 2D marginal distribution list for plotting
`scatter_plot`([ax, ranges, points, label, …])	Plot a scatter corner plot of all accepted parameter pair values
`scatter_summaries`([ax, ranges, points, …])	Plot a scatter corner plot of all accepted summary pair values
`F_distance`(x, y, F)	Default distance measure, squared difference scaled by Fisher matrix
`euclidean_distance`(x, y[, aux])	Euclidean distance between simulation summaries and target summary

Inherited from LikelihoodFreeInference

`__init__`(target_data, prior, simulator, …)	Constructor method
`get_gridsize`(gridsize, size)	Propogates gridpoints per parameter into a list if int provided
`get_levels`(marginal, ranges[, levels])	Used for approximately calculating confidence region isocontours
`setup_plot`([ax, ranges, axis_labels, …])	Builds corner plot
`marginal_plot`([ax, ranges, marginals, …])	Plots the marginal distribution corner plots
`put_marginals`(marginals)	Creates list of 1D and 2D marginal distributions ready for plotting
`target_choice`(target)	Returns the indices of targets to plot and the number of targets

Private Methods:

Inherited from LikelihoodFreeInference

_scatter_plot([ax, ranges, points, label, …])

Plotter for scatter plots

F_distance(x, y, F)¶

Default distance measure, squared difference scaled by Fisher matrix

The target summaries are expanded on the zeroth axis so that it can broadcast with the simulation summaries.

Parameters

x ((any, n_params)) – Envisioned as a whole batch of summaries of simulations
y ((n_params)) – Envisioned as a target summary
F ((n_params)) – Fisher information matrix to scale the summary direction

euclidean_distance(x, y, aux=None)¶

Euclidean distance between simulation summaries and target summary

The target summaries are expanded on the zeroth axis so that it can broadcast with the simulation summaries.

Parameters

x ((any, n_params)) – Envisioned as a whole batch of summaries of simulations
y ((n_params)) – Envisioned as a target summary
aux (None, default=Note) – Empty holder so that function works in the same way as F_distance

get_marginals(accepted_parameters=None, ranges=None, gridsize=None, smoothing=None)¶

Creates the 1D and 2D marginal distribution list for plotting

Using list of parameter values (accepted by the ABC) an approximate set of marginal distributions for plotting are created based on histogramming the points. Smoothing can be performed on the histogram to avoid undersampling artefacts.

For every parameter the full distribution is summed over every other parameter to get the 1D marginals and for every combination the 2D marginals are calculated by summing over the remaining parameters. The list is made up of a list of n_params lists which contain n_columns number of objects.

Parameters

accepted_parameters (float(any, n_params) or None, default=None) – An array of all accepted parameter values. If None, the accepted parameters from the parameters class attribute are used
ranges (list or None, default=None) – A list of arrays containing the bin centres for the marginal distribution obtained by histogramming for each parameter. If None the ranges are constructed from the ranges of the prior distribution.
gridsize (list or None, default=None) – The number of grid points to evaluate the marginal distribution on for each parameter. This needs to be set if ranges is passed (and different from the gridsize set on initialisation)
smoothing (float or None, default=None) – A Gaussian smoothing for the marginal distributions. Smoothing not done if smoothing is None

Returns

The 1D and 2D marginal distributions for each parameter (of pair)

Return type

list of lists

get_min_accepted(rng, ε, accepted, n_simulations=1, max_iterations=10, smoothing=None, verbose=True)¶

Iteratively run ABC until a minimum number of samples are accepted

Setting a maximum distance that simulation summaries can be allowed to be from the summaries of each target an iterative scheme can be used to get n_samples simulations at a time, compress them and calculate the distances until the minimum number of desired samples are within the allowed cutoff distance. If multiple targets are being inferred then the simulations are run until all targets have at least the minimum number of accepted samples.

Parameters

rng (int(2,)) – A random number generator to draw the parameters and make simulations with
ϵ (float or float(n_targets)) – The acceptance distance between summaries from simulations and the summary of the target data. A different epsilon can be passed for each target.
accepted (int) – The minimum number of samples to be accepted within the ϵ cutoff
n_simulations (int, default=1) – The number of simulations to do at once
max_iterations (int, default=10) – Maximum number of iterations in the while loop to prevent infinite runs. Note if max_iterations is reached then there are probably not the required number of accepted samples
smoothing (float or None, default=None) – A Gaussian smoothing for the marginal distributions
verbose (bool, default=True) – Whether to print out the running stats (number accepted, iterations, etc)

Returns

float(any, n_params) – All parameters values drawn
float(any, n_summaries) – All summaries of simulations made
float(n_target, any) – The distance of every summary to each target

get_samples(rng, n_samples, dist=None)¶

Draws parameters from prior and makes and compresses simulations

Simulations are done n_samples in parallel

Parameters

rng (int(2,)) – A random number generator to draw the parameters and make simulations with
n_samples (int) – Number of parallel samples to draw
dist (fn or None, default=None) – A distribution to sample, if None the samples will be drawn from the prior set in the class initialisation

scatter_plot(ax=None, ranges=None, points=None, label=None, axis_labels=None, colours=None, hist=True, s=5, alpha=1.0, figsize=(10, 10), linestyle='solid', target=None, ncol=2, bbox_to_anchor=(0.0, 1.0))¶

Plot a scatter corner plot of all accepted parameter pair values

Plots scatter plots for parameter values in 2D subplots (and optionally the histogram of points in the 1D diagonal subplots).

Parameters

ax (list of matplotlib axes or None, default=None) – If ax is None a new figure and axes are created to make a corner plot otherwise a set of axes are formatted correctly for a corner plot. If existing ax is not correctly shaped there will be an error
ranges (list, default=None) – The list of ranges to set the number of rows and columns (if this is None the ranges set on initialisation will be used)
points (float(n_targets, n_points, {n_params} or {n_summaries})) – The points to scatter plot, if None the class instance of parameters.accepted will be used
label (str or None, default=None) – Name to be used in the legend
axis_labels (list of str or None, default=None) – A list of names for each parameter, no axis labels if None
colours (str or list or None, default=None) – The colour or list of colours to use for the different targets, if None then the normal matplotlib colours are used
hist (bool, default=True) – Whether or not to plot histograms on the diagonal of the corner plot
s (float, default=5.) – The size of the marker points in the scatter plot
alpha (float, default=1.) – The amount of alpha colour for the marker points
figsize (tuple, default=(10, 10)) – The size (in inches) to create a figure (if ax is None)
linestyle (str, default="solid") – Linestyle for the histograms
target (None or int or list, default=None) – The indices to choose which target’s points are plotted
format (bool, default=False) – If formatting is not needed
ncols (int, default=2) – Number of columns for the legend
bbox_to_anchor (tuple, default=(0.0, 1.0)) – Position to fix the legend to

Returns

The scatter plot axes

Return type

axes object

scatter_summaries(ax=None, ranges=None, points=None, label=None, axis_labels=None, colours=None, hist=True, s=5, alpha=1.0, figsize=(10, 10), linestyle='solid', gridsize=100, target=None, format=False, ncol=2, bbox_to_anchor=(0.0, 1.0))¶

Plot a scatter corner plot of all accepted summary pair values

Plots scatter plots for summary values in 2D subplots (and optionally the histogram of points in the 1D diagonal subplots).

Parameters

ax (list of matplotlib axes or None, default=None) – If ax is None a new figure and axes are created to make a corner plot otherwise a set of axes are formatted correctly for a corner plot. If existing ax is not correctly shaped there will be an error
ranges (list, default=None) – The list of ranges to set the number of rows and columns (if this is None the ranges set on initialisation will be used)
points (float(n_targets, n_points, {n_params} or {n_summaries})) – The points to scatter plot, if None the class instance of summaries.accepted will be used
label (str or None, default=None) – Name to be used in the legend
axis_labels (list of str or None, default=None) – A list of names for each parameter, no axis labels if None
colours (str or list or None, default=None) – The colour or list of colours to use for the different targets, if None then the normal matplotlib colours are used
hist (bool, default=True) – Whether or not to plot histograms on the diagonal of the corner plot
s (float, default=5.) – The size of the marker points in the scatter plot
alpha (float, default=1.) – The amount of alpha colour for the marker points
figsize (tuple, default=(10, 10)) – The size (in inches) to create a figure (if ax is None)
linestyle (str, default="solid") – Linestyle for the histograms
target (None or int or list, default=None) – The indices to choose which target’s points are plotted
format (bool, default=False) – If formatting is not needed
ncols (int, default=2) – Number of columns for the legend
bbox_to_anchor (tuple, default=(0.0, 1.0)) – Position to fix the legend to

Returns

The scatter plot axes

Return type

axes object

set_accepted(ε, smoothing=None)¶

Sets the accepted and rejected attributes of the containers

Using a distance (or list of distances for each target) cutoff between simulation summaries and summaries from some target the accepted and rejected parameter values (and summaries) can be defined. These points are used to make an approximate set of marginal distributions for plotting based on histogramming the points - where smoothing can be performed on the histogram to avoid undersampling artefacts.

Parameters

ϵ (float or float(n_targets)) – The acceptance distance between summaries from simulations and the summary of the target data. A different epsilon can be passed for each target.
smoothing (float or None, default=None) – A Gaussian smoothing for the marginal distributions

get_accepted:: Returns a boolean array with whether summaries are within ϵ

set_marginals(accepted_parameters=None, ranges=None, gridsize=None, smoothing=None)¶

Wrapper for get_marginals to set the marginals attribute

Parameters

accepted_parameters (float(any, n_params) or None, default=None) – An array of all accepted parameter values
ranges (list or None, default=None) – A list of arrays containing the bin centres for the marginal distribution obtained by histogramming for each parameter.
gridsize (list or None, default=None) – The number of grid points to evaluate the marginal distribution on for each parameter
smoothing (float or None, default=None) – A Gaussian smoothing for the marginal distributions

set_samples(parameters, summaries, distances=None, replace=False)¶

Fills containers and calculates distances

All parameters and summaries of simulations made at those parameters (and their distance from the summary of the observed data) are kept in containers for ease of use. This function sets the all attribute of these containers with the passed summaries and parameters (and distances if provided or it calculates the distances). These are concatenated to existing values unless replace = True in which the existing values are removed and overwritten with the passed summaries, parameters and distances.

Parameters

parameters (float(any, n_params)) – The pre-obtained parameter values to set to the class
summaries (float(any, n_summaries)) – The summaries of prerun simulations (at parameter values corresponding to the values in parameters)
distances (float(n_targets, any) or None, default=None) – The distances of the summaries from the summaries of each target. If None then the distances will be calculated
replace (bool, default=False) – Whether to replace the summaries, parameters and distances already obtained

PopulationMonteCarlo¶

class imnn.lfi.PopulationMonteCarlo(target_data, prior, simulator, compressor, F=None, gridsize=100, distance_measure=None)¶

Population driven approximate Bayesian computation sample retrieval

SEEMS TO BE A BUG IN THIS MODULE CURRENTLY!

Approximate Bayesian computation tends to be very expensive because it depends on drawing samples from all over the prior and a lot of the domain may be extremely unlikely without good knowledge of the sampling distribution. To make sampling more efficient, the population Monte Carlo iteratively updates the proposal distribution for new samples based on the density of the closest summaries of simulations to the summary of the target.

Parameters

acceptance_reached (float(n_targets,)) – The number of draws accepted over the number over the number of draws made for each target
iterations (int(n_targets,)) – The number of iterations of population updates for each target
total_draws (int(n_targets,)) – The total number of draws made (equivalent to number of simulations made) for each target

Public Methods:

`__init__`(target_data, prior, simulator, …)	Constructor method
`__call__`(rng, n_points[, percentile, …])	Run the PMC
`move_samples`(rng, samples, summaries, …[, …])	Loop which updates the population iteratively for individual target
`set_samples`(parameters, summaries[, …])	Fills containers and calculates distances
`set_accepted`([smoothing])	Container values to list of accepted attributes, builds marginals
`w_cov`(proposed, weighting)

Inherited from ApproximateBayesianComputation

`__init__`(target_data, prior, simulator, …)	Constructor method
`__call__`(rng, n_points[, percentile, …])	Run the PMC
`set_samples`(parameters, summaries[, …])	Fills containers and calculates distances
`get_samples`(rng, n_samples[, dist])	Draws parameters from prior and makes and compresses simulations
`set_accepted`([smoothing])	Container values to list of accepted attributes, builds marginals
`get_min_accepted`(rng, ε, accepted[, …])	Iteratively run ABC until a minimum number of samples are accepted
`set_marginals`([accepted_parameters, ranges, …])	Wrapper for `get_marginals` to set the `marginals` attribute
`get_marginals`([accepted_parameters, ranges, …])	Creates the 1D and 2D marginal distribution list for plotting
`scatter_plot`([ax, ranges, points, label, …])	Plot a scatter corner plot of all accepted parameter pair values
`scatter_summaries`([ax, ranges, points, …])	Plot a scatter corner plot of all accepted summary pair values
`F_distance`(x, y, F)	Default distance measure, squared difference scaled by Fisher matrix
`euclidean_distance`(x, y[, aux])	Euclidean distance between simulation summaries and target summary

Inherited from LikelihoodFreeInference

`__init__`(target_data, prior, simulator, …)	Constructor method
`get_gridsize`(gridsize, size)	Propogates gridpoints per parameter into a list if int provided
`get_levels`(marginal, ranges[, levels])	Used for approximately calculating confidence region isocontours
`setup_plot`([ax, ranges, axis_labels, …])	Builds corner plot
`marginal_plot`([ax, ranges, marginals, …])	Plots the marginal distribution corner plots
`put_marginals`(marginals)	Creates list of 1D and 2D marginal distributions ready for plotting
`target_choice`(target)	Returns the indices of targets to plot and the number of targets

Private Methods:

Inherited from LikelihoodFreeInference

_scatter_plot([ax, ranges, points, label, …])

Plotter for scatter plots

move_samples(rng, samples, summaries, distances, weighting, target, F, ε_ind=None, acceptance_ratio=None, max_iteration=None, max_acceptance=None, max_samples=None, n_parallel_simulations=None)¶

Loop which updates the population iteratively for individual target

The values of the parameters corresponding to the most distant summaries to the summary of the target data are considered to be the means of a new proposal distribution with a covariance given by the population weighted samples (weighted from the prior in the first instance). From this proposal distribution a new sample (or a parallel set of samples) is proposed and a simulation made at each sample, compressed and the distance from the summary of the target data calculated. If this distance is within the population then that proposal is accepted and the weighting is updated with this new sample and the new furthest samples will be updated with this new proposal distribution on the next iteration. This is continued until the number of accepted draws is small compared to the total number of draws (this is the acceptance_ratio). At this point the distribution is somewhat stationary (as long as acceptance_ratio is small enough) and the samples approximately come from the target distribution.

Parameters

rng (int(2,)) – A random number generator to draw the parameters and make simulations with
samples (float(n_points, n_params)) – The parameter values from the ABC step
summaries (float(n_points, n_summaries)) – The summaries of prerun simulations from the ABC step (at parameter values corresponding to the values in samples)
distances (float(n_points,)) – The distances of the summaries from the summaries of a target from the ABC step
weighting (float(n_points,)) – The value of the prior evaluated at the parameter values obtained in the ABC step
target (float(n_summaries,)) – The summary of the target to infer the parameter values of
F (float(n_params, n_params)) – The Fisher information matrix to rescale parameter directions for measuring the distance. This should be set to the identity matrix float(n_summaries, n_summaries) if the summaries are not parameter estimates
ϵ_ind (float, default=None) – The index of the outer most population simulation
acceptance_ratio (float, default=None) – The ratio of the number of accepted points to total proposals. When this gets small it suggests that points aren’t being added to the population any more because the population is stationary
max_iteration (int, default=None) – The cutoff number of iterations to break out of the while loop even if acceptance_ratio is not reached
max_acceptance (int, default=None) – The cutoff number of attempts in a single iteration to attempt to get a sample accepted
max_samples (int, default=None) – The number of attempts to get parameter values from the truncated normal distribution
n_parallel_simulations (int or None, default=None) – number of simulations to do at once, the innermost (closest) summaries are the only ones accepted, but this can massively reduce sampling time as long as the simulations can be easily parallelised

Returns

int(2,) – A random number generator to draw the parameters and make simulations with
float(n_points, n_params) – The accepted parameter values
float(n_points, n_summaries) – The summaries corresponding to the accepted parameter values, i.e. the closest summaries to the summary of the target data
float(n_points,) – The distances of the summaries from the summaries of the target
float(n_points,) – The value of the weighting from by the population
float – The acceptance ratio reach by the final iteration
int – The number of iterations run
int – The total number of draws from the proposal distribution over all iterations

single_iteration_condition:: Checks if the acceptance ratio or maximum iterations is reached

single_iteration:: Runs a single (or parallel set of) simulation(s), accepts if closer

set_accepted(smoothing=None)¶

Container values to list of accepted attributes, builds marginals

Parameters: smoothing (float or None, default=None) – A Gaussian smoothing for the marginal distributions

set_samples(parameters, summaries, distances=None, replace=False)¶

Fills containers and calculates distances

All parameters and summaries of simulations made at those parameters (and their distance from the summary of the observed data) are kept in containers for ease of use. This function sets the all attribute of these containers with the passed summaries and parameters (and distances if provided or it calculates the distances). These are concatenated to existing values unless replace = True in which the existing values are removed and overwritten with the passed summaries, parameters and distances.

Parameters

parameters (float(n_targets, n_points, n_params)) – The pre-obtained parameter values to set to the class
summaries (float(n_targets, n_points, n_summaries)) – The summaries of prerun simulations (at parameter values corresponding to the values in parameters)
distances (float(n_targets, n_points) or None, default=None) – The distances of the summaries from the summaries of each target. If None then the distances will be calculated
replace (bool, default=False) – Whether to replace the summaries, parameters and distances already obtained

Likelihood-free inference modules¶

Doing likelihood-free inference¶

Gaussian Approximation¶

Approximate Bayesian Computation¶

Population Monte Carlo¶

Note¶

LikelihoodFreeInference¶

GaussianApproximation¶

ApproximateBayesianComputation¶

PopulationMonteCarlo¶

Information Maximising Neural Network

Navigation

Related Topics