Likelihood-free inference modules¶
To be able to make use of the outputs of the IMNN it is best to use
likelihood-free inference (this is true when using any sort of neural network,
but anyway - I digress). Any lfi or sbi module should be able to use these
outputs where the network (once trained) can be added to the simulation as a
compression step. For ease of use there are some jax implementations of a few
different approximate Bayesian computation algorithms. These also come with a
plotting function. Examples are available
here. Note that
the module is not fully tested, but seems to be functionally working. There are
more functions available with pip install imnn-tf
where there is a previous
implementation with more general versions of the same functions and checkout
also github.com/justinalsing/pydelfi
Doing likelihood-free inference¶
With a trained IMNN it is possible to get an estimate of some data using
estimate = IMNN.get_estimate(target_data)
Along with the Fisher information from the network, we can use this to make a Gaussian approximation of the posterior under the assumption that the fiducial parameter values used to calculate the Fisher information coincide with the parameter estimate. This posterior can be calculated using the imnn.lfi
module. For all of the available functions in the lfi module a TensorFlow Probability-like distribution is used for the prior, e.g. for a uniform distribution for two parameters between 0 and 10 each we could write
import tensorflow_probability
tfp = tensorflow_probability.substrates.jax
prior = tfp.distributions.Blockwise(
[tfp.distributions.Uniform(low=low, high=high)
for low, high in zip([0., 0.], [10., 10.])])
prior.low = np.array([0., 0.])
prior.high = np.array([10., 10.])
We set the values of prior.low
and prior.high
since they are used to define the plotting ranges. Note that prior.event_shape
should be equal to n_params
, i.e. the number of parameters in the physical model.
Gaussian Approximation¶
The GaussianApproximation
simply evaluates a multivariate Gaussian with mean at estimate
and covariance given by np.linalg.inv(IMNN.F)
on a grid defined by the prior ranges.
GA = imnn.lfi.GaussianApproximation(
parameter_estimates=estimate,
invF=np.linalg.inv(IMNN.F),
prior=prior,
gridsize=100)
And corner plots of the Gaussian approximation can be made using
GA.marginal_plot(
ax=None, # Axes object to plot (constructs new if None)
ranges=None, # Ranges for each parameter (None=preset)
marginals=None, # Marginal distributions to plot (None=preset)
known=None, # Plots known parameter values if not None
label=None, # Adds legend element if not None
axis_labels=None, # Adds labels to the axes if not None
levels=None, # Plot specified approx significance contours
linestyle="solid", # Linestyle for the contours
colours=None, # Colour for the contours
target=None, # If multiple target data, which index to plot
format=False, # Whether to set up the plot decoration
ncol=2, # Number of columns in the legend
bbox_to_anchor=(1.0, 1.0)) # Where to place the legend
Note that this approximation shouldn’t be necessarily a good estimate of the true posterior, for that actual LFI methods should be used.
Approximate Bayesian Computation¶
To generate simulations and accept or reject these simulations based on a distance based criterion from some target data we can use
ABC = imnn.lfi.ApproximateBayesianComputation(
target_data=target_data,
prior=prior,
simulator=simulator,
compressor=IMNN.get_estimate,
gridsize=100,
F=IMNN.F,
distance_measure=None)
This takes in the target data and compresses it using the provided compressor (like IMNN.get_estimate
). The Fisher information matrix can be provided to rescale the parameter directions to make meaningful distance measurements as long as summaries are parameter estimates. If a different distance measure is better for the specific problem this can be passed as a function. Note that if simulations have already been done for the ABC and only the plotting and the acceptance and rejection is needed then simulator
can be set to None
. The ABC can actually be run by calling the module
parameters, summaries, distances = ABC(
ϵ=None, # The size of the epsilon ball to accept summaries
rng=None, # Random number generator for params and simulation
n_samples=None, # The number of samples to run (at one time)
parameters=None, # Values of parameters with premade compressed sims
summaries=None, # Premade compressed sims to avoid running new sims
min_accepted=None, # Num of required sims in epsilon ball (iterative)
max_iterations=10, # Max num of iterations to try and get min_accepted
smoothing=None, # Amount of smoothing on the histogrammed marginals
replace=False) # Whether to remove all previous run summaries
If not run and no premade simulations have been made then n_samples
and rng
must be passed. Note that if ϵ
is too large then the accepted samples should not be considered to be drawn from the posterior but rather some partially marginalised part of the joint distribution of summaries and parameters, and hence it can be very misleading - ϵ
should be a small as possible! Like with the GaussianApproximation
there is a ABC.marginal_plot(...)
but the parameter samples can also be plotted as a scatter plot on the corner plot
ABC.scatter_plot(
ax=None, # Axes object to plot (constructs new if None)
ranges=None, # Ranges for each parameter (None=preset)
points=None, # Parameter values to scatter (None=preset)
label=None, # Adds legend element if not None
axis_labels=None, # Adds labels to the axes if not None
colours=None, # Colour for the scatter points (and hists)
hist=True, # Whether to plot 1D histograms of points
s=5, # Marker size for points
alpha=1., # Amount of transparency for the points
figsize=(10, 10), # Size of the figure if not premade
linestyle="solid", # Linestyle for the histograms
target=None, # If multiple target data, which index to plot
ncol=2, # Number of columns in the legend
bbox_to_anchor=(0., 1.)) # Where to place the legend
And the summaries can also be plotted on a corner plot with exactly the same parameters as scatter_plot
(apart from gridsize
being added) but if points
is left None
then ABC.summaries.accepted
is used instead and the ranges calculated from these values. If points is supplied but ranges
is None then the ranges are calculated from the minimum and maximum values of the points are used as the edges.
ABC.scatter_summaries(
ax=None,
ranges=None,
points=None,
label=None,
axis_labels=None,
colours=None,
hist=True,
s=5,
alpha=1.,
figsize=(10, 10),
linestyle="solid",
gridsize=100,
target=None,
format=False,
ncol=2,
bbox_to_anchor=(0.0, 1.0))
Population Monte Carlo¶
To more efficiently accept samples than using a simple ABC where samples are drawn from the prior, the PopulationMonteCarlo provides a JAX accelerated iterative acceptance and rejection scheme where each iteration the population of samples with summaries closest to the summary of the desired target defines a new proposal distribution to force a fixed population to converge towards the posterior without setting an explicit size for the epsilon ball of normal ABC. The PMC is stopped using a criterion on the number of accepted proposals compared to the number of total draws from the proposal. When this gets very small it suggests the distribution is stationary and that the proposal has been reached. It works similarly to ApproximateBayesianComputation
.
PMC = imnn.lfi.PopulationMonteCarlo(
target_data=target_data,
prior=prior,
simulator=simulator,
compressor=IMNN.get_estimate,
gridsize=100,
F=IMNN.F,
distance_measure=None)
And it can be run using
parameters, summaries, distances = PMC(
rng, # Random number generator for params and sims
n_points, # Number of points from the final distribution
percentile=None, # Percentage of points making the population
acceptance_ratio=0.1, # Fraction of accepted draws vs total draws
max_iteration=10, # Maximum number of iterations of the PMC
max_acceptance=1, # Maximum number of tries to get an accepted
max_samples=int(1e5), # Maximum number of attempts to get parameter
n_initial_points=None, # Number of points in the initial ABC step
n_parallel_simulations=None, # Number of simulations to do in parallel
proposed=None, # Prerun parameter values for the initial ABC
summaries=None, # Premade compressed simulations for ABC
distances=None, # Precalculated distances for the initial ABC
smoothing=None, # Amount of smoothing on histogrammed marginal
replace=False) # Whether to remove all previous run summaries
The same plotting functions as ApproximateBayesianComputation are also available in the PMC
Note¶
There seems to be a bug in PopulationMonteCarlo and the parallel sampler is turned off
LikelihoodFreeInference¶
-
class
imnn.lfi.
LikelihoodFreeInference
(prior, gridsize=100, marginals=None)¶ Base class (some functionality) for likelihood free inference methods
Mostly used for plotting
- Parameters
gridsize (list) – The number of grid points to evaluate the marginal distribution on for each parameter
ranges (list) – A list of arrays containing the gridpoints for the marginal distribution for each parameter
marginal (list of lists) – A list of rows and columns of marginal distributions of a corner plot
-
prior
()¶ A prior distribution which can be evaluated and sampled from (should also contain a
low
and ahigh
attribute with appropriate ranges)
Public Methods:
__init__
(prior[, gridsize, marginals])Constructor method
get_gridsize
(gridsize, size)Propogates gridpoints per parameter into a list if int provided
get_levels
(marginal, ranges[, levels])Used for approximately calculating confidence region isocontours
setup_plot
([ax, ranges, axis_labels, …])Builds corner plot
marginal_plot
([ax, ranges, marginals, …])Plots the marginal distribution corner plots
put_marginals
(marginals)Creates list of 1D and 2D marginal distributions ready for plotting
target_choice
(target)Returns the indices of targets to plot and the number of targets
Private Methods:
_scatter_plot
([ax, ranges, points, label, …])Plotter for scatter plots
-
_scatter_plot
(ax=None, ranges=None, points=None, label=None, axis_labels=None, colours=None, hist=True, s=5.0, alpha=1.0, figsize=(10, 10), linestyle='solid', target=None, format=False, ncol=2, bbox_to_anchor=(0.0, 1.0))¶ Plotter for scatter plots
Plots scatter plots for points (parameters or summaries) in 2D subplots and the histogram of points in the 1D diagonal subplots.
- Parameters
ax (list of matplotlib axes or None, default=None) – If ax is None a new figure and axes are created to make a corner plot otherwise a set of axes are formatted correctly for a corner plot. If existing ax is not correctly shaped there will be an error
ranges (list, default=None) – The list of ranges to set the number of rows and columns (if this is None the ranges set on initialisation will be used)
points (float(n_targets, n_points, {n_params} or {n_summaries})) – The points to scatter plot
label (str or None, default=None) – Name to be used in the legend
axis_labels (list of str or None, default=None) – A list of names for each parameter, no axis labels if None
colours (str or list or None, default=None) – The colour or list of colours to use for the different targets, if None then the normal matplotlib colours are used
hist (bool, default=True) – Whether or not to plot histograms on the diagonal of the corner plot
s (float, default=5.) – The size of the marker points in the scatter plot
alpha (float, default=1.) – The amount of alpha colour for the marker points
figsize (tuple, default=(10, 10)) – The size (in inches) to create a figure (if ax is None)
linestyle (str, default="solid") – Linestyle for the histograms
target (None or int or list, default=None) – The indices to choose which target’s points are plotted
format (bool, default=False) – If formatting is not needed
ncols (int, default=2) – Number of columns for the legend
bbox_to_anchor (tuple, default=(0.0, 1.0)) – Position to fix the legend to
- Returns
The scatter plot axes
- Return type
axes object
- Raises
ValueError – if colour input is not correct
-
get_gridsize
(gridsize, size)¶ Propogates gridpoints per parameter into a list if int provided
- Parameters
gridsize (int or list) – The number of grid points to evaluate the marginal distribution on for every parameter (int) or each parameter (list)
size (int) – Number of parameters describing size of gridsize list
- Returns
The number of gridpoints to evaluate marginals for each parameter
- Return type
list
- Raises
ValueError – If list passed is wrong length
TypeError – If gridsize is not int or list of correct length
-
get_levels
(marginal, ranges, levels=[0.68, 0.95])¶ Used for approximately calculating confidence region isocontours
To calculate the values of the marginal distribution whose isocontour contains approximately levels specified fraction of samples drawn from the distribution the marginal distribution is sorted by value and normalised to one. If the distribution does is significantly non-zero outside of the range then this will cause large biases! The cumulative sum of the sorted values are then calculated and the value at which the index of this normalised cumulative distribution is closest to the desired level is used to return the value of the original marginal distribution.
- Parameters
marginal (float(gridsize*n_param)) – The gridded marginal distribution to find the isocontour values of
ranges (list) – List of the grid points for each parameter
levels (list, default=[0.68, 0.95]) – The fraction describing the percentage of samples inside the isocontour
- Returns
The values of the isocontours of the marginal distributions
- Return type
list
-
marginal_plot
(ax=None, ranges=None, marginals=None, known=None, label=None, axis_labels=None, levels=None, linestyle='solid', colours=None, target=None, format=False, ncol=2, bbox_to_anchor=(1.0, 1.0))¶ Plots the marginal distribution corner plots
Plots the 68% and 95% (approximate) confidence contours for each 2D parameter pair and the 1D densities as a corner plot.
- Parameters
ax (list of matplotlib axes or None, default=None) – If ax is None a new figure and axes are created to make a corner plot otherwise a set of axes are formatted correctly for a corner plot. If existing ax is not correctly shaped there will be an error
ranges (list, default=None) – The list of ranges to set the number of rows and columns (if this is None the ranges set on initialisation will be used)
marginals (list of lists (n_params * n_params) or None, default=None) – The marginal distributions for every parameter and every 2D combination. If None the marginals defined in the class are used (if they exist)
known (float(n_params,)) – Values to plot known parameter value lines at
label (str or None, default=None) – Name to be used in the legend
axis_labels (list of str or None, default=None) – A list of names for each parameter, no axis labels if None
linestyle (str, default="solid") – Linestyle for the histograms
colours (str or list or None, default=None) – The colour or list of colours to use for the different targets, if None then the normal matplotlib colours are used
target (None or int or list, default=None) – The indices to choose which target’s points are plotted
format (bool, default=False) – If formatting is not needed
ncols (int, default=2) – Number of columns for the legend
bbox_to_anchor (tuple, default=(0.0, 1.0)) – Position to fix the legend to
- Returns
The scatter plot axes
- Return type
axes object
- Raises
ValueError – if marginals is not provided
TypeError – if marginals is not a list
-
put_marginals
(marginals)¶ Creates list of 1D and 2D marginal distributions ready for plotting
The marginal distribution lists from full distribution array. For every parameter the full distribution is summed over every other parameter to get the 1D marginals and for every combination the 2D marginals are calculated by summing over the remaining parameters. The list is made up of a list of n_params lists which contain n_columns number of objects.
- Parameters
marginals (float(n_targets, gridsize*n_params)) – The full distribution from which to calculate the marginal distributions
- Returns
The 1D and 2D marginal distributions for each parameter (of pair)
- Return type
list of lists
-
setup_plot
(ax=None, ranges=None, axis_labels=None, figsize=(10, 10), format=False)¶ Builds corner plot
- axlist of matplotlib axes or None, default=None
If ax is None a new figure and axes are created to make a corner plot otherwise a set of axes are formatted correctly for a corner plot. If existing ax is not correctly shaped there will be an error
- rangeslist, default=None
The list of ranges to set the number of rows and columns (if this is None there will be an error)
- axis_labelslist of str or None, default=None
A list of names for each parameter, no axis labels if None
- figsizetuple, default=(10, 10)
The size (in inches) to create a figure (if ax is None)
- formatbool, default=False
If formatting is not needed
- Returns
The formatted matplotlib axes
- Return type
axes object
-
target_choice
(target)¶ Returns the indices of targets to plot and the number of targets
- Parameters
target (None or int or list) – The indices of the targets to plot. If None then
n_targets
from the class instance is used- Returns
list – The indices of the targets to be plotted
int – The number of targets to be plotted
GaussianApproximation¶
-
class
imnn.lfi.
GaussianApproximation
(parameter_estimates, invF, prior, gridsize=100)¶ Uses Fisher information and parameter estimates approximate marginals
Since the inverse of the Fisher information matrix describes the minimum variance of some estimator we can use it to make an approximate (Gaussian- distributed) estimate of the target distribution. Note that this will not reflect the true shape of the target distribution and as likely to underestimate the distribution as overestimate it. Furthermore, if the Fisher information matrix is calculated far from the estimate of the parameter values then its value may not be representative of the Fisher information at that position and so the variance estimated from its inverse be incorrect.
- Parameters
parameter_estimates (float(n_targets, n_params)) – The parameter estimates of each target data
invF (float(n_targets, n_params, n_params)) – The inverse Fisher information matrix for each target
marginals (list of lists) – The 1D and 2D marginal distribution for each target
Public Methods:
__init__
(parameter_estimates, invF, prior[, …])Constructor method
get_marginals
([parameter_estimates, invF, …])Creates list of 1D and 2D marginal distributions ready for plotting
Inherited from
LikelihoodFreeInference
__init__
(parameter_estimates, invF, prior[, …])Constructor method
get_gridsize
(gridsize, size)Propogates gridpoints per parameter into a list if int provided
get_levels
(marginal, ranges[, levels])Used for approximately calculating confidence region isocontours
setup_plot
([ax, ranges, axis_labels, …])Builds corner plot
marginal_plot
([ax, ranges, marginals, …])Plots the marginal distribution corner plots
put_marginals
(marginals)Creates list of 1D and 2D marginal distributions ready for plotting
target_choice
(target)Returns the indices of targets to plot and the number of targets
Private Methods:
Inherited from
LikelihoodFreeInference
_scatter_plot
([ax, ranges, points, label, …])Plotter for scatter plots
-
get_marginals
(parameter_estimates=None, invF=None, ranges=None, gridsize=None)¶ Creates list of 1D and 2D marginal distributions ready for plotting
The marginal distribution lists from full distribution array. For every parameter the full distribution is summed over every other parameter to get the 1D marginals and for every combination the 2D marginals are calculated by summing over the remaining parameters. The list is made up of a list of n_params lists which contain n_columns number of objects. The value of the distribution comes from
- Parameters
parameter_estimates (float(n_targets, n_params) or None, default=None) – The parameter estimates of each target data. If None the class instance parameter estimates are used
invF (float(n_targets, n_params, n_params) or None, default=None) – The inverse Fisher information matrix for each target. If None the class instance inverse Fisher information matrices are used
ranges (list or None, default=None) – A list of arrays containing the gridpoints for the marginal distribution for each parameter. If None the class instance ranges are used determined by the prior range
gridsize (list or None, default=None) – If using own ranges then the gridsize for these ranges must be passed (not checked)
- Returns
The 1D and 2D marginal distributions for each parameter (of pair)
- Return type
list of lists
ApproximateBayesianComputation¶
-
class
imnn.lfi.
ApproximateBayesianComputation
(target_data, prior, simulator, compressor, gridsize=100, F=None, distance_measure=None)¶ Approximate Bayesian computation
Approximate Bayesian computation involves drawing parameter values from a prior, \(\boldsymbol{\theta}_i\leftarrow p(\boldsymbol{\theta})\), and generating simulations at this value using a model, \({\bf d}_i = f(\boldsymbol{\theta}_i)\), which can then be compressed down to some informative summary, \({\bf x}_i =g({\bf d}_i)\). By compressing some observed target to get \({\bf x}^\textrm{obs} =g({\bf d}^\textrm{obs})\) we can simulate from the prior distribution of possible parameter values to create a set of summaries \(\{{\bf x}_i|i\in[1, n_\textrm{sims}]\}\). The approximate posterior distribution is then obtained as the samples of parameter values whose summaries of simulations have an infinitely small difference from the summary of the target observation:
\[p(\boldsymbol{\theta}|{\bf d}^\textrm{obs})\approx \{\boldsymbol{\theta}_i| i~\textrm{where}~\lim_{\epsilon\to0} D(g(f(\boldsymbol{\theta}_i), g({\bf d}^\textrm{obs}))<\epsilon\}\]Note that the distance measure \(D(x, y)\) is somewhat ambiguous, but in the infinitessimal limit will be more or less flat. Sensible choices for this distance measure can be made, i.e. for parameter estimates a Fisher scaled squared distance, but it is not wholly trivial in all cases.
This class contains several methods for making simulations, compressing them and doing distance based acceptance and rejection of the parameters used to make the simulations from the compressed summaries of target data. In particular either a single set of simulations can be run and the distance calculated, simulations can be run until a desired number of accepted points within achosen distance are obtained or if there are already a set of run simulations, different distance values can be chosen for acceptance and rejection of points. There are also several plotting methods for corner plots.
Note that if the summaries of the data are not parameter estimates then the distance measure must be optimal for the summary space to make sense.
- Parameters
target_summaries (float(n_targets, n_summaries)) – The compressed data to be infered
n_summaries (int) – The size of the output of the compressor for a single simulation
n_targets (int) – The number of different targets to be infered
F (float(n_params, n_params)) – The Fisher information matrix for rescaling distance measure
parameters (utils container) – Holds accepted and rejected parameters and some summaries of results
summaries (utils container) – Holds accepted and rejected summaries and some summaries of results
distances (utils container) – Holds accepted and rejected distances of summaries to targets and some summaries of results
-
simulator
()¶ A simulator which takes in an array of parameters and returns simulations made at those parameter values
-
compressor
()¶ A function which takes in an array of simulations and compresses them. If no compression is needed this can be an identity function
-
distance_measure:
Either
F_distance
oreuclidean_distance
depending on inputs
Public Methods:
__init__
(target_data, prior, simulator, …)Constructor method
__call__
(ε=None[, rng, n_samples, …])Run the ABC
set_samples
(parameters, summaries[, …])Fills containers and calculates distances
get_samples
(rng, n_samples[, dist])Draws parameters from prior and makes and compresses simulations
set_accepted
(ε[, smoothing])Sets the accepted and rejected attributes of the containers
get_min_accepted
(rng, ε, accepted[, …])Iteratively run ABC until a minimum number of samples are accepted
set_marginals
([accepted_parameters, ranges, …])Wrapper for
get_marginals
to set themarginals
attributeget_marginals
([accepted_parameters, ranges, …])Creates the 1D and 2D marginal distribution list for plotting
scatter_plot
([ax, ranges, points, label, …])Plot a scatter corner plot of all accepted parameter pair values
scatter_summaries
([ax, ranges, points, …])Plot a scatter corner plot of all accepted summary pair values
F_distance
(x, y, F)Default distance measure, squared difference scaled by Fisher matrix
euclidean_distance
(x, y[, aux])Euclidean distance between simulation summaries and target summary
Inherited from
LikelihoodFreeInference
__init__
(target_data, prior, simulator, …)Constructor method
get_gridsize
(gridsize, size)Propogates gridpoints per parameter into a list if int provided
get_levels
(marginal, ranges[, levels])Used for approximately calculating confidence region isocontours
setup_plot
([ax, ranges, axis_labels, …])Builds corner plot
marginal_plot
([ax, ranges, marginals, …])Plots the marginal distribution corner plots
put_marginals
(marginals)Creates list of 1D and 2D marginal distributions ready for plotting
target_choice
(target)Returns the indices of targets to plot and the number of targets
Private Methods:
Inherited from
LikelihoodFreeInference
_scatter_plot
([ax, ranges, points, label, …])Plotter for scatter plots
-
F_distance
(x, y, F)¶ Default distance measure, squared difference scaled by Fisher matrix
The target summaries are expanded on the zeroth axis so that it can broadcast with the simulation summaries.
- Parameters
x ((any, n_params)) – Envisioned as a whole batch of summaries of simulations
y ((n_params)) – Envisioned as a target summary
F ((n_params)) – Fisher information matrix to scale the summary direction
-
euclidean_distance
(x, y, aux=None)¶ Euclidean distance between simulation summaries and target summary
The target summaries are expanded on the zeroth axis so that it can broadcast with the simulation summaries.
- Parameters
x ((any, n_params)) – Envisioned as a whole batch of summaries of simulations
y ((n_params)) – Envisioned as a target summary
aux (None, default=Note) – Empty holder so that function works in the same way as F_distance
-
get_marginals
(accepted_parameters=None, ranges=None, gridsize=None, smoothing=None)¶ Creates the 1D and 2D marginal distribution list for plotting
Using list of parameter values (accepted by the ABC) an approximate set of marginal distributions for plotting are created based on histogramming the points. Smoothing can be performed on the histogram to avoid undersampling artefacts.
For every parameter the full distribution is summed over every other parameter to get the 1D marginals and for every combination the 2D marginals are calculated by summing over the remaining parameters. The list is made up of a list of n_params lists which contain n_columns number of objects.
- Parameters
accepted_parameters (float(any, n_params) or None, default=None) – An array of all accepted parameter values. If None, the accepted parameters from the parameters class attribute are used
ranges (list or None, default=None) – A list of arrays containing the bin centres for the marginal distribution obtained by histogramming for each parameter. If None the ranges are constructed from the ranges of the prior distribution.
gridsize (list or None, default=None) – The number of grid points to evaluate the marginal distribution on for each parameter. This needs to be set if ranges is passed (and different from the gridsize set on initialisation)
smoothing (float or None, default=None) – A Gaussian smoothing for the marginal distributions. Smoothing not done if smoothing is None
- Returns
The 1D and 2D marginal distributions for each parameter (of pair)
- Return type
list of lists
-
get_min_accepted
(rng, ε, accepted, n_simulations=1, max_iterations=10, smoothing=None, verbose=True)¶ Iteratively run ABC until a minimum number of samples are accepted
Setting a maximum distance that simulation summaries can be allowed to be from the summaries of each target an iterative scheme can be used to get
n_samples
simulations at a time, compress them and calculate the distances until the minimum number of desired samples are within the allowed cutoff distance. If multiple targets are being inferred then the simulations are run until all targets have at least the minimum number of accepted samples.- Parameters
rng (int(2,)) – A random number generator to draw the parameters and make simulations with
ϵ (float or float(n_targets)) – The acceptance distance between summaries from simulations and the summary of the target data. A different epsilon can be passed for each target.
accepted (int) – The minimum number of samples to be accepted within the ϵ cutoff
n_simulations (int, default=1) – The number of simulations to do at once
max_iterations (int, default=10) – Maximum number of iterations in the while loop to prevent infinite runs. Note if max_iterations is reached then there are probably not the required number of accepted samples
smoothing (float or None, default=None) – A Gaussian smoothing for the marginal distributions
verbose (bool, default=True) – Whether to print out the running stats (number accepted, iterations, etc)
- Returns
float(any, n_params) – All parameters values drawn
float(any, n_summaries) – All summaries of simulations made
float(n_target, any) – The distance of every summary to each target
-
get_samples
(rng, n_samples, dist=None)¶ Draws parameters from prior and makes and compresses simulations
Simulations are done n_samples in parallel
- Parameters
rng (int(2,)) – A random number generator to draw the parameters and make simulations with
n_samples (int) – Number of parallel samples to draw
dist (fn or None, default=None) – A distribution to sample, if None the samples will be drawn from the prior set in the class initialisation
-
scatter_plot
(ax=None, ranges=None, points=None, label=None, axis_labels=None, colours=None, hist=True, s=5, alpha=1.0, figsize=(10, 10), linestyle='solid', target=None, ncol=2, bbox_to_anchor=(0.0, 1.0))¶ Plot a scatter corner plot of all accepted parameter pair values
Plots scatter plots for parameter values in 2D subplots (and optionally the histogram of points in the 1D diagonal subplots).
- Parameters
ax (list of matplotlib axes or None, default=None) – If ax is None a new figure and axes are created to make a corner plot otherwise a set of axes are formatted correctly for a corner plot. If existing ax is not correctly shaped there will be an error
ranges (list, default=None) – The list of ranges to set the number of rows and columns (if this is None the ranges set on initialisation will be used)
points (float(n_targets, n_points, {n_params} or {n_summaries})) – The points to scatter plot, if None the class instance of parameters.accepted will be used
label (str or None, default=None) – Name to be used in the legend
axis_labels (list of str or None, default=None) – A list of names for each parameter, no axis labels if None
colours (str or list or None, default=None) – The colour or list of colours to use for the different targets, if None then the normal matplotlib colours are used
hist (bool, default=True) – Whether or not to plot histograms on the diagonal of the corner plot
s (float, default=5.) – The size of the marker points in the scatter plot
alpha (float, default=1.) – The amount of alpha colour for the marker points
figsize (tuple, default=(10, 10)) – The size (in inches) to create a figure (if ax is None)
linestyle (str, default="solid") – Linestyle for the histograms
target (None or int or list, default=None) – The indices to choose which target’s points are plotted
format (bool, default=False) – If formatting is not needed
ncols (int, default=2) – Number of columns for the legend
bbox_to_anchor (tuple, default=(0.0, 1.0)) – Position to fix the legend to
- Returns
The scatter plot axes
- Return type
axes object
-
scatter_summaries
(ax=None, ranges=None, points=None, label=None, axis_labels=None, colours=None, hist=True, s=5, alpha=1.0, figsize=(10, 10), linestyle='solid', gridsize=100, target=None, format=False, ncol=2, bbox_to_anchor=(0.0, 1.0))¶ Plot a scatter corner plot of all accepted summary pair values
Plots scatter plots for summary values in 2D subplots (and optionally the histogram of points in the 1D diagonal subplots).
- Parameters
ax (list of matplotlib axes or None, default=None) – If ax is None a new figure and axes are created to make a corner plot otherwise a set of axes are formatted correctly for a corner plot. If existing ax is not correctly shaped there will be an error
ranges (list, default=None) – The list of ranges to set the number of rows and columns (if this is None the ranges set on initialisation will be used)
points (float(n_targets, n_points, {n_params} or {n_summaries})) – The points to scatter plot, if None the class instance of summaries.accepted will be used
label (str or None, default=None) – Name to be used in the legend
axis_labels (list of str or None, default=None) – A list of names for each parameter, no axis labels if None
colours (str or list or None, default=None) – The colour or list of colours to use for the different targets, if None then the normal matplotlib colours are used
hist (bool, default=True) – Whether or not to plot histograms on the diagonal of the corner plot
s (float, default=5.) – The size of the marker points in the scatter plot
alpha (float, default=1.) – The amount of alpha colour for the marker points
figsize (tuple, default=(10, 10)) – The size (in inches) to create a figure (if ax is None)
linestyle (str, default="solid") – Linestyle for the histograms
target (None or int or list, default=None) – The indices to choose which target’s points are plotted
format (bool, default=False) – If formatting is not needed
ncols (int, default=2) – Number of columns for the legend
bbox_to_anchor (tuple, default=(0.0, 1.0)) – Position to fix the legend to
- Returns
The scatter plot axes
- Return type
axes object
-
set_accepted
(ε, smoothing=None)¶ Sets the accepted and rejected attributes of the containers
Using a distance (or list of distances for each target) cutoff between simulation summaries and summaries from some target the accepted and rejected parameter values (and summaries) can be defined. These points are used to make an approximate set of marginal distributions for plotting based on histogramming the points - where smoothing can be performed on the histogram to avoid undersampling artefacts.
- Parameters
ϵ (float or float(n_targets)) – The acceptance distance between summaries from simulations and the summary of the target data. A different epsilon can be passed for each target.
smoothing (float or None, default=None) – A Gaussian smoothing for the marginal distributions
-
get_accepted:
Returns a boolean array with whether summaries are within ϵ
-
set_marginals
(accepted_parameters=None, ranges=None, gridsize=None, smoothing=None)¶ Wrapper for
get_marginals
to set themarginals
attribute- Parameters
accepted_parameters (float(any, n_params) or None, default=None) – An array of all accepted parameter values
ranges (list or None, default=None) – A list of arrays containing the bin centres for the marginal distribution obtained by histogramming for each parameter.
gridsize (list or None, default=None) – The number of grid points to evaluate the marginal distribution on for each parameter
smoothing (float or None, default=None) – A Gaussian smoothing for the marginal distributions
-
set_samples
(parameters, summaries, distances=None, replace=False)¶ Fills containers and calculates distances
All parameters and summaries of simulations made at those parameters (and their distance from the summary of the observed data) are kept in containers for ease of use. This function sets the
all
attribute of these containers with the passed summaries and parameters (and distances if provided or it calculates the distances). These are concatenated to existing values unlessreplace = True
in which the existing values are removed and overwritten with the passed summaries, parameters and distances.- Parameters
parameters (float(any, n_params)) – The pre-obtained parameter values to set to the class
summaries (float(any, n_summaries)) – The summaries of prerun simulations (at parameter values corresponding to the values in parameters)
distances (float(n_targets, any) or None, default=None) – The distances of the summaries from the summaries of each target. If None then the distances will be calculated
replace (bool, default=False) – Whether to replace the summaries, parameters and distances already obtained
PopulationMonteCarlo¶
-
class
imnn.lfi.
PopulationMonteCarlo
(target_data, prior, simulator, compressor, F=None, gridsize=100, distance_measure=None)¶ Population driven approximate Bayesian computation sample retrieval
SEEMS TO BE A BUG IN THIS MODULE CURRENTLY!
Approximate Bayesian computation tends to be very expensive because it depends on drawing samples from all over the prior and a lot of the domain may be extremely unlikely without good knowledge of the sampling distribution. To make sampling more efficient, the population Monte Carlo iteratively updates the proposal distribution for new samples based on the density of the closest summaries of simulations to the summary of the target.
- Parameters
acceptance_reached (float(n_targets,)) – The number of draws accepted over the number over the number of draws made for each target
iterations (int(n_targets,)) – The number of iterations of population updates for each target
total_draws (int(n_targets,)) – The total number of draws made (equivalent to number of simulations made) for each target
Public Methods:
__init__
(target_data, prior, simulator, …)Constructor method
__call__
(rng, n_points[, percentile, …])Run the PMC
move_samples
(rng, samples, summaries, …[, …])Loop which updates the population iteratively for individual target
set_samples
(parameters, summaries[, …])Fills containers and calculates distances
set_accepted
([smoothing])Container values to list of accepted attributes, builds marginals
w_cov
(proposed, weighting)Inherited from
ApproximateBayesianComputation
__init__
(target_data, prior, simulator, …)Constructor method
__call__
(rng, n_points[, percentile, …])Run the PMC
set_samples
(parameters, summaries[, …])Fills containers and calculates distances
get_samples
(rng, n_samples[, dist])Draws parameters from prior and makes and compresses simulations
set_accepted
([smoothing])Container values to list of accepted attributes, builds marginals
get_min_accepted
(rng, ε, accepted[, …])Iteratively run ABC until a minimum number of samples are accepted
set_marginals
([accepted_parameters, ranges, …])Wrapper for
get_marginals
to set themarginals
attributeget_marginals
([accepted_parameters, ranges, …])Creates the 1D and 2D marginal distribution list for plotting
scatter_plot
([ax, ranges, points, label, …])Plot a scatter corner plot of all accepted parameter pair values
scatter_summaries
([ax, ranges, points, …])Plot a scatter corner plot of all accepted summary pair values
F_distance
(x, y, F)Default distance measure, squared difference scaled by Fisher matrix
euclidean_distance
(x, y[, aux])Euclidean distance between simulation summaries and target summary
Inherited from
LikelihoodFreeInference
__init__
(target_data, prior, simulator, …)Constructor method
get_gridsize
(gridsize, size)Propogates gridpoints per parameter into a list if int provided
get_levels
(marginal, ranges[, levels])Used for approximately calculating confidence region isocontours
setup_plot
([ax, ranges, axis_labels, …])Builds corner plot
marginal_plot
([ax, ranges, marginals, …])Plots the marginal distribution corner plots
put_marginals
(marginals)Creates list of 1D and 2D marginal distributions ready for plotting
target_choice
(target)Returns the indices of targets to plot and the number of targets
Private Methods:
Inherited from
LikelihoodFreeInference
_scatter_plot
([ax, ranges, points, label, …])Plotter for scatter plots
-
move_samples
(rng, samples, summaries, distances, weighting, target, F, ε_ind=None, acceptance_ratio=None, max_iteration=None, max_acceptance=None, max_samples=None, n_parallel_simulations=None)¶ Loop which updates the population iteratively for individual target
The values of the parameters corresponding to the most distant summaries to the summary of the target data are considered to be the means of a new proposal distribution with a covariance given by the population weighted samples (weighted from the prior in the first instance). From this proposal distribution a new sample (or a parallel set of samples) is proposed and a simulation made at each sample, compressed and the distance from the summary of the target data calculated. If this distance is within the population then that proposal is accepted and the weighting is updated with this new sample and the new furthest samples will be updated with this new proposal distribution on the next iteration. This is continued until the number of accepted draws is small compared to the total number of draws (this is the acceptance_ratio). At this point the distribution is somewhat stationary (as long as acceptance_ratio is small enough) and the samples approximately come from the target distribution.
- Parameters
rng (int(2,)) – A random number generator to draw the parameters and make simulations with
samples (float(n_points, n_params)) – The parameter values from the ABC step
summaries (float(n_points, n_summaries)) – The summaries of prerun simulations from the ABC step (at parameter values corresponding to the values in samples)
distances (float(n_points,)) – The distances of the summaries from the summaries of a target from the ABC step
weighting (float(n_points,)) – The value of the prior evaluated at the parameter values obtained in the ABC step
target (float(n_summaries,)) – The summary of the target to infer the parameter values of
F (float(n_params, n_params)) – The Fisher information matrix to rescale parameter directions for measuring the distance. This should be set to the identity matrix float(n_summaries, n_summaries) if the summaries are not parameter estimates
ϵ_ind (float, default=None) – The index of the outer most population simulation
acceptance_ratio (float, default=None) – The ratio of the number of accepted points to total proposals. When this gets small it suggests that points aren’t being added to the population any more because the population is stationary
max_iteration (int, default=None) – The cutoff number of iterations to break out of the while loop even if acceptance_ratio is not reached
max_acceptance (int, default=None) – The cutoff number of attempts in a single iteration to attempt to get a sample accepted
max_samples (int, default=None) – The number of attempts to get parameter values from the truncated normal distribution
n_parallel_simulations (int or None, default=None) – number of simulations to do at once, the innermost (closest) summaries are the only ones accepted, but this can massively reduce sampling time as long as the simulations can be easily parallelised
- Returns
int(2,) – A random number generator to draw the parameters and make simulations with
float(n_points, n_params) – The accepted parameter values
float(n_points, n_summaries) – The summaries corresponding to the accepted parameter values, i.e. the closest summaries to the summary of the target data
float(n_points,) – The distances of the summaries from the summaries of the target
float(n_points,) – The value of the weighting from by the population
float – The acceptance ratio reach by the final iteration
int – The number of iterations run
int – The total number of draws from the proposal distribution over all iterations
-
single_iteration_condition:
Checks if the acceptance ratio or maximum iterations is reached
-
single_iteration:
Runs a single (or parallel set of) simulation(s), accepts if closer
-
set_accepted
(smoothing=None)¶ Container values to list of accepted attributes, builds marginals
- Parameters
smoothing (float or None, default=None) – A Gaussian smoothing for the marginal distributions
-
set_samples
(parameters, summaries, distances=None, replace=False)¶ Fills containers and calculates distances
All parameters and summaries of simulations made at those parameters (and their distance from the summary of the observed data) are kept in containers for ease of use. This function sets the
all
attribute of these containers with the passed summaries and parameters (and distances if provided or it calculates the distances). These are concatenated to existing values unlessreplace = True
in which the existing values are removed and overwritten with the passed summaries, parameters and distances.- Parameters
parameters (float(n_targets, n_points, n_params)) – The pre-obtained parameter values to set to the class
summaries (float(n_targets, n_points, n_summaries)) – The summaries of prerun simulations (at parameter values corresponding to the values in parameters)
distances (float(n_targets, n_points) or None, default=None) – The distances of the summaries from the summaries of each target. If None then the distances will be calculated
replace (bool, default=False) – Whether to replace the summaries, parameters and distances already obtained