Prob.itsample#

Inverse transform sampling, for sampling from arbitrary probability density functions.

adaptive_chebfit(pdf, lower_bd, upper_bd, eps=1e-15)#

Fit a chebyshev polynomial, increasing sampling rate until coefficient tail falls below provided tolerance.

Parameters:
  • pdf (function, float -> float) – The probability density function (not necessarily normalized). Must take floats or ints as input, and return floats as an output.

  • lower_bd (float) – Lower bound of the support of the pdf. This parameter allows one to manually establish cutoffs for the density.

  • upper_bd (float) – Upper bound of the support of the pdf.

  • eps (float) – Error tolerance of Chebyshev polynomial fit of PDF.

Returns:

  • x (array) – The nodes at which the polynomial interpolation takes place. These are adaptively chosen based on the provided tolerance.

  • coeffs (array) – Coefficients in Chebyshev approximation of the PDF.

Notes

This fit defines the “error” as the magnitude of the tail of the Chebyshev coefficients. Computing the true error (i.e. discrepancy between the PDF and it’s approximant) would be much slower, so we avoid it and use this rough approximation in its place.

chebcdf(pdf, lower_bd, upper_bd, eps=1e-15)#

Get Chebyshev approximation of the CDF.

Parameters:
  • pdf (function, float -> float) – The probability density function (not necessarily normalized). Must take floats or ints as input, and return floats as an output.

  • lower_bd (float) – Lower bound of the support of the pdf. This parameter allows one to manually establish cutoffs for the density.

  • upper_bd (float) – Upper bound of the support of the pdf.

  • eps (float) – Error tolerance of Chebyshev polynomial fit of PDF.

Returns:

cdf – The cumulative density function of the (normalized version of the) provided pdf. The function cdf() takes an iterable of floats or doubles as an argument, and returns an iterable of floats of the same length.

Return type:

function

get_cdf(pdf, lower_bd=-inf, upper_bd=inf)#

Generate a CDF from a (possibly not normalized) pdf.

Parameters:
  • pdf (function, float -> float) – The probability density function (not necessarily normalized). Must take floats or ints as input, and return floats as an output.

  • lower_bd (float) – Lower bound of the support of the pdf. This parameter allows one to manually establish cutoffs for the density.

  • upper_bd (float) – Upper bound of the support of the pdf.

Returns:

cdf – The cumulative density function of the (normalized version of the) provided pdf. Will return a float if provided with a float or int; will return a numpy array if provided with an iterable.

Return type:

function

normalize(pdf, lower_bd=-inf, upper_bd=inf, vectorize=False)#

Normalize a non-normalized PDF.

Parameters:
  • pdf (function, float -> float) – The probability density function (not necessarily normalized). Must take floats or ints as input, and return floats as an output.

  • lower_bd (float) – Lower bound of the support of the pdf. This parameter allows one to manually establish cutoffs for the density.

  • upper_bd (float) – Upper bound of the support of the pdf.

  • vectorize (boolean) – Vectorize the function. This slows down function calls, and so is generally set to False.

Returns:

pdf_norm – Function with same signature as pdf, but normalized so that the integral between lower_bd and upper_bd is close to 1. Maps nicely over iterables.

Return type:

function

sample(pdf, num_samples, lower_bd=-inf, upper_bd=inf, guess=0, chebyshev=False)#

Sample from an arbitrary, unnormalized PDF.

Parameters:
  • pdf (function, float -> float) – The probability density function (not necessarily normalized). Must take floats or ints as input, and return floats as an output.

  • num_samples (int) – The number of samples to be generated.

  • lower_bd (float) – Lower bound of the support of the pdf. This parameter allows one to manually establish cutoffs for the density.

  • upper_bd (float) – Upper bound of the support of the pdf.

  • guess (float or int) – Initial guess for the numerical solver to use when inverting the CDF.

  • chebyshev (Boolean, optional (default=False)) – If True, then the CDF is approximated using Chebyshev polynomials.

Returns:

samples – An array of samples from the provided PDF, with support between lower_bd and upper_bd.

Return type:

numpy array

Notes

For a unimodal distribution, the mode is a good choice for the parameter guess. Any number for which the CDF is not extremely close to 0 or 1 should be acceptable. If the cdf(guess) is near 1 or 0, then its derivative is near 0, and so the numerical root finder will be very slow to converge.

This sampling technique is slow (~3 ms/sample for a unit normal with initial guess of 0), since we re-integrate to get the CDF at every iteration of the numerical root-finder. This is improved somewhat by using Chebyshev approximations of the CDF, but the sampling rate is still prohibitively slow for >100k samples.