We show how custom Monte Carlo proposal distributions can be expressed as samplers written in probabilistic programming languages. We call these probabilistic programs *proposal programs*. Proposal programs allow the inference practitioner to naturally express their knowledge about a target distribution by writing a sampler that can include fast heuristic estimation algorithms, internal random choices, as well as parameters that are to be automatically optimized during an inference compilation phase. The knowledge encoded in proposal programs can translate into vastly improved time-accuracy profiles for approximate inference, relative to generic proposals. Existing probabilistic programming machinery can automate the estimation of the intractable or tedious importance weights and acceptance probabilities that result when proposal programs are used to define complex proposal distributions in importance sampling, sequential Monte Carlo (SMC), and MCMC. The resulting samplers retain guarantees of asymptotic convergence.

Using the machinery of proposal programs, we propose an inference programming methodology that bridges the gap between compiled neural inference approaches and hand-crafted proposal distributions. Often, the knowledge we have about the target distribution is more relevant to the modes of the target distribution, and not the variability within a given mode. Therefore, instead of requiring the programmer to specify manually how the proposal variability should be computed, we propose to learn the proposal variability automatically. For example, the variability may be determined by a neural network whose parameters are optimized during an ‘inference compilation’ phase. We illustrate the technique on a Bayesian linear regressions with outliers problem. The proposal program uses the RANSAC robust estimation algorithm is to quickly locate the modes of the target distribution, and then adds variability to the hypothesis returned by RANSAC based on the output of a neural network, which predicts whether RANSAC was accurate on the given data set as well as the inherent variability in the posterior.

Authors: Marco Cusumano-Towner, Vikash K. Mansinghka