The (International) Bayes Club
The Bayes club is an informal meeting of researchers in the field of Bayesian statistics. Although the interest is general, most of the presentations concern subjects in mathematical statistics, such as non-parametric priors and the asymptotic behaviour of posterior distributions. Our aim is to exchange ideas, to present our work/research and to discuss other developments in the field. In normal times, meetings are held on Fridays, 16:00-17:00h, (usually) at the Korteweg-de Vries Institute for Mathematics, University of Amsterdam Science Park, Amsterdam. (new)The Bayes Club now has a mailing list. Please subscribe here to get invitations for the Bayes Club meetings and other fun events. Upcoming Talks this Semester
- 3 December, 2020
Time: 15:00-17:00, Location: https://vu-live.zoom.us/j/93507519282 (Password: d7eHf@)
Sergios Agapiou (U Cyprus) Gauss versus Laplace rates of contraction under Besov regularity (abstract) Pierre Jacob (Harvard) A Gibbs sampler for a class of random convex polytopes (abstract)
Past Talks
- 3 December, 2020 - Sergios Agapiou - Gauss versus Laplace rates of contraction under Besov regularity (abstract)
- 3 December, 2020 - Pierre Jacob - A Gibbs sampler for a class of random convex polytopes (abstract)
- 5 November, 2020 - Hanne Kekkonen - Consistency of Bayesian inference for the inverse problem of the time dependant Schroedinger equation (abstract)
- 5 November, 2020 - Subhashis Ghoshal - Bayesian Inference on Monotone Regression Quantile: Coverage and Rate Acceleration (abstract)
- 17 May, 2019 - Stefan Franssen - Uncertainty quantification with species sampling processes (abstract)
- 12 Apr, 2019 - Max Welling - Gauge Fields in Deep Learning (abstract)
- 1 March, 2019 - Peter Grunwald - Bayes Factors, S-Values and Optional Continuation (abstract)
- 8 Feb, 2019 - Gianluca Finocchio - Bayesian variance estimation in Gaussian models with partial information (abstract)
- 8 Feb, 2019 - Subhashis Ghoshal - Posterior Contraction and Credible Sets for Filaments of Regression Functions (abstract)
- 30 Nov, 2018 - Moritz Schauer - Nonparametric Bayesian inference for Levy subordinators (abstract)
- 9 Nov, 2018 - Joris Bierkens - Reflections on piecewise deterministic Monte Carlo (abstract)
- 2 Nov, 2018 - Frank van der Meulen - Simulation of hypo-elliptic conditioned diffusions (abstract)
- 1 June, 2018 - Ismael Castillo - Inference via spike-and-slab posterior distributions (abstract)
- 18 May, 2018 - Bas Kleijn - What is asymptotically testable and what is not? (abstract)
- 4 May, 2018 - Amine Hadji - Uncertainty quantification using Gaussian Process with squared exponential kernel (abstract)
- 6 Apr, 2018 - Dong Yan - Bayesian linear inverse problems in regularity scales (abstract)
- 23 March, 2018 - Natalia Bochkina - Bernstein-von Mises theorem for inverse problems (abstract)
- 23 Febr, 2018 - Stéphanie van der Pas - Posterior concentration for Bayesian regression trees and their ensembles (abstract)
- 9 Febr, 2018 - Pierpaolo de Blasi - Asymptotics of the number of groups in a sample from the geometric stick-breaking process (abstract)
- 9 Febr, 2018 - Shota Gugushvili - Nonparametric Bayesian volatility estimation (abstract)
- 1 Dec, 2017 - Stephane Robin - Goodness of fit for logistic regression in random graphs: a (variational) Bayes approach (abstract)
- 3 Nov, 2017 - Kolyan Ray - Bayesian estimation of the average response in a missing data model (abstract)
- 20 Oct, 2017 - Moritz Schauer - Nonparametric Bayesian estimation of a Hölder continuous diffusion coefficient (abstract)
- 6 Oct, 2017 - Subhashis Ghosal - Bayesian methods for boundary detection in images (abstract)
- Sep 22, 2017 - Jan van Waaij - Adaptive posterior contraction results for Bayesian methods for diffusions (abstract)
- Jul 4, 2017 - Minwoo Chae - A probabilistic approach for solving some inverse problems (abstract)
- May 24, 2017 - Sonia Petrone - On a class of interacting randomly reinforced processes with implications in Bayesian nonparametrics (abstract)
- May 12, 2017 - Alexander Ly - Bayesian hypothesis testing in psychology: From Harold Jeffreys’s default Bayes factor to JASP (abstract)
- Apr 7, 2017 - Eduard Belitser - Empirical Bayes oracle uncertainty quantification for linear regression (abstract)
- Mar 17, 2017 - Moritz Schauer - Local criteria for posterior contraction (abstract)
- Feb 24, 2017 - Johannes Schmidt-Hieber and Markus Reiss - Nonparametric Bayesian analysis for support boundary recovery (abstract)
- Dec 9, 2016 - Frank van der Meulen - Bayesian estimation for hypo-elliptic diffusions (abstract)
- Nov 4, 2016 - Joris Bierkens - The Zig-Zag process and super-efficient sampling for Bayesian analysis of big data (abstract)
- 21 Oct, 2016 - William Weimin Yoo - Making Bayesian inference and quantifying uncertainty through the sup-norm distance (abstract)
- 7 Oct, 2016 - Julyan Arbel - Bayesian nonparametric inference for discovery probabilities: credible intervals and large sample asymptotics (abstract)
- Jun 3, 2016 - Subhashis Ghosal - Bayesian estimation and uncertainty quantification for differential equation models (abstract)
- May 13, 2016 - Paulo Serra - Regression with correlated noise, a non-parametric approach (abstract)
- Apr 22, 2016 - Stephanie van der Pas - Conditions for posterior contraction in the sparse normal means problem (abstract)
- Apr 15, 2016 - Eduard Belitser - Needles and straw in a haystack: robust empirical Bayes confidence for possibly sparse sequences (abstract)
- Mar 18, 2016 - Bas Kleijn - Posterior consistency revisited (abstract)
- Dec 4, 2015 - Peter Gruenwald - Generalized Bayesian inference (abstract)
- Nov 20, 2015 - Botond Szabo - How many needles in a haystack? (abstract)
- Nov 6, 2015 - Moritz Schauer - Bayesian inference for partially observed diffusion processes (abstract)
- Oct 23, 2015 - Jan van Waaij - Adaptive posterior contraction results for computationally efficient priors for diffusion models (abstract)
- Oct 16, 2015 - Marjan Sjerps - Assessing and reporting the strength of forensic evidence (abstract)
- May 29, 2015 - Stephen Walker - Recursive Bayesian predictive distributions (abstract)
- May 1, 2015 - Botond Szabo - Asymptotic behaviour of the empirical Bayes posteriors associated to maximum marginal likelihood estimator (abstract)
- April 24, 2015 - Bartek Knapik - Posterior contraction and nonparametric inverse problems (abstract)
- April 17, 2015 - Antonio Lijoi - Bayesian nonparametrics with heterogeneous data (abstract)
- March 20, 2015 - Alicia Kirichenko - Estimating a smoothly varying function on a large graph (abstract)
- Februari 27, 2015 - Jan van Waaij - Using random scaling to a Gaussian prior gives minimax adapted convergence rates for a diffusion model (abstract)
- June 6, 2014 - Fengnan Gao - Posterior contraction rates for deconvolution of Dirichlet-Laplace mixtures (abstract)
- May 9, 2014 - Bas Kleijn - Bayesian testability and consistency (abstract)
- April 4, 2014 - Stephanie van der Pas - The Horseshoe Estimator: Posterior Concentration around Nearly Black Vectors (abstract)
- March 14, 2014 - Jean-Bernard Salomond - Adaptive Bayes test for monotonicity (abstract)
- February 27, 2014 - Harrison Zhou - Rate-optimal Posterior Contraction for Sparse PCA (abstract)
- February 18, 2014 - Richard Nickl - On nonparametric Bernstein-von Mises (BvM) theorems (abstract)
- January 10, 2014 - Peter Orbanz - Nonparametric priors for graphs and arrays (abstract)
- December 20, 2013 - Yanyun Zhao - Alternatives for Ghosh-Ghosal-van der Vaart priors (abstract)
- November 22, 2013 - Johannes Schmidt-Hieber - On adaptive posterior contraction rates (abstract)
- October 25, 2013 - Aad van der Vaart - A review of Bayesian species sampling models (abstract)
- March 6, 2013 - Levi Boyles - Bayesian Hierarchical Clustering with the Coalescent Prior (abstract)
- March 6, 2013 - Judith Rousseau - On Bayesian nonparametric adaptive estimation under uniform loss (abstract)
- December 21, 2012 - Shota Gugushvili - Posterior consistency for a rescaled Brownian motion (abstract)
- December 14, 2012 - Bas Kleijn - Criteria for Bayesian consistency (abstract)
- November 30, 2012 - Peter Gruenwald - Inconsistency of Bayesian Inference When the Model Is Wrong (abstract)
- November 2, 2012 - Bartek Knapik - Semiparametric posterior limits revisited (abstract)
- October 19, 2012 - Max Welling - Bayesian Posterior Sampling with Stochastic Gradients for "Big Data" Problems (abstract)
- June 15, 2012 - Catia Scricciolo - Bayes and empirical Bayes: do they merge? (abstract)
- May 25, 2012 - Botond Szabó - Adaptive Bayesian techniques for inverse problems (abstract)
- May 11, 2012 - Suzanne Sniekers - Credible sets in the fixed design model with Brownian motion prior (abstract)
- April 27, 2012 - Eduard Belitser - On Bayesian construction of exact confidence sets (abstract)
- April 20, 2012 - Johannes Schmidt-Hieber - Posterior concentration in high-dimensional regression under sparsity (abstract)
- December 16, 2011 - Eduard Belitser - On estimation of high dimensional vector of binomial proportions (abstract)
- November 25, 2011 - Harry van Zanten - A differential equations approach to nonparametric Bayesian drift estimation for diffusions on the circle (abstract)
- September 16, 2011 - Tim van Erven - Bayesian Methods in Online Learning (abstract)
- June 17, 2011 - Frank van der Meulen (abstract)
- June 10, 2011 - Subhashis Ghoshal - Predicting Proportion of False Discovery Proportions in Dependent Multiple Tests
- May 13, 2011 - Haralambie Leahu - On the BvM phenomenon in the Gaussian white noise model (abstract)
- April 8, 2011 - Bartek Knapik - Semiparametric posterior limits under local asymptotic exponentiality (abstract)
- April 1, 2011 - Bas Kleijn - Bayesian efficiency in mixture models
- March 18, 2011 - Aad van der Vaart (abstract)
Abstracts
We will discuss recent results on rates of contraction with a family of priors with tails between Laplace and Gaussian, termed p-exponential priors. We will focus on the white noise model and will discuss upper bounds on the rate of contraction under Besov regularity of the truth, in L_2-loss. We will use alpha-regular priors and will see that Laplace priors achieve the same and often better rates than Gaussian ones. In particular, we will see that for spatially inhomogeneous unknown functions, that is functions which are smooth in some areas but rough in other areas, Gaussian priors appear to be suboptimal. On the other hand Laplace priors achieve better rates, which can be minimax when the prior is appropriately calibrated.
This is joint work with Masoumeh Dashti and Tapio Helin
https://arxiv.org/abs/1811.12244 (to appear in Bernoulli).
We present a Gibbs sampler for the Dempster--Shafer (DS) approach to statistical inference in the setting of for Categorical distributions, with arbitrary numbers of categories and observations. The DS framework extends the Bayesian approach, allows in particular the use of partial prior information, and yields three-valued uncertainty assessments (p,q,r) representing probabilities "for", "agains", and "don't know" about formal assertions of interest. However DS gives rise to computational challenges even in settings as classical and seemingly simple as Categorical distributions for count data. The proposed algorithm targets the distribution of a class of random convex polytopes which encapsulate the DS inference. The validity of the sampler relies on an equivalence between the iterative constraints of some vertex configuration and the non-negativity of cycles in a fully connected directed graph. Experiments illustrate the output of the algorithm in the setting of 2x2 contingency tables.
This is joint work with Ruobin Gong, Paul T. Edlefsen & Arthur P. Dempster and a technical report is available at https://arxiv.org/abs/1910.11953
We consider the statistical nonlinear inverse problem of recovering the potential f in the time dependent Schroedinger equation, with given boundary and initial value functions, from N discrete noisy point evaluations of the solution u_f. We study the statistical performance of Bayesian nonparametric procedures based on Gaussian process priors, that are often used in practice. We show that, as the number of measurements increases, the resulting posterior distributions concentrate around the true parameter f_0 that generated the data. We also derive a convergence rate for the reconstruction error of the associated posterior means.
For a univariate monotone regression function, the location where a specific value is attained, is called a regression quantile. We study the coverage of a Bayesian credible interval for a regression quantile in a nonparametric monotone regression model, assuming that it is unique and the regression function has a positive derivative there. We put a prior on a piecewise constant function with equal intervals and independent normal priors on the step-heights. To comply with the monotonicity constraint, we induce a ``projection-posterior'' by imposing the monotonicity constraint on samples from the posterior distribution of the step-heights. We demonstrate two different interesting phenomena in this context. First, we show that the asymptotic coverage of a credible interval depends only on the credibility level, but is not the same as it. Rather, the asymptotic coverage is higher than the credibility. Such an over-coverage phenomenon has been recently observed for a credible interval for the value of a monotone regression or density function at a point, and is opposite of the phenomenon observed in the context of smooth function estimation. A targeted asymptotic coverage may be obtained by using an appropriate lower credibility level, which can be numerically computed from the quantiles of a distribution, which we will term as the Bayes-Chernoff distribution. Next, we show that the posterior contraction rate for the regression quantile can be improved from its optimal $n^{-1/3}$ rate to the parametric rate $n^{-1/2}$ by sampling in two stages, sparing a fraction of the sampling budget to sample later from a credible interval obtained in the first stage. This property is analogous to that of a point estimator obtained by sampling in an appropriately-sized neighborhood of the maximum likelihood estimator in the second-stage.
In this talk, we will introduce a class of species sampling process priors with independent relative stick-breaking weights which include the Dirichlet process and more generally the Pitman-Yor process. We will see a couple of results that make the latter two classes special. We will study the theoretical performance of the class by means of consistency theorems and prove Bernstein-von Mises theorems in two cases. We will first show the Bernstein-von Mises theorem for the Pitman-Yor prior, and then for a larger class of priors, we will show a Bernstein-von Mises theorem for atomless $P_0$.
One of the many problems surrounding p-value based null hypothesis testing is this: if our test result is promising but nonconclusive (say, p = 0.07) we cannot simply decide to gather a few more data points. While this "optional continuation" is ubiquitous in science, it invalidates frequentist error guarantees. Bayes factor (BF) hypothesis testing behaves better in this regard, *as long as the null hypothesis is simple*: we can then interpret the BF as (what we call) an 'S-value'. S-values generalize the concept of nonnegative supermartingale. S-values generically handle optional continuation: if we reject when S > b (say 20), we get a frequentist Type-I error guarantee of 1/b (say 0.05) that *remains valid under optional continuation*. However, if the null is *composite* then the Bayes factor is usually not an S-value and indeed violates Type-I error guarantees. Here we provide a generic solution to this issue - we show that, for arbitrary composite nulls, there exist special priors under which BFs become S-values. In general, these priors are unlike any of the priors encountered in Bayesian practice; however, for the special case where all parameters in the null satisfy a group invariance, using the often-used improper right Haar prior one does get an S-value. Remarkably, Jeffreys (1961) Bayesian t-test, which uses the right Haar on \sigma and a Cauchy on \mu thus gives an S-value and can handle frequentist optional continuation; however, we show that there exists an alternative prior on \mu which gives substantially higher power, while still handling optional continuation.
Gauge field theory is the foundation of modern physics, including general relativity and the standard model of physics. It describes how a theory of physics should transform under symmetry transformations. For instance, in electrodynamics, electric forces may transform into magnetic forces if we transform a static observer to one that moves at constant speed. Similarly, in general relativity acceleration and gravity are equated to each other under symmetry transformations. Gauge fields also play a crucial role in modern quantum field theory and the standard model of physics, where they describe the forces between particles that transform into each other under (abstract) symmetry transformations.
In this work we describe how the mathematics of gauge groups becomes inevitable when you are interested in deep learning on manifolds. Defining a convolution on a manifold involves transporting geometric objects such as feature vectors and kernels across the manifold, which due to curvature become path dependent. As such it becomes impossible to represent these objects in a global reference frame and one is forced to consider local frames. These reference frames are arbitrary and changing between them is called a (local) gauge transformation. Since we do not want our computations to depend on the specific choice of frames we are forced to consider equivariance of our convolutions under gauge transformations. These considerations result in the first fully general theory of deep learning on manifolds, with gauge equivariant convolutions as the necessary key ingredient.
This is a joint work with Taco Cohen, Maurice Weiler and Berkay Kicanaoglu.
We study the problem of variance estimation in Gaussian models where the maximum likelihood estimator is biased and inconsistent. These are sequences of $n$ independent Gaussian variables sharing the same variance and whose means are unknown parameters, with the exception of the first $p< n$ observations which form a subsample of i.i.d. standard Gaussians variables. In a semiparametric sense, we treat the mean vector as nuisance and we describe the behaviour of the marginal posterior for the variance depending on the class of priors on the parameters, the variance and the mean vector are always treated as independent a priori. We find that the marginal posterior is inconsistent if the nuisances are modelled as i.i.d. a priori, as long as the prior is proper. We also find that consistency is retained when a hierarchical structure is implemented, that is when the prior on the nuisance is a mixture over a parametric family of proper distributions.
The filament of a smooth function f consists of local maximizers of f when moving in a certain
direction. The filament is an important geometrical feature of the surface of the graph of a function. It is also considered as an important lower dimensional summary in analyzing multivariate data. There have been some recent theoretical studies on estimating filaments of a density function using a nonparametric kernel density estimator. In this talk, we consider a Bayesian approach and concentrate on the nonparametric regression problem. We study the posterior contraction rates for filaments using a finite random series of B-splines prior on the regression function. Compared with the kernel method, this has the advantage that the bias can be better controlled when the function is smoother, which allows obtaining better rates. Under an isotropic Holder smoothness condition, we obtain the posterior contraction rate for the filament under two different metrics --- a distance of separation along an integral curve, and the Hausdorff distance between sets. Moreover, we construct credible sets of optimal size for the filament with sufficient frequentist coverage. We study the performance of our proposed method through a simulation study and apply on a dataset on California earthquakes to assess the fault-line of the maximum local earthquake intensity.
Based on joint work with my former graduate student, Dr. Wei Li, Assistant Professor, Syracuse University, New York.
Given discrete-time observations over a growing time interval, we consider a non-parametric Bayesian approach to estimation of the Lévy density of a Lévy process belonging to a flexible class of infinite activity subordinators. Posterior inference is performed via MCMC, and we circumvent the problem of the intractable likelihood via the data augmentation device, that in our case relies on bridge process sampling via Gamma process bridges. Our approach also requires the use of an infinite-dimensional form of a reversible jump MCMC algorithm. We show that our method leads to good practical results in challenging simulation examples. On the theoretical side, we establish that our non-parametric Bayesian procedure is consistent: in the low-frequency data setting, with observations equi-spaced in time and intervals between successive observations remaining fixed, the posterior asymptotically, as the sample size grows to infinity, concentrates around the Lévy density under which the data have been generated. Finally, we test our method on an insurance dataset.
Joint work with Denis Belomestny, Shota Gugushvili, Peter Spreij
In recent years piecewise deterministic Markov processes (PDMPs) have emerged as a promising alternative to classical MCMC algorithms. In particular these PDMP based algorithms have good convergence properties and allow for efficient subsampling. Although many different PDMP based algorithms can be designed, two algorithms play fundamental roles: the Bouncy Particle sampler and the Zig-Zag sampler. In this talk both algorithms will be introduced and a comparison of properties of these algorithms will be presented, including recent results on ergodicity and on scaling with respect to dimension.
Joint work with Pierre-André Zitt, Kengo Kamatani and Gareth Roberts
Consider a diffusion that is constructed as a strong solution to a stochastic differential equation (SDE). Parameter estimation of discrete time data is hard, due to intractability of the likelihood. To deal with this problem, a popular approach is to use a data-augmentation method, where the latent path is imputed. This imputation boils down to simulation of conditioned diffusions. Over the past decade this area of research has received quite some attention. Virtually all of the proposed methods assume that each component of the SDE is driven by its own Brownian motion. Unfortunately, if this is not the case, they break down. I will discuss how this problem can be circumvented for a wide class of SDE-models.
This talk is based on recent work with Joris Bierkens (Delft) and Moritz Schauer (Leiden)
This talk will discuss several aspects of inference using so-called spike-and-slab prior distributions on unknown sparse vectors, where the proportion of non-zero coefficients is chosen using marginal maximum likelihood Empirical Bayes. We will consider convergence of the full posterior distribution in terms of rates, its ability to provide asymptotically valid confidence sets that cover the true sparse parameter, as well as the possibility to use such a posterior for testing multiple hypotheses.
The talk is based on joint works with Romain Mismer, Botond Szabo and Etienne Roquain.
Given a statistical model for i.i.d. data, certain hypotheses can be tested consistently, while others cannot. If one thinks of consistent tests in terms of converging sequences of test statistics, some immediate conclusions can be drawn. But classical counterexamples (Cover, 1973; Dembo-Peres 1995) demonstrate that the matter is more involved. We address the problem of what characterizes the asymptotic testability of hypotheses for uniform, pointwise and Bayesian tests. Posteriors distinguish measurable hypotheses (prior-almost-surely), but frequentist tests require more. Application of the Le Cam-Schwartz theorem (Le Cam-Schwartz, 1960; write U for the associated uniformity), leads to two equivalences: hypotheses are testable with uniform power if and only if they are separated by U. Hypotheses are testable in a pointwise sense, if and only if the testing problem can be represented (continuously with respect to U) in a separable metric space. The above is illustrated with a a variety of examples and counterexamples.
Gaussian processes are widely used as nonparametrical priors in various fields of applications, including finance, machine learning, genomics, and epidemiology. Arguably, one of the most frequently used covariance kernel is the squared exponential kernel. As sample paths from the corresponding process are too smooth, it is common to rescale the kernel. The optimal rescaling factor depends however on the true parameter, which leads in practice to the use of empirical or hierarchical Bayes methods. The current results focus mainly on the recovery of the underlying functional parameter in different contexts and derive (nearly) optimal posterior contraction rates. However less is known about the reliability of uncertainty quantification using this type of process. In our work, we investigate the coverage properties of the corresponding credible sets in context of the Gaussian white noise model. We show that the resulting posterior distribution is not suitable for uncertainty quantification as the credible sets will have coverage tending to zero for typical signals. The derived theoretical findings are demonstrated on a thorough simulation study, where amongst others we obtain that Gaussian processes with squared exponential kernel have substantially worse coverage properties than other Gaussian processes. On the other hand, blowing the radius of our credible set by a log n factor allows it to encompass the truth.
This is an ongoing joint work with Botond Szabo and Aad van der Vaart.
We obtain rates of contraction of posterior distributions in inverse problems defined by scales of smoothness classes. We derive abstract results for general priors, with contraction rates determined by Galerkin approximation. The rate depends on the amount of prior concentration near the true function and the prior mass of functions with inferior Galerkin approximation. We apply the general result to non-conjugate series priors, showing that these priors give near optimal and adaptive recovery in some generality, Gaussian priors, and mixtures of Gaussian priors, where the latter are also shown to be near optimal and adaptive. The proofs are based on general testing and approximation arguments, without explicit calculations on the posterior distribution. We are thus not restricted to priors based on the singular value decomposition of the operator. We illustrate the results with examples of inverse problems resulting from differential equations.
This is a joint work with Shota Gugushvili and Aad van der Vaart.
We study concentration and the rate of contraction of the posterior distribution when the observations of the function of interest are indirect, and where the variance of the noise is heterogeneous, possibly depending on the unknown signal. Performance of the considered Bayesian estimator is illustrated on simulated data.
This is joint work with my PhD student Jenovah Rodrigues (University of Edinburgh)
Since their inception in the 1980's, regression trees have been one of the more widely used nonparametric prediction methods. Tree-structured methods yield a histogram reconstruction of the regression surface, where the bins correspond to terminal nodes of recursive partitioning. Trees are powerful, yet susceptible to overfitting. Strategies against overfitting have traditionally relied on pruning greedily grown trees. The Bayesian framework offers an alternative remedy against overfitting through priors. Roughly speaking, a good prior charges smaller trees where overfitting does not occur. In this paper, we take a step towards understanding why/when do Bayesian trees and their ensembles not overfit. We study the speed at which the posterior concentrates around the true smooth regression function. We propose a spike-and-tree variant of the popular Bayesian CART prior and establish new theoretical results showing that regression trees (and their ensembles) (a) are capable of recovering smooth regression surfaces, achieving optimal rates up to a log factor, (b) can adapt to the unknown level of smoothness and (c) can perform effective dimension reduction when $p > n$. These results provide a piece of missing theoretical evidence explaining why Bayesian trees (and additive variants thereof) have worked so well in practice.
This is a joint work with Veronika Rockova.
Abstract: Given discrete time observations over a fixed time interval, we study a nonparametric Bayesian approach to estimation of the volatility coefficient of a stochastic differential equation. We postulate a histogram-type prior on the volatility with piecewise constant realisations on bins forming a partition of the time interval. The values on the bins are assigned an inverse Gamma Markov chain (IGMC) prior. Posterior inference is straightforward to implement via Gibbs sampling, as the full conditional distributions are available explicitly and turn out to be inverse Gamma. We also discuss in detail the hyperparameter selection for our method. Our nonparametric Bayesian approach leads to good practical results in a wide range of simulation examples. Finally, we apply it on a classical data set in change-point analysis: weekly closings of the Dow-Jones industrial averages.
Joint work with Frank van der Meulen, Moritz Schauer and Peter Spreij.
Random discrete probability measures define prior distributions which are used for estimation in mixture models and species sampling problems. The law of the frequencies can be described in terms of the random partition featured by a sample taken from the prior and, in particular, by its large sample behavior as the sample size increases. By a result of Karlin, the asymptotics of the number of groups of the partition is determined by the rate of decay of the small frequencies, so that in the random frequency case it can be studied by a deconditioning argument whenever the frequencies in decreasing order admit a tractable form. The geometric stick-breaking process has been recently introduced as a simple yet effective alternative to the Dirichlet process, with frequencies that take the form of geometric probabilities with random probability of success. In this work we investigate the effect of the prior of this parameter on the asymptotic behavior of the number of groups, and we show that a whole range of logarithmic behavior can be achieved by appropriately tuning the prior. We also extend the analysis to a large class of priors that includes the geometric stick breaking process and features an additional parameter representing the number of failures of the negative binomial distribution.
Interaction networks are commonly used in various fields, from ecology to social sciences to describe the relations between a set of entities such as species or individuals. In many applications, covariates are also available, that describe phylogenetic proximities or social similarities between the entities. A natural question is then to understand to what extent covariates contribute to explain the topology of the network.
To this aim we define a generic network model that combines logistic regression on the covariates with graphon-type residual term. More specifically, we infer the residual graphon using a mixture of stochastic block models, which enables us to (i) estimate the covariate effects, (ii) test the existence of a residual structure (i.e. assess goodness-of-fit of the logistic regression) and (iii) visualize the residual structure.
In all this work, we adopt a Bayesian perspective, for which we first propose a variational Bayes (VB) inference strategy. We then show how the VB approximate posterior can be used as a starting point for bridge sampling toward the true posterior.
This is a joint work with Sophie Donnet, Pierre Latouche and Sarah Ouadah and relies on the following papers: * Latouche, P., & Robin, S. (2016). Variational Bayes model averaging for graphon functions and motif frequencies inference in W-graph models. Statistics and Computing, 26(6), 1173-1185. * Latouche, P., Robin, S., & Ouadah, S. (2017). Goodness of fit of logistic regression models for random graphs. Journal of Computational and Graphical Statistics. * Donnet, S., & Robin, S. (2017). Using deterministic approximations to accelerate SMC for posterior sampling. arXiv preprint arXiv:1707.07971.
We study semiparametric Bayesian estimation of the average response in a binary regression model with missing observations. In particular, some dependence between the missingness mechanism and outcomes is allowed. This model is studied due to applications in causal inference and biostatistics. We show that product priors satisfy a semiparametric Bernstein-von Mises under some conditions and suggest a dependent prior geared towards estimating this specific functional.
This is ongoing joint work with Aad van der Vaart.
I present a non-parametric Bayesian approach to the estimation of a diffusion coefficient of a stochastic differential equation given discrete time observations on its solution over a fixed time interval. As a prior on the diffusion coefficient, a histogram-type prior with piecewise constant realisations on bins forming a partition of the time interval is employed. The approach is justified theoretically by deriving the rate at which the corresponding posterior distribution asymptotically concentrates around the diffusion coefficient under which the data have been generated. As practical examples I consider interest rates and the North Greenland Ice Core data on the ratio of oxygen isotopes.
This is joint work with Frank van der Meulen, Shota Gugushvili, Peter Spreij.
Boundary detection in images has many important
applications in spatial statistics, forestry, climatology, medical
sciences and other fields and may be thought of as higher
dimensional generalization of change-point problems. The
boundary of a d-dimensional image may be viewed as a d-1
dimensional manifold, and in particular a smooth closed, not
self-intersecting curve for 2D images. We consider a
Bayesian approach to the problem using a prior indexed by
the unit circle, or the unit sphere in higher dimension, typically
constructed from a Gaussian process or a finite random
series using trigonometric polynomials or spherical harmonics
basis. For the most important case of 2D images, a very
convenient prior is the squared exponential periodic Gaussian
process. Its explicit eigen decomposition in terms of Bessel
functions allows a convenient computational scheme and
obtaining posterior contraction rate. We show that the posterior
contracts at the minimax optimal rate and adapts to the
unknown smoothness level of the curve. Simulation
experiments show that the method is exceptionally efficient
and robust against misspecification.
[This talk is based on joint work with Meng Li.]
We observe a diffusion X^T={X_t:t\in[0,T]}, which is given
by the stochastic differential equation dX_t=\theta(X_t)dt+dW_t,
where the drift function \theta:R\to R is measurable, one-periodic
and \int_0^1\theta(x)^2dx<\infty and W is a Brownian motion. This
model was for instance used in molecular dynamics, see
Papaspiliopoulos et al (2012). We are interested in estimating the
unknown parameter \theta. In this talk we will consider hierarchical
and empirical Bayes methods on the scaling hyperparameter of a
Gaussian process prior to obtain adaptivity. For both methods we
obtain minimax posterior contraction results. We will also discuss
future work. [Joint work with Harry van Zanten (University of Amsterdam)
and Moritz Schauer (Leiden University); the hierarchical Bayes part
is based on https://projecteuclid.org/euclid.ejs/1457382316.]
We consider novel iterative algorithms for solving some classical
inverse problems. They are fixed-point iterations that can be derived from Bayes theorem. The proposed approach is applied to solve i) a system of linear equations Ax = b and ii) a Fredholm integral equation of the first kind. If there is no solution for these equation, the proposed algorithm can find an optimal alternative. As a statistical application, we provide a new estimator for the mixing distribution in nonparametric mixtures. It is shown that the EM algorithm is a special case. [Joint work with M. Ryan and S. Walker]
Reinforced processes are of interest in probability and in
many related areas. In Bayesian statistics, they are the basis of
fundamental constructions of exchangeable processes, with notable
examples in Bayesian nonparametrics, and applications including
network modeling, where they may act as preferential attachment
rules. However, extensions to randomly reinforced processes (RRP)
may be needed in order to capture forms of competition, selection,
asymmetries or other forms of non-stationarity. In this talk, I will
review some results for a class of asymptotically exchangeable
RRPs, and present new results about interacting RRPs, that are
asymptotically partially exchangeable. Some examples will
illustrate the theory, and suggest implications in Bayesian
nonparametric inference. [Joint work with Sandra Fortini]
For many years statisticians (e.g., Berkson 1942; Wasserstein, 2016)
have warned applied scientists of the dangers of the mechanistic use of p-values
as a license for making a claim of a scientific finding. The erroneous belief that
a significant p-value justifies a scientific claim has led to many extremely
unbelievable, bizarre and obscene findings in (social) psychology that do not
replicate (OSF, 2015).
Despite the ongoing verbal assault, empirical disciplines such as psychology,
biology, and medicine continue to depend on p-values as the standard method
of drawing conclusions from data. An important reason for the lingering popularity
of the p-value is arguably the perceived lack of an accessible alternative method.
Under the guidance of Eric-Jan Wagenmakers, I have developed Bayes factors
for some standard null hypothesis significant tests, which I subsequently make
available to practitioners through the free and open-source software package JASP.
To construct a Bayes factor, one has to (1) select a pair of priors and (2) calculate
two integrals. In this talk I'll elaborate on how priors are selected for normal linear
model comparisons, based on the ideas of Harold Jeffreys and the subsequent
work of Liang et al. 2008 and Bayarri et al 2012.
[Joint work with EJ Wagenmakers and the JASP team]
We propose an empirical Bayes posterior and use it to construct a predictor
and a confidence region. Next we study the contraction properties of the posterior
from the frequentist point of view, the estimation error of the predictor and the
coverage properties of the confidence set. Following an oracle approach which
quantify error locally for each possible value of the parameter, optimal rates are
obtained and uniform size-optimal confidence sets with guaranteed coverage are
constructed under an "excessive bias restriction'' condition for many conceivable
classes simultaneously. This condition gives rise to a new slicing of the entire
space that is suitable for uncertainty quantification.
[Joint work with S. Ghoshal.]
In this talk I consider posterior contraction rates under
local entropy and prior mass conditions in linear spaces. Appropriate
global tests are constructed from local alternative hypotheses
assuming a natural link between Hellinger distance and a locally
convex metric. Local metric entropy bounds then bound the difficulty
to test the truth against the entire space.
Given a sample of a Poisson point process with intensity \lambda_f(x,y) = n 1(f(x) less than or equal y), we study recovery of the boundary function f from a nonparametric Bayes perspective. Because of the irregularity of this model, the standard approach for posterior contraction rates cannot be applied. We derive a general result for posterior contraction with respect to the Hellinger distance. This result is applied to several classes of priors, including Gaussian priors, priors based on random series, compound Poisson processes, and sub-ordinators. We also investigate the limiting shape of the posterior distribution and derive a nonparametric version of the Bernstein-von Mises theorem for irregular models. We show that for piecewise constant functions, the marginal posterior of the functional \theta = \int f does some automatic bias correction and contracts with a faster rate than the MLE. We also show that this property is lost if the true underlying function comes from a more general class of functions. [Joint work with M. Reiss]
Suppose X is a discretely observed diffusion process and we wish to sample from the posterior distribution of parameters appearing in either the drift coefficient or the diffusion coefficient. As the likelihood is intractable, a common approach is to derive an MCMC algorithm where the missing diffusion paths in between the observations are augmented to the state space. This requires efficient sampling of diffusion bridges. In recent years some results have appeared in the "uniformly elliptic? case, which is characterised by nondegeneracy of the covariance matrix of the noise. The "hypo-elliptic? case refers to the situation where the covariance matrix of the noise is degenerate and where observations are made only of variables that are not directly forced by white noise. As far as I am aware, not much is known how to sample bridges in this case. In this talk I will share some recent ideas on extending earlier results with Harry van Zanten (UvA) and Moritz Schauer (Leiden), derived under the assumption of uniformly ellipticity, to this setting. This concerns "work in progress", so I won't be able to provide a full solution to problem.
Standard MCMC methods can scale poorly to big data settings due to the need to evaluate the likelihood at each iteration. There have been a number of approximate MCMC algorithms that use sub-sampling ideas to reduce this computational burden, but with the drawback that these algorithms no longer target the true posterior distribution. We introduce a new family of Monte Carlo methods based upon a multi-dimensional version of the Zig-Zag process of (Bierkens, Roberts, 2016), a continuous time piecewise deterministic Markov process. While traditional MCMC methods are reversible by construction the Zig-Zag process offers a flexible non-reversible alternative. The dynamics of the Zig-Zag process correspond to a constant velocity model, with the velocity of the process switching at events from a point process. The rate of this point process can be related to the invariant distribution of the process. If we wish to target a given posterior distribution, then rates need to be set equal to the gradient of the log of the posterior. Unlike traditional MCMC, We show how the Zig-Zag process can be simulated without discretisation error, and give conditions for the process to be ergodic. Most importantly, we introduce a sub-sampling version of the Zig-Zag process that is an example of an exact approximate scheme. That is, if we replace the true gradient of the log posterior with an unbiased estimator, obtained by sub-sampling, then the resulting approximate process still has the posterior as its stationary distribution. Furthermore, if we use a control-variate idea to reduce the variance of our unbiased estimator, then both heuristic arguments and empirical observations show that Zig-Zag can be super-efficient: after an initial pre-processing step, essentially independent samples from the posterior distribution are obtained at a computational cost which does not depend on the size of the data.
In the context of nonparametric regression with unknown errors, we propose Bayesian methods to estimate the regression function f. We investigate frequentist properties of the resulting posterior distribution using the supremum-norm (sup-norm) distance. In particular, we study sup-norm posterior contraction rates and coverage of credible bands for f and its derivatives. We further study issues concerning adaptation in sup-norm, and provide adaptive Bayesian procedures that achieve the minimax sup-norm rate. We found that priors based onLepski?s method, spike and slab priors, and scaled integrated Brownian motion priors will work. The study of posterior contraction rates and credible sets in sup-norm is important, for its natural interpretation and implications for other problems such as function mode estimation.
Given a sample of size n from a population of individuals belonging to different species with unknown proportions, a popular problem of practical interest consists in making inference on the probability D_{n}(l) that the (n+1)-th draw coincides with a species with frequency l in the sample, for any l=0,1,...,n. This paper contributes to the methodology of Bayesian nonparametric inference for D_{n}(l). Specifically, under the general framework of Gibbs-type priors we show how to derive credible intervals for a Bayesian nonparametric estimation of D_{n}(l), and we investigate the large n asymptotic behaviour of such an estimator. Of particular interest are special cases of our results obtained under the specification of the two parameter Poisson--Dirichlet prior and the normalized generalized Gamma prior, which are two of the most commonly used Gibbs-type priors. With respect to these two prior specifications, the proposed results are illustrated through a simulation study and a benchmark Expressed Sequence Tags dataset. To the best our knowledge, this illustration provides the first comparative study between the two parameter Poisson--Dirichlet prior and the normalized generalized Gamma prior in the context of Bayesian nonparemetric inference for D_{n}(l). [Joint work with S. Favaro (University of Torino); Bernardo Nipoti (Trinity College, Dublin); Yee Whye Teh (University of Oxford)]
In several fields like genetics, viral dynamics, pharmacokinetics and pharmacodynamics, population studies and so on, regression models are often given by differential equations which are not analytically solvable. In this talk, Bayesian estimation and uncertainty quantification is addressed in such models. The approach is based on embedding the parametric model in a nonparametric regression model and extending the definition of the parameter beyond the original model. The nonparametric regression function is expanded in a basis and normal priors are given on coefficients leading to a normal posterior, which then induces a posterior distribution on the model parameters through a projection map. The posterior can be obtained by a simple direct sampling. We establish Bernstein-von Mises type theorems for the induced posterior distribution of the model parameters. We consider different choices of the projection map and study its impact on the asymptotic efficiency of the Bayesian estimator. We further show that posterior credible regions have asymptotically correct frequentist coverage. A simulation study and applications to some real date sets show practical usefulness of the method. Ideas of extending the methods to higher order differential equations and partial differential equations will also be discussed. [This talk is based on joint work with Prithwish Bhaumik.]
Regression models, particularly of the signal-plus-noise variant, play a central role in statistics and are a fundamental tool in many applied fields. Typically, the noise terms are assumed to be independent but this is often not a realistic assumption. Methods for selecting bandwidths/smoothing parameters for kernel/spline estimators like generalised cross-validation (GCV) break down even if the correlation is mild. To deal with this, two common approaches are to either “robustify” the criteria for selecting bandwidth/smoothing parameters, or making a parametric assumption on the noise. Unfortunately, both approaches are very sensitive to misspecification. The approach I will talk about is fully non-parametric. In this talk I will focus on penalised spline estimators, essentially smoothing splines with relatively few knots. I will show how they can be interpreted as Bayesian estimators (corresponding to a certain prior on the regression function). An alternative interpretation is as best linear unbiased predictors (BLUPs) in a linear mixed-effects model (LMM). The spline parameters are estimated via the empirical Bayes approach. I will talk a bit about some implementation issues, and about the asymptotics of the estimators. These asymptotics make explicit the influence of the correlation structure on the smoothing parameters of the penalised spline, and introduce some non-trivial constraints on the order of the splines. I will close with some numerical experiments where I compare our approach to two kernel estimators, and to a standard R procedure based on a (parametric) assumption on the noise structure.
[joint work w/ Tatyana Krivobokova, Francisco Rosales (Univ. of Goettingen)]
A large number of continuous shrinkage priors has been proposed to tackle the sparse normal means problems. Many of these shrinkage priors can be written as a scale mixture of normals, which makes them particularly easy to implement. We propose general conditions on the prior on the local variance in scale mixtures of normals, such that posterior contraction at the minimax rate is assured. The conditions require tails at least as heavy as Laplace, but not too heavy, and a large amount of mass around zero relative to the tails, more so as the sparsity increases. These conditions give some general guidelines for choosing a shrinkage prior for estimation under a nearly black sparsity assumption. We verify these conditions for the class of priors considered by Ghosh and Chakrabarti (2015), which includes the horseshoe and the normal-exponential gamma priors, and for the horseshoe+, the inverse-Gaussian prior, the normal-gamma prior, and the spike-and-slab Lasso, and thus extend the number of shrinkage priors which are known to lead to posterior contraction at the minimax estimation rate.
In the many normal means model we construct an empirical Bayes posterior which we then use for uncertainty quantification for the unknown (possibly sparse) parameter by constructing an estimator and a confidence set around it as empirical Bayes credible ball. We allow the model to be misspecified (the normality assumption can be dropped, with some moment conditions instead), leading to the robust empirical Bayes inference. An important step in assessing the uncertainty is the derivation of the fact that the empirical Bayes posterior contracts to the parameter with a local (i.e., depending on the parameter) rate which is the best over certain family of local rates; therefore called oracle rate. We introduce the so called Excessive Bias Restriction under which we establish the local (oracle) confidence optimality of the empirical Bayes credible ball. Adaptive minimax results (for the estimation and posterior contraction problems) over sparsity classes follow from our local results. An extra (square root of) log factor appears in the radial rate of the confidence ball; it is not known whether this is an artifact or not.
Frequentist conditions for asymptotic suitability of Bayesian procedures focus on lower bounds for prior mass in Kullback-Leibler neighbourhoods of the data distribution. In this talk, we investigate the flexibility in criteria for posterior consistency with i.i.d. data. We formulate a new posterior consistency theorem that applies both to well- and mis-specified models and which we use to re-derive Schwartz's theorem, consider Kullback-Leibler consistency and formulate consistency theorems in which priors charge metric balls. We also generalize to sieved models with Barron's negligible prior mass condition and to separable models with variations on
Walker's consistency theorem. Results also apply to marginal semi-parametric consistency: support boundary estimation is considered explicitly and consistency is proved in a model where the Kullback-Leibler priors do not exist. Other applications include Hellinger consistent density estimation in mixture models with Dirichlet or Gibbs-type priors of full weak support. Regarding posterior convergence at a rate, it is shown that under a mild integrability condition, the second-order Ghosal-Ghosh-van~der~Vaart prior mass condition can be relaxed to a lower bound to the prior mass in Schwartz's Kullback-Leibler neighbourhoods. The posterior rate of convergence is derived in a simple model for heavy-tailed distributions in which the Ghosal-Ghosh-van der Vaart
condition cannot be satisfied by any prior.
We develop a theory of 'generalized Bayesian inference' covering both standard Bayesian inference under misspecification and PAC-Bayesian inference. We define the \eta-generalized posterior, \eta=1 corresponding to standard Bayes, smaller \eta weighing the prior more strongly. We also define the \eta-convex hull of a probability model M, which for \eta=1 coincides with the standard convex hull but for smaller \eta shrinks towards M. Generalizing a construction due to Li and Barron, we show that for all \eta, there exists a distribution Q closest to the 'true' P* in KL divergence within the \eta-convex hull of M, and we define the *critical learning rate* \eta* as the largest \eta for which Q is not just in the \eta-convex hull but also in the model M itself. We show that generalized Bayes with any learning rate < \eta* concentrates as long as the prior puts sufficient mass in KL neighborhoods of Q, under no further conditions. A simple regression example shows that if generalized Bayes is run with a larger learning rate, it may not concentrate at all. We also show that conditions from the learning theory literature that ensure fast learning rates such as Tsybakov and Bernstein conditions, mixability and 'stochastic exp-concavity' can all be understood as special cases of the generic condition 'the critical learning rate should be large'. [Partially based on joint work with N. Mehta, T. van Ommen, T. van Erven. B. Williamson and M. Reid]
In our work we investigate the frequentist properties of the hierarchical Bayes and the maximum marginal likelihood empirical Bayes methods in the sparse multivariate mean model with unknown sparsity level. We consider the popular horseshoe prior introduced in Carvalho, Polson, and Scott (2008) and show that both adaptive Bayesian techniques lead to rate optimal posterior contraction without using any information on the sparsity level. Furthermore, we also investigate the frequenstist coverage properties of Bayesian credible sets resulting from the horseshoe prior both in the non-adaptive and adaptive setting. We show that the credible sets have good frequentist coverage and optimal size for appropriate choice of the tuning parameter (using information about the sparsity level). In case this information is not available the construction of adaptive and honest confidence sets is not possible, hence we have to introduce some additional restriction. We show that under a self-similarity type of assumption both the (slightly modified) hierarchical and empirical Bayes credible sets have (almost) rate adaptive size and good coverage. [Joint work with Stéphanie van der Pas and Aad van der Vaart.]
A multivariate, non-linear diffusion process with unknown parameters in drift and diffusion coefficient is partially observed with error at fix times. We introduce a process which closely approximates a diffusion bridge conditional on partial information about the location of the diffusion bridge at an intermediate time. We show that the distribution of this approximation and the conditional distribution of the diffusion bridge given the noisy intermediate observation are stochastically equivalent and we find the corresponding Girsanov likelihood in closed form. This leads to a Markov chain Monte Carlo procedure to sample from the joint distribution of the unobserved diffusion trajectory and the model parameters given the noisy, discrete, partial observations. This is illustrated at hand of the stochastic FitzHugh-Nagumo model for spike generation in squid axons modelling the axon membrane potential and a recovery variable, where only the membrane potential is observed.
Suppose we have continuous time observations $X^T=\{X_t:t\in[0,T]\}$ of a diffusion process $dX_t=b(X_t)dt+dW_t,$ with unknown drift parameter $b,$ which is 1-periodic and square integrable on $[0,1].$ In Bayesian setting Gaussian process priors were considered (see for instance Pokern, Stuart and van Zanten (2013), van Waaij and van Zanten (2015)). A randomly scaled and truncated wavelet series prior with Gaussian coefficients was proposed by van der Meulen, Schauer and van Zanten (2014), for which they develop an efficient algorithm to sample from the posterior. In this talk we will discuss our recent work on the asymptotic properties of this prior. Optimal rates up to a log factor and adaptivity to every Besov smoothness bigger than 1/2 will be shown. [Joint work with Frank van der Meulen, Harry van Zanten and Moritz Schauer.]
The so-called "Bayesian framework for interpreting evidence" uses Bayes rule to define the roles of (forensic) experts and lawyers. The experts are supposed to derive a likelihood ratio (LR) / Bayes Factor defined as the ratio of the probability of observing the evidence under two competing hypotheses. The lawyer can use this LR to update his prior beliefs in the hypotheses. Thus, the LR is seen as a numerical measure of evidential strength. This framework is currently considered the state of the art in forensic science, and is implemented in several forensic laboratories including the Netherlands Forensic Institute. I will explain the basic ideas and then highlight some difficulties when trying to actually calculate LRs in forensic casework (DNA, fingerprints, chemical analyses of e.g. glass and fire debris). One of these is a discussion about the fundamental question whether it makes sense to consider the uncertainty of a LR. I would value the opinion of the audience on my answer to this question.
The talk will discuss ideas for the construction of Bayesian predictive distributions; in particular, a recursive expression for fast estimation of the predictive, which avoids the need to compute the posterior distribution, will be presented. The key to the construction is the bivariate Gaussian copula.
In Bayesian nonparametrics it is common to consider a family of prior distribution indexed by some hyper parameters. The best choice of the prior out of this collection crucially depends on certain characteristics (e.g. smoothness,
sparseness,...) of the unknown function of interest, which are usually not available. Therefore in practice it is common to apply data dependent choices for the hyper-parameters. Arguably, the marginal likelihood empirical Bayes method is the best known data-dependent Bayesian procedure. The performance of this method was investigated only in specific models. Our aim is to investigate the performance of this method in a general nonparametric framework. We provide general theorems describing the frequentist behaviour of the empirical Bayes posterior distribution under “standard” assumptions. Then we apply the main theorem for various examples, recovering some of the existing results in the literature, along side with new models. [This is joint work with Judith Rousseau.]
General posterior contraction theorems are not suitable to deal with truly ill-posed inverse problems, as they lead to properties of the posterior for Kf rather than f. In other words, we obtain bounds on contraction rates in some natural metric measuring the distance between Kf and Kf_0, whereas the interest lies in the distance between f and f_0, and these two metrics are not equivalent. In this talk we review (a part of) the existing literature on Bayesian
approach to nonparametric inverse problems, and present a general contraction theorem. Our general result allows us to obtain minimax adaptive concentration rates in several settings, including a fixed-design nonparametric inverse regression example. [Joint work with JB Salomond (CWI Amsterdam)]
The talk surveys some recent work on random probability measure vectors and their role in Bayesian statistics. Indeed, dependent nonparametric priors are useful tools for drawing inferences on data that arise from different studies or experiments and for which the usual exhangeability assumption is not satisfied. The specific proposal that will be displayed gives rise to dependent discrete random probability measures and the talk will focus on their application to the analysis of right-censored survival data and to species sampling problems. The theoretical results to be presented are also relevant for devising Gibbs sampling schemes that will be applied to simulated and real datasets.
We propose a nonparametric Bayesian procedure for estimating a smooth function on an expanding graph. In particular, we investigate how the convergence rates of such procedures depend on the smoothness of the function and the geometry of the graph. Here both notions of ''geometry'' and ''smoothness'' are quantified using the Laplacian of the graph. We prove that using a rescaled Gaussian prior we can obtain an estimator that adapts to the degree of smoothness of the unknown function. Finally, we discuss the families of the graphs that satisfy our condition on the spectrum of the Laplacian.
Observe continuous observations of a one-dimensional diffusion process, where the drift function has some Sobolev smoothness. If the Sobolev-smoothness is known, put a Gaussian prior on the drift function. This gives minimax convergence rates and hence improve the results of Pokern, Stuart and Van Zanten (2012). If the Sobolev smoothness is known to be bounded from above, apply random scaling to a sufficiently smooth Gaussian prior. In this case we also obtain minimax convergence rates. If the Sobolev smoothness is unknown, apply random scaling to a Gaussian prior, where the smoothness of the Gaussian prior increases with the time T, this gives minimax rates up to a log factor. In this talk you will see how random scaling applied to a Gaussian process can be used to obtain a prior with adaptive optimal convergence rates.
We study nonparametric Bayesian inference with location mixtures of the Laplace density and a Dirichlet process prior on the mixing distribution. We derive a contraction rate of the corresponding posterior distribution, both for the mixing distribution relative to the Wasserstein metric and for the mixed density relative to the Hellinger and Lq metrics.
Bayesian consistency theorems come in (at least) three distinct types, e.g. Doob's prior-almost-sure consistency on Polish spaces (Doob, 1948), Schwartz's Hellinger consistency with KL-priors (Schwartz, 1965) and the `tailfree' weak consistency of Dirichlet posteriors. We ask the question how these notions are related and argue that one characterises them most conveniently using tests. We show that the existence of Bayesian tests is equivalent with Doob-like consistency of the posterior and show that Bayesian tests exist in much greater abundance than uniform tests. As examples we consider hypothesis testing problems like Cover's rational mean problem (Cover, 1973), tests for smoothness in Sobolev classes and tests for connectedness or cyclicality in networks. To achieve frequentist posterior consistency, we combine Bayesian tests with a prior condition that generalises Schwartz's KL-condition and accommodates weak consistency, e.g. involving the `tailfree' property of the Dirichlet distribution and others.
Carvalho, Polson and Scott (2010) introduced the horseshoe prior for the multivariate normal mean model in the situation that the mean vector is sparse in the nearly black sense. The corresponding posterior mean is used as an estimator of the underlying mean vector. We assume the frequentist framework where the data is generated according to a fixed mean vector. I will discuss some results on the $\ell_2$ risk and the rate of contraction of the posterior distribution around the horseshoe estimator. [Joint work with Bas Kleijn and Aad van der Vaart.]
We propose a Bayesian non parametric approach to test for monotonicity in a regression setting. In that context, the usual Bayes factor approach gives poor results in practice. We thus study an alternative approach that is both efficient and straightforward to implement, which is a great improvement compared to the existing frequentists procedures. Furthermore we study its asymptotic properties and prove that our procedure attains the adaptive minimax separation rate for a wide variety Hoelder smooth alternatives.
Principal component analysis (PCA) is possibly one of the most widely used statistical tools to recover a low rank structure of the data. In the high-dimensional settings, the leading eigenvector of the sample covariance can be nearly orthogonal to the true eigenvector. A sparse structure is then commonly assumed along with a low rank structure. Recently, minimax estimation rates of sparse PCA were established under various interesting settings. On the other side, Bayesian methods are becoming more and more popular in high dimensional estimation. But there is little work to connect frequentist properties and Bayesian methodologies for high dimensional data analysis. In this talk, we propose a prior for the sparse PCA problem, and analyze its theoretical properties. The prior adapts to both sparsity and rank. The posterior distribution is shown to contract to the truth at optimal minimax rates. In addition, a computational strategy for the rank-one case is discussed.
I will the discuss the following aspects of recent work with Ismael Castillo on Bernstein-von Mises theorems in nonparametric models:
1) unlike in finite-dimensional models, whether a nonparametric Bayesian credible set is an exact frequentist confidence sets depends crucially on the geometry of the set. 2) the geometry, or 'spaces', in which exact posterior asymptotics can be obtained have a natural interpretation in terms of `multi-scale statistics', used commonly in the frequentist literature. 3) discuss such multi-scale results that we could obtain for i.i.d. sampling models, including applications to Donsker-Kolmogorov-Smirnov theorems and confidence bands for random histograms.
Suppose we observe data that aggregates into a graph, or more generally, a matrix or a higher-order array. As more data becomes available, the size of the graph increases. I will explain how Bayesian models of such data can be derived if the graph is exchangeable, and what exchangeability means for this type of data. I will then discuss why exchangeable models are misspecified for network data, and summarize what we know about the (so far completely open) problem of finding an alternative concept suitable for networks.
Conditions for the rate of contraction of posterior distribution always put a sufficiency of prior mass in sharpened Kullback-Leibler neighborhoods (Ghosal, Ghosh and Van der Vaart (2000)). In this talk, we try to accommodate larger class of priors and formulate the corresponding part about the rate of convergence of posterior distribution of theorem 1.2 in Kleijn (2013) based on more relaxed assumption for the prior and some stringent conditions for the model. Now we are working on the application for the support boundary estimation and this work is in progress. [This is a joint work with Bas Kleijn.]
We investigate the problem of deriving posterior contraction rates under different loss functions in nonparametric Bayes. In a first part, we derive lower bounds on posterior coverages of shrinking neighbourhoods and discuss implications on proof strategies to derive posterior contraction rates. In a second part, feasible priors are constructed that lead to adaptive rates of contraction under L2 or L? metrics and that moreover achieve our lower bound. As an outlook, we discuss some consequences on the asymptotic behaviour of posterior credible balls.
[This ist a joint work with Marc Hoffmann and Judith Rousseau.]
In Bayesian statistics species sampling models are random discrete distributions that can serve as priors for the distribution of the data. The Dirichlet prior is the most famous example. A random sample of size n from a discrete distribution will induce a partition of {1,...,n} by the pattern of ties in the sample (the `distinct species'). This links species sampling models to exchangeable partitions of {1,...,n}, of which there is a rich probabilistic theory. For the Dirichlet process this is the famous `Chinese restaurant process'. This talk is a review of some of this theory, the Bayesian applications, and some open problems.
There has been an increasing literature in the past ten years on asymptotic properties of Bayesian nonparametric procedures, initiated mostly by the work of Ghosal, Ghosh and van der Vaart (1999) on posterior concentration rates for density estimation. There has been an increasing literature on the posterior concentration in nonparametric models and most of this literature deal with measures of concentrations in terms of losses that are "natural", like the $L_2$ in Gaussian regression models or the Hellinger or the $L_1$ metric in density estimation models. Recently some negative results have been obtained showing that bias might appear when some other types of losses are considered. In this work we first give some general results linking the control on the posterior concentration rate to the considered loss and some "natural loss" in a way similar to Cai and Low (2006). Then we study more precisely the case of the $L_\infty$ loss for which we exhibit both lower and upper bounds and we propose an adaptive Bayesian nonparametric prior in the case of the white noise model.
[Joint work with Marc Hoffman and Johannes Schmidt - Hieber (Paris Dauphine and CREST - ENSAE)]
Bayesian hierarchical clustering priors such as the Dirichlet Diffusion Tree and Kingman's Coalescent are flexible modelling tools in which a datum can belong to a family of nested clusters, rather than a single cluster. This representation has many advantages; most notably a better sharing of statistical information among data. However, inference in these models is often difficult and computationally expensive, largely due to a tying of the tree topology and branch lengths in the prior. Through a connection of the Coalescent to Aldous' Beta-splitting model, we can construct priors where the tree topology and branch lengths factorize. This provides two benefits: there is a more flexible choice in the priors that can be constructed and more efficient Gibbs type inference can be used. We demonstrate this on an example model for density estimation and show the model achieves competitive experimental results.
A usual path for establishing posterior consistency for specific statistical models is through application of one of the general posterior consistency results (Schwartz (1965), Barron et al. (1999), Walker (2004)). For many (simpler) models, however, such an approach appears to be overkill. In this talk, assuming that a sample $X_{i/n},i=0,1,\ldots,n$ from a rescaled Brownian motion $X_t=\int_0^t \sigma(s)dW_s$ is available (here $\sigma$ is a deterministic function that parametrises the model), we will show how to establish posterior consistency for non-parametric Bayesian estimation of $\sigma$ using direct arguments that bypass general posterior consistency theorems.
[The talk is based on a joint work with Peter Spreij.]
An unconventional application of the minimax theorem gives rise to a versatile sufficient condition for posterior consistency. We apply this condition to re-derive Schwartz' consistency theorem (Schwartz (1965)), sharpen its assertion somewhat and formulate several other consistency theorems. The main benefit of the proposed approach is enhanced flexibility in the choice of the prior: example consistency theorems formulate priors that charge Kullback-Leibler balls (as in Schwartz' theorem), as well as Hellinger and other metric balls. Marginal consistency in semi-parametric estimation problems falls within the range of application and an example is considered.
We present a family ('model') M of probability distributions, a distribution P outside M and a Bayesian prior distribution on M, such that
- M contains a distribution Q within a small (Hellinger, KL or L2) distance \delta from P. Nevertheless:
- when data are sampled according to P, then, no matter how many data are observed, the Bayesian posterior puts nearly all its mass on distributions that are at a distance from P that is much larger than \delta.
The result is fundamentally different from earlier Bayesian inconsistency results by Diaconis and Freedman, since we can choose M countable and the prior on Q to be > 0; if the model M were well-specified (`true'), then by Doob's theorem this would immediately imply consistency. We also discuss how the results can coexist with the Bayesian misspecification consistency results of Kleijn and Van der Vaart (2006). Partially based on joint work with John Langford (Microsoft Research New England). A preliminary version was presented at ISBA Valencia 2006 and published in the Machine Learning Journal, 2007.
In my talk I will go back to my first Bayes Club Seminar presentation in April 2011. Estimation of the end point of a distribution can be viewed as the shift or the scaling problem (or sometimes both). Assuming the underlying distribution possesses a density function, the behaviour of the density at the end point may simplify the estimation problem. I will show that densities with jumps give rise to a weakly converging expansion of the likelihood called local asymptotic exponentiality (LAE). This type of asymptotic behavior of the likelihood results in a one-sided, exponential posterior limit satisfying the irregular Bernstein-von Mises theorem. This can be then generalized to the semiparametric version of the irregular Bernstein-von Mises theorem. I will present a version of the theorem in a semiparametric LAE model for a shift parameter. Another semiparametric LAE example for a scale parameter will be also presented. In the latter setting one of the conditions of the irregular BvM seems to be much harder to verify than in the former. This might be surprising, since both semiparametric problems studied here are seemingly quite similar. [This is based on a joint work with B. Kleijn]
The volume of data is increasing on a similar exponential curve as Moore's law. Bayesian methods are at risk of becoming too computationally expensive for these large scale datasets. Stochastic gradient methods have served frequentists well to deal with these issues. I claim that Bayesian inference methods, and in particular MCMC sampling can also be adapted to reap the benefits of stochastic approximations. I will discuss different versions of "Stochastic Gradient Langevin Dynamics" that start off as stochastic gradient descent and then automatically transition into posterior samplers with the correct equilibrium distribution as the stepsize decreases. An improved version called "Stochastic Gradient Fisher Scoring" uses a preconditioning matrix to sample from the best Gaussian approximation of the posterior for large stepsizes. Experiments show that these samplers are practical and indeed perform well for large datasets.
Bayesian inference is attractive for its coherence and good frequentist properties. However, it is a common experience that eliciting a honest prior may be difficult and, in practice, people often take an empirical Bayes approach, plugging empirical estimates of the prior hyperparameters into the posterior distribution. Even if not rigorously justified, the underlying idea is that, when the sample size is large, empirical Bayes leads to "similar" inferential answers. Yet, precise mathematical results seem to be missing. In this work, we give a more rigorous justification in terms of merging of Bayes and empirical Bayes posterior distributions. We consider two notions of merging: Bayesian weak merging and frequentist merging in total variation. Since weak merging is related to consistency, we provide sufficient conditions for consistency of empirical Bayes posteriors. Also, we show that, under regularity conditions, the empirical Bayes procedure asymptotically selects the value of the hyperparameter for which the prior mostly favors the "truth". Joint work with Sonia Petrone and Judith Rousseau.
In my presentation I will talk about adaptive Bayesian techniques, such as empirical and hierarchical Bayes method, and apply them to solve mildly ill-posed inverse problems. I will investigate the behaviour of the adaptive posterior distribution from a frequentist point of view and show that the posterior distribution achieves the minimax rate of contraction up to a slowly varying term. Furthermore, if time allows, I will examine how much confidence can we put in the adaptive credible sets and try to construct the largest set on which we can trust adaptive credible sets as a measure of uncertainty. This is ongoing joint work with Aad, Harry, and Bartek.
We will consider the problem of estimating an unknown function f at a given point in a fixed design setting. A Bayesian approach with Brownian motion prior will be used to derive an estimator, which we will study using frequentist methods. We will investigate how the bias depends on the Hölder smoothness of the function f. This result will be used to study the coverage of Bayesian credible sets for this model.
We consider one general, simple theorem which gives a recipe how to construct exact (i.e., non-asymptotic) confidence sets under certain conditions. Next we discuss some applications.
Recently, highdimensional problems and sparsity constraints have gained a lot of attention due to their applicability in biology. In this talk, we consider a fully Bayesian approach to high-dimensional regression. For a class of priors, which accounts for sparsity, we provide results for the contraction rate of the posterior. We discuss the assumptions on the design matrix and relate them to existing work. For specific situations, we are able to determine the behavior of credible intervals. This is ongoing joint work with Aad and Ismael.
We consider the problem of (minimax and oracle) estimation of high (infinite) dimensional vector of binomial proportions. Under some conditions we derive the asymptotic behavior of the minimax risk over some nonparametric classes, in particular, a "binomial version" of Pinsker's result. Further, we might touch upon the issue of (empirical) Bayesian adaptation and the problem of optimal allocation of observations.
I will report on a joint project with Andrew Stuart and Yvo Pokern in which we study a Bayesian approach to nonparametric estimation of the periodic drift function of a one-dimensional diffusion from continuous-time data. We specify a centered Gaussian prior on the drift with a precision operator that is of differential form. It is proved that the posterior is Gaussian as well and we give an explicit expression for the posterior precision operator and show that the posterior mean is the solution of a certain differential equation. Moreover, we bound the rate at which the posterior contracts around the true drift function. The results rely on tools from the analysis of differential equations and new functional limit theorems for the local time of diffusions on the circle.
I will give an introduction to online learning, which deals with decision problems that can be formulated as a repeated game between the statistician and an adversary. This is a natural way to model problems like spam detection and optimization of financial portfolio's, which have an adversarial component. But there are also applications to data compression, for which particularly strong performance guarantees are possible.
Many standard algorithms in online learning can be interpreted as Bayesian methods, or approximations thereof. I will work out several examples. Time permitting, I will also present recent work with Peter Gruenwald, Wouter Koolen and Steven de Rooij, in which we analyse a new algorithm by viewing it as an approximation to Bayes, and show that fast convergence of the Bayesian posterior implies better decisions. I will review some methods to deal with MCMC on spaces of varying dimension, in particular the reversible jump algorithm by Green.
As an application, I will show how this method can be used to estimate the drift of a discretely observed diffusion. The particular hierarchical prior that we propose requires a slight adaptation of the basic algorithm. This concerns joint work with Moritz and Harry. One of the most interesting consequences of the (parametric) BvM Theorem is that the Bayesian and frequentist distributions of the estimation error are asymptotically the same, having asymptotic Gaussian shape; in particular, Bayesian credible sets and frequentist confidence regions must coincide asymptotically. This talk investigates the occurrence of this "phenomenon" in the Gaussian white noise model.
I will first give a 15 minutes presentation on Bayesian inverse problems, as preparation for the Philips Award session during NMC 2011 in Enschede. Then I will talk about semiparametric posterior limits under the condition of local asymptotic exponentiality. Consider the model indexed by a real-valued parameter \theta and a nuisance parameter \eta. Every element of the model has a density p_{\theta,\eta}(x) given by \eta(x-\theta), where \eta is a density function, supported on positive reals, and \eta(0) is positive and finite. I will show some results on the asymptotic behavior of the marginal posterior for the parameter of interest \theta in this semiparametric model.
My intention is to review some computational methods for posterior based on Gaussian priors: expectation propagation and Laplace approximation. The first I learned (mostly) from a thesis in Nijmegen, regarding the second I review the paper on INLA in JRSSb2009(?) by Chopin, Rue et al. They are claimed to be much faster than MCMC and just as good/bad.
No new stuff, and no theory, but if I find the time I'll prepare some pictures, and there may be possibilities of implementing similar algorithms on other models of interest. For any comments, questions, requests, suggestions etc., please contact Botond Szabo (b.t.szabo[at]vu.nl) and/or Paulo Serra (p.j.de.andradeserra[at]vu.nl).
P. Serra - Fri Oct 2, 2020 (Original website by B. Kleijn) |