On the Limits and Guarantees of Bayesian Convergence in Phylogenetic Inference

Abstract

Bayesian inference has become a widely adopted probabilistic framework for estimating evolutionary trees and model parameters. However, interpreting the results of Bayesian analyses requires a precise understanding of the mathematical guarantees underlying the method — and, critically, of where those guarantees break down. This manuscript clarifies the nature of Bayesian convergence, examining the constraints of the Bernstein–von Mises theorem, the complications introduced by discrete tree topologies, the inevitability of model misspecification in evolutionary biology, the permanent role of prior distributions in finite empirical alignments, and the practical consequences of these limitations for phylogenetic research.

1. The Asymptotic and Conditional Nature of Convergence

In Bayesian phylogenetics, the objective is to estimate the joint posterior probability of a tree topology τ and substitution model parameters θ, given an alignment of sequence data X. This is calculated via Bayes' theorem:

$P(\tau, \theta \mid X) = \frac{P(X \mid \tau, \theta) \, P(\tau, \theta)}{P(X)}$

where P(X | τ, θ) is the likelihood, P(τ, θ) is the joint prior, and P(X) is the marginal likelihood — the normalizing constant obtained by integrating the numerator over all possible topologies and parameter values. Because this integral is analytically intractable for phylogenetic problems, Bayesian phylogenetics relies on Markov Chain Monte Carlo (MCMC) sampling to approximate the posterior distribution (Huelsenbeck and Ronquist, 2001; Drummond et al., 2006).

A fundamental theoretical reassurance often cited in Bayesian analysis is that, given enough data, the posterior distribution will converge to the "truth." This concept is formalized by the Bernstein–von Mises (BvM) theorem. The BvM theorem states that under specific regularity conditions, as the amount of data (in phylogenetics, the sequence length n) approaches infinity, the posterior distribution converges to a multivariate normal distribution centered on the true parameter values (or the maximum likelihood estimator), with variance shrinking to zero (van der Vaart, 1998).

The BvM theorem states that under specific regularity conditions, as the amount of data (in phylogenetics, the sequence length n) approaches infinity, the posterior distribution converges to a multivariate normal distribution centered on the true parameter values (or the maximum likelihood estimator), with variance shrinking to zero (van der Vaart, 1998).

However, it is critical to recognize that this convergence guarantee is both asymptotic and conditional:

Asymptotic. It strictly applies only in the limit as n → ∞. For any finite alignment (which is all we ever have) the guarantee is an approximation whose quality depends on how large n is relative to the complexity of the problem.
Conditional. The standard formulation of the BvM theorem requires that the true data-generating mechanism (i.e., the actual biological process that produced the sequences) lies perfectly within the predefined space of models being explored (M). If the true model $M_{true}$ is not an element of the model space M, the foundational guarantees of the standard BvM theorem do not straightforwardly apply.

1.1 The complication of discrete tree topologies

There is an additional subtlety that is often overlooked. The classical BvM theorem applies to continuous parameters in Euclidean space. Tree topologies, however, are discrete combinatorial objects. The phylogenetic parameter space is not a smooth manifold. It is, instead, a collection of orthants (one per topology) glued together along their boundaries, forming the Billera-Holmes-Vogtmann (BHV) tree space (Billera et al., 2001). The BvM theorem can be applied to the continuous parameters (branch lengths, substitution model parameters) conditional on a fixed topology, but the convergence of posterior probability across topologies is a separate and harder problem.

The BvM theorem can be applied to the continuous parameters (branch lengths, substitution model parameters) conditional on a fixed topology, but the convergence of posterior probability across topologies is a separate and harder problem.