<aside> 📜
© 2026 Denis Jacob Machado. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
</aside>
Abstract
Bayesian inference has become a widely adopted probabilistic framework for estimating evolutionary trees and model parameters. However, interpreting the results of Bayesian analyses requires a precise understanding of the mathematical guarantees underlying the method — and, critically, of where those guarantees break down. This manuscript clarifies the nature of Bayesian convergence, examining the constraints of the Bernstein–von Mises theorem, the complications introduced by discrete tree topologies, the inevitability of model misspecification in evolutionary biology, the permanent role of prior distributions in finite empirical alignments, and the practical consequences of these limitations for phylogenetic research.
In Bayesian phylogenetics, the objective is to estimate the joint posterior probability of a tree topology τ and substitution model parameters θ, given an alignment of sequence data X. This is calculated via Bayes' theorem:
$P(\tau, \theta \mid X) = \frac{P(X \mid \tau, \theta) \, P(\tau, \theta)}{P(X)}$
where P(X | τ, θ) is the likelihood, P(τ, θ) is the joint prior, and P(X) is the marginal likelihood — the normalizing constant obtained by integrating the numerator over all possible topologies and parameter values. Because this integral is analytically intractable for phylogenetic problems, Bayesian phylogenetics relies on Markov Chain Monte Carlo (MCMC) sampling to approximate the posterior distribution (Huelsenbeck and Ronquist, 2001; Drummond et al., 2006).
A fundamental theoretical reassurance often cited in Bayesian analysis is that, given enough data, the posterior distribution will converge to the "truth." This concept is formalized by the Bernstein–von Mises (BvM) theorem. The BvM theorem states that under specific regularity conditions, as the amount of data (in phylogenetics, the sequence length n) approaches infinity, the posterior distribution converges to a multivariate normal distribution centered on the true parameter values (or the maximum likelihood estimator), with variance shrinking to zero (van der Vaart, 1998).
The BvM theorem states that under specific regularity conditions, as the amount of data (in phylogenetics, the sequence length n) approaches infinity, the posterior distribution converges to a multivariate normal distribution centered on the true parameter values (or the maximum likelihood estimator), with variance shrinking to zero (van der Vaart, 1998).
However, it is critical to recognize that this convergence guarantee is both asymptotic and conditional:
There is an additional subtlety that is often overlooked. The classical BvM theorem applies to continuous parameters in Euclidean space. Tree topologies, however, are discrete combinatorial objects. The phylogenetic parameter space is not a smooth manifold. It is, instead, a collection of orthants (one per topology) glued together along their boundaries, forming the Billera-Holmes-Vogtmann (BHV) tree space (Billera et al., 2001). The BvM theorem can be applied to the continuous parameters (branch lengths, substitution model parameters) conditional on a fixed topology, but the convergence of posterior probability across topologies is a separate and harder problem.
The BvM theorem can be applied to the continuous parameters (branch lengths, substitution model parameters) conditional on a fixed topology, but the convergence of posterior probability across topologies is a separate and harder problem.