<aside> 📜

</aside>

<aside> 🗓️

This project was created by Denis Jacob Machado as is available under a CC-BY-NC v4.0 license. This project was updated on Feb. 23, 2026.

</aside>

Theoretical-Philosophical Foundations of Character, Homology, and Homoplasy, and Their Implementation in Contemporary Phylogenetic Reconstruction Strategies Based on High-Throughput Sequencing

Abstract

Phylogenetic systematics rests on three interrelated conceptual pillars: the character, homology, and homoplasy. This manuscript examines the theoretical and philosophical foundations of these concepts and reviews how they are operationalized in contemporary phylogenetic reconstruction workflows, with emphasis on high-throughput sequencing (HTS) data. Characters are historical individuals — transformation series in the sense of Hennig (1966) — and phylogenetic systematics is therefore an ideographic, not nomothetic, science (Grant & Kluge 2004). Homology refers to a historical identity relation and is not synonymous with synapomorphy; plesiomorphy is also a form of homology, and the equation of homology with synapomorphy introduced by Patterson (1982) is historically and conceptually erroneous (Nixon & Carpenter 2011). The concept of homology is inherently tree-dependent and cannot be assessed in isolation from a cladogram. This circularity is not a weakness but a feature of a coherent epistemological framework in which hypotheses of homology and hypotheses of relationship are tested simultaneously. Outgroup comparison, character polarity, and rooting reduce to the same problem and are best addressed through unconstrained simultaneous analysis of all terminals; ingroup monophyly is a hypothesis to be tested, not an assumption to be enforced (Nixon & Carpenter 1993). Dynamic homology, as implemented in direct optimization methods, extends this framework to molecular sequence data without requiring a pre-analysis multiple sequence alignment. Insertions and deletions (InDels) constitute the second most important source of genomic variation and are poorly served by static alignment approaches; coding gaps as missing data in maximum likelihood analysis produce unpredictable and significant effects on topology and model scores. Dynamic homology provides a principled solution by treating InDel cost as an integral component of the phylogenetic optimality criterion. Character weighting strategies, sensitivity analysis, and parallel computing are indispensable tools for navigating the computational and analytical challenges posed by genome-scale data sets. Recent software developments, notably PhylogeneticGraph (PhyG; Wheeler et al. 2024), extend the phylogenetic search space beyond trees to softwired and hardwired networks, enabling analysis of horizontal transfer and hybridization scenarios. The phylogenetic minimum description length (PMDL; Wheeler & Varón 2025) offers a theoretically unified optimality criterion based on algorithmic complexity that subsumes parsimony, likelihood, and Bayesian inference under a common framework.

1. Introduction

Inferring the evolutionary history of organisms requires organized, heritable observations that can discriminate among competing phylogenetic hypotheses. The transformation of raw biological observations into such evidence is not a trivial operation. It demands a principled understanding of what constitutes a character, how characters bear on the hypothesis of common descent, and how apparent conflicts among characters are interpreted and resolved.

These questions have a long and contentious history. Lankester (1870) introduced the term homoplasy to distinguish features that superficially resemble one another but do not share a common origin. Hennig (1966) formalized the distinction between shared ancestral and shared derived character states, providing the logical foundation for cladistic methodology. Patterson (1982) and de Pinna (1991) subsequently addressed the epistemological status of homology as a hypothesis. Wheeler et al. (2006) and Wheeler (2012) synthesized these threads into an integrated computational framework capable of handling the scale and complexity of modern genomic data.

2. The Concept of Character

A character, in the phylogenetic sense, is "a historically independent transformation series" (Wheeler et al. 2006, p. 333). This definition has important epistemological consequences. A character is not a simple observation but a theoretically organized statement about heritable variation across taxa. Characters convey notions of relevance, comparability, and correspondence (Wheeler 2012). They are, in the language of Popper (1934, 1959), theory-laden objects: the investigator's prior understanding of organismal biology informs the decision to recognize a feature as a character at all.

Grant & Kluge (2004) explicated the deeper implications of Hennig's transformation series concept and argued that it is the only character concept consistent with the epistemological requirements of phylogenetic systematics. As transformation series, characters are historical individuals, analogous to species and clades in their ontological status. Phylogenetic systematics is, therefore, an explicitly ideographic science: it seeks to discover the series of necessarily unique historical transformation events that explain the heritable variation among lineages, and it does not conjecture universal laws. This has direct consequences for how characters are individuated in practice. Character dependence that is relevant to phylogenetic inference is historical or transformational independence, not functional or developmental correlation, both of which are irrelevant to the inference of unique historical events (Grant & Kluge 2004). A further consequence of this framework is that homology refers to a historical identity relation — the identity of being derived from the same unique transformation event — and is not synonymous with synapomorphy (Grant & Kluge 2004). This clarification is important because it separates the concept of homology from the particular test (character congruence) used to corroborate it.

Characters are composed of character states, which are the alternative forms that a transformation series can take. For binary characters, states typically designate the presence or absence of a structure. For multistate characters, states represent discrete, ordered or unordered forms. For sequence data, character states are the nucleotide or amino acid residues at a given site.

Wheeler et al. (2006) recognize five fundamental character types relevant to modern phylogenetics. Additive characters (Farris 1970) assume a linear ordering of states, so that the cost of transformation is the additive distance between states. Nonadditive characters (Fitch 1971) assign equal costs to all transformations, with no nested homology among states. Matrix or Sankoff characters (Sankoff & Rousseau 1975) allow arbitrary transformation costs to be specified. Sequence characters treat contiguous strings of nucleotides or amino acids as the unit of comparison, with transformation involving substitution, insertion, or deletion of component units. Finally, chromosomal characters simultaneously optimize nucleotide-level and locus-level variation, making them applicable to complete mitochondrial, bacterial, and viral genomes (Wheeler et al. 2006). PhyG (Wheeler et al. 2024) further extends character state representation to effectively unlimited alphabets and supports multicharacter state labels, allowing analysis of linguistic, developmental, and other non-standard comparative data types.

Phenotypic characters are structural or functional characteristics that express the genotype in conjunction with environmental influences and include morphological, behavioral, and amino acid sequence data. Genotypic characters, by contrast, are derived directly from DNA or RNA sequences and represent the genetic signal transmitted from parent to offspring. The principal advantage of genotypic data is that concerns about heritability are eliminated. Their principal challenge is that nucleotide homology cannot be tested in isolation; it requires a congruence test against other characters (Wheeler et al. 2006). The principle of total evidence (Kluge 1989) holds that all evidence, phenotypic and genotypic alike, possesses equal initial value for discriminating among phylogenetic hypotheses and should therefore be analyzed simultaneously.

Theoretical-Philosophical Foundations of Character, Homology, and Homoplasy, and Their Implementation in Contemporary Phylogenetic Reconstruction Strategies Based on High-Throughput Sequencing

Abstract

1. Introduction

2. The Concept of Character

3. Homology: Definition, Epistemology, and Debate

3.1 The Operational Definition