AI in Phylogenetics, Bioinformatics & Biomedical Sciences — FAQ

<aside> 📜

</aside>

This page compiles key concepts and questions on the contemporary debate around AI and machine learning applied to biological and biomedical research. Questions are organized from broader foundational concepts to more specific applications.

Part 1: Foundational Concepts

What is the difference between artificial intelligence, machine learning, and deep learning?

These are nested concepts. Artificial intelligence (AI) is the broadest category — it refers to any system designed to perform tasks that typically require human intelligence, such as reasoning or decision-making. Machine learning (ML) is a subset of AI in which systems learn from data rather than being explicitly programmed with rules. Deep learning (DL) is a subset of machine learning that uses artificial neural networks with multiple layers to process information — this is where the term "deep" comes from. Deep learning typically handles complex, unstructured data such as images and text, whereas traditional machine learning often works well with structured, tabular data.

What is an artificial neuron?

An artificial neuron is the basic computational unit of a neural network, loosely inspired by biological neurons. It receives multiple inputs, applies numerical weights to them, sums the weighted values (a matrix operation), and passes the result through an activation function that acts as a threshold to determine whether the neuron "fires" or activates. This output can then serve as input to other neurons in subsequent layers.

What are generative models?

Generative models are typically deep learning models that learn the underlying statistical distribution of training data and then produce new outputs that resemble that data. Rather than classifying or predicting existing data points, they generate entirely new examples — such as protein 3D structures, DNA sequences, synthetic medical images, or novel text. Examples include GANs (Generative Adversarial Networks) and models like ProtGPT2.

What are Bayesian networks?

A Bayesian network is a probabilistic graphical model represented as a directed acyclic graph (DAG). Nodes represent variables, directed edges encode probabilistic dependencies between them, and each node can be derived from one or more parent nodes. The network uses Bayes' theorem to update beliefs about variables given new evidence, enabling principled inference under uncertainty. Bayesian networks are not deep learning models — they belong to probabilistic machine learning and can be used in both supervised and unsupervised modes depending on the task.

What is AI-guided probabilistic inference?

AI-guided probabilistic inference refers to the combination of AI or machine learning techniques with probabilistic reasoning frameworks. Rather than running standard Bayesian inference alone, AI methods — such as neural networks — guide or accelerate the exploration of the probability space to improve inference. This approach offers more predictive power than Bayesian networks alone, while retaining the ability to quantify uncertainty. It is more computationally demanding and generally less interpretable than pure Bayesian networks.

What are LSTMs and how do they differ from large language models?

LSTMs (Long Short-Term Memory networks) are a type of recurrent neural network designed to capture long-range dependencies in sequential data by processing sequences step by step. They are useful for genomic sequence modeling, chromatin state prediction, and temporal clinical data analysis. Large language models (LLMs), by contrast, use transformer architectures that process entire sequences simultaneously via attention mechanisms, making them more powerful and efficient for natural language tasks. LLMs use tokens — discrete text chunks — as their basic units of input and output, whereas LSTMs work directly with raw sequential elements.

What are tokens and how do transformers use them?

A token is a chunk of text — a word, subword, or character — that a model processes as a discrete unit. Transformers break input text into tokens and then process all tokens simultaneously using attention mechanisms, allowing each token to "attend" to every other token in the sequence. This enables the model to capture context and long-range relationships efficiently. Token limits define the maximum amount of text a model can process at once (the context window).