<aside> 📜

© 2026 Denis Jacob Machado. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

</aside>

<aside> 🗓️

This document was written by Denis Jacob Machado and is available under the CC-BY-NC v4.0 lisence. This document was updated on February 22, 2026.

</aside>


Contemporary Debates on Artificial Intelligence in Biomedical Science

Abstract

Artificial intelligence (AI) has rapidly transformed biomedical research, enabling substantive advances in gene therapy, cell therapy, vaccine design, bioinformatics, and synthetic biology. However, its growing influence has also generated substantial debate concerning interpretability, reliability, validation, dataset bias, and regulatory oversight. This article synthesizes contemporary discussions surrounding supervised and unsupervised machine learning, deep learning, generative models, Bayesian networks, AI-guided probabilistic inference, large-scale language models (LLMs), and hybrid and AutoML systems. Drawing primarily from recent peer-reviewed literature in computational biology, gene therapy, cell therapy, vaccine design, and synthetic data generation, this work critically examines both demonstrated scientific progress and unresolved methodological challenges. The analysis emphasizes interpretability, reproducibility, and responsible deployment as central concerns for the future integration of AI into biomedical practice. We argue that the contemporary debate is not a binary question of adoption or rejection, but rather one of principled, epistemically accountable integration of AI methodology into the scientific enterprise.


1. Introduction

Artificial intelligence has transitioned from a specialized computational toolset to a central methodological framework within modern biomedical science. Over the past decade, AI applications in biomedicine have expanded to encompass stem cell differentiation modeling, gene therapy target identification, microbial image synthesis, vaccine epitope design, protein structure prediction, and clinical decision support systems, among many others (Topol, 2019; Jumper et al., 2021). This expansion has been driven by three converging developments: the exponential growth of multimodal biomedical datasets, the maturation of deep learning architectures, and the widespread availability of high-performance computing resources.

Recent work in cell therapy demonstrates how AI can model dynamic gene regulatory networks to guide stem cell differentiation, integrating synthetic biology with predictive modeling (Choudhury et al., 2025). Similarly, computational biology and machine learning approaches have enhanced gene therapy workflows, including target identification, vector design, and outcome prediction (Danaeifar & Najafi, 2025). In structural biology, the development of AlphaFold2 (Jumper et al., 2021) marked a watershed moment, demonstrating that deep learning could solve one of the most longstanding problems in the field—the protein folding problem—with near-experimental accuracy. More recently, generative approaches have been applied to design novel proteins and nucleic acid constructs with defined functional properties (Ferruz et al., 2022).

Despite these advances, the rapid expansion of AI methodologies has raised important theoretical and practical questions that the scientific community has not yet resolved. Central among these are the following: Are current AI systems genuinely explanatory, or are they sophisticated pattern matchers without mechanistic insight? Can deep learning models be trusted in high-stakes clinical contexts where errors carry life-altering consequences? How should uncertainty be quantified, propagated, and communicated to end users? And to what extent do the biases encoded in training data compromise the generalizability and fairness of AI-driven recommendations?

These questions collectively define the contemporary methodological debate in biomedical AI. The present article surveys this debate across the major paradigms—supervised and unsupervised learning, deep learning and representation learning, generative models and synthetic data, Bayesian and probabilistic inference, large-scale language models, and hybrid and AutoML systems—with particular attention to both demonstrated scientific potential and unresolved challenges. Our treatment emphasizes epistemic accountability: the responsibility of researchers to understand not only what their models predict, but why, under what conditions, and with what degree of confidence.


2. Supervised and Unsupervised Machine Learning in Biomedicine

2.1 Supervised Learning: Applications and Scope

Supervised learning methods—including support vector machines (SVMs), random forests, gradient-boosted trees, and feedforward neural networks—have become central to genomics, transcriptomics, proteomics, and clinical informatics analyses (Libbrecht & Noble, 2015). In genomics, supervised classifiers trained on labeled sequence data have been applied to the identification of pathogenic single-nucleotide polymorphisms (SNPs), prediction of splice sites, and classification of regulatory elements. In transcriptomics, machine learning models trained on RNA-seq data enable differential expression analysis, cell-type deconvolution, and transcriptome-level phenotype prediction.

In gene therapy research, supervised learning has been applied to all phases of the therapeutic pipeline. At the target identification stage, models trained on curated disease-gene association databases—such as DisGeNET or the OMIM catalog—predict novel therapeutic targets from multi-omics profiles (Danaeifar & Najafi, 2025). At the vector design stage, classifiers predict the efficiency and specificity of adeno-associated virus (AAV) serotypes for particular tissue tropisms based on capsid sequence features. At the outcome prediction stage, models trained on patient clinical and genomic data estimate therapeutic response, adverse event risk, and long-term durability of gene correction.

Proteomics has similarly benefited from supervised learning approaches. Mass spectrometry data processed through machine learning pipelines now allows high-throughput protein identification, quantification, and post-translational modification mapping, enabling large-scale profiling of disease-associated proteomes (Orre et al., 2019). In clinical imaging, supervised convolutional neural networks (CNNs) have achieved radiologist-level performance in tasks such as tumor detection, grading, and segmentation across multiple cancer types (Litjens et al., 2017).

2.2 Unsupervised Learning and Latent Structure Discovery