Food allergies affect 8-10% of children and 4% of adults globally, with rising incidence in urbanized populations. Traditional genetic and exposure models cannot explain this rapid increase, prompting investigation of environmental factors, particularly early-life gut microbiota.
Accumulating evidence demonstrates that gut microbiota composition during critical developmental windows plays a central role in shaping immune tolerance to dietary antigens. Allergic children consistently exhibit reduced microbial diversity, depletion of protective taxa (Bifidobacterium, Faecalibacterium, butyrate-producing Clostridia), and enrichment of pro-inflammatory bacteria (Escherichia-Shigella, Ruminococcus gnavus).
These shifts correlate with decreased immunoregulatory short-chain fatty acids, impaired epithelial barrier integrity, and type 2 inflammatory skewing with reduced regulatory T cells. Critically, these perturbations can be detected months before clinical manifestation, suggesting predictive potential.
Studies by De Filippis et al. and Zhang et al. show that specific gut microbiome signatures are strongly associated with peanut allergies in children, while Özçam et al. discuss the relationship between gut microbiome composition and potential failure of oral immunotherapies. These findings point to the fact that microbiome information is essential for understanding food allergies. This knowledge can be applied both predictively to detect the early development of food allergies from a young age, and therapeutically to identify treatments that restore food tolerance by improving gut microbiome health.
We're building a classifier that could transform how we predict and prevent food allergies by reading the hidden signals in our gut microbiome. Our initial model tackles the fundamental question: Can we identify who's at risk of developing food allergies by looking at microbiome data?
The essential implementation will be a binary classifier, where label 0 indicates a "healthy subject" and label 1 indicates an "allergic subject." In future iterations, we may extend this model to predict specific types of food allergies a patient may be at risk of developing.
The model follows a straightforward two-stage architecture: a microbiome embedding model paired with a classifier head (e.g., logistic regression). Our approach leverages a recently developed foundation model for microbiome data, created by a member of our community (detailed in this blogpost). This foundation model will serve as our backbone, enabling us to extract rich, meaningful representations from raw microbiome data that can then be fed into the classifier.
A complete example of a classifier implementation can be found in this repo.
High level view of the classifier model

Detailed view of input to embedding pipeline

Our model will be evaluated using standard classification performance metrics following established best practices in machine learning: