Awesome food allergy datasets

<aside> ☑️

</aside>

<aside> 🔗

</aside>

Our goal with this project is to create the first comprehensive collection of Food Allergy related datasets, with great potential for both ML enthusiasts, and the scientific community in general.

We identified three main tracks to explore and find datasets about:

🏥 Early detection & Diagnosis

AI models analyze diverse patient data including genomics, electronic health records, IgE levels, and family history to predict an individual's risk of developing a food allergy. Machine learning algorithms identify complex patterns across these datasets that are not apparent to human clinicians, enabling earlier and more accurate risk assessment.

🧬 Drug Design

AI used to accelerate the development of safer immunotherapies. Models analyze the 3D structure of allergenic proteins to identify specific regions (epitopes) that trigger the allergic reaction. This allows scientists to computationally redesign these proteins, modifying the allergenic parts while preserving the segments needed to build immune tolerance.

🌿 Food engineering

Create hypoallergenic foods by engineering the expression of certain epitopes. A nice starting example for this is AgroNT, a large language foundational model for edible plant genomes by Instadeep.

Next steps ➡️

At the moment we’re just collecting entries, but the final goal is to serve them in a digestable format. The further steps are:

Writing small descriptions for each datasets
Individuating most promising ones
Separate open source vs gated datasets. Collect contacts of institutions that require permission for access datasets.
Creating an open source repo organizing this knowledge (in the style of https://github.com/terryum/awesome-deep-learning-papers)
Ultimately merge all this knowledge to draft an article to post on our organization.

How can I contribute?

We set up an Huggingface space where you can send new dataset entries, together with their relative resources. You can also share documents on the repo that you can find in our github organization: