FoodVision AI | Notion

Lakshyajeet Domyan, Taylor Le, Apoorav Rathore

INTRODUCTION

Logging meals is a pain. Studies consistently show that people under-report daily calories by 20-50% when manually tracking their food intake. This significant margin of error can derail health goals, fitness plans, and dietary monitoring for millions of people. For our Data Science Lab Final Project, we set out to solve this problem by creating an end-to-end food recognition and calorie estimation system that works in near real-time. The goal was ambitious but straightforward: snap a photo of your food and get an accurate calorie estimate before your coffee cools down.

PROJECT OBJECTIVES

To create a successful system, we established two critical performance targets:

Food classification accuracy of at least 70% across 101 different food categories
Calorie estimation with mean absolute error below 15 kcal/100g

These benchmarks would ensure the system was accurate.

DATA COLLECTION

The foundation of this project relied on two key data sources:

The Food-101 dataset provided the visual training corpus with 101,000 images spread evenly across 101 food categories. This dataset offers exceptional variety in image quality and composition, ranging from high-resolution 512-pixel DSLR photographs to 135-pixel smartphone snapshots. This diversity proved invaluable for building a robust model capable of handling real-world image variability.

For nutritional information, we utilized a publicly available dataset from Kaggle titled “Calories in Food Items (per 100 grams)” by contributor kkhandekar. From this comprehensive dataset, we extracted and adapted a subset of 108 standardized food items, each with associated caloric content per 100g, mapped to categories relevant to our Food-101 classification task.

DATA PREPROCESSING

Raw data rarely comes in the ideal format for machine learning applications. To prepare the datasets for model training and inference, we conducted multiple preprocessing steps:

Visual Data Processing:

Built pixel sanitation routines to identify and handle corrupted JPEGs
Created a standardized image transformation pipeline with resizing to 224×224 pixels, and normalization using ImageNet statistics

This careful preparation established the foundation for reliable model training and performance.

MODEL ARCHITECTURE AND TRAINING

For our current implementation, we have chosen to utilize a ResNet-18 architecture pre-trained on ImageNet and fine-tuned on the Food-101 dataset.