2. Image-to-Do
Overview
Snap a photo of a whiteboard, slide deck, meeting flipchart—or even a receipt—and automatically extract actionable tasks (“To-Do” items) in list form.
Primary Use Cases
- Project managers jot tasks in brainstorming sessions
- Students capture homework items from board
- Anyone turning handwritten/post-it notes into structured tasks
Key Features
- Camera / upload widget in browser
- OCR via TrOCR or PaddleOCR for text extraction
- NLP parser that classifies lines as tasks, deadlines, assignees
- Export to common to-do apps (Todoist, Google Tasks)
Tech Stack
- Frontend: React + TypeScript + Tailwind (upload/camera, preview)
- Backend: FastAPI (Python) for OCR & NLP; Go service for webhook integrations
- AI Models:
- OCR:
microsoft/trocr-base-handwritten
- Task extraction: a fine-tuned T5 small model on “taskification” data
Architecture
- Image ingest → normalize (deskew, crop) → send to OCR.
- Text processing → split into lines/blocks → send to NLP parser.
- Output → JSON list of tasks with optional meta (due-date, tags) → delivered to frontend or via webhook.