Week 2 | Notion

22. 📷 Contract Scanner

Overview

A vision-enabled analyzer that processes scanned contract images or PDFs, performs OCR to extract text, then flags risky terms, missing mandatory sections, and unusual language patterns for compliance checks.

Primary Use Cases

Compliance teams scanning hard-copy contracts
Legal ops ensuring all required sections are present
Auditors quickly identifying high-risk language

Key Features

OCR pipeline optimized for legal document layouts
Template validation: checks for presence/absence of mandatory sections
Risk flagging: NLP model scores sentences for risk level (high/medium/low)
Report export: summary of findings with page/line references

Tech Stack

Frontend: React + TypeScript + Tailwind for scanner UI and report viewer
Backend: FastAPI (Python)
AI Models:
- OCR: microsoft/trocr-base with layout analysis via detectron2
- Risk detection: legal-bert fine-tuned for sentence-level risk classification
Storage: S3 or Firebase Storage for images; PostgreSQL for reports

Architecture

Image Ingest: accept image/PDF → normalize resolution, deskew
OCR & Layout: apply OCR + detect document structure (headings, tables)