22. 📷 Contract Scanner
Overview
A vision-enabled analyzer that processes scanned contract images or PDFs, performs OCR to extract text, then flags risky terms, missing mandatory sections, and unusual language patterns for compliance checks.
Primary Use Cases
- Compliance teams scanning hard-copy contracts
- Legal ops ensuring all required sections are present
- Auditors quickly identifying high-risk language
Key Features
- OCR pipeline optimized for legal document layouts
- Template validation: checks for presence/absence of mandatory sections
- Risk flagging: NLP model scores sentences for risk level (high/medium/low)
- Report export: summary of findings with page/line references
Tech Stack
- Frontend: React + TypeScript + Tailwind for scanner UI and report viewer
- Backend: FastAPI (Python)
- AI Models:
- OCR:
microsoft/trocr-base
with layout analysis via detectron2
- Risk detection:
legal-bert
fine-tuned for sentence-level risk classification
- Storage: S3 or Firebase Storage for images; PostgreSQL for reports
Architecture
- Image Ingest: accept image/PDF → normalize resolution, deskew
- OCR & Layout: apply OCR + detect document structure (headings, tables)