Research Hub

πŸ“Š Report Generator

Automated research β†’ polished report, end to end. Drop in a URL or file. Get back a fully rendered statistical report with narrative, tables, and charts.


πŸ”΄ At a Glance

🟒 Status Active Development
πŸ‘€ Owner Justin Verlin
πŸ“… Last updated April 9, 2026
πŸ”— Repo β€”
🧱 Stack Python 3.10 · Quarto · litellm · OpenAI
πŸ“€ Outputs HTML Β· PDF Β· DOCX

πŸ—ΊοΈ How It Works

The pipeline has five stages. Each stage is independent and can be swapped out.

Source (URL or file) β†’ Parse β†’ Chunk β†’ LLM Extract (map-reduce) β†’ LLM Report Writer β†’ Quarto .qmd β†’ HTML/PDF/DOCX

Source (URL or file) β†’ Parse β†’ Chunk β†’ LLM Extract (map-reduce) β†’ LLM Report Writer β†’ Quarto .qmd β†’ HTML/PDF/DOCX
Stage What happens
Input Detects URL vs. local file, downloads if needed, identifies MIME type
Parse Routes to the right parser β€” PDF, DOCX, XLSX, CSV, web, or plain text
Chunk Splits text into overlapping context-window-sized pieces; tables kept whole
Extract Each chunk hits the LLM for structured data (entities, stats, claims); results are merged
Generate Second LLM call turns the JSON extraction into a full .qmd with prose + charts
Render Quarto compiles .qmd β†’ HTML / PDF / DOCX

πŸ“₯ Supported Sources

Format Parser Notes
🌐 Web page trafilatura Any public URL
πŸ“„ PDF PyMuPDF + pdfplumber Text and tables
πŸ“ Word python-docx .docx
πŸ“Š Excel pandas + openpyxl .xlsx
πŸ—‚οΈ CSV pandas
πŸ“ƒ Plain text built-in

πŸ“€ Export Hub

Add a row each time a new report is generated. Link the .qmd source and the rendered output.

Report Source Format Generated QMD Output Status
Q1 Market Analysis market_data.xlsx HTML Apr 7, 2026 πŸ“„ Link πŸ”— Link βœ… Done
Industry Trends bloomberg.com PDF Apr 5, 2026 πŸ“„ Link πŸ”— Link βœ… Done
Competitive Landscape comp_data.pdf DOCX Mar 31, 2026 πŸ“„ Link πŸ”— Link βœ… Done
(next report) πŸ”² Pending

πŸ’‘ Tip: Turn this table into a Notion database for filtering by format, date, or status.