📄 Codex Prompt v2.0

The full product specification used to build AI SEO Audit. Covers architecture, data model, job flow, report structure, UX requirements, and deployment. Refined collaboratively with Claude Opus 4.5 from the v1.0 baseline.
Version: 2.0
Refined with: Claude Opus 4.5
Based on: Codex Prompt v1.0 (original)
Status: Current — used for active development
You are an expert full-stack engineer. Build a production-ready internal web app for SEO Audits.

APP IDENTITY
- Name: AI SEO Audit
- Display in: navbar, page titles (<title>), email headers, PDF headers
- Email "from": AI SEO Audit <noreply@yourdomain.com>

GOAL
- Signed-in users request an SEO audit with minimal inputs via a stepped wizard UI.
- Audit runs asynchronously via a job queue.
- On completion: send an email via SendGrid containing:
  - 3–6 bullet summary of key findings
  - A link to the HTML report in-app
  - A link to the PDF report

STACK (LOCKED)
- Next.js 14+ (App Router) + TypeScript + Tailwind
- Postgres + Prisma
- Auth: NextAuth v4 (Google OAuth ONLY)
- Email: SendGrid
- Async job queue: pg-boss (backed by Postgres; NO Redis)

AUTH IMPLEMENTATION (LOCKED)
- Use NextAuth route handler at /app/api/auth/[...nextauth]/route.ts
- Use getServerSession(authOptions) for route protection in API routes and server components

ADDITIONAL LIBRARIES (LOCKED)
- cheerio (HTML parsing)
- axios (HTTP requests)
- puppeteer (PDF generation from HTML)
- openai (LLM analysis)

STYLING (LOCKED)
- Use the attached theme.css as the base design system
- All components must use CSS variables defined in theme.css
- Do not override or create conflicting styles
- Tailwind config should extend (not replace) these theme tokens via tailwind.config.ts

SOURCE DOCUMENT (CANONICAL)
Use the attached PDF document as the canonical structure, steps, and output formats for each audit section. Do not invent extra sections or change table formats.

AUDIT SECTIONS (MUST MATCH PDF)
1. GBP Audit
2. GBP Ranking Factors
3. Complete Onpage SEO Audit (homepage or user-submitted URL)
4. Schema Audit
5. Rankability Test

GBP AUDIT ADAPTATION
The PDF references discovering competitors via Google Maps search—this is NOT implemented due to ToS concerns and reliability issues. Instead:
- Treat the user-provided competitor_gbp_urls as the "top 5 competitors"
- Label them Competitor #1 through #5 in the order provided
- Attempt best-effort extraction from the user-provided GBP URLs. If blocked/unavailable, set fields to unknown and continue; do not attempt automated Google Maps search.
- If user provides fewer than 5 URLs, label only those provided (e.g., Competitor #1–#3)
- No Google Maps SERP scraping. No bypassing bot protections.

MINIMAL INPUTS (WIZARD)
Wizard flow: 3 steps

Step 1 - Basic Info:
- website_url (required; validate as valid URL)
- primary_keyword (required; text input)
- city_state (required; text input; label as "City, State/Province")
- gbp_search_phrase (required; text input; e.g., "plumber near me")
- business_type (required; dropdown with options: Restaurant, Retail Store, Service Business, Medical Practice, Legal Services, Home Services, Other)

Step 2 - Competitors:
- competitor_gbp_urls (required; list of 3–5 Google Business Profile URLs)
  - Minimum: 3 URLs
  - Maximum: 5 URLs
  - Validate: must be valid Google Maps/Business URLs
  - UI must include helper text: "To find competitor GBP URLs: Search your keyword on Google Maps, click on a competitor's listing, and copy the URL from your browser's address bar."

Step 3 - Review & Submit:
- Display all inputs for user confirmation
- Submit button creates audit and redirects to dashboard

EXECUTION STRATEGY
- Competitor discovery: User-provided URLs only; NO Google Maps scraping
- gbp_search_phrase is used only for labeling/context + LLM analysis; do not perform automated Google/Maps searches
- Page fetching: axios to fetch HTML
- HTML parsing: cheerio
- Schema extraction: Parse JSON-LD blocks from page source using cheerio
- PageSpeed data: Google PageSpeed Insights API (free tier)
- PDF generation: Puppeteer renders report_html to PDF
- File storage (PDF): Local filesystem for dev; S3-compatible interface for prod

ANALYSIS APPROACH
- Data extraction: Deterministic code (fetching, parsing, API calls)
- Qualitative analysis (pattern recognition, ranking factor hypothesis, verdicts): Call OpenAI API with structured prompts derived from the PDF
- LLM Model: gpt-4o (or gpt-4o-mini for cost savings—make configurable via env var)
- Token budget: ~10k tokens per audit section max
- Prompt construction: For each section requiring LLM analysis, construct a prompt that includes the relevant section instructions from the PDF, the extracted data as structured input, and clear output format requirements matching REPORT_JSON STRUCTURE

JOB BEHAVIOR
- Timeout: 10 minutes max per audit job
- Retries: 2 retries on transient failures (network errors, API timeouts)
- Partial failure: If one section fails, continue with remaining sections; mark failed section as "status": "error" with "error_message" in report_json
- Concurrency: Process one audit at a time per worker (configurable)

JOB FLOW (MUST IMPLEMENT)

1. API Endpoint POST /api/audits:
- Validate auth (must be signed in)
- Validate inputs
- Create Audit record with status=queued
- Enqueue pg-boss job with audit_id
- Return { audit_id, status: "queued" }

2. Worker Process (/worker):
- Poll pg-boss queue
- On job received:
  - Set status=running
  - Execute audit steps (fetch pages, parse data, call APIs, call LLM)
  - Build report_json following REPORT_JSON STRUCTURE
  - Render report_html from report_json
  - Generate PDF from report_html using Puppeteer
  - Store PDF (local or S3) and save pdf_url
  - Set status=complete
  - Send SendGrid email with summary + links
  - Update email_status=sent, last_emailed_at=now

3. Error Handling:
- If unrecoverable error: status=failed, error_message=<error details>
- Optionally send failure notification email
- Log full error for debugging

REPORT_JSON STRUCTURE
HTML tables must match the PDF's headings and row order; report_json must be a structured equivalent of the same data:

{
  "gbp_audit": {
    "competitors": [
      {
        "label": "Competitor #1",
        "name": "string",
        "keywords_in_name": { "present": true, "keywords": ["keyword1"] },
        "primary_category": "string",
        "secondary_categories": ["string"],
        "address": "string",
        "city_borough": "string",
        "service_areas": ["string"],
        "location_type": "physical | SAB",
        "review_count": 123,
        "average_rating": 4.5,
        "recent_reviews_30d": 5,
        "review_keywords": ["keyword1", "keyword2"],
        "description_present": true,
        "description_length": "short | medium | long",
        "description_keywords": ["keyword"],
        "services_filled": true,
        "services_count": 10,
        "services_notable": ["Service Name"],
        "products_used": false,
        "photo_count": 25,
        "last_photo_date": "2024-01-15",
        "photo_types": ["job", "team", "branded", "stock"],
        "videos_present": false,
        "posts_active": true,
        "last_post_date": "2024-01-10",
        "last_post_type": "offer | update | service",
        "qa_present": true,
        "qa_owner_answered": true,
        "messaging_enabled": true,
        "website_linked": true,
        "website_type": "local | directory | landing_page",
        "badges_certifications": ["string"]
      }
    ],
    "patterns": {
      "top_3_common": {
        "categories": ["string"],
        "review_count_range": "50-200",
        "photo_frequency": "weekly | monthly",
        "keyword_usage": "string"
      },
      "outliers": [
        { "competitor": "#2", "observation": "Ranks high with only 20 reviews" }
      ]
    },
    "ranking_factors_hypothesis": [
      { "rank": 1, "factor": "Review authority", "reasoning": "string" },
      { "rank": 2, "factor": "Category relevance", "reasoning": "string" },
      { "rank": 3, "factor": "Proximity", "reasoning": "string" },
      { "rank": 4, "factor": "Keyword usage", "reasoning": "string" },
      { "rank": 5, "factor": "Activity/freshness", "reasoning": "string" }
    ],
    "status": "complete | error",
    "error_message": null
  },
  "gbp_ranking_factors": {
    "levers": [
      {
        "rank": 1,
        "lever": "string",
        "evidence": "Competitors #1, #3 demonstrate this by...",
        "why_it_matters": "string"
      }
    ],
    "status": "complete | error",
    "error_message": null
  },
  "onpage_audit": {
    "url_audited": "<https://example.com>",
    "final_url": "<https://example.com> (after redirects)",
    "summary": [
      "Big win or problem #1",
      "Big win or problem #2",
      "Big win or problem #3",
      "Big win or problem #4",
      "Big win or problem #5"
    ],
    "findings": [
      {
        "area": "Title tag",
        "checked": "Exact element checked",
        "found": "Actual content found (quoted)",
        "status": "correct | wrong | needs_improvement | unknown",
        "why_it_matters": "One line explanation",
        "priority": "P0 | P1 | P2"
      }
    ],
    "actions": [
      {
        "priority": "P0",
        "task": "Clear task description",
        "recommendation": "Exact replacement text or specific instruction",
        "effort": "S | M | L",
        "impact": "low | medium | high",
        "notes": "Dependencies or context"
      }
    ],
    "serp_rewrites": {
      "title_options": ["Option 1 (50-60 chars)", "Option 2"],
      "meta_description_options": ["Option 1 (140-160 chars)", "Option 2"]
    },
    "page_speed": {
      "performance_score": 75,
      "lcp": { "value": "2.5s", "status": "pass | fail" },
      "inp": { "value": "200ms", "status": "pass | fail" },
      "cls": { "value": "0.1", "status": "pass | fail" }
    },
    "status": "complete | error",
    "error_message": null
  },
  "schema_audit": {
    "existing": [
      {
        "schema_type": "LocalBusiness",
        "exists": true,
        "key_fields_present": ["name", "address", "telephone"],
        "verdict": "helpful | bare_minimum | broken"
      }
    ],
    "missing": [
      {
        "schema_type": "Service",
        "why_it_matters": "string",
        "priority": "high | medium | low"
      }
    ],
    "examples": [
      {
        "schema_type": "LocalBusiness",
        "priority": "high",
        "json_ld": "{ JSON-LD code block with placeholders }"
      }
    ],
    "status": "complete | error",
    "error_message": null
  },
  "rankability": {
    "target_url": "<https://example.com>",
    "target_keyword": "string",
    "top_3_competitors": [
      { "rank": 1, "url": "string", "title": "string" }
    ],
    "verdict": "deserves_higher | neutral | deserves_lower",
    "primary_reason": "1-2 sentence explanation",
    "top_improvement": "Single most impactful change",
    "actions": [
      {
        "rank": 1,
        "action": "Specific action item",
        "expected_impact": "string"
      }
    ],
    "status": "complete | error",
    "error_message": null
  },
  "metadata": {
    "audit_id": "uuid",
    "created_at": "ISO timestamp",
    "completed_at": "ISO timestamp",
    "total_duration_ms": 12345,
    "sections_completed": 5,
    "sections_failed": 0
  }
}

DATA MODEL (PRISMA)

model User {
  id        String   @id @default(cuid())
  email     String   @unique
  name      String?
  image     String?
  role      Role     @default(USER)
  createdAt DateTime @default(now())
  updatedAt DateTime @updatedAt
  audits    Audit[]
  feedback  Feedback[]
}

enum Role {
  USER
  ADMIN
}

model Audit {
  id             String       @id @default(cuid())
  userId         String
  user           User         @relation(fields: [userId], references: [id], onDelete: Cascade)
  createdAt      DateTime     @default(now())
  updatedAt      DateTime     @updatedAt
  deletedAt      DateTime?
  status         AuditStatus  @default(QUEUED)
  inputs         Json
  reportJson     Json?
  reportHtml     String?      @db.Text
  pdfUrl         String?
  emailStatus    EmailStatus  @default(NOT_SENT)
  lastEmailedAt  DateTime?
  errorMessage   String?      @db.Text
  feedback       Feedback[]
  @@index([userId])
  @@index([status])
  @@index([createdAt])
}

enum AuditStatus {
  QUEUED
  RUNNING
  COMPLETE
  FAILED
}

enum EmailStatus {
  NOT_SENT
  SENT
  FAILED
}

model Feedback {
  id        String   @id @default(cuid())
  auditId   String
  audit     Audit    @relation(fields: [auditId], references: [id], onDelete: Cascade)
  userId    String
  user      User     @relation(fields: [userId], references: [id], onDelete: Cascade)
  rating    Int
  comment   String?  @db.Text
  createdAt DateTime @default(now())
  @@unique([auditId, userId])
  @@index([auditId])
  @@index([rating])
}

REPORT RENDERING (ROUTES)

User Routes:
- GET /audits — Dashboard list
- GET /audits/new — New audit wizard
- GET /audits/[id] — HTML report view
- GET /audits/[id]/pdf — Serve PDF
- DELETE /api/audits/[id] — Soft delete
- POST /api/audits/[id]/feedback — Submit feedback

API Routes:
- POST /api/audits — Create new audit
- GET /api/audits/[id]/status — Status polling

Admin Routes:
- GET /admin/audits — All audits
- GET /admin/audits/[id] — Admin view
- POST /api/admin/audits/[id]/resend-email — Re-send email
- GET /admin/feedback — All feedback

EMAIL CONTENT

On Success:
- Subject: Your SEO Audit for {{website_url}} is Ready
- Body: Branding header, key findings bullets, CTA to view report, PDF download link
- Footer: Support email

On Failure:
- Subject: Your SEO Audit for {{website_url}} encountered an issue
- Body: Apology + error description + support contact

UI/UX REQUIREMENTS
- Responsive: mobile-first
- Dashboard: paginated (20/page), sorted newest first
- Status badges: Queued (gray), Running (blue), Complete (green), Failed (red)
- Status polling: every 5 seconds while QUEUED/RUNNING
- Report view: tabs/accordion per section, feedback form at bottom
- PDF: branded header/footer, page numbers

SECURITY
- All reads verify user ownership or admin role
- Admin routes require role=ADMIN
- Audit inputs sanitized
- PDF URLs behind auth
- Rate limit: 10 audits/user/day

DEPLOYMENT
- Web app: Vercel
- Worker: Railway (separate long-running Node process)
See also: Codex Prompt v1.0 (Archive) — original version, embedded as toggle on parent project page.