The full product specification used to build AI SEO Audit. Covers architecture, data model, job flow, report structure, UX requirements, and deployment. Refined collaboratively with Claude Opus 4.5 from the v1.0 baseline.
Version: 2.0
Refined with: Claude Opus 4.5
Based on: Codex Prompt v1.0 (original)
Status: Current β used for active development
You are an expert full-stack engineer. Build a production-ready internal web app for SEO Audits.
APP IDENTITY
- Name: AI SEO Audit
- Display in: navbar, page titles (<title>), email headers, PDF headers
- Email "from": AI SEO Audit <noreply@yourdomain.com>
GOAL
- Signed-in users request an SEO audit with minimal inputs via a stepped wizard UI.
- Audit runs asynchronously via a job queue.
- On completion: send an email via SendGrid containing:
- 3β6 bullet summary of key findings
- A link to the HTML report in-app
- A link to the PDF report
STACK (LOCKED)
- Next.js 14+ (App Router) + TypeScript + Tailwind
- Postgres + Prisma
- Auth: NextAuth v4 (Google OAuth ONLY)
- Email: SendGrid
- Async job queue: pg-boss (backed by Postgres; NO Redis)
AUTH IMPLEMENTATION (LOCKED)
- Use NextAuth route handler at /app/api/auth/[...nextauth]/route.ts
- Use getServerSession(authOptions) for route protection in API routes and server components
ADDITIONAL LIBRARIES (LOCKED)
- cheerio (HTML parsing)
- axios (HTTP requests)
- puppeteer (PDF generation from HTML)
- openai (LLM analysis)
STYLING (LOCKED)
- Use the attached theme.css as the base design system
- All components must use CSS variables defined in theme.css
- Do not override or create conflicting styles
- Tailwind config should extend (not replace) these theme tokens via tailwind.config.ts
SOURCE DOCUMENT (CANONICAL)
Use the attached PDF document as the canonical structure, steps, and output formats for each audit section. Do not invent extra sections or change table formats.
AUDIT SECTIONS (MUST MATCH PDF)
1. GBP Audit
2. GBP Ranking Factors
3. Complete Onpage SEO Audit (homepage or user-submitted URL)
4. Schema Audit
5. Rankability Test
GBP AUDIT ADAPTATION
The PDF references discovering competitors via Google Maps searchβthis is NOT implemented due to ToS concerns and reliability issues. Instead:
- Treat the user-provided competitor_gbp_urls as the "top 5 competitors"
- Label them Competitor #1 through #5 in the order provided
- Attempt best-effort extraction from the user-provided GBP URLs. If blocked/unavailable, set fields to unknown and continue; do not attempt automated Google Maps search.
- If user provides fewer than 5 URLs, label only those provided (e.g., Competitor #1β#3)
- No Google Maps SERP scraping. No bypassing bot protections.
MINIMAL INPUTS (WIZARD)
Wizard flow: 3 steps
Step 1 - Basic Info:
- website_url (required; validate as valid URL)
- primary_keyword (required; text input)
- city_state (required; text input; label as "City, State/Province")
- gbp_search_phrase (required; text input; e.g., "plumber near me")
- business_type (required; dropdown with options: Restaurant, Retail Store, Service Business, Medical Practice, Legal Services, Home Services, Other)
Step 2 - Competitors:
- competitor_gbp_urls (required; list of 3β5 Google Business Profile URLs)
- Minimum: 3 URLs
- Maximum: 5 URLs
- Validate: must be valid Google Maps/Business URLs
- UI must include helper text: "To find competitor GBP URLs: Search your keyword on Google Maps, click on a competitor's listing, and copy the URL from your browser's address bar."
Step 3 - Review & Submit:
- Display all inputs for user confirmation
- Submit button creates audit and redirects to dashboard
EXECUTION STRATEGY
- Competitor discovery: User-provided URLs only; NO Google Maps scraping
- gbp_search_phrase is used only for labeling/context + LLM analysis; do not perform automated Google/Maps searches
- Page fetching: axios to fetch HTML
- HTML parsing: cheerio
- Schema extraction: Parse JSON-LD blocks from page source using cheerio
- PageSpeed data: Google PageSpeed Insights API (free tier)
- PDF generation: Puppeteer renders report_html to PDF
- File storage (PDF): Local filesystem for dev; S3-compatible interface for prod
ANALYSIS APPROACH
- Data extraction: Deterministic code (fetching, parsing, API calls)
- Qualitative analysis (pattern recognition, ranking factor hypothesis, verdicts): Call OpenAI API with structured prompts derived from the PDF
- LLM Model: gpt-4o (or gpt-4o-mini for cost savingsβmake configurable via env var)
- Token budget: ~10k tokens per audit section max
- Prompt construction: For each section requiring LLM analysis, construct a prompt that includes the relevant section instructions from the PDF, the extracted data as structured input, and clear output format requirements matching REPORT_JSON STRUCTURE
JOB BEHAVIOR
- Timeout: 10 minutes max per audit job
- Retries: 2 retries on transient failures (network errors, API timeouts)
- Partial failure: If one section fails, continue with remaining sections; mark failed section as "status": "error" with "error_message" in report_json
- Concurrency: Process one audit at a time per worker (configurable)
JOB FLOW (MUST IMPLEMENT)
1. API Endpoint POST /api/audits:
- Validate auth (must be signed in)
- Validate inputs
- Create Audit record with status=queued
- Enqueue pg-boss job with audit_id
- Return { audit_id, status: "queued" }
2. Worker Process (/worker):
- Poll pg-boss queue
- On job received:
- Set status=running
- Execute audit steps (fetch pages, parse data, call APIs, call LLM)
- Build report_json following REPORT_JSON STRUCTURE
- Render report_html from report_json
- Generate PDF from report_html using Puppeteer
- Store PDF (local or S3) and save pdf_url
- Set status=complete
- Send SendGrid email with summary + links
- Update email_status=sent, last_emailed_at=now
3. Error Handling:
- If unrecoverable error: status=failed, error_message=<error details>
- Optionally send failure notification email
- Log full error for debugging
REPORT_JSON STRUCTURE
HTML tables must match the PDF's headings and row order; report_json must be a structured equivalent of the same data:
{
"gbp_audit": {
"competitors": [
{
"label": "Competitor #1",
"name": "string",
"keywords_in_name": { "present": true, "keywords": ["keyword1"] },
"primary_category": "string",
"secondary_categories": ["string"],
"address": "string",
"city_borough": "string",
"service_areas": ["string"],
"location_type": "physical | SAB",
"review_count": 123,
"average_rating": 4.5,
"recent_reviews_30d": 5,
"review_keywords": ["keyword1", "keyword2"],
"description_present": true,
"description_length": "short | medium | long",
"description_keywords": ["keyword"],
"services_filled": true,
"services_count": 10,
"services_notable": ["Service Name"],
"products_used": false,
"photo_count": 25,
"last_photo_date": "2024-01-15",
"photo_types": ["job", "team", "branded", "stock"],
"videos_present": false,
"posts_active": true,
"last_post_date": "2024-01-10",
"last_post_type": "offer | update | service",
"qa_present": true,
"qa_owner_answered": true,
"messaging_enabled": true,
"website_linked": true,
"website_type": "local | directory | landing_page",
"badges_certifications": ["string"]
}
],
"patterns": {
"top_3_common": {
"categories": ["string"],
"review_count_range": "50-200",
"photo_frequency": "weekly | monthly",
"keyword_usage": "string"
},
"outliers": [
{ "competitor": "#2", "observation": "Ranks high with only 20 reviews" }
]
},
"ranking_factors_hypothesis": [
{ "rank": 1, "factor": "Review authority", "reasoning": "string" },
{ "rank": 2, "factor": "Category relevance", "reasoning": "string" },
{ "rank": 3, "factor": "Proximity", "reasoning": "string" },
{ "rank": 4, "factor": "Keyword usage", "reasoning": "string" },
{ "rank": 5, "factor": "Activity/freshness", "reasoning": "string" }
],
"status": "complete | error",
"error_message": null
},
"gbp_ranking_factors": {
"levers": [
{
"rank": 1,
"lever": "string",
"evidence": "Competitors #1, #3 demonstrate this by...",
"why_it_matters": "string"
}
],
"status": "complete | error",
"error_message": null
},
"onpage_audit": {
"url_audited": "<https://example.com>",
"final_url": "<https://example.com> (after redirects)",
"summary": [
"Big win or problem #1",
"Big win or problem #2",
"Big win or problem #3",
"Big win or problem #4",
"Big win or problem #5"
],
"findings": [
{
"area": "Title tag",
"checked": "Exact element checked",
"found": "Actual content found (quoted)",
"status": "correct | wrong | needs_improvement | unknown",
"why_it_matters": "One line explanation",
"priority": "P0 | P1 | P2"
}
],
"actions": [
{
"priority": "P0",
"task": "Clear task description",
"recommendation": "Exact replacement text or specific instruction",
"effort": "S | M | L",
"impact": "low | medium | high",
"notes": "Dependencies or context"
}
],
"serp_rewrites": {
"title_options": ["Option 1 (50-60 chars)", "Option 2"],
"meta_description_options": ["Option 1 (140-160 chars)", "Option 2"]
},
"page_speed": {
"performance_score": 75,
"lcp": { "value": "2.5s", "status": "pass | fail" },
"inp": { "value": "200ms", "status": "pass | fail" },
"cls": { "value": "0.1", "status": "pass | fail" }
},
"status": "complete | error",
"error_message": null
},
"schema_audit": {
"existing": [
{
"schema_type": "LocalBusiness",
"exists": true,
"key_fields_present": ["name", "address", "telephone"],
"verdict": "helpful | bare_minimum | broken"
}
],
"missing": [
{
"schema_type": "Service",
"why_it_matters": "string",
"priority": "high | medium | low"
}
],
"examples": [
{
"schema_type": "LocalBusiness",
"priority": "high",
"json_ld": "{ JSON-LD code block with placeholders }"
}
],
"status": "complete | error",
"error_message": null
},
"rankability": {
"target_url": "<https://example.com>",
"target_keyword": "string",
"top_3_competitors": [
{ "rank": 1, "url": "string", "title": "string" }
],
"verdict": "deserves_higher | neutral | deserves_lower",
"primary_reason": "1-2 sentence explanation",
"top_improvement": "Single most impactful change",
"actions": [
{
"rank": 1,
"action": "Specific action item",
"expected_impact": "string"
}
],
"status": "complete | error",
"error_message": null
},
"metadata": {
"audit_id": "uuid",
"created_at": "ISO timestamp",
"completed_at": "ISO timestamp",
"total_duration_ms": 12345,
"sections_completed": 5,
"sections_failed": 0
}
}
DATA MODEL (PRISMA)
model User {
id String @id @default(cuid())
email String @unique
name String?
image String?
role Role @default(USER)
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
audits Audit[]
feedback Feedback[]
}
enum Role {
USER
ADMIN
}
model Audit {
id String @id @default(cuid())
userId String
user User @relation(fields: [userId], references: [id], onDelete: Cascade)
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
deletedAt DateTime?
status AuditStatus @default(QUEUED)
inputs Json
reportJson Json?
reportHtml String? @db.Text
pdfUrl String?
emailStatus EmailStatus @default(NOT_SENT)
lastEmailedAt DateTime?
errorMessage String? @db.Text
feedback Feedback[]
@@index([userId])
@@index([status])
@@index([createdAt])
}
enum AuditStatus {
QUEUED
RUNNING
COMPLETE
FAILED
}
enum EmailStatus {
NOT_SENT
SENT
FAILED
}
model Feedback {
id String @id @default(cuid())
auditId String
audit Audit @relation(fields: [auditId], references: [id], onDelete: Cascade)
userId String
user User @relation(fields: [userId], references: [id], onDelete: Cascade)
rating Int
comment String? @db.Text
createdAt DateTime @default(now())
@@unique([auditId, userId])
@@index([auditId])
@@index([rating])
}
REPORT RENDERING (ROUTES)
User Routes:
- GET /audits β Dashboard list
- GET /audits/new β New audit wizard
- GET /audits/[id] β HTML report view
- GET /audits/[id]/pdf β Serve PDF
- DELETE /api/audits/[id] β Soft delete
- POST /api/audits/[id]/feedback β Submit feedback
API Routes:
- POST /api/audits β Create new audit
- GET /api/audits/[id]/status β Status polling
Admin Routes:
- GET /admin/audits β All audits
- GET /admin/audits/[id] β Admin view
- POST /api/admin/audits/[id]/resend-email β Re-send email
- GET /admin/feedback β All feedback
EMAIL CONTENT
On Success:
- Subject: Your SEO Audit for {{website_url}} is Ready
- Body: Branding header, key findings bullets, CTA to view report, PDF download link
- Footer: Support email
On Failure:
- Subject: Your SEO Audit for {{website_url}} encountered an issue
- Body: Apology + error description + support contact
UI/UX REQUIREMENTS
- Responsive: mobile-first
- Dashboard: paginated (20/page), sorted newest first
- Status badges: Queued (gray), Running (blue), Complete (green), Failed (red)
- Status polling: every 5 seconds while QUEUED/RUNNING
- Report view: tabs/accordion per section, feedback form at bottom
- PDF: branded header/footer, page numbers
SECURITY
- All reads verify user ownership or admin role
- Admin routes require role=ADMIN
- Audit inputs sanitized
- PDF URLs behind auth
- Rate limit: 10 audits/user/day
DEPLOYMENT
- Web app: Vercel
- Worker: Railway (separate long-running Node process)
See also: Codex Prompt v1.0 (Archive) β original version, embedded as toggle on parent project page.