Project 12 — AI Governance Knowledge Base (RAG Agent for Policies & Controls)

Executive summary (post this at the top of the repo/Notion)

Answer “Which control covers X?” with cited SOC 2, NIST 800-53, ISO 42001 + internal policies, plus mock evidence links in one chat UI.
Every query is tamper-evidently logged (SHA-256 hash chain) with retrieved chunks, model, prompt, and confidence.
Security add-ons: model governance policy (allowed models, PII redaction, data use rules) + prompt-injection detection (rule-based + LLM-based) with audit outcomes.

Architecture (RAG + Governance)

[Streamlit UI]
     │  user query
     ▼
[Guardrails Layer]
  - PII scrubber
  - Prompt injection detector
  - Model governance policy check
     │  sanitized query + allow/deny
     ▼
[Retriever] ← embeddings (Bedrock Titan or local), vectors (FAISS/pgvector/Pinecone)
     │  top-k control/policy chunks + mock evidence refs
     ▼
[Bedrock LLM or Local LLM]
     │  grounded answer + citations + confidence
     ▼
[Audit Logger]
  - JSON entry + SHA-256(prev_hash+entry)
  - Optional KMS signing (later)

Repo structure (paste into README)

genai-architecture-portfolio/
└─ project-12-ai-governance-kb/
   ├─ README.md
   ├─ app/                       # Streamlit app
   │  ├─ main.py
   │  ├─ guardrails.py
   │  ├─ retriever.py
   │  ├─ governance.py
   │  ├─ logger.py
   │  └─ security_eval.py
   ├─ data/
   │  ├─ controls_soc2.csv
   │  ├─ controls_nist80053.csv
   │  ├─ controls_iso42001.csv
   │  ├─ policies_internal.csv
   │  └─ evidence_map.json
   ├─ vectors/                   # generated at ingestion
   ├─ scripts/
   │  ├─ ingest.py               # build vectors from CSV
   │  └─ seed_internal_policy.py
   ├─ security/
   │  ├─ model_governance.yaml   # allowed models, regions, logging rules
   │  ├─ pii_patterns.yaml       # simple PII regexes
   │  └─ prompt_injection_rules.yaml
   ├─ audits/
   │  └─ audit_log.jsonl         # chain-hashed audit entries
   ├─ .env.example
   ├─ requirements.txt
   └─ Makefile

Here’s a ready-to-run Bash script that will create the full folder and file structure for your lab repo exactly as described.

Save this as setup_structure.sh in your root folder, then run:

bash setup_structure.sh

#!/bin/bash
# setup_structure.sh
# Create the folder and file structure for Project 12 - AI Governance Knowledge Base

BASE_DIR="genai-architecture-portfolio/project-12-ai-governance-kb"

echo "🚀 Creating directory structure for $BASE_DIR ..."

# Core directories
mkdir -p $BASE_DIR/{app,data,vectors,scripts,security,audits}

# Touch base files
touch $BASE_DIR/README.md
touch $BASE_DIR/.env.example
touch $BASE_DIR/requirements.txt
touch $BASE_DIR/Makefile

# App files
touch $BASE_DIR/app/{main.py,guardrails.py,retriever.py,governance.py,logger.py,security_eval.py}

# Data files
touch $BASE_DIR/data/{controls_soc2.csv,controls_nist80053.csv,controls_iso42001.csv,policies_internal.csv,evidence_map.json}

# Vectors dir (empty for now)
echo "# Vector data will be generated here by ingest.py" > $BASE_DIR/vectors/README.txt

# Scripts
touch $BASE_DIR/scripts/{ingest.py,seed_internal_policy.py}

# Security configs
touch $BASE_DIR/security/{model_governance.yaml,pii_patterns.yaml,prompt_injection_rules.yaml}

# Audits
touch $BASE_DIR/audits/audit_log.jsonl

echo "✅ Folder and file structure created successfully."

# Optional: tree output (if tree is installed)
if command -v tree &> /dev/null
then
    tree genai-architecture-portfolio
else
    echo "💡 Tip: install 'tree' to visualize structure: sudo apt install tree"
fi

🧩 Usage

Save this script in the same directory where you want genai-architecture-portfolio/ to be created.