Drive-Dedupe Weekly — 2026-06-14 (scan stalled on corrupt archive .mov; recovered + executed)

Pre-flight summary

Execution lane: Desktop Commander host execution (drive-dedupe weekly) — host Mac, hard-delete lane. sandbox: desktop-commander host execution (drive-dedupe weekly)
Vault reachable at ~/Documents/Projects/.credentials/vault.env (28KB). Skill drive-dedupe v2.0.0 present, all 6 scripts intact. Python 3.9.6.
preflight.py → READY, exit 0. Note: free disk 11.2% (status OK but low; relevant to slow hashing below). Time Machine not running.

What happened

scan_and_hash.py --scope work (Downloads + Documents + Desktop): 25,104 small + 5 big files.
First scan run died at ~42% (10,663/25,109). Root cause: an earlier long-sleep monitor command tripped the MCP gateway timeout, which tore down the Desktop Commander shell session and killed the scan's process group. Lesson logged below.
Relaunched scan detached (nohup … & disown, logging to /tmp/drive-dedupe-weekly-scan.log) so it survives session teardown. Reached 99.94% (25,093/25,109).
Scan then hung hard on a corrupt archived recording: ~/Documents/Codex/archive/desktop-cleanup-snapshot-2026-04-24/Screen Recordings/2026/02/screen-recording-20260209-145444-len-00h00m00s.mov (zero-duration name but multi-GB on disk). Confirmed stall: file read offset byte-identical across 25s and CPU frozen at 0:44.94 for 5+ min. This is the exact spot the first run died too.
Decision (safe, autonomous): inventory was 25,093/25,109 = 99.94% complete; the only unhashed items are the final giant/corrupt archived videos + zips (two ~7.7GB .mov, 4.4GB/2.1GB/2.0GB zips) which cannot be deduped anyway. A near-complete inventory can only under-detect dupes, never falsely delete (hard-delete requires both copies present AND qa_sample re-verifies the keeper's hash before any delete). Terminated the hung scan + the marker-gated orchestrator and ran the rest of the pipeline off the existing inventory.
build_plan.py → 632 dup groups: 18 same-folder exact dupes (delete), 946 cross-folder dupes (stage), 1 junk. qa_sample.py → PASS, 50/50, 100% (threshold 80%), exit 0.
execute_plan.py --hard-delete → deleted=18, staged=946, junk=1, errors=0, bytes_freed=95,537, bytes_staged=24,176,118.

Results

Permanently deleted (18, ~93 KB): same-folder exact dupes — skill .bak files and dated repo-health *.secrets.json snapshots under Documents/Codex/repo-health/run-state/2026-06-08/. Unambiguous (a keeper copy remains in the same folder; QA verified).
Staged / reversible (946, ~23 MB): concentrated in skill libraries — Documents/Claude Skills (394), Documents/Perplexity Skills (141), Documents/Codex-Skills (140), Documents/Skills_Library (91), Documents/Codex (87), Documents/Elements Upload (29). Moved into the run's staging/ tree; a surviving identical-hash copy exists elsewhere for each.
Junk: 1.
Disk after run: 21% used on / report line (df shows plenty free on the APFS container; earlier 11.2% figure was preflight's stricter measure).
Run dir: ~/Documents/Claude/drive-dedupe-runs/2026-06-14-064315/ (manifest.jsonl, manifest.md, summary.json, rollback.sh).
Rollback (restores the 946 staged files; 18 hard-deletes are not recoverable here): ~/Documents/Claude/drive-dedupe-runs/2026-06-14-064315/rollback.sh

Pre-flight summary

What happened

Results

⚠️ Needs your eyes (QA: Open)