The benchmarks/ directory contains the performance evaluation harness that measures jcodemunch's token efficiency against real-world repositories.
This module quantifies the core value proposition of jcodemunch: how many fewer tokens does an AI agent consume when using jcodemunch's structured index versus reading raw source files? The harness automates this comparison across multiple repositories and queries.
harness/run_benchmark.pyThe main benchmark script. It:
tasks.json containing target repositories and sample queries.measure_baseline() — the total token count of all raw source files.measure_jmunch() — executes the search-then-fetch workflow for each query and counts consumed tokens.render_markdown().tasks.jsonA JSON corpus defining which repositories to benchmark and what queries to run against them. Default targets include popular frameworks like Express, FastAPI, and Gin.
SEARCH_MAX_RESULTS | Value: 5 | Description: Maximum search results returned per querySYMBOLS_FETCHED | Value: 3 | Description: Number of symbols fetched per searchTOKENIZER | Value: cl100k_base | Description: Tokenizer model used for consistent token counting