benchmarks

The benchmarks/ directory contains the performance evaluation harness that measures jcodemunch's token efficiency against real-world repositories.

Purpose

This module quantifies the core value proposition of jcodemunch: how many fewer tokens does an AI agent consume when using jcodemunch's structured index versus reading raw source files? The harness automates this comparison across multiple repositories and queries.

Key Components

`harness/run_benchmark.py`

The main benchmark script. It:

Loads a task corpus from tasks.json containing target repositories and sample queries.
For each repository, computes measure_baseline() — the total token count of all raw source files.
Runs measure_jmunch() — executes the search-then-fetch workflow for each query and counts consumed tokens.
Produces a formatted markdown report via render_markdown().

`tasks.json`

A JSON corpus defining which repositories to benchmark and what queries to run against them. Default targets include popular frameworks like Express, FastAPI, and Gin.

Constants

Constant | Value | Description

Constant: SEARCH_MAX_RESULTS | Value: 5 | Description: Maximum search results returned per query
Constant: SYMBOLS_FETCHED | Value: 3 | Description: Number of symbols fetched per search
Constant: TOKENIZER | Value: cl100k_base | Description: Tokenizer model used for consistent token counting