Why ProofSet size doesn’t really matter

Analysis

Given each challenge is sampled at random (i.e. all challenges are independent) among N indexes, the probability that a single challenge hit a node that the prover can prove is (1-\alpha)

This means that for K challenges, the probability that the prover who lost an \alpha fraction of the data is not caught is

$p = (1-\alpha)^K$

As one can observe, given we are considering the percentage of storage lost (i.e. \alpha), the soundness error only depends on \alpha and K, not N.

Of course, the absolute value of data lost depends on ProofSet size, but this is not considered in the security analysis.

Assuming one proof per day, K challenges each, the soundness error decreases over time as

$$ p_{\text{day } T} = (1-\alpha)^{K\cdot T} $$

If we call \epsilon the "probability of evasion”, we have that

Detection probability vs data loss fraction

α (fraction of data lost) Per-day evasion ( (1−α)^5 ) Per-day detection 30-day evasion ( (1−α)^(150) ) 30-day detection
1% 0.95099 4.901 % 0.22145 77.855 %
5% 0.77378 22.622 % 0.00046 99.954 %
20% 0.32768 67.232 % 2.91 × 10⁻¹⁵ ≈ 100 %