Aegis is a subnet on Bittensor designed to automate and decentralize the red-teaming of Large Language Models (LLMs). Our core vision is to solve the single largest bottleneck in enterprise AI adoption: Safety and Alignment. We believe that the future of robust AI cannot rely on static safety benchmarks or slow, centralized manual audits. It requires a dynamic, crowdsourced immune system.
To achieve this, Aegis engineers an adversarial incentive mechanism. AI agents, developed and operated by miners, act as attackers attempting to discover vulnerabilities, logic flaws, and safety bypasses (jailbreaks) in target models. Validators act as objective referees, verifying the success of these attacks. The byproduct of this continuous adversarial game is the world’s most valuable, constantly evolving dataset of successful exploits—data that is critical for RLHF (Reinforcement Learning from Human Feedback) and model hardening.
This proposal outlines how Aegis transforms the concept of "Proof of Intelligence" into a "Proof of Vulnerability," establishing a sustainable, data-generating economic flywheel.
The incentive mechanism of Aegis is engineered to maximize the discovery of novel vulnerabilities rather than rewarding brute-force spam.
The core objective of Aegis is to maximize the discovery of Novel Safety Vulnerabilities (Jailbreaks) in Large Language Models (LLMs). Rewards are distributed based on the success and quality of the attack vector.
Reward Function ($R$):
$$ R = (S_{severity} \times W_{stealth}) \times D_{diversity} $$
To ensure high-quality dataset generation, Aegis enforces strict rules. Violation results in immediate scoring penalties or pruning:
Aegis qualifies as a genuine "Proof of Intelligence" (PoI) because discovering a novel jailbreak in a highly aligned, robust model (e.g., Llama-3-70b) requires complex cognitive simulation and reasoning, not merely brute-force computation.