AI Student is an LLM-based evaluator that watches an educational video from a learner's point of view and reports how well the video would have taught them. It is designed to stand in for a human reviewer: given a video and a description of a target learner (the persona), AI Student simulates watching the video at that persona's level of background knowledge, pacing preference, and learning style, then scores the video across four independent axes.
AI Student evaluates short-to-medium educational videos (typically [3–15] minutes) on technical or academic topics. Each submission consists of:
Before scoring, AI Student extracts an internal content_map: an ordered list of the teaching units it detected in the video (each unit: topic, time span, claimed concept). The content_map is used both as the reference for coverage checks (S3 Content Completeness) and to define the "slide / beat" counting unit used by the B-scale metrics.
AI Student is a three-agent pipeline. All three agents are LLM instances (currently gemini-3-flash), differing only in their prompts and the inputs they receive.

Agent 1 — Perceiving LLM.
Inputs: the raw video, the title, and the persona.
Output: a Content Map (the ordered list of teaching units referenced throughout this rubric — see S3, and D6's "slide/beat" unit) plus a Multi-modal Audit of presentation, visual alignment, and accessibility.
Role: this agent does not score. It extracts what is in the video so the scoring agents can work from a shared, structured observation.
→ Produces the agent1_content_analyst block in the output JSON.
Agent 2 — Grading LLM.
Inputs: Agent 1's Content Map & Audit only (no access to raw video).
Output: Accuracy and Logic scores, with metric-level ratings and evidence.
Role: because Agent 2 operates on Agent 1's structured observations, its scoring is reproducible and auditable against a fixed artifact. If a contestant contests a score, the Content Map is the shared ground truth.
→ Produces the agent2_gap_analysis_judge block.
Agent 3 — Persona Judging LLM.
Inputs: the raw video, the title, the persona, and Agent 1's Content Map & Audit.
Output: Adaptability and Engagement scores.
Role: these two dimensions are irreducibly subjective — "does the pacing feel right for this persona?", "is the voice energizing?" — so Agent 3 re-watches the video itself while cross-referencing Agent 1's structured extraction.
→ Produces the subjective_evaluation block.
Design implication for contestants.
| Dimension | The question it answers |
|---|---|
| Accuracy | Is the content correct, complete, and faithful to the title? |
| Logic | Does it build coherently, without unjustified jumps or overload? |
| Adaptability | Is it matched to the given learner persona? |
| Engagement | Does it hold attention without being hollow spectacle? |
Each dimension is scored independently in [0.0, 5.0] and reported as a separate number. This rubric never collapses the four into a single overall grade — any aggregated leaderboard score is defined by the competition rules, not here.