0. About AI Student

AI Student is an LLM-based evaluator that watches an educational video from a learner's point of view and reports how well the video would have taught them. It is designed to stand in for a human reviewer: given a video and a description of a target learner (the persona), AI Student simulates watching the video at that persona's level of background knowledge, pacing preference, and learning style, then scores the video across four independent axes.

What AI Student evaluates

AI Student evaluates short-to-medium educational videos (typically [3–15] minutes) on technical or academic topics. Each submission consists of:

Before scoring, AI Student extracts an internal content_map: an ordered list of the teaching units it detected in the video (each unit: topic, time span, claimed concept). The content_map is used both as the reference for coverage checks (S3 Content Completeness) and to define the "slide / beat" counting unit used by the B-scale metrics.

How AI Student works internally

AI Student is a three-agent pipeline. All three agents are LLM instances (currently gemini-3-flash), differing only in their prompts and the inputs they receive.

image.png

Agent 1 — Perceiving LLM. Inputs: the raw video, the title, and the persona. Output: a Content Map (the ordered list of teaching units referenced throughout this rubric — see S3, and D6's "slide/beat" unit) plus a Multi-modal Audit of presentation, visual alignment, and accessibility. Role: this agent does not score. It extracts what is in the video so the scoring agents can work from a shared, structured observation. → Produces the agent1_content_analyst block in the output JSON.

Agent 2 — Grading LLM. Inputs: Agent 1's Content Map & Audit only (no access to raw video). Output: Accuracy and Logic scores, with metric-level ratings and evidence. Role: because Agent 2 operates on Agent 1's structured observations, its scoring is reproducible and auditable against a fixed artifact. If a contestant contests a score, the Content Map is the shared ground truth. → Produces the agent2_gap_analysis_judge block.

Agent 3 — Persona Judging LLM. Inputs: the raw video, the title, the persona, and Agent 1's Content Map & Audit. Output: Adaptability and Engagement scores. Role: these two dimensions are irreducibly subjective — "does the pacing feel right for this persona?", "is the voice energizing?" — so Agent 3 re-watches the video itself while cross-referencing Agent 1's structured extraction. → Produces the subjective_evaluation block.

Design implication for contestants.

The four dimensions, at a glance

Dimension The question it answers
Accuracy Is the content correct, complete, and faithful to the title?
Logic Does it build coherently, without unjustified jumps or overload?
Adaptability Is it matched to the given learner persona?
Engagement Does it hold attention without being hollow spectacle?

Each dimension is scored independently in [0.0, 5.0] and reported as a separate number. This rubric never collapses the four into a single overall grade — any aggregated leaderboard score is defined by the competition rules, not here.