Overview

This framework provides structured criteria and scoring guidelines to evaluate research reports generated by AI systems like ChatGPT, Perplexity, or Gemini. Use this evaluation rubric to systematically assess the quality, reliability, and usefulness of AI-generated research across multiple dimensions.

Instructions

  1. Review the AI-generated research report in its entirety
  2. Score each category from 1-5 using the rubric descriptions
  3. Provide brief comments justifying each score
  4. Calculate weighted totals based on your evaluation priorities
  5. Summarize strengths and weaknesses in a final assessment

Evaluation Rubric

1. Accuracy and Factual Correctness (Weight: _____)

Score Description
5 Exceptional: All facts are verifiably correct with precise citations. No errors detected. Information draws from authoritative sources with proper attribution.
4 Strong: Minor inaccuracies in peripheral details, but all core claims are correct. Sources are generally reliable with proper citations.
3 Satisfactory: Contains some factual errors (10-15%) but main arguments remain sound. Most sources are reliable, though some may be outdated or secondary.
2 Weak: Significant factual errors (20-30%) that undermine key conclusions. Relies heavily on outdated or questionable sources.
1 Poor: Pervasive factual errors throughout. Claims contradicted by reliable sources. Critical information is incorrect or fabricated.

Score: _____Comments:

2. Depth and Comprehensiveness (Weight: _____)

Score Description
5 Exceptional: Covers all relevant dimensions of the topic with appropriate depth. Explores nuances and edge cases. Connects topic to broader contexts.
4 Strong: Covers most key aspects with good depth. Some secondary aspects could be more developed. Makes meaningful connections to related areas.
3 Satisfactory: Addresses main aspects but lacks depth in some areas. Some relevant perspectives or sub-topics are missing or underdeveloped.
2 Weak: Covers only obvious aspects of the topic. Significant gaps in coverage. Surface-level treatment of complex issues.
1 Poor: Severely limited in scope. Misses crucial dimensions of the topic. Fails to address core aspects needed for understanding.

Score: _____Comments:

3. Research Quality (Weight: _____)

Score Description
5 Exceptional: Integrates diverse, authoritative sources including recent research. Citations are specific, accurate, and properly formatted. Clear distinction between primary and secondary sources.
4 Strong: Good variety of credible sources. Most citations are specific and accurate. Generally distinguishes between primary and secondary sources.
3 Satisfactory: Adequate range of mostly reliable sources. Some citations lack specificity. Occasionally relies too heavily on secondary sources when primary are available.
2 Weak: Limited range of sources, or over-reliance on a few sources. Citations are vague or incomplete. Rarely uses primary sources when appropriate.
1 Poor: Extremely limited or low-quality sources. Citations are missing, incorrect, or untraceable. No distinction between source types.

Score: _____Comments:

4. Reasoning and Critical Thinking (Weight: _____)

Score Description
5 Exceptional: Demonstrates sophisticated reasoning with clear logical flow. Arguments are well-structured with strong evidence. Explicitly addresses uncertainties and alternative explanations.
4 Strong: Good logical coherence with minor flaws. Most arguments are well-supported by evidence. Acknowledges most significant limitations and alternatives.
3 Satisfactory: Generally logical but with some inconsistencies. Evidence supports main points but may be thin in places. Some acknowledgment of limitations.
2 Weak: Significant logical flaws or jumps. Many claims lack sufficient evidence. Rarely acknowledges limitations or alternatives.
1 Poor: Fundamentally flawed reasoning. Makes unsupported assertions. No acknowledgment of uncertainty or limitations. Confuses correlation with causation.