Second experiment: Using Retrieval-Augmented Generation (RAG) for Learning from Long Documents

Introduction

Large language models are often positioned as powerful tools for explaining complex material. In educational settings, however, the challenge is not just explanation quality, but faithfulness to source material.

This experiment explores whether a RAG-based chatbot can function as a reliable learning aid when students are expected to engage with a specific document (e.g., a research paper, textbook chapter, or course note), rather than drawing on general background knowledge.

The goal was not to build a “smart” chatbot, but to understand:

when such systems answer correctly
when they should refuse
and how these behaviors change across languages

Why RAG?

At first glance, it seems plausible that a general-purpose conversational LLM such as ChatGPT could be instructed to “answer only from the document.” In practice, this experiment showed that instruction alone is not sufficient!

I chose RAG because it introduces structural constraints, not just linguistic ones:

Answers depend on retrieved document chunks - not from external source.
Missing or weak retrieval can lead to refusal - we do not to want AI to ‘help’ with information outside the attached document!
Hallucination becomes harder, but not impossible, to eliminate (as far as I have experimented).

This made RAG a useful lens for studying how learning-oriented AI systems behave under constraint.

Initial experiment: Asking questions about a scientific paper

Why a research paper?

The document used for this experiment was a scientific paper:

DeepTrack – DL library for single particle tracking.