LLMs Know More Than They Show: On the Intrinsic Representation of...


Key Contributions


Traditional methods focus on user-perceptions and output-analysis to predict truthfulness and error-types

Where truthfulness is encoded?

Given the above,

However,

Probing classifiers