Ari
Humans have a kind of ‘hardware passthrough’ effect when it comes to communicating: what we see or read, even if it’s meant to be scoped to a specific domain, aspect, or level of abstraction, tends to effect our views and actions in some base model we have of the world. Consider the ‘saying is believing’ effect, where people tend to align their internal views with what they say, even if they have to shift significantly to match what they verbalized. In a way, this is a kind of societal defense system—it requires cognitive load to be too much of a chameleon, because the human mind retroactively attempts to align itself. Do language models suffer from the saying is believing effect? Teacher-forcing words into models certainly allows us to manipulate them, but models are also capable of ignoring their own chain-of-thought. To really understand what passes through to a model’s base perception and when we’d need to have a better abstraction to describe the shape of LLM ‘beliefs’. This seems important—if we’re going to trust LLM decisions, we need to know when the LLM identifies information as part of its model of reality just by processing it.
This is likely related to the ways models deal with concept incongruence—a term coined by C&I researchers to describe the fact that many requests are misspecified in ways that create contradictions that must be resolved, such as drawing a two-horned unicorn.

LLMs will inevitably leak information from data that is being ‘read’ into their model of the current situation, since this is necessary for many tasks. The real question is how will this transform an LLM’s views if new information conflicts with the LLM’s priors.
@misc{holtzman2025hardwarepassthrough,
author = {Ari Holtzman},
title = {Hardware Passthrough and LLM Beliefs},
howpublished = {Communication and Intelligence — Live},
year = {2025},
month = May,
day = {31}
}