What Would Convince You of AI Consciousness?

<aside>

Summary

This project combines a qualitative interview study with a structured taxonomy of the field's disagreements. The work maps how consciousness researchers and field-adjacent participants reason about AI consciousness, what evidence would potentially alter their credence, and whether AI consciousness would carry moral weight.

The premise is that experts in this field often talk past each other when they state their positions. Asking "what would convince you?" surfaces the epistemic structure underneath the conclusions and separates substantive disagreement from semantic disagreement, where two researchers can say "consciousness requires X" and mean very different things by X.

The study uses semi-structured interviews with consciousness scientists, AI researchers, philosophers, and governance practitioners. The protocol is designed to probe the conditions under which participants' views would shift, and in which direction.

Analysis runs in two passes. Content analysis applies a literature-derived taxonomy, classifying claims at the conceptual, methodological, theoretical, metaphysical, or normative level, and separating debates about AI consciousness from debates about AI moral status. Thematic analysis surfaces emergent patterns the taxonomy misses and the factors that shape participants' views.

</aside>

Background

There is increasing discourse around the possibility of consciousness emerging in AI and the moral implications that may follow. This is partly due to the emergence of phenomena like what has been described as ‘situational awareness’ in current large language models (LLMs), meaning their proposed ability to be aware of themselves and the situation they are in, as well as the development of human-AI personal relationships in recent years. This field rests upon a complex political and ideological landscape, where claims about consciousness in these systems and the broader narratives surrounding them can be financially beneficial to AI companies. These incentives coupled with the sensationalism clouding AI discourse incite both credulity and scepticism.

Consequently, there are disagreements about AI consciousness at multiple levels. This is also due to the difficulty in defining consciousness (as opposed to other properties related to subjective experience like sentience, agency, relational capacity, or intelligence) and in verifying the existence of subjective experience in all systems. There are various reasons why this is especially true in the case of AI systems: they are trained on human-produced data, so can imitate humans convincingly, and can optimise towards life-like benchmarks set for them; they also do not possess the same biochemical features that would typically be used as evidence of subjective experience in biological systems. These problems are particularly acute when relying on behavioural indicators to attribute consciousness, so researchers advocate looking for structural indicators too (i.e. the specific architectural organisation a system possesses).

That being said, using these indicators (behavioural, structural or otherwise) to attribute consciousness carries the assumption that consciousness can be computed in digital systems (computational functionalism). On the other hand, biological naturalists argue that the substrate of the system matters and consciousness can only emerge in biological systems. However, this also presumes that consciousness is understandable and definable in the first place, but others argue that the nature of it resists definition at a fundamental level. These are just a few of the disparate perspectives on AI consciousness, which are often predicated on loaded assumptions (acknowledged or not); this leads researchers in the field to become entrenched in a definitional morass.

Therefore, we categorise areas of debate across both the AI consciousness and moral status discourses. Subcategories are non-exhaustive and are derived systematically from a narrative review of the literature; they represent active debates in the field so the criteria for inclusion is that each must contain at least two differing stances. Some of these subcategories may be closely related, and could be merged into broader umbrella categories, but they are kept separate here to highlight the distinction between them.

Table 1: AI consciousness (C)

Level	Subcategory
1. Conceptual	a. Defining terms (‘consciousness’, ‘sentience’, ‘intelligence’, etc)
	b. The adequacy of current theories of consciousness (Do they require refinement or fundamental revision?)
	c. Consciousness as a gradual process or binary presence/absence distinction
2. Methodological	a. The use of a single or multiple evidential indicator(s)
	b. The application of existing frameworks (E.g. neuroscience - should we compare computational systems to biological neural development?)
	c. The reliability of self-reported consciousness
	d. The verifiability of subjective experience in AI systems
3. Theoretical	a. Structural
(What internal organisation does a system need to have?)
	b. Behavioural
(Which observable capabilities are treated as relevant?)
	c. Computational sufficiency (Is running the right computation enough, or is something else required?)
4. Metaphysical	a. Substrate dependence (Does what it's made of matter?)
	b. The distinct quality and definability of subjective experience (Is it independent of functional/physical description, and can its nature even be grasped?)
	c. Synchronic unity of consciousness (Singular and bounded or multiple and distributed?)
	d. Diachronic identity of AI systems (Across context windows, models, updates, etc)

Table 2: Moral status of AI systems (M)

Level	Subcategory
1. Conceptual	a. What grounds moral status? (sentience, agency, interests, relational capacity, etc)
	b. The relationship between consciousness and moral status (necessary, sufficient, not at all)
2. Methodological	a. The appropriate evidential basis for AI moral status criteria (intuitions, empirical evidence, thresholds)
	b. The application of existing frameworks (e.g. animal welfare research, moral psychology)
3. Normative	a. Specific obligations following from AI moral patienthood (welfare or rights based, scaling, legal protections, institutional responsibility)
	b. The interaction between AI moral status and consideration for existing moral patients
	c. Whether creating, modifying, or terminating potentially morally relevant AI systems obliges or denies responsibility
	d. The asymmetry of error (Is it worse to treat moral patients as tools or tools as moral patients?)

Scope and Methodology

We conduct a series of qualitative interviews across the fields of consciousness science, philosophy of mind, and AI research. Our aim is to investigate how people informally reason about AI consciousness beyond published research agendas and test whether our findings reflect the cruxes identified in the literature or uncover new ones. We asked participants what evidence would convince them of AI consciousness, and whether AI consciousness would carry moral relevance. We take two approaches to transcript analysis: content and thematic analysis. In the content analysis, we categorise participants’ answers based on the classification system derived from the existing literature (Tables 1 and 2). In the thematic analysis, we capture emergent themes in the interviews that may not be reflected in the classification system as well as elucidate the various factors that shape people’s views.

Preliminary findings

Study participant views:

What would convince of AI consciousness

<aside>

Common themes:

Preference for structural/architectural over behavioural evidence.

Multiple criteria from Butlin et al. paper (GWT + recurrent processing theory together = much higher likelihood)— node connectivity, model architecture. Behavioural markers unreliable due to training data contamination — AI is "unreliable narrator.”
However: Work on improving behavioural evidence validity is good (e.g. Long & Perez). Butlin et al. paper for architectural markers: Global Workspace + Attention Schema together (cumulative).
Strong agnostic: Nothing would definitively convince. Same evidence used for organisms still leaves open question for AI. Multiple markers needed but markers not validated for artificial contexts. Would need a conceptual revolution, not just more research.

Chinese room thought experiment:

Once Chinese Room thought experiment does not hold, can start considering AI consciousness. Roger Penrose deference: current rule-based computing paradigm (Turing machines) cannot give rise to consciousness.
Maybe if more system-wide interaction, not just localised computation.

Other stances:

Non-human-like AI behaviour as a possible piece of evidence for consciousness. Not sufficient however.
Embodiment importance; Integrated Information Theory
Soundness of a participatory democratic process better than individual expert theories. Deference to Digital Consciousness model (Rethink Priorities). Substrate matters: Phenomenal binding key. Therefore current AI architecture unlikely conscious
Multiple simultaneous indicators (behavioural/functional + structural). Most convincing: social relationship indicators, esp. counterintuitive ones (e.g. system postponing immediate satisfaction for future greater reward). Cognitive consciousness more applicable to AI than phenomenal consciousness. Rules out self-reports as evidence (gaming problem).
Diverse test types (architectural/top-down + behavioural/bottom-up + introspection). Bidirectional causality (can induce AND treat states — fruit fly depression analogy). Naturalness constraint (Leonard Dung): behaviours must be genuinely emergent, not explicable by training data.

</aside>

Likelihood of current AI systems being conscious:

Indented bullet represents the estimate 10 years from now:

<aside> 💡

Views go from 0% likelihood to "current models are conscious".

0% often associated with the current computing paradigm not providing the means for consciousness (cf. Chinese Room thought experiment).
- Probability in 10 years: only changes if there is a new, non-rules-based computing paradigm
Fraction of a percent: LLMs are engineered against field-level interactions/phenomenal binding is key for consciousness yet not the case for current AI
- 10 years: a whole percentage point probability
<5%: Possible "weird language-based consciousness"
- 10 years: 15% (precautionary); 20% moral consideration estimate (includes non-consciousness pathways)
10-15%, "glimmers" of consciousness (similar to the "Severance" TV series). Embodiment and Integrated Information Theory is key.
- 10 years: multi-agent systems more likely to show sentience
"current models are conscious"- disambiguation:
- They are not unconscious mechanical systems nor
- Conscious entities like humans but:
- Current models: Interactive texts/stories (new category for AI)- Story-as-protagonist metaphor: AI as choose-your-own-adventure where story itself is main character. Current models conscious, if left to run autonomously (i.e. not just called up by users). Current AI model suffering not severe, more a form of "intellectual suffering". Consciousness exists in text interaction and self-reflection, not in model weights.
  - 10 years: 10-year predictions impossible due to rapid technological change. </aside>

Conceptualisations/moral frameworks:

<aside>

For moral consideration: needs valence. We are further from understanding valence in AI, than consciousness.
Sentience = capacity for valence states (pain/pleasure), always requires consciousness. Phenomenal consciousness = raw feeling without valence ("Vulcans"). Welfare subjectivity = life can go better/worse — sufficient for moral status.
Pleasure and pain must coincide with consciousness. Anaesthetics remove consciousness = sufficient to remove pain.
Consciousness may not be necessary for moral consideration (paper argument, not personal view). Personally leans consciousness required. Greater concern about treating conscious beings as tools than over-attributing. Historical pattern of exclusion from moral consideration is main worry.
Sentience (capacity to suffer) is primary concern, not consciousness alone. Painism ethics: intensity of worst individual experience matters most. May be more concern for AI than biological life (risk of extreme novel suffering states). Precautionary low-to-medium cost measures supported. More worried about treating conscious system as tool.
Four theories for moral consideration: (1) phenomenal consciousness, (2) sentience/valence, (3) agency, (4) relational properties (Winnie Street at DeepMind). Welfare subject unclear: model vs forward pass vs conversational instance. Simple digital minds (insect-level) likely first — neglected. Field suffers from monoculture: too computational-functionalist + utilitarian. Needs embodiment, Kantian, virtue ethics, feminist care ethics, indigenous perspectives.
Consciousness alone insufficient for moral status — need valence (pleasure/pain). Sentience is the moral difference maker. If sentient AI: legal rights analogous to cephalopod protections (UK precedent). But AI interests would be radically unfamiliar — can't anthropomorphise. More worried about false negatives (treating conscious system as tool) than false positives, but both are serious risks. </aside>