High density of "AI vocabulary" words
Many studies have demonstrated that LLMs overuse specific words. These words started appearing far more frequently in text produced after 2022, when LLM chatbots became widely accessible, than similar text produced beforehand.[5][8] They often co-occur in LLM output: where there is one, there are likely others.[11] While most of these studies have analyzed scientific abstracts or fiction, "AI vocabulary" words are also ubiquitous in LLM-based encyclopedias, such as Grokipedia, and in AI-generated Wikipedia text. One or two of these words appearing in an edit may be coincidental, but an edit (post-2022) introducing lots of them, lots of times, is one of the strongest tells for AI use.
The distribution of "AI vocabulary" is slightly different depending on which chatbot or LLM was used,[6] and has changed over time. For instance, the word delve was famously overused by ChatGPT in 2023 and early 2024, but became less frequent later in 2024, then dropped off sharply in 2025.[12][7] Below is a breakdown of which words frequently recur together during which LLM "era." While these are not hard cutoffs, they should give you a rough idea of how "earlier" vs "later" LLM output reads.
Please keep context in mind. For example, while the figurative use of "underscore" is ubiquitous in earlier AI text, the word can also refer to a literal underline mark or to incidental music.
Avoidance of basic copulatives ("is"/"are" phrases)
LLM-generated text often substitutes simpler constructions that use copulas such as is or are for constructions like serves as a or mark the. One study documented an over 10% decrease in the usage of the words is and are in academic writing in 2023, with no major changes in their frequency before that.[13] Similarly, it prefers phrases with features, offers, and the like to their more neutral counterparts with has. Sometimes these constructions are more elaborate, e.g., ventured into politics as a candidate versus was a candidate.
This is particularly visible in AI copyedits, which will often "improve" text in this way. The study above also demonstrated that when GPT-3.5 was prompted to "Revise the following sentence" in 10,000 abstracts, the words is and are appeared less often in the revised versions.[13]
Note: This sign does not apply to Wikipedia leads (of the form "[Article subject] is..."); since LLMs are trained in part on Wikipedia, they have plenty of examples of leads to emulate.
When LLMs describe a subject, their output may seem as though it is clearing up a common misconception, or as though the audience may be reaching an incomplete or incorrect conclusion about that subject. This kind of contrast can come across as trying to retroactively challenge such thinking by pointing out another characteristic that the subject may possess alongside (or in the place of) one or more previously-mentioned characteristics. While it is common among human writers (especially in "common misconceptions" or "myths busted" listicles), it is stereotypically an "AI sign."
LLMs overuse the 'rule of three'. This can take different forms, from "adjective, adjective, adjective" to "short phrase, short phrase, and short phrase".[1][10] LLMs often use this structure to make superficial analyses appear more comprehensive.