147 phrases across 5 structural archetypes that flag content as AI-written. Built from the Kompozy brand-safety gate — the same rules that govern our autopilot outputs.
Last verified · 2026-05-29 · by Moe Ameen
AI-generated content flags as AI through 5 structural patterns: hedge words, tricolons (lists of three), pseudo-philosophical bridges, authority hand-waves, and emotional thinness. The 147 phrases in this index cover the highest-frequency manifestations of each archetype. Banning these phrases at output-time is the single highest-leverage defense against AI-sounding content.
This is not a study. It is a curatorial index — an opinionated ranking of AI-tell phrases organized by the structural archetype that produces them. The methodology behind the index:
Content that flags as AI underperforms in three measurable ways: lower trust (readers discount the claims), lower share rate (people do not amplify generic-feeling content), and lower SEO ceiling (Google\'s Helpful Content algorithm down-ranks content with low specificity scores). Banning the 5 archetypes is not about hiding that AI was involved — it is about producing content that earns the same trust as the best human writing, regardless of how it was produced.
The Kompozy brand-safety gate applies banned-word filtering at output-time, not just prompt-time. This is the structural defense — banned words are stripped from the final output even if the model emits them, then the model is re-prompted to fill the gap.
Click any archetype to jump to its full phrase list and structural analysis.
LLMs are trained to avoid claim-confidence because confident claims trigger hallucination risk during RLHF. The model learned that hedging is "safer." Humans confident in their position do not hedge — they assert. Hedge words are the single most reliable AI tell because they appear in nearly every model-generated paragraph above 100 words.
Replace every hedge with a specific instance or a confident assertion. "It is often the case that X" becomes "X happened on the last three projects I shipped." Specificity beats hedging on every dimension: trust, engagement, memorability.
LLMs love three-item lists because they pattern-match against rhetorical training data (Lincoln, Churchill, Aaron Sorkin). The model genuinely cannot help itself — given a prompt about anything, it will reach for a three-noun structure. Real writers use tricolons sparingly because lists of three lose specificity at scale.
Cut the third item. If the third item is needed, the first two were wrong. "Faster, cheaper, and better" becomes "Cheaper without losing quality" — a real claim with a real trade-off. The discipline is to commit to two dimensions per sentence, not three.
LLMs produce these because the training data is dense in essay-form prose. Real writers either skip transitions entirely (paragraph break = transition) or use specific, content-bearing transitions. Pseudo-bridges signal "I am about to say something" rather than actually saying it.
Delete the bridge entirely. Start the sentence with the content. "In an increasingly digital world, marketing is hard" becomes "Marketing is hard." Nine times out of ten the bridge added zero information. The reader already knows we are in a digital world.
LLMs invoke vague authority because real authority claims require citations the model often cannot produce. "Studies have shown" is the AI version of "trust me, bro" — a confidence-projection without underlying evidence. Real writers either cite the specific study or do not invoke study-evidence.
Either cite the specific study by name + year, or skip the authority claim entirely. "Studies have shown that X" becomes "The 2024 Backlinko ranking factors study found X." If you cannot cite, the claim was probably weaker than the hand-wave suggested.
LLMs simulate emotional language by reaching for clinical descriptors of emotion rather than performing the emotion. The result is technically grammatical but flat — "exciting" and "delightful" and "passionate" carry no specific texture. Real emotional language is concrete (a memory, a sensation, a stake) not labeled.
Replace the emotion label with the cause of the emotion. "It was a game-changing experience" becomes "I rewrote the pricing page after that call and conversion went from 1.2% to 3.4%." Emotion comes from the specific concrete; the label kills the emotion every time.
Ameen, M. (2026). The 2026 AI Tells Index: 5 archetypes and 147 phrases that flag content as AI-written. Kompozy. https://kompozy.io/research/ai-tells-index
This index is updated as new model versions ship and new tells emerge. Subscribe via the Kompozy changelog to be notified of revisions.
AI tells are recurring structural patterns that flag content as machine-generated. This index groups them into 5 archetypes: hedge words and softeners, tricolons (lists of three), pseudo-philosophical bridges, authority hand-waves, and emotional thinness. Concentrated together in one piece, they produce the recognizable "this was written by AI" feeling.
The index catalogs 147 phrases across the 5 archetypes, loosely ordered within each archetype by how strongly they signal AI generation when they appear in isolation.
Replace each tell rather than just deleting it — every banned phrase is a placeholder for missing specificity. Track concentration, not occurrence: one hedge in 1,500 words is fine, twelve in 200 words is the AI feel. A brand-safety gate that strips banned words at output-time is the structural defense.
No. It is a curatorial, opinionated index, not a statistical analysis of detection-tool performance. It does not claim any single phrase guarantees AI authorship — humans use all of these phrases sometimes; the signal is concentration.
They were collected from three sources: the banned-words list inside the Kompozy brand-safety gate, public LLM output across GPT-4, Claude-3, and Llama-3 family models from generic prompts, and operator feedback during autopilot calibration. Archetypes were derived inductively from that corpus.