A plain-English reference to the AI terms creators actually run into in 2026 — LLM, token, prompt, hallucination, multimodal, agent, RAG, diffusion, fine-tuning, and inference — with what each one means for the person making content.
Last verified · 2026-07-05 · by Moe Ameen
An AI glossary is a reference that defines the vocabulary of artificial intelligence in plain language. For creators specifically, the useful version is not the academic one — it is the subset of terms you keep hitting when you use AI to write, generate images and video, or automate publishing, defined by what they change about your workflow rather than by the math underneath. The point of learning them is practical: the words tell you what a tool can and cannot do, which is the difference between choosing the right tool and being surprised by its limits.
The terms cluster into a few groups. There are the model basics — a large language model (LLM) is the text engine behind most AI writing tools; a token is the unit of text it reads and bills by (roughly ¾ of a word in English); the context window is how much text it can consider at once; inference is the act of running the model to get an output. There are the interaction terms — a prompt is your instruction to the model, and prompt engineering is the craft of writing instructions that reliably produce the output you want. There are the failure and safety terms — a hallucination is when a model states something false as if it were fact, and grounding or retrieval (RAG) is the technique of feeding the model real source material so it invents less. And there are the capability terms that define the frontier in 2026 — multimodal means one model handles text, image, audio, and video together; agentic AI or an AI agent means a system that takes multi-step actions (browsing, calling tools, running code) toward a goal rather than answering a single question.
On the media side, the words that matter most to creators are about how images and video get made. A diffusion model is the architecture behind most AI image and video generation — it starts from noise and denoises toward the prompt. Face-lock or identity consistency is the ability to keep the same face across many generated images. Lip-sync and voice cloning are what turn a script into a talking [avatar video](/glossary/avatar-video). Text-to-video, image-to-video, and video-to-video describe what you feed a generative video model. Knowing these lets you read a tool's feature list and predict what it will actually output.
A glossary is a starting map, not the territory. The terms below are stable enough to rely on, but the field moves fast — new model names, new capabilities, and new jargon land monthly. Treat this as the working vocabulary for 2026 and expect to add to it. Where a term has its own deep entry — [content repurposing](/glossary/content-repurposing), [viral clip detection](/glossary/viral-clip-detection), [avatar video](/glossary/avatar-video), the [AI design aesthetic](/glossary/ai-design-aesthetic) — this glossary links out to it rather than repeating it.
The vocabulary creators use today was built in layers. "Neural network" and "deep learning" entered the mainstream in the 2010s as image recognition and translation started working. "GAN" (generative adversarial network) drove the first wave of synthetic images and early deepfakes in the late 2010s. The current lexicon — LLM, prompt, token, hallucination, fine-tuning — arrived with the consumer launch of ChatGPT in late 2022, which put a conversational model in front of hundreds of millions of people and forced a new set of everyday words into general use. "Prompt engineering" went from a niche skill to a résumé line inside a year.
The image and video half of the vocabulary broke out alongside it. "Diffusion model" became common knowledge in 2022 as Stable Diffusion, DALL·E, and Midjourney reached the public, replacing GANs as the dominant image architecture. "Multimodal" moved from research term to product category in 2023–2024 as models learned to handle text, images, and audio in one system. By 2026 the buzzword had shifted again to "agentic AI" — systems that act rather than just answer — as tool-using agents became the framing for the next generation of products. Each wave left a layer of vocabulary behind, which is exactly why a creator-facing glossary needs periodic refreshing: the words that mattered in 2023 are table stakes now, and the words that matter in 2026 barely existed two years earlier.
The reason a glossary is worth your time as a creator is not so you can talk about AI — it is so you can read a feature list and know what will actually come out the other end. Almost every disappointment with an AI tool traces back to a vocabulary gap: someone expected a diffusion model to keep a consistent face (it will not, without face-lock), or trusted an LLM's statistics (hallucination), or thought "multimodal" meant "great at video" (it means it accepts video). The words are a lie detector for marketing pages.
The other thing the vocabulary reveals is that most AI tools are single primitives — an LLM drafts text, a diffusion model makes an image, a clip model finds a highlight, an avatar model renders a talking head. Each does one thing. The work of turning those primitives into finished, on-brand, scheduled content is a separate layer, and it is the layer most creators underestimate. That is the layer Kompozy is: it orchestrates the primitives so you do not operate them individually. The glossary terms map cleanly onto how it works — the prompt problem is solved by a [Persona Brief](/glossary/persona-brief) that constrains voice on every generation; the hallucination and brand-safety problem is handled by [quality gates](/glossary/quality-gates) that reject invented stats and banned words before anything publishes; the agentic idea shows up as [Autopilot](/glossary/autopilot), which runs generate-and-schedule end to end; and multimodal breadth is why one source can fan out to video, image, text, blog, and newsletter across nine platforms. You do not need to run inference, tune a diffusion model, or wire up retrieval yourself — but knowing what those words mean is exactly how you judge whether the layer sitting on top of them is doing its job. If you want the deep end, the terms carry through to running your own models — see [running SOTA LLMs locally](/guides/running-sota-llms-locally).
An AI glossary is a plain-language reference that defines the vocabulary of artificial intelligence. For creators, the useful version defines the terms you actually meet when using AI to write, generate images and video, or automate publishing — LLM, token, prompt, hallucination, multimodal, agent, diffusion, and so on — by what they change about your workflow rather than by the underlying math.
A large language model is the text engine behind most AI writing tools. It is trained on huge amounts of text and predicts the next token to generate responses. The "large" refers to the scale of training data and the number of parameters — modern LLMs run into the hundreds of billions or trillions of parameters. It is the thing drafting your captions, blogs, and scripts.
A token is the unit of text a language model reads and bills by — roughly three-quarters of a word in English, so 1,000 tokens is about 750 words. Models have a context window measured in tokens (how much they can consider at once), and API pricing is charged per token, which is why long inputs and outputs cost more.
A hallucination is when an AI model states something false as if it were fact — an invented statistic, a fake citation, a wrong date. It is a property of how generative models work, not a fixable bug. You reduce it by grounding the model in real source material (a technique called RAG) and by verifying any fact before you publish it, but you cannot eliminate it entirely.
An AI agent is a system that takes a sequence of actions toward a goal — browsing the web, calling tools, running code, making decisions — rather than answering a single prompt and stopping. "Agentic" is the 2026 framing for this shift from chatbots that respond to systems that act. In content, an agentic tool might pull a source, generate outputs, and schedule them without a human step in between.
Multimodal AI describes a model that handles more than one type of content — text, images, audio, and video — within a single system, rather than needing a separate tool for each. It refers to the inputs and outputs the model can process, not a guarantee that it is equally good at all of them.
The highest-value ones are LLM (the text engine), hallucination (why you verify facts), prompt and prompt engineering (how you steer output), multimodal (what formats a model handles), diffusion model (the architecture behind AI images and video), face-lock (consistent identity across generated images), and agentic AI (systems that act, not just answer). Together they let you read any tool's feature list and predict what it will actually produce.