Glossary
A plain-English glossary of the methods and ideas behind modern AI. Search, sort, or jump to a letter.
98 terms
| Definition | ||
|---|---|---|
| Ablation | An experiment that removes or changes one component of a model or training setup to measure how much it actually contributes. | May 22, 2026 |
| Adam Optimizer | A widely-used adaptive optimizer for training neural networks. | May 21, 2026 |
| Agentic Reinforcement Learning | Training LLMs with RL where the model takes multi-step actions in an environment (tools, code, web search) and is rewarded on task outcomes. | May 21, 2026 |
| AI Agents | Systems where a model plans, uses tools, and acts over multiple steps. | May 21, 2026 |
| AI Alignment | Making AI systems pursue what people actually intend. | May 21, 2026 |
| Attention | A mechanism that lets a model weigh which other tokens matter for each token. | May 21, 2026 |
| Autoencoder | A network that compresses data and reconstructs it. | May 21, 2026 |
| Autoregressive Models | Models that generate output one token at a time, left to right. | May 21, 2026 |
| Backpropagation | The algorithm that trains neural networks by propagating error gradients. | May 21, 2026 |
| Batch Normalization | Normalizing layer activations to stabilize and speed up training. | May 21, 2026 |
| Chain-of-Thought Prompting | Prompting a model to reason step by step before answering. | May 21, 2026 |
| CLIP | A model that learns shared text and image embeddings from paired data. | May 21, 2026 |
| Constitutional AI | Aligning models using a written set of principles instead of human labels. | May 21, 2026 |
| Context Engineering | Systematically constructing and evolving an agent’s context — instructions, memory, retrieved info, tool results — as the primary lever for performance. | May 21, 2026 |
| Context Window | The maximum number of tokens a model can attend to at once, spanning both the prompt and its generated output. | May 22, 2026 |
| Contrastive Learning | Learning by pulling related items together and pushing others apart. | May 21, 2026 |
| Convolutional Neural Network | A network that uses sliding filters, ideal for images. | May 21, 2026 |
| Data Contamination | When benchmark or test data leaks into a model’s training set, inflating its scores. | May 22, 2026 |
| Deep Learning | Machine learning with many-layered neural networks that learn features automatically. | May 21, 2026 |
| Deep Research Agents | Autonomous agents that iteratively search, read, reason, and synthesize multi-source, cited reports on a question over many steps. | May 21, 2026 |
| DeepSeek Sparse Attention (DSA) | A fine-grained trainable sparse attention built on Multi-head Latent Attention, using a lightning indexer plus token selection. | May 21, 2026 |
| Diffusion Models | Generative models that create data by gradually denoising random noise. | May 21, 2026 |
| Direct Preference Optimization | A simpler alternative to RLHF that optimizes preferences without a reward model. | May 21, 2026 |
| Dropout | Randomly disabling neurons during training to prevent overfitting. | May 21, 2026 |
| Embeddings | Numeric vectors that capture the meaning of text, images, or other data. | May 21, 2026 |
| Emergent Abilities | Capabilities that appear only once models get large enough. | May 21, 2026 |
| Epoch | One full pass over the entire training dataset. | May 22, 2026 |
| Expert Systems | Classic AI based on hand-coded rules and knowledge. | May 21, 2026 |
| Few-Shot Learning | Performing a task from just a handful of examples. | May 21, 2026 |
| Fine-tuning | Adapting a pretrained model to a specific task or domain with further training. | May 21, 2026 |
| FlashAttention | An exact, IO-aware attention algorithm that is much faster and uses less memory. | May 21, 2026 |
| FLOPs | Floating-point operations — the standard unit for measuring the compute used to train or run a model. | May 22, 2026 |
| Foundation Models | Large models pretrained broadly, then adapted to many tasks. | May 21, 2026 |
| Generative Adversarial Network | Two networks — a generator and a critic — trained against each other. | May 21, 2026 |
| Gradient Descent | Iteratively nudging parameters downhill to minimize a loss. | May 21, 2026 |
| Group Relative Policy Optimization (GRPO) | A critic-free RL algorithm that estimates the advantage baseline from a group of sampled responses to the same prompt. | May 21, 2026 |
| Group Sequence Policy Optimization (GSPO) | A reasoning-RL algorithm that computes importance ratios and clips at the whole-sequence level rather than per token. | May 21, 2026 |
| Grouped-Query Attention | An attention variant that shares key/value heads to speed inference. | May 21, 2026 |
| Hallucination | When a model states false information confidently. | May 21, 2026 |
| Hyperparameter | A configuration value set before training (e.g. learning rate, batch size) rather than learned by the model. | May 22, 2026 |
| In-Context Learning | Learning a task from examples in the prompt, without weight updates. | May 21, 2026 |
| Inference | Running a trained model to produce outputs — as opposed to training, which updates its weights. | May 22, 2026 |
| Instruction Tuning | Fine-tuning a base model to follow natural-language instructions. | May 21, 2026 |
| Interactive World Models | Generative models that produce explorable, action-conditioned 3D/video environments in real time from a prompt. | May 21, 2026 |
| Knowledge Distillation | Training a small "student" model to imitate a larger "teacher". | May 21, 2026 |
| KV Cache | Cached attention keys and values that let a model generate each new token without recomputing the whole sequence. | May 22, 2026 |
| Latent Reasoning | Performing reasoning in the model’s continuous hidden-state space instead of by emitting explicit chain-of-thought tokens. | May 21, 2026 |
| Layer Normalization | Normalizing across features within each token. | May 21, 2026 |
| Logits | A model’s raw, unnormalized output scores for each possible next token, converted to probabilities by softmax. | May 22, 2026 |
| LoRA | Low-Rank Adaptation — efficient fine-tuning by training small adapter matrices. | May 21, 2026 |
| LSTM | A recurrent network with gates that remember long-range information. | May 21, 2026 |
| Masked Language Modeling | Pretraining by predicting hidden (masked) tokens. | May 21, 2026 |
| Mixture of Experts | A sparse architecture that routes each token to a few specialized sub-networks. | May 21, 2026 |
| Mixture-of-Recursions (MoR) | An architecture that recursively reuses shared layers and routes each token to its own recursion depth for adaptive computation. | May 21, 2026 |
| Model Context Protocol (MCP) | An open standard (JSON-RPC) for connecting LLMs and agents to external tools, data, and resources via a uniform client–server interface. | May 21, 2026 |
| Monte Carlo Tree Search | A search method that samples and evaluates promising move sequences. | May 21, 2026 |
| Multimodal Learning | Models that jointly understand multiple data types. | May 21, 2026 |
| Native Sparse Attention (NSA) | A hardware-aligned, end-to-end-trainable sparse attention using hierarchical token compression and selection. | May 21, 2026 |
| Neural Network | A model of interconnected "neurons" that learns patterns from data. | May 21, 2026 |
| Overfitting | When a model memorizes its training data and fails to generalize to new, unseen inputs. | May 22, 2026 |
| Parameters | The learned weights of a neural network; the "parameter count" (e.g. 70B) is a rough proxy for a model’s size and capacity. | May 22, 2026 |
| Perceptron | The earliest trainable artificial neuron. | May 21, 2026 |
| Perplexity | A measure of how well a language model predicts a sample of text; lower is better. | May 22, 2026 |
| Positional Encoding | How Transformers represent the order of tokens. | May 21, 2026 |
| Pretraining | The large-scale self-supervised stage that builds a model’s base knowledge. | May 21, 2026 |
| Prompt Engineering | Crafting inputs to steer a model toward better outputs. | May 21, 2026 |
| Quantization | Shrinking models by storing weights at lower numerical precision. | May 21, 2026 |
| Reasoning Models | Models that spend extra compute "thinking" before they answer. | May 21, 2026 |
| Recurrent Neural Network | A network with memory, processing sequences one step at a time. | May 21, 2026 |
| Reinforcement Learning | Learning by trial and error to maximize reward. | May 21, 2026 |
| Residual Connections | Skip connections that let very deep networks train. | May 21, 2026 |
| Retrieval-Augmented Generation | Grounding model answers in documents fetched at query time. | May 21, 2026 |
| Reward Model | A model trained to score outputs by human preference, used to guide reinforcement learning during alignment. | May 22, 2026 |
| RLHF | Reinforcement Learning from Human Feedback — aligning models to human preferences. | May 21, 2026 |
| Rotary Position Embedding | A positional scheme that rotates query/key vectors by position. | May 21, 2026 |
| Scaling Laws | Empirical relationships predicting model quality from compute, data, and size. | May 21, 2026 |
| Self-Play | Improving by training against copies of oneself. | May 21, 2026 |
| Self-Supervised Learning | Learning from unlabeled data by predicting part of it from the rest. | May 21, 2026 |
| Sequence-to-Sequence | An encoder-decoder framework that maps one sequence to another. | May 21, 2026 |
| Speculative Decoding | Speeding up generation with a small draft model the big one verifies. | May 21, 2026 |
| State Space Models | Sequence models with linear-time scaling, an alternative to attention. | May 21, 2026 |
| Supervised Fine-Tuning (SFT) | Training a pretrained model on labeled input→output examples to teach a specific behavior, format, or task. | May 22, 2026 |
| Supervised Learning | Learning from labeled input-output examples. | May 21, 2026 |
| Synthetic Data | Training data generated by models rather than collected from humans. | May 22, 2026 |
| Temperature | A sampling setting that controls randomness — low values make output focused and deterministic, high values make it more diverse. | May 22, 2026 |
| Test-Time Compute Scaling | Improving accuracy by spending more computation at inference — longer reasoning or multiple samples — rather than only at training. | May 21, 2026 |
| Token | The atomic unit of text a model reads and generates — typically a word, sub-word, or character chunk. | May 22, 2026 |
| Tokenization | Splitting text into the discrete units a model actually reads. | May 21, 2026 |
| Tool Use | Letting models call external tools, code, and APIs. | May 21, 2026 |
| Top-p (Nucleus) Sampling | Sampling the next token from the smallest set of candidates whose probabilities sum to p. | May 22, 2026 |
| Transfer Learning | Reusing a model trained on one task to bootstrap another. | May 21, 2026 |
| Transformer | The neural-network architecture behind virtually all modern language models. | May 21, 2026 |
| Tree of Thoughts | Reasoning by exploring and evaluating multiple solution paths. | May 21, 2026 |
| Turing Test | Alan Turing’s thought experiment for machine intelligence. | May 21, 2026 |
| Unsupervised Learning | Finding structure in data without labels. | May 21, 2026 |
| Variational Autoencoder | An autoencoder that learns a smooth, samplable latent space. | May 21, 2026 |
| Vision Transformer | Applying the Transformer architecture directly to images. | May 21, 2026 |
| Zero-Shot Learning | Doing a task with no examples, from instructions alone. | May 21, 2026 |