Glossary

A plain-English glossary of the methods and ideas behind modern AI. Search, sort, or jump to a letter.

98 terms

	Definition
Ablation	An experiment that removes or changes one component of a model or training setup to measure how much it actually contributes.	May 22, 2026
Adam Optimizer	A widely-used adaptive optimizer for training neural networks.	May 21, 2026
Agentic Reinforcement Learning	Training LLMs with RL where the model takes multi-step actions in an environment (tools, code, web search) and is rewarded on task outcomes.	May 21, 2026
AI Agents	Systems where a model plans, uses tools, and acts over multiple steps.	May 21, 2026
AI Alignment	Making AI systems pursue what people actually intend.	May 21, 2026
Attention	A mechanism that lets a model weigh which other tokens matter for each token.	May 21, 2026
Autoencoder	A network that compresses data and reconstructs it.	May 21, 2026
Autoregressive Models	Models that generate output one token at a time, left to right.	May 21, 2026
Backpropagation	The algorithm that trains neural networks by propagating error gradients.	May 21, 2026
Batch Normalization	Normalizing layer activations to stabilize and speed up training.	May 21, 2026
Chain-of-Thought Prompting	Prompting a model to reason step by step before answering.	May 21, 2026
CLIP	A model that learns shared text and image embeddings from paired data.	May 21, 2026
Constitutional AI	Aligning models using a written set of principles instead of human labels.	May 21, 2026
Context Engineering	Systematically constructing and evolving an agent’s context — instructions, memory, retrieved info, tool results — as the primary lever for performance.	May 21, 2026
Context Window	The maximum number of tokens a model can attend to at once, spanning both the prompt and its generated output.	May 22, 2026
Contrastive Learning	Learning by pulling related items together and pushing others apart.	May 21, 2026
Convolutional Neural Network	A network that uses sliding filters, ideal for images.	May 21, 2026
Data Contamination	When benchmark or test data leaks into a model’s training set, inflating its scores.	May 22, 2026
Deep Learning	Machine learning with many-layered neural networks that learn features automatically.	May 21, 2026
Deep Research Agents	Autonomous agents that iteratively search, read, reason, and synthesize multi-source, cited reports on a question over many steps.	May 21, 2026
DeepSeek Sparse Attention (DSA)	A fine-grained trainable sparse attention built on Multi-head Latent Attention, using a lightning indexer plus token selection.	May 21, 2026
Diffusion Models	Generative models that create data by gradually denoising random noise.	May 21, 2026
Direct Preference Optimization	A simpler alternative to RLHF that optimizes preferences without a reward model.	May 21, 2026
Dropout	Randomly disabling neurons during training to prevent overfitting.	May 21, 2026
Embeddings	Numeric vectors that capture the meaning of text, images, or other data.	May 21, 2026
Emergent Abilities	Capabilities that appear only once models get large enough.	May 21, 2026
Epoch	One full pass over the entire training dataset.	May 22, 2026
Expert Systems	Classic AI based on hand-coded rules and knowledge.	May 21, 2026
Few-Shot Learning	Performing a task from just a handful of examples.	May 21, 2026
Fine-tuning	Adapting a pretrained model to a specific task or domain with further training.	May 21, 2026
FlashAttention	An exact, IO-aware attention algorithm that is much faster and uses less memory.	May 21, 2026
FLOPs	Floating-point operations — the standard unit for measuring the compute used to train or run a model.	May 22, 2026
Foundation Models	Large models pretrained broadly, then adapted to many tasks.	May 21, 2026
Generative Adversarial Network	Two networks — a generator and a critic — trained against each other.	May 21, 2026
Gradient Descent	Iteratively nudging parameters downhill to minimize a loss.	May 21, 2026
Group Relative Policy Optimization (GRPO)	A critic-free RL algorithm that estimates the advantage baseline from a group of sampled responses to the same prompt.	May 21, 2026
Group Sequence Policy Optimization (GSPO)	A reasoning-RL algorithm that computes importance ratios and clips at the whole-sequence level rather than per token.	May 21, 2026
Grouped-Query Attention	An attention variant that shares key/value heads to speed inference.	May 21, 2026
Hallucination	When a model states false information confidently.	May 21, 2026
Hyperparameter	A configuration value set before training (e.g. learning rate, batch size) rather than learned by the model.	May 22, 2026
In-Context Learning	Learning a task from examples in the prompt, without weight updates.	May 21, 2026
Inference	Running a trained model to produce outputs — as opposed to training, which updates its weights.	May 22, 2026
Instruction Tuning	Fine-tuning a base model to follow natural-language instructions.	May 21, 2026
Interactive World Models	Generative models that produce explorable, action-conditioned 3D/video environments in real time from a prompt.	May 21, 2026
Knowledge Distillation	Training a small "student" model to imitate a larger "teacher".	May 21, 2026
KV Cache	Cached attention keys and values that let a model generate each new token without recomputing the whole sequence.	May 22, 2026
Latent Reasoning	Performing reasoning in the model’s continuous hidden-state space instead of by emitting explicit chain-of-thought tokens.	May 21, 2026
Layer Normalization	Normalizing across features within each token.	May 21, 2026
Logits	A model’s raw, unnormalized output scores for each possible next token, converted to probabilities by softmax.	May 22, 2026
LoRA	Low-Rank Adaptation — efficient fine-tuning by training small adapter matrices.	May 21, 2026
LSTM	A recurrent network with gates that remember long-range information.	May 21, 2026
Masked Language Modeling	Pretraining by predicting hidden (masked) tokens.	May 21, 2026
Mixture of Experts	A sparse architecture that routes each token to a few specialized sub-networks.	May 21, 2026
Mixture-of-Recursions (MoR)	An architecture that recursively reuses shared layers and routes each token to its own recursion depth for adaptive computation.	May 21, 2026
Model Context Protocol (MCP)	An open standard (JSON-RPC) for connecting LLMs and agents to external tools, data, and resources via a uniform client–server interface.	May 21, 2026
Monte Carlo Tree Search	A search method that samples and evaluates promising move sequences.	May 21, 2026
Multimodal Learning	Models that jointly understand multiple data types.	May 21, 2026
Native Sparse Attention (NSA)	A hardware-aligned, end-to-end-trainable sparse attention using hierarchical token compression and selection.	May 21, 2026
Neural Network	A model of interconnected "neurons" that learns patterns from data.	May 21, 2026
Overfitting	When a model memorizes its training data and fails to generalize to new, unseen inputs.	May 22, 2026
Parameters	The learned weights of a neural network; the "parameter count" (e.g. 70B) is a rough proxy for a model’s size and capacity.	May 22, 2026
Perceptron	The earliest trainable artificial neuron.	May 21, 2026
Perplexity	A measure of how well a language model predicts a sample of text; lower is better.	May 22, 2026
Positional Encoding	How Transformers represent the order of tokens.	May 21, 2026
Pretraining	The large-scale self-supervised stage that builds a model’s base knowledge.	May 21, 2026
Prompt Engineering	Crafting inputs to steer a model toward better outputs.	May 21, 2026
Quantization	Shrinking models by storing weights at lower numerical precision.	May 21, 2026
Reasoning Models	Models that spend extra compute "thinking" before they answer.	May 21, 2026
Recurrent Neural Network	A network with memory, processing sequences one step at a time.	May 21, 2026
Reinforcement Learning	Learning by trial and error to maximize reward.	May 21, 2026
Residual Connections	Skip connections that let very deep networks train.	May 21, 2026
Retrieval-Augmented Generation	Grounding model answers in documents fetched at query time.	May 21, 2026
Reward Model	A model trained to score outputs by human preference, used to guide reinforcement learning during alignment.	May 22, 2026
RLHF	Reinforcement Learning from Human Feedback — aligning models to human preferences.	May 21, 2026
Rotary Position Embedding	A positional scheme that rotates query/key vectors by position.	May 21, 2026
Scaling Laws	Empirical relationships predicting model quality from compute, data, and size.	May 21, 2026
Self-Play	Improving by training against copies of oneself.	May 21, 2026
Self-Supervised Learning	Learning from unlabeled data by predicting part of it from the rest.	May 21, 2026
Sequence-to-Sequence	An encoder-decoder framework that maps one sequence to another.	May 21, 2026
Speculative Decoding	Speeding up generation with a small draft model the big one verifies.	May 21, 2026
State Space Models	Sequence models with linear-time scaling, an alternative to attention.	May 21, 2026
Supervised Fine-Tuning (SFT)	Training a pretrained model on labeled input→output examples to teach a specific behavior, format, or task.	May 22, 2026
Supervised Learning	Learning from labeled input-output examples.	May 21, 2026
Synthetic Data	Training data generated by models rather than collected from humans.	May 22, 2026
Temperature	A sampling setting that controls randomness — low values make output focused and deterministic, high values make it more diverse.	May 22, 2026
Test-Time Compute Scaling	Improving accuracy by spending more computation at inference — longer reasoning or multiple samples — rather than only at training.	May 21, 2026
Token	The atomic unit of text a model reads and generates — typically a word, sub-word, or character chunk.	May 22, 2026
Tokenization	Splitting text into the discrete units a model actually reads.	May 21, 2026
Tool Use	Letting models call external tools, code, and APIs.	May 21, 2026
Top-p (Nucleus) Sampling	Sampling the next token from the smallest set of candidates whose probabilities sum to p.	May 22, 2026
Transfer Learning	Reusing a model trained on one task to bootstrap another.	May 21, 2026
Transformer	The neural-network architecture behind virtually all modern language models.	May 21, 2026
Tree of Thoughts	Reasoning by exploring and evaluating multiple solution paths.	May 21, 2026
Turing Test	Alan Turing’s thought experiment for machine intelligence.	May 21, 2026
Unsupervised Learning	Finding structure in data without labels.	May 21, 2026
Variational Autoencoder	An autoencoder that learns a smooth, samplable latent space.	May 21, 2026
Vision Transformer	Applying the Transformer architecture directly to images.	May 21, 2026
Zero-Shot Learning	Doing a task with no examples, from instructions alone.	May 21, 2026