AI Alignment

Making AI systems pursue what people actually intend.

Alignment is the problem of ensuring AI systems’ goals and behavior match human values and intentions, especially as they grow more capable. Techniques like RLHF and Constitutional AI are practical alignment methods; the broader research field studies safety at scale.

Related papers