LoRA

Low-Rank Adaptation — efficient fine-tuning by training small adapter matrices.

LoRA freezes the original model weights and injects small, trainable low-rank matrices into each layer. Only those few parameters are updated during fine-tuning, cutting memory and compute by orders of magnitude while matching full fine-tuning quality on many tasks.

It made customizing large models practical on modest hardware and underpins most open-source fine-tuning today.

Related papers