QLoRA

Quantized Low-Rank Adaptation

Definition

QLoRA combines 4-bit quantization of the base model with LoRA fine-tuning, enabling fine-tuning of very large models (65B+ parameters) on a single consumer GPU. The base model is loaded in NF4 (normalized float 4-bit) format and kept frozen, while LoRA adapters are trained in 16-bit precision.

QLoRA demonstrated that high-quality fine-tuning of 65B models is feasible on a single 48GB GPU.