Quantization

Model Quantization

Definition

Quantization is a model compression technique that reduces the numerical precision of weights and/or activations from floating-point (FP32/FP16) to lower-bit integer formats (INT8, INT4). This reduces memory footprint, increases inference throughput, and lowers power consumption with acceptable accuracy tradeoffs.

Methods include post-training quantization (GPTQ, AWQ) and quantization-aware training.