ai
Quantization
Model Quantization
Definition
Quantization is a model compression technique that reduces the numerical precision of weights and/or activations from floating-point (FP32/FP16) to lower-bit integer formats (INT8, INT4). This reduces memory footprint, increases inference throughput, and lowers power consumption with acceptable accuracy tradeoffs.
Methods include post-training quantization (GPTQ, AWQ) and quantization-aware training.
Ship secure code faster
Crash Override integrates security into the developer workflow. No context switching, no waiting on reviews.