ai
GPTQ
Generative Pre-trained Transformer Quantization
Definition
GPTQ is a one-shot post-training quantization method for LLMs that uses second-order information (Hessian approximation) to minimize quantization error layer by layer. It achieves high-quality 4-bit and 3-bit quantized models with minimal accuracy loss, enabling large models to run on single consumer GPUs.
GPTQ is widely used with the AutoGPTQ library for serving quantized open-source models.
Ship secure code faster
Crash Override integrates security into the developer workflow. No context switching, no waiting on reviews.