INT4

4-bit Integer Quantization

Definition

INT4 quantization represents model weights using 4-bit integers instead of 16- or 32-bit floats, reducing model size by 4–8x and significantly increasing inference speed on compatible hardware. INT4 models introduce quantization error that can degrade accuracy on complex reasoning tasks, but modern methods like GPTQ and AWQ minimize this degradation.

INT4 is used to run 70B+ parameter models on single consumer GPUs.

Related Terms

INT8

8-bit Integer Quantization

Quantization

Model Quantization

GPTQ

Generative Pre-trained Transformer Quantization

AWQ

Activation-aware Weight Quantization

← Back to Glossary

Ship secure code faster

Crash Override integrates security into the developer workflow. No context switching, no waiting on reviews.

Talk to a Human See the Product