Inference

Model Inference

Definition

Inference is the process of running a trained model to generate predictions or text from new inputs. Unlike training, inference does not update model weights.

LLM inference is computationally intensive due to the quadratic attention complexity and sequential token generation, driving demand for optimized inference engines, quantization, and hardware accelerators.

Related Terms

vLLM Inference Engine

Ship secure code faster

Crash Override integrates security into the developer workflow. No context switching, no waiting on reviews.

Talk to a Human See the Product