Skip to content
ai

TensorRT

TensorRT

Definition

TensorRT is NVIDIA's SDK for high-performance deep learning inference, optimizing models for NVIDIA GPUs through layer fusion, precision calibration (INT8/FP16), and kernel auto-tuning. TensorRT-LLM extends these optimizations specifically for large language models with features like in-flight batching and paged KV caching.

It enables significant throughput gains over standard PyTorch inference.


Ship secure code faster

Crash Override integrates security into the developer workflow. No context switching, no waiting on reviews.