ai
TensorRT
TensorRT
Definition
TensorRT is NVIDIA's SDK for high-performance deep learning inference, optimizing models for NVIDIA GPUs through layer fusion, precision calibration (INT8/FP16), and kernel auto-tuning. TensorRT-LLM extends these optimizations specifically for large language models with features like in-flight batching and paged KV caching.
It enables significant throughput gains over standard PyTorch inference.
Ship secure code faster
Crash Override integrates security into the developer workflow. No context switching, no waiting on reviews.