No video

Inference Optimization with NVIDIA TensorRT

  Рет қаралды 11,526

NCSAatIllinois

NCSAatIllinois

2 жыл бұрын

In many applications of deep learning models, we would benefit from reduced latency (time taken for inference). This tutorial will introduce NVIDIA TensorRT, an SDK for high-performance deep learning inference. We will go through all the steps necessary to convert a trained deep learning model to an inference-optimized model on HAL.
Speakers: Nikil Ravi and Pranshu Chaturvedi, UIUC
Webinar Date: April 13, 2022

Пікірлер
Demo: Optimizing Gemma inference on NVIDIA GPUs with TensorRT-LLM
12:21
Google for Developers
Рет қаралды 2,5 М.
Profiling Deep Learning Applications with NVIDIA NSight
53:01
Argonne Leadership Computing Facility
Рет қаралды 3,7 М.
НРАВИТСЯ ЭТОТ ФОРМАТ??
00:37
МЯТНАЯ ФАНТА
Рет қаралды 8 МЛН
NVAITC Webinar: Deploying Models with TensorRT
15:08
NVIDIA Developer
Рет қаралды 18 М.
The moment we stopped understanding AI [AlexNet]
17:38
Welch Labs
Рет қаралды 839 М.
ONNX Tools: Polygraphy and ONNX-GraphSurgeon
9:56
ONNX
Рет қаралды 2,1 М.
Generative AI in a Nutshell - how to survive and thrive in the age of AI
17:57
TensorRT-LLM: Quantization and Benchmarking
19:24
Long's Short-Term Memory
Рет қаралды 594
How does TensorRT 8.2 work?
11:43
Daniel Persson
Рет қаралды 2,1 М.
High Performance Inferencing with TensorRT
21:38
InsideHPC Report
Рет қаралды 7 М.
НРАВИТСЯ ЭТОТ ФОРМАТ??
00:37
МЯТНАЯ ФАНТА
Рет қаралды 8 МЛН