HC2023-T1.1: ML Inference

  Рет қаралды 2,237

hotchipsvideos

hotchipsvideos

10 ай бұрын

Tutorial 1, part 1, Hot Chips 2023, Sunday, August 27, 2023.
Organizers: Nathan Kalyanasundharam, CXL Board & AMD
This tutorial gives a brief introduction to basic concepts underlying ML inference and then gives overviews of several hot areas where current research is improving the performance and capabilities of ML inference. After the introduction, the hot areas covered in this part of the tutorial are how quantization of weights and parameters can be utilized to increase inference efficiency and techniques used to optimize inference in small mobile devices.
ML Inference Overview
Micah Villmow, NVIDIA
Quantization Methods for Efficient ML Inference
Amir Gholami, UC Berkeley
ML Inference at the Edge
Felix Baum, Qualcomm

Пікірлер
HC2023-T1.2: ML Inference
1:14:58
hotchipsvideos
Рет қаралды 780
HC2023-S7: ML-Inference
2:01:48
hotchipsvideos
Рет қаралды 1,7 М.
Watermelon magic box! #shorts by Leisi Crazy
00:20
Leisi Crazy
Рет қаралды 110 МЛН
버블티로 부자 구별하는법4
00:11
진영민yeongmin
Рет қаралды 13 МЛН
pumpkins #shorts
00:39
Mr DegrEE
Рет қаралды 103 МЛН
HC2023-S1: Processing in Memory
1:01:51
hotchipsvideos
Рет қаралды 1,8 М.
Query, Key and Value Matrix for Attention Mechanisms in Large Language Models
18:21
Demo: Optimizing Gemma inference on NVIDIA GPUs with TensorRT-LLM
12:21
Google for Developers
Рет қаралды 2,8 М.
HC2023-S2: CPU 1
1:37:15
hotchipsvideos
Рет қаралды 1,7 М.
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 342 М.
HC2023-T2.2: Chiplets/UCI
1:54:23
hotchipsvideos
Рет қаралды 1,7 М.
HC2023-S5: ML-Training
1:07:00
hotchipsvideos
Рет қаралды 1,3 М.
It's Not About Scale, It's About Abstraction
46:22
Machine Learning Street Talk
Рет қаралды 28 М.
Quantization vs Pruning vs Distillation: Optimizing NNs for Inference
19:46
Watermelon magic box! #shorts by Leisi Crazy
00:20
Leisi Crazy
Рет қаралды 110 МЛН