Accelerate Transformer inference on GPU with Optimum and Better Transformer

  Рет қаралды 4,074

Julien Simon

Julien Simon

Жыл бұрын

In this video, I show you how to accelerate Transformer inference with Optimum, an open-source library by Hugging Face, and Better Transformer, a PyTorch extension available since PyTorch 1.12.
Using an AWS instance equipped with an NVIDIA V100 GPU, I start from a couple of models that I previously fine-tuned: a DistilBERT model for text classification and a Vision Transformer model for image classification. I first benchmark the original models, then I use Optimum and Better Transformer to optimize them with a single line of code, and I benchmark them again. This simple process delivers a 20-30% percent speedup with no accuracy drop!
⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos ⭐️⭐️⭐️
⭐️⭐️⭐️ Want to buy me a coffee? I can always use more :) www.buymeacoffee.com/julsimon ⭐️⭐️⭐️
- Optimum v1.5.0 : github.com/huggingface/optimu...
- Optimum docs: huggingface.co/docs/optimum/o...
- Better Transformer blog post: pytorch.org/blog/a-better-tra...
- DistilBERT model: huggingface.co/juliensimon/di...
- Vision Transformer model: huggingface.co/juliensimon/au...
- Code: gitlab.com/juliensimon/huggin...

Пікірлер: 5
@weebprogrammer2979
@weebprogrammer2979 Жыл бұрын
Can we even speed up whisper model?
@juliensimonfr
@juliensimonfr Жыл бұрын
It should help with all transformers, yes.
@theonerm2
@theonerm2 Жыл бұрын
Hey. I need some help. I like to mess around with these Ai models just for fun on huggingface. I had an Nvidia gpu with 12GB of vRAM. Decided to go for an AMD 7900 XT which has 20GB of vRAM to see if I could do a bit more with the models. Now I can't run pytorch with the gpu. It appears I need linux to do it but I also heard linux doesn't run well on my gpu. I suppose I could try dual booting but the problem there is my pc is also a media server in my house and I have programs that don't work in linux that I need to run. I'd really prefer one os that does it all. But I can't seem to figure it all out.
@setop123
@setop123 10 ай бұрын
I think that's a compatibility issue, AMD don't support CUDA very well ( or even at all i'm unshure) so I'd investigate that if you haven't figured it out already
@ammarahmad6079
@ammarahmad6079 4 ай бұрын
@@setop123yea Cuda is for Nvidia only
Finger Heart - Fancy Refill (Inside Out Animation)
00:30
FASH
Рет қаралды 28 МЛН
Red❤️+Green💚=
00:38
ISSEI / いっせい
Рет қаралды 88 МЛН
Каха заблудился в горах
00:57
К-Media
Рет қаралды 9 МЛН
Can A Seed Grow In Your Nose? 🤔
00:33
Zack D. Films
Рет қаралды 24 МЛН
CUDA Explained - Why Deep Learning uses GPUs
13:33
deeplizard
Рет қаралды 232 М.
CUDA Simply Explained - GPU vs CPU Parallel Computing for Beginners
19:11
Python Simplified
Рет қаралды 246 М.
Accelerate Transformer inference on CPU with Optimum and ONNX
16:32
Julien Simon
Рет қаралды 4,3 М.
100x Faster Than NumPy... (GPU Acceleration)
28:49
Mr. P Solver
Рет қаралды 83 М.
Deep Dive: Optimizing LLM inference
36:12
Julien Simon
Рет қаралды 19 М.
Accelerating Transformers with Hugging Face Optimum and Infinity
1:28:19
MLOps World: Machine Learning in Production
Рет қаралды 352
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 255 М.
ML Frameworks: Hugging Face Accelerate w/ Sylvain Gugger
1:05:29
Weights & Biases
Рет қаралды 4,1 М.
Xiaomi SU-7 Max 2024 - Самый быстрый мобильник
32:11
Клубный сервис
Рет қаралды 540 М.
Samsung laughing on iPhone #techbyakram
0:12
Tech by Akram
Рет қаралды 6 МЛН
Better Than Smart Phones☠️🤯 | #trollface
0:11
Not Sanu Moments
Рет қаралды 16 МЛН
Todos os modelos de smartphone
0:20
Spider Slack
Рет қаралды 64 МЛН
Look, this is the 97th generation of the phone?
0:13
Edcers
Рет қаралды 7 МЛН