Accelerate Transformer inference on CPU with Optimum and ONNX

  Рет қаралды 4,262

Julien Simon

Julien Simon

Күн бұрын

In this video, I show you how to accelerate Transformer inference with Optimum, an open source library by Hugging Face, and ONNX.
I start from a DistilBERT model fine-tuned for text classification, export it to ONNX format, then optimize it, and finally quantize it. Running benchmarks on an AWS c6i instance (Intel Ice Lake architecture), we speed up the original model more than 2.5x and divide its size by two, with just a few lines of simple Python code and without any accuracy drop!
⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos ⭐️⭐️⭐️
⭐️⭐️⭐️ Want to buy me a coffee? I can always use more :) www.buymeacoffee.com/julsimon ⭐️⭐️⭐️
- Optimum: github.com/huggingface/optimum
- Optimum docs: huggingface.co/docs/optimum/o...
- ONNX: onnx.ai/
- Original model: huggingface.co/juliensimon/di...
- Code: gitlab.com/juliensimon/huggin...

Пікірлер: 14
@anabildea9274
@anabildea9274 Жыл бұрын
Thank you for sharing! great content!
@geekyprogrammer4831
@geekyprogrammer4831 Жыл бұрын
Thanks a lot for creating this video. I saved a month by watching this video!
@juliensimonfr
@juliensimonfr Жыл бұрын
Great to hear, thank you.
@youssefbenhachem993
@youssefbenhachem993 Жыл бұрын
To the point ! great explanation, thanks 😀
@juliensimonfr
@juliensimonfr Жыл бұрын
Glad it was helpful!
@Gerald-iz7mv
@Gerald-iz7mv Ай бұрын
How do you export to onnx using cuda? It seems optimum doesnt support it - is there an alternative?
@juliensimonfr
@juliensimonfr Ай бұрын
huggingface.co/docs/optimum/onnxruntime/usage_guides/gpu
@ahlamhusni6258
@ahlamhusni6258 Жыл бұрын
is there any optimization methods applied on word2vec 2.0 model ? and can I apply these methods on the word2vec 2.0
@juliensimonfr
@juliensimonfr Жыл бұрын
Hi, Word2Vec isn't based on the transformer architecture. You should take a look at Sentence Transformers, they're a good way to get started with Transformer embeddings huggingface.co/blog/getting-started-with-embeddings
@ibrahimamin474
@ibrahimamin474 6 ай бұрын
@@juliensimonfr I think he meant wav2vec 2.0
@TheBontenbal
@TheBontenbal 3 ай бұрын
I am trying to follow along. Many updates to the code so many errors unfortunately.
@juliensimonfr
@juliensimonfr 3 ай бұрын
Docs and examples here: huggingface.co/docs/optimum/onnxruntime/overview
@Gerald-xg3rq
@Gerald-xg3rq Ай бұрын
what the difference between setfit.exporters.onnx and optimum.onnxruntime (optimizer = ORTModelFromFeatureExtraction.from_pretrained(...) optimizer.optimize()) etc.?
@juliensimonfr
@juliensimonfr Ай бұрын
Probably the same :)
Deep Dive: Optimizing LLM inference
36:12
Julien Simon
Рет қаралды 18 М.
Looks realistic #tiktok
00:22
Анастасия Тарасова
Рет қаралды 96 МЛН
NERF WAR HEAVY: Drone Battle!
00:30
MacDannyGun
Рет қаралды 53 МЛН
Я нашел кто меня пранкует!
00:51
Аришнев
Рет қаралды 4,3 МЛН
Heartwarming: Stranger Saves Puppy from Hot Car #shorts
00:22
Fabiosa Best Lifehacks
Рет қаралды 21 МЛН
ONNX and ONNX Runtime
44:35
Microsoft Research
Рет қаралды 23 М.
Accelerating Transformers with Hugging Face Optimum and Infinity
1:28:19
MLOps World: Machine Learning in Production
Рет қаралды 325
Everything You Want to Know About ONNX
1:06:55
Janakiram MSV
Рет қаралды 36 М.
Walk with fastai, all about Hugging Face Accelerate
28:17
Zachary Mueller
Рет қаралды 2 М.
Deep dive: model merging
47:26
Julien Simon
Рет қаралды 6 М.
Inference Optimization with NVIDIA TensorRT
36:28
NCSAatIllinois
Рет қаралды 11 М.
HuggingFace Crash Course - Sentiment Analysis, Model Hub, Fine Tuning
38:12
Хотела заскамить на Айфон!😱📱(@gertieinar)
0:21
Взрывная История
Рет қаралды 6 МЛН
Как слушать музыку с помощью чека?
0:36