Deep dive: model merging

  Рет қаралды 8,414

Julien Simon

Julien Simon

Күн бұрын

*** Part 2 is now available at • Deep dive: model mergi... : Model Breadcrumbs, Model Stock, DELLA
Model merging is an increasingly popular technique that makes it possible to add or remove capabilities to transformer models, without the need for any additional training.
In this video, we first introduce what model merging is. Then, we discuss different merging algorithms implemented in the mergekit library (github.com/arcee-ai): model soups, SLERP, Task Arithmetic, TIES, DARE, and Franken-merging.
⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos. Follow me on Medium at / julsimon or Substack at julsimon.substack.com. ⭐️⭐️⭐️
01:16 What is model merging?
07:10 Model soups
14:00 Spherical Linear Interpolation (SLERP)
20:35 Task Arithmetic
27:15 Trim, Extract Sign and Merge (TIES)
36:20 Drop and Rescale (DARE)
43:40 Franken-merging

Пікірлер: 16
@kenchang3456
@kenchang3456 4 ай бұрын
Thank you for this video. I gotta give this a try 🙂
@juliensimonfr
@juliensimonfr 4 ай бұрын
You're welcome, and yes, you should :)
@melikanobakhtian6018
@melikanobakhtian6018 9 күн бұрын
That was great and it helped me so much! Is there this possibility to have the presentation slides?
@gnibu42
@gnibu42 4 ай бұрын
Super intersting Julien, thanks a lot for sharing
@juliensimonfr
@juliensimonfr 4 ай бұрын
Glad you enjoyed it
@SrikanthIyer
@SrikanthIyer 4 ай бұрын
Thanks for the fantastic video. Loved how you simplified almost all the methods to merge the models!
@juliensimonfr
@juliensimonfr 4 ай бұрын
Glad it was helpful!
@subhamkundu5043
@subhamkundu5043 4 ай бұрын
Hey @Julien, great vide. I have a question regarding the scale factor in TIES method. How we determine the scale factor?
@juliensimonfr
@juliensimonfr 4 ай бұрын
Thank you. It's up to you, depending on how much you want to "influence" the base model. mergekit has a parameter called 'density': fraction of weights in differences from the base model to retain. Example at github.com/arcee-ai/mergekit/blob/edd3817e4a470c7a959ef4c505f52a650a46ff07/examples/ties.yml
@uygarkurtai
@uygarkurtai 3 ай бұрын
Great viedo thank you! What I didn't grasp quite well is that, let's say I'm merging 2 models. One is trained on maths, other is trained on coding. Do we expect the merged model to perform high level in both tasks?
@juliensimonfr
@juliensimonfr 3 ай бұрын
Yes, that's the expectation :)
@abse-mj8pw
@abse-mj8pw 2 ай бұрын
I can't help wondering if there is an experiment which really fully discovers those technique like applying to all kinds of models or combining different methods together?
@juliensimonfr
@juliensimonfr 2 ай бұрын
Check out arcee.ai, their platform is definitely going that way.
@abse-mj8pw
@abse-mj8pw 2 ай бұрын
@@juliensimonfr Thanks for your answer!! I've found some interesting blogs about it!
@AbdennacerAyeb
@AbdennacerAyeb 4 ай бұрын
This is a random comment to boost your channel. Thank you.
@juliensimonfr
@juliensimonfr 4 ай бұрын
LOL, thank you.
Deep Dive: Optimizing LLM inference
36:12
Julien Simon
Рет қаралды 19 М.
10 weird algorithms
9:06
Fireship
Рет қаралды 1,2 МЛН
Best Toilet Gadgets and #Hacks you must try!!💩💩
00:49
Poly Holy Yow
Рет қаралды 22 МЛН
Secret Experiment Toothpaste Pt.4 😱 #shorts
00:35
Mr DegrEE
Рет қаралды 37 МЛН
What it feels like cleaning up after a toddler.
00:40
Daniel LaBelle
Рет қаралды 89 МЛН
Survive 100 Days In Nuclear Bunker, Win $500,000
32:21
MrBeast
Рет қаралды 76 МЛН
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 256 М.
Why Is This Basic Computer Science Problem So Hard?
8:34
Quanta Magazine
Рет қаралды 94 М.
Key Value Cache in Large Language Models Explained
17:37
Tensordroid
Рет қаралды 746
It's Been a Good Run, Drywall.
20:48
LRN2DIY
Рет қаралды 3,1 МЛН
Why Fine Tuning is Dead w/Emmanuel Ameisen
50:07
Hamel Husain
Рет қаралды 29 М.
"okay, but I want Llama 3 for my specific use case" - Here's how
24:20
How I'd Learn AI (If I Had to Start Over)
15:04
Thu Vu data analytics
Рет қаралды 764 М.
Meta Announces Llama 3 at Weights & Biases’ conference
26:16
Weights & Biases
Рет қаралды 83 М.
$1 vs $100,000 Slow Motion Camera!
0:44
Hafu Go
Рет қаралды 28 МЛН
iPhone socket cleaning #Fixit
0:30
Tamar DB (mt)
Рет қаралды 18 МЛН
📱магазин техники в 2014 vs 2024
0:41
djetics
Рет қаралды 594 М.
Rate This Smartphone Cooler Set-up ⭐
0:10
Shakeuptech
Рет қаралды 6 МЛН
Новые iPhone 16 и 16 Pro Max
0:42
Romancev768
Рет қаралды 2,3 МЛН