Deep dive: model merging

Рет қаралды 8,414

Күн бұрын

*** Part 2 is now available at • Deep dive: model mergi... : Model Breadcrumbs, Model Stock, DELLA
Model merging is an increasingly popular technique that makes it possible to add or remove capabilities to transformer models, without the need for any additional training.
In this video, we first introduce what model merging is. Then, we discuss different merging algorithms implemented in the mergekit library (github.com/arcee-ai): model soups, SLERP, Task Arithmetic, TIES, DARE, and Franken-merging.
⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos. Follow me on Medium at / julsimon or Substack at julsimon.substack.com. ⭐️⭐️⭐️
01:16 What is model merging?
07:10 Model soups
14:00 Spherical Linear Interpolation (SLERP)
20:35 Task Arithmetic
27:15 Trim, Extract Sign and Merge (TIES)
36:20 Drop and Rescale (DARE)
43:40 Franken-merging

Пікірлер: 16

@kenchang3456 4 ай бұрын

Thank you for this video. I gotta give this a try 🙂

@juliensimonfr 4 ай бұрын

You're welcome, and yes, you should :)

@melikanobakhtian6018 9 күн бұрын

That was great and it helped me so much! Is there this possibility to have the presentation slides?

@gnibu42 4 ай бұрын

Super intersting Julien, thanks a lot for sharing

@juliensimonfr 4 ай бұрын

Glad you enjoyed it

@SrikanthIyer 4 ай бұрын

Thanks for the fantastic video. Loved how you simplified almost all the methods to merge the models!

@juliensimonfr 4 ай бұрын

Glad it was helpful!

@subhamkundu5043 4 ай бұрын

Hey @Julien, great vide. I have a question regarding the scale factor in TIES method. How we determine the scale factor?

@juliensimonfr 4 ай бұрын

Thank you. It's up to you, depending on how much you want to "influence" the base model. mergekit has a parameter called 'density': fraction of weights in differences from the base model to retain. Example at github.com/arcee-ai/mergekit/blob/edd3817e4a470c7a959ef4c505f52a650a46ff07/examples/ties.yml

@uygarkurtai 3 ай бұрын

Great viedo thank you! What I didn't grasp quite well is that, let's say I'm merging 2 models. One is trained on maths, other is trained on coding. Do we expect the merged model to perform high level in both tasks?

@juliensimonfr 3 ай бұрын

Yes, that's the expectation :)

@abse-mj8pw 2 ай бұрын

I can't help wondering if there is an experiment which really fully discovers those technique like applying to all kinds of models or combining different methods together?

@juliensimonfr 2 ай бұрын

Check out arcee.ai, their platform is definitely going that way.

@abse-mj8pw 2 ай бұрын

@@juliensimonfr Thanks for your answer!! I've found some interesting blogs about it!