Model Distillation - How ChatGPT Cheaps Out Over Time

  Рет қаралды 41,477

bycloud

bycloud

Күн бұрын

To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/.... You’ll also get 20% off an annual premium subscription.
my newsletter:
mail.bycloud.ai
What is model distillation? Well today I will explain the concept and how it is applied to make SoTA cheaper for people to access. Model with names like Turbo, fast, lightning, are usually processed with distillation. It can also be used in some other interesting aspects, eg. step distillation or hide your model structures.
Special thanks to Cross Product & Neggles for helping with this video
This video is supported by the kind Patrons & KZbin Members:
🙏Andrew Lescelius, alex j, Chris LeDoux, Alex Maurice, Miguilim, Deagan, FiFaŁ, Robert Zawiasa, Owen Ingraham, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Penumbraa, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO, Richárd Nagyfi, Hector, Drexon, Claxvii 177th, Inferencer, Michael Brenner, Akkusativ, Oleg Wock, FantomBloth, Thipok Tham, Clayton Ford, Theo, Handenon, Diego Silva, mayssam
[Discord] / discord
[Twitter] / bycloudai
[Patreon] / bycloud
[Music] Massobeats - daydream
[Profile & Banner Art] / pygm7
[Video Editor] ‪@Askejm‬

Пікірлер: 104
@bycloudAI
@bycloudAI Ай бұрын
To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/bycloud/. You’ll also get 20% off an annual premium subscription. I have a few more interesting ways of how people use model distillation, lmk if yall would like to see more creative use of this idea!
@parthmakode5255
@parthmakode5255 Ай бұрын
hi , in this cideo could you please mention that it is not actually RL but RLHF, that distinction is actually important to not ignore. thanks
@karlkastor
@karlkastor Ай бұрын
Very good overview of Distillation. One thing I felt was kinda missing here: Why is distillation better than just training the student on the original dataset? It is because the teacher model gives better feedback via it's outputs. For example an LLM outputs a probability distribution of what the next token could be "60% chance the next letter is an A, 20% chance it is a B". This makes it way easier for the student to learn than having to figure out these probabilities by itself from many examples where the next letter is either A or B.
@mirek190
@mirek190 Ай бұрын
Bigger models are better in finding pattens and correlations because their "brains" have more "neurons". Small models are dumber to find correlations as good as big models ... but teacher "big model" can explain those correlations to a small model that could understand it. A small model couldn't grasp a complex patten by itself but after explaining by a "big model" can understand it anyway. Something like that
@Gebsfrom404
@Gebsfrom404 Ай бұрын
@@mirek190 Sounds like understand vs. memorize.
@weakmindedidiot
@weakmindedidiot Ай бұрын
@@Gebsfrom404 It's less of memorizing and more of being told what is important to understand or learn. That's why you maintain huge amounts of the performance UP TO SOME LEVEL. At some point the compression is too lossy and the distilled model just kinda.. sucks.
@TragicGFuel
@TragicGFuel Ай бұрын
@@Gebsfrom404 more like, derive vs understand. You need more knowledge and intelligence to derive those patterns, but not necessarily the same complexity to understand them
@mirek190
@mirek190 Ай бұрын
@@TragicGFuel exactly
@leoj8633
@leoj8633 Ай бұрын
drink a shot every time he says "Model"
@moxes8237
@moxes8237 Ай бұрын
I’m drunk already
@Invuska
@Invuska Ай бұрын
He says model 70 times according to the transcript. Nah, I'm good lmao 😂
@leoj8633
@leoj8633 Ай бұрын
@@Invuska out of curiosity what are you using to check the transcript? and does it separate text by individual (like if the video had multiple people talking)
@q_cumber5936
@q_cumber5936 Ай бұрын
meow purr
@dmitryr9613
@dmitryr9613 Ай бұрын
@@leoj8633 there's a button for it in the video description
@setop123
@setop123 Ай бұрын
I dare you to find a better channel about technical AI... There's none. Amazing job as always bycloud !👏👏👏
@AfifFarhati
@AfifFarhati Ай бұрын
Subscribe to AI explained and Bycloud , and you're basically SET!
@colinharter4094
@colinharter4094 Ай бұрын
​@@AfifFarhaticame here to say AI explained too.
@zyansheep
@zyansheep Ай бұрын
Teacher model: _generates your training data_
@ikki4885
@ikki4885 Ай бұрын
Half of the training data...more like answers...the question that are need to be fed to the model needs humans to take care of the data accuracy and governance so in short questions or initial prompts need to be produced by humans ( carefully crafted) the answers are produced by teacher model and rl training of student model to match the answers of teacher model
@fyruzone
@fyruzone Ай бұрын
4o gotta be distilled af
@andreilucasgoncalves1416
@andreilucasgoncalves1416 Ай бұрын
To be honest for me it seems that ChatGPT is given better answers and is faster. I don't pay plus, but I am enjoying the process
@aykutakguen3498
@aykutakguen3498 Ай бұрын
@@andreilucasgoncalves1416 I feel the opposite
@Steamrick
@Steamrick Ай бұрын
GPT 4o is distilled. GPT 4o mini is distille AF
@w花b
@w花b Ай бұрын
Moonshine
@RedditGuy-22
@RedditGuy-22 Ай бұрын
Gpt4o is much dumber than Gpt4
@mariokotlar303
@mariokotlar303 Ай бұрын
Thanks, great video! It would be interesting to hear details of how and why distillation process affects finetuning, to better understand the efforts and troubles of stable diffusion / flux fine tunning community.
@BadChess56
@BadChess56 Ай бұрын
WE DISTILLING OUR MODELS WITH THIS ONE 🗣️🗣️🗣️👏👏👏🔥🔥🔥🔥🔥🔥
@nova8585
@nova8585 Ай бұрын
Once you make the video on quantization, could you give some thoughts on quantization vs model size performance? i.e. given a fixed vram, should I go for a highly quantized 70B or a less quantized 8B for example.
@fnorgen
@fnorgen Ай бұрын
In my experience there's little point in doing inference above 8 bit precision. However I often find a highly quantized 4 bit model behaves very similar to the same model in 8 bit with a high temperature. So if you value creativity over accuracy go with a highly quantized model. So According to this logic, I think 4 bit might be a bit too sloppy if you exclusively value accuracy. I don't really know though. Just do some tests I guess. Maybe compare a 4 bit 70b model vs the corresponding 8 bit 35b model. Just please don't go running an 8b model at 16 bit precision unless you have no other viable options. It would just be a waste of resources.
@paulopacitti
@paulopacitti Ай бұрын
it would be awesome a video about quantization!
@nTu4Ka
@nTu4Ka Ай бұрын
There is definitely quality degradation with model distillation. This especially noticeable with diffusion models.
@peteintania
@peteintania Ай бұрын
Thank you very much for this video! Just wondering, have you done the video about 1.58 bit already?
@4.0.4
@4.0.4 Ай бұрын
Ok, then why the hell didn't llama release something between 8B and 70B for 3.1? (at least Mistral graced us with 12B and 22B, those are fantastic btw).
@koisher-k
@koisher-k Ай бұрын
Qwen 2.5 14b and 32b were recently released and they are even better than Mistral 22b (according to both real humans reviews and good benchmarks like MMLU-Pro)
@markmuller7962
@markmuller7962 Ай бұрын
They could make money from LLMs by making voice mode uncensored for 18+
@dakidokino
@dakidokino Ай бұрын
Can you makes sure to update your MoE research? Seems promising. Maybe also AI hierarchy stuff?
@elliotcossins8417
@elliotcossins8417 Ай бұрын
Literally just talking with a friend that I have had worse responses out of 4o then I have ever had before and he brought up it could be distillation, then this video popped on my feed haha
@AB-wf8ek
@AB-wf8ek Ай бұрын
This one made me lol'd. Very informative. Thanks!
@Words-.
@Words-. Ай бұрын
Nice video👍
@fhub29
@fhub29 Ай бұрын
Loved the vid, as always
@dhruv-v8w
@dhruv-v8w Ай бұрын
Brilliant video.
@markanderson7236
@markanderson7236 Ай бұрын
When will ChatGPT start replacing platforms like Masterclass and Coursera? After all, when the ChatGPT promo briefly demonstrated its language teaching capabilities, stocks for companies like Babel and others took a hit.
@bycloudAI
@bycloudAI Ай бұрын
it’s harder but not impossible. The main selling point for them is UX, chatgpt can’t accommodate those as well rn
@AWriterWandering
@AWriterWandering Ай бұрын
The stock market is driven by people with no technical experience following whatever is hyped the most right now. It’s a terrible gauge of how practical these technologies actually are.
@tensorkelechi
@tensorkelechi Ай бұрын
Just what I needed 🤝🏾🗿
@sentry3118
@sentry3118 Ай бұрын
I was jumpscared at 2:58 by that minecraft damage noise. i thought i was getting hit in my game. I don't have minecraft open. I've barely played in years.
@CODE7X
@CODE7X Ай бұрын
I was right , ever since they brought feb2022 update they started degrading the original chatgpt , i started protesting in their servers but i got banned
@hydrohasspoken6227
@hydrohasspoken6227 Ай бұрын
didn't understand anything. Could you please repeat or reupload?
@bycloudAI
@bycloudAI Ай бұрын
I think if you press the "join" button beside "subscribe" you will instantly understand the video
@__nemesis__1571
@__nemesis__1571 Ай бұрын
Real 😭
@q_cumber5936
@q_cumber5936 Ай бұрын
System prompt makes anthropic response output really SHIT
@DJTechnosapien
@DJTechnosapien Ай бұрын
Human teacher model: hmm, this text is likely 89% generated with chatgpt. Student must be cheating. F-. Failed. Expelled. AI Teacher Model: hmm, this is 89% similar to the best ChatGPT outputs, respect. A+. Passed. You’ve now graduated lil student model 🤭🥰
@RancorSnp
@RancorSnp Ай бұрын
I have for sure noticed that GPT4 is much better, halucinates less and actually gets things done, pretty much the day GPT4o was released, and I haven't seen anything to convince me otherwise till thos day. At least the new "pause to think model" actually is able to give a response that isn't pulled out of it's ass
@gabrielt8466
@gabrielt8466 Ай бұрын
Oh how I miss release week chatgpt-4
@klinstone
@klinstone Ай бұрын
As a non-native speaker of English, I am responsible for saying that I didn't understand anything.
@mrrespected5948
@mrrespected5948 Ай бұрын
Nice
@naromsky
@naromsky Ай бұрын
Yeah, you basically explained why your sponsor "Brilliant" isn't the best way to learn.
@jblbasstester6198
@jblbasstester6198 Ай бұрын
Woah that hits hard, wasn‘t the best choice for this video 😂
@balu3811
@balu3811 Ай бұрын
ChatGPT-4 is better than it’s the newer models…
@pxrposewithnopurpose5801
@pxrposewithnopurpose5801 Ай бұрын
it got so much worse
@akirachisaka9997
@akirachisaka9997 Ай бұрын
I wonder why GPT-4o seems to perform better than GPT-4, when you look at arena leaderboards and stuff. I thought in theory, 4o is just “4 Turbo Version 2”? Is it because 4o is being used more for ChatGPT, which results in more first hand training data, thus being able to improve faster?
@ChanhDucTuong
@ChanhDucTuong Ай бұрын
Can someone please teach me how to make videos like this? The sheer amount of memes and B-roll in his videos makes me think that he might spend 50% of his time finding memes for each video. Is there any way to automate this process of searching for memes and b-roll?
@6DAMMK9
@6DAMMK9 Ай бұрын
community: What is distillation? Let's finetune Flux anyway. *No full "checkpoints" since then 🎉
@RedOneM
@RedOneM Ай бұрын
Who is going to open source 300k GPUs to train state of the art models? 😅 We should already be very glad with what we have.
@punk3900
@punk3900 Ай бұрын
now that we have o1 the previous 4o is crapp
@erkinalp
@erkinalp Ай бұрын
o1 mini and 4o large are about the same size
@punk3900
@punk3900 Ай бұрын
@@erkinalp The compute for 4o is almost unbearable for coding now that o1 is available :D
@punk3900
@punk3900 Ай бұрын
@@erkinalp Where can i read more about this?
@guillermorobledo2842
@guillermorobledo2842 Ай бұрын
So, a drip by drip dumb down? This is gonna be leading to funny stuff later down the line. I think they're doing this to get rid of the devs creating it in order to splurge the money elsewhere. Either that or they're really strapped for cash. Given enough distillations passed down like matryoshka, there will only be distortions. I never thought that the downfall of AI would be due to human greed🤣
@elijahtrenton8351
@elijahtrenton8351 Ай бұрын
So AI goes to college.
@cdkw2
@cdkw2 Ай бұрын
All this and in future people are gonna make CS:GO cat edits with this
@punk3900
@punk3900 Ай бұрын
too fast... please... distill your pace
@Xotchkass
@Xotchkass Ай бұрын
There is "playback" button. Put it on 0.5x and stop whining
@ckq
@ckq Ай бұрын
We're getting scammed Gpt 3.5 GPT 4 (costs money) GPT 4 Turbo (worse) GPT 4o (worse) GPT 4o mini (worse) o1 (not necessarily worse, but even smaller) way too expensive o1 preview (even smaller) o1 mini (super small)
@ckq
@ckq Ай бұрын
Parameter count 3.5: 175B ($3/4 per M) 4: 1.7T ($30/60 per M) 4 Turbo: 500B? ($10/30 per M) 4o: 250B? ($5/15 per M) 4o mini: 8B? ($0.15/0.60 per M)
@ckq
@ckq Ай бұрын
We should've been on GPT 4.5 which had 17T and anticipating GPT 5 which has 175T parameters. Instead they giving 8B parameter models
@erkinalp
@erkinalp Ай бұрын
@@ckq 4o is 880B, 4 turbo and 4o mini are 220B
@guillermorobledo2842
@guillermorobledo2842 Ай бұрын
Matryoshka Dolls
@erkinalp
@erkinalp Ай бұрын
@@ckq 8B ones are what free tier users are routed through when they invoke the "GPTs" tools, not directly selectable through the menu.
@ismbks
@ismbks Ай бұрын
you lost me halfway through the video
@AinHab
@AinHab Ай бұрын
guys , Im just starting out as an AI enthusiast making similar content would appreciate your feedback
@OnTheThirdDay
@OnTheThirdDay Ай бұрын
"A disciple is not above his teacher, but everyone when he is fully trained will be like his teacher." - Jesus, Luke 6:40
@AB-wf8ek
@AB-wf8ek Ай бұрын
I didn't realize JC was an LLM developer
@OnTheThirdDay
@OnTheThirdDay Ай бұрын
@@AB-wf8ek The student models not being fully trained is what provides an LLM that is not as educated as the teacher model. I don't think LLMs existed back then but I would need to check my compSci textbook to see when they were invented.
@cc1drt
@cc1drt Ай бұрын
gpt4-o1 preview is a massive upgrade bro is onto nothing 🔥🔥🔥
@snylekkie
@snylekkie Ай бұрын
What about just pruning and fine tuning? @bycloud
@HyperUpscale
@HyperUpscale Ай бұрын
Why AI Simulated DOOM Is Actually Absurd
13:20
bycloud
Рет қаралды 114 М.
Players vs Pitch 🤯
00:26
LE FOOT EN VIDÉO
Рет қаралды 123 МЛН
Из какого города смотришь? 😃
00:34
МЯТНАЯ ФАНТА
Рет қаралды 1,4 МЛН
СКОЛЬКО ПАЛЬЦЕВ ТУТ?
00:16
Masomka
Рет қаралды 1,4 МЛН
amazing#devil #lilith #funny #shorts
00:15
Devil Lilith
Рет қаралды 18 МЛН
This Free Image AI Is Gonna Break the Internet
10:52
bycloud
Рет қаралды 134 М.
How on Earth does ^.?$|^(..+?)\1+$ produce primes?
18:37
Stand-up Maths
Рет қаралды 378 М.
A Graphene Transistor Breakthrough?
15:23
Asianometry
Рет қаралды 278 М.
The Unreasonable Effectiveness of Prompt "Engineering"
15:12
How Physicists Broke the Solar Efficiency Record
20:47
Dr Ben Miles
Рет қаралды 675 М.
The Trillion Dollar Equation
31:22
Veritasium
Рет қаралды 9 МЛН
1 Million Tiny Experts in an AI? Fine-Grained MoE Explained
12:29
Players vs Pitch 🤯
00:26
LE FOOT EN VIDÉO
Рет қаралды 123 МЛН