Chat GPT Rewards Model Explained!

  Рет қаралды 18,161

CodeEmporium

CodeEmporium

Күн бұрын

Пікірлер: 43
@ryanhewitt9902
@ryanhewitt9902 Жыл бұрын
This was excellent. I've seen a lot of videos discussing that same infographic and thought this would be more of the same. Your explanation has the perfect level of detail.
@CodeEmporium
@CodeEmporium Жыл бұрын
Thanks so much! I really appreciate it! There is definitely more of this to come :)
@josephpareti9156
@josephpareti9156 Жыл бұрын
YESSS
@DAsiaView_
@DAsiaView_ Жыл бұрын
Really nice video! I really like that you break it down in more technical details rather than many other videos that give a high-level "how to use it". Look forward to your other videos in the series :)
@CodeEmporium
@CodeEmporium Жыл бұрын
Thanks so much. There is more to come :)
@SIVAKUMARSivaprahasam
@SIVAKUMARSivaprahasam Жыл бұрын
Amazing video like your other videos!! I recently started watching your videos and subscribed to your channel. Good content with great clarity!!
@sonicsharma2507
@sonicsharma2507 Жыл бұрын
Please create a machine learning course or some end to end projects if you have time. Your way of teaching is phenomenal and would love to learn from you.
@CodeEmporium
@CodeEmporium Жыл бұрын
Thanks so much! In time, I shall! But in the mean time please stick around for more educational content:)
@williamrich3909
@williamrich3909 Жыл бұрын
Thank you CodeEmporium, another excellent video!
@CodeEmporium
@CodeEmporium Жыл бұрын
Thanks so much :$
@paull923
@paull923 Жыл бұрын
Again, great explanation, well done, thank you very much! I'm already excited about the upcoming videos
@CodeEmporium
@CodeEmporium Жыл бұрын
Thanks so much Paul. So next week, i just have a few shorts. But should be back with the other parts the following week
@haskidev
@haskidev Жыл бұрын
Thank you for this insightful explanation. Looking forward to see more from ChatGPT :)
@CodeEmporium
@CodeEmporium Жыл бұрын
Of course! Thanks so much for the compliments!
@kevon217
@kevon217 Жыл бұрын
Super helpful. You’re a great teacher!
@CodeEmporium
@CodeEmporium Жыл бұрын
You are very welcome! And thanks!
@josephpareti9156
@josephpareti9156 Жыл бұрын
outstanding presentation, just what I need for what I am planning to do: (i) when do you publish the details on steps 1 and 3? (ii) why, when tryining the reward model, r1 should always be better than r2? How do you set them? should not be done by the NN instead?
@CodeEmporium
@CodeEmporium Жыл бұрын
I will publish these videos early January (they are the next set of videos after the holiday season). r1 should be greater than r2 for that specific loss. If we wanted it the other way around, we’d need to change the loss function. The rewards model is a “model”; so it has a training phase and inference phase. This step 2 talked about the training phase. Hence the lablers are required. In step 3, we infer from the trained model to assess quality of the response.
@heeroyuy298
@heeroyuy298 Жыл бұрын
Great video. What's the reason that the rewards model is trained siamese style instead of just training one to predict the reward with a mean square error loss function?
@aryamohan4230
@aryamohan4230 Жыл бұрын
This is amazing, thank you! I had a question though - Where do we use the likert scale in training? From the paper, I understand that we just use the rankings to train the model.
@saramirabi1485
@saramirabi1485 7 ай бұрын
Oh this video really opens my mind... Thanks for your great explanation. :-)
@Ltsoftware3139
@Ltsoftware3139 Жыл бұрын
Great video! I have some questions that baffled me. 1) How does ChatGPT deal with all of the specific words, like framework names. Does it also Tokenize them? Many videos compare ChatGpt with google and call it the google killer. Do we have any info about how much does it cost to run chat gpt vs a search in google?
@Ltsoftware3139
@Ltsoftware3139 Жыл бұрын
Also, I asked math questions, like what is 35643 + 12352, and it was able to answer correctly. Does it have an internal mechanism of constructing math expressions or maybe generating code that would give the answer when ran?
@CodeEmporium
@CodeEmporium Жыл бұрын
Great questions. In terms of tokenization of inputs, I think they are broken down into sub word tokens with Byte Pair Encoding. This is just a hunch , but I think this is how the GPT models process inputs (hence assuming the same). About it being a “Google Killer” - yes. I have heard this but I don’t believe this to be true. Google actually accesses the internet in real time. ChatGPT may look like it has legit answers but that’s probably because it was trained not too long ago. ChatGPTs objective is to answer questions with “safe and ideally factual responses”, but there is not mechanism to say the response is truly correct.
@ajaytaneja111
@ajaytaneja111 Жыл бұрын
Hi Ajay, can you point me to the references for the explanation of each block? They aren't in the comments section. Great video! Also: please correct me if I'm wrong - in the first block (step 1) ChatGPT responds to the users question In Step 2 (and step 3) it uses the response and through the rewards model , improves further so that next tine step 1 is answered more appropriately
@CodeEmporium
@CodeEmporium Жыл бұрын
Hey Ajay. They should be in the description box under the video under the heading “RESOURCES”. And yep I think you summed up the steps pretty well :)
@Abdullahkbc
@Abdullahkbc 5 ай бұрын
i dont get how the token is seleceted in top-k sample ? does it get randomly from the top-k?
@creativeuser9086
@creativeuser9086 Жыл бұрын
In the rewards model, how is it the same as the fine tuned model?
@rm175
@rm175 Жыл бұрын
What sampling technique does Chat GPT-3 use then? Is it a combination of the ones you mentioned or just top-k?
@jonathanlatouche7013
@jonathanlatouche7013 Жыл бұрын
13:25
@kaitoukid1088
@kaitoukid1088 Жыл бұрын
Not ChatGPT related but what book did u use to learn linear algebra?
@CodeEmporium
@CodeEmporium Жыл бұрын
Hmm. Bits and pieces in college and school I guess. It’s been a while since I just sat down and read a book like this. I’d need to look into it
@prashantlawhatre7007
@prashantlawhatre7007 Жыл бұрын
you may also go through the playlist of 3brown1blue.
@etmasikewo
@etmasikewo Жыл бұрын
Your courses are great. Love how you explain it. I still don't always get the math but sometimes it is better to accept how things work before getting a deeper understanding. (Sometimes getting to know first principles is better too) I hope to see more of this from you as I'm sure it will only progress further in everyday adoption.
@CodeEmporium
@CodeEmporium Жыл бұрын
Thanks so much for commenting! And yea. It’s hard to strike that right balance of engagement and details :) I am trying to explore this for every video. Will continue to make more !
@paimeg
@paimeg Жыл бұрын
You sir, are at the vanguard of protecting us against our ChatGPT overlords
@CodeEmporium
@CodeEmporium Жыл бұрын
“You can’t beat ChatGPT, you can only hope to understand it” ~ Code Emporium, 2023 (lol)
@ajaytaneja111
@ajaytaneja111 Жыл бұрын
Hi Ajay, does ChatGPT use inconsistent values of temperature sampling so that it generates human-like responses?
@creativeuser9086
@creativeuser9086 Жыл бұрын
In fact, in the second column, the output of the labeling process will not be a scalar, it will be the ranking of the 4 different answers from best to worst, then that will feed into the reward model which is trained to maximize the difference between best and worst responses.
@prashantlawhatre7007
@prashantlawhatre7007 Жыл бұрын
Great Work Ajay. 👏👏👏
@CodeEmporium
@CodeEmporium Жыл бұрын
Thanks a lot :)
@davidporterrealestate
@davidporterrealestate Жыл бұрын
good explanation
@CodeEmporium
@CodeEmporium Жыл бұрын
Thanks so much!
ChatGPT and Reinforcement Learning
15:53
CodeEmporium
Рет қаралды 10 М.
What does GPT in ChatGPT do?
16:58
CodeEmporium
Рет қаралды 6 М.
БЕЛКА СЬЕЛА КОТЕНКА?#cat
00:13
Лайки Like
Рет қаралды 1,8 МЛН
Officer Rabbit is so bad. He made Luffy deaf. #funny #supersiblings #comedy
00:18
Funny superhero siblings
Рет қаралды 3,2 МЛН
Has Generative AI Already Peaked? - Computerphile
12:48
Computerphile
Рет қаралды 992 М.
Sentence Transformers - EXPLAINED!
17:51
CodeEmporium
Рет қаралды 28 М.
So How Does ChatGPT really work?  Behind the screen!
15:01
Arvin Ash
Рет қаралды 547 М.
Training AI Without Writing A Reward Function, with Reward Modelling
17:52
Robert Miles AI Safety
Рет қаралды 238 М.
What's the future for generative AI? - The Turing Lectures with Mike Wooldridge
1:00:59
Advanced ChatGPT Guide - How to build your own Chat GPT Site
37:09
Adrian Twarog
Рет қаралды 1,1 МЛН
Blowing up Transformer Decoder architecture
25:59
CodeEmporium
Рет қаралды 16 М.
The complete guide to Transformer neural Networks!
27:53
CodeEmporium
Рет қаралды 34 М.
[1hr Talk] Intro to Large Language Models
59:48
Andrej Karpathy
Рет қаралды 2,2 МЛН
БЕЛКА СЬЕЛА КОТЕНКА?#cat
00:13
Лайки Like
Рет қаралды 1,8 МЛН