Chat GPT Rewards Model Explained!

Рет қаралды 18,161

Күн бұрын

Пікірлер: 43

@ryanhewitt9902 Жыл бұрын

This was excellent. I've seen a lot of videos discussing that same infographic and thought this would be more of the same. Your explanation has the perfect level of detail.

@CodeEmporium Жыл бұрын

Thanks so much! I really appreciate it! There is definitely more of this to come :)

@josephpareti9156 Жыл бұрын

YESSS

@DAsiaView_ Жыл бұрын

Really nice video! I really like that you break it down in more technical details rather than many other videos that give a high-level "how to use it". Look forward to your other videos in the series :)

@CodeEmporium Жыл бұрын

Thanks so much. There is more to come :)

@SIVAKUMARSivaprahasam Жыл бұрын

Amazing video like your other videos!! I recently started watching your videos and subscribed to your channel. Good content with great clarity!!

@sonicsharma2507 Жыл бұрын

Please create a machine learning course or some end to end projects if you have time. Your way of teaching is phenomenal and would love to learn from you.

@CodeEmporium Жыл бұрын

Thanks so much! In time, I shall! But in the mean time please stick around for more educational content:)

@williamrich3909 Жыл бұрын

Thank you CodeEmporium, another excellent video!

@CodeEmporium Жыл бұрын

Thanks so much :$

@paull923 Жыл бұрын

Again, great explanation, well done, thank you very much! I'm already excited about the upcoming videos

@CodeEmporium Жыл бұрын

Thanks so much Paul. So next week, i just have a few shorts. But should be back with the other parts the following week

@haskidev Жыл бұрын

Thank you for this insightful explanation. Looking forward to see more from ChatGPT :)

@CodeEmporium Жыл бұрын

Of course! Thanks so much for the compliments!

@kevon217 Жыл бұрын

Super helpful. You’re a great teacher!

@CodeEmporium Жыл бұрын

You are very welcome! And thanks!

@josephpareti9156 Жыл бұрын

outstanding presentation, just what I need for what I am planning to do: (i) when do you publish the details on steps 1 and 3? (ii) why, when tryining the reward model, r1 should always be better than r2? How do you set them? should not be done by the NN instead?

@CodeEmporium Жыл бұрын

I will publish these videos early January (they are the next set of videos after the holiday season). r1 should be greater than r2 for that specific loss. If we wanted it the other way around, we’d need to change the loss function. The rewards model is a “model”; so it has a training phase and inference phase. This step 2 talked about the training phase. Hence the lablers are required. In step 3, we infer from the trained model to assess quality of the response.

@heeroyuy298 Жыл бұрын

Great video. What's the reason that the rewards model is trained siamese style instead of just training one to predict the reward with a mean square error loss function?

@aryamohan4230 Жыл бұрын

This is amazing, thank you! I had a question though - Where do we use the likert scale in training? From the paper, I understand that we just use the rankings to train the model.

@saramirabi1485 7 ай бұрын

Oh this video really opens my mind... Thanks for your great explanation. :-)

@Ltsoftware3139 Жыл бұрын

Great video! I have some questions that baffled me. 1) How does ChatGPT deal with all of the specific words, like framework names. Does it also Tokenize them? Many videos compare ChatGpt with google and call it the google killer. Do we have any info about how much does it cost to run chat gpt vs a search in google?

@Ltsoftware3139 Жыл бұрын

Also, I asked math questions, like what is 35643 + 12352, and it was able to answer correctly. Does it have an internal mechanism of constructing math expressions or maybe generating code that would give the answer when ran?

@CodeEmporium Жыл бұрын

Great questions. In terms of tokenization of inputs, I think they are broken down into sub word tokens with Byte Pair Encoding. This is just a hunch , but I think this is how the GPT models process inputs (hence assuming the same). About it being a “Google Killer” - yes. I have heard this but I don’t believe this to be true. Google actually accesses the internet in real time. ChatGPT may look like it has legit answers but that’s probably because it was trained not too long ago. ChatGPTs objective is to answer questions with “safe and ideally factual responses”, but there is not mechanism to say the response is truly correct.

@ajaytaneja111 Жыл бұрын

Hi Ajay, can you point me to the references for the explanation of each block? They aren't in the comments section. Great video! Also: please correct me if I'm wrong - in the first block (step 1) ChatGPT responds to the users question In Step 2 (and step 3) it uses the response and through the rewards model , improves further so that next tine step 1 is answered more appropriately

@CodeEmporium Жыл бұрын

Hey Ajay. They should be in the description box under the video under the heading “RESOURCES”. And yep I think you summed up the steps pretty well :)

@Abdullahkbc 5 ай бұрын

i dont get how the token is seleceted in top-k sample ? does it get randomly from the top-k?

@creativeuser9086 Жыл бұрын

In the rewards model, how is it the same as the fine tuned model?

@rm175 Жыл бұрын

What sampling technique does Chat GPT-3 use then? Is it a combination of the ones you mentioned or just top-k?

@jonathanlatouche7013 Жыл бұрын

13:25

@kaitoukid1088 Жыл бұрын

Not ChatGPT related but what book did u use to learn linear algebra?

@CodeEmporium Жыл бұрын

Hmm. Bits and pieces in college and school I guess. It’s been a while since I just sat down and read a book like this. I’d need to look into it

@prashantlawhatre7007 Жыл бұрын

you may also go through the playlist of 3brown1blue.

@etmasikewo Жыл бұрын

Your courses are great. Love how you explain it. I still don't always get the math but sometimes it is better to accept how things work before getting a deeper understanding. (Sometimes getting to know first principles is better too) I hope to see more of this from you as I'm sure it will only progress further in everyday adoption.

@CodeEmporium Жыл бұрын

Thanks so much for commenting! And yea. It’s hard to strike that right balance of engagement and details :) I am trying to explore this for every video. Will continue to make more !

@paimeg Жыл бұрын

You sir, are at the vanguard of protecting us against our ChatGPT overlords

@CodeEmporium Жыл бұрын

“You can’t beat ChatGPT, you can only hope to understand it” ~ Code Emporium, 2023 (lol)

@ajaytaneja111 Жыл бұрын

Hi Ajay, does ChatGPT use inconsistent values of temperature sampling so that it generates human-like responses?

@creativeuser9086 Жыл бұрын

In fact, in the second column, the output of the labeling process will not be a scalar, it will be the ranking of the 4 different answers from best to worst, then that will feed into the reward model which is trained to maximize the difference between best and worst responses.